llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 04:32:44 +01:00

Author	SHA1	Message	Date
Sanjay Patel	d78c67e231	[InstCombine][test] add tests for FP min/max with negated op; NFC	2021-06-22 14:15:04 -04:00
Sanjay Patel	e5a2d28d0d	[InstCombine][test] add tests for FP min/max with negated op; NFC	2021-06-22 14:15:04 -04:00
Joseph Huber	0fbe411307	[Attributor] Add interface to emit remarks in Attributor Summary: This patch adds support for the Attributor to emit remarks on behalf of some other pass. The attributor can now optionally take a callback function that returns an OptimizationRemarkEmitter object when given a Function pointer. If this is availible then a remark will be emitted for the corresponding pass name. Depends on D102197 Reviewed By: sstefan1 thegameg Differential Revision: https://reviews.llvm.org/D102444	2021-06-22 14:12:46 -04:00
David Green	bb402997c3	[ARM] Change some Gather/Scatter interface types to Instructions. NFC These returned Values are cast to an Instruction already, this just cleans up the interface a little to match the expected types.	2021-06-22 19:11:39 +01:00
Matt Arsenault	248de533d0	AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions These used to consistently be zeroed pre-gfx9, but gfx9 made the situation complicated since now some still do and some don't. This also manages to pick up a few cases that the pattern fails to optimize away. We handle some cases with instruction patterns, but some get through. In particular this improves the integer cases.	2021-06-22 13:42:49 -04:00
Matt Arsenault	67612de11d	AMDGPU: Add baseline test for instructions zeroing high bits	2021-06-22 13:27:39 -04:00
Joseph Huber	4df09b164e	[OpenMP] Enable HeapToStack conversion in OpenMPOpt for new RTL globalization calls Summary: The changes to globalization introduced in D97680 introduce a large amount of overhead by default. The old globalization method would always ignore globalization code if executing in SPMD mode. This wasn't strictly correct as data sharing is still possible in SPMD mode. The new interface is correct but introduces globalization code even when unnecessary. This optimization will use the existing HeapToStack transformation in the attributor to allow for unneeded globalization to be replaced with thread-private stack memory. This is done using the newly introduced library instances for the RTL functions added in D102087. Depends on D97818 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102197	2021-06-22 13:23:05 -04:00
Joseph Huber	76b50aa3c4	[OpenMP] Add new OpenMP globalization functions to library info Summary: The changes to globalization introduced in D97680 created two new functions to push / pop shareably memory on the GPU, __kmpc_alloc_shared and __kmpc_free_shared. This patch adds these new runtime functions to the library info so they can be used by the HeapToStack attributor interface. This optimization replaces malloc / free pairs with stack memory if legal. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102087	2021-06-22 13:23:05 -04:00
Patrick Holland	0114258120	[MCA] [In-order pipeline] Fix for 0 latency instruction causing assertion to fail. 0 latency instructions now get processed and retired properly within the in-order pipeline. Had to fix a bug within TimelineView.cpp as well that would show up when a 0 latency instruction was the first instruction in the source. Differential Revision: https://reviews.llvm.org/D104675	2021-06-22 10:18:39 -07:00
Matt Arsenault	daaa1d32d1	AMDGPU: Fix high 16-bit optimization on gfx9 We can do this optimization in the majority of cases, but we currently don't have a way to do it. We do not track/model which instructions have which behavior, the control bit to change the high bit behavior, or making use of preserved bits at all. This is a bit fuzzy since we don't know precisely how the source instruction will be lowered, but that only really matters in one case (for fma_mixlo). We do need to fixup some of these cases after selection, but the pattern helps eliminate many of these zexts.	2021-06-22 13:16:45 -04:00
LLVM GN Syncbot	1bf34b2165	[gn build] Port 40d6d2c49dd1	2021-06-22 17:03:46 +00:00
Sami Tolvanen	df1ab29b62	ThinLTO: Fix inline assembly references to static functions with CFI Create an internal alias with the original name for static functions that are renamed in promoteInternals to avoid breaking inline assembly references to them. Link: https://github.com/ClangBuiltLinux/linux/issues/1354 Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D104058	2021-06-22 10:01:55 -07:00
zhijian	3f6513e8c5	[AIX][XCOFF] generate eh_info when vector registers are saved according to the traceback table. Summary: generate eh_info when vector registers are saved according to the traceback table. struct eh_info_t { unsigned version; /* EH info version 0 / #if defined(64BIT) char _pad[4]; / padding / #endif unsigned long lsda; / Pointer to Language Specific Data Area / unsigned long personality; / Pointer to the personality routine */ }; the value of lsda and personality is zero when the number of vector registers saved is large zero and there is not personality of the function Reviewers: Jason Liu Differential Revision: https://reviews.llvm.org/D103651	2021-06-22 13:01:31 -04:00
Stanislav Mekhanoshin	cd7b7932a8	[AMDGPU] Use performOptimizedStructLayout for LDS sort This gives better packing. Differential Revision: https://reviews.llvm.org/D104331	2021-06-22 09:58:10 -07:00
Fangrui Song	61ffec433f	Improve the diagnostic of DiagnosticInfoResourceLimit (and warn-stack-size in particular) Before: `warning: stack size limit exceeded (888) in main` After: `warning: stack frame size (888) exceeds limit (100) in function 'main'` (the -Wframe-larger-than limit will be mentioned) Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D104667	2021-06-22 09:55:20 -07:00
Nico Weber	afac7e2fd3	[gn build] manually port c747b7d1d9a2 (config.osx_sysroot)	2021-06-22 12:50:58 -04:00
Matt Arsenault	50b757aa13	AMDGPU: Move zeroed FP high bits optimization to patterns	2021-06-22 12:47:56 -04:00
Joseph Huber	cbac628d6a	[OpenMP] Internalize functions in OpenMPOpt to improve IPO passes Summary: Currently the attributor needs to give up if a function has external linkage. This means that the optimization introduced in D97818 will only apply to static functions. This change uses the Attributor to internalize OpenMP device routines by making a copy of each function with private linkage and replacing the uses in the module with it. This allows for the optimization to be applied to any regular function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102824	2021-06-22 12:38:10 -04:00
Steven Wu	f17f759d9f	[llvm] Fix lto tests that requires ld64 Since Xcode 13, ld64 requires linking libSystem for all the executable. Fix the tests that needs to run ld64 by linking libSystem from sysroot. rdar://77332728 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D104332	2021-06-22 09:21:29 -07:00
Fangrui Song	8084959508	[llvm-objcopy] Fix some namespace style issues https://llvm.org/docs/CodingStandards.html#use-namespace-qualifiers-to-implement-previously-declared-functions Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D104693	2021-06-22 09:19:48 -07:00
Bill Wendling	e71f7771a1	[llvm-diff] Constify APIs so that there aren't conflicts Some APIs work with const variables while others don't. This can cause conflicts when calling one from the other. This is NFC. Differential Revision: https://reviews.llvm.org/D104719	2021-06-22 09:17:04 -07:00
Joseph Huber	24e62e819b	[OpenMP] Replace GPU globalization calls with shared memory in the middle-end Summary: The changes introduced in D97680 create a simpler interface to code that needs to be globalized. This interface is used to simplify the globalization calls in the middle end. We can check any globalization call that is only called by a single thread in the team and replace it with a static shared memory buffer. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97818	2021-06-22 11:55:44 -04:00
Nikita Popov	80dafbe344	[OpaquePtr] Handle addrspacecasts in InstCombine This adds support for addrspace casts involving opaque pointers to InstCombine, as well as the isEliminableCastPair() helper (otherwise the assertion failure would just move there). Add PointerType::hasSameElementTypeAs() to hide the element type details. Differential Revision: https://reviews.llvm.org/D104668	2021-06-22 17:45:30 +02:00
Meera Nakrani	6089765b49	[AArch64LoadStoreOptimizer] Recommit: Generate more STPs by renaming registers earlier This is a recommit that fixes unwanted STP generation by checking that the base register has not been modified or used elsewhere. Our initial motivating case was memcpy's with alignments > 16. The loads/stores, to which small memcpy's expand, are kept together in several places so that we get a sequence like this for a 64 bit copy: LD w0 LD w1 ST w0 ST w1 The load/store optimiser can generate a LDP/STP w0, w1 from this because the registers read/written are consecutive. In our case however, the sequence is optimised during ISel, resulting in: LD w0 ST w0 LD w0 ST w0 This instruction reordering allows reuse of registers. Since the registers are no longer consecutive (i.e. they are the same), it inhibits LDP/STP creation. The approach here is to perform renaming: LD w0 ST w0 LD w1 ST w1 to enable the folding of the stores into a STP. We do not yet generate the LDP due to a limitation in the renaming implementation, but plan to look at that in a follow-up so that we fully support this case. While this was initially motivated by certain memcpy's, this is a general approach and thus is beneficial for other cases too, as can be seen in some test changes. Differential Revision: https://reviews.llvm.org/D103597	2021-06-22 15:29:13 +00:00
Jingu Kang	d3ec564d67	[SimpleLoopUnswich] Fixa a bug on ComputeUnswitchedCost with partial unswitch There was a bug from cost calculation for partially invariant unswitch. The costs of non-duplicated blocks are substracted from the total LoopCost, so anything that is duplicated should not be counted. Differential Revision: https://reviews.llvm.org/D103816	2021-06-22 16:18:00 +01:00
Florian Hahn	ed71001ad9	[SCEV] Reduce code to handle predicates in applyLoopGuards (NFC). Hoist out common recurrence check and sink updating the map, to reduce the code required to support additional predicates.	2021-06-22 15:56:45 +01:00
Joseph Huber	3aea5cddbb	[OpenMP] Simplify GPU memory globalization Summary: Memory globalization is required to maintain OpenMP standard semantics for data sharing between worker and master threads. The GPU cannot share data between its threads so must allocate global or shared memory to store the data in. Currently this is implemented fully in the frontend using the `__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard CPU stack sharing. The front-end scans the target region for variables that escape the region and must be shared between the threads. Each variable then has a field created for it in a global record type. This patch replaces this functinality with a single allocation command, effectively mimicing an alloca instruction for the variables that must be shared between the threads. This will be much slower than the current solution, but makes it much easier to optimize as we can analyze each variable independently and determine if it is not captured. In the future, we can replace these calls with an `alloca` and small allocations can be pushed to shared memory. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97680	2021-06-22 10:52:46 -04:00
Rosie Sumpter	96972f2a2c	[SLP][AArch64] Add SLP vectorizer tests for XOR and AND reductions. NFC These regression tests show missed SLP vectorization opportunities, which will be fixed in a future commit (see: https://reviews.llvm.org/D104538). Differential Revision: https://reviews.llvm.org/D104708	2021-06-22 15:16:02 +01:00
Florian Hahn	0966169859	[BitcodeReader] Validate Strtab before accessing. This fixes a crash with invalid bitcode files that have records referencing names in Strtab, but Strtab is not present or the index is out-of-bounds. This fixes the following clusterfuzz issue: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29895 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D95554	2021-06-22 14:52:16 +01:00
Nikita Popov	dca0f3abd1	[ConstantFold] Delay fetching pointer element type Don't do this while stipping pointer casts, instead fetch it at the end. This improves compatibility with opaque pointers for the case where the base object is not opaque.	2021-06-22 15:51:00 +02:00
Nikita Popov	c651fde72b	[ConstantFold] Skip bitcast -> GEP transform for opaque pointers Same as with the InstCombine transform, this is not possible for bitcasts involving opaque pointers, as GEP preserves opaqueness.	2021-06-22 15:50:55 +02:00
Thomas Johnson	c71f850869	Add norm sub-target feature to table gen for ARC This adds the `norm` sub-target feature (without backing implementation for now) to table gen. Differential Revision: https://reviews.llvm.org/D104558	2021-06-22 14:39:29 +03:00
Florian Hahn	e93c4c3336	[SCEV] Retain AddExpr flags when subtracting a foldable constant. Currently we drop wrapping flags for expressions like (A + C1)<flags> - C2. But we can retain flags under certain conditions: * Adding a smaller constant is NUW if the original AddExpr was NUW. * Adding a constant with the same sign and small magnitude is NSW, if the original AddExpr was NSW. This can improve results after using `SimplifyICmpOperands`, which may subtract one in order to use stricter predicates, as is the case for `isKnownPredicate`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D104319	2021-06-22 11:27:51 +01:00
Martin Storsjö	a307928fe5	[ADT] Add StringRef consume_front_lower and consume_back_lower These serve as a convenient combination of consume_front/back and startswith_lower/endswith_lower, consistent with other existing case insensitive methods named <operation>_lower. Differential Revision: https://reviews.llvm.org/D104218	2021-06-22 12:38:08 +03:00
Nikita Popov	7b3baaea7f	[ConstantFolding] Separate conditions in GEP evaluation (NFC) Handle to gep p, 0-v case separately, and not as part of the loop that ensures all indices are constant integers. Those two things are not really related.	2021-06-22 11:14:47 +02:00
Fraser Cormack	a876d5dd7e	[Utils][vim] Add missing highlights for fast-math flags This patch adds the `afn`, `contract`, and `reassoc` fast-math flags. It also fixes up `fneg`'s order in the alphabetized list. Reviewed By: MaskRay, craig.topper Differential Revision: https://reviews.llvm.org/D104541	2021-06-22 09:39:15 +01:00
Sander de Smalen	fd053d5ffe	[GlobalISel] Add scalable property to LLT types. This patch aims to add the scalable property to LLT. The rest of the patch-series changes the interfaces to take/return ElementCount and TypeSize, which both have the ability to represent the scalable property. The changes are mostly mechanical and aim to be non-functional changes for fixed-width vectors. For scalable vectors some unit tests have been added, but no effort has been put into making any of the GlobalISel algorithms work with scalable vectors yet. That will be left as future work. The work is split into a series of 5 patches to make reviews easier. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D104450	2021-06-22 08:43:34 +01:00
Bjorn Pettersson	a73e012b96	[NewPM] Print passes with params when using "opt -print-passes" Make sure we also print passes with params when using "opt -print-passes". Differential Revision: https://reviews.llvm.org/D104625	2021-06-22 09:01:38 +02:00
Fangrui Song	c8843dd694	[llvm-objcopy] Internalize some symbols	2021-06-21 23:49:25 -07:00
Fangrui Song	b47d0d83bf	[llvm-objcopy] Delete empty namespace. NFC	2021-06-21 23:44:07 -07:00
Max Kazantsev	7cd9c97dcc	Re-land "[LoopDeletion] Handle Phis with similar inputs from different blocks" Patch was reverted due to a bug that existed before it and was exposed by it. Returning after the underlying bug has been fixed. Differential Revision: https://reviews.llvm.org/D103959	2021-06-22 12:28:46 +07:00
Max Kazantsev	d1d62d860b	[LoopDeletion] Require loop to have a predecessor when executing 1st iteration symbolically Two predecessors break the further logic, and the loop may come to the opt in non-canonicalized state.	2021-06-22 12:20:55 +07:00
Heejin Ahn	68b04daf84	[WebAssembly] Make tag attribute's encoding uint8 This changes the encoding of the `attribute` field, which currently only contains the value `0` denoting this tag is for an exception, from `varuint32` to `uint8`. This field is effectively unused at the moment and reserved for future use, and it is not likely to need `varuint32` even in future. See https://github.com/WebAssembly/exception-handling/pull/162. This does not change any encoded binaries because `0` is encoded in the same way both in `varuint32` and `uint8`. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D104571	2021-06-21 21:22:39 -07:00
Eli Friedman	0c81356419	Rename MachineMemOperand::getOrdering -> getSuccessOrdering. Since this method can apply to cmpxchg operations, make sure it's clear what value we're actually retrieving. This will help ensure we don't accidentally ignore the failure ordering of cmpxchg in the future. We could potentially introduce a getOrdering() method on AtomicSDNode that asserts the operation isn't cmpxchg, but not sure that's worthwhile. Differential Revision: https://reviews.llvm.org/D103338	2021-06-21 16:49:27 -07:00
Vitaly Buka	379a6155dd	[NFC] Add getUnderlyingObjects test Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D104585	2021-06-21 16:36:50 -07:00
Eli Friedman	19ad020e28	[ScalarEvolution] Ensure backedge-taken counts are not pointers. A backedge-taken count doesn't refer to memory; returning a pointer type is nonsense. So make sure we always return an integer. The obvious way to do this would be to just convert the operands of the icmp to integers, but that doesn't quite work out at the moment: isLoopEntryGuardedByCond currently gets confused by ptrtoint operations. So we perform the ptrtoint conversion late for lt/gt operations. The test changes are mostly innocuous. The most interesting changes are more complex SCEV expressions of the form "(-1 * (ptrtoint i8* %ptr to i64)) + %ptr)". This is expected: we can't fold this to zero because we need to preserve the pointer base. The call to isLoopEntryGuardedByCond in howFarToZero is less precise because of ptrtoint operations; this shows up in the function pr46786_c26_char in ptrtoint.ll. Fixing it here would require more complex refactoring. It should eventually be fixed by future improvements to isImpliedCond. See https://bugs.llvm.org/show_bug.cgi?id=46786 for context. Differential Revision: https://reviews.llvm.org/D103656	2021-06-21 16:24:16 -07:00
Nick Desaulniers	2aca733d9e	[IR] convert warn-stack-size from module flag to fn attr Otherwise, this causes issues when building with LTO for object files that use different values. Link: https://github.com/ClangBuiltLinux/linux/issues/1395 Reviewed By: dblaikie, MaskRay Differential Revision: https://reviews.llvm.org/D104342	2021-06-21 15:09:25 -07:00
Rong Xu	2d9e36a4c2	[SampleFDO] Make FSDiscriminator flag part of function parameters Add a parameter of IsFSDiscriminator to function getBaseDiscriminatorFromDiscriminator(). This function currently checks the internal flag of --enable-fs-discriminator. This is not good because we might change the default value of the internal flag. Note that we have a default parameter. This is just because create_afdo_tool has a call-site to it. I will remove the default parameter in a later patch. Differential Revision: https://reviews.llvm.org/D104584	2021-06-21 14:37:45 -07:00
Eli Friedman	d98eee18e6	[ARM] Make sure we don't transform unaligned store to stm on Thumb1. This isn't likely to come up in practice; the combination of compiler flags required to hit this issue should be rare. Found by inspection.	2021-06-21 14:32:42 -07:00
Fangrui Song	bb7326c8ea	[AArch64][X86] Allow 64-bit label differences lower to IMAGE_REL_*_REL32 `IMAGE_REL_ARM64_REL64/IMAGE_REL_AMD64_REL64` do not exist and `.quad a - .` is currently not representable. For instrumentation, `.quad a - .` is useful representing a cross-section reference in a metadata section, to allow ELF medium/large code models. The COFF limitation makes such generic instrumentations inconvenient. I plan to make a PGO/coverage metadata section field relative in D104556. Differential Revision: https://reviews.llvm.org/D104564	2021-06-21 14:32:25 -07:00

... 2 3 4 5 6 ...

217651 Commits