llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 18:54:02 +01:00

Author	SHA1	Message	Date
Florian Hahn	5f9b78cce6	Recommit "[VectorCombine] Scalarize vector load/extract." This reverts commit 94d54155e2f38b56171811757044a3e6f643c14b. This fixes a sanitizer failure by moving scalarizeLoadExtract(I) before foldSingleElementStore(I), which may remove instructions.	2021-05-24 11:35:07 +01:00
David Green	56d0e7bedd	[ARM] Ensure WLS preheader blocks have branches during memcpy lowering This makes sure that the blocks created for lowering memcpy to loops end up with branches, even if they fall through to the successor. Otherwise IfCvt is getting confused with unanalyzable branches and creating invalid block layouts. The extra branches should be removed as the tail predicated loop is finalized in almost all cases.	2021-05-24 11:26:45 +01:00
David Green	5c433e70b6	[ARM] Fix inline memcpy trip count sequence The trip count for a memcpy/memset will be n/16 rounded up to the nearest integer. So (n+15)>>4. The old code was including a BIC too, to clear one of the bits, which does not seem correct. This remove the extra BIC. Note that ideally this would never actually be generated, as in the creation of a tail predicated loop we will DCE that setup code, letting the WLSTP perform the trip count calculation. So this doesn't usually come up in testing (and apparently the ARMLowOverheadLoops pass does not do any sort of validation on the tripcount). Only if the generation of the WLTP fails will it use the incorrect BIC instructions. Differential Revision: https://reviews.llvm.org/D102629	2021-05-24 11:01:58 +01:00
Fraser Cormack	f5b495a5df	[RISCV] Prevent store combining from infinitely looping RVV code generation does not successfully custom-lower BUILD_VECTOR in all cases. When it resorts to default expansion it may, on occasion, be expanded to scalar stores through the stack. Unfortunately these stores may then be picked up by the post-legalization DAGCombiner which merges them again. The merged store uses a BUILD_VECTOR which is then expanded, and so on. This patch addresses the issue by overriding the `mergeStoresAfterLegalization` hook. A lack of granularity in this method (being passed the scalar type) means we opt out in almost all cases when RVV fixed-length vector support is enabled. The only exception to this rule are mask vectors, which are always either custom-lowered or are expanded to a load from a constant pool. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D102913	2021-05-24 10:19:32 +01:00
Roman Lebedev	53a13ec861	[NFCI][LoopIdiom] 'left-shift until bittest': assert that BaseX is loop-invariant Given that BaseX is an incoming value when coming from the preheader, it should be loop-invariant, but let's just document this assumption.	2021-05-24 12:15:06 +03:00
Roman Lebedev	6aa4ab51ca	[LoopIdiom] 'logical right shift until zero': the value must be loop-invariant As per the reproducer provided by Mikael Holmén in post-commit review.	2021-05-24 12:15:06 +03:00
Florian Hahn	7a9eb4bae0	Revert "[VectorCombine] Scalarize vector load/extract." This reverts commit 86497785d540e59eaca24bed4219ddec183cbc9b. One of the tests causes an ASAN failure. https://lab.llvm.org/buildbot/#/builders/5/builds/7927/steps/12/logs/stdio	2021-05-24 10:11:00 +01:00
Simon Pilgrim	4fa97550f8	[CostModel][X86] Improve accuracy of vXi64 MUL costs on AVX2/AVX512 targets By llvm-mca analysis, Haswell/Broadwell has the worst v4i64 recip-throughput cost of the AVX2 targets at 6 (vs the currently used cost of 8). Similarly SkylakeServer (our only AVX512 target model) implements PMULLQ with an average cost of 1.5 (rounded up to 2.0), and the PMULUDQ-sequence (without AVX512DQ) as a cost of 6.	2021-05-24 09:48:32 +01:00
Chen Zheng	7ccc4dd201	[Debug-Info]update section name to match AIX behaviour; nfc	2021-05-24 04:33:41 -04:00
Florian Hahn	930085daca	[VectorCombine] Scalarize vector load/extract. This patch adds a new combine that tries to scalarize chains of `extractelement (load %ptr), %idx` to `load (gep %ptr, %idx)`. This is profitable when extracting only a few elements out of a large vector. At the moment, `store (extractelement (load %ptr), %idx), %ptr` operations on large vectors result in huge code in the backend. This can easily be triggered by using the matrix extension, e.g. https://clang.godbolt.org/z/qsccPdPf4 This should complement D98240. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D100273	2021-05-24 09:29:08 +01:00
Johannes Doerfert	e8185bdd21	[Attributor] Introduce a helper do deal with constant type mismatches If we simplify values we sometimes end up with type mismatches. If the value is a constant we can often cast it though to still allow propagation. The logic is now put into a helper and it replaces some ad hoc things we did before. This also introduces the AA namespace for abstract attribute related functions and types.	2021-05-23 23:00:40 -05:00
Johannes Doerfert	637e90a523	[Attributor] Teach AAIsDead about undef values Not only if the branch or switch condition is dead but also if it is assumed `undef` we can delay AAIsDead exploration.	2021-05-23 23:00:40 -05:00
Johannes Doerfert	8fbdaf26cb	[Attributor] Deal with address spaces gracefully When we do value propagation we need to cast address spaces properly.	2021-05-23 23:00:39 -05:00
Johannes Doerfert	3c31dd18ce	[Attributor] Be more careful to not disturb the CG outside the SCC We have seen various problems when the call graph was not updated or the updated did not succeed because it involved functions outside the SCC. This patch adds assertions and checks to avoid accidentally changing something outside the SCC that would impact the call graph. It also prevents us from reanalyzing functions outside the current SCC which could cause problems on its own. Note that the transformations we do might cause the CG to be "more precise" but the original one would always be a super set of the most precise one. Since the call graph is by nature an approximation, it is good enough to have a super set of all call edges.	2021-05-23 23:00:39 -05:00
Johannes Doerfert	70d6e43525	[Attributor][FIX] Account for undef in the constant value lattice The constant value lattice looks like this ``` <None> \| <undef> / \| \ ... <0> ... \ \| / <unknown> ``` We did not account for the undef and assumed a value meant we could not change anymore. Now we actually check if we have the same value as before, which will signal CHANGED to the users when we go from undef to a specific constant. This fixes, among other things, the bug exposed by @ipccp4 in `value-simplify.ll`.	2021-05-23 20:47:06 -05:00
Johannes Doerfert	0473f9be3c	[Attributor][FIX] Ensure we replace undef if we see the first "real" value The state of AAPotentialValues tracks if undef is contained. It should fold undef into the first non-undef value. However we missed a case before. There was also a shadowing definition of two variables that caused trouble. The test exposes both problems.	2021-05-23 20:47:06 -05:00
Johannes Doerfert	42ec472f5c	[Attributor][NFC] Precommit test case with branch on undef This test exposes a bug in the module pass as it simplifies ipccp4 to unreachable, which is unfortunately wrong.	2021-05-23 20:47:06 -05:00
Johannes Doerfert	ced56b844d	[Attributor][NFC] Add helpful debug outputs	2021-05-23 20:47:05 -05:00
Johannes Doerfert	06fcc8738f	[Attributor][NFC] Clang format the Attributor source files	2021-05-23 20:47:05 -05:00
Johannes Doerfert	27d2bed19b	[Attributor][NFC] Rerun update_test_checks script on Attributor tests	2021-05-23 20:47:05 -05:00
Fady Ghanim	34efa8f465	[NFC] Removing leftover debug code Removing a missed value::dump() used to debug during development of OMPBuilder atomic.	2021-05-23 19:13:33 -04:00
Fangrui Song	7d66664724	[AArch64] Delete unneeded fixup_aarch64_ldr_pcrel_imm19 VK_GOT special case An AArch64 VK_GOT fixup must have a symbol. MCAssembler::evaluateFixup considers such a fixup not resolved. The code path cannot trigger.	2021-05-23 15:20:56 -07:00
Fady Ghanim	bb0b21b662	[OpenMP][OMPIRBuilder]Adding support for `omp atomic` This patch adds support for generating `omp atomic` for all different atomic clauses	2021-05-23 17:44:09 -04:00
Philipp Krones	df7a8b162e	[MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo This makes it possible for targets to define their own MCObjectFileInfo. This MCObjectFileInfo is then used to determine things like section alignment. This is a follow up to D101462 and prepares for the RISCV backend defining the text section alignment depending on the enabled extensions. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D101921	2021-05-23 14:15:23 -07:00
Nikita Popov	130d03950d	[LoopUnroll] Add test for partial unrolling again non-latch exit (NFC) This test case would get miscompiled by the current version of D102982, because unrolling does not respect the PreserveCondBr flag for partial unrolling.	2021-05-23 23:10:23 +02:00
Fangrui Song	68c6e43bf6	[AArch64][MC] Remove unneeded "in .xxx directive" from diagnostics The prevailing style does not add the message. The directive name is not useful because the next line replicates the error line which includes the directive.	2021-05-23 13:58:16 -07:00
Joerg Sonnenberger	4ecac84bb1	[SPARC] recognize the "rd %pc, reg" special form Differential Revision: https://reviews.llvm.org/D96312	2021-05-23 22:52:59 +02:00
Sander de Smalen	849ddeeded	NFC: cleaned up and renamed scalable-vf-analysis.ll -> scalable-vectorization.ll * Removes unnecessary loop hints. * Use RUN line with '-scalable-vectorization=preferred' instead of 'on' for the maximize-bandwidth behaviour. This prepares the test for enabling scalable vectorization; With a forced instruction-cost of 1, 'on' will always favour fixed-width VF to be chosen, whereas with 'preferred' we can check that the maximize-bandwidth option in combination with scalable-vectorization=preferred actually picks a scalable VF. * Renamed to scalable-vectorization.ll, because a follow-up patch will test more than just analysis.	2021-05-23 19:53:51 +01:00
Roman Lebedev	9f647330f9	[NFC][X86][Costmodel] Add tests with with masked loads/stores w/non-power-of-two vectors	2021-05-23 21:45:36 +03:00
Fangrui Song	653c782076	[AArch64] Use \t in AsmStreamer to match the prevailing style	2021-05-23 11:35:42 -07:00
Simon Pilgrim	da358a290c	Fix bugs URL for PR relocations The PR works from llvm.org, not bugs.llvm.org	2021-05-23 17:19:36 +01:00
Simon Pilgrim	f8e2239648	[CostModel][X86] Align v2i64 MUL costs on SSE42+ targets with worst case Based on worst case of sandybridge (which seems to match nehalem for this SSE sequence) (vs btver2 + bdver2) llvm-mca analysis	2021-05-23 16:20:57 +01:00
Nico Weber	9cfe1a59a7	[gn build] (semi-manually) port 0bccdf82f705	2021-05-23 10:01:06 -04:00
Sanjay Patel	645f84951e	[InstSimplify] add more tests for rem-mul-div; NFC See D102864 for discussion.	2021-05-23 09:46:29 -04:00
maekawatoshiki	0614e76510	[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass. The next patch will utilize LoopNest to effectively handle loop nests. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D99149	2021-05-23 22:32:01 +09:00
Nikita Popov	8df1db8535	[LoopUnroll] Add test for unrollable non-latch multi-exit (NFC) This test case requires unrolling against a non-latch exit in a multiple-exit loop with exiting latch. It's not covered by exiting heuristics or the extension in D102635.	2021-05-23 10:51:45 +02:00
David Green	77bb6ccfb3	[ARM] Add extra debug messages for gather/scatter lowering. NFC	2021-05-23 08:52:13 +01:00
Martin Storsjö	a30052855f	[Windows] Use TerminateProcess to exit without running destructors If exiting using _Exit or ExitProcess, DLLs are still unloaded cleanly before exiting, running destructors and other cleanup in those DLLs. When the caller expects to exit without cleanup, running destructors in some loaded DLLs (which can be either libLLVM.dll or e.g. libc++.dll) can cause deadlocks occasionally. This is an alternative to D102684. Differential Revision: https://reviews.llvm.org/D102944	2021-05-22 23:41:40 +03:00
Martin Storsjö	68aa000dc9	[MinGW] Mark a number of library functions unavailable for mingw targets These functions were marked unavailable for MSVC targets before, within an "T.isOSWindows() && !T.isOSCygMing()" block, but these ones are unavailable on MinGW targets too. This avoids generating calls to stpcpy for MinGW targets, which has been happening since 6dbf0cfcf789365493f70ae69df8a7a59be41c75 (in some cases). This fixes https://github.com/mstorsjo/llvm-mingw/issues/201. Differential Revision: https://reviews.llvm.org/D102946	2021-05-22 23:40:19 +03:00
Simon Pilgrim	f6dc684c0c	[CostModel][X86] Align v4i64 MUL costs on AVX1 targets with worst case Based on worst case of sandybridge (vs btver2 + bdver2) llvm-mca analysis - which is a lot less than what we were predicting (I think based off total uop count).	2021-05-22 20:07:55 +01:00
Nikita Popov	66053da05c	[IR] Optimize no-op removal from AttributeList (NFC) When removing an AttrBuilder from an index of an AttributeList, directly return the original list if no attributes were actually removed.	2021-05-22 19:03:27 +02:00
Nikita Popov	632878de15	[IR] Optimize no-op removal from AttributeSet (NFC) When removing an AttrBuilder from an AttributeSet, first check whether there is any overlap. If nothing is being removed, we can directly return the original set.	2021-05-22 18:55:25 +02:00
Simon Pilgrim	0bdfd66523	[CostModel][X86] Pull out X86/X64 scalar int arithmetric costs from SSE tables. NFCI. These aren't dependent on any SSE level (and don't tend to get quicker either).	2021-05-22 16:13:49 +01:00
Lang Hames	da54af9961	[ORC] Add more synchronization to TestLookupWithUnthreadedMaterialization. Don't run tasks until their corresponding thread has been added to the running threads vector. This is an extention to fda4300da82, which doesn't seem to have been enough to fix the synchronization issues on its own.	2021-05-22 07:59:24 -07:00
Lang Hames	635b337d6d	[JITLink] Move some Block bitfields into Addressable to improve packing. Keeping these bitfields from Block to Addressable allows them to be packed with the bitfields at the end of Addressable, reducing the size of Block by eight bytes.	2021-05-22 07:59:24 -07:00
Yaxun (Sam) Liu	80152597c5	[HIP] support ThinLTO Add options -[no-]offload-lto and -foffload-lto=[thin,full] for controlling LTO for offload compilation. Allow LTO for AMDGPU target. AMDGPU target does not support codegen of object files containing call of external functions, therefore the LLVM module passed to AMDGPU backend needs to contain definitions of all the callees. An LLVM option is added to allow function importer to import functions with noinline attribute. HIP toolchain passes proper LLVM options to lld to make sure function importer imports definitions of all the callees. Reviewed by: Teresa Johnson, Artem Belevich Differential Revision: https://reviews.llvm.org/D99683	2021-05-22 10:48:34 -04:00
Nikita Popov	8c07a78a96	Reapply [InstCombine] Fold multiuse shr eq zero This was reverted due to performance regressions in ARM benchmarks, which have since been addressed by D101196 (SCEV analysis improvement) and D101778 (CGP reverse transform). ----- The single-use case is handled implicity by converting the icmp into a mask check first. When comparing with zero in particular, we don't need the one-use restriction, as we only produce a single icmp. https://alive2.llvm.org/ce/z/MSixcm https://alive2.llvm.org/ce/z/GwpG0M	2021-05-22 14:46:50 +02:00
David Green	aca1951bd7	[ARM] Clean up some tests, removing dead instructions. NFC	2021-05-22 13:38:00 +01:00
Simon Pilgrim	99610346a2	[CostModel][X86] vXi8 MUL is always promoted to vXi16	2021-05-22 11:56:49 +01:00
Florian Hahn	2e9363c644	[Matrix] Bail out early if there are no matrix intrinsics. If there are no matrix intrinsics in a function, we can directly bail out, as there's nothing left to do. Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D102931	2021-05-22 11:37:25 +01:00

1 2 3 4 5 ...

216227 Commits