llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 03:02:36 +01:00

Author	SHA1	Message	Date
Simon Pilgrim	cf39147d1e	[DAG] MergeInnerShuffle with BinOps - sometimes accept undef mask elements If the inner shuffle already contains undef elements, then accept them in the merged shuffle as well. This helps some X86 HADD/SUB patterns where slow targets were ending up with HADD/SUB because the (un)merged shuffles were stuck either side of the ADD/SUB - meaning we ended up with a total cost much higher than the "2*shuffle+add" that a slow target usually expands a HADD/SUB to.	2021-04-01 14:33:00 +01:00
Alexey Bataev	4a573da3f8	[SLP]Remove `else` after `return`, NFC.`	2021-04-01 05:33:01 -07:00
Dmitry Preobrazhensky	985d51d2a5	[AMDGPU][MC][GFX10][GFX90A] Corrected _e32/_e64 suffices Fixed bugs https://bugs.llvm.org//show_bug.cgi?id=49643, https://bugs.llvm.org//show_bug.cgi?id=49644, https://bugs.llvm.org//show_bug.cgi?id=49645. Differential Revision: https://reviews.llvm.org/D99413	2021-04-01 14:21:00 +03:00
Simon Pilgrim	834f9c5710	[X86][SSE] Fold HOP(HOP(X,X),HOP(Y,Y)) -> HOP(PERMUTE(HOP(X,Y)),PERMUTE(HOP(X,Y)) For slow-hop targets, attempt to merge HADD/SUB pairs used in chains.	2021-04-01 11:54:10 +01:00
Simon Pilgrim	3a797eccba	[X86][SSE] Enable (F)HADD/SUB handling to SimplifyMultipleUseDemandedVectorElts Attempt to bypass unused horiz-op operands. This is very similar to the PACKSS/PACKUS handling - we should try to merge these.	2021-04-01 11:54:09 +01:00
Simon Pilgrim	96cbf2a08f	[X86][SSE] Add isHorizOp helper function. NFCI.	2021-04-01 11:54:09 +01:00
Dmitry Preobrazhensky	f7f94d099c	[AMDGPU][MC] Added flag to identify VOP instructions which have a single variant By convention, VOP1/2/C instructions which can be promoted to VOP3 have _e32 suffix while promoted instructions have _e64 suffix. Instructions which have a single variant should have no _e32/_e64 suffix. Unfortunately there was no simple way to identify single variant instructions - it was implemented by a hack. See bug https://bugs.llvm.org/show_bug.cgi?id=39086. This fix simplifies handling of single VOP instructions by adding a dedicated flag. Differential Revision: https://reviews.llvm.org/D99408	2021-04-01 13:53:12 +03:00
Florian Hahn	ab06955ff7	[SLP] Add test cases for missing SLP vectorization on AArch64.	2021-04-01 11:48:16 +01:00
David Sherwood	f9929db64e	[NFC] Add tests for scalable vectorization of loops with large stride acesses This patch just adds tests that we can vectorize loop such as these: for (i = 0; i < n; i++) dst[i * 7] += 1; and for (i = 0; i < n; i++) if (cond[i]) dst[i * 7] += 1; using scalable vectors, where we expect to use gathers and scatters in the vectorized loop. The vector of pointers used for the gather is identical to those used for the scatter so there should be no memory dependences. Tests are added here: Transforms/LoopVectorize/AArch64/sve-large-strides.ll Differential Revision: https://reviews.llvm.org/D99192	2021-04-01 10:25:06 +01:00
Yevgeny Rouban	5d014af6d9	[LoopFlatten] Do not report CFG analyses as up-to-date Removes CFGAnalyses from the preserved analyses set returned by LoopFlattenPass::run(). Reviewed By: Dave Green, Ta-Wei Tu Differential Revision: https://reviews.llvm.org/D99700	2021-04-01 15:52:36 +07:00
Sam Parker	a0fbbf61ec	[WebAssembly] Invert branch condition on xor input A frequent pattern for floating point conditional branches use an xor to invert the input for the branch. Instead we can fold away the xor by swapping the branch target instead. Differential Revision: https://reviews.llvm.org/D99171	2021-04-01 09:23:28 +01:00
Max Kazantsev	2231fbd3ed	[NFC] Undo some erroneous renamings Some vars renamed by mistake during auto-replacements. Undoing them.	2021-04-01 13:10:10 +07:00
Max Kazantsev	467293274d	[NFC] Disambiguate LI in GVN Name GVN uses name 'LI' for two different unrelated things: LoadInst and LoopInfo. This patch relates the variables with former meaning into 'Load' to disambiguate the code.	2021-04-01 12:40:35 +07:00
Chen Zheng	1c951bda95	[debug-info] support new tuning debugger type DBX for XCOFF DWARF Based on this debugger type, for now, we plan to: 1: use inline string by default for XCOFF DWARF 2: generate no column info for debug line table. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D99400	2021-04-01 00:11:30 -04:00
KAWASHIMA Takahiro	dd33b4b9e5	[GVN] Propagate llvm.access.group metadata of loads Before this change, the `llvm.access.group` metadata was dropped when moving a load instruction in GVN. This prevents vectorizing a C/C++ loop with `#pragma clang loop vectorize(assume_safety)`. This change propagates the metadata as well as other metadata if it is safe (the move-destination basic block and source basic block belong to the same loop). Differential Revision: https://reviews.llvm.org/D93503	2021-04-01 10:00:48 +09:00
KAWASHIMA Takahiro	1bdd822573	[GVN][NFC] Pre-commit test for D93503	2021-04-01 10:00:48 +09:00
qixingxue	df8bbdf70e	[GVN][NFC] Refactor analyzeLoadFromClobberingWrite This commit adjusts the order of two swappable if statements to make code cleaner. Reviewed By: lattner, nikic Differential Revision: https://reviews.llvm.org/D99648	2021-04-01 08:35:35 +08:00
Philip Reames	5a0375405f	[ValueTracking] Handle non-zero ashr/lshr recurrences If we know we don't shift out bits (e.g. exact), all we need to know is that input is non-zero.	2021-03-31 16:48:32 -07:00
Philip Reames	e571959eb5	[tests] Add tests for ashr/lshr recurrences in isKnownNonZero	2021-03-31 16:48:32 -07:00
Philip Reames	1966c159a4	Add debug printers for KnownBits [nfc]	2021-03-31 15:36:07 -07:00
Simonas Kazlauskas	c1d491f5a6	Support {S,U}REMEqFold before legalization This allows these optimisations to apply to e.g. `urem i16` directly before `urem` is promoted to i32 on architectures where i16 operations are not intrinsically legal (such as on Aarch64). The legalization then later can happen more directly and generated code gets a chance to avoid wasting time on computing results in types wider than necessary, in the end. Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88785	2021-04-01 01:35:41 +03:00
Craig Topper	ae21ac30d0	[RISCV] Add UnsupportedSchedZfh multiclass to reduce duplicate lines from RISCVSchedRocket.td and RISCVSchedSiFive7.td. NFC	2021-03-31 15:06:14 -07:00
YangKeao	bc3527f3c1	[X86] add dwarf annotation for inline stack probe While probing stack, the stack register is moved without dwarf information, which could cause panic if unwind the backtrace. This commit only add annotation for the inline stack probe case. Dwarf information for the loop case should be done in another patch and need further discussion. Reviewed By: nagisa Differential Revision: https://reviews.llvm.org/D99579	2021-04-01 00:32:50 +03:00
Thomas Preud'homme	737d322a2a	[test, InferFunctionAttrs] Fix use of var defined in CHECK-NOT LLVM test Transforms/InferFunctionAttrs/annotate contains two RUN invokation (UNKNOWN and NVPTX lines) which involve a CHECK-NOT directive with a variable not defined by the enabled CHECK prefixes. This commit fixes that by: - enabling CHECK prefix for unknown target with specialisation when it differs from other targets - checking for absence of bcmp with any attribute for NVPTX Reviewed By: tra Differential Revision: https://reviews.llvm.org/D99589	2021-03-31 22:06:34 +01:00
Roman Lebedev	5b224b04d8	[NFC][LoopRotation] Count the number of instructions hoisted/cloned into preheader	2021-03-31 23:27:36 +03:00
Philip Reames	4739f740ea	Revert "Make TableGenGlobalISel an object library" This reverts commit 2c3cf62d4a26de85aab180bb43a579c913b17f3e. Causes build failures on x86_64, will respond to commit thread with link errors.	2021-03-31 13:27:00 -07:00
Aaron Puchert	936867007c	Make TableGenGlobalISel an object library That's how it was originally intended but that wasn't possible because we still needed to support older CMake versions. The problem here is that the sources in TableGenGlobalISel are meant to be linked into both llvm-tblgen and TableGenTests (a unit test), but not be part of LLVM proper. So they shouldn't be an ordinary LLVM component. Because they are used in llvm-tblgen, they can't draw in the LLVM dylib dependency, but then we'd have to do the same thing in TableGenTests to make sure we don't link both a static Support library and another copy through the LLVM dylib. With an object library we're just reusing the object files and don't have to care about dependencies at all. Reviewed By: beanz Differential Revision: https://reviews.llvm.org/D74588	2021-03-31 22:20:56 +02:00
Philip Reames	b4fffe7b16	[tests] Exercise cases where SCEV can use trip counts to refine ashr/lshr recurrences	2021-03-31 12:48:50 -07:00
Alexey Bataev	43bf28eee8	[SLP]Update test checks, NFC	2021-03-31 12:35:58 -07:00
Craig Topper	1e557f9750	[SelectionDAG] Remove unneeded vector resize from the end of FoldConstantArithmetic. NFC There's an assert right before that makes sure the size already matches. Earlier in this function's life, scalars and vectors shared more code.	2021-03-31 12:33:10 -07:00
George Mitenkov	b6034064b6	[ConstantFolding] Fixing addo/subo with undef When folding addo/subo with undef, the current convention is to use { -1, false } for addo and { 0, false } for subo. This was fixed for InstSimplify in https://reviews.llvm.org/rGf094d65beaa492e845b03561eddd75b5be653a01, but not in ConstantFolding. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D99564	2021-03-31 21:47:29 +03:00
Alexey Bataev	897d734376	[SLP]Add a test for the bug in `getVectorElementSize()`, NFC.	2021-03-31 11:40:44 -07:00
Huihui Zhang	7272e7d7a3	[LoopVectorize] Use SetVector to track uniform uses to prevent non-determinism. Use SetVector instead of SmallPtrSet to track values with uniform use. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM test consecutive-ptr-uniforms.ll . Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99549	2021-03-31 11:21:07 -07:00
Thomas Lively	a9b1f02ebd	[WebAssembly] Implement i64x2 comparisons Removes the prototype builtin and intrinsic for i64x2.eq and implements that instruction as well as the other i64x2 comparison instructions in the final SIMD spec. Unsigned comparisons were not included in the final spec, so they still need to be scalarized via a custom lowering. Differential Revision: https://reviews.llvm.org/D99623	2021-03-31 10:46:17 -07:00
Juneyoung Lee	3e94a07500	[ValueTracking] Add with.overflow intrinsics to poison analysis functions This is a patch teaching ValueTracking that `s/u*.with.overflow` intrinsics do not create undef/poison and they propagate poison. I couldn't write a nice example like the one with ctpop; ValueTrackingTest.cpp were simply updated to check these instead. This patch helps reducing regression while fixing https://llvm.org/pr49688 . Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99671	2021-04-01 02:41:38 +09:00
Philip Reames	4da7b4cf4f	[SCEV] Handle unreachable binop when matching shift recurrence This fixes an issue introduced with my change d4648e, and reported in pr49768. The root problem is that dominance collapses in unreachable code, and that LoopInfo explicitly only models reachable code. Since the recurrence matcher doesn't filter by reachability (and can't easily because not all consumers have domtree), we need to bailout before assuming that finding a recurrence implies we found a loop.	2021-03-31 10:33:34 -07:00
Craig Topper	ba24a9b976	[X86] Improve SMULO/UMULO codegen for vXi8 vectors. The default expansion creates a MUL and either a MULHS/MULHU. Each of those separately expand to sequences that use one or more PMULLW instructions as well as additional instructions to extend the types to vXi16. The MULHS/MULHU expansion computes the whole 16-bit product, but only keeps the high part. We can improve the lowering of SMULO/UMULO for some cases by using the MULHS/MULHU expansion, but keep both the high and low parts. And we can use those parts to calculate the overflow. For AVX512 we might have vXi1 overflow outputs. We can improve those by using vpcmpeqw to produce a k register if AVX512BW is enabled. This is a little better than truncating the high result to use vpcmpeqb. If we don't have avx512bw we can extend up to v16i32 to use vpcmpeqd to produce a k register. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97624	2021-03-31 10:13:50 -07:00
Shimin Cui	af509b77f3	[PowerPC] [MLICM] Enable hoisting of caller preserved registers on AIX On ppc64 linux , MachineLICM will hoist caller preserved registers, including TOC loads of the global variable address, out of loops. This is to enable this on AIX for both ppc64 and ppc32. Differential Revision: https://reviews.llvm.org/D99076	2021-03-31 12:46:25 -04:00
Craig Topper	37baf7a26c	[X86] Improve optimizeCompareInstr for signed comparisons after BMI/TBM instructions We previously couldn't optimize out a TEST if the branch/setcc/cmov used the overflow flag. This patches allows the TEST to be removed if the flag producing instruction is known to clear the OF flag. Thats what the TEST instruction would have done so that should be equivalent. Need to add test cases. I'll try to get back to this if I have bandwidth. Fixes PR48768. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D94856	2021-03-31 09:45:29 -07:00
Wael Yehia	1925a6cfe8	[LTO][Legacy] Decouple option parsing from LTOCodeGenerator in this patch we add a new libLTO API to specify debug options independent of an lto_code_gen_t. This allows clients to pass codegen flags (through libLTO) which otherwise today are ignored. Reviewed By: steven_wu Differential Revision: https://reviews.llvm.org/D92611	2021-03-31 16:43:26 +00:00
Craig Topper	c662f4964b	[RISCV] Add RISCVISD opcodes for CLZW and CTZW. Our CLZW isel pattern is quite easily broken by surrounding code preventing it from matching sometimes. This usually results in failing to remove the and X, 0xffffffff inserted by type legalization. The add with -32 that type legalization also inserts will often gets combined into other add/sub nodes. That doesn't usually result in extra code when we don't use clzw. CTTZ seems to be less fragile, but I wanted to keep it consistent with CTLZ. Reviewed By: asb, HsiangKai Differential Revision: https://reviews.llvm.org/D99317	2021-03-31 09:40:07 -07:00
Jay Foad	b26eea9c23	[AMDGPU] Add some image tests with enable-prt-strict-null disabled. NFC.	2021-03-31 17:27:20 +01:00
Jay Foad	a35262e5d6	[AMDGPU] Use a common check prefix for some image tests. NFC.	2021-03-31 17:27:20 +01:00
Craig Topper	588b65d9ad	[RISCV] Add isel patterns to select vsub_vx intrinsic to vadd.vi if it uses a small enough immediate Also modify the simm5_plus1 check because Imm-1 is UB if Imm happens to be INT64_MIN. I don't think the compiler would optimize based on that in this usage, but it could fail UBSan or -ftrapv. Reviewed By: HsiangKai, frasercrmck Differential Revision: https://reviews.llvm.org/D99637	2021-03-31 09:26:41 -07:00
Arthur Eubanks	2e601b2808	[llvm-jitlink] Fix -Wunused-function on Windows Reviewed By: sgraenitz Differential Revision: https://reviews.llvm.org/D99604	2021-03-31 09:26:09 -07:00
Heejin Ahn	ac42de94f4	[WebAssembly] Raname a test and fix comments D99627 fixed a decoding bug, not an encoding bug. This renames the test to correct it and fix comments. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D99644	2021-03-31 09:13:08 -07:00
Sanjay Patel	276f54740e	[InstCombine] fold abs(srem X, 2) This is a missing optimization based on an example in: https://llvm.org/PR49763 As noted there and the test here, we could add a more general fold if that is shown useful. https://alive2.llvm.org/ce/z/xEHdTv https://alive2.llvm.org/ce/z/97dcY5	2021-03-31 11:29:20 -04:00
Sanjay Patel	b2b2a53d04	[InstCombine] add tests for srem+abs; NFC	2021-03-31 11:29:20 -04:00
Bradley Smith	64f8889766	[AArch64][SVE] Add tests for UREM/SREM using fixed SVE types Differential Revision: https://reviews.llvm.org/D99265	2021-03-31 16:09:55 +01:00
Sander de Smalen	772162c240	[SVE] Fix LoopVectorizer test scalalable-call.ll This marks FSIN and other operations to EXPAND for scalable vectors, so that they are not assumed to be legal by the cost-model. Depends on D97470 Reviewed By: dmgreen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D97471	2021-03-31 14:52:49 +01:00

1 2 3 4 5 ...

213522 Commits