llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 04:52:54 +02:00

Author	SHA1	Message	Date
Sanjay Patel	66f3d22962	[InstCombine] add isCanonicalPredicate() helper function and use it; NFCI There should be a slight efficiency improvement from handling icmp/fcmp with one matcher and reducing duplicated code. The larger motivation is that there are questions about how predicate canonicalization is handled, and the refactoring should make it easier if we want to change any of that behavior. 1. As noted in the code comment, we've chosen 3 of the 16 FCMP preds as not canonical. Why those 3? It goes back to rL32751 from what I can tell, but I'm not sure if there's a justification for that rule. 2. We currently do not canonicalize integer select conditions. Should we use the same rule that applies to branches for selects? 3. We currently do canonicalize some FP select conditions, and those rules would conflict with the rule shown here. Should one or both be changed? No-functional-change-intended, but adding tests anyway because there's no coverage for most of the predicates. Differential Revision: https://reviews.llvm.org/D33247 llvm-svn: 303261	2017-05-17 14:21:19 +00:00
Gor Nishanov	ef81a105e5	[coroutines] Handle spills before catchswitch If we need to spill the result of the PHI instruction, we insert the spill after all of the PHIs and EHPads, however, in a catchswitch block there is no room to insert the spill. Make room by splitting away catchswitch into a separate block. Before the fix: catch.dispatch: %val = phi i32 [ 1, %if.then ], [ 2, %if.else ] %switch = catchswitch within none [label %catch] unwind label %cleanuppad After: catch.dispatch: %val = phi i32 [ 1, %if.then ], [ 2, %if.else ] %tok = cleanuppad within none [] ; spill goes here cleanupret from %tok unwind label %catch.dispatch.switch catch.dispatch.switch: %switch = catchswitch within none [label %catch] unwind label %cleanuppad https://reviews.llvm.org/D31846 llvm-svn: 303232	2017-05-17 03:09:22 +00:00
Davide Italiano	e6fa4e685f	[NewGVN] Re-enable test now that the nondeterminism has been fixed. llvm-svn: 303217	2017-05-16 22:27:06 +00:00
NAKAMURA Takumi	0fc3371c12	llvm/test/Transforms/InstCombine/debuginfo-skip.ll REQUIRES +asserts. llvm-svn: 303216	2017-05-16 22:19:56 +00:00
Sanjay Patel	5c904052a0	[InstSimplify] add folds for constant mask of value shifted by constant We would eventually catch these via demanded bits and computing known bits in InstCombine, but I think it's better to handle the simple cases as soon as possible as a matter of efficiency. This fold allows further simplifications based on distributed ops transforms. eg: %a = lshr i8 %x, 7 %b = or i8 %a, 2 %c = and i8 %b, 1 InstSimplify can directly fold this now: %a = lshr i8 %x, 7 Differential Revision: https://reviews.llvm.org/D33221 llvm-svn: 303213	2017-05-16 21:51:04 +00:00
Amara Emerson	b4afa9c73c	Re-commit r302678, fixing PR33053. The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions which didn't have a lowering. llvm-svn: 303211	2017-05-16 21:29:22 +00:00
Easwaran Raman	d91313ddb5	[Inliner] Do not mix callsite and callee hotness based updates. Update threshold based on callee's hotness only when BFI is not available. Otherwise use only callsite's hotness. This makes it easier to reason about hotness related threshold updates. Differential revision: https://reviews.llvm.org/D33157 llvm-svn: 303210	2017-05-16 21:18:09 +00:00
Sanjay Patel	95166c2c5b	[InstCombine] auto-generate better checks; NFC llvm-svn: 303203	2017-05-16 20:09:32 +00:00
Dmitry Mikulin	2eecd88090	In debug builds non-trivial amount of time is spent in InstCombine processing @llvm.dbg.* calls in visitCallInst(). They can be safely ignored. llvm-svn: 303202	2017-05-16 20:08:49 +00:00
Sanjay Patel	483e8a2253	[InstCombine] add motivational comment for tests; NFC The referenced tests are derived from: https://bugs.llvm.org/show_bug.cgi?id=32791 and: https://reviews.llvm.org/D33172 The motivation for including negative tests may not be clear, so I'm adding an explanatory comment here. In the post-commit thread for r303133: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170515/453793.html ...it was mentioned that we don't want to add redundant tests. This is a valid point. But in this case, we have a patch under review (D33172) that demonstrates that no existing regression tests are affected by a proposed code change, but these are. Therefore, I think these tests have value not visible in any existing regression tests regardless of whether they show a transform. Differential Revision: https://reviews.llvm.org/D33242 llvm-svn: 303185	2017-05-16 16:30:46 +00:00
Matthew Simpson	be5fce863d	Revert 303174, 303176, and 303178 These commits are breaking the bots. Reverting to investigate. llvm-svn: 303182	2017-05-16 15:50:30 +00:00
Matthew Simpson	f9ca5aa639	Make test target-specific llvm-svn: 303178	2017-05-16 15:33:22 +00:00
Matthew Simpson	2b38ec1ab7	Fix test case to unbreak bots llvm-svn: 303176	2017-05-16 15:20:27 +00:00
Matthew Simpson	fdeda43e2f	[LV] Avoid potentential division by zero when selecting IC llvm-svn: 303174	2017-05-16 14:43:55 +00:00
Gor Nishanov	e2a5e02b38	[coroutines] Handle unwind edge splitting Summary: RewritePHIs algorithm used in building of CoroFrame inserts a placeholder ``` %placeholder = phi [%val] ``` on every edge leading to a block starting with PHI node with multiple incoming edges, so that if one of the incoming values was spilled and need to be reloaded, we have a place to insert a reload. We use SplitEdge helper function to split the incoming edge. SplitEdge function does not deal with unwind edges comping into a block with an EHPad. This patch adds an ehAwareSplitEdge function that can correctly split the unwind edge. For landing pads, we clone the landing pad into every edge block and replace the original landing pad with a PHI collection the values from all incoming landing pads. For WinEH pads, we keep the original EHPad in place and insert cleanuppad/cleapret in the edge blocks. Reviewers: majnemer, rnk Reviewed By: majnemer Subscribers: EricWF, llvm-commits Differential Revision: https://reviews.llvm.org/D31845 llvm-svn: 303172	2017-05-16 14:11:39 +00:00
Davide Italiano	c968f4e36b	Revert "[NewGVN] Replace predicate info leftovers." It's breaking the bots. llvm-svn: 303142	2017-05-16 05:51:21 +00:00
Davide Italiano	08da355c1b	[NewGVN] Replace predicate info leftovers. Fixes PR32945. Differential Revision: https://reviews.llvm.org/D33226 llvm-svn: 303141	2017-05-16 05:23:23 +00:00
Sanjay Patel	3f32289f72	[InstCombine] add tests for PR32791; NFC llvm-svn: 303133	2017-05-15 23:59:28 +00:00
Sanjay Patel	6a5f74bb94	[InstSimplify] add tests for unnecessary mask of shifted values; NFC llvm-svn: 303127	2017-05-15 22:54:37 +00:00
David Blaikie	cbf90e5613	PR32288: Describe a bool parameter's DWARF location with a simple register There's no need (& a bit incorrect) to mask off the high bits of the register reference when describing a simple bool value. Reviewers: aprantl Differential Revision: https://reviews.llvm.org/D31062 llvm-svn: 303117	2017-05-15 21:34:01 +00:00
Adam Nemet	f9607f0660	[SLP] Enable 64-bit wide vectorization on AArch64 ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. * Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116	2017-05-15 21:15:01 +00:00
Hans Wennborg	247e13c637	Revert r302678 "[AArch64] Enable use of reduction intrinsics." This caused PR33053. Original commit message: > The new experimental reduction intrinsics can now be used, so I'm enabling this > for AArch64. We will need this for SVE anyway, so it makes sense to do this for > NEON reductions as well. > > The existing code to match shufflevector patterns are replaced with a direct > lowering of the reductions to AArch64-specific nodes. Tests updated with the > new, simpler, representation. > > Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 303115	2017-05-15 20:59:32 +00:00
Sanjay Patel	612c21f9a9	[InstCombine] restrict icmp fold with 2 sdiv exact operands (PR32949) This is the InstCombine counterpart to D32954. I added some comments about the code duplication in: rL302436 Alive-based verification: http://rise4fun.com/Alive/dPw This is a 2nd fix for the problem reported in: https://bugs.llvm.org/show_bug.cgi?id=32949 Differential Revision: https://reviews.llvm.org/D32970 llvm-svn: 303105	2017-05-15 19:27:53 +00:00
Sanjay Patel	6116bcb0ac	[InstSimplify] restrict icmp fold with 2 sdiv exact operands (PR32949) These folds were introduced with https://reviews.llvm.org/rL127064 as part of solving: https://bugs.llvm.org/show_bug.cgi?id=9343 As shown here: http://rise4fun.com/Alive/C8 ...however, the sdiv exact case needs a stronger predicate. I opted for duplicated code instead of adding another fallthrough because I think that's easier to read (and edit in case we need/want to restrict/loosen the predicates any more). This should fix: https://bugs.llvm.org/show_bug.cgi?id=32949 https://bugs.llvm.org/show_bug.cgi?id=32948 Differential Revision: https://reviews.llvm.org/D32954 llvm-svn: 303104	2017-05-15 19:16:49 +00:00
Evgeny Stupachenko	d11ab9e578	The patch adds CTLZ idiom recognition. Summary: The following loops should be recognized: i = 0; while (n) { n = n >> 1; i++; body(); } use(i); And replaced with builtin_ctlz(n) if body() is empty or for CPUs that have CTLZ instruction converted to countable: for (j = 0; j < builtin_ctlz(n); j++) { n = n >> 1; i++; body(); } use(builtin_ctlz(n)); Reviewers: rengolin, joerg Differential Revision: http://reviews.llvm.org/D32605 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 303102	2017-05-15 19:08:56 +00:00
Davide Italiano	a3335b259b	[NewGVN] Fix verification of MemoryPhis in verifyMemoryCongruency(). verifyMemoryCongruency() filters out trivially dead MemoryDef(s), as we find them immediately dead, before moving from TOP to a new congruence class. This fixes the same problem for PHI(s) skipping MemoryPhis if all the operands are dead. Differential Revision: https://reviews.llvm.org/D33044 llvm-svn: 303100	2017-05-15 18:50:53 +00:00
Simon Pilgrim	a09d9c5af0	[SLPVectorizer][X86] Add vectorization tests for vXi64/vXi32/vXi16/VXi8 add/sub/mul llvm-svn: 303074	2017-05-15 15:48:15 +00:00
Simon Pilgrim	61ca3be831	[SLPVectorizer][X86] Add vectorization tests for vXi64/vXi32/vXi16/VXi8 shifts llvm-svn: 303069	2017-05-15 14:27:11 +00:00
Daniel Jasper	9ff7dd68c1	Fix two tests that weren't correctly copied. One didn't correctly fine the regex variable, the other still had a RUN line for FNOBUILTIN-checks, which weren't copied to the file. llvm-svn: 303025	2017-05-14 22:07:50 +00:00
Craig Topper	6c2eb35d5e	[InstSimplify] Add patterns for folding (A & B) \| (~A ^ B) -> (~A ^ B) and its commuted variants. We already had (A & ~B) \| (A ^ B), but we missed the cases where the not was part of the xor. llvm-svn: 303004	2017-05-14 07:54:43 +00:00
Craig Topper	3da63abd9f	foo llvm-svn: 303003	2017-05-14 07:54:40 +00:00
Xinliang David Li	5f25d42892	Renable test that was disabled due to cost analysis llvm-svn: 303000	2017-05-14 02:58:39 +00:00
Simon Pilgrim	288ebc9253	[LoopOptimizer][Fix]PR32859, PR24738 The Loop vectorizer pass introduced undef value while it is fixing output of LCSSA form. Here it is: before: %e.0.ph = phi i32 [ 0, %for.inc.2.i ] after: %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ undef, %middle.block ] and after this change we have: %e.0.ph = phi i32 [ 0, %for.inc.2.i ] %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ 0, %middle.block ] Committed on behalf of @dtemirbulatov Differential Revision: https://reviews.llvm.org/D33055 llvm-svn: 302988	2017-05-13 13:25:57 +00:00
Craig Topper	d39e7102a2	[InstCombine] Prevent InstCombine from triggering an extra iteration if something changed in the initial Worklist creation Summary: If the Worklist build causes an IR change this change flag currently factors into the flag for running another iteration of the iteration loop. But only changes during processing should trigger another loop. This patch captures the worklist creation change flag into the outside the loop flag currently used for DbgDeclares and only sends that flag up to the caller. Rerunning the loop only depends on IC.run() now. This uses the debug output of InstCombine to determine if one or two iterations run. I couldn't think of a better way to detect it since the second spurious iteration shoudn't make any visible changes. Just wasted computation. I can do a pre-commit of the test case with the CHECK-NOT as a CHECK if this is an ok way to check this. This is a subset of D31678 as I'm still not sure how to verify the analysis behavior for that. Reviewers: davide, majnemer, spatel, chandlerc Reviewed By: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32453 llvm-svn: 302982	2017-05-13 06:56:04 +00:00
Justin Bogner	104b204907	ConstProp: Split x86 SSE intrinsic tests out of calls.ll This allows us to mark this as `REQUIRES: x86`, since it uses x86 target specific intrinsics. llvm-svn: 302980	2017-05-13 05:52:17 +00:00
Justin Bogner	2d310588a2	InstCombine: Move tests that use target intrinsics into subdirectories Tests with target intrinsics are inherently target specific, so it doesn't actually make sense to run them if we've excluded their target. llvm-svn: 302979	2017-05-13 05:39:46 +00:00
NAKAMURA Takumi	0474b0e5ba	Disable llvm/test/Transforms/NewGVN/pr32934.ll while Davide is investigating. llvm-svn: 302977	2017-05-13 03:05:38 +00:00
Davide Italiano	c5771fbe04	[NewGVN] XFAIL a flaky test until I find out what's going on. I bet the change is correct but this test seems to expose some underlying problem that manifest only on some buildbots, and I'm not able to reproduce locally. Unfortunately I can't debug right now but I don't want to annoy people with spurious failures, so I'll XFAIL until I can take a look (over the weekend). llvm-svn: 302976	2017-05-13 02:45:47 +00:00
Xinliang David Li	28a5d9c340	[PartialInlining] Profile based cost analysis Implemented frequency based cost/saving analysis and related options. The pass is now in a state ready to be turne on in the pipeline (in follow up). Differential Revision: http://reviews.llvm.org/D32783 llvm-svn: 302967	2017-05-12 23:41:43 +00:00
Andrew Kaylor	392b1353f7	[TLI] Add mapping for various '__<func>_finite' forms of the math routines to SVML routines Patch by Chris Chrulski Differential Revision: https://reviews.llvm.org/D31789 llvm-svn: 302957	2017-05-12 22:11:26 +00:00
Andrew Kaylor	0792f83d68	[ConstantFolding] Add folding for various math '__<func>_finite' routines generated from -ffast-math Patch by Chris Chrulski Differential Revision: https://reviews.llvm.org/D31788 llvm-svn: 302956	2017-05-12 22:11:20 +00:00
Andrew Kaylor	95445317bd	[TLI] Add declarations for various math header file routines from math-finite.h that create '__<func>_finite as functions Patch by Chris Chrulski Differential Revision: https://reviews.llvm.org/D31787 llvm-svn: 302955	2017-05-12 22:11:12 +00:00
Davide Italiano	c3a517dde1	[LoopUnroll] Fix a test. REQUIRE should be REQUIRES. Found by inspection. llvm-svn: 302909	2017-05-12 15:30:58 +00:00
Davide Italiano	4d6c0b489d	[NewGVN] Don't incorrectly reset the memory leader. This code was missing a check for stores, so we were thinking the congruency class didn't have any memory members, and reset the memory leader. Differential Revision: https://reviews.llvm.org/D33056 llvm-svn: 302905	2017-05-12 15:22:45 +00:00
Chandler Carruth	a360b737fa	[PM/Unswitch] Teach the new simple loop unswitch to handle loop invariant PHI inputs and to rewrite PHI nodes during the actual unswitching. The checking is quite easy, but rewriting the PHI nodes is somewhat surprisingly challenging. This should handle both branches and switches. I think this is now a full featured trivial unswitcher, and more full featured than the trivial cases in the old pass while still being (IMO) somewhat simpler in how it works. Next up is to verify its correctness in more widespread testing, and then to add non-trivial unswitching. Thanks to Davide and Sanjoy for the excellent review. There is one remaining question that I may address in a follow-up patch (see the review thread for details) but it isn't related to the functionality specifically. Differential Revision: https://reviews.llvm.org/D32699 llvm-svn: 302867	2017-05-12 02:19:59 +00:00
Teresa Johnson	81aa3660f7	Restrict call metadata based hotness detection to Sample PGO mode Summary: Don't use the metadata on call instructions for determining hotness unless we are in sample PGO mode, where it is needed because profile counts are not accurate. In instrumentation mode this is not necessary and does more harm than good when calls have VP metadata that hasn't been properly scaled after transformations or dropped after constant prop based devirtualization (both should be fixed, but we don't need to do this in the first place for instrumentation PGO). This required adjusting a number of tests to distinguish between sample and instrumentation PGO handling, and to add in profile summary metadata so that getProfileCount can get the summary. Reviewers: davidxl, danielcdh Subscribers: aemerson, rengolin, mehdi_amini, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D32877 llvm-svn: 302844	2017-05-11 23:18:05 +00:00
Easwaran Raman	7fb8c288b9	Decrease inlinecold-threshold to 45 I ran the test-suite (including SPEC 2006) in PGO mode comparing cold thresholds of 225 and 45. Here are some stats on the text size: Out of 904 tests that ran, 197 see a change in text size. The average text size reduction (of all the 904 binaries) is 1.07%. Of the 197 binaries, 19 see a text size increase, as high as 18%, but most of them are small single source benchmarks. There are 3 multisource benchmarks with a >0.5% size increase (0.7, 1.3 and 2.1 are their % increases). On the other side of the spectrum, 31 benchmarks see >10% size reduction and 6 of them are MultiSource. I haven't run the test-suite with other values of inlinecold-threshold. Since we have a cold callsite threshold of 45, I picked this value. Differential revision: https://reviews.llvm.org/D33106 llvm-svn: 302829	2017-05-11 21:36:28 +00:00
Adam Nemet	73703d12a4	[SLP] Emit optimization remarks The approach I followed was to emit the remark after getTreeCost concludes that SLP is profitable. I initially tried emitting them after the vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely remember missing a few cases for example in HorizontalReduction::tryToReduce. ORE is placed in BoUpSLP so that it's available from everywhere (notably HorizontalReduction::tryToReduce). We use the first instruction in the root bundle as the locator for the remark. In order to get a sense how far the tree is spanning I've include the size of the tree in the remark. This is not perfect of course but it gives you at least a rough idea about the tree. Then you can follow up with -view-slp-tree to really see the actual tree. llvm-svn: 302811	2017-05-11 17:06:17 +00:00
Sanjay Patel	c4d21ee61b	[InstCombine] remove fold that swaps xor/or with constants; NFCI // (X ^ C1) \| C2 --> (X \| C2) ^ (C1&~C2) This canonicalization was added at: https://reviews.llvm.org/rL7264 By moving xors out/down, we can more easily combine constants. I'm adding tests that do not change with this patch, so we can verify that those kinds of transforms are still happening. This is no-functional-change-intended because there's a later fold: // (X^C)\|Y -> (X\|Y)^C iff Y&C == 0 ...and demanded-bits appears to guarantee that any fold that would have hit the fold we're removing here would be caught by that 2nd fold. Similar reasoning was used in: https://reviews.llvm.org/rL299384 The larger motivation for removing this code is that it could interfere with the fix for PR32706: https://bugs.llvm.org/show_bug.cgi?id=32706 Ie, we're not checking if the 'xor' is actually a 'not', so we could reverse a 'not' optimization and cause an infinite loop by altering an 'xor X, -1'. Differential Revision: https://reviews.llvm.org/D33050 llvm-svn: 302733	2017-05-10 21:33:55 +00:00
Sanjay Patel	9a17eb1902	[InstSimplify, InstCombine] move 'or' simplification tests; NFC Surprisingly, I don't think these are redundant for InstSimplify. They were just misplaced as InstCombine tests. llvm-svn: 302684	2017-05-10 15:57:47 +00:00

1 2 3 4 5 ...

8882 Commits