llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 21:42:54 +02:00

Author	SHA1	Message	Date
Davide Italiano	572717843f	[SimplifyLibCalls] Add test to ensure transform is not executed if fast-math attribute is not present. During my refactor in r251595 I changed the behavior of optimizeSqrt(), skipping the transformation if the function wasn't marked with unsafe-fp-math attribute. This fixed a bug, as confirmed by Sanjay (before the optimization was silently executed anyway), although it wasn't my primary aim. This commit adds a test to ensure the code doesn't break again. Reported by: Marcello Maggioni Discussed with: Sanjay Patel llvm-svn: 251747	2015-10-31 20:59:32 +00:00
Silviu Baranga	8fa79f5d2b	[InstCombine] Teach instcombine not to create extra PHI nodes when folding GEPs Summary: InstCombine tries to transform GEP(PHI(GEP1, GEP2, ..)) into GEP(GEP(PHI(...)) when possible. However, this may leave the old PHI node around. Even if we do end up folding the GEPs, having an extra PHI node might not be beneficial. This change makes the transformation more conservative. We now only do this if the PHI has only one use, and can therefore be removed after the transformation. Reviewers: jmolloy, majnemer Subscribers: mcrosier, mssimpso, llvm-commits Differential Revision: http://reviews.llvm.org/D13887 llvm-svn: 251281	2015-10-26 10:25:05 +00:00
Hal Finkel	1a64f66683	Handle non-constant shifts in computeKnownBits, and use computeKnownBits for constant folding in InstCombine/Simplify First, the motivation: LLVM currently does not realize that: ((2072 >> (L == 0)) >> 7) & 1 == 0 where L is some arbitrary value. Whether you right-shift 2072 by 7 or by 8, the lowest-order bit is always zero. There are obviously several ways to go about fixing this, but the generic solution pursued in this patch is to teach computeKnownBits something about shifts by a non-constant amount. Previously, we would give up completely on these. Instead, in cases where we know something about the low-order bits of the shift-amount operand, we can combine (and together) the associated restrictions for all shift amounts consistent with that knowledge. As a further generalization, I refactored all of the logic for all three kinds of shifts to have this capability. This works well in the above case, for example, because the dynamic shift amount can only be 0 or 1, and thus we can say a lot about the known bits of the result. This brings us to the second part of this change: Even when we know all of the bits of a value via computeKnownBits, nothing used to constant-fold the result. This introduces the necessary code into InstCombine and InstSimplify. I've added it into both because: 1. InstCombine won't automatically pick up the associated logic in InstSimplify (InstCombine uses InstSimplify, but not via the API that passes in the original instruction). 2. Putting the logic in InstCombine allows the resulting simplifications to become part of the iterative worklist 3. Putting the logic in InstSimplify allows the resulting simplifications to be used by everywhere else that calls SimplifyInstruction (inlining, unrolling, and many others). And this requires a small change to our definition of an ephemeral value so that we don't break the rest case from r246696 (where the icmp feeding the @llvm.assume, is also feeding a br). Under the old definition, the icmp would not be considered ephemeral (because it is used by the br), but this causes the assume to remove itself (in addition to simplifying the branch structure), and it seems more-useful to prevent that from happening. llvm-svn: 251146	2015-10-23 20:37:08 +00:00
Michael Liao	9a38e95740	[InstCombine] Revise the test case to match full sequene llvm-svn: 250950	2015-10-21 21:50:58 +00:00
Michael Liao	5efcb99302	[InstCombine] Optimize icmp of inc/dec at RHS Allow LLVM to optimize the sequence like the following: %inc = add nsw i32 %i, 1 %cmp = icmp slt %n, %inc into: %cmp = icmp sle i32 %n, %i The case is not handled previously due to the complexity of compuation of %n. Hence, LLVM cannot swap operands of icmp accordingly. llvm-svn: 250746	2015-10-19 22:08:14 +00:00
Simon Pilgrim	2df3161736	[InstCombine] SSE4A constant folding and conversion to shuffles. This patch improves support for combining the SSE4A EXTRQ(I) and INSERTQ(I) intrinsics: 1 - Converts INSERTQ/EXTRQ calls to INSERTQI/EXTRQI if the 'bit index' and 'length' operands are constant 2 - Converts INSERTQI/EXTRQI calls to shufflevector if the bit index/length are both byte aligned (we can already lower shuffles to INSERTQI/EXTRQI if its useful) 3 - Constant folding support 4 - Add zeroinitializer handling Differential Revision: http://reviews.llvm.org/D13348 llvm-svn: 250609	2015-10-17 11:40:05 +00:00
Philip Reames	1bb1df8eb7	Tighten known bits for ctpop based on zero input bits This is a cleaned up patch from the one written by John Regehr based on the findings of the Souper superoptimizer. The basic idea here is that input bits that are known zero reduce the maximum count that the intrinsic could return. We know that the number of bits required to represent a particular count is at most log2(N)+1. Differential Revision: http://reviews.llvm.org/D13253 llvm-svn: 250338	2015-10-14 22:42:12 +00:00
Simon Pilgrim	5a441530c3	[InstCombine][SSE4A] Remove broken INSERTQI range combining optimization As discussed in D13348 - the INSERTQI range combining code is wrong in that it confuses the insertion bit index with an extraction bit index. The remaining legal combines are very unlikely (especially once we've converted to shuffles in D13348) so I'm removing the optimization. llvm-svn: 250160	2015-10-13 14:48:54 +00:00
Simon Pilgrim	977c9904f0	[InstCombine] Tidied up SSE4A tests. First stage of bugfix discussed in D13348 llvm-svn: 250121	2015-10-12 23:07:06 +00:00
Simon Pilgrim	91bf14e39c	[InstCombine][X86][XOP] Combine XOP integer vector comparisons to native IR We now have lowering support for XOP PCOM/PCOMU instructions. llvm-svn: 249977	2015-10-11 14:38:34 +00:00
Sanjay Patel	0bf5df4240	[InstCombine] transform masking off of an FP sign bit into a fabs() intrinsic call (PR24886) This is a partial fix for PR24886: https://llvm.org/bugs/show_bug.cgi?id=24886 Without this IR transform, the backend (x86 at least) was producing inefficient code. This patch is making 2 assumptions: 1. The canonical form of a fabs() operation is, in fact, the LLVM fabs() intrinsic. 2. The high bit of an FP value is always the sign bit; as noted in the bug report, this isn't specified by the LangRef. Differential Revision: http://reviews.llvm.org/D13076 llvm-svn: 249702	2015-10-08 17:09:31 +00:00
Sanjay Patel	172d0b6bb7	[ValueTracking] teach computeKnownBits that a fabs() clears sign bits This was requested in D13076: if we're going to canonicalize to fabs(), ValueTracking should know that fabs() clears sign bits. In this patch (as in D13076), we're not handling vectors yet even though computeKnownBits' fabs() case itself should be vector-ready via the splat in this patch. Fixing this will require follow-on patches to correct other logic that uses 'getScalarType'. Differential Revision: http://reviews.llvm.org/D13222 llvm-svn: 249701	2015-10-08 16:56:55 +00:00
Artur Pilipenko	8b3cc14fdc	Teach computeKnownBits to use new align attribute/metadata Reviewed By: reames Differential Revision: http://reviews.llvm.org/D13470 llvm-svn: 249557	2015-10-07 16:01:18 +00:00
Hans Wennborg	b27c6dcee4	InstCombine: Fold comparisons between unguessable allocas and other pointers This will allow us to optimize code such as: int f(int p) { int x; return p == &x; } as well as: int allocate(void); int f() { int x; int *p = allocate(); return p == &x; } The folding can only be done under certain circumstances. Even though p and &x cannot alias, the comparison must still return true if the pointer representations are equal. If a user successfully generates a p that's a correct guess for &x, comparison should return true even though p is an invalid pointer. This patch argues that if the address of the alloca isn't observable outside the function, the function can act as-if the address is impossible to guess from the outside. The tricky part is keeping the act consistent: if we fold p == &x to false in one place, we must make sure to fold any other comparisons based on those pointers similarly. To ensure that, we only fold when &x is involved exactly once in comparison instructions. Differential Revision: http://reviews.llvm.org/D13358 llvm-svn: 249490	2015-10-07 00:20:07 +00:00
Philip Reames	ca99cf7d94	Extend known bits to understand @llvm.bswap This is a cleaned up patch from the one written by John Regehr based on the findings of the Souper superoptimizer. When writing tests, I was surprised to find that instsimplify apparently doesn't know how to collapse bit test sequences based purely on known bits. This required me to split my tests across both instsimplify and instcombine. Differential Revision: http://reviews.llvm.org/D13250 llvm-svn: 249453	2015-10-06 20:20:45 +00:00
Andrea Di Biagio	e88e150aaa	[InstCombine] Teach SimplifyDemandedVectorElts how to handle ConstantVector select masks with ConstantExpr elements (PR24922) If the mask of a select instruction is a ConstantVector, method SimplifyDemandedVectorElts iterates over the mask elements to identify which values are selected from the select inputs. Before this patch, method SimplifyDemandedVectorElts always used method Constant::isNullValue() to check if a value in the mask was zero. Unfortunately that method always returns false when called on a ConstantExpr. This patch fixes the problem in SimplifyDemandedVectorElts by adding an explicit check for ConstantExpr values. Now, if a value in the mask is a ConstantExpr, we avoid calling isNullValue() on it. Fixes PR24922. Differential Revision: http://reviews.llvm.org/D13219 llvm-svn: 249390	2015-10-06 10:34:53 +00:00
Bruno Cardoso Lopes	f80e20287d	[SimplifyLibCalls] Fix instruction misplacement in string/memory libcall optimization When trying to optimize fortified library functions use the right location to insert new instructions in order to preserve correct def-use order. This fixes an issue where a misplaced instruction definition would happen to be after one of its use after a RAUW, forming invalid IR. This behavior was introduced by r227250. Differential Revision: http://reviews.llvm.org/D13301 rdar://problem/22802369 llvm-svn: 249092	2015-10-01 22:43:53 +00:00
Arnaud A. de Grandmaison	3d6b7ea719	[InstCombine] Remove trivially empty lifetime start/end ranges. Summary: Some passes may open up opportunities for optimizations, leaving empty lifetime start/end ranges. For example, with the following code: void foo(char , char ); void bar(int Size, bool flag) { for (int i = 0; i < Size; ++i) { char text[1]; char buff[1]; if (flag) foo(text, buff); // BBFoo } } the loop unswitch pass will create 2 versions of the loop, one with flag==true, and the other one with flag==false, but always leaving the BBFoo basic block, with lifetime ranges covering the scope of the for loop. Simplify CFG will then remove BBFoo in the case where flag==false, but will leave the lifetime markers. This patch teaches InstCombine to remove trivially empty lifetime marker ranges, that is ranges ending right after they were started (ignoring debug info or other lifetime markers in the range). This fixes PR24598: excessive compile time after r234581. Reviewers: reames, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13305 llvm-svn: 249018	2015-10-01 14:54:31 +00:00
Andrea Di Biagio	71ee17fea4	[InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin shuffles if the shuffle mask is constant. This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a builtin shuffle if the mask is constant. Converting byte shuffle intrinsic calls to builtin shuffles can help finding more opportunities for combining shuffles later on in selection dag. We may end up with byte shuffles with constant masks as the result of inlining. Differential Revision: http://reviews.llvm.org/D13252 llvm-svn: 248913	2015-09-30 16:44:39 +00:00
Jeroen Ketema	b9ecf8a3ee	[ARM][NEON] Use address space in vld([1234]\|[234]lane) and vst([1234]\|[234]lane) instructions This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234], vst[234]lane ARM neon intrinsics and associates an address space with the pointer that these intrinsics take. This changes, e.g., <2 x i32> @llvm.arm.neon.vld1.v2i32(i8, i32) to <2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8, i32) This change ensures that address spaces are fully taken into account in the ARM target during lowering of interleaved loads and stores. Differential Revision: http://reviews.llvm.org/D12985 llvm-svn: 248887	2015-09-30 10:56:37 +00:00
Simon Pilgrim	95228d060e	[InstCombine] Improve Vector Demanded Bits Through Bitcasts Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements. This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector). This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts. Differential Revision: http://reviews.llvm.org/D12935 llvm-svn: 248784	2015-09-29 08:19:11 +00:00
Sanjay Patel	c39648a72a	[InstCombine] fold zexts and constants into a phi (PR24766) This is one step towards solving PR24766: https://llvm.org/bugs/show_bug.cgi?id=24766 We were not producing the same IR for these two C functions because the store to the temp bool causes extra zexts: #include <stdbool.h> bool switchy(char x1, char x2, char condition) { bool conditionMet = false; switch (condition) { case 0: conditionMet = (x1 == x2); break; case 1: conditionMet = (x1 <= x2); break; } return conditionMet; } bool switchy2(char x1, char x2, char condition) { switch (condition) { case 0: return (x1 == x2); case 1: return (x1 <= x2); } return false; } As noted in the code comments, this test case manages to avoid the more general existing phi optimizations where there are only 2 phi inputs or where there are no constant phi args mixed in with the casts ops. It seems like a corner case, but if we don't catch it, then I don't think we can get SimplifyCFG to further optimize towards the canonical form for this function shown in the bug report. Differential Revision: http://reviews.llvm.org/D12866 llvm-svn: 248689	2015-09-27 20:34:31 +00:00
Simon Pilgrim	a06843cf1a	[InstCombine] Removed unnecessary meta attributes. llvm-svn: 248672	2015-09-26 17:49:04 +00:00
Chen Li	11db69b00f	[Bug 24848] Use range metadata to constant fold comparisons between two values Summary: This is the second part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848. If both operands of a comparison have range metadata, they should be used to constant fold the comparison. Reviewers: sanjoy, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13177 llvm-svn: 248650	2015-09-26 03:26:47 +00:00
Sanjay Patel	61ecb7fad2	[InstCombine] match De Morgan's Law hidden by zext ops (PR22723) This is a fix for PR22723: https://llvm.org/bugs/show_bug.cgi?id=22723 My first attempt at this was to change what I thought was the root problem: xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32 ...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop! My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would mean potentially returning a different size for the match than what was input. I think this would require all users of m_Not to check the size of the returned match, so I abandoned that idea. I settled on just fixing the exact case presented in the PR. This patch does allow the 2 functions in PR22723 to compile identically (x86): bool test(bool x, bool y) { return !x \| !y; } bool test(bool x, bool y) { return !x \|\| !y; } ... andb %sil, %dil xorb $1, %dil movb %dil, %al retq Differential Revision: http://reviews.llvm.org/D12705 llvm-svn: 248634	2015-09-25 23:21:38 +00:00
Charlie Turner	4305b8ca2a	[InstCombine] Recognize another bswap idiom. Summary: The byte-swap recognizer can now notice that this ``` uint32_t bswap(uint32_t x) { x = (x & 0x0000FFFF) << 16 \| (x & 0xFFFF0000) >> 16; x = (x & 0x00FF00FF) << 8 \| (x & 0xFF00FF00) >> 8; return x; } ``` is a bswap. Fixes PR23863. Reviewers: nlewycky, hfinkel, hans, jmolloy, rengolin Subscribers: majnemer, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D12637 llvm-svn: 248482	2015-09-24 10:24:58 +00:00
Akira Hatanaka	e7bdc0a349	[InstCombine] Preserve metadata when merging loads that are phi arguments. Make sure InstCombiner::FoldPHIArgLoadIntoPHI doesn't drop the following metadata: MD_tbaa MD_alias_scope MD_noalias MD_invariant_load MD_nonnull MD_range rdar://problem/17617709 Differential Revision: http://reviews.llvm.org/D12710 llvm-svn: 248419	2015-09-23 18:40:57 +00:00
Chen Li	7331c90075	[Bug 24848] Use range metadata to constant fold comparisons with constant values Summary: This is the first part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848. When range metadata is provided, it should be used to constant fold comparisons with constant values. Reviewers: sanjoy, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12988 llvm-svn: 248402	2015-09-23 17:58:44 +00:00
David Majnemer	906a17fb34	[InstCombine] FoldICmpCstShrCst failed for ashr when comparing against -1 (icmp eq (ashr C1, %V) -1) may have multiple answers if C1 is not a power of two and has the sign bit set. This fixes PR24873. llvm-svn: 248074	2015-09-19 00:48:31 +00:00
Simon Pilgrim	76acca71c9	[InstCombine] Added vector demanded bits support for SSE4A EXTRQ/INSERTQ instructions The SSE4A instructions EXTRQ/INSERTQ only use the lower 64-bits (or less) for many of their input vector operands and all of them have undefined upper 64-bits results. Differential Revision: http://reviews.llvm.org/D12680 llvm-svn: 247934	2015-09-17 20:32:45 +00:00
Sanjoy Das	cbeb36a4cc	[InstCombine] Optimize icmp slt signum(x), 1 --> icmp slt x, 1 Summary: `signum(x)` is sometimes implemented as `(x >> 63) \| (-x >>> 63)` (for an `i64` `x`). This change adds a matcher for that pattern, and an instcombine rule to optimize `signum(x) s< 1`. Later, we can also consider optimizing: icmp slt signum(x), 0 --> icmp slt x, 0 icmp sle signum(x), 1 --> true etc. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12703 llvm-svn: 247846	2015-09-16 20:41:29 +00:00
Arch D. Robison	f5010c4ba7	Broaden optimization of fcmp ([us]itofp x, constant) by instcombine. The patch extends the optimization to cases where the constant's magnitude is so small or large that the rounding of the conversion is irrelevant. The "so small" case includes negative zero. Differential review: http://reviews.llvm.org/D11210 llvm-svn: 247708	2015-09-15 17:51:59 +00:00
Chen Li	7a880213bc	[InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing arguments at callsite Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG. It also adds assertions in isKnownNonNull() and isKnownNonNullFromDominatingCondition() to make sure the value checked is pointer type (as defined in LLVM document). These assertions might trip failures in things which are not covered under llvm/test, but fixes should be pretty obvious. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12779 llvm-svn: 247587	2015-09-14 18:10:43 +00:00
Simon Pilgrim	4f410cc25c	[InstCombine] CVTPH2PS Vector Demanded Elements + Constant Folding Improved InstCombine support for CVTPH2PS (F16C half 2 float conversion): <4 x float> @llvm.x86.vcvtph2ps.128(<8 x i16>) - only uses the bottom 4 i16 elements for the conversion. Added constant folding support. Differential Revision: http://reviews.llvm.org/D12731 llvm-svn: 247504	2015-09-12 13:39:53 +00:00
David Blaikie	65b92c4f37	[opaque pointer type] Add textual IR support for explicit type parameter for global aliases update.py: import fileinput import sys import re alias_match_prefix = r"(.(?:=\|:\|^)\s(?:external \|)(?:(?:private\|internal\|linkonce\|linkonce_odr\|weak\|weak_odr\|common\|appending\|extern_weak\|available_externally) )?(?:default \|hidden \|protected )?(?:dllimport \|dllexport )?(?:unnamed_addr \|)(?:thread_local(?:$[a-z]$)? )?alias" plain = re.compile(alias_match_prefix + r" (.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|addrspacecast\|\[\[[a-zA-Z]\|\{\{).$)") cast = re.compile(alias_match_prefix + r") ((?:bitcast\|inttoptr\|addrspacecast)\s$. to (.?)(\| addrspace\(\d+$ )\\)\s(?:;.)?$)") gep = re.compile(alias_match_prefix + r") ((?:getelementptr)\s(?:inbounds)?\s$(?P<type>.), (?P=type)(?:\saddrspace\(\d+$\s)?\* .\)\s(?:;.)?$)") def conv(line): m = re.match(cast, line) if m: return m.group(1) + " " + m.group(3) + ", " + m.group(2) m = re.match(gep, line) if m: return m.group(1) + " " + m.group(3) + ", " + m.group(2) m = re.match(plain, line) if m: return m.group(1) + ", " + m.group(2) + m.group(3) + "" + m.group(4) + "\n" return line for line in sys.stdin: sys.stdout.write(conv(line)) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name .ll \| xargs ./apply.sh llvm-svn: 247378	2015-09-11 03:22:04 +00:00
Mehdi Amini	2a8d4fb24b	Revert "[InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing arguments at callsite" This reverts commit r247356. Breaks test/Transforms/InstCombine/pr8547.ll with: Wrong types for attribute: byval inalloca nest noalias nocapture nonnull readnone readonly sret dereferenceable(1) dereferenceable_or_null(1) %call = call i32 (i8, ...) @printf(i8 getelementptr inbounds ([10 x i8], [10 x i8]* @.str, i64 0, i64 0), i32 nonnull %conv2) #0 LLVM ERROR: Broken function found, compilation aborted! From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 247371	2015-09-11 01:33:48 +00:00
Chen Li	b3a586f07f	[InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing arguments at callsite Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12779 llvm-svn: 247356	2015-09-10 23:04:49 +00:00
Chen Li	040b0ed807	[InstCombineCalls] Use isKnownNonNullAt() to check nullness of gc.relocate return value Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of gc.relocate return value. In this way it can handle cases where the relocated value does not have nonnull attribute but has a dominating null check from the CFG. Reviewers: reames Subscribers: llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D12772 llvm-svn: 247353	2015-09-10 22:35:41 +00:00
Jakub Kuderski	1c398c712b	There is a trunc(lshr (zext A), Cst) optimization in InstCombineCasts that removes cast by performing the lshr on smaller types. However, currently there is no trunc(lshr (sext A), Cst) variant. This patch add such optimization by transforming trunc(lshr (sext A), Cst) to ashr A, Cst. Differential Revision: http://reviews.llvm.org/D12520 llvm-svn: 247271	2015-09-10 11:31:20 +00:00
David Majnemer	3926c488cf	Revert trunc(lshr (sext A), Cst) to ashr A, Cst This reverts commit r246997, it introduced a regression (PR24763). llvm-svn: 247180	2015-09-09 20:20:08 +00:00
Benjamin Kramer	2c58ab02a8	Merge or combine tests and convert to FileCheck. - Move tests only exercising instsimplify to instsimplify's apint-or.ll - Actually test the CHECK lines in instsimplify's apint-or.ll - Merge the remaining tests in apint-or1.ll and apint-or2.ll, use FileCheck llvm-svn: 247045	2015-09-08 18:36:56 +00:00
Sanjay Patel	044ebbbd9b	add tests for De Morgan instcombines based on PR22723 llvm-svn: 247040	2015-09-08 18:13:03 +00:00
Sanjay Patel	4714c9c846	fix typos, remove noise; NFCI llvm-svn: 247035	2015-09-08 17:58:22 +00:00
Jakub Kuderski	92dba72884	There is a trunc(lshr (zext A), Cst) optimization in InstCombineCasts that removes cast by performing the lshr on smaller types. However, currently there is no trunc(lshr (sext A), Cst) variant. This patch add such optimization by transforming trunc(lshr (sext A), Cst) to ashr A, Cst. Differential Revision: http://reviews.llvm.org/D12520 llvm-svn: 246997	2015-09-08 10:03:17 +00:00
Sanjay Patel	3529fac16b	add missing regression tests for De Morgan's Law transform in InstCombine llvm-svn: 246973	2015-09-07 19:00:38 +00:00
David Majnemer	5e6c715cd2	[InstCombine] Don't divide by zero when evaluating a potential transform Trivial multiplication by zero may survive the worklist. We tried to reassociate the multiplication with a division instruction, causing us to divide by zero; bail out instead. This fixes PR24726. llvm-svn: 246939	2015-09-06 06:49:59 +00:00
David Majnemer	5cca152072	[InstCombine] Don't assume m_Mul gives back an Instruction This fixes PR24713. llvm-svn: 246933	2015-09-05 20:44:56 +00:00
Duncan P. N. Exon Smith	0c1aee0b16	DI: Require subprogram definitions to be distinct As a follow-up to r246098, require `DISubprogram` definitions (`isDefinition: true`) to be 'distinct'. Specifically, add an assembler check, a verifier check, and bitcode upgrading logic to combat testcase bitrot after the `DIBuilder` change. While working on the testcases, I realized that test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore. Its purpose was to check for a corner case in PR22792 where two subprogram definitions match exactly and share the same metadata node. The new verifier check, requiring that subprogram definitions are 'distinct', precludes that possibility. I updated almost all the IR with the following script: git grep -l -E -e '= !DISubprogram$.* isDefinition: true' \| grep -v test/Bitcode \| xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true$/= distinct \1/' Likely some variant of would work for out-of-tree testcases. llvm-svn: 246327	2015-08-28 20:26:49 +00:00
Sanjoy Das	b13b002480	[InstCombine] Fix PR24605. PR24605 is caused due to an incorrect insert point in instcombine's IR builder. When simplifying %t = add X Y ... %m = icmp ... %t the replacement for %t should be placed before %t, not before %m, as there could be a use of %t between %t and %m. llvm-svn: 246315	2015-08-28 19:09:31 +00:00
Chad Rosier	85d4ab5305	Optimize memcmp(x,y,n)==0 for small n and suitably aligned x/y. http://reviews.llvm.org/D6952 PR20673 llvm-svn: 246313	2015-08-28 18:30:18 +00:00
Pete Cooper	7ecbe8117c	isKnownNonNull needs to consider globals in non-zero address spaces. Globals in address spaces other than one may have 0 as a valid address, so we should not assume that they can be null. Reviewed by Philip Reames. llvm-svn: 246137	2015-08-27 03:16:29 +00:00
Sanjoy Das	ead6e3fe61	Re-apply r245635, "[InstCombine] Transform A & (L - 1) u< L --> L != 0" The original checkin was buggy, this change has a fix. Original commit message: [InstCombine] Transform A & (L - 1) u< L --> L != 0 Summary: This transform is never a pessimization at the IR level (since it replaces an `icmp` with another), and has potentiall payoffs: 1. It may make the `icmp` fold away or become loop invariant. 2. It may make the `A & (L - 1)` computation dead. This shows up in Java, in range checks generated by array accesses of the form `a[i & (a.length - 1)]`. Reviewers: reames, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12210 llvm-svn: 245753	2015-08-21 22:22:37 +00:00
Simon Pilgrim	a6ae3df1c8	Line endings fix. llvm-svn: 245736	2015-08-21 21:09:51 +00:00
NAKAMURA Takumi	9d82bd7d88	Revert r245635, "[InstCombine] Transform A & (L - 1) u< L --> L != 0" It caused miscompilation in clang. llvm-svn: 245678	2015-08-21 07:46:07 +00:00
Sanjoy Das	d7d7f13bd6	[InstCombine] Transform A & (L - 1) u< L --> L != 0 Summary: This transform is never a pessimization at the IR level (since it replaces an `icmp` with another), and has potentiall payoffs: 1. It may make the `icmp` fold away or become loop invariant. 2. It may make the `A & (L - 1)` computation dead. This shows up in Java, in range checks generated by array accesses of the form `a[i & (a.length - 1)]`. Reviewers: reames, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12210 llvm-svn: 245635	2015-08-20 22:31:55 +00:00
Balaram Makam	9086980655	Optimize bitwise even/odd test (-x&1 -> x&1) to not use negation. Summary: We know that -x & 1 is equivalent to x & 1, avoid using negation for testing if a negative integer is even or odd. Reviewers: majnemer Subscribers: junbuml, mssimpso, gberry, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D12156 llvm-svn: 245569	2015-08-20 15:35:00 +00:00
David Majnemer	fc4172a950	Revert "[InstCombinePHI] Partial simplification of identity operations." This reverts commit r244887, it caused PR24470. llvm-svn: 245194	2015-08-17 03:11:26 +00:00
Sanjay Patel	138d4e067f	transform fmin/fmax calls when possible (PR24314) If we can ignore NaNs, fmin/fmax libcalls can become compare and select (this is what we turn std::min / std::max into). This IR should then be optimized in the backend to whatever is best for any given target. Eg, x86 can use minss/maxss instructions. This should solve PR24314: https://llvm.org/bugs/show_bug.cgi?id=24314 Differential Revision: http://reviews.llvm.org/D11866 llvm-svn: 245187	2015-08-16 20:18:19 +00:00
David Majnemer	d32daec92c	[InstCombine] Replace an and+icmp with a trunc+icmp Bitwise arithmetic can obscure a simple sign-test. If replacing the mask with a truncate is preferable if the type is legal because it permits us to rephrase the comparison more explicitly. llvm-svn: 245171	2015-08-16 07:09:17 +00:00
Nick Lewycky	465237af32	Fix a crash where a utility function wasn't aware of fcmp vectors and created a value with the wrong type. Fixes PR24458! llvm-svn: 245119	2015-08-14 22:46:49 +00:00
Davide Italiano	3672dcbb24	[SimplifyLibCalls] Correctly set the is_zero_undef flag for llvm.cttz If <src> is non-zero we can safely set the flag to true, and this results in less code generated for, e.g. ffs(x) + 1 on FreeBSD. Thanks to majnemer for suggesting the fix and reviewing. Code generated before the patch was applied: 0: 0f bc c7 bsf %edi,%eax 3: b9 20 00 00 00 mov $0x20,%ecx 8: 0f 45 c8 cmovne %eax,%ecx b: 83 c1 02 add $0x2,%ecx e: b8 01 00 00 00 mov $0x1,%eax 13: 85 ff test %edi,%edi 15: 0f 45 c1 cmovne %ecx,%eax 18: c3 retq Code generated after the patch was applied: 0: 0f bc cf bsf %edi,%ecx 3: 83 c1 02 add $0x2,%ecx 6: 85 ff test %edi,%edi 8: b8 01 00 00 00 mov $0x1,%eax d: 0f 45 c1 cmovne %ecx,%eax 10: c3 retq It seems we can still use cmove and save another 'test' instruction, but that can be tackled separately. Differential Revision: http://reviews.llvm.org/D11989 llvm-svn: 244947	2015-08-13 20:34:26 +00:00
Charlie Turner	ed8f5c45cb	[InstCombinePHI] Partial simplification of identity operations. Consider this code: BB: %i = phi i32 [ 0, %if.then ], [ %c, %if.else ] %add = add nsw i32 %i, %b ... In this common case the add can be moved to the %if.else basic block, because adding zero is an identity operation. If we go though %if.then branch it's always a win, because add is not executed; if not, the number of instructions stays the same. This pattern applies also to other instructions like sub, shl, shr, ashr \| 0, mul, sdiv, div \| 1. Patch by Jakub Kuderski! llvm-svn: 244887	2015-08-13 12:38:58 +00:00
Simon Pilgrim	6b29a29fa2	[InstCombine] SSE/AVX vector shifts demanded shift amount bits Most SSE/AVX (non-constant) vector shift instructions only use the lower 64-bits of the 128-bit shift amount vector operand, this patch calls SimplifyDemandedVectorElts to optimize for this. I had to refactor some of my recent InstCombiner work on the vector shifts to avoid quite a bit of duplicate code, it means that SimplifyX86immshift now (re)decodes the type of shift. Differential Revision: http://reviews.llvm.org/D11938 llvm-svn: 244872	2015-08-13 07:39:03 +00:00
Simon Pilgrim	45d6ddee89	[InstCombine] Move SSE/AVX vector blend folding to instcombiner As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely). InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask). I also moved all the relevant combine tests into InstCombine/blend_x86.ll Differential Revision: http://reviews.llvm.org/D11934 llvm-svn: 244723	2015-08-12 08:08:56 +00:00
Sanjoy Das	2078fb4d89	Fix PR24354. `InstCombiner::OptimizeOverflowCheck` was asserting an invariant (operands to binary operations are ordered by decreasing complexity) that wasn't really an invariant. Fix this by instead having `InstCombiner::OptimizeOverflowCheck` establish the invariant if it does not hold. llvm-svn: 244676	2015-08-11 21:33:55 +00:00
Mehdi Amini	45a5fad0be	Fix InstCombine test: invalid CHECK line slipped in r231270 I incorrectly wrote CHECK-NEXT with followin with ':', the check was ignored by FileCheck. The non-inbound GEP is folded here because the DataLayout is no longer optional, the fold was originally guarded with a comment that said: We need TD information to know the pointer size unless this is inbounds. Now we always have "TD information" and perform the fold. Thanks Jonathan Roelofs for noticing. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 244613	2015-08-11 15:31:17 +00:00
James Molloy	ecd6525b24	Add support for floating-point minnum and maxnum The select pattern recognition in ValueTracking (as used by InstCombine and SelectionDAGBuilder) only knew about integer patterns. This teaches it about minimum and maximum operations. matchSelectPattern() has been extended to return a struct containing the existing Flavor and a new enum defining the pattern's behavior when given one NaN operand. C minnum() is defined to return the non-NaN operand in this case, but the idiomatic C "a < b ? a : b" would return the NaN operand. ARM and AArch64 at least have different instructions for these different cases. llvm-svn: 244580	2015-08-11 09:12:57 +00:00
Simon Pilgrim	65266a8e22	[InstCombine] Move SSE2/AVX2 arithmetic vector shift folding to instcombiner As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations. Differential Revision: http://reviews.llvm.org/D11886 llvm-svn: 244495	2015-08-10 20:21:15 +00:00
Jonathan Roelofs	135252e238	Fix a few more cases of 'CHECK[^:]*$'. NFCI llvm-svn: 244491	2015-08-10 19:56:39 +00:00
Jonathan Roelofs	3ac128b6b7	Fix a bunch of trivial cases of 'CHECK[^:]*$' in the tests. NFCI I looked into adding a warning / error for this to FileCheck, but there doesn't seem to be a good way to avoid it triggering on the instances of it in RUN lines. llvm-svn: 244481	2015-08-10 19:01:27 +00:00
Simon Pilgrim	c31bcce147	[InstCombine] Fix SSE2/AVX2 vector logical shift by constant This patch fixes the sse2/avx2 vector shift by constant instcombine call to correctly deal with the fact that the shift amount is formed from the entire lower 64-bit and not just the lowest element as it currently assumes. e.g. %1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 15, i32 15, i32 15, i32 15>) In this case, (V)PSRLD doesn't perform a lshr by 15 but in fact attempts to shift by 64424509455 ((15 << 32) \| 15) - giving a zero result. In addition, this review also recognizes shift-by-zero from a ConstantAggregateZero type (PR23821). Differential Revision: http://reviews.llvm.org/D11760 llvm-svn: 244341	2015-08-07 18:22:50 +00:00
Simon Pilgrim	99b0393ce0	[InstCombine] Added more specific SSE2/AVX2 vector shift tests. llvm-svn: 244022	2015-08-05 08:21:38 +00:00
Simon Pilgrim	557b132cad	[InstCombine] Split off SSE2/AVX2 vector shift tests. These aren't vector demanded bits tests. More tests to follow. llvm-svn: 243963	2015-08-04 08:05:27 +00:00
Duncan P. N. Exon Smith	87c77233df	DI: Disallow uniquable DICompileUnits Since r241097, `DIBuilder` has only created distinct `DICompileUnit`s. The backend is liable to start relying on that (if it hasn't already), so make uniquable `DICompileUnit`s illegal and automatically upgrade old bitcode. This is a nice cleanup, since we can remove an unnecessary `DenseSet` (and the associated uniquing info) from `LLVMContextImpl`. Almost all the testcases were updated with this script: git grep -e '= !DICompileUnit' -l -- test \| grep -v test/Bitcode \| xargs sed -i '' -e 's,= !DICompileUnit,= distinct !DICompileUnit,' I imagine something similar should work for out-of-tree testcases. llvm-svn: 243885	2015-08-03 17:26:41 +00:00
Duncan P. N. Exon Smith	08a36a35c8	DI: Remove DW_TAG_arg_variable and DW_TAG_auto_variable Remove the fake `DW_TAG_auto_variable` and `DW_TAG_arg_variable` tags, using `DW_TAG_variable` in their place Stop exposing the `tag:` field at all in the assembly format for `DILocalVariable`. Most of the testcase updates were generated by the following sed script: find test/ -name ".ll" -o -name ".mir" \| xargs grep -l 'DILocalVariable' \| xargs sed -i '' \ -e 's/tag: DW_TAG_arg_variable, //' \ -e 's/tag: DW_TAG_auto_variable, //' There were only a handful of tests in `test/Assembly` that I needed to update by hand. (Note: a follow-up could change `DILocalVariable::DILocalVariable()` to set the tag to `DW_TAG_formal_parameter` instead of `DW_TAG_variable` (as appropriate), instead of having that logic magically in the backend in `DbgVariable`. I've added a FIXME to that effect.) llvm-svn: 243774	2015-07-31 18:58:39 +00:00
Simon Pilgrim	e713c640a4	[InstCombine][X86][SSE] Replace sign/zero extension intrinsics with native IR Now that we are generating sane codegen for vector sext/zext nodes on SSE targets, this patch uses instcombine to replace the SSE41/AVX2 pmovsx and pmovzx intrinsics with the equivalent native IR code. Differential Revision: http://reviews.llvm.org/D11503 llvm-svn: 243303	2015-07-27 18:52:15 +00:00
Simon Pilgrim	a58dfe82c9	[InstCombine] Split off SSE4a tests. These aren't vector demanded bits tests. More tests to follow. llvm-svn: 243223	2015-07-25 17:14:01 +00:00
Karthik Bhat	bceb74b45e	Constfold trunc,rint,nearbyint,ceil and floor using APFloat A patch by Chakshu Grover! This patch allows constfolding of trunc,rint,nearbyint,ceil and floor intrinsics using APFloat class. Differential Revision: http://reviews.llvm.org/D11144 llvm-svn: 242763	2015-07-21 08:52:23 +00:00
David Majnemer	c31a55656e	[InstCombine] Generalize sub of selects optimization to all BinaryOperators This exposes further optimization opportunities if the selects are correlated. llvm-svn: 242235	2015-07-14 22:39:23 +00:00
Reid Kleckner	15ae77989b	Update enforceKnownAlignment after the isWeakForLinker semantic change Previously we would refrain from attempting to increase the linkage of available_externally globals because they were considered weak for the linker. Now they are treated more like a declaration instead of a weak definition. This was causing SSE alignment faults in Chromuim, when some code assumed it could increase the alignment of a dllimported global that it didn't control. http://crbug.com/509256 llvm-svn: 242091	2015-07-14 00:11:08 +00:00
Bjorn Steinbrink	4de6eb776a	[InstCombine] Actually combine AA metadata when replacing one load with another Fixes PR24083 llvm-svn: 241955	2015-07-10 22:30:17 +00:00
Bjorn Steinbrink	ecb205fc74	[InstCombine] Employ AliasAnalysis in FindAvailableLoadedValue llvm-svn: 241887	2015-07-10 06:55:49 +00:00
Bjorn Steinbrink	1a4f8db997	[InstCombine] Properly combine metadata when replacing a load with another Not doing this can lead to misoptimizations down the line, e.g. because of range metadata on the replacing load excluding values that are valid for the load that is being replaced. llvm-svn: 241886	2015-07-10 06:55:44 +00:00
Karthik Bhat	e65d1c7f73	Allow constfolding of llvm.sin.* and llvm.cos.* intrinsics This patch const folds llvm.sin.* and llvm.cos.* intrinsics whenever feasible. Differential Revision: http://reviews.llvm.org/D10836 llvm-svn: 241665	2015-07-08 03:55:47 +00:00
Jingyue Wu	35a6e27706	[InstCombine] call SimplifyICmpInst with correct context Summary: Fixes PR23809. Without passing the context to SimplifyICmpInst, we would use the assume to prove that the condition feeding the assume is trivially true (see isValidAssumeForContext in ValueTracking.cpp), causing the removal of the assume which may be useful for later optimizations. Test Plan: pr23800.ll Reviewers: hfinkel, majnemer Reviewed By: hfinkel Subscribers: henryhu, llvm-commits, wengxt, broune, meheff, eliben Differential Revision: http://reviews.llvm.org/D10695 llvm-svn: 240683	2015-06-25 20:14:47 +00:00
Artur Pilipenko	ccbbd4db82	Take alignment into account in isSafeToLoadUnconditionally Reviewed By: hfinkel Differential Revision: http://reviews.llvm.org/D10475 llvm-svn: 240636	2015-06-25 12:18:43 +00:00
David Majnemer	9f136ce139	[InstCombine] Optimize subtract of selects into a select of a sub This came up when examining some code generated by clang's IRGen for certain member pointers. llvm-svn: 240369	2015-06-23 02:49:24 +00:00
David Majnemer	c8b1f095a3	Move the personality function from LandingPadInst to Function The personality routine currently lives in the LandingPadInst. This isn't desirable because: - All LandingPadInsts in the same function must have the same personality routine. This means that each LandingPadInst beyond the first has an operand which produces no additional information. - There is ongoing work to introduce EH IR constructs other than LandingPadInst. Moving the personality routine off of any one particular Instruction and onto the parent function seems a lot better than have N different places a personality function can sneak onto an exceptional function. Differential Revision: http://reviews.llvm.org/D10429 llvm-svn: 239940	2015-06-17 20:52:32 +00:00
Philip Reames	fc6ddd62bf	Reapply 239795 - [InstCombine] Propagate non-null facts to call parameters The original change broke clang side tests. I will be submitting those momentarily. This change includes post commit feedback on the original change from from Pete Cooper. Original Submission comments: If a parameter to a function is known non-null, use the existing parameter attributes to record that fact at the call site. This has no optimization benefit by itself - that I know of - but is an enabling change for http://reviews.llvm.org/D9129. Differential Revision: http://reviews.llvm.org/D9132 llvm-svn: 239849	2015-06-16 20:24:25 +00:00
Philip Reames	2aad4769d2	Revert 239795 I forgot to update some clang test cases. I'll fix and resubmit tomorrow. llvm-svn: 239800	2015-06-16 01:20:53 +00:00
Philip Reames	54716a6f5b	[InstCombine] Propagate non-null facts to call parameters If a parameter to a function is known non-null, use the existing parameter attributes to record that fact at the call site. This has no optimization benefit by itself - that I know of - but is an enabling change for http://reviews.llvm.org/D9129. Differential Revision: http://reviews.llvm.org/D9132 llvm-svn: 239795	2015-06-16 00:43:54 +00:00
David Majnemer	aa3c1f7077	[InstCombine] Don't miscompile select to poison If we have (select a, b, c), it is sometimes valid to simplify this to a single select operand. However, doing so is only valid if the computation doesn't inject poison into the computation. It might be helpful to consider the following example: (select (icmp ne %i, INT_MAX), (add nsw %i, 1), INT_MIN) The select is equivalent to (add %i, 1) but not (add nsw %i, 1). Self hosting on x86_64 revealed that this occurs very, very rarely so bailing out is hopefully pretty reasonable. llvm-svn: 239215	2015-06-06 02:30:43 +00:00
Renato Golin	cdb4b5a579	Revert "[InstCombine] Rephrase fix to SimplifyWithOpReplaced" This reverts commit r239141. This commit was an attempt to reintroduce a previous patch that broke many self-hosting bots with clang timeouts, but it still has slowdown issues, at least on ARM, increasing the compilation time (stage 2, clang's) by 5x. llvm-svn: 239175	2015-06-05 18:24:12 +00:00
Sanjoy Das	71de44f239	[InstCombine] Fix PR23751. PR23751 was caused by a missing ``break;`` in r234388. llvm-svn: 239171	2015-06-05 18:04:42 +00:00
David Majnemer	523f1fa033	[InstCombine] Rephrase fix to SimplifyWithOpReplaced I don't have the IR which is causing the build bot breakage but I can postulate as to why they are timing out: 1. SimplifyWithOpReplaced was stripping flags from the simplified value. 2. visitSelectInstWithICmp was overriding SimplifyWithOpReplaced because it's simplification wasn't correct. 3. InstCombine would revisit the add instruction and note that it can rederive the flags. 4. By modifying the value, we chose to revisit instructions which reuse the value. One of the instructions is the original select, causing LLVM to never reach fixpoint. Instead, strip the flags only when we are sure we are going to perform the simplification. llvm-svn: 239141	2015-06-05 09:57:57 +00:00
Daniel Jasper	85b6ed5297	Revert "[InstCombine] Don't miscompile safe increment idiom" This is breaking a lot of build bots and is causing very long-running compiles (infinite loops)? Likely, we shouldn't return nullptr? llvm-svn: 239139	2015-06-05 09:31:20 +00:00
David Majnemer	7e0cc86e96	[InstCombine] Don't miscompile safe increment idiom We cleverly handle cases where computation done in one argument of a select instruction is suitable for the other operand, thus obviating the need of the select and the comparison. However, the other operand cannot have flags. This fixes PR23757. llvm-svn: 239115	2015-06-04 23:11:30 +00:00
Ahmed Bougacha	a689d2cb58	[IR] fptrunc-of-fptrunc isn't an EliminableCastPair. Double and single rounding can produce different results. This is the IR counterpart to r228911. llvm-svn: 238531	2015-05-29 00:04:30 +00:00
David Majnemer	514ab73614	[InstCombine] Fold IntToPtr and PtrToInt into preceding loads. Currently we only fold a BitCast into a Load when the BitCast is its only user. Do the same for any no-op cast. Differential Revision: http://reviews.llvm.org/D9152 llvm-svn: 238452	2015-05-28 18:39:17 +00:00
David Majnemer	8a231f0374	[InstCombine] Don't eagerly propagate nsw for AB+AC => A(B+C) InstCombine transforms A nsw B +nsw A nsw C to A nsw (B + C). This is incorrect -- e.g. if A = -1, B = 1, C = INT_SMAX. Then nothing in the LHS overflows, but the multiplication in RHS overflows. We need to first make sure that we won't multiple by INT_SMAX + 1. Test case `add_of_mul` contributed by Sanjoy Das. This fixes PR23635. Differential Revision: http://reviews.llvm.org/D9629 llvm-svn: 238066	2015-05-22 23:02:11 +00:00
David Majnemer	e7a303ac2b	[InstSimplify] Handle some overflow intrinsics in InstSimplify This change does a few things: - Move some InstCombine transforms to InstSimplify - Run SimplifyCall from within InstCombine::visitCallInst - Teach InstSimplify to fold [us]mul_with_overflow(X, undef) to 0. llvm-svn: 237995	2015-05-22 03:56:46 +00:00
David Majnemer	b6938a4929	[InstCombine] X - 0 is equal to X, not undef A refactoring made @llvm.ssub.with.overflow.i32(i32 %X, i32 0) transform into undef instead of %X. This fixes PR23624. llvm-svn: 237968	2015-05-21 23:04:21 +00:00
James Molloy	e993a7db93	Reapply r237539 with a fix for the Chromium build. Make sure if we're truncating a constant that would then be sign extended that the sign extension of the truncated constant is the same as the original constant. > Canonicalize min/max expressions correctly. > > This patch introduces a canonical form for min/max idioms where one operand > is extended or truncated. This often happens when the other operand is a > constant. For example: > > %1 = icmp slt i32 %a, i32 0 > %2 = sext i32 %a to i64 > %3 = select i1 %1, i64 %2, i64 0 > > Would now be canonicalized into: > > %1 = icmp slt i32 %a, i32 0 > %2 = select i1 %1, i32 %a, i32 0 > %3 = sext i32 %2 to i64 > > This builds upon a patch posted by David Majenemer > (https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass > passively stopped instcombine from ruining canonical patterns. This > patch additionally actively makes instcombine canonicalize too. > > Canonicalization of expressions involving a change in type from int->fp > or fp->int are not yet implemented. llvm-svn: 237821	2015-05-20 18:41:25 +00:00
Hans Wennborg	224f420df5	Revert r237539: "Reapply r237520 with another fix for infinite looping" This caused PR23583. llvm-svn: 237739	2015-05-19 23:06:30 +00:00
James Molloy	928c38a114	Reapply r237520 with another fix for infinite looping SimplifyDemandedBits was "simplifying" a constant by removing just sign bits. This caused a canonicalization race between different parts of instcombine. Fix and regression test added - third time lucky? llvm-svn: 237539	2015-05-17 08:27:27 +00:00
James Molloy	56795ba031	Revert commits r237521 and r237520. The AArch64 LNT bot is unhappy - I've found that the problem is in SimpliftDemandedBits, but that's going to require another code review so reverting in the meantime. llvm-svn: 237528	2015-05-16 21:27:14 +00:00
James Molloy	e351ab828e	Update to r237520 - swap order of CHECK-NEXT lines. ... I'd copied the check-next lines from a previous test so they were slightly wrong, and had managed to test the wrong source tree. D'oh! llvm-svn: 237521	2015-05-16 13:26:25 +00:00
James Molloy	c45c32fb55	Reapply r237453 with a fix for the test timeouts. The test timeouts were due to instcombine fighting itself. Regression test added. Original log message: Canonicalize min/max expressions correctly. This patch introduces a canonical form for min/max idioms where one operand is extended or truncated. This often happens when the other operand is a constant. For example: %1 = icmp slt i32 %a, i32 0 %2 = sext i32 %a to i64 %3 = select i1 %1, i64 %2, i64 0 Would now be canonicalized into: %1 = icmp slt i32 %a, i32 0 %2 = select i1 %1, i32 %a, i32 0 %3 = sext i32 %2 to i64 This builds upon a patch posted by David Majenemer (https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass passively stopped instcombine from ruining canonical patterns. This patch additionally actively makes instcombine canonicalize too. Canonicalization of expressions involving a change in type from int->fp or fp->int are not yet implemented. llvm-svn: 237520	2015-05-16 13:10:45 +00:00
James Molloy	d400196c9a	Revert "Canonicalize min/max expressions correctly." This reverts r237453 - it was causing timeouts on some bots. Reverting while I investigate (it's probably InstCombine fighting itself...) llvm-svn: 237458	2015-05-15 17:45:09 +00:00
James Molloy	d7cc5c99ef	Canonicalize min/max expressions correctly. This patch introduces a canonical form for min/max idioms where one operand is extended or truncated. This often happens when the other operand is a constant. For example: %1 = icmp slt i32 %a, i32 0 %2 = sext i32 %a to i64 %3 = select i1 %1, i64 %2, i64 0 Would now be canonicalized into: %1 = icmp slt i32 %a, i32 0 %2 = select i1 %1, i32 %a, i32 0 %3 = sext i32 %2 to i64 This builds upon a patch posted by David Majenemer (https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass passively stopped instcombine from ruining canonical patterns. This patch additionally actively makes instcombine canonicalize too. Canonicalization of expressions involving a change in type from int->fp or fp->int are not yet implemented. llvm-svn: 237453	2015-05-15 16:10:59 +00:00
Sanjoy Das	6d67db8c09	[Statepoints] Support for "patchable" statepoints. Summary: This change adds two new parameters to the statepoint intrinsic, `i64 id` and `i32 num_patch_bytes`. `id` gets propagated to the ID field in the generated StackMap section. If the `num_patch_bytes` is non-zero then the statepoint is lowered to `num_patch_bytes` bytes of nops instead of a call (the spill and reload code remains unchanged). A non-zero `num_patch_bytes` is useful in situations where a language runtime requires complete control over how a call is lowered. This change brings statepoints one step closer to patchpoints. With some additional work (that is not part of this patch) it should be possible to get rid of `TargetOpcode::STATEPOINT` altogether. PlaceSafepoints generates `statepoint` wrappers with `id` set to `0xABCDEF00` (the old default value for the ID reported in the stackmap) and `num_patch_bytes` set to `0`. This can be made more sophisticated later. Reviewers: reames, pgavlin, swaroop.sridhar, AndyAyers Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9546 llvm-svn: 237214	2015-05-12 23:52:24 +00:00
Sunil Srivastava	e4b73132fd	Changed renaming of local symbols by inserting a dot vefore the numeric suffix. One code change and several test changes to match that details in http://reviews.llvm.org/D9481 llvm-svn: 237150	2015-05-12 16:47:30 +00:00
Hal Finkel	bb5d93c15a	[InstCombine/PowerPC] Fix single-precision QPX load/store replacement The QPX single-precision load/store intrinsics have implied truncation/extension from/to the declared value type of <4 x double> to the memory type of <4 x float>. When we can prove the alignment of the pointer argument, and thus replace the intrinsic with a regular load or store, we need to load or store the correct data type (<4 x float>) instead of (<4 x double>). llvm-svn: 236973	2015-05-11 06:37:03 +00:00
David Majnemer	3be7693a33	Make buildbots happy llvm-svn: 236970	2015-05-11 05:33:27 +00:00
David Majnemer	5580620741	[InstCombine] Canonicalize single element array store Use the element type instead of the aggregate type. Differential Revision: http://reviews.llvm.org/D9591 llvm-svn: 236969	2015-05-11 05:04:27 +00:00
David Majnemer	9f376fec1c	[InstCombine] Canonicalize single element array load Use the element type instead of the aggregate type. Differential Revision: http://reviews.llvm.org/D9596 llvm-svn: 236968	2015-05-11 05:04:22 +00:00
Pat Gavlin	c022b8d288	Extend the statepoint intrinsic to allow statepoints to be marked as transitions from GC-aware code to code that is not GC-aware. This changes the shape of the statepoint intrinsic from: @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args) to: @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args) This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back. In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation. Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops. Differential Revision: http://reviews.llvm.org/D9501 llvm-svn: 236888	2015-05-08 18:07:42 +00:00
Mehdi Amini	7da0eff84e	Update InstCombine to transform aggregate loads into scalar loads. Summary: One step further getting aggregate loads and store being optimized properly. This will only handle struct with one element at this point. Test Plan: Added unit tests for the new supported cases. Reviewers: chandlerc, joker-eph, joker.eph, majnemer Reviewed By: majnemer Subscribers: pete, llvm-commits Differential Revision: http://reviews.llvm.org/D8339 Patch by Amaury Sechet. From: Amaury Sechet <amaury@fb.com> llvm-svn: 236695	2015-05-07 05:52:40 +00:00
Matthias Braun	bfbcc9d6b2	InstCombineSimplifyDemanded: Remove nsw/nuw flags when optimizing demanded bits When optimizing demanded bits of the operands of an Add we have to remove the nsw/nuw flags as we have no guarantee anymore that we don't wrap. This is legal here because the top bit is not demanded. In fact this operaion was already performed but missed in the case of an Add with a constant on the right side. To fix this this patch refactors the code to unify the code paths in SimplifyDemandedUseBits() handling of Add/Sub: - The transformation of Add->Or is removed from the simplify demand code because the equivalent transformation exists in InstCombiner::visitAdd() - KnownOnes/KnownZero are not adjusted for Add x, C anymore as computeKnownBits() already performs these computations. - The simplification of the operands is unified. In this new version constant on the right side of a Sub are shrunk now as I could not find a reason why not to do so. - The special case for clearing nsw/nuw in ShrinkDemandedConstant() is not necessary anymore as the caller does that already. Differential Revision: http://reviews.llvm.org/D9415 llvm-svn: 236269	2015-04-30 22:05:30 +00:00
Sanjoy Das	dd8fba973b	[InstCombine] Add new rule for MIN(MAX(~A, ~B), ~C) et. al. Summary: Optimizing these well are especially interesting for IRCE since it "clamps" values by generating this sort of pattern through SCEV expressions. Depends on D9352. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9353 llvm-svn: 236203	2015-04-30 04:56:04 +00:00
Sanjoy Das	668a7c9deb	[InstCombine] Add a new formula for SMIN. Summary: After this change `MatchSelectPattern` recognizes the following form of SMIN: Y >s C ? ~Y : ~C == ~Y <s ~C ? ~Y : ~C = SMIN(~Y, ~C) Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9352 llvm-svn: 236202	2015-04-30 04:56:00 +00:00
Duncan P. N. Exon Smith	09b5c9c24d	IR: Give 'DI' prefix to debug info metadata Finish off PR23080 by renaming the debug info IR constructs from `MD` to `DI`. The last of the `DIDescriptor` classes were deleted in r235356, and the last of the related typedefs removed in r235413, so this has all baked for about a week. Note: If you have out-of-tree code (like a frontend), I recommend that you get everything compiling and tests passing with the previous commit before updating to this one. It'll be easier to keep track of what code is using the `DIDescriptor` hierarchy and what you've already updated, and I think you're extremely unlikely to insert bugs. YMMV of course. Back to this commit: I did this using the rename-md-di-nodes.sh upgrade script I've attached to PR23080 (both code and testcases) and filtered through clang-format-diff.py. I edited the tests for test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns were off-by-three. It should work on your out-of-tree testcases (and code, if you've followed the advice in the previous paragraph). Some of the tests are in badly named files now (e.g., test/Assembler/invalid-mdcompositetype-missing-tag.ll should be 'dicompositetype'); I'll come back and move the files in a follow-up commit. llvm-svn: 236120	2015-04-29 16:38:44 +00:00
Sanjay Patel	6e05431540	[x86] instcombine more cases of insertps into a shufflevector This is a follow-on to D8833 (insertps optimization when the zero mask is not used). In this patch, we check for the case where the zmask is used, but both input vectors to the insertps intrinsic are the same operand or the zmask overrides the destination lane. This lets us replace the 2nd shuffle input operand with the zero vector. Differential Revision: http://reviews.llvm.org/D9257 llvm-svn: 235810	2015-04-25 20:55:25 +00:00
David Blaikie	2fcc0180e4	[opaque pointer type] Add textual IR support for explicit type parameter to the invoke instruction Same as r235145 for the call instruction - the justification, tradeoffs, etc are all the same. The conversion script worked the same without any false negatives (after replacing 'call' with 'invoke'). llvm-svn: 235755	2015-04-24 19:32:54 +00:00
David Majnemer	c54f703dad	[InstCombine] Use a more targeted fix instead of r235544 Only clear out the NSW/NUW flags if we are optimizing 'add'/'sub' while taking advantage that the sign bit is not set. We do this optimization to further shrink the mask but shrinking the mask isn't NSW/NUW preserving in this case. llvm-svn: 235558	2015-04-22 22:42:05 +00:00
David Majnemer	61d5d8dd8e	[InstCombine] Clear out nsw/nuw if we modify computation in the chain An nsw/nuw operation relies on the values feeding into it to not overflow if 'poison' is not to be produced. This means that optimizations which make modifications to the bottom of a chain (like SimplifyDemandedBits) must strip out nsw/nuw if they cannot ensure that they will be preserved. This fixes PR23309. llvm-svn: 235544	2015-04-22 20:59:28 +00:00
NAKAMURA Takumi	f0b6c8a839	Remove a zero-length file of llvm/test/Transforms/InstCombine/descale-zero.ll. llvm-svn: 235457	2015-04-21 23:14:33 +00:00
Wei Mi	24e5246dd6	Limiting gep merging to fix the performance problem described in https://llvm.org/bugs/show_bug.cgi?id=23163. Gep merging sometimes behaves like a reverse CSE/LICM optimization, which has negative impact on performance. In this patch we restrict gep merging to happen only when the indexes to be merged are both consts, which ensures such merge is always beneficial. The patch makes gep merging only happen in very restrictive cases. It is possible that some analysis/optimization passes rely on the merged geps to get better result, and we havn't notice them yet. We will be ready to further improve it once we see the cases. Differential Revision: http://reviews.llvm.org/D8911 llvm-svn: 235455	2015-04-21 23:02:15 +00:00
Wei Mi	6a1df2cb86	Revert r235451 since it is attached to a wrong Differential Revision. Sorry. llvm-svn: 235453	2015-04-21 22:56:09 +00:00
Wei Mi	9c45f921b6	Limiting gep merging to fix the performance problem described in https://llvm.org/bugs/show_bug.cgi?id=23163. Gep merging sometimes behaves like a reverse CSE/LICM optimizations, which has negative impact on performance. In this patch we restrict gep merging to happen only when the indexes to be merged are both consts, which ensures such merge is always beneficial. The patch makes gep merging only happen in very restrictive cases. It is possible that some analysis/optimization passes rely on the merged geps to get better result, and we havn't notice them yet. We will be ready to further improve it once we see the cases. Differential Revision: http://reviews.llvm.org/D9007 llvm-svn: 235451	2015-04-21 22:37:09 +00:00
Fiona Glaser	c2e9ee5036	InstCombine: fold (sitofp (zext x)) to (uitofp x) This is okay because the zext guarantees the high bit is zero, and so the value is unsigned. llvm-svn: 235364	2015-04-21 00:05:41 +00:00
David Majnemer	fa43478f01	[InstCombine] (mul nsw 1, INT_MIN) != (shl nsw 1, 31) Multiplying INT_MIN by 1 doesn't trigger nsw. However, shifting 1 into the sign bit does trigger nsw. llvm-svn: 235250	2015-04-18 04:41:30 +00:00
David Blaikie	dfadb4e9ee	[opaque pointer type] Add textual IR support for explicit type parameter to the call instruction See r230786 and r230794 for similar changes to gep and load respectively. Call is a bit different because it often doesn't have a single explicit type - usually the type is deduced from the arguments, and just the return type is explicit. In those cases there's no need to change the IR. When that's not the case, the IR usually contains the pointer type of the first operand - but since typed pointers are going away, that representation is insufficient so I'm just stripping the "pointerness" of the explicit type away. This does make the IR a bit weird - it /sort of/ reads like the type of the first operand: "call void () %x(" but %x is actually of type "void ()" and will eventually be just of type "ptr". But this seems not too bad and I don't think it would benefit from repeating the type ("void (), void () %x(" and then eventually "void (), ptr %x(") as has been done with gep and load. This also has a side benefit: since the explicit type is no longer a pointer, there's no ambiguity between an explicit type and a function that returns a function pointer. Previously this case needed an explicit type (eg: a function returning a void() function was written as "call void () () * @x(" rather than "call void () * @x(" because of the ambiguity between a function returning a pointer to a void() function and a function returning void). No ambiguity means even function pointer return types can just be written alone, without writing the whole function's type. This leaves /only/ the varargs case where the explicit type is required. Given the special type syntax in call instructions, the regex-fu used for migration was a bit more involved in its own unique way (as every one of these is) so here it is. Use it in conjunction with the apply.sh script and associated find/xargs commands I've provided in rr230786 to migrate your out of tree tests. Do let me know if any of this doesn't cover your cases & we can iterate on a more general script/regexes to help others with out of tree tests. About 9 test cases couldn't be automatically migrated - half of those were functions returning function pointers, where I just had to manually delete the function argument types now that we didn't need an explicit function type there. The other half were typedefs of function types used in calls - just had to manually drop the * from those. import fileinput import sys import re pat = re.compile(r'((?:=\|:\|^\|\s)call\s(?:[^@]?))(\s$\|\s(?:(?:\[\[[a-zA-Z0-9_]+\]\]\|[@%](?:(")?[\\\?@a-zA-Z0-9_.]?(?(3)"\|)\|{{.}}))(?:$\|$)\|undef\|inttoptr\|bitcast\|null\|asm).$)') addrspace_end = re.compile(r"addrspace\(\d+$\s\$") func_end = re.compile("(?:void.\|\)\s)\$") def conv(match, line): if not match or re.search(addrspace_end, match.group(1)) or not re.search(func_end, match.group(1)): return line return line[:match.start()] + match.group(1)[:match.group(1).rfind('')].rstrip() + match.group(2) + line[match.end():] for line in sys.stdin: sys.stdout.write(conv(re.search(pat, line), line)) llvm-svn: 235145	2015-04-16 23:24:18 +00:00
Sanjay Patel	f808782564	[X86, SSE] instcombine common cases of insertps intrinsics into shuffles This is very similar to D8486 / r232852 (vperm2). If we treat insertps intrinsics as shufflevectors, we can optimize them better. I've left all but the full zero case of the zero mask variants out of this patch. I don't think those can be converted into a single shuffle in all cases, but I'd be happy to be proven wrong as I was for vperm2f128. Either way, we'd need to support whatever sequence we come up with for those cases in the backend before converting them here. Differential Revision: http://reviews.llvm.org/D8833 llvm-svn: 235124	2015-04-16 17:52:13 +00:00
Nick Lewycky	e8c4d44b5f	Subtraction is not commutative. Fixes PR23212! llvm-svn: 234780	2015-04-13 19:17:37 +00:00
Sanjoy Das	68c711fb56	[InstCombine][CodeGenPrep] Create llvm.uadd.with.overflow in CGP. Summary: This change moves creating calls to `llvm.uadd.with.overflow` from InstCombine to CodeGenPrep. Combining overflow check patterns into calls to the said intrinsic in InstCombine inhibits optimization because it introduces an intrinsic call that not all other transforms and analyses understand. Depends on D8888. Reviewers: majnemer, atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8889 llvm-svn: 234638	2015-04-10 21:07:09 +00:00
David Majnemer	2af32ca579	Fix a typo CHECK-LABEL had the wrong function name. llvm-svn: 234051	2015-04-03 20:56:24 +00:00
David Majnemer	2243ffe08d	[InstCombine] Use DataLayout to determine vector element width InstCombine didn't realize that it needs to use DataLayout to determine how wide pointers are. This lead to assertion failures. This fixes PR23113. llvm-svn: 234046	2015-04-03 20:18:40 +00:00
Duncan P. N. Exon Smith	7af4fb7b9f	Verifier: Call verifyModule() from llc and opt Change `llc` and `opt` to run `verifyModule()`. This ensures that we check the full module before `FunctionPass::doInitialization()` ever gets called (I was getting crashes in `DwarfDebug` instead of verifier failures when testing a WIP patch that checks operands of compile units). In `opt`, also move up debug-info-stripping so that it still runs before verification. There was a fair bit of broken code that was sitting in tree. Interestingly, some were cases of a `select` that referred to itself in `-instcombine` tests (apparently an intermediate result). I split them off to `*-noverify.ll` tests with RUN lines like this: opt < %s -S -disable-verify -instcombine \| opt -S \| FileCheck %s This avoids verifying the input file (so we can get the broken code into `-instcombine), but still verifies the output with a second call to `opt` (to verify that `-instcombine` will clean it up like it should). llvm-svn: 233432	2015-03-27 22:04:28 +00:00
Duncan P. N. Exon Smith	4d29d309de	DebugInfo: Fix bad debug info for compile units and types Fix debug info in these tests, which started failing with a WIP patch to verify compile units and types. The problems look like they were all caused by bitrot. They fell into these categories: - Using `!{i32 0}` instead of `!{}`. - Using `!{null}` instead of `!{}`. - Using `!MDExpression()` instead of `!{}`. - Using `!8` instead of `!{!8}`. - `file:` references that pointed at `MDCompileUnit`s instead of the same `MDFile` as the compile unit. - `file:` references that were numerically off-by-one or (off-by-ten). llvm-svn: 233415	2015-03-27 20:46:33 +00:00
Philip Reames	007c0af082	Require a GC strategy be specified for functions which use gc.statepoint This was discussed a while back and I left it optional for migration. Since it's been far more than the 'week or two' that was discussed, time to actually make this manditory. llvm-svn: 233357	2015-03-27 05:09:33 +00:00
Benjamin Kramer	ab138a6078	InstCombine: fold (A << C) == (B << C) --> ((A^B) & (~0U >> C)) == 0 Anding and comparing with zero can be done in a single instruction on most archs so this is a bit cheaper. llvm-svn: 233291	2015-03-26 17:12:06 +00:00
Sanjay Patel	1a17022bda	optimize the AVX2 (integer) version of vperm2 into a shuffle ...because this is what happens when an instruction set puts its underwear on after its pants. This is an extension of r232852, r233100, and 233110: http://llvm.org/viewvc/llvm-project?view=revision&revision=232852 http://llvm.org/viewvc/llvm-project?view=revision&revision=233100 http://llvm.org/viewvc/llvm-project?view=revision&revision=233110 llvm-svn: 233127	2015-03-24 22:39:29 +00:00
David Blaikie	8b46a1bcb2	Revert "Remove an InstCombine that seems to have become redundant." Assertion fires in compiler-rt. Guess it does fire.. This reverts commit r233116. llvm-svn: 233121	2015-03-24 21:50:35 +00:00
David Blaikie	0914ef9c6b	Remove an InstCombine that seems to have become redundant. Assert that this doesn't fire - I'll remove all of this later, but just leaving it in for a while in case this is firing & we just don't have test coverage. llvm-svn: 233116	2015-03-24 21:31:31 +00:00
Sanjay Patel	c077161b46	[X86, AVX] instcombine vperm2 intrinsics with zero inputs into shuffles This is the IR optimizer follow-on patch for D8563: the x86 backend patch that converts this kind of shuffle back into a vperm2. This is also a continuation of the transform that started in D8486. In that patch, Andrea suggested that we could convert vperm2 intrinsics that use zero masks into a single shuffle. This is an implementation of that suggestion. Differential Revision: http://reviews.llvm.org/D8567 llvm-svn: 233110	2015-03-24 20:36:42 +00:00
Benjamin Kramer	b1b20b3003	[SimplifyLibCalls] Fix negative shifts being produced by the memchr -> bitfield transform. llvm-svn: 232903	2015-03-21 22:04:26 +00:00
Benjamin Kramer	facc0a564b	[SimplifyLibCalls] Turn memchr(const, C, const) into a bitfield check. strchr("123!", C) != nullptr is a common pattern to check if C is one of 1, 2, 3 or !. If the largest element of the string is smaller than the target's register size we can easily create a bitfield and just do a simple test for set membership. int foo(char C) { return strchr("123!", C) != nullptr; } now becomes cmpl $64, %edi ## range check sbbb %al, %al movabsq $0xE000200000001, %rcx btq %rdi, %rcx ## bit test sbbb %cl, %cl andb %al, %cl ## and the two conditions andb $1, %cl movzbl %cl, %eax ## returning an int ret (imho the backend should expand this into a series of branches, but that's a different story) The code is currently limited to bit fields that fit in a register, so usually 64 or 32 bits. Sadly, this misses anything using alpha chars or {}. This could be fixed by just emitting a i128 bit field, but that can generate really ugly code so we have to find a better way. To some degree this is also recreating switch lowering logic, but we can't simply emit a switch instruction and thus change the CFG within instcombine. llvm-svn: 232902	2015-03-21 21:09:33 +00:00
Benjamin Kramer	6c7bd16fc0	SimplifyLibCalls: Add basic optimization of memchr calls. This is just memchr(x, y, 0) -> nullptr and constant folding. llvm-svn: 232896	2015-03-21 15:36:21 +00:00
Sanjay Patel	cf8a53f502	[X86, AVX] instcombine common cases of vperm2* intrinsics into shuffles vperm2* intrinsics are just shuffles. In a few special cases, they're not even shuffles. Optimizing intrinsics in InstCombine is better than handling this in the front-end for at least two reasons: 1. Optimizing custom-written SSE intrinsic code at -O0 makes vector coders really angry (and so I have regrets about some patches from last week). 2. Doing mask conversion logic in header files is hard to write and subsequently read. There are a couple of TODOs in this patch to complete this optimization. Differential Revision: http://reviews.llvm.org/D8486 llvm-svn: 232852	2015-03-20 21:47:56 +00:00
Daniel Jasper	7e5f7305cf	[InstCombine] Don't fold a GEP into itself through a PHI node This can only occur (I think) through the back-edge of the loop. However, folding a GEP into itself means that the value of the previous iteration needs to be stored in the meantime, thus requiring an additional register variable to be live, but not actually achieving anything (the gep still needs to be executed once per loop iteration). The attached test case is derived from: typedef unsigned uint32; typedef unsigned char uint8; inline uint8 f(uint32 value, uint8 target) { while (value >= 0x80) { value >>= 7; ++target; } ++target; return target; } uint8 g(uint32 b, uint8 target) { target = f(b, f(42, target)); return target; } What happens is that the GEP stored in incptr2 is folded into itself through the loop's back-edge and the phi-node stored in loopptr, effectively incrementing the ptr by "2" in each iteration instead of "1". In this case, it is actually increasing the number of GEPs required as the GEP before the loop can't be folded away anymore. For comparison: With this patch: define i8* @test4(i32 %value, i8* %buffer) { entry: %cmp = icmp ugt i32 %value, 127 br i1 %cmp, label %loop.header, label %exit loop.header: ; preds = %entry br label %loop.body loop.body: ; preds = %loop.body, %loop.header %buffer.pn = phi i8* [ %buffer, %loop.header ], [ %loopptr, %loop.body ] %newval = phi i32 [ %value, %loop.header ], [ %shr, %loop.body ] %loopptr = getelementptr inbounds i8, i8* %buffer.pn, i64 1 %shr = lshr i32 %newval, 7 %cmp2 = icmp ugt i32 %newval, 16383 br i1 %cmp2, label %loop.body, label %loop.exit loop.exit: ; preds = %loop.body br label %exit exit: ; preds = %loop.exit, %entry %0 = phi i8* [ %loopptr, %loop.exit ], [ %buffer, %entry ] %incptr3 = getelementptr inbounds i8, i8* %0, i64 2 ret i8* %incptr3 } Without this patch: define i8* @test4(i32 %value, i8* %buffer) { entry: %incptr = getelementptr inbounds i8, i8* %buffer, i64 1 %cmp = icmp ugt i32 %value, 127 br i1 %cmp, label %loop.header, label %exit loop.header: ; preds = %entry br label %loop.body loop.body: ; preds = %loop.body, %loop.header %0 = phi i8* [ %buffer, %loop.header ], [ %loopptr, %loop.body ] %loopptr = phi i8* [ %incptr, %loop.header ], [ %incptr2, %loop.body ] %newval = phi i32 [ %value, %loop.header ], [ %shr, %loop.body ] %shr = lshr i32 %newval, 7 %incptr2 = getelementptr inbounds i8, i8* %0, i64 2 %cmp2 = icmp ugt i32 %newval, 16383 br i1 %cmp2, label %loop.body, label %loop.exit loop.exit: ; preds = %loop.body br label %exit exit: ; preds = %loop.exit, %entry %ptr2 = phi i8* [ %incptr2, %loop.exit ], [ %incptr, %entry ] %incptr3 = getelementptr inbounds i8, i8* %ptr2, i64 1 ret i8* %incptr3 } Review: http://reviews.llvm.org/D8245 llvm-svn: 232718	2015-03-19 11:05:08 +00:00
Duncan P. N. Exon Smith	9f52f307c2	Verifier: Check debug info intrinsic arguments Verify that debug info intrinsic arguments are valid. (These checks will not recurse through the full debug info graph, so they don't need to be cordoned of in `DebugInfoVerifier`.) With those checks in place, changing the `DbgIntrinsicInst` accessors to downcast to `MDLocalVariable` and `MDExpression` is natural (added isa specializations in `Metadata.h` to support this). Added tests to `test/Verifier` for the new -verify checks, and fixed the debug info in all the in-tree tests. If you have out-of-tree testcases that have started to fail to -verify, hopefully the verify checks are helpful. The most likely problem is that the expression argument is `!{}` (instead of `!MDExpression()`). llvm-svn: 232296	2015-03-15 01:21:30 +00:00
Mehdi Amini	459bdb4f65	Update InstCombine to transform aggregate stores into scalar stores. Summary: This is a first step toward getting proper support for aggregate loads and stores. Test Plan: Added unittests Reviewers: reames, chandlerc Reviewed By: chandlerc Subscribers: majnemer, joker.eph, chandlerc, llvm-commits Differential Revision: http://reviews.llvm.org/D7780 Patch by Amaury Sechet From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 232284	2015-03-14 22:19:33 +00:00
Duncan P. N. Exon Smith	e5ae5021db	instcombine: alloca: Canonicalize scalar allocation array size As a follow-up to r232200, add an `-instcombine` to canonicalize scalar allocations to `i32 1`. Since r232200, `iX 1` (for X != 32) are only created by RAUWs, so this shouldn't fire too often. Nevertheless, it's a cheap check and a nice cleanup. llvm-svn: 232202	2015-03-13 19:42:09 +00:00
Duncan P. N. Exon Smith	4326380356	AsmWriter: Write alloca array size explicitly (and -instcombine fixup) Write the `alloca` array size explicitly when it's non-canonical. Previously, if the array size was `iX 1` (where X is not 32), the type would mutate to `i32` when round-tripping through assembly. The testcase I added fails in `verify-uselistorder` (as well as `FileCheck`), since the use-lists for `i32 1` and `i64 1` change. (Manman Ren came across this when running `verify-uselistorder` on some non-trivial, optimized code as part of PR5680.) The type mutation started with r104911, which allowed array sizes to be something other than an `i32`. Starting with r204945, we "canonicalized" to `i64` on 64-bit platforms -- and then on every round-trip through assembly, mutated back to `i32`. I bundled a fixup for `-instcombine` to avoid r204945 on scalar allocations. (There wasn't a clean way to sequence this into two commits, since the assembly change on its own caused testcase churn, and the `-instcombine` change can't be tested without the assembly changes.) An obvious alternative fix -- change `AllocaInst::AllocaInst()`, `AsmWriter` and `LLParser` to treat `intptr_t` as the canonical type for scalar allocations -- was rejected out of hand, since this required teaching them each about the data layout. A follow-up commit will add an `-instcombine` to canonicalize the scalar allocation array size to `i32 1` rather than leaving `iX 1` alone. rdar://problem/20075773 llvm-svn: 232200	2015-03-13 19:30:44 +00:00
David Blaikie	3ea2df7c7b	[opaque pointer type] Add textual IR support for explicit type parameter to gep operator Similar to gep (r230786) and load (r230794) changes. Similar migration script can be used to update test cases, which successfully migrated all of LLVM and Polly, but about 4 test cases needed manually changes in Clang. (this script will read the contents of stdin and massage it into stdout - wrap it in the 'apply.sh' script shown in previous commits + xargs to apply it over a large set of test cases) import fileinput import sys import re rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s$)((<\d\s+x\s+)?([^@]?)(\|\saddrspace\(\d+$)\s\(?(3)>)\s*)(?=$\|%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|zeroinitializer\|<\|\[\[[a-zA-Z]\|\{\{)", re.MULTILINE \| re.DOTALL) def conv(match): line = match.group(1) line += match.group(4) line += ", " line += match.group(2) return line line = sys.stdin.read() off = 0 for match in re.finditer(rep, line): sys.stdout.write(line[off:match.start()]) sys.stdout.write(conv(match)) off = match.end() sys.stdout.write(line[off:]) llvm-svn: 232184	2015-03-13 18:20:45 +00:00
David Majnemer	613caa1eaa	InstCombine: Don't fold call bitcast into args if callee is byval This fixes a bug reported here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150309/265341.html llvm-svn: 231948	2015-03-11 18:03:05 +00:00
Sanjay Patel	f8b6a29222	Inliner should not add callgraph edges for intrinsic calls (PR22857) The CallGraphNode function "addCalledFunction()" asserts that edges are not to intrinsics. This patch makes sure that the Inliner does not add such an edge to the callgraph. Fix for clang crash by assertion: https://llvm.org/bugs/show_bug.cgi?id=22857 Differential Revision: http://reviews.llvm.org/D8231 llvm-svn: 231927	2015-03-11 15:12:32 +00:00
Philip Reames	d1be160c96	If a conditional branch jumps to the same target, remove the condition Given that large parts of inst combine is restricted to instructions which have one use, getting rid of a use on the condition can help the effectiveness of the optimizer. Also, it allows the condition to potentially be deleted by instcombine rather than waiting for another pass. I noticed this completely by accident in another test case. It's not anything that actually came from a real workload. p.s. We should probably do the same thing for switch instructions. Differential Revision: http://reviews.llvm.org/D8220 llvm-svn: 231881	2015-03-10 22:52:37 +00:00
Philip Reames	6b5f658b6c	Infer known bits from dominating conditions This patch adds limited support in ValueTracking for inferring known bits of a value from conditional expressions which must be true to reach the instruction we're trying to optimize. At this time, the feature is off by default. Once landed, I'm hoping for feedback from others on both profitability and compile time impact. Forms of conditional value propagation have been tried in LLVM before and have failed due to compile time problems. In an attempt to side step that, this patch only considers conditions where the edge leaving the branch dominates the context instruction. It does not attempt full dataflow. Even with that restriction, it handles many interesting cases: * Early exits from functions * Early exits from loops (for context instructions in the loop and after the check) * Conditions which control entry into loops, including multi-version loops (such as those produced during vectorization, IRCE, loop unswitch, etc..) Possible applications include optimizing using information provided by constructs such as: preconditions, assumptions, null checks, & range checks. This patch implements two approaches to the problem that need further benchmarking. Approach 1 is to directly walk the dominator tree looking for interesting conditions. Approach 2 is to inspect other uses of the value being queried for interesting comparisons. From initial benchmarking, it appears that Approach 2 is faster than Approach 1, but this needs to be further validated. Differential Revision: http://reviews.llvm.org/D7708 llvm-svn: 231879	2015-03-10 22:43:20 +00:00
Owen Anderson	fa465e39c1	Fix a crash in InstCombine where we could try to truncate a switch comparison to zero width. llvm-svn: 231761	2015-03-10 06:51:39 +00:00
Owen Anderson	c22a907f78	Fix an infinite loop in InstCombine when an instruction with no users and side effects can be constant folded. ReplaceInstUsesWith needs to return nullptr when the input has no users, because in that case it does not mutate the program. Otherwise, we can get stuck in an infinite loop of repeatedly attempting to constant fold and instruction with no users. llvm-svn: 231755	2015-03-10 05:13:47 +00:00
Mehdi Amini	201b59d62e	InstCombine: fix fold "fcmp x, undef" to account for NaN Summary: See the two test cases. ; Can fold fcmp with undef on one side by choosing NaN for the undef ; Can fold fcmp with undef on both side ; fcmp u_pred undef, undef -> true ; fcmp o_pred undef, undef -> false ; because whatever you choose for the first undef ; you can choose NaN for the other undef Reviewers: hfinkel, chandlerc, majnemer Reviewed By: majnemer Subscribers: majnemer, llvm-commits Differential Revision: http://reviews.llvm.org/D7617 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231626	2015-03-09 03:20:25 +00:00
Owen Anderson	e54101bbb5	Teach DataLayout to infer a plausible alignment for things even when nothing is specified by the user. llvm-svn: 231613	2015-03-08 21:53:59 +00:00
Nadav Rotem	f095e52b93	Teach ComputeNumSignBits about signed reminder. This optimization a continuation of r231140 that reasoned about signed div. llvm-svn: 231433	2015-03-06 00:23:58 +00:00
Michael Kuperstein	92d61decb9	[InstCombine] Fix an assertion when fmul has a ConstantExpr operand isNormalFp and isFiniteNonZeroFp should not assume vector operands can not be constant expressions. Patch by Pawel Jurek <pawel.jurek@intel.com> Differential Revision: http://reviews.llvm.org/D8053 llvm-svn: 231359	2015-03-05 08:38:57 +00:00
Mehdi Amini	29ebc2d39f	Make DataLayout Non-Optional in the Module Summary: DataLayout keeps the string used for its creation. As a side effect it is no longer needed in the Module. This is "almost" NFC, the string is no longer canonicalized, you can't rely on two "equals" DataLayout having the same string returned by getStringRepresentation(). Get rid of DataLayoutPass: the DataLayout is in the Module The DataLayout is "per-module", let's enforce this by not duplicating it more than necessary. One more step toward non-optionality of the DataLayout in the module. Make DataLayout Non-Optional in the Module Module->getDataLayout() will never returns nullptr anymore. Reviewers: echristo Subscribers: resistor, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D7992 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231270	2015-03-04 18:43:29 +00:00
David Majnemer	0a7829af7e	InstCombine: Ensure select condition types are identical before merging Selection conditions may be vectors or scalars. Make sure InstCombine doesn't indiscriminately assume that a select which is value dependent on another select have identical select condition types. This fixes PR22773. llvm-svn: 231156	2015-03-03 22:40:36 +00:00
Nadav Rotem	0f7a38b97d	Teach ComputeNumSignBits about signed divisions. http://reviews.llvm.org/D8028 rdar://20023136 llvm-svn: 231140	2015-03-03 21:39:02 +00:00
Duncan P. N. Exon Smith	8d1b74869c	DebugInfo: Move new hierarchy into place Move the specialized metadata nodes for the new debug info hierarchy into place, finishing off PR22464. I've done bootstraps (and all that) and I'm confident this commit is NFC as far as DWARF output is concerned. Let me know if I'm wrong :). The code changes are fairly mechanical: - Bumped the "Debug Info Version". - `DIBuilder` now creates the appropriate subclass of `MDNode`. - Subclasses of DIDescriptor now expect to hold their "MD" counterparts (e.g., `DIBasicType` expects `MDBasicType`). - Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp` for printing comments. - Big update to LangRef to describe the nodes in the new hierarchy. Feel free to make it better. Testcase changes are enormous. There's an accompanying clang commit on its way. If you have out-of-tree debug info testcases, I just broke your build. - `upgrade-specialized-nodes.sh` is attached to PR22564. I used it to update all the IR testcases. - Unfortunately I failed to find way to script the updates to CHECK lines, so I updated all of these by hand. This was fairly painful, since the old CHECKs are difficult to reason about. That's one of the benefits of the new hierarchy. This work isn't quite finished, BTW. The `DIDescriptor` subclasses are almost empty wrappers, but not quite: they still have loose casting checks (see the `RETURN_FROM_RAW()` macro). Once they're completely gutted, I'll rename the "MD" classes to "DI" and kill the wrappers. I also expect to make a few schema changes now that it's easier to reason about everything. llvm-svn: 231082	2015-03-03 17:24:31 +00:00
David Blaikie	ab043ff680	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794	2015-02-27 21:17:42 +00:00
David Blaikie	0d99339102	[opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float> %x, ... ->getelementptr float, <4 x float> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.?[^%\w]getelementptr inbounds )(((?:<\d x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") normrep = re.compile( r"(^.?[^%\w]getelementptr )(((?:<\d* x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll \| xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786	2015-02-27 19:29:02 +00:00
Hal Finkel	5cc8afebfd	[InstCombine/PowerPC] Convert aligned QPX load/store intrinsics into loads/stores InstCombine has long had logic to convert aligned Altivec load/store intrinsics into regular loads and stores. This mirrors that functionality for QPX vector load/store intrinsics. llvm-svn: 230660	2015-02-26 18:56:03 +00:00
Hal Finkel	13407e3fb6	[InstCombine] Add a test for altivec load/store intrinsic simplification InstCombine has logic to convert aligned Altivec load/store intrinsics into regular loads and stores. Unfortunately, there seems to be no regression test covering this behavior. Adding one... llvm-svn: 230632	2015-02-26 14:22:41 +00:00
JF Bastien	0f477f42f6	InstCombine: extract instead of shuffle when performing vector/array type punning Summary: SROA generates code that isn't quite as easy to optimize and contains unusual-sized shuffles, but that code is generally correct. As discussed in D7487 the right place to clean things up is InstCombine, which will pick up the type-punning pattern and transform it into a more obvious bitcast+extractelement, while leaving the other patterns SROA encounters as-is. Test Plan: make check Reviewers: jvoung, chandlerc Subscribers: llvm-commits llvm-svn: 230560	2015-02-25 22:30:51 +00:00
Charles Davis	f135d6973b	[IC] Turn non-null MD on pointer loads to range MD on integer loads. Summary: This change fixes the FIXME that you recently added when you committed (a modified version of) my patch. When `InstCombine` combines a load and store of an pointer to those of an equivalently-sized integer, it currently drops any `!nonnull` metadata that might be present. This change replaces `!nonnull` metadata with `!range !{ 1, -1 }` metadata instead. Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7621 llvm-svn: 230462	2015-02-25 05:10:25 +00:00
Sanjoy Das	de271b53ef	New instcombine rule: max(~a,~b) -> ~min(a, b) This case is interesting because ScalarEvolutionExpander lowers min(a, b) as ~max(~a,~b). I think the profitability heuristics can be made more clever/aggressive, but this is a start. Differential Revision: http://reviews.llvm.org/D7821 llvm-svn: 230285	2015-02-24 00:08:41 +00:00
Hal Finkel	09a36680d7	[InstCombine] Remove unnecessary variable indexing into single-element arrays This change addresses a deficiency pointed out in PR22629. To copy from the bug report: [from the bug report] Consider this code: int f(int x) { int a[] = {12}; return a[x]; } GCC knows to optimize this to movl $12, %eax ret The code generated by recent Clang at -O3 is: movslq %edi, %rax movl .L_ZZ1fiE1a(,%rax,4), %eax retq .L_ZZ1fiE1a: .long 12 # 0xc [end from the bug report] This definitely seems worth fixing. I've also seen this kind of code before (as the base case of generic vector wrapper templates with one element). The general idea is to look at the GEP feeding a load or a store, which has some variable as its first non-zero index, and determine if that index must be zero (or else an out-of-bounds access would occur). We can do this for allocas and globals with constant initializers where we know the maximum size of the underlying object. When we find such a GEP, we create a new one for the memory access with that first variable index replaced with a constant zero. Even if we can't eliminate the memory access (and sometimes we can't), it is still useful because it removes unnecessary indexing calculations. llvm-svn: 229959	2015-02-20 03:05:53 +00:00
Akira Hatanaka	913155d842	[InstCombine] Do not insert a GEP instruction before a landingpad instruction. InstCombiner::visitGetElementPtrInst was using getFirstNonPHI to compute the insertion point, which caused the verifier to complain when a GEP was inserted before a landingpad instruction. This commit fixes it to use getFirstInsertionPt instead. rdar://problem/19394964 llvm-svn: 229619	2015-02-18 03:30:11 +00:00
Mehdi Amini	ce11b626e7	InstCombine: fold more cases of (fp_to_u/sint (u/sint_to_fp val)) Fixes radar 15486701. From: Fiona Glaser <fglaser@apple.com> llvm-svn: 229437	2015-02-16 21:47:54 +00:00
Mehdi Amini	1468b597ea	Tests: reformat sitofp.ll and use FileCheck From: Fiona Glaser <fglaser@apple.com> llvm-svn: 229436	2015-02-16 21:47:50 +00:00
Ramkumar Ramachandra	af4f23c6ae	InstCombine: propagate deref via new addDereferenceableAttr The "dereferenceable" attribute cannot be added via .addAttribute(), since it also expects a size in bytes. AttrBuilder#addAttribute or AttributeSet#addAttribute is wrapped by classes Function, InvokeInst, and CallInst. Add corresponding wrappers to AttrBuilder#addDereferenceableAttr. Having done this, propagate the dereferenceable attribute via gc.relocate, adding a test to exercise it. Note that -datalayout is required during execution over and above -instcombine, because InstCombine only optionally requires DataLayoutPass. Differential Revision: http://reviews.llvm.org/D7510 llvm-svn: 229265	2015-02-14 19:37:54 +00:00
Philip Reames	45adc98469	[InstCombine] When canonicalizing gep indices, prefer zext when possible If we know that the sign bit of a value being sign extended is zero, we can use a zero extension instead. This is motivated by the fact that zero extensions are generally cheaper on x86 (and most other architectures?). We already apply a similar transform in DAGCombine, this just extends that to the IR level. This comes up when we eagerly canonicalize gep indices to the width of a machine register (i64 on x86_64). To do so, we insert sign extensions (sext) to promote smaller types. Differential Revision: http://reviews.llvm.org/D7255 llvm-svn: 229189	2015-02-14 00:05:36 +00:00
Andrea Di Biagio	0c23f4cf6c	[InstCombine] Fix regression introduced at r227197. This patch fixes a problem I accidentally introduced in an instruction combine on select instructions added at r227197. That revision taught the instruction combiner how to fold a cttz/ctlz followed by a icmp plus select into a single cttz/ctlz with flag 'is_zero_undef' cleared. However, the new rule added at r227197 would have produced wrong results in the case where a cttz/ctlz with flag 'is_zero_undef' cleared was follwed by a zero-extend or truncate. In that case, the folded instruction would have been inserted in a wrong location thus leaving the CFG in an inconsistent state. This patch fixes the problem and add two reproducible test cases to existing test 'InstCombine/select-cmp-cttz-ctlz.ll'. llvm-svn: 229124	2015-02-13 16:33:34 +00:00
Michael Liao	1e3950b179	[InstCombine] Fix a bug when combining `icmp` from `ptrtoint` - First, there's a crash when we try to combine that pointers into `icmp` directly by creating a `bitcast`, which is invalid if that two pointers are from different address spaces. - It's not always appropriate to cast one pointer to another if they are from different address spaces as that is not no-op cast. Instead, we only combine `icmp` from `ptrtoint` if that two pointers are of the same address space. llvm-svn: 229063	2015-02-13 04:51:26 +00:00
Chandler Carruth	cbd5e9d8c2	[IC] Fix a bug with the instcombine canonicalizing of loads and propagating of metadata. We were propagating !nonnull metadata even when the newly formed load is no longer of a pointer type. This is clearly broken and results in LLVM failing the verifier and aborting. This patch just restricts the propagation of !nonnull metadata to when we actually have a pointer type. This bug report and the initial version of this patch was provided by Charles Davis! Many thanks for finding this! We still need to add logic to round-trip the metadata correctly if we combine from pointer types to integer types and then back by using range metadata for the integer type loads. But this is the minimal and safe version of the patch, which is important so we can backport it into 3.6. llvm-svn: 229029	2015-02-13 02:30:01 +00:00
Benjamin Kramer	3b1b5b8725	InstCombine: Allow folding of xor into icmp by changing the predicate for vectors The loop vectorizer can create this pattern. llvm-svn: 228954	2015-02-12 20:26:46 +00:00
Chandler Carruth	3f645fb28b	Revert r228556: InstCombine: propagate nonNull through assume This commit isn't using the correct context, and is transfoming calls that are operands to loads rather than calls that are operands to an icmp feeding into an assume. I've replied on the original review thread with a very reduced test case and some thoughts on how to rework this. llvm-svn: 228677	2015-02-10 08:07:32 +00:00
Ramkumar Ramachandra	570fa73130	InstCombine: propagate nonNull through assume Make assume (load (call\|invoke) != null) set nonNull return attribute for the call and invoke. Also include tests. Differential Revision: http://reviews.llvm.org/D7107 llvm-svn: 228556	2015-02-09 01:13:13 +00:00
Matthias Braun	7549566067	InstCombine: Combine select sequences into a single select Normalize select(C0, select(C1, a, b), b) -> select((C0 & C1), a, b) select(C0, a, select(C1, a, b)) -> select((C0 \| C1), a, b) This normal form may enable further combines on the And/Or and shortens paths for the values. Many targets prefer the other but can go back easily in CodeGen. Differential Revision: http://reviews.llvm.org/D7399 llvm-svn: 228409	2015-02-06 17:49:36 +00:00
Reid Kleckner	031930cb0b	Move EH personality type classification to Analysis/LibCallSemantics.h Summary: Also add enum types for __C_specific_handler and _CxxFrameHandler3 for which we know a few things. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7214 llvm-svn: 227284	2015-01-28 01:17:38 +00:00
Ahmed Bougacha	63ac678e11	[SimplifyLibCalls] Don't confuse strcpy_chk for stpcpy_chk. This was introduced in a faulty refactoring (r225640, mea culpa): the tests weren't testing the return values, so, for both __strcpy_chk and __stpcpy_chk, we would return the end of the buffer (matching stpcpy) instead of the beginning (for strcpy). The root cause was the prefix "__" being ignored when comparing, which made us always pick LibFunc::stpcpy_chk. Pass the LibFunc::Func directly to avoid this kind of error. Also, make the testcases as explicit as possible to prevent this. The now-useful testcases expose another, entangled, stpcpy problem, with the further simplification. This was introduced in a refactoring (r225640) to match the original behavior. However, this leads to problems when successive simplifications generate several similar instructions, none of which are removed by the custom replaceAllUsesWith. For instance, InstCombine (the main user) doesn't erase the instruction in its custom RAUW. When trying to simplify say __stpcpy_chk: - first, an stpcpy is created (fortified simplifier), - second, a memcpy is created (normal simplifier), but the stpcpy call isn't removed. - third, InstCombine later revisits the instructions, and simplifies the first stpcpy to a memcpy. We now have two memcpys. llvm-svn: 227250	2015-01-27 21:52:16 +00:00
Andrea Di Biagio	a698cd7c36	[InstCombine] Teach how to fold a select into a cttz/ctlz with the 'is_zero_undef' flag. This patch teaches the Instruction Combiner how to fold a cttz/ctlz followed by a icmp plus select into a single cttz/ctlz with flag 'is_zero_undef' cleared. Added test InstCombine/select-cmp-cttz-ctlz.ll. llvm-svn: 227197	2015-01-27 15:58:14 +00:00
Chandler Carruth	3f06e001ed	[PM] Port instcombine to the new pass manager! This is exciting as this is a much more involved port. This is a complex, existing transformation pass. All of the core logic is shared between both old and new pass managers. Only the access to the analyses is separate because the actual techniques are separate. This also uses a bunch of different and interesting analyses and is the first time where we need to use an analysis across an IR layer. This also paves the way to expose instcombine utility functions. I've got a static function that implements the core pass logic over a function which might be mildly interesting, but more interesting is likely exposing a routine which just uses instructions already in the worklist and combines until empty. I've switched one of my favorite instcombine tests to run with both as well to make sure this keeps working. llvm-svn: 226987	2015-01-24 04:19:17 +00:00
Chandler Carruth	001ed18423	[canonicalize] Teach InstCombine to canonicalize loads which are only ever stored to always use a legal integer type if one is available. Regardless of whether this particular type is good or bad, it ensures we don't get weird differences in generated code (and resulting performance) from "equivalent" patterns that happen to end up using a slightly different type. After some discussion on llvmdev it seems everyone generally likes this canonicalization. However, there may be some parts of LLVM that handle it poorly and need to be fixed. I have at least verified that this doesn't impede GVN and instcombine's store-to-load forwarding powers in any obvious cases. Subtle cases are exactly what we need te flush out if they remain. Also note that this IR pattern should already be hitting LLVM from Clang at least because it is exactly the IR which would be produced if you used memcpy to copy a pointer or floating point between memory instead of a variable. llvm-svn: 226781	2015-01-22 05:08:12 +00:00
David Majnemer	f4834fe3ed	InstCombine: Don't strip bitcasts off of callsites marked 'thunk' The return type of a thunk is meaningless, we just want the arguments and return value to be forwarded. llvm-svn: 226708	2015-01-21 22:32:04 +00:00
Richard Smith	cd37c22fb7	For PR21145: recognise a builtin call to a known deallocation function even if it's defined in the current module. Clang generates this situation for the C++14 sized deallocation functions, because it generates a weak definition in case one isn't provided by the C++ runtime library. llvm-svn: 226069	2015-01-15 01:00:33 +00:00
Duncan P. N. Exon Smith	4a5feedcaa	IR: Move MDLocation into place This commit moves `MDLocation`, finishing off PR21433. There's an accompanying clang commit for frontend testcases. I'll attach the testcase upgrade script I used to PR21433 to help out-of-tree frontends/backends. This changes the schema for `DebugLoc` and `DILocation` from: !{i32 3, i32 7, !7, !8} to: !MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8) Note that empty fields (line/column: 0 and inlinedAt: null) don't get printed by the assembly writer. llvm-svn: 226048	2015-01-14 22:27:36 +00:00
David Majnemer	b89ab3fd88	InstCombine: Don't take A-B<0 into A<B if A-B has other uses This fixes PR22226. llvm-svn: 226023	2015-01-14 19:26:56 +00:00
Ahmed Bougacha	284d7745d1	[SimplifyLibCalls] Don't try to simplify indirect calls. It turns out, all callsites of the simplifier are guarded by a check for CallInst::getCalledFunction (i.e., to make sure the callee is direct). This check wasn't done when trying to further optimize a simplified fortified libcall, introduced by a refactoring in r225640. Fix that, add a testcase, and document the requirement. llvm-svn: 225895	2015-01-14 00:55:05 +00:00
Matt Arsenault	4b66850c40	Fix fcmp + fabs instcombines when using the intrinsic This was only handling the libcall. This is another example of why only the intrinsic should ever be used when it exists. llvm-svn: 225465	2015-01-08 20:09:34 +00:00
Matt Arsenault	0d86cea633	Fix using wrong intrinsic in test This is a leftover from renaming the intrinsic. It's surprising the unknown llvm. intrinsic wasn't rejected. llvm-svn: 225304	2015-01-06 23:00:33 +00:00
Matt Arsenault	416be52fbf	Convert fcmp with 0.0 from casted integers to icmp This is already handled in general when it is known the conversion can't lose bits with smaller integer types casted into wider floating point types. This pattern happens somewhat often in GPU programs that cast workitem intrinsics to float, which are often compared with 0. Specifically handle the special case of compares with zero which should also be known to not lose information. I had a more general version of this which allows equality compares if the casted float is exactly representable in the integer, but I'm not 100% confident that is always correct. Also fold cases that aren't integers to true / false. llvm-svn: 225265	2015-01-06 15:50:59 +00:00
David Majnemer	82fa22459b	InstCombine: Bitcast call arguments from/to pointer/integer type Try harder to get rid of bitcast'd calls by ptrtoint/inttoptr'ing arguments and return values when DataLayout says it is safe to do so. llvm-svn: 225254	2015-01-06 08:41:31 +00:00
David Majnemer	04c9e5e52d	InstCombine: match can find ConstantExprs, don't assume we have a Value We assumed the output of a match was a Value, this would cause us to assert because we would fail a cast<>. Instead, use a helper in the Operator family to hide the distinction between Value and Constant. This fixes PR22087. llvm-svn: 225127	2015-01-04 07:36:02 +00:00
David Majnemer	78198d7245	InstCombine: Detect when llvm.umul.with.overflow always overflows We know overflow always occurs if both ~LHSKnownZero * ~RHSKnownZero and LHSKnownOne * RHSKnownOne overflow. llvm-svn: 225077	2015-01-02 07:29:47 +00:00
Sanjay Patel	657e61ccbf	InstCombine: fsub nsz 0, X ==> fsub nsz -0.0, X Some day the backend may handle instruction-level fast math flags and make this transform unnecessary, but it's still better practice to use the canonical representation of fneg when possible (use a -0.0). This is a partial fix for PR20870 ( http://llvm.org/bugs/show_bug.cgi?id=20870 ). See also http://reviews.llvm.org/D6723. Differential Revision: http://reviews.llvm.org/D6731 llvm-svn: 225050	2014-12-31 22:14:05 +00:00
David Majnemer	83939d3744	InstCombine: try to transform A-B < 0 into A < B We are allowed to move the 'B' to the right hand side if we an prove there is no signed overflow and if the comparison itself is signed. llvm-svn: 225034	2014-12-31 04:21:41 +00:00
Philip Reames	4527f27ca2	Carry facts about nullness and undef across GC relocation This change implements four basic optimizations: If a relocated value isn't used, it doesn't need to be relocated. If the value being relocated is null, relocation doesn't change that. (Technically, this might be collector specific. I don't know of one which it doesn't work for though.) If the value being relocated is undef, the relocation is meaningless. If the value being relocated was known nonnull, the relocated pointer also isn't null. (Since it points to the same source language object.) I outlined other planned work in comments. Differential Revision: http://reviews.llvm.org/D6600 llvm-svn: 224968	2014-12-29 23:27:30 +00:00
Philip Reames	a4f427e6e7	Loading from null is valid outside of addrspace 0 This patches fixes a miscompile where we were assuming that loading from null is undefined and thus we could assume it doesn't happen. This transform is perfectly legal in address space 0, but is not neccessarily legal in other address spaces. We really should introduce a hook to control this property on a per target per address space basis. We may be loosing valuable optimizations in some address spaces by being too conservative. Original patch by Thomas P Raoux (submitted to llvm-commits), tests and formatting fixes by me. llvm-svn: 224961	2014-12-29 22:46:21 +00:00
David Majnemer	c57dbde8fa	InstCombine: Infer nuw for multiplies A multiply cannot unsigned wrap if there are bitwidth, or more, leading zero bits between the two operands. llvm-svn: 224849	2014-12-26 09:50:35 +00:00
David Majnemer	ce5bd510cb	InstCombe: Infer nsw for multiplies We already utilize this logic for reducing overflow intrinsics, it makes sense to reuse it for normal multiplies as well. llvm-svn: 224847	2014-12-26 09:10:14 +00:00
Michael Kuperstein	180649bb38	[ValueTracking] Move GlobalAlias handling to be after the max depth check in computeKnownBits() GlobalAlias handling used to be after GlobalValue handling, which meant it was, in practice, dead code. r220165 moved GlobalAlias handling to be before GlobalValue handling, but also moved it to be before the max depth check, causing an assert due to a recursion depth limit violation. This moves GlobalAlias handling forward to where it's safe, and changes the GlobalValue handling to only look at GlobalObjects. Differential Revision: http://reviews.llvm.org/D6758 llvm-svn: 224765	2014-12-23 11:33:41 +00:00
David Majnemer	ee1d92da94	This should have been part of r224676. llvm-svn: 224677	2014-12-20 04:48:34 +00:00
David Majnemer	3da9d34415	InstCombine: Squash an icmp+select into bitwise arithmetic (X & INT_MIN) == 0 ? X ^ INT_MIN : X into X \| INT_MIN (X & INT_MIN) != 0 ? X ^ INT_MIN : X into X & INT_MAX This fixes PR21993. llvm-svn: 224676	2014-12-20 04:45:35 +00:00
Bruno Cardoso Lopes	b25873afd1	Reapply: [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. Also, fix code to also return the modified switch when only the truncation is performed. This fixes an assertion crash. Differential Revision: http://reviews.llvm.org/D6644 rdar://problem/19191835 llvm-svn: 224588	2014-12-19 17:12:35 +00:00
Sanjay Patel	b1bb7100db	use -0.0 when creating an fneg instruction Backends recognize (-0.0 - X) as the canonical form for fneg and produce better code. Eg, ppc64 with 0.0: lis r2, ha16(LCPI0_0) lfs f0, lo16(LCPI0_0)(r2) fsubs f1, f0, f1 blr vs. -0.0: fneg f1, f1 blr Differential Revision: http://reviews.llvm.org/D6723 llvm-svn: 224583	2014-12-19 16:44:08 +00:00
Bruno Cardoso Lopes	ba090bc5c2	Revert "[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr" Reverts commit r224574 to appease buildbots: The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. This fixes an assertion crash. llvm-svn: 224576	2014-12-19 14:36:24 +00:00
Bruno Cardoso Lopes	afb4b15ab3	[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. This fixes an assertion crash. Differential Revision: http://reviews.llvm.org/D6644 rdar://problem/19191835 llvm-svn: 224574	2014-12-19 14:23:15 +00:00
Erik Eckstein	042c032147	Strength reduce intrinsics with overflow into regular arithmetic operations if possible. Some intrinsics, like s/uadd.with.overflow and umul.with.overflow, are already strength reduced. This change adds other arithmetic intrinsics: s/usub.with.overflow, smul.with.overflow. It completes the work on PR20194. llvm-svn: 224417	2014-12-17 07:29:19 +00:00
Duncan P. N. Exon Smith	9c5542c040	IR: Make metadata typeless in assembly Now that `Metadata` is typeless, reflect that in the assembly. These are the matching assembly changes for the metadata/value split in r223802. - Only use the `metadata` type when referencing metadata from a call intrinsic -- i.e., only when it's used as a `Value`. - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode` when referencing it from call intrinsics. So, assembly like this: define @foo(i32 %v) { call void @llvm.foo(metadata !{i32 %v}, metadata !0) call void @llvm.foo(metadata !{i32 7}, metadata !0) call void @llvm.foo(metadata !1, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{metadata !3}, metadata !0) ret void, !bar !2 } !0 = metadata !{metadata !2} !1 = metadata !{i32* @global} !2 = metadata !{metadata !3} !3 = metadata !{} turns into this: define @foo(i32 %v) { call void @llvm.foo(metadata i32 %v, metadata !0) call void @llvm.foo(metadata i32 7, metadata !0) call void @llvm.foo(metadata i32* @global, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{!3}, metadata !0) ret void, !bar !2 } !0 = !{!2} !1 = !{i32* @global} !2 = !{!3} !3 = !{} I wrote an upgrade script that handled almost all of the tests in llvm and many of the tests in cfe (even handling many `CHECK` lines). I've attached it (or will attach it in a moment if you're speedy) to PR21532 to help everyone update their out-of-tree testcases. This is part of PR21532. llvm-svn: 224257	2014-12-15 19:07:53 +00:00
David Majnemer	70ded5026c	ValueTracking: Don't recurse too deeply in computeKnownBitsFromAssume Respect the MaxDepth recursion limit, doing otherwise will trigger an assert in computeKnownBits. This fixes PR21891. llvm-svn: 224168	2014-12-12 23:59:29 +00:00
Steven Wu	896d3dd47b	Fix another infinite loop in InstCombine Summary: InstCombine infinite-loops for the testcase added It is because InstCombine is generating instructions that can be optimized by itself. Fix by not optimizing frem if the optimized type is the same as original type. rdar://problem/19150820 Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D6634 llvm-svn: 224097	2014-12-12 04:34:07 +00:00
Andrea Di Biagio	6186490ec7	[InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi. This patch teaches the instruction combiner how to fold a call to 'insertqi' if the 'length field' (3rd operand) is set to zero, and if the sum between field 'length' and 'bit index' (4th operand) is bigger than 64. From the AMD64 Architecture Programmer's Manual: 1. If the sum of the bit index + length field is greater than 64, then the results are undefined; 2. A value of zero in the field length is defined as a length of 64. This patch improves the existing combining logic for intrinsic 'insertqi' adding extra checks to address both point 1. and point 2. Differential Revision: http://reviews.llvm.org/D6583 llvm-svn: 224054	2014-12-11 20:44:59 +00:00
David Majnemer	cde1ba6638	ConstantFold, InstSimplify: undef >>a x can be either -1 or 0, choose 0 Zero is usually a nicer constant to have than -1. llvm-svn: 223969	2014-12-10 21:58:15 +00:00
Chandler Carruth	2100e2da37	Revert r223764 which taught instcombine about integer-based elment extraction patterns. This is causing Clang to miscompile itself for 32-bit x86 somehow, and likely also on ARM and PPC. I really don't know how, but reverting now that I've confirmed this is actually the culprit. I have a reproduction as well and so should be able to restore this shortly. This reverts commit r223764. Original commit log follows: Teach instcombine to canonicalize "element extraction" from a load of an integer and "element insertion" into a store of an integer into actual element extraction, element insertion, and vector loads and stores. Previously various parts of LLVM (including instcombine itself) would introduce integer loads and stores into the code as a way of opaquely loading and storing "bits". In some cases (such as a memcpy of std::complex<float> object) we will eventually end up using those bits in non-integer types. In order for SROA to effectively promote the allocas involved, it splits these "store a bag of bits" integer loads and stores up into the constituent parts. However, for non-alloca loads and tsores which remain, it uses integer math to recombine the values into a large integer to load or store. All of this would be "fine", except that it forces LLVM to go through integer math to combine and split up values. While this makes perfect sense for integers (and in fact is critical for bitfields to end up lowering efficiently) it is terrible for non-integer types, especially floating point types. We have a much more canonical way of representing the act of concatenating the bits of two SSA values in LLVM: a vector and insertelement. This patch teaching InstCombine to use this representation. With this patch applied, LLVM will no longer introduce integer math into the critical path of every loop over std::complex<float> operations such as those that make up the hot path of ... oh, most HPC code, Eigen, and any other heavy linear algebra library. For the record, I looked extensively at fixing this in other parts of the compiler, but it just doesn't work: - We really do want to canonicalize memcpy and other bit-motion to integer loads and stores. SSA values are tremendously more powerful than "copy" intrinsics. Not doing this regresses massive amounts of LLVM's scalar optimizer. - We really do need to split up integer loads and stores of this form in SROA or every memcpy of a trivially copyable struct will prevent SSA formation of the members of that struct. It essentially turns off SROA. - The closest alternative is to actually split the loads and stores when partitioning with SROA, but this has all of the downsides historically discussed of splitting up loads and stores -- the wide-store information is fundamentally lost. We would also see performance regressions for bitfield-heavy code and other places where the integers aren't really intended to be split without seemingly arbitrary logic to treat integers totally differently. - We can effectively fix this in instcombine, so it isn't that hard of a choice to make IMO. llvm-svn: 223813	2014-12-09 19:21:16 +00:00
Sonam Kumari	4fb99c8117	Removal Of Duplicate Test Cases and Addition Of Missing Check Statements llvm-svn: 223768	2014-12-09 10:46:38 +00:00
Ankur Garg	429f69a1e1	[test/Transforms/InstCombine/shift.ll] Removed duplicate test cases. NFC. Removed some duplicate test cases from the file /test/Transforms/InstCombine/shift.ll. test54 and test57 were duplicates of each other. test55 and test58 were duplicates of each other. (Removed test57 and test58) llvm-svn: 223767	2014-12-09 10:35:19 +00:00
Chandler Carruth	318b867199	Teach instcombine to canonicalize "element extraction" from a load of an integer and "element insertion" into a store of an integer into actual element extraction, element insertion, and vector loads and stores. Previously various parts of LLVM (including instcombine itself) would introduce integer loads and stores into the code as a way of opaquely loading and storing "bits". In some cases (such as a memcpy of std::complex<float> object) we will eventually end up using those bits in non-integer types. In order for SROA to effectively promote the allocas involved, it splits these "store a bag of bits" integer loads and stores up into the constituent parts. However, for non-alloca loads and tsores which remain, it uses integer math to recombine the values into a large integer to load or store. All of this would be "fine", except that it forces LLVM to go through integer math to combine and split up values. While this makes perfect sense for integers (and in fact is critical for bitfields to end up lowering efficiently) it is terrible for non-integer types, especially floating point types. We have a much more canonical way of representing the act of concatenating the bits of two SSA values in LLVM: a vector and insertelement. This patch teaching InstCombine to use this representation. With this patch applied, LLVM will no longer introduce integer math into the critical path of every loop over std::complex<float> operations such as those that make up the hot path of ... oh, most HPC code, Eigen, and any other heavy linear algebra library. For the record, I looked extensively at fixing this in other parts of the compiler, but it just doesn't work: - We really do want to canonicalize memcpy and other bit-motion to integer loads and stores. SSA values are tremendously more powerful than "copy" intrinsics. Not doing this regresses massive amounts of LLVM's scalar optimizer. - We really do need to split up integer loads and stores of this form in SROA or every memcpy of a trivially copyable struct will prevent SSA formation of the members of that struct. It essentially turns off SROA. - The closest alternative is to actually split the loads and stores when partitioning with SROA, but this has all of the downsides historically discussed of splitting up loads and stores -- the wide-store information is fundamentally lost. We would also see performance regressions for bitfield-heavy code and other places where the integers aren't really intended to be split without seemingly arbitrary logic to treat integers totally differently. - We can effectively fix this in instcombine, so it isn't that hard of a choice to make IMO. Differential Revision: http://reviews.llvm.org/D6548 llvm-svn: 223764	2014-12-09 08:55:32 +00:00
Sonam Kumari	0b050fd308	Removal Of Duplicate Test Case from shift.ll file llvm-svn: 223648	2014-12-08 09:40:43 +00:00
Philip Reames	80180bc27f	Add a test case for argument type coercion in an invoke of a vararg function This would have caught the bug I fixed in 223370. llvm-svn: 223378	2014-12-04 19:13:45 +00:00
Simon Pilgrim	0b4002e32c	[InstCombine] Minor optimization for bswap with binary ops Added instcombine optimizations for BSWAP with AND/OR/XOR ops: OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) ) OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) ) Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well: fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y)) Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone. Differential Revision: http://reviews.llvm.org/D6407 llvm-svn: 223349	2014-12-04 09:44:01 +00:00
Matthias Braun	8672c85e5e	[SimplifyLibCalls] Improve double->float shrinking to consider constants This allows cases like float x; fmin(1.0, x); to be optimized to fminf(1.0f, x); rdar://19049359 Differential Revision: http://reviews.llvm.org/D6496 llvm-svn: 223270	2014-12-03 21:46:33 +00:00
Matthias Braun	3bca19851f	[SimplifyLibCalls] Enable double to float shrinking for copysign rdar://19049359 Differential Revision: http://reviews.llvm.org/D6495 llvm-svn: 223269	2014-12-03 21:46:29 +00:00
Erik Eckstein	b61d06dbaf	InstCombine: simplify signed range checks Try to convert two compares of a signed range check into a single unsigned compare. Examples: (icmp sge x, 0) & (icmp slt x, n) --> icmp ult x, n (icmp slt x, 0) \| (icmp sgt x, n) --> icmp ugt x, n llvm-svn: 223224	2014-12-03 10:39:15 +00:00
Sonam Kumari	c72e2ad442	[signext.ll] Removal Of Duplicate Test Cases Removed the duplicate test case existing in signext.ll file. llvm-svn: 223109	2014-12-02 05:29:47 +00:00
Sonam Kumari	b1f24c8a5b	Removed extra whitespace. (Testing commit access). NFC. llvm-svn: 222994	2014-12-01 09:27:46 +00:00
David Majnemer	87a17a0975	InstCombine: FoldOrOfICmps harder We may be in a situation where the icmps might not be near each other in a tree of or instructions. Try to dig out related compare instructions and see if they combine. N.B. This won't fire on deep trees of compares because rewritting the tree might end up creating a net increase of IR. We may have to resort to something more sophisticated if this is a real problem. llvm-svn: 222928	2014-11-28 19:58:29 +00:00
Suyog Sarda	375e0af627	Use FileCheck instead of grep. Change by Ankur Garg. Differential Revision: http://reviews.llvm.org/D6430 llvm-svn: 222879	2014-11-27 11:22:49 +00:00
Suyog Sarda	a17e6ed740	Use FileCheck instead of grep. Change by Sonam. Differential Revision: http://reviews.llvm.org/D6432 llvm-svn: 222876	2014-11-27 10:57:24 +00:00
David Majnemer	9698d3487d	InstCombine: Restore optimizations lost in r210006 This restores our ability to optimize: (X & C) == 0 ? X ^ C : X into X \| C (X & C) != 0 ? X ^ C : X into X & ~C llvm-svn: 222871	2014-11-27 07:25:21 +00:00
David Majnemer	d9ae958b9b	InstSimplify: Restore optimizations lost in r210006 This restores our ability to optimize: (X & C) ? X & ~C : X into X & ~C (X & C) ? X : X & ~C into X (X & C) ? X \| C : X into X (X & C) ? X : X \| C into X \| C llvm-svn: 222868	2014-11-27 06:32:46 +00:00
David Majnemer	7e9b94486e	Revert "Added inst combine transforms for single bit tests from Chris's note" This reverts commit r210006, it miscompiled libapr which is used in who knows how many projects. A test has been added to ensure that we don't regress again. I'll work on a rewrite of what the optimization was trying to do later. llvm-svn: 222856	2014-11-26 23:00:38 +00:00
Chandler Carruth	6264bc6537	[InstCombine] Change LLVM To canonicalize toward the value type being stored rather than the pointer type. This change is analogous to r220138 which changed the canonicalization for loads. The rationale is the same: memory does not have a type, operations (and thus the values they produce) have a type. We should match that type as closely as possible rather than reading some form of semantics into the pointer type. With this change, loads and stores should no longer be made with nonsensical types for the values that tehy load and store. This is particularly important when trying to match specific loaded and stored types in the process of doing other instcombines, which is what led me down this twisty maze of miscanonicalization. I've put quite some effort into looking through IR to find places where LLVM's optimizer was being unreasonably conservative in the face of mismatched load and store types, however it is possible (let's say, likely!) I have missed some. If you see regressions here, or from r220138, the likely cause is some part of LLVM failing to cope with load and store types differing. Test cases appreciated, it is important that we root all of these out of LLVM. llvm-svn: 222748	2014-11-25 10:09:51 +00:00
Suyog Sarda	cbd1299080	Change the test case file to use FileCheck instead of grep. NFC. Change by Ankur Garg. Differential Revision: http://reviews.llvm.org/D6382 llvm-svn: 222740	2014-11-25 08:44:56 +00:00
Chandler Carruth	7feb19d89c	Revert r220349 to re-instate r220277 with a fix for PR21330 -- quite clearly only exactly equal width ptrtoint and inttoptr casts are no-op casts, it says so right there in the langref. Make the code agree. Original log from r220277: Teach the load analysis to allow finding available values which require inttoptr or ptrtoint cast provided there is datalayout available. Eventually, the datalayout can just be required but in practice it will always be there today. To go with the ability to expose available values requiring a ptrtoint or inttoptr cast, helpers are added to perform one of these three casts. These smarts are necessary to finish canonicalizing loads and stores to the operational type requirements without regressing fundamental combines. I've added some test cases. These should actually improve as the load combining and store combining improves, but they may fundamentally be highlighting some missing combines for select in addition to exercising the specific added logic to load analysis. llvm-svn: 222739	2014-11-25 08:20:27 +00:00
Matt Arsenault	454e837bd2	Bug 21610: Canonicalize min/max fcmp selects to use ordered comparisons llvm-svn: 222705	2014-11-24 23:15:18 +00:00
Matt Arsenault	ae1a2b7c22	Convert test to FileCheck and use CHECK-LABEL llvm-svn: 222704	2014-11-24 23:03:17 +00:00
David Majnemer	291966cd3b	InstCombine: Don't create an unused instruction We would create an instruction but not inserting it. Not inserting the unused instruction would lead us to verification failure. This fixes PR21653. llvm-svn: 222659	2014-11-24 16:41:13 +00:00
David Majnemer	2445c6caf5	InstCombine: Don't assume DataLayout is always available We tried to get the result of DataLayout::getLargestLegalIntTypeSize but we didn't have a DataLayout. This resulted in opt crashing. This fixes PR21651. llvm-svn: 222645	2014-11-24 07:26:20 +00:00

... 3 4 5 6 7 ...

2022 Commits