llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 04:22:57 +02:00

Author	SHA1	Message	Date
Matthew Simpson	2374105880	Reapply commit r258404 with fix This patch is the second attempt to reapply commit r258404. There was bug in the initial patch and subsequent fix (mentioned below). The initial patch caused an assertion because we were computing smaller type sizes for instructions that cannot be demoted. The fix first determines the instructions that will be demoted, and then applies the smaller type size to only those instructions. This should fix PR26239 and PR26307. llvm-svn: 258929	2016-01-27 13:43:27 +00:00
Chen Li	0516a9ad17	[IndVarSimplify] Rewrite loop exit values with their initial values from loop preheader Summary: This is a revised version of D13974, and the following quoted summary are from D13974 "This patch adds support to check if a loop has loop invariant conditions which lead to loop exits. If so, we know that if the exit path is taken, it is at the first loop iteration. If there is an induction variable used in that exit path whose value has not been updated, it will keep its initial value passing from loop preheader. We can therefore rewrite the exit value with its initial value. This will help remove phis created by LCSSA and enable other optimizations like loop unswitch." D13974 was committed but failed one lnt test. The bug was that we only checked the condition from loop exit's incoming block was a loop invariant. But there could be another condition from loop header to that incoming block not being a loop invariant. This would produce miscompiled code. This patch fixes the issue by checking if the incoming block is loop header, and if not, don't perform the rewrite. The could be further improved by recursively checking all conditions leading to loop exit block, but I'd like to check in this simple version first and improve it with future patches. Reviewers: sanjoy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16570 llvm-svn: 258912	2016-01-27 07:40:41 +00:00
David Majnemer	be8a721ed2	Revert "Revert "[SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818)"" This reverts commit r258903 which reverted r255660. r258903 was an accidental commit and should not have been committed. llvm-svn: 258905	2016-01-27 02:59:41 +00:00
David Majnemer	4ee6f6446b	[SimplifyCFG] Don't mistake icmp of and for a tree of comparisons SimplifyCFG tries to turn complex branch conditions into a switch. Some of it's logic attempts to reason about bitwise arithmetic produced by InstCombine. InstCombine can turn things like (X == 2) \|\| (X == 3) into (X & 1) == 2 and so SimplifyCFG tries to detect when this occurs so that it can produce a switch instruction. However, the legality checking was not sufficient to determine whether or not this had occured. Correctly check this case by requiring that the right-hand side of the comparison be a power of two. This fixes PR26323. llvm-svn: 258904	2016-01-27 02:43:28 +00:00
David Majnemer	991bc63f1d	Revert "[SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818)" This reverts commit r255660. llvm-svn: 258903	2016-01-27 02:43:22 +00:00
Aditya Nandakumar	90ba1bcb73	Reassociate: Reprocess RedoInsts after each inst Previously the RedoInsts was processed at the end of the block. However it was possible that it left behind some instructions that were not canonicalized. This should guarantee that any previous instruction in the basic block is canonicalized before we process a new instruction. llvm-svn: 258830	2016-01-26 18:42:36 +00:00
Sanjay Patel	906306d436	[LibCallSimplifier] fold memset(malloc(x), 0, x) --> calloc(1, x) This is a step towards solving PR25892: https://llvm.org/bugs/show_bug.cgi?id=25892 It won't handle the reported case. As noted by the 'TODO' comments in the patch, we need to relax the hasOneUse() constraint and also match patterns that include memset_chk() and the llvm.memset() intrinsic in addition to memset(). Differential Revision: http://reviews.llvm.org/D16337 llvm-svn: 258816	2016-01-26 16:17:24 +00:00
Matthew Simpson	973e079b66	Revert "Reapply commit r258404 with fix" This commit exposes a crash in computeKnownBits on the Chromium buildbots. Reverting to investigate. Reference: https://llvm.org/bugs/show_bug.cgi?id=26307 llvm-svn: 258812	2016-01-26 15:45:49 +00:00
Haicheng Wu	5302d65f58	[LIR] Add support for structs and hand unrolled loops This is a recommit of r258620 which causes PR26293. The original message: Now LIR can turn following codes into memset: typedef struct foo { int a; int b; } foo_t; void bar(foo_t f, unsigned n) { for (unsigned i = 0; i < n; ++i) { f[i].a = 0; f[i].b = 0; } } void test(foo_t f, unsigned n) { for (unsigned i = 0; i < n; i += 2) { f[i] = 0; f[i+1] = 0; } } llvm-svn: 258777	2016-01-26 02:27:47 +00:00
Dan Gohman	2bd89d3994	Followup to 258750; update more tests to use .p2align . llvm-svn: 258755	2016-01-26 00:35:07 +00:00
Evgeniy Stepanov	258db6665b	[cfi] Cross-DSO CFI diagnostic mode (LLVM part). * __cfi_check gets a 3rd argument: ubsan handler data * Instead of trapping on failure, call __cfi_check_fail which must be present in the module (generated in the frontend). llvm-svn: 258746	2016-01-25 23:35:03 +00:00
Lawrence Hu	baafd4c214	Enable loopreroll to rerool loop with pointer induction variable. Example: while (buf !=end ) { S += buf[0]; S += buf[1]; buf +=2; }; Differential Revision: http://reviews.llvm.org/D13151 llvm-svn: 258709	2016-01-25 19:43:45 +00:00
Lawrence Hu	0572a631ee	Undo commit 258700 due to missing commit message llvm-svn: 258708	2016-01-25 19:36:30 +00:00
Matthew Simpson	d9e4b63bf8	Reapply commit r25804 with fix We were hitting an assertion because we were computing smaller type sizes for instructions that cannot be demoted. The fix first determines the instructions that will be demoted, and then applies the smaller type size to only those instructions. This should fix PR26239. llvm-svn: 258705	2016-01-25 19:24:29 +00:00
Quentin Colombet	06230e1d45	Speculatively revert r258620 as it is the likely culprid of PR26293. llvm-svn: 258703	2016-01-25 19:12:49 +00:00
Lawrence Hu	1cf7c9fba6	Differential Revision: http://reviews.llvm.org/D13151 llvm-svn: 258700	2016-01-25 18:53:39 +00:00
Igor Breger	66fa90c341	AVX1 : Enable vector masked_load/store to AVX1. Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q). Differential Revision: http://reviews.llvm.org/D16528 llvm-svn: 258675	2016-01-25 10:17:11 +00:00
Haicheng Wu	9d77533d54	[LIR] Add support for structs and hand unrolled loops Now LIR can turn following codes into memset: typedef struct foo { int a; int b; } foo_t; void bar(foo_t f, unsigned n) { for (unsigned i = 0; i < n; ++i) { f[i].a = 0; f[i].b = 0; } } void test(foo_t f, unsigned n) { for (unsigned i = 0; i < n; i += 2) { f[i] = 0; f[i+1] = 0; } } llvm-svn: 258620	2016-01-23 06:52:41 +00:00
David Majnemer	f62478a34a	[PruneEH] Don't try to insert a terminator after another terminator LLVM's BasicBlock has a single terminator, it is not valid to have two. llvm-svn: 258616	2016-01-23 06:00:44 +00:00
Matt Arsenault	2d71f7b1ea	AMDGPU: Replace some deprecated intrinsic uses in tests llvm-svn: 258614	2016-01-23 05:42:49 +00:00
David Majnemer	7a3addc91c	[PruneEH] FuncletPads must not have undef operands Instead of RAUW with undef, replace the first non-token instruction with unreachable. This fixes PR26263. llvm-svn: 258611	2016-01-23 05:41:29 +00:00
Matt Arsenault	7a5e15697d	AMDGPU: Rename intrinsics to use amdgcn prefix The intrinsic target prefix should match the target name as it appears in the triple. This is not yet complete, but gets most of the important ones. llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled for compatability for now. llvm-svn: 258557	2016-01-22 21:30:34 +00:00
Sergei Larin	7b219abac0	Make sure that any new and optimized objects created during GlobalOPT copy all the attributes from the base object. Summary: Make sure that any new and optimized objects created during GlobalOPT copy all the attributes from the base object. A good example of improper behavior in the current implementation is section information associated with the GlobalObject. If a section was set for it, and GlobalOpt is creating/modifying a new object based on this one (often copying the original name), without this change new object will be placed in a default section, resulting in inappropriate properties of the new variable. The argument here is that if customer specified a section for a variable, any changes to it that compiler does should not cause it to change that section allocation. Moreover, any other properties worth representation in copyAttributesFrom() should also be propagated. Reviewers: jmolloy, joker-eph, joker.eph Subscribers: slarin, joker.eph, rafael, tobiasvk, llvm-commits Differential Revision: http://reviews.llvm.org/D16074 llvm-svn: 258556	2016-01-22 21:18:20 +00:00
Sanjoy Das	26d6272ad2	[PlaceSafepoints] Introduce a -spp-no-statepoints flag Summary: This change adds a `-spp-no-statepoints` flag to PlaceSafepoints that bypasses the code that wraps newly introduced polls and existing calls in gc.statepoint. With `-spp-no-statepoints` enabled, PlaceSafepoints effectively becomes a safpeoint poll insertion pass. The eventual goal is to "constant fold" this option, along with `-rs4gc-use-deopt-bundles` to `true`, once clients using gc.statepoint are okay doing so. Reviewers: pgavlin, reames, JosephTremoulet Subscribers: sanjoy, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D16439 llvm-svn: 258551	2016-01-22 21:02:55 +00:00
David L Kreitzer	28ea778709	Fix for two constant propagation problems in GVN with the assume intrinsic instruction. Patch by Yuanrui Zhang. Differential Revision: http://reviews.llvm.org/D16100 llvm-svn: 258435	2016-01-21 21:32:35 +00:00
Sanjay Patel	1087b8fb2a	[LibCallSimplifier] don't get fooled by a fake fmin() This is similar to the bug/fix: https://llvm.org/bugs/show_bug.cgi?id=26211 http://reviews.llvm.org/rL258325 The fmin() test case reveals another bug caused by sloppy code duplication. It will crash without this patch because fp128 is a valid floating-point type, but we would think that we had matched a function that used doubles. The new helper function can be used to replace similar checks that are used in several other places in this file. llvm-svn: 258428	2016-01-21 20:19:54 +00:00
David Majnemer	4981d2326a	[InstCombine] Simplify (x >> y) <= x This commit extends the patterns recognised by InstSimplify to also handle (x >> y) <= x in the same way as (x /u y) <= x. The missing optimisation was found investigating why LLVM did not optimise away bound checks in a binary search: https://github.com/rust-lang/rust/pull/30917 Patch by Andrea Canciani! Differential Revision: http://reviews.llvm.org/D16402 llvm-svn: 258422	2016-01-21 18:55:54 +00:00
Rong Xu	6c08b3c582	[PGO] IR level instrumentation of indirect call value profiling This patch adds the instrumentation for indirect call value profiling. It finds all the indirect call-sites and generates instrprof_value_profile intrinsic calls. A new opt level option -disable-vp is introduced to disable this instrumentation. Reviewers: davidxl, betulb, vsk Differential Revision: http://reviews.llvm.org/D16016 llvm-svn: 258417	2016-01-21 18:11:44 +00:00
Matthew Simpson	d8f9568a4c	Revert "[SLP] Truncate expressions to minimum required bit width" This reverts commit r258404. llvm-svn: 258408	2016-01-21 17:17:20 +00:00
Vedant Kumar	28de1d0a47	[GCOV] Avoid emitting profile arcs for module and skeleton CUs Do not emit profile arc files and note files for module and skeleton CU's. Our users report seeing unexpected .gcda and .gcno files in their projects when using gcov-style profiling with modules or frameworks. The unwanted files come from these modules. This is not very helpful for end-users. Further, we've seen reports of instrumented programs crashing while writing these files out (due to I/O failures). rdar://problem/22838296 Reviewed-by: aprantl Differential Revision: http://reviews.llvm.org/D15997 llvm-svn: 258406	2016-01-21 17:04:42 +00:00
Matthew Simpson	14b16e7ee1	[SLP] Truncate expressions to minimum required bit width This change attempts to produce vectorized integer expressions in bit widths that are narrower than their scalar counterparts. The need for demotion arises especially on architectures in which the small integer types (e.g., i8 and i16) are not legal for scalar operations but can still be used in vectors. Like similar work done within the loop vectorizer, we rely on InstCombine to perform the actual type-shrinking. We use the DemandedBits analysis and ComputeNumSignBits from ValueTracking to determine the minimum required bit width of an expression. Differential revision: http://reviews.llvm.org/D15815 llvm-svn: 258404	2016-01-21 16:31:55 +00:00
Sanjoy Das	b0b3d4c99d	Add a "gc-transition" operand bundle Summary: This adds a new kind of operand bundle to LLVM denoted by the `"gc-transition"` tag. Inputs to `"gc-transition"` operand bundle are lowered into the "transition args" section of `gc.statepoint` by `RewriteStatepointsForGC`. This removes the last bit of functionality that was unsupported in the deopt bundle based code path in `RewriteStatepointsForGC`. Reviewers: pgavlin, JosephTremoulet, reames Subscribers: sanjoy, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D16342 llvm-svn: 258338	2016-01-20 19:50:25 +00:00
Sanjay Patel	3635b71b45	[LibCallSimplifier] don't get fooled by a fake sqrt() The test case will crash without this patch because the subsequent call to hasUnsafeAlgebra() assumes that the call instruction is an FPMathOperator (ie, returns an FP type). This part of the function signature check was omitted for the sqrt() case, but seems to be in place for all other transforms. Before: http://reviews.llvm.org/rL257400 ...we would have needlessly continued execution in optimizeSqrt(), but the bug was harmless because we'd eventually fail some other check and return without damage. This should fix: https://llvm.org/bugs/show_bug.cgi?id=26211 Differential Revision: http://reviews.llvm.org/D16198 llvm-svn: 258325	2016-01-20 17:41:14 +00:00
Joseph Tremoulet	de5c9a8723	[Inliner/WinEH] Honor implicit nounwinds Summary: Funclet EH tables require that a given funclet have only one unwind destination for exceptional exits. The verifier will therefore reject e.g. two cleanuprets with different unwind dests for the same cleanup, or two invokes exiting the same funclet but to different unwind dests. Because catchswitch has no 'nounwind' variant, and because IR producers are not required to annotate calls which will not unwind as 'nounwind', it is legal to nest a call or an "unwind to caller" catchswitch within a funclet pad that has an unwind destination other than caller; it is undefined behavior for such a call or catchswitch to unwind. Normally when inlining an invoke, calls in the inlined sequence are rewritten to invokes that unwind to the callsite invoke's unwind destination, and "unwind to caller" catchswitches in the inlined sequence are rewritten to unwind to the callsite invoke's unwind destination. However, if such a call or "unwind to caller" catchswitch is located in a callee funclet that has another exceptional exit with an unwind destination within the callee, applying the normal transformation would give that callee funclet multiple unwind destinations for its exceptional exits. There would be no way for EH table generation to determine which is the "true" exit, and the verifier would reject the function accordingly. Add logic to the inliner to detect these cases and leave such calls and "unwind to caller" catchswitches as calls and "unwind to caller" catchswitches in the inlined sequence. This fixes PR26147. Reviewers: rnk, andrew.w.kaylor, majnemer Subscribers: alexcrichton, llvm-commits Differential Revision: http://reviews.llvm.org/D16319 llvm-svn: 258273	2016-01-20 02:15:15 +00:00
Sanjay Patel	691c821001	add tests to show missing memset/malloc optimizations (PR25892) llvm-svn: 258218	2016-01-19 23:07:10 +00:00
Sanjoy Das	c6887c7e27	[SCEV] Fix PR26207 In some cases, the max backedge taken count can be more conservative than the exact backedge taken count (for instance, because ScalarEvolution::getRange is not control-flow sensitive whereas computeExitLimitFromICmp can be). In these cases, computeExitLimitFromCond (specifically the bit that deals with `and` and `or` instructions) can create an ExitLimit instance with a `SCEVCouldNotCompute` max backedge count expression, but a computable exact backedge count expression. This violates an implicit SCEV assumption: a computable exact BE count should imply a computable max BE count. This change - Makes the above implicit invariant explicit by adding an assert to ExitLimit's constructor - Changes `computeExitLimitFromCond` to be more robust around conservative max backedge counts llvm-svn: 258184	2016-01-19 20:53:51 +00:00
Sanjay Patel	a2ab3d6165	[LibCallSimplifier] use instruction-level fast-math-flags to shrink calls This is a continuation of adding FMF to call instructions: http://reviews.llvm.org/rL255555 llvm-svn: 258158	2016-01-19 18:38:52 +00:00
Sanjay Patel	a46637dede	[LibCallSimplifier] use instruction-level fast-math-flags to transform pow(x, [small integer]) calls This is a continuation of adding FMF to call instructions: http://reviews.llvm.org/rL255555 As with D15937, the intent of the patch is to preserve the current behavior of the transform except that we use the pow call's 'fast' attribute as a trigger rather than a function-level attribute. The TODO comment notes a potential follow-on patch that would propagate FMF to the new instructions. Differential Revision: http://reviews.llvm.org/D16122 llvm-svn: 258153	2016-01-19 18:15:12 +00:00
Sanjoy Das	aa011535f3	[IndVars] Fix PR25576 `LCSSASafePhiForRAUW` as computed was incorrect -- in cases like these (this exact example does not actually trigger the bug): define i32 @f(i32 %n, i1* %c) { entry: br label %outer.loop outer.loop: br label %inner.loop inner.loop: %iv = phi i32 [ 0, %outer.loop ], [ %iv.inc, %inner.loop ] %iv.inc = add nuw nsw i32 %iv, 1 %tc = udiv i32 %n, 13 %be.cond = icmp ult i32 %iv, %tc br i1 %be.cond, label %inner.loop, label %inner.exit inner.exit: %iv.lcssa = phi i32 [ %iv, %inner.loop ] %outer.be.cond = load volatile i1, i1* %c br i1 %outer.be.cond, label %outer.loop, label %leave leave: %iv.lcssa.lcssa = phi i32 [ %iv.lcssa, %inner.exit ] ret i32 %iv.lcssa.lcssa } `LCSSASafePhiForRAUW` is true for `%iv.lcssa` when re-rewriting the exit value of `%iv` for `%inner.loop` to `%tc` (this can happen due to `SCEVExpander::findExistingExpansion`), but the RAUW breaks LCSSA. To fix this, instead of computing `SafePhi` with special logic, decide the safety of RAUW directly via `replacementPreservesLCSSAForm`. llvm-svn: 258016	2016-01-17 18:12:52 +00:00
Artur Pilipenko	bb5abf9eb3	Push isDereferenceableAndAlignedPointer down into isSafeToLoadUnconditionally Reviewed By: reames Differential Revision: http://reviews.llvm.org/D16226 llvm-svn: 258010	2016-01-17 12:35:29 +00:00
Igor Laevsky	1f8ac9245d	[BasicAliasAnalysis] Take into account operand bundles in the getModRefInfo function Differential Revision: http://reviews.llvm.org/D16225 llvm-svn: 257991	2016-01-16 12:15:53 +00:00
Matthew Simpson	bcc32afd72	Reapply r257800 with fix The fix uniques the bundle of getelementptr indices we are about to vectorize since it's possible for the same index to be used by multiple instructions. The original commit message is below. [SLP] Vectorize the index computations of getelementptr instructions. This patch seeds the SLP vectorizer with getelementptr indices. The primary motivation in doing so is to vectorize gather-like idioms beginning with consecutive loads (e.g., g[a[0] - b[0]] + g[a[1] - b[1]] + ...). While these cases could be vectorized with a top-down phase, seeding the existing bottom-up phase with the index computations avoids the complexity, compile-time, and phase ordering issues associated with a full top-down pass. Only bundles of single-index getelementptrs with non-constant differences are considered for vectorization. llvm-svn: 257918	2016-01-15 18:51:51 +00:00
Silviu Baranga	777f975cab	Re-commit r257064, after it was reverted in r257340. This contains a fix for the issue that caused the revert: we no longer assume that we can insert instructions after the instruction that produces the base pointer. We previously assumed that this would be ok, because the instruction produces a value and therefore is not a terminator. This is false for invoke instructions. We will now insert these new instruction directly at the location of the users. Original commit message: [InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs Summary: When comparing two GEP instructions which have the same base pointer and one of them has a constant index, it is possible to only compare indices, transforming it to a compare with a constant. This removes one use for the GEP instruction with the constant index, can reduce register pressure and can sometimes lead to removing the comparisson entirely. InstCombine was already doing this when comparing two GEPs if the base pointers were the same. However, in the case where we have complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to or from integers, etc) the value of the original base pointer will be hidden to the optimizer and this transformation will be disabled. This change detects when the two sides of the comparison can be expressed as GEPs with the same base pointer, even if they don't appear as such in the IR. The transformation will convert all the pointer arithmetic to arithmetic done on indices and all the relevant uses of GEPs to GEPs with a common base pointer. The GEP comparison will be converted to a comparison done on indices. Reviewers: majnemer, jmolloy Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits Differential Revision: http://reviews.llvm.org/D15146 llvm-svn: 257897	2016-01-15 15:52:05 +00:00
Matthew Simpson	676ccfcd0a	Revert "[SLP] Vectorize the index computations of getelementptr instructions." This reverts commit r257800. llvm-svn: 257888	2016-01-15 13:10:46 +00:00
James Molloy	7697faf6db	[InstCombine] Rewrite bswap/bitreverse handling completely. There are several requirements that ended up with this design; 1. Matching bitreversals is too heavyweight for InstCombine and doesn't really need to be done so early. 2. Bitreversals and byteswaps are very related in their matching logic. 3. We want to implement support for matching more advanced bswap/bitreverse patterns like partial bswaps/bitreverses. 4. Bswaps are best matched early in InstCombine. The result of these is that a new utility function is created in Transforms/Utils/Local.h that can be configured to search for bswaps, bitreverses or both. InstCombine uses it to find only bswaps, CGP uses it to find only bitreversals. We can then extend the matching logic in one place only. llvm-svn: 257875	2016-01-15 09:20:19 +00:00
Keno Fischer	927308d763	Reapply r257105 "[Verifier] Check that debug values have proper size" I originally reapplied this in 257550, but had to revert again due to bot breakage. The only change in this version is to allow either the TypeSize or the TypeAllocSize of the variable to be the one represented in debug info (hopefully in the future we can figure out how to encode the difference). Additionally, several bot failures following r257550, were due to optimizer bugs now fixed in r257787 and r257795. r257550 commit message was: ``` The follow extra changes were made to test cases: Manually making the variable be the actual type instead of a pointer to avoid pointer-size differences in generic code: LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll LLVM :: DebugInfo/Generic/varargs.ll Delete sizing information from debug info for the same reason (but the presence of the pointer was important to the test case): LLVM :: DebugInfo/Generic/restrict.ll LLVM :: DebugInfo/Generic/tu-composite.ll LLVM :: Linker/type-unique-type-array-a.ll LLVM :: Linker/type-unique-simple2.ll Fixing an incorrect DW_OP_deref LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll Fixing a missing DW_OP_deref LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll Additionally, clang should no longer complain during bootstrap should no longer happen after r257534. The original commit message was: `` Summary: Teach the Verifier to make sure that the storage size given to llvm.dbg.declare or the value size given to llvm.dbg.value agree with what is declared in DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA). Additionally this catches a number of common mistakes, such as passing a pointer when a value was intended or vice versa. One complication comes from stack coloring which modifies the original IR when it merges allocas in order to make sure that if AA falls back to the IR it gets the correct result. However, given this new invariant, indiscriminately replacing one alloca by a different (differently sized one) is no longer valid. Fix this by just undefing out any use of the alloca in a dbg.declare in this case. Additionally, I had to fix a number of test cases. Of particular note: - I regenerated dbg-changes-codegen-branch-folding.ll from the given source as it was affected by the bug fixed in r256077 - two-cus-from-same-file.ll was changed to avoid having a variable-typed debug variable as that would depend on the target, even though this test is supposed to be generic - I had to manually declared size/align for reference type. See also the discussion for D14275/r253186. - fpstack-debuginstr-kill.ll required changing `double` to `long double` - most others were just a question of adding OP_deref `` ``` llvm-svn: 257850	2016-01-15 00:46:17 +00:00
Matthew Simpson	b2378417a2	[SLP] Vectorize the index computations of getelementptr instructions. This patch seeds the SLP vectorizer with getelementptr indices. The primary motivation in doing so is to vectorize gather-like idioms beginning with consecutive loads (e.g., g[a[0] - b[0]] + g[a[1] - b[1]] + ...). While these cases could be vectorized with a top-down phase, seeding the existing bottom-up phase with the index computations avoids the complexity, compile-time, and phase ordering issues associated with a full top-down pass. Only bundles of single-index getelementptrs with non-constant differences are considered for vectorization. Differential Revision: http://reviews.llvm.org/D14829 llvm-svn: 257800	2016-01-14 20:46:27 +00:00
Keno Fischer	33808751c1	[SROA] Also insert a bit piece expression if only one piece is needed Summary: If SROA creates only one piece (e.g. because the other is not needed), it still needs to create a bit_piece expression if that bit piece is smaller than the original size of the alloca. Reviewers: aprantl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16187 llvm-svn: 257795	2016-01-14 20:06:34 +00:00
Keno Fischer	939b9c069c	[Utils] Fix incorrect dbg.declare store conversion Summary: The dbg.declare -> dbg.value conversion did not check which operand of the store instruction the alloca was passed to. As a result code that stored the address of an alloca, rather than storing to the alloca, would still trigger the conversion routine, leading to the insertion of an incorrect dbg.value intrinsic. Reviewers: aprantl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16169 llvm-svn: 257787	2016-01-14 19:12:27 +00:00
Akira Hatanaka	3525b3a30f	[Inliner] Merge the attributes of the caller and callee functions This patch turns off the fast-math optimization attribute on the caller if the callee's fast-math attribute is not turned on. For example, - before inlining caller: "less-precise-fpmad"="true" callee: "less-precise-fpmad"="false" - after inlining caller: "less-precise-fpmad"="false" Alternatively, it's possible to block inlining if the caller's and callee's attributes don't match. If this approach is preferable to the one in this patch, we can discuss post-commit. rdar://problem/19836465 Differential Revision: http://reviews.llvm.org/D7802 llvm-svn: 257575	2016-01-13 06:02:45 +00:00
Keno Fischer	97a5fb3666	Re-Revert r257105 (Verifier debug info changes) While I investigate some new buildbot failures. This was originally reapplied as r257550 and r257558. llvm-svn: 257563	2016-01-13 02:31:14 +00:00
Keno Fischer	d8825d5008	Reapply r257105 "[Verifier] Check that debug values have proper size" The follow extra changes were made to test cases: Manually making the variable be the actual type instead of a pointer to avoid pointer-size differences in generic code: LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll LLVM :: DebugInfo/Generic/varargs.ll Delete sizing information from debug info for the same reason (but the presence of the pointer was important to the test case): LLVM :: DebugInfo/Generic/restrict.ll LLVM :: DebugInfo/Generic/tu-composite.ll LLVM :: Linker/type-unique-type-array-a.ll LLVM :: Linker/type-unique-simple2.ll Fixing an incorrect DW_OP_deref LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll Fixing a missing DW_OP_deref LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll Additionally, clang should no longer complain during bootstrap should no longer happen after r257534. The original commit message was: ``` Summary: Teach the Verifier to make sure that the storage size given to llvm.dbg.declare or the value size given to llvm.dbg.value agree with what is declared in DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA). Additionally this catches a number of common mistakes, such as passing a pointer when a value was intended or vice versa. One complication comes from stack coloring which modifies the original IR when it merges allocas in order to make sure that if AA falls back to the IR it gets the correct result. However, given this new invariant, indiscriminately replacing one alloca by a different (differently sized one) is no longer valid. Fix this by just undefing out any use of the alloca in a dbg.declare in this case. Additionally, I had to fix a number of test cases. Of particular note: - I regenerated dbg-changes-codegen-branch-folding.ll from the given source as it was affected by the bug fixed in r256077 - two-cus-from-same-file.ll was changed to avoid having a variable-typed debug variable as that would depend on the target, even though this test is supposed to be generic - I had to manually declared size/align for reference type. See also the discussion for D14275/r253186. - fpstack-debuginstr-kill.ll required changing `double` to `long double` - most others were just a question of adding OP_deref ``` llvm-svn: 257550	2016-01-13 00:31:44 +00:00
Fiona Glaser	3959ead5a7	CannotBeOrderedLessThanZero: add some missing cases llvm-svn: 257542	2016-01-12 23:37:30 +00:00
Keno Fischer	a2e765d377	[Utils] Insert DW_OP_bit_piece when only describing part of the variable Summary: The dbg.declare -> dbg.value conversion looks through any zext/sext to find a value to describe the variable (in the expectation that those zext/sext instruction will go away later). However, those values do not cover the entire variable and thus need a DW_OP_bit_piece. Reviewers: aprantl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16061 llvm-svn: 257534	2016-01-12 22:46:09 +00:00
Sanjay Patel	489a46e98d	[LibCallSimplifier] use instruction-level fast-math-flags to transform pow(x, 0.5) calls Also, propagate the FMF to the newly created sqrt() call. llvm-svn: 257503	2016-01-12 19:06:35 +00:00
Teresa Johnson	772d13fff8	Fix bot failure from r257493: remove extraneous temp file read This was left from an earlier version of the test. llvm-svn: 257494	2016-01-12 17:53:59 +00:00
Teresa Johnson	10e78a41c7	[ThinLTO] Handle an external call from an import to an alias in dest The findExternalCalls routine ignores calls to functions already defined in the dest module. This was not handling the case where the definition in the current module is actually an alias to a function call. llvm-svn: 257493	2016-01-12 17:48:44 +00:00
Sanjay Patel	91e6a8ee15	[LibCallSimplifier] use instruction-level fast-math-flags to transform pow(exp(x)) calls See also: http://reviews.llvm.org/rL255555 http://reviews.llvm.org/rL256871 http://reviews.llvm.org/rL256964 http://reviews.llvm.org/rL257400 http://reviews.llvm.org/rL257404 http://reviews.llvm.org/rL257414 llvm-svn: 257491	2016-01-12 17:30:37 +00:00
Sanjay Patel	930be29a19	consolidate exp/exp2 tests The transform is identical, so keep the tests together and save some overhead. llvm-svn: 257484	2016-01-12 17:00:38 +00:00
Sanjay Patel	c16495bc10	Add/edit tests to include instruction-level FMF on calls Prepatory patch before changing LibCallSimplifier to use the FMF. Also, tighten the CHECK lines and give the tests more meaningful names. Similar changes to: http://reviews.llvm.org/rL257414 llvm-svn: 257481	2016-01-12 16:50:17 +00:00
Teresa Johnson	d8e00ee37a	[IRMover] Don't copy personality, etc unless creating def Function::copyAttributesFrom will copy the personality function, prefix data and prolog data from the source function to the new function, and is invoked when the IRMover copies the function prototype. This puts a reference to a constant in the source module on a function in the dest module, which causes an error when deleting the source module after importing, since the personality function in the source module still has uses (this would presumably also be an issue for the prologue and prefix data). Remove the copies added to the dest copy when creating the new prototype, as they are mapped properly when/if we link the function body. llvm-svn: 257420	2016-01-12 00:24:24 +00:00
Sanjay Patel	42e7daf81c	[LibCallSimplifier] use instruction-level fast-math-flags to transform log calls Also, add tests to verify that we're checking 'fast' on both calls of each transform pair, tighten the CHECK lines, and give the tests more meaningful names. This is a continuation of: http://reviews.llvm.org/rL255555 http://reviews.llvm.org/rL256871 http://reviews.llvm.org/rL256964 http://reviews.llvm.org/rL257400 http://reviews.llvm.org/rL257404 llvm-svn: 257414	2016-01-11 23:31:48 +00:00
Sanjay Patel	dfd0791d6d	[LibCallSimplifier] don't allow sqrt transform unless all ops are unsafe Fix the FIXME added with: http://reviews.llvm.org/rL257400 llvm-svn: 257404	2016-01-11 22:50:36 +00:00
Justin Bogner	98deb31a78	LoopUnroll: Use the optsize threshold for minsize as well Currently we're unrolling loops more in minsize than in optsize, which means -Oz will have a larger code size than -Os. That doesn't make any sense. This resolves the FIXME about this in LoopUnrollPass and extends the optsize test to make sure we use the smaller threshold for minsize as well. llvm-svn: 257402	2016-01-11 22:39:43 +00:00
Sanjay Patel	9ac7e74796	[LibCallSimplifier] use instruction-level fast-math-flags to transform sqrt calls This is a continuation of adding FMF to call instructions: http://reviews.llvm.org/rL255555 The intent of the patch is to preserve the current behavior of the transform except that we use the sqrt instruction's 'fast' attribute as a trigger rather than the function-level attribute. But this raises a bug noted by the new FIXME comment. In order to do this transform: sqrt((x * x) * y) ---> fabs(x) * sqrt(y) ...we need all of the sqrt, the first fmul, and the second fmul to be 'fast'. If any of those ops is strict, we should bail out. Differential Revision: http://reviews.llvm.org/D15937 llvm-svn: 257400	2016-01-11 22:34:19 +00:00
Silviu Baranga	90360019af	Revert r257164 - it has caused spec2k6 failures in LTO mode llvm-svn: 257340	2016-01-11 16:19:38 +00:00
David Majnemer	cc00affa03	Add test for r257279. llvm-svn: 257280	2016-01-10 07:13:33 +00:00
Chen Li	e4ebcc71ab	[SimplifyCFG] Extend SimplifyResume to handle phi of trivial landing pad. Summary: This is a fix of D13718. D13718 was committed but then reverted because of the following bug: https://llvm.org/bugs/show_bug.cgi?id=25299 This patch fixes the issue shown in the bug. Reviewers: majnemer, reames Subscribers: jevinskie, llvm-commits Differential Revision: http://reviews.llvm.org/D14308 llvm-svn: 257277	2016-01-10 05:48:01 +00:00
Manuel Jacob	3215e2f49f	[RS4GC] Update and simplify handling of Constants in findBaseDefiningValueOfVector(). Summary: This is analogous to r256079, which removed an overly strong assertion, and r256812, which simplified the code by replacing three conditionals by one. Reviewers: reames Subscribers: sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D16019 llvm-svn: 257250	2016-01-09 04:02:16 +00:00
Philip Reames	6204508de8	[rs4gc] Optionally directly relocated vector of pointers This patch teaches rewrite-statepoints-for-gc to relocate vector-of-pointers directly rather than trying to split them. This builds on the recent lowering/IR changes to allow vector typed gc.relocates. The motivation for this is that we recently found a bug in the vector splitting code where depending on visit order, a vector might not be relocated at some safepoint. Specifically, the bug is that the splitting code wasn't updating the side tables (live vector) of other safepoints. As a result, a vector which was live at two safepoints might not be updated at one of them. However, if you happened to visit safepoints in post order over the dominator tree, everything worked correctly. Weirdly, it turns out that post order is actually an incredibly common order to visit instructions in in practice. Frustratingly, I have not managed to write a test case which actually hits this. I can only reproduce it in large IR files produced by actual applications. Rather than continue to make this code more complicated, we can remove all of the complexity by just representing the relocation of the entire vector natively in the IR. At the moment, the new functionality is hidden behind a flag. To use this code, you need to pass "-rs4gc-split-vector-values=0". Once I have a chance to stress test with this option and get feedback from other users, my plan is to flip the default and remove the original splitting code. I would just remove it now, but given the rareness of the bug, I figured it was better to leave it in place until the new approach has been stress tested. Differential Revision: http://reviews.llvm.org/D15982 llvm-svn: 257244	2016-01-09 01:31:13 +00:00
Haicheng Wu	d23d8b325b	[JumpThreading] Split select that has constant conditions coming from the PHI node Look for PHI/Select in the same BB of the form bb: %p = phi [false, %bb1], [true, %bb2], [false, %bb3], [true, %bb4], ... %s = select p, trueval, falseval And expand the select into a branch structure. This later enables jump-threading over bb in this pass. Using the similar approach of SimplifyCFG::FoldCondBranchOnPHI(), unfold select if the associated PHI has at least one constant. If the unfolded select is not jump-threaded, it will be folded again in the later optimizations. llvm-svn: 257198	2016-01-08 19:39:39 +00:00
Justin Bogner	879f86bb78	LoopInfo: Simplify ownership of Loop objects It's strange that LoopInfo mostly owns the Loop objects, but that it defers deleting them to the loop pass manager. Instead, change the oddly named "updateUnloop" to "markAsRemoved" and have it queue the Loop object for deletion. We can't delete the Loop immediately when we remove it, since we need its pointer identity still, so we'll mark the object as "invalid" so that clients can see what's going on. llvm-svn: 257191	2016-01-08 19:08:53 +00:00
Teresa Johnson	ce9de68594	[ThinLTO] Delay metadata materializtion in function importer The function importer was still materializing metadata when modules were loaded for function importing. We only want to materialize it when we are going to invoke the metadata linking postpass. Materializing it before function importing is not only unnecessary, but also causes metadata referenced by imported functions to be mapped in early, and then not connected to the rest of the module level metadata when it is ultimately linked in. Augmented the test case to specifically check for the metadata being properly connected, which it wasn't before this fix. llvm-svn: 257171	2016-01-08 14:17:41 +00:00
Silviu Baranga	93f7373429	Re-commit r257064, this time with a fixed assert In setInsertionPoint if the value is not a PHI, Instruction or Argument it should be a Constant, not a ConstantExpr. Original commit message: [InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs Summary: When comparing two GEP instructions which have the same base pointer and one of them has a constant index, it is possible to only compare indices, transforming it to a compare with a constant. This removes one use for the GEP instruction with the constant index, can reduce register pressure and can sometimes lead to removing the comparisson entirely. InstCombine was already doing this when comparing two GEPs if the base pointers were the same. However, in the case where we have complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to or from integers, etc) the value of the original base pointer will be hidden to the optimizer and this transformation will be disabled. This change detects when the two sides of the comparison can be expressed as GEPs with the same base pointer, even if they don't appear as such in the IR. The transformation will convert all the pointer arithmetic to arithmetic done on indices and all the relevant uses of GEPs to GEPs with a common base pointer. The GEP comparison will be converted to a comparison done on indices. Reviewers: majnemer, jmolloy Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits Differential Revision: http://reviews.llvm.org/D15146 llvm-svn: 257164	2016-01-08 11:11:04 +00:00
Chandler Carruth	1b5532dd29	[attrs] Split the late-revisit pattern for deducing norecurse in a top-down manner into a true top-down or RPO pass over the call graph. There are specific patterns of function attributes, notably the norecurse attribute, which are most effectively propagated top-down because all they us caller information. Walk in RPO over the call graph SCCs takes the form of a module pass run immediately after the CGSCC pass managers postorder walk of the SCCs, trying again to deduce norerucrse for each singular SCC in the call graph. This removes a very legacy pass manager specific trick of using a lazy revisit list traversed during finalization of the CGSCC pass. There is no analogous finalization step in the new pass manager, and a lazy revisit list is just trying to produce an RPO iteration of the call graph. We can do that more directly if more expensively. It seems unlikely that this will be the expensive part of any compilation though as we never examine the function bodies here. Even in an LTO run over a very large module, this should be a reasonable fast set of operations over a reasonably small working set -- the function call graph itself. In the future, if this really is a compile time performance issue, we can look at building support for both post order and RPO traversals directly into a pass manager that builds and maintains the PO list of SCCs. Differential Revision: http://reviews.llvm.org/D15785 llvm-svn: 257163	2016-01-08 10:55:52 +00:00
Sanjay Patel	898b29bc66	[InstCombine] insert a new shuffle in a safe place (PR25999) Limit this transform to a basic block and guard against PHIs. Hopefully, this fixes the remaining failures in PR25999: https://llvm.org/bugs/show_bug.cgi?id=25999 llvm-svn: 257133	2016-01-08 01:39:16 +00:00
Aditya Nandakumar	f2609534e3	Instructions to be redone only if from the same BB While adding instructions(possible roots) to be redone, make sure they are from the same basic block. llvm-svn: 257112	2016-01-07 23:22:55 +00:00
Keno Fischer	37415bceb0	Temporarily revert r257105 "[Verifier] Check that debug values have proper size" Looks like there's a case where clang generates debug info that triggers the new verifier check. Reverting while investigating. llvm-svn: 257107	2016-01-07 22:39:11 +00:00
Keno Fischer	c41229c60a	[Verifier] Check that debug values have proper size Summary: Teach the Verifier to make sure that the storage size given to llvm.dbg.declare or the value size given to llvm.dbg.value agree with what is declared in DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA). Additionally this catches a number of common mistakes, such as passing a pointer when a value was intended or vice versa. One complication comes from stack coloring which modifies the original IR when it merges allocas in order to make sure that if AA falls back to the IR it gets the correct result. However, given this new invariant, indiscriminately replacing one alloca by a different (differently sized one) is no longer valid. Fix this by just undefing out any use of the alloca in a dbg.declare in this case. Additionally, I had to fix a number of test cases. Of particular note: - I regenerated dbg-changes-codegen-branch-folding.ll from the given source as it was affected by the bug fixed in r256077 - two-cus-from-same-file.ll was changed to avoid having a variable-typed debug variable as that would depend on the target, even though this test is supposed to be generic - I had to manually declared size/align for reference type. See also the discussion for D14275/r253186. - fpstack-debuginstr-kill.ll required changing `double` to `long double` - most others were just a question of adding OP_deref Reviewers: aprantl Differential Revision: http://reviews.llvm.org/D14276 llvm-svn: 257105	2016-01-07 22:18:37 +00:00
David Majnemer	2296864d81	[SCCP] Don't violate the lattice invariants We marked values which are 'undef' as constant instead of undefined which violates SCCP's invariants. If we can figure out that a computation results in 'undef', leave it in the undefined state. This fixes PR16052. llvm-svn: 257102	2016-01-07 21:36:16 +00:00
David Majnemer	bda025cd91	Add test for r256912 I forgot to add this with the rest of r256912. llvm-svn: 257088	2016-01-07 19:27:16 +00:00
David Majnemer	dce0498b16	[SCCP] Can't go from overdefined to constant The fix for PR23999 made us mark loads of null as producing the constant undef which upsets the lattice. Instead, keep the load as "undefined". This fixes PR26044. llvm-svn: 257087	2016-01-07 19:25:39 +00:00
Silviu Baranga	aa39f9d643	Revert r257064. It caused failures in some sanitizer tests. llvm-svn: 257069	2016-01-07 15:46:43 +00:00
Silviu Baranga	b0c35664c0	[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs Summary: When comparing two GEP instructions which have the same base pointer and one of them has a constant index, it is possible to only compare indices, transforming it to a compare with a constant. This removes one use for the GEP instruction with the constant index, can reduce register pressure and can sometimes lead to removing the comparisson entirely. InstCombine was already doing this when comparing two GEPs if the base pointers were the same. However, in the case where we have complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to or from integers, etc) the value of the original base pointer will be hidden to the optimizer and this transformation will be disabled. This change detects when the two sides of the comparison can be expressed as GEPs with the same base pointer, even if they don't appear as such in the IR. The transformation will convert all the pointer arithmetic to arithmetic done on indices and all the relevant uses of GEPs to GEPs with a common base pointer. The GEP comparison will be converted to a comparison done on indices. Reviewers: majnemer, jmolloy Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits Differential Revision: http://reviews.llvm.org/D15146 llvm-svn: 257064	2016-01-07 14:56:08 +00:00
Mehdi Amini	e9f8479a85	Fix PR26051: Memcpy optimization should introduce a call to memcpy before the store destination position This is a conservative fix, I expect Amaury to relax this. Follow-up for r256923 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 256999	2016-01-06 23:50:22 +00:00
Chen Li	98023a7f09	[SplitLandingPadPredecessors] Create a PHINode for the original landingpad only if it has some uses Summary: This patch adds a check in SplitLandingPadPredecessors to see if the original landingpad instruction has any uses. If not, we don't need to create a PHINode for it in the joint block since it's gonna be a dead code anyway. The motivation for this patch is that we found a bug that SplitLandingPadPredecessors created a PHINode of token type landingpad, which failed the verifier since PHINode can not be token type. However, the created PHINode will never be used in our code pattern. This patch will workaround this bug, and we might add supports in SplitLandingPadPredecessors to handle token type landingpad with uses in the future. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15835 llvm-svn: 256972	2016-01-06 20:32:05 +00:00
Amaury Sechet	b940cf08ab	Promote aggregate store to memset when possible Summary: As per title. This will allow the optimizer to pick up on it. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15923 llvm-svn: 256969	2016-01-06 19:47:24 +00:00
Sanjay Patel	20d1d5e75f	[LibCallSimplifier] use instruction-level fast-math-flags for tan/atan transform llvm-svn: 256964	2016-01-06 19:23:35 +00:00
Amaury Sechet	cafefe3116	Improve load/store to memcpy for aggregate Summary: It turns out that if we don't try to do it at the store location, we can do it before any operation that alias the load, as long as no operation alias the store. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15903 llvm-svn: 256923	2016-01-06 09:30:39 +00:00
Philip Reames	780b59a41c	[BasicAA] Remove special casing of memset_pattern16 in favor of generic attribute inference Most of the properties of memset_pattern16 can be now covered by the generic attributes and inferred by InferFunctionAttrs. The only exceptions are: - We don't yet have a writeonly attribute for the first argument. - We don't have an attribute for modeling the access size facts encoded in MemoryLocation.cpp. Differential Revision: http://reviews.llvm.org/D15879 llvm-svn: 256911	2016-01-06 04:53:16 +00:00
Manuel Jacob	2e54a66b93	[Statepoints] Check for the "gc-leaf-function" attribute on call sites as well. Reviewers: sanjoy, reames Subscribers: sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D15900 llvm-svn: 256875	2016-01-05 23:59:08 +00:00
Sanjay Patel	2273c0c2a2	[LibCallSimplfier] use instruction-level fast-math-flags for fmin/fmax transforms llvm-svn: 256871	2016-01-05 20:46:19 +00:00
Amaury Sechet	f8a4963955	Implement load to store => memcpy in MemCpyOpt for aggregates Summary: Most of the tool chain is able to optimize scalar and memcpy like operation effisciently while it isn't that good with aggregates. In order to improve the support of aggregate, we try to change aggregate manipulation into either scalar or memcpy like ones whenever possible without loosing informations. This is one such opportunity. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15894 llvm-svn: 256868	2016-01-05 20:17:48 +00:00
Manuel Jacob	bf62c3f251	Correct my last commit (revision 256860). I forgot to save a small wording improvement before committing. llvm-svn: 256862	2016-01-05 19:45:54 +00:00
Manuel Jacob	68b31ef787	[PlaceSafepoints] Add a test. Calls of functions with the "gc-leaf-function" attribute shouldn't be turned into a safepoint. llvm-svn: 256860	2016-01-05 19:40:58 +00:00
Sanjay Patel	2e13cb5de7	[InstCombine] insert a new shuffle before its uses (PR26015) Although this solves the test case in PR26015: https://llvm.org/bugs/show_bug.cgi?id=26015 And may solve PR25999: https://llvm.org/bugs/show_bug.cgi?id=25999 ...I suspect this is not the best solution. I think we want to insert the new shuffle just ahead of the earliest ExtractElementInst that we're replacing, but I don't know how that should be implemented. Differential Revision: http://reviews.llvm.org/D15878 llvm-svn: 256857	2016-01-05 19:09:47 +00:00
David Majnemer	90b554b54f	[SimplifyCFG] Further improve our ability to remove redundant catchpads In r256814, we managed to remove catchpads which were trivially redudant because they were the same SSA value. We can do better using the same algorithm but with a smarter datastructure by hashing the SSA values within the catchpad and comparing them structurally. llvm-svn: 256815	2016-01-05 07:42:17 +00:00
David Majnemer	ddc4b71886	[SimplifyCFG] Remove redundant catchpads Remove duplicate catchpad handlers from a catchswitch. llvm-svn: 256814	2016-01-05 06:27:50 +00:00
Joseph Tremoulet	bff6334639	[WinEH] Simplify unreachable catchpads Summary: At least for CoreCLR, a catchpad which immediately executes an `unreachable` instruction indicates that the exception can never have a matching type, and so such catchpads can be removed, and so can their catchswitches if the catchswitch becomes empty. Reviewers: rnk, andrew.w.kaylor, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15846 llvm-svn: 256809	2016-01-05 02:37:41 +00:00
Chen Li	10e521338c	[InstructionCombining] prepareICWorklistFromFunction halts in infinite loop with instructions of token type Summary: This patch fixes a bug in prepareICWorklistFromFunction, where the loop becomes infinite with instructions of token type. The patch checks if the instruction is token type, and if so it updates EndInst with the current instruction. Reviewers: reames, majnemer Subscribers: llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D15859 llvm-svn: 256792	2016-01-04 23:28:57 +00:00
David Majnemer	ff1af8bac5	[LICM] Fix a small oversight introduced in r256763 r256763 had promoteLoopAccessesToScalars check for the existence of a catchswitch when the exit blocks were populated but promoteLoopAccessesToScalars may be called with a prepopulated set of exit blocks which would also need to be checked. This fixes PR26019. llvm-svn: 256788	2016-01-04 23:16:22 +00:00
Philip Reames	a43feccb31	[MemoryBuiltins] Remove isOperatorNewLike by consolidating non-null inference handling This patch removes the isOperatorNewLike predicate since it was only being used to establish a non-null return value and we have attributes specifically for that purpose with generic handling. To keep approximate the same behaviour for existing frontends, I added the various operator new like (i.e. instances of operator new) to InferFunctionAttrs. It's not really clear to me why this isn't handled in Clang, but I didn't want to break existing code and any subtle assumptions it might have. Once this patch is in, I'm going to start separating the isAllocLike family of predicates. These appear to be being used for a mixture of things which should be more clearly separated and documented. Today, they're being used to indicate (at least) aliasing facts, CSE-ability, and default values from an allocation site. Differential Revision: http://reviews.llvm.org/D15820 llvm-svn: 256787	2016-01-04 22:49:23 +00:00
Aditya Nandakumar	8b72ea100b	Remove dead instructions before Redoing Before reevaluating instructions, iterate over all instructions to be reevaluated and remove trivially dead instructions and if any of it's operands become trivially dead, mark it for deletion until all trivially dead instructions have been removed llvm-svn: 256773	2016-01-04 19:48:14 +00:00
David Majnemer	8c5d1fc2f6	[LICM] Don't insert instructions after a catchswitch when performing loop promotion Inserting after a catchswitch results in verifier errors, bail out on promotion if a catchswitch is a loop exit. llvm-svn: 256763	2016-01-04 17:42:19 +00:00
David Majnemer	403ae568aa	[LICM] Make instruction sinking funclet-aware We had two bugs here: - We might try to sink into a catchswitch, causing verifier failures. - We will succeed in sinking into a cleanuppad but we didn't update the funclet operand bundle. This fixes PR26000. llvm-svn: 256728	2016-01-04 03:37:39 +00:00
Dimitry Andric	0614f2a55e	Fix several accidental DOS line endings in source files Summary: There are a number of files in the tree which have been accidentally checked in with DOS line endings. Convert these to native line endings. There are also a few files which have DOS line endings on purpose, and I have set the svn:eol-style property to 'CRLF' on those. Reviewers: joerg, aaron.ballman Subscribers: aaron.ballman, sanjoy, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D15848 llvm-svn: 256707	2016-01-03 17:22:03 +00:00
Sanjay Patel	2d3c7242d3	[LibCallSimplifier] propagate FMF when shrinking binary calls llvm-svn: 256682	2015-12-31 23:40:59 +00:00
Sanjay Patel	9333af147c	[LibCallSimplifier] propagate FMF when shrinking unary calls llvm-svn: 256679	2015-12-31 21:52:31 +00:00
Sanjay Patel	bc5190f0cb	change function names to avoid accidentally matching the substring llvm-svn: 256678	2015-12-31 21:25:25 +00:00
Sanjay Patel	3ea18b95b7	add 'fast' attribute to calls to show that the flag isn't being propagated llvm-svn: 256677	2015-12-31 21:12:19 +00:00
Geoff Berry	6eaa03403d	[JumpThreading] Fix opcode bonus in getJumpThreadDuplicationCost() The code that was meant to adjust the duplication cost based on the terminator opcode was not being executed in cases where the initial threshold was hit inside the loop. Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D15536 llvm-svn: 256568	2015-12-29 18:10:16 +00:00
Manuel Jacob	b1de405597	[RS4GC] Fix rematerialization of bitcast of bitcast. Summary: Previously, only the outer (last) bitcast was rematerialized, resulting in a use of the unrelocated inner (first) bitcast after the statepoint. See the test case for an example. Reviewers: igor-laevsky, reames Subscribers: reames, alex, llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D15789 llvm-svn: 256520	2015-12-28 20:14:05 +00:00
Chandler Carruth	8beb86a806	[attrs] Extract the pure inference of function attributes into a standalone pass. There is no call graph or even interesting analysis for this part of function attributes -- it is literally inferring attributes based on the target library identification. As such, we can do it using a much simpler module pass that just walks the declarations. This can also happen much earlier in the pass pipeline which has benefits for any number of other passes. In the process, I've cleaned up one particular aspect of the logic which was necessary in order to separate the two passes cleanly. It now counts inferred attributes independently rather than just counting all the inferred attributes as one, and the counts are more clearly explained. The two test cases we had for this code path are both ... woefully inadequate and copies of each other. I've kept the superset test and updated it. We need more testing here, but I had to pick somewhere to stop fixing everything broken I saw here. Differential Revision: http://reviews.llvm.org/D15676 llvm-svn: 256466	2015-12-27 08:41:34 +00:00
Chandler Carruth	cf6f5436f5	[attrs] Split off the forced attributes utility into its own pass that is (by default) run much earlier than FuncitonAttrs proper. This allows forcing optnone or other widely impactful attributes. It is also a bit simpler as the force attribute behavior needs no specific iteration order. I've added the pass into the default module pass pipeline and LTO pass pipeline which mirrors where function attrs itself was being run. Differential Revision: http://reviews.llvm.org/D15668 llvm-svn: 256465	2015-12-27 08:13:45 +00:00
Benjamin Kramer	f19aafee12	Fix safepoint intrinsic signatures in test. Should bring back the bots after r256443. llvm-svn: 256450	2015-12-26 11:40:48 +00:00
Chen Li	c60ad3e1fe	[gc.statepoint] Change gc.statepoint intrinsic's return type to token type instead of i32 type Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint. Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob Subscribers: reames, mjacob, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D15662 llvm-svn: 256443	2015-12-26 07:54:32 +00:00
Sanjay Patel	b4b4a9aeb1	[InstCombine] transform more extract/insert pairs into shuffles (PR2109) This is an extension of the shuffle combining from r203229: http://reviews.llvm.org/rL203229 The idea is to widen a short input vector with undef elements so the existing shuffle transform for extract/insert can kick in. The motivation is to finally solve PR2109: https://llvm.org/bugs/show_bug.cgi?id=2109 For that example, the IR becomes: %1 = bitcast <2 x i32>* %P to <2 x float>* %ld1 = load <2 x float>, <2 x float>* %1, align 8 %2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> %i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5> ret <4 x float> %i2 And x86 SSE output improves from: movq (%rdi), %xmm1 ## xmm1 = mem[0],zero movdqa %xmm1, %xmm2 shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3] shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0] shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2] shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0] shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0] retq To the almost optimal: movhpd (%rdi), %xmm0 Note: There's a tension in the existing transform related to generating arbitrary shufflevector masks. We avoid that in other places in InstCombine because we're scared that codegen can't handle strange masks, but it looks like we're ok with producing those here. I purposely chose weird insert/extract indexes for the regression tests to see the effect in these cases. For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or better for these examples. Differential Revision: http://reviews.llvm.org/D15096 llvm-svn: 256394	2015-12-24 21:17:56 +00:00
David Majnemer	86245dd102	[OperandBundles] Have TailCallElim play nice with operand bundles A call site's use of a Value might not correspond to an argument operand but to a bundle operand. This fixes PR25928. llvm-svn: 256328	2015-12-23 09:58:43 +00:00
David Majnemer	d697901638	[OperandBundles] Have InstCombine play nice with operand bundles Don't assume a call's use corresponds to an argument operand, it might correspond to a bundle operand. llvm-svn: 256327	2015-12-23 09:58:41 +00:00
David Majnemer	22cb4eb850	[OperandBundles] Have DeadArgElim play nice with operand bundles A call site's use of a Value might not correspond to an argument operand but to a bundle operand. llvm-svn: 256326	2015-12-23 09:58:36 +00:00
Manuel Jacob	5f0ac433d3	[RS4GC] Fix base pair printing for constants. Previously, "%" + name of the value was printed for each derived and base pointer. This is correct for instructions, but wrong for e.g. globals. llvm-svn: 256305	2015-12-23 00:19:45 +00:00
Rafael Espindola	067bfb9e99	Also add unnamed_addr to functions. llvm-svn: 256281	2015-12-22 20:43:30 +00:00
Rafael Espindola	f795707790	Delete dead GlobalAliases. llvm-svn: 256276	2015-12-22 19:50:22 +00:00
Cong Hou	50c405416c	[BPI] Replace weights by probabilities in BPI. This patch removes all weight-related interfaces from BPI and replace them by probability versions. With this patch, we won't use edge weight anymore in either IR or MC passes. Edge probabilitiy is a better representation in terms of CFG update and validation. Differential revision: http://reviews.llvm.org/D15519 llvm-svn: 256263	2015-12-22 18:56:14 +00:00
Manuel Jacob	3a4569b878	Remove deprecated llvm.experimental.gc.result.{int,float,ptr} intrinsics. Summary: These were deprecated 11 months ago when a generic llvm.experimental.gc.result intrinsic, which works for all types, was added. Reviewers: sanjoy, reames Subscribers: sanjoy, chenli, llvm-commits Differential Revision: http://reviews.llvm.org/D15719 llvm-svn: 256262	2015-12-22 18:44:45 +00:00
Manuel Jacob	b94cce35d1	[RS4GC] Fix crash in the case that a live variable has a constant base. Summary: Previously, RS4GC crashed in CreateGCRelocates() because it assumed that every base is also in the array of live variables, which isn't true if a live variable has a constant base. This change fixes the crash by making sure CreateGCRelocates() won't try to relocate a live variable with a constant base. This would be unnecessary anyway because anything with a constant base won't move. Reviewers: reames Subscribers: llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D15556 llvm-svn: 256252	2015-12-22 16:50:44 +00:00
Easwaran Raman	66e5fa28c2	Determine callee's hotness and adjust threshold based on that. NFC. This uses the same criteria used in CFE's CodeGenPGO to identify hot and cold callees and uses values of inlinehint-threshold and inlinecold-threshold respectively as the thresholds for such callees. Differential Revision: http://reviews.llvm.org/D15245 llvm-svn: 256222	2015-12-22 00:32:35 +00:00
Evgeniy Stepanov	4c984e7582	[safestack] Add option for non-TLS unsafe stack pointer. This patch adds an option, -safe-stack-no-tls, for using normal storage instead of thread-local storage for the unsafe stack pointer. This can be useful when SafeStack is applied to an operating system kernel. http://reviews.llvm.org/D15673 Patch by Michael LeMay. llvm-svn: 256221	2015-12-22 00:13:11 +00:00
Evgeniy Stepanov	7ed9f33690	[cfi] Fix LowerBitSets on 32-bit targets. This code attempts to truncate IntPtrTy to i32, which may be the same type. llvm-svn: 256205	2015-12-21 22:14:04 +00:00
Sanjoy Das	d080ee893d	Nonnull elements in OperandBundleCallSites are not all Instructions `CloneAndPruneIntoFromInst` sometimes RAUW's dead instructions with `undef` before erasing them (to avoid deleting instructions that still have uses). This changes the `WeakVH` in `OperandBundleCallSites` to hold an `undef`, and we need to guard for this situation in eventuality in `llvm::InlineFunction`. llvm-svn: 256110	2015-12-19 22:40:28 +00:00
Sanjoy Das	378d350613	[Deopt bundles] Fix a test case The `CHECK-NOT` line was incorrect, and would not have caught a breakage. llvm-svn: 256109	2015-12-19 22:40:22 +00:00
Manuel Jacob	bc032094a5	Remove double blanks. NFC. llvm-svn: 256100	2015-12-19 18:26:53 +00:00
Philip Reames	95ac2d26ea	[RS4GC] Remove an overly strong assertion As shown by the included test case, it's reasonable to end up with constant references during base pointer calculation. The code actually handled this case just fine, we only had the assert to help isolate problems under the belief that constant references shouldn't be present in IR generated by managed frontends. This turned out to be wrong on two fronts: 1) Manual Jacobs is working on a language with constant references, and b) we found a case where the optimizer does create them in practice. llvm-svn: 256079	2015-12-19 02:38:22 +00:00
Keno Fischer	53320e0722	Clean up the processing of dbg.value in various places Summary: First up is instcombine, where in the dbg.declare -> dbg.value conversion, the llvm.dbg.value needs to be called on the actual loaded value, rather than the address (since the whole point of this transformation is to be able to get rid of the alloca). Further, now that that's cleaned up, we can remove a hack in the backend, that would add an implicit OP_deref if the argument to dbg.value was an alloca. This stems from before the existence of DIExpression and is no longer necessary since the deref can be expressed explicitly. Now, in order to make sure that the tests pass with this change, we need to correct the printing of DEBUG_VALUE comments to take into account the expression, which wasn't taken into account before. Unfortunately, for both these changes, there were a number of incorrect test cases (mostly the wrong number of DW_OP_derefs, but also a couple where the test itself was broken more badly). aprantl and I have gone through and adjusted these test case in order to make them pass with these fixes and in some cases to make sure they're actually testing what they are meant to test. Reviewers: aprantl Subscribers: dsanders Differential Revision: http://reviews.llvm.org/D14186 llvm-svn: 256077	2015-12-19 02:02:44 +00:00
Matt Arsenault	db9fbb2b79	AMDGPU: Switch barrier intrinsics to using convergent noduplicate prevents unrolling of small loops that happen to have barriers in them. If a loop has a barrier in it, it is OK to duplicate it for the unroll. llvm-svn: 256075	2015-12-19 01:46:41 +00:00
Jingyue Wu	6c0c800317	[NaryReassociate] allow candidate to have a different type Summary: If Candiadte may have a different type from GEP, we should bitcast or pointer cast it to GEP's type so that the later RAUW doesn't complain. Added a test in nary-gep.ll Reviewers: tra, meheff Subscribers: mcrosier, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D15618 llvm-svn: 256035	2015-12-18 21:36:30 +00:00
Andrew Kaylor	5467f8865f	[WinEH] Update LCSSA to handle catchswitch with handlers inside and outside a loop Differential Revision: http://reviews.llvm.org/D15630 llvm-svn: 256005	2015-12-18 18:12:35 +00:00
Philip Reames	46cd55f309	[InstCombine] Extend peephole DSE to handle unordered atomics This extends the same line of reasoning used in EarlyCSE w/http://reviews.llvm.org/D15352 to the DSE implementation in InstCombine. Key points: * We only remove unordered or simple stores. * The loads producing values consumed by dead stores don't influence whether the store is dead. Differential Revision: http://reviews.llvm.org/D15354 llvm-svn: 255932	2015-12-17 22:19:27 +00:00
Philip Reames	a19558aa7c	[EarlyCSE] DSE of atomic unordered stores The rules for removing trivially dead stores are a lot less complicated than loads. Since we know the later store post dominates the former and the former dominates the later, unless the former has side effects other than the actual store, we can remove it. One slightly surprising thing is that we can freely remove atomic stores, even if the later one isn't atomic. There's no guarantee the atomic one was every visible. For the moment, we don't handle DSE of ordered atomic stores. We could extend the same chain of reasoning to them, but the catch is we'd then have to model the ordering effect without a store instruction. Since our fences are a stronger than our operation orderings, simple using a fence isn't an obvious win. This arguable calls for a refinement in our fence specification, but that's (much) later work. Differential Revision: http://reviews.llvm.org/D15352 llvm-svn: 255914	2015-12-17 18:50:50 +00:00
Teresa Johnson	0dce8d436c	[ThinLTO] Metadata linking for imported functions Summary: Second patch split out from http://reviews.llvm.org/D14752. Maps metadata as a post-pass from each module when importing complete, suturing up final metadata to the temporary metadata left on the imported instructions. This entails saving the mapping from bitcode value id to temporary metadata in the importing pass, and from bitcode value id to final metadata during the metadata linking postpass. Depends on D14825. Reviewers: dexonsmith, joker.eph Subscribers: davidxl, llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D14838 llvm-svn: 255909	2015-12-17 17:14:09 +00:00
Charlie Turner	bcb6cfca5d	[NFC] Update horizontal reduction test cases. These testcases no longer need to specify -slp-vectorize-hor, since it was enabled by default in r252733. llvm-svn: 255783	2015-12-16 17:22:24 +00:00
James Molloy	cefbfa53f9	[SimplifyCFG] Don't create unnecessary PHIs In conditional store merging, we were creating PHIs when we didn't need to. If the value to be predicated isn't defined in the block we're predicating, then it doesn't need a PHI at all (because we only deal with triangles and diamonds, any value not in the predicated BB must dominate the predicated BB). This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite. Now with a fix (and fixed tests) for the conformance issue seen in Chromium. llvm-svn: 255767	2015-12-16 14:12:44 +00:00
Philip Reames	de669fec7d	[EarlyCSE] DSE of stores which write back loaded values Extend EarlyCSE with an additional style of dead store elimination. If we write back a value just read from that memory location, we can eliminate the store under the assumption that the value hasn't changed. I'm implementing this mostly because I noticed the omission when looking at the code. It seemed strange to have InstCombine have a peephole which was more powerful than EarlyCSE. :) Differential Revision: http://reviews.llvm.org/D15397 llvm-svn: 255739	2015-12-16 01:01:30 +00:00
Philip Reames	d32c9d5ac8	[IR] Add support for floating pointer atomic loads and stores This patch allows atomic loads and stores of floating point to be specified in the IR and adds an adapter to allow them to be lowered via existing backend support for bitcast-to-equivalent-integer idiom. Previously, the only way to specify a atomic float operation was to bitcast the pointer to a i32, load the value as an i32, then bitcast to a float. At it's most basic, this patch simply moves this expansion step to the point we start lowering to the backend. This patch does not add canonicalization rules to convert the bitcast idioms to the appropriate atomic loads. I plan to do that in the future, but for now, let's simply add the support. I'd like to get instruction selection working through at least one backend (x86-64) without the bitcast conversion before canonicalizing into this form. Similarly, I haven't yet added the target hooks to opt out of the lowering step I added to AtomicExpand. I figured it would more sense to add those once at least one backend (x86) was ready to actually opt out. As you can see from the included tests, the generated code quality is not great. I plan on submitting some patches to fix this, but help from others along that line would be very welcome. I'm not super familiar with the backend and my ramp up time may be material. Differential Revision: http://reviews.llvm.org/D15471 llvm-svn: 255737	2015-12-16 00:49:36 +00:00
Evgeniy Stepanov	39e538e166	Cross-DSO control flow integrity (LLVM part). An LTO pass that generates a __cfi_check() function that validates a call based on a hash of the call-site-known type and the target pointer. llvm-svn: 255693	2015-12-15 23:00:08 +00:00
Cong Hou	d84dd1150a	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions. (This is the third attempt to check in this patch, and the first two are r255454 and r255460. The once failed test file reg-usage.ll is now moved to test/Transform/LoopVectorize/X86 directory with target datalayout and target triple indicated.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255691	2015-12-15 22:45:09 +00:00
Sanjay Patel	26b1d4568d	[SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818) This is the last general step to allow more IR-level speculation with a safety harness in place in CodeGenPrepare. The intent is to restore the behavior enabled by: http://reviews.llvm.org/rL228826 but prevent bad performance such as: https://llvm.org/bugs/show_bug.cgi?id=24818 Earlier patches in this sequence: D12882 (disable SimplifyCFG speculation for expensive instructions) D13297 (have CGP despeculate expensive ops) D14630 (have CGP despeculate special versions of cttz/ctlz) As shown in the test cases, we only have two instructions currently affected: ctz for some x86 and fdiv generally. Allowing exactly one expensive instruction is a bit of a hack, but it lines up with what is currently implemented in CGP. If we make the despeculation more general in CGP, we can make the speculation here more liberal. A follow-up patch will adjust the cost for sqrt and possibly other typically expensive math intrinsics (currently everything is cheap by default). GPU targets would likely want to override those expensive default costs (just as they probably should already override the cost of div/rem) because just about any math is cheaper than control-flow on those targets. Differential Revision: http://reviews.llvm.org/D15213 llvm-svn: 255660	2015-12-15 17:38:29 +00:00
Nicolai Hahnle	2aeb81a126	AMDGPU: mark ldexp LibCalls as unavailable Summary: The LibCallSimplifier will turn llvm.exp2.* intrinsics into ldexp* libcalls which do not make sense with the AMDGPU backend. In the long run, we'll want an llvm.ldexp.* intrinsic to properly make use of this optimization, but this works around the problem for now. See also: http://reviews.llvm.org/D14327 (suggested llvm.ldexp.* implementation) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709 Reviewers: arsenm, tstellarAMD Differential Revision: http://reviews.llvm.org/D14990 llvm-svn: 255658	2015-12-15 17:24:15 +00:00
Mehdi Amini	b29b50a9dd	Instcombine: destructor loads of structs that do not contains padding For non padded structs, we can just proceed and deaggregate them. We don't want ot do this when there is padding in the struct as to not lose information about this padding (the subsequents passes would then try hard to preserve the padding, which is undesirable). Also update extractvalue.ll and cast.ll so that they use structs with padding. Remove the FIXME in the extractvalue of laod case as the non padded case is handled when processing the load, and we don't want to do it on the padded case. Patch by: Amaury SECHET <deadalnix@gmail.com> Differential Revision: http://reviews.llvm.org/D14483 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255600	2015-12-15 01:44:07 +00:00
Xinliang David Li	7640a69ad0	[PGO] make profile prefix even shorter and more readable llvm-svn: 255586	2015-12-15 00:32:56 +00:00
Xinliang David Li	10c9ed2b0f	[PGO] Shorten profile symbol prefixes Profile symbols have long prefixes which waste space and creating pressure for linker. This patch shortens the prefixes to minimal length without losing verbosity. Differential Revision: http://reviews.llvm.org/D15503 llvm-svn: 255575	2015-12-14 23:26:27 +00:00
Reid Kleckner	a7d52d8543	Revert "Don't create unnecessary PHIs" This reverts commit r255489. It causes test failures in Chromium and does not appear to respect the AlternativeV parameter. llvm-svn: 255562	2015-12-14 22:36:57 +00:00
Sanjay Patel	14a74b66f7	add fast-math-flags to 'call' instructions (PR21290) This patch adds optional fast-math-flags (the same that apply to fmul/fadd/fsub/fdiv/frem/fcmp) to call instructions in IR. Follow-up patches would use these flags in LibCallSimplifier, add support to clang, and extend FMF to the DAG for calls. Motivating example: %y = fmul fast float %x, %x %z = tail call float @sqrtf(float %y) We'd like to be able to optimize sqrt(x*x) into fabs(x). We do this today using a function-wide attribute for unsafe-math, but we really want to trigger on the instructions themselves: %z = tail call fast float @sqrtf(float %y) because in an LTO build it's possible that calls with fast semantics have been inlined into a function with non-fast semantics. The code changes and tests are based on the recent commits that added "notail": http://reviews.llvm.org/rL252368 and added FMF to fcmp: http://reviews.llvm.org/rL241901 Differential Revision: http://reviews.llvm.org/D14707 llvm-svn: 255555	2015-12-14 21:59:03 +00:00
David Majnemer	49dcd13916	[IR] Remove terminatepad It turns out that terminatepad gives little benefit over a cleanuppad which calls the termination function. This is not sufficient to implement fully generic filters but MSVC doesn't support them which makes terminatepad a little over-designed. Depends on D15478. Differential Revision: http://reviews.llvm.org/D15479 llvm-svn: 255522	2015-12-14 18:34:23 +00:00
Sanjay Patel	0876eb09e0	[InstCombine] fold trunc ([lshr] (bitcast vector) ) --> extractelement (PR25543) This is a fix for PR25543: https://llvm.org/bugs/show_bug.cgi?id=25543 The idea is to take the existing fold of: bitcast ( trunc ( lshr ( bitcast X))) --> extractelement (bitcast X) ( http://reviews.llvm.org/rL112232 ) And break it into less specific transforms so we'll catch more cases such as the example in the bug report: bitcast ( trunc ( lshr ( bitcast X))) --> bitcast ( extractelement (bitcast X)) --> extractelement (bitcast X) Enabling patches for this change: http://reviews.llvm.org/rL255399 (combine bitcasts) http://reviews.llvm.org/rL255433 (canonicalize extractelement(bitcast X)) Differential Revision: http://reviews.llvm.org/D15392 llvm-svn: 255504	2015-12-14 16:16:54 +00:00
James Molloy	699de2f2b0	Don't create unnecessary PHIs In conditional store merging, we were creating PHIs when we didn't need to. If the value to be predicated isn't defined in the block we're predicating, then it doesn't need a PHI at all (because we only deal with triangles and diamonds, any value not in the predicated BB must dominate the predicated BB). This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite. llvm-svn: 255489	2015-12-14 10:57:01 +00:00
Cong Hou	19e67a55f7	Revert r255460, which still causes test failures on some platforms. Further investigation on the failures is ongoing. llvm-svn: 255463	2015-12-13 17:15:38 +00:00
Cong Hou	ea8edf0fdf	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions. (This is the second attempt to check in this patch: REQUIRES: asserts is added to reg-usage.ll now.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255460	2015-12-13 16:55:46 +00:00
Cong Hou	74b6e6397a	Revert r255454 as it leads to several test failers on buildbots. llvm-svn: 255456	2015-12-13 09:28:57 +00:00
Cong Hou	0784bf19f6	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions. LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255454	2015-12-13 08:44:08 +00:00
Xinliang David Li	9a2e877d9a	[PGO] Stop using invalid char in instr variable names. Before the patch, -fprofile-instr-generate compile will fail if no integrated-as is specified when the file contains any static functions (the -S output is also invalid). This is the second try. The fix in this patch is very localized. Only profile symbol names of profile symbols with internal linkage are fixed up while initializer of name syms are not changes. This means there is no format change nor version bump. llvm-svn: 255434	2015-12-12 17:28:03 +00:00
Sanjay Patel	3f624d4650	[InstCombine] canonicalize (bitcast (extractelement X)) --> (extractelement(bitcast X)) This change was discussed in D15392. It allows us to remove the fold that was added in: http://reviews.llvm.org/r255261 ...and it will allow us to generalize this fold: http://reviews.llvm.org/rL112232 while preserving the order of bitcast + extract that it produces and testing shows is better handled by the backend. Note that the existing check for "isVectorTy()" wasn't strong enough in general and specifically because: x86_mmx. It's not a vector, but it's not vectorizable either. So here we check VectorType::isValidElementType() directly before proceeding with the transform. llvm-svn: 255433	2015-12-12 16:44:48 +00:00
David Majnemer	2e1367257e	Move catchpad-phi-cast.ll to the X86 specific subdirectory It is X86 specific and will not be properly exercised unless LLVM is built with the X86 target. llvm-svn: 255426	2015-12-12 06:21:08 +00:00
David Majnemer	bf189bdcd7	[IR] Reformulate LLVM's EH funclet IR While we have successfully implemented a funclet-oriented EH scheme on top of LLVM IR, our scheme has some notable deficiencies: - catchendpad and cleanupendpad are necessary in the current design but they are difficult to explain to others, even to seasoned LLVM experts. - catchendpad and cleanupendpad are optimization barriers. They cannot be split and force all potentially throwing call-sites to be invokes. This has a noticable effect on the quality of our code generation. - catchpad, while similar in some aspects to invoke, is fairly awkward. It is unsplittable, starts a funclet, and has control flow to other funclets. - The nesting relationship between funclets is currently a property of control flow edges. Because of this, we are forced to carefully analyze the flow graph to see if there might potentially exist illegal nesting among funclets. While we have logic to clone funclets when they are illegally nested, it would be nicer if we had a representation which forbade them upfront. Let's clean this up a bit by doing the following: - Instead, make catchpad more like cleanuppad and landingpad: no control flow, just a bunch of simple operands; catchpad would be splittable. - Introduce catchswitch, a control flow instruction designed to model the constraints of funclet oriented EH. - Make funclet scoping explicit by having funclet instructions consume the token produced by the funclet which contains them. - Remove catchendpad and cleanupendpad. Their presence can be inferred implicitly using coloring information. N.B. The state numbering code for the CLR has been updated but the veracity of it's output cannot be spoken for. An expert should take a look to make sure the results are reasonable. Reviewers: rnk, JosephTremoulet, andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D15139 llvm-svn: 255422	2015-12-12 05:38:55 +00:00
Sanjay Patel	ceecde00d5	[InstCombine] allow any pair of bitcasts to be combined This change is discussed in D15392 and should allow us to effectively revert: http://llvm.org/viewvc/llvm-project?view=revision&revision=255261 if we canonicalize bitcasts ahead of extracts. It should be safe to convert any pair of bitcasts into a single bitcast, however, it was mentioned here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20110829/127089.html that we're not allowed to bitcast from an x86_mmx to some other types, but I'm not seeing any failures from that, and we have regression tests in CodeGen/X86 that appear to cover all of those cases. Some day we'll get to remove that MMX wart from LLVM IR completely? Differential Revision: http://reviews.llvm.org/D15468 llvm-svn: 255399	2015-12-12 00:33:36 +00:00
Sanjay Patel	6151d885e6	use FileCheck for better checking llvm-svn: 255394	2015-12-12 00:01:10 +00:00
Sanjay Patel	5d71fe5292	Add tests for bitcast-bitcast sequences for all scalar/vector permutations As noted in http://reviews.llvm.org/D15392 , we should be able to improve this. llvm-svn: 255370	2015-12-11 20:26:30 +00:00
Xinliang David Li	2cad28318a	[PGO] Revert r255365: solution incomplete, not handling lambda yet llvm-svn: 255369	2015-12-11 20:23:22 +00:00
Xinliang David Li	de46641441	[PGO] Stop using invalid char in instr variable names. Before the patch, -fprofile-instr-generate compile will fail if no integrated-as is specified when the file contains any static functions (the -S output is also invalid). This patch fixed the issue. With the change, the index format version will be bumped up by 1. Backward compatibility is preserved with this change. Differential Revision: http://reviews.llvm.org/D15243 llvm-svn: 255365	2015-12-11 19:53:19 +00:00
Chad Rosier	2dd4cf7c08	Revert r255247, r255265, and r255286 due to serious compile-time regressions. Revert "[DSE] Disable non-local DSE to see if the bots go green." Revert "[DeadStoreElimination] Use range-based loops. NFC." Revert "[DeadStoreElimination] Add support for non-local DSE." llvm-svn: 255354	2015-12-11 18:39:41 +00:00
James Molloy	f5a3f4d77c	[Mem2Reg] Respect optnone Mem2Reg shouldn't be optimizing a function that is marked optnone. There is a test checking this that fails when mem2reg is explicitly added to the standard pass pipeline. llvm-svn: 255336	2015-12-11 13:36:59 +00:00
James Molloy	d8003c7bf6	[InstCombine] Make MatchBSwap also match bit reversals MatchBSwap has most of the functionality to match bit reversals already. If we switch it from looking at bytes to individual bits and remove a few early exits, we can extend the main recursive function to match any sequence of ORs, ANDs and shifts that assemble a value from different parts of another, base value. Once we have this bit->bit mapping, we can very simply detect if it is appropriate for a bswap or bitreverse. llvm-svn: 255334	2015-12-11 10:04:51 +00:00
JF Bastien	b5effc7a01	EarlyCSE: add tests Summary: As a follow-up to rL255054 I wasn't able to convince myself that the code did what I thought, so I wrote more tests. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15371 llvm-svn: 255295	2015-12-10 20:24:34 +00:00
Chad Rosier	f118529c0e	[DSE] Disable non-local DSE to see if the bots go green. I see a few bots timing out, so I'm speculatively disabling r255247. llvm-svn: 255286	2015-12-10 19:23:02 +00:00
Rong Xu	1cf66e30a9	[PGO] Use %t as the temporary profdata filename in the test cases. Using %t rather %T/<specific_name> as the temporary profdata filename. llvm-svn: 255271	2015-12-10 18:24:44 +00:00
Sanjay Patel	25a4b4195f	[InstCombine] fold bitcasts around an extractelement (3rd try) This is a redo of r255137 (reverted at r255227) which was a redo of r255124 (reverted at r255126) with a fixed check for a scalar source type and an added test for the failure that caused the revert. Original commit message: Example: bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float ---> extractelement <2 x float> %X, i32 1 This is part of fixing PR25543: https://llvm.org/bugs/show_bug.cgi?id=25543 The next step will be to generalize this fold: trunc ( lshr ( bitcast X) ) -> extractelement (X) Ie, I'm hoping to replace the existing transform of: bitcast ( trunc ( lshr ( bitcast X))) added by: http://reviews.llvm.org/rL112232 with 2 less specific transforms to catch the case in the bug report. Differential Revision: http://reviews.llvm.org/D14879 llvm-svn: 255261	2015-12-10 17:09:28 +00:00
Chad Rosier	4a619fca22	[DeadStoreElimination] Add support for non-local DSE. We extend the search for redundant stores to predecessor blocks that unconditionally lead to the block BB with the current store instruction. That also includes single-block loops that unconditionally lead to BB, and if-then-else blocks where then- and else-blocks unconditionally lead to BB. http://reviews.llvm.org/D13363 Patch by Ivan Baev <ibaev@codeaurora.org>! llvm-svn: 255247	2015-12-10 13:51:43 +00:00
Akira Hatanaka	a1488717da	Revert r255137. This commit broke apple's internal bot. llvm-svn: 255227	2015-12-10 08:00:52 +00:00
Rong Xu	59778bc572	[PGO] Rename the profdata filename to avoid the conflict b/w tests. Two tests diag_mismatch.ll and diag_no_funcprofdata.ll generates the same profdata filename which can conflict in current test runs. This patch renames them to have different names. llvm-svn: 255158	2015-12-09 21:27:59 +00:00
Justin Bogner	610820950e	IR: Make ConstantDataArray::getFP actually return a ConstantDataArray The ConstantDataArray::getFP(LLVMContext &, ArrayRef<uint16_t>) overload has had a typo in it since it was written, where it will create a Vector instead of an Array. This obviously doesn't work at all, but it turns out that until r254991 there weren't actually any callers of this overload. Fix the typo and add some test coverage. llvm-svn: 255157	2015-12-09 21:21:07 +00:00
Reid Kleckner	dc26fb037c	[Float2Int] Don't operate on vector instructions This fixes a crash bug. It's also not clear if we'd want to do this transform for vectors. llvm-svn: 255155	2015-12-09 21:08:18 +00:00
Sanjoy Das	f3ba629c4d	Use WeakVH to keep track of calls with operand bundles in CloneCodeInfo `CloneAndPruneIntoFromInst` can DCE instructions after cloning them into the new function, and so an AssertingVH is too strong. This change switches CloneCodeInfo to use a std::vector<WeakVH>. llvm-svn: 255148	2015-12-09 20:33:52 +00:00
Sanjay Patel	de6f59d487	[InstCombine] fold bitcasts around an extractelement (2nd try) This is a redo of r255124 (reverted at r255126) with an added check for a scalar destination type and an added test for the failure seen in Clang's test/CodeGen/vector.c. The extra test shows a different missing optimization. Original commit message: Example: bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float ---> extractelement <2 x float> %X, i32 1 This is part of fixing PR25543: https://llvm.org/bugs/show_bug.cgi?id=25543 The next step will be to generalize this fold: trunc ( lshr ( bitcast X) ) -> extractelement (X) Ie, I'm hoping to replace the existing transform of: bitcast ( trunc ( lshr ( bitcast X))) added by: http://reviews.llvm.org/rL112232 with 2 less specific transforms to catch the case in the bug report. Differential Revision: http://reviews.llvm.org/D14879 llvm-svn: 255137	2015-12-09 18:57:16 +00:00
Rong Xu	2f995f2098	[PGO] Resubmit "MST based PGO instrumentation infrastructure" (r254021) This new patch fixes a few bugs that exposed in last submit. It also improves the test cases. --Original Commit Message-- This patch implements a minimum spanning tree (MST) based instrumentation for PGO. The use of MST guarantees minimum number of CFG edges getting instrumented. An addition optimization is to instrument the less executed edges to further reduce the instrumentation overhead. The patch contains both the instrumentation and the use of the profile to set the branch weights. Differential Revision: http://reviews.llvm.org/D12781 llvm-svn: 255132	2015-12-09 18:08:16 +00:00
Mehdi Amini	de04fa6b68	Revert "[InstCombine] fold bitcasts around an extractelement" This reverts commit r255124. Broke http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/4193/steps/test/logs/stdio From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255126	2015-12-09 16:31:39 +00:00
Sanjay Patel	8a5018320c	[InstCombine] fold bitcasts around an extractelement Example: bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float ---> extractelement <2 x float> %X, i32 1 This is part of fixing PR25543: https://llvm.org/bugs/show_bug.cgi?id=25543 The next step will be to generalize this fold: trunc ( lshr ( bitcast X) ) -> extractelement (X) Ie, I'm hoping to replace the existing transform of: bitcast ( trunc ( lshr ( bitcast X))) added by: http://reviews.llvm.org/rL112232 with 2 less specific transforms to catch the case in the bug report. Differential Revision: http://reviews.llvm.org/D14879 llvm-svn: 255124	2015-12-09 16:17:20 +00:00
Mehdi Amini	300ed48d90	Change hasUniqueInitializer() to call isStrongDefinitionForLinker() instead of !isWeakForLinker() Summary: Available_externally global variable with initializer were considered "hasInitializer()", while obviously it can't match the description: Whether the global variable has an initializer, and any changes made to the initializer will turn up in the final executable. since modifying the initializer of an externally available variable does not make sense. Reviewers: pcc, rafael Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15351 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255123	2015-12-09 16:17:07 +00:00
Sanjoy Das	9ec731cd34	Don't drop attributes when inlining through "deopt" operand bundles Test case attached (test case also checks that we don't drop the calling convention, but that functionality was correct before this patch). llvm-svn: 255088	2015-12-09 01:01:28 +00:00
Sanjoy Das	cb770fbcb6	[OperandBundles] Have PruneEH work correct with operand bundles. For an invoke with operand bundles, the [op_begin(), op_end()-3] range can contain things other than invoke arguments. This change teaches PruneEH to use arg_begin() and arg_end() explicitly. llvm-svn: 255073	2015-12-08 23:16:52 +00:00
Reid Kleckner	bc5854b1a9	[CGP] Reimplement r255055 a different way llvm-svn: 255070	2015-12-08 23:00:03 +00:00
Reid Kleckner	d51ed310dc	Revert "[CGP] Check that we have an insert point before moving llvm.dbg.value around" This reverts commit r255055. Breakage has been reported. llvm-svn: 255063	2015-12-08 22:33:23 +00:00
Sanjoy Das	5a8ebaa29b	[OperandBundles] Fix a transform in simplifycfg Reviewers: pcc, majnemer, reames Subscribers: reames, llvm-commits Differential Revision: http://reviews.llvm.org/D15345 llvm-svn: 255062	2015-12-08 22:26:08 +00:00
Reid Kleckner	fb4e05e94a	[CGP] Check that we have an insert point before moving llvm.dbg.value around llvm-svn: 255055	2015-12-08 21:50:52 +00:00
Philip Reames	041ac7b389	[EarlyCSE] Value forwarding for unordered atomics This patch teaches the fully redundant load part of EarlyCSE how to forward from atomic and volatile loads and stores, and how to eliminate unordered atomics (only). This patch does not include dead store elimination support for unordered atomics, that will follow in the near future. The basic idea is that we allow all loads and stores to be tracked by the AvailableLoad table. We store a bit in the table which tracks whether load/store was atomic, and then only replace atomic loads with ones which were also atomic. No attempt is made to refine our handling of ordered loads or stores. Those are still treated as full fences. We could pretty easily extend the release fence handling to release stores, but that should be a separate patch. Differential Revision: http://reviews.llvm.org/D15337 llvm-svn: 255054	2015-12-08 21:45:41 +00:00
Mehdi Amini	e4f5a60024	Revert "Add Available Externally linkage type to isWeakForLinker()" This reverts r255043, as per post-review concern were raised on the correctness. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255045	2015-12-08 19:13:31 +00:00
Mehdi Amini	ca56a93080	Cleanup test: remove useless alignment From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255044	2015-12-08 19:02:55 +00:00
Mehdi Amini	adf4a628c7	Add Available Externally linkage type to isWeakForLinker() Per LangRef: "Globals with available_externally linkage are allowed to be discarded at will, and are otherwise the same as linkonce_odr", since linkonce_odr is in this list it makes sense to have available_externally there as well. Reviewers: rafael Differential Revision: http://reviews.llvm.org/D15323 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255043	2015-12-08 19:01:29 +00:00
Sanjoy Das	2f7aca1668	[IndVars] Have getInsertPointForUses preserve LCSSA Summary: Also add a stricter post-condition for IndVarSimplify. Fixes PR25578. Test case by Michael Zolotukhin. Reviewers: hfinkel, atrick, mzolotukhin Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15059 llvm-svn: 254977	2015-12-08 00:13:21 +00:00
Sanjoy Das	ec1f59a19a	[SCEVExpander] Have hoistIVInc preserve LCSSA Summary: (Note: the problematic invocation of hoistIVInc that caused PR24804 came from IndVarSimplify, not from SCEVExpander itself) Fixes PR24804. Test case by David Majnemer. Reviewers: hfinkel, majnemer, atrick, mzolotukhin Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15058 llvm-svn: 254976	2015-12-08 00:13:17 +00:00
Sanjoy Das	16ad4f2471	[InstCombine] Call getCmpPredicateForMinMax only with a valid SPF Summary: There are `SelectPatternFlavor`s that don't represent min or max idioms, and we should not be passing those to `getCmpPredicateForMinMax`. Fixes PR25745. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15249 llvm-svn: 254869	2015-12-05 23:44:22 +00:00
Weiming Zhao	84bd343622	[SimplifyLibCalls] Optimization for pow(x, n) where n is some constant Summary: In order to avoid calling pow function we generate repeated fmul when n is a positive or negative whole number. For each exponent we pre-compute Addition Chains in order to minimize the no. of fmuls. Refer: http://wwwhomes.uni-bielefeld.de/achim/addition_chain.html We pre-compute addition chains for exponents upto 32 (which results in a max of 7 fmuls). For eg: 4 = 2+2 5 = 2+3 6 = 3+3 and so on Hence, pow(x, 4.0) ==> y = fmul x, x x = fmul y, y ret x For negative exponents, we simply compute the reciprocal of the final result. Note: This transformation is only enabled under fast-math. Patch by Mandeep Singh Grang <mgrang@codeaurora.org> Reviewers: weimingz, majnemer, escha, davide, scanon, joerg Subscribers: probinson, escha, llvm-commits Differential Revision: http://reviews.llvm.org/D13994 llvm-svn: 254776	2015-12-04 22:00:47 +00:00
David Majnemer	dc587eeed6	[Analysis] Become aware of MSVC's new/delete functions The compiler can take advantage of the allocation/deallocation function's properties. We knew how to do this for Itanium but had no support for MSVC-style functions. llvm-svn: 254656	2015-12-03 22:45:19 +00:00
Andrew Kaylor	0608b5076b	Move branch folding test to a better location. llvm-svn: 254640	2015-12-03 19:41:25 +00:00
Andrew Kaylor	5b31868aff	Fix buildbot failures llvm-svn: 254636	2015-12-03 19:30:38 +00:00
Andrew Kaylor	f447735445	[WinEH] Avoid infinite loop in BranchFolding for multiple single block funclets Differential Revision: http://reviews.llvm.org/D14996 llvm-svn: 254629	2015-12-03 18:55:28 +00:00
Rafael Espindola	c0e7cab0ab	Delete what is now duplicated code. Having to import an alias as declaration is not thinlto specific. The test difference are because when we already have a decl and we are not importing it, we just leave the decl alone. llvm-svn: 254556	2015-12-02 22:22:24 +00:00
David Majnemer	df4ee5c023	Do (A == C1 \|\| A == C2) -> (A & ~(C1 ^ C2)) == C1 rather than (A == C1 \|\| A == C2) -> (A \| (C1 ^ C2)) == C2 when C1 ^ C2 is a power of 2. Differential Revision: http://reviews.llvm.org/D14223 Patch by Amaury SECHET! llvm-svn: 254518	2015-12-02 16:15:07 +00:00
Evgeniy Stepanov	154021c8a2	[safestack] Protect byval function arguments. Detect unsafe byval function arguments and move them to the unsafe stack. llvm-svn: 254353	2015-12-01 00:40:05 +00:00
Evgeniy Stepanov	b189b16a23	[safestack] Fix handling of array allocas. The current code does not take alloca array size into account and, as a result, considers any access past the first array element to be unsafe. llvm-svn: 254350	2015-12-01 00:06:13 +00:00
Sanjay Patel	f9793efc69	[InstCombine] add tests to show potential vector IR shuffle transforms llvm-svn: 254342	2015-11-30 22:39:36 +00:00
Davide Italiano	a4b22c0406	[SimplifyLibCalls] Remove useless bits of this tests. llvm-svn: 254318	2015-11-30 19:38:35 +00:00
Davide Italiano	0f427b7147	[SimplifyLibCalls] Transform log(exp2(y)) to y*log(2) under fast-math. llvm-svn: 254317	2015-11-30 19:36:35 +00:00
Davide Italiano	ae7cdf685f	[SimplifyLibCalls] Don't crash if the function doesn't have a name. llvm-svn: 254265	2015-11-29 21:58:56 +00:00
Davide Italiano	85963c8ad6	[SimplifyLibCalls] Tranform log(pow(x, y)) -> ylog(x). This one is enabled only under -ffast-math. There are cases where the difference between the value computed and the correct value is huge even for ffast-math, e.g. as Steven pointed out: x = -1, y = -4 log(pow(-1), 4) = 0 4log(-1) = NaN I checked what GCC does and apparently they do the same optimization (which result in the dramatic difference). Future work might try to make this (slightly) less worse. Differential Revision: http://reviews.llvm.org/D14400 llvm-svn: 254263	2015-11-29 20:58:04 +00:00
Diego Novillo	d08de97276	SamplePGO - Add initial support for inliner annotations. This adds two thresholds to the sample profiler to affect inlining decisions: the concept of global hotness and coldness. Functions that have accumulated more than a certain fraction of samples at runtime, are annotated with the InlineHint attribute. Conversely, functions that accumulate less than a certain fraction of samples, are annotated with the Cold attribute. This is very similar to the hints emitted by Clang when using instrumentation profiles. Notice that this is a very blunt instrument. A function may have globally collected a significant fraction of samples, but that does not necessarily mean that every callsite for that function is hot. Ideally, we would annotate each callsite with the samples collected at that callsite. This way, the inliner can incorporate all these weights into its cost model. Once the inliner offers this functionality, we can change the hints emitted here to a more precise per-callsite annotation. For now, this is providing some measure of speedups with our internal benchmarks. I've observed speedups of up to 23% (though the geo mean is about 3%). I expect these numbers to improve as the inliner gets better annotations. llvm-svn: 254212	2015-11-27 23:14:51 +00:00
Charlie Turner	18cf3a8580	[LoopVectorize] Use MapVector rather than DenseMap for MinBWs. The order in which instructions are truncated in truncateToMinimalBitwidths effects code generation. Switch to a map with a determinisic order, since the iteration order over a DenseMap is not defined. This code is not hot, so the difference in container performance isn't interesting. Many thanks to David Blaikie for making me aware of MapVector! Fixes PR25490. Differential Revision: http://reviews.llvm.org/D14981 llvm-svn: 254179	2015-11-26 20:39:51 +00:00
Rafael Espindola	d215bba299	Disallow aliases to available_externally. They are as much trouble as aliases to declarations. They are requiring the code generator to define a symbol with the same value as another symbol, but the second symbol is undefined. If representing this is important for some optimization, we could add support for available_externally aliases. They would be required to point to a declaration (or available_externally definition). llvm-svn: 254170	2015-11-26 19:22:59 +00:00
Benjamin Kramer	a5c875d940	[SimplifyLibCalls] Don't depend on a called function having a name, it might be an indirect call. Fixes the crasher in PR25651 and related crashers using the same pattern. llvm-svn: 254145	2015-11-26 09:51:17 +00:00
Evgeniy Stepanov	ef8f40a43e	[safestack] Fix alignment of dynamic allocas. Fixes PR25588. llvm-svn: 254109	2015-11-25 22:52:30 +00:00
Sanjoy Das	d16b4e5c5e	[InstCombine] Don't drop operand bundles Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14857 llvm-svn: 254046	2015-11-25 00:42:19 +00:00
Rong Xu	0c33c67e46	Revert r254021 llvm-svn: 254042	2015-11-24 23:57:51 +00:00
Rong Xu	c4f897c441	[PGO] Revert revision r254021,r254028,r254035 Revert the above revision due to multiple issues. llvm-svn: 254040	2015-11-24 23:49:08 +00:00
Teresa Johnson	cbf6e0bf1b	[ThinLTO] Add option to limit importing based on instruction count Add a simple initial heuristic to control importing based on the number of instructions recorded in the function's summary. Add option to control the limit, and test using option. llvm-svn: 254036	2015-11-24 22:55:46 +00:00
Rong Xu	9a7262eb26	[PGO] Relax test cases in PGO instrumentation Fix buildbot failure for clang-x86_64-linux-selfhost-modules. http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/8866 The failing test cases are newly added from r254021. It seems the IR has a different order in this platform. In this patch, I temporarily relax the test case to make the build green. I'll have a complete fix (more robust way to test) soon. llvm-svn: 254035	2015-11-24 22:50:34 +00:00
Diego Novillo	2b7c3c54ab	SamplePGO - Add test for hot/cold inlined functions. When the original binary is executed and sampled, the resulting profile contains information on the original inline stack. We currently follow the original inline plan if we notice that the inlined callsite has more than 0 samples to it. A better way is to determine whether the callsite is actually worth inlining. If the callsite accumulates a small fraction of the samples spent in the parent function, then we don't want to bother inlining it (as it means that the callsite is actually cold). This patch introduces a threshold expressed in percentage of samples in relation to the parent function. If the callsite uses less than N% of the total samples used by its parent, the original inline decision is not re-applied. I've set the threshold to the very arbitrary value of 5%. I'm yet to do any actual experiments to see what's a good value. I wanted to separate the basic mechanism from the tuning. llvm-svn: 254034	2015-11-24 22:38:37 +00:00
Rong Xu	025bf7be0c	[PGO] MST based PGO instrumentation infrastructure This patch implements a minimum spanning tree (MST) based instrumentation for PGO. The use of MST guarantees minimum number of CFG edges getting instrumented. An addition optimization is to instrument the less executed edges to further reduce the instrumentation overhead. The patch contains both the instrumentation and the use of the profile to set the branch weights. Differential Revision: http://reviews.llvm.org/D12781 llvm-svn: 254021	2015-11-24 21:31:25 +00:00
Teresa Johnson	a3214913e6	[ThinLTO] Enable iterative importing in FunctionImport pass Analyze imported function bodies and add any new external calls to the worklist for importing. Currently no controls on the importing so this will end up importing everything possible in the call tree below the importing module. Basic profitability checks coming next. Update test to check for iteratively inlined functions. llvm-svn: 254011	2015-11-24 19:55:04 +00:00
Teresa Johnson	7a187fa24b	[ThinLTO] Handle previously imported and promoted locals in module linker The new function import pass exposed an issue when we import references to local values on multiple importing passes. They are renamed on each import pass, and we need to ensure that the already promoted and renamed references existing in the dest module are correctly identified and updated so that they aren't spuriously renamed again (due to a perceived conflict with the newly linked reference). llvm-svn: 254009	2015-11-24 19:46:58 +00:00
Sanjay Patel	cca965412e	[InstCombine] fix propagation of fast-math-flags Noticed while working on D4583: http://reviews.llvm.org/D4583 llvm-svn: 253997	2015-11-24 17:51:20 +00:00
Teresa Johnson	9c0a1779ce	[ThinLTO] Fix FunctionImport alias checking and test Skip imports for weak_any aliases as well. Fix the test to check non-import of weak aliases and functions, and import of normal alias. llvm-svn: 253991	2015-11-24 16:10:43 +00:00
Mehdi Amini	2fe02188ef	Add a FunctionImporter helper to perform summary-based cross-module function importing Summary: This is a helper to perform cross-module import for ThinLTO. Right now it is importing naively every possible called functions. Reviewers: tejohnson Subscribers: dexonsmith, llvm-commits Differential Revision: http://reviews.llvm.org/D14914 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 253954	2015-11-24 06:07:49 +00:00
Diego Novillo	d28d079aa7	SamplePGO - Add coverage tracking for samples. The existing coverage tracker counts the number of records that were used from the input profile. An alternative view of coverage is to check how many available samples were applied. This way, if the profile contains several records with few samples, it doesn't really matter much that they were not applied. The more interesting records to apply are the ones that contribute many samples. llvm-svn: 253912	2015-11-23 20:12:21 +00:00
Andrew Kaylor	4859a7de39	[WinEH] Fix a case where GVN could incorrectly PRE a load into an EH pad. Differential Revision: http://reviews.llvm.org/D14842 llvm-svn: 253908	2015-11-23 19:51:41 +00:00
Andrew Kaylor	b08d35fdf1	[WinEH] Fix problem where CodeGenPrepare incorrectly sinks a bitcast into an EH pad. Differential Revision: http://reviews.llvm.org/D14842 llvm-svn: 253902	2015-11-23 19:16:15 +00:00
Rafael Espindola	9cb8841b77	Have a single way for creating unique value names. We had two code paths. One would create names like "foo.1" and the other names like "foo1". For globals it is important to use "foo.1" to help C++ name demangling. For locals there is no strong reason to go one way or the other so I kept the most common mangling (foo1). llvm-svn: 253804	2015-11-22 00:16:24 +00:00
Sanjay Patel	58d25e69b7	move a single test case to where most other instcombine shuffle bug test cases exist llvm-svn: 253784	2015-11-21 16:12:58 +00:00
NAKAMURA Takumi	5946adf23c	Move free-zext.ll to llvm/test/Transforms/CodeGenPrepare/AArch64/ llvm-svn: 253730	2015-11-20 22:55:34 +00:00
Owen Anderson	def6a5c0c6	Fix another infinite loop in Reassociate caused by Constant::isZero(). Not all zero vectors are ConstantDataVector's. llvm-svn: 253723	2015-11-20 22:34:48 +00:00
Geoff Berry	893bbf2bf8	[CodeGenPrepare] Create more extloads and fewer ands Summary: Add and instructions immediately after loads that only have their low bits used, assuming that the (and (load x) c) will be matched as a extload and the ands/truncs fed by the extload will be removed by isel. Reviewers: mcrosier, qcolombet, ab Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14584 llvm-svn: 253722	2015-11-20 22:34:39 +00:00
Diego Novillo	8f65a2ac5e	SamplePGO - Tweak RUN command for a test. NFC. llvm-svn: 253717	2015-11-20 21:46:41 +00:00
Diego Novillo	372bf7dc64	SamplePGO - Do not count never-executed inlined functions when computing coverage. If a function was originally inlined but not actually hot at runtime, its samples will not be counted inside the parent function. This throws off the coverage calculation because it expects to find more used records than it should. Fixed by ignoring functions that will not be inlined into the parent. Currently, this is inlined functions with 0 samples. In subsequent patches, I'll change this to mean "cold" functions. llvm-svn: 253716	2015-11-20 21:46:38 +00:00
Diego Novillo	4255fabac8	SamplePGO - Add line offset and discriminator information to sample reports. While debugging some sampling coverage problems, I found this useful: When applying samples from a profile, it helps to also know what line offset and discriminator the sample belongs to. This makes it easy to correlate against the input profile. llvm-svn: 253670	2015-11-20 15:39:42 +00:00
Owen Anderson	b4d0c09caf	Fix a pair of issues that caused an infinite loop in reassociate. Terrifyingly, one of them is a mishandling of floating point vectors in Constant::isZero(). How exactly this issue survived this long is beyond me. llvm-svn: 253655	2015-11-20 08:16:13 +00:00
Peter Collingbourne	e196b98966	ScalarEvolution: do not set nuw when creating exprs of form <expr> + <all-ones>. The nuw constraint will not be satisfied unless <expr> == 0. This bug has been around since r102234 (in 2010!), but was uncovered by r251052, which introduced more aggressive optimization of nuw scev expressions. Differential Revision: http://reviews.llvm.org/D14850 llvm-svn: 253627	2015-11-20 01:26:13 +00:00
Sanjay Patel	c0f869e525	[InstCombine] add tests to show missing trunc optimizations llvm-svn: 253609	2015-11-19 22:11:52 +00:00
Sanjay Patel	c933d364c7	[InstCombine] add tests to show missing bitcast optimizations llvm-svn: 253602	2015-11-19 21:32:25 +00:00
Dehao Chen	52b358f670	Reimplement discriminator assignment algorithm. Summary: The new algorithm is more efficient (O(n), n is number of basic blocks). And it is guaranteed to cover all cases of multiple BB mapped to same line. Reviewers: dblaikie, davidxl, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14738 llvm-svn: 253594	2015-11-19 19:53:05 +00:00
James Molloy	2208ca52dd	[GlobalOpt] Localize some globals that have non-instruction users We currently bail out of global localization if the global has non-instruction users. However, often these can be simple bitcasts or constant-GEPs, which we can easily turn into instructions before localizing. Be a bit more aggressive. llvm-svn: 253584	2015-11-19 18:04:33 +00:00
Sanjay Patel	7659e7f765	this new test file was accidentally left out of r253573 llvm-svn: 253574	2015-11-19 16:39:00 +00:00
James Molloy	b585b0aee8	[FunctionAttrs] Provide a mechanism for adding function attributes from the command line This provides a way to force a function to have certain attributes from the command line. This can be useful when debugging or doing workload exploration, where manually editing IR is tedious or not possible (due to build systems etc). The syntax is -force-attribute=function_name:attribute_name All function attributes are parsed except alignstack as it requires an argument. llvm-svn: 253550	2015-11-19 08:49:57 +00:00

... 3 4 5 6 7 ...

6334 Commits