llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 06:22:56 +02:00

Author	SHA1	Message	Date
Adam Nemet	eb3a2a0c5f	[LoopAccesses] Change LAA:getInfo to return a constant reference As expected, this required a few more const-correctness fixes. Based on Hal's feedback on D7684. llvm-svn: 229899	2015-02-19 19:15:21 +00:00
Adam Nemet	c3fb49ac6a	[LoopAccesses] Split out LoopAccessReport from VectorizerReport The only difference between these two is that VectorizerReport adds a vectorizer-specific prefix to its messages. When LAA is used in the vectorizer context the prefix is added when we promote the LoopAccessReport into a VectorizerReport via one of the constructors. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229897	2015-02-19 19:15:15 +00:00
Adam Nemet	a19312773c	[LoopAccesses] Add missing const to APIs in VectorizationReport When I split out LoopAccessReport from this, I need to create some temps so constness becomes necessary. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229896	2015-02-19 19:15:13 +00:00
Adam Nemet	a98e624489	[LoopAccesses] Change debug messages from LV to LAA Also add pass name as an argument to VectorizationReport::emitAnalysis. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229894	2015-02-19 19:15:07 +00:00
Adam Nemet	6041d14ab3	[LoopAccesses] Create the analysis pass This is a function pass that runs the analysis on demand. The analysis can be initiated by querying the loop access info via LAA::getInfo. It either returns the cached info or runs the analysis. Symbolic stride information continues to reside outside of this analysis pass. We may move it inside later but it's not a priority for me right now. The idea is that Loop Distribution won't support run-time stride checking at least initially. This means that when querying the analysis, symbolic stride information can be provided optionally. Whether stride information is used can invalidate the cache entry and rerun the analysis. Note that if the loop does not have any symbolic stride, the entry should be preserved across Loop Distribution and LV. Since currently the only user of the pass is LV, I just check that the symbolic stride information didn't change when using a cached result. On the LV side, LoopVectorizationLegality requests the info object corresponding to the loop from the analysis pass. A large chunk of the diff is due to LAI becoming a pointer from a reference. A test will be added as part of the -analyze patch. Also tested that with AVX, we generate identical assembly output for the testsuite (including the external testsuite) before and after. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229893	2015-02-19 19:15:04 +00:00
Adam Nemet	2205a68ebc	[LoopAccesses] Cache the result of canVectorizeMemory LAA will be an on-demand analysis pass, so we need to cache the result of the analysis. canVectorizeMemory is renamed to analyzeLoop which computes the result. canVectorizeMemory becomes the query function for the cached result. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229892	2015-02-19 19:15:00 +00:00
Adam Nemet	95a7235607	[LoopAccesses] Stash the report from the analysis rather than emitting it The transformation passes will query this and then emit them as part of their own report. The currently only user LV is modified to do just that. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229891	2015-02-19 19:14:56 +00:00
Adam Nemet	e77467312e	[LoopAccesses] Make VectorizerParams global + fix for cyclic dep As LAA is becoming a pass, we can no longer pass the params to its constructor. This changes the command line flags to have external storage. These can now be accessed both from LV and LAA. VectorizerParams is moved out of LoopAccessInfo in order to shorten the code to access it. This commits also has the fix (D7731) to the break dependence cycle between the analysis and vector libraries. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229890	2015-02-19 19:14:52 +00:00
Adam Nemet	ef703ef4d8	Revert "Reformat." This reverts commit r229651. I'd like to ultimately revert r229650 but this reformat stands in the way. I'll reformat the affected files once the the loop-access pass is fully committed. llvm-svn: 229889	2015-02-19 19:14:34 +00:00
Benjamin Kramer	deb8f2934a	LSR: Move set instead of copying. NFC. llvm-svn: 229871	2015-02-19 17:19:43 +00:00
Benjamin Kramer	c0850fa665	Demote vectors to arrays. No functionality change. llvm-svn: 229861	2015-02-19 15:26:17 +00:00
Michael Gottesman	86672937bd	[objc-arc] Introduce the concept of RCIdentity and rename all relevant functions to use that name. NFC. The RCIdentity root ("Reference Count Identity Root") of a value V is a dominating value U for which retaining or releasing U is equivalent to retaining or releasing V. In other words, ARC operations on V are equivalent to ARC operations on U. This is a useful property to ascertain since we can use this in the ARC optimizer to make it easier to match up ARC operations by always mapping ARC operations to RCIdentityRoots instead of pointers themselves. Then we perform pairing of retains, releases which are applied to the same RCIdentityRoot. In general, the two ways that we see RCIdentical values in ObjC are via: 1. PointerCasts 2. Forwarding Calls that return their argument verbatim. As such in ObjC, two RCIdentical pointers must always point to the same memory location. Previously this concept was implicit in the code and various methods that dealt with this concept were given functional names that did not conform to any name in the "ARC" model. This often times resulted in code that was hard for the non-ARC acquanted to understand resulting in unhappiness and confusion. llvm-svn: 229796	2015-02-19 00:42:38 +00:00
Michael Gottesman	b9380fca86	[objc-arc-contract] Rename contractRelease => tryToContractReleaseIntoStoreStrong. NFC. Makes it clearer what this method is actually supposed to do. llvm-svn: 229795	2015-02-19 00:42:34 +00:00
Michael Gottesman	5352c179b7	[objc-arc-contract] Refactor out tryToPeepholeInstruction into its own method. NFC. The main method of ObjCARCContract is really large and busy. By refactoring this out, it becomes easier to reason about. llvm-svn: 229794	2015-02-19 00:42:30 +00:00
Michael Gottesman	28626ad221	[objc-arc-contract] Reorganize the code a bit and make the debug output easier to read. llvm-svn: 229793	2015-02-19 00:42:27 +00:00
Sanjoy Das	48dc5bb962	Partial fix for bug 22589 Don't spend the entire iteration space in the scalar loop prologue if computing the trip count overflows. This change also gets rid of the backedge check in the prologue loop and the extra check for overflowing trip-count. Differential Revision: http://reviews.llvm.org/D7715 llvm-svn: 229731	2015-02-18 19:32:25 +00:00
Andrew Kaylor	eca1819627	Adding implementation to outline C++ catch handlers for native Windows 64 exception handling. Differential Revision: http://reviews.llvm.org/D7363 llvm-svn: 229715	2015-02-18 18:31:51 +00:00
Mohit K. Bhakkad	ac187bf468	[MSan][MIPS] VarArgHelper for MIPS64 Reviewers: Reviewers: eugenis, kcc, samsonov, petarj Subscribers: dsanders, sagar, llvm-commits Differential Revision: http://reviews.llvm.org/D7182 llvm-svn: 229667	2015-02-18 11:41:24 +00:00
NAKAMURA Takumi	e99052dc84	Reformat. llvm-svn: 229651	2015-02-18 08:36:14 +00:00
NAKAMURA Takumi	922c5e2986	Revert r229622: "[LoopAccesses] Make VectorizerParams global" and others. r229622 brought cyclic dependencies between Analysis and Vector. r229622: "[LoopAccesses] Make VectorizerParams global" r229623: "[LoopAccesses] Stash the report from the analysis rather than emitting it" r229624: "[LoopAccesses] Cache the result of canVectorizeMemory" r229626: "[LoopAccesses] Create the analysis pass" r229628: "[LoopAccesses] Change debug messages from LV to LAA" r229630: "[LoopAccesses] Add canAnalyzeLoop" r229631: "[LoopAccesses] Add missing const to APIs in VectorizationReport" r229632: "[LoopAccesses] Split out LoopAccessReport from VectorizerReport" r229633: "[LoopAccesses] Add -analyze support" r229634: "[LoopAccesses] Change LAA:getInfo to return a constant reference" r229638: "Analysis: fix buildbots" llvm-svn: 229650	2015-02-18 08:34:47 +00:00
Craig Topper	886ea644c2	[X86] Remove AVX512 pslldq/psrldq shift intrinsics. They aren't implemented yet and when they are they should be done with shuffles like SSE2 and AVX2. llvm-svn: 229641	2015-02-18 06:24:49 +00:00
Craig Topper	398dc737fa	[X86] Remove AVX2 and SSE2 pslldq and psrldq intrinsics. We can represent them in IR with vector shuffles now. All their uses have been removed from clang in favor of shuffles. llvm-svn: 229640	2015-02-18 06:24:44 +00:00
Adam Nemet	ec70942f90	[LoopAccesses] Change LAA:getInfo to return a constant reference As expected, this required a few more const-correctness fixes. Based on Hal's feedback on D7684. llvm-svn: 229634	2015-02-18 03:44:33 +00:00
Adam Nemet	213b3dce3c	[LoopAccesses] Split out LoopAccessReport from VectorizerReport The only difference between these two is that VectorizerReport adds a vectorizer-specific prefix to its messages. When LAA is used in the vectorizer context the prefix is added when we promote the LoopAccessReport into a VectorizerReport via one of the constructors. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229632	2015-02-18 03:44:25 +00:00
Adam Nemet	2126adeebf	[LoopAccesses] Add missing const to APIs in VectorizationReport When I split out LoopAccessReport from this, I need to create some temps so constness becomes necessary. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229631	2015-02-18 03:44:20 +00:00
Adam Nemet	d74d7f9eaa	[LoopAccesses] Change debug messages from LV to LAA Also add pass name as an argument to VectorizationReport::emitAnalysis. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229628	2015-02-18 03:43:37 +00:00
Adam Nemet	b645eb3a09	[LoopAccesses] Create the analysis pass This is a function pass that runs the analysis on demand. The analysis can be initiated by querying the loop access info via LAA::getInfo. It either returns the cached info or runs the analysis. Symbolic stride information continues to reside outside of this analysis pass. We may move it inside later but it's not a priority for me right now. The idea is that Loop Distribution won't support run-time stride checking at least initially. This means that when querying the analysis, symbolic stride information can be provided optionally. Whether stride information is used can invalidate the cache entry and rerun the analysis. Note that if the loop does not have any symbolic stride, the entry should be preserved across Loop Distribution and LV. Since currently the only user of the pass is LV, I just check that the symbolic stride information didn't change when using a cached result. On the LV side, LoopVectorizationLegality requests the info object corresponding to the loop from the analysis pass. A large chunk of the diff is due to LAI becoming a pointer from a reference. A test will be added as part of the -analyze patch. Also tested that with AVX, we generate identical assembly output for the testsuite (including the external testsuite) before and after. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229626	2015-02-18 03:43:24 +00:00
Adam Nemet	1c221f2fa8	[LoopAccesses] Make blockNeedsPredication static blockNeedsPredication is in LoopAccess in order to share it with the vectorizer. It's a utility needed by LoopAccess not strictly provided by it but it's a good place to share it. This makes the function static so that it no longer required to create an LoopAccessInfo instance in order to access it from LV. This was actually causing problems because it would have required creating LAI much earlier that LV::canVectorizeMemory(). This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229625	2015-02-18 03:43:19 +00:00
Adam Nemet	bad3bd2042	[LoopAccesses] Cache the result of canVectorizeMemory LAA will be an on-demand analysis pass, so we need to cache the result of the analysis. canVectorizeMemory is renamed to analyzeLoop which computes the result. canVectorizeMemory becomes the query function for the cached result. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229624	2015-02-18 03:42:57 +00:00
Adam Nemet	7664875dc2	[LoopAccesses] Stash the report from the analysis rather than emitting it The transformation passes will query this and then emit them as part of their own report. The currently only user LV is modified to do just that. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229623	2015-02-18 03:42:50 +00:00
Adam Nemet	24750973e2	[LoopAccesses] Make VectorizerParams global As LAA is becoming a pass, we can no longer pass the params to its constructor. This changes the command line flags to have external storage. These can now be accessed both from LV and LAA. VectorizerParams is moved out of LoopAccessInfo in order to shorten the code to access it. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229622	2015-02-18 03:42:43 +00:00
Adam Nemet	292fb2e17f	[LoopAccesses] Rename LoopAccessAnalysis to LoopAccessInfo LoopAccessAnalysis will be used as the name of the pass. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229621	2015-02-18 03:42:35 +00:00
Akira Hatanaka	913155d842	[InstCombine] Do not insert a GEP instruction before a landingpad instruction. InstCombiner::visitGetElementPtrInst was using getFirstNonPHI to compute the insertion point, which caused the verifier to complain when a GEP was inserted before a landingpad instruction. This commit fixes it to use getFirstInsertionPt instead. rdar://problem/19394964 llvm-svn: 229619	2015-02-18 03:30:11 +00:00
Hal Finkel	c078e835f2	[BDCE] Don't forget uses of root instructions seen before the instruction itself When visiting the initial list of "root" instructions (those which must always be alive), for those that are integer-valued (such as invokes returning an integer), we mark their bits as (initially) all dead (we might, obviously, find uses of those bits later, but all bits are assumed dead until proven otherwise). Don't do so, however, if we're already seen a use of those bits by another root instruction (such as a store). Fixes a miscompile of the sanitizer unit tests on x86_64. Also, add a debug line for visiting the root instructions, and remove a debug line which tried to print instructions being removed (printing dead instructions is dangerous, and can sometimes crash). llvm-svn: 229618	2015-02-18 03:12:28 +00:00
Benjamin Kramer	21bee91af5	Prefer SmallVector::append/insert over push_back loops. Same functionality, but hoists the vector growth out of the loop. llvm-svn: 229500	2015-02-17 15:29:18 +00:00
Elena Demikhovsky	b1743d071c	Fixed a bug in store sinking. The problem was in store-sink barrier check. Store sink barrier should be checked for ModRef (read-write) mode. http://llvm.org/bugs/show_bug.cgi?id=22613 llvm-svn: 229495	2015-02-17 13:10:05 +00:00
Hal Finkel	c9890f4fe1	[BDCE] Add a bit-tracking DCE pass BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the "aggressive DCE" pass), with the added capability to track dead bits of integer valued instructions and remove those instructions when all of the bits are dead. Currently, it does not actually do this all-bits-dead removal, but rather replaces the instruction's uses with a constant zero, and lets instcombine (and the later run of ADCE) do the rest. Because we essentially get a run of ADCE "for free" while tracking the dead bits, we also do what ADCE does and removes actually-dead instructions as well (this includes instructions newly trivially dead because all bits were dead, but not all such instructions can be removed). The motivation for this is a case like: int __attribute__((const)) foo(int i); int bar(int x) { x \|= (4 & foo(5)); x \|= (8 & foo(3)); x \|= (16 & foo(2)); x \|= (32 & foo(1)); x \|= (64 & foo(0)); x \|= (128& foo(4)); return x >> 4; } As it turns out, if you order the bit-field insertions so that all of the dead ones come last, then instcombine will remove them. However, if you pick some other order (such as the one above), the fact that some of the calls to foo() are useless is not locally obvious, and we don't remove them (without this pass). I did a quick compile-time overhead check using sqlite from the test suite (Release+Asserts). BDCE took ~0.4% of the compilation time (making it about twice as expensive as ADCE). I've not looked at why yet, but we eliminate instructions due to having all-dead bits in: External/SPEC/CFP2006/447.dealII/447.dealII External/SPEC/CINT2006/400.perlbench/400.perlbench External/SPEC/CINT2006/403.gcc/403.gcc MultiSource/Applications/ClamAV/clamscan MultiSource/Benchmarks/7zip/7zip-benchmark llvm-svn: 229462	2015-02-17 01:36:59 +00:00
Mehdi Amini	ce11b626e7	InstCombine: fold more cases of (fp_to_u/sint (u/sint_to_fp val)) Fixes radar 15486701. From: Fiona Glaser <fglaser@apple.com> llvm-svn: 229437	2015-02-16 21:47:54 +00:00
James Molloy	c4b7d913d4	Run LICM as part of the cleanup phase from the scalar optimizer. Things like LoopUnrolling can produce loop invariant values - make sure we pick them up. llvm-svn: 229419	2015-02-16 18:59:54 +00:00
Hal Finkel	4546f3de43	[ADCE] Don't indent inside an anonymous namespace To be consistent with what clang-format does, don't add extra indentation inside an anonymous namespace. NFC. llvm-svn: 229412	2015-02-16 18:08:00 +00:00
James Molloy	317bb7473f	[LoopReroll] Relax some assumptions a little. We won't find a root with index zero in any loop that we are able to reroll. However, we may find one in a non-rerollable loop, so bail gracefully instead of failing hard. llvm-svn: 229406	2015-02-16 17:02:00 +00:00
James Molloy	382c1caece	[LoopReroll] Don't crash on dead code If a PHI has no users, don't crash; bail gracefully. This shouldn't happen often, but we can make no guarantees that previous passes didn't leave dead code around. llvm-svn: 229405	2015-02-16 17:01:52 +00:00
Evgeniy Stepanov	5ad40ba7e0	[asan] Reuse a common function. Do not reimplement RoundUpToAlignment. llvm-svn: 229397	2015-02-16 14:49:37 +00:00
Aaron Ballman	0b45511a2e	Removing LLVM_DELETED_FUNCTION, as MSVC 2012 was the last reason for requiring the macro. NFC; LLVM edition. llvm-svn: 229340	2015-02-15 22:54:22 +00:00
Hal Finkel	afd400b12e	[ADCE] Convert another loop for a range-based for We can use a range-based for for the operands loop too; NFC. llvm-svn: 229319	2015-02-15 15:51:25 +00:00
Hal Finkel	bc23b1a680	[ADCE] Use inst_range and range-based fors Convert a few loops to range-based fors; NFC. llvm-svn: 229318	2015-02-15 15:51:23 +00:00
Hal Finkel	20dadad9f5	[ADCE] Fix formatting of pointer types We prefer to put the * with the variable, not with the type; NFC. llvm-svn: 229317	2015-02-15 15:47:52 +00:00
Hal Finkel	f67e71db4d	[ADCE] Fix capitalization of another local variable Bring another local variable in compliance with our naming conventions, NFC. llvm-svn: 229316	2015-02-15 15:45:30 +00:00
Hal Finkel	6660cfbeba	[ADCE] Fix capitalization of some local variables Bring some local variables in compliance with our naming conventions, NFC. llvm-svn: 229315	2015-02-15 15:45:28 +00:00
Elena Demikhovsky	79bb080d9c	Enabled cost calculation for masked memory operations. We already have implementation for cost calculation for masked memory operations. I just call it from the loop vectorizer. llvm-svn: 229290	2015-02-15 08:08:48 +00:00
Ramkumar Ramachandra	af4f23c6ae	InstCombine: propagate deref via new addDereferenceableAttr The "dereferenceable" attribute cannot be added via .addAttribute(), since it also expects a size in bytes. AttrBuilder#addAttribute or AttributeSet#addAttribute is wrapped by classes Function, InvokeInst, and CallInst. Add corresponding wrappers to AttrBuilder#addDereferenceableAttr. Having done this, propagate the dereferenceable attribute via gc.relocate, adding a test to exercise it. Note that -datalayout is required during execution over and above -instcombine, because InstCombine only optionally requires DataLayoutPass. Differential Revision: http://reviews.llvm.org/D7510 llvm-svn: 229265	2015-02-14 19:37:54 +00:00
Andrea Di Biagio	d0ba37a3e6	[optnone] Skip pass Constant Hoisting on optnone functions. Added test CodeGen/X86/constant-hoisting-optnone.ll to verify that pass Constant Hoisting is not run on optnone functions. llvm-svn: 229258	2015-02-14 15:11:48 +00:00
Duncan P. N. Exon Smith	a0ef805386	Transforms: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229202	2015-02-14 01:11:29 +00:00
Philip Reames	45adc98469	[InstCombine] When canonicalizing gep indices, prefer zext when possible If we know that the sign bit of a value being sign extended is zero, we can use a zero extension instead. This is motivated by the fact that zero extensions are generally cheaper on x86 (and most other architectures?). We already apply a similar transform in DAGCombine, this just extends that to the IR level. This comes up when we eagerly canonicalize gep indices to the width of a machine register (i64 on x86_64). To do so, we insert sign extensions (sext) to promote smaller types. Differential Revision: http://reviews.llvm.org/D7255 llvm-svn: 229189	2015-02-14 00:05:36 +00:00
Andrea Di Biagio	0c23f4cf6c	[InstCombine] Fix regression introduced at r227197. This patch fixes a problem I accidentally introduced in an instruction combine on select instructions added at r227197. That revision taught the instruction combiner how to fold a cttz/ctlz followed by a icmp plus select into a single cttz/ctlz with flag 'is_zero_undef' cleared. However, the new rule added at r227197 would have produced wrong results in the case where a cttz/ctlz with flag 'is_zero_undef' cleared was follwed by a zero-extend or truncate. In that case, the folded instruction would have been inserted in a wrong location thus leaving the CFG in an inconsistent state. This patch fixes the problem and add two reproducible test cases to existing test 'InstCombine/select-cmp-cttz-ctlz.ll'. llvm-svn: 229124	2015-02-13 16:33:34 +00:00
James Molloy	d75793f030	[SimplifyCFG] Be more aggressive Up the phi node folding threshold from a cheap "1" to a meagre "2". Update tests for extra added selects and slight code churn. llvm-svn: 229099	2015-02-13 10:48:30 +00:00
Chandler Carruth	18e8c62883	[PM] Remove the old 'PassManager.h' header file at the top level of LLVM's include tree and the use of using declarations to hide the 'legacy' namespace for the old pass manager. This undoes the primary modules-hostile change I made to keep out-of-tree targets building. I sent an email inquiring about whether this would be reasonable to do at this phase and people seemed fine with it, so making it a reality. This should allow us to start bootstrapping with modules to a certain extent along with making it easier to mix and match headers in general. The updates to any code for users of LLVM are very mechanical. Switch from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h". Qualify the types which now produce compile errors with "legacy::". The most common ones are "PassManager", "PassManagerBase", and "FunctionPassManager". llvm-svn: 229094	2015-02-13 10:01:29 +00:00
Chandler Carruth	5b9fde4431	[unroll] Concede defeat and disable the unroll analyzer for now. The issues with the new unroll analyzer are more fundamental than code cleanup, algorithm, or data structure changes. I've sent an email to the original commit thread with details and a proposal for how to redesign things. I'm disabling this for now so that we don't spend time debugging issues with it in its current state. llvm-svn: 229064	2015-02-13 05:31:46 +00:00
Michael Liao	1e3950b179	[InstCombine] Fix a bug when combining `icmp` from `ptrtoint` - First, there's a crash when we try to combine that pointers into `icmp` directly by creating a `bitcast`, which is invalid if that two pointers are from different address spaces. - It's not always appropriate to cast one pointer to another if they are from different address spaces as that is not no-op cast. Instead, we only combine `icmp` from `ptrtoint` if that two pointers are of the same address space. llvm-svn: 229063	2015-02-13 04:51:26 +00:00
Chandler Carruth	3982226b25	[unroll] Merge the simplification and DCE estimation methods on the UnrollAnalyzer. Now they share a single worklist and have less implicit state between them. There was no real benefit to separating these two things out. I'm going to subsequently refactor things to share even more code. llvm-svn: 229062	2015-02-13 04:39:05 +00:00
Chandler Carruth	8d0faf7531	[unroll] Remove pointless dyn_cast<>s to Instruction - the users of an instruction must by definition be instructions. llvm-svn: 229061	2015-02-13 04:33:21 +00:00
Chandler Carruth	93d5cf945c	[unroll] Don't check the loop set for whether an instruction is contained in it each time we try to add it to the worklist, just check this when pulling it off the worklist. That way we do it at most once per instruction with the cost of the worklist set we would need to pay anyways. llvm-svn: 229060	2015-02-13 04:30:44 +00:00
Chandler Carruth	efe46ecfb3	[unroll] Change the other worklist in the unroll analyzer to be a set vector. In addition to dramatically reducing the work required for contrived example loops, this also has to correct some serious latent bugs in the cost computation. Previously, we might add an instruction onto the worklist once for every load which it used and was simplified. Then we would visit it many times and accumulate "savings" each time. I mean, fortunately this couldn't matter for things like calls with 100s of operands, but even for binary operators this code seems like it must be double counting the savings. I just noticed this by inspection and due to the runtime problems it can introduce, I don't have any test cases for cases where the cost produced by this routine is unacceptable. llvm-svn: 229059	2015-02-13 04:27:50 +00:00
Chandler Carruth	94b76fa886	[unroll] Replace a boolean, for loop, condition, and break with std::all_of and a lambda. Much cleaner, no functionality changed. llvm-svn: 229058	2015-02-13 04:18:14 +00:00
Chandler Carruth	4ae15f9bf1	[unroll] Directly query for dead instructions. In the unroll analyzer, it is checking each user to see if that user will become dead. However, it first checked if that user was missing from the simplified values map, and then if was also missing from the dead instructions set. We add everything from the simplified values map to the dead instructions set, so the first step is completely subsumed by the second. Moreover, the first step requires inserting something into the simplified value map which isn't what we want at all. This also replaces a dyn_cast with a cast as an instruction cannot be used by a non-instruction. llvm-svn: 229057	2015-02-13 04:14:05 +00:00
Chandler Carruth	c34764a1de	[unroll] Replace a linear time check for no uses with a constant time check. Also hoist this into the enqueue process as it is faster even than testing the worklist set, we should just directly filter these out much like we filter out constants and such. llvm-svn: 229056	2015-02-13 04:06:08 +00:00
Chandler Carruth	68e0d0db9b	[unroll] Rather than an operand set, use a setvector for the worklist. We don't just want to handle duplicate operands within an instruction, but also duplicates across operands of different instructions. I should have gone straight to this, but I had convinced myself that it wasn't going to be necessary briefly. I've come to my senses after chatting more with Nick, and am now happier here. llvm-svn: 229054	2015-02-13 03:57:40 +00:00
Chandler Carruth	636bd67dfd	[unroll] Extract the code to enqueue operansd for the worklist in the unroll analysis into a lambda and call it. That's much simpler than duplicating all the code. llvm-svn: 229053	2015-02-13 03:49:41 +00:00
Chandler Carruth	5f5321daa3	[unroll] Use a small set to de-duplicate operands prior to putting them into the worklist. This avoids allocating lots of worklist memory for them when there are large numbers of repeated operands. llvm-svn: 229052	2015-02-13 03:48:38 +00:00
Chandler Carruth	ad0d6361a5	[unroll] Make the unroll cost analysis terminate deterministically and reasonably quickly. I don't have a reduced test case, but for a version of FFMPEG, this makes the loop unroller start finishing at all (after over 15 minutes of running, it hadn't terminated for me, no idea if it was a true infloop or just exponential work). The key thing here is to check the DeadInstructions set when pulling things off the worklist. Without this, we would re-walk the user list of already dead instructions again and again and again. Consider phi nodes with many, many operands and other patterns. The other important aspect of this is that because we would keep re-visiting instructions that were already known dead, we kept adding their cost savings to this! This would cause our cost savings to be insanely inflated from this. While I was here, I also rotated the operand walk out of the worklist loop to make the code easier to read. There is still work to be done to minimize worklist traffic because we don't de-duplicate operands. This means we may add the same instruction onto the worklist 1000s of times if it shows up in 1000s of operansd to a PHI node for example. Still, with this patch, the ffmpeg testcase I have finishes quickly and I can't measure the runtime impact of the unroll analysis any more. I'll probably try to do a few more cleanups to this code, but not sure how much cleanup I can justify right now. llvm-svn: 229038	2015-02-13 03:40:58 +00:00
Chandler Carruth	0249146912	[unroll] Make range based for loops a bit more explicit and more readable. The biggest thing that was causing me problems is recognizing the references vs. poniters here. I also found that for maps naming the loop variable as KeyValue helps make it obvious why you don't actually use it directly. Finally, using 'auto' instead of 'User *' doesn't seem like a good tradeoff. Much like with the other cases, I like to know its a pointer, and 'User' is just as long and tells the reader a lot more. llvm-svn: 229033	2015-02-13 02:45:17 +00:00
Chandler Carruth	cbd5e9d8c2	[IC] Fix a bug with the instcombine canonicalizing of loads and propagating of metadata. We were propagating !nonnull metadata even when the newly formed load is no longer of a pointer type. This is clearly broken and results in LLVM failing the verifier and aborting. This patch just restricts the propagation of !nonnull metadata to when we actually have a pointer type. This bug report and the initial version of this patch was provided by Charles Davis! Many thanks for finding this! We still need to add logic to round-trip the metadata correctly if we combine from pointer types to integer types and then back by using range metadata for the integer type loads. But this is the minimal and safe version of the patch, which is important so we can backport it into 3.6. llvm-svn: 229029	2015-02-13 02:30:01 +00:00
Chandler Carruth	079dd5c550	[unroll] Avoid the "Insn" abbreviation of Instruction. This is quite hard to type and read for me, and is inconsistent with the other abbreviation in the base class "Inst". For most of these (where they are used widely) I prefer just spelling it out as Instruction. I've changed two of the short-lived variables to use "Inst" to match the base class. llvm-svn: 229028	2015-02-13 02:17:39 +00:00
Chandler Carruth	fc05ca53a5	[unroll] Tidy up the integer we use to accumululate the number of instructions optimized. NFC, just separating this out from the functionality changing commit. llvm-svn: 229026	2015-02-13 02:10:56 +00:00
Chandler Carruth	dc2a7e767f	[unroll] Don't use a map from pointer to bool. Use a set. This is much more efficient. In particular, the query with the user instruction has to insert a false for every missing instruction into the set. This is just a cleanup a long the way to fixing the underlying algorithm problems here. llvm-svn: 228994	2015-02-13 00:29:39 +00:00
Michael Zolotukhin	c6b3be0511	Prevent division by 0. When we try to estimate number of potentially removed instructions in loop unroller, we analyze first N iterations and then scale the computed number by TripCount/N. We should bail out early if N is 0. llvm-svn: 228988	2015-02-13 00:17:03 +00:00
Chandler Carruth	43545494aa	[unroll] Update the new analysis logic from r228265 to use modern coding conventions for function names consistently. Some were already using this but not all. llvm-svn: 228987	2015-02-13 00:00:24 +00:00
Benjamin Kramer	3b1b5b8725	InstCombine: Allow folding of xor into icmp by changing the predicate for vectors The loop vectorizer can create this pattern. llvm-svn: 228954	2015-02-12 20:26:46 +00:00
James Molloy	a6fbcd3b92	[LoopRerolling] Be more forgiving with instruction order. We can't solve the full subgraph isomorphism problem. But we can allow obvious cases, where for example two instructions of different types are out of order. Due to them having different types/opcodes, there is no ambiguity. llvm-svn: 228931	2015-02-12 15:54:14 +00:00
Dmitry Vyukov	149a56946f	tsan: do not instrument not captured values I've built some tests in WebRTC with and without this change. With this change number of __tsan_read/write calls is reduced by 20-40%, binary size decreases by 5-10% and execution time drops by ~5%. For example: $ ls -l old/modules_unittests new/modules_unittests -rwxr-x--- 1 dvyukov 41708976 Jan 20 18:35 old/modules_unittests -rwxr-x--- 1 dvyukov 38294008 Jan 20 18:29 new/modules_unittests $ objdump -d old/modules_unittests \| egrep "callq.__tsan_(read\|write\|unaligned)" \| wc -l 239871 $ objdump -d new/modules_unittests \| egrep "callq.__tsan_(read\|write\|unaligned)" \| wc -l 148365 http://reviews.llvm.org/D7069 llvm-svn: 228917	2015-02-12 09:55:28 +00:00
Chandler Carruth	2af75e99bb	[slp] Fix a nasty bug in the SLP vectorizer that Joerg pointed out. Apparently some code finally started to tickle this after my canonicalization changes to instcombine. The bug stems from trying to form a vector type out of scalars that aren't compatible at all. In this example, from x86_mmx values. The code in the vectorizer that checks for reasonable types whas checking for aggregates or vectors, but there are lots of other types that should just never reach the vectorizer. Debugging this was made more confusing by the lie in an assert in VectorType::get() -- it isn't that the types are primitive. The types must be integer, pointer, or floating point types. No other types are allowed. I've improved the assert and added a helper to the vectorizer to handle the element type validity checks. It now re-uses the VectorType static function and then further excludes weird target-specific types that we probably shouldn't be touching here (x86_fp80 and ppc_fp128). Neither of these are really reachable anyways (neither 80-bit nor 128-bit things will get vectorized) but it seems better to just eagerly exclude such nonesense. I've added a test case, but while it definitely covers two of the paths through this code there may be more paths that would benefit from test coverage. I'm not familiar enough with the SLP vectorizer to synthesize test cases for all of these, but was able to update the code itself by inspection. llvm-svn: 228899	2015-02-12 02:30:56 +00:00
Tim Northover	f976f969cd	DeadArgElim: aggregate Return assessment properly. I mistakenly thought the liveness of each "RetVal(F, i)" depended only on F. It actually depends on the index too, which means we need to be careful about how the results are combined before return. In particular if a single Use returns Live, that counts for the entire object, at the granularity we're considering. llvm-svn: 228885	2015-02-11 23:13:11 +00:00
Mehdi Amini	3c8f7ac243	Reassociate: cannot negate a INT_MIN value Summary: When trying to canonicalize negative constants out of multiplication expressions, we need to check that the constant is not INT_MIN which cannot be negated. Reviewers: mcrosier Reviewed By: mcrosier Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7286 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 228872	2015-02-11 19:54:44 +00:00
James Molloy	ba8cd33738	[SimplifyCFG] Swap to using TargetTransformInfo for cost analysis. We're already using TTI in SimplifyCFG, so remove the hard-baked "cheapness" heuristic and use TTI directly. Generally NFC intended, but we're using a slightly different heuristic now so there is a slight test churn. Test changes: * combine-comparisons-by-cse.ll: Removed unneeded branch check. * 2014-08-04-muls-it.ll: Test now doesn't branch but emits muleq. * coalesce-subregs.ll: Superfluous block check. * 2008-01-02-hoist-fp-add.ll: fadd is safe to speculate. Change to udiv. * PhiBlockMerge.ll: Superfluous CFG checking code. Main checks still present. * select-gep.ll: A variable GEP is not expensive, just TCC_Basic, according to the TTI. llvm-svn: 228826	2015-02-11 12:15:41 +00:00
James Molloy	c9ce650708	[LoopReroll] Introduce the concept of DAGRootSets. A DAGRootSet models an induction variable being used in a rerollable loop. For example: x[i3+0] = y1 x[i3+1] = y2 x[i3+2] = y3 Base instruction -> i3 +---+----+ / \| \ ST[y1] +1 +2 <-- Roots \| \| ST[y2] ST[y3] There may be multiple DAGRootSets, for example: x[i2+0] = ... (1) x[i2+1] = ... (1) x[i2+4] = ... (2) x[i2+5] = ... (2) x[(i+1234)2+5678] = ... (3) x[(i+1234)2+5679] = ... (3) This concept is similar to the "Scale" member used previously, but allows multiple independent sets of roots based off the same induction variable. llvm-svn: 228821	2015-02-11 09:19:47 +00:00
Zachary Turner	76143c865c	Use ADDITIONAL_HEADER_DIRS in all LLVM CMake projects. This allows IDEs to recognize the entire set of header files for each of the core LLVM projects. Differential Revision: http://reviews.llvm.org/D7526 Reviewed By: Chris Bieneman llvm-svn: 228798	2015-02-11 03:28:02 +00:00
Justin Bogner	8b002ea8d6	InstrProf: Lower coverage mappings by setting their sections appropriately Add handling for __llvm_coverage_mapping to the InstrProfiling pass. We need to make sure the constant and any profile names it refers to are in the correct sections, which is easier and cleaner to do here where we have to know about profiling sections anyway. This is really tricky to test without a frontend, so I'm committing the test for the fix in clang. If anyone knows a good way to test this within LLVM, please let me know. Fixes PR22531. llvm-svn: 228793	2015-02-11 02:52:44 +00:00
Reid Kleckner	86643b627c	Don't promote asynch EH invokes of nounwind functions to calls If the landingpad of the invoke is using a personality function that catches asynch exceptions, then it can catch a trap. Also add some landingpads to invalid LLVM IR test cases that lack them. Over-the-shoulder reviewed by David Majnemer. llvm-svn: 228782	2015-02-11 01:23:16 +00:00
David Majnemer	530afeca43	EarlyCSE: It isn't safe to CSE across synchronization boundaries This fixes PR22514. llvm-svn: 228760	2015-02-10 23:09:43 +00:00
Tim Northover	0eba2eb04d	DeadArgElim: arguments affect all returned sub-values by default. Unless we meet an insertvalue on a path from some value to a return, that value will be live if any of the return's components are live, so all of those components must be added to the MaybeLiveUses. Previously we were deleting arguments if sub-value 0 turned out to be dead. llvm-svn: 228731	2015-02-10 19:49:18 +00:00
Chandler Carruth	3f645fb28b	Revert r228556: InstCombine: propagate nonNull through assume This commit isn't using the correct context, and is transfoming calls that are operands to loads rather than calls that are operands to an icmp feeding into an assume. I've replied on the original review thread with a very reduced test case and some thoughts on how to rework this. llvm-svn: 228677	2015-02-10 08:07:32 +00:00
Philip Reames	da2562bb8a	Adjust how we avoid poll insertion inside the poll function (NFC) I realized that my early fix for this was overly complicated. Rather than scatter checks around in a bunch of places, just exit early when we visit the poll function itself. Thinking about it a bit, the whole inlining mechanism used with gc.safepoint_poll could probably be cleaned up a bit. Originally, poll insertion was fused with gc relocation rewriting. It might be worth going back to see if we can simplify the chain of events now that these two are seperated. As one thought, maybe it makes sense to rewrite calls inside the helper function before inlining it to the many callers. This would require us to visit the poll function before any other functions though.. llvm-svn: 228634	2015-02-10 00:04:53 +00:00
Adrian Prantl	c70d198b1d	Debug info: When updating debug info during SROA, do not emit debug info for any padding introduced by SROA. In particular, do not emit debug info for an alloca that represents only the padding introduced by a previous iteration. Fixes PR22495. llvm-svn: 228632	2015-02-09 23:57:22 +00:00
Adrian Prantl	f10ec50249	Debug info: Use DW_OP_bit_piece instead of DW_OP_piece in the intermediate representation. This - increases consistency by using the same granularity everywhere - allows for pieces < 1 byte - DW_OP_piece didn't actually allow storing an offset. Part of PR22495. llvm-svn: 228631	2015-02-09 23:57:15 +00:00
Ramkumar Ramachandra	545e586a0e	[Statepoint] Improve two asserts, fix some style (NFC) Summary: It's important that our users immediately know what gc.safepoint_poll is. Also fix the style of the declaration of CreateGCStatepoint, in preparation for another change that will wrap it. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7517 llvm-svn: 228626	2015-02-09 23:02:10 +00:00
Ramkumar Ramachandra	054737a0c9	PlaceSafepoints: modernize gc.result.* -> gc.result Differential Revision: http://reviews.llvm.org/D7516 llvm-svn: 228625	2015-02-09 23:00:40 +00:00
Philip Reames	1372127d53	Update file comment to clarify points highlighted in review (NFC) llvm-svn: 228621	2015-02-09 22:44:03 +00:00
Philip Reames	9655ce0aa3	Use range for loops in PlaceSafepoints (NFC) llvm-svn: 228620	2015-02-09 22:26:11 +00:00
Duncan P. N. Exon Smith	ec76782680	IR: Take uint64_t in DIBuilder::createExpression() `DIExpression` deals with `uint64_t`, so it doesn't make sense that `createExpression()` is created from `int64_t`. Switch to `uint64_t` to unify them. I've temporarily left in the `int64_t` version, which forwards to the `uint64_t` version. I'll delete it once I've updated the callers. llvm-svn: 228619	2015-02-09 22:13:27 +00:00
Philip Reames	4ae49786df	Add basic tests for PlaceSafepoints This is just adding really simple tests which should have been part of the original submission. When doing so, I discovered that I'd mistakenly removed required pieces when preparing the patch for upstream submission. I fixed two such bugs in this submission. llvm-svn: 228610	2015-02-09 21:48:05 +00:00

1 2 3 4 5 ...

12617 Commits