llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00

Author	SHA1	Message	Date
Tim Northover	8a4cb5ce31	Loop strength reduce: fix function name. llvm-svn: 199801	2014-01-22 13:27:00 +00:00
Chandler Carruth	e90b399e43	[SROA] Fix a bug which could cause the common type finding to return inconsistent results for different orderings of alloca slices. The fundamental issue is that it is just always a mistake to return early from this function. There is no effective early exit to leverage. This patch stops trynig to do so and simplifies the code a bit as a consequence. Original diagnosis and patch by James Molloy with some name tweaks by me in part reflecting feedback from Duncan Smith on the mailing list. llvm-svn: 199771	2014-01-21 23:16:05 +00:00
Owen Anderson	e0205fdcd8	Fix all the remaining lost-fast-math-flags bugs I've been able to find. The most important of these are cases in the generic logic for combining BinaryOperators. This logic hadn't been updated to handle FastMathFlags, and it took me a while to detect it because it doesn't show up in a simple search for CreateFAdd. llvm-svn: 199629	2014-01-20 07:44:53 +00:00
Benjamin Kramer	813eb189fa	InstCombine: Modernize a bunch of cast combines. Also make them vector-aware. llvm-svn: 199608	2014-01-19 20:05:13 +00:00
Benjamin Kramer	319cbf6707	InstCombine: Hoist 3 copies of AddOne/SubOne into a header. llvm-svn: 199605	2014-01-19 16:56:10 +00:00
Benjamin Kramer	47d4c4c113	InstCombine: Replace a hand-rolled version of isKnownToBeAPowerOfTwo with the real thing. llvm-svn: 199604	2014-01-19 16:48:41 +00:00
Benjamin Kramer	0de38fdc6a	InstCombine: Teach most integer add/sub/mul/div combines how to deal with vectors. llvm-svn: 199602	2014-01-19 15:24:22 +00:00
Benjamin Kramer	b864b5d907	InstCombine: Refactor fmul/fdiv combines to handle vectors. llvm-svn: 199598	2014-01-19 13:36:27 +00:00
Chandler Carruth	8b7504e0a3	Fix a really nasty SROA bug with how we handled out-of-bounds memcpy intrinsics. Reported on the list by Evan with a couple of attempts to fix, but it took a while to dig down to the root cause. There are two overlapping bugs here, both centering around the circumstance of discovering a memcpy operand which is known to be completely outside the bounds of the alloca. First, we need to kill the other side of the memcpy if it was added to this alloca. Otherwise we'll factor it into our slicing and try to rewrite it even though we know for a fact that it is dead. This is made more tricky because we can visit the sides in either order. So we have to both kill the other side and skip instructions marked as dead. The latter really should be goodness in every case, but here is a matter of correctness. Second, we need to actually remove the uses of the alloca by the memcpy when queuing it for later deletion. Otherwise it may still be using the alloca when we go to promote it (if the rewrite re-uses the existing alloca instruction). Do this by factoring out the use-clobbering used when for nixing a Phi argument and re-using it across the operands of a to-be-deleted instruction. llvm-svn: 199590	2014-01-19 12:16:54 +00:00
Arnold Schwaighofer	2c67b7dc58	LoopVectorizer: A reduction that has multiple uses of the reduction value is not a reduction. Really. Under certain circumstances (the use list of an instruction has to be set up right - hence the extra pass in the test case) we would not recognize when a value in a potential reduction cycle was used multiple times by the reduction cycle. Fixes PR18526. radar://15851149 llvm-svn: 199570	2014-01-19 03:18:31 +00:00
Nick Lewycky	f31f7a5863	Don't refuse to transform constexpr(call(arg, ...)) to call(constexpr(arg), ...)) just because the function has multiple return values even if their return types are the same. Patch by Eduard Burtescu! llvm-svn: 199564	2014-01-18 22:47:12 +00:00
Benjamin Kramer	ace2801d74	InstCombine: Make the (fmul X, -1.0) -> (fsub -0.0, X) transform handle vectors too. PR18532. llvm-svn: 199553	2014-01-18 16:43:14 +00:00
Owen Anderson	8750294bae	Fix more instances of dropped fast math flags when optimizing FADD instructions. All found by inspection (aka grep). llvm-svn: 199528	2014-01-18 00:48:14 +00:00
Kostya Serebryany	88b5111b60	[asan] extend asan-coverage (still experimental). - add a mode for collecting per-block coverage (-asan-coverage=2). So far the implementation is naive (all blocks are instrumented), the performance overhead on top of asan could be as high as 30%. - Make sure the one-time calls to __sanitizer_cov are moved to function buttom, which in turn required to copy the original debug info into the call insn. Here is the performance data on SPEC 2006 (train data, comparing asan with asan-coverage={0,1,2}): asan+cov0 asan+cov1 diff 0-1 asan+cov2 diff 0-2 diff 1-2 400.perlbench, 65.60, 65.80, 1.00, 76.20, 1.16, 1.16 401.bzip2, 65.10, 65.50, 1.01, 75.90, 1.17, 1.16 403.gcc, 1.64, 1.69, 1.03, 2.04, 1.24, 1.21 429.mcf, 21.90, 22.60, 1.03, 23.20, 1.06, 1.03 445.gobmk, 166.00, 169.00, 1.02, 205.00, 1.23, 1.21 456.hmmer, 88.30, 87.90, 1.00, 91.00, 1.03, 1.04 458.sjeng, 210.00, 222.00, 1.06, 258.00, 1.23, 1.16 462.libquantum, 1.73, 1.75, 1.01, 2.11, 1.22, 1.21 464.h264ref, 147.00, 152.00, 1.03, 160.00, 1.09, 1.05 471.omnetpp, 115.00, 116.00, 1.01, 140.00, 1.22, 1.21 473.astar, 133.00, 131.00, 0.98, 142.00, 1.07, 1.08 483.xalancbmk, 118.00, 120.00, 1.02, 154.00, 1.31, 1.28 433.milc, 19.80, 20.00, 1.01, 20.10, 1.02, 1.01 444.namd, 16.20, 16.20, 1.00, 17.60, 1.09, 1.09 447.dealII, 41.80, 42.20, 1.01, 43.50, 1.04, 1.03 450.soplex, 7.51, 7.82, 1.04, 8.25, 1.10, 1.05 453.povray, 14.00, 14.40, 1.03, 15.80, 1.13, 1.10 470.lbm, 33.30, 34.10, 1.02, 34.10, 1.02, 1.00 482.sphinx3, 12.40, 12.30, 0.99, 13.00, 1.05, 1.06 llvm-svn: 199488	2014-01-17 11:00:30 +00:00
Quentin Colombet	b42dbc5117	[opt][PassInfo] Allow opt to run passes that need target machine. When registering a pass, a pass can now specify a second construct that takes as argument a pointer to TargetMachine. The PassInfo class has been updated to reflect that possibility. If such a constructor exists opt will use it instead of the default constructor when instantiating the pass. Since such IR passes are supposed to be rare, no specific support has been added to this commit to allow an easy registration of such a pass. In other words, for such pass, the initialization function has to be hand-written (see CodeGenPrepare for instance). Now, codegenprepare can be tested using opt: opt -codegenprepare -mtriple=mytriple input.ll llvm-svn: 199430	2014-01-16 21:44:34 +00:00
Owen Anderson	9c1a615059	Fix two cases where we could lose fast math flags when optimizing FADD expressions. llvm-svn: 199427	2014-01-16 21:26:02 +00:00
Owen Anderson	dbdd830886	Fix an instance where we would drop fast math flags when performing an fdiv to reciprocal multiply transformation. llvm-svn: 199425	2014-01-16 21:07:52 +00:00
Owen Anderson	2c40c9a6c0	Fix a bug in InstCombine where we failed to preserve fast math flags when optimizing an FMUL expression. llvm-svn: 199424	2014-01-16 20:59:41 +00:00
Owen Anderson	a218b5b798	Teach InstCombine that (fmul X, -1.0) can be simplified to (fneg X), which LLVM expresses as (fsub -0.0, X). llvm-svn: 199420	2014-01-16 20:36:42 +00:00
Evgeniy Stepanov	5b1a672532	[asan] Remove -fsanitize-address-zero-base-shadow command line flag from clang, and disable zero-base shadow support on all platforms where it is not the default behavior. - It is completely unused, as far as we know. - It is ABI-incompatible with non-zero-base shadow, which means all objects in a process must be built with the same setting. Failing to do so results in a segmentation fault at runtime. - It introduces a backward dependency of compiler-rt on user code, which is uncommon and complicates testing. This is the LLVM part of a larger change. llvm-svn: 199371	2014-01-16 10:19:12 +00:00
Hans Wennborg	efa9ef0e63	Switch-to-lookup tables: set threshold to 3 cases There has been an old FIXME to find the right cut-off for when it's worth analyzing and potentially transforming a switch to a lookup table. The switches always have two or more cases. I could not measure any speed-up by transforming a switch with two cases. A switch with three cases gets a nice speed-up, and I couldn't measure any compile-time regression, so I think this is the right threshold. In a Clang self-host, this causes 480 new switches to be transformed, and reduces the final binary size with 8 KB. llvm-svn: 199294	2014-01-15 05:00:27 +00:00
Arnold Schwaighofer	9fb94754bd	LoopVectorize: Only strip casts from integer types when replacing symbolic strides Fixes PR18480. llvm-svn: 199291	2014-01-15 03:35:46 +00:00
Matt Arsenault	babc737d7b	Do pointer cast simplifications on addrspacecast llvm-svn: 199254	2014-01-14 20:00:45 +00:00
Matt Arsenault	a5adc47c53	Remove a check for an illegal condition. Bitcasts can't be between address spaces anymore. llvm-svn: 199253	2014-01-14 19:56:57 +00:00
Matt Arsenault	50ba8b89a7	Make nocapture analysis work with addrspacecast llvm-svn: 199246	2014-01-14 19:11:52 +00:00
Duncan P. N. Exon Smith	bb847bd59e	Reapply "LTO: add API to set strategy for -internalize" Reapply r199191, reverted in r199197 because it carelessly broke Other/link-opts.ll. The problem was that calling createInternalizePass("main") would select createInternalizePass(bool("main")) instead of createInternalizePass(ArrayRef<const char >("main")). This commit fixes the bug. The original commit message follows. Add API to LTOCodeGenerator to specify a strategy for the -internalize pass. This is a new attempt at Bill's change in r185882, which he reverted in r188029 due to problems with the gold linker. This puts the onus on the linker to decide whether (and what) to internalize. In particular, running internalize before outputting an object file may change a 'weak' symbol into an internal one, even though that symbol could be needed by an external object file --- e.g., with arclite. This patch enables three strategies: - LTO_INTERNALIZE_FULL: the default (and the old behaviour). - LTO_INTERNALIZE_NONE: skip -internalize. - LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden visibility. LTO_INTERNALIZE_FULL should be used when linking an executable. Outputting an object file (e.g., via ld -r) is more complicated, and depends on whether hidden symbols should be internalized. E.g., for ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and LTO_INTERNALIZE_HIDDEN can be used otherwise. However, LTO_INTERNALIZE_FULL is inappropriate, since the output object file will eventually need to link with others. lto_codegen_set_internalize_strategy() sets the strategy for subsequent calls to lto_codegen_write_merged_modules() and lto_codegen_compile(). <rdar://problem/14334895> llvm-svn: 199244	2014-01-14 18:52:17 +00:00
Nico Rieck	964a13bb4e	Decouple dllexport/dllimport from linkage Representing dllexport/dllimport as distinct linkage types prevents using these attributes on templates and inline functions. Instead of introducing further mixed linkage types to include linkonce and weak ODR, the old import/export linkage types are replaced with a new separate visibility-like specifier: define available_externally dllimport void @f() {} @Var = dllexport global i32 1, align 4 Linkage for dllexported globals and functions is now equal to their linkage without dllexport. Imported globals and functions must be either declarations with external linkage, or definitions with AvailableExternallyLinkage. llvm-svn: 199218	2014-01-14 15:22:47 +00:00
Nico Rieck	e8a579c6bc	Revert "Decouple dllexport/dllimport from linkage" Revert this for now until I fix an issue in Clang with it. This reverts commit r199204. llvm-svn: 199207	2014-01-14 12:38:32 +00:00
Nico Rieck	6203d44313	Decouple dllexport/dllimport from linkage Representing dllexport/dllimport as distinct linkage types prevents using these attributes on templates and inline functions. Instead of introducing further mixed linkage types to include linkonce and weak ODR, the old import/export linkage types are replaced with a new separate visibility-like specifier: define available_externally dllimport void @f() {} @Var = dllexport global i32 1, align 4 Linkage for dllexported globals and functions is now equal to their linkage without dllexport. Imported globals and functions must be either declarations with external linkage, or definitions with AvailableExternallyLinkage. llvm-svn: 199204	2014-01-14 11:55:03 +00:00
NAKAMURA Takumi	068c8352f7	Revert r199191, "LTO: add API to set strategy for -internalize" Please update also Other/link-opts.ll, in next time. llvm-svn: 199197	2014-01-14 09:40:18 +00:00
Duncan P. N. Exon Smith	95dadb39e4	LTO: add API to set strategy for -internalize Add API to LTOCodeGenerator to specify a strategy for the -internalize pass. This is a new attempt at Bill's change in r185882, which he reverted in r188029 due to problems with the gold linker. This puts the onus on the linker to decide whether (and what) to internalize. In particular, running internalize before outputting an object file may change a 'weak' symbol into an internal one, even though that symbol could be needed by an external object file --- e.g., with arclite. This patch enables three strategies: - LTO_INTERNALIZE_FULL: the default (and the old behaviour). - LTO_INTERNALIZE_NONE: skip -internalize. - LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden visibility. LTO_INTERNALIZE_FULL should be used when linking an executable. Outputting an object file (e.g., via ld -r) is more complicated, and depends on whether hidden symbols should be internalized. E.g., for ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and LTO_INTERNALIZE_HIDDEN can be used otherwise. However, LTO_INTERNALIZE_FULL is inappropriate, since the output object file will eventually need to link with others. lto_codegen_set_internalize_strategy() sets the strategy for subsequent calls to lto_codegen_write_merged_modules() and lto_codegen_compile*(). <rdar://problem/14334895> llvm-svn: 199191	2014-01-14 06:37:26 +00:00
Chandler Carruth	98adff6224	[PM] Split DominatorTree into a concrete analysis result object which can be used by both the new pass manager and the old. This removes it from any of the virtual mess of the pass interfaces and lets it derive cleanly from the DominatorTreeBase<> template. In turn, tons of boilerplate interface can be nuked and it turns into a very straightforward extension of the base DominatorTree interface. The old analysis pass is now a simple wrapper. The names and style of this split should match the split between CallGraph and CallGraphWrapperPass. All of the users of DominatorTree have been updated to match using many of the same tricks as with CallGraph. The goal is that the common type remains the resulting DominatorTree rather than the pass. This will make subsequent work toward the new pass manager significantly easier. Also in numerous places things became cleaner because I switched from re-running the pass (!!! mid way through some other passes run!!!) to directly recomputing the domtree. llvm-svn: 199104	2014-01-13 13:07:17 +00:00
Chandler Carruth	59e885531a	[PM] Pull the generic graph algorithms and data structures for dominator trees into the Support library. These are all expressed in terms of the generic GraphTraits and CFG, with no reliance on any concrete IR types. Putting them in support clarifies that and makes the fact that the static analyzer in Clang uses them much more sane. When moving the Dominators.h file into the IR library I claimed that this was the right home for it but not something I planned to work on. Oops. So why am I doing this? It happens to be one step toward breaking the requirement that IR verification can only be performed from inside of a pass context, which completely blocks the implementation of verification for the new pass manager infrastructure. Fixing it will also allow removing the concept of the "preverify" step (WTF???) and allow the verifier to cleanly flag functions which fail verification in a way that precludes even computing dominance information. Currently, that results in a fatal error even when you ask the verifier to not fatally error. It's awesome like that. The yak shaving will continue... llvm-svn: 199095	2014-01-13 10:52:56 +00:00
Chandler Carruth	ee051af6e2	[cleanup] Move the Dominators.h and Verifier.h headers into the IR directory. These passes are already defined in the IR library, and it doesn't make any sense to have the headers in Analysis. Long term, I think there is going to be a much better way to divide these matters. The dominators code should be fully separated into the abstract graph algorithm and have that put in Support where it becomes obvious that evn Clang's CFGBlock's can use it. Then the verifier can manually construct dominance information from the Support-driven interface while the Analysis library can provide a pass which both caches, reconstructs, and supports a nice update API. But those are very long term, and so I don't want to leave the really confusing structure until that day arrives. llvm-svn: 199082	2014-01-13 09:26:24 +00:00
Chandler Carruth	03b6c941a3	Re-sort #include lines again, prior to moving headers around. llvm-svn: 199080	2014-01-13 08:04:33 +00:00
Hans Wennborg	f5c5f6e123	Switch-to-lookup tables: Don't require a result for the default case when the lookup table doesn't have any holes. This means we can build a lookup table for switches like this: switch (x) { case 0: return 1; case 1: return 2; case 2: return 3; case 3: return 4; default: exit(1); } The default case doesn't yield a constant result here, but that doesn't matter, since a default result is only necessary for filling holes in the lookup table, and this table doesn't have any holes. This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB off the resulting clang binary. llvm-svn: 199025	2014-01-12 00:44:41 +00:00
Arnold Schwaighofer	15e9d90974	LoopVectorizer: Enable strided memory accesses versioning per default I saw no compile or execution time regressions on x86_64 -mavx -O3. radar://13075509 llvm-svn: 199015	2014-01-11 20:40:34 +00:00
NAKAMURA Takumi	fbff75f61d	LoopVectorize.cpp: Appease MSC16. Excuse me, I hope msc16 builders would be fine till its end day. Introduce nullptr then. ;) llvm-svn: 199001	2014-01-11 09:59:27 +00:00
Diego Novillo	f47aa4d47f	Extend and simplify the sample profile input file. 1- Use the line_iterator class to read profile files. 2- Allow comments in profile file. Lines starting with '#' are completely ignored while reading the profile. 3- Add parsing support for discriminators and indirect call samples. Our external profiler can emit more profile information that we are currently not handling. This patch does not add new functionality to support this information, but it allows profile files to provide it. I will add actual support later on (for at least one of these features, I need support for DWARF discriminators in Clang). A sample line may contain the following additional information: Discriminator. This is used if the sampled program was compiled with DWARF discriminator support (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This is currently only emitted by GCC and we just ignore it. Potential call targets and samples. If present, this line contains a call instruction. This models both direct and indirect calls. Each called target is listed together with the number of samples. For example, 130: 7 foo:3 bar:2 baz:7 The above means that at relative line offset 130 there is a call instruction that calls one of foo(), bar() and baz(). With baz() being the relatively more frequent call target. Differential Revision: http://llvm-reviews.chandlerc.com/D2355 4- Simplify format of profile input file. This implements earlier suggestions to simplify the format of the sample profile file. The symbol table is not necessary and function profiles do not need to know the number of samples in advance. Differential Revision: http://llvm-reviews.chandlerc.com/D2419 llvm-svn: 198973	2014-01-10 23:23:51 +00:00
Diego Novillo	9e8454b3fe	Propagation of profile samples through the CFG. This adds a propagation heuristic to convert instruction samples into branch weights. It implements a similar heuristic to the one implemented by Dehao Chen on GCC. The propagation proceeds in 3 phases: 1- Assignment of block weights. All the basic blocks in the function are initial assigned the same weight as their most frequently executed instruction. 2- Creation of equivalence classes. Since samples may be missing from blocks, we can fill in the gaps by setting the weights of all the blocks in the same equivalence class to the same weight. To compute the concept of equivalence, we use dominance and loop information. Two blocks B1 and B2 are in the same equivalence class if B1 dominates B2, B2 post-dominates B1 and both are in the same loop. 3- Propagation of block weights into edges. This uses a simple propagation heuristic. The following rules are applied to every block B in the CFG: - If B has a single predecessor/successor, then the weight of that edge is the weight of the block. - If all the edges are known except one, and the weight of the block is already known, the weight of the unknown edge will be the weight of the block minus the sum of all the known edges. If the sum of all the known edges is larger than B's weight, we set the unknown edge weight to zero. - If there is a self-referential edge, and the weight of the block is known, the weight for that edge is set to the weight of the block minus the weight of the other incoming edges to that block (if known). Since this propagation is not guaranteed to finalize for every CFG, we only allow it to proceed for a limited number of iterations (controlled by -sample-profile-max-propagate-iterations). It currently uses the same GCC default of 100. Before propagation starts, the pass builds (for each block) a list of unique predecessors and successors. This is necessary to handle identical edges in multiway branches. Since we visit all blocks and all edges of the CFG, it is cleaner to build these lists once at the start of the pass. Finally, the patch fixes the computation of relative line locations. The profiler emits lines relative to the function header. To discover it, we traverse the compilation unit looking for the subprogram corresponding to the function. The line number of that subprogram is the line where the function begins. That becomes line zero for all the relative locations. llvm-svn: 198972	2014-01-10 23:23:46 +00:00
Arnold Schwaighofer	702d83d3d8	LoopVectorizer: Handle strided memory accesses by versioning for (i = 0; i < N; ++i) A[i * Stride1] += B[i * Stride2]; We take loops like this and check that the symbolic strides 'Strided1/2' are one and drop to the scalar loop if they are not. This is currently disabled by default and hidden behind the flag 'enable-mem-access-versioning'. radar://13075509 llvm-svn: 198950	2014-01-10 18:20:32 +00:00
Chandler Carruth	53468087f3	Put the functionality for printing a value to a raw_ostream as an operand into the Value interface just like the core print method is. That gives a more conistent organization to the IR printing interfaces -- they are all attached to the IR objects themselves. Also, update all the users. This removes the 'Writer.h' header which contained only a single function declaration. llvm-svn: 198836	2014-01-09 02:29:41 +00:00
Hao Liu	8c08e05c81	Fix a bug about generating undef operand when optimising shuffle vector and insert element in instruction combine. llvm-svn: 198730	2014-01-08 03:06:15 +00:00
Chandler Carruth	7aa902a488	Move the LLVM IR asm writer header files into the IR directory, as they are part of the core IR library in order to support dumping and other basic functionality. Rename the 'Assembly' include directory to 'AsmParser' to match the library name and the only functionality left their -- printing has been in the core IR library for quite some time. Update all of the #includes to match. All of this started because I wanted to have the layering in good shape before I started adding support for printing LLVM IR using the new pass infrastructure, and commandline support for the new pass infrastructure. llvm-svn: 198688	2014-01-07 12:34:26 +00:00
Chandler Carruth	87f14b4eec	Re-sort all of the includes with ./utils/sort_includes.py so that subsequent changes are easier to review. About to fix some layering issues, and wanted to separate out the necessary churn. Also comment and sink the include of "Windows.h" in three .inc files to match the usage in Memory.inc. llvm-svn: 198685	2014-01-07 11:48:04 +00:00
Andrew Trick	bb6ce38639	Reapply r198654 "indvars: sink truncates outside the loop." This doesn't seem to have actually broken anything. It was paranoia on my part. Trying again now that bots are more stable. This is a follow up of the r198338 commit that added truncates for lcssa phi nodes. Sinking the truncates below the phis cleans up the loop and simplifies subsequent analysis within the indvars pass. llvm-svn: 198678	2014-01-07 06:59:12 +00:00
Andrew Trick	6d854ef50f	Revert "indvars: sink truncates outside the loop." This reverts commit r198654. One of the bots reported a SciMark failure. llvm-svn: 198659	2014-01-07 01:50:58 +00:00
Andrew Trick	7621f7c6a3	indvars: sink truncates outside the loop. This is a follow up of the r198338 commit that added truncates for lcssa phi nodes. Sinking the truncates below the phis cleans up the loop and simplifies subsequent analysis within the indvars pass. llvm-svn: 198654	2014-01-07 01:02:55 +00:00
Andrew Trick	7236fefab6	80 col. comment. llvm-svn: 198653	2014-01-07 01:02:52 +00:00
Andrew Trick	12dfc32452	Reapply r198478 "Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things." Now with a fix for PR18384: ValueHandleBase::ValueIsDeleted. We need to invalidate SCEV's loop info when we delete a block, even if no values are hoisted. llvm-svn: 198631	2014-01-06 19:43:14 +00:00
Alp Toker	b20c031b7a	Add missed cleanup from r198456 All other uses of this macro in LLVM/clang have been moved to the function definition so follow suite (and the usage advice) here too for consistency. llvm-svn: 198516	2014-01-04 22:47:48 +00:00
Alp Toker	2d17611e90	Revert "Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things." This commit was the source of crasher PR18384: While deleting: label %for.cond127 An asserting value handle still pointed to this value! UNREACHABLE executed at llvm/lib/IR/Value.cpp:671! Reverting to get the builders green, feel free to re-land after fixing up. (Renato has a handy isolated repro if you need it.) This reverts commit r198478. llvm-svn: 198503	2014-01-04 17:00:45 +00:00
Andrew Trick	45ef495b91	Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things. getSCEV for an ashr instruction creates an intermediate zext expression when it truncates its operand. The operand is initially inside the loop, so the narrow zext expression has a non-loop-invariant loop disposition. LoopSimplify then runs on an outer loop, hoists the ashr operand, and properly invalidate the SCEVs that are mapped to value. The SCEV expression for the ashr is now an AddRec with the hoisted value as the now loop-invariant start value. The LoopDisposition of this wide value was properly invalidated during LoopSimplify. However, if we later get the ashr SCEV again, we again try to create the intermediate zext expression. We get the same SCEV that we did earlier, and it is still cached because it was never mapped to a Value. When we try to create a new AddRec we abort because we're using the old non-loop-invariant LoopDisposition. I don't have a solution for this other than to clear LoopDisposition when LoopSimplify hoists things. I think the long-term strategy should be to perform LoopSimplify on all loops before computing SCEV and before running any loop opts on individual loops. It's possible we may want to rerun LoopSimplify on individual loops, but it should rarely do anything, so rarely require invalidating SCEV. llvm-svn: 198478	2014-01-04 05:52:49 +00:00
Nico Weber	7e53ec0698	Add a LLVM_DUMP_METHOD macro. The motivation is to mark dump methods as used in debug builds so that they can be called from lldb, but to not do so in release builds so that they can be dead-stripped. There's lots of potential follow-up work suggested in the thread "Should dump methods be LLVM_ATTRIBUTE_USED only in debug builds?" on cfe-dev, but everyone seems to agreen on this subset. Macro name chosen by fair coin toss. llvm-svn: 198456	2014-01-03 22:53:37 +00:00
David Peixotto	2028917754	Fix loop rerolling pass failure with non-consant loop lower bound The loop rerolling pass was failing with an assertion failure from a failed cast on loops like this: void foo(int A, int B, int m, int n) { for (int i = m; i < n; i+=4) { A[i+0] = B[i+0] * 4; A[i+1] = B[i+1] * 4; A[i+2] = B[i+2] * 4; A[i+3] = B[i+3] * 4; } } The code was casting the SCEV-expanded code for the new induction variable to a phi-node. When the loop had a non-constant lower bound, the SCEV expander would end the code expansion with an add insted of a phi node and the cast would fail. It looks like the cast to a phi node was only needed to get the induction variable value coming from the backedge to compute the end of loop condition. This patch changes the loop reroller to compare the induction variable to the number of times the backedge is taken instead of the iteration count of the loop. In other words, we stop the loop when the current value of the induction variable == IterationCount-1. Previously, the comparison was comparing the induction variable value from the next iteration == IterationCount. This problem only seems to occur on 32-bit targets. For some reason, the loop is not rerolled on 64-bit targets. PR18290 llvm-svn: 198425	2014-01-03 17:20:01 +00:00
Hal Finkel	df8016f76f	Disable compare sinking in CodeGenPrepare when multiple condition registers are available As noted in the comment above CodeGenPrepare::OptimizeInst, which aggressively sinks compares to reduce pressure on the condition register(s), for targets such as PowerPC with multiple condition registers, this may not be the right thing to do. This adds an HasMultipleConditionRegisters boolean to TLI, and CodeGenPrepare::OptimizeInst is skipped when HasMultipleConditionRegisters is true. This functionality will be used by the PowerPC backend in an upcoming commit. Especially when the PowerPC backend starts tracking individual condition register bits as separate allocatable entities (which will happen in this upcoming commit), this sinking from CodeGenPrepare::OptimizeInst is significantly suboptimial. llvm-svn: 198354	2014-01-02 21:13:43 +00:00
Andrew Trick	9bdab3f1b3	indvars: cleanup the IV visitor. It does more than gather sext/zext info. llvm-svn: 198353	2014-01-02 21:12:11 +00:00
Matt Arsenault	e28f607079	Delete unread globals through addrspacecast llvm-svn: 198346	2014-01-02 20:01:43 +00:00
Matt Arsenault	090fe5a92a	Fix addrspacecast with metadata globals llvm-svn: 198345	2014-01-02 19:53:49 +00:00
Andrew Trick	5f76ab650f	indvars: insert truncate at loop boundary to avoid redundant IVs. When widening an IV to remove s/zext, we generally try to eliminate the original narrow IV. However, LCSSA phi nodes outside the loop were still using the original IV. Clean this up more aggressively to avoid redundancy in generated code. llvm-svn: 198338	2014-01-02 19:29:38 +00:00
Nico Weber	10bf32e628	Set LLVM_EXPORTED_SYMBOL_FILE in CMakeLists whose corresponding Makefiles do so. (unittests/ExecutionEngine/JIT/CMakeLists.txt is still missing for now, since it handles export files in a strange way: It generates a .exports file from a .def file instead of the other way round.) llvm-svn: 198183	2013-12-29 23:06:49 +00:00
Alexander Potapenko	7da398bcae	[ASan] Fix the test for __asan_gen_ globals and actually fix http://llvm.org/bugs/show_bug.cgi?id=17976 by setting the correct linkage (as stated in the bug). llvm-svn: 198018	2013-12-25 16:46:27 +00:00
Alexander Potapenko	53694d2efb	[ASan] Make sure none of the __asan_gen_ global strings end up in the symbol table, add a test. This should fix http://llvm.org/bugs/show_bug.cgi?id=17976 Another test checking for the global variables' locations and prefixes on Darwin will be committed separately. llvm-svn: 198017	2013-12-25 14:22:15 +00:00
Andrew Trick	e7f9f5556d	Add support to indvars for optimizing sadd.with.overflow. Split sadd.with.overflow into add + sadd.with.overflow to allow analysis and optimization. This should ideally be done after InstCombine, which can perform code motion (eventually indvars should run after all canonical instcombines). We want ISEL to recombine the add and the check, at least on x86. This is currently under an option for reducing live induction variables: -liv-reduce. The next step is reducing liveness of IVs that are live out of the overflow check paths. Once the related optimizations are fully developed, reviewed and tested, I do expect this to become default. llvm-svn: 197926	2013-12-23 23:31:49 +00:00
Richard Sandiford	f367c783a7	Fix Scalarizer insertion point when replacing PHIs with insertelements If the Scalarizer scalarized a vector PHI but could not scalarize all uses of it, it would insert a series of insertelements to reconstruct the vector PHI value from the scalar ones. The problem was that it would emit these insertelements immediately after the PHI, even if there were other PHIs after it. llvm-svn: 197909	2013-12-23 14:51:56 +00:00
Richard Sandiford	27fc4a21a8	Fix Scalarizer handling of vector GEPs with multiple index operands The old code only worked for one index operand. Also handle "inbounds". llvm-svn: 197908	2013-12-23 14:45:00 +00:00
Kostya Serebryany	a148c8c9ed	[asan] don't unpoison redzones on function exit in use-after-return mode. Summary: Before this change the instrumented code before Ret instructions looked like: <Unpoison Frame Redzones> if (Frame != OriginalFrame) // I.e. Frame is fake <Poison Complete Frame> Now the instrumented code looks like: if (Frame != OriginalFrame) // I.e. Frame is fake <Poison Complete Frame> else <Unpoison Frame Redzones> Reviewers: eugenis Reviewed By: eugenis CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2458 llvm-svn: 197907	2013-12-23 14:15:08 +00:00
Kostya Serebryany	911683de1d	[asan] produce fewer stores when poisoning stack shadow llvm-svn: 197904	2013-12-23 09:24:36 +00:00
Justin Bogner	3b4e34606e	Transforms: Don't create bad weights when eliminating dead cases If we happen to eliminate every case in a switch that has branch weights, we currently try to create metadata for the one remaining branch, triggering an assert. Instead, we need to check that the metadata we're trying to create is sensible. llvm-svn: 197791	2013-12-20 08:21:30 +00:00
Kay Tiong Khoo	86f36f1147	Stay classy (and legal) LLVM. Remove links to 3rd party SMT solver whose links may not be permanent. llvm-svn: 197713	2013-12-19 18:35:54 +00:00
Kay Tiong Khoo	304e305b5c	Improved fix for PR17827 (instcombine of shift/and/compare). This change fixes the case of arithmetic shift right - do not attempt to fold that case. This change also relaxes the conditions when attempting to fold the logical shift right and shift left cases. No additional IR-level test cases included at this time. See http://llvm.org/bugs/show_bug.cgi?id=17827 for proofs that these are correct transformations. llvm-svn: 197705	2013-12-19 18:07:17 +00:00
Evgeniy Stepanov	301154310a	[dfsan] Simplify code after r197677. llvm-svn: 197679	2013-12-19 14:37:03 +00:00
Evgeniy Stepanov	0cd4eea1b6	Add an explicit insert point argument to SplitBlockAndInsertIfThen. Currently SplitBlockAndInsertIfThen requires that branch condition is an Instruction itself, which is very inconvenient, because it is sometimes an Operator, or even a Constant. llvm-svn: 197677	2013-12-19 13:29:56 +00:00
Arnold Schwaighofer	e4d65aae7d	LoopVectorizer: Don't if-convert constant expressions that can trap A phi node operand or an instruction operand could be a constant expression that can trap (division). Check that we don't vectorize such cases. PR16729 radar://15653590 llvm-svn: 197449	2013-12-17 01:11:01 +00:00
Yi Jiang	67f2e8e3f8	Enable double to float shrinking optimizations for binary functions like 'fmin/fmax'. Fix radar:15283121 llvm-svn: 197434	2013-12-16 22:42:40 +00:00
Hal Finkel	89ba3023da	Fix a use-after-free error in GlobalOpt CleanupConstantGlobalUsers GlobalOpt's CleanupConstantGlobalUsers function uses a worklist array to manage constant users to be visited. The pointers in this array need to be weak handles because when we delete a constant array, we may also be holding a pointer to one of its elements (or an element of one of its elements if we're dealing with an array of arrays) in the worklist. Fixes PR17347. llvm-svn: 197178	2013-12-12 20:45:24 +00:00
Hal Finkel	4492bf3e5d	Initialize the barrier pass llvm::initializeIPO The barrier pass is a temporary hack, and should go away soon. Nevertheless, if we don't initialize it, then opt will not understand -barrier, and this will break bugpoint (because when it dumps the passes from the default pass manager -barrier will be there). llvm-svn: 197177	2013-12-12 20:45:08 +00:00
Yi Jiang	f648b7ec82	Resubmit r196544: Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, x) ―> __exp10(x) llvm-svn: 197109	2013-12-12 01:55:04 +00:00
NAKAMURA Takumi	54fa39136d	Prune redundant dependencies in LLVMBuild.txt. llvm-svn: 196988	2013-12-11 00:30:57 +00:00
Reid Kleckner	29793e54b1	[asan] Fix the coverage.cc test broken by r196939 It was failing because ASan was adding all of the following to one function: - dynamic alloca - stack realignment - inline asm This patch avoids making the static alloca dynamic when coverage is used. ASan should probably not be inserting empty inline asm blobs to inhibit duplicate tail elimination. llvm-svn: 196973	2013-12-10 21:49:28 +00:00
NAKAMURA Takumi	3fda77b6c7	Add proper dependencies to LLVMBuild.txt in llvm/lib. I'll prune redundant deps in LLVMBuild.txt, later. llvm-svn: 196881	2013-12-10 05:39:34 +00:00
NAKAMURA Takumi	b2c60b7ca7	Whitespaces. llvm-svn: 196880	2013-12-10 05:39:12 +00:00
Justin Bogner	42f9d0cf93	Transforms: Don't create bad branch weights when folding a switch This avoids creating branch weight metadata of length one when we fold cases into the default of a switch instruction, which was triggering an assert. llvm-svn: 196845	2013-12-10 00:13:41 +00:00
Manman Ren	4fcc808139	Revert 196544 due to internal bot failures. llvm-svn: 196732	2013-12-08 20:28:33 +00:00
Mark Seaborn	72517bc221	Fix inlining to not lose the "cleanup" clause from landingpads This fixes PR17872. This bug can lead to C++ destructors not being called when they should be, when an exception is thrown. llvm-svn: 196711	2013-12-08 00:51:21 +00:00
Mark Seaborn	2d856cb007	Fix inlining to not produce duplicate landingpad clauses Before this change, inlining one "invoke" into an outer "invoke" call site can lead to the outer landingpad's catch/filter clauses being copied multiple times into the resulting landingpad. This happens: * when the inlined function contains multiple "resume" instructions, because forwardResume() copies the clauses but is called multiple times; * when the inlined function contains a "resume" and a "call", because HandleCallsInBlockInlinedThroughInvoke() copies the clauses but is redundant with forwardResume(). Fix this by deduplicating the code. This problem doesn't lead to any incorrect execution; it's only untidy. This change will make fixing PR17872 a little easier. llvm-svn: 196710	2013-12-08 00:50:58 +00:00
Jakub Staszak	11e1c882f7	Don't #include heavy Dominators.h file in LoopInfo.h. This change reduces overall time of LLVM compilation by ~1%. llvm-svn: 196667	2013-12-07 21:20:17 +00:00
Matt Arsenault	db406f2a95	Fix assert with copy from global through addrspacecast llvm-svn: 196638	2013-12-07 02:58:45 +00:00
Duncan P. N. Exon Smith	de77610c42	Don't use isNullValue to evaluate ConstantExpr ConstantExpr can evaluate to false even when isNullValue gives false. Fixes PR18143. llvm-svn: 196611	2013-12-06 21:48:36 +00:00
Kostya Serebryany	f0897919b4	[asan] fix ndebug build with strict warnings (-Wunused-variable) llvm-svn: 196574	2013-12-06 09:26:09 +00:00
Kostya Serebryany	fc32a3e5d2	[asan] rewrite asan's stack frame layout Summary: Rewrite asan's stack frame layout. First, most of the stack layout logic is moved into a separte file to make it more testable and (potentially) useful for other projects. Second, make the frames more compact by using adaptive redzones (smaller for small objects, larger for large objects). Third, try to minimized gaps due to large alignments (this is hypothetical since today we don't see many stack vars aligned by more than 32). The frames indeed become more compact, but I'll still need to run more benchmarks before committing, but I am sking for review now to get early feedback. This change will be accompanied by a trivial change in compiler-rt tests to match the new frame sizes. Reviewers: samsonov, dvyukov Reviewed By: samsonov CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2324 llvm-svn: 196568	2013-12-06 09:00:17 +00:00
Yi Jiang	0bd0569be5	Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, x) ―> __exp10(x) llvm-svn: 196544	2013-12-05 22:42:50 +00:00
Renato Golin	a4d4a4c44f	Add #pragma vectorize enable/disable to LLVM The intended behaviour is to force vectorization on the presence of the flag (either turn on or off), and to continue the behaviour as expected in its absence. Tests were added to make sure the all cases are covered in opt. No tests were added in other tools with the assumption that they should use the PassManagerBuilder in the same way. This patch also removes the outdated -late-vectorize flag, which was on by default and not helping much. The pragma metadata is being attached to the same place as other loop metadata, but nothing forbids one from attaching it to a function (to enable #pragma optimize) or basic blocks (to hint the basic-block vectorizers), etc. The logic should be the same all around. Patches to Clang to produce the metadata will be produced after the initial implementation is agreed upon and committed. Patches to other vectorizers (such as SLP and BB) will be added once we're happy with the pass manager changes. llvm-svn: 196537	2013-12-05 21:20:02 +00:00
Michael Gottesman	f2e7969964	Change std::deque => std::vector. No functionality change. There is no reason to use std::deque here over std::vector. Thus given the performance differences inbetween the two it makes sense to change deque to vector. llvm-svn: 196524	2013-12-05 18:42:12 +00:00
Rafael Espindola	2ad993fb14	Fix non-deterministic behavior. We use CSEBlocks to initialize a worklist: SmallVector<BasicBlock *, 8> CSEWorkList(CSEBlocks.begin(), CSEBlocks.end()); so it must have a deterministic order. llvm-svn: 196520	2013-12-05 18:28:01 +00:00
Arnold Schwaighofer	120880c780	SLPVectorizer: An in-tree vectorized entry cannot also be a scalar external use We were creating external uses for scalar values in MustGather entries that also had a ScalarToTreeEntry (they also are present in a vectorized tuple). This meant we would keep a value 'alive' as a scalar and vectorized causing havoc. This is not necessary because when we create a MustGather vector we explicitly create external uses entries for the insertelement instructions of the MustGather vector elements. Fixes PR18129. radar://15582184 llvm-svn: 196508	2013-12-05 15:14:40 +00:00
Kostya Serebryany	eb57b3e248	[tsan] fix PR18146: sometimes a variable written into vptr could have an integer type (after other optimizations) llvm-svn: 196507	2013-12-05 15:03:02 +00:00
Alp Toker	e845f8af67	Correct word hyphenations This patch tries to avoid unrelated changes other than fixing a few hyphen-related ambiguities and contractions in nearby lines. llvm-svn: 196471	2013-12-05 05:44:44 +00:00
Yuchen Wu	9559c6af5d	llvm-cov: Replace size() with empty() in bool check. llvm-svn: 196400	2013-12-04 19:18:23 +00:00
Daniel Jasper	ca41e63412	Un-revert r196358: "llvm-cov: Added support for function checksums." And add the proper fix. llvm-svn: 196367	2013-12-04 08:57:17 +00:00

1 2 3 4 5 ...

11109 Commits