llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 13:02:52 +02:00

Author	SHA1	Message	Date
Renato Golin	a4d4a4c44f	Add #pragma vectorize enable/disable to LLVM The intended behaviour is to force vectorization on the presence of the flag (either turn on or off), and to continue the behaviour as expected in its absence. Tests were added to make sure the all cases are covered in opt. No tests were added in other tools with the assumption that they should use the PassManagerBuilder in the same way. This patch also removes the outdated -late-vectorize flag, which was on by default and not helping much. The pragma metadata is being attached to the same place as other loop metadata, but nothing forbids one from attaching it to a function (to enable #pragma optimize) or basic blocks (to hint the basic-block vectorizers), etc. The logic should be the same all around. Patches to Clang to produce the metadata will be produced after the initial implementation is agreed upon and committed. Patches to other vectorizers (such as SLP and BB) will be added once we're happy with the pass manager changes. llvm-svn: 196537	2013-12-05 21:20:02 +00:00
Hal Finkel	a22a21165f	Disable unrolling in the loop vectorizer when disabled in the pass manager When unrolling is disabled in the pass manager, the loop vectorizer should also not unroll loops. This will allow the -fno-unroll-loops option in Clang to behave as expected (even for vectorizable loops). The loop vectorizer's -force-vector-unroll option will (continue to) override the pass-manager setting (including -force-vector-unroll=0 to force use of the internal auto-selection logic). In order to test this, I added a flag to opt (-disable-loop-unrolling) to force disable unrolling through opt (the analog of -fno-unroll-loops in Clang). Also, this fixes a small bug in opt where the loop vectorizer was enabled only after the pass manager populated the queue of passes (the global_alias.ll test needed a slight update to the RUN line as a result of this fix). llvm-svn: 189499	2013-08-28 18:33:10 +00:00
Nadav Rotem	96f8f45bd5	Add support for bottom-up SLP vectorization infrastructure. This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations. The infrastructure has three potential users: 1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]). 2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute. 3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization. This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code: void SAXPY(int x, int y, int a, int i) { x[i] = a * x[i] + y[i]; x[i+1] = a * x[i+1] + y[i+1]; x[i+2] = a * x[i+2] + y[i+2]; x[i+3] = a * x[i+3] + y[i+3]; } llvm-svn: 179117	2013-04-09 19:44:35 +00:00
Hal Finkel	5e320e9019	BBVectorize: Cap the number of candidate pairs in each instruction group For some basic blocks, it is possible to generate many candidate pairs for relatively few pairable instructions. When many (tens of thousands) of these pairs are generated for a single instruction group, the time taken to generate and rank the different vectorization plans can become quite large. As a result, we now cap the number of candidate pairs within each instruction group. This is done by closing out the group once the threshold is reached (set now at 3000 pairs). Although this will limit the overall compile-time impact, this may not be the best way to achieve this result. It might be better, for example, to prune excessive candidate pairs after the fact the prevent the generation of short, but highly-connected groups. We can experiment with this in the future. This change reduces the overall compile-time slowdown of the csa.ll test case in PR15222 to ~5x. If 5x is still considered too large, a lower limit can be used as the default. This represents a functionality change, but only for very large inputs (thus, there is no regression test). llvm-svn: 175251	2013-02-15 04:28:42 +00:00
Nadav Rotem	2c25a05088	LoopVectorizer: Use the "optsize" attribute to decide if we are allowed to increase the function size. llvm-svn: 170004	2012-12-12 19:29:45 +00:00
Nadav Rotem	edcebe4904	LoopVectorizer: When -Os is used, vectorize only loops that dont require a tail loop. There is no testcase because I dont know of a way to initialize the loop vectorizer pass without adding an additional hidden flag. llvm-svn: 169950	2012-12-12 01:11:46 +00:00
Nadav Rotem	7a8cea8699	minor renaming, documentation and cleanups. llvm-svn: 169175	2012-12-03 22:57:09 +00:00
Chandler Carruth	a490793037	Use the new script to sort the includes of every file under lib. Sooooo many of these had incorrect or strange main module includes. I have manually inspected all of these, and fixed the main module include to be the nearest plausible thing I could find. If you own or care about any of these source files, I encourage you to take some time and check that these edits were sensible. I can't have broken anything (I strictly added headers, and reordered them, never removed), but they may not be the headers you'd really like to identify as containing the API being implemented. Many forward declarations and missing includes were added to a header files to allow them to parse cleanly when included first. The main module rule does in fact have its merits. =] llvm-svn: 169131	2012-12-03 16:50:05 +00:00
Nadav Rotem	8303c909c7	Add a loop vectorizer. llvm-svn: 166112	2012-10-17 18:25:06 +00:00
Hal Finkel	89ff4e2b47	Allow BBVectorize to form non-2^n-length vectors. The original algorithm only used recursive pair fusion of equal-length types. This is now extended to allow pairing of any types that share the same underlying scalar type. Because we would still generally prefer the 2^n-length types, those are formed first. Then a second set of iterations form the non-2^n-length types. Also, a call to SimplifyInstructionsInBlock has been added after each pairing iteration. This takes care of DCE (and a few other things) that make the following iterations execute somewhat faster. For the same reason, some of the simple shuffle-combination cases are now handled internally. There is some additional refactoring work to be done, but I've had many requests for this feature, so additional refactoring will come soon in future commits (as will additional test cases). llvm-svn: 159330	2012-06-28 05:42:42 +00:00
Hal Finkel	409cab2a0a	Allow controlling vectorization of boolean values separately from other integer types. These are used as the result of comparisons, and often handled differently from larger integer types. llvm-svn: 159111	2012-06-24 13:28:01 +00:00
Hal Finkel	d0a65988d8	Allow BBVectorize to fuse compare instructions. llvm-svn: 159088	2012-06-23 21:52:50 +00:00
Hal Finkel	c55edb7b35	Enhance BBVectorize to more-properly handle pointer values and vectorize GEPs. llvm-svn: 154734	2012-04-14 07:32:43 +00:00
Hal Finkel	12b4c41203	Add support to BBVectorize for vectorizing selects. llvm-svn: 154700	2012-04-13 20:45:45 +00:00
Hongbin Zheng	48758c581f	Refactor: Use positive field names in VectorizeConfig. llvm-svn: 154249	2012-04-07 03:56:23 +00:00
Hongbin Zheng	7a4e40f87f	Introduce the VectorizeConfig class, with which we can control the behavior of the BBVectorizePass without using command line option. As pointed out by Hal, we can ask the TargetLoweringInfo for the architecture specific VectorizeConfig to perform vectorizing with architecture specific information. llvm-svn: 154096	2012-04-05 15:46:55 +00:00
Hongbin Zheng	8d380b332d	Add the function "vectorizeBasicBlock" which allow users vectorize a BasicBlock in other passes, e.g. we can call vectorizeBasicBlock in the loop unroll pass right after the loop is unrolled. llvm-svn: 154089	2012-04-05 08:05:16 +00:00
Hal Finkel	8cf5de5774	Add a basic-block autovectorization pass. This is the initial checkin of the basic-block autovectorization pass along with some supporting vectorization infrastructure. Special thanks to everyone who helped review this code over the last several months (especially Tobias Grosser). llvm-svn: 149468	2012-02-01 03:51:43 +00:00

18 Commits