llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 04:52:54 +02:00

Author	SHA1	Message	Date
Dehao Chen	997f895ee0	Increases full-unroll threshold. Summary: The default threshold for fully unroll is too conservative. This patch doubles the full-unroll threshold This change will affect the following speccpu2006 benchmarks (performance numbers were collected from Intel Sandybridge): Performance: 403 0.11% 433 0.51% 445 0.48% 447 3.50% 453 1.49% 464 0.75% Code size: 403 0.56% 433 0.96% 445 2.16% 447 2.96% 453 0.94% 464 8.02% The compiler time overhead is similar with code size. Reviewers: davidxl, mkuper, mzolotukhin, hfinkel, chandlerc Reviewed By: hfinkel, chandlerc Subscribers: mehdi_amini, zzheng, efriedma, haicheng, hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D28368 llvm-svn: 295538	2017-02-18 03:46:51 +00:00
Matthew Simpson	bf448c694c	[LV] Remove constant restriction for vector phi creation We previously only created a vector phi node for an induction variable if its step had a constant integer type. However, the step actually only needs to be loop-invariant. We only handle inductions having loop-invariant steps, so this patch should enable vector phi node creation for all integer induction variables that will be vectorized. Differential Revision: https://reviews.llvm.org/D29956 llvm-svn: 295456	2017-02-17 16:09:07 +00:00
Matthew Simpson	96843032a3	Reapply "[LV] Extend trunc optimization to all IVs with constant integer steps" This reapplies commit r294967 with a fix for the execution time regressions caught by the clang-cmake-aarch64-quick bot. We now extend the truncate optimization to non-primary induction variables only if the truncate isn't already free. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 295063	2017-02-14 16:28:32 +00:00
Karl-Johan Karlsson	9b88013fdb	Revert "[LoopVectorize] Added address space check when analysing interleaved accesses" This reverts r295038. The buildbot clang-with-thin-lto-ubuntu failed. I'm reverting to investigate. llvm-svn: 295042	2017-02-14 10:06:16 +00:00
Karl-Johan Karlsson	31668a8c6e	[LoopVectorize] Added address space check when analysing interleaved accesses Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 llvm-svn: 295038	2017-02-14 08:14:06 +00:00
Matthew Simpson	c00928de84	Revert "[LV] Extend trunc optimization to all IVs with constant integer steps" This reverts commit r294967. This patch caused execution time slowdowns in a few LLVM test-suite tests, as reported by the clang-cmake-aarch64-quick bot. I'm reverting to investigate. llvm-svn: 294973	2017-02-13 18:02:35 +00:00
Matthew Simpson	658ae0c60e	[LV] Extend trunc optimization to all IVs with constant integer steps This patch extends the optimization of truncations whose operand is an induction variable with a constant integer step. Previously we were only applying this optimization to the primary induction variable. However, the cost model assumes the optimization is applied to the truncation of all integer induction variables (even regardless of step type). The transformation is now applied to the other induction variables, and I've updated the cost model to ensure it is better in sync with the transformation we actually perform. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 294967	2017-02-13 16:48:00 +00:00
Dorit Nuzman	15c8d7c6d1	[LV/LoopAccess] Check statically if an unknown dependence distance can be proven larger than the loop-count This fixes PR31098: Try to resolve statically data-dependences whose compile-time-unknown distance can be proven larger than the loop-count, instead of resorting to runtime dependence checking (which are not always possible). For vectorization it is sufficient to prove that the dependence distance is >= VF; But in some cases we can prune unknown dependence distances early, and even before selecting the VF, and without a runtime test, by comparing the distance against the loop iteration count. Since the vectorized code will be executed only if LoopCount >= VF, proving distance >= LoopCount also guarantees that distance >= VF. This check is also equivalent to the Strong SIV Test. Reviewers: mkuper, anemet, sanjoy Differential Revision: https://reviews.llvm.org/D28044 llvm-svn: 294892	2017-02-12 09:32:53 +00:00
Ahmed Bougacha	6d1fcfb76b	[ARM] Make f16 interleaved accesses expensive. There are no vldN/vstN f16 variants, even with +fullfp16. We could use the i16 variants, but, in practice, even with +fullfp16, the f16 sequence leading to the i16 shuffle usually gets scalarized. We'd need to improve our support for f16 codegen before getting there. Teach the cost model to consider f16 interleaved operations as expensive. Otherwise, we are all but guaranteed to end up with a large block of scalarized vector code. llvm-svn: 294819	2017-02-11 01:53:04 +00:00
Dehao Chen	a75059ebaa	Encode duplication factor from loop vectorization and loop unrolling to discriminator. Summary: This patch starts the implementation as discuss in the following RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html When optimization duplicates code that will scale down the execution count of a basic block, we will record the duplication factor as part of discriminator so that the offline process tool can find the duplication factor and collect the accurate execution frequency of the corresponding source code. Two important optimization that fall into this category is loop vectorization and loop unroll. This patch records the duplication factor for these 2 optimizations. The recording will be guarded by a flag encode-duplication-in-discriminators, which is off by default. Reviewers: probinson, aprantl, davidxl, hfinkel, echristo Reviewed By: hfinkel Subscribers: mehdi_amini, anemet, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D26420 llvm-svn: 294782	2017-02-10 21:09:07 +00:00
Matthew Simpson	d972d92018	[LV] Remove type restriction for vector phi creation We previously only created a vector phi node for an induction variable if its type matched the type of the canonical induction variable. Differential Revision: https://reviews.llvm.org/D29776 llvm-svn: 294755	2017-02-10 16:15:26 +00:00
Elena Demikhovsky	edaa790008	[Loop Vectorizer] Cost-based decision for vectorization form of memory instruction. Making the cost model selecting between Interleave, GatherScatter or Scalar vectorization form of memory instruction. The right decision should be done for non-consecutive memory access instrcuctions that may have more than one vectorization solution. This patch includes the following changes: - Cost Model calculates the cost of Load/Store vector form and choose the better option between Widening, Interleave, GatherScactter and Scalarization. Cost Model keeps the widening decision. - Arrays of Uniform and Scalar values are moved from Legality to Cost Model. - Cost Model collects Uniforms and Scalars per VF. The collection is based on CM decision map of Loadis/Stores vectorization form. - Vectorization of memory instruction is performed according to the CM decision. Differential Revision: https://reviews.llvm.org/D27919 llvm-svn: 294503	2017-02-08 19:25:23 +00:00
Craig Topper	1cb14e7d5f	[X86] Remove PCOMMIT instruction support since Intel has deprecated this instruction with no plans to release products with it. Intel's documentation for the deprecation https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction llvm-svn: 294405	2017-02-08 05:45:39 +00:00
Matthew Simpson	df3b8cddb2	[LV] Add new ARM/AArch64 interleaved access cost model tests (NFC) llvm-svn: 294342	2017-02-07 19:34:24 +00:00
Matthew Simpson	307380d151	[LV] Simplify ARM/AArch64 interleaved access cost model tests (NFC) This patch removes unneeded instructions from the existing ARM/AArch64 interleaved access cost model tests. I'll be adding a similar set of tests in a follow-on patch to increase coverage. llvm-svn: 294336	2017-02-07 19:17:44 +00:00
Adam Nemet	835438236b	[LV] Also port failure remarks to new OptimizationRemarkEmitter API llvm-svn: 293866	2017-02-02 05:41:51 +00:00
Chandler Carruth	b3230220b9	[LV] Fix an issue where forming LCSSA in the place that we did would change the set of uniform instructions in the loop causing an assert failure. The problem is that the legalization checking also builds data structures mapping various facts about the loop body. The immediate cause was the set of uniform instructions. If these then change when LCSSA is formed, the data structures would already have been built and become stale. The included test case triggered an assert in loop vectorize that was reduced out of the new PM's pipeline. The solution is to form LCSSA early enough that no information is cached across the changes made. The only really obvious position is outside of the main logic to vectorize the loop. This also has the advantage of removing one case where forming LCSSA could mutate the loop but we wouldn't track that as a "Changed" state. If it is significantly advantageous to do some legalization checking prior to this, we can do a more careful positioning but it seemed best to just back off to a safe position first. llvm-svn: 293168	2017-01-26 10:41:09 +00:00
Mohammed Agabaria	ee333a2999	[X86] enable memory interleaving for X86\SLM arch. Differential Revision: https://reviews.llvm.org/D28547 llvm-svn: 293040	2017-01-25 09:14:48 +00:00
Michael Kuperstein	8718b01fcb	[LV] Run loop-simplify and LCSSA explicitly instead of "requiring" them This changes the vectorizer to explicitly use the loopsimplify and lcssa utils, instead of "requiring" the transformations as if they were analyses. This is not NFC, since it changes the LCSSA behavior - we no longer run LCSSA for all loops, but rather only for the loops we expect to modify. Differential Revision: https://reviews.llvm.org/D28868 llvm-svn: 292456	2017-01-19 00:42:28 +00:00
Michael Kuperstein	d5f940eb70	[LV] Allow reductions that have several uses outside the loop We currently check whether a reduction has a single outside user. We don't really need to require that - we just need to make sure a single value is used externally. The number of external users of that value shouldn't actually matter. Differential Revision: https://reviews.llvm.org/D28830 llvm-svn: 292424	2017-01-18 19:02:52 +00:00
Matthew Simpson	6f0aa3e381	[LV] Add requires asserts to test case llvm-svn: 292280	2017-01-17 22:21:33 +00:00
Matthew Simpson	90e56de383	[LV] Mark non-consecutive-like pointers non-uniform If a memory instruction will be vectorized, but it's pointer operand is non-consecutive-like, the instruction is a gather or scatter operation. Its pointer operand will be non-uniform. This should fix PR31671. Reference: https://llvm.org/bugs/show_bug.cgi?id=31671 Differential Revision: https://reviews.llvm.org/D28819 llvm-svn: 292254	2017-01-17 20:51:39 +00:00
Mohammed Agabaria	007bbd87af	[X86] fixing failed test in commit: r291657 Missing Requires asserts. llvm-svn: 291659	2017-01-11 09:03:11 +00:00
Mohammed Agabaria	df301aa885	[X86] updating TTI costs for arithmetic instructions on X86\SLM arch. updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657	2017-01-11 08:23:37 +00:00
Michael Kuperstein	e2a21ff3e2	[LV] Don't panic when encountering the IV of an outer loop. Bail out instead of asserting when we encounter this situation, which can actually happen. The reason the test uses the new PM is that the "bad" phi, incidentally, gets cleaned up by LoopSimplify. But LICM can create this kind of phi and preserve loop simplify form, so the cleanup has no chance to run. This fixes PR31190. We may want to solve this in a less conservative manner, since this phi is actually uniform within the inner loop (or we may want LICM to output a cleaner promotion to begin with). Differential Revision: https://reviews.llvm.org/D28490 llvm-svn: 291589	2017-01-10 19:32:30 +00:00
Matthew Simpson	d47c4174f6	[LV] Fix-up external IV users after updating dominator tree This patch delays the fix-up step for external induction variable users until after the dominator tree has been properly updated. This should fix PR30742. The SCEVExpander in InductionDescriptor::transform can generate code in the wrong location if the dominator tree is not up-to-date. We should work towards keeping the dominator tree up-to-date throughout the transformation. Reference: https://llvm.org/bugs/show_bug.cgi?id=30742 Differential Revision: https://reviews.llvm.org/D28168 llvm-svn: 291462	2017-01-09 19:05:29 +00:00
Mohammed Agabaria	caef091029	Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling. This code seems to be target dependent which may not be the same for all targets. Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'. Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general. Differential Revision: https://reviews.llvm.org/D27518 llvm-svn: 291106	2017-01-05 14:03:41 +00:00
Adrian Prantl	1f3fc31b75	Renumber testcase metadata nodes after r290153. This patch renumbers the metadata nodes in debug info testcases after https://reviews.llvm.org/D26769. This is a separate patch because it causes so much churn. This was implemented with a python script that pipes the testcases through llvm-as - \| llvm-dis - and then goes through the original and new output side-by side to insert all comments at a close-enough location. Differential Revision: https://reviews.llvm.org/D27765 llvm-svn: 290292	2016-12-22 00:45:21 +00:00
Adrian Prantl	cf7e846d11	[IR] Remove the DIExpression field from DIGlobalVariable. This patch implements PR31013 by introducing a DIGlobalVariableExpression that holds a pair of DIGlobalVariable and DIExpression. Currently, DIGlobalVariables holds a DIExpression. This is not the best way to model this: (1) The DIGlobalVariable should describe the source level variable, not how to get to its location. (2) It makes it unsafe/hard to update the expressions when we call replaceExpression on the DIGLobalVariable. (3) It makes it impossible to represent a global variable that is in more than one location (e.g., a variable with multiple DW_OP_LLVM_fragment-s). We also moved away from attaching the DIExpression to DILocalVariable for the same reasons. This reapplies r289902 with additional testcase upgrades and a change to the Bitcode record for DIGlobalVariable, that makes upgrading the old format unambiguous also for variables without DIExpressions. <rdar://problem/29250149> https://llvm.org/bugs/show_bug.cgi?id=31013 Differential Revision: https://reviews.llvm.org/D26769 llvm-svn: 290153	2016-12-20 02:09:43 +00:00
Florian Hahn	f1e8664904	[LoopVersioning] Require loop-simplify form for loop versioning. Summary: Requiring loop-simplify form for loop versioning ensures that the runtime check block always dominates the exit block. This patch closes #30958 (https://llvm.org/bugs/show_bug.cgi?id=30958). Reviewers: silviu.baranga, hfinkel, anemet, ashutosh.nema Subscribers: ashutosh.nema, mzolotukhin, efriedma, hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D27469 llvm-svn: 290116	2016-12-19 17:13:37 +00:00
Adrian Prantl	0ab6669d6d	Revert "[IR] Remove the DIExpression field from DIGlobalVariable." This reverts commit 289920 (again). I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable has not DIExpression. Unfortunately it is not possible to safely upgrade these variables without adding a flag to the bitcode record indicating which version they are. My plan of record is to roll the planned follow-up patch that adds a unit: field to DIGlobalVariable into this patch before recomitting. This way we only need one Bitcode upgrade for both changes (with a version flag in the bitcode record to safely distinguish the record formats). Sorry for the churn! llvm-svn: 289982	2016-12-16 19:39:01 +00:00
Matthew Simpson	7fb206b6f3	Reapply "[LV] Enable vectorization of loops with conditional stores by default" This patch reapplies r289863. The original patch was reverted because it exposed a bug causing the loop vectorizer to crash in the Python runtime on PPC. The underlying issue was fixed with r289958. llvm-svn: 289975	2016-12-16 19:12:02 +00:00
Matthew Simpson	add3b6cc60	[LV] Don't attempt to type-shrink scalarized instructions After r288909, instructions feeding predicated instructions may be scalarized if profitable. Since these instructions will remain scalar, we shouldn't attempt to type-shrink them. We should only truncate vector types to their minimal bit widths. This bug was exposed by enabling the vectorization of loops containing conditional stores by default. llvm-svn: 289958	2016-12-16 16:52:35 +00:00
Chandler Carruth	9362cf3865	Revert r289863: [LV] Enable vectorization of loops with conditional stores by default This uncovers a crasher in the loop vectorizer on PPC when building the Python runtime. I'll send the testcase to the review thread for the original commit. llvm-svn: 289934	2016-12-16 11:31:39 +00:00
Adrian Prantl	2345112c5b	[IR] Remove the DIExpression field from DIGlobalVariable. This patch implements PR31013 by introducing a DIGlobalVariableExpression that holds a pair of DIGlobalVariable and DIExpression. Currently, DIGlobalVariables holds a DIExpression. This is not the best way to model this: (1) The DIGlobalVariable should describe the source level variable, not how to get to its location. (2) It makes it unsafe/hard to update the expressions when we call replaceExpression on the DIGLobalVariable. (3) It makes it impossible to represent a global variable that is in more than one location (e.g., a variable with multiple DW_OP_LLVM_fragment-s). We also moved away from attaching the DIExpression to DILocalVariable for the same reasons. This reapplies r289902 with additional testcase upgrades. <rdar://problem/29250149> https://llvm.org/bugs/show_bug.cgi?id=31013 Differential Revision: https://reviews.llvm.org/D26769 llvm-svn: 289920	2016-12-16 04:25:54 +00:00
Adrian Prantl	daf4fef1f9	Revert "[IR] Remove the DIExpression field from DIGlobalVariable." This reverts commit 289902 while investigating bot berakage. llvm-svn: 289906	2016-12-16 01:00:30 +00:00
Adrian Prantl	0eee52640f	[IR] Remove the DIExpression field from DIGlobalVariable. This patch implements PR31013 by introducing a DIGlobalVariableExpression that holds a pair of DIGlobalVariable and DIExpression. Currently, DIGlobalVariables holds a DIExpression. This is not the best way to model this: (1) The DIGlobalVariable should describe the source level variable, not how to get to its location. (2) It makes it unsafe/hard to update the expressions when we call replaceExpression on the DIGLobalVariable. (3) It makes it impossible to represent a global variable that is in more than one location (e.g., a variable with multiple DW_OP_LLVM_fragment-s). We also moved away from attaching the DIExpression to DILocalVariable for the same reasons. <rdar://problem/29250149> https://llvm.org/bugs/show_bug.cgi?id=31013 Differential Revision: https://reviews.llvm.org/D26769 llvm-svn: 289902	2016-12-16 00:36:43 +00:00
Matthew Simpson	765604b11b	[LV] Enable vectorization of loops with conditional stores by default This patch sets the default value of the "-enable-cond-stores-vec" command line option to "true". Differential Revision: https://reviews.llvm.org/D27814 llvm-svn: 289863	2016-12-15 20:11:05 +00:00
Michael Kuperstein	2c0f3dbd6f	[LV] Don't vectorize when we have a small static bound on trip count We currently check if the exact trip count is known and is smaller than the "tiny loop" bound. We should be checking the maximum bound on the trip count instead. Differential Revision: https://reviews.llvm.org/D27690 llvm-svn: 289583	2016-12-13 20:38:18 +00:00
Sanjoy Das	a0e8011216	[Verifier] Add verification for TBAA metadata Summary: This change adds some verification in the IR verifier around struct path TBAA metadata. Other than some basic sanity checks (e.g. we get constant integers where we expect constant integers), this checks: - That by the time an struct access tuple `(base-type, offset)` is "reduced" to a scalar base type, the offset is `0`. For instance, in C++ you can't start from, say `("struct-a", 16)`, and end up with `("int", 4)` -- by the time the base type is `"int"`, the offset better be zero. In particular, a variant of this invariant is needed for `llvm::getMostGenericTBAA` to be correct. - That there are no cycles in a struct path. - That struct type nodes have their offsets listed in an ascending order. - That when generating the struct access path, you eventually reach the access type listed in the tbaa tag node. Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D26438 llvm-svn: 289402	2016-12-11 20:07:15 +00:00
Matthew Simpson	aa3b094d0c	[LV] Scalarize operands of predicated instructions This patch attempts to scalarize the operand expressions of predicated instructions if they were conditionally executed in the original loop. After scalarization, the expressions will be sunk inside the blocks created for the predicated instructions. The transformation essentially performs un-if-conversion on the operands. The cost model has been updated to determine if scalarization is profitable. It compares the cost of a vectorized instruction, assuming it will be if-converted, to the cost of the scalarized instruction, assuming that the instructions corresponding to each vector lane will be sunk inside a predicated block, possibly avoiding execution. If it's more profitable to scalarize the entire expression tree feeding the predicated instruction, the expression will be scalarized; otherwise, it will be vectorized. We only consider the cost of the entire expression to accurately estimate the cost of the required insertelement and extractelement instructions. Differential Revision: https://reviews.llvm.org/D26083 llvm-svn: 288909	2016-12-07 15:03:32 +00:00
Guozhi Wei	27571a5d37	[ppc] Correctly compute the cost of loading 32/64 bit memory into VSR VSX has instructions lxsiwax/lxsdx that can load 32/64 bit value into VSX register cheaply. That patch makes it known to memory cost model, so the vectorization of the test case in pr30990 is beneficial. Differential Revision: https://reviews.llvm.org/D26713 llvm-svn: 288560	2016-12-03 00:41:43 +00:00
Robert Lougher	75ef160b89	[LoopVectorizer] When estimating reg usage, unused insts may "end" another use The register usage algorithm incorrectly treats instructions whose value is not used within the loop (e.g. those that do not produce a value). The algorithm first calculates the usages within the loop. It iterates over the instructions in order, and records at which instruction index each use ends (in fact, they're actually recorded against the next index, as this is when we want to delete them from the open intervals). The algorithm then iterates over the instructions again, adding each instruction in turn to a list of open intervals. Instructions are then removed from the list of open intervals when they occur in the list of uses ended at the current index. The problem is, instructions which are not used in the loop are skipped. However, although they aren't used, the last use of a value may have been recorded against that instruction index. In this case, the use is not deleted from the open intervals, which may then bump up the estimated register usage. This patch fixes the issue by simply moving the "is used" check after the loop which erases the uses at the current index. Differential Revision: https://reviews.llvm.org/D26554 llvm-svn: 286969	2016-11-15 14:27:33 +00:00
Simon Pilgrim	bf629ee6cc	[X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason. This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit) llvm-svn: 286832	2016-11-14 14:45:16 +00:00
Adam Nemet	2d303ea2a2	[LV] Stop saying "use -Rpass-analysis=loop-vectorize" This is PR28376. Unfortunately given the current structure of optimization diagnostics we lack the capability to tell whether the user has passed -Rpass-analysis=loop-vectorize since this is local to the front-end (BackendConsumer::OptimizationRemarkHandler). So rather than printing this even if the user has already passed -Rpass-analysis, this patch just punts and stops recommending this option. I don't think that getting this right is worth the complexity. Differential Revision: https://reviews.llvm.org/D26563 llvm-svn: 286662	2016-11-11 22:51:46 +00:00
Dehao Chen	d5e59904e2	Update vectorization debug info unittest. Summary: The change will test the change in r286159. The idea behind the change: Make the dbg location different between loop header and preheader/exit. Originally, dbg location 21 exists in 3 BBs: preheader, header, critical edge (exit). Update the debug location of inside the loop header from !21 to !22 so that it will reflect the correct location. Reviewers: probinson Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26428 llvm-svn: 286403	2016-11-09 22:25:19 +00:00
Dorit Nuzman	78297c81ee	Second attempt at r285517. llvm-svn: 285568	2016-10-31 13:17:31 +00:00
Dorit Nuzman	bea49f1580	Revert r285517 due to build failures. llvm-svn: 285518	2016-10-30 14:34:57 +00:00
Dorit Nuzman	2fc6728513	[LoopVectorize] Make interleaved-accesses analysis less conservative about possible pointer-wrap-around concerns, in some cases. Before this patch, collectConstStridedAccesses (part of interleaved-accesses analysis) called getPtrStride with [Assume=false, ShouldCheckWrap=true] when examining all candidate pointers. This is too conservative. Instead, this patch makes collectConstStridedAccesses use an optimistic approach, calling getPtrStride with [Assume=true, ShouldCheckWrap=false], and then, once the candidate interleave groups have been formed, revisits the pointer-wrapping analysis but only where it matters: namely, in groups that have gaps, and where the gaps are not at the very end of the group (in which case the loop is peeled). This second time getPtrStride is called with [Assume=false, ShouldCheckWrap=true], but this could further be improved to using Assume=true, once we also add the logic to track that we are not going to meet the scev runtime checks threshold. Differential Revision: https://reviews.llvm.org/D25276 llvm-svn: 285517	2016-10-30 12:23:26 +00:00
Matthew Simpson	ca1b8b1ae6	[LV] Correct misleading comments in test (NFC) llvm-svn: 285402	2016-10-28 14:27:45 +00:00
Matthew Simpson	47bcdf8c60	[LV] Sink scalar operands of predicated instructions When we predicate an instruction (div, rem, store) we place the instruction in its own basic block within the vectorized loop. If a predicated instruction has scalar operands, it's possible to recursively sink these scalar expressions into the predicated block so that they might avoid execution. This patch sinks as much scalar computation as possible into predicated blocks. We previously were able to sink such operands only if they were extractelement instructions. Differential Revision: https://reviews.llvm.org/D25632 llvm-svn: 285097	2016-10-25 18:59:45 +00:00
Michael Kuperstein	ae8887d98b	[X86] Enable interleaved memory access by default This lets the loop vectorizer generate interleaved memory accesses on x86. Differential Revision: https://reviews.llvm.org/D25350 llvm-svn: 284779	2016-10-20 21:04:31 +00:00
Matthew Simpson	f64cb1c3a7	[LV] Avoid emitting trivially dead instructions Some instructions from the original loop, when vectorized, can become trivially dead. This happens because of the way we structure the new loop. For example, we create new induction variables and induction variable "steps" in the new loop. Thus, when we go to vectorize the original induction variable update, it may no longer be needed due to the instructions we've already created. This patch prevents us from creating these redundant instructions. This reduces code size before simplification and allows greater flexibility in code generation since we have fewer unnecessary instruction uses. Differential Revision: https://reviews.llvm.org/D25631 llvm-svn: 284631	2016-10-19 19:22:02 +00:00
Matthew Simpson	fc6d2c527b	[LV] Account for predicated stores in instruction costs This patch ensures that we scale the estimated cost of predicated stores by block probability. This is a follow-on patch for r284123. llvm-svn: 284126	2016-10-13 14:54:31 +00:00
Matthew Simpson	c7b4269af0	[LV] Avoid rounding errors for predicated instruction costs This patch modifies the cost calculation of predicated instructions (div and rem) to avoid the accumulation of rounding errors due to multiple truncating integer divisions. The calculation for predicated stores will be addressed in a follow-on patch since we currently don't scale the cost of predicated stores by block probability. Differential Revision: https://reviews.llvm.org/D25333 llvm-svn: 284123	2016-10-13 14:19:48 +00:00
Matthew Simpson	c4d75790e7	[LV] Don't mark multi-use branch conditions uniform Previously, we marked the branch conditions of latch blocks uniform after vectorization if they were instructions contained in the loop. However, if a condition instruction has users other than the branch, it may not remain uniform. This patch ensures the conditions we mark uniform are only used by the branch. This should fix PR30627. Reference: https://llvm.org/bugs/show_bug.cgi?id=30627 llvm-svn: 283563	2016-10-07 15:20:13 +00:00
Michael Kuperstein	e1ab454f4d	[LV] Remove triples from target-independent vectorizer tests. NFC. Vectorizer tests in the target-independent directory should not have a target triple. If a test really needs to query a specific backend, it belongs in the right target subdirectory (which "REQUIRES" the right backend). Otherwise, it should not specify a triple. llvm-svn: 283512	2016-10-06 23:57:25 +00:00
Matthew Simpson	f6e1d8dedb	[LV] Build all scalar steps for non-uniform induction variables When building the steps for scalar induction variables, we previously attempted to determine if all the scalar users of the induction variable were uniform. If they were, we would only emit the step corresponding to vector lane zero. This optimization was too aggressive. We generally don't know the entire set of induction variable users that will be scalar. We have isScalarAfterVectorization, but this is only a conservative estimate of the instructions that will be scalarized. Thus, an induction variable may have scalar users that aren't already known to be scalar. To avoid emitting unused steps, we can only check that the induction variable is uniform. This should fix PR30542. Reference: https://llvm.org/bugs/show_bug.cgi?id=30542 llvm-svn: 282863	2016-09-30 15:13:52 +00:00
Matthew Simpson	894b941fee	[LV] Scalarize instructions marked scalar after vectorization This patch ensures that we actually scalarize instructions marked scalar after vectorization. Previously, such instructions may have been vectorized instead. Differential Revision: https://reviews.llvm.org/D23889 llvm-svn: 282418	2016-09-26 17:08:37 +00:00
Matthew Simpson	90550420f2	[LV] Don't emit unused scalars for uniform instructions If we identify an instruction as uniform after vectorization, we know that we should only use the value corresponding to the first vector lane of each unroll iteration. However, when scalarizing such instructions, we still produce values for the other vector lanes. This patch prevents us from generating the unused scalars. Differential Revision: https://reviews.llvm.org/D24275 llvm-svn: 282087	2016-09-21 16:50:24 +00:00
Adam Nemet	f1965e4c24	[LV] When reporting about a specific instruction without debug location use loop's This can occur for example if some optimization drops the debug location. llvm-svn: 282048	2016-09-21 03:14:20 +00:00
Elena Demikhovsky	fdf2d14b30	[Loop Vectorizer] Consecutive memory access - fixed and simplified Amended consecutive memory access detection in Loop Vectorizer. Load/Store were not handled properly without preceding GEP instruction. Differential Revision: https://reviews.llvm.org/D20789 llvm-svn: 281853	2016-09-18 13:56:08 +00:00
Matthew Simpson	23308c98a2	[LV] Process pointer IVs with PHINodes in collectLoopUniforms This patch moves the processing of pointer induction variables in collectLoopUniforms from the consecutive pointer phase of the analysis to the phi node phase. Previously, if a pointer induction variable was used by both a scalarized non-memory instruction as well as a vectorized memory instruction, we would incorrectly identify the pointer as uniform. Pointer induction variables should be treated the same as other phi nodes. That is, they are uniform if all users of the induction variable and induction variable update are uniform. Differential Revision: https://reviews.llvm.org/D24511 llvm-svn: 281485	2016-09-14 14:47:40 +00:00
Peter Collingbourne	a9fbb2813b	DebugInfo: New metadata representation for global variables. This patch reverses the edge from DIGlobalVariable to GlobalVariable. This will allow us to more easily preserve debug info metadata when manipulating global variables. Fixes PR30362. A program for upgrading test cases is attached to that bug. Differential Revision: http://reviews.llvm.org/D20147 llvm-svn: 281284	2016-09-13 01:12:59 +00:00
Matthew Simpson	aa17e4a420	[LV] Ensure proper handling of multi-use case when collecting uniforms The test case included in r280979 wasn't checking what it was supposed to be checking for the predicated store case. Fixing the test revealed that the multi-use case (when a pointer is used by both vectorized and scalarized memory accesses) wasn't being handled properly. We can't skip over non-consecutive-like pointers since they may have looked consecutive-like with a different memory access. llvm-svn: 280992	2016-09-08 21:38:26 +00:00
Matthew Simpson	a98b17ce43	[LV] Don't mark pointers used by scalarized memory accesses uniform Previously, all consecutive pointers were marked uniform after vectorization. However, if a consecutive pointer is used by a memory access that is eventually scalarized, the pointer won't remain uniform after all. An example is predicated stores. Even though a predicated store may be consecutive, it will still be scalarized, making it's pointer operand non-uniform. This patch updates the logic in collectLoopUniforms to consider the cases where a memory access may be scalarized. If a memory access may be scalarized, its pointer operand is not marked uniform. The determination of whether a given memory instruction will be scalarized or not has been moved into a common function that is used by the vectorizer, cost model, and legality analysis. Differential Revision: https://reviews.llvm.org/D24271 llvm-svn: 280979	2016-09-08 19:11:07 +00:00
Matthew Simpson	a8cb85d1b1	[LV] Ensure reverse interleaved group GEPs remain uniform For uniform instructions, we're only required to generate a scalar value for the first vector lane of each unroll iteration. Thus, if we have a reverse interleaved group, computing the member index off the scalar GEP corresponding to the last vector lane of its pointer operand technically makes the GEP non-uniform. We should compute the member index off the first scalar GEP instead. I've added the updated member index computation to the existing reverse interleaved group test. llvm-svn: 280497	2016-09-02 16:19:22 +00:00
Michael Kuperstein	39c8e8b0c7	[LoopVectorizer] Predicate instructions in blocks with several incoming edges We don't need to limit predication to blocks that have a single incoming edge, we just need to use the right mask. This fixes PR30172. Differential Revision: https://reviews.llvm.org/D24009 llvm-svn: 280148	2016-08-30 20:22:21 +00:00
Matthew Simpson	5aacf5b410	[LV] Move insertelement sequence after scalar definitions After r279649 when getting a vector value from VectorLoopValueMap, we create an insertelement sequence on-demand if the value has been scalarized instead of vectorized. We previously inserted this insertelement sequence before the value's first vector user. However, this insert location is problematic if that user is the phi node of a first-order recurrence. With this patch, we move the insertelement sequence after the last scalar instruction we created when scalarizing the value. Thus, the value's vector definition in the new loop will immediately follow its scalar definitions. This should fix PR30183. Reference: https://llvm.org/bugs/show_bug.cgi?id=30183 llvm-svn: 280001	2016-08-29 20:14:04 +00:00
Elena Demikhovsky	eed0bb3d71	[Loop Vectorizer] Fixed memory confilict checks. Fixed a bug in run-time checks for possible memory conflicts inside loop. The bug is in Low <-> High boundaries calculation. The High boundary should be calculated as "last memory access pointer + element size". Differential revision: https://reviews.llvm.org/D23176 llvm-svn: 279930	2016-08-28 08:53:53 +00:00
Matthew Simpson	e16dab59f1	[LV] Unify vector and scalar maps This patch unifies the data structures we use for mapping instructions from the original loop to their corresponding instructions in the new loop. Previously, we maintained two distinct maps for this purpose: WidenMap and ScalarIVMap. WidenMap maintained the vector values each instruction from the old loop was represented with, and ScalarIVMap maintained the scalar values each scalarized induction variable was represented with. With this patch, all values created for the new loop are maintained in VectorLoopValueMap. The change allows for several simplifications. Previously, when an instruction was scalarized, we had to insert the scalar values into vectors in order to maintain the mapping in WidenMap. Then, if a user of the scalarized value was also scalar, we had to extract the scalar values from the temporary vector we created. We now aovid these unnecessary scalar-to-vector-to-scalar conversions. If a scalarized value is used by a scalar instruction, the scalar value is used directly. However, if the scalarized value is needed by a vector instruction, we generate the needed insertelement instructions on-demand. A common idiom in several locations in the code (including the scalarization code), is to first get the vector values an instruction from the original loop maps to, and then extract a particular scalar value. This patch adds getScalarValue for this purpose along side getVectorValue as an interface into VectorLoopValueMap. These functions work together to return the requested values if they're available or to produce them if they're not. The mapping has also be made less permissive. Entries can be added to VectorLoopValue map with the new initVector and initScalar functions. getVectorValue has been modified to return a constant reference to the mapped entries. There's no real functional change with this patch; however, in some cases we will generate slightly different code. For example, instead of an insertelement sequence following the definition of an instruction, it will now precede the first use of that instruction. This can be seen in the test case changes. Differential Revision: https://reviews.llvm.org/D23169 llvm-svn: 279649	2016-08-24 18:23:17 +00:00
Gil Rapaport	06e1775b78	[Loop Vectorizer] Support predication of div/rem div/rem instructions in basic blocks that require predication currently prevent vectorization. This patch extends the existing mechanism for predicating stores to handle other instructions and leverages it to predicate divs and rems. Differential Revision: https://reviews.llvm.org/D22918 llvm-svn: 279620	2016-08-24 11:37:57 +00:00
Tim Shen	5263838307	[LoopVectorize] Detect loops in the innermost loop before creating InnerLoopVectorizer InnerLoopVectorizer shouldn't handle a loop with cycles inside the loop body, even if that cycle isn't a natural loop. Fixes PR28541. Differential Revision: https://reviews.llvm.org/D22952 llvm-svn: 278573	2016-08-12 22:47:13 +00:00
David Majnemer	cbcaf6adec	[ValueTracking] Teach computeKnownBits about [su]min/max Reasoning about a select in terms of a min or max allows us to derive a tigher bound on the result. llvm-svn: 277914	2016-08-06 08:16:00 +00:00
Michael Kuperstein	7044572182	[LV, X86] Be more optimistic about vectorizing shifts. Shifts with a uniform but non-constant count were considered very expensive to vectorize, because the splat of the uniform count and the shift would tend to appear in different blocks. That made the splat invisible to ISel, and we'd scalarize the shift at codegen time. Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we are able to select the appropriate vector shifts. This updates the cost model to to take this into account by making shifts by a uniform cheap again. Differential Revision: https://reviews.llvm.org/D23049 llvm-svn: 277782	2016-08-04 22:48:03 +00:00
Wei Mi	b2fe0c32d3	[LoopVectorize] Change comment for isOutOfScope in collectLoopUniforms, NFC Update comment for isOutOfScope and add a testcase for uniform value being used out of scope. Differential Revision: https://reviews.llvm.org/D23073 llvm-svn: 277515	2016-08-02 20:27:49 +00:00
Matthew Simpson	d277a64fa7	[LV] Generate both scalar and vector integer induction variables This patch enables the vectorizer to generate both scalar and vector versions of an integer induction variable for a given loop. Previously, we only generated a scalar induction variable if we knew all its users were going to be scalar. Otherwise, we generated a vector induction variable. In the case of a loop with both scalar and vector users of the induction variable, we would generate the vector induction variable and extract scalar values from it for the scalar users. With this patch, we now generate both versions of the induction variable when there are both scalar and vector users and select which version to use based on whether the user is scalar or vector. Differential Revision: https://reviews.llvm.org/D22869 llvm-svn: 277474	2016-08-02 15:25:16 +00:00
Matthew Simpson	9e386a7206	[LV] Untangle the concepts of uniform and scalar This patch refactors the logic in collectLoopUniforms and collectValuesToIgnore, untangling the concepts of "uniform" and "scalar". It adds isScalarAfterVectorization along side isUniformAfterVectorization to distinguish the two. Known scalar values include those that are uniform, getelementptr instructions that won't be vectorized, and induction variables and induction variable update instructions whose users are all known to be scalar. This patch includes the following functional changes: - In collectLoopUniforms, we mark uniform the pointer operands of interleaved accesses. Although non-consecutive, these pointers are treated like consecutive pointers during vectorization. - In collectValuesToIgnore, we insert a value into VecValuesToIgnore if it isScalarAfterVectorization rather than isUniformAfterVectorization. This differs from the previous functionaly in that we now add getelementptr instructions that will not be vectorized into VecValuesToIgnore. This patch also removes the ValuesNotWidened set used for induction variable scalarization since, after the above changes, it is now equivalent to isScalarAfterVectorization. Differential Revision: https://reviews.llvm.org/D22867 llvm-svn: 277460	2016-08-02 14:29:41 +00:00
Igor Breger	438f956c4f	[AVX512] Don't use i128 masked gather/scatter/load/store. Do more accurately dataWidth check. Differential Revision: http://reviews.llvm.org/D23055 llvm-svn: 277435	2016-08-02 09:15:28 +00:00
Craig Topper	381a6f48a2	[AVX-512] Fix a test missed in r277327. llvm-svn: 277330	2016-08-01 08:15:30 +00:00
Matt Masten	6edcc04bd0	Initial support for vectorization using svml (short vector math library). Differential Revision: https://reviews.llvm.org/D19544 llvm-svn: 277166	2016-07-29 16:42:44 +00:00
Wei Mi	df7bf7d6f4	Fix the assertion error in collectLoopUniforms caused by empty Worklist before expanding. Contributed-by: David Callahan Differential Revision: https://reviews.llvm.org/D22886 llvm-svn: 276943	2016-07-27 23:53:58 +00:00
Elena Demikhovsky	6936839d54	[Loop Vectorizer] Handling loops FP induction variables. Allowed loop vectorization with secondary FP IVs. Like this: float *A; float x = init; for (int i=0; i < N; ++i) { A[i] = x; x -= fp_inc; } The auto-vectorization is possible when the induction binary operator is "fast" or the function has "unsafe" attribute. Differential Revision: https://reviews.llvm.org/D21330 llvm-svn: 276554	2016-07-24 07:24:54 +00:00
Matthew Simpson	2eab139cb5	[LV] Move vector int induction update to end of latch This patch moves the update instruction for vectorized integer induction phi nodes to the end of the latch block. This ensures consistent placement of all induction updates across all the kinds of int inductions we create (scalar, splat vector, or vector phi). Differential Revision: https://reviews.llvm.org/D22416 llvm-svn: 276339	2016-07-21 21:20:15 +00:00
Adam Nemet	377d292ea8	[OptDiag,LV] Add hotness attribute to applied-optimization remarks Test coverage is provided by modifying the function in the FP-math testcase that we are allowed to vectorize. llvm-svn: 276223	2016-07-21 01:07:13 +00:00
Adam Nemet	2a94ac8820	[OptDiag,LV] Add hotness attribute to the derived analysis remarks This includes FPCompute and Aliasing. Testcase is based on no_fpmath.ll. llvm-svn: 276211	2016-07-20 23:50:32 +00:00
Adam Nemet	46bb1fa09e	[OptDiag,LV] Add hotness attribute to analysis remarks The earlier change added hotness attribute to missed-optimization remarks. This follows up with the analysis remarks (the ones explaining the reason for the missed optimization). llvm-svn: 276192	2016-07-20 21:44:26 +00:00
Adam Nemet	5c2a8f1a0c	[LV] Add hotness attribute to missed-optimization remarks The new OptimizationRemarkEmitter analysis pass is hooked up to both new and old PM passes. llvm-svn: 276080	2016-07-20 04:03:43 +00:00
Wei Mi	a63472cac9	Recommit the patch "Use uniforms set to populate VecValuesToIgnore". For instructions in uniform set, they will not have vector versions so add them to VecValuesToIgnore. For induction vars, those only used in uniform instructions or consecutive ptrs instructions have already been added to VecValuesToIgnore above. For those induction vars which are only used in uniform instructions or non-consecutive/non-gather scatter ptr instructions, the related phi and update will also be added into VecValuesToIgnore set. The change will make the vector RegUsages estimation less conservative. Differential Revision: https://reviews.llvm.org/D20474 The recommit fixed the testcase global_alias.ll. llvm-svn: 275936	2016-07-19 00:50:43 +00:00
Wei Mi	34c2e98656	Revert rL275912. llvm-svn: 275915	2016-07-18 21:14:43 +00:00
Wei Mi	e3b5e56180	Use uniforms set to populate VecValuesToIgnore. For instructions in uniform set, they will not have vector versions so add them to VecValuesToIgnore. For induction vars, those only used in uniform instructions or consecutive ptrs instructions have already been added to VecValuesToIgnore above. For those induction vars which are only used in uniform instructions or non-consecutive/non-gather scatter ptr instructions, the related phi and update will also be added into VecValuesToIgnore set. The change will make the vector RegUsages estimation less conservative. Differential Revision: https://reviews.llvm.org/D20474 llvm-svn: 275912	2016-07-18 20:59:53 +00:00
Matthew Simpson	b1999c4c21	[LV] Allow interleaved accesses in loops with predicated blocks This patch allows the formation of interleaved access groups in loops containing predicated blocks. However, the predicated accesses are prevented from forming groups. Differential Revision: https://reviews.llvm.org/D19694 llvm-svn: 275471	2016-07-14 20:59:47 +00:00
Matthew Simpson	9e14b27894	[LV] Avoid unnecessary IV scalar-to-vector-to-scalar conversions This patch prevents increases in the number of instructions, pre-instcombine, due to induction variable scalarization. An increase in instructions can lead to an increase in the compile-time required to simplify the induction variables. We now maintain a new map for scalarized induction variables to prevent us from converting between the scalar and vector forms. This patch should resolve compile-time regressions seen after r274627. llvm-svn: 275419	2016-07-14 14:36:06 +00:00
Michael Kuperstein	989812aece	[LV] Remove wrong assumption about LCSSA The LCSSA pass itself will not generate several redundant PHI nodes in a single exit block. However, such redundant PHI nodes don't violate LCSSA form, and may be introduced by passes that preserve LCSSA, and/or preserved by the LCSSA pass itself. So, assuming a single PHI node per exit block is not safe. llvm-svn: 275217	2016-07-12 21:24:06 +00:00
Michael Kuperstein	7e6a08b33c	[X86] Make some cast costs more precise Make some AVX and AVX512 cast costs more precise. Based on part of a patch by Elena Demikhovsky (D15604). Differential Revision: http://reviews.llvm.org/D22064 llvm-svn: 275106	2016-07-11 21:39:44 +00:00
Sean Silva	8238108d59	[PM] Port LoopVectorize to the new PM. llvm-svn: 275000	2016-07-09 22:56:50 +00:00
Elena Demikhovsky	744b1499a5	Fixed a bug in vectorizing GEP before gather/scatter intrinsic. Vectorizing GEP was incorrect and broke SSA in some cases. The patch fixes PR27997 https://llvm.org/bugs/show_bug.cgi?id=27997. Differential revision: http://reviews.llvm.org/D22035 llvm-svn: 274735	2016-07-07 06:06:46 +00:00
Michael Kuperstein	d59d4568d1	[TTI] The cost model should not assume vector casts get completely scalarized The cost model should not assume vector casts get completely scalarized, since on targets that have vector support, the common case is a partial split up to the legal vector size. So, when a vector cast gets split, the resulting casts end up legal and cheap. Instead of pessimistically assuming scalarization, base TTI can use the costs the concrete TTI provides for the split vector, plus a fudge factor to account for the cost of the split itself. This fudge factor is currently 1 by default, except on AMDGPU where inserts and extracts are considered free. Differential Revision: http://reviews.llvm.org/D21251 llvm-svn: 274642	2016-07-06 17:30:56 +00:00
Matthew Simpson	5671f9d0a9	[LV] Don't widen trivial induction variables We currently always vectorize induction variables. However, if an induction variable is only used for counting loop iterations or computing addresses with getelementptr instructions, we don't need to do this. Vectorizing these trivial induction variables can create vector code that is difficult to simplify later on. This is especially true when the unroll factor is greater than one, and we create vector arithmetic when computing step vectors. With this patch, we check if an induction variable is only used for counting iterations or computing addresses, and if so, scalarize the arithmetic when computing step vectors instead. This allows for greater simplification. This patch addresses the suboptimal pointer arithmetic sequence seen in PR27881. Reference: https://llvm.org/bugs/show_bug.cgi?id=27881 Differential Revision: http://reviews.llvm.org/D21620 llvm-svn: 274627	2016-07-06 14:26:59 +00:00
Matt Arsenault	f316d56e2f	SLPVectorizer: Move propagateMetadata to VectorUtils This will be re-used by the LoadStoreVectorizer. Fix handling of range metadata and testcase by Justin Lebar. llvm-svn: 274281	2016-06-30 21:17:59 +00:00

1 2 3 4 5 ...

597 Commits