1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 21:42:54 +02:00
Commit Graph

569 Commits

Author SHA1 Message Date
Elena Demikhovsky
c7f80a2ab8 Masked Store in Loop Vectorizer - bugfix
Fixed a bug in loop vectorization with conditional store.

Differential Revision: http://reviews.llvm.org/D19532

llvm-svn: 267597
2016-04-26 20:18:04 +00:00
Hal Finkel
7956f36dac [LoopVectorize] Don't consider conditional-load dereferenceability for marked parallel loops
I really thought we were doing this already, but we were not. Given this input:

void Test(int *res, int *c, int *d, int *p) {
  for (int i = 0; i < 16; i++)
    res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
}

we did not vectorize the loop. Even with "assume_safety" the check that we
don't if-convert conditionally-executed loads (to protect against
data-dependent deferenceability) was not elided.

One subtlety: As implemented, it will still prefer to use a masked-load
instrinsic (given target support) over the speculated load. The choice here
seems architecture specific; the best option depends on how expensive the
masked load is compared to a regular load. Ideally, using the masked load still
reduces unnecessary memory traffic, and so should be preferred. If we'd rather
do it the other way, flipping the order of the checks is easy.

The LangRef is updated to make explicit that llvm.mem.parallel_loop_access also
implies that if conversion is okay.

Differential Revision: http://reviews.llvm.org/D19512

llvm-svn: 267514
2016-04-26 02:00:36 +00:00
Andrew Kaylor
653d361880 Re-commit optimization bisect support (r267022) without new pass manager support.
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).

Differential Revision: http://reviews.llvm.org/D19172

llvm-svn: 267231
2016-04-22 22:06:11 +00:00
Vedant Kumar
b6cc52b7d8 Revert "Initial implementation of optimization bisect support."
This reverts commit r267022, due to an ASan failure:

  http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check/1549

llvm-svn: 267115
2016-04-22 06:51:37 +00:00
Andrew Kaylor
fd49f275f8 Initial implementation of optimization bisect support.
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.

The bisection is enabled using a new command line option (-opt-bisect-limit).  Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit.  A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.

The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check.  Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute.  A new function call has been added for module and SCC passes that behaves in a similar way.

Differential Revision: http://reviews.llvm.org/D19172

llvm-svn: 267022
2016-04-21 17:58:54 +00:00
David Majnemer
d93b1bf8b1 [ValueTracking, VectorUtils] Refactor getIntrinsicIDForCall
The functionality contained within getIntrinsicIDForCall is two-fold: it
checks if a CallInst's callee is a vectorizable intrinsic.  If it isn't
an intrinsic, it attempts to map the call's target to a suitable
intrinsic.

Move the mapping functionality into getIntrinsicForCallSite and rename
getIntrinsicIDForCall to getVectorIntrinsicIDForCall while
reimplementing it in terms of getIntrinsicForCallSite.

llvm-svn: 266801
2016-04-19 19:10:21 +00:00
Michael Kuperstein
3e5d8ebde9 Port DemandedBits to the new pass manager.
Differential Revision: http://reviews.llvm.org/D18679

llvm-svn: 266699
2016-04-18 23:55:01 +00:00
Renato Golin
bf4b959200 [ARM] Adding IEEE-754 SIMD detection to loop vectorizer
Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON.

This patch teaches the loop vectorizer to only allow transformations of loops
that either contain no floating-point operations or have enough allowance
flags supporting lack of precision (ex. -ffast-math, Darwin).

For that, the target description now has a method which tells us if the
vectorizer is allowed to handle FP math without falling into unsafe
representations, plus a check on every FP instruction in the candidate loop
to check for the safety flags.

This commit makes LLVM behave like GCC with respect to ARM NEON support, but
it stops short of fixing the underlying problem: sub-normals. Neither GCC
nor LLVM have a flag for allowing sub-normal operations. Before this patch,
GCC only allows it using unsafe-math flags and LLVM allows it by default with
no way to turn it off (short of not using NEON at all).

As a first step, we push this change to make it safe and in sync with GCC.
The second step is to discuss a new sub-normal's flag on both communitues
and come up with a common solution. The third step is to improve the FastMath
flags in LLVM to encode sub-normals and use those flags to restrict NEON FP.

Fixes PR16275.

llvm-svn: 266363
2016-04-14 20:42:18 +00:00
David Majnemer
d2ed420815 [CodeGen] Teach LLVM how to lower @llvm.{min,max}num to {MIN,MAX}NAN
The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only
one of the inputs is NaN: -NUM will return the non-NaN argument while
-NAN would return NaN.

It is desirable to lower to @llvm.{min,max}num to -NAN if they don't
have a native instruction for -NUM.  Notably, ARMv7 NEON's vmin has the
-NAN semantics.

N.B.  Of course, it is only safe to do this if the intrinsic call is
marked nnan.

llvm-svn: 266279
2016-04-14 07:13:24 +00:00
Elena Demikhovsky
8410bf33a6 Loop vectorization with uniform load
Vectorization cost of uniform load wasn't correctly calculated.
As a result, a simple loop that loads a uniform value wasn't vectorized.

Differential Revision: http://reviews.llvm.org/D18940

llvm-svn: 265901
2016-04-10 16:53:19 +00:00
David Majnemer
a4716b737d [LoopVectorize] Register cloned assumptions
InstCombine cannot effectively remove redundant assumptions without them
registered in the assumption cache.  The vectorizer can create identical
assumptions but doesn't register them with the cache, resulting in
slower compile times because InstCombine tries to reason about a lot
more assumptions.

Fix this by registering the cloned assumptions.

llvm-svn: 265800
2016-04-08 16:37:10 +00:00
Silviu Baranga
a0999051a2 Re-commit [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV
This re-commits r265535 which was reverted in r265541 because it
broke the windows bots. The problem was that we had a PointerIntPair
which took a pointer to a struct allocated with new. The problem
was that new doesn't provide sufficient alignment guarantees.
This pattern was already present before r265535 and it just happened
to work. To fix this, we now separate the PointerToIntPair from the
ExitNotTakenInfo struct into a pointer and a bool.

Original commit message:

Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.

However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.

In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.

We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.

Reviewers: anemet, mzolotukhin, hfinkel, sanjoy

Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17201

llvm-svn: 265786
2016-04-08 14:29:09 +00:00
Silviu Baranga
aab80ed89c Revert r265535 until we know how we can fix the bots
llvm-svn: 265541
2016-04-06 14:06:32 +00:00
Silviu Baranga
1fce9f5629 [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV
Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.

However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.

In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.

We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.

Reviewers: anemet, mzolotukhin, hfinkel, sanjoy

Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17201

llvm-svn: 265535
2016-04-06 13:18:26 +00:00
Hal Finkel
e18e63d6a5 [LoopVectorize] Don't vectorize loops when everything will be scalarized
This change prevents the loop vectorizer from vectorizing when all of the vector
types it generates will be scalarized. I've run into this problem on the PPC's QPX
vector ISA, which only holds floating-point vector types. The loop vectorizer
will, however, happily vectorize loops with purely integer computation. Here's
an example:

  LV: The Smallest and Widest types: 32 / 32 bits.
  LV: The Widest register is: 256 bits.
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 1 for VF 1 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 1 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 1 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 1 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Scalar loop costs: 3.
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 2 for VF 2 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 2 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 2 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 2 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Vector loop of width 2 costs: 2.
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 4 for VF 4 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 4 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 4 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 4 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Vector loop of width 4 costs: 1.
  ...
  LV: Selecting VF: 8.
  LV: The target has 32 registers
  LV(REG): Calculating max register usage:
  LV(REG): At #0 Interval # 0
  LV(REG): At #1 Interval # 1
  LV(REG): At #2 Interval # 2
  LV(REG): At #4 Interval # 1
  LV(REG): At #5 Interval # 1
  LV(REG): VF = 8

The problem is that the cost model here is not wrong, exactly. Since all of
these operations are scalarized, their cost (aside from the uniform ones) are
indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF
picked, the lower the relative overhead from the loop itself (and the
induction-variable update and check), and so in a sense, picking the largest VF
here is the right thing to do.

The problem is that vectorizing like this, where all of the vectors will be
scalarized in the backend, isn't really vectorizing, but rather interleaving.
By itself, this would be okay, but then the vectorizer itself also interleaves,
and that's where the problem manifests itself. There's aren't actually enough
scalar registers to support the normal interleave factor multiplied by a factor
of VF (8 in this example). In other words, the problem with this is that our
register-pressure heuristic does not account for scalarization.

While we might want to improve our register-pressure heuristic, I don't think
this is the right motivating case for that work. Here we have a more-basic
problem: The job of the vectorizer is to vectorize things (interleaving aside),
and if the IR it generates won't generate any actual vector code, then
something is wrong. Thus, if every type looks like it will be scalarized (i.e.
will be split into VF or more parts), then don't consider that VF.

This is not a problem specific to PPC/QPX, however. The problem comes up under
SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's
reduced test case from PR26837 to this commit.

Differential Revision: http://reviews.llvm.org/D18537

llvm-svn: 264904
2016-03-30 19:37:08 +00:00
Nirav Dave
7a0e387b04 Remove HasFnAttribute guards to getFnAttribute calls
These checks are redundant and can be removed

Reviewers: hans

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D18564

llvm-svn: 264872
2016-03-30 15:41:12 +00:00
Adam Nemet
43c0d4789b [LoopVectorize] Annotate versioned loop with noalias metadata
Summary:
Use the new LoopVersioning facility (D16712) to add noalias metadata in
the vector loop if we versioned with memchecks.  This can enable some
optimization opportunities further down the pipeline (see the included
test or the benchmark improvement quoted in D16712).

The test also covers the bug I had in the initial version in D16712.

The vectorizer did not previously use LoopVersioning.  The reason is
that the vectorizer performs its transformations in single shot.  It
creates an empty single-block vector loop that it then populates with
the widened, if-converted instructions.  Thus creating an intermediate
versioned scalar loop seems wasteful.

So this patch (rather than bringing in LoopVersioning fully) adds a
special interface to LoopVersioning to allow the vectorizer to add
no-alias annotation while still performing its own versioning.

As the vectorizer propagates metadata from the instructions in the
original loop to the vector instructions we also check the pointer in
the original instruction and see if LoopVersioning can add no-alias
metadata based on the issued memchecks.

Reviewers: hfinkel, nadav, mzolotukhin

Subscribers: mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17191

llvm-svn: 263744
2016-03-17 20:32:37 +00:00
Adam Nemet
269fc43aff [LV] Preserve LoopInfo when store predication is used
This was a latent bug that got exposed by the change to add LoopSimplify
as a dependence to LoopLoadElimination.  Since LoopInfo was corrupted
after LV, LoopSimplify mis-compiled nbench in the test-suite (more
details in the PR).

The problem was that when we create the blocks for predicated stores we
didn't add those to any loops.

The original testcase for store predication provides coverage for this
assuming we verify LI on the way out of LV.

Fixes PR26952.

llvm-svn: 263565
2016-03-15 18:06:20 +00:00
Hans Wennborg
8cdb9f2953 Revert r255691 "[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions."
It caused PR26509.

llvm-svn: 261368
2016-02-19 21:40:12 +00:00
Matthew Simpson
2ebb736740 [LV] Vectorize first-order recurrences
This patch enables the vectorization of first-order recurrences. A first-order
recurrence is a non-reduction recurrence relation in which the value of the
recurrence in the current loop iteration equals a value defined in the previous
iteration. The load PRE of the GVN pass often creates these recurrences by
hoisting loads from within loops.

In this patch, we add a new recurrence kind for first-order phi nodes and
attempt to vectorize them if possible. Vectorization is performed by shuffling
the values for the current and previous iterations. The vectorization cost
estimate is updated to account for the added shuffle instruction.

Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D16197

llvm-svn: 261346
2016-02-19 17:56:08 +00:00
Silviu Baranga
7a8e19daa2 [LV] Fix PR26600: avoid out of bounds loads for interleaved access vectorization
Summary:
If we don't have the first and last access of an interleaved load group,
the first and last wide load in the loop can do an out of bounds
access. Even though we discard results from speculative loads,
this can cause problems, since it can technically generate page faults
(or worse).

We now discard interleaved load groups that don't have the first and
load in the group.

Reviewers: hfinkel, rengolin

Subscribers: rengolin, llvm-commits, mzolotukhin, anemet

Differential Revision: http://reviews.llvm.org/D17332

llvm-svn: 261331
2016-02-19 15:46:10 +00:00
Elena Demikhovsky
fdd98d5776 Create masked gather and scatter intrinsics in Loop Vectorizer.
Loop vectorizer now knows to vectorize GEP and create masked gather and scatter intrinsics for random memory access.

The feature is enabled on AVX-512 target.
Differential Revision: http://reviews.llvm.org/D15690

llvm-svn: 261140
2016-02-17 19:23:04 +00:00
Silviu Baranga
3bf0593dd6 [LV] Add support for insertelt/extractelt processing during type truncation
Summary:
While shrinking types according to the required bits, we can
encounter insert/extract element instructions. This will cause us to
reach an llvm_unreachable statement.

This change adds support for truncating insert/extract element
operations, and adds a regression test.

Reviewers: jmolloy

Subscribers: mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17078

llvm-svn: 260893
2016-02-15 15:38:17 +00:00
Silviu Baranga
6e6031d4bf [SCEV][LAA] Re-commit r260085 and r260086, this time with a fix for the memory
sanitizer issue. The PredicatedScalarEvolution's copy constructor
wasn't copying the Generation value, and was leaving it un-initialized.

Original commit message:

[SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection

Summary:
This change adds no wrap SCEV predicates with:
  - support for runtime checking
  - support for expression rewriting:
      (sext ({x,+,y}) -> {sext(x),+,sext(y)}
      (zext ({x,+,y}) -> {zext(x),+,sext(y)}

Note that we are sign extending the increment of the SCEV, even for
the zext case. This is needed to cover the fairly common case where y would
be a (small) negative integer. In order to do this, this change adds two new
flags: nusw and nssw that are applicable to AddRecExprs and permit the
transformations above.

We also change isStridedPtr in LAA to be able to make use of
these predicates. With this feature we should now always be able to
work around overflow issues in the dependence analysis.

Reviewers: mzolotukhin, sanjoy, anemet

Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel

Differential Revision: http://reviews.llvm.org/D15412

llvm-svn: 260112
2016-02-08 17:02:45 +00:00
Silviu Baranga
bb2840e653 Revert r260086 and r260085. They have broken the memory
sanitizer bots.

llvm-svn: 260087
2016-02-08 11:56:15 +00:00
Silviu Baranga
f0b51949f8 [SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection
Summary:
This change adds no wrap SCEV predicates with:
  - support for runtime checking
  - support for expression rewriting:
      (sext ({x,+,y}) -> {sext(x),+,sext(y)}
      (zext ({x,+,y}) -> {zext(x),+,sext(y)}

Note that we are sign extending the increment of the SCEV, even for
the zext case. This is needed to cover the fairly common case where y would
be a (small) negative integer. In order to do this, this change adds two new
flags: nusw and nssw that are applicable to AddRecExprs and permit the
transformations above.

We also change isStridedPtr in LAA to be able to make use of
these predicates. With this feature we should now always be able to
work around overflow issues in the dependence analysis.

Reviewers: mzolotukhin, sanjoy, anemet

Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel

Differential Revision: http://reviews.llvm.org/D15412

llvm-svn: 260085
2016-02-08 10:45:50 +00:00
Wei Mi
00d0d9c981 [SCEV] Try to reuse existing value during SCEV expansion
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.

This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.

The original commit triggered regressions in Polly tests. The regressions
exposed two problems which have been fixed in current version.

1. Polly will generate a new function based on the old one. To generate an
instruction for the new function, it builds SCEV for the old instruction,
applies some tranformation on the SCEV generated, then expands the transformed
SCEV and insert the expanded value into new function. Because SCEV expansion
may reuse value cached in ExprValueMap, the value in old function may be
inserted into new function, which is wrong.
   In SCEVExpander::expand, there is a logic to check the cached value to
be used should dominate the insertion point. However, for the above
case, the check always passes. That is because the insertion point is
in a new function, which is unreachable from the old function. However
for unreachable node, DominatorTreeBase::dominates thinks it will be
dominated by any other node.
   The fix is to simply add a check that the cached value to be used in
expansion should be in the same function as the insertion point instruction.

2. When the SCEV is of scConstant type, expanding it directly is cheaper than
reusing a normal value cached. Although in the cached value set in ExprValueMap,
there is a Constant type value, but it is not easy to find it out -- the cached
Value set is not sorted according to the potential cost. Existing reuse logic
in SCEVExpander::expand simply chooses the first legal element from the cached
value set.
   The fix is that when the SCEV is of scConstant type, don't try the reuse
logic. simply expand it.

Differential Revision: http://reviews.llvm.org/D12090

llvm-svn: 259736
2016-02-04 01:27:38 +00:00
Junmo Park
bff0483f1c Minor code cleanups. NFC.
llvm-svn: 259725
2016-02-03 23:16:39 +00:00
Wei Mi
1ef051b016 Revert r259662, which caused regressions on polly tests.
llvm-svn: 259675
2016-02-03 18:05:57 +00:00
Wei Mi
4fc93e70ac [SCEV] Try to reuse existing value during SCEV expansion
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.

This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.

Differential Revision: http://reviews.llvm.org/D12090

llvm-svn: 259662
2016-02-03 17:05:12 +00:00
Matthew Simpson
7c41d95a48 [LV] Rename RdxPHIsToFix to PHIsToFix (NFC)
In the future, we will vectorize recurrences other than reductions. This patch
renames a few variables and updates their associated comments to enable them to
be reused for non-reduction PHI nodes.

This change was requested in the review for D16197.

llvm-svn: 259364
2016-02-01 16:07:01 +00:00
Matthew Simpson
18b4a9537b [LV] Avoid creating empty reduction entries (NFC)
This patch prevents us from unintentionally creating entries in the reductions
map for PHIs that are not actually reductions. This is currently not an issue
since we bail out if we encounter PHIs other than inductions or reductions.
However the behavior could become problematic as we add support for additional
recurrence types.

llvm-svn: 256930
2016-01-06 12:50:29 +00:00
Sanjoy Das
5292411857 [SCEV] Add and use SCEVConstant::getAPInt; NFCI
llvm-svn: 255921
2015-12-17 20:28:46 +00:00
Cong Hou
d84dd1150a [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.
(This is the third attempt to check in this patch, and the first two are r255454
and r255460. The once failed test file reg-usage.ll is now moved to
test/Transform/LoopVectorize/X86 directory with target datalayout and target
triple indicated.)

LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.


Differential revision: http://reviews.llvm.org/D15177

llvm-svn: 255691
2015-12-15 22:45:09 +00:00
Cong Hou
19e67a55f7 Revert r255460, which still causes test failures on some platforms.
Further investigation on the failures is ongoing.

llvm-svn: 255463
2015-12-13 17:15:38 +00:00
Cong Hou
ea8edf0fdf [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.
(This is the second attempt to check in this patch: REQUIRES: asserts is added
to reg-usage.ll now.)

LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.


Differential revision: http://reviews.llvm.org/D15177

llvm-svn: 255460
2015-12-13 16:55:46 +00:00
Cong Hou
74b6e6397a Revert r255454 as it leads to several test failers on buildbots.
llvm-svn: 255456
2015-12-13 09:28:57 +00:00
Cong Hou
0784bf19f6 [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.
LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.


Differential revision: http://reviews.llvm.org/D15177

llvm-svn: 255454
2015-12-13 08:44:08 +00:00
Silviu Baranga
d19d7b747a Re-commit r255115, with the PredicatedScalarEvolution class moved to
ScalarEvolution.h, in order to avoid cyclic dependencies between the Transform
and Analysis modules:

[LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions

Summary:
This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the
usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge
by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is
that both LAA and LV should use this interface everywhere.

This also solves a problem involving the result of SCEV expression rewritting when
the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates
  P1: {a,+,b} has nsw
  P2: b = 1.

Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us
sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies).
The SCEVPredicatedLayer maintains the order of transformations by feeding back
the results of previous transformations into new transformations, and therefore
avoiding this issue.

The SCEVPredicatedLayer maintains a cache to remember the results of previous
SCEV rewritting results. This also has the benefit of reducing the overall number
of expression rewrites.

Reviewers: mzolotukhin, anemet

Subscribers: jmolloy, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D14296

llvm-svn: 255122
2015-12-09 16:06:28 +00:00
Silviu Baranga
ba0669cbca Revert r255115 until we figure out how to fix the bot failures.
llvm-svn: 255117
2015-12-09 15:25:28 +00:00
Silviu Baranga
f6006f41f7 [LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions
Summary:
This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the
usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge
by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is
that both LAA and LV should use this interface everywhere.

This also solves a problem involving the result of SCEV expression rewritting when
the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates
  P1: {a,+,b} has nsw
  P2: b = 1.

Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us
sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies).
The SCEVPredicatedLayer maintains the order of transformations by feeding back
the results of previous transformations into new transformations, and therefore
avoiding this issue.

The SCEVPredicatedLayer maintains a cache to remember the results of previous
SCEV rewritting results. This also has the benefit of reducing the overall number
of expression rewrites.

Reviewers: mzolotukhin, anemet

Subscribers: jmolloy, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D14296

llvm-svn: 255115
2015-12-09 15:03:52 +00:00
Cong Hou
30b1a3deac Fix a typo in LoopVectorize.cpp. NFC.
llvm-svn: 254813
2015-12-05 01:00:22 +00:00
Cong Hou
ae07b50dd6 Fix a typo in LoopVectorize.cpp. NFC.
llvm-svn: 254549
2015-12-02 21:33:47 +00:00
Charlie Turner
18cf3a8580 [LoopVectorize] Use MapVector rather than DenseMap for MinBWs.
The order in which instructions are truncated in truncateToMinimalBitwidths
effects code generation. Switch to a map with a determinisic order, since the
iteration order over a DenseMap is not defined.

This code is not hot, so the difference in container performance isn't
interesting.

Many thanks to David Blaikie for making me aware of MapVector!

Fixes PR25490.

Differential Revision: http://reviews.llvm.org/D14981

llvm-svn: 254179
2015-11-26 20:39:51 +00:00
Chad Rosier
7d466a8b34 [LV] Add a helper function, isReductionVariable. NFC.
llvm-svn: 253565
2015-11-19 14:19:06 +00:00
Cong Hou
7900e5e76e Fix several long lines (>80) in LoopVectorize.cpp. NFC.
llvm-svn: 253527
2015-11-19 00:32:30 +00:00
Chad Rosier
1d3f2c21a9 Typo.
llvm-svn: 253336
2015-11-17 13:58:10 +00:00
James Molloy
697ec724f3 [LoopVectorize] Address post-commit feedback on r250032
Implemented as many of Michael's suggestions as were possible:
  * clang-format the added code while it is still fresh.
  * tried to change Value* to Instruction* in many places in computeMinimumValueSizes - unfortunately there are several places where Constants need to be handled so this wasn't possible.
  * Reduce the pass list on loop-vectorization-factors.ll.
  * Fix a bug where we were querying MinBWs for I->getOperand(0) but using MinBWs[I].

llvm-svn: 252469
2015-11-09 14:32:05 +00:00
Elena Demikhovsky
45c8421c3f LoopVectorizer - skip 'bitcast' between GEP and load.
Skipping 'bitcast' in this case allows to vectorize load:

  %arrayidx = getelementptr inbounds double*, double** %in, i64 %indvars.iv
  %tmp53 = bitcast double** %arrayidx to i64*
  %tmp54 = load i64, i64* %tmp53, align 8

Differential Revision http://reviews.llvm.org/D14112

llvm-svn: 251907
2015-11-03 10:29:34 +00:00
Cong Hou
0b6d5e284f Add a flag vectorizer-maximize-bandwidth in loop vectorizer to enable using larger vectorization factor.
To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers.

This is the second attempt to submit this patch. The first attempt got a test failure on ARM. This patch is updated to try to fix the failure (more specifically, by handling the case when VF=1).

Differential revision: http://reviews.llvm.org/D8943

llvm-svn: 251850
2015-11-02 22:53:48 +00:00