1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 20:12:56 +02:00
Commit Graph

1287 Commits

Author SHA1 Message Date
Reid Kleckner
73e1a13fdc [IR] De-virtualize ~Value to save a vptr
Summary:
Implements PR889

Removing the virtual table pointer from Value saves 1% of RSS when doing
LTO of llc on Linux. The impact on time was positive, but too noisy to
conclusively say that performance improved. Here is a link to the
spreadsheet with the original data:

https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing

This change makes it invalid to directly delete a Value, User, or
Instruction pointer. Instead, such code can be rewritten to a null check
and a call Value::deleteValue(). Value objects tend to have their
lifetimes managed through iplist, so for the most part, this isn't a big
deal.  However, there are some places where LLVM deletes values, and
those places had to be migrated to deleteValue.  I have also created
llvm::unique_value, which has a custom deleter, so it can be used in
place of std::unique_ptr<Value>.

I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which
derives from User outside of lib/IR. Code in IR cannot include MemorySSA
headers or call the MemoryAccess object destructors without introducing
a circular dependency, so we need some level of indirection.
Unfortunately, no class derived from User may have any virtual methods,
because adding a virtual method would break User::getHungOffOperands(),
which assumes that it can find the use list immediately prior to the
User object. I've added a static_assert to the appropriate OperandTraits
templates to help people avoid this trap.

Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv

Reviewed By: chandlerc

Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D31261

llvm-svn: 303362
2017-05-18 17:24:10 +00:00
Matthew Simpson
be5fce863d Revert 303174, 303176, and 303178
These commits are breaking the bots. Reverting to investigate.

llvm-svn: 303182
2017-05-16 15:50:30 +00:00
Matthew Simpson
fdeda43e2f [LV] Avoid potentential division by zero when selecting IC
llvm-svn: 303174
2017-05-16 14:43:55 +00:00
Adam Nemet
f9607f0660 [SLP] Enable 64-bit wide vectorization on AArch64
ARM Neon has native support for half-sized vector registers (64 bits).  This
is beneficial for example for 2D and 3D graphics.  This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.

*** Performance Analysis

This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.

The results are with -O3 and PGO.  A negative percentage is an improvement.
The testsuite was run with a sample size of 4.

** SPEC

* CFP2006/482.sphinx3  -3.34%

A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.

* CFP2000/177.mesa     -3.34%
* CINT2000/256.bzip2   +6.97%

My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%.  There are also other problems with the codegen in
this loop so there is further room for improvement.

** LLVM testsuite

* SingleSource/Benchmarks/Misc/ReedSolomon               -10.75%

There are multiple small SLP vectorizations outside the hot code.  It's a bit
surprising that it adds up to 10%.  Some of this may be code-layout noise.

* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%

The opt-viewer screenshot can be seen at F3218284.  We start at a colder store
but the tree leads us into the hottest loop.

* MultiSource/Applications/lambda-0.1.3/lambda            -2.68%
* MultiSource/Benchmarks/Bullet/bullet                    -2.18%

This is using 3D vectors.

* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%

Noise, binary is unchanged.

* MultiSource/Benchmarks/Ptrdist/anagram/anagram          +4.90%

There is an additional SLP in the cold code.  The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.

* MultiSource/Applications/aha/aha                        +1.63%
* MultiSource/Applications/JM/lencod/lencod               +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark         +1.15%

Differential Revision: https://reviews.llvm.org/D31965

llvm-svn: 303116
2017-05-15 21:15:01 +00:00
Craig Topper
be2ad5e5e7 [ValueTracking] Replace all uses of ComputeSignBit with computeKnownBits.
This patch finishes off the conversion of ComputeSignBit to computeKnownBits.

Differential Revision: https://reviews.llvm.org/D33166

llvm-svn: 303035
2017-05-15 06:39:41 +00:00
Simon Pilgrim
288ebc9253 [LoopOptimizer][Fix]PR32859, PR24738
The Loop vectorizer pass introduced undef value while it is fixing output of LCSSA form.
Here it is:

before: %e.0.ph = phi i32 [ 0, %for.inc.2.i ]
after: %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ undef, %middle.block ]

and after this change we have:

%e.0.ph = phi i32 [ 0, %for.inc.2.i ]
%e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ 0, %middle.block ]

Committed on behalf of @dtemirbulatov

Differential Revision: https://reviews.llvm.org/D33055

llvm-svn: 302988
2017-05-13 13:25:57 +00:00
Craig Topper
7ebd2e3f5b [KnownBits] Add bit counting methods to KnownBits struct and use them where possible
This patch adds min/max population count, leading/trailing zero/one bit counting methods.

The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give.

Differential Revision: https://reviews.llvm.org/D32931

llvm-svn: 302925
2017-05-12 17:20:30 +00:00
Adam Nemet
73703d12a4 [SLP] Emit optimization remarks
The approach I followed was to emit the remark after getTreeCost concludes
that SLP is profitable.  I initially tried emitting them after the
vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely
remember missing a few cases for example in HorizontalReduction::tryToReduce.

ORE is placed in BoUpSLP so that it's available from everywhere (notably
HorizontalReduction::tryToReduce).

We use the first instruction in the root bundle as the locator for the remark.
In order to get a sense how far the tree is spanning I've include the size of
the tree in the remark.  This is not perfect of course but it gives you at
least a rough idea about the tree.  Then you can follow up with -view-slp-tree
to really see the actual tree.

llvm-svn: 302811
2017-05-11 17:06:17 +00:00
Ayal Zaks
4c2e1fba70 [LV] Refactor ILV.vectorize{Loop}() by introducing LVP.executePlan(); NFC
Introduce LoopVectorizationPlanner.executePlan(), replacing ILV.vectorize() and
refactoring ILV.vectorizeLoop(). Method collectDeadInstructions() is moved from
ILV to LVP. These changes facilitate building VPlans and using them to generate
code, following https://reviews.llvm.org/D28975 and its tentative breakdown.

Method ILV.createEmptyLoop() is renamed ILV.createVectorizedLoopSkeleton() to
improve clarity; it's contents remain intact.

Differential Revision: https://reviews.llvm.org/D32200

llvm-svn: 302790
2017-05-11 11:36:33 +00:00
Matthew Simpson
383f076bfc [AArch64] Consider widening instructions in cost calculations
The AArch64 instruction set has a few "widening" instructions (e.g., uaddl,
saddl, uaddw, etc.) that take one or more doubleword operands and produce
quadword results. The operands are automatically sign- or zero-extended as
appropriate. However, in LLVM IR, these extends are explicit. This patch
updates TTI to consider these widening instructions as single operations whose
cost is attached to the arithmetic instruction. It marks extends that are part
of a widening operation "free" and applies a sub-target specified overhead
(zero by default) to the arithmetic instructions.

Differential Revision: https://reviews.llvm.org/D32706

llvm-svn: 302582
2017-05-09 20:18:12 +00:00
Anna Thomas
3580c4d010 [LV] Fix insertion point for shuffle vectors in first order recurrence
Summary:
In first order recurrence vectorization, when the previous value is a phi node, we need to
set the insertion point to the first non-phi node.
We can have the previous value being a phi node, due to the generation of new
IVs as part of trunc optimization [1].

[1] https://reviews.llvm.org/rL294967

Reviewers: mssimpso, mkuper

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D32969

llvm-svn: 302532
2017-05-09 14:29:33 +00:00
Amara Emerson
59ff6c8c60 Introduce experimental generic intrinsics for horizontal vector reductions.
- This change allows targets to opt-in to using them instead of the log2
  shufflevector algorithm.
- The SLP and Loop vectorizers have the common code to do shuffle reductions
  factored out into LoopUtils, and now have a unified interface for generating
  reductions regardless of the preference of the target. LoopUtils now uses TTI
  to determine what kind of reductions the target wants to handle.
- For CodeGen, basic legalization support is added.

Differential Revision: https://reviews.llvm.org/D30086

llvm-svn: 302514
2017-05-09 10:43:25 +00:00
Jonas Paulsson
fd1a4ff9b5 Use right function in LoopVectorize.
-    unsigned AS = getMemInstAlignment(I);
+    unsigned AS = getMemInstAddressSpace(I);

Review: Hal Finkel
llvm-svn: 302114
2017-05-04 05:31:56 +00:00
Sanjoy Das
19757d9ec3 Rename WeakVH to WeakTrackingVH; NFC
This relands r301424.

llvm-svn: 301812
2017-05-01 17:07:49 +00:00
Craig Topper
c5d014c133 [ValueTracking] Introduce a KnownBits struct to wrap the two APInts for computeKnownBits
This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit.

Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch.

I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases.

Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero|One) so we don't write it out everywhere. Maybe a method for (Zero|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with.

Differential Revision: https://reviews.llvm.org/D32376

llvm-svn: 301432
2017-04-26 16:39:58 +00:00
Sanjoy Das
732f091d68 Reverts commit r301424, r301425 and r301426
Commits were:

"Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts"
"Add a new WeakVH value handle; NFC"
"Rename WeakVH to WeakTrackingVH; NFC"

The changes assumed pointers are 8 byte aligned on all architectures.

llvm-svn: 301429
2017-04-26 16:37:05 +00:00
Matthew Simpson
1c83e0e38f [LV] Handle external uses of floating-point induction variables
Reference: https://bugs.llvm.org/show_bug.cgi?id=32758
Differential Revision: https://reviews.llvm.org/D32445

llvm-svn: 301428
2017-04-26 16:23:02 +00:00
Sanjoy Das
e226969b1c Rename WeakVH to WeakTrackingVH; NFC
Summary:
I plan to use WeakVH to mean "nulls itself out on deletion, but does
not track RAUW" in a subsequent commit.

Reviewers: dblaikie, davide

Reviewed By: davide

Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D32266

llvm-svn: 301424
2017-04-26 16:20:52 +00:00
Stanislav Mekhanoshin
19be2234f0 Skip bitcasts while looking for GEP in LoadStoreVectorizer
Differential Revisison: https://reviews.llvm.org/D32101

llvm-svn: 301343
2017-04-25 18:00:08 +00:00
Craig Topper
d59a3da3b6 [APInt] Use isSubsetOf, intersects, and bit counting methods to reduce temporary APInts
This patch uses various APInt methods to reduce temporary APInt creation.

This should be all of the unrelated cleanups that got buried in D32376(creating a KnownBits struct) as well as some pointed out by Simon during the review of that. Plus a few improvements to use counting instead of masking.

I've left out any places where we do something like (KnownZero & KnownOne) != 0 as I plan to add a helper method to KnownBits to ask that question and didn't want to thrash that code an additional time.

Differential Revision: https://reviews.llvm.org/D32495

llvm-svn: 301338
2017-04-25 17:46:30 +00:00
Gil Rapaport
a2e72f19da [LV] Remove redundant basic block split
This patch is part of D28975's breakdown.

Genreating the control-flow to guard predicated instructions modified to
only use SplitBlockAndInsertIfThen() for producing the if-then construct.

Differential Revision: https://reviews.llvm.org/D32224

llvm-svn: 301293
2017-04-25 05:57:22 +00:00
Matthew Simpson
ca617e5f11 [LV] Model if-converted phi node costs
Phi nodes in non-header blocks are converted to select instructions after
if-conversion. This patch updates the cost model to account for the selects.

Differential Revision: https://reviews.llvm.org/D31906

llvm-svn: 300980
2017-04-21 14:14:54 +00:00
Easwaran Raman
7b83777c46 [SLP vectorizer] Allow phi node reordering in tryToVectorizeList.
In tryToVectorizeList, under a very limited circumstance (when entered
from tryToVectorizePair), the values may be reordered (swapped) and the
SLP tree is built with the new order. This extends that to the case when
starting from phis in vectorizeChainsInBlock when there are exactly two
phis. The textual order of phi nodes shouldn't really matter. Without
this change, the loop body in the accompnaying test case is fully vectorized
when we swap the orde of the phis but not with this order. While this
doesn't solve the phi-ordering problem in a general way (for more than 2
phis), this is simple fix that piggybacks on an existing mechanism and
is useful in cases like multiplying two complex numbers.

Differential revision: https://reviews.llvm.org/D32065

llvm-svn: 300574
2017-04-18 18:16:57 +00:00
Gil Rapaport
e43ac15a7c [LV] Cache block mask values
This patch is part of D28975's breakdown.

Add caching for block masks similar to the cache already used for edge masks,
replacing generation per user with reusing the first generated value which
dominates all uses.

Differential Revision: https://reviews.llvm.org/D32054

llvm-svn: 300557
2017-04-18 14:43:43 +00:00
Gil Rapaport
f30248c36d [LV] Remove implicit single basic block assumption
This patch is part of D28975's breakdown - no change in output intended.

LV's code currently assumes the vectorized loop is a single basic block up
until predicateInstructions() is called. This patch removes two manifestations
of this assumption (loop phi incoming values, dominator tree update) by
replacing the use of vectorLoopBody with the vectorized loop's latch/header.

Differential Revision: https://reviews.llvm.org/D32040

llvm-svn: 300310
2017-04-14 07:30:23 +00:00
Anna Thomas
926acfc84a [LV] Fix the vector code generation for first order recurrence
Summary:
In first order recurrences where phi's are used outside the loop,
we should generate an additional vector.extract of the second last element from
the vectorized phi update.
This is because we require the phi itself (which is the value at the second last
iteration of the vector loop) and not the phi's update within the loop.
Also fix the code gen when we just unroll, but don't vectorize.
Fixes PR32396.

Reviewers: mssimpso, mkuper, anemet

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D31979

llvm-svn: 300238
2017-04-13 18:59:25 +00:00
Ayal Zaks
25f1259181 [LV] Refactor ILV to provide vectorizeInstruction(); NFC
Refactoring InnerLoopVectorizer's vectorizeBlockInLoop() to provide
vectorizeInstruction(). Aligning DeadInstructions with its only user.
Facilitates driving the transformation by VPlan - follows
https://reviews.llvm.org/D28975 and its tentative breakdown.

Differential Revision: https://reviews.llvm.org/D31997

llvm-svn: 300183
2017-04-13 09:07:23 +00:00
Jonas Paulsson
eedad8c536 [SLPVectorizer] Pass the right type argument to getCmpSelInstrCost()
In getEntryCost(), make the scalar type for a compare instruction that of the
operands, not i1. This is needed in order to call getCmpSelInstrCost() for a
compare in a sensible way, the same way as the LoopVectorizer does.

New test: test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll

Review: Matthew Simpson
https://reviews.llvm.org/D31601

llvm-svn: 300061
2017-04-12 13:29:25 +00:00
Jonas Paulsson
a16794b9f4 [LoopVectorizer] Improve handling of branches during cost estimation.
The cost for a branch after vectorization is very different depending on if
the vectorizer will if-convert the block (branch is eliminated), or if
scalarized and predicated blocks will be produced (branch duplicated before
each block). There is also the case of remaining scalar branches, such as the
back-edge branch.

This patch handles these cases differently with TTI based cost estimates.

Review: Matthew Simpson
https://reviews.llvm.org/D31175

llvm-svn: 300058
2017-04-12 13:13:15 +00:00
Jonas Paulsson
b15d643e4a [LoopVectorizer, TTI] New method supportsEfficientVectorElementLoadStore()
Since SystemZ supports vector element load/store instructions, there is no
need for extracts/inserts if a vector load/store gets scalarized.

This patch lets Target specify that it supports such instructions by means of
a new TTI hook that defaults to false.

The use for this is in the LoopVectorizer getScalarizationOverhead() method,
which will with this patch produce a smaller sum for a vector load/store on
SystemZ.

New test: test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll

Review: Adam Nemet
https://reviews.llvm.org/D30680

llvm-svn: 300056
2017-04-12 12:41:37 +00:00
Jonas Paulsson
90b172efa0 [SystemZ] TargetTransformInfo cost functions implemented.
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.

Interleaved access vectorization enabled.

BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.

Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631

llvm-svn: 300052
2017-04-12 11:49:08 +00:00
Anna Thomas
b2455558f1 [LV] Avoid vectorizing first order recurrence when phi uses are outside loop
In the vectorization of first order recurrence, we vectorize such
that the last element in the vector will be the one extracted to pass into the
scalar remainder loop. However, this is not true when there is a phi (other
than the primary induction variable) is used outside the loop.
In such a case, we need the value from the second last iteration (i.e.
the phi value), not the last iteration (which would be the phi update).
I've added a test case for this. Also see PR32396.

A follow up patch would generate the correct code gen for such cases,
and turn this vectorization on.

Differential Revision: https://reviews.llvm.org/D31910

Reviewers: mssimpso
llvm-svn: 299985
2017-04-11 21:02:00 +00:00
Matthew Simpson
7a552e6e70 Reapply r298620: [LV] Vectorize GEPs
This patch reapplies r298620. The original patch was reverted because of two
issues. First, the patch exposed a bug in InstCombine that caused the Chromium
builds to fail (PR32414). This issue was fixed in r299017. Second, the patch
introduced a bug in the vectorizer's scalars analysis that caused test suite
builds to fail on SystemZ. The scalars analysis was too aggressive and marked a
memory instruction scalar, even though it was going to be vectorized. This
issue has been fixed in the current patch and several new test cases for the
scalars analysis have been added.

llvm-svn: 299770
2017-04-07 14:15:34 +00:00
Matthew Simpson
a754d575fa [LV] Transform truncations of non-primary induction variables
The vectorizer tries to replace truncations of induction variables with new
induction variables having the smaller type. After r295063, this optimization
was applied to all integer induction variables, including non-primary ones.
When optimizing the truncation of a non-primary induction variable, we still
need to transform the new induction so that it has the correct start value.
This should fix PR32419.

Reference: https://bugs.llvm.org/show_bug.cgi?id=32419
llvm-svn: 298882
2017-03-27 20:07:38 +00:00
Ivan Krasin
2fef6ee778 Revert r298620: [LV] Vectorize GEPs
Reason: breaks linking Chromium with LLD + ThinLTO (a pass crashes)
LLVM bug: https://bugs.llvm.org//show_bug.cgi?id=32413

Original change description:

[LV] Vectorize GEPs

This patch adds support for vectorizing GEPs. Previously, we only generated
vector GEPs on-demand when creating gather or scatter operations. All GEPs from
the original loop were scalarized by default, and if a pointer was to be stored
to memory, we would have to build up the pointer vector with insertelement
instructions.

With this patch, we will vectorize all GEPs that haven't already been marked
for scalarization.

The patch refines collectLoopScalars to more exactly identify the scalar GEPs.
The function now more closely resembles collectLoopUniforms. And the patch
moves vector GEP creation out of vectorizeMemoryInstruction and into the main
vectorization loop. The vector GEPs needed for gather and scatter operations
will have already been generated before vectoring the memory accesses.

Original Differential Revision: https://reviews.llvm.org/D30710

llvm-svn: 298735
2017-03-24 20:49:43 +00:00
Matthew Simpson
6e2f8c9035 [LV] Vectorize GEPs
This patch adds support for vectorizing GEPs. Previously, we only generated
vector GEPs on-demand when creating gather or scatter operations. All GEPs from
the original loop were scalarized by default, and if a pointer was to be stored
to memory, we would have to build up the pointer vector with insertelement
instructions.

With this patch, we will vectorize all GEPs that haven't already been marked
for scalarization.

The patch refines collectLoopScalars to more exactly identify the scalar GEPs.
The function now more closely resembles collectLoopUniforms. And the patch
moves vector GEP creation out of vectorizeMemoryInstruction and into the main
vectorization loop. The vector GEPs needed for gather and scatter operations
will have already been generated before vectoring the memory accesses.

Differential Revision: https://reviews.llvm.org/D30710

llvm-svn: 298620
2017-03-23 16:29:58 +00:00
Matthew Simpson
0ed5ec7509 [LV] Delete unneeded scalar GEP creation code
The code for generating scalar base pointers in vectorizeMemoryInstruction is
not needed. We currently scalarize all GEPs and maintain the scalarized values
in VectorLoopValueMap. The GEP cloning in this unneeded code is the same as
that in scalarizeInstruction. The test cases that changed as a result of this
patch changed because we were able to reuse the scalarized GEP that we
previously generated instead of cloning a new one.

Differential Revision: https://reviews.llvm.org/D30587

llvm-svn: 298615
2017-03-23 16:07:21 +00:00
Yi Kong
9f5d4ec9d0 Test commit access
Remove some trailing whitespaces.

llvm-svn: 298379
2017-03-21 14:49:19 +00:00
Gil Rapaport
efbbc3ed40 [LV] Refactor cross-iteration phi's back-patching; NFC
This patch refactors the PHisToFix loop as follows:

- The loop itself now resides in its own method.
- The new method iterates on scalar-loop's header; the PHIsToFix map formerly
  propagated as an output parameter and filled during phi widening is removed.
- The code handling reductions is moved into its own method, similar to the
  existing fixFirstOrderRecurrence().

Differential Revision: https://reviews.llvm.org/D30755

llvm-svn: 297740
2017-03-14 13:50:47 +00:00
Ayal Zaks
91ca0c753e [LV] Refactor Cost Model's selectVectorizationFactor(); NFC
Refactoring Cost Model's selectVectorizationFactor() so that it handles only the
selection of the best VF from a pre-computed range of candidate VF's, extracting
early-exit criteria and the computation of a MaxVF upper-bound to other methods,
all driven by a newly introduced LoopVectorizationPlanner.

Differential Revision: https://reviews.llvm.org/D30653

llvm-svn: 297737
2017-03-14 13:07:04 +00:00
Jonas Paulsson
42e7a2d74b [TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved
getIntrinsicInstrCost() used to only compute scalarization cost based on types.
This patch improves this so that the actual arguments are checked when they are
available, in order to handle only unique non-constant operands.

Tests updates:

Analysis/CostModel/X86/arith-fp.ll
Transforms/LoopVectorize/AArch64/interleaved_cost.ll
Transforms/LoopVectorize/ARM/interleaved_cost.ll

The improvement in getOperandsScalarizationOverhead() to differentiate on
constants made it necessary to update the interleaved_cost.ll tests even
though they do not relate to intrinsics.

Review: Hal Finkel
https://reviews.llvm.org/D29540

llvm-svn: 297705
2017-03-14 06:35:36 +00:00
Gil Rapaport
52cac734e2 [LV] Set memcheck metadata also for VF==1
This commit is a follow-up on r297580. It fixes the FIXME added temporarily
by that commit to keep the removal of Unroller's specialized version of
scalarizeInstruction() an NFC. See https://reviews.llvm.org/D30715 for details.

llvm-svn: 297610
2017-03-13 10:23:46 +00:00
Gil Rapaport
95732c0a6d [LV] A unified scalarizeInstruction() for Vectorizer and Unroller; NFC
Unroller's specialized scalarizeInstruction() is mostly duplicating Vectorizer's
variant. OTOH Vectorizer's scalarizeInstruction() already supports the special
case of VF==1 except for avoiding mask-bit extraction in that case. This patch
removes Unroller's specialized version in favor of a unified method.

The only functional difference between the two variants seems to be setting
memcheck metadata for loads and stores only in Vectorizer's variant, which is a
bug in Unroller. To keep this patch an NFC the unified method doesn't set
memcheck metadata for VF==1.

Differential Revision: https://reviews.llvm.org/D30715

llvm-svn: 297580
2017-03-12 12:31:38 +00:00
Ayal Zaks
0f64e83c30 Test commit.
llvm-svn: 297579
2017-03-12 09:48:06 +00:00
Michael Kuperstein
0a5d356cf1 [SLP] Revert everything that has to do with memory access sorting.
This reverts r293386, r294027, r294029 and r296411.

Turns out the SLP tree isn't actually a "tree" and we don't handle
accessing the same packet of loads in several different orders well,
causing miscompiles.

Revert until we can fix this properly.

llvm-svn: 297493
2017-03-10 18:59:07 +00:00
Daniel Berlin
294f6c0958 Add support for DenseMap/DenseSet count and find using const pointers
Summary:
Similar to SmallPtrSet, this makes find and count work with both const
referneces and const pointers.

Reviewers: dblaikie

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D30713

llvm-svn: 297424
2017-03-10 00:25:26 +00:00
Adam Nemet
d57520ae79 [SLP] Mark values in Dot that need to be extracted
llvm-svn: 297361
2017-03-09 05:48:03 +00:00
Adam Nemet
c956f416cc [SLP] Visualize SLP trees with -view-slp-tree
Analyzing larger trees is extremely difficult with the current debug output so
this adds GraphTraits and DOTGraphTraits on top of the VectorizableTree data
structure.  We can now display the SLP trees with Graphviz as in
https://reviews.llvm.org/F3132765.

I decorated the graph where a value needs to be gathered for one reason or
another.  These are the red nodes.

There are other improvement I am planning to make as I work through my case
here.  For example, I would also like to mark nodes that need to be extracted.

Differential Revision: https://reviews.llvm.org/D30731

llvm-svn: 297303
2017-03-08 18:47:50 +00:00
Matthew Simpson
9d666ee035 [LV] Select legal insert point when fixing first-order recurrences
Because IRBuilder performs constant-folding, it's not guaranteed that an
instruction in the original loop map to an instruction in the vector loop. It
could map to a constant vector instead. The handling of first-order recurrences
was incorrectly making this assumption when setting the IRBuilder's insert
point.

llvm-svn: 297302
2017-03-08 18:18:20 +00:00
Matthew Simpson
ca6ac576f1 [LV] Consider users that are memory accesses in uniforms expansion step
When expanding the set of uniform instructions beyond the seed instructions
(e.g., consecutive pointers), we mark a new instruction uniform if all its
loop-varying users are uniform. We should also allow users that are consecutive
or interleaved memory accesses. This fixes cases where we have an instruction
that is used as the pointer operand of a consecutive access but also used by a
non-memory instruction that later becomes uniform as part of the expansion.

llvm-svn: 297179
2017-03-07 18:47:30 +00:00