1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 14:33:02 +02:00
Commit Graph

1338 Commits

Author SHA1 Message Date
Andrea Di Biagio
9c99df5e6c [DAG] Refactor the shuffle combining logic in DAGCombiner. NFC.
This patch simplifies the logic that combines a pair of shuffle nodes into
a single shuffle if there is a legal mask. Also added comments to better
describe the algorithm. No functional change intended.

llvm-svn: 222522
2014-11-21 11:33:07 +00:00
Hao Liu
9cb82be410 DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal.
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)

A hook is added to allow the target to control whether it needs to do such combine.

Reviewed in http://reviews.llvm.org/D6334

llvm-svn: 222510
2014-11-21 06:39:58 +00:00
David Blaikie
60e6c80905 Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.

This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...

llvm-svn: 222334
2014-11-19 07:49:26 +00:00
Oliver Stannard
2efad103c3 Fix optimisations of SELECT_CC which assumed result is boolean
Some optimisations in DAGCombiner cause miscompilations for targets that use
TargetLowering::UndefinedBooleanContent, because they assume that the results
of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and
XORed. These optimisations are only valid for targets that use
ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent.

This is a follow-up to D6210/r221693.

llvm-svn: 222123
2014-11-17 10:49:31 +00:00
Andrea Di Biagio
5475bc1d1b [DAG] Improved target independent vector shuffle folding logic.
This patch teaches the DAGCombiner how to combine shuffles according to rules:
   shuffle(shuffle(A, Undef, M0), B, M1) -> shuffle(B, A, M2)
   shuffle(shuffle(A, B, M0), B, M1) -> shuffle(B, A, M2)
   shuffle(shuffle(A, B, M0), A, M1) -> shuffle(B, A, M2)

llvm-svn: 222090
2014-11-15 22:56:25 +00:00
Oliver Stannard
553b791f8f LLVM incorrectly folds xor into select
LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with
(set_cc !cc x y), which is only correct when the xor has type i1.
Instead, we should check that the constant operand to the xor is all
ones.

llvm-svn: 221693
2014-11-11 17:36:01 +00:00
Andrea Di Biagio
38de0c92c6 [X86] Teach method 'isVectorClearMaskLegal' how to check for legal blend masks.
This patch improves the folding of vector AND nodes into blend operations for
targets that feature SSE4.1. A vector AND node where one of the operands is
a constant build_vector with elements that are either zero or all-ones can be
converted into a blend.

This allows for example to simplify the following code:

define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
  %1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1>
  %2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0>
  %3 = or <4 x i32> %1, %2
  ret <4 x i32> %3
}

Before this patch llc (-mcpu=corei7) generated:
        andps  LCPI1_0(%rip), %xmm0, %xmm0
        andps  LCPI1_1(%rip), %xmm1, %xmm1
        orps   %xmm1, %xmm0, %xmm0
        retq

With this patch we generate a single 'vpblendw'.

llvm-svn: 221343
2014-11-05 13:04:14 +00:00
Paul Robinson
2b27bd26b6 Normally an 'optnone' function goes through fast-isel, which does not
call DAGCombiner. But we ran into a case (on Windows) where the
calling convention causes argument lowering to bail out of fast-isel,
and we end up in CodeGenAndEmitDAG() which does run DAGCombiner.
So, we need to make DAGCombiner check for 'optnone' after all.

Commit includes the test that found this, plus another one that got
missed in the original optnone work.

llvm-svn: 221168
2014-11-03 18:19:26 +00:00
Louis Gerbarg
6f92b8978d Fix incorrect invariant check in DAG Combine
Earlier this summer I fixed an issue where we were incorrectly combining
multiple loads that had different constraints such alignment, invariance,
temporality, etc. Apparently in one case I made copt paste error and swapped
alignment and invariance.

Tests included.

rdar://18816719

llvm-svn: 220933
2014-10-30 22:21:03 +00:00
NAKAMURA Takumi
ee3b3d3d09 Whitespace.
llvm-svn: 220857
2014-10-29 15:23:11 +00:00
Sanjay Patel
d9b7837012 Use rsqrt (X86) to speed up reciprocal square root calcs
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.

For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).

This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..

See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900

Differential Revision: http://reviews.llvm.org/D5658

llvm-svn: 220570
2014-10-24 17:02:16 +00:00
Benjamin Kramer
a6c059251d Strength reduce constant-sized vectors into arrays. No functionality change.
llvm-svn: 220412
2014-10-22 19:55:26 +00:00
Matt Arsenault
2257f6b589 Add minnum / maxnum codegen
llvm-svn: 220342
2014-10-21 23:01:01 +00:00
Jan Vesely
31f817808d SelectionDAG: Add sext_inreg optimizations
v2: use dyn_cast
    fixup comments
v3: use cast

Reviewed-by: Matt Arsenault <arsenm2@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220044
2014-10-17 14:45:25 +00:00
Sanjay Patel
78e4aafd3f Improve sqrt estimate algorithm (fast-math)
This patch changes the fast-math implementation for calculating sqrt(x) from:
y = 1 / (1 / sqrt(x))
to:
y = x * (1 / sqrt(x))

This has 2 benefits: less code / faster code and one less estimate instruction 
that may lose precision.

The only target that will be affected (until http://reviews.llvm.org/D5658 is approved)
is PPC. The difference in codegen for PPC is 2 less flops for a single-precision sqrtf
or vector sqrtf and 4 less flops for a double-precision sqrt. 
We also eliminate a constant load and extra register usage.

Differential Revision: http://reviews.llvm.org/D5682

llvm-svn: 219445
2014-10-09 21:26:35 +00:00
Eric Christopher
fb763ff585 Remove unnecessary include.
llvm-svn: 219368
2014-10-08 23:38:40 +00:00
Eric Christopher
c47c81ea7a Use both the cached TLI and the subtarget off of the DAG in
the DAG combiner.

llvm-svn: 219367
2014-10-08 23:38:39 +00:00
Hal Finkel
0d252ab8da [DAGCombine] Remove SIGN_EXTEND-related inf-loop
The patch's author points out that, despite the function's documentation,
getSetCCResultType is only used to get the SETCC result type (with one
here-removed problematic exception). In one case, getSetCCResultType was being
used to get the predicate type to use for a SELECT node, and then
SIGN_EXTENDing (or truncating) to get the input predicate to match that type.
Unfortunately, this was happening inside visitSIGN_EXTEND, and creating new
SIGN_EXTEND nodes was causing an infinite loop. In addition, this behavior was
wrong if a target was not using ZeroOrNegativeOneBooleanContent. Lastly, the
extension/truncation seems unnecessary here: SELECT is defined as:

  Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not i1
  then the high bits must conform to getBooleanContents.

So here we remove this use of getSetCCResultType and update
getSetCCResultType's documentation to reflect its actual uses.

Patch by deadal nix!

llvm-svn: 219141
2014-10-06 20:19:47 +00:00
Sanjay Patel
7477a9a155 Fast-math fold: x / (y * sqrt(z)) -> x * (rsqrt(z) / y)
The motivation is to recognize code such as this from /llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c:

float distance = sqrt(dx * dx + dy * dy + dz * dz);
float mag = dt / (distance * distance * distance);

Without this patch, we don't match the sqrt as a reciprocal sqrt, so for PPC the new testcase in this patch produces:

   addis 3, 2, .LCPI4_2@toc@ha
   lfs 4, .LCPI4_2@toc@l(3)
   addis 3, 2, .LCPI4_1@toc@ha
   lfs 0, .LCPI4_1@toc@l(3)
   fcmpu 0, 1, 4
   beq 0, .LBB4_2
# BB#1:
   frsqrtes 4, 1
   addis 3, 2, .LCPI4_0@toc@ha
   lfs 5, .LCPI4_0@toc@l(3)
   fnmsubs 13, 1, 5, 1
   fmuls 6, 4, 4
   fmadds 1, 13, 6, 5
   fmuls 1, 4, 1
   fres 4, 1                <--- reciprocal of reciprocal square root
   fnmsubs 1, 1, 4, 0
   fmadds 4, 4, 1, 4
.LBB4_2:
   fmuls 1, 4, 2
   fres 2, 1
   fnmsubs 0, 1, 2, 0
   fmadds 0, 2, 0, 2
   fmuls 1, 3, 0
   blr

After the patch, this simplifies to:

frsqrtes 0, 1
addis 3, 2, .LCPI4_1@toc@ha
fres 5, 2
lfs 4, .LCPI4_1@toc@l(3)
addis 3, 2, .LCPI4_0@toc@ha
lfs 7, .LCPI4_0@toc@l(3)
fnmsubs 13, 1, 4, 1
fmuls 6, 0, 0
fnmsubs 2, 2, 5, 7
fmadds 1, 13, 6, 4
fmadds 2, 5, 2, 5
fmuls 0, 0, 1
fmuls 0, 0, 2
fmuls 1, 3, 0
blr

Differential Revision: http://reviews.llvm.org/D5628

llvm-svn: 219139
2014-10-06 19:31:18 +00:00
Chandler Carruth
a54972693a [x86, dag] Teach the DAG combiner to prune inputs toa vector_shuffle
that are unused.

This allows the combiner to delete math feeding shuffles where the math
isn't actually necessary. This improves some of the vperm2x128 tests
that regressed when the vector shuffle lowering started actually
generating vperm instructions rather than forcibly decomposing them.

Sadly, this isn't enough to get this *really* right because we still
form a completely unnecessary permutation. To fix that, we also need to
fold shuffles which just rearrange concatenated or inserted subvectors.

llvm-svn: 219086
2014-10-05 19:14:34 +00:00
Sanjay Patel
abcb3acee5 Use the target-specified iteration count to opt out of any further refinement of an estimate. NFC.
llvm-svn: 218700
2014-09-30 20:44:23 +00:00
Sanjay Patel
c095454ca2 Split the estimate() interface into separate functions for each type. NFC.
It was hacky to use an opcode as a switch because it won't always match
(rsqrte != sqrte), and it looks like we'll need to add more special casing
per arch than I had hoped for. Eg, x86 will prefer a different NR estimate
implementation. ARM will want to use it's 'step' instructions. There also
don't appear to be any new estimate instructions in any arch in a long,
long time. Altivec vloge and vexpte may have been the first and last in
that field...

llvm-svn: 218698
2014-09-30 20:28:48 +00:00
Andrea Di Biagio
a5f619dff6 [DAG] Check in advance if a build_vector has a legal type before attempting to convert it into a shuffle.
Currently, the DAG Combiner only tries to convert type-legal build_vector nodes
into shuffles. This patch simply moves the logic that checks if a
build_vector has a legal value type up before we even start analyzing the
operands. This allows to early exit immediately from method
'visitBUILD_VECTOR' if the node type is known to be illegal.

No functional change intended.

llvm-svn: 218677
2014-09-30 15:30:22 +00:00
James Molloy
3c39c7c1b4 [AArch64] Redundant store instructions should be removed as dead code
If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed.

This problem is found in spec2006-197.parser.

For example,
  stur    w10, [x11, #-4]
  stur    w10, [x11, #-4]
Then one of the two stur instructions can be removed.

Patch by David Xu!

llvm-svn: 218569
2014-09-27 17:02:54 +00:00
Sanjay Patel
98a98574c5 Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2).
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

And:

z = y / x

into:

z = y * rcpe(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.

Differential Revision: http://reviews.llvm.org/D5484

llvm-svn: 218553
2014-09-26 23:01:47 +00:00
David Xu
c351ac640d Revert patch ofr218493
llvm-svn: 218494
2014-09-26 02:28:03 +00:00
David Xu
43a5d5bdc1 Redundant store instructions should be removed as dead code
llvm-svn: 218493
2014-09-26 02:02:09 +00:00
Sanjay Patel
14a8712c3d Use SDValue bool operator to reduce code. No functional change.
llvm-svn: 218314
2014-09-23 16:24:20 +00:00
Sanjay Patel
1235f854b3 Refactor reciprocal square root estimate into target-independent function; NFC.
This is purely a plumbing patch. No functional changes intended.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner.

Next steps:

    The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added.
    Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner.
    X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added.

Differential Revision: http://reviews.llvm.org/D5425

llvm-svn: 218219
2014-09-21 15:19:15 +00:00
Hal Finkel
0c0c256ad7 Optionally enable more-aggressive FMA formation in DAGCombine
The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one
use, but this is overly-conservative on some systems. Specifically, if the FMA
and the FADD have the same latency (and the FMA does not compete for resources
with the FMUL any more than the FADD does), there is no need for the
restriction, and furthermore, forming the FMA leaving the FMUL can still allow
for higher overall throughput and decreased critical-path length.

Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to
elide the hasOneUse check. This is enabled for PowerPC by default, as most
PowerPC systems will benefit.

Patch by Olivier Sallenave, thanks!

llvm-svn: 218120
2014-09-19 11:42:56 +00:00
Sanjay Patel
7c6f056447 Replace dead links to "Hacker's Delight" with general references. NFC.
llvm-svn: 217814
2014-09-15 19:47:44 +00:00
Matt Arsenault
dbc82e3483 Add DAG combine for shl + add of constants.
Do
 (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)

This is already done for multiplies, but since multiplies
by powers of two are turned into shifts, we also need
to handle it here.

This might want checks for isLegalAddImmediate to avoid
transforming an add of a legal immediate with one that isn't.

llvm-svn: 217610
2014-09-11 17:34:19 +00:00
Sanjay Patel
099c1958cc Combine fmul vector FP constants when unsafe math is allowed.
This is an extension of the change made with r215820:
http://llvm.org/viewvc/llvm-project?view=revision&revision=215820

That patch allowed combining of splatted vector FP constants that are multiplied.

This patch allows combining non-uniform vector FP constants too by relaxing the
check on the type of vector. Also, canonicalize a vector fmul in the
same way that we already do for scalars - if only one operand of the fmul is a
constant, make it operand 1. Otherwise, we miss potential folds.

This fold is also done by -instcombine, but it's possible that extra
fmuls may have been generated during lowering.

Differential Revision: http://reviews.llvm.org/D5254

llvm-svn: 217599
2014-09-11 15:45:27 +00:00
David Xu
4d7f013423 Build correct vector filled with undef nodes
llvm-svn: 217570
2014-09-11 05:10:28 +00:00
Sanjay Patel
b6481eb090 Group unsafe fmul math folds together for easier reading. No functional change.
llvm-svn: 217399
2014-09-08 20:16:42 +00:00
Sanjay Patel
3803a37086 Fix the FIXME that was just added in r217390 - remove a bunch of redundant fold permutations.
The testcases for these folds already exist in test/CodeGen/X86/fp-fast.ll.

llvm-svn: 217393
2014-09-08 18:22:51 +00:00
Sanjay Patel
510f5b3b43 group unsafe math folds together for easier reading
Also added a FIXME regarding redundant folds for non-canonicalized constants.

llvm-svn: 217390
2014-09-08 17:32:19 +00:00
Sanjay Patel
5c895f97e3 Allow vector fsub ops with constants to get the same optimizations as scalars.
This problem is bigger than just fsub, but this is the minimum fix to solve
fneg for PR20556 ( http://llvm.org/bugs/show_bug.cgi?id=20556 ), and we solve
zero subtraction with the same change.

llvm-svn: 217286
2014-09-05 22:26:22 +00:00
Sanjay Patel
0fe727a669 clean up; NFC
llvm-svn: 217278
2014-09-05 20:55:46 +00:00
Matt Arsenault
6635d39450 Fix interference caused by fmul 2, x -> fadd x, x
If an fmul was introduced by lowering, it wouldn't be folded
into a multiply by a constant since the earlier combine would
have replaced the fmul with the fadd.

llvm-svn: 216932
2014-09-02 19:02:53 +00:00
Matt Arsenault
2e1586e561 Fix comment and unnecessary check for FP build_vectors.
This was copy-paste from the integer version, but
FP build_vectors don't truncate.

llvm-svn: 216928
2014-09-02 18:33:51 +00:00
Hal Finkel
9b47ecfb28 Enable splitting indexing from loads with TargetConstants
When I recommitted r208640 (in r216898) I added an exclusion for TargetConstant
offsets, as there is no guarantee that a backend can handle them on generic
ADDs (even if it generates them during address-mode matching) -- and,
specifically, applying this transformation directly with TargetConstants caused
a self-hosting failure on PPC64. Ignoring all TargetConstants, however, is less
than ideal. Instead, for non-opaque constants, we can convert them into regular
constants for use with the generated ADD (or SUB).

llvm-svn: 216908
2014-09-02 16:05:23 +00:00
Hal Finkel
afd5700add Revert "Revert '[DAGCombiner] Split up an indexed load if only the base pointer value is live'"
I reverted r208640 in r209747 because r208640 broke self-hosting on PPC64. The
underlying cause of the failure is that pre-inc loads with increments
represented by ISD::TargetConstants were being transformed into ISD:::ADDs with
ISD::TargetConstant operands. PPC doesn't have a pattern for those, and so they
were selected as invalid r+r adds.

This recommits r208640, rebased and with an exclusion for ISD::TargetConstant
increments. This behavior seems correct, although in the future we might want
to ask the target to split out the indexing that uses ISD::TargetConstants.

Unfortunately, I don't yet have small test case where the relevant invalid
'add' instruction is not itself dead (and thus eliminated by
DeadMachineInstructionElim -- sometimes bugpoint is too good at removing things)

Original commit message (by Adam Nemet):

Right now the load may not get DCE'd because of the side-effect of updating
the base pointer.

This can happen if we lower a read-modify-write of an illegal larger type
(e.g. i48) such that the modification only affects one of the subparts (the
lower i32 part but not the higher i16 part).  See the testcase.

In order to spot the dead load we need to revisit it when SimplifyDemandedBits
decided that the value of the load is masked off.  This is the
CommitTargetLoweringOpt piece.

I checked compile time with ARM64 by sending SPEC bitcode files through llc.
No measurable change.

Fixes <rdar://problem/16031651>

llvm-svn: 216898
2014-09-02 06:24:04 +00:00
Sanjay Patel
644b751800 Move FNEG next to FABS and make them more similar, so it's easier that they can be refactored. NFC.
llvm-svn: 216688
2014-08-28 21:51:37 +00:00
Owen Anderson
667ba798e8 Do not introduce new shuffle patterns after operation legalization if SHUFFLE_VECTOR
was marked custom.  The target independent DAG combine has no way to know if
the shuffles it is introducing are ones that the target could support or not.

llvm-svn: 216678
2014-08-28 17:49:58 +00:00
Sanjay Patel
ec50268a80 Janitorial services: "Don’t duplicate function or class name at the beginning of the comment."
llvm-svn: 216674
2014-08-28 16:29:51 +00:00
Sanjay Patel
be9c641d8e Remove local TLI vars that are just duplicates of the class var. No functional change.
llvm-svn: 216673
2014-08-28 16:01:50 +00:00
Sanjay Patel
049add0c08 Use local vars to improve readability. No functional change.
Completes what was started in r216611 and r216623. 
Used const refs instead of pointers; not sure if one is preferable to the other.

llvm-svn: 216672
2014-08-28 15:53:16 +00:00
Sanjay Patel
df00ebc59f Use local variable in visitFADD. No functional change.
llvm-svn: 216623
2014-08-27 21:42:42 +00:00
Sanjay Patel
aa5b60a6f9 Group unsafe-math optimizations for fsub into one block. No functional change.
llvm-svn: 216616
2014-08-27 20:57:52 +00:00
Sanjay Patel
62aafe2e1c Use local variable to improve readability.
No functional change intended.

llvm-svn: 216611
2014-08-27 20:40:31 +00:00
Craig Topper
43cee2f5fc Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or just letting them be implicitly created.
llvm-svn: 216525
2014-08-27 05:25:25 +00:00
Craig Topper
c2e0ae6754 Use range based for loops to avoid needing to re-mention SmallPtrSet size.
llvm-svn: 216351
2014-08-24 23:23:06 +00:00
Sanjay Patel
7498546daf name change: isPow2DivCheap -> isPow2SDivCheap
isPow2DivCheap

That name doesn't specify signed or unsigned.

Lazy as I am, I eventually read the function and variable comments. It turns out that this is strictly about signed div. But I discovered that the comments are wrong:

   srl/add/sra

is not the general sequence for signed integer division by power-of-2. We need one more 'sra':

   sra/srl/add/sra

That's the sequence produced in DAGCombiner. The first 'sra' may be removed when dividing by exactly '2', but that's a special case.

This patch corrects the comments, changes the name of the flag bit, and changes the name of the accessor methods.

No functional change intended.

Differential Revision: http://reviews.llvm.org/D5010

llvm-svn: 216237
2014-08-21 22:31:48 +00:00
Benjamin Kramer
54a0e07417 DAGCombiner: Make concat_vector combine safe for EVTs and concat_vectors with many arguments.
PR20677

llvm-svn: 216175
2014-08-21 13:28:02 +00:00
Matt Arsenault
71bb8180a2 Fix fmul combines with constant splat vectors
Fixes things like fmul x, 2 -> fadd x, x

llvm-svn: 215820
2014-08-16 10:14:19 +00:00
Andrea Di Biagio
e85b8c6f4a [DAGCombiner] Improve the folding of target independet shuffles to Undef.
When combining a pair of shuffle nodes, check if the combined shuffle mask is
trivially Undef. In case, immediately fold that pair of shuffles to Undef.

The lack of checks for undef masks was the root-cause of a poor-codegen bug
in the dag combiner.

Example:
  %1 = shufflevector <4 x i32> %A, <4 x i32> %B, <4 x i32> <i32 4, i32 1, i32 1, i32 6>
  %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 0, i32 4, i32 1, i32 6>
  %3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> <i32 1, i32 5, i32 3, i32 3>

Before this patch, on x86 (with -mcpu=corei7) we failed to fold the entire
sequence to Undef value and therefore we generated:
  shufps $-123, %xmm1, $xmm0
  pshufd $-46, %xmm0, %xmm0

With this patch, the entire shuffle sequence is folded to Undef and no
shuffles are generated in the output assembly.

Added new test cases to test 'combine-vec-shuffle-5.ll'.

llvm-svn: 215797
2014-08-16 00:29:44 +00:00
Sanjay Patel
e7588bf533 optimize vector fneg of bitcasted integer value
This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops.

This patch is very similar to a fabs patch committed at r214892.

Differential Revision: http://reviews.llvm.org/D4852

llvm-svn: 215646
2014-08-14 15:15:28 +00:00
Chandler Carruth
b65d715ac5 [SDAG] Fix a bug in the DAG combiner where we would fail to return the
input node after manually adding it to the worklist and using CombineTo.

Once we use CombineTo the input node may have been deleted. Despite this
being *completely confusing* and somewhat broken, the only way to
"correctly" return from a DAG combine after potentially deleting the
input node is to return *that exact node*....

But really, this code should just never have used CombineTo. It won't do
what it wants (returning the node as mentioned above just causes the
combine to infloop). The correct way to combine away a casted load to
a load of the correct type is to RAUW the chain directly and then return
the loaded value to replace the actual value node.

I managed to find this with the vector shuffle fuzzer even though it
clearly has nothing at all to do with vector shuffles and rather those
happen to trigger a load of a constant pool that hits this combine *just
right*. I've included the test as it is small and a nice stress test
that the infrastructure isn't asserting.

llvm-svn: 215622
2014-08-14 08:18:34 +00:00
Andrea Di Biagio
2e80a6ad4d [DAGCombiner] Improved target independent vector shuffle combine rule.
This patch improves the existing algorithm in DAGCombiner that
attempts to fold shuffles according to rule:
  shuffle(shuffle(x, y, M1), undef, M2) -> shuffle(y, undef, M3)

Before this change, there were cases where the DAGCombiner conservatively
avoided folding shuffles even if the resulting mask would have been legal.
That is because the algorithm wrongly assumed that commuting
an illegal shuffle mask would always produce an illegal mask.

With this change, we now correctly compute the commuted shuffle mask before
calling method 'isShuffleMaskLegal' on it.
On X86, this improves for example the codegen for the following function:

define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
  %1 = shufflevector <4 x i32> %B, <4 x i32> %A, <4 x i32> <i32 1, i32 2, i32 6, i32 7>
  %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 2, i32 3>
  ret <4 x i32> %2
}

Before this change the X86 backend (-mcpu=corei7) generated
the following assembly code for function @test:
  shufps $-23, %xmm0, %xmm1  # xmm1 = xmm1[1,2],xmm0[2,3]
  movhlps %xmm1, %xmm1       # xmm1 = xmm1[1,1]
  movaps %xmm1, %xmm0

Now we produce:
  movhlps %xmm0, %xmm0       # xmm0 = xmm0[1,1]

Added extra test cases in combine-vec-shuffle-2.ll to verify that we correctly
fold according to the above-mentioned rule.

llvm-svn: 215555
2014-08-13 16:09:40 +00:00
Michael J. Spencer
045df0bbd3 [x86] Fold extract_vector_elt of a load into the Load's address computation.
llvm-svn: 215409
2014-08-11 23:49:33 +00:00
Sanjay Patel
1f56d11869 Optimize vector fabs of bitcasted constant integer values.
Allow vector fabs operations on bitcasted constant integer values to be optimized
in the same way that we already optimize scalar fabs.

So for code like this:
%bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000
%fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast)
%ret = bitcast <2 x float> %fabs to i64

Instead of generating something like this:

movabsq (constant pool loadi of mask for sign bits)
vmovq   (move from integer register to vector/fp register)
vandps  (mask off sign bits)
vmovq   (move vector/fp register back to integer return register)

We should generate:

mov     (put constant value in return register)

I have also removed a redundant clause in the first 'if' statement:
N0.getOperand(0).getValueType().isInteger()

is the same thing as:
IntVT.isInteger()

Testcases for x86 and ARM added to existing files that deal with vector fabs.
One existing testcase for x86 removed because it is no longer ideal.

For more background, please see:
http://reviews.llvm.org/D4770

And:
http://llvm.org/bugs/show_bug.cgi?id=20354

Differential Revision: http://reviews.llvm.org/D4785

llvm-svn: 214892
2014-08-05 17:35:22 +00:00
Chandler Carruth
c3af9263ee [SDAG] Fix a really, really terrible bug in the DAG combiner.
This code is completely wrong. It is also dead, as if it were to *ever*
run, it would crash. Fortunately, after my work to the combiner, it is
at least *possible* to reach the code, and llvm-stress has found a test
case. Thanks to Patrick for reporting.

It would be really good if anyone who remembers how this code works and
what it was intended to do could add some more obvious test coverage
instead of my completely contrived and reduced test case. My test case
was so brittle I left a bread crumb comment in it to help the next
person to stumble on it and not know what it was actually testing for.

llvm-svn: 214785
2014-08-04 21:29:59 +00:00
Eric Christopher
99307e99a2 Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

llvm-svn: 214781
2014-08-04 21:25:23 +00:00
Chandler Carruth
9918929946 [x86] Don't add nodes to the combined set (and prune subsequent
combines) until they are legal.

Doing it the old way could, when the stars align *just* right, cause
a node to get into the combine set prior to being legalized. Then, when
the same node showed up as an operand to another node later on (but not
so much later on that it had been deleted as dead) we would fail to add
it back to the worklist thinking it had already been combined. This
would in turn cause it to not be legalized. Fortunately, we can also
walk the operands looking for uncombined (and thus potentially
un-legalized) nodes late. It will still ensure that we walk all operands
of all nodes and send all of them through both the legalizer without
changes and the combiner at least once. (Which was the original goal of
this).

I have a test case for this bug, but it is terribly brittle. For
example, it will stop finding the bug the moment I enable the new
shuffle lowering. I don't yet have any test case that reliably exercises
this bug, and it isn't clear that it will be possible to craft one. It
is entirely possible that with the new shuffle lowering the two forms of
doing this are precisely equivalent. That doesn't mean we shouldn't take
the more conservative approach of insisting on things in the combined
set having survived the legalizer.

llvm-svn: 214673
2014-08-03 23:10:59 +00:00
Sanjay Patel
70baef2f02 fix for PR20354 - Miscompile of fabs due to vectorization
This is intended to be the minimal change needed to fix PR20354 ( http://llvm.org/bugs/show_bug.cgi?id=20354 ). The check for a vector operation was wrong; we need to check that the fabs itself is not a vector operation.

This patch will not generate the optimal code. A constant pool load and 'and' op will be generated instead of just returning a value that we can calculate in advance (as we do for the scalar case). I've put a 'TODO' comment for that here and expect to have that patch ready soon.

There is a very similar optimization that we can do in visitFNEG, so I've put another 'TODO' there and expect to have another patch for that too.

llvm-svn: 214670
2014-08-03 22:48:23 +00:00
James Molloy
240087c61a [AArch64] Teach DAGCombiner that converting two consecutive loads into a vector load is not a good transform when paired loads are available.
The combiner was creating Q-register loads and stores, which then had to be spilled because there are no callee-save Q registers!

llvm-svn: 214634
2014-08-02 14:51:24 +00:00
Chandler Carruth
f8f86e7b92 [SDAG] Refactor the code which deletes nodes in the DAG combiner to do
so using a single helper which adds operands back onto the worklist.

Several places didn't rigorously do this but a couple already did.
Factoring them together and doing it rigorously is important to delete
things recursively early on in the combiner and get a chance to see
accurate hasOneUse values. While no existing test cases change, an
upcoming patch to add DAG combining logic for PSHUFB requires this to
work correctly.

llvm-svn: 214623
2014-08-02 10:02:07 +00:00
Owen Anderson
6c991acdcf Fix issues with ISD::FNEG and ISD::FMA SDNodes where they would not be constant-folded
during DAGCombine in certain circumstances.  Unfortunately, the circumstances required
to trigger the issue seem to require a pretty specific interaction of DAGCombines,
and I haven't been able to find a testcase that reproduces on X86, ARM, or AArch64.
The functionality added here is replicated in essentially every other DAG combine,
so it seems pretty obviously correct.

llvm-svn: 214622
2014-08-02 08:45:33 +00:00
Louis Gerbarg
751c622bb4 White space fix.
llvm-svn: 214455
2014-07-31 22:57:46 +00:00
Louis Gerbarg
8048e52537 Make sure no loads resulting from load->switch DAGCombine are marked invariant
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.

This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.

llvm-svn: 214449
2014-07-31 21:45:05 +00:00
Louis Gerbarg
909cb65fe7 Retain alignment requirements for load->selects modified by DAGCombine
DAGCombine may choose to rewrite graphs where two loads feed a select into
graphs where a select of two addresses feed a load. While it sanity checks the
loads to make sure they are broadly equivalent it currently just uses the
alignment restriction of the left node. In cases where the right node has
stronger alignment requiresment this may lead to bad codegen, such as generating
an aligned load where an unaligned load is required. This patch makes the
combine generate a load with an alignment that is the same as whichever is more
restrictive of the two alignments.

Tests included.

rdar://17762530

llvm-svn: 214322
2014-07-30 18:24:41 +00:00
Chandler Carruth
4123a2c212 [SDAG] Add DEBUG logging to the legalizer, fixing a "bug" found by
inspection in the proccess, and shuffle the logging in the DAG combiner
around a bit.

With this it is much easier to follow what the legalizer is doing. It
should even accurately present most of the strange legalization
operations where a single node is replaced by multiple nodes, etc. There
is still some information lost (we log SDNodes not SDValues so we don't
log which result is used for which thing), but I think this is much
closer to a usable system. Notably, this will make it *much* more
apparant when legalization is actually happening inside the combiner, or
when there is a cycle caused by interactions of the legalizer and the
combiner.

The "bug" I fixed here I'm not sure is remotely possible to trigger. We
were only adding one of the nodes in a replacement to the updated set
rather than all of the nodes in the replacement. Realistically, the
worst result of this are nodes not getting back onto the worklist in the
DAG combiner. I doubt it is possible to trigger this today, and
I certainly don't have any ideas about how, but this at least brings the
code into alignment with the principled operation of the routine.

llvm-svn: 214105
2014-07-28 17:55:07 +00:00
Chandler Carruth
581bf74660 [SDAG] When performing post-legalize DAG combining, run the legalizer
over each node in the worklist prior to combining.

This allows the combiner to produce new nodes which need to go back
through legalization. This is particularly useful when generating
operands to target specific nodes in a post-legalize DAG combine where
the operands are significantly easier to express as pre-legalized
operations. My immediate use case will be PSHUFB formation where we need
to build a constant shuffle mask with a build_vector node.

This also refactors the relevant functionality in the legalizer to
support this, and updates relevant tests. I've spoken to the R600 folks
and these changes look like improvements to them. The avx512 change
needs to be investigated, I suspect there is a disagreement between the
legalizer and the DAG combiner there, but it seems a minor issue so
leaving it to be re-evaluated after this patch.

Differential Revision: http://reviews.llvm.org/D4564

llvm-svn: 214020
2014-07-26 05:49:40 +00:00
Matt Arsenault
977e6a229d Store nodes only have 1 result.
llvm-svn: 213928
2014-07-25 07:56:42 +00:00
Chandler Carruth
033ee8b5b3 [SDAG] Start plumbing an assert into SDValues that we don't form one
with a result number outside the range of results for the node.

I don't know how we managed to not really check this very basic
invariant for so long, but the code is *very* broken at this point.
I have over 270 test failures with the assert enabled. I'm committing it
disabled so that others can join in the cleanup effort and reproduce the
issues. I've also included one of the obvious fixes that I already
found. More fixes to come.

llvm-svn: 213926
2014-07-25 07:23:23 +00:00
Chandler Carruth
8182e043d9 [SDAG] Introduce a combined set to the DAG combiner which tracks nodes
which have successfully round-tripped through the combine phase, and use
this to ensure all operands to DAG nodes are visited by the combiner,
even if they are only added during the combine phase.

This is critical to have the combiner reach nodes that are *introduced*
during combining. Previously these would sometimes be visited and
sometimes not be visited based on whether they happened to end up on the
worklist or not. Now we always run them through the combiner.

This fixes quite a few bad codegen test cases lurking in the suite while
also being more principled. Among these, the TLS codegeneration is
particularly exciting for programs that have this in the critical path
like TSan-instrumented binaries (although I think they engineer to use
a different TLS that is faster anyways).

I've tried to check for compile-time regressions here by running llc
over a merged (but not LTO-ed) clang bitcode file and observed at most
a 3% slowdown in llc. Given that this is essentially a worst case (none
of opt or clang are running at this phase) I think this is tolerable.
The actual LTO case should be even less costly, and the cost in normal
compilation should be negligible.

With this combining logic, it is possible to re-legalize as we combine
which is necessary to implement PSHUFB formation on x86 as
a post-legalize DAG combine (my ultimate goal).

Differential Revision: http://reviews.llvm.org/D4638

llvm-svn: 213898
2014-07-24 22:15:28 +00:00
Hal Finkel
9be4aefa57 AA metadata refactoring (introduce AAMDNodes)
In order to enable the preservation of noalias function parameter information
after inlining, and the representation of block-level __restrict__ pointer
information (etc.), additional kinds of aliasing metadata will be introduced.
This metadata needs to be carried around in AliasAnalysis::Location objects
(and MMOs at the SDAG level), and so we need to generalize the current scheme
(which is hard-coded to just one TBAA MDNode*).

This commit introduces only the necessary refactoring to allow for the
introduction of other aliasing metadata types, but does not actually introduce
any (that will come in a follow-up commit). What it does introduce is a new
AAMDNodes structure to hold all of the aliasing metadata nodes associated with
a particular memory-accessing instruction, and uses that structure instead of
the raw MDNode* in AliasAnalysis::Location, etc.

No functionality change intended.

llvm-svn: 213859
2014-07-24 12:16:19 +00:00
Chad Rosier
2cb7fd5dae [AArch64] Lower sdiv x, pow2 using add + select + shift.
The target-independent DAGcombiner will generate:
asr w1, X, #31 w1 = splat sign bit.
add X, X, w1, lsr #28 X = X + 0 or pow2-1
asr w0, X, asr #4 w0 = X/pow2

However, the add + shifts is expensive, so generate:
add w0, X, 15 w0 = X + pow2-1
cmp X, wzr X - 0
csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X;
asr w0, X, asr 4 w0 = X/pow2

llvm-svn: 213758
2014-07-23 14:57:52 +00:00
Chandler Carruth
908e62868c [SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate
insertions.

The old behavior could cause arbitrarily bad memory usage in the DAG
combiner if there was heavy traffic of adding nodes already on the
worklist to it. This commit switches the DAG combine worklist to work
the same way as the instcombine worklist where we null-out removed
entries and only add new entries to the worklist. My measurements of
codegen time shows slight improvement. The memory utilization is
unsurprisingly dominated by other factors (the IR and DAG itself
I suspect).

This change results in subtle, frustrating churn in the particular order
in which DAG combines are applied which causes a number of minor
regressions where we fail to match a pattern previously matched by
accident. AFAICT, all of these should be using AddToWorklist to directly
or should be written in a less brittle way. None of the changes seem
drastically bad, and a few of the changes seem distinctly better.

A major change required to make this work is to significantly harden the
way in which the DAG combiner handle nodes which become dead
(zero-uses). Previously, we relied on the ability to "priority-bump"
them on the combine worklist to achieve recursive deletion of these
nodes and ensure that the frontier of remaining live nodes all were
added to the worklist. Instead, I've introduced a routine to just
implement that precise logic with no indirection. It is a significantly
simpler operation than that of the combiner worklist proper. I suspect
this will also fix some other problems with the combiner.

I think the x86 changes are really minor and uninteresting, but the
avx512 change at least is hiding a "regression" (despite the test case
being just noise, not testing some performance invariant) that might be
looked into. Not sure if any of the others impact specific "important"
code paths, but they didn't look terribly interesting to me, or the
changes were really minor. The consensus in review is to fix any
regressions that show up after the fact here.

Thanks to the other reviewers for checking the output on other
architectures. There is a specific regression on ARM that Tim already
has a fix prepped to commit.

Differential Revision: http://reviews.llvm.org/D4616

llvm-svn: 213727
2014-07-23 07:08:53 +00:00
Chandler Carruth
a3d43d8f9b [SDAG,cleanup] Switch the DAG combiner over to use the spelling
'Worklist' consistently rather than a deeply confusing mixture of
'WorkList' and 'Worklist'.

Notably, the very 'WorkList' of the DAG combiner was exposed to target
specific DAG combines under an interface 'AddToWorklist' which was
implemented by in turn calling 'AddToWorkList' in the combiner. This has
sent me circling with the wrong case in grep one too many times.

I chose to normalize on 'Worklist' because that one won the grep-vote
for llvm/lib/... by a hundered hits or so, and it is used in places
relatively "canonical" such as InstCombine's Worklist. Let's all jsut
pick this casing, whether "correct", "good", or "bad" and be
consistent...

llvm-svn: 213506
2014-07-21 08:56:44 +00:00
Chandler Carruth
04f31dc299 [SDAG] Rather than using a narrow test against the one dummy node on the
stack, filter all handle nodes from the DAG combiner worklist.

This will also handle cases where other handle nodes might be
(erroneously) added to the worklist and then cause bugs and explosions
when deleted. For example, when running the legalizer within the DAG
combiner, there are times when other handle nodes are used and can end
up here.

llvm-svn: 213505
2014-07-21 08:32:31 +00:00
Andrea Di Biagio
626e5271d3 [DAGCombiner] Improve the shuffle-vector folding logic.
Canonicalize shuffles according to rules:
 *  shuffle(A, shuffle(A, B)) -> shuffle(shuffle(A,B), A)
 *  shuffle(B, shuffle(A, B)) -> shuffle(shuffle(A,B), B)
 *  shuffle(B, shuffle(A, Undef)) -> shuffle(shuffle(A, Undef), B)

This patch helps identifying more shuffle pairs that could be combined reusing
the already existing rules in the DAGCombiner.

Added new test 'combine-vec-shuffle-5.ll' to verify that the canonicalized
shuffles are now folded into a single shuffle node by the DAGCombiner.
Added more test cases to 'combine-vec-shuffle-4.ll'.

llvm-svn: 213504
2014-07-21 07:30:54 +00:00
Michael J. Spencer
ff351b2bdb Revert "[x86] Fold extract_vector_elt of a load into the Load's address computation."
There's a bug where this can create cycles in the DAG. It will take a bit
to fix, so I'm backing it out for now.

llvm-svn: 213339
2014-07-18 00:15:50 +00:00
Tim Northover
c6c02a43ba CodeGen: don't form illegail EXTLOAD operations.
It turns out that in most cases (the main exception being i1-related
types) once these operations are formed we cannot separate them and
the targets end up having to deal with them whether they want to or
not.

This is not a good situation, and a more reasonable default can be
formed by ackowledging this and having targets leave them as Legal.
Only x86 seems to be affected (other targets don't even try marking
the operation Expand).

Mostly there's no visible change here yet, but it will be useful to
have truly expanded EXTLOADS for MVT::f16 softening support.

llvm-svn: 213162
2014-07-16 15:37:24 +00:00
Andrea Di Biagio
454620d57b [DAGCombiner] Add more rules to fold shuffles.
This patch adds two new rules to the DAGCombiner:
 1.  shuffle (shuffle A, Undef, M0), B, M1 -> shuffle A, B, M2
 2.  shuffle (shuffle A, Undef, M0), A, M1 -> shuffle A, Undef, M2

We only do this if the combined shuffle is legal for the target.

Example:
;;
define <4 x float> @test(<4 x float> %a, <4 x float> %b) {
  %1 = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32><i32 6, i32 0, i32 1, i32 7>
  %2 = shufflevector <4 x float> %1, <4 x float> %b, <4 x i32><i32 1, i32 2, i32 4, i32 5>
  ret <4 x i32> %2
}
;;

(using llc -mcpu=corei7 -march=x86-64)
Before, the x86 backend generated:
  pshufd $120, %xmm0, %xmm0
  shufps $-108, %xmm0, %xmm1
  movaps %xmm1, %xmm0

Now the x86 backend generates:
  movsd %xmm1, %xmm0

llvm-svn: 213069
2014-07-15 13:26:28 +00:00
Andrea Di Biagio
167e00fc99 [DAGCombiner] Avoid calling method 'isShuffleMaskLegal' on illegal vector types.
This patch fixes a crasher in method 'DAGCombiner::visitOR' due to an invalid
call to method 'isShuffleMaskLegal'. On x86, method 'isShuffleMaskLegal'
always expects a legal vector value type in input.

With this patch, we immediately check if the input OR dag node has a legal
vector type; we only try to fold a OR dag node into a single shufflevector
if we know that the resulting shuffle will have a legal type.
This is to avoid calling method 'isShuffleMaskLegal' on a potentially
illegal vector value type.

Added a new test-case to file 'CodeGen/X86/combine-or.ll' to verify that
DAGCombiner doesn't crash in the attempt to check/combine an OR between shuffles
with illegal types.

llvm-svn: 213020
2014-07-15 00:02:32 +00:00
Andrea Di Biagio
1b83284869 [DAGCombiner] Add more rules to combine shuffle vector dag nodes.
This patch teaches the DAGCombiner how to fold a pair of shuffles
according to rules:
  1.  shuffle(shuffle A, B, M0), B, M1) -> shuffle(A, B, M2)
  2.  shuffle(shuffle A, B, M0), A, M1) -> shuffle(A, B, M3)

The new rules would only trigger if the resulting shuffle has legal type and
legal mask.

Added test 'combine-vec-shuffle-3.ll' to verify that DAGCombiner correctly
folds shuffles on x86 when the resulting mask is legal. Also added some negative
cases to verify that we avoid introducing illegal shuffles.

llvm-svn: 213001
2014-07-14 22:46:26 +00:00
Andrea Di Biagio
4242d12adc [DAGCombiner] Fix a crash caused by a missing check for legal type when trying to fold shuffles.
Verify that DAGCombiner does not crash when trying to fold a pair of shuffles
according to rule (added at r212539):
  (shuffle (shuffle A, Undef, M0), Undef, M1) -> (shuffle A, Undef, M2)

The DAGCombiner avoids folding shuffles if the resulting shuffle dag node
is not legal for the target. That means, the resulting shuffle must have
legal type and legal mask.

Before, the DAGCombiner only called method
'TargetLowering::isShuffleMaskLegal' to check if it was "safe" to fold according
to the above-mentioned rule. However, this caused a crash in the x86 backend
since method 'isShuffleMaskLegal' always expects to be called on a
legal vector type.

llvm-svn: 212915
2014-07-13 21:02:14 +00:00
Matt Arsenault
49a604853d Revert "Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine.""
Don't try to convert the select condition type.

llvm-svn: 212750
2014-07-10 18:21:04 +00:00
Andrea Di Biagio
29af13b774 [DAG] Further improve the logic in DAGCombiner that folds a pair of shuffles into a single shuffle if the resulting mask is legal.
This patch teaches the DAGCombiner how to fold shuffles according to the
following new rules:
  1. shuffle(shuffle(x, y), undef) -> x
  2. shuffle(shuffle(x, y), undef) -> y
  3. shuffle(shuffle(x, y), undef) -> shuffle(x, undef)
  4. shuffle(shuffle(x, y), undef) -> shuffle(y, undef)

The backend avoids to combine shuffles according to rules 3. and 4. if
the resulting shuffle does not have a legal mask. This is to avoid introducing
illegal shuffles that are potentially expanded into a sub-optimal sequence of
target specific dag nodes during vector legalization.

Added test case combine-vec-shuffle-2.ll to verify that we correctly triggers
the new rules when combining shuffles.

llvm-svn: 212748
2014-07-10 18:04:55 +00:00
NAKAMURA Takumi
ea850a0edc Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine."
This caused miscompilation on, at least, x86-64. SExt(i1 cond) confused other optimizations.

llvm-svn: 212708
2014-07-10 11:37:28 +00:00
Daniel Sanders
a59d10a761 Make it possible for ints/floats to return different values from getBooleanContents()
Summary:
On MIPS32r6/MIPS64r6, floating point comparisons return 0 or -1 but integer
comparisons return 0 or 1.

Updated the various uses of getBooleanContents. Two simplifications had to be
disabled when float and int boolean contents differ:
- ScalarizeVecRes_VSELECT except when the kind of boolean contents is trivially
  discoverable (i.e. when the condition of the VSELECT is a SETCC node).
- visitVSELECT (select C, 0, 1) -> (xor C, 1).
  Come to think of it, this one could test for the common case of 'C'
  being a SETCC too.

Preserved existing behaviour for all other targets and updated the affected
MIPS32r6/MIPS64r6 tests. This also fixes the pi benchmark where the 'low'
variable was counting in the wrong direction because it thought it could simply
add the result of the comparison.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, jholewinski, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D4389

llvm-svn: 212697
2014-07-10 10:18:12 +00:00
Hao Liu
cb23c5a35e [AArch64]Fix an assertion failure in DAG Combiner about concating 2 build_vector.
llvm-svn: 212677
2014-07-10 03:41:50 +00:00
Matt Arsenault
479a1f90e1 Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine.
Do this if the truncate is free and the select is legal.

llvm-svn: 212640
2014-07-09 19:12:07 +00:00
Chandler Carruth
14de70ee01 [SDAG] At the suggestion of Hal, switch to an output parameter that
tracks which elements of the build vector are in fact undef.

This should make actually inpsecting them (likely in my next patch)
reasonably pretty. Also makes the output parameter optional as it is
clear now that *most* users are happy with undefs in their splats.

llvm-svn: 212581
2014-07-09 00:41:34 +00:00
Andrea Di Biagio
ac6b7b5b82 [DAG] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.
This patch teaches how to fold a shuffle according to rule:
  shuffle (shuffle (x, undef, M0), undef, M1) -> shuffle(x, undef, M2)

We do this only if the resulting mask M2 is legal; this is to avoid introducing
illegal shuffles that are potentially expanded into a sub-optimal sequence
of target specific dag nodes.

This patch has the advantage of being target independent, since it works on ISD
nodes. Therefore, all targets (not only x86) can take advantage of this rule.
The idea behind this patch is that most shuffle pairs can be safely combined
before we run the legalizer on vector operations. This allows us to
combine/simplify dag nodes earlier in the process and not only immediately
before instruction selection stage.

That said. This patch is not meant to replace any existing target specific
combine rules; backends might still introduce new shuffles during legalization
stage. Also, this rule is very simple and avoids to aggressively optimize
shuffles.

llvm-svn: 212539
2014-07-08 15:22:29 +00:00
Chandler Carruth
0764373bcc [SDAG] Build up a more rich set of APIs for querying build-vector SDAG
nodes about whether they are splats. This is factored out and improved
from r212324 which got reverted as it was far too aggressive. The new
API should help more conservatively handle buildvectors that are
a mixture of splatted and undef values.

No functionality change at this point. The hope is to slowly
re-introduce the undef-tolerant optimization of splats, but each time
being forced to make a concious decision about how to handle the undefs
in a way that doesn't lead to contradicting assumptions about the
collapsed value.

Hal has pointed out in discussions that this may not end up being the
desired API and instead it may be more convenient to get a mask of the
undef elements or something similar. I'm starting simple and will expand
the API as I adapt actual callers and see exactly what they need.

llvm-svn: 212514
2014-07-08 07:19:55 +00:00
Chandler Carruth
9fd1035842 [x86] Revert r212324 which was too aggressive w.r.t. allowing undef
lanes in vector splats.

The core problem here is that undef lanes can't *unilaterally* be
considered to contribute to splats. Their handling needs to be more
cautious. There is also a reported failure of the nightly testers
(thanks Tobias!) that may well stem from the same core issue. I'm going
to fix this theoretical issue, factor the APIs a bit better, and then
verify that I don't see anything bad with Tobias's reduction from the
test suite before recommitting.

Original commit message for r212324:
  [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for
  any constant, constant FP, or undef splat and to tolerate any undef
  lanes in a splat, then replace all uses of isSplatVector in X86's
  lowering with it.

  This fixes issues where undef lanes in an otherwise splat vector would
  prevent the splat logic from firing. It is a touch more awkward to use
  this interface, but it is much more accurate. Suggestions for better
  interface structuring welcome.

  With this fix, the code generated with the widening legalization
  strategy for widen_cast-4.ll is *dramatically* improved as the special
  lowering strategies for a v16i8 SRA kick in even though the high lanes
  are undef.

  We also get a slightly different choice for broadcasting an aligned
  memory location, and use vpshufd instead of vbroadcastss. This looks
  like a minor win for pipelining and domain crossing, but a minor loss
  for the number of micro-ops. I suspect its a wash, but folks can
  easily tweak the lowering if they want.

llvm-svn: 212475
2014-07-07 19:03:32 +00:00
Chandler Carruth
ab7167839a [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for
any constant, constant FP, or undef splat and to tolerate any undef
lanes in a splat, then replace all uses of isSplatVector in X86's
lowering with it.

This fixes issues where undef lanes in an otherwise splat vector would
prevent the splat logic from firing. It is a touch more awkward to use
this interface, but it is much more accurate. Suggestions for better
interface structuring welcome.

With this fix, the code generated with the widening legalization
strategy for widen_cast-4.ll is *dramatically* improved as the special
lowering strategies for a v16i8 SRA kick in even though the high lanes
are undef.

We also get a slightly different choice for broadcasting an aligned
memory location, and use vpshufd instead of vbroadcastss. This looks
like a minor win for pipelining and domain crossing, but a minor loss
for the number of micro-ops. I suspect its a wash, but folks can easily
tweak the lowering if they want.

llvm-svn: 212324
2014-07-04 08:11:49 +00:00
Ulrich Weigand
7f6ccb0182 Fix ppcf128 component access on little-endian systems
The PowerPC 128-bit long double data type (ppcf128 in LLVM) is in fact a
pair of two doubles, where one is considered the "high" or
more-significant part, and the other is considered the "low" or
less-significant part.  When a ppcf128 value is stored in memory or a
register pair, the high part always comes first, i.e. at the lower
memory address or in the lower-numbered register, and the low part
always comes second.  This is true both on big-endian and little-endian
PowerPC systems.  (Similar to how with a complex number, the real part
always comes first and the imaginary part second, no matter the byte
order of the system.)

This was implemented incorrectly for little-endian systems in LLVM.
This commit fixes three related issues:

- When printing an immediate ppcf128 constant to assembler output
  in emitGlobalConstantFP, emit the high part first on both big-
  and little-endian systems.

- When lowering a ppcf128 type to a pair of f64 types in SelectionDAG
  (which is used e.g. when generating code to load an argument into a
  register pair), use correct low/high part ordering on little-endian
  systems.

- In a related issue, because lowering ppcf128 into a pair of f64 must
  operate differently from lowering an int128 into a pair of i64,
  bitcasts between ppcf128 and int128 must not be optimized away by the
  DAG combiner on little-endian systems, but must effect a word-swap.

Reviewed by Hal Finkel.

llvm-svn: 212274
2014-07-03 15:06:47 +00:00
Tom Stellard
5f887c9493 Revert "SelectionDAG: Enable (and (setcc x), (setcc y)) -> (setcc (and x, y)) for vectors"
This reverts commit r210540, adds a testcase for the regression it
caused, and marks the R600 test it was supposed to fix as XFAIL.

llvm-svn: 210792
2014-06-12 16:04:47 +00:00
Tom Stellard
ad2d29f10e SelectionDAG: Don't use MVT::Other to determine legality of ISD::SELECT_CC
The SelectionDAG bad a special case for ISD::SELECT_CC, where it would
allow targets to specify:

setOperationAction(ISD::SELECT_CC, MVT::Other, Expand);

to indicate that they wanted to expand ISD::SELECT_CC for all types.
This wasn't applied correctly everywhere, and it makes writing new
DAG patterns with ISD::SELECT_CC difficult.

llvm-svn: 210541
2014-06-10 16:01:29 +00:00
Tom Stellard
ba1210e7ac SelectionDAG: Enable (and (setcc x), (setcc y)) -> (setcc (and x, y)) for vectors
This prevents a future commit from regressing:

test/CodeGen/R600/setcc-equivalent.ll

llvm-svn: 210540
2014-06-10 16:01:25 +00:00
Andrea Di Biagio
23548cb631 [X86] Add target combine rules for horizontal add/sub.
This patch adds new target specific combine rules to identify horizontal
add/sub idioms from BUILD_VECTOR dag nodes.

This patch also teaches the DAGCombiner how to canonicalize sequences of
insert_vector_elt dag nodes according to the following rule:

  (insert_vector_elt (insert_vector_elt A, I0), I1) ->
    (insert_vecto_elt (insert_vector_elt A, I1), I0)

This new canonicalization rule only triggers if the inner insert_vector
dag node has exactly one use; also, both indices must be known constants,
and I1 < I0.
This last rule made it possible to write a simpler algorithm to identify
horizontal add/sub patterns because now we don't have to worry about the
ordering of insert_vector_elt dag nodes.

llvm-svn: 210477
2014-06-09 16:54:41 +00:00
Andrea Di Biagio
a2490b5eaa [DAG] Expose NoSignedWrap, NoUnsignedWrap and Exact flags to SelectionDAG.
This patch modifies SelectionDAGBuilder to construct SDNodes with associated
NoSignedWrap, NoUnsignedWrap and Exact flags coming from IR BinaryOperator
instructions.

Added a new SDNode type called 'BinaryWithFlagsSDNode' to allow accessing
nsw/nuw/exact flags during codegen.

Patch by Marcello Maggioni.

llvm-svn: 210467
2014-06-09 12:32:53 +00:00
Andrea Di Biagio
3a03708285 [X86] Add two combine rules to simplify dag nodes introduced during type legalization when promoting nodes with illegal vector type.
This patch teaches the backend how to simplify/canonicalize dag node
sequences normally introduced by the backend when promoting certain dag nodes
with illegal vector type.

This patch adds two new combine rules:
1) fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) ->
        (shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>)

2) fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) ->
        (shuffle (BINOP A, B), Undef, <Mask>).

Both rules are only triggered on the type-legalized DAG.
In particular, rule 1. is a target specific combine rule that attempts
to sink a bitconvert into the operands of a binary operation.
Rule 2. is a target independet rule that attempts to move a shuffle
immediately after a binary operation.

llvm-svn: 209930
2014-05-30 23:17:53 +00:00
Filipe Cabecinhas
8abf11ea97 Convert a vselect into a concat_vector if possible
Summary:
If both vector args to vselect are concat_vectors and the condition is
constant and picks half a vector from each argument, convert the vselect
into a concat_vectors.

Added a test.

The ConvertSelectToConcatVector is assuming it doesn't get vselects with
arguments of, for example, <undef, undef, true, true>. Those get taken
care of in the checks above its call.

Reviewers: nadav, delena, grosbach, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3916

llvm-svn: 209929
2014-05-30 23:03:11 +00:00
Michael J. Spencer
836291b5ff [x86] Fold extract_vector_elt of a load into the Load's address computation.
An address only use of an extract element of a load can be simplified to a
load. Without this the result of the extract element is spilled to the
stack so that an address is available.

llvm-svn: 209788
2014-05-29 01:42:45 +00:00
Hal Finkel
530ec71f07 Revert "[DAGCombiner] Split up an indexed load if only the base pointer value is live"
This reverts r208640 (I've just XFAILed the test) because it broke ppc64/Linux
self-hosting. Because nearly every regression test triggers a segfault, I hope
this will be easy to fix.

llvm-svn: 209747
2014-05-28 15:33:19 +00:00
Jay Foad
e0eac700cb Rename ComputeMaskedBits to computeKnownBits. "Masked" has been
inappropriate since it lost its Mask parameter in r154011.

llvm-svn: 208811
2014-05-14 21:14:37 +00:00
Adam Nemet
78e81b5109 [DAGCombiner] Split up an indexed load if only the base pointer value is live
Right now the load may not get DCE'd because of the side-effect of updating
the base pointer.

This can happen if we lower a read-modify-write of an illegal larger type
(e.g. i48) such that the modification only affects one of the subparts (the
lower i32 part but not the higher i16 part).  See the testcase.

In order to spot the dead load we need to revisit it when SimplifyDemandedBits
decided that the value of the load is masked off.  This is the
CommitTargetLoweringOpt piece.

I checked compile time with ARM64 by sending SPEC bitcode files through llc.
No measurable change.

Fixes <rdar://problem/16031651>

llvm-svn: 208640
2014-05-12 23:00:03 +00:00
Tim Northover
4aa1f54c61 DAGCombine: prevent formation of illegal ConstantFP nodes.
llvm-svn: 207850
2014-05-02 17:25:02 +00:00
Weiming Zhao
3625856a33 [ARM64] Prevent bit extraction to be adjusted by following shift
For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it
into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx
more difficult.
For example:
Given
  %shr = lshr i64 %x, 4
  %and = and i64 %shr, 15
  %arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and
  %0 = load i64* %arrayidx
With current shift folding, it takes 3 instrs to compute base address:
  lsr x8, x0, #1
  and x8, x8, #0x78
  add x8, x9, x8

If using ubfx, it only needs 2 instrs:
  ubfx  x8, x0, #4, #4
  add x8, x9, x8, lsl #3

This fixes bug 19589

llvm-svn: 207702
2014-04-30 21:07:24 +00:00
Jim Grosbach
f4913dabb0 Tidy up whitespace.
llvm-svn: 207583
2014-04-29 22:41:50 +00:00
Craig Topper
aec1381207 Convert AddNodeIDNode and SelectionDAG::getNodeIfExiists to use ArrayRef<SDValue>
llvm-svn: 207383
2014-04-27 23:22:43 +00:00
Craig Topper
e5c6e7f4ea Convert one last signature of getNode to take an ArrayRef of SDUse.
llvm-svn: 207376
2014-04-27 19:21:06 +00:00
Benjamin Kramer
e6a357c0fc DAGCombiner: Simplify code a bit, make more transforms work with vectors.
llvm-svn: 207338
2014-04-26 23:09:49 +00:00
Craig Topper
1b1f54bcca Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>.
llvm-svn: 207327
2014-04-26 18:35:24 +00:00
Benjamin Kramer
89fb3dd5a4 Rip out X86-specific vector SDIV lowering, make the corresponding DAGCombiner transform work on vectors.
llvm-svn: 207316
2014-04-26 13:00:53 +00:00
Benjamin Kramer
163df6bc62 DAGCombiner: Turn divs of vector splats into vectorized multiplications.
Otherwise the legalizer would just scalarize everything. Support for
mulhi in the targets isn't that great yet so on most targets we get
exactly the same scalarized output. Add a test for x86 vector udiv.

I had to disable the mulhi nodes on ARM because there aren't any patterns
for it. As far as I know ARM has instructions for getting the high part of
a multiply so this should be fixed.

llvm-svn: 207315
2014-04-26 12:06:28 +00:00
Hao Liu
6daf5ecff4 Fix an infinite loop bug in DAG Combine about keeping transfering between ANY_EXTEND and SIGN_EXTEND.
llvm-svn: 206873
2014-04-22 09:57:06 +00:00
Chandler Carruth
2361db41db [Modules] Remove potential ODR violations by sinking the DEBUG_TYPE
define below all header includes in the lib/CodeGen/... tree. While the
current modules implementation doesn't check for this kind of ODR
violation yet, it is likely to grow support for it in the future. It
also removes one layer of macro pollution across all the included
headers.

Other sub-trees will follow.

llvm-svn: 206837
2014-04-22 02:02:50 +00:00
Tim Northover
dcc9d1cb89 DAGCombiner: don't optimise non-existant litpool load
This particular DAG combine is designed to kick in when both ConstantFPs will
end up being loaded via a litpool, however those nodes have a semi-legal
status, dictated by isFPImmLegal so in some cases there wouldn't have been a
litpool in the first place. Don't try to be clever in those circumstances.

Picked up while merging some AArch64 tests.

llvm-svn: 206365
2014-04-16 09:03:09 +00:00
Robert Lougher
eaacc6b6ba Revert r191049/r191059 as it can produce wrong code (see PR17975).
It has already been reverted on the 3.4 branch in r196521.

llvm-svn: 206311
2014-04-15 18:34:24 +00:00
Nick Lewycky
82ad9fc7c8 Break PseudoSourceValue out of the Value hierarchy. It is now the root of its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead.
llvm-svn: 206255
2014-04-15 07:22:52 +00:00
Craig Topper
30281a67fb [C++11] More 'nullptr' conversion. In some cases just using a boolean check instead of comparing to nullptr.
llvm-svn: 206142
2014-04-14 00:51:57 +00:00
Hal Finkel
70fab0cd6d Reenable use of TBAA during CodeGen
We had disabled use of TBAA during CodeGen (even when otherwise using AA)
because the ptrtoint/inttoptr used by CGP for address sinking caused BasicAA to
miss basic type punning that it should catch (and, thus, we'd fail to override
TBAA when we should).

However, when AA is in use during CodeGen, CGP now uses normal GEPs and
bitcasts, instead of ptrtoint/inttoptr, when doing address sinking. As a
result, BasicAA should be able to make us do the right thing in the face of
type-punning, and it seems safe to enable use of TBAA again. self-hosting seems
fine on PPC64/Linux on the P7, with TBAA enabled and -misched=shuffle.

Note: We still don't update TBAA when merging stack slots, although because
BasicAA should now catch all such cases, this is no longer a blocking issue.
Nevertheless, I plan to commit code to deal with this properly in the near
future.

llvm-svn: 206093
2014-04-12 01:26:00 +00:00
Jim Grosbach
6f9873ee9f [c++11] Range'ify use list loops in DAGCombiner.
llvm-svn: 206014
2014-04-11 01:13:13 +00:00
Quentin Colombet
95c5120626 [DAGCombiner] DAG combine does not know how to combine indexed loads with
sign/zero/any extensions. However a few places were not checking properly the
property of the load and were turning an indexed load into a regular extended
load. Therefore the indexed value was lost during the process and this was
triggering an assertion.

<rdar://problem/16389332>

llvm-svn: 205923
2014-04-09 20:03:05 +00:00
Matt Arsenault
b8e729fdf2 Bug 19348: Check for legal ExtLoad operation before folding
(aext (zextload x)) -> (aext (truncate (*extload x)))

Patch by Stanislav Mekhanoshin!

llvm-svn: 205805
2014-04-08 21:40:37 +00:00
Matt Arsenault
0062eb7871 Make isSetCCEquivalent respect the TargetBooleanContents
llvm-svn: 205336
2014-04-01 18:13:26 +00:00
Hal Finkel
724ed34f6e Look at shuffles of build_vectors in DAGCombiner::visitEXTRACT_VECTOR_ELT
When the loop vectorizer vectorizes code that uses the loop induction variable,
we often end up with IR like this:

  %b1 = insertelement <2 x i32> undef, i32 %v, i32 0
  %b2 = shufflevector <2 x i32> %b1, <2 x i32> undef, <2 x i32> zeroinitializer
  %i = add <2 x i32> %b2, <i32 2, i32 3>

If the add in this example is not legal (as is the case on PPC with VSX), it
will be scalarized, and we'll end up with a number of extract_vector_elt nodes
with the vector shuffle as the input operand, and that vector shuffle is fed by
one or more build_vector nodes. By the time that vector operations are
expanded, visitEXTRACT_VECTOR_ELT will not create new extract_vector_elt by
looking through the vector shuffle (to make sure that no illegal operations are
created), and so the extract_vector_elt -> vector shuffle -> build_vector is
never simplified to an operand of the build vector.

By looking at build_vectors through a shuffle we fix this particular situation,
preventing a vector from being built, only to be deconstructed again (for the
scalarized add) -- an expensive proposition when this all needs to be done via
the stack. We probably want a more comprehensive fix here where we look back
recursively through any shuffles to any build_vectors or scalar_to_vectors,
etc. but that can come later.

llvm-svn: 205179
2014-03-31 11:43:19 +00:00
Andrea Di Biagio
84fdff1b7f [DAG] Fix an assertion failure caused by an invalid cast in method 'BuildVectorSDNode::isConstantSplat'
This patch renames method 'isConstantSplat' as 'getConstantSplatValue'
(mainly for consistency reasons), and rewrites its logic to ensure
that we always perform a legal 'cast<ConstantSDNode>'.

Added test shift-combine-crash.ll to verify that DAGCombiner no longer crashes with an assertion failure in the attempt to simplify a vector shift by a vector of all undef counts.

llvm-svn: 204536
2014-03-22 01:47:22 +00:00
Andrea Di Biagio
b329156058 [DAGCombiner] teach how to simplify xor/and/or nodes according to the following rules:
1)  (AND (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (AND (A, B), C, Mask)
 2)  (OR  (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (OR  (A, B), C, Mask)
 3)  (XOR (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (XOR (A, B), V_0, Mask)

 4)  (AND (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, AND (A, B), Mask)
 5)  (OR  (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, OR  (A, B), Mask)
 6)  (XOR (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (V_0, XOR (A, B), Mask)

llvm-svn: 204160
2014-03-18 17:12:59 +00:00
Matt Arsenault
c95c06bda9 Make DAGCombiner work on vector bitshifts with constant splat vectors.
llvm-svn: 204071
2014-03-17 18:58:01 +00:00
Craig Topper
c2c1be655d [C++11] Add 'override' keyword to virtual methods that override their base class.
llvm-svn: 203339
2014-03-08 06:31:39 +00:00
Adam Nemet
08154e7b50 [DAGCombiner] Distribute TRUNC through AND in rotation amount
This is already done for shifts.  Allow it for rotations as well. E.g.:

   (rotl:i32 x, (trunc (and y, 31))) -> (rotl:i32 x, (and (trunc y), 31))

Use the newly factored-out distributeTruncateThroughAnd.

With this patch and some X86.td tweaks we should be able to remove redundant
masking of the rotation amount like in the example above.  HW implicitly
performs this masking.

The testcase will be added as part of the X86 patch.

llvm-svn: 203316
2014-03-07 23:56:30 +00:00
Adam Nemet
f591fbc1db [DAGCombiner] Recognize another rotation idiom
This is the new idiom:

  x<<(y&31) | x>>((0-y)&31)

which is recognized as:

  x ROTL (y&31)

The change refines matchRotateSub.  In
Neg & (OpSize - 1) == (OpSize - Pos) & (OpSize - 1), if Pos is
Pos' & (OpSize - 1) we can just use Pos' instead of Pos.

llvm-svn: 203315
2014-03-07 23:56:28 +00:00
Adam Nemet
e4f5b8836b [DAGCombiner] Slightly improve readability of matchRotateSub
Slightly change the wording in the function comment. Originally, it can be
misunderstood as we turned the input into two subsequent rotates.

Better connect the comment which talks about Mask and the code which used
LoBits.  Renamed variable to MaskLoBits.

llvm-svn: 203314
2014-03-07 23:56:24 +00:00
Andrea Di Biagio
4586b3b78c [X86] Teach the DAGCombiner how to fold a OR of two shufflevector nodes.
This patch teaches the DAGCombiner how to fold a binary OR between two
shufflevector into a single shuffle vector when possible.

The rules are:
  1. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask1)
  2. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf B, A, Mask2)

The DAGCombiner can take advantage of the fact that OR is commutative and
compute two possible shuffle masks (Mask1 and Mask2) for the resulting
shuffle node.

Before folding a dag according to either rule 1 or 2, DAGCombiner verifies
that the resulting shuffle mask is legal for the target.
DAGCombiner would firstly try to fold according to 1.; If not possible
then it will try to fold according to 2.
If both Mask1 and Mask2 are illegal then we conservatively don't fold
the OR instruction.

llvm-svn: 203156
2014-03-06 20:19:52 +00:00
Adam Nemet
432d40e686 [DAGCombiner] Factor out distributeTruncateThroughAnd
Currently this code is duplicated across visitSHL, visitSRA and visitSRL.  The
plan is to add rotates as clients to this new function.

There is no functional change intended here.

llvm-svn: 202908
2014-03-04 23:28:31 +00:00
Benjamin Kramer
3ac154a395 [C++11] Replace llvm::tie with std::tie.
The old implementation is no longer needed in C++11.

llvm-svn: 202644
2014-03-02 13:30:33 +00:00
Benjamin Kramer
803ba41365 Now that we have C++11, turn simple functors into lambdas and remove a ton of boilerplate.
No intended functionality change.

llvm-svn: 202588
2014-03-01 11:47:00 +00:00
Hal Finkel
41d36f7b16 Fix visitTRUNCATE for legal i1 values
This extract-and-trunc vector optimization cannot work for i1 values as
currently implemented, and so I'm disabling this for now for i1 values. In the
future, this can be fixed properly.

Soon I'll commit support for i1 CR bit tracking in the PowerPC backend, and
this will be covered by one of the existing regression tests.

llvm-svn: 202449
2014-02-28 00:26:45 +00:00
Matt Arsenault
9ba8b6c51a Trivial code simplification
llvm-svn: 202073
2014-02-24 21:01:15 +00:00
Quentin Colombet
60b53fae3c [DAGCombiner] PCMP* sets its result to all ones or zeros so we can AND with the
shifted mask rather than masking and shifting separately.

The patch adds this transformation to the DAGCombiner:

  (shl (and (setcc:i8v16 ...) N01C) N1C) -> (and (setcc:i8v16 ...) N01C<<N1C)

<rdar://problem/16054492>

Patch by Adam Nemet <anemet@apple.com>

llvm-svn: 201906
2014-02-21 23:42:41 +00:00
Robert Lougher
834d664400 Teach the DAGCombiner how to fold concat_vector nodes when the input is two
BUILD_VECTOR nodes, e.g.:

(concat_vectors (BUILD_VECTOR a1, a2, a3, a4), (BUILD_VECTOR b1, b2, b3, b4))
->
(BUILD_VECTOR a1, a2, a3, a4, b1, b2, b3, b4)

This fixes an issue with AVX, where a sequence was not recognized as a 256-bit
vbroadcast due to the concat_vectors.

llvm-svn: 201158
2014-02-11 15:42:46 +00:00
Juergen Ributzka
a5769c5abc [DAG] Don't pull the binary operation though the shift if the operands have opaque constants.
During DAGCombine visitShiftByConstant assumes that certain binary operations
with only constant operands can always be folded successfully. This is no longer
true when the constant is opaque. This commit fixes visitShiftByConstant by not
performing the optimization for opaque constants. Otherwise we would end up in
an infinite DAGCombine loop.

llvm-svn: 200900
2014-02-06 04:09:06 +00:00
Manman Ren
0552af6547 This patch teaches the DAGCombiner how to fold insert_subvector nodes
when the input is a concat_vectors and the insert replaces one of the
concat halves:

Lower half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors Z, Y)
Upper half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors X, Z)

This can be seen with the following IR:

define <8 x float> @lower_half(<4 x float> %v1, <4 x float> %v2, <4 x
float> %v3) {
  %1 = shufflevector <4 x float> %v1, <4 x float> %v2, <8 x i32> <i32
0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
  %2 = tail call <8 x float> @llvm.x86.avx.vinsertf128.ps.256(<8 x
float> %1, <4 x float> %v3, i8 0)

The vinsertf128 intrinsic is converted into an insert_subvector node
in SelectionDAGBuilder.cpp.

Using AVX, without the patch this generates two vinsertf128 instructions:

vinsertf128 $1, %xmm1, %ymm0, %ymm0
vinsertf128 $0, %xmm2, %ymm0, %ymm0

With the patch this is optimized into:

vinsertf128 $1, %xmm1, %ymm2, %ymm0

Patch by Robert Lougher.

llvm-svn: 200506
2014-01-31 01:10:35 +00:00