1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

11645 Commits

Author SHA1 Message Date
Volkan Keles
cc07d01a19 [InstCombine] Combine nested min/max intrinsics with constants
Reviewers: arsenm, spatel

Reviewed By: spatel

Subscribers: lebedev.ri, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D53774

llvm-svn: 345751
2018-10-31 17:50:52 +00:00
Sanjay Patel
936915cfd1 [InstCombine] refactor fabs+fcmp fold; NFC
Also, remove/replace/minimize/enhance the tests for this fold.
The code drops FMF, so it needs more tests and at least 1 fix.

llvm-svn: 345734
2018-10-31 16:34:43 +00:00
Sanjay Patel
7a21856af1 [InstSimplify] fold 'fcmp nnan ult X, 0.0' when X is not negative
This is the inverted case for the transform added with D53874 / rL345725.

llvm-svn: 345728
2018-10-31 15:35:46 +00:00
Sanjay Patel
0520773727 [InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negative
This re-raises some of the open questions about how to apply and use fast-math-flags in IR from PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086
...but given the current implementation (no FMF on casts), this is likely the only way to predicate the 
transform.

This is part of solving PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475

Differential Revision: https://reviews.llvm.org/D53874

llvm-svn: 345725
2018-10-31 14:57:23 +00:00
Fedor Sergeev
3984a320d2 [LoopUnroll] allow customization for new-pass-manager version of LoopUnroll
Unlike its legacy counterpart new pass manager's LoopUnrollPass does
not provide any means to select which flavors of unroll to run
(runtime, peeling, partial), relying on global defaults.

In some cases having ability to run a restricted LoopUnroll that
does more than LoopFullUnroll is needed.

Introduced LoopUnrollOptions to select optional unroll behaviors.
Added 'unroll<peeling>' to PassRegistry mainly for the sake of testing.

Reviewers: chandlerc, tejohnson
Differential Revision: https://reviews.llvm.org/D53440

llvm-svn: 345723
2018-10-31 14:33:14 +00:00
Sanjay Patel
5297adacba [InstSimplify] add tests for fcmp and known positive; NFC
llvm-svn: 345722
2018-10-31 14:29:21 +00:00
Sanjay Patel
dfc4cdc406 [InstSimplify] fold icmp based on range of abs/nabs
This is a fix for PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475

We managed to get some of these patterns using computeKnownBits in D47041, but that 
can't be used for nabs(). Instead, put in some range-based logic, so we can fold 
both abs/nabs with icmp with a constant value.

Alive proofs:
https://rise4fun.com/Alive/21r

Name: abs_nsw_is_positive
  %cmp = icmp slt i32 %x, 0
  %negx = sub nsw i32 0, %x
  %abs = select i1 %cmp, i32 %negx, i32 %x
  %r = icmp sgt i32 %abs, -1
    =>
  %r = i1 true
 
Name: abs_nsw_is_not_negative
  %cmp = icmp slt i32 %x, 0
  %negx = sub nsw i32 0, %x
  %abs = select i1 %cmp, i32 %negx, i32 %x
  %r = icmp slt i32 %abs, 0
    =>
  %r = i1 false
 
Name: nabs_is_negative_or_0
  %cmp = icmp slt i32 %x, 0
  %negx = sub i32 0, %x
  %nabs = select i1 %cmp, i32 %x, i32 %negx
  %r = icmp slt i32 %nabs, 1
    =>
  %r = i1 true

Name: nabs_is_not_over_0
  %cmp = icmp slt i32 %x, 0
  %negx = sub i32 0, %x
  %nabs = select i1 %cmp, i32 %x, i32 %negx
  %r = icmp sgt i32 %nabs, 0
    =>
  %r = i1 false

Differential Revision: https://reviews.llvm.org/D53844

llvm-svn: 345717
2018-10-31 13:25:10 +00:00
Max Kazantsev
7bf507eea1 [NFC] Add tests for loop-simplifycfg for further development
llvm-svn: 345713
2018-10-31 11:28:23 +00:00
Max Kazantsev
a704bd2a52 [IndVars] Strengthen restricton in rewriteLoopExitValues
For some unclear reason rewriteLoopExitValues considers recalculation
after the loop profitable if it has some "soft uses" outside the loop (i.e. any
use other than call and return), even if we have proved that it has a user inside
the loop which we think will not be optimized away.

There is no existing unit test that would explain this. This patch provides an
example when rematerialisation of exit value is not profitable but it passes
this check due to presence of a "soft use" outside the loop.

It makes no sense to recalculate value on exit if we are going to compute it
due to some irremovable within the loop. This patch disallows applying this
transform in the described situation.

Differential Revision: https://reviews.llvm.org/D51581
Reviewed By: etherzhhb

llvm-svn: 345708
2018-10-31 10:30:50 +00:00
Dorit Nuzman
a2771a93ac [LV] Support vectorization of interleave-groups that require an epilog under
optsize using masked wide loads 

Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53668

llvm-svn: 345705
2018-10-31 09:57:56 +00:00
Quentin Colombet
b5e9d9e242 [InstCombine] Teach the move free before null test opti how to deal with noop casts
InstCombine features an optimization that essentially replaces:
if (a)
  free(a)
into:
free(a)

Right now, this optimization is gated by the minsize attribute and therefore
we only perform it if we can prove that we are going to be able to eliminate
the branch and the destination block.

However when casts are involved the optimization would fail to apply, because
the optimization was not smart enough to realize that it is possible to also
move the casts away from the destination block and that is harmless to the
performance since they are just noops.
E.g.,
foo(int *a)
if (a)
  free((char*)a)

Wouldn't be optimized by instcombine, because
- We would refuse to hoist the `bitcast i32* %a to i8` in the source block
- We would fail to see that `bitcast i32* %a to i8` and %a are the same value.

This patch fixes both these problems:
- It teaches the pattern matching of the comparison how to look
  through casts.
- It checks that whether the additional instruction in the destination block
  can be hoisted and are harmless performance-wise.
- It hoists all the code of the destination block in the source block.

Differential Revision: D53356

llvm-svn: 345644
2018-10-30 20:51:04 +00:00
Volkan Keles
11e0c69e3a [InstCombine] Add preliminary tests for nested min/max combines. NFC
Summary: As requested in D53774.

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53875

llvm-svn: 345616
2018-10-30 17:51:14 +00:00
Sanjay Patel
21296e663b [InstSimplify] add tests for fcmp folds; NFC
This is part of a problem noted in PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475

llvm-svn: 345615
2018-10-30 16:58:43 +00:00
Sanjay Patel
e35e74d2a8 [InstCombine] try to turn shuffle into insertelement
shuffle (insert ?, Scalar, IndexC), V1, Mask --> insert V1, Scalar, IndexC'

The motivating case is at least a couple of steps away: I noticed that
SLPVectorizer does not analyze shuffles as well as sequences of 
insert/extract in PR34724:
https://bugs.llvm.org/show_bug.cgi?id=34724
...so SLP may fail to vectorize when source code has shuffles to start 
with or instcombine has converted insert/extract to shuffles.

Independent of that, an insertelement is always a simpler op for IR 
analysis vs. a shuffle, so we should transform to insert when possible.

I don't think there's any codegen concern here - if a target can't insert 
a scalar directly to some fixed element in a vector (x86?), then this 
should get expanded to the insert+shuffle that we started with.

Differential Revision: https://reviews.llvm.org/D53507

llvm-svn: 345607
2018-10-30 15:26:39 +00:00
Jonas Paulsson
a12facfaec [LoopVectorizer] Fix for cost values of memory accesses.
This commit is a combination of two patches:

* "Fix in getScalarizationOverhead()"

   If target returns false in TTI.prefersVectorizedAddressing(), it means the
   address registers will not need to be extracted. Therefore, there should
   be no operands scalarization overhead for a load instruction.

* "Don't pass the instruction pointer from getMemInstScalarizationCost."

   Since VF is always > 1, this is a cost query for an instruction in the
   vectorized loop and it should not be evaluated within the scalar
   context of the instruction.

Review: Ulrich Weigand, Hal Finkel
https://reviews.llvm.org/D52351
https://reviews.llvm.org/D52417

llvm-svn: 345603
2018-10-30 14:34:15 +00:00
Nicola Zaghen
51d91937b5 [SROA] Use offset sizes from the DataLayout instead of the pointer siezes.
This fixes an assertion when constant folding a GEP when the part of the offset
was in i32 (IndexSize, as per DataLayout) and part in the i64 (PointerSize) in
the newly created test case.

Differential Revision: https://reviews.llvm.org/D52609

llvm-svn: 345585
2018-10-30 11:15:04 +00:00
Sanjay Patel
7b52ff9cf4 [InstSimplify] add tests for abs/nabs+icmp folding; NFC
llvm-svn: 345541
2018-10-29 21:05:41 +00:00
Fedor Sergeev
8100e4074d [LoopUnroll] NFC. Factor out runtime-loop.ll common test behavior.
Adding COMMON prefix to get common part handled there.
Needed to simplify test changes for D53440.

llvm-svn: 345538
2018-10-29 20:38:23 +00:00
Vedant Kumar
c3c1b791ca [HotColdSplitting] Allow outlining single-block cold regions
It can be profitable to outline single-block cold regions because they
may be large.

Allow outlining single-block regions if they have over some threshold of
non-debug, non-terminator instructions. I chose 3 as the threshold after
experimenting with several internal frameworks.

In practice, reducing the threshold further did not give much
improvement, whereas increasing it resulted in substantial regressions.

Differential Revision: https://reviews.llvm.org/D53824

llvm-svn: 345524
2018-10-29 19:15:39 +00:00
Renato Golin
b4d299c31b Revert r344172: [LV] Add a new reduction pattern match
This patch has caused fast-math issues in the reduction pattern.

Will re-work and land again.

llvm-svn: 345465
2018-10-27 22:13:43 +00:00
Florian Hahn
0cc56f7cae [Local] Keep K's range if K does not move when combining metadata.
As K has to dominate I, IIUC I's range metadata must be a subset of
K's. After Eli's recent clarification to the LangRef, loading a value
outside of the range is undefined behavior.
Therefore if I's range contains elements outside of K's range and we would load
one such value, K would cause undefined behavior.

In cases like hoisting/sinking, we still want the most generic range
over all code paths to/from the hoist/sink point. As suggested in the
patches related to D47339, I will refactor the handling of those
scenarios and try to decouple it from this function as follow up, once
we switched to a similar handling of metadata in most of
combineMetadata.

I updated some tests checking mostly the merging of metadata to keep the
metadata of to dominating load. The most interesting one is probably test8 in
test/Transforms/JumpThreading/thread-loads.ll. It contained a comment
about the alias metadata preventing us to eliminate the branch, but it
seem like the actual problem currently is that we merge the ranges of
both loads and cannot eliminate the icmp afterwards. With this patch, we
manage to eliminate the icmp, as the range of the first load excludes 8.

Reviewers: efriedma, nlopes, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D51629

llvm-svn: 345456
2018-10-27 16:53:45 +00:00
Sanjay Patel
b34960fe47 [ValueTracking] peek through shuffles in ComputeNumSignBits (PR37549)
The motivating case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

The analysis improvement allows us to form a vector 'select' out of 
bitwise logic (the use of ComputeNumSignBits was added at rL345149).

The smaller test shows another InstCombine improvement - we use 
ComputeNumSignBits to add 'nsw' to shift-left. But the negative
test shows an example where we must not add 'nsw' - when the shuffle
mask contains undef elements.

Differential Revision: https://reviews.llvm.org/D53659

llvm-svn: 345429
2018-10-26 21:05:14 +00:00
Christy Lee
bee2de7843 Pointer types were treated as zero-size by MergeICmps
Summary:
The visitICmp analysis function would record compares of pointer types, as size 0. This causes the resulting memcmp() call to have the wrong total size.
Found with "self-build" of clang/LLVM on Windows.

Reviewers: christylee, trentxintong, courbet

Reviewed By: courbet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53536

llvm-svn: 345413
2018-10-26 18:02:06 +00:00
Max Kazantsev
9b8cbc08e6 [SimpleLoopUnswitch] Unswitch by experimental.guard intrinsics
This patch adds support of `llvm.experimental.guard` intrinsics to non-trivial
simple loop unswitching. These intrinsics represent implicit control flow which
has pretty much the same semantics as usual conditional branches. The
algorithm of dealing with them is following:

- Consider guards as unswitching candidates;
- If a guard is considered the best candidate, turn it into a branch;
- Apply normal unswitching algorithm on this branch.

The patch has no compile time effect on code that does not contain any guards.

Differential Revision: https://reviews.llvm.org/D53744
Reviewed By: chandlerc

llvm-svn: 345387
2018-10-26 14:20:11 +00:00
Cameron McInally
90a586c915 [FPEnv] Last BinaryOperator::isFNeg(...) to m_FNeg(...) changes
Replacing BinaryOperator::isFNeg(...) to avoid regressions when we
separate FNeg from the FSub IR instruction.

Differential Revision: https://reviews.llvm.org/D53650

llvm-svn: 345295
2018-10-25 18:09:33 +00:00
Sam Parker
e4b39c84d1 [ARM] Use Cortex-A57 sched model for Cortex-A72
This mirrors what we already do for AArch64 as the cores are similar.
As discussed in the review, enabling the machine scheduler causes
more variations in performance changes so it is not enabled for now.
This patch improves LNT scores by a geomean of 1.57% at -O3.

Differential Revision: https://reviews.llvm.org/D53562

llvm-svn: 345272
2018-10-25 15:08:29 +00:00
Simon Pilgrim
c6ef0b8381 [CostModel][X86] Add realistic vXi64 uitofp vXf64 costs
Match codegen improvements from D53649/rL345256

llvm-svn: 345263
2018-10-25 13:06:20 +00:00
Simon Pilgrim
0fa453abed [CostModel][X86] Add realistic i64 uitofp f64 scalar costs
llvm-svn: 345261
2018-10-25 12:42:10 +00:00
Gabor Buella
5eb75bc24f Add -instcombine-code-sinking option
Reviewers: craig.topper, andrew.w.kaylor, efriedma

Reviewed By: craig.topper, andrew.w.kaylor, efriedma

Differential Revision: https://reviews.llvm.org/D52709

llvm-svn: 345248
2018-10-25 08:32:29 +00:00
Alina Sbirlea
91e00bac96 Update MemorySSA in LoopRotate.
Summary:
Teach LoopRotate to preserve MemorySSA.
Enable tests for correctness, dependency disabled by default.

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D51718

llvm-svn: 345216
2018-10-24 22:46:45 +00:00
Vedant Kumar
8b8c6ce97c [HotColdSplitting] Identify larger cold regions using domtree queries
The current splitting algorithm works in three stages:

  1) Identify cold blocks, then
  2) Use forward/backward propagation to mark hot blocks, then
  3) Grow a SESE region of blocks *outside* of the set of hot blocks and
  start outlining.

While testing this pass on Apple internal frameworks I noticed that some
kinds of control flow (e.g. loops) are never outlined, even though they
unconditionally lead to / follow cold blocks. I noticed two other issues
related to how cold regions are identified:

  - An inconsistency can arise in the internal state of the hotness
  propagation stage, as a block may end up in both the ColdBlocks set
  and the HotBlocks set. Further inconsistencies can arise as these sets
  do not match what's in ProfileSummaryInfo.

  - It isn't necessary to limit outlining to single-exit regions.

This patch teaches the splitting algorithm to identify maximal cold
regions and outline them. A maximal cold region is defined as the set of
blocks post-dominated by a cold sink block, or dominated by that sink
block. This approach can successfully outline loops in the cold path. As
a side benefit, it maintains less internal state than the current
approach.

Due to a limitation in CodeExtractor, blocks within the maximal cold
region which aren't dominated by a single entry point (a so-called "max
ancestor") are filtered out.

Results:
  - X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to
  47KB pre-patch, or a ~3x improvement. Did not see a performance impact
  across two runs.
  - AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB
  of TEXT were outlined. Ditto re: performance impact.
  - Outlining results improve marginally in the internal frameworks I
  tested.

Follow-ups:
  - Outline more than once per function, outline large single basic
  blocks, & try to remove unconditional branches in outlined functions.

Differential Revision: https://reviews.llvm.org/D53627

llvm-svn: 345209
2018-10-24 22:15:41 +00:00
Sanjay Patel
47c53ff9be [InstCombine] add test for fptrunc with vector with undef elt; NFC
This should be fixed with D53650.

llvm-svn: 345206
2018-10-24 22:02:05 +00:00
Teresa Johnson
6eeef9bcbf [hot-cold-split] Name split functions with ".cold" suffix
Summary:
The current default of appending "_"+entry block label to the new
extracted cold function breaks demangling. Change the deliminator from
"_" to "." to enable demangling. Because the header block label will
be empty for release compile code, use "extracted" after the "." when
the label is empty.

Additionally, add a mechanism for the client to pass in an alternate
suffix applied after the ".", and have the hot cold split pass use
"cold."+Count, where the Count is currently 1 but can be used to
uniquely number multiple cold functions split out from the same function
with D53588.

Reviewers: sebpop, hiraditya

Subscribers: llvm-commits, erik.pilkington

Differential Revision: https://reviews.llvm.org/D53534

llvm-svn: 345178
2018-10-24 18:53:47 +00:00
Sanjay Patel
e004590e92 [InstCombine] add test for ComputeNumSignBits with shuffle; NFC
llvm-svn: 345162
2018-10-24 17:01:42 +00:00
Sanjay Patel
998b51454a [InstCombine] add test for select with shuffled condition (PR37549); NFC
llvm-svn: 345156
2018-10-24 16:21:23 +00:00
Sanjay Patel
37e302a718 [InstCombine] try harder to form select from logic ops (2nd try)
The original patch was committed here:
rL344609
...and reverted:
rL344612
...because it did not properly check/test data types before calling
ComputeNumSignBits(). 

The tests that caused bot failures for the previous commit are 
over-reaching front-end tests that run the entire -O optimizer 
pipeline: 
    Clang :: CodeGen/builtins-systemz-zvector.c
    Clang :: CodeGen/builtins-systemz-zvector2.c

I've added a negative test here to ensure coverage for that case.
The new early exit check also tests the type of the 'B' parameter,
so we don't waste time on matching if either value is unsuitable.

Original commit message:

This is part of solving PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

The patterns shown here are a special case of something
that we already convert to select. Using ComputeNumSignBits()
catches that case (but not the more complicated motivating
patterns yet).

The backend has hooks/logic to convert back to logic ops
if that's better for the target.

llvm-svn: 345149
2018-10-24 15:17:56 +00:00
Cameron McInally
94421559c1 [FPEnv] Convert more BinaryOperator::isFNeg(...) to m_FNeg(...)
This work is to avoid regressions when we seperate FNeg from the FSub IR instruction. 

Differential Revision: https://reviews.llvm.org/D53205

llvm-svn: 345146
2018-10-24 14:45:18 +00:00
Gil Rapaport
375b47405c Revert r345114
Investigating fails.

llvm-svn: 345123
2018-10-24 08:41:22 +00:00
Dorit Nuzman
10d1cff193 [LV] Don't have fold-tail under optsize invalidate interleave-groups when
masked-interleaving is enabled

Enable interleave-groups under fold-tail scenario for Opt for size compilation;
D50480 added support for vectorizing loops of arbitrary trip-count without a
remiander, which in turn makes everything in the loop conditional, including
interleave-groups if any. It therefore invalidated all interleave-groups
because we didn't have support for vectorizing predicated interleaved-groups
at the time. In the meantime, D53011 introduced this support, so we don't
have to invalidate interleave-groups when masked-interleaved support is enabled.

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: hsaito

Differential Revision: https://reviews.llvm.org/D53559

llvm-svn: 345115
2018-10-24 07:11:38 +00:00
Gil Rapaport
20e5de3e82 [LSR] Combine unfolded offset into invariant register
LSR reassociates constants as unfolded offsets when the constants fit as
immediate add operands, which currently prevents such constants from being
combined later with loop invariant registers.
This patch modifies GenerateCombinations() to generate a second formula which
includes the unfolded offset in the combined loop-invariant register.

Differential Revision: https://reviews.llvm.org/D51861

llvm-svn: 345114
2018-10-24 07:08:38 +00:00
Wei Mi
09801d8a06 [PM] keeping history when original SCC split and then merge into itself
in the same round of SCC update.

In https://reviews.llvm.org/rL309784, inline history is added to prevent
infinite inlining across multiple run of inliner and SCC update, but the
history will only be kept when new SCC is actually generated during SCC update.

We found a case that SCC can be split and then merge into itself in the same
round of SCC update, so the same SCC will be pop out from UR.CWorklist and
then added back immediately, without any new SCC generated, that is why the
existing patch cannot catch the infinite inline case.

What the patch does is even if no new SCC is generated, if only the current
SCC appears in UR.CWorklist again, then keep the inline history.

Differential Revision: https://reviews.llvm.org/D52915

llvm-svn: 345103
2018-10-23 23:29:45 +00:00
Vedant Kumar
842c78c9af [HotColdSplitting] Attach MinSize to outlined code
Outlined code is cold by assumption, so it makes sense to optimize it
for minimal code size rather than performance.

After r344869 moved the splitting pass to the end of the IR pipeline,
this does not result in much of a code size reduction. This is probably
because a comparatively small number backend transforms make use of the
MinSize hint.

Running LNT on x86_64, I see that 33/1020 binaries shrink for a total of
919 bytes of TEXT reduction. I didn't measure a significant performance
impact.

Differential Revision: https://reviews.llvm.org/D53518

llvm-svn: 345072
2018-10-23 19:41:12 +00:00
Jordan Rupprecht
48c08c639e [DebugInfo][GlobalOpt] Fix -debugify for globalopt shrinking globals to booleans.
Summary:
TryToShrinkGlobalToBoolean, when possible, will split store <value> + load <value> into store <bool> + select <bool ? value : 0>. This preserves DebugLoc during that pass.

Fixes PR37959. The test case here is the simplified .ll for:

```
static int foo;
int bar() {
  foo = 5;
  return foo;
}
```

Reviewers: dblaikie, gbedwell, aprantl

Reviewed By: dblaikie

Subscribers: mehdi_amini, JDevlieghere, dexonsmith, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D53531

llvm-svn: 345046
2018-10-23 16:35:51 +00:00
Sanjay Patel
874543a4ff [Reassociate] replace fake binop queries with 'match' API
We need to update this code before introducing an 'fneg' instruction in IR,
so we might as well kill off the integer neg/not queries too.

This is no-functional-change-intended for scalar code and most vector code. 
For vectors, we can see that the 'match' API allows for undef elements in 
constants, so we optimize those cases better.

Ideally, there would be a test for each code diff, but I don't see evidence
of that for the existing code, so I didn't try very hard to come up with new 
vector tests for each code change.

Differential Revision: https://reviews.llvm.org/D53533

llvm-svn: 345042
2018-10-23 15:55:06 +00:00
Simon Pilgrim
775ebccf6b [SLPVectorizer] Add basic support for mul/and/or/xor horizontal reductions
Expand arithmetic reduction to include mul/and/or/xor instructions.

This patch just fixes the SLPVectorizer - the effective reduction costs for AVX1+ are still poor (see rL344846) and will need to be improved before SLP sees this as a valid transform - but we can already see the effect on SSE2 tests.

This partially helps PR37731, but doesn't fix it all as it still falls over on the extraction/reduction order for some reason.

Differential Revision: https://reviews.llvm.org/D53473

llvm-svn: 345037
2018-10-23 15:13:09 +00:00
Sanjay Patel
1425492643 [InstCombine] use 'match' to handle vectors and simplify code
This is another step towards completely removing the fake 
binop queries for not/neg/fneg.

llvm-svn: 345036
2018-10-23 15:05:12 +00:00
Sanjay Patel
e4ec3df7e4 [InstCombine] swap select profile metadata when swapping select ops
llvm-svn: 345034
2018-10-23 14:43:31 +00:00
Sanjay Patel
59f372e3e0 [InstCombine] add/move tests for select with inverted condition; NFC
The transform is broken in 2 ways - it doesn't correct metadata (or even drop it),
and it doesn't work with vectors with undef elements.

llvm-svn: 345033
2018-10-23 14:37:29 +00:00
Sanjay Patel
fce34b5b85 [SLSR] use 'match' to simplify code; NFC
This pass could probably be modified slightly to allow
vector splat transforms for practically no cost, but
it only works on scalars for now. So the use of the
newer 'match' API should make no functional difference. 

llvm-svn: 345030
2018-10-23 14:07:39 +00:00
Sanjay Patel
5b20aa2364 [SLSR] auto-generate full test assertions; NFC
llvm-svn: 345028
2018-10-23 13:39:40 +00:00