1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 13:02:52 +02:00
Commit Graph

12040 Commits

Author SHA1 Message Date
Simon Pilgrim
ce1717761f Added test case for PR22678 (check CONCAT_VECTORS DAG combiner pass doesn't introduce illegal types)
llvm-svn: 230386
2015-02-24 21:46:23 +00:00
Andrew Kaylor
914fdbf483 Fixing eol-style
llvm-svn: 230378
2015-02-24 20:49:35 +00:00
Eric Christopher
db420cc1dd Revert:
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date:   Mon Feb 23 23:04:28 2015 +0000

    Fix based on post-commit comment on D7816 & rL230177 - BUILD_VECTOR operand truncation was using the the BV's output scalar type instead of the input type.

and

Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date:   Sun Feb 22 18:17:28 2015 +0000

    [DagCombiner] Generalized BuildVector Vector Concatenation

    The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.

    This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.

    This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.

    Differential Revision: http://reviews.llvm.org/D7816

as the root cause of PR22678 which is causing an assertion inside the DAG combiner.

I'll follow up to the main thread as well.

llvm-svn: 230358
2015-02-24 19:11:00 +00:00
Matthias Braun
45ead6c770 AArch64: Relax assert about large shift sizes.
The reason why these large shift sizes happen is because OpaqueConstants
currently inhibit alot of DAG combining, but that has to be addressed in
another commit (like the proposal in D6946).

Differential Revision: http://reviews.llvm.org/D6940

llvm-svn: 230355
2015-02-24 18:52:04 +00:00
Tom Stellard
266ed1f528 R600/SI: Remove isel mubuf legalization
We legalize mubuf instructions post-instruction selection, so this
code is no longer needed.

llvm-svn: 230352
2015-02-24 17:59:19 +00:00
Tim Northover
afcf85da25 ARM: treat [N x i32] and [N x i64] as AAPCS composite types
The logic is almost there already, with our special homogeneous aggregate
handling. Tweaking it like this allows front-ends to emit AAPCS compliant code
without ever having to count registers or add discarded padding arguments.

Only arrays of i32 and i64 are needed to model AAPCS rules, but I decided to
apply the logic to all integer arrays for more consistency.

llvm-svn: 230348
2015-02-24 17:22:34 +00:00
Hans Wennborg
fb5af71543 Revert r230280: "Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap"
This caused PR22674, failing this assert:

Instructions.h:2281: llvm::Value* llvm::PHINode::getOperand(unsigned int) const: Assertion `i_nocapture < OperandTraits<PHINode>::operands(this) && "getOperand() out of range!"' failed.

llvm-svn: 230341
2015-02-24 16:19:29 +00:00
Michael Kuperstein
0bcb785609 [x32] x32 should use ebx as the base pointer.
This fixes the original issue in PR22655, but not the secondary one.

llvm-svn: 230334
2015-02-24 15:27:13 +00:00
Reed Kotler
05834965d0 Beginning of alloca implementation for Mips fast-isel
Summary: Begin to add various address modes; including alloca.

Test Plan: Make sure there are no regressions in test-suite at O0/02 in mips32r1/r2

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: echristo, rfuhler, llvm-commits

Differential Revision: http://reviews.llvm.org/D6426

llvm-svn: 230300
2015-02-24 02:36:45 +00:00
David Majnemer
f29778c502 X86: Only use 'lea' in Win64 epilogues if a frame pointer exists
We can only use 'add' in epilogues, 'lea' is not permitted unless we've
established a frame pointer in the prologue.

llvm-svn: 230286
2015-02-24 00:11:32 +00:00
Sanjoy Das
9dddcf7f33 Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.

This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.

Apart from the attached test case, another (more realistic) manifestation
of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll.

NOTE: this change was landed with an incorrect commit message in
rL230275 and was reverted for that reason in rL230279.  This commit
message is the correct one.

Differential Revision: http://reviews.llvm.org/D7778

llvm-svn: 230280
2015-02-23 23:22:58 +00:00
Sanjoy Das
e5ca05754e Revert 230275.
230275 got committed with an incorrect commit message due to a mixup
on my side.  Will re-land in a few moments with the correct commit
message.

llvm-svn: 230279
2015-02-23 23:13:22 +00:00
Andrea Di Biagio
fc5bcff188 [X86] Teach how to custom lower double-to-half conversions under fast-math.
This patch teaches the backend how to expand a double-half conversion into
a double-float conversion immediately followed by a float-half conversion.
We do this only under fast-math, and if float-half conversions are legal
for the target.

Added test CodeGen/X86/fastmath-float-half-conversion.ll

Differential Revision: http://reviews.llvm.org/D7832

llvm-svn: 230276
2015-02-23 22:59:02 +00:00
Sanjoy Das
421baaa45f Fix bug 22641
The bug was a result of getPreStartForExtend interpreting nsw/nuw
flags on an add recurrence more strongly than is legal.  {S,+,X}<nsw>
implies S+X is nsw only if the backedge of the loop is taken at least
once.

Differential Revision: http://reviews.llvm.org/D7808

llvm-svn: 230275
2015-02-23 22:55:13 +00:00
David Majnemer
b6ac6360c4 X86: Use a smaller 'mov' instruction for stack probe calls
Prologue emission, in some cases, requires calls to a stack probe helper
function.  The amount of stack to probe is passed as a register
argument in the Win64 ABI but the instruction sequence used is
pessimistic: it assumes that the number of bytes to probe is greater
than 4 GB.

Instead, select a more appropriate opcode depending on the number of
bytes we are going to probe.

llvm-svn: 230270
2015-02-23 21:50:30 +00:00
David Majnemer
5bffee412d X86: Use 'mov' instead of 'lea' in Win64 SEH prologues when possible
'mov' and 'lea' are equivalent when the displacement applied with 'lea'
is zero.  However, 'mov' should encode smaller.

llvm-svn: 230269
2015-02-23 21:50:27 +00:00
Bruno Cardoso Lopes
a8c6f58e91 [X86][MMX] Fix test to reflect current codegen
This test failed in several buildbots, a bit unclear how that happen
since this was the previous behavior before r230248.

llvm-svn: 230258
2015-02-23 20:57:46 +00:00
Andrew Kaylor
a496164a47 Adding test for Windows EH frame variable remapping.
llvm-svn: 230250
2015-02-23 20:04:51 +00:00
Andrew Kaylor
4e424eb088 Remap frame variables for native Windows exception handling.
Differential Revision: http://reviews.llvm.org/D7770

llvm-svn: 230249
2015-02-23 20:01:56 +00:00
Bruno Cardoso Lopes
390bb4572c Revert "[X86][MMX] Add MMX instructions to foldable tables"
This reverts commit r230226 since it breaks win buildbots.

llvm-svn: 230248
2015-02-23 19:53:37 +00:00
Daniel Sanders
0eca0162a3 [mips] Honour -mno-odd-spreg for vector insert/extract when MSA is enabled.
Summary:
-mno-odd-spreg prohibits the use of odd-numbered single-precision floating
point registers. However, vector insert/extract was still using them when
manipulating the subregisters of an MSA register. Fixed this by ensuring
that insertion/extraction is only performed on even-numbered vector
registers when -mno-odd-spreg is given.

Reviewers: vmedic, sstankovic

Reviewed By: sstankovic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7672

llvm-svn: 230235
2015-02-23 17:22:16 +00:00
Bruno Cardoso Lopes
01a01f5b4a [X86] Add specific mtriple in order to appease builbots
llvm-svn: 230229
2015-02-23 15:33:40 +00:00
Bruno Cardoso Lopes
ae9a9b4601 [X86][MMX] Add MMX instructions to foldable tables
Teach the peephole optimizer to work with MMX instructions by adding
entries into the foldable tables. This covers folding opportunities not
handled during isel.

llvm-svn: 230226
2015-02-23 15:23:22 +00:00
Bruno Cardoso Lopes
8fe8621194 [X86][MMX] Support folding loads in psll, psrl and psra intrinsics
llvm-svn: 230225
2015-02-23 15:23:14 +00:00
Bruno Cardoso Lopes
8b65bbca30 [X86][MMX] Add tests for pslli, psrli and psrai intrinsics
Add tests to cover the RR form of the pslli, psrli and psrai intrinsics.
In the next commit, the loads are going to be folded and the
instructions use the RM form.

llvm-svn: 230224
2015-02-23 15:23:06 +00:00
Elena Demikhovsky
b15d81ba19 AVX-512: recommitted 229837 + bugfix + test
llvm-svn: 230223
2015-02-23 15:12:31 +00:00
Simon Pilgrim
fb2ad0f430 [DagCombiner] Generalized BuildVector Vector Concatenation
The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.

This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.

This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.

Differential Revision: http://reviews.llvm.org/D7816

llvm-svn: 230177
2015-02-22 18:17:28 +00:00
Matt Arsenault
18256b1e8b R600/SI: Use v_madmk_f32
llvm-svn: 230149
2015-02-21 21:29:10 +00:00
Matt Arsenault
b126670f9a R600/SI: Try to use v_madak_f32
This is a code size optimization when the constant
only has one use.

llvm-svn: 230148
2015-02-21 21:29:07 +00:00
Simon Pilgrim
1794ce916b [X86][SSE] Added shuffle based integer zero extension tests.
llvm-svn: 230145
2015-02-21 21:25:16 +00:00
David Majnemer
15414a7819 Win64: Stack alignment constraints aren't applied during SET_FPREG
Stack realignment occurs after the prolog, not during, for Win64.
Because of this, don't factor in the maximum stack alignment when
establishing a frame pointer.

This fixes PR22572.

llvm-svn: 230113
2015-02-21 01:04:47 +00:00
Rafael Espindola
2661d3d69a Use short names for jumptable sections.
Also refactor code to remove some duplication.

llvm-svn: 230087
2015-02-20 23:28:28 +00:00
Matt Arsenault
d59f9a3d0d R600/SI: Remove v_sub_f64 pseudo
The expansion code does the same thing. Since
the operands were not defined with the correct
types, this has the side effect of fixing operand
folding since the expanded pseudo would never use
SGPRs or inline immediates.

llvm-svn: 230072
2015-02-20 22:10:45 +00:00
Matt Arsenault
82f523f0fa R600: Use new fmad node.
This enables a few useful combines that used to only
use fma.

Also since v_mad_f32 apparently does not support denormals,
disable the existing cases that are custom handled if they are
requested.

llvm-svn: 230071
2015-02-20 22:10:41 +00:00
Jozef Kolek
6455b0cb7f Reversed revision 229706. The reason is regression, which is caused by the
usage of instruction ADDU16 by CodeGen. For this instruction an improper
register is allocated, i.e. the register that is not from register set defined
for the instruction.

llvm-svn: 230053
2015-02-20 20:26:52 +00:00
Andrea Di Biagio
bd0813ded2 [X86][FastIsel] Teach how to select float-half conversion intrinsics.
This patch teaches X86FastISel how to select intrinsic 'convert_from_fp16' and
intrinsic 'convert_to_fp16'.
If the target has F16C, we can select VCVTPS2PHrr for a float-half conversion,
and VCVTPH2PSrr for a half-float conversion.

Differential Revision: http://reviews.llvm.org/D7673

llvm-svn: 230043
2015-02-20 19:37:14 +00:00
Kit Barton
e41ef2390a I incorrectly marked the VORC instruction as isCommutable when I added it.
This fix removes the VORC instruction definition from the isCommutable block.

Phabricator review: http://reviews.llvm.org/D7772

llvm-svn: 230020
2015-02-20 15:54:58 +00:00
Hal Finkel
03acdd5b32 [PowerPC] Loop Data Prefetching for the BG/Q
The IBM BG/Q supercomputer's A2 cores have a hardware prefetching unit, the
L1P, but it does not prefetch directly into the A2's L1 cache. Instead, it
prefetches into its own L1P buffer, and the latency to access that buffer is
significantly higher than that to the L1 cache (although smaller than the
latency to the L2 cache). As a result, especially when multiple hardware
threads are not actively busy, explicitly prefetching data into the L1 cache is
advantageous.

I've been using this pass out-of-tree for data prefetching on the BG/Q for well
over a year, and it has worked quite well. It is enabled by default only for
the BG/Q, but can be enabled for other cores as well via a command-line option.

Eventually, we might want to add some TTI interfaces and move this into
Transforms/Scalar (there is nothing particularly target dependent about it,
although only machines like the BG/Q will benefit from its simplistic
strategy).

llvm-svn: 229966
2015-02-20 05:08:21 +00:00
Chandler Carruth
ef5f7bdec3 [x86] Remove the old vector shuffle lowering code and its flag.
The new shuffle lowering has been the default for some time. I've
enabled the new legality testing by default with no really blocking
regressions. I've fuzz tested this very heavily (many millions of fuzz
test cases have passed at this point). And this cleans up a ton of code.
=]

Thanks again to the many folks that helped with this transition. There
was a lot of work by others that went into the new shuffle lowering to
make it really excellent.

In case you aren't using a diff algorithm that can handle this:
  X86ISelLowering.cpp: 22 insertions(+), 2940 deletions(-)

llvm-svn: 229964
2015-02-20 04:25:04 +00:00
Chandler Carruth
55eca885d4 [x86] Now that the new vector shuffle legality is enabled and everything
is going well, remove the flag and the code for the old legality tests.

This is the first step toward removing the entire old vector shuffle
lowering. *Much* more code to delete coming up next.

llvm-svn: 229963
2015-02-20 03:59:35 +00:00
Chandler Carruth
6b218ec380 [x86] Make the new vector shuffle legality test on by default, which
reflects the fact that the x86 backend can in fact lower any shuffle you
want it to with reasonably high code quality.

My recent work on the new vector shuffle has made this regress *very*
little. The diff in the test cases makes me very, very happy.

llvm-svn: 229958
2015-02-20 03:05:47 +00:00
Chandler Carruth
74f60d7a7d [x86] Clean up a couple of test cases with the new update script. Split
one test case that is only partially tested in 32-bits into two test
cases so that the script doesn't generate massive spews of tests for the
cases we don't care about.

llvm-svn: 229955
2015-02-20 02:44:13 +00:00
Chandler Carruth
4c2aad4a26 Revert r229944: EH: Prune unreachable resume instructions during Dwarf EH preparation
This doesn't pass 'ninja check-llvm' for me. Lots of tests, including
the ones updated, fail with crashes and other explosions.

llvm-svn: 229952
2015-02-20 02:15:36 +00:00
Reid Kleckner
13eabcdf32 EH: Prune unreachable resume instructions during Dwarf EH preparation
Today a simple function that only catches exceptions and doesn't run
destructor cleanups ends up containing a dead call to _Unwind_Resume
(PR20300). We can't remove these dead resume instructions during normal
optimization because inlining might introduce additional landingpads
that do have cleanups to run. Instead we can do this during EH
preparation, which is guaranteed to run after inlining.

Fixes PR20300.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D7744

llvm-svn: 229944
2015-02-20 01:00:19 +00:00
Eric Christopher
c93875565e Revert "AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics."
The instructions were being generated on architectures that don't support avx512.

This reverts commit r229837.

llvm-svn: 229942
2015-02-20 00:45:28 +00:00
Ahmed Bougacha
5f490e6f09 [ARM] Re-re-apply VLD1/VST1 base-update combine.
This re-applies r223862, r224198, r224203, and r224754, which were
reverted in r228129 because they exposed Clang misalignment problems
when self-hosting.

The combine caused the crashes because we turned ISD::LOAD/STORE nodes
to ARMISD::VLD1/VST1_UPD nodes.  When selecting addressing modes, we
were very lax for the former, and only emitted the alignment operand
(as in "[r1:128]") when it was larger than the standard alignment of
the memory type.

However, for ARMISD nodes, we just used the MMO alignment, no matter
what.  In our case, we turned ISD nodes to ARMISD nodes, and this
caused the alignment operands to start being emitted.

And that's how we exposed alignment problems that were ignored before
(but I believe would have been caught with SCTRL.A==1?).

To fix this, we can just mirror the hack done for ISD nodes:  only
take into account the MMO alignment when the access is overaligned.

Original commit message:
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.

We can do the same thing for generic load/stores.

Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).

rdar://19717869, rdar://14062261.

llvm-svn: 229932
2015-02-19 23:52:41 +00:00
Sanjay Patel
3da4617a80 add X86 load folding tests for unary math ops
X86 load folding is fragile; eg, the tests here
don't work without AVX even though they should. This
is because we have a mix of tablegen patterns that have
been added over time, and we have a load folding table
used by the peephole optimizer that has to be kept in 
sync with the ever-changing ISA and tablegen defs.

llvm-svn: 229870
2015-02-19 16:59:11 +00:00
Chandler Carruth
6d618f4a18 [x86] Delete still more piles of complex code now that we have a good
systematic lowering of v8i16.

This required a slight strategy shift to prefer unpack lowerings in more
places. While this isn't a cut-and-dry win in every case, it is in the
overwhelming majority. There are only a few places where the old
lowering would probably be a touch faster, and then only by a small
margin.

In some cases, this is yet another significant improvement.

llvm-svn: 229859
2015-02-19 15:21:57 +00:00
Chandler Carruth
fad94c932a [x86] Teach the unpack lowering how to lower with an initial unpack in
addition to lowering to trees rooted in an unpack.

This saves shuffles and or registers in many various ways, lets us
handle another class of v4i32 shuffles pre SSE4.1 without domain
crosses, etc.

llvm-svn: 229856
2015-02-19 15:06:13 +00:00
Chandler Carruth
6f2050671d [x86] Dramatically improve v8i16 shuffle lowering by not using its
terribly complex partial blend logic.

This code path was one of the more complex and bug prone when it first
went in and it hasn't faired much better. Ultimately, with the simpler
basis for unpack lowering and support bit-math blending, this is
completely obsolete. In the worst case without this we generate
different but equivalent instructions. However, in many cases we
generate much better code. This is especially true when blends or pshufb
is available.

This does expose one (minor) weakness of the unpack lowering that I'll
try to address.

In case you were wondering, this is actually a big part of what I've
been trying to pull off in the recent string of commits.

llvm-svn: 229853
2015-02-19 14:08:24 +00:00