1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 19:12:56 +02:00
Commit Graph

71 Commits

Author SHA1 Message Date
Tim Northover
8de0d64b13 AArch64/ARM64: mark fmul intrinsic as commutative.
This gives DAG patterns matching indexed patterns where either side is an
indexed vector.

llvm-svn: 206875
2014-04-22 10:10:14 +00:00
Hao Liu
6daf5ecff4 Fix an infinite loop bug in DAG Combine about keeping transfering between ANY_EXTEND and SIGN_EXTEND.
llvm-svn: 206873
2014-04-22 09:57:06 +00:00
Quentin Colombet
df419d429a [CodeGenPrepare] Use APInt to check the value of the immediate in a and
while checking candidate for bit field extract.
Otherwise the value may not fit in uint64_t and this will trigger an
assertion.

This fixes PR19503.

llvm-svn: 206834
2014-04-22 01:20:34 +00:00
Yi Jiang
b1c450606d ARM64: Combine shifts and uses from different basic block to bit-extract instruction
llvm-svn: 206774
2014-04-21 19:34:27 +00:00
Michael Zolotukhin
c7f992f9a3 Reapply r206732. This time without optimization of branches.
llvm-svn: 206749
2014-04-21 12:01:33 +00:00
Chandler Carruth
164ee32140 Revert r206732 which is causing llc to crash on most of the build bots.
Original commit message:
  Implement builtins for safe division: safe.sdiv.iN, safe.udiv.iN,
  safe.srem.iN, safe.urem.iN (iN = i8, i61, i32, or i64).

llvm-svn: 206735
2014-04-21 07:11:15 +00:00
Michael Zolotukhin
f5ebd83e24 Implement builtins for safe division: safe.sdiv.iN, safe.udiv.iN, safe.srem.iN,
safe.urem.iN (iN = i8, i16, i32, or i64).

llvm-svn: 206732
2014-04-21 05:33:09 +00:00
Chad Rosier
0edf159537 [ARM64] Ports the Cortex-A53 Machine Model description from AArch64.
Summary:
This port includes the rudimentary latencies that were provided for
the Cortex-A53 Machine Model in the AArch64 backend. It also changes
the SchedAlias for COPY in the Cyclone model to an explicit
WriteRes mapping to avoid conflicts in other subtargets.

Differential Revision: http://reviews.llvm.org/D3427
Patch by Dave Estes <cestes@codeaurora.org>!

llvm-svn: 206652
2014-04-18 21:22:04 +00:00
Tim Northover
51c071ce73 AArch64/ARM64: add more NEON tests.
Mostly no testing this time, since they were just wrangling
target-specific intrinsics.

llvm-svn: 206613
2014-04-18 14:54:53 +00:00
Tim Northover
6da78986c9 ARM64: disable generation of .loh directives outside MachO.
Part of PR19455.

llvm-svn: 206611
2014-04-18 14:54:46 +00:00
Tim Northover
f264cc2d60 ARM64: don't emit .subsections_via_symbols on ELF.
Part of PR19455.

llvm-svn: 206610
2014-04-18 14:54:41 +00:00
Tim Northover
f3e0ceb127 ARM64: add extra NEG pattern.
llvm-svn: 206609
2014-04-18 14:54:35 +00:00
Tim Northover
c327e9b2cf AArch64/ARM64: port more AArch64 tests to ARM64.
llvm-svn: 206592
2014-04-18 13:16:55 +00:00
Tim Northover
56351e91d9 AArch64/ARM64: add non-scalar lowering for more FCVT operations.
llvm-svn: 206591
2014-04-18 13:16:42 +00:00
Tim Northover
de9624364e AArch64/ARM64: improve spotting of EXT instructions from VECTOR_SHUFFLE.
We couldn't cope if the first mask element was UNDEF before, which
isn't ideal.

llvm-svn: 206588
2014-04-18 12:50:58 +00:00
Tim Northover
23ca911ed1 AArch64/ARM64: spot a greater variety of concat_vector operations.
Code mostly copied from AArch64, just tidied up a trifle and plumbed
into the ARM64 way of doing things.

This also enables the AArch64 tests which inspired the previous
untested commits.

llvm-svn: 206574
2014-04-18 09:31:27 +00:00
Tim Northover
1584cfd1c5 ARM64: implement cunning optimisation from AArch64
A vector extract followed by a dup can become a single instruction even if the
types don't match. AArch64 handled this in ISelLowering, but a few reasonably
simple patterns can take care of it in TableGen, so that's where I've put it.

llvm-svn: 206573
2014-04-18 09:31:20 +00:00
Louis Gerbarg
7dd641b366 Make test/CodeGen/ARM64/vector-insertion.ll explicitly select neon syntax
Change the command line vector-insertion.ll to explicitly set the neon syntax
to apple so that buildbots that default to other syntaxes won't fail.

llvm-svn: 206502
2014-04-17 21:32:41 +00:00
Louis Gerbarg
b9b4d34ddb Improve ARM64 vector creation
This patch improves the performance of vector creation in caseiswhere where
several of the lanes in the vector are a constant floating point value. It
also includes new patterns to fold together some of the instructions when the
value is 0.0f. Test cases included.

rdar://16349427

llvm-svn: 206496
2014-04-17 20:51:50 +00:00
Jim Grosbach
a428f4ce70 ARM64: [su]xtw use W regs as inputs, not X regs.
Update the SXT[BHW]/UXTW instruction aliases and the shifted reg addressing
mode handling.

PR19455 and rdar://16650642

llvm-svn: 206495
2014-04-17 20:47:31 +00:00
Tim Northover
77edcc9a3a ARM64: switch to IR-based atomic operations.
Goodbye code!

(Game: spot the bug fixed by the change).

llvm-svn: 206490
2014-04-17 20:00:33 +00:00
Tim Northover
d47e9a6e0d ARM64: add acquire/release versions of the existing atomic intrinsics.
These will be needed to support IR-level lowering of atomic
operations.

llvm-svn: 206489
2014-04-17 20:00:24 +00:00
Adam Nemet
3430c6a131 [ARM64] Fix "Cannot select" for vector ctpop
The commit of r205855:

Author: Arnold Schwaighofer <aschwaighofer@apple.com>
Date:   Wed Apr 9 14:20:47 2014 +0000

    SLPVectorizer: Only vectorize intrinsics whose operands are widened equally

    The vectorizer only knows how to vectorize intrinics by widening all operands by
    the same factor.

    Patch by Tyler Nowicki!

exposed a backend bug causing a regression (Cannot select ctpop).

The commit msg is a bit confusing because the patch actually changes the
behavior for the loop-vectorizer as well.  As things got refactored into a
helper ctpop got snuck in to the trivially-vectorizable helper which is now
used by both vectorizers.  In other words, we started seeing vector-ctpops in
the backend.

This change makes ctpop LegalizeAction::Expand for the types not supported by
the byte-only CNT instruction.  We may be able to custom-lower these later to
a single CNT but this is to fix the compiler crash first.

Fixes <rdar://problem/16578951>

llvm-svn: 206433
2014-04-17 01:01:37 +00:00
Tim Northover
e2f8ef0ee7 AArch64/ARM64: port some NEON tests to ARM64
These ones used completely different sets of intrinsics, so the only way to do
it is create a separate ARM64 copy and change them all.

Other than that, CodeGen was straightforward, no deficiencies detected here.

llvm-svn: 206392
2014-04-16 15:28:02 +00:00
Tim Northover
881dc32be3 ARM64: specify triple so that Linux tests pass
Now that Linux is trying to reparse all inline asm it chokes on the different
comment character in this test.

llvm-svn: 206382
2014-04-16 12:03:56 +00:00
Tim Northover
9be7d39fd7 ARM64: use 32-bit moves for constants where possible.
If we know that a particular 64-bit constant has all high bits zero, then we
can rely on the fact that 32-bit ARM64 instructions automatically zero out the
high bits of an x-register. This gives the expansion logic less constraints to
satisfy and so sometimes allows it to pick better sequences.

Came up while porting test/CodeGen/AArch64/movw-consts.ll: this will allow a
32-bit MOVN to be used in @test8 soon.

llvm-svn: 206379
2014-04-16 11:52:51 +00:00
Tim Northover
0ee9eb6ded ARM64: explicitly ask for Apple NEON syntax so test passes on Linux
llvm-svn: 206368
2014-04-16 09:13:44 +00:00
Tim Northover
0610b92b97 ARM64: mark x7 as used when an i128 gets shunted onto the stack.
The second half of a split i128 was ending up in x7, which is not a good thing.

This is another part of PR19432.

llvm-svn: 206366
2014-04-16 09:03:25 +00:00
Tim Northover
dcc9d1cb89 DAGCombiner: don't optimise non-existant litpool load
This particular DAG combine is designed to kick in when both ConstantFPs will
end up being loaded via a litpool, however those nodes have a semi-legal
status, dictated by isFPImmLegal so in some cases there wouldn't have been a
litpool in the first place. Don't try to be clever in those circumstances.

Picked up while merging some AArch64 tests.

llvm-svn: 206365
2014-04-16 09:03:09 +00:00
Quentin Colombet
d23a7863fe [ARM64] Set default CPU to generic instead of cyclone.
llvm-svn: 206313
2014-04-15 19:08:46 +00:00
Robert Lougher
eaacc6b6ba Revert r191049/r191059 as it can produce wrong code (see PR17975).
It has already been reverted on the 3.4 branch in r196521.

llvm-svn: 206311
2014-04-15 18:34:24 +00:00
Tim Northover
19b6db94a6 ARM64: add constraints to various FastISel operations
llvm-svn: 206284
2014-04-15 13:59:53 +00:00
Louis Gerbarg
3eecd92d3d Fix for codegen bug that could cause illegal cmn instruction generation
In rare cases the dead definition elimination pass code can cause illegal cmn
instructions when it replaces dead registers on instructions that use
unmaterialized frame indexes. This patch disables the dead definition
optimization for instructions which include frame index operands.

rdar://16438284

llvm-svn: 206208
2014-04-14 21:05:05 +00:00
Louis Gerbarg
c06e27404e Add a flag to disable the ARM64DeadRegisterDefinitionsPass
This patch adds a -arm64-dead-def-elimination flag so that it is possible to
disable dead definition elimination. Includes test case.

llvm-svn: 206207
2014-04-14 21:05:02 +00:00
Tim Northover
1ddf6e933e ARM64: remove buggy REV16 pattern.
The 32-bit pattern is still valid: 0123 -> 3210 -> 1032.

llvm-svn: 206172
2014-04-14 12:59:52 +00:00
Hal Finkel
f4336e3866 Add the ability to use GEPs for address sinking in CGP
The current memory-instruction optimization logic in CGP, which sinks parts of
the address computation that can be adsorbed by the addressing mode, does this
by explicitly converting the relevant part of the address computation into
IR-level integer operations (making use of ptrtoint and inttoptr). For most
targets this is currently not a problem, but for targets wishing to make use of
IR-level aliasing analysis during CodeGen, the use of ptrtoint/inttoptr is a
problem for two reasons:
  1. BasicAA becomes less powerful in the face of the ptrtoint/inttoptr
  2. In cases where type-punning was used, and BasicAA was used
     to override TBAA, BasicAA may no longer do so. (this had forced us to disable
     all use of TBAA in CodeGen; something which we can now enable again)

This (use of GEPs instead of ptrtoint/inttoptr) is not currently enabled by
default (except for those targets that use AA during CodeGen), and so aside
from some PowerPC subtargets and SystemZ, there should be no change in
behavior. We may be able to switch completely away from the ptrtoint/inttoptr
sinking on all targets, but further testing is required.

I've doubled-up on a number of existing tests that are sensitive to the
address sinking behavior (including some store-merging tests that are
sensitive to the order of the resulting ADD operations at the SDAG level).

llvm-svn: 206092
2014-04-12 00:59:48 +00:00
Louis Gerbarg
e84c6abeec Add ARM64 CLS patterns
This patch adds patterns to generate the cls instruction ARM64. Includes tests
for 64 bit and 32 bit operands.

rdar://15611957

llvm-svn: 206079
2014-04-11 22:27:58 +00:00
Quentin Colombet
95c5120626 [DAGCombiner] DAG combine does not know how to combine indexed loads with
sign/zero/any extensions. However a few places were not checking properly the
property of the load and were turning an indexed load into a regular extended
load. Therefore the indexed value was lost during the process and this was
triggering an assertion.

<rdar://problem/16389332>

llvm-svn: 205923
2014-04-09 20:03:05 +00:00
Alp Toker
111bd28e59 Fix some doc and comment typos
llvm-svn: 205899
2014-04-09 14:47:27 +00:00
Bradley Smith
c33112b3a6 [ARM64] Rename LR to the UAL-compliant 'X30'.
llvm-svn: 205885
2014-04-09 14:43:59 +00:00
Bradley Smith
8923d46955 [ARM64] Rename FP to the UAL-compliant 'X29'.
llvm-svn: 205884
2014-04-09 14:43:50 +00:00
Tim Northover
06d767ccba ARM64: scalarize v1i64 mul operation
This is the second part of fixing PR19367.

llvm-svn: 205836
2014-04-09 07:07:02 +00:00
Tim Northover
2dd1fcef1d ARM64: add pattern for <1 x i64> custom not node.
This should fix PR19367.

llvm-svn: 205835
2014-04-09 06:55:39 +00:00
Juergen Ributzka
ef9790c95c [Constant Hoisting][ARM64] Enable constant hoisting for ARM64.
This implements the target-hooks for ARM64 to enable constant hoisting.

This fixes <rdar://problem/14774662> and <rdar://problem/16381500>.

llvm-svn: 205791
2014-04-08 20:39:59 +00:00
Tim Northover
0fdbe96fff ARM64: fix fmsub patterns which assumed accum operand was first
Confusingly, the NEON fmla instructions put the accumulator first but the
scalar versions put it at the end (like the fma lib function & LLVM's
intrinsic).

This should fix PR19345, assuming there's only one issue.

llvm-svn: 205758
2014-04-08 12:23:51 +00:00
Tim Northover
9ea26aa436 DAGLegalize: add last-ditch type-legalization for VSELECT.
When LLVM sees something like (v1iN (vselect v1i1, v1iN, v1iN)) it can
decide that the result is OK (v1i64 is legal on AArch64, for example)
but it still need scalarising because of that v1i1. There was no code
to do this though.

AArch64 and ARM64 have DAG combines to produce efficient code and
prevent that occuring in *most* such situations, but there are edge
cases that they miss. This adds a legalization to cope with that.

llvm-svn: 205626
2014-04-04 14:49:30 +00:00
Tim Northover
421793ce9a ARM64: handle v1i1 types arising from setcc properly.
There were several overlapping problems here, and this solution is
closely inspired by the one adopted in AArch64 in r201381.

Firstly, scalarisation of v1i1 setcc operations simply fails if the
input types are legal. This is fixed in LegalizeVectorTypes.cpp this
time, and allows AArch64 code to be simplified slightly.

Second, vselect with such a setcc feeding into it ends up in
ScalarizeVectorOperand, where it's not handled. I experimented with an
implementation, but found that whatever DAG came out was rather
horrific. I think Hao's DAG combine approach is a good one for
quality, though there are edge cases it won't catch (to be fixed
separately).

Should fix PR19335.

llvm-svn: 205625
2014-04-04 14:49:21 +00:00
Tim Northover
d1d0ccfca1 ARM64: use regalloc-friendly COPY_TO_REGCLASS for bitcasts
The previous patterns directly inserted FMOV or INS instructions into
the DAG for scalar_to_vector & bitconvert patterns. This is horribly
inefficient and can generated lots more GPR <-> FPR register traffic
than necessary.

It's much better to emit instructions the register allocator
understands so it can coalesce the copies when appropriate.

It led to at least one ISelLowering hack to avoid the problems, which
was incorrect for v1i64 (FPR64 has no dsub). It can now be removed
entirely.

This should also fix PR19331.

llvm-svn: 205616
2014-04-04 09:03:09 +00:00
Tim Northover
95abd1f95a ARM64: add 128-bit MLA operations to the custom selection code.
Without this change, the llvm_unreachable kicked in. The code pattern
being spotted is rather non-canonical for 128-bit MLAs, but it can
happen and there's no point in generating sub-optimal code for it just
because it looks odd.

Should fix PR19332.

llvm-svn: 205615
2014-04-04 09:03:02 +00:00
Lang Hames
9b407b8dd3 [ARM64] Teach the ARM64DeadRegisterDefinition pass to respect implicit-defs.
When rematerializing through truncates, the coalescer may produce instructions
with dead defs, but live implicit-defs of subregs:
E.g.
  %X1<def,dead> = MOVi64imm 2, %W1<imp-def>; %X1:GPR64, %W1:GPR32

These instructions are live, and their definitions should not be rewritten.

Fixes <rdar://problem/16492408>

llvm-svn: 205565
2014-04-03 20:51:08 +00:00