1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 05:23:45 +02:00
Commit Graph

930 Commits

Author SHA1 Message Date
Ahmed Bougacha
a2af74a580 [AArch64] Don't blindly lower f16/f128 FCCMPs.
Instead, extend f16 (like we do when lowering a standalone SETCC),
and let f128 be legalized to the RT calls.

Fixes PR26803.

llvm-svn: 263301
2016-03-11 22:02:58 +00:00
Quentin Colombet
8cb0c9cc03 [IRTranslator] Translate unconditional branches.
llvm-svn: 263265
2016-03-11 17:28:03 +00:00
Tim Northover
594e5f0364 AArch64: only try to use scaled fcvt ops on legal vector types.
Before we ended up calling getSimpleVectorType on a <3 x float>, which
asserted.

llvm-svn: 263169
2016-03-10 23:02:21 +00:00
Balaram Makam
635be27aa9 Fix testicase to turn buildbot green. NFC.
llvm-svn: 263154
2016-03-10 19:07:50 +00:00
Balaram Makam
a2655214ad [AArch64] Optimize compare and branch sequence when the compare's constant operand is power of 2
Summary:
Peephole optimization that generates a single TBZ/TBNZ instruction
for test and branch sequences like in the example below. This handles
the cases that miss folding of AND into TBZ/TBNZ during ISelLowering of BR_CC

Examples:
   and  w8, w8, #0x400
   cbnz w8, L1
 to
   tbnz w8, #10, L1

Reviewers: MatzeB, jmolloy, mcrosier, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D17942

llvm-svn: 263136
2016-03-10 17:54:55 +00:00
Roman Levenstein
26a5894f0f Add support for a preserve_most calling convention to the AArch64 backend.
This change adds a support for a preserve_most calling convention to the AArch64 backend, similar to how it was done for X86-64.

There is also a subsequent patch on top of this one to add a tail-calls support for this calling convention.

Differential Revision: http://reviews.llvm.org/D18016

llvm-svn: 263092
2016-03-10 04:35:09 +00:00
Chad Rosier
5c9777923f [AArch64] Disable the MI scheduler to turn bots green after r262942.
llvm-svn: 262944
2016-03-08 17:33:34 +00:00
Quentin Colombet
dc021403fd [AArch64][GlobalISel] Add a test case for the IRTranslator.
llvm-svn: 262898
2016-03-08 01:48:08 +00:00
Sanjay Patel
0d79c5acf8 [AArch64] fold 'isPositive' vector integer operations (PR26819)
This is one of the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26819

Shift and negate is what InstCombine prefers to produce (and I tried to make it do more of that
in http://reviews.llvm.org/rL262424 ), so we should recognize that pattern as something that might
come from autovectorization even if it's unlikely to be produced from C NEON intrinsics.

The patch is based on the x86 equivalent:
http://reviews.llvm.org/rL262036

Differential Revision: http://reviews.llvm.org/D17834

llvm-svn: 262623
2016-03-03 15:56:08 +00:00
Renato Golin
c777279486 Making rem_crash.ll target-specific
This test failed in some ARM bots after a divmod change because it was
running on a native llc, instead of targeted one. This makes sure the test
is target-specific (as intended), and also copies to ARM and AArch64
directories. If it is also supposed to work on other architectures, I'll
leave as an exercise to the respective maintainers.

llvm-svn: 262620
2016-03-03 14:01:10 +00:00
Sanjay Patel
5151ce438a [AArch64] add tests to demonstrate existing codegen for PR26819
llvm-svn: 262540
2016-03-02 23:22:03 +00:00
Geoff Berry
8e0ed2340c [AArch64] Enable non-leaf frame pointer elimination.
Summary:
This change enables frame pointer elimination in non-leaf functions.
The -fomit-frame-pointer option still needs to be used when compiling
via clang (or an equivalent method of not setting the
'no-frame-pointer-elim*' function attributes if generating llvm IR via
some other method) to take advantage of this optimization.

This change should be NFC when compiling via clang without
-fomit-frame-pointer.

Reviewers: t.p.northover

Subscribers: aemerson, rengolin, tberghammer, qcolombet, llvm-commits, danalbert, mcrosier, srhines

Differential Revision: http://reviews.llvm.org/D17730

llvm-svn: 262495
2016-03-02 17:58:31 +00:00
Geoff Berry
c28f755d4b Revert "[AArch64] Fix isLegalAddImmediate() to return true for valid negative values."
Revert r262248 in an attempt to fix the clang-native-aarch64-full
    bot and to investigate a performance regression in
    SingleSource/Benchmarks/CoyoteBench/huffbench

llvm-svn: 262388
2016-03-01 20:28:52 +00:00
Geoff Berry
0df55f3489 [AArch64] Fix isLegalAddImmediate() to return true for valid negative values.
Reviewers: t.p.northover, jmolloy

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D17463

llvm-svn: 262248
2016-02-29 19:53:22 +00:00
Cong Hou
1cb5dabaaa Fix a bug in isVectorReductionOp() in SelectionDAGBuilder.cpp that may cause assertion failure on AArch64.
llvm-svn: 262091
2016-02-26 23:25:30 +00:00
Paul Robinson
4103ca4c54 Reapply r262054 with triple fix.
llvm-svn: 262069
2016-02-26 21:18:34 +00:00
Paul Robinson
837d0b5fec Revert r262054 on one file that fails sometimes.
llvm-svn: 262060
2016-02-26 20:41:07 +00:00
Paul Robinson
119632fa81 Fix tests that used CHECK-NEXT-NOT and CHECK-DAG-NOT.
FileCheck actually doesn't support combo suffixes.

Differential Revision: http://reviews.llvm.org/D17588

llvm-svn: 262054
2016-02-26 19:40:34 +00:00
Junmo Park
e684ff35db [CodeGenPrepare] Remove load-based heuristic
Summary:
Both the hardware and LLVM have changed since 2012.
Now, load-based heuristic don't show big differences any more on OoO cores.

There is no notable regressons and improvements on spec2000/2006. (Cortex-A57, Core i5).

Reviewers: spatel, zansari
    
Differential Revision: http://reviews.llvm.org/D16836

llvm-svn: 261809
2016-02-25 00:23:27 +00:00
Geoff Berry
5377924081 [AArch64] Generate csinv instruction more often
Reviewers: t.p.northover, jmolloy

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D17546

llvm-svn: 261675
2016-02-23 19:34:13 +00:00
Geoff Berry
a37df81842 [AArch64] Fix fastcc -tailcallopt epilog code generation.
Summary:
Fix a bug in epilog generation where the incoming stack arguments were
not being popped for fastcc functions when -tailcallopt was passed.

Reviewers: t.p.northover, mcrosier, jmolloy, rengolin

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16894

llvm-svn: 261650
2016-02-23 16:54:36 +00:00
Geoff Berry
986cabdd71 [AArch64][ShrinkWrap] Fix bug in prolog clobbering live reg when shrink wrapping.
Summary: See bug https://llvm.org/bugs/show_bug.cgi?id=26642

Reviewers: qcolombet, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D17350

llvm-svn: 261349
2016-02-19 18:27:32 +00:00
Matthias Braun
2bbf522cfc LegalizeDAG: Fix ExpandFCOPYSIGN assuming the same type on both inputs
llvm-svn: 261306
2016-02-19 04:44:19 +00:00
Lawrence Hu
405ada2ef1 Bug fix: use dyn_cast_or_null instead of dyn_cast
Differential Revision: http://reviews.llvm.org/D17154

llvm-svn: 261299
2016-02-19 02:17:07 +00:00
Justin Lebar
975bf7a977 When printing MIR, output to errs() rather than outs().
Summary:
Without this, this command

  $ llvm-run llc -stop-after machine-cp -o - <( echo '' )

outputs an error, because we close stdout twice -- once when closing the
file opened for "-o", and again when closing outs().

Also clarify in the outs() definition that you can't ever call it if you
want to open your own raw_fd_ostream on stdout.

Reviewers: jroelofs, tstellarAMD

Subscribers: jholewinski, qcolombet, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D17422

llvm-svn: 261286
2016-02-19 00:18:46 +00:00
Tim Northover
2fedd64241 AArch64: always clear kill flags up to last eliminated copy
After r261154, we were only clearing flags if the known-zero register was
originally live-in to the basic block, but we have to do it even if not when
more than one COPY has been eliminated, otherwise the user of the first COPY
may still have <kill> marked.

E.g.

BB#N:
    %X0 = COPY %XZR
    STRXui %X0<kill>, <fi#0>
    %X0 = COPY %XZR
    STRXui %X0<kill>, <fi#1>

We can eliminate both copies, X0 is not live-in, but we must clear the kill on
the first store.

Unfortunately, I've been unable to come up with a non-fragile test for this.
I've only seen it in the wild with regalloc-created spills, and attempts to
reproduce that in a reasonable way run afoul of COPY coalescing. Even volatile
asm clobbers were moved around. Should fix the aarch64 bot though.

llvm-svn: 261175
2016-02-17 23:07:04 +00:00
Tim Northover
b44c9536bf AArch64: improve redundant copy elimination.
Mostly, this fixes the bug that if the CBZ guaranteed Xn but Wn was used, we
didn't sort out the use-def chain properly.

I've also made it check more than just the last instruction for a compatible
CBZ (so it can cope without fallthroughs). I'd have liked to do that
separately, but it's helps writing the test.

Finally, I removed some custom loops in favour of MachineInstr helpers and
refactored the control flow to flatten it and avoid possibly quadratic
iterations in blocks with many copies. NFC for these, just a general tidy-up.

llvm-svn: 261154
2016-02-17 21:16:53 +00:00
Jun Bum Lim
bf77014eda [AArch64] Add pass to remove redundant copy after RA
Summary:
This change will add a pass to remove unnecessary zero copies in target blocks
of cbz/cbnz instructions. E.g., the copy instruction in the code below can be
removed because the cbz jumps to BB1 when x0 is zero :
  BB0:
    cbz x0, .BB1
  BB1:
    mov x0, xzr

Jun

Reviewers: gberry, jmolloy, HaoLiu, MatzeB, mcrosier

Subscribers: mcrosier, mssimpso, haicheng, bmakam, llvm-commits, aemerson, rengolin

Differential Revision: http://reviews.llvm.org/D16203

llvm-svn: 261004
2016-02-16 20:02:39 +00:00
Geoff Berry
2d034feb0d [AArch64] Reduce number of callee-save save/restores.
Summary:
Before this change, callee-save registers would be rounded up to even
pairs of GPRs and FPRs.  This change eliminates these extra padding
load/stores, though it does keep the stack allocation the same size
unless both the GPR and FPR sets have an odd size, in which case one
full pair stack slot (16 bytes) is saved.

This optimization cannot currently be done for MachO targets since they
rely on a fast-path .debug_frame equivalent that can only encode
callee-save registers as pairs.

Reviewers: t.p.northover, rengolin, mcrosier, jmolloy

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D17000

llvm-svn: 260689
2016-02-12 16:31:41 +00:00
Chad Rosier
81da1b9bcf [AArch64] Add support for Qualcomm Kryo CPU.
Machine model description by Dave Estes <cestes@codeaurora.org>.

llvm-svn: 260686
2016-02-12 15:51:51 +00:00
Jun Bum Lim
2e0485683a [AArch64] Merge two adjacent str WZR into str XZR
Summary:
This change merges adjacent 32 bit zero stores into a 64 bit zero store.
e.g.,
  str wzr, [x0]
  str wzr, [x0, #4]
becomes
  str xzr, [x0]

Therefore, four adjacent 32 bit zero stores will be a single stp.
e.g.,
  str wzr, [x0]
  str wzr, [x0, #4]
  str wzr, [x0, #8]
  str wzr, [x0, #12]
becomes
  stp xzr, xzr, [x0]

Reviewers: mcrosier, jmolloy, gberry, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16933

llvm-svn: 260682
2016-02-12 15:25:39 +00:00
Chad Rosier
0febca6df8 [AArch64] Improve load/store optimizer to handle LDUR + LDR.
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.

This is a reapplication of r259812, which had an incorrect assert.  The
test_stur_str_no_assert() test is a reduced version of the issue hit in
the AArch64 self-host.

PR24465

llvm-svn: 260523
2016-02-11 14:25:08 +00:00
Geoff Berry
16f8ead77d [AArch64] AArch64LoadStoreOptimizer: fix bug in pre-inc check iterator
Summary:
Fix case where a pre-inc/dec load/store would not be formed if the
add/sub that forms the inc/dec part of the operation was the first
instruction in the block being examined.

Reviewers: mcrosier, jmolloy, t.p.northover, junbuml

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16785

llvm-svn: 260275
2016-02-09 20:47:21 +00:00
Tim Northover
1dabed8e29 AArch64: match correct order in subtraction pattern.
The accumulator in multiply-and-subtract instructions is actually subtracted
*from* so these patterns were computing the wrong value.

llvm-svn: 260131
2016-02-08 19:33:18 +00:00
Matt Arsenault
34d57039a9 SelectionDAG: Lower some range metadata to AssertZext
If a range has a lower bound of 0, add an AssertZext from the
nearest floor power of two.

This allows operations with some workitem intrinsics with known
maximum ranges to use fast 24-bit multiplies.

llvm-svn: 260109
2016-02-08 16:28:19 +00:00
Renato Golin
c3580d7a36 Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3)."
This reverts commit r259812 as it broke AArch64 self-hosting.

llvm-svn: 259881
2016-02-05 12:14:30 +00:00
Chad Rosier
b8b5852fe4 [AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3).
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.

PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.

This is a reapplication of r246769 and r259790.  The tramp3d failure was caused
by an incorrect refactoring in the patch.  Specifically, we weren't always
properly clearing the SExtIdx flag.

llvm-svn: 259812
2016-02-04 18:59:49 +00:00
Silviu Baranga
22ab3adc5c [AArch64] Multiply extended 32-bit ints with `[U|S]MADDL'
During instruction selection, the AArch64 backend can recognise the
following pattern and generate an [U|S]MADDL instruction, i.e. a
multiply of two 32-bit operands with a 64-bit result:

(mul (sext i32), (sext i32))
However, when one of the operands is constant, the sign extension
gets folded into the constant in SelectionDAG::getNode(). This means
that the instruction selection sees this:

(mul (sext i32), i64)
...which doesn't match the pattern. Sign-extension and 64-bit
multiply instructions are generated, which are slower than one 32-bit
multiply.

Add a pattern to match this and generate the correct instruction, for
both signed and unsigned multiplies.

Patch by Chris Diamand!

llvm-svn: 259800
2016-02-04 16:47:09 +00:00
Chad Rosier
fcca55983b Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR."
This reverts commit r259790. tramp3d-v4 is still having problems.

llvm-svn: 259795
2016-02-04 16:01:40 +00:00
Chad Rosier
52d5d7b161 [AArch64] Improve load/store optimizer to handle LDUR + LDR.
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.

PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.

This is a reapplication of r246769, which was reverted in r246782 due to a
test-suite failure.  I'm unable to reproduce the issue at this time.

llvm-svn: 259790
2016-02-04 14:42:55 +00:00
Jonas Paulsson
99afcac2ad [ScheduleDAGInstrs::buildSchedGraph()] Handling of memory dependecies rewritten.
Recommited, after some fixing with test cases.

Updated test cases:
test/CodeGen/AArch64/arm64-misched-memdep-bug.ll
test/CodeGen/AArch64/tailcall_misched_graph.ll

Temporarily disabled test cases:
test/CodeGen/AMDGPU/split-vector-memoperand-offsets.ll
test/CodeGen/PowerPC/ppc64-fastcc.ll (partially updated)
test/CodeGen/PowerPC/vsx-fma-m.ll
test/CodeGen/PowerPC/vsx-fma-sp.ll

http://reviews.llvm.org/D8705
Reviewers: Hal Finkel, Andy Trick.

llvm-svn: 259673
2016-02-03 17:52:29 +00:00
Balaram Makam
a423972bd5 AArch64: Implement missed conditional compare sequences.
Summary:
This is an extension to the existing implementation of r242436 which
restricts to only select inputs. This version fixes missed opportunities
in pr26084 by attempting to lower conditional compare sequences of
and/or trees with setcc leafs. This will additionaly handle the case
when a tree with select input is not a conjunction-disjunction tree
but some of the sub trees are conjunction-disjunction trees.

Reviewers: jmolloy, t.p.northover, mcrosier, MatzeB

Subscribers: mcrosier, llvm-commits, junbuml, haicheng, mssimpso, gberry

Differential Revision: http://reviews.llvm.org/D16291

llvm-svn: 259387
2016-02-01 19:13:07 +00:00
Ahmed Bougacha
94c2d7d066 [AArch64] Fix i64 nontemporal high-half extraction.
Since we only have pair - not single - nontemporal store instructions,
we have to extract the high part into a separate register to be able
to use them.

When the initial nontemporal codegen support was added, I wrote the
extract using the nonsensical UBFX [0,32[.
Use the correct LSR form instead.

llvm-svn: 259134
2016-01-29 01:08:41 +00:00
Junmo Park
f494fe59f5 [DAGCombiner] Don't add volatile or indexed stores to ChainedStores
Summary:
findBetterNeighborChains does not handle volatile or indexed stores.
However, it did not check when adding stores to ChainedStores.

Reviewers: arsenm

Differential Revision: http://reviews.llvm.org/D16463

llvm-svn: 259024
2016-01-28 06:23:33 +00:00
Dan Gohman
a72e83c26e [MC] Use .p2align instead of .align
For historic reasons, the behavior of .align differs between targets.
Fortunately, there are alternatives, .p2align and .balign, which make the
interpretation of the parameter explicit, and which behave consistently across
targets.

This patch teaches MC to use .p2align instead of .align, so that people reading
code for multiple architectures don't have to remember which way each platform
does its .align directive.

Differential Revision: http://reviews.llvm.org/D16549

llvm-svn: 258750
2016-01-26 00:03:25 +00:00
Matthias Braun
0892910f16 AArch64ISel: Fix ccmp code selection matching deep expressions.
Some of the conditions necessary to produce ccmp sequences were only
checked in recursive calls to emitConjunctionDisjunctionTree() after
some of the earlier expressions were already built. Move all checks over
to isConjunctionDisjunctionTree() so they are all checked before we
start emitting instructions.

Also rename some variable to better reflect their usage.

llvm-svn: 258605
2016-01-23 04:05:22 +00:00
Ahmed Bougacha
8af301da92 [AArch64] Cleanup ccmp test check labels. NFC.
llvm-svn: 258541
2016-01-22 20:02:26 +00:00
Ahmed Bougacha
1c71a2aac6 [AArch64] Lower 2-CC FCCMPs (one/ueq) using AND'ed CCs.
The current behavior is incorrect, as the two CCs returned by
changeFPCCToAArch64CC, intended to be OR'ed, are instead used
in an AND ccmp chain.

Consider:
define i32 @t(float %a, float %b, float %c, float %d, i32 %e, i32 %f) {
  %cc1 = fcmp one float %a, %b
  %cc2 = fcmp olt float %c, %d
  %and = and i1 %cc1, %cc2
  %r = select i1 %and, i32 %e, i32 %f
  ret i32 %r
}

Assuming (%a < %b) and (%c < %d); we used to do:
  fcmp  s0, s1            # nzcv <- 1000
  orr   w8, wzr, #0x1     # w8 <- 1
  csel  w9, w8, wzr, mi   # w9 <- 1
  csel  w8, w8, w9, gt    # w8 <- 1
  fcmp  s2, s3            # nzcv <- 1000
  cset   w9, mi           # w9 <- 1
  tst    w8, w9           # (w8 & w9) == 1, so: nzcv <- 0000
  csel  w0, w0, w1, ne    # w0 <- w0

We now do:
  fcmp  s2, s3            # nzcv <- 1000
  fccmp s0, s1, #0, mi    #  mi, so: nzcv <- 1000
  fccmp s0, s1, #8, le    # !le, so: nzcv <- 1000
  csel  w0, w0, w1, pl    # !pl, so: w0 <- w1

In other words, we transformed:
  (c < d) &&  ((a < b) || (a > b))
into:
  (c < d) &&   (a u>= b) && (a u<= b)
whereas, per De Morgan's, we wanted:
  (c < d) && !((a u>= b) && (a u<= b))

Note that this problem doesn't occur in the test-suite.

changeFPCCToAArch64CC produces disjunct CCs; here, one -> mi/gt.
We can't represent that in the fccmp chain; it can't express
arbitrary OR sequences, as one comment explains:
  In general we can create code for arbitrary "... (and (and A B) C)"
  sequences.  We can also implement some "or" expressions, because
  "(or A B)" is equivalent to "not (and (not A) (not B))" and we can
  implement some  negation operations. [...] However there is no way
  to negate the result of a partial sequence.

Instead, introduce changeFPCCToANDAArch64CC, which produces the
conjunct cond codes:
- (a one b)
    == ((a olt b) || (a ogt b))
    == ((a ord b) && (a une b))
- (a ueq b)
    == ((a uno b) || (a oeq b))
    == ((a ule b) && (a uge b))

Note that, at first, one might think that, when PushNegate is true,
we should use the disjunct CCs, in effect doing:
  (a || b)
  = !(!a && !(b))
  = !(!a && !(b1 || b2))  <- changeFPCCToAArch64CC(b, b1, b2)
  = !(!a && !b1 && !b2)

However, we can take advantage of the fact that the CC is already
negated, which lets us avoid special-casing PushNegate and doing
the simpler to reason about:

  (a || b)
  = !(!a && (!b))
  = !(!a && (b1 && b2))   <- changeFPCCToANDAArch64CC(!b, b1, b2)
  = !(!a && b1 && b2)

This makes both emitConditionalCompare cases behave identically,
and produces correct ccmp sequences for the 2-CC fcmps.

llvm-svn: 258533
2016-01-22 19:43:54 +00:00
Pirama Arumuga Nainar
2e5b2b3d41 Do not lower VSETCC if operand is an f16 vector
Summary:
SETCC with f16 vectors has OperationAction set to Expand but still gets
lowered to FCM* intrinsics based on its result type.  This patch skips
lowering of VSETCC if the operand is an f16 vector.

v4 and v8 tests included.

Reviewers: ab, jmolloy

Subscribers: srhines, llvm-commits

Differential Revision: http://reviews.llvm.org/D15361

llvm-svn: 258471
2016-01-22 01:16:57 +00:00
Manman Ren
177654f861 CXX_FAST_TLS calling convention: fix issue on AArch64.
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.

I will commit fixes to other platforms as well.

PR26136

llvm-svn: 257929
2016-01-15 20:13:28 +00:00