1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 20:43:44 +02:00
Commit Graph

424 Commits

Author SHA1 Message Date
Quentin Colombet
ca131ef450 [AArch64] Run a peephole pass right after AdvSIMD pass.
The AdvSIMD pass may produce copies that are not coalescer-friendly. The
peephole optimizer knows how to fix that as demonstrated in the test case.

<rdar://problem/12702965>

llvm-svn: 216200
2014-08-21 18:10:07 +00:00
Juergen Ributzka
83c6e645d7 [FastISel][AArch64] Remove redundant test.
These tests and many more are already covered by fast-isel-addressing-modes.ll.

llvm-svn: 216186
2014-08-21 16:40:05 +00:00
Jiangning Liu
c14e7de948 Revert r216066, "Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type".
llvm-svn: 216147
2014-08-21 01:59:30 +00:00
Juergen Ributzka
7c8f6aa104 [FastISel][AArch64] Don't fold the sign-/zero-extend from i1 into the compare.
This fixes a bug I introduced in a previous commit (r216033). Sign-/Zero-
extension from i1 cannot be folded into the ADDS/SUBS instructions. Instead both
operands have to be sign-/zero-extended with separate instructions.

Related to <rdar://problem/17913111>.

llvm-svn: 216073
2014-08-20 16:34:15 +00:00
Jiangning Liu
c3dd378a9e Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type
legalization stage. With those two optimizations, fewer signed/zero extension
instructions can be inserted, and then we can expose more opportunities to
Machine CSE pass in back-end.

llvm-svn: 216066
2014-08-20 12:05:15 +00:00
Yi Kong
57329b1cc2 ARM: Fix codegen for rbit intrinsic
LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic.
According to ARM ARM, rbit only takes register as argument, not immediate.
The correct instruction should be rbit <Rd>, <Rm>.

The bug was originally introduced in r211057.

Differential Revision: http://reviews.llvm.org/D4980

llvm-svn: 216064
2014-08-20 10:40:20 +00:00
Juergen Ributzka
ce5953230a [FastISel][AArch64] Use the proper FMOV instruction to materialize a +0.0.
Use FMOVWSr/FMOVXDr instead of FMOVSr/FMOVDr, which have the proper register
class to be used with the zero register. This makes the MachineInstruction
verifier happy again.

This is related to <rdar://problem/18027157>.

llvm-svn: 216040
2014-08-20 01:10:36 +00:00
Juergen Ributzka
21b19be38f [FastISel][AArch64] Factor out ADDS/SUBS instruction emission and add support for extensions and shift folding.
Factor out the ADDS/SUBS instruction emission code into helper functions and
make the helper functions more clever to support most of the different ADDS/SUBS
instructions the architecture support. This includes better immedediate support,
shift folding, and sign-/zero-extend folding.

This fixes <rdar://problem/17913111>.

llvm-svn: 216033
2014-08-19 22:29:55 +00:00
Juergen Ributzka
233cb7bf1a [FastISel][AArch64] Extend floating-point materialization test.
This adds the missing test that I promised for r215753 to test the
materialization of the floating-point value +0.0.

Related to <rdar://problem/18027157>.

llvm-svn: 216019
2014-08-19 20:35:07 +00:00
Juergen Ributzka
f39a032c8b Reapply [FastISel][AArch64] Add support for more addressing modes (r215597).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

Original commit message:
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.

For Example:
  lsl x1, x1, #3     --> ldr x0, [x0, x1, lsl #3]
  ldr x0, [x0, x1]

  sxtw x1, w1
  lsl x1, x1, #3     --> ldr x0, [x0, x1, sxtw #3]
  ldr x0, [x0, x1]

llvm-svn: 216013
2014-08-19 19:44:17 +00:00
Juergen Ributzka
1cb2d0a61e Reapply [FastISel][AArch64] Make use of the zero register when possible (r215591).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

Original commit message:
This change materializes now the value "0" from the zero register.
The zero register can be folded by several instruction, so no
materialization is need at all.

Fixes <rdar://problem/17924413>.

llvm-svn: 216009
2014-08-19 19:44:02 +00:00
Juergen Ributzka
9261bd7fe9 [FastISel][AArch64] Fix a few BuildMI callsites where the result register was added as an operand register.
This fixes a few BuildMI callsites where the result register was added by
using addReg, which is per default a use and therefore an operand register.

Also use the zero register as result register when emitting a compare
instruction (SUBS with unused result register).

llvm-svn: 215997
2014-08-19 17:41:53 +00:00
Oliver Stannard
0f36700d69 Teach the AArch64 backend to handle f16
This allows the AArch64 backend to handle fadd, fsub, fmul and fdiv
operations on f16 (half-precision) types by promoting to f32.

llvm-svn: 215891
2014-08-18 14:22:39 +00:00
Oliver Stannard
159a549ea3 [ARM,AArch64] Do not tail-call to an externally-defined function with weak linkage
Externally-defined functions with weak linkage should not be
tail-called on ARM or AArch64, as the AAELF spec requires normal calls
to undefined weak functions to be replaced with a NOP or jump to the
next instruction. The behaviour of branch instructions in this
situation (as used for tail calls) is implementation-defined, so we
cannot rely on the linker replacing the tail call with a return.

llvm-svn: 215890
2014-08-18 12:42:15 +00:00
Amara Emerson
03cfd262eb [AArch64] Narrow arguments passed in wrong position on the stack in
big-endian mode.

Patch by Asiri Rathnayake.

Differential Revision: http://reviews.llvm.org/D4922

llvm-svn: 215716
2014-08-15 14:29:57 +00:00
Juergen Ributzka
a981de1e50 Revert several FastISel commits to track down a buildbot error.
This reverts:
r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants."
r215594 "[FastISel][X86] Use XOR to materialize the "0" value."
r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization."
r215591 "[FastISel][AArch64] Make use of the zero register when possible."
r215588 "[FastISel] Let the target decide first if it wants to materialize a constant."
r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI."

llvm-svn: 215673
2014-08-14 19:56:28 +00:00
Juergen Ributzka
49e924e64e Revert "[FastISel][AArch64] Add support for more addressing modes."
This reverts commits r215597, because it might have broken the build bots.

llvm-svn: 215659
2014-08-14 17:10:54 +00:00
Akira Hatanaka
87c53cc314 [AArch64, fast-isel] Fall back to SelectionDAG to select tail calls.
Certain functions such as objc_autoreleaseReturnValue have to be called as
tail-calls even at -O0. Since normal fast-isel doesn't emit calls as tail calls,
we have to fall back to SelectionDAG to select calls that are marked as tail.

<rdar://problem/17991614>

llvm-svn: 215600
2014-08-13 23:23:58 +00:00
Juergen Ributzka
1428168ab7 [FastISel][AArch64] Add support for more addressing modes.
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.

For Example:
  lsl x1, x1, #3     --> ldr x0, [x0, x1, lsl #3]
  ldr x0, [x0, x1]

  sxtw x1, w1
  lsl x1, x1, #3     --> ldr x0, [x0, x1, sxtw #3]
  ldr x0, [x0, x1]

llvm-svn: 215597
2014-08-13 22:53:29 +00:00
Juergen Ributzka
410902c24a [FastISel][AArch64] Make use of the zero register when possible.
This change materializes now the value "0" from the zero register.
The zero register can be folded by several instruction, so no
materialization is need at all.

Fixes <rdar://problem/17924413>.

llvm-svn: 215591
2014-08-13 22:13:14 +00:00
Gerolf Hoflehner
59d9e5a101 [MachineCombiner] Fix for ICE bug 20598
The combiner ignored DBG nodes when checking
the uses of a virtual register.

It combined a sequence like
   %vreg1 = madd %vreg2, %vreg3,...
   DBG_VALUE (%vreg1 ...)
   %vreg4 = add %vreg1,...
to
  %vreg4 = madd %vreg2, %vreg3

leaving behind a dangling DBG_VALUE with
a definition. This triggered an assertion
in the MachineTraceMetrics.cpp module.

llvm-svn: 215431
2014-08-12 07:54:12 +00:00
Quentin Colombet
03e2418837 [AArch64] Fix registerAllocator assigns same register for base and wback in
pre/post-index load and store.

Patch by Steven Wu <stevenwu@apple.com>

llvm-svn: 215390
2014-08-11 21:39:53 +00:00
Jiangning Liu
b923ade46b In Machine CSE pass, the source register of a COPY machine instruction can
be propagated to all its users, and this propagation could increase the 
probability of finding common subexpressions. If the COPY has only one user,
the COPY itself can be removed.

llvm-svn: 215344
2014-08-11 05:17:19 +00:00
Jiangning Liu
264179daf4 [AArch64] Fix a type conversion bug for anlyzing compare.
The bug can cause spec2006/483.xalancbmk failure.

Patched by David Xu.

llvm-svn: 215206
2014-08-08 14:19:29 +00:00
James Molloy
c2be58dbb0 [AArch64] Add an FP load balancing pass for Cortex-A57
For best-case performance on Cortex-A57, we should try to use a balanced mix of odd and even D-registers when performing a critical sequence of independent, non-quadword FP/ASIMD floating-point multiply or multiply-accumulate operations.

This pass attempts to detect situations where the register allocation may adversely affect this load balancing and to change the registers used so as to better utilize the CPU.

Ideally we'd just take each multiply or multiply-accumulate in turn and allocate it alternating even or odd registers. However, multiply-accumulates are most efficiently performed in the same functional unit as their accumulation operand. Therefore this pass tries to find maximal sequences ("Chains") of multiply-accumulates linked via their accumulation operand, and assign them all the same "color" (oddness/evenness).

This optimization affects S-register and D-register floating point multiplies and FMADD/FMAs, as well as vector (floating point only) muls and FMADD/FMA. Q register instructions (and 128-bit vector instructions) are not affected.

llvm-svn: 215199
2014-08-08 12:33:21 +00:00
Tim Northover
43f2f4770b AArch64: stop trying to take control of all UnknownArch triples.
This short-circuited our error reporting for incorrectly specified
target triples (you'd get AArch64 code instead).

Should fix PR20567.

llvm-svn: 215191
2014-08-08 08:27:44 +00:00
Akira Hatanaka
6d09dc6814 [stack protector] Look through bitcasts to get global variable
__stack_chk_guard.

Handle the case where the pointer operand of the load instruction that loads the
stack guard is not a global variable but instead a bitcast.

%StackGuard = load i8** bitcast (i64** @__stack_chk_guard to i8**)
call void @llvm.stackprotector(i8* %StackGuard, i8** %StackGuardSlot)

Original test case provided by Ana Pazos.

This fixes PR20558.

llvm-svn: 215167
2014-08-07 23:08:24 +00:00
Gerolf Hoflehner
35c59e408d MachineCombiner Pass for selecting faster instruction sequence on AArch64
Re-commit of r214832,r21469 with a work-around that
avoids the previous problem with gcc build compilers

The work-around is to use SmallVector instead of ArrayRef
of basic blocks in preservesResourceLen()/MachineCombiner.cpp

llvm-svn: 215151
2014-08-07 21:40:58 +00:00
James Molloy
7518c61a09 [AArch64] Add a testcase for r214957.
llvm-svn: 214965
2014-08-06 13:31:32 +00:00
Yi Kong
a05145f812 AArch64: Add support for instruction prefetch intrinsic
Instruction prefetch is not implemented for AArch64, it is incorrectly
translated into data prefetch instruction.

Differential Revision: http://reviews.llvm.org/D4777

llvm-svn: 214860
2014-08-05 12:46:47 +00:00
Juergen Ributzka
ec5a9526be [FastISel][AArch64] Implement the FastLowerArguments hook.
This implements basic argument lowering for AArch64 in FastISel. It only
handles a small subset of the C calling convention. It supports simple
arguments that can be passed in GPR and FPR registers.

This should cover most of the trivial cases without falling back to
SelectionDAG.

This fixes <rdar://problem/17890986>.

llvm-svn: 214846
2014-08-05 05:43:48 +00:00
Kevin Qin
2c9a6f5e83 Revert "r214832 - MachineCombiner Pass for selecting faster instruction"
It broke compiling of most Benchmark and internal test, as clang got
clashed by segmentation fault or assertion.

llvm-svn: 214845
2014-08-05 05:43:47 +00:00
Juergen Ributzka
20218b6fa0 [FastISel][AArch64] Don't perform sign-/zero-extension for function arguments that have already been sign-/zero-extended.
llvm-svn: 214844
2014-08-05 05:43:44 +00:00
Gerolf Hoflehner
40e09fb7dd MachineCombiner Pass for selecting faster instruction
sequence on AArch64

Re-commit of r214669 without changes to test cases
LLVM::CodeGen/AArch64/arm64-neon-mul-div.ll and
LLVM:: CodeGen/AArch64/dp-3source.ll
This resolves the reported compfails of the original commit.

llvm-svn: 214832
2014-08-05 01:16:13 +00:00
Juergen Ributzka
f39fd7e490 [FastISel][AArch64] Fix shift lowering for i8 and i16 value types.
This fix changes the parameters #r and #s that are passed to the UBFM/SBFM
instruction to get the zero/sign-extension for free.

The original problem was that the shift left would use the 32-bit shift even for
i8/i16 value types, which could leave the upper bits set with "garbage" values.

The arithmetic shift right on the other side would use the wrong MSB as sign-bit
to determine what bits to shift into the value.

This fixes <rdar://problem/17907720>.

llvm-svn: 214788
2014-08-04 21:49:51 +00:00
Chad Rosier
d843d13525 [AArch64] Extend the number of scalar instructions supported in the AdvSIMD
scalar integer instruction pass.

This is a patch I had lying around from a few months ago.  The pass is
currently disabled by default, so nothing to interesting.

llvm-svn: 214779
2014-08-04 21:20:25 +00:00
Kevin Qin
348a8dd760 Revert "r214669 - MachineCombiner Pass for selecting faster instruction"
This commit broke "make check" for several hours, so get it reverted.

llvm-svn: 214697
2014-08-04 05:10:33 +00:00
Gerolf Hoflehner
265dd68643 MachineCombiner Pass for selecting faster instruction
sequence -  AArch64 target support

 This patch turns off madd/msub generation in the DAGCombiner and generates
 them in the MachineCombiner instead. It replaces the original code sequence
 with the combined sequence when it is beneficial to do so.

 When there is no machine model support it always generates the madd/msub
 instruction. This is true also when the objective is to optimize for code
 size: when the combined sequence is shorter is always chosen and does not
 get evaluated.

 When there is a machine model the combined instruction sequence
 is evaluated for critical path and resource length using machine
 trace metrics and the original code sequence is replaced when it is
 determined to be faster.

 rdar://16319955

llvm-svn: 214669
2014-08-03 22:03:40 +00:00
James Molloy
b4d821ca5b Update test to use a more modern AArch64 triple, as requested by Renato.
llvm-svn: 214637
2014-08-02 17:15:11 +00:00
James Molloy
240087c61a [AArch64] Teach DAGCombiner that converting two consecutive loads into a vector load is not a good transform when paired loads are available.
The combiner was creating Q-register loads and stores, which then had to be spilled because there are no callee-save Q registers!

llvm-svn: 214634
2014-08-02 14:51:24 +00:00
Juergen Ributzka
bb4322fe8b [FastISel][AArch64] Fold offset into the memory operation.
Fold simple offsets into the memory operation:
  add x0, x0, #8
  ldr x0, [x0]
-->
  ldr x0, [x0, #8]

Fixes <rdar://problem/17887945>.

llvm-svn: 214545
2014-08-01 19:40:16 +00:00
Juergen Ributzka
5fad1ddebc [FastISel][AArch64] Add branch weights.
Add branch weights to branch instructions, so that the following passes can
optimize based on it (i.e. basic block ordering).

Fixes <rdar://problem/17887137>.

llvm-svn: 214537
2014-08-01 18:39:24 +00:00
Chad Rosier
f6bd4ee872 [AArch64] Fix test from r214518 in an attempt to appease buildbots.
llvm-svn: 214521
2014-08-01 15:30:41 +00:00
Chad Rosier
69caabf908 [AArch64] Generate tbz/tbnz when comparing against zero.
The tbz/tbnz checks the sign bit to convert

op w1, w1, w10
cmp w1, #0
b.lt .LBB0_0

to

op w1, w1, w10
tbnz w1, #31, .LBB0_0

Differential Revision: http://reviews.llvm.org/D4440

llvm-svn: 214518
2014-08-01 14:48:56 +00:00
Juergen Ributzka
1686869897 [FastISel][AArch64] Fix the immediate versions of the {s|u}{add|sub}.with.overflow intrinsics.
ADDS and SUBS cannot encode negative immediates or immediates larger than 12bit.
This fix checks if the immediate version can be used under this constraints and
if we can convert ADDS to SUBS or vice versa to support negative immediates.

Also update the test cases to test the immediate versions.

llvm-svn: 214470
2014-08-01 01:25:55 +00:00
Juergen Ributzka
ed5bbc5130 [FastISel][AArch64] Add basic bitcast support for conversion between float and int.
Fixes <rdar://problem/17867078>.

llvm-svn: 214389
2014-07-31 06:25:37 +00:00
Juergen Ributzka
35e07f4cd0 [FastISel][AArch64] Add sqrt intrinsic support.
Fixes <rdar://problem/17867067>.

llvm-svn: 214388
2014-07-31 06:25:33 +00:00
Juergen Ributzka
4e074494e8 [FastISel][AArch64] Update and enable patchpoint and stackmap intrinsic tests for FastISel.
This commit updates the existing SelectionDAG tests for the stackmap and patchpoint
intrinsics and enables FastISel testing. It also splits up the tests into separate
files, due to different codegen between SelectionDAG and FastISel.

llvm-svn: 214382
2014-07-31 04:10:43 +00:00
Juergen Ributzka
622f41919a [FastISel][AArch64] Add MachO large code model support for function calls.
Currently the large code model for MachO uses the GOT to make function calls.
Emit the required adrp and ldr instructions to load the address from the GOT.

Related to <rdar://problem/17733076>.

llvm-svn: 214381
2014-07-31 04:10:40 +00:00
Juergen Ributzka
d1fc9d2924 [FastISel][AArch64] Add select folding support for the XALU intrinsics.
This improves the code generation for the XALU intrinsics when the
condition is feeding a select instruction.

This also updates and enables the XALU unit tests for FastISel.

This fixes <rdar://problem/17831117>.

llvm-svn: 214350
2014-07-30 22:04:37 +00:00