1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 05:23:45 +02:00
Commit Graph

523 Commits

Author SHA1 Message Date
Yi Kong
a05145f812 AArch64: Add support for instruction prefetch intrinsic
Instruction prefetch is not implemented for AArch64, it is incorrectly
translated into data prefetch instruction.

Differential Revision: http://reviews.llvm.org/D4777

llvm-svn: 214860
2014-08-05 12:46:47 +00:00
James Molloy
ea323a2876 Teach the SLP Vectorizer that keeping some values live over a callsite can have a cost.
Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account.

llvm-svn: 214859
2014-08-05 12:30:34 +00:00
Juergen Ributzka
8e89773258 [FastIsel][AArch64] Fix previous commit r214844 (Don't perform sign-/zero-extension for function arguments that have already been sign-/zero-extended.)
The original code would fail for unsupported value types like i1, i8, and i16.
This fix changes the code to only create a sub-register copy for i64 value types
and all other types (i1/i8/i16/i32) just use the source register without any
modifications.

getRegClassFor() is now guarded by the i64 value type check, that guarantees
that we always request a register for a valid value type.

llvm-svn: 214848
2014-08-05 07:31:30 +00:00
Juergen Ributzka
ec5a9526be [FastISel][AArch64] Implement the FastLowerArguments hook.
This implements basic argument lowering for AArch64 in FastISel. It only
handles a small subset of the C calling convention. It supports simple
arguments that can be passed in GPR and FPR registers.

This should cover most of the trivial cases without falling back to
SelectionDAG.

This fixes <rdar://problem/17890986>.

llvm-svn: 214846
2014-08-05 05:43:48 +00:00
Kevin Qin
2c9a6f5e83 Revert "r214832 - MachineCombiner Pass for selecting faster instruction"
It broke compiling of most Benchmark and internal test, as clang got
clashed by segmentation fault or assertion.

llvm-svn: 214845
2014-08-05 05:43:47 +00:00
Juergen Ributzka
20218b6fa0 [FastISel][AArch64] Don't perform sign-/zero-extension for function arguments that have already been sign-/zero-extended.
llvm-svn: 214844
2014-08-05 05:43:44 +00:00
Eric Christopher
67c04e77e5 Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.

Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.

llvm-svn: 214838
2014-08-05 02:39:49 +00:00
Gerolf Hoflehner
40e09fb7dd MachineCombiner Pass for selecting faster instruction
sequence on AArch64

Re-commit of r214669 without changes to test cases
LLVM::CodeGen/AArch64/arm64-neon-mul-div.ll and
LLVM:: CodeGen/AArch64/dp-3source.ll
This resolves the reported compfails of the original commit.

llvm-svn: 214832
2014-08-05 01:16:13 +00:00
Juergen Ributzka
f39fd7e490 [FastISel][AArch64] Fix shift lowering for i8 and i16 value types.
This fix changes the parameters #r and #s that are passed to the UBFM/SBFM
instruction to get the zero/sign-extension for free.

The original problem was that the shift left would use the 32-bit shift even for
i8/i16 value types, which could leave the upper bits set with "garbage" values.

The arithmetic shift right on the other side would use the wrong MSB as sign-bit
to determine what bits to shift into the value.

This fixes <rdar://problem/17907720>.

llvm-svn: 214788
2014-08-04 21:49:51 +00:00
Eric Christopher
99307e99a2 Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

llvm-svn: 214781
2014-08-04 21:25:23 +00:00
Chad Rosier
d843d13525 [AArch64] Extend the number of scalar instructions supported in the AdvSIMD
scalar integer instruction pass.

This is a patch I had lying around from a few months ago.  The pass is
currently disabled by default, so nothing to interesting.

llvm-svn: 214779
2014-08-04 21:20:25 +00:00
Kevin Qin
348a8dd760 Revert "r214669 - MachineCombiner Pass for selecting faster instruction"
This commit broke "make check" for several hours, so get it reverted.

llvm-svn: 214697
2014-08-04 05:10:33 +00:00
Gerolf Hoflehner
265dd68643 MachineCombiner Pass for selecting faster instruction
sequence -  AArch64 target support

 This patch turns off madd/msub generation in the DAGCombiner and generates
 them in the MachineCombiner instead. It replaces the original code sequence
 with the combined sequence when it is beneficial to do so.

 When there is no machine model support it always generates the madd/msub
 instruction. This is true also when the objective is to optimize for code
 size: when the combined sequence is shorter is always chosen and does not
 get evaluated.

 When there is a machine model the combined instruction sequence
 is evaluated for critical path and resource length using machine
 trace metrics and the original code sequence is replaced when it is
 determined to be faster.

 rdar://16319955

llvm-svn: 214669
2014-08-03 22:03:40 +00:00
Juergen Ributzka
bb4322fe8b [FastISel][AArch64] Fold offset into the memory operation.
Fold simple offsets into the memory operation:
  add x0, x0, #8
  ldr x0, [x0]
-->
  ldr x0, [x0, #8]

Fixes <rdar://problem/17887945>.

llvm-svn: 214545
2014-08-01 19:40:16 +00:00
Juergen Ributzka
5fad1ddebc [FastISel][AArch64] Add branch weights.
Add branch weights to branch instructions, so that the following passes can
optimize based on it (i.e. basic block ordering).

Fixes <rdar://problem/17887137>.

llvm-svn: 214537
2014-08-01 18:39:24 +00:00
Renato Golin
5b6b134ddb Add missing breaks to AArch64InstrInfo::isGPRCopy
llvm-svn: 214528
2014-08-01 17:27:31 +00:00
Chad Rosier
69caabf908 [AArch64] Generate tbz/tbnz when comparing against zero.
The tbz/tbnz checks the sign bit to convert

op w1, w1, w10
cmp w1, #0
b.lt .LBB0_0

to

op w1, w1, w10
tbnz w1, #31, .LBB0_0

Differential Revision: http://reviews.llvm.org/D4440

llvm-svn: 214518
2014-08-01 14:48:56 +00:00
Juergen Ributzka
1686869897 [FastISel][AArch64] Fix the immediate versions of the {s|u}{add|sub}.with.overflow intrinsics.
ADDS and SUBS cannot encode negative immediates or immediates larger than 12bit.
This fix checks if the immediate version can be used under this constraints and
if we can convert ADDS to SUBS or vice versa to support negative immediates.

Also update the test cases to test the immediate versions.

llvm-svn: 214470
2014-08-01 01:25:55 +00:00
Louis Gerbarg
8048e52537 Make sure no loads resulting from load->switch DAGCombine are marked invariant
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.

This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.

llvm-svn: 214449
2014-07-31 21:45:05 +00:00
Aaron Ballman
e2e6979c9d Fixing an -Woverloaded-virtual warnings by exposing the hidden virtual function as well. No functional changes intended.
llvm-svn: 214400
2014-07-31 12:58:50 +00:00
Juergen Ributzka
ed5bbc5130 [FastISel][AArch64] Add basic bitcast support for conversion between float and int.
Fixes <rdar://problem/17867078>.

llvm-svn: 214389
2014-07-31 06:25:37 +00:00
Juergen Ributzka
35e07f4cd0 [FastISel][AArch64] Add sqrt intrinsic support.
Fixes <rdar://problem/17867067>.

llvm-svn: 214388
2014-07-31 06:25:33 +00:00
Juergen Ributzka
622f41919a [FastISel][AArch64] Add MachO large code model support for function calls.
Currently the large code model for MachO uses the GOT to make function calls.
Emit the required adrp and ldr instructions to load the address from the GOT.

Related to <rdar://problem/17733076>.

llvm-svn: 214381
2014-07-31 04:10:40 +00:00
Juergen Ributzka
f6e1a34d78 [FastISel][AArch64 and X86] Don't emit stores for UNDEF arguments during function call lowering.
UNDEF arguments are not ment to be touched - especially for the webkit_js
calling convention. This fix reproduces the already existing behavior of
SelectionDAG in FastISel.

llvm-svn: 214366
2014-07-31 00:11:11 +00:00
Juergen Ributzka
d1fc9d2924 [FastISel][AArch64] Add select folding support for the XALU intrinsics.
This improves the code generation for the XALU intrinsics when the
condition is feeding a select instruction.

This also updates and enables the XALU unit tests for FastISel.

This fixes <rdar://problem/17831117>.

llvm-svn: 214350
2014-07-30 22:04:37 +00:00
Juergen Ributzka
6162229025 [FastISel][AArch64] Add branch folding support for the XALU intrinsics.
This improves the code generation for the XALU intrinsics when the
condition is feeding a branch instruction.

This is related to <rdar://problem/17831117>.

llvm-svn: 214349
2014-07-30 22:04:34 +00:00
Juergen Ributzka
b93a2a4457 [FastISel][AArch64] Add {s|u}{add|sub|mul}.with.overflow intrinsic support.
This commit adds support for the {s|u}{add|sub|mul}.with.overflow intrinsics.
The unit tests for FastISel will be enabled in a later commit, once there is
also branch and select folding support.

This is related to <rdar://problem/17831117>.

llvm-svn: 214348
2014-07-30 22:04:31 +00:00
Juergen Ributzka
f12a2ac4ea [FastISel][AArch64] Create helper functions to create the various multiplies on AArch64.
llvm-svn: 214346
2014-07-30 22:04:25 +00:00
Juergen Ributzka
9ceeaa8bbc [FastISel][AArch64] Add support for shift-immediate.
Currently the shift-immediate versions are not supported by tblgen and
hopefully this can be later removed, once the required support has been
added to tblgen.

llvm-svn: 214345
2014-07-30 22:04:22 +00:00
Jiangning Liu
86fc448354 Implement AArch64 TTI interface isAsCheapAsAMove.
llvm-svn: 214159
2014-07-29 02:09:26 +00:00
Matt Arsenault
76c7b7a591 Add alignment value to allowsUnalignedMemoryAccess
Rename to allowsMisalignedMemoryAccess.

On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment,
and don't need to be split into multiple accesses. Vector loads with
an alignment of the element type are not uncommon in OpenCL code.

llvm-svn: 214055
2014-07-27 17:46:40 +00:00
Tim Northover
5290b47db1 AArch64: fix conversion of 'J' inline asm constraints.
'J' represents a negative number suitable for an add/sub alias
instruction, but while preparing it to become an int64_t we were
mangling the sign extension. So "i32 -1" became 0xffffffffLL, for
example.

Should fix one half of PR20456.

llvm-svn: 214052
2014-07-27 07:10:29 +00:00
Akira Hatanaka
e9a7fadd46 [stack protector] Fix a potential security bug in stack protector where the
address of the stack guard was being spilled to the stack.

Previously the address of the stack guard would get spilled to the stack if it
was impossible to keep it in a register. This patch introduces a new target
independent node and pseudo instruction which gets expanded post-RA to a
sequence of instructions that load the stack guard value. Register allocator
can now just remat the value when it can't keep it in a register. 

<rdar://problem/12475629>

llvm-svn: 213967
2014-07-25 19:31:34 +00:00
Juergen Ributzka
d43a6ec447 [FastISel][AArch64] Add support for frameaddress intrinsic.
This commit implements the frameaddress intrinsic for the AArch64 architecture
in FastISel.

There were two test cases that pretty much tested the same, so I combined them
to a single test case.

Fixes <rdar://problem/17811834>

llvm-svn: 213959
2014-07-25 17:47:14 +00:00
Benjamin Kramer
85476f43e5 Run sort_includes.py on the AArch64 backend.
No functionality change.

llvm-svn: 213938
2014-07-25 11:42:14 +00:00
Tim Northover
5be46ae3c3 AArch64: refactor ReconstructShuffle function
Quite a bit of cruft had accumulated as we realised the various different cases
it had to handle and squeezed them in where possible. This refactoring mostly
flattens the logic and special-cases. The result is slightly longer, but I
think clearer.

Should be no functionality change.

llvm-svn: 213867
2014-07-24 15:39:55 +00:00
NAKAMURA Takumi
b99eee32fb Prune redundant libdeps.
llvm-svn: 213857
2014-07-24 11:45:27 +00:00
NAKAMURA Takumi
dbff653a5a Update library dependencies.
llvm-svn: 213832
2014-07-24 02:10:42 +00:00
Kevin Qin
a272df5db2 [AArch64] Fix a bug generating incorrect instruction when building small vector.
This bug is introduced by r211144. The element of operand may be
smaller than the element of result, but previous commit can
only handle the contrary condition. This commit is to handle this
scenario and generate optimized codes like ZIP1.

llvm-svn: 213830
2014-07-24 02:05:42 +00:00
Jiangning Liu
a94c806f8b [AArch64] Disable some optimization cases for type conversion from sint to fp, because those optimization cases are micro-architecture dependent and only make sense for Cyclone. A new predicate Cyclone is introduced in .td file.
llvm-svn: 213827
2014-07-24 01:29:59 +00:00
Rafael Espindola
d69ab9179e Finish inverting the MC -> Object dependency.
There were still some disassembler bits in lib/MC, but their use of Object
was only visible in the includes they used, not in the symbols.

llvm-svn: 213808
2014-07-23 22:26:07 +00:00
Jim Grosbach
f6143714d5 [X86,AArch64] Extend vcmp w/ unary op combine to work w/ more constants.
The transform to constant fold unary operations with an AND across a
vector comparison applies when the constant is not a splat of a scalar
as well.

llvm-svn: 213800
2014-07-23 20:41:43 +00:00
Jim Grosbach
5e39096957 X86: restrict combine to when type sizes are safe.
The folding of unary operations through a vector compare and mask operation
is only safe if the unary operation result is of the same size as its input.
For example, it's not safe for [su]itofp from v4i32 to v4f64.

llvm-svn: 213799
2014-07-23 20:41:38 +00:00
Juergen Ributzka
2a176cc855 [FastISel][AArch64] Fix return type in FastLowerCall.
I used the wrong method to obtain the return type inside FinishCall. This fix
simply uses the return type from FastLowerCall, which we already determined to
be a valid type.

Reduced test case from Chad. Thanks.

llvm-svn: 213788
2014-07-23 20:03:13 +00:00
Chad Rosier
2cb7fd5dae [AArch64] Lower sdiv x, pow2 using add + select + shift.
The target-independent DAGcombiner will generate:
asr w1, X, #31 w1 = splat sign bit.
add X, X, w1, lsr #28 X = X + 0 or pow2-1
asr w0, X, asr #4 w0 = X/pow2

However, the add + shifts is expensive, so generate:
add w0, X, 15 w0 = X + pow2-1
cmp X, wzr X - 0
csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X;
asr w0, X, asr 4 w0 = X/pow2

llvm-svn: 213758
2014-07-23 14:57:52 +00:00
Tim Northover
c357579164 AArch64: remove "arm64_be" support in favour of "aarch64_be".
There really is no arm64_be: it was a useful fiction to test big-endian support
while both backends existed in parallel, but now the only platform that uses
the name (iOS) doesn't have a big-endian variant, let alone one called
"arm64_be".

llvm-svn: 213748
2014-07-23 12:58:11 +00:00
Tim Northover
3e8fd62854 AArch64: remove arm64 triple enumerator.
Having both Triple::arm64 and Triple::aarch64 is extremely confusing, and
invites bugs where only one is checked. In reality, the only legitimate
difference between the two (arm64 usually means iOS) is also present in the OS
part of the triple and that's what should be checked.

We still parse the "arm64" triple, just canonicalise it to Triple::aarch64, so
there aren't any LLVM-side test changes.

llvm-svn: 213743
2014-07-23 12:32:47 +00:00
Juergen Ributzka
c2d9ee45f3 [FastIsel][AArch64] Add support for the FastLowerCall and FastLowerIntrinsicCall target-hooks.
This commit modifies the existing call lowering functions to be used as the
FastLowerCall and FastLowerIntrinsicCall target-hooks instead.

This enables patchpoint intrinsic lowering for AArch64.

This fixes <rdar://problem/17733076>

llvm-svn: 213704
2014-07-22 23:14:58 +00:00
Tim Northover
86e216695b CodeGen: emit IR-level f16 conversion intrinsics as fptrunc/fpext
This makes the first stage DAG for @llvm.convert.to.fp16 an fptrunc,
and correspondingly @llvm.convert.from.fp16 an fpext. The legalisation
path is now uniform, regardless of the input IR:

  fptrunc -> FP_TO_FP16 (if f16 illegal) -> libcall
  fpext -> FP16_TO_FP (if f16 illegal) -> libcall

Each target should be able to select the version that best matches its
operations and not be required to duplicate patterns for both fptrunc
and FP_TO_FP16 (for example).

As a result we can remove some redundant AArch64 patterns.

llvm-svn: 213507
2014-07-21 09:13:56 +00:00
David Peixotto
569d73691e MC: support different sized constants in constant pools
On AArch64 the pseudo instruction ldr <reg>, =... supports both
32-bit and 64-bit constants. Add support for 64 bit constants for
the pools to support the pseudo instruction fully.

Changes the AArch64 ldr-pseudo tests to use 32-bit registers and
adds tests with 64-bit registers.

Patch by Janne Grunau!

Differential Revision: http://reviews.llvm.org/D4279

llvm-svn: 213387
2014-07-18 16:05:14 +00:00