1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 11:02:59 +02:00
Commit Graph

691 Commits

Author SHA1 Message Date
Sylvestre Ledru
d22d211a98 Fix typos:
* libaries => libraries
* avaiable => available

llvm-svn: 215366
2014-08-11 18:04:46 +00:00
Joerg Sonnenberger
df2012d0aa If available, pass down the Fixup object to EvaluateAsRelocatable.
At least on PowerPC, the interpretation of certain modifiers depends on
the context they appear in.

llvm-svn: 215310
2014-08-10 11:35:12 +00:00
Aaron Ballman
8207a5fba4 Resolving some type truncation warnings in MSVC (enum to bool in this case). No functional changes intended.
llvm-svn: 215293
2014-08-09 19:53:34 +00:00
Tim Northover
d67d6cf4fb AArch64: avoid deleting the current iterator in a loop.
std::map invalidates the iterator to any element that gets deleted, which means
we can't increment it correctly afterwards. This was causing Darwin test
failures.

llvm-svn: 215233
2014-08-08 17:31:52 +00:00
Juergen Ributzka
657fb37f68 [FastISel][AArch64] Attach MachineMemOperands to load and store instructions.
llvm-svn: 215231
2014-08-08 17:24:10 +00:00
NAKAMURA Takumi
52e4e2fd27 AArch64A57FPLoadBalancing.cpp: Define ColorNames in !NDEBUG.
llvm-svn: 215226
2014-08-08 17:00:59 +00:00
Jiangning Liu
264179daf4 [AArch64] Fix a type conversion bug for anlyzing compare.
The bug can cause spec2006/483.xalancbmk failure.

Patched by David Xu.

llvm-svn: 215206
2014-08-08 14:19:29 +00:00
James Molloy
c2be58dbb0 [AArch64] Add an FP load balancing pass for Cortex-A57
For best-case performance on Cortex-A57, we should try to use a balanced mix of odd and even D-registers when performing a critical sequence of independent, non-quadword FP/ASIMD floating-point multiply or multiply-accumulate operations.

This pass attempts to detect situations where the register allocation may adversely affect this load balancing and to change the registers used so as to better utilize the CPU.

Ideally we'd just take each multiply or multiply-accumulate in turn and allocate it alternating even or odd registers. However, multiply-accumulates are most efficiently performed in the same functional unit as their accumulation operand. Therefore this pass tries to find maximal sequences ("Chains") of multiply-accumulates linked via their accumulation operand, and assign them all the same "color" (oddness/evenness).

This optimization affects S-register and D-register floating point multiplies and FMADD/FMAs, as well as vector (floating point only) muls and FMADD/FMA. Q register instructions (and 128-bit vector instructions) are not affected.

llvm-svn: 215199
2014-08-08 12:33:21 +00:00
Tim Northover
43f2f4770b AArch64: stop trying to take control of all UnknownArch triples.
This short-circuited our error reporting for incorrectly specified
target triples (you'd get AArch64 code instead).

Should fix PR20567.

llvm-svn: 215191
2014-08-08 08:27:44 +00:00
NAKAMURA Takumi
76338bbbe8 AArch64InstrInfo.cpp: Fix \param(s). [-Wdocumentation]
llvm-svn: 215180
2014-08-08 02:04:18 +00:00
Eric Christopher
378bc328f0 Temporarily Revert "Nuke the old JIT." as it's not quite ready to
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.

Approved by Jim Grosbach, Lang Hames, Rafael Espindola.

This reverts commits r215111, 215115, 215116, 215117, 215136.

llvm-svn: 215154
2014-08-07 22:02:54 +00:00
Gerolf Hoflehner
35c59e408d MachineCombiner Pass for selecting faster instruction sequence on AArch64
Re-commit of r214832,r21469 with a work-around that
avoids the previous problem with gcc build compilers

The work-around is to use SmallVector instead of ArrayRef
of basic blocks in preservesResourceLen()/MachineCombiner.cpp

llvm-svn: 215151
2014-08-07 21:40:58 +00:00
Rafael Espindola
e9ebbe5559 Nuke the old JIT.
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.

Thanks to Lang Hames for making MCJIT a good replacement!

llvm-svn: 215111
2014-08-07 14:21:18 +00:00
Eric Christopher
4a1cdb2ba7 Remove the target machine from CCState. Previously it was only used
to get the subtarget and that's accessible from the MachineFunction
now. This helps clear the way for smaller changes where we getting
a subtarget will require passing in a MachineFunction/Function as
well.

llvm-svn: 214988
2014-08-06 18:45:26 +00:00
Chad Rosier
0214f513b9 [AArch64] Add a few isTarget* API to AArch64 Subtarget.
llvm-svn: 214977
2014-08-06 16:56:58 +00:00
Chad Rosier
8291c24b09 [AArch64] Fix OS ABI flag for aarch64-linux-gnu target.
For triple aarch64-linux-gnu we were incorrectly setting IRIX.
For triple aarch64 we are correctly setting SYSV.

Patch by Ana Pazos <apazos@codeaurora.org>.

llvm-svn: 214974
2014-08-06 16:05:02 +00:00
James Molloy
7518c61a09 [AArch64] Add a testcase for r214957.
llvm-svn: 214965
2014-08-06 13:31:32 +00:00
James Molloy
2bbe86fab0 [AArch64] Conditional selects are expensive on out-of-order cores.
Specifically Cortex-A57. This probably applies to Cyclone too but I haven't enabled it for that as I can't test it.

This gives ~4% improvement on SPEC 174.vpr, and ~1% in 471.omnetpp.

llvm-svn: 214957
2014-08-06 10:42:18 +00:00
Yi Kong
a05145f812 AArch64: Add support for instruction prefetch intrinsic
Instruction prefetch is not implemented for AArch64, it is incorrectly
translated into data prefetch instruction.

Differential Revision: http://reviews.llvm.org/D4777

llvm-svn: 214860
2014-08-05 12:46:47 +00:00
James Molloy
ea323a2876 Teach the SLP Vectorizer that keeping some values live over a callsite can have a cost.
Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account.

llvm-svn: 214859
2014-08-05 12:30:34 +00:00
Juergen Ributzka
8e89773258 [FastIsel][AArch64] Fix previous commit r214844 (Don't perform sign-/zero-extension for function arguments that have already been sign-/zero-extended.)
The original code would fail for unsupported value types like i1, i8, and i16.
This fix changes the code to only create a sub-register copy for i64 value types
and all other types (i1/i8/i16/i32) just use the source register without any
modifications.

getRegClassFor() is now guarded by the i64 value type check, that guarantees
that we always request a register for a valid value type.

llvm-svn: 214848
2014-08-05 07:31:30 +00:00
Juergen Ributzka
ec5a9526be [FastISel][AArch64] Implement the FastLowerArguments hook.
This implements basic argument lowering for AArch64 in FastISel. It only
handles a small subset of the C calling convention. It supports simple
arguments that can be passed in GPR and FPR registers.

This should cover most of the trivial cases without falling back to
SelectionDAG.

This fixes <rdar://problem/17890986>.

llvm-svn: 214846
2014-08-05 05:43:48 +00:00
Kevin Qin
2c9a6f5e83 Revert "r214832 - MachineCombiner Pass for selecting faster instruction"
It broke compiling of most Benchmark and internal test, as clang got
clashed by segmentation fault or assertion.

llvm-svn: 214845
2014-08-05 05:43:47 +00:00
Juergen Ributzka
20218b6fa0 [FastISel][AArch64] Don't perform sign-/zero-extension for function arguments that have already been sign-/zero-extended.
llvm-svn: 214844
2014-08-05 05:43:44 +00:00
Eric Christopher
67c04e77e5 Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.

Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.

llvm-svn: 214838
2014-08-05 02:39:49 +00:00
Gerolf Hoflehner
40e09fb7dd MachineCombiner Pass for selecting faster instruction
sequence on AArch64

Re-commit of r214669 without changes to test cases
LLVM::CodeGen/AArch64/arm64-neon-mul-div.ll and
LLVM:: CodeGen/AArch64/dp-3source.ll
This resolves the reported compfails of the original commit.

llvm-svn: 214832
2014-08-05 01:16:13 +00:00
Juergen Ributzka
f39fd7e490 [FastISel][AArch64] Fix shift lowering for i8 and i16 value types.
This fix changes the parameters #r and #s that are passed to the UBFM/SBFM
instruction to get the zero/sign-extension for free.

The original problem was that the shift left would use the 32-bit shift even for
i8/i16 value types, which could leave the upper bits set with "garbage" values.

The arithmetic shift right on the other side would use the wrong MSB as sign-bit
to determine what bits to shift into the value.

This fixes <rdar://problem/17907720>.

llvm-svn: 214788
2014-08-04 21:49:51 +00:00
Eric Christopher
99307e99a2 Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

llvm-svn: 214781
2014-08-04 21:25:23 +00:00
Chad Rosier
d843d13525 [AArch64] Extend the number of scalar instructions supported in the AdvSIMD
scalar integer instruction pass.

This is a patch I had lying around from a few months ago.  The pass is
currently disabled by default, so nothing to interesting.

llvm-svn: 214779
2014-08-04 21:20:25 +00:00
Kevin Qin
348a8dd760 Revert "r214669 - MachineCombiner Pass for selecting faster instruction"
This commit broke "make check" for several hours, so get it reverted.

llvm-svn: 214697
2014-08-04 05:10:33 +00:00
Gerolf Hoflehner
265dd68643 MachineCombiner Pass for selecting faster instruction
sequence -  AArch64 target support

 This patch turns off madd/msub generation in the DAGCombiner and generates
 them in the MachineCombiner instead. It replaces the original code sequence
 with the combined sequence when it is beneficial to do so.

 When there is no machine model support it always generates the madd/msub
 instruction. This is true also when the objective is to optimize for code
 size: when the combined sequence is shorter is always chosen and does not
 get evaluated.

 When there is a machine model the combined instruction sequence
 is evaluated for critical path and resource length using machine
 trace metrics and the original code sequence is replaced when it is
 determined to be faster.

 rdar://16319955

llvm-svn: 214669
2014-08-03 22:03:40 +00:00
Juergen Ributzka
bb4322fe8b [FastISel][AArch64] Fold offset into the memory operation.
Fold simple offsets into the memory operation:
  add x0, x0, #8
  ldr x0, [x0]
-->
  ldr x0, [x0, #8]

Fixes <rdar://problem/17887945>.

llvm-svn: 214545
2014-08-01 19:40:16 +00:00
Juergen Ributzka
5fad1ddebc [FastISel][AArch64] Add branch weights.
Add branch weights to branch instructions, so that the following passes can
optimize based on it (i.e. basic block ordering).

Fixes <rdar://problem/17887137>.

llvm-svn: 214537
2014-08-01 18:39:24 +00:00
Renato Golin
5b6b134ddb Add missing breaks to AArch64InstrInfo::isGPRCopy
llvm-svn: 214528
2014-08-01 17:27:31 +00:00
Chad Rosier
69caabf908 [AArch64] Generate tbz/tbnz when comparing against zero.
The tbz/tbnz checks the sign bit to convert

op w1, w1, w10
cmp w1, #0
b.lt .LBB0_0

to

op w1, w1, w10
tbnz w1, #31, .LBB0_0

Differential Revision: http://reviews.llvm.org/D4440

llvm-svn: 214518
2014-08-01 14:48:56 +00:00
Juergen Ributzka
1686869897 [FastISel][AArch64] Fix the immediate versions of the {s|u}{add|sub}.with.overflow intrinsics.
ADDS and SUBS cannot encode negative immediates or immediates larger than 12bit.
This fix checks if the immediate version can be used under this constraints and
if we can convert ADDS to SUBS or vice versa to support negative immediates.

Also update the test cases to test the immediate versions.

llvm-svn: 214470
2014-08-01 01:25:55 +00:00
Louis Gerbarg
8048e52537 Make sure no loads resulting from load->switch DAGCombine are marked invariant
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.

This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.

llvm-svn: 214449
2014-07-31 21:45:05 +00:00
Aaron Ballman
e2e6979c9d Fixing an -Woverloaded-virtual warnings by exposing the hidden virtual function as well. No functional changes intended.
llvm-svn: 214400
2014-07-31 12:58:50 +00:00
Juergen Ributzka
ed5bbc5130 [FastISel][AArch64] Add basic bitcast support for conversion between float and int.
Fixes <rdar://problem/17867078>.

llvm-svn: 214389
2014-07-31 06:25:37 +00:00
Juergen Ributzka
35e07f4cd0 [FastISel][AArch64] Add sqrt intrinsic support.
Fixes <rdar://problem/17867067>.

llvm-svn: 214388
2014-07-31 06:25:33 +00:00
Juergen Ributzka
622f41919a [FastISel][AArch64] Add MachO large code model support for function calls.
Currently the large code model for MachO uses the GOT to make function calls.
Emit the required adrp and ldr instructions to load the address from the GOT.

Related to <rdar://problem/17733076>.

llvm-svn: 214381
2014-07-31 04:10:40 +00:00
Juergen Ributzka
f6e1a34d78 [FastISel][AArch64 and X86] Don't emit stores for UNDEF arguments during function call lowering.
UNDEF arguments are not ment to be touched - especially for the webkit_js
calling convention. This fix reproduces the already existing behavior of
SelectionDAG in FastISel.

llvm-svn: 214366
2014-07-31 00:11:11 +00:00
Juergen Ributzka
d1fc9d2924 [FastISel][AArch64] Add select folding support for the XALU intrinsics.
This improves the code generation for the XALU intrinsics when the
condition is feeding a select instruction.

This also updates and enables the XALU unit tests for FastISel.

This fixes <rdar://problem/17831117>.

llvm-svn: 214350
2014-07-30 22:04:37 +00:00
Juergen Ributzka
6162229025 [FastISel][AArch64] Add branch folding support for the XALU intrinsics.
This improves the code generation for the XALU intrinsics when the
condition is feeding a branch instruction.

This is related to <rdar://problem/17831117>.

llvm-svn: 214349
2014-07-30 22:04:34 +00:00
Juergen Ributzka
b93a2a4457 [FastISel][AArch64] Add {s|u}{add|sub|mul}.with.overflow intrinsic support.
This commit adds support for the {s|u}{add|sub|mul}.with.overflow intrinsics.
The unit tests for FastISel will be enabled in a later commit, once there is
also branch and select folding support.

This is related to <rdar://problem/17831117>.

llvm-svn: 214348
2014-07-30 22:04:31 +00:00
Juergen Ributzka
f12a2ac4ea [FastISel][AArch64] Create helper functions to create the various multiplies on AArch64.
llvm-svn: 214346
2014-07-30 22:04:25 +00:00
Juergen Ributzka
9ceeaa8bbc [FastISel][AArch64] Add support for shift-immediate.
Currently the shift-immediate versions are not supported by tblgen and
hopefully this can be later removed, once the required support has been
added to tblgen.

llvm-svn: 214345
2014-07-30 22:04:22 +00:00
Jiangning Liu
86fc448354 Implement AArch64 TTI interface isAsCheapAsAMove.
llvm-svn: 214159
2014-07-29 02:09:26 +00:00
Matt Arsenault
76c7b7a591 Add alignment value to allowsUnalignedMemoryAccess
Rename to allowsMisalignedMemoryAccess.

On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment,
and don't need to be split into multiple accesses. Vector loads with
an alignment of the element type are not uncommon in OpenCL code.

llvm-svn: 214055
2014-07-27 17:46:40 +00:00
Tim Northover
5290b47db1 AArch64: fix conversion of 'J' inline asm constraints.
'J' represents a negative number suitable for an add/sub alias
instruction, but while preparing it to become an int64_t we were
mangling the sign extension. So "i32 -1" became 0xffffffffLL, for
example.

Should fix one half of PR20456.

llvm-svn: 214052
2014-07-27 07:10:29 +00:00
Akira Hatanaka
e9a7fadd46 [stack protector] Fix a potential security bug in stack protector where the
address of the stack guard was being spilled to the stack.

Previously the address of the stack guard would get spilled to the stack if it
was impossible to keep it in a register. This patch introduces a new target
independent node and pseudo instruction which gets expanded post-RA to a
sequence of instructions that load the stack guard value. Register allocator
can now just remat the value when it can't keep it in a register. 

<rdar://problem/12475629>

llvm-svn: 213967
2014-07-25 19:31:34 +00:00
Juergen Ributzka
d43a6ec447 [FastISel][AArch64] Add support for frameaddress intrinsic.
This commit implements the frameaddress intrinsic for the AArch64 architecture
in FastISel.

There were two test cases that pretty much tested the same, so I combined them
to a single test case.

Fixes <rdar://problem/17811834>

llvm-svn: 213959
2014-07-25 17:47:14 +00:00
Benjamin Kramer
85476f43e5 Run sort_includes.py on the AArch64 backend.
No functionality change.

llvm-svn: 213938
2014-07-25 11:42:14 +00:00
Tim Northover
5be46ae3c3 AArch64: refactor ReconstructShuffle function
Quite a bit of cruft had accumulated as we realised the various different cases
it had to handle and squeezed them in where possible. This refactoring mostly
flattens the logic and special-cases. The result is slightly longer, but I
think clearer.

Should be no functionality change.

llvm-svn: 213867
2014-07-24 15:39:55 +00:00
NAKAMURA Takumi
b99eee32fb Prune redundant libdeps.
llvm-svn: 213857
2014-07-24 11:45:27 +00:00
NAKAMURA Takumi
dbff653a5a Update library dependencies.
llvm-svn: 213832
2014-07-24 02:10:42 +00:00
Kevin Qin
a272df5db2 [AArch64] Fix a bug generating incorrect instruction when building small vector.
This bug is introduced by r211144. The element of operand may be
smaller than the element of result, but previous commit can
only handle the contrary condition. This commit is to handle this
scenario and generate optimized codes like ZIP1.

llvm-svn: 213830
2014-07-24 02:05:42 +00:00
Jiangning Liu
a94c806f8b [AArch64] Disable some optimization cases for type conversion from sint to fp, because those optimization cases are micro-architecture dependent and only make sense for Cyclone. A new predicate Cyclone is introduced in .td file.
llvm-svn: 213827
2014-07-24 01:29:59 +00:00
Rafael Espindola
d69ab9179e Finish inverting the MC -> Object dependency.
There were still some disassembler bits in lib/MC, but their use of Object
was only visible in the includes they used, not in the symbols.

llvm-svn: 213808
2014-07-23 22:26:07 +00:00
Jim Grosbach
f6143714d5 [X86,AArch64] Extend vcmp w/ unary op combine to work w/ more constants.
The transform to constant fold unary operations with an AND across a
vector comparison applies when the constant is not a splat of a scalar
as well.

llvm-svn: 213800
2014-07-23 20:41:43 +00:00
Jim Grosbach
5e39096957 X86: restrict combine to when type sizes are safe.
The folding of unary operations through a vector compare and mask operation
is only safe if the unary operation result is of the same size as its input.
For example, it's not safe for [su]itofp from v4i32 to v4f64.

llvm-svn: 213799
2014-07-23 20:41:38 +00:00
Juergen Ributzka
2a176cc855 [FastISel][AArch64] Fix return type in FastLowerCall.
I used the wrong method to obtain the return type inside FinishCall. This fix
simply uses the return type from FastLowerCall, which we already determined to
be a valid type.

Reduced test case from Chad. Thanks.

llvm-svn: 213788
2014-07-23 20:03:13 +00:00
Chad Rosier
2cb7fd5dae [AArch64] Lower sdiv x, pow2 using add + select + shift.
The target-independent DAGcombiner will generate:
asr w1, X, #31 w1 = splat sign bit.
add X, X, w1, lsr #28 X = X + 0 or pow2-1
asr w0, X, asr #4 w0 = X/pow2

However, the add + shifts is expensive, so generate:
add w0, X, 15 w0 = X + pow2-1
cmp X, wzr X - 0
csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X;
asr w0, X, asr 4 w0 = X/pow2

llvm-svn: 213758
2014-07-23 14:57:52 +00:00
Tim Northover
c357579164 AArch64: remove "arm64_be" support in favour of "aarch64_be".
There really is no arm64_be: it was a useful fiction to test big-endian support
while both backends existed in parallel, but now the only platform that uses
the name (iOS) doesn't have a big-endian variant, let alone one called
"arm64_be".

llvm-svn: 213748
2014-07-23 12:58:11 +00:00
Tim Northover
3e8fd62854 AArch64: remove arm64 triple enumerator.
Having both Triple::arm64 and Triple::aarch64 is extremely confusing, and
invites bugs where only one is checked. In reality, the only legitimate
difference between the two (arm64 usually means iOS) is also present in the OS
part of the triple and that's what should be checked.

We still parse the "arm64" triple, just canonicalise it to Triple::aarch64, so
there aren't any LLVM-side test changes.

llvm-svn: 213743
2014-07-23 12:32:47 +00:00
Juergen Ributzka
c2d9ee45f3 [FastIsel][AArch64] Add support for the FastLowerCall and FastLowerIntrinsicCall target-hooks.
This commit modifies the existing call lowering functions to be used as the
FastLowerCall and FastLowerIntrinsicCall target-hooks instead.

This enables patchpoint intrinsic lowering for AArch64.

This fixes <rdar://problem/17733076>

llvm-svn: 213704
2014-07-22 23:14:58 +00:00
Tim Northover
86e216695b CodeGen: emit IR-level f16 conversion intrinsics as fptrunc/fpext
This makes the first stage DAG for @llvm.convert.to.fp16 an fptrunc,
and correspondingly @llvm.convert.from.fp16 an fpext. The legalisation
path is now uniform, regardless of the input IR:

  fptrunc -> FP_TO_FP16 (if f16 illegal) -> libcall
  fpext -> FP16_TO_FP (if f16 illegal) -> libcall

Each target should be able to select the version that best matches its
operations and not be required to duplicate patterns for both fptrunc
and FP_TO_FP16 (for example).

As a result we can remove some redundant AArch64 patterns.

llvm-svn: 213507
2014-07-21 09:13:56 +00:00
David Peixotto
569d73691e MC: support different sized constants in constant pools
On AArch64 the pseudo instruction ldr <reg>, =... supports both
32-bit and 64-bit constants. Add support for 64 bit constants for
the pools to support the pseudo instruction fully.

Changes the AArch64 ldr-pseudo tests to use 32-bit registers and
adds tests with 64-bit registers.

Patch by Janne Grunau!

Differential Revision: http://reviews.llvm.org/D4279

llvm-svn: 213387
2014-07-18 16:05:14 +00:00
Tim Northover
be8b73df94 AArch64: implement efficient f16 bitcasts
Because i16 is illegal, there's no native DAG method to
represent a bitcast to or from an f16 type. This meant LLVM was
inserting a stack store/load pair which is really not ideal.

llvm-svn: 213378
2014-07-18 13:07:05 +00:00
Tim Northover
1b266803fb AArch64: support f16 extend/trunc operations.
llvm-svn: 213375
2014-07-18 13:01:31 +00:00
Jim Grosbach
2f665f1cd7 AArch64: Constant fold converting vector setcc results to float.
Since the result of a SETCC for AArch64 is 0 or -1 in each lane, we can
move unary operations, in this case [su]int_to_fp through the mask
operation and constant fold the operation away. Generally speaking:
  UNARYOP(AND(VECTOR_CMP(x,y), constant))
      --> AND(VECTOR_CMP(x,y), constant2)
where constant2 is UNARYOP(constant).

This implements the transform where UNARYOP is [su]int_to_fp.

For example, consider the simple function:
define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind {
  %cmp = fcmp oeq <4 x float> %val, %test
  %ext = zext <4 x i1> %cmp to <4 x i32>
  %result = sitofp <4 x i32> %ext to <4 x float>
  ret <4 x float> %result
}

Before this change, the code is generated as:
  fcmeq.4s  v0, v0, v1
  movi.4s v1, #0x1        // Integer splat value.
  and.16b v0, v0, v1      // Mask lanes based on the comparison.
  scvtf.4s  v0, v0        // Convert each lane to f32.
  ret

After, the code is improved to:
  fcmeq.4s  v0, v0, v1
  fmov.4s v1, #1.00000000 // f32 splat value.
  and.16b v0, v0, v1      // Mask lanes based on the comparison.
  ret

The svvtf.4s has been constant folded away and the floating point 1.0f
vector lanes are materialized directly via fmov.4s.

Rather than do the folding manually in the target code, teach getNode()
in the generic SelectionDAG to handle folding constant operands of
vector [su]int_to_fp nodes. It is reasonable (as noted in a FIXME) to do
additional constant folding there as well, but I don't have test cases
for those operations, so leaving them for another time when it becomes
appropriate.

rdar://17693791

llvm-svn: 213341
2014-07-18 00:40:52 +00:00
Arnaud A. de Grandmaison
32cb6202b9 [AArch64] Cleanup AsmParser: no need to use dyn_cast + assert. cast does it for us.
llvm-svn: 213296
2014-07-17 19:08:14 +00:00
Tim Northover
eae1f1c8cc CodeGen: extend f16 conversions to permit types > float.
This makes the two intrinsics @llvm.convert.from.f16 and
@llvm.convert.to.f16 accept types other than simple "float". This is
only strictly needed for the truncate operation, since otherwise
double rounding occurs and there's no way to represent the strict IEEE
conversion. However, for symmetry we allow larger types in the extend
too.

During legalization, we can expand an "fp16_to_double" operation into
two extends for convenience, but abort when the truncate isn't legal. A new
libcall is probably needed here.

Even after this commit, various target tweaks are needed to actually use the
extended intrinsics. I've put these into separate commits for clarity, so there
are no actual tests of f64 conversion here.

llvm-svn: 213248
2014-07-17 10:51:23 +00:00
Yi Kong
9b1652c5d0 Port memory barriers intrinsics to AArch64
Memory barrier __builtin_arm_[dmb, dsb, isb] intrinsics are required to
implement their corresponding ACLE and MSVC intrinsics.

This patch ports ARM dmb, dsb, isb intrinsic to AArch64.

Differential Revision: http://reviews.llvm.org/D4520

llvm-svn: 213247
2014-07-17 10:50:20 +00:00
Tim Northover
93f17f8aee AArch64: fall back to generic code for out of range extract/insert.
rdar://problem/17624784

llvm-svn: 213059
2014-07-15 10:00:26 +00:00
Tim Northover
4ac35c9d7b AArch64: remove unnecessary pseudo-instruction.
Sufficiently twisted use of TableGen lets us write patterns directly for f16
(as an i16 promoted to i32) -> f32 conversion.

llvm-svn: 212933
2014-07-14 11:16:02 +00:00
Saleem Abdulrasool
00c379d824 AArch64: add support for llvm.aarch64.hint intrinsic
This adds a llvm.aarch64.hint intrinsic to mirror the llvm.arm.hint in order to
support the various hint intrinsic functions in the ACLE.

Add an optional pattern field that permits the subclass to specify the pattern
that matches the selection.  The intrinsic pattern is set as mayLoad, mayStore,
so overload the value for the definition of the hint instruction.

llvm-svn: 212883
2014-07-12 21:20:49 +00:00
Oliver Stannard
6090374181 ARM: Allow __fp16 as a function arg or return type for AArch64
ACLE 2.0 allows __fp16 to be used as a function argument or return
type. This enables this for AArch64.

llvm-svn: 212812
2014-07-11 13:33:46 +00:00
Arnaud A. de Grandmaison
e705812564 [AArch64] Add logical alias instructions to MC AsmParser
This patch teaches the AsmParser to accept some logical+immediate
instructions and convert them as shown:

  bic  Rd, Rn, #imm  ->  and Rd, Rn, #~imm
  bics Rd, Rn, #imm  ->  ands Rd, Rn, #~imm
  orn  Rd, Rn, #imm  ->  orr Rd, Rn, #~imm
  eon  Rd, Rn, #imm  ->  eor Rd, Rn, #~imm

Those instructions are an alternate syntax available to assembly coders,
and are needed in order to support code already compiling with some other
assemblers. For example, the bic construct is used by the linux kernel.

llvm-svn: 212722
2014-07-10 15:12:26 +00:00
Tim Northover
3a26804901 AArch64: correctly fast-isel i8 & i16 multiplies
We were asking for a register for type i8 or i16 which caused an assert.

rdar://problem/17620015

llvm-svn: 212718
2014-07-10 14:18:46 +00:00
Jim Grosbach
10098b8e10 AArch64: Better codegen for storing to __fp16.
Storing will generally be immediately preceded by rounding from an f32
or f64, so make sure to match those patterns directly to convert into the
FPR16 register class directly rather than going through the integer GPRs.

This also eliminates an extra step in the convert-from-f64 path
which was first converting to f32 and then to f16 from there.

rdar://17594379

llvm-svn: 212638
2014-07-09 18:55:52 +00:00
Louis Gerbarg
53a9c08b97 Make AArch64FastISel::EmitIntExt explicitly check its source and destination types
This is a follow up to r212492. There should be no functional difference, but
this patch makes it clear that SrcVT must be an i1/i8/16/i32 and DestVT must be
an i8/i16/i32/i64.

rdar://17516686

llvm-svn: 212633
2014-07-09 17:54:32 +00:00
Jim Grosbach
4c963f72c5 AArch64: Better codegen for loading from __fp16.
Loading will generally extend to an f32 or an 64, so make sure
to match those patterns directly to load into the FPR16 register
class directly rather than going through the integer GPRs.

This also eliminates an extra step in the convert-to-f64 path
which was first converting to f32 and then to f64 from there.

rdar://17594379

llvm-svn: 212573
2014-07-08 23:28:48 +00:00
Arnaud A. de Grandmaison
5d079c21a0 Truncate the immediate in logical operation to the register width
And continue to produce an error if the 32 most significant bits are not all ones or zeros.

llvm-svn: 212520
2014-07-08 09:53:04 +00:00
Louis Gerbarg
30a2dcd984 Allow AArch64FastISel to degrade graceully in the presence of an MVT::i128
Currently AArch64FastISel crashes if it tries to extend an integer into an
MVT::i128. This can happen by creating 128 bit integers like so:

  typedef unsigned int uint128_t __attribute__((mode(TI)));
  typedef int sint128_t __attribute__((mode(TI)));

This patch makes EmitIntExt check for their presence and then falls back to
SelectionDAG.

Tests included.

rdar://17516686

llvm-svn: 212492
2014-07-07 21:37:51 +00:00
Kevin Qin
c72c5a1639 [AArch64] Normalize all constants to build a vector.
The value of constant operands will be truncated to fit element width.

llvm-svn: 212428
2014-07-07 02:45:40 +00:00
Saleem Abdulrasool
9cdbf65f90 AArch64: whitespace cleanup
llvm-svn: 212420
2014-07-06 22:13:26 +00:00
Chandler Carruth
fc0fe5064b [codegen,aarch64] Add a target hook to the code generator to control
vector type legalization strategies in a more fine grained manner, and
change the legalization of several v1iN types and v1f32 to be widening
rather than scalarization on AArch64.

This fixes an assertion failure caused by scalarizing nodes like "v1i32
trunc v1i64". As v1i64 is legal it will fail to scalarize v1i32.

This also provides a foundation for other targets to have more granular
control over how vector types are legalized.

Patch by Hao Liu, reviewed by Tim Northover. I'm committing it to allow
some work to start taking place on top of this patch as it adds some
really important hooks to the backend that I'd like to immediately start
using. =]

http://reviews.llvm.org/D4322

llvm-svn: 212242
2014-07-03 00:23:43 +00:00
Duncan P. N. Exon Smith
3ca30c6b7f AArch64: Re-enable AArch64AddressTypePromotion
This reverts commits r212189 and r212190.

While this pass was accidentally disabled (until r212073), r205437
slipped in a use of `auto` that should have been `auto&`.

This fixes PR20188.

llvm-svn: 212201
2014-07-02 18:17:40 +00:00
Duncan P. N. Exon Smith
53a9506052 AArch64: Remove unnecessary parens
llvm-svn: 212199
2014-07-02 18:14:03 +00:00
Duncan P. N. Exon Smith
08df999c69 AArch64: Merge isa with dyn_cast
llvm-svn: 212194
2014-07-02 17:26:39 +00:00
Duncan P. N. Exon Smith
3a2ea7e9a6 AArch64: Temporarily disable AArch64AddressTypePromotion
Temporarily disable AArch64AddressTypePromotion, which was effectively
re-enabled in r212073 and r212075, while I look into PR20188.

llvm-svn: 212189
2014-07-02 17:03:16 +00:00
Saleem Abdulrasool
ac95fd68f3 aarch64: support target-specific .req assembler directive
Based on the support for .req on ARM. The aarch64 variant has to keep track if
the alias register was a vector register (v0-31) or a general purpose or
VFP/Advanced SIMD ([bhsdq]0-31) register.

Patch by Janne Grunau!

llvm-svn: 212161
2014-07-02 04:50:23 +00:00
Juergen Ributzka
241f08ad4e [DAG] Pass the argument list to the CallLoweringInfo via move semantics. NFCI.
The argument list vector is never used after it has been passed to the
CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and
pretty much as cheap as keeping a pointer to it.

llvm-svn: 212135
2014-07-01 22:01:54 +00:00
Tim Northover
2d5af48fc1 AArch64: fix comment typo
llvm-svn: 212120
2014-07-01 19:47:09 +00:00
Duncan P. N. Exon Smith
c07a03b9ce AArch64: Follow-up to r212073
In r212073 I missed a call of `use_begin()` that assumed the wrong
semantics.  It's not clear to me at all what this code does without the
fix, so I'm not sure how to write a testcase.

llvm-svn: 212075
2014-07-01 00:05:37 +00:00
Duncan P. N. Exon Smith
8dd9ef0369 AArch64: Actually do address type promotion
AArch64AddressTypePromotion was doing nothing because it was using the
old semantics of `Use` and `uses()`, when it really wanted to get at the
`users()`.

llvm-svn: 212073
2014-06-30 23:42:14 +00:00
Chad Rosier
e739f85e32 [AArch64] Unsized types don't specify an alignment.
PR20109

llvm-svn: 212045
2014-06-30 15:03:00 +00:00
Chad Rosier
2954f936f3 [AArch64] Convert mul x, -(pow2 +/- 1) to shift + add/sub.
The combine for mul x, pow2 +/- 1 is unchanged. Test cases for
both combines as well as mul x, pow2 have been added as well.

llvm-svn: 212044
2014-06-30 14:51:14 +00:00
Eric Christopher
04713c650c Remove extraneous includes from the target machines.
llvm-svn: 211800
2014-06-26 19:30:05 +00:00
Rafael Espindola
196881cf29 Move expression visitation logic up to MCStreamer.
Remove the duplicate from MCRecordStreamer. No functionality change.

llvm-svn: 211714
2014-06-25 15:45:33 +00:00
Rafael Espindola
0e71694bfa Simplify the visitation of target expressions. No functionality change.
llvm-svn: 211707
2014-06-25 15:29:54 +00:00
Weiming Zhao
23e9a680c2 Resubmit commit r211533
"Fix PR20056: Implement pseudo LDR <reg>, =<literal/label> for AArch64"
Missed files are added in this commit.

llvm-svn: 211605
2014-06-24 16:21:38 +00:00
Rafael Espindola
8d9ccc8516 This reverts commit r211533 and r211539.
Revert "Fix PR20056: Implement pseudo LDR <reg>, =<literal/label> for AArch64"
 Revert "Fix cmake build."

It was missing a file.

llvm-svn: 211540
2014-06-23 21:20:58 +00:00
Juergen Ributzka
240b1bdaa5 Fix cmake build.
llvm-svn: 211539
2014-06-23 21:15:55 +00:00
Weiming Zhao
12e13fe224 Fix PR20056: Implement pseudo LDR <reg>, =<literal/label> for AArch64
This patch is based on the changes from ARM target [1,2]

Based on ARM doc [3], if the literal value can be loaded with a valid MOV,
it can emit that instruction. This is implemented in this patch.

[1] Fix PR18345: ldr= pseudo instruction produces incorrect code when using in inline assembly
Author: David Peixotto <dpeixott@codeaurora.org>
commit b92cca222898d87bbc764fa22e805adb04ef7f13 (r200777)
[2] Implement the ldr-pseudo opcode for ARM assembly
Author: David Peixotto <dpeixott@codeaurora.org>
commit 0fa193b08627927ccaa0804a34d80480894614b8 (r197708)
[3] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802a/CJAHAIBC.html

Differential Revision: http://reviews.llvm.org/D4163

llvm-svn: 211533
2014-06-23 20:44:16 +00:00
Craig Topper
c20830d1c1 Convert some assert(0) to llvm_unreachable or fold an 'if' condition into the assert.
llvm-svn: 211254
2014-06-19 06:10:58 +00:00
Kevin Qin
8c13822db4 [AArch64] Fix a pattern match failure caused by creating improper CONCAT_VECTOR.
ReconstructShuffle() may wrongly creat a CONCAT_VECTOR trying to
concat 2 of v2i32 into v4i16. This commit is to fix this issue and
try to generate UZP1 instead of lots of MOV and INS.
Patch is initalized by Kevin Qin, and refactored by Tim Northover.

llvm-svn: 211144
2014-06-18 05:54:42 +00:00
Craig Topper
701fca8e8d Replace some assert(0)'s with llvm_unreachable.
llvm-svn: 211141
2014-06-18 05:05:13 +00:00
Tim Northover
300d65b2ea AArch64: estimate inline asm length during branch relaxation
To make sure branches are in range, we need to do a better job of estimating
the length of an inline assembly block than "it's probably 1 instruction, who'd
write asm with more than that?".

Fortunately there's already a (highly suspect, see how many ways you can think
of to break it!) callback for this purpose, which is used by the other targets.

rdar://problem/17277590

llvm-svn: 211095
2014-06-17 11:31:42 +00:00
Jim Grosbach
6677259350 AArch64: Add backend intrinsic for rbit.
Define an intrinsic for the frontend to use and pattern match it to
the RBIT instruction.

rdar://9283021

llvm-svn: 211058
2014-06-16 21:55:35 +00:00
Tilmann Scheller
c25b867f23 [AArch64] Remove dead code.
Both function declarations lack a callee and an implementation.

llvm-svn: 211029
2014-06-16 15:15:41 +00:00
James Molloy
d8293dd333 [AArch64] Fix a fencepost error in lowering for llvm.aarch64.neon.uqshl.
Patch by Jiangning Liu!

llvm-svn: 211014
2014-06-16 10:39:21 +00:00
Tim Northover
9eac1de1e4 AArch64: improve handling & modelling of FP_TO_XINT nodes.
There's probably no acatual change in behaviour here, just updating
the LowerFP_TO_INT function to be more similar to the reverse
implementation and updating costs to current CodeGen.

llvm-svn: 210985
2014-06-15 09:27:15 +00:00
Tim Northover
0f6e617e90 AArch64: improve vector [su]itofp handling.
This somehow got missed in the AArch64 merge, so should fix a
performance regression since 3.4.

llvm-svn: 210984
2014-06-15 09:27:06 +00:00
Chad Rosier
07ce4c0d5f [AArch64] Basic Sched Model for Cortex-A57.
Patch by Dave Estes<cestes@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D4008

llvm-svn: 210705
2014-06-11 21:06:56 +00:00
Eric Christopher
553f176803 Move to a private function to initialize the subtarget dependencies
so that we can use initializer lists for the AArch64 Subtarget.

llvm-svn: 210616
2014-06-11 00:46:34 +00:00
Eric Christopher
3dcf4d029a Move AArch64TargetLowering to AArch64Subtarget.
This currently necessitates a TargetMachine for the TargetLowering
constructor and TLOF.

llvm-svn: 210605
2014-06-10 23:26:45 +00:00
Eric Christopher
c915f116de Move AArch64InstrInfo to AArch64Subtarget.
llvm-svn: 210599
2014-06-10 22:57:25 +00:00
Eric Christopher
4a380b82ec Remove a method that was just replacing direct access to a member.
llvm-svn: 210598
2014-06-10 22:57:21 +00:00
Eric Christopher
c85f7b41b5 Move AArch64SelectionDAGInfo down to the subtarget.
llvm-svn: 210557
2014-06-10 18:21:53 +00:00
Eric Christopher
b49d64f413 Remove the cached little endian variable. We can get it easily off
of the DataLayout.

llvm-svn: 210555
2014-06-10 18:11:20 +00:00
Eric Christopher
653ef1ea20 Have AArch64SelectionDAGInfo take a DataLayout parameter rather
than a TargetMachine.

llvm-svn: 210554
2014-06-10 18:06:28 +00:00
Eric Christopher
f8abeb0328 Remove caching of the subtarget for AArch64SelectionDAGInfo.
llvm-svn: 210553
2014-06-10 18:06:25 +00:00
Eric Christopher
3447f35f1b Move DataLayout onto the AArch64 subtarget.
llvm-svn: 210552
2014-06-10 18:06:23 +00:00
Eric Christopher
dcaea5b602 Move AArch64FrameLowering into the subtarget.
llvm-svn: 210549
2014-06-10 17:44:12 +00:00
Eric Christopher
9130d84166 Remove the uses of AArch64TargetMachine and AArch64Subtarget from
AArch64FrameLowering.

llvm-svn: 210548
2014-06-10 17:33:39 +00:00
Chad Rosier
0f6d185fcf [AArch64] Emit .ident compiler version attribute.
Patch by Ana Pazos<apazos@codeaurora.org>!

llvm-svn: 210535
2014-06-10 14:32:08 +00:00
Artyom Skrobov
e445b07705 Condition codes AL and NV are invalid in the aliases that use
inverted condition codes (CINC, CINV, CNEG, CSET, and CSETM).

Matching aliases based on "immediate classes", when disassembling,
wasn't previously supported, hence adding MCOperandPredicate
into class Operand, and implementing the support for it
in AsmWriterEmitter.

The parsing for those aliases was already custom, so just adding
the missing condition into AArch64AsmParser::parseCondCode.

llvm-svn: 210528
2014-06-10 13:11:35 +00:00
Tim Northover
8d5e97704b AArch64: disallow x30 & x29 as the destination for indirect tail calls
As Ana Pazos pointed out, these have to be restored to their incoming values
before a function returns; i.e. before the tail call. So they can't be used
correctly as the destination register.

llvm-svn: 210525
2014-06-10 10:50:24 +00:00
Tim Northover
bfac8dd607 AArch64: teach FastISel how to handle offset FrameIndices
Previously we were abandonning the attempt, leading to some combination of
extra work (when selection of a load/store fails completely) and inferior code
(when this leads to a real memcpy call instead of inlining).

rdar://problem/17187463

llvm-svn: 210520
2014-06-10 09:52:44 +00:00
Tim Northover
666d07f003 AArch64: make FastISel memcpy emission more robust.
We were hitting an assert if FastISel couldn't create the load or store we
requested. Currently this happens for large frame-local addresses, though
CodeGen could be improved there.

rdar://problem/17187463

llvm-svn: 210519
2014-06-10 09:52:40 +00:00
Artyom Skrobov
915d6e58c2 [AArch64] Missing aliases for CMP/CMN [W]SP with no shift
llvm-svn: 210464
2014-06-09 11:10:14 +00:00
Chad Rosier
22a15b47d4 [AArch64] Fix the ordering of the accumulate operand in SchedRW list.
Patch by Dave Estes <cestes@codeaurora.org>
http://reviews.llvm.org/D4037

llvm-svn: 210446
2014-06-09 01:54:00 +00:00
Chad Rosier
010594577d [AArch64] When combining constant mul of power of 2 plus/minus 1, prefer shift
plus add.  The shift can be folded into the add.  This only effects codegen
when the constant is 3.

llvm-svn: 210445
2014-06-09 01:25:51 +00:00
Craig Topper
b00824c629 [C++11] Use 'nullptr'.
llvm-svn: 210442
2014-06-08 22:29:17 +00:00
David Blaikie
f670b953e7 AsmMatchers: Use unique_ptr to manage ownership of MCParsedAsmOperand
I saw at least a memory leak or two from inspection (on probably
untested error paths) and r206991, which was the original inspiration
for this change.

I ran this idea by Jim Grosbach a few weeks ago & he was OK with it.
Since it's a basically mechanical patch that seemed sufficient - usual
post-commit review, revert, etc, as needed.

llvm-svn: 210427
2014-06-08 16:18:35 +00:00
Alp Toker
c46322b804 Remove outdated CMake MSVC workaround
llvm-svn: 210421
2014-06-08 07:37:17 +00:00
Eric Christopher
db8e2ecde5 Have TargetSelectionDAGInfo take a DataLayout initializer rather than
a TargetMachine since the only thing it wants is DataLayout.

llvm-svn: 210366
2014-06-06 19:04:48 +00:00
Tilmann Scheller
acc3c4f243 [AArch64] clang-format the load/store optimizer.
No change in functionality.

llvm-svn: 210182
2014-06-04 12:40:35 +00:00
Tilmann Scheller
4ed82f8466 [AArch64] Fix some LLVM Coding Standards violations in the load/store optimizer.
Variable names should start with an upper case letter.

No change in functionality.

llvm-svn: 210181
2014-06-04 12:36:28 +00:00
Tilmann Scheller
a373112959 [AArch64] Fix typo in load/store optimizer.
llvm-svn: 210114
2014-06-03 16:33:13 +00:00
Tim Northover
d56609ce6c AArch64: mark small types (i1, i8, i16) as promoted
This means the output of LowerFormalArguments returns a lowered
SDValue with the correct type (expected in SelectionDAGBuilder).
Without this, an assertion under a DEBUG macro triggers when those
types are passed on the stack.

llvm-svn: 210102
2014-06-03 13:54:53 +00:00
Jiangning Liu
531302fb19 [AArch64] Correctly deal with VPR stack parameter passing.
llvm-svn: 210067
2014-06-03 03:25:09 +00:00
Alp Toker
e8634eb077 Fix typos
llvm-svn: 209982
2014-05-31 21:26:28 +00:00
Eric Christopher
1aad72164e Have the TLOF creation take a Triple rather than needing a subtarget.
llvm-svn: 209937
2014-05-31 00:07:32 +00:00
Tim Northover
3bb84c9bcc ARM & AArch64: make use of common cmpxchg idioms after expansion
The C and C++ semantics for compare_exchange require it to return a bool
indicating success. This gets mapped to LLVM IR which follows each cmpxchg with
an icmp of the value loaded against the desired value.

When lowered to ldxr/stxr loops, this extra comparison is redundant: its
results are implicit in the control-flow of the function.

This commit makes two changes: it replaces that icmp with appropriate PHI
nodes, and then makes sure earlyCSE is called after expansion to actually make
use of the opportunities revealed.

I've also added -{arm,aarch64}-enable-atomic-tidy options, so that
existing fragile tests aren't perturbed too much by the change. Many
of them either rely on undef/unreachable too pervasively to be
restored to something well-defined (particularly while making sure
they test the same obscure assert from many years ago), or depend on a
particular CFG shape, which is disrupted by SimplifyCFG.

rdar://problem/16227836

llvm-svn: 209883
2014-05-30 10:09:59 +00:00
Artyom Skrobov
f8a8cd09c7 Restore getInvertedCondCode() from the phased-out backend, fixing disassembly for NV
llvm-svn: 209803
2014-05-29 11:34:50 +00:00
Artyom Skrobov
ec5776d81a Add missing check when MatchInstructionImpl() reports failure
llvm-svn: 209802
2014-05-29 11:26:15 +00:00
Hao Liu
0e99724daa Fix an assertion failure caused by v1i64 in DAGCombiner Shrink.
llvm-svn: 209798
2014-05-29 09:19:07 +00:00
Rafael Espindola
acdb307db3 [pr19844] Add thread local mode to aliases.
This matches gcc's behavior. It also seems natural given that aliases
contain other properties that govern how it is accessed (linkage,
visibility, dll storage).

Clang still has to be updated to expose this feature to C.

llvm-svn: 209759
2014-05-28 18:15:43 +00:00
Tim Northover
eeb6250a8a AArch64: implement copies to/from NZCV as a last ditch effort.
A test in test/Generic creates a DAG where the NZCV output of an ADCS is used
by multiple nodes. This makes LLVM want to save a copy of NZCV for later, which
it couldn't do before.

This should be the last fix required for the aarch64 buildbot.

llvm-svn: 209651
2014-05-27 12:16:02 +00:00
Tim Northover
94dde835f0 AArch64: support 'c' and 'n' inline asm modifiers.
These are tested by test/CodeGen/Generic, so we should probably know
how to deal with them. Fortunately generic code does it if asked.

llvm-svn: 209646
2014-05-27 07:37:21 +00:00
Tim Northover
10cffb6eef AArch64: force i1 to be zero-extended at an ABI boundary.
This commit is debatable. There are two possible approaches, neither
of which is really satisfactory:

1. Use "@foo(i1 zeroext)" to mean an extension to 32-bits on Darwin,
   and 8 bits otherwise.
2. Redefine "@foo(i1)" to mean that the i1 is extended by the caller
   to 8 bits. This goes against the spirit of "zeroext" I think, but
   it's a bit of a vague construct anyway (by definition you're going
   to extend to the amount required by the ABI, that's why it's the
   ABI!).

This implements option 2. The DAG machinery really isn't setup for the
first (there's a fairly strong assumption that "zeroext" goes to at
least the smallest register size), and even if it was the resulting
DAG looks like it would be inferior in many cases.

Theoretically we could add AssertZext nodes in the consumers of
ABI-passed values too now, but this actually seems to make the code
worse in practice by making truncation proceed in two steps. The code
produced is equally valid if we continue to assume only the low bit is
defined.

Should fix PR19850

llvm-svn: 209637
2014-05-26 17:22:07 +00:00
Tim Northover
fc1b1e8952 AArch64: simplify calling conventions slightly.
We can eliminate the custom C++ code in favour of some TableGen to
check the same things. Functionality should be identical, except for a
buffer overrun that was present in the C++ code and meant webkit
failed if any small argument needed to be passed on the stack.

llvm-svn: 209636
2014-05-26 17:21:53 +00:00
Tim Northover
b8c72c5d80 AArch64: disable FastISel for large code model.
The code emitted is what would be expected for the small model, so it
shouldn't be used when objects can be the full 64-bits away.

This fixes MCJIT tests on Linux.

llvm-svn: 209585
2014-05-24 19:45:41 +00:00
Tim Northover
ca0f4dc4f0 AArch64/ARM64: move ARM64 into AArch64's place
This commit starts with a "git mv ARM64 AArch64" and continues out
from there, renaming the C++ classes, intrinsics, and other
target-local objects for consistency.

"ARM64" test directories are also moved, and tests that began their
life in ARM64 use an arm64 triple, those from AArch64 use an aarch64
triple. Both should be equivalent though.

This finishes the AArch64 merge, and everyone should feel free to
continue committing as normal now.

llvm-svn: 209577
2014-05-24 12:50:23 +00:00
Tim Northover
d7f173214f AArch64/ARM64: remove AArch64 from tree prior to renaming ARM64.
I'm doing this in two phases for a better "git blame" record. This
commit removes the previous AArch64 backend and redirects all
functionality to ARM64. It also deduplicates test-lines and removes
orphaned AArch64 tests.

The next step will be "git mv ARM64 AArch64" and rewire most of the
tests.

Hopefully LLVM is still functional, though it would be even better if
no-one ever had to care because the rename happens straight
afterwards.

llvm-svn: 209576
2014-05-24 12:42:26 +00:00
Benjamin Kramer
600e24a1cb SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the bswap not.
- On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though.
- On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal.
- On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled.

llvm-svn: 209123
2014-05-19 13:12:38 +00:00
Saleem Abdulrasool
501d3b6235 Target: remove old constructors for CallLoweringInfo
This is mostly a mechanical change changing all the call sites to the newer
chained-function construction pattern.  This removes the horrible 15-parameter
constructor for the CallLoweringInfo in favour of setting properties of the call
via chained functions.  No functional change beyond the removal of the old
constructors are intended.

llvm-svn: 209082
2014-05-17 21:50:17 +00:00
Rafael Espindola
6d40091c3c Revert "Implement global merge optimization for global variables."
This reverts commit r208934.

The patch depends on aliases to GEPs with non zero offsets. That is not
supported and fairly broken.

The good news is that GlobalAlias is being redesigned and will have support
for offsets, so this patch should be a nice match for it.

llvm-svn: 208978
2014-05-16 13:02:18 +00:00
Tim Northover
7dfb58559c AArch64: disable printing of add/sub alias
This alias appears not to have an appropriate PrintMethod. Normally, I'd look
into it, but since AArch64 is disappearing soon it's probably not worth it.

This will be tested when the TableGen "should I print this Alias" heuristic is
fixed (very soon).

llvm-svn: 208967
2014-05-16 09:41:43 +00:00
Tim Northover
4736478963 AArch64: disable printing of MOV -> MOVZ aliases
Actually, MOV sometimes is canonical, but for now this is a better
approximation than what's there.

This will be tested when the TableGen "should I print this Alias" heuristic is
fixed (very soon).

llvm-svn: 208962
2014-05-16 09:41:21 +00:00
Jiangning Liu
5366cb42f6 Implement global merge optimization for global variables.
This commit implements two command line switches -global-merge-on-external
and -global-merge-aligned, and both of them are false by default, so this
optimization is disabled by default for all targets.

For ARM64, some back-end behaviors need to be tuned to get this optimization
further enabled.

llvm-svn: 208934
2014-05-15 23:45:42 +00:00
Tim Northover
4ba95d4483 TableGen/ARM64: print aliases even if they have syntax variants.
To get at least one use of the change (and some actual tests) in with its
commit, I've enabled the AArch64 & ARM64 NEON mov aliases.

llvm-svn: 208867
2014-05-15 11:16:32 +00:00
Tim Northover
3c2cc7a397 TableGen: use PrintMethods to print more aliases
llvm-svn: 208607
2014-05-12 18:04:06 +00:00
Hal Finkel
5b038e4cbc Pass the value type to TLI::getRegisterByName
We must validate the value type in TLI::getRegisterByName, because if we
don't and the wrong type was used with the IR intrinsic, then we'll assert
(because we won't be able to find a valid register class with which to
construct the requested copy operation). For PPC64, additionally, the type
information is necessary to decide between the 64-bit register and the 32-bit
subregister.

No functionality change.

llvm-svn: 208508
2014-05-11 19:29:07 +00:00
Hal Finkel
34d719e885 Add 'override' to getRegisterByName in *ISelLowering.h
No functionality change intended.

llvm-svn: 208507
2014-05-11 19:28:55 +00:00
Renato Golin
8a9a382ab2 Implememting named register intrinsics
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).

So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.

llvm-svn: 208104
2014-05-06 16:51:25 +00:00
Benjamin Kramer
3e3e43e656 AArch64: Mark vector long multiplication as expand.
There are no patterns for this. This was already fixed for ARM64 but I forgot
to apply it to AArch64 too.

llvm-svn: 207515
2014-04-29 09:37:54 +00:00
Craig Topper
b22729defa [C++11] Add 'override' keywords and remove 'virtual'. Additionally add 'final' and leave 'virtual' on some methods that are marked virtual without overriding anything and have no obvious overrides themselves. AArch64 edition
llvm-svn: 207510
2014-04-29 07:58:34 +00:00
Craig Topper
b663bffa27 [C++] Use 'nullptr'.
llvm-svn: 207394
2014-04-28 04:05:08 +00:00
Craig Topper
1efda44640 Convert SelectionDAG::SelectNodeTo to use ArrayRef.
llvm-svn: 207377
2014-04-27 19:21:11 +00:00
Craig Topper
536995c0a7 Convert SelectionDAG::getMergeValues to use ArrayRef.
llvm-svn: 207374
2014-04-27 19:20:57 +00:00
Craig Topper
e0741a0fcb Convert getMemIntrinsicNode to take ArrayRef of SDValue instead of pointer and size.
llvm-svn: 207329
2014-04-26 19:29:41 +00:00
Craig Topper
1b1f54bcca Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>.
llvm-svn: 207327
2014-04-26 18:35:24 +00:00
Craig Topper
6d411cb95a [C++] Use 'nullptr'. Target edition.
llvm-svn: 207197
2014-04-25 05:30:21 +00:00
Reid Kleckner
e7e2ccb9e9 Add 'musttail' marker to call instructions
This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur.  It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.

Reviewers: nicholas

Differential Revision: http://llvm-reviews.chandlerc.com/D3240

llvm-svn: 207143
2014-04-24 20:14:34 +00:00
Tim Northover
44ef313763 AArch64: print NEON lists with a space.
This matches ARM64 behaviour, which I think is clearer. It also puts all the
churn from that difference into one easily ignored commit.

llvm-svn: 207116
2014-04-24 14:06:20 +00:00
Evgeniy Stepanov
c242bd4b23 Create MCTargetOptions.
For now it contains a single flag, SanitizeAddress, which enables
AddressSanitizer instrumentation of inline assembly.

Patch by Yuri Gorshenin.

llvm-svn: 206971
2014-04-23 11:16:03 +00:00
Kevin Enderby
223e66dc63 Fix the assembler to print a better relocatable expression error
diagnostic that includes location information.

Currently if one has this assembly:

	.quad (0x1234 + (4 * SOME_VALUE))

where SOME_VALUE is undefined ones gets the less than
useful error message with no location information:

% clang -c x.s
clang -cc1as: fatal error: error in backend: expected relocatable expression

With this fix one now gets a more useful error message
with location information:

% clang -c x.s 
x.s:5:8: error: expected relocatable expression
 .quad (0x1234 + (4 * SOME_VALUE))
       ^

To do this I plumbed the SMLoc through the MCObjectStreamer
EmitValue() and EmitValueImpl() interfaces so it could be used
when creating the MCFixup.

rdar://12391022

llvm-svn: 206906
2014-04-22 17:27:29 +00:00
Jiangning Liu
fbd9fe9a73 [AArch64] Enable global merge pass.
llvm-svn: 206861
2014-04-22 03:33:26 +00:00
Chandler Carruth
ae889a5f85 [Modules] Fix potential ODR violations by sinking the DEBUG_TYPE
definition below all of the header #include lines, lib/Target/...
edition.

llvm-svn: 206842
2014-04-22 02:41:26 +00:00
Chandler Carruth
1bfa34f57d [cleanup] Fix two headers where we included a standard library header
after including the generated code from tablegen.

llvm-svn: 206841
2014-04-22 02:28:45 +00:00
Chandler Carruth
72185824a4 [cleanup] Lift using directives, DEBUG_TYPE definitions, and even some
system headers above the includes of generated '.inc' files that
actually contain code. In a few targets this was already done pretty
consistently, but it wasn't done *really* consistently anywhere. It is
strictly cleaner IMO and necessary in a bunch of places where the
DEBUG_TYPE is referenced from the generated code. Consistency with the
necessary places trumps. Hopefully the build bots are OK with the
movement of intrin.h...

llvm-svn: 206838
2014-04-22 02:03:14 +00:00
Chandler Carruth
15c7b91ac2 [Modules] Make Support/Debug.h modular. This requires it to not change
behavior based on other files defining DEBUG_TYPE, which means it cannot
define DEBUG_TYPE at all. This is actually better IMO as it forces folks
to define relevant DEBUG_TYPEs for their files. However, it requires all
files that currently use DEBUG(...) to define a DEBUG_TYPE if they don't
already. I've updated all such files in LLVM and will do the same for
other upstream projects.

This still leaves one important change in how LLVM uses the DEBUG_TYPE
macro going forward: we need to only define the macro *after* header
files have been #include-ed. Previously, this wasn't possible because
Debug.h required the macro to be pre-defined. This commit removes that.
By defining DEBUG_TYPE after the includes two things are fixed:

- Header files that need to provide a DEBUG_TYPE for some inline code
  can do so by defining the macro before their inline code and undef-ing
  it afterward so the macro does not escape.

- We no longer have rampant ODR violations due to including headers with
  different DEBUG_TYPE definitions. This may be mostly an academic
  violation today, but with modules these types of violations are easy
  to check for and potentially very relevant.

Where necessary to suppor headers with DEBUG_TYPE, I have moved the
definitions below the includes in this commit. I plan to move the rest
of the DEBUG_TYPE macros in LLVM in subsequent commits; this one is big
enough.

The comments in Debug.h, which were hilariously out of date already,
have been updated to reflect the recommended practice going forward.

llvm-svn: 206822
2014-04-21 22:55:11 +00:00
Jiangning Liu
57e94eee58 This commit allows vectorized loops to be unrolled by a factor of 2 for AArch64.
A new test case is also added for ARM64.

Patched by Z.Zheng

llvm-svn: 206563
2014-04-18 07:57:54 +00:00
Jiangning Liu
6aa9a901c7 This is one of the optimizations ported from ARM64 to AArch64 to address the performance gap between these two back ends. The test case newly added for AArch64 already exists in ARM64.
Patched by Z.Zheng

llvm-svn: 206559
2014-04-18 05:58:09 +00:00
Jiangning Liu
fcc0f2379a This commit enables unaligned memory accesses of vector types on AArch64 back end. This should boost vectorized code performance.
Patched by Z. Zheng

llvm-svn: 206557
2014-04-18 03:58:38 +00:00
Chad Rosier
f414b6adf9 [AArch64] Implement the getCSRFirstUseCost API, mirroring that in ARM64.
llvm-svn: 206473
2014-04-17 16:19:54 +00:00
Craig Topper
69e0e91431 Convert SelectionDAG::getVTList to use ArrayRef
llvm-svn: 206357
2014-04-16 06:10:51 +00:00
Nick Lewycky
82ad9fc7c8 Break PseudoSourceValue out of the Value hierarchy. It is now the root of its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead.
llvm-svn: 206255
2014-04-15 07:22:52 +00:00
Lang Hames
91cdab6916 [MC] Require an MCContext when constructing an MCDisassembler.
This patch re-introduces the MCContext member that was removed from
MCDisassembler in r206063, and requires that an MCContext be passed in at
MCDisassembler construction time. (Previously the MCContext member had been
initialized in an ad-hoc fashion after construction). The MCCContext member
can be used by MCDisassembler sub-classes to construct constant or
target-specific MCExprs.

This patch updates disassemblers for in-tree targets, and provides the
MCRegisterInfo instance that some disassemblers were using through the
MCContext (previously those backends were constructing their own
MCRegisterInfo instances).

llvm-svn: 206241
2014-04-15 04:40:56 +00:00
Chad Rosier
f94fdcad79 [AArch64] Implement the isLegalAddressingMode and getScalingFactorCost APIs.
llvm-svn: 206089
2014-04-12 00:14:23 +00:00
NAKAMURA Takumi
d13a996dc0 LLVMBuild.txt: Add missing dependencies.
llvm-svn: 205962
2014-04-10 11:16:47 +00:00
Chad Rosier
0a639ba5a6 [AArch64] Implement the isZExtFree APIs.
llvm-svn: 205926
2014-04-09 20:51:21 +00:00
Chad Rosier
b6bf098390 [AArch64] Implement the isTruncateFree API.
In AArch64 i64 to i32 truncate operation is a subregister access.

This allows more opportunities for LSR optmization to eliminate
variables of different types (i32 and i64).

llvm-svn: 205925
2014-04-09 20:43:40 +00:00
Alp Toker
111bd28e59 Fix some doc and comment typos
llvm-svn: 205899
2014-04-09 14:47:27 +00:00
Tim Northover
421793ce9a ARM64: handle v1i1 types arising from setcc properly.
There were several overlapping problems here, and this solution is
closely inspired by the one adopted in AArch64 in r201381.

Firstly, scalarisation of v1i1 setcc operations simply fails if the
input types are legal. This is fixed in LegalizeVectorTypes.cpp this
time, and allows AArch64 code to be simplified slightly.

Second, vselect with such a setcc feeding into it ends up in
ScalarizeVectorOperand, where it's not handled. I experimented with an
implementation, but found that whatever DAG came out was rather
horrific. I think Hao's DAG combine approach is a good one for
quality, though there are edge cases it won't catch (to be fixed
separately).

Should fix PR19335.

llvm-svn: 205625
2014-04-04 14:49:21 +00:00
Craig Topper
694437e2ef Make consistent use of MCPhysReg instead of uint16_t throughout the tree.
llvm-svn: 205610
2014-04-04 05:16:06 +00:00