1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 11:02:59 +02:00
Commit Graph

722 Commits

Author SHA1 Message Date
Oliver Stannard
4116b199b2 [AArch64] Fix fast-isel of cbz of i1, i8, i16
This fixes a miscompilation in the AArch64 fast-isel which was
triggered when a branch is based on an icmp with condition eq or ne,
and type i1, i8 or i16. The cbz instruction compares the whole 32-bit
register, so values with the bottom 1, 8 or 16 bits clear would cause
the wrong branch to be taken.

llvm-svn: 220553
2014-10-24 09:54:41 +00:00
Chad Rosier
19ef65a831 [AArch64] Add support for the .inst directive.
This has been implement using the MCTargetStreamer interface as is done in the
ARM, Mips and PPC backends.

Phabricator: http://reviews.llvm.org/D5891
PR20964

llvm-svn: 220422
2014-10-22 20:35:57 +00:00
Arnaud A. de Grandmaison
e8d641238b [AArch64] Cleanup A57PBQPConstraints
And add a long awaited testcase.

llvm-svn: 220381
2014-10-22 12:40:20 +00:00
Arnaud A. de Grandmaison
73624b6ac4 [PBQP] Teach PassConfig to tell if the default register allocator is used.
This enables targets to adapt their pass pipeline to the register
allocator in use. For example, with the AArch64 backend, using PBQP
with the cortex-a57, the FPLoadBalancing pass is no longer necessary.

llvm-svn: 220321
2014-10-21 20:47:22 +00:00
James Molloy
b06648acfb [AArch64] Fix a silent codegen fault in BUILD_VECTOR lowering.
We should be talking about the number of source elements, not the number of destination elements, given we know at this point that the source and dest element numbers are not the same.

While we're at it, avoid writing to std::vector::end()...

Bug found with random testing and a lot of coffee.

llvm-svn: 220051
2014-10-17 17:06:31 +00:00
Juergen Ributzka
3743a4c0d2 [AArch64] Fix miscompile of sdiv-by-power-of-2.
When the constant divisor was larger than 32bits, then the optimized code
generated for the AArch64 backend would emit the wrong code, because the shift
was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would
loose the upper 32bits.

This fixes rdar://problem/18678801.

llvm-svn: 219934
2014-10-16 16:41:15 +00:00
Juergen Ributzka
d372bea426 Reapply "[FastISel][AArch64] Add custom lowering for GEPs."
This is mostly a copy of the existing FastISel GEP code, but we have to
duplicate it for AArch64, because otherwise we would bail out even for simple
cases. This is because the standard fastEmit functions don't cover MUL at all
and ADD is lowered very inefficientily.

The original commit had a bug in the add emit logic, which has been fixed.

llvm-svn: 219831
2014-10-15 18:58:07 +00:00
Juergen Ributzka
e743ba566b [FastISel][AArch64] Factor out add with immediate emission into a helper function. NFC.
Simplify add with immediate emission by factoring it out into a helper function.

llvm-svn: 219830
2014-10-15 18:58:02 +00:00
Rafael Espindola
1dba93c519 Simplify handling of --noexecstack by using getNonexecutableStackSection.
llvm-svn: 219799
2014-10-15 16:12:52 +00:00
Juergen Ributzka
a0f67257e6 Revert "[FastISel][AArch64] Add custom lowering for GEPs."
This breaks our internal build bots. Reverting it to get the bots green again.

llvm-svn: 219776
2014-10-15 04:55:48 +00:00
Eric Christopher
6ae3b9e3df Remove unused variable.
llvm-svn: 219750
2014-10-15 00:09:07 +00:00
Gerolf Hoflehner
93b11ca136 [AArch64] Wrong CC access in CSINC-conditional branch sequence
This is a follow up to commit r219742. It removes the CCInMI variable
and accesses the CC in CSCINC directly. In the case of a conditional
branch accessing the CC with CCInMI was wrong.

llvm-svn: 219748
2014-10-14 23:55:00 +00:00
Gerolf Hoflehner
fbd25ba142 [AAarch64] Optimize CSINC-branch sequence
Peephole optimization that generates a single conditional branch
for csinc-branch sequences like in the examples below. This is
possible when the csinc sets or clears a register based on a condition
code and the branch checks that register. Also the condition
code may not be modified between the csinc and the original branch.

Examples:

1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44
   to b.<invCC>

2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44
   to b.<CC>


rdar://problem/18506500

llvm-svn: 219742
2014-10-14 23:07:53 +00:00
Juergen Ributzka
d3bf2383f1 [FastISel][AArch64] Add custom lowering for GEPs.
This is mostly a copy of the existing FastISel GEP code, but on AArch64 we bail
out even for simple cases, because the standard fastEmit functions don't cover
MUL and ADD is lowered inefficientily.

llvm-svn: 219726
2014-10-14 21:41:23 +00:00
Juergen Ributzka
3aad57b5cd [FastISel][AArch64] Fix sign-/zero-extend folding when SelectionDAG is involved.
Sign-/zero-extend folding depended on the load and the integer extend to be
both selected by FastISel. This cannot always be garantueed and SelectionDAG
might interfer. This commit adds additonal checks to load and integer extend
lowering to catch this.

Related to rdar://problem/18495928.

llvm-svn: 219716
2014-10-14 20:36:02 +00:00
Bradley Smith
bb04f108aa [AArch64] Fix crash with empty/pseudo-only blocks in A53 erratum (835769) workaround
llvm-svn: 219684
2014-10-14 14:02:41 +00:00
Hao Liu
6c7f917b92 [AArch64]Select wide immediate offset into [Base+XReg] addressing mode
e.g Currently we'll generate following instructions if the immediate is too wide:
    MOV  X0, WideImmediate
    ADD  X1, BaseReg, X0
    LDR  X2, [X1, 0]

    Using [Base+XReg] addressing mode can save one ADD as following:
    MOV  X0, WideImmediate
    LDR  X2, [BaseReg, X0]

    Differential Revision: http://reviews.llvm.org/D5477

llvm-svn: 219665
2014-10-14 06:50:36 +00:00
Bradley Smith
aa602e5e4d [AArch64] Add workaround for Cortex-A53 erratum (835769)
Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is
possible for a 64-bit multiply-accumulate instruction in AArch64 state to
generate an incorrect result.  The details are quite complex and hard to
determine statically, since branches in the code may exist in some
 circumstances, but all cases end with a memory (load, store, or prefetch)
instruction followed immediately by the multiply-accumulate operation.

The safest work-around for this issue is to make the compiler avoid emitting
multiply-accumulate instructions immediately after memory instructions and the
simplest way to do this is to insert a NOP.

This patch implements such work-around in the backend, enabled via the option
-aarch64-fix-cortex-a53-835769.

The work-around code generation is not enabled by default.

llvm-svn: 219603
2014-10-13 10:12:35 +00:00
Lang Hames
e0ce993042 [PBQP] Add missing headers from r219421.
llvm-svn: 219425
2014-10-09 18:36:59 +00:00
Lang Hames
1ac2927c37 [PBQP] Replace PBQPBuilder with composable constraints (PBQPRAConstraint).
This patch removes the PBQPBuilder class and its subclasses and replaces them
with a composable constraints class: PBQPRAConstraint. This allows constraints
that are only required for optimisation (e.g. coalescing, soft pairing) to be
mixed and matched.

This patch also introduces support for target writers to supply custom
constraints for their targets by overriding a TargetSubtargetInfo method:

std::unique_ptr<PBQPRAConstraints> getCustomPBQPConstraints() const;

This patch should have no effect on allocations.

llvm-svn: 219421
2014-10-09 18:20:51 +00:00
Kevin Qin
33566b5879 [AArch64] Enable partial & runtime unrolling on cortex-a57.
llvm-svn: 219401
2014-10-09 10:13:27 +00:00
Chad Rosier
75e17097bb [AArch64] Generate vector signed/unsigned mul and mla/mls long.
Phabricator Revision: http://reviews.llvm.org/D5589
Patch by Balaram Makam <bmakam@codeaurora.org>!!

llvm-svn: 219276
2014-10-08 02:31:24 +00:00
Juergen Ributzka
06c32d25cd [FastISel][AArch64] Teach the address computation code to also fold sign-/zero-extends.
The code already folds sign-/zero-extends, but only if they are arguments to
mul and shift instructions. This extends the code to also fold them when they
are direct inputs.

llvm-svn: 219187
2014-10-07 03:40:06 +00:00
Juergen Ributzka
6619d4d4b3 [FastISel][AArch64] Teach the address computation to also fold sub instructions.
Tiny enhancement to the address computation code to also fold sub instructions
if the rhs is constant and can be folded into the offset.

llvm-svn: 219186
2014-10-07 03:40:03 +00:00
Juergen Ributzka
d396480465 [FastISel][AArch64] Fix "Fold sign-/zero-extends into the load instruction."
This commit fixes an issue with sign-/zero-extending loads that was discovered
by Richard Barton.

We use now the correct load instructions for sign-extending loads to 64bit. Also
updated and added more unit tests.

llvm-svn: 219185
2014-10-07 03:39:59 +00:00
Eric Christopher
faca264c55 Add subtarget caches to aarch64, arm, ppc, and x86.
These will make it easier to test further changes to the
code generation and optimization pipelines as those are
moved to subtargets initialized with target feature and
target cpu.

llvm-svn: 219106
2014-10-06 06:45:36 +00:00
Benjamin Kramer
860521c88b Make AAMDNodes ctor and operator bool (!!!) explicit, mop up bugs and weirdness exposed by it.
llvm-svn: 219068
2014-10-04 22:44:29 +00:00
Jingyue Wu
4a186967a9 Add fake use to suppress defined-but-unused warnings
llvm-svn: 219045
2014-10-04 03:50:10 +00:00
Benjamin Kramer
4c9fb3d669 Eliminate some deep std::vector copies. NFC.
llvm-svn: 218999
2014-10-03 18:33:16 +00:00
Eric Christopher
47c04156aa constify TargetMachine parameter.
llvm-svn: 218934
2014-10-03 00:42:41 +00:00
Juergen Ributzka
2eab1ef77e [Stackmaps] Make ithe frame-pointer required for stackmaps.
Do not eliminate the frame pointer if there is a stackmap or patchpoint in the
function. All stackmap references should be FP relative.

This fixes PR21107.

llvm-svn: 218920
2014-10-02 22:21:49 +00:00
Adrian Prantl
2b1df58ebe Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787
2014-10-01 18:55:02 +00:00
Adrian Prantl
0959156fa3 Revert r218778 while investigating buldbot breakage.
"Move the complex address expression out of DIVariable and into an extra"

llvm-svn: 218782
2014-10-01 18:10:54 +00:00
Adrian Prantl
229943585f Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

llvm-svn: 218778
2014-10-01 17:55:39 +00:00
Tom Coxon
50ff005894 [AArch64] Allow access to all system registers with MRS/MSR instructions.
The A64 instruction set includes a generic register syntax for accessing
implementation-defined system registers. The syntax for these registers is:
    S<op0>_<op1>_<CRn>_<CRm>_<op2>

The encoding space permitted for implementation-defined system registers
is:
    op0 op1  CRn   CRm   op2
    11  xxx  1x11  xxxx  xxx

The full encoding space can now be accessed:
    op0 op1  CRn   CRm   op2
    xx  xxx  xxxx  xxxx  xxx

This is useful to anyone needing to write assembly code supporting new
system registers before the assembler has learned the official names for
them.

llvm-svn: 218753
2014-10-01 10:13:59 +00:00
Asiri Rathnayake
0e12aa2ad9 Add missing natual vector cast.
Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST
was introduced in r217159 and r217138. This patch adds a missing cast from
v2f32 to v1i64 which is causing some compilation failures. Also added test
cases to cover various modimm types and BUILD_VECTORs with i64 elements.

llvm-svn: 218751
2014-10-01 09:59:45 +00:00
Juergen Ributzka
4b2325a925 Recommit r218010 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ.
Note: This version fixed an issue with the TBZ/TBNZ instructions that were
generated in FastISel. The issue was that the 64bit version of TBZ (TBZX)
automagically sets the upper bit of the immediate field that is used to specify
the bit we want to test. To test for any of the lower 32bits we have to first
extract the subregister and use the 32bit version of the TBZ instruction (TBZW).

Original commit message:
Teach selectBranch to fold bit test and branch into a single instruction (TBZ or
TBNZ).

llvm-svn: 218693
2014-09-30 19:59:35 +00:00
Tom Coxon
856ab42e33 [AArch64] Remove unnecessary whitespace. (Test commit)
llvm-svn: 218680
2014-09-30 16:23:16 +00:00
Juergen Ributzka
040a60a3d3 [FastISel][AArch64] Fold sign-/zero-extends into the load instruction.
The sign-/zero-extension of the loaded value can be performed by the memory
instruction for free. If the result of the load has only one use and the use is
a sign-/zero-extend, then we emit the proper load instruction. The extend is
only a register copy and will be optimized away later on.

Other instructions that consume the sign-/zero-extended value are also made
aware of this fact, so they don't fold the extend too.

This fixes rdar://problem/18495928.

llvm-svn: 218653
2014-09-30 00:49:58 +00:00
Juergen Ributzka
d6d5162a97 [FastISel][AArch64] Factor out scale factor calculation. NFC.
Factor out the code that determines the implicit scale factor of memory
operations for a given value type.

llvm-svn: 218652
2014-09-30 00:49:54 +00:00
Dave Estes
a9a3195105 [AArch64] Refines the Cortex-A57 Machine Model
Primarily refines all of the instructions with accurate latency
and micro-op information. Refinements largely focus on the NEON
instructions.

Additionally, a few advanced features are modeled, including
forwarding for MAC instructions and hazards for floating point SQRT
and DIV.

Lastly, the issue-width is reduced to three so that the scheduler
will better accommodate the narrower decode and dispatch width.

llvm-svn: 218627
2014-09-29 21:27:36 +00:00
Chad Rosier
df7883744f [AArch64] Improve cost model to handle sdiv by a pow-of-two.
This patch improves the target-specific cost model to better handle signed
division by a power of two. The immediate result is that this enables the SLP
vectorizer to do a better job.

http://reviews.llvm.org/D5469
PR20714

llvm-svn: 218607
2014-09-29 13:59:31 +00:00
Jim Grosbach
41b2a508dd AArch64: allow constant expressions for shifted reg literals
e.g., add w1, w2, w3, lsl #(2 - 1)

This sort of thing comes up in pre-processed assembly playing macro games.
Still validate that it's an assembly time constant. The early exit error check
was just a bit overzealous and disallowed a left paren.

rdar://18430542

llvm-svn: 218336
2014-09-23 22:16:02 +00:00
Oliver Stannard
d5ddd9f5ee Fix segfault in AArch64 backend with -g and -mbig-endian
Fix a null pointer dereference when trying to swap the endianness of
fixups in the .eh_frame section in the AArch64 backend.

llvm-svn: 218311
2014-09-23 15:38:11 +00:00
Juergen Ributzka
96a3a7534a [FastISel][AArch64] Also allow folding of sign-/zero-extend and shift-left for booleans (i1).
Shift-left immediate with sign-/zero-extensions also works for boolean values.
Update the assert and the test cases to reflect that fact.

This should fix a bug found by Chad.

llvm-svn: 218275
2014-09-22 21:08:53 +00:00
Juergen Ributzka
c86ece1ad5 [FastIsel][AArch64] Fix a think-o in address computation.
When looking through sign/zero-extensions the code would always assume there is
such an extension instruction and use the wrong operand for the address.

There was also a minor issue in the handling of 'AND' instructions. I
accidentially used a 'cast' instead of a 'dyn_cast'.

llvm-svn: 218161
2014-09-19 22:23:46 +00:00
Aaron Ballman
c9d2119dc2 Reverting NFC changes from r218050. Instead, the warning was disabled for GCC in r218059, so these changes are no longer required.
llvm-svn: 218062
2014-09-18 17:34:23 +00:00
Aaron Ballman
2e4b3f3dca Fixing a bunch of -Woverloaded-virtual warnings due to hiding getSubtargetImpl from the base class. NFC.
llvm-svn: 218050
2014-09-18 13:27:14 +00:00
Juergen Ributzka
675ee57091 Revert "[FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ."
Reverting it until I have time to investigate a regression.

llvm-svn: 218035
2014-09-18 08:07:40 +00:00
Juergen Ributzka
1686c27b86 Fix previous commit: [FastISel][AArch64] Simplify XALU multiplies.
When folding the intrinsic flag into the branch or select we also have to
consider the fact if the intrinsic got simplified, because it changes the
flag we have to check for.

llvm-svn: 218034
2014-09-18 07:26:26 +00:00