1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 04:52:54 +02:00
Commit Graph

2043 Commits

Author SHA1 Message Date
Robin Morisset
81ba11a5c9 Fix swift-atomics testcase
This testcase was not testing what it meant: because there were only two checks for
dmb {{ish}} in the second function, it could have missed a bug where one of the three
required dmb {{ish}} became dmb {{ishst}}. As I was fixing it, I also added
CHECK-LABELs to make it a bit less brittle.

llvm-svn: 218341
2014-09-23 23:18:01 +00:00
Quentin Colombet
7f70cb4039 [ARM] Do not perform a tail call when the caller returns several values.
The fix is slightly different then x86 (see r216117) because the number of values
attached to a return can vary even for a single returned value (e.g., f64 yields
two returned values).

<rdar://problem/18352998>

llvm-svn: 218076
2014-09-18 21:17:50 +00:00
Robin Morisset
53ad29f76a Restore "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors"
Summary:
This patch was originally in D5304 (I could not find a way to reopen that revision).
It was accepted, commited and broke the build bots because the overloading of
the constructor of ArrayRef for braced initializer lists is not supported by all
toolchains. I then reverted it, and propose this fixed version that uses a plain
C array instead in makeDMB (that array is then converted implicitly to an
ArrayRef, but that is not behind an ifdef). Could someone confirm me whether
initialization lists for plain C arrays are supported by every toolchain used
to build llvm ? Otherwise I can just initialize the array in the old way:
args[0] = ...; .. ; args[5] = ...;

Below is the description of the original patch:
```
I had only tested this code for ARMv7 and ARMv8. This patch adds several
fallback paths if the processor does not support dmb ish:
- dmb sy if a cortex-M with support for dmb
- mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB)
These fallback paths were chosen based on the code for fence seq_cst.

Thanks to luqmana for having noticed this bug.
```

Test Plan: Added more cases to atomic-load-store.ll + make check-all

Reviewers: jfb, t.p.northover, luqmana

Subscribers: llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D5386

llvm-svn: 218066
2014-09-18 18:56:04 +00:00
Robin Morisset
25e83f310a Revert "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors"
It is breaking the build on the buildbots but works fine on my machine, I revert
while trying to understand what happens (it appears to depend on the compiler used
to build, I probably used a C++11 feature that is not perfectly supported by some
of the buildbots).

This reverts commit feb3176c4d006f99af8b40373abd56215a90e7cc.

llvm-svn: 217973
2014-09-17 18:09:13 +00:00
Robin Morisset
0183a24810 [ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors
Summary:
I had only tested this code for ARMv7 and ARMv8. This patch adds several
fallback paths if the processor does not support dmb ish:
- dmb sy if a cortex-M with support for dmb
- mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB)
These fallback paths were chosen based on the code for fence seq_cst.

Thanks to luqmana for having noticed this bug.

Test Plan: Added more cases to atomic-load-store.ll + make check-all

Reviewers: jfb, t.p.northover, luqmana

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5304

llvm-svn: 217965
2014-09-17 17:41:16 +00:00
Tilmann Scheller
9817f96600 [ARM] Add Thumb-2 code size optimization regression test for LSR (register).
llvm-svn: 217582
2014-09-11 10:45:50 +00:00
Tilmann Scheller
2b9d5dcb97 [ARM] Add Thumb-2 code size optimization regression test for LSR (immediate).
llvm-svn: 217581
2014-09-11 10:42:17 +00:00
Tilmann Scheller
9745855d08 [ARM] Add Thumb-2 code size optimization regression test for LSL (register).
llvm-svn: 217579
2014-09-11 10:33:39 +00:00
Tilmann Scheller
c86b96c835 [ARM] Add Thumb2 code size optimization regression test for LSL (immediate).
llvm-svn: 217576
2014-09-11 10:29:42 +00:00
Tim Northover
c24289aa47 ARM: don't size-reduce STMs using the LR register.
The only Thumb-1 multi-store capable of using LR is the PUSH instruction, which
translates to STMDB, so we shouldn't convert STMIAs.

Patch by Sergey Dmitrouk.

llvm-svn: 217498
2014-09-10 12:53:28 +00:00
Renato Golin
024d56e8f8 ARM: Negative offset support problem
This patch is to permit a negative offset usage for a non frame access.

Patch by Igor Oblakov.

llvm-svn: 217431
2014-09-09 09:57:59 +00:00
Renato Golin
0158ba63ec Missing test from r216989
llvm-svn: 216990
2014-09-02 22:46:18 +00:00
Renato Golin
36f6ac7e19 Only emit movw on ARMv6T2+
Fix PR18364.

Patch by Dimitry Andric.

llvm-svn: 216989
2014-09-02 22:45:13 +00:00
Tilmann Scheller
0da3307044 [ARM] Add Thumb-2 code size optimization regression test for EOR.
llvm-svn: 216881
2014-09-01 12:59:34 +00:00
Tilmann Scheller
0dabe778bc ARM] Add Thumb-2 code size optimization regression test for BIC.
llvm-svn: 216880
2014-09-01 12:53:29 +00:00
Tilmann Scheller
af0d1a637a [ARM] Add Thumb-2 code size optimization test for ASR (register).
llvm-svn: 216746
2014-08-29 17:19:00 +00:00
Tilmann Scheller
3d9297824f [ARM] Add Thumb-2 code size optimization test for ASR (immediate).
llvm-svn: 216744
2014-08-29 17:02:28 +00:00
Tilmann Scheller
5b77924fbf [ARM] Make Thumb-2 code size optimization test more strict.
llvm-svn: 216729
2014-08-29 15:13:35 +00:00
Tilmann Scheller
a017f260c0 [ARM] Add a first test for the Thumb-2 code size optimization pass.
While working on a Thumb-2 code size optimization I just realized that we don't have any regression tests for it.

So here's a first test case, I plan to increase the coverage over time.

llvm-svn: 216728
2014-08-29 15:04:40 +00:00
Yi Kong
7df6bd5f10 ARM: Add patterns for dbg
llvm-svn: 216451
2014-08-26 12:47:26 +00:00
Chad Rosier
9a6dcd9a47 [AArch32] Add patterns for VCVT{A,N,P,M}.
Patterns for lowering libm calls to VCVT{A,N,P,M} are also included.
Phabricator Revision: http://reviews.llvm.org/D5033

llvm-svn: 216388
2014-08-25 16:56:33 +00:00
Chad Rosier
54e07eda0d Revert "ARM: improve RTABI 4.2 conformance on Linux"
This reverts commit r215862 due to nightly failures.  Will work on getting a
reduced test case, but I wanted to get our bots green in the meantime.

llvm-svn: 216325
2014-08-23 18:29:43 +00:00
Reid Kleckner
60c1740ad6 ARM / x86_64 varargs: Don't save regparms in prologue without va_start
There's no need to do this if the user doesn't call va_start. In the
future, we're going to have thunks that forward these register
parameters with musttail calls, and they won't need these spills for
handling va_start.

Most of the test suite changes are adding va_start calls to existing
tests to keep things working.

llvm-svn: 216294
2014-08-22 21:59:26 +00:00
Quentin Colombet
8524db56a4 [ARM] Move the implementation of the target hooks related to copy-related
instruction from ARMInstrInfo to ARMBaseInstrInfo.
That way, thumb mode can also benefit from the advanced copy optimization.

<rdar://problem/12702965>

llvm-svn: 216274
2014-08-22 18:05:22 +00:00
Jonathan Roelofs
31a7253462 Add a thread-model knob for lowering atomics on baremetal & single threaded systems
http://reviews.llvm.org/D4984

llvm-svn: 216182
2014-08-21 14:35:47 +00:00
Oliver Stannard
7994364d2f [ARM] Enable DP copy, load and store instructions for FPv4-SP
The FPv4-SP floating-point unit is generally referred to as
single-precision only, but it does have double-precision registers and
load, store and GPR<->DPR move instructions which operate on them.
This patch enables the use of these registers, the main advantage of
which is that we now comply with the AAPCS-VFP calling convention.
This partially reverts r209650, which added some AAPCS-VFP support,
but did not handle return values or alignment of double arguments in
registers.

This patch also adds tests for Thumb2 code generation for
floating-point instructions and intrinsics, which previously only
existed for ARM.

llvm-svn: 216172
2014-08-21 12:50:31 +00:00
Quentin Colombet
be2a400039 [PeepholeOptimizer] Take advantage of the isInsertSubreg property in the
advanced copy optimization.

This is the final step patch toward transforming:
udiv    r0, r0, r2
udiv    r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov    r0, r1, d16
bx      lr

into:
udiv    r0, r0, r2
udiv    r1, r1, r3
bx      lr

Indeed, thanks to this patch, this optimization is able to look through
vmov.32 d16[0], r0
vmov.32 d16[1], r1

and is able to rewrite the following sequence:
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov    r0, r1, d16

into simple generic GPR copies that the coalescer managed to remove.

<rdar://problem/12702965>

llvm-svn: 216144
2014-08-21 00:19:16 +00:00
Jonathan Roelofs
63874dab8e Lower thumbv4t & thumbv5 lo->lo copies through a push-pop sequence
On pre-v6 hardware, 'MOV lo, lo' gives undefined results, so such copies need to
be avoided. This patch trades simplicity for implementation time at the expense
of performance... As they say: correctness first, then performance.

See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075998.html for a few
ideas on how to make this better.

llvm-svn: 216138
2014-08-20 23:38:50 +00:00
Quentin Colombet
5404c5510b [PeepholeOptimizer] Refactor the advanced copy optimization to take advantage of
the isRegSequence property.

This is a follow-up of r215394 and r215404, which respectively introduces the
isRegSequence property and uses it for ARM.

Thanks to the property introduced by the previous commits, this patch is able
to optimize the following sequence:
vmov	d0, r2, r3
vmov	d1, r0, r1
vmov	r0, s0
vmov	r1, s2
udiv	r0, r1, r0
vmov	r1, s1
vmov	r2, s3
udiv	r1, r2, r1
vmov.32	d16[0], r0
vmov.32	d16[1], r1
vmov	r0, r1, d16
bx	lr

into:
udiv	r0, r0, r2
udiv	r1, r1, r3
vmov.32	d16[0], r0
vmov.32	d16[1], r1
vmov	r0, r1, d16
bx	lr

This patch refactors how the copy optimizations are done in the peephole
optimizer. Prior to this patch, we had one copy-related optimization that
replaced a copy or bitcast by a generic, more suitable (in terms of register
file), copy.

With this patch, the peephole optimizer features two copy-related optimizations:
1. One for rewriting generic copies to generic copies:
PeepholeOptimizer::optimizeCoalescableCopy.
2. One for replacing non-generic copies with generic copies:
PeepholeOptimizer::optimizeUncoalescableCopy.

The goals of these two optimizations are slightly different: one rewrite the
operand of the instruction (#1), the other kills off the non-generic instruction
and replace it by a (sequence of) generic instruction(s).

Both optimizations rely on the ValueTracker introduced in r212100.

The ValueTracker has been refactored to use the information from the
TargetInstrInfo for non-generic instruction. As part of the refactoring, we
switched the tracking from the index of the definition to the actual register
(virtual or physical). This one change is to provide better consistency with
register related APIs and to ease the use of the TargetInstrInfo.

Moreover, this patch introduces a new helper class CopyRewriter used to ease the
rewriting of generic copies (i.e., #1).

Finally, this patch adds a dead code elimination pass right after the peephole
optimizer to get rid of dead code that may appear after rewriting.

This is related to <rdar://problem/12702965>.

Review: http://reviews.llvm.org/D4874
llvm-svn: 216088
2014-08-20 17:41:48 +00:00
Yi Kong
57329b1cc2 ARM: Fix codegen for rbit intrinsic
LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic.
According to ARM ARM, rbit only takes register as argument, not immediate.
The correct instruction should be rbit <Rd>, <Rm>.

The bug was originally introduced in r211057.

Differential Revision: http://reviews.llvm.org/D4980

llvm-svn: 216064
2014-08-20 10:40:20 +00:00
Juergen Ributzka
15f8549d05 Reapply [FastISel] Let the target decide first if it wants to materialize a constant (215588).
Note: This was originally reverted to track down a buildbot error. This commit
exposed a latent bug that was fixed in r215753. Therefore it is reapplied
without any modifications.

I run it through SPEC2k and SPEC2k6 for AArch64 and it didn't introduce any new
regeressions.

Original commit message:
This changes the order in which FastISel tries to materialize a constant.
Originally it would try to use a simple target-independent approach, which
can lead to the generation of inefficient code.

On X86 this would result in the use of movabsq to materialize any 64bit
integer constant - even for simple and small values such as 0 and 1. Also
some very funny floating-point materialization could be observed too.

On AArch64 it would materialize the constant 0 in a register even the
architecture has an actual "zero" register.

On ARM it would generate unnecessary mov instructions or not use mvn.

This change simply changes the order and always asks the target first if it
likes to materialize the constant. This doesn't fix all the issues
mentioned above, but it enables the targets to implement such
optimizations.

Related to <rdar://problem/17420988>.

llvm-svn: 216006
2014-08-19 19:05:24 +00:00
Oliver Stannard
159a549ea3 [ARM,AArch64] Do not tail-call to an externally-defined function with weak linkage
Externally-defined functions with weak linkage should not be
tail-called on ARM or AArch64, as the AAELF spec requires normal calls
to undefined weak functions to be replaced with a NOP or jump to the
next instruction. The behaviour of branch instructions in this
situation (as used for tail calls) is implementation-defined, so we
cannot rely on the linker replacing the tail call with a return.

llvm-svn: 215890
2014-08-18 12:42:15 +00:00
Saleem Abdulrasool
dd9b2d6749 ARM: improve RTABI 4.2 conformance on Linux
The set of functions defined in the RTABI was separated for no real reason.
This brings us closer to proper utilisation of the functions defined by the
RTABI.  It also sets the ground for correctly emitting function calls to AEABI
functions on all AEABI conforming platforms.

The previously existing lie on the behaviour of __ldivmod and __uldivmod is
propagated as it is beyond the scope of the change.

The changes to the test are due to the fact that we now use the divmod functions
which return both the quotient and remainder and thus we no longer need to
invoke two functions on Linux (making it closer to EABI's behaviour).

llvm-svn: 215862
2014-08-17 22:51:02 +00:00
Chad Rosier
e4076eb345 [AArch32] Add support for FP rounding operations for ARMv8/AArch32.
Phabricator Revision: http://reviews.llvm.org/D4935

llvm-svn: 215772
2014-08-15 21:38:16 +00:00
Juergen Ributzka
f61c4566db [FastISel][ARM] Fix unit test from r215682.
Thanks Jim for finding this.

llvm-svn: 215733
2014-08-15 17:23:20 +00:00
Juergen Ributzka
50fa69ed58 [FastISel][ARM] Fall-back to constant pool loads when materializing an i32 constant.
FastEmit_i won't always succeed to materialize an i32 constant and just fail.
This would trigger a fall-back to SelectionDAG, which is really not necessary.

This fix will first fall-back to a constant pool load to materialize the constant
before giving up for good.

This fixes <rdar://problem/18022633>.

llvm-svn: 215682
2014-08-14 23:29:49 +00:00
Juergen Ributzka
a981de1e50 Revert several FastISel commits to track down a buildbot error.
This reverts:
r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants."
r215594 "[FastISel][X86] Use XOR to materialize the "0" value."
r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization."
r215591 "[FastISel][AArch64] Make use of the zero register when possible."
r215588 "[FastISel] Let the target decide first if it wants to materialize a constant."
r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI."

llvm-svn: 215673
2014-08-14 19:56:28 +00:00
Sanjay Patel
e7588bf533 optimize vector fneg of bitcasted integer value
This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops.

This patch is very similar to a fabs patch committed at r214892.

Differential Revision: http://reviews.llvm.org/D4852

llvm-svn: 215646
2014-08-14 15:15:28 +00:00
Juergen Ributzka
3e89dc8eab [FastISel] Let the target decide first if it wants to materialize a constant.
This changes the order in which FastISel tries to materialize a constant.
Originally it would try to use a simple target-independent approach, which
can lead to the generation of inefficient code.

On X86 this would result in the use of movabsq to materialize any 64bit
integer constant - even for simple and small values such as 0 and 1. Also
some very funny floating-point materialization could be observed too.

On AArch64 it would materialize the constant 0 in a register even the
architecture has an actual "zero" register.

On ARM it would generate unnecessary mov instructions or not use mvn.

This change simply changes the order and always asks the target first if it
likes to materialize the constant. This doesn't fix all the issues
mentioned above, but it enables the targets to implement such
optimizations.

Related to <rdar://problem/17420988>.

llvm-svn: 215588
2014-08-13 22:08:02 +00:00
Juergen Ributzka
fdca7fc3da [FastISel][ARM] Use MOVT/MOVW if the subtarget requests it.
This change is also in preparation for a future change to make sure that
the constant materialization uses MOVT/MOVW when available and not a load
from the constant pool.

llvm-svn: 215584
2014-08-13 21:42:19 +00:00
Saleem Abdulrasool
8388ecd074 ARM: try harder to detect non-IT eligible instructions
For many Thumb-1 register register instructions, setting the CPSR is not
permitted inside an IT block.  We would not correctly flag those instructions.
The previous change to identify this scenario was insufficient as it did not
actually catch all the instances.  The current list is formed by manual
inspection of the ARMv6M ARM.

The change to the Thumb2 IT block test is due to the fact that the new more
stringent checking of the MIs results in the If Conversion pass being prevented
from executing (since not all the instructions in the BB are predicable).  This
results in code gen changes.

Thanks to Tim Northover for pointing out that the previous patch was
insufficient and hinting that the use of the v6M ARM would be much easier to use
than the v7 or v8!

llvm-svn: 215382
2014-08-11 20:13:25 +00:00
Sanjay Patel
94042be148 Correct a missing RUN line in the ARM codegen test for fneg ops. We should also explicitly specify +/-neonfp.
The bug was introduced at r99570 when use of "-arm-use-neon-fp" was removed.

Differential Revision: http://reviews.llvm.org/D4846

llvm-svn: 215377
2014-08-11 19:04:28 +00:00
Oliver Stannard
378f39a337 ARM: __gnu_h2f_ieee and __gnu_f2h_ieee always use the soft-float calling convention
By default, LLVM uses the "C" calling convention for all runtime
library functions. The half-precision FP conversion functions use the
soft-float calling convention, and are needed for some targets which
use the hard-float convention by default, so must have their calling
convention explicitly set.

llvm-svn: 215348
2014-08-11 09:12:32 +00:00
Saleem Abdulrasool
1f058e747e ARM: correct isPredicable for MULS in ThHUMB mode
The ARM ARM states that CPSR may not be updated by a MUL in thumb mode.  Due to
an ordering of Thumb 2 Size Reduction and If Conversion, we would end up
generating a THUMB MULS inside an IT block.

The If Conversion pass uses the TTI isPredicable method to ensure that it can
transform a Basic Block.  However, because we only check for IT handling on
Thumb2 functions, we may miss some cases.  Even then, it only validates that the
CPSR is not *live* rather than it is not accessed.  This corrects the handling
for that particular case since the same restriction does not hold on the vast
majority of the instructions.

This does prevent the IfConversion optimization from kicking in in certain
cases, but generating correct code is more valuable.  Addresses PR20555.

llvm-svn: 215328
2014-08-10 22:20:37 +00:00
Adrian Prantl
ab8759a1d5 Make these regexes stricter by disallowing any additional characters in the output.
Thanks to dblaikie for pointing this out!

llvm-svn: 215166
2014-08-07 23:04:07 +00:00
Adrian Prantl
4f83792a23 Reflow this comment.
llvm-svn: 215160
2014-08-07 22:44:24 +00:00
Akira Hatanaka
8f4cbac196 [Branch probability] Recompute branch weights of tail-merged basic blocks.
BranchFolderPass was not correctly setting the basic block branch weights when
tail-merging created or merged blocks. This patch recomutes the weights of
tail-merged blocks using the following formula:

branch_weight(merged block to successor j) =
sum(block_frequency(bb) * branch_probability(bb -> j))

bb is a block that is in the set of merged blocks.

<rdar://problem/16256423>

llvm-svn: 215135
2014-08-07 19:30:13 +00:00
Tim Northover
0028a9a97d ARM: do not generate BLX instructions on Cortex-M CPUs.
Particularly on MachO, we were generating "blx _dest" instructions on M-class
CPUs, which don't actually exist. They happen to get fixed up by the linker
into valid "bl _dest" instructions (which is why such a massive issue has
remained largely undetected), but we shouldn't rely on that.

llvm-svn: 214959
2014-08-06 11:13:14 +00:00
Tim Northover
7abd4db81b ARM-MachO: materialize callee address correctly on v4t.
llvm-svn: 214958
2014-08-06 11:13:06 +00:00
David Blaikie
37a242b70f DebugInfo: Assert that any CU for which debug_loc lists are emitted, has at least one range.
This was coming in weird debug info that had variables (and hence
debug_locs) but was in GMLT mode (because it was missing the 13th field
of the compile_unit metadata) so no ranges were constructed. We should
always have at least one range for any CU with a debug_loc in it -
because the range should cover the debug_loc.

The assertion just ensures that the "!= 1" range case inside the
subsequent loop doesn't get entered for the case where there are no
ranges at all, which should never reach here in the first place.

llvm-svn: 214939
2014-08-06 00:21:25 +00:00