1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 21:42:54 +02:00
Commit Graph

1978 Commits

Author SHA1 Message Date
NAKAMURA Takumi
47c4f52209 llvm/test/CodeGen/ARM/inlineasm-global.ll: Add explicit triple to appease targeting *-win32.
llvm-svn: 213933
2014-07-25 09:55:01 +00:00
NAKAMURA Takumi
0577d1aff3 llvm/test/CodeGen/ARM/inlineasm-global.ll: Avoid specifing source file on llc.
It sometimes confuses FileCheck. Consider the case that path contains 'stmib'. :)

llvm-svn: 213932
2014-07-25 09:54:49 +00:00
Akira Hatanaka
d624eb1a76 [ARM] In thumb mode, emit directive ".code 16" before file level inline
assembly instructions.

This is necessary to ensure ARM assembler switches to Thumb mode before it
starts assembling the file level inline assembly instructions at the beginning
of a .s file.

<rdar://problem/17757232>

llvm-svn: 213924
2014-07-25 05:12:49 +00:00
Chandler Carruth
8182e043d9 [SDAG] Introduce a combined set to the DAG combiner which tracks nodes
which have successfully round-tripped through the combine phase, and use
this to ensure all operands to DAG nodes are visited by the combiner,
even if they are only added during the combine phase.

This is critical to have the combiner reach nodes that are *introduced*
during combining. Previously these would sometimes be visited and
sometimes not be visited based on whether they happened to end up on the
worklist or not. Now we always run them through the combiner.

This fixes quite a few bad codegen test cases lurking in the suite while
also being more principled. Among these, the TLS codegeneration is
particularly exciting for programs that have this in the critical path
like TSan-instrumented binaries (although I think they engineer to use
a different TLS that is faster anyways).

I've tried to check for compile-time regressions here by running llc
over a merged (but not LTO-ed) clang bitcode file and observed at most
a 3% slowdown in llc. Given that this is essentially a worst case (none
of opt or clang are running at this phase) I think this is tolerable.
The actual LTO case should be even less costly, and the cost in normal
compilation should be negligible.

With this combining logic, it is possible to re-legalize as we combine
which is necessary to implement PSHUFB formation on x86 as
a post-legalize DAG combine (my ultimate goal).

Differential Revision: http://reviews.llvm.org/D4638

llvm-svn: 213898
2014-07-24 22:15:28 +00:00
Tim Northover
869fa46eae ARM: spot SBFX-compatbile code expressed with sign_extend_inreg
We were assuming all SBFX-like operations would have the shl/asr form, but
often when the field being extracted is an i8 or i16, we end up with a
SIGN_EXTEND_INREG acting on a shift instead. Simple enough to check for though.

llvm-svn: 213754
2014-07-23 13:59:12 +00:00
Tilmann Scheller
92abecadc2 [ARM] Add regression test for the earlyclobber constraint of ARM STRB.
The constraint was added in r213369.

llvm-svn: 213730
2014-07-23 08:39:50 +00:00
Tilmann Scheller
e0c6a75a6a [ARM] Add earlyclobber constraint to pre/post-indexed ARM STRH instructions.
The post-indexed instructions were missing the constraint, causing unpredictable STRH instructions to be emitted.

The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed.

llvm-svn: 213729
2014-07-23 08:12:51 +00:00
Chandler Carruth
908e62868c [SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate
insertions.

The old behavior could cause arbitrarily bad memory usage in the DAG
combiner if there was heavy traffic of adding nodes already on the
worklist to it. This commit switches the DAG combine worklist to work
the same way as the instcombine worklist where we null-out removed
entries and only add new entries to the worklist. My measurements of
codegen time shows slight improvement. The memory utilization is
unsurprisingly dominated by other factors (the IR and DAG itself
I suspect).

This change results in subtle, frustrating churn in the particular order
in which DAG combines are applied which causes a number of minor
regressions where we fail to match a pattern previously matched by
accident. AFAICT, all of these should be using AddToWorklist to directly
or should be written in a less brittle way. None of the changes seem
drastically bad, and a few of the changes seem distinctly better.

A major change required to make this work is to significantly harden the
way in which the DAG combiner handle nodes which become dead
(zero-uses). Previously, we relied on the ability to "priority-bump"
them on the combine worklist to achieve recursive deletion of these
nodes and ensure that the frontier of remaining live nodes all were
added to the worklist. Instead, I've introduced a routine to just
implement that precise logic with no indirection. It is a significantly
simpler operation than that of the combiner worklist proper. I suspect
this will also fix some other problems with the combiner.

I think the x86 changes are really minor and uninteresting, but the
avx512 change at least is hiding a "regression" (despite the test case
being just noise, not testing some performance invariant) that might be
looked into. Not sure if any of the others impact specific "important"
code paths, but they didn't look terribly interesting to me, or the
changes were really minor. The consensus in review is to fix any
regressions that show up after the fact here.

Thanks to the other reviewers for checking the output on other
architectures. There is a specific regression on ARM that Tim already
has a fix prepped to commit.

Differential Revision: http://reviews.llvm.org/D4616

llvm-svn: 213727
2014-07-23 07:08:53 +00:00
Logan Chien
b6d535b47e Replace the result usages while legalizing cmpxchg.
We should update the usages to all of the results;
otherwise, we might get assertion failure or SEGV during
the type legalization of ATOMIC_CMP_SWAP_WITH_SUCCESS
with two or more illegal types.

For example, in the following sequence, both i8 and i1
might be illegal in some target, e.g. armv5, mipsel, mips64el,

    %0 = cmpxchg i8* %ptr, i8 %desire, i8 %new monotonic monotonic
    %1 = extractvalue { i8, i1 } %0, 1

Since both i8 and i1 should be legalized, the corresponding
ATOMIC_CMP_SWAP_WITH_SUCCESS dag will be checked/replaced/updated
twice.

If we don't update the usage to *ALL* of the results in the
first round, the DAG for extractvalue might be processed earlier.
The GetPromotedInteger() will result in assertion failure,
because its operand (i.e. the success bit of cmpxchg) is not
promoted beforehand.

llvm-svn: 213569
2014-07-21 17:33:44 +00:00
Saleem Abdulrasool
f8a4e76c32 ARM: correct WoA __builtin_alloca handling on O0
When performing a dynamic stack adjustment without optimisations, we would mark
SP as def and R4 as kill.  This occurred as part of the expansion of a
WIN__CHKSTK SDNode which indicated the proper handling of SP and R4.  The result
would be that we would double define SP as part of an operation, which is
obviously incorrect.

Furthermore, the VTList for the chain had an incorrect parameter type of i32
instead of Other.

Correct these to permit proper lowering of __builtin_alloca at -O0.

llvm-svn: 213442
2014-07-19 01:29:51 +00:00
Tim Northover
25c770b7c4 ARM: support legalisation of "fptrunc ... to half" operations.
llvm-svn: 213373
2014-07-18 13:01:19 +00:00
Tim Northover
e4c93c0798 CodeGen: soften f16 type by default instead of marking legal.
Actual support for softening f16 operations is still limited, and can be added
when it's needed.  But Soften is much closer to being a useful thing to try
than keeping it Legal when no registers can actually hold such values.

Longer term, we probably want something between Soften and Promote semantics
for most targets, it'll be more efficient to promote the 4 basic operations to
f32 than libcall them.

llvm-svn: 213372
2014-07-18 12:41:46 +00:00
Tilmann Scheller
d04d5fde2e [ARM] Add earlyclobber constraint to pre/post-indexed ARM STR instructions.
The post-indexed instructions were missing the constraint, causing unpredictable STR instructions to be emitted.

The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed.

This fixes PR20323.

llvm-svn: 213369
2014-07-18 12:05:49 +00:00
Tim Northover
48ae22e14a ARM: support direct f16 <-> f64 conversions
ARMv8 has instructions to handle it, otherwise a libcall is needed.

llvm-svn: 213254
2014-07-17 11:27:04 +00:00
Tim Northover
eae1f1c8cc CodeGen: extend f16 conversions to permit types > float.
This makes the two intrinsics @llvm.convert.from.f16 and
@llvm.convert.to.f16 accept types other than simple "float". This is
only strictly needed for the truncate operation, since otherwise
double rounding occurs and there's no way to represent the strict IEEE
conversion. However, for symmetry we allow larger types in the extend
too.

During legalization, we can expand an "fp16_to_double" operation into
two extends for convenience, but abort when the truncate isn't legal. A new
libcall is probably needed here.

Even after this commit, various target tweaks are needed to actually use the
extended intrinsics. I've put these into separate commits for clarity, so there
are no actual tests of f64 conversion here.

llvm-svn: 213248
2014-07-17 10:51:23 +00:00
Juergen Ributzka
63d2af65d4 [FastISel] Local values shouldn't be alive across an inline asm call with side effects.
This fixes an issue where a local value is defined before and used after an
inline asm call with side effects.

This fix simply flushes the local value map, which updates the insertion point
for the inline asm call to be above any previously defined local values.

This fixes <rdar://problem/17694203>

llvm-svn: 213203
2014-07-16 22:20:51 +00:00
Chris Bieneman
d1b660f0a6 [RegisterCoalescer] Add new subtarget hook allowing targets to opt-out of coalescing.
The coalescer is very aggressive at propagating constraints on the register classes, and the register allocator doesn’t know how to split sub-registers later to recover. This patch provides an escape valve for targets that encounter this problem to limit coalescing.

This patch also implements such for ARM to lower register pressure when using lots of large register classes. This works around PR18825.

llvm-svn: 213078
2014-07-15 17:18:41 +00:00
Bill Wendling
1a86df990e Unify the lowering of arguments during SjLj prepare.
The 'select true, %arg, undef' instruction can be used for both aggregate and
non-aggregate arguments.

llvm-svn: 212967
2014-07-14 18:21:11 +00:00
Bill Wendling
93c9860cb7 Support lowering of empty aggregates.
This crash was pretty common while compiling Rust for iOS (armv7). Reason -
SjLj preparation step was lowering aggregate arguments as ExtractValue +
InsertValue. ExtractValue has assertion which checks that there is some data in
value, which is not true in case of empty (no fields) structures. Rust uses
them quite extensively so this patch uses a 'select true, %val, undef'
instruction to lower the argument.

Patch by Valerii Hiora.

llvm-svn: 212922
2014-07-14 06:22:36 +00:00
Saleem Abdulrasool
b5f61cbe40 ARM: properly lower dllimport'ed global values
This completes the handling for DLL import storage symbols when lowering
instructions.  A DLL import storage symbol must have an additional load
performed prior to use.  This is applicable to variables and functions.

This is particularly important for non-function symbols as it is possible to
handle function references by emitting a thunk which performs the translation
from the unprefixed __imp_ symbol to the proper symbol (although, this is a
non-optimal lowering).  For a variable symbol, no such thunk can be
accommodated.

llvm-svn: 212431
2014-07-07 05:18:35 +00:00
Yi Kong
b99d414631 [ARM] Implement ISB memory barrier intrinsic
Adds support for __builtin_arm_isb. Also corrects DSB and ISB instructions
modelling by adding has-side-effects property.

llvm-svn: 212276
2014-07-03 16:00:41 +00:00
Adrian Prantl
2e31f00493 Debug info: split out complex DIVariable address expressions into a
separate MDNode so they can be uniqued via folding set magic. To conserve
space, DIVariable nodes are still variable-length, with the last two
fields being optional.

No functional change.
http://reviews.llvm.org/D3526

llvm-svn: 212050
2014-06-30 17:17:35 +00:00
David Blaikie
ad1b24c1a0 Fix up scoping in a few tests (and delete one that validates unnecessary behavior).
Most of this is just tests that were silently succeeding in spite of
schema changes I made over a year ago. Cleaning them up as they lead to
failures in a change I'm working on/will come soon.

test/DebugInfo/2010-01-19-DbgScope.ll was removed as it tested miscoping
where a DebugLoc described a location not in the current function. The
test case doesn't describe why this is a valid situation and should be
supported, so I'm removing it and shortly going to commit changes that
make this firmly unsupported/assert-fail.

llvm-svn: 211628
2014-06-24 20:10:27 +00:00
Christian Pirker
42c866343b ARMEB: Vector extend operations
Reviewed at http://reviews.llvm.org/D4043

llvm-svn: 211520
2014-06-23 18:05:53 +00:00
Rafael Espindola
f4a9d636ea Move test so that it is skipped if the ARM target is not enabled.
llvm-svn: 211366
2014-06-20 15:30:38 +00:00
Oliver Stannard
8087ec46e0 Emit the ARM build attributes ABI_PCS_wchar_t and ABI_enum_size.
Emit the ARM build attributes ABI_PCS_wchar_t and ABI_enum_size based on
module flags metadata.

llvm-svn: 211349
2014-06-20 10:08:11 +00:00
Tim Northover
7f8ca02cf4 ARM: implement correct atomic operations on v7M
ARM v7M has ldrex/strex but not ldrexd/strexd. This means 32-bit
operations should work as normal, but 64-bit ones are almost certainly
doomed.

Patch by Phoebe Buckheister.

llvm-svn: 211042
2014-06-16 18:49:36 +00:00
Christian Pirker
219e80de72 ARMEB: Fix trunc store for vector types
Reviewed at http://reviews.llvm.org/D4135

llvm-svn: 211010
2014-06-16 09:17:30 +00:00
Tim Northover
ad404d23ea Atomics: make use of the "cmpxchg weak" instruction.
This also simplifies the IR we create slightly: instead of working out
where success & failure should go manually, it turns out we can just
always jump to a success/failure block created for the purpose. Later
phases will sort out the mess without much difficulty.

llvm-svn: 210917
2014-06-13 16:45:52 +00:00
Tim Northover
b9ec29d7c5 IR: add "cmpxchg weak" variant to support permitted failure.
This commit adds a weak variant of the cmpxchg operation, as described
in C++11. A cmpxchg instruction with this modifier is permitted to
fail to store, even if the comparison indicated it should.

As a result, cmpxchg instructions must return a flag indicating
success in addition to their original iN value loaded. Thus, for
uniformity *all* cmpxchg instructions now return "{ iN, i1 }". The
second flag is 1 when the store succeeded.

At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been
added as the natural representation for the new cmpxchg instructions.
It is a strong cmpxchg.

By default this gets Expanded to the existing ATOMIC_CMP_SWAP during
Legalization, so existing backends should see no change in behaviour.
If they wish to deal with the enhanced node instead, they can call
setOperationAction on it. Beware: as a node with 2 results, it cannot
be selected from TableGen.

Currently, no use is made of the extra information provided in this
patch. Test updates are almost entirely adapting the input IR to the
new scheme.

Summary for out of tree users:
------------------------------

+ Legacy Bitcode files are upgraded during read.
+ Legacy assembly IR files will be invalid.
+ Front-ends must adapt to different type for "cmpxchg".
+ Backends should be unaffected by default.

llvm-svn: 210903
2014-06-13 14:24:07 +00:00
Saleem Abdulrasool
e89f958d6e CodeGen: enable mov.w/mov.t pairs with minsize for WoA
Windows on ARM uses COFF/PE which is intrinsically position independent.  For
the case of 32-bit immediates, use a pair-wise relocation as otherwise we may
exceed the range of operators.  This fixes a code generation crash when using
-Oz when targeting Windows on ARM.

llvm-svn: 210814
2014-06-12 20:06:33 +00:00
Jiangning Liu
4291d4247a Global merge for global symbols.
This commit is to improve global merge pass and support global symbol merge.
The global symbol merge is not enabled by default. For aarch64, we need some
more back-end fix to make it really benifit ADRP CSE.

llvm-svn: 210640
2014-06-11 06:44:53 +00:00
Alp Toker
03b6e12fae Reduce verbiage of lit.local.cfg files
We can just split targets_to_build in one place and make it immutable.

llvm-svn: 210496
2014-06-09 22:42:55 +00:00
Saleem Abdulrasool
cf709958ac ARM: add VLA extension for WoA Itanium ABI
The armv7-windows-itanium environment is nearly identical to the MSVC ABI. It
has a few divergences, mostly revolving around the use of the Itanium ABI for
C++. VLA support is one of the extensions that are amongst the set of the
extensions.

This adds support for proper VLA emission for this environment. This is
somewhat similar to the handling for __chkstk emission on X86 and the large
stack frame emission for ARM. The invocation style for chkstk is still
controlled via the -mcmodel flag to clang.

Make an explicit note that this is an extension.

llvm-svn: 210489
2014-06-09 20:18:42 +00:00
Saleem Abdulrasool
a55e9f811e test: add test case for SVN r210406
Add missing test case for constructor section selection.  Thanks David Blaikie!

llvm-svn: 210409
2014-06-08 01:27:32 +00:00
Saleem Abdulrasool
6fbe07452a ARM: correct assertion for long-calls on WoA
COFF/PE, so the relocation model is never static.  Loosen the assertion
accordingly.  The relocation can still be emitted properly, as it will be
converted to an IMAGE_REL_ARM_ADDR32 which will be resolved by the loader
taking the base relocation into account.  This is necessary to permit the
emission of long calls which can be controlled via the -mlong-calls option in
the driver.

llvm-svn: 210399
2014-06-07 20:29:27 +00:00
Tom Roeder
2e8a372526 Adding explicit triples to the ARM jumptable tests
llvm-svn: 210288
2014-06-05 21:40:13 +00:00
Tom Roeder
740d86dc79 Add a new attribute called 'jumptable' that creates jump-instruction tables for functions marked with this attribute.
It includes a pass that rewrites all indirect calls to jumptable functions to pass through these tables.

This also adds backend support for generating the jump-instruction tables on ARM and X86.
Note that since the jumptable attribute creates a second function pointer for a
function, any function marked with jumptable must also be marked with unnamed_addr.

llvm-svn: 210280
2014-06-05 19:29:43 +00:00
Rafael Espindola
87cd774844 Allow alias to point to an arbitrary ConstantExpr.
This  patch changes GlobalAlias to point to an arbitrary ConstantExpr and it is
up to MC (or the system assembler) to decide if that expression is valid or not.

This reduces our ability to diagnose invalid uses and how early we can spot
them, but it also lets us do things like

@test5 = alias inttoptr(i32 sub (i32 ptrtoint (i32* @test2 to i32),
                                 i32 ptrtoint (i32* @bar to i32)) to i32*)

An important implication of this patch is that the notion of aliased global
doesn't exist any more. The alias has to encode the information needed to
access it in its metadata (linkage, visibility, type, etc).

Another consequence to notice is that getSection has to return a "const char *".
It could return a NullTerminatedStringRef if there was such a thing, but when
that was proposed the decision was to just uses "const char*" for that.

llvm-svn: 210062
2014-06-03 02:41:57 +00:00
Christian Pirker
6d4cce97f1 ARMEB: Fix function return type f64
Reviewed at http://reviews.llvm.org/D3968

llvm-svn: 209990
2014-06-01 09:30:52 +00:00
Tim Northover
89515a61ad SelectionDAG: skip barriers for unordered atomic operations
Unordered is strictly weaker than monotonic, so if the latter doesn't have any
barriers then the former certainly shouldn't.

rdar://problem/16548260

llvm-svn: 209901
2014-05-30 14:41:51 +00:00
Tim Northover
6ee9050b92 ARM: use AAPCS-style prologues for embedded MachO.
Darwin prologues save their GPRs in two stages: a narrow push of r0-r7 & lr,
followed by a wide push of the remaining registers if there are any. AAPCS uses
a single push.w instruction.

It turns out that, on average, enough registers get pushed that code is smaller
in the AAPCS prologue, which is a nice property for M-class programmers. They
also have other options available for back-traces, so can hopefully deal with
the fact that FP & LR aren't adjacent in memory.

rdar://problem/15909583

llvm-svn: 209895
2014-05-30 13:23:06 +00:00
Tim Northover
3bb84c9bcc ARM & AArch64: make use of common cmpxchg idioms after expansion
The C and C++ semantics for compare_exchange require it to return a bool
indicating success. This gets mapped to LLVM IR which follows each cmpxchg with
an icmp of the value loaded against the desired value.

When lowered to ldxr/stxr loops, this extra comparison is redundant: its
results are implicit in the control-flow of the function.

This commit makes two changes: it replaces that icmp with appropriate PHI
nodes, and then makes sure earlyCSE is called after expansion to actually make
use of the opportunities revealed.

I've also added -{arm,aarch64}-enable-atomic-tidy options, so that
existing fragile tests aren't perturbed too much by the change. Many
of them either rely on undef/unreachable too pervasively to be
restored to something well-defined (particularly while making sure
they test the same obscure assert from many years ago), or depend on a
particular CFG shape, which is disrupted by SimplifyCFG.

rdar://problem/16227836

llvm-svn: 209883
2014-05-30 10:09:59 +00:00
Tim Northover
6f54bcc408 AArch64 & ARM: remove undefined behaviour from some tests.
llvm-svn: 209880
2014-05-30 08:59:55 +00:00
Amara Emerson
77fc34e95e [ARM] Emit correct build attributes for the relocation models.
Patch by Asiri Rathnayake.

llvm-svn: 209656
2014-05-27 13:30:21 +00:00
Tim Northover
2172cefdfd ARM: teach AAPCS-VFP to deal with Cortex-M4.
Cortex-M4 only has single-precision floating point support, so any LLVM
"double" type will have been split into 2 i32s by now. Fortunately, the
consecutive-register framework turns out to be precisely what's needed to
reconstruct the double and follow AAPCS-VFP correctly!

rdar://problem/17012966

llvm-svn: 209650
2014-05-27 10:43:38 +00:00
Tim Northover
6877f3a322 Segmented stacks: omit __morestack call when there's no frame.
Patch by Florian Zeitz

llvm-svn: 209436
2014-05-22 13:03:43 +00:00
Saleem Abdulrasool
fa4b3f6e65 ARM: introduce llvm.arm.undefined intrinsic
This intrinsic permits the emission of platform specific undefined sequences.
ARM has reserved the 0xde opcode which takes a single integer parameter (ignored
by the CPU).  This permits the operating system to implement custom behaviour on
this trap.  The llvm.arm.undefined intrinsic is meant to provide a means for
generating the target specific behaviour from the frontend.  This is
particularly useful for Windows on ARM which has made use of a series of these
special opcodes.

llvm-svn: 209390
2014-05-22 04:46:46 +00:00
Saleem Abdulrasool
7f0235499b ARM: correct bundle generation for MOV32T relocations
Although the previous code would construct a bundle and add the correct elements
to it, it would not finalise the bundle.  This resulted in the InternalRead
markers not being added to the MachineOperands nor, more importantly, the
externally visible defs to the bundle itself.  So, although the bundle was not
exposing the def, the generated code would be correct because there was no
optimisations being performed.  When optimisations were enabled, the post
register allocator would kick in, and the hazard recognizer would reorder
operations around the load which would define the value being operated upon.

Rather than manually constructing the bundle, simply construct and finalise the
bundle via the finaliseBundle call after both MIs have been emitted.  This
improves the code generation with optimisations where IMAGE_REL_ARM_MOV32T
relocations are emitted.

The changes to the other tests are the result of the bundle generation
preventing the scheduler from hoisting the moves across the loads.  The net
effect of the generated code is equivalent, but, is much more identical to what
is actually being lowered.

llvm-svn: 209267
2014-05-21 01:25:24 +00:00
Benjamin Kramer
600e24a1cb SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the bswap not.
- On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though.
- On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal.
- On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled.

llvm-svn: 209123
2014-05-19 13:12:38 +00:00