1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 13:33:37 +02:00
Commit Graph

2097 Commits

Author SHA1 Message Date
Bradley Smith
5d5a40a0f8 [ARM] Prevent PerformVCVTCombine from combining a vmul/vcvt with 8 lanes
This would result in a crash since the vcvt used does not support v8i32 types.

llvm-svn: 224332
2014-12-16 10:59:27 +00:00
Duncan P. N. Exon Smith
9c5542c040 IR: Make metadata typeless in assembly
Now that `Metadata` is typeless, reflect that in the assembly.  These
are the matching assembly changes for the metadata/value split in
r223802.

  - Only use the `metadata` type when referencing metadata from a call
    intrinsic -- i.e., only when it's used as a `Value`.

  - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode`
    when referencing it from call intrinsics.

So, assembly like this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata !{i32 %v}, metadata !0)
      call void @llvm.foo(metadata !{i32 7}, metadata !0)
      call void @llvm.foo(metadata !1, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{metadata !3}, metadata !0)
      ret void, !bar !2
    }
    !0 = metadata !{metadata !2}
    !1 = metadata !{i32* @global}
    !2 = metadata !{metadata !3}
    !3 = metadata !{}

turns into this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata i32 %v, metadata !0)
      call void @llvm.foo(metadata i32 7, metadata !0)
      call void @llvm.foo(metadata i32* @global, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{!3}, metadata !0)
      ret void, !bar !2
    }
    !0 = !{!2}
    !1 = !{i32* @global}
    !2 = !{!3}
    !3 = !{}

I wrote an upgrade script that handled almost all of the tests in llvm
and many of the tests in cfe (even handling many `CHECK` lines).  I've
attached it (or will attach it in a moment if you're speedy) to PR21532
to help everyone update their out-of-tree testcases.

This is part of PR21532.

llvm-svn: 224257
2014-12-15 19:07:53 +00:00
Ahmed Bougacha
88111b0889 Reapply "[ARM] Combine base-updating/post-incrementing vector load/stores."
r223862 tried to also combine base-updating load/stores.
r224198 reverted it, as "it created a regression on the test-suite
on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order
in which the words are shown."
Reapply, with a fix to ignore non-normal load/stores.
Truncstores are handled elsewhere (you can actually write a pattern for
those, whereas for postinc loads you can't, since they return two values),
but it should be possible to also combine extloads base updates, by checking
that the memory (rather than result) type is of the same size as the addend.

Original commit message:
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.

We can do the same thing for generic load/stores.

Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).

Differential Revision: http://reviews.llvm.org/D6585

llvm-svn: 224203
2014-12-13 23:22:12 +00:00
Renato Golin
3418b50014 Revert "[ARM] Combine base-updating/post-incrementing vector load/stores."
This reverts commit r223862, as it created a regression on the test-suite
on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order
in which the words are shown. We'll investigate the issue and re-apply
when safe.

llvm-svn: 224198
2014-12-13 20:23:18 +00:00
Charlie Turner
8bc4c033ad Emit Tag_ABI_FP_16bit_format build attribute.
The __fp16 type is unconditionally exposed. Since -mfp16-format is not yet
supported, there is not a user switch to change this behaviour. This build
attribute should capture the default behaviour of the compiler, which is to
expose the IEEE 754 version of __fp16.

When -mfp16-format is emitted, that will be the way to control the value of
this build attribute.

Change-Id: I8a46641ff0fd2ef8ad0af5f482a6d1af2ac3f6b0
llvm-svn: 224115
2014-12-12 11:59:18 +00:00
Tim Northover
2e78e5c83f ARM: correctly expand LDR-lit based globals.
Quite a major error here: the expansions for the Pseudos with and without
folded load were mixed up. Fortunately it only affects ARM-mode, when not using
movw/movt, on Darwin. I'm guessing no-one actually uses that combination.

llvm-svn: 223986
2014-12-10 23:40:50 +00:00
Ahmed Bougacha
9f7458d44b [ARM] Combine base-updating/post-incrementing vector load/stores.
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.

We can do the same thing for generic load/stores.

Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).

Differential Revision: http://reviews.llvm.org/D6585

llvm-svn: 223862
2014-12-10 00:07:37 +00:00
Ahmed Bougacha
c032d370a9 [ARM] Make testcase more explicit. NFC.
llvm-svn: 223841
2014-12-09 22:08:57 +00:00
Ahmed Bougacha
80726eea3d [ARM] Also support v2f64 vld1/vst1.
It was missing from the VLD1/VST1 handling logic, even though the
corresponding instructions exist (same form as v2i64).

In preparation for a future patch.

llvm-svn: 223832
2014-12-09 21:25:00 +00:00
Charlie Turner
1fb529a753 Add missing FP build attribute tests.
The test file test/CodeGen/ARM/build-attributes.ll was missing several
floating-point build attribute tests. The intention of this commit is that for
each CPU / architecture currently tested, there are now tests that make sure
the following attributes are sufficiently checked,

  * Tag_ABI_FP_rounding
  * Tag_ABI_FP_denormal
  * Tag_ABI_FP_exceptions
  * Tag_ABI_FP_user_exceptions
  * Tag_ABI_FP_number_model

Also in this commit, the -unsafe-fp-math flag has been augmented with the full
suite of flags Clang sends to LLVM when you pass -ffast-math to Clang. That is,
`-unsafe-fp-math' has been changed to `-enable-unsafe-fp-math -disable-fp-elim
-enable-no-infs-fp-math -enable-no-nans-fp-math -fp-contract=fast'

Change-Id: I35d766076bcbbf09021021c0a534bf8bf9a32dfc
llvm-svn: 223454
2014-12-05 08:22:47 +00:00
Jonathan Roelofs
348155e124 Fix thumbv4t indirect calls
So there are a couple of issues with indirect calls on thumbv4t. First, the most
'obvious' instruction, 'blx' isn't available until v5t. And secondly, the
next-most-obvious sequence: 'mov lr, pc; bx rN' doesn't DTRT in thumb code
because the saved off pc has its thumb bit cleared, so when the callee returns
we end up in ARM mode.... yuck.

The solution is to 'bl' to a nearby landing pad with a 'bx rN' in it.

We could cut down on code size by sharing the landing pads between call sites
that are close enough, but for the moment let's do correctness first and look at
performance later.


Patch by: Iain Sandoe

http://reviews.llvm.org/D6519

llvm-svn: 223380
2014-12-04 19:34:50 +00:00
Charlie Turner
ab73ef8264 Emit ABI_FP_rounding attribute.
LLVM understands a -enable-sign-dependent-rounding-fp-math codegen option. When
the user has specified this option, the Tag_ABI_FP_rounding attribute should be
emitted with value 1. This option currently does not appear to disable
transformations and optimizations that assume default floating point rounding
behavior, AFAICT, but the intention should be recorded in the build attributes,
regardless of what the compiler actually does with the intention.

Change-Id: If838578df3dc652b6f2796b8d152545674bcb30e
llvm-svn: 223218
2014-12-03 08:12:26 +00:00
Charlie Turner
7f4e539ba5 Add tests for default value of Tag_ABI_FP_rounding.
Change-Id: I051866d073fc6ce87ce3e693a3762da6d81f4393
llvm-svn: 223217
2014-12-03 07:59:50 +00:00
Charlie Turner
e286c5ea3a Emit Tag_ABI_FP_denormal correctly in fast-math mode.
The default ARM floating-point mode does not support IEEE 754 mode exactly. Of
relevance to this patch is that input denormals are flushed to zero. The way in
which they're flushed to zero depends on the architecture,

  * For VFPv2, it is implementation defined as to whether the sign of zero is
    preserved.
  * For VFPv3 and above, the sign of zero is always preserved when a denormal
    is flushed to zero.

When FP support has been disabled, the strategy taken by this patch is to
assume the software support will mirror the behaviour of the hardware support
for the target *if it existed*. That is, for architectures which can only have
VFPv2, it is assumed the software will flush to positive zero. For later
architectures it is assumed the software will flush to zero preserving sign.

Change-Id: Icc5928633ba222a4ba3ca8c0df44a440445865fd
llvm-svn: 223110
2014-12-02 08:22:29 +00:00
Reid Kleckner
1591491217 Parse 'ghccc' in .ll files as the GHC convention (cc 10)
Previously we just used "cc 10" in the .ll files, but that isn't very
human readable.

llvm-svn: 223076
2014-12-01 21:04:44 +00:00
Tim Northover
6096fde320 ARM: lower tail calls correctly when using GHC calling convention.
Patch by Ben Gamari.

llvm-svn: 223055
2014-12-01 17:46:39 +00:00
Charlie Turner
e4a6c7fd89 Stop uppercasing build attribute data.
The string data for string-valued build attributes were being unconditionally
uppercased. There is no mention in the ARM ABI addenda about case conventions,
so it's technically implementation defined as to whether the data are
capitialised in some way or not. However, there are good reasons not to
captialise the data.

  * It's less work.
  * Some vendors may legitimately have case-sensitive checks for these
    attributes which would fail on LLVM generated object files.
  * There could be locale issues with uppercasing.

The original reasons for uppercasing appear to have stemmed from an
old codesourcery toolchain behaviour, see

http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/87133

This patch makes the object file emitted no longer captialise string
data, it encodes as seen in the assembly source.

Change-Id: Ibe20dd6e60d2773d57ff72a78470839033aa5538
llvm-svn: 222882
2014-11-27 12:13:56 +00:00
Renato Golin
b92ad16856 Fix ARM triple parsing
The triple parser should only accept existing architecture names
when the triple starts with armv, armebv, thumbv or thumbebv.

Patch by Gabor Ballabas.

llvm-svn: 222129
2014-11-17 14:08:57 +00:00
Oliver Stannard
93a823bc7a [Thumb1] Re-write emitThumbRegPlusImmediate
This was motivated by a bug which caused code like this to be
miscompiled:
  declare void @take_ptr(i8*)
  define void @test() {
    %addr1.32 = alloca i8
    %addr2.32 = alloca i32, i32 1028
    call void @take_ptr(i8* %addr1)
    ret void
  }

This was emitting the following assembly to get the value of %addr1:
  add r0, sp, #1020
  add r0, r0, #8
However, "add r0, r0, #8" is not a valid Thumb1 instruction, and this
could not be assembled. The generated object file contained this,
resulting in r0 holding SP+8 rather tha SP+1028:
  add r0, sp, #1020
  add r0, sp, #8

This function looked like it could have caused miscompilations for
other combinations of registers and offsets (though I don't think it is
currently called with these), and the heuristic it used did not match
the emitted code in all cases.

llvm-svn: 222125
2014-11-17 11:18:10 +00:00
Oliver Stannard
2efad103c3 Fix optimisations of SELECT_CC which assumed result is boolean
Some optimisations in DAGCombiner cause miscompilations for targets that use
TargetLowering::UndefinedBooleanContent, because they assume that the results
of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and
XORed. These optimisations are only valid for targets that use
ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent.

This is a follow-up to D6210/r221693.

llvm-svn: 222123
2014-11-17 10:49:31 +00:00
Tim Northover
f511e3a633 ARM: refactor .cfi_def_cfa_offset emission.
We use to track quite a few "adjusted" offsets through the FrameLowering code
to account for changes in the prologue instructions as we went and allow the
emission of correct CFA annotations. However, we were missing a couple of cases
and the code was almost impenetrable.

It's easier to just add any stack-adjusting instruction to a list and emit them
together.

llvm-svn: 222057
2014-11-14 22:45:33 +00:00
Tim Northover
5eff3fb39c ARM: correctly calculate the offset of FP in its push.
When we folded the DPR alignment gap into a push, we weren't noting the extra
distance from the beginning of the push to the FP, and so FP ended up pointing
at an incorrect offset.

The .cfi_def_cfa_offset directives are still wrong in this case, but I think
that can be improved by refactoring.

llvm-svn: 222056
2014-11-14 22:45:31 +00:00
Tim Northover
3327a92a74 ARM: simplify test.
The test's DWARF stubs were there just to trigger the emission of .cfi
directives. Fortunately, the NetBSD ABI already demands proper DWARF unwind
info, so it's easier to just use that triple.

llvm-svn: 222055
2014-11-14 22:45:23 +00:00
Tim Northover
a52b6a893f ARM: avoid duplicating branches during constant islands.
We were using a naive heuristic to determine whether a basic block already had
an unconditional branch at the end. This mostly corresponded to reality
(assuming branches got optimised) because there's not much point in a branch to
the next block, but could go wrong.

llvm-svn: 221904
2014-11-13 17:58:51 +00:00
Tim Northover
5807daa475 ARM: add @llvm.arm.space intrinsic for testing ConstantIslands.
Creating tests for the ConstantIslands pass is very difficult, since it depends
on precise layout details. Having the ability to precisely inject a number of
bytes into the stream helps greatly.

llvm-svn: 221903
2014-11-13 17:58:48 +00:00
Tom Roeder
f8bc1a9968 Add Forward Control-Flow Integrity.
This commit adds a new pass that can inject checks before indirect calls to
make sure that these calls target known locations. It supports three types of
checks and, at compile time, it can take the name of a custom function to call
when an indirect call check fails. The default failure function ignores the
error and continues.

This pass incidentally moves the function JumpInstrTables::transformType from
private to public and makes it static (with a new argument that specifies the
table type to use); this is so that the CFI code can transform function types
at call sites to determine which jump-instruction table to use for the check at
that site.

Also, this removes support for jumptables in ARM, pending further performance
analysis and discussion.

Review: http://reviews.llvm.org/D4167
llvm-svn: 221708
2014-11-11 21:08:02 +00:00
Oliver Stannard
553b791f8f LLVM incorrectly folds xor into select
LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with
(set_cc !cc x y), which is only correct when the xor has type i1.
Instead, we should check that the constant operand to the xor is all
ones.

llvm-svn: 221693
2014-11-11 17:36:01 +00:00
Lang Hames
fd4cd4028c [RegAlloc] Remove reference to the trivial spiller in test case.
This test case was never actually testing the trivial spiller: the -spiller
option has not been hooked up for a while now.

llvm-svn: 221475
2014-11-06 19:24:18 +00:00
Rafael Espindola
233f1b11dd Use FileCheck in a few tests.
llvm-svn: 221459
2014-11-06 15:05:51 +00:00
Tim Northover
b0edeb50ba ARM: try to add extra CS-register whenever stack alignment >= 8.
We currently try to push an even number of registers to preserve 8-byte
alignment during a function's prologue, but only when the stack alignment is
prcisely 8. Many of the reasons for doing it apply also when that alignment > 8
(the extra store is often free, and can save another stack adjustment, though
less frequently for 16-byte stack alignment).

llvm-svn: 221321
2014-11-05 00:27:20 +00:00
Tim Northover
ea6751aca3 ARM/Dwarf: correctly align stack before callee-saved VPRs
We were making an attempt to do this by adding an extra callee-saved GPR (so
that there was an even number in the list), but when that failed we went ahead
and pushed anyway.

This had a couple of potential issues:
  + The .cfi directives we emit misplaced dN because they were based on
    PrologEpilogInserter's calculation.
  + Unaligned stores can be less efficient.
  + Unaligned stores can actually fault (likely only an issue in niche cases,
    but possible).

This adds a final explicit stack adjustment if all other options fail, so that
the actual locations of the registers match up with where they should be.

llvm-svn: 221320
2014-11-05 00:27:13 +00:00
Charlie Turner
05675977c5 Remove the cortex-a9-mp CPU.
This CPU definition is redundant. The Cortex-A9 is defined as
supporting multiprocessing extensions. Remove its definition and
update appropriate tests.

LLVM defines both a cortex-a9 CPU and a cortex-a9-mp CPU. The only
difference between the two CPU definitions in ARM.td is that
cortex-a9-mp contains the feature FeatureMP for multiprocessing
extensions.

This is redundant since the Cortex-A9 is defined as having
multiprocessing extensions in the TRMs. armcc also defines the
Cortex-A9 as having multiprocessing extensions by default.

Change-Id: Ifcadaa6c322be0a33d9d2a39cfdd7da1d75981a7
llvm-svn: 221166
2014-11-03 17:38:00 +00:00
Quentin Colombet
06167df4ad [CodeGenPrepare] Move extractelement close to store if they can be combined.
This patch adds an optimization in CodeGenPrepare to move an extractelement
right before a store when the target can combine them.
The optimization may promote any scalar operations to vector operations in the
way to make that possible.


** Context **

Some targets use different register files for both vector and scalar operations.
This means that transitioning from one domain to another may incur copy from one
register file to another. These copies are not coalescable and may be expensive.
For example, according to the scheduling model, on cortex-A8 a vector to GPR
move is 20 cycles.


** Motivating Example **

Let us consider an example:
define void @foo(<2 x i32>* %addr1, i32* %dest) {
 %in1 = load <2 x i32>* %addr1, align 8
 %extract = extractelement <2 x i32> %in1, i32 1
 %out = or i32 %extract, 1
 store i32 %out, i32* %dest, align 4
 ret void
}

As it is, this IR generates the following assembly on armv7:
  vldr  d16, [r0]            @vector load  
  vmov.32 r0, d16[1]  @ cross-register-file copy: 20 cycles
  orr r0, r0, #1           @ scalar bitwise or
  str r0, [r1]               @ scalar store
  bx  lr

Whereas we could generate much faster code:
  vldr  d16, [r0]               @ vector load
  vorr.i32  d16, #0x1     @ vector bitwise or
  vst1.32 {d16[1]}, [r1:32] @ vector extract + store
  bx  lr

Half of the computation made in the vector is useless, but this allows to get
rid of the expensive cross-register-file copy.


** Proposed Solution **

To avoid this cross-register-copy penalty, we promote the scalar operations to
vector operations. The penalty will be removed if we manage to promote the whole
chain of computation in the vector domain.
Currently, we do that only when the chain of computation ends by a store and the
target is able to combine an extract with a store.

Stores are the most likely candidates, because other instructions produce values
that would need to be promoted and so, extracted as some point[1]. Moreover,
this is customary that targets feature stores that perform a vector extract (see
AArch64 and X86 for instance).

The proposed implementation relies on the TargetTransformInfo to decide whether
or not it is beneficial to promote a chain of computation in the vector domain.
Unfortunately, this interface is rather inaccurate for this level of details and
although this optimization may be beneficial for X86 and AArch64, the inaccuracy
will lead to the optimization being too aggressive.
Basically in TargetTransformInfo, everything that is legal has a cost of 1,
whereas, even if a vector type is legal, usually a vector operation is slightly
more expensive than its scalar counterpart. That will lead to too many
promotions that may not be counter balanced by the saving of the
cross-register-file copy. For instance, on AArch64 this penalty is just 4
cycles.

For now, the optimization is just enabled for ARM prior than v8, since those
processors have a larger penalty on cross-register-file copies, and the scope is
limited to basic blocks. Because of these two factors, we limit the effects of
the inaccuracy. Indeed, I did not want to build up a fancy cost model with block
frequency and everything on top of that.

[1] We can imagine targets that can combine an extractelement with  other
instructions than just stores. If we want to go into that direction, the current
interfaces must be augmented and, moreover, I think this becomes a global isel
problem.

Differential Revision: http://reviews.llvm.org/D5921

<rdar://problem/14170854>

llvm-svn: 220978
2014-10-31 17:52:53 +00:00
Tim Northover
d587e3a983 ARM: test default values for TAG_CPU_unaligned_access attribute.
It should be on for every target that supports unaligned accesses (e.g. not
v6m).

Patch by Charlie Turner.

llvm-svn: 220912
2014-10-30 17:05:44 +00:00
Oliver Stannard
9a595b2769 [ARM] Select VMAXNM and VMINNM regardless of operand order
Currently, the ARM backend will select the VMAXNM and VMINNM for these C
expressions:
  (a < b) ? a : b
  (a > b) ? a : b
but not these expressions:
  (a > b) ? b : a
  (a < b) ? b : a

This patch allows all of these expressions to be matched.

llvm-svn: 220671
2014-10-27 09:23:02 +00:00
Renato Golin
cd30ecdc65 Do not emit intermediate register for zero FP immediate
This updates check for double precision zero floating point constant to allow
use of instruction with immediate value rather than temporary register.
Currently "a == 0.0", where "a" is of "double" type generates:

vmov.i32        d16, #0x0
vcmpe.f64       d0, d16

With this change it becomes:

vcmpe.f64        d0, #0

Patch by Sergey Dmitrouk.

llvm-svn: 220486
2014-10-23 15:31:50 +00:00
Akira Hatanaka
f1f0b1a1fd [ARM, stack protector] If supported, use armv7 instructions.
This commit enables using movt/movw to load the stack guard address:

movw r0, :lower16:(L_g3$non_lazy_ptr-(LPC0_0+8))
movt r0, :upper16:(L_g3$non_lazy_ptr-(LPC0_0+8))
ldr r0, [pc, r0]

Previously a pc-relative load was emitted:

ldr r0, LCPI0_0
ldr r0, [pc, r0]

rdar://problem/18740489

llvm-svn: 220470
2014-10-23 04:17:05 +00:00
Tim Northover
7a41a526ce ARM: rework Thumb1 frame index rewriting
The previous code had a few problems, motivating the choices here.

1. It could create instructions clobbering CPSR, but the incoming MachineInstr
   didn't reflect this. A potential source of corruption. This is why the patch
   has a new PseudoInst for before lowering.
2. Similarly, there was some code to handle the incoming instruction not being
   ARMCC::AL, but this would have caused massive problems if it was actually
   invoked when a complex offset needing more than one instruction was requested.
3. It wasn't designed to handle unaligned pointers (or offsets). These should
   probably be minimised anyway, but the code needs to deal with them properly
   regardless.
4. It had some rather dubious ad-hoc code to avoid calling
   emitThumbRegPlusImmediate, a function which should be designed to do precisely
   this job.

We seem to cover the common cases correctly now, and hopefully can enhance
emitThumbRegPlusImmediate to handle any extra optimisations we need to add in
future.

llvm-svn: 220236
2014-10-20 21:28:41 +00:00
Oliver Stannard
23d7518ec9 [ARM] Do not select SMULW[BT] or SMLAW[BT]
The current instruction selection patterns for SMULW[BT] and SMLAW[BT]
are incorrect. These instructions multiply a 32-bit and a 16-bit value
(both signed) and return the top 32 bits of the 48-bit result. This
preserves the 16 bits of overflow, whereas the patterns they currently
match truncate the result to 16 bits then sign extend.

To select these instructions, we would need to match an ISD::SMUL_LOHI,
a sign extend, two shifts and an or. There is no way to match SMUL_LOHI
in an instruction pattern as it defines multiple values, so this would
have to be done in C++. I have raised
http://llvm.org/bugs/show_bug.cgi?id=21297 to cover allowing correct
selection of these instructions.

This fixes http://llvm.org/bugs/show_bug.cgi?id=19396

llvm-svn: 220196
2014-10-20 11:30:35 +00:00
Tim Northover
a7b041da1f ARM: remove ARM/Thumb distinction for preferred alignment.
Thumb1 has legitimate reasons for preferring 32-bit alignment of types
i1/i8/i16, since the 16-bit encoding of "add rD, sp, #imm" requires #imm to be
a multiple of 4. However, this is a trade-off betweem code size and RAM usage;
the DataLayout string is not the best place to represent it even if desired.

So this patch removes the extra Thumb requirements, hopefully making ARM and
Thumb completely compatible in this respect.

llvm-svn: 219734
2014-10-14 22:12:17 +00:00
Tim Northover
c5e92d5ca6 ARM: allow misaligned local variables in Thumb1 mode.
There's no hard requirement on LLVM to align local variable to 32-bits, so the
Thumb1 frame handling needs to be able to deal with variables that are only
naturally aligned without falling over.

llvm-svn: 219733
2014-10-14 22:12:14 +00:00
Tim Northover
519637b1b7 ARM: set preferred aggregate alignment to 32 universally.
Before, ARM and Thumb mode code had different preferred alignments, which could
lead to some rather unexpected results. There's justification for reducing it
from the default 64-bits (wasted space), but I don't think there is for going
below 32-bits.

There's no actual ABI change here, just to reassure people.

llvm-svn: 219719
2014-10-14 20:57:26 +00:00
Renato Golin
c17a0bcd0e Adds support for the Cortex-A17 to the ARM backend
Patch by Matthew Wahab.

llvm-svn: 219606
2014-10-13 10:22:19 +00:00
Renato Golin
23c290356f Emit unaligned access build attribute for ARM
Patch by Charlie Turner.

llvm-svn: 219301
2014-10-08 12:26:22 +00:00
Duncan P. N. Exon Smith
c1be4794ba Revert "Revert "DI: Fold constant arguments into a single MDString""
This reverts commit r218918, effectively reapplying r218914 after fixing
an Ocaml bindings test and an Asan crash.  The root cause of the latter
was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a
PR to investigate who requires the loose check (and why).

Original commit message follows.

--

This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString.  Integers are stringified and
a `\0` character is used as a separator.

Part of PR17891.

Note: I've attached my testcases upgrade scripts to the PR.  If I've
just broken your out-of-tree testcases, they might help.

llvm-svn: 219010
2014-10-03 20:01:09 +00:00
Renato Golin
aea9ec6761 Revert 202433 - Provide a target override for the latest regalloc heuristic
That commit was introduced in order to help investigate a problem in ARM
codegen breaking from commit 202304 (Add a limit to the heuristic that register
allocates instructions in local order). Recent analisys indicated that the
problem no longer exists, so I'm reverting this change.

See PR18996.

llvm-svn: 218981
2014-10-03 12:20:53 +00:00
Duncan P. N. Exon Smith
fb6bcc4eb2 Revert "DI: Fold constant arguments into a single MDString"
This reverts commit r218914 while I investigate some bots.

llvm-svn: 218918
2014-10-02 22:15:31 +00:00
Duncan P. N. Exon Smith
58b6077a79 DI: Fold constant arguments into a single MDString
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString.  Integers are stringified and
a `\0` character is used as a separator.

Part of PR17891.

Note: I've attached my testcases upgrade scripts to the PR.  If I've
just broken your out-of-tree testcases, they might help.

llvm-svn: 218914
2014-10-02 21:56:57 +00:00
Tim Northover
64a066f76a ARM: allow copying of CPSR when all else fails.
As with x86 and AArch64, certain situations can arise where we need to spill
CPSR in the middle of a calculation. These should be avoided where possible
(MRS/MSR is rather expensive), which ARM is actually better at than the other
two since it tries to Glue defs to uses, but as a last ditch effort, copying is
better than crashing.

rdar://problem/18011155

llvm-svn: 218789
2014-10-01 19:21:03 +00:00
Adrian Prantl
2b1df58ebe Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787
2014-10-01 18:55:02 +00:00