1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 20:12:56 +02:00
Commit Graph

7116 Commits

Author SHA1 Message Date
Saleem Abdulrasool
a36ea7e3cb ARM IAS: fix after r198172
The DPR and SPR register lists are also register lists.  Furthermore, the
registers need not be checked individually since the register type can be
checked via the list kind.  Use that to simplify the logic and fix the incorrect
assertion.

llvm-svn: 198174
2013-12-29 18:53:16 +00:00
Saleem Abdulrasool
6f98f56dff ARM: provide VFP aliases for pre-V6 mnemonics
In order to provide compatibility with the GNU assembler, provide aliases for
pre-UAL mnemonics for floating point operations.

llvm-svn: 198172
2013-12-29 17:58:35 +00:00
Saleem Abdulrasool
8e75b67e2b ARM: fix a few typos in comments
llvm-svn: 198171
2013-12-29 17:58:31 +00:00
Saleem Abdulrasool
26268ead5b ARM: fix typo in VFP instruction definition
The vstm family of VFP instructions belong to the VFP store itinerary class, not
the VFP load itinerary class.

llvm-svn: 198170
2013-12-29 17:58:27 +00:00
Bill Wendling
29d960cf77 Store the global variable that's created so that it's reclaimed afterwards.
This plugs a memory leak in ARM's FastISel by storing the GV in Module so that
it's reclaimed.
PR17978

llvm-svn: 198160
2013-12-29 08:00:04 +00:00
Saleem Abdulrasool
aa836ead3f ARM IAS: handle errors more appropriately
Directive parsers must return false if the target assembler is interested in
handling the directive.  The Error member function returns true always.  Using
the 'return Error()' pattern would incorrectly indicate to the general parser
that the target was not interested in the directive, when in reality it simply
encountered a badly formed directive or some other error.  This corrects the
behaviour to ensure that the parser behaves appropriately.

llvm-svn: 198132
2013-12-28 22:47:53 +00:00
Andrew Trick
ed2d925c84 New machine model for cortex-a9. Schedule for resources and latency.
Schedule more conservatively to account for stalls on floating point
resources and latency. Use the AGU resource to model latency stalls
since it's shared between FP and LD/ST instructions. This might not be
completely accurate but should work well in practice.

llvm-svn: 198125
2013-12-28 21:57:05 +00:00
Andrew Trick
f37b4dad96 The Cortex-A9 machine model is incomplete. Mark it as such.
Many vector operations never had itineraries. Since the new machine
model was a mapping from existing itinerary classes, we don't have a
model for these. We still want to migrate A9 even though no one has
invested in a complete model, so mark it incomplete to avoid the
scheduler asserting.

llvm-svn: 198123
2013-12-28 21:57:00 +00:00
Saleem Abdulrasool
4b51b53336 ARMAsmParser: fix typo in comment
llvm-svn: 198095
2013-12-28 03:07:12 +00:00
Joerg Sonnenberger
e549217adc Recognize armv7a and friends as aliases for armv7-a etc. for the purpose
of architecture naming.

llvm-svn: 198043
2013-12-26 11:50:28 +00:00
Saleem Abdulrasool
d63e2f47bf ARM IAS: support .even directive
The .even directive aligns content to an evan-numbered address.  This is an ARM
specific directive applicable to any section.

llvm-svn: 198031
2013-12-26 01:52:28 +00:00
Adrian Prantl
e5c282662c Debug info: On ARM ensure that the data sections come before the
(optional) DWARF sections, so compiling with -g does not result in
different code being generated.

rdar://problem/15623193

llvm-svn: 197922
2013-12-23 22:24:47 +00:00
Saleem Abdulrasool
b26620d3fc ARM: bkpt has an implicit immediate constant 0
The bkpt mnemonic has an implicit immediate constant of 0 unless otherwise
specified.  Add an instruction alias for the unvalued breakpoint mnemonic to
treat it as a 0.  This improves compatibility with GNU AS.

Signed-off-by: Saleem Abdulrasool <compnerd@compnerd.org>
llvm-svn: 197913
2013-12-23 17:23:58 +00:00
Lang Hames
9a60e0ec8e ARM AnalyzeBranch should ignore DEBUG_VALUES while analyzing terminators.
Found by inspection by Julien Lerouge. Thanks Julian!

llvm-svn: 197833
2013-12-20 20:27:51 +00:00
Saleem Abdulrasool
e75f7af412 ARM IAS: add support for the .pool directive
The .pool directive is an alias for the .ltorg directive used to create a
literal pool.  Simply treat .pool as if .ltorg was passed.

llvm-svn: 197787
2013-12-20 07:21:16 +00:00
David Peixotto
ae78d16e68 Ensure deterministic when printing ARM assembler constant pools
We dump any non-empty assembler constant pools after a successful
parse of an assembly file that uses the ldr pseudo opcode. These
per-section constant pools should be output in a deterministic order
to ensure that we always generate the same output when printing the
output with an AsmStreamer.

This patch changes the map data struture used to associate a section
with its constant pool to a MapVector to ensure deterministic
output. Because this map type does not support deletion, we now
check that the constant pool is not empty before dumping its entries
and clear the entries after emitting them with the streamer.

llvm-svn: 197735
2013-12-19 22:41:56 +00:00
David Peixotto
16536db0ae Implement the .ltorg directive for ARM assembly
This directive will write out the assembler-maintained constant
pool for the current section. These constant pools are created to
support the ldr-pseudo instruction (e.g. ldr r0, =val).

The directive can be used by the programmer to place the constant
pool in a location that can be reached by a pc-relative offset in
the ldr instruction.

llvm-svn: 197711
2013-12-19 18:26:07 +00:00
David Peixotto
a66e68bb52 Implement the ldr-pseudo opcode for ARM assembly
The ldr-pseudo opcode is a convenience for loading 32-bit constants.
It is converted into a pc-relative load from a constant pool. For
example,

  ldr r0, =0x10001
  ldr r1, =bar

will generate this output in the final assembly

  ldr r0, .Ltmp0
  ldr r1, .Ltmp1
  ...
  .Ltmp0: .long 0x10001
  .Ltmp1: .long bar

Sketch of the LDR pseudo implementation:
  Keep a map from Section => ConstantPool

  When parsing ldr r0, =val
    parse val as an MCExpr
    get ConstantPool for current Section
    Label = CreateTempSymbol()
    remember val in ConstantPool at next free slot
    add operand to ldr that is MCSymbolRef of Label

  On finishParse() callback
    Write out all non-empty constant pools
    for each Entry in ConstantPool
      Emit Entry.Label
      Emit Entry.Value

Possible improvements to be added in a later patch:
  1. Does not convert load of small constants to mov
     (e.g. ldr r0, =0x1 => mov r0, 0x1)
  2. Does reuse constant pool entries for same constant

The implementation was tested for ARM, Thumb1, and Thumb2 targets on
linux and darwin.

llvm-svn: 197708
2013-12-19 18:12:36 +00:00
Saleem Abdulrasool
a48f1a5b54 ARM IAS: support .inst directive
This adds support for the .inst directive.  This is an ARM specific directive to
indicate an instruction encoded as a constant expression.  The major difference
between .word, .short, or .byte and .inst is that the latter will be
disassembled as an instruction since it does not get flagged as data.

llvm-svn: 197657
2013-12-19 05:17:58 +00:00
Rafael Espindola
859cb122ba Synchronize the NaCl DataLayout strings with the ones in clang.
Patch by Derek Schuff.

llvm-svn: 197640
2013-12-19 00:44:37 +00:00
Weiming Zhao
628bf03d65 [aarch32] fix bug 18268: Incorrect condition of vsel
Given vsel_cc, op1, op2, since vsel has no LE/LT, to generate vsel for
such selection, it needs to inverse cc and swap op1 and op2. To inverse
cc, both L/G and E bits should be flipped.

llvm-svn: 197615
2013-12-18 22:25:17 +00:00
Rafael Espindola
6dc5fe883a Correctly handle the degenerated triple "thumb".
Fixes a crash in llc where some parts think the target is thumb and others think
it is ARM.

llvm-svn: 197607
2013-12-18 21:29:44 +00:00
Logan Chien
89e5c10f8c [arm] Rename Tag_VFP_arch to Tag_FP_arch.
According to "Addenda to ABI for ARM architecture", Tag_FP_arch is the
new name for the equivalent Tag_VFP_arch.  This commit renames
Tag_VFP_arch to Tag_FP_arch.

llvm-svn: 197587
2013-12-18 17:23:15 +00:00
Tim Northover
c9f20609f9 ARM: update comment to match reality
llvm-svn: 197570
2013-12-18 14:18:36 +00:00
Tim Northover
fe4e45a5b0 ARM: set default float ABI based on triple.
Clang sets the float-abi target option manually, but no longer
annotates each function with its ABI. This can lead to confusing
mistmatch between "clang -emit-llvm | llc" and normal clang
invocations.

Besides which, gnueabihf actually *is* hard-float. Defaulting to soft
was just perverse.

llvm-svn: 197554
2013-12-18 09:27:33 +00:00
Rafael Espindola
2b8c8dce53 On APCS, only try to align aggregates to 32 bits instead of 64.
This matches clang's behavior and since it is only a preference, it is not
an ABI issue.

llvm-svn: 197526
2013-12-17 21:36:54 +00:00
Rafael Espindola
0e6563fe28 Handle i64 first for clarity. No functionality change.
llvm-svn: 197524
2013-12-17 21:28:36 +00:00
Rafael Espindola
525165cec2 One last cleanup of LLVM's DataLayout strings.
Produce them in the same order on every target. The order is that of
getStringRepresentation: e|E-i*-f*-v*-a*-s*-n*-S*.

llvm-svn: 197411
2013-12-16 19:31:14 +00:00
Joerg Sonnenberger
dea3c5aab3 Recognize EABIHF as environment and use it for RTAPI + VFP.
llvm-svn: 197405
2013-12-16 18:51:28 +00:00
Rafael Espindola
559bceac20 The preferred alignment defaults to the abi alignment. Omit if it is the same.
llvm-svn: 197400
2013-12-16 18:01:51 +00:00
Evgeniy Stepanov
5a1332c25e Fix Android regression in r197332.
llvm-svn: 197366
2013-12-16 07:02:51 +00:00
Joerg Sonnenberger
64c33d9c40 Replace string matching with a switch on Triple::getEnvironment.
llvm-svn: 197332
2013-12-15 00:12:52 +00:00
Kevin Enderby
e6ab982310 Fixed a bug in getARMFixupKindMachOInfo() where three ARM fixup kinds
were falling into the cases for 24-bit branch kinds which are not 24-bit
branches.  The routine is to return false for fixups are expected to always
be resolvable at assembly time. Which these three fixups are as they have
limited displacement and are for local references within a function.

rdar://15586725

llvm-svn: 197282
2013-12-13 22:46:54 +00:00
Joerg Sonnenberger
2826f5b278 Enabling thumb2 mode used to force support for armv6t2. Replace this
with a temporary assertion and adjust the various test cases.

llvm-svn: 197224
2013-12-13 11:16:00 +00:00
Rafael Espindola
b5a53246ce Simplify the datalayout string of ARM and AArch64.
No functionality change.

Reviewed by Tim Northover.

llvm-svn: 197172
2013-12-12 17:43:37 +00:00
Logan Chien
e6b3d37d36 [arm] Implement ARM .arch directive.
llvm-svn: 197052
2013-12-11 17:16:25 +00:00
Tim Northover
d66403a4e1 ARM: constrain register-class in fast-isel
The tests were no longer using fast-isel at all (MachO needs an "ios" rather
than "darwin" triple at the moment and Linux needs ARM mode). Once that was
corrected, the verifier complained about a t2ADDri created for the alloca.

llvm-svn: 197046
2013-12-11 16:04:57 +00:00
NAKAMURA Takumi
54fa39136d Prune redundant dependencies in LLVMBuild.txt.
llvm-svn: 196988
2013-12-11 00:30:57 +00:00
Tim Northover
54d769f4eb Make Triple's isOSBinFormatXXX functions partition triple-space.
Most users would be surprised if "isCOFF" and "isMachO" were simultaneously
true, unless they'd put the compiler in a box with a gun attached to a photon
detector.

This makes sure precisely one of the three formats is true for any triple and
simplifies some target logic based on that.

llvm-svn: 196934
2013-12-10 16:57:43 +00:00
NAKAMURA Takumi
3fda77b6c7 Add proper dependencies to LLVMBuild.txt in llvm/lib.
I'll prune redundant deps in LLVMBuild.txt, later.

llvm-svn: 196881
2013-12-10 05:39:34 +00:00
Rafael Espindola
3a54357741 Add comments documenting the ARM datalayout string.
llvm-svn: 196850
2013-12-10 00:37:37 +00:00
Rafael Espindola
b9446f75e7 Simplify further.
Thanks to Jim Grosbach for noticing it.

llvm-svn: 196846
2013-12-10 00:15:35 +00:00
Rafael Espindola
eb7e18aa45 Refactor the construction of the DataLayout string on ARM.
llvm-svn: 196843
2013-12-09 23:56:41 +00:00
Tim Northover
2ef97554d8 ARM: fix folding of stack-adjustment (yet again).
When trying to eliminate an "sub sp, sp, #N" instruction by folding
it into an existing push/pop using dummy registers, we need to account
for the fact that this might affect precisely how "fp" gets set in the
prologue.

We were attempting this, but assuming that *whenever* we performed a
fold it would make a difference. This is false, for example, in:
    push {r4, r7, lr}
    add fp, sp, #4
    vpush {d8}
    sub sp, sp, #8

we can fold the "sub" into the "vpush", forming "vpush {d7, d8}".
However, in that case the "add fp" instruction mustn't change, which
we were getting wrong before.

Should fix PR18160.

llvm-svn: 196725
2013-12-08 15:56:50 +00:00
Ana Pazos
c66716b626 Added support for mcpu krait
- krait processor currently modeled with the same features as A9.
- Krait processor additionally has VFP4 (fused multiply add/sub)
and hardware division features enabled.
- krait has currently the same Schedule model as A9
- krait cpu flag is not recognized by the GNU assembler yet,
it is replaced with march=armv7-a to avoid a lower march
from being used.

llvm-svn: 196619
2013-12-06 22:48:17 +00:00
Weiming Zhao
f90fe366af Bug 18149: [AArch32] VSel instructions has no ARMCC field
The current peephole optimizing for compare inst assumes an instr that
uses CPSR has an MO for ARM Cond code.However, for VSEL instructions
(vseqeq, vselgt, vselgt, vselvs), there is no such operand nor do
they support the modification of Cond Code.

llvm-svn: 196588
2013-12-06 17:56:48 +00:00
Andrew Trick
192311ab9a MI-Sched: handle latency of in-order operations with the new machine model.
The per-operand machine model allows the target to define "unbuffered"
processor resources. This change is a quick, cheap way to model stalls
caused by the latency of operations that use such resources. This only
applies when the processor's micro-op buffer size is non-zero
(Out-of-Order). We can't precisely model in-order stalls during
out-of-order execution, but this is an easy and effective
heuristic. It benefits cortex-a9 scheduling when using the new
machine model, which is not yet on by default.

MI-Sched for armv7 was evaluated on Swift (and only not enabled because
of a performance bug related to predication). However, we never
evaluated Cortex-A9 performance on MI-Sched in its current form. This
change adds MI-Sched functionality to reach performance goals on
A9. The only remaining change is to allow MI-Sched to run as a PostRA
pass.

I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7:
-mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false

For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results:
(min run time over 2 runs, filtering tiny changes)

Speedups:
| Benchmarks/BenchmarkGame/recursive         |  52.39% |
| Benchmarks/VersaBench/beamformer           |  20.80% |
| Benchmarks/Misc/pi                         |  19.97% |
| Benchmarks/Misc/mandel-2                   |  19.95% |
| SPEC/CFP2000/188.ammp                      |  18.72% |
| Benchmarks/McCat/08-main/main              |  18.58% |
| Benchmarks/Misc-C++/Large/sphereflake      |  18.46% |
| Benchmarks/Olden/power                     |  17.11% |
| Benchmarks/Misc-C++/mandel-text            |  16.47% |
| Benchmarks/Misc/oourafft                   |  15.94% |
| Benchmarks/Misc/flops-7                    |  14.99% |
| Benchmarks/FreeBench/distray               |  14.26% |
| SPEC/CFP2006/470.lbm                       |  14.00% |
| mediabench/mpeg2/mpeg2dec/mpeg2decode      |  12.28% |
| Benchmarks/SmallPT/smallpt                 |  10.36% |
| Benchmarks/Misc-C++/Large/ray              |   8.97% |
| Benchmarks/Misc/fp-convert                 |   8.75% |
| Benchmarks/Olden/perimeter                 |   7.10% |
| Benchmarks/Bullet/bullet                   |   7.03% |
| Benchmarks/Misc/mandel                     |   6.75% |
| Benchmarks/Olden/voronoi                   |   6.26% |
| Benchmarks/Misc/flops-8                    |   5.77% |
| Benchmarks/Misc/matmul_f64_4x4             |   5.19% |
| Benchmarks/MiBench/security-rijndael       |   5.15% |
| Benchmarks/Misc/flops-6                    |   5.10% |
| Benchmarks/Olden/tsp                       |   4.46% |
| Benchmarks/MiBench/consumer-lame           |   4.28% |
| Benchmarks/Misc/flops-5                    |   4.27% |
| Benchmarks/mafft/pairlocalalign            |   4.19% |
| Benchmarks/Misc/himenobmtxpa               |   4.07% |
| Benchmarks/Misc/lowercase                  |   4.06% |
| SPEC/CFP2006/433.milc                      |   3.99% |
| Benchmarks/tramp3d-v4                      |   3.79% |
| Benchmarks/FreeBench/pifft                 |   3.66% |
| Benchmarks/Ptrdist/ks                      |   3.21% |
| Benchmarks/Adobe-C++/loop_unroll           |   3.12% |
| SPEC/CINT2000/175.vpr                      |   3.12% |
| Benchmarks/nbench                          |   2.98% |
| SPEC/CFP2000/183.equake                    |   2.91% |
| Benchmarks/Misc/perlin                     |   2.85% |
| Benchmarks/Misc/flops-1                    |   2.82% |
| Benchmarks/Misc-C++-EH/spirit              |   2.80% |
| Benchmarks/Misc/flops-2                    |   2.77% |
| Benchmarks/NPB-serial/is                   |   2.42% |
| Benchmarks/ASC_Sequoia/CrystalMk           |   2.33% |
| Benchmarks/BenchmarkGame/n-body            |   2.28% |
| Benchmarks/SciMark2-C/scimark2             |   2.27% |
| Benchmarks/Olden/bh                        |   2.03% |
| skidmarks10/skidmarks                      |   1.81% |
| Benchmarks/Misc/flops                      |   1.72% |

Slowdowns:
| Benchmarks/llubenchmark/llu                | -14.14% |
| Benchmarks/Polybench/stencils/seidel-2d    |  -5.67% |
| Benchmarks/Adobe-C++/functionobjects       |  -5.25% |
| Benchmarks/Misc-C++/oopack_v1p8            |  -5.00% |
| Benchmarks/Shootout/hash                   |  -2.35% |
| Benchmarks/Prolangs-C++/ocean              |  -2.01% |
| Benchmarks/Polybench/medley/floyd-warshall |  -1.98% |
| Polybench/linear-algebra/kernels/3mm       |  -1.95% |
| Benchmarks/McCat/09-vor/vor                |  -1.68% |

llvm-svn: 196516
2013-12-05 17:55:58 +00:00
Andrew Trick
9518dde658 Fix the A9 machine model. VTRN writes two registers.
llvm-svn: 196514
2013-12-05 17:55:49 +00:00
Tim Northover
d04bb11dd7 ARM: fix yet another stack-folding bug
We were trying to fold the stack adjustment into the wrong instruction in the
situation where the entire basic-block was epilogue code. Really, it can only
ever be valid to do the folding precisely where the "add sp, ..." would be
placed so there's no need for a separate iterator to track that.

Should fix PR18136.

llvm-svn: 196493
2013-12-05 11:02:02 +00:00
Alp Toker
e845f8af67 Correct word hyphenations
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471
2013-12-05 05:44:44 +00:00