1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 21:42:54 +02:00
Commit Graph

1896 Commits

Author SHA1 Message Date
Arnold Schwaighofer
ac9a5042d8 Swift: Only build vldm/vstm with q register aligned register lists
Unaligned vldm/vstm need more uops and therefore are slower in general on swift.

radar://14522102

llvm-svn: 189961
2013-09-04 17:41:16 +00:00
Silviu Baranga
f1ba2ead74 Fix scheduling for vldm/vstm instructions that load/store more than 32 bytes on Cortex-A9. This also makes the existing code more compact.
llvm-svn: 189958
2013-09-04 17:05:18 +00:00
Jim Grosbach
4f219b5c7e Revert "Revert "ARM: Improve pattern for isel mul of vector by scalar.""
This reverts commit r189648.

Fixes for the previously failing clang-side arm_neon_intrinsics test
cases will be checked in separately.

llvm-svn: 189841
2013-09-03 20:08:17 +00:00
Tilmann Scheller
e8598ae406 ARM: Default to the Swift CPU when targeting armv7s/thumbv7s.
Test cases adjusted accordingly.

This fixes rdar://14871821.

llvm-svn: 189766
2013-09-02 17:09:01 +00:00
Tilmann Scheller
9205705478 Revert 189756 for now, it doesn't match what rdar://14871821 really wants.
What we really want is to enable Swift by default for *v7s triples (and there already seems to be some logic which attempts to do that). In that case the iOS version doesn't matter. 

llvm-svn: 189763
2013-09-02 15:48:17 +00:00
Tilmann Scheller
9e0d3ff678 ARM: Default to Swift when compiling for iOS 6 or later.
Test cases adjusted accordingly.

This fixes rdar://14871821.

llvm-svn: 189756
2013-09-02 12:01:58 +00:00
Michael Gottesman
113e9285a1 Revert "ARM: Improve pattern for isel mul of vector by scalar."
This reverts commit r189619.

The commit was breaking the arm_neon_intrinsic test.

llvm-svn: 189648
2013-08-30 05:36:14 +00:00
Jim Grosbach
7089633cb9 ARM: Improve pattern for isel mul of vector by scalar.
In addition to recognizing when the multiply's second argument is
coming from an explicit VDUPLANE, also look for a plain scalar
f32 reference and reference it via the corresponding vector
lane.

rdar://14870054

llvm-svn: 189619
2013-08-29 22:41:46 +00:00
Tim Northover
490c4c1bda ARM: remove unused v(add|sub)hn and vqdml[as]l intrinsics.
Clang is now generating cleaner IR, so this removes the old variants which
should be completely unused.

llvm-svn: 189481
2013-08-28 14:33:33 +00:00
Tim Northover
e4e6bb8e0e ARM: add patterns for vqdmlal with separate vqdmull and vqadds
The vqdmlal and vqdmlls instructions are really just a fused pair consisting of
a vqdmull.sN and a vqadd.sN. This adds patterns to LLVM so that we can switch
Clang's CodeGen over to generating these instead of the special vqdmlal
intrinsics.

llvm-svn: 189480
2013-08-28 12:15:16 +00:00
Tim Northover
24c6842d69 ARM: add natural patterns for vaddhl and vsubhl.
These instructions aren't particularly complicated and it's well worth having
patterns for some reasonably useful LLVM IR that will match them. Soon we
should be able to switch Clang over to producing this natural version.

llvm-svn: 189335
2013-08-27 10:31:36 +00:00
Manman Ren
20ae2817fa Debug Info: add an identifier field to DICompositeType.
DICompositeType will have an identifier field at position 14. For now, the
field is set to null in DIBuilder.
For DICompositeTypes where the template argument field (the 13th field)
was optional, modify DIBuilder to make sure the template argument field is set.
Now DICompositeType has 15 fields.

Update DIBuilder to use NULL instead of "i32 0" for null value of a MDNode.
Update verifier to check that DICompositeType has 15 fields and the last
field is null or a MDString.

Update testing cases to include an extra field for DICompositeType.
The identifier field will be used by type uniquing so a front end can
genearte a DICompositeType with a unique identifer.

llvm-svn: 189282
2013-08-26 22:39:55 +00:00
Jim Grosbach
b5dae7e207 ARM: Enable machine verifier for a few more tests.
Now that fast-isel is in better shape, we can enable the machine
verifier for these tests, too.

rdar://12594152

llvm-svn: 189275
2013-08-26 20:22:08 +00:00
Jim Grosbach
f88c4f8dbc ARM: Constrain regclass for TSTri instruction.
Get the register class right for the TST instruction. This keeps the
machine verifier happy, enabling us to turn it on for another test.

rdar://12594152

llvm-svn: 189274
2013-08-26 20:22:05 +00:00
Jim Grosbach
0cc0fac41e ARM: FastISel verifier error cleanup.
Constant pool and global value reference instructions need more
restricted register classes than plain GPR.

rdar://12594152

llvm-svn: 189270
2013-08-26 20:07:29 +00:00
Joey Gouly
25375f9ffb [ARM] Fix another ARM FastISel -verify-machineinstrs issue.
llvm-svn: 189109
2013-08-23 15:20:56 +00:00
Joey Gouly
9ebd1c7d68 [ARMv8] Add CodeGen for VMAXNM/VMINNM.
llvm-svn: 189103
2013-08-23 12:01:13 +00:00
Michael Gottesman
cb2cf901dc [stack protector] Work around an issue with the BMOVPCB_CALL instruction on ARM by disabling does not return on __stack_chk_fail.
This is to fix the bots while I look to see if there is something I can do here.

rdar://14811848

llvm-svn: 189076
2013-08-22 23:45:24 +00:00
Bill Wendling
fa2b06a7e7 Update to remove the no-frame-pointer-elim-non-leaf flag if it was set to 'false'.
llvm-svn: 189068
2013-08-22 21:28:54 +00:00
Bill Wendling
77e4bfaf0d Fix some tests. The 'false' version just omits the attribute altogether.
llvm-svn: 189065
2013-08-22 21:20:14 +00:00
Manman Ren
b66c695f15 [Debug Info Tests] Update testing cases.
A single metadata will not span multiple lines. This also helps me with
my script to automatic update the testing cases.
A debug info testing case should have a llvm.dbg.cu.
Do not use hard-coded id for debug nodes.

llvm-svn: 189033
2013-08-22 17:11:18 +00:00
Joey Gouly
355a09f268 [ARMv8] Add CodeGen support for VSEL.
This uses the ARMcmov pattern that Tim cleaned up in r188995.

Thanks to Simon Tatham for his floating point help!

llvm-svn: 189024
2013-08-22 15:29:11 +00:00
Joey Gouly
67edac2b5e [ARM] Constrain some register classes in EmitAtomicBinary64 so that
we pass these tests with -verify-machineinstrs.

llvm-svn: 189006
2013-08-22 12:19:24 +00:00
Logan Chien
2891b0c61c Fix ARM FastISel PIC function call.
The function call to external function should come with PLT relocation
type if the PIC relocation model is used.

llvm-svn: 189002
2013-08-22 12:08:04 +00:00
Tim Northover
eb7a86ed88 ARM: use TableGen patterns to select CMOV operations.
Back in the mists of time (2008), it seems TableGen couldn't handle the
patterns necessary to match ARM's CMOV node that we convert select operations
to, so we wrote a lot of fairly hairy C++ to do it for us.

TableGen can deal with it now: there were a few minor differences to CodeGen
(see tests), but nothing obviously worse that I could see, so we should
probably address anything that *does* come up in a localised manner.

llvm-svn: 188995
2013-08-22 09:57:11 +00:00
Tim Northover
7e34f41ef8 ARM: respect tied 64-bit inlineasm operands when printing
The code for 'Q' and 'R' operand modifiers needs to look through tied
operands to discover the register class.

llvm-svn: 188990
2013-08-22 06:51:04 +00:00
Manman Ren
1b7973fc19 TBAA: remove !tbaa from testing cases when they are not needed.
This will make it easier to turn on struct-path aware TBAA since the metadata
format will change.

llvm-svn: 188944
2013-08-21 22:20:53 +00:00
Jim Grosbach
343f1fbc39 ARM: Fix fast-isel copy/paste-o.
Update testcase to be more careful about checking register
values. While regexes are general goodness for these sorts of
testcases, in this example, the registers are constrained by
the calling convention, so we can and should check their
explicit values.

rdar://14779513

llvm-svn: 188819
2013-08-20 19:12:42 +00:00
Tim Northover
cec1079024 ARM: implement some simple f64 materializations.
Previously we used a const-pool load for virtually all 64-bit floating values.
Actually, we can get quite a few common values (including 0.0, 1.0) via "vmov"
instructions of one stripe or another.

llvm-svn: 188773
2013-08-20 08:57:11 +00:00
Tim Northover
057a4d7c26 ARM: make sure we keep inline asm operands tied.
When patching inlineasm nodes to use GPRPair for 64-bit values, we
were dropping the information that two operands were tied, which
effectively broke the live-interval of vregs affected.

llvm-svn: 188643
2013-08-18 18:06:03 +00:00
Jim Grosbach
58bafdb9f1 ARM: Properly constrain comparison fastisel register classes.
Ongoing 'make the verifier happy' improvements to ARM fast-isel.

rdar://12594152

llvm-svn: 188595
2013-08-16 23:37:40 +00:00
Jim Grosbach
4829862a24 ARM: Fast-isel register class constrain for extends.
Properly constrain the operand register class for instructions used
in [sz]ext expansion. Update more tests to use the verifier now that
we're getting the register classes correct.

rdar://12594152

llvm-svn: 188594
2013-08-16 23:37:36 +00:00
Jim Grosbach
de05043e78 ARM: Fix more fast-isel verifier failures.
Teach the generic instruction selection helper functions to constrain
the register classes of their input operands. For non-physical register
references, the generic code needs to be careful not to mess that up
when replacing references to result registers. As the comment indicates
for MachineRegisterInfo::replaceRegWith(), it's important to call
constrainRegClass() first.

rdar://12594152

llvm-svn: 188593
2013-08-16 23:37:31 +00:00
Jim Grosbach
7f992f45da ARM: Clean up fast-isel machine verifier errors.
Lots of machine verifier errors result from using a plain GPR regclass
for incoming argument copies. A more restrictive rGPR class is more
appropriate since it more accurately represents what's happening, plus
it lines up better with isel later on so the verifier is happier.
Reduces the number of ARM fast-isel tests not running with the verifier
enabled by over half.

rdar://12594152

llvm-svn: 188592
2013-08-16 23:37:23 +00:00
Benjamin Kramer
700f0ccf14 When initializing the PIC global base register on ARM/ELF add pc to fix the address.
This unbreaks PIC with fast isel on ELF targets (PR16717). The output matches
what GCC and SDag do for PIC but may not cover all of the many flavors of PIC
that exist.

llvm-svn: 188551
2013-08-16 12:52:08 +00:00
Daniel Dunbar
a496d61c01 [tests] Cleanup initialization of test suffixes.
- Instead of setting the suffixes in a bunch of places, just set one master
   list in the top-level config. We now only modify the suffix list in a few
   suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).

 - Aside from removing the need for a bunch of lit.local.cfg files, this enables
   4 tests that were inadvertently being skipped (one in
   Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
   CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
   XFAILED).

 - This commit also fixes a bunch of config files to use config.root instead of
   older copy-pasted code.

llvm-svn: 188513
2013-08-16 00:37:11 +00:00
Renato Golin
5068d4579d Let t2LDRBi8 and t2LDRBi12 have same Base Pointer
When determining if two different loads are from the same base address,
this patch allows one load to use a t2LDRi8 address mode and another to
use a t2LDRi12 address mode. The current implementation is very
conservative and this allows the case of differing Thumb2 byte loads to
be considered. Allowing these differing modes instead of forcing the exact
same opcode is useful for situations where one opcodes loads from a base
address+1 and a second opcode loads for a base address-1.

Patch by Daniel Stewart.

llvm-svn: 188385
2013-08-14 16:35:29 +00:00
Jim Grosbach
a6bd8c2220 DAG: Combine (and (setne X, 0), (setne X, -1)) -> (setuge (add X, 1), 2)
A common idiom is to use zero and all-ones as sentinal values and to
check for both in a single conditional ("x != 0 && x != (unsigned)-1").
That generates code, for i32, like:
  testl %edi, %edi
  setne %al
  cmpl  $-1, %edi
  setne %cl
  andb  %al, %cl

With this transform, we generate the simpler:
  incl  %edi
  cmpl  $1, %edi
  seta  %al

Similar improvements for other integer sizes and on other platforms. In
general, combining the two setcc instructions into one is better.

rdar://14689217

llvm-svn: 188315
2013-08-13 21:30:58 +00:00
Tim Northover
67f2cc8ebf Fix FileCheck --check-prefix lines.
Various tests had sprung up over the years which had --check-prefix=ABC on the
RUN line, but "CHECK-ABC:" later on. This happened to work before, but was
strictly incorrect. FileCheck is getting stricter soon though.

Patch by Ron Ofir.

llvm-svn: 188173
2013-08-12 12:43:26 +00:00
Eric Christopher
e6639b8535 Make sure that if we're going to attempt to add a type to a DIE that
the type exists.

Fix up cases where we weren't checking for optional types and add
an assert to addType to make sure we catch this in the future.

Fix up a testcase that was using the tag for DW_TAG_array_type
when it meant DW_TAG_enumeration_type.

llvm-svn: 187963
2013-08-08 07:40:37 +00:00
Manman Ren
50def296e2 Debug Info Finder|Verifier: handle DbgLoc attached to instructions.
Also remove checking of llvm.dbg.sp since it is not used in generating dwarf.

Current state of Finder:
DebugInfoFinder tries to list all debug info MDNodes used in a module. To
list debug info MDNodes used by an instruction, DebugInfoFinder provides
processDeclare, processValue and processLocation to handle DbgDeclareInst,
DbgValueInst and DbgLoc attached to instructions. processModule will go
through all DICompileUnits in llvm.dbg.cu and list debug info MDNodes
used by the CUs.

TODO:
1> Finder has a list of CUs, SPs, Types, Scopes and global variables. We
need to add a list of variables that are used by DbgDeclareInst and
DbgValueInst.
2> MDString fields should be null or isa<MDString> and MDNode fields should be
null or isa<MDNode>. We currently use empty string or int 0 to represent null.
3> Go though Verify functions and make sure that they check field types.
4> Clean up existing testing cases to remove llvm.dbg.sp and make sure each
testing case has a llvm.dbg.cu.

Re-apply r187609 with fix to pass ocaml binding. vmcore.ml generates a debug
location with scope being metadata !{}, in verifier we treat this as a null
scope.

llvm-svn: 187812
2013-08-06 19:38:43 +00:00
Tim Northover
d79219981f ARM: implement allowTruncateForTailCall
Now that it's in place, it seems silly not to let ARM make use of the extra
tail call opportunities.

llvm-svn: 187795
2013-08-06 13:58:03 +00:00
Eric Christopher
973b3bf7ae Temporarily revert "Debug Info Finder|Verifier: handle DbgLoc attached to
instructions." in an attempt to bring back some bots.

This reverts commit r187609.

llvm-svn: 187638
2013-08-02 00:49:44 +00:00
Bill Wendling
e7b7059f1d Use function attributes to indicate that we don't want to realign the stack.
Function attributes are the future! So just query whether we want to realign the
stack directly from the function instead of through a random target options
structure.

llvm-svn: 187618
2013-08-01 21:42:05 +00:00
Manman Ren
dd35d4fb94 Debug Info Finder|Verifier: handle DbgLoc attached to instructions.
Also remove checking of llvm.dbg.sp since it is not used in generating dwarf.

Current state of Finder:
DebugInfoFinder tries to list all debug info MDNodes used in a module. To
list debug info MDNodes used by an instruction, DebugInfoFinder provides
processDeclare, processValue and processLocation to handle DbgDeclareInst,
DbgValueInst and DbgLoc attached to instructions. processModule will go
through all DICompileUnits in llvm.dbg.cu and list debug info MDNodes
used by the CUs.

TODO:
1> Finder has a list of CUs, SPs, Types, Scopes and global variables. We
need to add a list of variables that are used by DbgDeclareInst and
DbgValueInst.
2> MDString fields should be null or isa<MDString> and MDNode fields should be
null or isa<MDNode>. We currently use empty string or int 0 to represent null.
3> Go though Verify functions and make sure that they check field types.
4> Clean up existing testing cases to remove llvm.dbg.sp and make sure each
testing case has a llvm.dbg.cu.

llvm-svn: 187609
2013-08-01 20:52:39 +00:00
Andrew Trick
c8b891d6ff This test may have been sensitive to the ARM ABI...
llvm-svn: 187442
2013-07-30 20:34:59 +00:00
Andrew Trick
bf377e8816 MI Sched fix: assert "Disconnected LRG within the scheduling region."
llvm-svn: 187435
2013-07-30 19:59:08 +00:00
Saleem Abdulrasool
f01cc77809 [ARM] check bitwidth in PerformORCombine
When simplifying a (or (and B A) (and C ~A)) to a (VBSL A B C) ensure that the
bitwidth of the second operands to both ands match before comparing the negation
of the values.

Split the check of the value of the second operands to the ands.  Move the cast
and variable declaration slightly higher to make it slightly easier to follow.

Bug-Id: 16700
Signed-off-by: Saleem Abdulrasool <compnerd@compnerd.org>
llvm-svn: 187404
2013-07-30 04:43:08 +00:00
Quentin Colombet
94c7d4af34 [DAGCombiner] insert_vector_elt: Avoid building a vector twice.
This patch prevents the following combine when the input vector is used more
than once.
insert_vector_elt (build_vector elt0, ..., eltN), NewEltIdx, idx
=>
build_vector elt0, ..., NewEltIdx, ..., eltN 

The reasons are:
- Building a vector may be expensive, so try to reuse the existing part of a
  vector instead of creating a new one (think big vectors).
- elt0 to eltN now have two users instead of one. This may prevent some other
  optimizations.

llvm-svn: 187396
2013-07-30 00:24:09 +00:00
Manman Ren
46cc5a281a Debug Info: enable verifier for testing cases.
llvm-svn: 187375
2013-07-29 20:18:19 +00:00
Manman Ren
7a31996783 Debug Info: update testing cases to pass verifier.
llvm-svn: 187362
2013-07-29 18:12:58 +00:00
Silviu Baranga
5aac9ffdd0 Allow generation of vmla.f32 instructions when targeting Cortex-A15. The patch also adds the VFP4 feature to Cortex-A15 and fixes the DontUseFusedMAC predicate so that we can still generate vmla.f32 instructions on non-darwin targets with VFP4.
llvm-svn: 187349
2013-07-29 09:25:50 +00:00
Manman Ren
d32ce58b2f Debug Info Verifier: verify SPs in llvm.dbg.sp.
Also always add DIType, DISubprogram and DIGlobalVariable to the list
in DebugInfoFinder without checking them, so we can verify them later
on.

llvm-svn: 187285
2013-07-27 01:26:08 +00:00
Manman Ren
2bd542e9c6 Debug Info Verifier: enable verification of DICompileUnit.
We used to call Verify before adding DICompileUnit to the list, and now we
remove the check and always add DICompileUnit to the list in DebugInfoFinder,
so we can verify them later on.

llvm-svn: 187237
2013-07-26 20:04:30 +00:00
Manman Ren
c20620404a Debug Info: improve the verifier to check field types.
Make sure the context field of DIType is MDNode.
Fix testing cases to make them pass the verifier.

llvm-svn: 187150
2013-07-25 19:33:30 +00:00
Andrew Trick
7b0a985247 Evict local live ranges if they can be reassigned.
The previous change to local live range allocation also suppressed
eviction of local ranges. In rare cases, this could result in more
expensive register choices. This commit actually revives a feature
that I added long ago: check if live ranges can be reassigned before
eviction. But now it only happens in rare cases of evicting a local
live range because another local live range wants a cheaper register.

The benefit is improved code size for some benchmarks on x86 and armv7.

I measured no significant compile time increase and performance
changes are noise.

llvm-svn: 187140
2013-07-25 18:35:19 +00:00
Andrew Trick
b401fd4c9e Allocate local registers in order for optimal coloring.
Also avoid locals evicting locals just because they want a cheaper register.

Problem: MI Sched knows exactly how many registers we have and assumes
they can be colored. In cases where we have large blocks, usually from
unrolled loops, greedy coloring fails. This is a source of
"regressions" from the MI Scheduler on x86. I noticed this issue on
x86 where we have long chains of two-address defs in the same live
range. It's easy to see this in matrix multiplication benchmarks like
IRSmk and even the unit test misched-matmul.ll.

A fundamental difference between the LLVM register allocator and
conventional graph coloring is that in our model a live range can't
discover its neighbors, it can only verify its neighbors. That's why
we initially went for greedy coloring and added eviction to deal with
the hard cases. However, for singly defined and two-address live
ranges, we can optimally color without visiting neighbors simply by
processing the live ranges in instruction order.

Other beneficial side effects:

It is much easier to understand and debug regalloc for large blocks
when the live ranges are allocated in order. Yes, global allocation is
still very confusing, but it's nice to be able to comprehend what
happened locally.

Heuristics could be added to bias register assignment based on
instruction locality (think late register pairing, banks...).

Intuituvely this will make some test cases that are on the threshold
of register pressure more stable.

llvm-svn: 187139
2013-07-25 18:35:14 +00:00
Rafael Espindola
837f0a4606 Current batch of -disable-debug-info-verifier.
llvm-svn: 187130
2013-07-25 17:16:05 +00:00
Manman Ren
c48140740e Debug Info: improve the verifier to check field types.
Make sure the context and type fields are MDNodes. We will generate
verification errors if those fields are non-empty strings.
Fix testing cases to make them pass the verifier.

llvm-svn: 187106
2013-07-25 06:43:01 +00:00
Manman Ren
425f3c838c Update testing cases to pass debug info verifier.
llvm-svn: 187083
2013-07-24 22:23:00 +00:00
Quentin Colombet
83197eea3a Fix a bug in IfConverter with nested predicates.
Prior to this patch, IfConverter may widen the cases where a sequence of
instructions were executed because of the way it uses nested predicates. This
result in incorrect execution.

For instance, Let A be a basic block that flows conditionally into B and B be a
predicated block.
B can be predicated with A.BrToBPredicate into A iff B.Predicate is less
"permissive" than A.BrToBPredicate, i.e., iff A.BrToBPredicate subsumes
B.Predicate.

The IfConverter was checking the opposite: B.Predicate subsumes
A.BrToBPredicate.

<rdar://problem/14379453>

llvm-svn: 187071
2013-07-24 20:20:37 +00:00
Manman Ren
2abe20d1ee Debug Info: improve the Finder.
Improve the Finder to handle context of a DIVariable used by DbgValueInst.
Fix testing cases to make them pass the verifier.

llvm-svn: 187052
2013-07-24 17:10:09 +00:00
Manman Ren
9438f9191a Update testing cases to make them pass debug info verification.
llvm-svn: 187016
2013-07-24 01:26:37 +00:00
Manman Ren
35c74ac066 Debug Info: improve the Finder.
Improve the Finder to handle context of a DIVariable.
If Scope is a DICompileUnit, add it to the list of CUs.

llvm-svn: 187003
2013-07-23 23:10:00 +00:00
Quentin Colombet
995760ffc1 [ARM][ISel] Improve the lowering of vector loads.
When vectors are built from a single value, the ARM lowering issues a
scalar_to_vector node.
This node is then always morphed into a move from the general purpose unit to
the vector unit.
When the value comes from a load, this can be simplified into a vector load to
the right lane.

This patch changes the lowering of insert_vector_elt to expose a vector
friendly pattern in this situation.

This is a step toward fixing <rdar://problem/14170854>.

llvm-svn: 186999
2013-07-23 22:34:47 +00:00
Manman Ren
3a6f508b6b Debug Info Finder: use processDeclare and processValue to list debug info
MDNodes used by DbgDeclareInst and DbgValueInst.

Another 16 testing cases failed and they are disabled with
-disable-debug-info-verifier.
A total of 34 cases are disabled with -disable-debug-info-verifier and will be
corrected.

llvm-svn: 186902
2013-07-23 00:22:51 +00:00
Mihai Popa
cb970f1789 This adds range checking for "ldr Rn, [pc, #imm]" Thumb
instructions. With this patch:

1. ldr.n is recognized as mnemonic for the short encoding
2. ldr.w is recognized as menmonic for the long encoding
3. ldr will map to either short or long encodings depending on the size of the offset

llvm-svn: 186831
2013-07-22 15:49:36 +00:00
Lang Hames
2d3b19969c Refactor AnalyzeBranch on ARM. The previous version did not always analyze
indirect branches correctly. Under some circumstances, this led to the deletion
of basic blocks that were the destination of indirect branches. In that case it
left indirect branches to nowhere in the code.

This patch replaces, and is more general than either of the previous fixes for
indirect-branch-analysis issues, r181161 and r186461.

For other branches (not indirect) this refactor should have *almost* identical
behavior to the previous version. There are some corner cases where this
refactor is able to analyze blocks that the previous version could not (e.g.
this necessitated the update to thumb2-ifcvt2.ll). 

<rdar://problem/14464830>

llvm-svn: 186735
2013-07-19 23:52:47 +00:00
Manman Ren
4cd42b4157 Try to appease the bots.
llvm-svn: 186653
2013-07-19 04:56:51 +00:00
Stephen Lin
798e242090 Update to more CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change.
All changes were made by the following bash script:

  find test/CodeGen -name "*.ll" | \
  while read NAME; do
    echo "$NAME"
    grep -q "^; *RUN: *llc.*debug" $NAME && continue
    grep -q "^; *RUN:.*llvm-objdump" $NAME && continue
    grep -q "^; *RUN: *opt.*" $NAME && continue
    TEMP=`mktemp -t temp`
    cp $NAME $TEMP
    sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
    while read FUNC; do
      sed -i '' "s/;\([A-Za-z0-9_-]*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC[:]* *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
    done
    sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
    sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
    sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
    sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
    mv $TEMP $NAME
  done

This script catches a superset of the cases caught by the script associated with commit r186280. It initially found some false positives due to unusual constructs in a minority of tests; all such cases were disambiguated first in commit r186621.

llvm-svn: 186624
2013-07-18 22:47:09 +00:00
Stephen Lin
52ebde139c Disambiguate function names in some CodeGen tests. (Some tests were using function names that also were names of instructions and/or doing other unusual things that were making the test not amenable to otherwise scriptable pattern matching.) No functionality change.
llvm-svn: 186621
2013-07-18 22:29:15 +00:00
Stephen Lin
f569d23c3a Update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change.
llvm-svn: 186594
2013-07-18 18:35:22 +00:00
Joey Gouly
ee38b99990 Forgot 'svn add' again, sorry!
Tests for r186574.

llvm-svn: 186580
2013-07-18 13:17:26 +00:00
Joey Gouly
200e661b16 Add the tests that I forgot to 'svn add' with my previous commit (r186504).
llvm-svn: 186506
2013-07-17 14:03:49 +00:00
Manman Ren
c67f77c5d6 Cleanup testing case by using a shorter name for types.
llvm-svn: 186436
2013-07-16 18:26:48 +00:00
Tim Northover
69d676cd12 ARM: implement ldrex, strex and clrex intrinsics
Intrinsics already existed for the 64-bit variants, so these support operations
of size at most 32-bits.

llvm-svn: 186392
2013-07-16 09:46:55 +00:00
Renato Golin
5b7294a39c ARM EABI divmod support
This patch enables calls to __aeabi_idivmod when in EABI mode,
by using the remainder value returned on registers (R1),
enabled by the ARM triple "none-eabi". Note that Darwin and
GNUEABI triples will continue lowering on GNU style, that is,
using the stack for the remainder.

Still need to add SREM/UREM support fix for 64-bit lowering.

llvm-svn: 186390
2013-07-16 09:32:17 +00:00
Manman Ren
19fc512a36 PEI: Support for non-zero SPAdj at beginning of a basic block.
We can have a FrameSetup in one basic block and the matching FrameDestroy
in a different basic block when we have struct byval. In that case, SPAdj
is not zero at beginning of the basic block.

Modify PEI to correctly set SPAdj at beginning of each basic block using
DFS traversal. We used to assume SPAdj is 0 at beginning of each basic block.

PEI had an assert SPAdjCount || SPAdj == 0.
If we have a Destroy <n> followed by a Setup <m>, PEI will assert failure.
We can add an extra condition to make sure the pairs are matched:
  The pairs start with a FrameSetup.
But since we are doing a much better job in the verifier, this patch removes
the check in PEI.

PR16393

llvm-svn: 186364
2013-07-15 23:47:29 +00:00
Stephen Lin
7e501cf4c3 Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change and all updated tests passed locally.
This update was done with the following bash script:

  find test/CodeGen -name "*.ll" | \
  while read NAME; do
    echo "$NAME"
    if ! grep -q "^; *RUN: *llc.*debug" $NAME; then
      TEMP=`mktemp -t temp`
      cp $NAME $TEMP
      sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
      while read FUNC; do
        sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
      done
      sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
      sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
      sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
      sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
      mv $TEMP $NAME
    fi
  done

llvm-svn: 186280
2013-07-14 06:24:09 +00:00
Stephen Lin
3ae734a60c Convert CodeGen/*/*.ll tests to use the new CHECK-LABEL for easier debugging. No functionality change and all tests pass after conversion.
This was done with the following sed invocation to catch label lines demarking function boundaries:
    sed -i '' "s/^;\( *\)\([A-Z0-9_]*\):\( *\)test\([A-Za-z0-9_-]*\):\( *\)$/;\1\2-LABEL:\3test\4:\5/g" test/CodeGen/*/*.ll
which was written conservatively to avoid false positives rather than false negatives. I scanned through all the changes and everything looks correct.

llvm-svn: 186258
2013-07-13 20:38:47 +00:00
JF Bastien
5885dc293c Fix ARM paired GPR COPY lowering
ARM paired GPR COPY was being lowered to two MOVr without CC. This
patch puts the CC back.

My test is a reduction of the case where I encountered the issue,
64-bit atomics use paired GPRs.

The issue only occurs with selectionDAG, FastISel doesn't encounter it
so I didn't bother calling it.

llvm-svn: 186226
2013-07-12 23:33:03 +00:00
Stephen Lin
c6bb3a6cda Start using CHECK-LABEL in some tests.
llvm-svn: 186163
2013-07-12 14:54:12 +00:00
Joey Gouly
b4f59412fd Add a comment to this change, requested by Eric Christopher.
llvm-svn: 185853
2013-07-08 19:52:51 +00:00
Jim Grosbach
b4234c1d88 ARM: Improve codegen for generic vselect.
Fall back to by-element insert rather than building it up on the stack.

rdar://14351991

llvm-svn: 185846
2013-07-08 18:18:52 +00:00
Tim Northover
696f647891 Stop putting operations after a tail call.
This prevents the emission of DAG-generated vreg definitions after a
tail call be dropping them entirely (on the grounds that nothing could
use them anyway, and they interfere with O0 CodeGen).

llvm-svn: 185754
2013-07-06 12:58:45 +00:00
Arnold Schwaighofer
97cea9b991 ARM: Add a pack pattern for matching arithmetic shift right
llvm-svn: 185714
2013-07-05 18:57:49 +00:00
Arnold Schwaighofer
d5fc888196 ARM: Fix incorrect pack pattern
A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and packs them
in the bottom half of "x". An arithmetic and logic shift are only equivalent in
this context if the shift amount is 16. We would be shifting in ones into the
bottom 16bits instead of zeros if "y" is negative.

radar://14338767

llvm-svn: 185712
2013-07-05 18:28:39 +00:00
Joey Gouly
76f34b0ffb PR16490: fix a crash in ARMDAGToDAGISel::SelectInlineAsm.
In the SelectionDAG immediate operands to inline asm are constructed as
two separate operands. The first is a constant of value InlineAsm::Kind_Imm
and the second is a constant with the value of the immediate.

In ARMDAGToDAGISel::SelectInlineAsm, if we reach an operand of Kind_Imm we
should skip over the next operand too.

llvm-svn: 185688
2013-07-05 10:19:40 +00:00
Quentin Colombet
49190aa8d1 [ARM] Improve the instruction selection of vector loads.
In the ARM back-end, build_vector nodes are lowered to a target specific
build_vector that uses floating point type. 
This works well, unless the inserted bitcasts survive until instruction
selection. In that case, they incur moves between integer unit and floating
point unit that may result in inefficient code.

In other words, this conversion may introduce artificial dependencies when the
code leading to the build vector cannot be completed with a floating point type.

In particular, this happens when loads are not aligned.

Before this patch, in that case, the compiler generates general purpose loads
and creates the floating point vector from them, instead of directly using the
vector unit.

The patch uses a vector friendly sequence of code when the inserted bitcasts to
floating point survived DAGCombine.

This is done by a target specific DAGCombine that changes the target specific
build_vector into a sequence of insert_vector_elt that get rid of the bitcasts.

<rdar://problem/14170854>

llvm-svn: 185587
2013-07-03 21:42:57 +00:00
Rafael Espindola
006ee945bb Prefix failing commands with not to make clear they are expected to fail.
llvm-svn: 185554
2013-07-03 16:41:29 +00:00
Tim Northover
f4b07c69d1 ARM: relax the atomic release barrier to "dmb ishst" on Swift
Swift cores implement store barriers that are stronger than the ARM
specification but weaker than general barriers. They are, in fact, just about
enough to provide the ordering needed for atomic operations with release
semantics.

This patch makes use of that quirk.

llvm-svn: 185527
2013-07-03 09:20:36 +00:00
Tim Northover
51fd747de9 Revert r185339 (ARM: relax the atomic release barrier to "dmb ishst")
Turns out I'd misread the architecture reference manual and thought
that was a load/store-store barrier, when it's not.

Thanks for pointing it out Eli!

llvm-svn: 185356
2013-07-01 18:37:33 +00:00
Tim Northover
25286e5b71 ARM: relax the atomic release barrier to "dmb ishst"
I believe the full "dmb ish" barrier is not required to guarantee release
semantics for atomic operations. The weaker "dmb ishst" prevents previous
operations being reordered with a store executed afterwards, which is enough.

A key point to note (fortunately already correct) is that this barrier alone is
*insufficient* for sequential consistency, no matter how liberally placed.

llvm-svn: 185339
2013-07-01 14:48:48 +00:00
Lang Hames
e8b20c4f7b Add missing case to switch statement - DAGTypeLegalizer::ExpandIntegerResult
should expand ATOMIC_CMP_SWAP nodes the same way that it does for ATOMIC_SWAP.

Since ATOMIC_LOADs on some targets (e.g. older ARM variants) get legalized to
ATOMIC_CMP_SWAPs, the missing case had been causing i64 atomic loads to crash
during isel.

<rdar://problem/14074644>

llvm-svn: 185186
2013-06-28 18:36:42 +00:00
Weiming Zhao
b97c1a69a2 Bug 13662: Enable GPRPair for all i64 operands of inline asm on ARM
This patch assigns paired GPRs  for inline asm with
64-bit data on ARM. It's enabled for both ARM and Thumb to support modifiers
like %H, %Q, %R.

llvm-svn: 185169
2013-06-28 17:26:02 +00:00
Tim Northover
e13264c995 ARM: ensure fixed-point conversions have sane types
We were generating intrinsics for NEON fixed-point conversions that didn't
exist (e.g. float -> i16). There are two cases to consider:
  + iN is smaller than float. In this case we can do the conversion but need an
    extend or truncate as well.
  + iN is larger than float. In this case using the NEON conversion would be
    incorrect so we don't perform any combining.

llvm-svn: 185158
2013-06-28 15:29:25 +00:00
Manman Ren
5bedd08922 Debug Info: clean up usage of Verify.
No functionality change.
It should suffice to check the type of a debug info metadata, instead of
calling Verify. For cases where we know the type of a DI metadata, use
assert.

Also update testing cases to make them conform to the format of DI classes.

llvm-svn: 185135
2013-06-28 05:43:10 +00:00
Joey Gouly
42f1898415 Add a Subtarget feature 'v8fp' to the ARM backend.
llvm-svn: 185073
2013-06-27 11:49:26 +00:00
Joey Gouly
f4bea6681b Add a subtarget feature 'v8' to the ARM backend.
This allows for targeting the ARMv8 AArch32 variant.

llvm-svn: 184967
2013-06-26 16:58:26 +00:00
Joey Gouly
a82562fd5b Remove the 'generic' CPU from the ARM eabi attributes printer.
Make v4 the default ARM architecture attribute, to match CodeGen.

llvm-svn: 184962
2013-06-26 16:39:06 +00:00
David Blaikie
b1e1db3db6 DebugInfo: Don't lose unreferenced non-trivial by-value parameters
A FastISel optimization was causing us to emit no information for such
parameters & when they go missing we end up emitting a different
function type. By avoiding that shortcut we not only get types correct
(very important) but also location information (handy) - even if it's
only live at the start of a function & may be clobbered later.

Reviewed/discussion by Evan Cheng & Dan Gohman.

llvm-svn: 184604
2013-06-21 22:56:30 +00:00
Quentin Colombet
bdb6bfd975 ARM: Remove a (false) dependency on the memoryoperand's value as we do not use
it at the moment.
This allows to form more paired loads even when stack coloring pass destroys the
memoryoperand's value.

<rdar://problem/13978317>

llvm-svn: 184492
2013-06-20 22:51:44 +00:00
Quentin Colombet
337e6e6d1f During SelectionDAG building explicitly set a node to constant zero when the
value is zero.
This allows optmizations to kick in more easily.
Fix some test cases so that they remain meaningful (i.e., not completely dead
coded) when optimizations apply.

<rdar://problem/14096009> superfluous multiply by high part of zero-extended
value.

llvm-svn: 184222
2013-06-18 20:14:39 +00:00
Benjamin Kramer
2934370512 Switch spill weights from a basic loop depth estimation to BlockFrequencyInfo.
The main advantages here are way better heuristics, taking into account not
just loop depth but also __builtin_expect and other static heuristics and will
eventually learn how to use profile info. Most of the work in this patch is
pushing the MachineBlockFrequencyInfo analysis into the right places.

This is good for a 5% speedup on zlib's deflate (x86_64), there were some very
unfortunate spilling decisions in its hottest loop in longest_match(). Other
benchmarks I tried were mostly neutral.

This changes register allocation in subtle ways, update the tests for it.
2012-02-20-MachineCPBug.ll was deleted as it's very fragile and the instruction
it looked for was gone already (but the FileCheck pattern picked up unrelated
stuff).

llvm-svn: 184105
2013-06-17 19:00:36 +00:00
David Blaikie
f3d2951503 Debug Info: Simplify Frame Index handling in DBG_VALUE Machine Instructions
Rather than using the full power of target-specific addressing modes in
DBG_VALUEs with Frame Indicies, simply use Frame Index + Offset. This
reduces the complexity of debug info handling down to two
representations of values (reg+offset and frame index+offset) rather
than three or four.

Ideally we could ensure that frame indicies had been eliminated by the
time we reached an assembly or dwarf generation, but I haven't spent the
time to figure out where the FIs are leaking through into that & whether
there's a good place to convert them. Some FI+offset=>reg+offset
conversion is done (see PrologEpilogInserter, for example) which is
necessary for some SelectionDAG assumptions about registers, I believe,
but it might be possible to make this a more thorough conversion &
ensure there are no remaining FIs no matter how instruction selection
is performed.

llvm-svn: 184066
2013-06-16 20:34:15 +00:00
David Blaikie
ef0e4640d7 DebugInfo: follow up to 184045 to constrain the tests further to ensure they don't contain +0 offsets
llvm-svn: 184046
2013-06-15 16:02:44 +00:00
David Blaikie
7a29358013 DebugInfo: print DBG_VALUE MachineInstrs with [] for deref and drop the offset when it's zero
llvm-svn: 184045
2013-06-15 15:52:58 +00:00
Derek Schuff
4c429cad03 Make PrologEpilogInserter save/restore all callee saved registers
in functions which call __builtin_unwind_init()

__builtin_unwind_init() is an undocumented gcc intrinsic which has this effect,
and is used in libgcc_eh.

Goes part of the way toward fixing PR8541.

llvm-svn: 183984
2013-06-14 16:15:29 +00:00
JF Bastien
cb60eaba94 Enable FastISel on ARM for Linux and NaCl, not MCJIT
This is a resubmit of r182877, which was reverted because it broken
MCJIT tests on ARM. The patch leaves MCJIT on ARM as it was before: only
enabled for iOS. I've CC'ed people from the original review and revert.

FastISel was only enabled for iOS ARM and Thumb2, this patch enables it
for ARM (not Thumb2) on Linux and NaCl, but not MCJIT.

Thumb2 support needs a bit more work, mainly around register class
restrictions.

The patch punts to SelectionDAG when doing TLS relocation on non-Darwin
targets. I will fix this and other FastISel-to-SelectionDAG failures in
a separate patch.

The patch also forces FastISel to retain frame pointers: iOS always
keeps them for backtracking (so emitted code won't change because of
this), but Linux was getting much worse code that was incorrect when
using big frames (such as test-suite's lencod). I'll also fix this in a
later patch, it will probably require a peephole so that FastISel
doesn't rematerialize frame pointers back-to-back.

The test changes are straightforward, similar to:
  http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130513/174279.html
They also add a vararg test that got dropped in that change.

I ran all of lnt test-suite on A15 hardware with --optimize-option=-O0
and all the tests pass. All the tests also pass on x86 make check-all. I
also re-ran the check-all tests that failed on ARM, and they all seem to
pass.

llvm-svn: 183966
2013-06-14 02:49:43 +00:00
JF Bastien
be1c9b906e Add test for ARM FastISel load/store register classes
r183624 fixed an issue that was tested indirectly. Test it directly with this new test.

llvm-svn: 183634
2013-06-10 00:35:57 +00:00
Logan Chien
61f1a9f5d2 Refine the ARM EHABI test cases.
Since we have ARM unwind directive parser and assembler, we
can check the correctness in two stages:

1. From LLVM assembly (.ll) to ARM assembly (.s)
2. From ARM assembly (.s) to ELF object file (.o)

We already have several "*.s to *.o" test cases.  This CL adds
some "*.ll to *.s" test cases and removes the redundant "*.ll to *.o"
test cases.

New test cases to check "*.ll to *.s" code generator:

- ehabi.ll: Check the correctness of the generated unwind directives.
- section-name.ll: Check the section name of functions.

Removed test cases:

- ehabi-mc-cantunwind.ll
  (Covered by ehabi-cantunwind.ll, and eh-directive-cantunwind.s)
- ehabi-mc-compact-pr0.ll
  (Covered by ehabi.ll, eh-compact-pr0.s, eh-directive-save.s, and
   eh-directive-setfp.s)
- ehabi-mc-compact-pr1.ll
  (Covered by ehabi.ll, eh-compact-pr1.s, eh-directive-save.s, and
   eh-directive-setfp.s)
- ehabi-mc.ll
  (Covered by ehabi.ll, and eh-directive-integrated-test.s)
- ehabi-mc-section-group.ll
  (Covered by section-name.ll, and eh-directive-section-comdat.s)
- ehabi-mc-section.ll
  (Covered by section-name.ll, and eh-directive-section.s)
- ehabi-mc-sh_link.ll
  (Covered by eh-directive-text-section.s, and eh-directive-section.s)

llvm-svn: 183628
2013-06-09 12:36:57 +00:00
Logan Chien
fa2266ded0 Fix ARM unwind opcode assembler in several cases.
Changes to ARM unwind opcode assembler:

* Fix multiple .save or .vsave directives.  Besides, the
  order is preserved now.

* For the directives which will generate multiple opcodes,
  such as ".save {r0-r11}", the order of the unwind opcode
  is fixed now, i.e. the registers with less encoding value
  are popped first.

* Fix the $sp offset calculation.  Now, we can use the
  .setfp, .pad, .save, and .vsave directives at any order.

Changes to test cases:

* Add test cases to check the order of multiple opcodes
  for the .save directive.

* Fix the incorrect $sp offset in the test case.  The
  stack pointer offset specified in the test case was
  incorrect.  (Changed test cases: ehabi-mc-section.ll and
  ehabi-mc.ll)

* The opcode to restore $sp are slightly reordered.  The
  behavior are not changed, and the new output is same
  as the output of GNU as.  (Changed test cases:
  eh-directive-pad.s and eh-directive-setfp.s)

llvm-svn: 183627
2013-06-09 12:22:30 +00:00
Quentin Colombet
288faf08d2 Reapply r183552. This time, use a standard type for the option to avoid template
instantiation issue with non-standard type.

Add a backend option to warn on a given stack size limit.
Option: -mllvm -warn-stack-size=<limit>
Output (if limit is exceeded):
warning: Stack size limit exceeded (<actual size>) in <functionName>.

The longer term plan is to hook that to a clang warning.
PR:4072
<rdar://problem/13987214>.

llvm-svn: 183595
2013-06-08 00:07:54 +00:00
Quentin Colombet
8f78b800ab Revert commits related to stack warning.
llvm-svn: 183579
2013-06-07 22:14:50 +00:00
Quentin Colombet
4a1b82f036 Explicit triple in warn stack size test cases to not depend on OS.
llvm-svn: 183574
2013-06-07 21:09:42 +00:00
Quentin Colombet
474d9dcceb Add a backend option to warn on a given stack size limit.
Option: -mllvm -warn-stack-size=<limit>
Output (if limit is exceeded):
warning: Stack size limit exceeded (<actual size>) in <functionName>.

The longer term plan is to hook that to a clang warning.
PR:4072
<rdar://problem/13987214>

llvm-svn: 183552
2013-06-07 20:18:12 +00:00
JF Bastien
c950ffbe08 ARM FastISel integer sext/zext improvements
My recent ARM FastISel patch exposed this bug:
  http://llvm.org/bugs/show_bug.cgi?id=16178
The root cause is that it can't select integer sext/zext pre-ARMv6 and
asserts out.

The current integer sext/zext code doesn't handle other cases gracefully
either, so this patch makes it handle all sext and zext from i1/i8/i16
to i8/i16/i32, with and without ARMv6, both in Thumb and ARM mode. This
should fix the bug as well as make FastISel faster because it bails to
SelectionDAG less often. See fastisel-ext.patch for this.

fastisel-ext-tests.patch changes current tests to always use reg-imm AND
for 8-bit zext instead of UXTB. This simplifies code since it is
supported on ARMv4t and later, and at least on A15 both should perform
exactly the same (both have exec 1 uop 1, type I).

2013-05-31-char-shift-crash.ll is a bitcode version of the above bug
16178 repro.

fast-isel-ext.ll tests all sext/zext combinations that ARM FastISel
should now handle.

Note that my ARM FastISel enabling patch was reverted due to a separate
failure when dealing with MCJIT, I'll fix this second failure and then
turn FastISel on again for non-iOS ARM targets.

I've tested "make check-all" on my x86 box, and "lnt test-suite" on A15
hardware.

llvm-svn: 183551
2013-06-07 20:10:37 +00:00
Quentin Colombet
a8970e620f Teach AsmPrinter how to print odd constants.
Fix an assertion when the compiler encounters big constants whose bit width is
not a multiple of 64-bits.
Although clang would never generate something like this, the backend should be
able to handle any legal IR.

<rdar://problem/13363576>

llvm-svn: 183544
2013-06-07 18:36:03 +00:00
Evan Cheng
1c010771f4 Cortex-R5 can issue Thumb2 integer division instructions.
llvm-svn: 183275
2013-06-04 22:52:09 +00:00
David Majnemer
d0dc0d58f6 ARM: Fix crash in ARM backend inside of ARMConstantIslandPass
The ARM backend did not expect LDRBi12 to hold a constant pool operand.
Allow for LLVM to deal with the instruction similar to how it deals with
LDRi12.

This fixes PR16215.

llvm-svn: 183238
2013-06-04 17:46:15 +00:00
Rafael Espindola
d23482285b Revert r182937 and r182877.
r182877 broke MCJIT tests on ARM and r182937 was working around another failure
by r182877.

This should make the ARM bots green.

llvm-svn: 182960
2013-05-30 20:37:52 +00:00
Rafael Espindola
5b34d5a3c7 Change how we iterate over relocations on ELF.
For COFF and MachO, sections semantically have relocations that apply to them.
That is not the case on ELF.

In relocatable objects (.o), a section with relocations in ELF has offsets to
another section where the relocations should be applied.

In dynamic objects and executables, relocations don't have an offset, they have
a virtual address. The section sh_info may or may not point to another section,
but that is not actually used for resolving the relocations.

This patch exposes that in the ObjectFile API. It has the following advantages:

* Most (all?) clients can handle this more efficiently. They will normally walk
all relocations, so doing an effort to iterate in a particular order doesn't
save time.

* llvm-readobj now prints relocations in the same way the native readelf does.

* probably most important, relocations that don't point to any section are now
visible. This is the case of relocations in the rela.dyn section. See the
updated relocation-executable.test for example.

llvm-svn: 182908
2013-05-30 03:05:14 +00:00
Andrew Trick
aec414c298 Order CALLSEQ_START and CALLSEQ_END nodes.
Fixes PR16146: gdb.base__call-ar-st.exp fails after
pre-RA-sched=source fixes.

Patch by Xiaoyi Guo!

This also fixes an unsupported dbg.value test case. Codegen was
previously incorrect but the test was passing by luck.

llvm-svn: 182885
2013-05-29 22:03:55 +00:00
JF Bastien
235d8c9117 Enable FastISel on ARM for Linux and NaCl
FastISel was only enabled for iOS ARM and Thumb2, this patch enables it
for ARM (not Thumb2) on Linux and NaCl.

Thumb2 support needs a bit more work, mainly around register class
restrictions.

The patch punts to SelectionDAG when doing TLS relocation on non-Darwin
targets. I will fix this and other FastISel-to-SelectionDAG failures in
a separate patch.

The patch also forces FastISel to retain frame pointers: iOS always
keeps them for backtracking (so emitted code won't change because of
this), but Linux was getting much worse code that was incorrect when
using big frames (such as test-suite's lencod). I'll also fix this in a
later patch, it will probably require a peephole so that FastISel
doesn't rematerialize frame pointers back-to-back.

The test changes are straightforward, similar to:
  http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130513/174279.html
They also add a vararg test that got dropped in that change.

I ran all of test-suite on A15 hardware with --optimize-option=-O0 and
all the tests pass.

llvm-svn: 182877
2013-05-29 20:38:10 +00:00
Andrew Trick
53ca6c9254 Track IR ordering of SelectionDAG nodes 4/4.
Unit test cases for -pre-RA-sched=source.

llvm-svn: 182706
2013-05-25 03:26:51 +00:00
Andrew Trick
34c31df32a Track IR ordering of SelectionDAG nodes 3/4.
Remove the old IR ordering mechanism and switch to new one.  Fix unit
test failures.

llvm-svn: 182704
2013-05-25 03:08:10 +00:00
Tim Northover
9b6ef4d68e ARM: implement @llvm.readcyclecounter intrinsic
This implements the @llvm.readcyclecounter intrinsic as the specific
MRC instruction specified in the ARM manuals for CPUs with the Power
Management extensions.

Older CPUs had slightly different methods which may also have to be
implemented eventually, but this should cover all v7 cases.

rdar://problem/13939186

llvm-svn: 182603
2013-05-23 19:11:20 +00:00
Jakob Stoklund Olesen
b4fe10613b Fix PR16110: Handle DBG_VALUE in ConnectedVNInfoEqClasses::Distribute().
Now that the LiveDebugVariables pass is running *after* register
coalescing, the ConnectedVNInfoEqClasses class needs to deal with
DBG_VALUE instructions.

This only comes up when rematerialization during coalescing causes the
remaining live range of a virtual register to separate into two
connected components.

llvm-svn: 182592
2013-05-23 17:02:23 +00:00
Stepan Dyatkovskiy
adb91e7ed9 PR15868 fix.
Introduction:
In case when stack alignment is 8 and GPRs parameter part size is not N*8:
we add padding to GPRs part, so part's last byte must be recovered at
address K*8-1.
We need to do it, since remained (stack) part of parameter starts from
address K*8, and we need to "attach" "GPRs head" without gaps to it:

Stack:
|---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8 bytes...
[ [padding] [GPRs head] ] [ ------ Tail passed via stack  ------ ...

FIX:
Note, once we added padding we need to correct *all* Arg offsets that are going
after padded one. That's why we need this fix: Arg offsets were never corrected
before this patch. See new test-cases included in patch.

We also don't need to insert padding for byval parameters that are stored in GPRs
only. We need pad only last byval parameter and only in case it outsides GPRs
and stack alignment = 8.
Though, stack area, allocated for recovered byval params, must satisfy
"Size mod 8 = 0" restriction.

This patch reduces stack usage for some cases:
We can reduce ArgRegsSaveArea since inner N*4 bytes sized byval params my be
"packed" with alignment 4 in some cases.

llvm-svn: 182237
2013-05-20 08:01:34 +00:00
JF Bastien
cbcaf8db77 Support unaligned load/store on more ARM targets
This patch matches GCC behavior: the code used to only allow unaligned
load/store on ARM for v6+ Darwin, it will now allow unaligned load/store
for v6+ Darwin as well as for v7+ on Linux and NaCl.

The distinction is made because v6 doesn't guarantee support (but LLVM
assumes that Apple controls hardware+kernel and therefore have
conformant v6 CPUs), whereas v7 does provide this guarantee (and
Linux/NaCl behave sanely).

The patch keeps the -arm-strict-align command line option, and adds
-arm-no-strict-align. They behave similarly to GCC's -mstrict-align and
-mnostrict-align.

I originally encountered this discrepancy in FastIsel tests which expect
unaligned load/store generation. Overall this should slightly improve
performance in most cases because of reduced I$ pressure.

llvm-svn: 182175
2013-05-17 23:49:01 +00:00
Arnold Schwaighofer
6df028f783 ARM ISel: Don't create illegal types during LowerMUL
The transformation happening here is that we want to turn a
"mul(ext(X), ext(X))" into a "vmull(X, X)", stripping off the extension. We have
to make sure that X still has a valid vector type - possibly recreate an
extension to a smaller type. In case of a extload of a memory type smaller than
64 bit we used create a ext(load()). The problem with doing this - instead of
recreating an extload - is that an illegal type is exposed.

This patch fixes this by creating extloads instead of ext(load()) sequences.

Fixes PR15970.

radar://13871383

llvm-svn: 181842
2013-05-14 22:33:24 +00:00
Derek Schuff
812ee5bc7c Fix ARM FastISel tests, as a first step to enabling ARM FastISel
ARM FastISel is currently only enabled for iOS non-Thumb1, and I'm working on
enabling it for other targets. As a first step I've fixed some of the tests.
Changes to ARM FastISel tests:
- Different triples don't generate the same relocations (especially
  movw/movt versus constant pool loads). Use a regex to allow either.
- Mangling is different. Use a regex to allow either.
- The reserved registers are sometimes different, so registers get
  allocated in a different order. Capture the names only where this
  occurs.
- Add -verify-machineinstrs to some tests where it works. It doesn't
  work everywhere it should yet.
- Add -fast-isel-abort to many tests that didn't have it before.
- Split out the VarArg test from fast-isel-call.ll into its own
  test. This simplifies test setup because of --check-prefix.

Patch by JF Bastien

llvm-svn: 181801
2013-05-14 16:26:38 +00:00
Lang Hames
9019f5804f Correctly preserve the input chain for potential tailcall nodes whose
return values are bitcasts.

The chain had previously been being clobbered with the entry node to
the dag, which sometimes caused other code in the function to be
erroneously deleted when tailcall optimization kicked in.

<rdar://problem/13827621>

llvm-svn: 181696
2013-05-13 10:21:19 +00:00
Hao Liu
4a62b32731 Fix PR15950 A bug in DAG Combiner about undef mask
llvm-svn: 181682
2013-05-13 02:07:05 +00:00
Evan Cheng
2fcdb53946 Test case for r181160 and r181161. rdar://13782395
llvm-svn: 181162
2013-05-05 18:07:15 +00:00
Stepan Dyatkovskiy
c06cd03f6e For ARM backend, fixed "byval" attribute support.
Now even the small structures could be passed within byval (small enough
to be stored in GPRs).
In regression tests next function prototypes are checked:

PR15293:
  %artz = type { i32 }
  define void @foo(%artz* byval %s)
  define void @foo2(%artz* byval %s, i32 %p, %artz* byval %s2)
foo: "s" stored in R0
foo2: "s" stored in R0, "s2" stored in R2.

Next AAPCS rules are checked:
5.5 Parameters Passing, C.4 and C.5,
"ParamSize" is parameter size in 32bit words:
-- NSAA != 0, NCRN < R4 and NCRN+ParamSize > R4.
   Parameter should be sent to the stack; NCRN := R4.
-- NSAA != 0, and NCRN < R4, NCRN+ParamSize < R4.
   Parameter stored in GPRs; NCRN += ParamSize.

llvm-svn: 181148
2013-05-05 07:48:36 +00:00
Nadav Rotem
d62966b79d Optimize away nop CONCAT_VECTOR nodes.
Optimize CONCAT_VECTOR nodes that merge EXTRACT_SUBVECTOR values that extract from the same vector.

rdar://13402653
PR15866

llvm-svn: 180871
2013-05-01 19:18:51 +00:00
Stephen Lin
84b2d4dbd4 Only pass 'returned' to target-specific lowering code when the value of entire register is guaranteed to be preserved.
llvm-svn: 180825
2013-04-30 22:49:28 +00:00
Adrian Prantl
7482401c1d Temporarily revert "Change the informal convention of DBG_VALUE so that we can express a"
because it breaks some buildbots.

This reverts commit 180816.

llvm-svn: 180819
2013-04-30 22:35:14 +00:00
Adrian Prantl
baf0a98faa Change the informal convention of DBG_VALUE so that we can express a
register-indirect address with an offset of 0.
It used to be that a DBG_VALUE is a register-indirect value if the offset
(operand 1) is nonzero. The new convention is that a DBG_VALUE is
register-indirect if the first operand is a register and the second
operand is an immediate. For plain registers use the combination reg, reg.

rdar://problem/13658587

llvm-svn: 180816
2013-04-30 22:16:46 +00:00
Manman Ren
0b37dd0efc TBAA: remove !tbaa from testing cases if not used.
This will make it easier to turn on struct-path aware TBAA since the metadata
format will change.

llvm-svn: 180796
2013-04-30 17:52:57 +00:00
Manman Ren
180923f053 TBAA: remove !tbaa from testing cases if not used.
This will make it easier to turn on struct-path aware TBAA since the metadata
format will change.

llvm-svn: 180745
2013-04-29 22:58:55 +00:00
Benjamin Kramer
40a2d53c85 ARM/NEON: Pattern match vector integer abs to vabs.
llvm-svn: 180604
2013-04-26 15:00:57 +00:00
Silviu Baranga
c656dd9bdd Fix constant folding for one lane vector types. Constant folding one lane vector types not returns a vector instead of a scalar.
llvm-svn: 180254
2013-04-25 09:32:33 +00:00
Andrew Trick
73014520d6 MI Sched: eliminate local vreg copies.
For now, we just reschedule instructions that use the copied vregs and
let regalloc elliminate it. I would really like to eliminate the
copies on-the-fly during scheduling, but we need a complete
implementation of repairIntervalsInRange() first.

The general strategy is for the register coalescer to eliminate as
many global copies as possible and shrink live ranges to be
extended-basic-block local. The coalescer should not have to worry
about resolving local copies (e.g. it shouldn't attemp to reorder
instructions). The scheduler is a much better place to deal with local
interference. The coalescer side of this equation needs work.

llvm-svn: 180193
2013-04-24 15:54:43 +00:00
Stephen Lin
44a24c9593 Add more tests for r179925 to verify correct handling of signext/zeroext; strengthen condition check to require actual MVT::i32 virtual register types, just in case (no actual functionality change)
llvm-svn: 180138
2013-04-23 19:42:25 +00:00
Stephen Lin
4e394628e7 Extra paranoid test for r179925 (verify that tail calls are not generated to 'this'-returning constructors of objects with different 'this' pointers than the caller)
llvm-svn: 180032
2013-04-22 17:23:49 +00:00
Stepan Dyatkovskiy
8adaf54376 Fix for 5.5 Parameter Passing --> Stage C:
-- C.4 and C.5 statements, when NSAA is not equal to SP.
 -- C.1.cp statement for VA functions. Note: There are no VFP CPRCs in a
    variadic procedure.

Before this patch "NSAA != 0" means "don't use GPRs anymore ". But there are
some exceptions in AAPCS.
1. For non VA function: allocate all VFP regs for CPRC. When all VFPs are allocated
   CPRCs would be sent to stack, while non CPRCs may be still allocated in GRPs.
2. Check that for VA functions all params uses GPRs and then stack.
   No exceptions, no CPRCs here.

llvm-svn: 180011
2013-04-22 13:06:52 +00:00
Arnaud A. de Grandmaison
087fe129d8 Cleanup: test source files do not need to be executable
llvm-svn: 180003
2013-04-22 08:02:43 +00:00
David Blaikie
9bfe15c313 Revert "Revert "PR14606: debug info imported_module support""
This reverts commit r179840 with a fix to test/DebugInfo/two-cus-from-same-file.ll

I'm not sure why that test only failed on ARM & MIPS and not X86 Linux, even
though the debug info was clearly invalid on all of them, but this ought to fix
it.

llvm-svn: 179996
2013-04-22 06:12:31 +00:00
Jim Grosbach
3104dcf2ca Legalize vector truncates by parts rather than just splitting.
Rather than just splitting the input type and hoping for the best, apply
a bit more cleverness. Just splitting the types until the source is
legal often leads to an illegal result time, which is then widened and a
scalarization step is introduced which leads to truly horrible code
generation. With the loop vectorizer, these sorts of operations are much
more common, and so it's worth extra effort to do them well.

Add a legalization hook for the operands of a TRUNCATE node, which will
be encountered after the result type has been legalized, but if the
operand type is still illegal. If simple splitting of both types
ends up with the result type of each half still being legal, just
do that (v16i16 -> v16i8 on ARM, for example). If, however, that would
result in an illegal result type (v8i32 -> v8i8 on ARM, for example),
we can get more clever with power-two vectors. Specifically,
split the input type, but also widen the result element size, then
concatenate the halves and truncate again.  For example on ARM,
To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
  %inlo = v4i32 extract_subvector %in, 0
  %inhi = v4i32 extract_subvector %in, 4
  %lo16 = v4i16 trunc v4i32 %inlo
  %hi16 = v4i16 trunc v4i32 %inhi
  %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
  %res = v8i8 trunc v8i16 %in16

This allows instruction selection to generate three VMOVN instructions
instead of a sequences of moves, stores and loads.

Update the ARMTargetTransformInfo to take this improved legalization
into account.

Consider the simplified IR:

define <16 x i8> @test1(<16 x i32>* %ap) {
  %a = load <16 x i32>* %ap
  %tmp = trunc <16 x i32> %a to <16 x i8>
  ret <16 x i8> %tmp
}

define <8 x i8> @test2(<8 x i32>* %ap) {
  %a = load <8 x i32>* %ap
  %tmp = trunc <8 x i32> %a to <8 x i8>
  ret <8 x i8> %tmp
}

Previously, we would generate the truly hideous:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #20
	bic	sp, sp, #7
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d24, d25}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	vld1.64	{d18, d19}, [r2:128]
	add	r1, r0, #16
	vmovn.i32	d22, q8
	vld1.64	{d16, d17}, [r1:128]
	vmovn.i32	d20, q9
	vmovn.i32	d18, q12
	vmov.u16	r0, d22[3]
	strb	r0, [sp, #15]
	vmov.u16	r0, d22[2]
	strb	r0, [sp, #14]
	vmov.u16	r0, d22[1]
	strb	r0, [sp, #13]
	vmov.u16	r0, d22[0]
	vmovn.i32	d16, q8
	strb	r0, [sp, #12]
	vmov.u16	r0, d20[3]
	strb	r0, [sp, #11]
	vmov.u16	r0, d20[2]
	strb	r0, [sp, #10]
	vmov.u16	r0, d20[1]
	strb	r0, [sp, #9]
	vmov.u16	r0, d20[0]
	strb	r0, [sp, #8]
	vmov.u16	r0, d18[3]
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	vldmia	sp, {d16, d17}
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	mov	sp, r7
	pop	{r7}
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #12
	bic	sp, sp, #7
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d20, d21}, [r0:128]
	vmovn.i32	d18, q8
	vmov.u16	r0, d18[3]
	vmovn.i32	d16, q10
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	ldm	sp, {r0, r1}
	mov	sp, r7
	pop	{r7}
	bx	lr

Now, however, we generate the much more straightforward:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d20, d21}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	add	r1, r0, #16
	vld1.64	{d18, d19}, [r2:128]
	vld1.64	{d22, d23}, [r1:128]
	vmovn.i32	d17, q8
	vmovn.i32	d16, q9
	vmovn.i32	d18, q10
	vmovn.i32	d19, q11
	vmovn.i16	d17, q8
	vmovn.i16	d16, q9
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d18, d19}, [r0:128]
	vmovn.i32	d16, q8
	vmovn.i32	d17, q9
	vmovn.i16	d16, q8
	vmov	r0, r1, d16
	bx	lr

llvm-svn: 179989
2013-04-21 23:47:41 +00:00
Jim Grosbach
2582e2e539 ARM: Split out cost model vcvt testcases.
They had a separate RUN line already, so may as well be in a separate file.

llvm-svn: 179988
2013-04-21 23:47:37 +00:00
Tim Northover
593f76e08e ARM: fix part of test which actually needed an asserts build
This should fix a buildbot failure that occurred after r179977.

llvm-svn: 179978
2013-04-21 12:20:19 +00:00
Tim Northover
943f2a9234 ARM: Use ldrd/strd to spill 64-bit pairs when available.
This allows common sp-offsets to be part of the instruction and is
probably faster on modern CPUs too.

llvm-svn: 179977
2013-04-21 11:57:07 +00:00
Tim Northover
de5285eb6f ARM: don't add FrameIndex offset for LDMIA (has no immediate)
Previously, when spilling 64-bit paired registers, an LDMIA with both
a FrameIndex and an offset was produced. This kind of instruction
shouldn't exist, and the extra operand was being confused with the
predicate, causing aborts later on.

This removes the invalid 0-offset from the instruction being
produced.

llvm-svn: 179956
2013-04-20 19:31:00 +00:00
Stephen Lin
9d99ba2071 Add CodeGen support for functions that always return arguments via a new parameter attribute 'returned', which is taken advantage of in target-independent tail call opportunity detection and in ARM call lowering (when placed on an integral first parameter).
llvm-svn: 179925
2013-04-20 05:14:40 +00:00
Eric Christopher
88bdd26cc9 Revert "PR14606: debug info imported_module support"
This reverts commit r179836 as it seems to have caused test failures.

llvm-svn: 179840
2013-04-19 07:47:16 +00:00
David Blaikie
46f35f8e56 PR14606: debug info imported_module support
Adding another CU-wide list, in this case of imported_modules (since they
should be relatively rare, it seemed better to add a list where each element
had a "context" value, rather than add a (usually empty) list to every scope).
This takes care of DW_TAG_imported_module, but to fully address PR14606 we'll
need to expand this to cover DW_TAG_imported_declaration too.

llvm-svn: 179836
2013-04-19 06:57:04 +00:00
Hao Liu
ca09ec237c Fix for PR14824, An ARM Load/Store Optimization bug
llvm-svn: 179751
2013-04-18 09:11:08 +00:00
Logan Chien
6f13ff357d Implement ARM unwind opcode assembler.
llvm-svn: 179591
2013-04-16 12:02:21 +00:00
Nico Rieck
1162bb7a1d Replace coff-/elf-dump with llvm-readobj
llvm-svn: 179361
2013-04-12 04:06:46 +00:00
Benjamin Kramer
f15ba24b8d Add missing colons to check lines.
llvm-svn: 179277
2013-04-11 12:41:41 +00:00
Benjamin Kramer
4413e71a39 FileCheckize a bunch of tests.
llvm-svn: 179276
2013-04-11 12:32:23 +00:00
Benjamin Kramer
788f55c7d4 DAGCombiner: Fold a shuffle on CONCAT_VECTORS into a new CONCAT_VECTORS if possible.
This pattern occurs in SROA output due to the way vector arguments are lowered
on ARM.

The testcase from PR15525 now compiles into this, which is better than the code
we got with the old scalarrepl:
_Store:
	ldr.w	r9, [sp]
	vmov	d17, r3, r9
	vmov	d16, r1, r2
	vst1.8	{d16, d17}, [r0]
	bx	lr

Differential Revision: http://llvm-reviews.chandlerc.com/D647

llvm-svn: 179106
2013-04-09 17:41:43 +00:00
Renato Golin
9d05117f2b Reverting 178851 as it broke buildbots
llvm-svn: 178883
2013-04-05 16:39:53 +00:00
Stepan Dyatkovskiy
98f7dac944 Fix for PR14824: "Optimization arm_ldst_opt inserts newly generated instruction vldmia at incorrect position".
Patch introduces memory operands tracking in ARMLoadStoreOpt::LoadStoreMultipleOpti. For each register it keeps the order of load operations as it was before optimization pass.
It is kind of deep improvement of fix proposed by Hao: http://llvm.org/bugs/show_bug.cgi?id=14824#c4
But it also tracks conflicts between different register classes (e.g. D2 and S5).
For more details see:
Bug description: http://llvm.org/bugs/show_bug.cgi?id=14824
LLVM Commits discussion: 
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130311/167936.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130318/168688.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130325/169376.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130401/170238.html

llvm-svn: 178851
2013-04-05 05:52:14 +00:00
Jakob Stoklund Olesen
a53fa8d450 Avoid high-latency false CPSR dependencies even for tMOVSi.
The Thumb2SizeReduction pass avoids false CPSR dependencies, except it
still aggressively creates tMOVi8 instructions because they are so
common.

Avoid creating false CPSR dependencies even for tMOVi8 instructions when
the the CPSR flags are known to have high latency. This allows integer
computation to overlap floating point computations.

Also process blocks in a reverse post-order and propagate high-latency
flags to successors.

<rdar://problem/13468102>

llvm-svn: 178773
2013-04-04 18:25:36 +00:00
Stepan Dyatkovskiy
0562afa331 New-password-test commit.
llvm-svn: 178765
2013-04-04 16:11:18 +00:00
Arnold Schwaighofer
c9508f817a DAGCombiner: Merge store/loads when we have extload/truncstores
This is helps on architectures where i8,i16 are not legal but we have byte, and
short loads/stores. Allowing us to merge copies like the one below on ARM.

copy(char *a, char *b, int n) {
 do {
   int t0 = a[0];
   int t1 = a[1];
   b[0] = t0;
   b[1] = t1;

radar://13536387

llvm-svn: 178546
2013-04-02 15:58:51 +00:00
Benjamin Kramer
279e5cfa9a Remove the old CodePlacementOpt pass.
It was superseded by MachineBlockPlacement and disabled by default since LLVM 3.1.

llvm-svn: 178349
2013-03-29 17:14:24 +00:00
David Blaikie
377434ec76 Revert "Adding DIImportedModules to DIScopes."
This reverts commit 342d92c7a0adeabc9ab00f3f0d88d739fe7da4c7.

Turns out we're going with a different schema design to represent
DW_TAG_imported_modules so we won't need this extra field.

llvm-svn: 178215
2013-03-28 02:44:59 +00:00
Silviu Baranga
bd61af84a7 Enabling the generation of dependency breakers for partial updates on Cortex-A15. Also fixing a small bug in getting the update clearence for VLD1LNd32.
llvm-svn: 178134
2013-03-27 12:38:44 +00:00
David Blaikie
75da1f2b54 Adding DIImportedModules to DIScopes.
This is just the basic groundwork for supporting DW_TAG_imported_module but I
wanted to commit this before pushing support further into Clang or LLVM so that
this rather churny change is isolated from the rest of the work. The major
churn here is obviously adding another field (within the common DIScope prefix)
to all DIScopes (files, classes, namespaces, lexical scopes, etc). This should
be the last big churny change needed for DW_TAG_imported_module/using directive
support/PR14606.

llvm-svn: 178099
2013-03-27 00:07:26 +00:00
David Blaikie
620d0ae359 Reorder the DIFile field in DILexicalBlock to become a prefix common with other DIScopes
llvm-svn: 177703
2013-03-22 05:47:44 +00:00
David Blaikie
648a81f32c Move the DIFile in DISubprogram to the beginning to be a common prefix along with other DIScopes
llvm-svn: 177674
2013-03-21 22:29:36 +00:00
Renato Golin
1fca3efc0b Fix Darwin NEON FP and increase coverage
llvm-svn: 177664
2013-03-21 21:30:49 +00:00
David Blaikie
67c9dc82dc Remove unused field in DISubprogram
llvm-svn: 177661
2013-03-21 20:28:52 +00:00
Renato Golin
0854fd9bef Avoid NEON SP-FP unless unsafe-math or Darwin
NEON is not IEEE 754 compliant, so we should avoid lowering single-precision
floating point operations with NEON unless unsafe-math is turned on. The
equivalent VFP instructions are IEEE 754 compliant, but in some cores they're
much slower, so some archs/OSs might still request it to be on by default,
such as Swift and Darwin.

llvm-svn: 177651
2013-03-21 18:47:47 +00:00
David Blaikie
b2d35852ea Debug info: refactor the first field of DICompileUnit to be a raw file/directory pair
This removes the DICompileUnit special case from DIScope.

llvm-svn: 177610
2013-03-20 23:58:12 +00:00
Nadav Rotem
3a9f2d7de8 When computing the demanded bits of Load SDNodes, make sure that we are looking at the loaded-value operand and not the ptr result (in case of pre-inc loads).
rdar://13348420

llvm-svn: 177596
2013-03-20 22:53:44 +00:00
David Blaikie
78d3bdea74 Debug Info: Swap the 2nd and 3rd parameters to DICompileUnit to match the common DIScope prefix
llvm-svn: 177595
2013-03-20 22:52:54 +00:00
David Blaikie
30abbc718f Remove unused field in DICompileUnit
llvm-svn: 177590
2013-03-20 22:34:33 +00:00
David Blaikie
b9f490e28c Refactor the DIFile (2nd) parameter to DITypes to be an MDNode reference to a raw directory/file pair
This makes DIType's first non-tag parameter the same as DIFile's, allowing them
to both share the common implementation of getFilename/getDirectory in DIScope.

llvm-svn: 177467
2013-03-20 00:26:26 +00:00
David Blaikie
dd2f7e5b88 Move the DIFile operand to DITypes from the 4th operand to the 2nd.
This is another step along the way to making all DIScopes have a common prefix
which can be added to in a general manner to support using directives
(DW_TAG_imported_module).

llvm-svn: 177462
2013-03-19 23:25:22 +00:00
Renato Golin
6d0295565e Improve long vector sext/zext lowering on ARM
The ARM backend currently has poor codegen for long sext/zext
operations, such as v8i8 -> v8i32. This patch addresses this
by performing a custom expansion in ARMISelLowering. It also
adds/changes the cost of such lowering in ARMTTI.

This partially addresses PR14867.

Patch by Pete Couperus

llvm-svn: 177380
2013-03-19 08:15:38 +00:00
Quentin Colombet
bb36556d97 Extend global merge pass to optionally consider global constant variables.
Also add some checks to not merge globals used within landing pad instructions or marked as "used".

llvm-svn: 177331
2013-03-18 22:30:07 +00:00
David Blaikie
928fd30ba7 Remove unnecessary leading comment characters in lit-only file
llvm-svn: 177327
2013-03-18 22:08:16 +00:00
David Blaikie
ae14af22c5 Include '.test' suffix in target specific lit configs that need it
Apparently my final cleanup to use a relevant suffix for these tests before
committing r176831 caused them to stop running since lit wasn't configured to
run tests with that suffix in those directories (why don't we just have a
global suffix list?). So, add the suffix to the relevant directories & fix the
test that has bitrotted over the last week due to my debug info schema changes.

llvm-svn: 177315
2013-03-18 20:31:44 +00:00
David Blaikie
3193e0599a Split out filename & directory from DIFile to start generalizing over DIScopes
This is the first step to making all DIScopes have a common metadata prefix (so
that things (using directives, for example) that can appear in any scope can be
added to that common prefix). DIFile is itself a DIScope so the common prefix
of all DIScopes cannot be a DIFile - instead it's the raw filename/directory
name pair.

llvm-svn: 177239
2013-03-17 21:13:55 +00:00
Arnold Schwaighofer
c83f5b493e ARM cost model: Fix costs for some vector selects
I was too pessimistic in r177105. Vector selects that fit into a legal register
type lower just fine. I was mislead by the code fragment that I was using. The
stores/loads that I saw in those cases came from lowering the conditional off
an address.

Changing the code fragment to:

%T0_3 = type <8 x i18>
%T1_3 = type <8 x i1>

define void @func_blend3(%T0_3* %loadaddr, %T0_3* %loadaddr2,
                         %T1_3* %blend, %T0_3* %storeaddr) {
  %v0 = load %T0_3* %loadaddr
  %v1 = load %T0_3* %loadaddr2
==> FROM:
  ;%c = load %T1_3* %blend
==> TO:
  %c = icmp slt %T0_3 %v0, %v1
==> USE:
  %r = select %T1_3 %c, %T0_3 %v0, %T0_3 %v1

  store %T0_3 %r, %T0_3* %storeaddr
  ret void
}

revealed this mistake.

radar://13403975

llvm-svn: 177170
2013-03-15 18:31:01 +00:00
Silviu Baranga
ff316abe9d Adding an A15 specific optimization pass for interactions between S/D/Q registers. The pass handles all the required transformations pre-regalloc.
llvm-svn: 177169
2013-03-15 18:28:25 +00:00
Benjamin Kramer
2294ee0960 ARM: Fix an old refacto.
Fixes PR15520.

llvm-svn: 177167
2013-03-15 17:27:39 +00:00
Arnold Schwaighofer
63a59d3be8 ARM cost model: Increase cost of some vector selects we do terrible on
By terrible I mean we store/load from the stack.

This matters on PAQp8 in _Z5trainPsS_ii (which is inlined into Mixer::update)
where we decide to vectorize a loop with a VF of 8 resulting in a 25%
degradation on a cortex-a8.

LV: Found an estimated cost of 2 for VF 8 For instruction:   icmp slt i32
LV: Found an estimated cost of 2 for VF 8 For instruction:   select i1, i32, i32

The bug that tracks the CodeGen part is PR14868.

radar://13403975

llvm-svn: 177105
2013-03-14 19:17:02 +00:00
David Blaikie
127d79d573 Remove the unused 4th operand for DIFile debug info metadata
llvm-svn: 176983
2013-03-13 22:05:21 +00:00
Arnold Schwaighofer
3294ca42bf ARM cost model: Add test case to make sure we would notice a change in CodeGen
In r176898 I updated the cost model to reflect the fact that sext/zext/cast on
v8i32 <-> v8i8 and v16i32 <-> v16i8 are expensive.

This test case is so that we make sure to update the cost model once we fix
CodeGen.

llvm-svn: 176955
2013-03-13 16:25:55 +00:00
David Blaikie
3c701e7671 Refactor filename/directory in DICompileUnit into a DIFile
This is the next step towards making the metadata for DIScopes have a common
prefix rather than having to delegate based on their tag type.

llvm-svn: 176913
2013-03-13 00:01:35 +00:00
David Blaikie
98d9ccffb8 Remove unused "isMain" field from DICompileUnit
llvm-svn: 176910
2013-03-12 22:43:04 +00:00
David Blaikie
c37a0a822a Update debug info test cases with empty SplitDebugFilename field.
This could be 'null' or the empty string, DIDescriptor::getStringField
coalesces the two cases anyway so it's just a matter of legible/efficient
representation.

The change in behavior of the DICompileUnit::get* functions could be
subsumed by the full verification check - but ideally that should just be an
assertion if we could front-load the actual debug info metadata failure paths.

llvm-svn: 176907
2013-03-12 22:25:36 +00:00
Jan Wen Voung
74d9647d18 Revert the test moves from 176733. Use "REQUIRES: asserts" instead.
llvm-svn: 176873
2013-03-12 16:27:52 +00:00
David Blaikie
00170a5a62 Upgrading debug info test cases to be (more) compatible with the current debug info format.
These cases were found by further work to remove support for debug info
versioning. Common cleanups (other than changing the version info in the tag
field) included adding the last parameter to compile_units (recently added for
fission support) and other cases of trailing fields in lexical blocks, compile
units, and subprograms.

llvm-svn: 176834
2013-03-11 22:37:40 +00:00
David Blaikie
b8d3b70835 Remove duplicate test contents.
llvm-svn: 176831
2013-03-11 22:10:14 +00:00
Lang Hames
57a19e2cc0 Remove date from test case file name. The PR number provides a unique ID already.
llvm-svn: 176796
2013-03-11 03:49:23 +00:00
Lang Hames
aed76c2308 Don't glue users to extract_subreg when selecting the llvm.arm.ldrexd
intrinsic - it can cause impossible-to-schedule subgraphs to be
introduced.

PR15053.

llvm-svn: 176777
2013-03-09 22:56:09 +00:00
Benjamin Kramer
202c1b8357 Test case hygiene.
llvm-svn: 176772
2013-03-09 18:25:40 +00:00
Jan Wen Voung
2346df4d41 Disable statistics on Release builds and move tests that depend on -stats.
Summary:
Statistics are still available in Release+Asserts (any +Asserts builds),
and stats can also be turned on with LLVM_ENABLE_STATS.

Move some of the FastISel stats that were moved under DEBUG()
back out of DEBUG(), since stats are disabled across the board now.

Many tests depend on grepping "-stats" output.  Move those into
a orig_dir/Stats/. so that they can be marked as unsupported
when building without statistics.

Differential Revision: http://llvm-reviews.chandlerc.com/D486

llvm-svn: 176733
2013-03-08 22:56:31 +00:00
Bill Wendling
8c7ceb2a0e Revert r176154 in favor of a better approach.
Code generation makes some basic assumptions about the IR it's been given. In
particular, if there is only one 'invoke' in the function, then that invoke
won't be going away. However, with the advent of the `llvm.donothing' intrinsic,
those invokes may go away. If all of them go away, the landing pad no longer has
any users. This confuses the back-end, which asserts.

This happens with SjLj exceptions, because that's the model that modifies the IR
based on there being invokes, etc. in the function.

Remove any invokes of `llvm.donothing' during SjLj EH preparation. This will
give us a CFG that the back-end won't be confused about. If all of the invokes
in a function are removed, then the SjLj EH prepare pass won't insert the bogus
code the relies upon the invokes being there.
<rdar://problem/13228754&13316637>

llvm-svn: 176677
2013-03-08 02:21:08 +00:00
David Blaikie
41f29ff448 Upgrade tests to the latest debug info format.
Mostly this is just changing the named metadata (llvm.dbg.sp, llvm.dbg.gv,
llvm.dbg.<func>.lv, etc -> llvm.dbg.cu), adding a few fields to older records
(DIVariable: flags/inlined-at, DICompileUnit: sp/gv/types,
DISubprogram: local variables list)

The tests to update were discovered by a change I'm working on to remove debug
info version support - so any tests using old debug info versions I haven't
updated probably are bad tests or just not actually designed to test debug
info.

llvm-svn: 176671
2013-03-08 00:23:31 +00:00
Chad Rosier
bd6edf2054 [fast-isel] Add support for the expect intrinsic.
rdar://13370942

llvm-svn: 176649
2013-03-07 20:42:17 +00:00
Arnold Schwaighofer
c633bf302e ARM NEON: Fix v2f32 float intrinsics
Mark them as expand, they are not legal as our backend does not match them.

llvm-svn: 176410
2013-03-02 19:38:33 +00:00
Chad Rosier
25ffc43c38 Generate an error message instead of asserting or segfaulting when we can't
handle indirect register inputs.
rdar://13322011

llvm-svn: 176367
2013-03-01 19:12:05 +00:00
Chad Rosier
313ffa4bc0 Add support for using non-pic code for arm and thumb1 when emitting the sjlj
dispatch code.  As far as I can tell the thumb2 code is behaving as expected.
I was able to compile and run the associated test case for both arm and thumb1.
rdar://13066352

llvm-svn: 176363
2013-03-01 18:30:38 +00:00
Jim Grosbach
4d945565f7 ARM: FMA is legal only if VFP4 is available.
rdar://13306723

llvm-svn: 176212
2013-02-27 21:31:12 +00:00
Manman Ren
894d0f9fc3 SelectionDAG: If llvm.donothing has a landingpad, we should clear
CurrentCallSite to avoid an assertion failure:
assert(MMI.getCurrentCallSite() == 0 && "Overlapping call sites!");

rdar://problem/13228754

llvm-svn: 176154
2013-02-27 02:11:57 +00:00
Kristof Beyls
a686678676 Make ARMAsmPrinter generate the correct alignment specifier syntax in instructions.
The Printer will now print instructions with the correct alignment specifier syntax, like
    vld1.8  {d16}, [r0:64]

llvm-svn: 175884
2013-02-22 10:01:33 +00:00
Arnold Schwaighofer
170d2a8c25 DAGCombiner: Fold pointless truncate, bitcast, buildvector series
(2xi32) (truncate ((2xi64) bitcast (buildvector i32 a, i32 x, i32 b, i32 y)))
can be folded into a (2xi32) (buildvector i32 a, i32 b).

Such a DAG would cause uneccessary vdup instructions followed by vmovn
instructions.

We generate this code on ARM NEON for a setcc olt, 2xf64, 2xf64. For example, in
the vectorized version of the code below.

double A[N];
double B[N];

void test_double_compare_to_double() {
  int i;
  for(i=0;i<N;i++)
    A[i] = (double)(A[i] < B[i]);
}

radar://13191881

Fixes bug 15283.

llvm-svn: 175670
2013-02-20 21:33:32 +00:00
Arnold Schwaighofer
3a1cb40149 ARM NEON: Merge a f32 bitcast of a v2i32 extractelt
A vectorized sitfp on doubles will get scalarized to a sequence of an
extract_element of <2 x i32>, a bitcast to f32 and a sitofp.
Due to the the extract_element, and the bitcast we will uneccessarily generate
moves between scalar and vector registers.

The patch fixes this by using a COPY_TO_REGCLASS and a EXTRACT_SUBREG to extract
the element from the vector instead.

radar://13191881

llvm-svn: 175520
2013-02-19 15:27:05 +00:00
Chad Rosier
5babcb4a4b Comment out the rdar number.
llvm-svn: 175460
2013-02-18 21:59:15 +00:00
Chad Rosier
81ced58e28 [fast-isel] Remove an invalid assert.
If the memcpy has an odd length with an alignment of 2, this would incorrectly
assert on the last 1 byte copy.
rdar://13202135

llvm-svn: 175459
2013-02-18 21:46:28 +00:00
Weiming Zhao
c1d92fe42d Re-apply r175088 for bug fix 13622: Add paired register support for
inline asm with 64-bit data on ARM

Update test case to use -mtriple=arm-linux-gnueabi

llvm-svn: 175186
2013-02-14 18:10:21 +00:00
Kristof Beyls
d33917748d Make ARMAsmParser accept the correct alignment specifier syntax in instructions.
The parser will now accept instructions with alignment specifiers written like
    vld1.8  {d16}, [r0:64]
, while also still accepting the incorrect syntax
    vld1.8  {d16}, [r0, :64]

llvm-svn: 175164
2013-02-14 14:46:12 +00:00
Weiming Zhao
1159c1f3f0 temporarily revert the patch due to some conflicts
llvm-svn: 175107
2013-02-13 23:24:40 +00:00
Weiming Zhao
e51d6cf7ae Bug fix 13622: Add paired register support for inline asm with 64-bit data on ARM
llvm-svn: 175088
2013-02-13 21:43:02 +00:00
David Peixotto
0a3102166e PR14992 - Tablegen incorrectly converts ARM tLDMIA_UPD pseudo to tLDMIA
Fixed bug in tablegen conversion when source pseudo instruction has
a different number of arguments than the destination instruction.

llvm-svn: 175066
2013-02-13 19:21:47 +00:00
Arnold Schwaighofer
1ecca5fd68 ARM NEON: Handle v16i8 and v8i16 reverse shuffles
Lower reverse shuffles to a vrev64 and a vext instruction instead of the default
legalization of storing and loading to the stack. This is important because we
generate reverse shuffles in the loop vectorizer when we reverse store to an
array.

  uint8_t Arr[N];
  for (i = 0; i < N; ++i)
    Arr[N - i - 1] = ...

radar://13171760

llvm-svn: 174929
2013-02-12 01:58:32 +00:00
Bob Wilson
d9dfcce74f Revert 172027 and 174336. Remove diagnostics about over-aligned stack objects.
Aside from the question of whether we report a warning or an error when we
can't satisfy a requested stack object alignment, the current implementation
of this is not good.  We're not providing any source location in the diagnostics
and the current warning is not connected to any warning group so you can't
control it.  We could improve the source location somewhat, but we can do a
much better job if this check is implemented in the front-end, so let's do that
instead.  <rdar://problem/13127907>

llvm-svn: 174741
2013-02-08 20:35:15 +00:00
Manman Ren
6edff4edb0 Attempt to recover gdb bot after r174445.
Failure: undefined symbol 'Lline_table_start0'.
Root-cause: we use a symbol subtraction to calculate at_stmt_list, but
the line table entries are not dumped in the assembly.
Fix: use zero instead of a symbol subtraction for Compile Unit 0.

llvm-svn: 174479
2013-02-06 00:59:41 +00:00
Manman Ren
b9bd895a06 Dwarf: support for LTO where a single object file can have multiple line tables
We generate one line table for each compilation unit in the object file.
Reviewed by Eric and Kevin.

rdar://problem/13067005

llvm-svn: 174445
2013-02-05 21:52:47 +00:00
Chad Rosier
c645395ab5 [SjLj Prepare] When demoting an invoke instructions to the stack, if the normal
edge is critical, then split it so we can insert the store.
rdar://13126179

llvm-svn: 174418
2013-02-05 18:23:10 +00:00
Logan Chien
95ad6bcb45 Link .ARM.exidx with corresponding text section.
The sh_link in the ELF section header of .ARM.exidx should
be filled with the section index of the corresponding text
section.

llvm-svn: 174372
2013-02-05 14:18:59 +00:00
Manman Ren
5380cead1a [Stack Alignment] emit warning instead of a hard error
Per discussion in rdar://13127907, we should emit a hard error only if
people write code where the requested alignment is larger than achievable
and assumes the low bits are zeros. A warning should be good enough when
we are not sure if the source code assumes the low bits are zeros.

rdar://13127907

llvm-svn: 174336
2013-02-04 23:45:08 +00:00
Eli Bendersky
54b69d95ac Add a special ARM trap encoding for NaCl.
More details in this thread: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130128/163783.html

Patch by JF Bastien

llvm-svn: 173943
2013-01-30 16:30:19 +00:00
Logan Chien
8155fc64e5 Add missing header and test cases for r173939.
llvm-svn: 173941
2013-01-30 15:48:50 +00:00
Tim Northover
842599c977 Fix 64-bit atomic operations in Thumb mode.
The ARM and Thumb variants of LDREXD and STREXD have different constraints and
take different operands. Previously the code expanding atomic operations didn't
take this into account and asserted in Thumb mode.

llvm-svn: 173780
2013-01-29 09:06:13 +00:00
Silviu Baranga
e74bc9b8dc Fixed the condition codes for the atomic64 min/umin code generation on ARM. If the sutraction of the higher 32 bit parts gives a 0 result, we need to do the store operation.
llvm-svn: 173437
2013-01-25 10:39:49 +00:00
Jakob Stoklund Olesen
b38d4fe021 Remove some register allocation order dependencies.
llvm-svn: 172874
2013-01-19 00:03:32 +00:00
Tim Northover
978c012c2a Simplify writing floating types to assembly.
This removes previous special cases for each floating-point type in favour of a
shared codepath.

llvm-svn: 172189
2013-01-11 10:36:13 +00:00
Manman Ren
99a9fe9415 Stack Alignment: throw error if we can't satisfy the minimal alignment
requirement when creating stack objects in MachineFrameInfo.

Add CreateStackObjectWithMinAlign to throw error when the minimal alignment
can't be achieved and to clamp the alignment when the preferred alignment
can't be achieved. Same is true for CreateVariableSizedObject.
Will not emit error in CreateSpillStackObject or CreateStackObject.

As long as callers of CreateStackObject do not assume the object will be
aligned at the requested alignment, we should not have miscompile since
later optimizations which look at the object's alignment will have the correct
information.

rdar://12713765

llvm-svn: 172027
2013-01-10 01:10:10 +00:00
Andrew Trick
c15e94c204 MIsched: add an ILP window property to machine model.
This was an experimental option, but needs to be defined
per-target. e.g. PPC A2 needs to aggressively hide latency.

I converted some in-order scheduling tests to A2. Hal is working on
more test cases.

llvm-svn: 171946
2013-01-09 03:36:49 +00:00
Tim Northover
337241fa6f Specify complete triple for fp128 tests.
This avoids FileCheck failing over different comment characters in
assembly (notably powerpc64 on Linux vs Darwin) and should fix David's
build-bot.

llvm-svn: 171886
2013-01-08 19:36:33 +00:00
Tim Northover
dde4cda878 Allow the asm printer to print fp128 values properly.
llvm-svn: 171866
2013-01-08 16:56:23 +00:00
Silviu Baranga
0aae14e2d4 Make the MergeGlobals pass correctly handle the address space qualifiers of the global variables. We partition the set of globals by their address space, and apply the same the trasnformation as before to merge them.
llvm-svn: 171730
2013-01-07 12:31:25 +00:00
Bob Wilson
3cae2545eb Revert "Adding support for llvm.arm.neon.vaddl[su].* and"
This reverts r170694.  The operations can be represented in IR without
adding any new intrinsics.

llvm-svn: 170765
2012-12-20 21:09:38 +00:00
Renato Golin
1fbd598908 Adding support for llvm.arm.neon.vaddl[su].* and
llvm.arm.neon.vsub[su].* intrinsics.

Patch by Pete Couperus <pjcoup@gmail.com>

llvm-svn: 170694
2012-12-20 13:52:11 +00:00
Evan Cheng
82e8360cf3 LLVM sdisel normalize bit extraction of the form:
((x & 0xff00) >> 8) << 2
to
 (x >> 6) & 0x3fc

This is general goodness since it folds a left shift into the mask. However,
the trailing zeros in the mask prevents the ARM backend from using the bit
extraction instructions. And worse since the mask materialization may require
an addition instruction. This comes up fairly frequently when the result of 
the bit twiddling is used as memory address. e.g.

 = ptr[(x & 0xFF0000) >> 16]

We want to generate:
  ubfx   r3, r1, #16, #8
  ldr.w  r3, [r0, r3, lsl #2]

vs.
  mov.w  r9, #1020
  and.w  r2, r9, r1, lsr #14
  ldr    r2, [r0, r2]

Add a late ARM specific isel optimization to
ARMDAGToDAGISel::PreprocessISelDAG(). It folds the left shift to the
'base + offset' address computation; change the mask to one which doesn't have
trailing zeros and enable the use of ubfx.

Note the optimization has to be done late since it's target specific and we
don't want to change the DAG normalization. It's also fairly restrictive
as shifter operands are not always free. It's only done for lsh 1 / 2. It's
known to be free on some cpus and they are most common for address
computation.

This is a slight win for blowfish, rijndael, etc.

rdar://12870177

llvm-svn: 170581
2012-12-19 20:16:09 +00:00
Quentin Colombet
104aea3aa3 Disable ARM partial flag dependency optimization at -Oz
To not over constrain the scheduler for ARM in thumb mode, some optimizations  for code size reduction, specific to ARM thumb, are blocked when they add a dependency (like write after read dependency).

Disables this check when code size is the priority, i.e., code is compiled with -Oz.

llvm-svn: 170462
2012-12-18 22:47:16 +00:00
Andrew Trick
57f2ff72d8 MISched: add dependence to ExitSU to model live-out latency.
llvm-svn: 170454
2012-12-18 20:53:01 +00:00
Evan Cheng
86dd733bc8 Some enhancements for memcpy / memset inline expansion.
1. Teach it to use overlapping unaligned load / store to copy / set the trailing
   bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
   x86 and ARM.
3. When memcpy from a constant string, do *not* replace the load with a constant
   if it's not possible to materialize an integer immediate with a single
   instruction (required a new target hook: TLI.isIntImmLegal()).
4. Use unaligned load / stores more aggressively if target hooks indicates they
   are "fast".
5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.
   Also increase the threshold to something reasonable (8 for memset, 4 pairs
   for memcpy).

This significantly improves Dhrystone, up to 50% on ARM iOS devices.

rdar://12760078

llvm-svn: 169791
2012-12-10 23:21:26 +00:00
Tim Northover
2dbbd221f7 Added Mapping Symbols for ARM ELF
Before this patch, when you objdump an LLVM-compiled file, objdump tried to
decode data-in-code sections as if they were code.  This patch adds the missing
Mapping Symbols, as defined by "ELF for the ARM Architecture" (ARM IHI 0044D).

Patch based on work by Greg Fitzgerald.

llvm-svn: 169609
2012-12-07 16:50:23 +00:00
Dmitri Gribenko
d81733f67d Fix typos in CHECK lines.
Patch by Alexander Zinenko.

llvm-svn: 169547
2012-12-06 21:24:47 +00:00
Evan Cheng
f3d075e766 Properly fix the tes.
llvm-svn: 169464
2012-12-06 02:29:29 +00:00