1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-30 15:32:52 +01:00
Commit Graph

7524 Commits

Author SHA1 Message Date
Benjamin Kramer
e9efb3252f X86: Make shuffle -> shift conversion more aggressive about undefs.
Shuffles that only move an element into position 0 of the vector are common in
the output of the loop vectorizer and often generate suboptimal code when SSSE3
is not available. Lower them to vector shifts if possible.

We still prefer palignr over psrldq because it has higher throughput on
sandybridge.

llvm-svn: 182102
2013-05-17 14:48:34 +00:00
Benjamin Kramer
d1f1dddf9f FileCheckize test.
llvm-svn: 182101
2013-05-17 14:48:25 +00:00
Venkatraman Govindaraju
62af2fad30 [Sparc] Prevent instructions that defines or uses %o7 to be in call's delay slot.
llvm-svn: 182063
2013-05-16 23:53:29 +00:00
Akira Hatanaka
3848727973 [mips] Improve instruction selection for pattern (store (fp_to_sint $src), $ptr).
Previously, three instructions were needed:

trunc.w.s $f0, $f2
mfc1 $4, $f0
sw $4, 0($2)

Now we need only two:

trunc.w.s $f0, $f2
swc1 $f0, 0($2)

llvm-svn: 182053
2013-05-16 21:17:15 +00:00
Rafael Espindola
1a64a52101 More test coverage for addFrameMove.
llvm-svn: 182051
2013-05-16 20:50:56 +00:00
Hal Finkel
4f0a332c93 Fix cpu on test CodeGen/PowerPC/ctrloop-fp64.ll
We need ppc instead of generic to override native features on ppc machines.

llvm-svn: 182049
2013-05-16 20:28:05 +00:00
Rafael Espindola
afa5fae8eb More addFrameMove test coverage.
llvm-svn: 182046
2013-05-16 20:00:45 +00:00
Hal Finkel
80fddc9af7 Create an new preheader in PPCCTRLoops to avoid counter register clobbers
Some IR-level instructions (such as FP <-> i64 conversions) are not chained
w.r.t. the mtctr intrinsic and yet may become function calls that clobber the
counter register. At the selection-DAG level, these might be reordered with the
mtctr intrinsic causing miscompiles. To avoid this situation, if an existing
preheader has instructions that might use the counter register, create a new
preheader for the mtctr intrinsic. This extra block will be remerged with the
old preheader at the MI level, but will prevent unwanted reordering at the
selection-DAG level.

llvm-svn: 182045
2013-05-16 19:58:38 +00:00
Akira Hatanaka
ba455f200e [mips] Test case for r182042. Add comment.
llvm-svn: 182044
2013-05-16 19:57:23 +00:00
Rafael Espindola
a2af7d8def More test coverage for addFrameMove.
llvm-svn: 182041
2013-05-16 19:44:40 +00:00
Benjamin Kramer
d971be891d DAGCombine: Also shrink eq compares where the constant is exactly as large as the smaller type.
if ((x & 255) == 255)

before: movzbl  %al, %eax
        cmpl  $255, %eax

after:  cmpb  $-1, %al
llvm-svn: 182038
2013-05-16 18:47:58 +00:00
Ulrich Weigand
7b22c7a38a [PowerPC] Use true offset value in "memrix" machine operands
This is the second part of the change to always return "true"
offset values from getPreIndexedAddressParts, tackling the
case of "memrix" type operands.

This is about instructions like LD/STD that only have a 14-bit
field to encode immediate offsets, which are implicitly extended
by two zero bits by the machine, so that in effect we can access
16-bit offsets as long as they are a multiple of 4.

The PowerPC back end currently handles such instructions by
carrying the 14-bit value (as it will get encoded into the
actual machine instructions) in the machine operand fields
for such instructions.  This means that those values are
in fact not the true offset, but rather the offset divided
by 4 (and then truncated to an unsigned 14-bit value).

Like in the case fixed in r182012, this makes common code
operations on such offset values not work as expected.
Furthermore, there doesn't really appear to be any strong
reason why we should encode machine operands this way.

This patch therefore changes the encoding of "memrix" type
machine operands to simply contain the "true" offset value
as a signed immediate value, while enforcing the rules that
it must fit in a 16-bit signed value and must also be a
multiple of 4.

This change must be made simultaneously in all places that
access machine operands of this type.  However, just about
all those changes make the code simpler; in many cases we
can now just share the same code for memri and memrix
operands.

llvm-svn: 182032
2013-05-16 17:58:02 +00:00
Hal Finkel
7daa616e35 PPC32 cannot form counter loops around i64 FP conversions
On PPC32, i64 FP conversions are implemented using runtime calls (which clobber
the counter register). These must be excluded.

llvm-svn: 182023
2013-05-16 16:52:41 +00:00
Rafael Espindola
eb3034b91f Add a triple to the test to try to fix the windows bots.
llvm-svn: 182022
2013-05-16 16:48:46 +00:00
Rafael Espindola
d2b872cd1b More addFrameMove test coverage.
llvm-svn: 182021
2013-05-16 16:34:38 +00:00
Bill Schmidt
edb2d94d45 Use new CHECK-DAG support to stabilize CodeGen/PowerPC/recipest.ll
While testing some experimental code to add vector-scalar registers to
PowerPC, I noticed that a couple of independent instructions were
flipped by the scheduler.  The new CHECK-DAG support is perfect for
avoiding this problem.

llvm-svn: 182020
2013-05-16 16:15:18 +00:00
Rafael Espindola
379ba8a86e Add more addFrameMove test coverage.
llvm-svn: 182019
2013-05-16 16:09:54 +00:00
Rafael Espindola
7dd7b264b7 Add more test coverage for addFrameMove.
llvm-svn: 182017
2013-05-16 15:18:50 +00:00
Ulrich Weigand
08228b8354 [PowerPC] Report true displacement value from getPreIndexedAddressParts
DAGCombiner::CombineToPreIndexedLoadStore calls a target routine to
decompose a memory address into a base/offset pair.  It expects the
offset (if constant) to be the true displacement value in order to
perform optional additional optimizations; in particular, to convert
other uses of the original pointer into uses of the new base pointer
after pre-increment.

The PowerPC implementation of getPreIndexedAddressParts, however,
simply calls SelectAddressRegImm, which returns a TargetConstant.
This value is appropriate for encoding into the instruction, but
it is not always usable as true displacement value:

- Its type is always MVT::i32, even on 64-bit, where addresses
  ought to be i64 ... this causes the optimization to simply
  always fail on 64-bit due to this line in DAGCombiner:

      // FIXME: In some cases, we can be smarter about this.
      if (Op1.getValueType() != Offset.getValueType()) {

- Its value is truncated to an unsigned 16-bit value if negative.
  This causes the above opimization to generate wrong code.

This patch fixes both problems by simply returning the true
displacement value (in its original type).  This doesn't
affect any other user of the displacement.

llvm-svn: 182012
2013-05-16 14:53:05 +00:00
Rafael Espindola
16c10c628f Add more addFrameMove test coverage.
llvm-svn: 182011
2013-05-16 14:51:26 +00:00
Rafael Espindola
0daf590030 Extend test to check the .cfi instructions.
I am about to refactor the calls to addFrameMove and some of the ppc
ones were not being tested.

llvm-svn: 182009
2013-05-16 14:30:09 +00:00
Benjamin Kramer
c2c24d044d Relax CHECK-NEXTs a bit to cope with atom's return nop padding.
llvm-svn: 181999
2013-05-16 11:46:50 +00:00
Rafael Espindola
554870d776 Extend test for better coverage.
Without this change nothing was covering this addFrameMove:

// For 64-bit SVR4 when we have spilled CRs, the spill location
// is SP+8, not a frame-relative slot.
if (Subtarget.isSVR4ABI()
    && Subtarget.isPPC64()
    && (PPC::CR2 <= Reg && Reg <= PPC::CR4)) {
  MachineLocation CSDst(PPC::X1, 8);
  MachineLocation CSSrc(PPC::CR2);
  MMI.addFrameMove(Label, CSDst, CSSrc);
  continue;
}

llvm-svn: 181976
2013-05-16 03:48:50 +00:00
Reed Kotler
fb71c30979 Patch number 2 for mips16/32 floating point interoperability stubs.
This creates stubs that help Mips32 functions call Mips16 
functions which have floating point parameters that are normally passed
in floating point registers.
 

llvm-svn: 181972
2013-05-16 02:17:42 +00:00
David Majnemer
71b69a2eb9 Set an explicit triple for this test.
This allows the test to correctly check symbol names.

llvm-svn: 181939
2013-05-15 22:23:21 +00:00
David Majnemer
8ce4c34d1d X86: Remove redundant test instructions
Increase the number of instructions LLVM recognizes as setting the ZF
flag. This allows us to remove test instructions that redundantly
recalculate the flag.

llvm-svn: 181937
2013-05-15 22:03:08 +00:00
Hal Finkel
91bd48d046 Implement PPC counter loops as a late IR-level pass
The old PPCCTRLoops pass, like the Hexagon pass version from which it was
derived, could only handle some simple loops in canonical form. We cannot
directly adapt the new Hexagon hardware loops pass, however, because the
Hexagon pass contains a fundamental assumption that non-constant-trip-count
loops will contain a guard, and this is not always true (the result being that
incorrect negative counts can be generated). With this commit, we replace the
pass with a late IR-level pass which makes use of SE to calculate the
backedge-taken counts and safely generate the loop-count expressions (including
any necessary max() parts). This IR level pass inserts custom intrinsics that
are lowered into the desired decrement-and-branch instructions.

The most fragile part of this new implementation is that interfering uses of
the counter register must be detected on the IR level (and, on PPC, this also
includes any indirect branches in addition to function calls). Also, to make
all of this work, we need a variant of the mtctr instruction that is marked
as having side effects. Without this, machine-code level CSE, DCE, etc.
illegally transform the resulting code. Hopefully, this can be improved
in the future.

This new pass is smaller than the original (and much smaller than the new
Hexagon hardware loops pass), and can handle many additional cases correctly.
In addition, the preheader-creation code has been copied from LoopSimplify, and
after we decide on where it belongs, this code will be refactored so that it
can be explicitly shared (making this implementation even smaller).

The new test-case files ctrloop-{le,lt,ne}.ll have been adapted from tests for
the new Hexagon pass. There are a few classes of loops that this pass does not
transform (noted by FIXMEs in the files), but these deficiencies can be
addressed within the SE infrastructure (thus helping many other passes as well).

llvm-svn: 181927
2013-05-15 21:37:41 +00:00
Derek Schuff
962654973c Fix miscompile due to StackColoring incorrectly merging stack slots (PR15707)
IR optimisation passes can result in a basic block that contains:

  llvm.lifetime.start(%buf)
  ...
  llvm.lifetime.end(%buf)
  ...
  llvm.lifetime.start(%buf)

Before this change, calculateLiveIntervals() was ignoring the second
lifetime.start() and was regarding %buf as being dead from the
lifetime.end() through to the end of the basic block.  This can cause
StackColoring to incorrectly merge %buf with another stack slot.

Fix by removing the incorrect Starts[pos].isValid() and
Finishes[pos].isValid() checks.

Just doing:
      Starts[pos] = Indexes->getMBBStartIdx(MBB);
      Finishes[pos] = Indexes->getMBBEndIdx(MBB);
unconditionally would be enough to fix the bug, but it causes some
test failures due to stack slots not being merged when they were
before.  So, in order to keep the existing tests passing, treat LiveIn
and LiveOut separately rather than approximating the live ranges by
merging LiveIn and LiveOut.

This fixes PR15707.
Patch by Mark Seaborn.

llvm-svn: 181922
2013-05-15 21:15:09 +00:00
Richard Sandiford
1ccc224047 [SystemZ] Make use of SUBTRACT HALFWORD
Thanks to Ulrich Weigand for noticing that this instruction was missing.

llvm-svn: 181893
2013-05-15 15:05:29 +00:00
Arnold Schwaighofer
6df028f783 ARM ISel: Don't create illegal types during LowerMUL
The transformation happening here is that we want to turn a
"mul(ext(X), ext(X))" into a "vmull(X, X)", stripping off the extension. We have
to make sure that X still has a valid vector type - possibly recreate an
extension to a smaller type. In case of a extload of a memory type smaller than
64 bit we used create a ext(load()). The problem with doing this - instead of
recreating an extload - is that an illegal type is exposed.

This patch fixes this by creating extloads instead of ext(load()) sequences.

Fixes PR15970.

radar://13871383

llvm-svn: 181842
2013-05-14 22:33:24 +00:00
Jyotsna Verma
a1b968ec45 Hexagon: Pass to replace tranfer/copy instructions into combine instruction
where possible.

llvm-svn: 181817
2013-05-14 18:54:06 +00:00
Eric Christopher
ad3c788829 Reapply "Subtract isn't commutative, fix this for MMX psub." with
a somewhat randomly chosen cpu that will minimize cpu specific
differences on bots.

llvm-svn: 181814
2013-05-14 18:33:40 +00:00
Eric Christopher
9db4fb70b3 Temporarily revert "Subtract isn't commutative, fix this for MMX psub."
It's causing failures on the atom bot.

llvm-svn: 181812
2013-05-14 18:20:42 +00:00
Eric Christopher
e3d05e6f9d Subtract isn't commutative, fix this for MMX psub.
Patch by Andrea DiBiagio.

llvm-svn: 181809
2013-05-14 17:52:05 +00:00
Jakob Stoklund Olesen
a90977f7ae Recognize sparc64 as an alias for sparcv9 triples.
Patch by Brad Smith!

llvm-svn: 181808
2013-05-14 17:47:27 +00:00
Jyotsna Verma
980fae33f3 Hexagon: Add patterns to generate 'combine' instructions.
llvm-svn: 181805
2013-05-14 17:16:38 +00:00
Jyotsna Verma
86beda7e47 Hexagon: ArePredicatesComplement should not restrict itself to TFRs.
llvm-svn: 181803
2013-05-14 16:36:34 +00:00
Derek Schuff
812ee5bc7c Fix ARM FastISel tests, as a first step to enabling ARM FastISel
ARM FastISel is currently only enabled for iOS non-Thumb1, and I'm working on
enabling it for other targets. As a first step I've fixed some of the tests.
Changes to ARM FastISel tests:
- Different triples don't generate the same relocations (especially
  movw/movt versus constant pool loads). Use a regex to allow either.
- Mangling is different. Use a regex to allow either.
- The reserved registers are sometimes different, so registers get
  allocated in a different order. Capture the names only where this
  occurs.
- Add -verify-machineinstrs to some tests where it works. It doesn't
  work everywhere it should yet.
- Add -fast-isel-abort to many tests that didn't have it before.
- Split out the VarArg test from fast-isel-call.ll into its own
  test. This simplifies test setup because of --check-prefix.

Patch by JF Bastien

llvm-svn: 181801
2013-05-14 16:26:38 +00:00
Bill Schmidt
8c1e12a2ff PPC32: Fix stack collision between FP and CR save areas.
The changes to CR spill handling missed a case for 32-bit PowerPC.
The code in PPCFrameLowering::processFunctionBeforeFrameFinalized()
checks whether CR spill has occurred using a flag in the function
info.  This flag is only set by storeRegToStackSlot and
loadRegFromStackSlot.  spillCalleeSavedRegisters does not call
storeRegToStackSlot, but instead produces MI directly.  Thus we don't
see the CR is spilled when assigning frame offsets, and the CR spill
ends up colliding with some other location (generally the FP slot).

This patch sets the flag in spillCalleeSavedRegisters for PPC32 so
that the CR spill is properly detected and gets its own slot in the
stack frame.

llvm-svn: 181800
2013-05-14 16:08:32 +00:00
Jyotsna Verma
a892e02220 Hexagon: Test case to check if branch probabilities are properly reflected in
the jump instructions in the form of taken/not-taken hint.

llvm-svn: 181799
2013-05-14 15:50:49 +00:00
Michel Danzer
451c4b7507 R600/SI: Add lit test coverage for the remaining patterns added recently
Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 181775
2013-05-14 09:53:30 +00:00
Reed Kotler
cade566d36 This is the first of three patches which creates stubs used for
Mips16/32 floating point interoperability.

When Mips16 code calls external functions that would normally have some
of its parameters or return values passed in floating point registers,
it needs (Mips32) helper functions to do this because while in Mips16 mode
there is no ability to access the floating point registers.

In Pic mode, this is done with a set of predefined functions in libc.
This case is already handled in llvm for Mips16.

In static relocation mode, for efficiency reasons, the compiler generates
stubs that the linker will use if it turns out that the external function
is a Mips32 function. (If it's Mips16, then it does not need the helper
stubs).

These stubs are identically named and the linker knows about these tricks
and will not create multiple copies and will delete them if they are not
needed.

llvm-svn: 181753
2013-05-14 02:00:24 +00:00
Akira Hatanaka
cc5a3793e3 StackColoring: don't clear an instruction's mem operand if the underlying
object is a PseudoSourceValue and PseudoSourceValue::isConstant returns true (i.e.,
points to memory that has a constant value).

llvm-svn: 181751
2013-05-14 01:42:44 +00:00
Bill Schmidt
c7fd4630b3 PPC64: Constant initializers with dynamic relocations go in .data.rel.ro.
This fixes warning messages observed in the oggenc application test in
projects/test-suite.  Special handling is needed for the 64-bit
PowerPC SVR4 ABI when a constant is initialized with a pointer to a
function in a shared library.  Because a function address is
implemented as the address of a function descriptor, the use of copy
relocations can lead to problems with initialization.  GNU ld
therefore replaces copy relocations with dynamic relocations to be
resolved by the dynamic linker.  This means the constant cannot reside
in the read-only data section, but instead belongs in .data.rel.ro,
which is designed for constants containing dynamic relocations.

The implementation creates a class PPC64LinuxTargetObjectFile
inheriting from TargetLoweringObjectFileELF, which behaves like its
parent except to place constants of this sort into .data.rel.ro.

The test case is reduced from the oggenc application.

llvm-svn: 181723
2013-05-13 19:34:37 +00:00
Akira Hatanaka
aaa3035d45 [mips] Add option -mno-ldc1-sdc1.
This option is used when the user wants to avoid emitting double precision FP
loads and stores. Double precision FP loads and stores are expanded to single
precision instructions after register allocation.

llvm-svn: 181718
2013-05-13 18:23:35 +00:00
Lang Hames
9019f5804f Correctly preserve the input chain for potential tailcall nodes whose
return values are bitcasts.

The chain had previously been being clobbered with the entry node to
the dag, which sometimes caused other code in the function to be
erroneously deleted when tailcall optimization kicked in.

<rdar://problem/13827621>

llvm-svn: 181696
2013-05-13 10:21:19 +00:00
Hao Liu
4a62b32731 Fix PR15950 A bug in DAG Combiner about undef mask
llvm-svn: 181682
2013-05-13 02:07:05 +00:00
Reed Kotler
1f27950895 Add -mtriple=mipsel-linux-gnu to the test so that the compiler does
not think it can support small data sections.

llvm-svn: 181654
2013-05-11 01:02:20 +00:00
Reed Kotler
88fbecdc6f Checkin in of first of several patches to finish implementation of
mips16/mips32 floating point interoperability. 

This patch fixes returns from mips16 functions so that if the function
was in fact called by a mips32 hard float routine, then values
that would have been returned in floating point registers are so returned.

Mips16 mode has no floating point instructions so there is no way to
load values into floating point registers.

This is needed when returning float, double, single complex, double complex
in the Mips ABI.

Helper functions in libc for mips16 are available to do this.

For efficiency purposes, these helper functions have a different calling
convention from normal Mips calls.

Registers v0,v1,a0,a1 are used to pass parameters instead of
a0,a1,a2,a3.

This is because v0,v1,a0,a1 are the natural registers used to return
floating point values in soft float. These values can then be moved
to the appropriate floating point registers with no extra cost.

The only register that is modified is ra in this call.

The helper functions make sure that the return values are in the floating
point registers that they would be in if soft float was not in effect
(which it is for mips16, though the soft float is implemented using a mips32
library that uses hard float).
 

llvm-svn: 181641
2013-05-10 22:25:39 +00:00
Jyotsna Verma
2dfc0b2d13 Hexagon: Fix switch cases in HexagonVLIWPacketizer.cpp.
llvm-svn: 181624
2013-05-10 20:27:34 +00:00
Benjamin Kramer
9205d91a11 DAGCombiner: Generate a correct constant for vector types when folding (xor (and)) into (and (not)).
PR15948.

llvm-svn: 181597
2013-05-10 14:09:52 +00:00
Tom Stellard
7edf38bf1f R600: Remove AMDILPeeopholeOptimizer and replace optimizations with tablegen patterns
The BFE optimization was the only one we were actually using, and it was
emitting an intrinsic that we don't support.

https://bugs.freedesktop.org/show_bug.cgi?id=64201

Reviewed-by: Christian König <christian.koenig@amd.com>

NOTE: This is a candidate for the 3.3 branch.
llvm-svn: 181580
2013-05-10 02:09:45 +00:00
Tom Stellard
ed363c57b2 R600: Expand SUB for v2i32/v4i32
Patch by: Aaron Watry

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>

NOTE: This is a candidate for the 3.3 branch.
llvm-svn: 181579
2013-05-10 02:09:39 +00:00
Tom Stellard
3ca3d250c6 R600: Expand MUL for v4i32/v2i32
Fixes piglit test for OpenCL builtin mul24, and allows mad24 to run.

Patch by: Aaron Watry

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>

NOTE: This is a candidate for the 3.3 branch.
llvm-svn: 181578
2013-05-10 02:09:34 +00:00
Tom Stellard
56fef8261c R600: Expand SRA for v4i32/v2i32
v2: Add v4i32 test

Patch by: Aaron Watry

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>

NOTE: This is a candidate for the 3.3 branch.
llvm-svn: 181577
2013-05-10 02:09:29 +00:00
Tom Stellard
5d4a5a0d37 R600: Expand vselect for v4i32 and v2i32
v2: Add vselect v4i32 test

Patch by: Aaron Watry

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>

NOTE: This is a candidate for the 3.3 branch.
llvm-svn: 181576
2013-05-10 02:09:24 +00:00
Owen Anderson
e34b034b7d Teach SelectionDAG to constant fold all-constant FMA nodes the same way that it constant folds FADD, FMUL, etc.
llvm-svn: 181555
2013-05-09 22:27:13 +00:00
Bill Wendling
59bb428cc9 Generate a compact unwind encoding in the face of a stack alignment push.
We generate a `push' of a random register (%rax) if the stack needs to be
aligned by the size of that register. However, this could mess up compact unwind
generation. In particular, we want to still generate compact unwind in the
presence of this monstrosity.

Check if the push of of the %rax/%eax register. If it is and it's marked with
the `FrameSetup' flag, then we can generate a compact unwind encoding for the
function only if the push is the last FrameSetup instruction.

llvm-svn: 181540
2013-05-09 20:10:38 +00:00
Jyotsna Verma
15d448ba0c Hexagon: Use relation map for getMatchingCondBranchOpcode() and
getInvertedPredicatedOpcode() functions instead of switch cases.

llvm-svn: 181530
2013-05-09 18:25:44 +00:00
Richard Osborne
7504cb9f47 [XCore] Fix handling of functions where only the LR is spilled.
Previously we only checked if the LR required saving if the frame size was
non zero. However because the caller reserves 1 word for the callee to use
that doesn't count towards our frame size it is possible for the LR to need
saving and for the frame size to be 0.

We didn't hit when the LR needed saving because of a function calls because
the 1 word of stack we must allocate for our callee means the frame size
is always non zero in this case. However we can hit this case if the LR is
clobbered in inline asm.

llvm-svn: 181520
2013-05-09 16:43:42 +00:00
Akira Hatanaka
a1d814e7b8 [mips] Add instruction selection pattern for (seteq $LHS, 0).
llvm-svn: 181459
2013-05-08 19:38:04 +00:00
Bill Schmidt
7f1a2b5212 Fix handling of anonymous aggregate parameters for powerpc*-apple-darwin8.
This fixes bug 15821 similarly to the powerpc64-linux fix for bug 14779.

Patch by David Fang.

llvm-svn: 181449
2013-05-08 17:22:33 +00:00
Michel Danzer
e95e9672c2 R600/SI: Add lit tests for llvm.SI.imageload and llvm.SI.resinfo intrinsics
Adapted from the llvm.SI.sample test.

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 181425
2013-05-08 13:07:29 +00:00
Hal Finkel
1636de8210 PPCInstrInfo::optimizeCompareInstr should not optimize FP compares
The floating-point record forms on PPC don't set the condition register bits
based on a comparison with zero (like the integer record forms do), but rather
based on the exception status bits.

llvm-svn: 181423
2013-05-08 12:16:14 +00:00
Nick Lewycky
a88ff03516 Fix a bug in codegenprep where it was losing track of values OptimizeMemoryInst
by switching to a ValueMap. Patch by Andrea DiBiagio!

llvm-svn: 181397
2013-05-08 09:00:10 +00:00
David Majnemer
fdba035711 DAGCombiner: Simplify inverted bit tests
Fold (xor (and x, y), y) -> (and (not x), y)

This removes an opportunity for a constant to appear twice.

llvm-svn: 181395
2013-05-08 06:44:42 +00:00
Jyotsna Verma
37863260ff Hexagon: Fix Small Data support to handle -G 0 correctly.
llvm-svn: 181344
2013-05-07 19:53:00 +00:00
Jyotsna Verma
5307666fe8 Reverting r181331.
Missing file, HexagonSplitConst32AndConst64.cpp, from lib/Target/Hexagon/CMakeLists.txt.

llvm-svn: 181334
2013-05-07 17:12:35 +00:00
Jyotsna Verma
af0c734e1b Hexagon: Fix Small Data support to handle -G 0 correctly.
llvm-svn: 181331
2013-05-07 16:42:15 +00:00
Bill Wendling
0c1e625af5 Reduce attributes.
llvm-svn: 181245
2013-05-06 20:57:23 +00:00
Tom Stellard
fb8e73f3af R600: Emit config values in register / value pairs
Reviewed-by: Vincent Lejeune <vljn@ovi.com>
Tested-By: Aaron Watry <awatry@gmail.com>
llvm-svn: 181228
2013-05-06 17:50:51 +00:00
Tom Stellard
6c3f6e1b02 R600: Stop emitting the instruction type byte before each instruction
Reviewed-by: Vincent Lejeune <vljn@ovi.com>
Tested-By: Aaron Watry <awatry@gmail.com>
llvm-svn: 181225
2013-05-06 17:50:44 +00:00
Tom Stellard
ebe049fd75 R600: Emit ISA for CALL_FS_* instructions
Reviewed-by: Vincent Lejeune <vljn@ovi.com>
Tested-By: Aaron Watry <awatry@gmail.com>
llvm-svn: 181223
2013-05-06 17:50:26 +00:00
Ulrich Weigand
1431b3c2f5 [SystemZ] Add CodeGen test cases
This adds all CodeGen tests for the SystemZ target.

This version of the patch incorporates feedback from a review by
Sean Silva.  Thanks to all reviewers!

Patch by Richard Sandiford.

llvm-svn: 181204
2013-05-06 16:17:29 +00:00
Michael Kuperstein
74685eff73 Fix slightly too aggressive conact_vector optimization.
(Would sometimes optimize away conacts used to extend a vector with undef values)

llvm-svn: 181186
2013-05-06 08:06:13 +00:00
Bill Wendling
454b468436 Add a testcase that checks that we generate functions with frame
pointers or not depending upon the function attributes.

llvm-svn: 181180
2013-05-06 05:45:57 +00:00
Evan Cheng
2fcdb53946 Test case for r181160 and r181161. rdar://13782395
llvm-svn: 181162
2013-05-05 18:07:15 +00:00
Stepan Dyatkovskiy
c06cd03f6e For ARM backend, fixed "byval" attribute support.
Now even the small structures could be passed within byval (small enough
to be stored in GPRs).
In regression tests next function prototypes are checked:

PR15293:
  %artz = type { i32 }
  define void @foo(%artz* byval %s)
  define void @foo2(%artz* byval %s, i32 %p, %artz* byval %s2)
foo: "s" stored in R0
foo2: "s" stored in R0, "s2" stored in R2.

Next AAPCS rules are checked:
5.5 Parameters Passing, C.4 and C.5,
"ParamSize" is parameter size in 32bit words:
-- NSAA != 0, NCRN < R4 and NCRN+ParamSize > R4.
   Parameter should be sent to the stack; NCRN := R4.
-- NSAA != 0, and NCRN < R4, NCRN+ParamSize < R4.
   Parameter stored in GPRs; NCRN += ParamSize.

llvm-svn: 181148
2013-05-05 07:48:36 +00:00
David Majnemer
d5ba4da281 Remove a recently redundant transform from X86ISelLowering.
X86ISelLowering has support to treat:
(icmp ne (and (xor %flags, -1), (shl 1, flag)), 0)

as if it were actually:
(icmp eq (and %flags, (shl 1, flag)), 0)

However, r179386 has code at the InstCombine level to handle this.

llvm-svn: 181145
2013-05-05 02:00:10 +00:00
Tim Northover
d4f2cac7b6 AArch64: support literal pool access in large memory model.
llvm-svn: 181120
2013-05-04 16:54:07 +00:00
Tim Northover
4ef2500d01 AArch64: support large code model for jump-tables
llvm-svn: 181119
2013-05-04 16:54:00 +00:00
Tim Northover
ece66eacb2 AArch64: implement support for blockaddress in large code model
llvm-svn: 181118
2013-05-04 16:53:53 +00:00
Tim Northover
87645e02c0 AArch64: implement large code model access to global variables.
The MOVZ/MOVK instruction sequence may not be the most efficient (a
literal-pool load could be better) but adding that would require
reinstating the ConstantIslands pass.

For now the sequence is correct, and that's enough. Beware, as of
commit GNU ld does not appear to support the relocations needed for
this. Its primary purpose (for now) will be to support JITed code,
since in that case there is no guarantee of where your code will end
up in memory relative to external symbols it references.

llvm-svn: 181117
2013-05-04 16:53:46 +00:00
Reed Kotler
b89d9a0181 Remove some uneeded pseudos in the presence of the naked function attribute.
llvm-svn: 181072
2013-05-03 23:17:24 +00:00
Akira Hatanaka
5f295bccfc [mips] Split the DSP control register and define one register for each field of
its fields.

This removes false dependencies between DSP instructions which access different
fields of the the control register. Implicit register operands are added to
instructions RDDSP and WRDSP after instruction selection, depending on the
value of the mask operand.

llvm-svn: 181041
2013-05-03 18:37:49 +00:00
Tom Stellard
2165728987 R600: Expand vector or, shl, srl, and xor nodes
llvm-svn: 181035
2013-05-03 17:21:31 +00:00
Tom Stellard
f2fd0109a0 R600: Add pattern for SHA-256 Ma function
This can be optimized using the BFI_INT instruction.

llvm-svn: 181033
2013-05-03 17:21:20 +00:00
Akira Hatanaka
ab6ee99fe0 [mips] Handle reading, writing or copying of ccond field of DSP control
register.

- Define pseudo instructions which store or load ccond field of the DSP
  control register.
- Emit the pseudos in MipsSEInstrInfo::storeRegToStack and loadRegFromStack.
- Expand the pseudos before callee-scan save.
- Emit instructions RDDSP or WRDSP to copy between ccond field and GPRs. 

llvm-svn: 180969
2013-05-02 23:07:05 +00:00
Vincent Lejeune
5d7b2a4aea R600: Signed literals are 64bits wide
llvm-svn: 180960
2013-05-02 21:53:03 +00:00
Vincent Lejeune
3ff31b75b3 R600: If previous bundle is dot4, PV valid chan is always X
llvm-svn: 180959
2013-05-02 21:52:55 +00:00
Vincent Lejeune
97fb65a788 R600: Add a test to check that use_kill is emitted
llvm-svn: 180958
2013-05-02 21:52:46 +00:00
Vincent Lejeune
62da1453e1 R600: Prettier asmPrint of Alu
llvm-svn: 180956
2013-05-02 21:52:30 +00:00
Pranav Bhandarkar
520d26e773 Hexagon - Add peephole optimizations for zero extends.
* lib/Target/Hexagon/HexagonInstrInfo.td: Add patterns to combine a
	sequence of a pair of i32->i64 extensions followed by a "bitwise or"
	into COMBINE_rr.
	* lib/Target/Hexagon/HexagonPeephole.cpp: Copy propagate Rx in the
	instruction Rp = COMBINE_Ir_V4(0, Rx) to the uses of Rp:subreg_loreg.
	* test/CodeGen/Hexagon/union-1.ll: New test.
	* test/CodeGen/Hexagon/combine_ir.ll: Fix test.

llvm-svn: 180946
2013-05-02 20:22:51 +00:00
Manman Ren
0e2c14381d TBAA: remove !tbaa from testing cases if not used.
This will make it easier to turn on struct-path aware TBAA since the metadata
format will change.

llvm-svn: 180935
2013-05-02 18:11:35 +00:00
Michael Liao
ca62375d96 Rewrite X86 codegen regression test with FileCheck
llvm-svn: 180910
2013-05-02 06:20:42 +00:00
Michael Liao
ec28235c2a Avoid generating tempfile(s) never used
As DejaGNU is deprecated, it seems pipe-jam issue doesn't exist any more.

llvm-svn: 180892
2013-05-01 22:46:50 +00:00
Bill Wendling
218b457a2f Revert r180737. The companion patch was reverted, and this is not relevant right now.
llvm-svn: 180889
2013-05-01 22:32:08 +00:00
Nadav Rotem
d62966b79d Optimize away nop CONCAT_VECTOR nodes.
Optimize CONCAT_VECTOR nodes that merge EXTRACT_SUBVECTOR values that extract from the same vector.

rdar://13402653
PR15866

llvm-svn: 180871
2013-05-01 19:18:51 +00:00
Rafael Espindola
056abc9e26 Put VMOVPQIto64rr in the VRPDI class.
Patch by Joshua Magee.

llvm-svn: 180842
2013-05-01 13:00:16 +00:00
Michael Liao
cff6c527b8 Forget remove the tempfile argument
llvm-svn: 180838
2013-05-01 05:45:57 +00:00
Michael Liao
349e772e4f More rewrites of x86 codegen regression tests with FileCheck
llvm-svn: 180837
2013-05-01 05:34:30 +00:00
Akira Hatanaka
f5c940dea8 [mips] Fix handling of instructions which copy to/from accumulator registers.
Expand copy instructions between two accumulator registers before callee-saved
scan is done. Handle copies between integer GPR and hi/lo registers in
MipsSEInstrInfo::copyPhysReg. Delete pseudo-copy instructions that are not
needed.

llvm-svn: 180827
2013-04-30 23:22:09 +00:00
Stephen Lin
84b2d4dbd4 Only pass 'returned' to target-specific lowering code when the value of entire register is guaranteed to be preserved.
llvm-svn: 180825
2013-04-30 22:49:28 +00:00
Akira Hatanaka
0bca7f3584 [mips] Instruction selection patterns for DSP-ASE vector select and compare
instructions.

llvm-svn: 180820
2013-04-30 22:37:26 +00:00
Adrian Prantl
7482401c1d Temporarily revert "Change the informal convention of DBG_VALUE so that we can express a"
because it breaks some buildbots.

This reverts commit 180816.

llvm-svn: 180819
2013-04-30 22:35:14 +00:00
Adrian Prantl
baf0a98faa Change the informal convention of DBG_VALUE so that we can express a
register-indirect address with an offset of 0.
It used to be that a DBG_VALUE is a register-indirect value if the offset
(operand 1) is nonzero. The new convention is that a DBG_VALUE is
register-indirect if the first operand is a register and the second
operand is an immediate. For plain registers use the combination reg, reg.

rdar://problem/13658587

llvm-svn: 180816
2013-04-30 22:16:46 +00:00
Hal Finkel
2ac40ae0d3 LocalStackSlotAllocation improvements
First, taking advantage of the fact that the virtual base registers are allocated in order of the local frame offsets, remove the quadratic register-searching behavior. Because of the ordering, we only need to check the last virtual base register created.

Second, store the frame index in the FrameRef structure, and get the frame index and the local offset from this structure at the top of the loop iteration. This allows us to de-nest the loops in insertFrameReferenceRegisters (and I think makes the code cleaner). I also moved the needsFrameBaseReg check into the first loop over instructions so that we don't bother pushing FrameRefs for instructions that don't want a virtual base register anyway.

Lastly, and this is the only functionality change, avoid the creation of single-use virtual base registers. These are currently not useful because, in general, they end up replacing what would be one r+r instruction with an add and a r+i instruction. Committing this removes the XFAIL in CodeGen/PowerPC/2007-09-07-LoadStoreIdxForms.ll

Jim has okayed this off-list.

llvm-svn: 180799
2013-04-30 20:04:37 +00:00
Manman Ren
0b37dd0efc TBAA: remove !tbaa from testing cases if not used.
This will make it easier to turn on struct-path aware TBAA since the metadata
format will change.

llvm-svn: 180796
2013-04-30 17:52:57 +00:00
Vincent Lejeune
25352bd54f R600: fix loop-address.ll test
Texture cache is now used when shader type is not specified

llvm-svn: 180785
2013-04-30 12:47:56 +00:00
Michael Liao
c67e0fc9ea Rewrite X86 codegen regression test with FileCheck
llvm-svn: 180776
2013-04-30 07:51:08 +00:00
Vincent Lejeune
29f24e0ce8 R600: use native for alu
llvm-svn: 180761
2013-04-30 00:14:38 +00:00
Vincent Lejeune
e641cd06c9 R600: Add FetchInst bit to instruction defs to denote vertex/tex instructions
v2[Vincent Lejeune]: Split FetchInst into usesTextureCache/usesVertexCache

llvm-svn: 180755
2013-04-30 00:13:39 +00:00
Michael Liao
3db1a24464 Rewrite test in FileCheck instead of grep in X86 codegen
llvm-svn: 180754
2013-04-30 00:13:38 +00:00
Manman Ren
180923f053 TBAA: remove !tbaa from testing cases if not used.
This will make it easier to turn on struct-path aware TBAA since the metadata
format will change.

llvm-svn: 180745
2013-04-29 22:58:55 +00:00
Bill Wendling
e24a89874b Duplicate a testcase.
llvm-svn: 180744
2013-04-29 22:42:47 +00:00
Michael Liao
4e03c7690d Rewrite some tests with FileCHeck in X86 codegen
- Revise previous patches of the same purpose by fixing
  *) grep <PA> | not grep <PB> semantically is not the same as
     CHECK: <PA>{{^<PB>.*$}} as the former will check all occurrences of <PA>
     while the later only check the first match. As the result, CHECK needs
     putting in all place where <PA> occurs.
  *) grep <PA> | count <N> needs a final CHECK-NOT of the same pattern.
     (As 'CHECK-<N>' is proposed for discussion, converting 'grep | count <N>'
      where N > 1 is postponed.)

llvm-svn: 180742
2013-04-29 22:41:29 +00:00
Tom Stellard
33e7a52e1c R600: Use correct CF_END instruction on Northern Island GPUs
llvm-svn: 180735
2013-04-29 22:23:58 +00:00
Tom Stellard
a22d2b47f3 R600: Fix encoding of CF_END_{EG, R600} instructions
The EOP bit was not being encoded.

llvm-svn: 180734
2013-04-29 22:23:54 +00:00
Rafael Espindola
ef355ff0c1 Make all darwin ppc stubs local.
This fixes pr15763.
Patch by David Fang.

llvm-svn: 180657
2013-04-27 00:43:16 +00:00
Benjamin Kramer
583d3f2591 Make CHECK lines a bit less strict so they also match code generated for win64.
Hopefully brings the windows buildbots back to life.

llvm-svn: 180630
2013-04-26 21:04:21 +00:00
Tom Stellard
de2ad0a8f1 R600: Initialize AMDGPUMachineFunction::ShaderType to ShaderType::COMPUTE
We need to intialize this to something and since clang does not set
the shader type attribute and clang is used only for compute shaders,
initializing it to COMPUTE seems like the best choice.

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 180620
2013-04-26 18:32:24 +00:00
Benjamin Kramer
40a2d53c85 ARM/NEON: Pattern match vector integer abs to vabs.
llvm-svn: 180604
2013-04-26 15:00:57 +00:00
Benjamin Kramer
11723aa321 X86: Now that we have a canonical form for vector integer abs, match it into pabs.
llvm-svn: 180600
2013-04-26 12:05:21 +00:00
Benjamin Kramer
7ce75fb032 DAGCombiner: Canonicalize vector integer abs in the same way we do it for scalars.
This already helps SSE2 x86 a lot because it lacks an efficient way to
represent a vector select. The long term goal is to enable the backend to match
a canonicalized pattern into a single instruction (e.g. vabs or pabs).

llvm-svn: 180597
2013-04-26 09:19:19 +00:00
Preston Gurd
0547d81fdb This patch adds the X86FixupLEAs pass, which will reduce instruction
latency for certain models of the Intel Atom family, by converting
instructions into their equivalent LEA instructions, when it is both
useful and possible to do so.

llvm-svn: 180573
2013-04-25 20:29:37 +00:00
Chad Rosier
3030f76a0d [inline asm] Add a test case for r180226. The specific issue is that the inline
assembly is requesting a 64-bit register, which is invalid for i386.
rdar://13731657

llvm-svn: 180445
2013-04-25 17:10:21 +00:00
Silviu Baranga
c656dd9bdd Fix constant folding for one lane vector types. Constant folding one lane vector types not returns a vector instead of a scalar.
llvm-svn: 180254
2013-04-25 09:32:33 +00:00
Tom Stellard
48d161332e R600: Use SHT_PROGBITS for the .AMDGPU.config section
The libelf implementation that is distributed here:
http://www.mr511.de/software/english.html
will not parse sections that are marked SHT_NULL.

llvm-svn: 180230
2013-04-24 23:56:14 +00:00
Andrew Trick
73014520d6 MI Sched: eliminate local vreg copies.
For now, we just reschedule instructions that use the copied vregs and
let regalloc elliminate it. I would really like to eliminate the
copies on-the-fly during scheduling, but we need a complete
implementation of repairIntervalsInRange() first.

The general strategy is for the register coalescer to eliminate as
many global copies as possible and shrink live ranges to be
extended-basic-block local. The coalescer should not have to worry
about resolving local copies (e.g. it shouldn't attemp to reorder
instructions). The scheduler is a much better place to deal with local
interference. The coalescer side of this equation needs work.

llvm-svn: 180193
2013-04-24 15:54:43 +00:00
Jyotsna Verma
4a5d195942 Hexagon: Use multiclass for combine and STri[bhwd]_shl_V4 instructions.
llvm-svn: 180145
2013-04-23 21:17:40 +00:00
Stephen Lin
44a24c9593 Add more tests for r179925 to verify correct handling of signext/zeroext; strengthen condition check to require actual MVT::i32 virtual register types, just in case (no actual functionality change)
llvm-svn: 180138
2013-04-23 19:42:25 +00:00
Jyotsna Verma
624f2e0434 Hexagon: Remove assembler mapped instruction definitions.
llvm-svn: 180133
2013-04-23 19:15:55 +00:00
Vincent Lejeune
3666f07489 R600: Use .AMDGPU.config section to emit stacksize
llvm-svn: 180124
2013-04-23 17:34:12 +00:00
Vincent Lejeune
e5ba5f1b14 R600: Add CF_END
llvm-svn: 180123
2013-04-23 17:34:00 +00:00
Jyotsna Verma
20903a7aba Hexagon: Remove duplicate instructions to handle global/immediate values
for absolute/absolute-set addressing modes.

llvm-svn: 180120
2013-04-23 17:11:46 +00:00
Rafael Espindola
f7c86d97a1 Move test from grep to FileCheck.
llvm-svn: 180092
2013-04-23 12:03:27 +00:00
Akira Hatanaka
913bf6194a [mips] In performDSPShiftCombine, check that all elements in the vector are
shifted by the same amount and the shift amount is smaller than the element
size.

llvm-svn: 180039
2013-04-22 19:58:23 +00:00
Stephen Lin
4e394628e7 Extra paranoid test for r179925 (verify that tail calls are not generated to 'this'-returning constructors of objects with different 'this' pointers than the caller)
llvm-svn: 180032
2013-04-22 17:23:49 +00:00
Stepan Dyatkovskiy
8adaf54376 Fix for 5.5 Parameter Passing --> Stage C:
-- C.4 and C.5 statements, when NSAA is not equal to SP.
 -- C.1.cp statement for VA functions. Note: There are no VFP CPRCs in a
    variadic procedure.

Before this patch "NSAA != 0" means "don't use GPRs anymore ". But there are
some exceptions in AAPCS.
1. For non VA function: allocate all VFP regs for CPRC. When all VFPs are allocated
   CPRCs would be sent to stack, while non CPRCs may be still allocated in GRPs.
2. Check that for VA functions all params uses GPRs and then stack.
   No exceptions, no CPRCs here.

llvm-svn: 180011
2013-04-22 13:06:52 +00:00
Arnaud A. de Grandmaison
087fe129d8 Cleanup: test source files do not need to be executable
llvm-svn: 180003
2013-04-22 08:02:43 +00:00
David Blaikie
9bfe15c313 Revert "Revert "PR14606: debug info imported_module support""
This reverts commit r179840 with a fix to test/DebugInfo/two-cus-from-same-file.ll

I'm not sure why that test only failed on ARM & MIPS and not X86 Linux, even
though the debug info was clearly invalid on all of them, but this ought to fix
it.

llvm-svn: 179996
2013-04-22 06:12:31 +00:00
Jim Grosbach
3104dcf2ca Legalize vector truncates by parts rather than just splitting.
Rather than just splitting the input type and hoping for the best, apply
a bit more cleverness. Just splitting the types until the source is
legal often leads to an illegal result time, which is then widened and a
scalarization step is introduced which leads to truly horrible code
generation. With the loop vectorizer, these sorts of operations are much
more common, and so it's worth extra effort to do them well.

Add a legalization hook for the operands of a TRUNCATE node, which will
be encountered after the result type has been legalized, but if the
operand type is still illegal. If simple splitting of both types
ends up with the result type of each half still being legal, just
do that (v16i16 -> v16i8 on ARM, for example). If, however, that would
result in an illegal result type (v8i32 -> v8i8 on ARM, for example),
we can get more clever with power-two vectors. Specifically,
split the input type, but also widen the result element size, then
concatenate the halves and truncate again.  For example on ARM,
To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
  %inlo = v4i32 extract_subvector %in, 0
  %inhi = v4i32 extract_subvector %in, 4
  %lo16 = v4i16 trunc v4i32 %inlo
  %hi16 = v4i16 trunc v4i32 %inhi
  %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
  %res = v8i8 trunc v8i16 %in16

This allows instruction selection to generate three VMOVN instructions
instead of a sequences of moves, stores and loads.

Update the ARMTargetTransformInfo to take this improved legalization
into account.

Consider the simplified IR:

define <16 x i8> @test1(<16 x i32>* %ap) {
  %a = load <16 x i32>* %ap
  %tmp = trunc <16 x i32> %a to <16 x i8>
  ret <16 x i8> %tmp
}

define <8 x i8> @test2(<8 x i32>* %ap) {
  %a = load <8 x i32>* %ap
  %tmp = trunc <8 x i32> %a to <8 x i8>
  ret <8 x i8> %tmp
}

Previously, we would generate the truly hideous:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #20
	bic	sp, sp, #7
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d24, d25}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	vld1.64	{d18, d19}, [r2:128]
	add	r1, r0, #16
	vmovn.i32	d22, q8
	vld1.64	{d16, d17}, [r1:128]
	vmovn.i32	d20, q9
	vmovn.i32	d18, q12
	vmov.u16	r0, d22[3]
	strb	r0, [sp, #15]
	vmov.u16	r0, d22[2]
	strb	r0, [sp, #14]
	vmov.u16	r0, d22[1]
	strb	r0, [sp, #13]
	vmov.u16	r0, d22[0]
	vmovn.i32	d16, q8
	strb	r0, [sp, #12]
	vmov.u16	r0, d20[3]
	strb	r0, [sp, #11]
	vmov.u16	r0, d20[2]
	strb	r0, [sp, #10]
	vmov.u16	r0, d20[1]
	strb	r0, [sp, #9]
	vmov.u16	r0, d20[0]
	strb	r0, [sp, #8]
	vmov.u16	r0, d18[3]
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	vldmia	sp, {d16, d17}
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	mov	sp, r7
	pop	{r7}
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #12
	bic	sp, sp, #7
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d20, d21}, [r0:128]
	vmovn.i32	d18, q8
	vmov.u16	r0, d18[3]
	vmovn.i32	d16, q10
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	ldm	sp, {r0, r1}
	mov	sp, r7
	pop	{r7}
	bx	lr

Now, however, we generate the much more straightforward:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d20, d21}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	add	r1, r0, #16
	vld1.64	{d18, d19}, [r2:128]
	vld1.64	{d22, d23}, [r1:128]
	vmovn.i32	d17, q8
	vmovn.i32	d16, q9
	vmovn.i32	d18, q10
	vmovn.i32	d19, q11
	vmovn.i16	d17, q8
	vmovn.i16	d16, q9
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d18, d19}, [r0:128]
	vmovn.i32	d16, q8
	vmovn.i32	d17, q9
	vmovn.i16	d16, q8
	vmov	r0, r1, d16
	bx	lr

llvm-svn: 179989
2013-04-21 23:47:41 +00:00
Jim Grosbach
2582e2e539 ARM: Split out cost model vcvt testcases.
They had a separate RUN line already, so may as well be in a separate file.

llvm-svn: 179988
2013-04-21 23:47:37 +00:00
Jakob Stoklund Olesen
c9f30e9065 Passing arguments to varags functions under the SPARC v9 ABI.
Arguments after the fixed arguments never use the floating point
registers.

llvm-svn: 179987
2013-04-21 21:36:49 +00:00
Jakob Stoklund Olesen
d8a2b84611 Fix the SETHIimm pattern for 64-bit code.
Don't ignore the high 32 bits of the immediate.

llvm-svn: 179985
2013-04-21 21:18:03 +00:00
Tim Northover
593f76e08e ARM: fix part of test which actually needed an asserts build
This should fix a buildbot failure that occurred after r179977.

llvm-svn: 179978
2013-04-21 12:20:19 +00:00
Tim Northover
943f2a9234 ARM: Use ldrd/strd to spill 64-bit pairs when available.
This allows common sp-offsets to be part of the instruction and is
probably faster on modern CPUs too.

llvm-svn: 179977
2013-04-21 11:57:07 +00:00
Bill Wendling
61eb6957c5 Remove tbaa metadata.
llvm-svn: 179970
2013-04-21 01:38:25 +00:00
Jakob Stoklund Olesen
cbfbef04da Compile varargs functions for SPARCv9.
With a little help from the frontend, it looks like the standard va_*
intrinsics can do the job.

Also clean up an old bitcast hack in LowerVAARG that dealt with
unaligned double loads. Load SDNodes can specify an alignment now.

Still missing: Calling varargs functions with float arguments.

llvm-svn: 179961
2013-04-20 22:49:16 +00:00
Tim Northover
de5285eb6f ARM: don't add FrameIndex offset for LDMIA (has no immediate)
Previously, when spilling 64-bit paired registers, an LDMIA with both
a FrameIndex and an offset was produced. This kind of instruction
shouldn't exist, and the extra operand was being confused with the
predicate, causing aborts later on.

This removes the invalid 0-offset from the instruction being
produced.

llvm-svn: 179956
2013-04-20 19:31:00 +00:00