1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-30 15:32:52 +01:00
Commit Graph

7677 Commits

Author SHA1 Message Date
Nico Rieck
cd7bc94022 Revert "Reuse %rax after calling __chkstk on win64"
This reverts commit 01f8d579f7672872324208ac5bc4ac311e81b22e.

llvm-svn: 185781
2013-07-08 01:30:57 +00:00
Nico Rieck
15089c31ec Reuse %rax after calling __chkstk on win64
llvm-svn: 185778
2013-07-07 16:48:39 +00:00
Nico Rieck
f5c31a8456 Proper va_arg/va_copy lowering on win64
llvm-svn: 185763
2013-07-06 18:08:19 +00:00
Benjamin Kramer
5d9e616519 DAGCombiner: Don't drop extension behavior when shrinking a load when unsafe.
ReduceLoadWidth unconditionally drops extensions from loads. Limit it to the
case when all of the bits the extension would otherwise produce are dropped by
the shrink. It would be possible to shrink the load in more cases by merging
the extensions, but this isn't trivial and a very rare case. I left a TODO for
that case.

Fixes PR16551.

llvm-svn: 185755
2013-07-06 14:05:09 +00:00
Tim Northover
696f647891 Stop putting operations after a tail call.
This prevents the emission of DAG-generated vreg definitions after a
tail call be dropping them entirely (on the grounds that nothing could
use them anyway, and they interfere with O0 CodeGen).

llvm-svn: 185754
2013-07-06 12:58:45 +00:00
Arnold Schwaighofer
97cea9b991 ARM: Add a pack pattern for matching arithmetic shift right
llvm-svn: 185714
2013-07-05 18:57:49 +00:00
Arnold Schwaighofer
d5fc888196 ARM: Fix incorrect pack pattern
A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and packs them
in the bottom half of "x". An arithmetic and logic shift are only equivalent in
this context if the shift amount is 16. We would be shifting in ones into the
bottom 16bits instead of zeros if "y" is negative.

radar://14338767

llvm-svn: 185712
2013-07-05 18:28:39 +00:00
Richard Sandiford
c0fe83c1b6 [SystemZ] Remove no-op MVCs
The stack coloring pass has code to delete stores and loads that become
trivially dead after coloring.  Extend it to cope with single instructions
that copy from one frame index to another.

The testcase happens to show an example of this kicking in at the moment.
It did occur in Real Code too though.

llvm-svn: 185705
2013-07-05 14:38:48 +00:00
Richard Sandiford
8e414e2aef Fix double renaming bug in stack coloring pass
The stack coloring pass renumbered frame indexes with a loop of the form:

  for each frame index FI
    for each instruction I that uses FI
      for each use of FI in I
        rename FI to FI'

This caused problems if an instruction used two frame indexes F0 and F1
and if F0 was renamed to F1 and F1 to F2.  The first time we visited the
instruction we changed F0 to F1, then we changed both F1s to F2.

In other words, the problem was that SSRefs recorded which instructions
used an FI, but not which MachineOperands and MachineMemOperands within
that instruction used it.

This is easily fixed for MachineOperands by walking the instructions
once and processing each operand in turn.  There's already a loop to
do that for dead store elimination, so it seemed more efficient to
fuse the two at the block level.

MachineMemOperands are more tricky because they can be shared between
instructions.  The patch handles them by making SSRefs an array of
MachineMemOperands rather than an array of MachineInstrs.  We might end
up processing the same MachineMemOperand twice, but that's OK because
we always know from the SSRefs index what the original frame index was.

llvm-svn: 185703
2013-07-05 14:24:47 +00:00
Richard Sandiford
d84ed5f34a [SystemZ] Enable the use of MVC for frame-to-frame spills
...now that the problem that prompted the restriction has been fixed.

The original spill-02.py was a compromise because at the time I couldn't
find an example that actually failed without the two scavenging slots.
The version included here did.

llvm-svn: 185701
2013-07-05 14:02:01 +00:00
Richard Sandiford
acd92ea1e1 [SystemZ] Allocate a second register scavenging slot
This is another prerequisite for frame-to-frame MVC copies.
I'll commit the patch that makes use of the slot separately.

The downside of trying to test many corner cases with each of the
available addressing modes is that a fair few tests need to account
for the new frame layout.  I do still think it's useful to have all
these tests though, since it's something that wouldn't get much coverage
otherwise.

llvm-svn: 185698
2013-07-05 13:11:52 +00:00
Joey Gouly
76f34b0ffb PR16490: fix a crash in ARMDAGToDAGISel::SelectInlineAsm.
In the SelectionDAG immediate operands to inline asm are constructed as
two separate operands. The first is a constant of value InlineAsm::Kind_Imm
and the second is a constant with the value of the immediate.

In ARMDAGToDAGISel::SelectInlineAsm, if we reach an operand of Kind_Imm we
should skip over the next operand too.

llvm-svn: 185688
2013-07-05 10:19:40 +00:00
Quentin Colombet
49190aa8d1 [ARM] Improve the instruction selection of vector loads.
In the ARM back-end, build_vector nodes are lowered to a target specific
build_vector that uses floating point type. 
This works well, unless the inserted bitcasts survive until instruction
selection. In that case, they incur moves between integer unit and floating
point unit that may result in inefficient code.

In other words, this conversion may introduce artificial dependencies when the
code leading to the build vector cannot be completed with a floating point type.

In particular, this happens when loads are not aligned.

Before this patch, in that case, the compiler generates general purpose loads
and creates the floating point vector from them, instead of directly using the
vector unit.

The patch uses a vector friendly sequence of code when the inserted bitcasts to
floating point survived DAGCombine.

This is done by a target specific DAGCombine that changes the target specific
build_vector into a sequence of insert_vector_elt that get rid of the bitcasts.

<rdar://problem/14170854>

llvm-svn: 185587
2013-07-03 21:42:57 +00:00
Ulrich Weigand
a5490843a1 [PowerPC] Use mtocrf when available
Just as with mfocrf, it is also preferable to use mtocrf instead of
mtcrf when only a single CR register is to be written.

Current code however always emits mtcrf.  This probably does not matter
when using an external assembler, since the GNU assembler will in fact
automatically replace mtcrf with mtocrf when possible.  It does create
inefficient code with the integrated assembler, however.

To fix this, this patch adds MTOCRF/MTOCRF8 instruction patterns and
uses those instead of MTCRF/MTCRF8 everything.  Just as done in the
MFOCRF patch committed as 185556, these patterns will be converted
back to MTCRF if MTOCRF is not available on the machine.

As a side effect, this allows to modify the MTCRF pattern to accept
the full range of mask operands for the benefit of the asm parser.

llvm-svn: 185561
2013-07-03 17:59:07 +00:00
Rafael Espindola
006ee945bb Prefix failing commands with not to make clear they are expected to fail.
llvm-svn: 185554
2013-07-03 16:41:29 +00:00
Rafael Espindola
4af20b9fde Remove another old test.
It was only passing because 'grep andpd' was not finding any andpd, but
we don't fail if part of a pipe fails.

llvm-svn: 185552
2013-07-03 16:35:26 +00:00
Rafael Espindola
6cccbc7b17 Remove test for the old EH system. It doesn't parse anymore.
llvm-svn: 185551
2013-07-03 16:30:01 +00:00
Richard Sandiford
c7495a0fca [SystemZ] Fold more spills
Add a mapping from register-based <INSN>R instructions to the corresponding
memory-based <INSN>.  Use it to cut down on the number of spill loads.

Some instructions extend their operands from smaller fields, so this
required a new TSFlags field to say how big the unextended operand is.

This optimisation doesn't trigger for C(G)R and CL(G)R because in practice
we always combine those instructions with a branch.  Adding a test for every
other case probably seems excessive, but it did catch a missed optimisation
for DSGF (fixed in r185435).

llvm-svn: 185529
2013-07-03 10:10:02 +00:00
Tim Northover
f4b07c69d1 ARM: relax the atomic release barrier to "dmb ishst" on Swift
Swift cores implement store barriers that are stronger than the ARM
specification but weaker than general barriers. They are, in fact, just about
enough to provide the ordering needed for atomic operations with release
semantics.

This patch makes use of that quirk.

llvm-svn: 185527
2013-07-03 09:20:36 +00:00
Richard Osborne
207824e7f8 [XCore] Add ISel pattern for LDWCP
Patch by Robert Lytton.

llvm-svn: 185518
2013-07-03 07:48:50 +00:00
Ulrich Weigand
042ff673b7 [PowerPC] Remove VK_PPC_TLSGD and VK_PPC_TLSLD
The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD
correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD.
This causes some confusion with the asm parser, since VK_PPC_TLSGD
is output as @tlsgd, which is then read back in as VK_TLSGD.

To avoid this confusion, this patch removes the PowerPC-specific
modifiers and uses the generic modifiers throughout.  (The only
drawback is that the generic modifiers are printed in upper case
while the usual convention on PowerPC is to use lower-case modifiers.
But this is just a cosmetic issue.)

llvm-svn: 185476
2013-07-02 21:29:06 +00:00
Richard Sandiford
750b064fa2 [SystemZ] Use DSGFR over DSGR in more cases
Fixes some cases where we were using full 64-bit division for (sdiv i32, i32)
and (sdiv i64, i32).

The "32" in "SDIVREM32" just refers to the second operand.  The first operand
of all *DIVREM*s is a GR128.

llvm-svn: 185435
2013-07-02 15:40:22 +00:00
Richard Sandiford
33deb195f9 [SystemZ] Use MVC to spill loads and stores
Try to use MVC when spilling the destination of a simple load or the source
of a simple store.  As explained in the comment, this doesn't yet handle
the case where the load or store location is also a frame index, since
that could lead to two simultaneous scavenger spills, something the
backend can't handle yet.  spill-02.py tests that this restriction kicks in,
but unfortunately I've not yet found a case that would fail without it.
The volatile trick I used for other scavenger tests doesn't work here
because we can't use MVC for volatile accesses anyway.

I'm planning on relaxing the restriction later, hopefully with a test
that does trigger the problem...

Tests @f8 and @f9 also showed that L(G)RL and ST(G)RL were wrongly
classified as SimpleBDX{Load,Store}.  It wouldn't be easy to test for
that bug separately, which is why I didn't split out the fix as a
separate patch.

llvm-svn: 185434
2013-07-02 15:28:56 +00:00
Richard Osborne
ad449c14dd [XCore] Fix instruction selection for zext, mkmsk instructions.
r182680 replaced CountLeadingZeros_32 with a template function
countLeadingZeros that relies on using the correct argument type to give
the right result. The type passed in the XCore backend after this
revision was incorrect in a couple of places.

Patch by Robert Lytton.

llvm-svn: 185430
2013-07-02 14:46:34 +00:00
Tim Northover
d0b90ac5a7 DAGCombiner: fix use-counting issue when forming zextload
DAGCombiner was counting all uses of a load node  when considering whether it's
worth combining into a zextload. Really, it wants to ignore the chain and just
count real uses.

rdar://problem/13896307

llvm-svn: 185419
2013-07-02 09:58:53 +00:00
Hal Finkel
4eba1c5685 Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling
There are a couple of (small) related changes here:

1. The printed name of the VRSAVE register has been changed from VRsave to
vrsave in order to match the name accepted by GNU binutils.

2. Support for parsing vrsave has been added to the asm parser (it seems that
there was no test case specifically covering this code, so I've added one).

3. The list of Altivec registers, which was common to all calling conventions,
has been separated out. This allows us to define the base CSR lists, and then
lists for each ABI with Altivec included. This allows SjLj, for example, to
work correctly on non-Altivec targets without using unnatural definitions of
the NoRegs CSR list.

4. VRSAVE is now always reserved on non-Darwin targets and all Altivec
registers are reserved when Altivec is disabled.

With these changes, it is now possible to compile a function containing
__builtin_unwind_init() on Linux/PPC64 with debugging information. This did not
work previously because GNU binutils assumes that all .cfi_offset offsets will
be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned
offset). This is not true for the vrsave register, however, because this
register is used only on Darwin, GCC does not bother printing a .cfi_offset
entry for it (even though there is a slot in the stack frame for it as
specified by the ABI). This change allows us to do the same: we will also not
print .cfi_offset directives for vrsave.

llvm-svn: 185409
2013-07-02 03:39:34 +00:00
Bill Schmidt
4e099704b5 Index: test/CodeGen/PowerPC/reloc-align.ll
===================================================================
--- test/CodeGen/PowerPC/reloc-align.ll	(revision 0)
+++ test/CodeGen/PowerPC/reloc-align.ll	(revision 0)
@@ -0,0 +1,34 @@
+; RUN: llc -mcpu=pwr7 -O1 < %s | FileCheck %s
+
+; This test verifies that the peephole optimization of address accesses
+; does not produce a load or store with a relocation that can't be
+; satisfied for a given instruction encoding.  Reduced from a test supplied
+; by Hal Finkel.
+
+target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
+target triple = "powerpc64-unknown-linux-gnu"
+
+%struct.S1 = type { [8 x i8] }
+
+@main.l_1554 = internal global { i8, i8, i8, i8, i8, i8, i8, i8 } { i8 -1, i8 -6, i8 57, i8 62, i8 -48, i8 0, i8 58, i8 80 }, align 1
+
+; Function Attrs: nounwind readonly
+define signext i32 @main() #0 {
+entry:
+  %call = tail call fastcc signext i32 @func_90(%struct.S1* byval bitcast ({ i8, i8, i8, i8, i8, i8, i8, i8 }* @main.l_1554 to %struct.S1*))
+; CHECK-NOT: ld {{[0-9]+}}, main.l_1554@toc@l
+  ret i32 %call
+}
+
+; Function Attrs: nounwind readonly
+define internal fastcc signext i32 @func_90(%struct.S1* byval nocapture %p_91) #0 {
+entry:
+  %0 = bitcast %struct.S1* %p_91 to i64*
+  %bf.load = load i64* %0, align 1
+  %bf.shl = shl i64 %bf.load, 26
+  %bf.ashr = ashr i64 %bf.shl, 54
+  %bf.cast = trunc i64 %bf.ashr to i32
+  ret i32 %bf.cast
+}
+
+attributes #0 = { nounwind readonly "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
Index: lib/Target/PowerPC/PPCAsmPrinter.cpp
===================================================================
--- lib/Target/PowerPC/PPCAsmPrinter.cpp	(revision 185327)
+++ lib/Target/PowerPC/PPCAsmPrinter.cpp	(working copy)
@@ -679,7 +679,26 @@ void PPCAsmPrinter::EmitInstruction(const MachineI
       OutStreamer.EmitRawText(StringRef("\tmsync"));
       return;
     }
+    break;
+  case PPC::LD:
+  case PPC::STD:
+  case PPC::LWA: {
+    // Verify alignment is legal, so we don't create relocations
+    // that can't be supported.
+    // FIXME:  This test is currently disabled for Darwin.  The test
+    // suite shows a handful of test cases that fail this check for
+    // Darwin.  Those need to be investigated before this sanity test
+    // can be enabled for those subtargets.
+    if (!Subtarget.isDarwin()) {
+      unsigned OpNum = (MI->getOpcode() == PPC::STD) ? 2 : 1;
+      const MachineOperand &MO = MI->getOperand(OpNum);
+      if (MO.isGlobal() && MO.getGlobal()->getAlignment() < 4)
+        llvm_unreachable("Global must be word-aligned for LD, STD, LWA!");
+    }
+    // Now process the instruction normally.
+    break;
   }
+  }
 
   LowerPPCMachineInstrToMCInst(MI, TmpInst, *this);
   OutStreamer.EmitInstruction(TmpInst);
Index: lib/Target/PowerPC/PPCISelDAGToDAG.cpp
===================================================================
--- lib/Target/PowerPC/PPCISelDAGToDAG.cpp	(revision 185327)
+++ lib/Target/PowerPC/PPCISelDAGToDAG.cpp	(working copy)
@@ -1530,6 +1530,14 @@ void PPCDAGToDAGISel::PostprocessISelDAG() {
       if (GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(ImmOpnd)) {
         SDLoc dl(GA);
         const GlobalValue *GV = GA->getGlobal();
+        // We can't perform this optimization for data whose alignment
+        // is insufficient for the instruction encoding.
+        if (GV->getAlignment() < 4 &&
+            (StorageOpcode == PPC::LD || StorageOpcode == PPC::STD ||
+             StorageOpcode == PPC::LWA)) {
+          DEBUG(dbgs() << "Rejected this candidate for alignment.\n\n");
+          continue;
+        }
         ImmOpnd = CurDAG->getTargetGlobalAddress(GV, dl, MVT::i64, 0, Flags);
       } else if (ConstantPoolSDNode *CP =
                  dyn_cast<ConstantPoolSDNode>(ImmOpnd)) {

llvm-svn: 185380
2013-07-01 20:52:27 +00:00
Akira Hatanaka
a704ad94d7 [mips] Fix test case to check that mips64 instructions are generated.
llvm-svn: 185371
2013-07-01 20:18:58 +00:00
Anton Korobeynikov
a6a4ed92ee Really fix the test. Sorry for the breakage...
llvm-svn: 185369
2013-07-01 19:51:36 +00:00
Anton Korobeynikov
8e5dfb5ca4 Fix the test which relies on uncommitted change
llvm-svn: 185368
2013-07-01 19:50:31 +00:00
Anton Korobeynikov
f6e50fd093 Add jump tables handling for MSP430.
Patch by Job Noorman!

llvm-svn: 185364
2013-07-01 19:44:44 +00:00
Cameron Zwarich
7af5158621 Fix PR16508.
When phis get lowered, destination copies are inserted using an iterator that is
determined once for all phis in the block, which BuildMI interprets as a request
to insert an instruction directly before the iterator. In the case of a cyclic
phi, source copies may also be inserted directly before this iterator, which can
cause source copies to be inserted before destination copies. The fix is to keep
an iterator to the last phi and then advance it while lowering each phi in order
to insert destination copies directly after the phis.

llvm-svn: 185363
2013-07-01 19:42:46 +00:00
Hal Finkel
0b854b8c04 Don't form PPC CTR loops for over-sized exit counts
Although you can't generate this from C on PPC64, if you have a loop using a
64-bit counter on PPC32 then you can't form a CTR-based loop for it. This had
been cauing the PPCCTRLoops pass to assert.

Thanks to Joerg Sonnenberger for providing a test case!

llvm-svn: 185361
2013-07-01 19:34:59 +00:00
Tim Northover
c1348880dc AArch64: correct CodeGen of MOVZ/MOVK combinations.
According to the AArch64 ELF specification (4.6.8), it's the
assembler's responsibility to make sure the shift amount is correct in
relocated MOVZ/MOVK instructions.

This wasn't being obeyed by either the MCJIT CodeGen or RuntimeDyldELF
(which happened to work out well for JIT tests). This commit should
make us compliant in this area.

llvm-svn: 185360
2013-07-01 19:23:10 +00:00
Tim Northover
51fd747de9 Revert r185339 (ARM: relax the atomic release barrier to "dmb ishst")
Turns out I'd misread the architecture reference manual and thought
that was a load/store-store barrier, when it's not.

Thanks for pointing it out Eli!

llvm-svn: 185356
2013-07-01 18:37:33 +00:00
Tim Northover
25286e5b71 ARM: relax the atomic release barrier to "dmb ishst"
I believe the full "dmb ish" barrier is not required to guarantee release
semantics for atomic operations. The weaker "dmb ishst" prevents previous
operations being reordered with a store executed afterwards, which is enough.

A key point to note (fortunately already correct) is that this barrier alone is
*insufficient* for sequential consistency, no matter how liberally placed.

llvm-svn: 185339
2013-07-01 14:48:48 +00:00
Justin Holewinski
d88c5d6e19 [NVPTX] Add support for module-scope inline asm
Since we were explicitly not calling AsmPrinter::doInitialization,
any module-scope inline asm was not being printed.

llvm-svn: 185336
2013-07-01 13:00:14 +00:00
Justin Holewinski
89a1f98197 [NVPTX] 64-bit ADDC/ADDE are not legal
llvm-svn: 185333
2013-07-01 12:59:04 +00:00
Justin Holewinski
6284a5cea6 [NVPTX] Fix vector loads from parameters that span multiple loads, and fix some typos
llvm-svn: 185332
2013-07-01 12:59:01 +00:00
Justin Holewinski
77ba2f5ed9 [NVPTX] Handle signext/zeroext attributes properly
Fix a case where we were incorrectly sign-extending a value when we should have been zero-extending the value.

Also change some SIGN_EXTEND to ANY_EXTEND because we really dont care and may have more opportunity to fold subexpressions

llvm-svn: 185331
2013-07-01 12:58:58 +00:00
Justin Holewinski
46fe052e1f [NVPTX] Add support for native SIGN_EXTEND_INREG where available
llvm-svn: 185330
2013-07-01 12:58:56 +00:00
Justin Holewinski
46d3cad4d8 [NVPTX] Add isel patterns for [reg+offset] form of ldg/ldu.
llvm-svn: 185329
2013-07-01 12:58:52 +00:00
Justin Holewinski
c254f0f839 [NVPTX] Make sure we zero out high-order 24 bits for 8-bit load into 32-bit value
llvm-svn: 185328
2013-07-01 12:58:48 +00:00
Vincent Lejeune
4cef82fa31 R600: Support schedule and packetization of trans-only inst
llvm-svn: 185268
2013-06-29 19:32:43 +00:00
Hal Finkel
055ca2ecc9 PPC: Ignore spill/restore requests for VRSAVE (except on Darwin)
This fixes PR16418, which reports that a function calling
__builtin_unwind_init() asserts. The cause is that this generates a
spill/restore for VRSAVE, and we support that only on Darwin (because VRSAVE is
only really used on Darwin).

The test case checks only that we don't crash. We can add correctness checks
once someone verifies what behavior the function is supposed to have.

llvm-svn: 185235
2013-06-28 22:29:56 +00:00
Hal Finkel
49c072c532 Fix CodeGen/PowerPC/stack-protector.ll on OpenBSD
On OpenBSD, the stack-smash protection transform uses "__guard_local"
and "__stack_smash_handler" instead of "__stack_chk_guard" and
"__stack_chk_fail".  However, CodeGen/PowerPC/stack-protector.ll
doesn't specify a target OS, so on OpenBSD it fails.

Add -mtriple=ppc32-unknown-linux to make the test host-OS agnostic. While
there, convert to FileCheck.

Patch by Matthew Dempsky.

llvm-svn: 185206
2013-06-28 20:18:14 +00:00
Hal Finkel
7f9144ae20 Fix a PPC rlwimi instruction-selection bug
Under certain (evidently rare) circumstances, this code used to convert OR(a,
AND(x, y)) into OR(a, x). This was incorrect.

While there, I've added a comment to the code immediately above.

llvm-svn: 185201
2013-06-28 20:00:07 +00:00
Lang Hames
e8b20c4f7b Add missing case to switch statement - DAGTypeLegalizer::ExpandIntegerResult
should expand ATOMIC_CMP_SWAP nodes the same way that it does for ATOMIC_SWAP.

Since ATOMIC_LOADs on some targets (e.g. older ARM variants) get legalized to
ATOMIC_CMP_SWAPs, the missing case had been causing i64 atomic loads to crash
during isel.

<rdar://problem/14074644>

llvm-svn: 185186
2013-06-28 18:36:42 +00:00
Justin Holewinski
434a514175 [NVPTX] Add (1.0 / sqrt(x)) => rsqrt(x) generation when allowable by FP flags
llvm-svn: 185178
2013-06-28 17:58:13 +00:00
Justin Holewinski
f17855a9dc [NVPTX] Calling conventions fix
Fix ABI handling for function
returning bool -- use st.param.b32 to return the value
and use ld.param.b32 in caller to load the return value.

llvm-svn: 185177
2013-06-28 17:58:10 +00:00