1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-27 14:02:50 +01:00
Commit Graph

4132 Commits

Author SHA1 Message Date
Hal Finkel
519c3f0279 [PowerPC] Darwin byval arguments are not immutable
On PPC/Darwin, byval arguments occur at fixed stack offsets in the callee's
frame, but are not immutable -- the pointer value is directly available to the
higher-level code as the address of the argument, and the value of the byval
argument can be modified at the IR level.

This is necessary, but not sufficient, to fix PR20280. When PR20280 is fixed in
a follow-up commit, its test case will cover this change.

llvm-svn: 215793
2014-08-16 00:16:29 +00:00
Rafael Espindola
80b0c0c71c Remove HasLEB128.
We already require CFI, so it should be safe to require .leb128 and .uleb128.

llvm-svn: 215712
2014-08-15 14:01:07 +00:00
Benjamin Kramer
9d3803050f PPC: Clean up pointer casting, no functionality change.
Silences GCC's -Wcast-qual.

llvm-svn: 215703
2014-08-15 11:05:45 +00:00
Bill Schmidt
c5f23638d5 [PPC64] Add missing dependency on X2 to LDinto_toc.
The LDinto_toc pattern has been part of 64-bit PowerPC for a long
time, and represents loading from a memory location into the TOC
register (X2).  However, this pattern doesn't explicitly record that
it modifies that register.  This patch adds the missing dependency.

It was very surprising to me that this has never shown up as a problem
in the past, and that we only saw this problem recently in a single
scenario when building a self-hosted clang.  It turns out that in most
cases we have another dependency present that keeps the LDinto_toc
instruction tied in place.  LDinto_toc is used for TOC restore
following a call site, so this is a typical sequence:

   BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
   LDinto_toc 24, %X1
   ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use>

Because the LDinto_toc is inserted prior to the ADJCALLSTACKUP, there
is a natural anti-dependency between the two that keeps it in place.

Therefore we don't usually see a problem.  However, in one particular
case, one call is followed immediately by another call, and the second
call requires a parameter that is a TOC-relative address.  This is the
code sequence:

  BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
  LDinto_toc 24, %X1
  ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use>
  ADJCALLSTACKDOWN 96, %R1<imp-def>, %R1<imp-use>
  %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
  %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39

Note that the back-to-back stack adjustments are the same size!  The
back end is smart enough to recognize this and optimize them away:

  BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
  LDinto_toc 24, %X1
  %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
  %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39

Now there is nothing to prevent the ADDIStocHA instruction from moving
ahead of the LDinto_toc instruction, and because of the longest-path
heuristic, this is what happens.

With the accompanying patch, %X2 is represented as an implicit def:

  BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
  LDinto_toc 24, %X1, %X2<imp-def,dead>
  ADJCALLSTACKUP 96, 0, %R1<imp-def,dead>, %R1<imp-use>
  ADJCALLSTACKDOWN 96, %R1<imp-def,dead>, %R1<imp-use>
  %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
  %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39

So now when the two stack adjustments are removed, ADDIStocHA is
prevented from being moved above LDinto_toc.

I have not yet created a test case for this, because the original
failure occurs on a relatively large function that needs reduction.
However, this is a fairly serious bug, despite its infrequency, and I
wanted to get this patch onto the list as soon as possible so that it
can be considered for a 3.5 backport.  I'll work on whittling down a
test case.

Have we missed the boat for 3.5 at this point?

Thanks,
Bill

llvm-svn: 215685
2014-08-15 01:25:26 +00:00
Benjamin Kramer
da144ed5a2 Canonicalize header guards into a common format.
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)

Changes made by clang-tidy with minor tweaks.

llvm-svn: 215558
2014-08-13 16:26:38 +00:00
Hal Finkel
97fb1d4d91 [PowerPC] Implement PPCTargetLowering::getTgtMemIntrinsic
This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store
intrinsics. As with the construction of the MachineMemOperands for the
intrinsic calls used for unaligned load/store lowering, the only slight
complication is that we need to represent a larger memory range than the
loaded/stored value-type size (because the address is rounded down to an
aligned address, and we need to conservatively represent the entire possible
range of the actual access). This required adding an extra size field to
TargetLowering::IntrinsicInfo, and this was done in a way that required no
modifications to other targets (the size defaults to the store size of the
provided memory data type).

This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed).

llvm-svn: 215512
2014-08-13 01:15:40 +00:00
Joerg Sonnenberger
4a2da5b0e2 @l and friends adjust their value depending the context used in.
For ori, they are unsigned, for addi, signed. Create a new target
expression type to handle this and evaluate Fixups accordingly.

llvm-svn: 215315
2014-08-10 12:41:50 +00:00
Joerg Sonnenberger
df2012d0aa If available, pass down the Fixup object to EvaluateAsRelocatable.
At least on PowerPC, the interpretation of certain modifiers depends on
the context they appear in.

llvm-svn: 215310
2014-08-10 11:35:12 +00:00
Joerg Sonnenberger
f744f70919 Allow the third argument for the subi family to be an expression.
llvm-svn: 215286
2014-08-09 17:10:26 +00:00
Joerg Sonnenberger
a632bc8df4 Use the full form of dccci and iccci from the early PPC 405 documents,
since the operands are actually used on those cores. Provide aliases for
the only documented case in the newer Power ISA speec.

llvm-svn: 215282
2014-08-09 13:58:31 +00:00
Eric Christopher
3461eab467 Initialize PPC DataLayout based on the Triple only.
llvm-svn: 215281
2014-08-09 04:53:17 +00:00
Eric Christopher
846e6d954c Remove extraneous 64-bit argument to the PPC TargetMachine constructor
and update initialization.

llvm-svn: 215280
2014-08-09 04:38:56 +00:00
Joerg Sonnenberger
71ad99879f Allow large immediates for branch instructions in 32bit mode.
llvm-svn: 215240
2014-08-08 20:57:58 +00:00
Joerg Sonnenberger
8e42fa93a6 Provide an implementation of getNoopForMachoTarget for PPC, otherwise
empty functions will assert in the MC object writer.

llvm-svn: 215238
2014-08-08 19:13:23 +00:00
Joerg Sonnenberger
4df95fac67 Add low-level option for avoiding float stores from va_start until
soft-float is properly supported.

llvm-svn: 215221
2014-08-08 16:46:10 +00:00
Joerg Sonnenberger
ca56042175 Add support for SPE load/store from memory.
llvm-svn: 215220
2014-08-08 16:43:49 +00:00
Eric Christopher
378bc328f0 Temporarily Revert "Nuke the old JIT." as it's not quite ready to
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.

Approved by Jim Grosbach, Lang Hames, Rafael Espindola.

This reverts commits r215111, 215115, 215116, 215117, 215136.

llvm-svn: 215154
2014-08-07 22:02:54 +00:00
Joerg Sonnenberger
857466a57b Add the majority of the remaining SPE instructions.
llvm-svn: 215131
2014-08-07 18:52:39 +00:00
Rafael Espindola
e9ebbe5559 Nuke the old JIT.
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.

Thanks to Lang Hames for making MCJIT a good replacement!

llvm-svn: 215111
2014-08-07 14:21:18 +00:00
Joerg Sonnenberger
c6357b6edc Add mfasr and mtasr
llvm-svn: 215110
2014-08-07 13:35:34 +00:00
Joerg Sonnenberger
57c4e076f0 Add mfrtcu and mfrtcl instructions
llvm-svn: 215109
2014-08-07 13:16:58 +00:00
Joerg Sonnenberger
91c0c415d7 Support mttbl and mttbu mnemonic
llvm-svn: 215108
2014-08-07 13:06:23 +00:00
Joerg Sonnenberger
22c67059fa Add RFID instruction.
llvm-svn: 215105
2014-08-07 12:39:59 +00:00
Joerg Sonnenberger
2b751a0bc6 Fix Itineray class of rfi
llvm-svn: 215104
2014-08-07 12:35:16 +00:00
Joerg Sonnenberger
9980cac0b2 Spell e500 feature in lower case.
llvm-svn: 215103
2014-08-07 12:31:28 +00:00
Joerg Sonnenberger
d344d15c74 Add first bunch of SPE instructions. As they overlap with Altivec, mark
them as parser-only until the disassembler is extended to handle
predicates properly.

llvm-svn: 215102
2014-08-07 12:18:21 +00:00
Eric Christopher
4a1cdb2ba7 Remove the target machine from CCState. Previously it was only used
to get the subtarget and that's accessible from the MachineFunction
now. This helps clear the way for smaller changes where we getting
a subtarget will require passing in a MachineFunction/Function as
well.

llvm-svn: 214988
2014-08-06 18:45:26 +00:00
Rafael Espindola
c981388b03 Remove a virtual function from TargetMachine. NFC.
llvm-svn: 214929
2014-08-05 22:10:21 +00:00
Bill Schmidt
8159f9047b [PowerPC] Swap arguments and adjust shift count for vsldoi on little endian
Commits r213915 and r214718 fix recognition of shuffle masks for vmrg*
and vpku*um instructions for a little-endian target, by swapping the
input arguments.  The vsldoi instruction requires similar treatment,
and also needs its shift count adjusted for little endian.

Reviewed by Ulrich Weigand.

This is a bug fix candidate for release 3.5 (and hopefully the last of
those for PowerPC).

llvm-svn: 214923
2014-08-05 20:47:25 +00:00
Joerg Sonnenberger
6eb6423316 Add accessors for the PPC 403 bank registers.
llvm-svn: 214875
2014-08-05 15:45:15 +00:00
Joerg Sonnenberger
cc8f67da81 Accessors for SSR2 and SSR3 on PPC 403.
llvm-svn: 214867
2014-08-05 14:53:05 +00:00
Joerg Sonnenberger
bc6e9b7c12 Add dci/ici instructions for PPC 476 and friends.
llvm-svn: 214864
2014-08-05 14:40:32 +00:00
Joerg Sonnenberger
f726eed55e Add mftblo and mftbhi for PPC 4xx.
llvm-svn: 214863
2014-08-05 14:18:16 +00:00
Joerg Sonnenberger
d97414d887 Add lswi / stswi for assembler use with a warning to not add patterns
for them.

llvm-svn: 214862
2014-08-05 13:34:01 +00:00
Eric Christopher
67c04e77e5 Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.

Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.

llvm-svn: 214838
2014-08-05 02:39:49 +00:00
Joerg Sonnenberger
c84fe6e931 Add TCR register access
llvm-svn: 214826
2014-08-04 23:53:42 +00:00
Joerg Sonnenberger
cd34d01e29 Add PPC 603's tlbld and tlbli instructions.
llvm-svn: 214825
2014-08-04 23:49:45 +00:00
Bill Schmidt
5e2cd3791c [PPC64LE] Fix wrong IR for vec_sld and vec_vsldoi
My original LE implementation of the vsldoi instruction, with its
altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect
shufflevector operations in the LLVM IR.  Correct code is generated
because the back end handles the incorrect shufflevector in a
consistent manner.

This patch and a companion patch for Clang correct this problem by
removing the fixup from altivec.h and the corresponding fixup from the
PowerPC back end.  Several test cases are also modified to reflect the
now-correct LLVM IR.

llvm-svn: 214800
2014-08-04 23:21:01 +00:00
Joerg Sonnenberger
b94b28812d Add simplified aliases for access to DCCR, ICCR, DEAR and ESR
llvm-svn: 214797
2014-08-04 22:56:42 +00:00
Joerg Sonnenberger
0f4e0ad62e tlbre / tlbwe / tlbsx / tlbsx. variants for the PPC 4xx CPUs.
llvm-svn: 214784
2014-08-04 21:28:22 +00:00
Eric Christopher
99307e99a2 Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

llvm-svn: 214781
2014-08-04 21:25:23 +00:00
Joerg Sonnenberger
c3da10cbbc Recognize mftbl as alias for mftb, for symmetry with mttb.
llvm-svn: 214769
2014-08-04 20:28:34 +00:00
Joerg Sonnenberger
3be1146503 Rename PPCLinuxMCAsmInfo to PPCELFMCAsmInfo to better reflect the
systems it represents.

llvm-svn: 214755
2014-08-04 18:46:13 +00:00
Joerg Sonnenberger
660877af60 Allow .lcomm with alignment on ELF targets.
llvm-svn: 214754
2014-08-04 18:45:10 +00:00
Joerg Sonnenberger
51a8467117 Refactor SPRG instructions.
llvm-svn: 214733
2014-08-04 17:26:15 +00:00
Joerg Sonnenberger
f6049406d6 Add support for m[ft][di]bat[ul] instructions.
llvm-svn: 214731
2014-08-04 17:07:41 +00:00
Joerg Sonnenberger
b8c7e54901 Add features for PPC 4xx and e500/e500mc instructions.
Move the test cases for them into separate files.

llvm-svn: 214724
2014-08-04 15:47:38 +00:00
Ulrich Weigand
5df23aacbe [PowerPC] Swap arguments to vpkuhum/vpkuwum on little-endian
In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl*
by swapping the input arguments.  As it turns out, the exact same fix
is also required for the vpkuhum/vpkuwum patterns.

This fixes another regression in llvmpipe when vector support is
enabled.

Reviewed by Bill Schmidt.

llvm-svn: 214718
2014-08-04 13:53:40 +00:00
Ulrich Weigand
bf94969247 [PowerPC] MULHU/MULHS are not legal for vector types
I ran into some test failures where common code changed vector division
by constant into a multiply-high operation (MULHU).  But these are not
implemented by the back-end, so we failed to recognize the insn.

Fixed by marking MULHU/MULHS as Expand for vector types.

llvm-svn: 214716
2014-08-04 13:27:12 +00:00
Ulrich Weigand
32b8ceb243 [PowerPC] Fix and improve vector comparisons
This patch refactors code generation of vector comparisons.

This fixes a wrong code-gen bug for ISD::SETGE for floating-point types,
and improves generated code for vector comparisons in general.

Specifically, the patch moves all logic deciding how to implement vector
comparisons into getVCmpInst, which gets two extra boolean outputs
indicating to its caller whether its needs to swap the input operands
and/or negate the result of the comparison.  Apart from implementing
these two modifications as directed by getVCmpInst, there is no need
to ever implement vector comparisons in any other manner; in particular,
there is never a need to perform two separate comparisons (e.g. one for
equal and one for greater-than, as code used to do before this patch).

Reviewed by Bill Schmidt.

llvm-svn: 214714
2014-08-04 13:13:57 +00:00
Joerg Sonnenberger
1253396c1a tlbia support
llvm-svn: 214640
2014-08-02 20:16:29 +00:00
Joerg Sonnenberger
e208527e31 mfdcr / mtdcr support
llvm-svn: 214639
2014-08-02 20:00:26 +00:00
Joerg Sonnenberger
ca3cb82714 Don't use additional arguments for dss and friends to satisfy DSS_Form,
when let can do the same thing. Keep the 64bit variants as codegen-only.
While they have a different register class, the encoding is the same for
32bit and 64bit mode. Having both present would otherwise confuse the
disassembler.

llvm-svn: 214636
2014-08-02 15:09:41 +00:00
Eric Christopher
dfc6457da2 Add a non-const subtarget returning function to the target machine
so that we can use it to get the old-style JIT out of the subtarget.

This code should be removed when the old-style JIT is removed
(imminently).

llvm-svn: 214560
2014-08-01 21:18:01 +00:00
Ulrich Weigand
9dae728acd [PowerPC] PR20280 - Slots for byval parameters are not immutable
Found by inspection while looking at PR20280: code would mark slots
in the parameter save area where a byval parameter is passed as
"immutable".  This is not correct since code is allowed to modify
byval parameters in place in the parameter save area.

llvm-svn: 214517
2014-08-01 14:35:58 +00:00
Hal Finkel
f46410e6eb [PowerPC] Generate unaligned vector loads using intrinsics instead of regular loads
Altivec vector loads on PowerPC have an interesting property: They always load
from an aligned address (by rounding down the address actually provided if
necessary). In order to generate an actual unaligned load, you can generate two
load instructions, one with the original address, one offset by one vector
length, and use a special permutation to extract the bytes desired.

When this was originally implemented, I generated these two loads using regular
ISD::LOAD nodes, now marked as aligned. Unfortunately, there is a problem with
this:

The alignment of a load does not contribute to its identity, and SDNodes
are uniqued. So, imagine that we have some unaligned load, L1, that is not
aligned. The routine will create two loads, L1(aligned) and (L1+16)(aligned).
Further imagine that there had already existed a load (L1+16)(unaligned) with
the same chain operand as the load L1. When (L1+16)(aligned) is created as part
of the lowering of L1, this load *is* also the (L1+16)(unaligned) node, just
now marked as aligned (because the new alignment overwrites the old). But the
original users of (L1+16)(unaligned) now get the data intended for the
permutation yielding the data for L1, and (L1+16)(unaligned) no longer exists
to get its own permutation-based expansion. This was PR19991.

A second potential problem has to do with the MMOs on these loads, which can be
used by AA during instruction scheduling to break chain-based dependencies. If
the new "aligned" loads get the MMO from the original unaligned load, this does
not represent the fact that it will load data from below the original address.
Normally, this would not matter, but this load might be combined with another
load pair for a previous vector, and then the dependency on the otherwise-
ignored lower bytes can matter.

To fix both problems, instead of generating the necessary loads using regular
ISD::LOAD instructions, ppc_altivec_lvx intrinsics are used instead. These are
provided with MMOs with a conservative address range.

Unfortunately, I no longer have a failing test case (since PR19991 was
reported, other changes in CodeGen have forced this bug back into hiding it
again). Nevertheless, this should fix the underlying problem.

llvm-svn: 214481
2014-08-01 05:20:41 +00:00
Hal Finkel
3be61a8b81 [PowerPC] Recognize consecutive memory accesses from intrinsics
When generating unaligned vector loads, we need to search for other loads or
stores nearby offset by one vector width. If we find one, then we know that we
can safely generate another aligned load at that address. Otherwise, we must
generate the next load using an offset of the vector width minus one byte (so
we don't read off the end of the allocation if the base unaligned address
happened to be aligned at runtime). We had previously done this using only
other vector loads and stores, but did not consider the PowerPC-specific vector
load/store intrinsics. Now we'll also consider vector intrinsics. By itself,
this change is a feature enhancement, but is a necessary step toward fixing the
underlying problem behind PR19991.

llvm-svn: 214469
2014-08-01 01:02:01 +00:00
Louis Gerbarg
8048e52537 Make sure no loads resulting from load->switch DAGCombine are marked invariant
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.

This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.

llvm-svn: 214449
2014-07-31 21:45:05 +00:00
Joerg Sonnenberger
a5a8f9d13c Add mtpid/mfpid for BookE.
llvm-svn: 214363
2014-07-30 23:59:11 +00:00
Joerg Sonnenberger
3845a9a9b8 Refactor TLBIVAX and add tlbsx.
llvm-svn: 214354
2014-07-30 22:51:15 +00:00
Joerg Sonnenberger
acae7c8f64 Add rfdi and rfmci from the e500/e500mc ISA.
llvm-svn: 214339
2014-07-30 21:09:03 +00:00
Joerg Sonnenberger
e81540b210 Add BookE's tlbre, tlbwe and tlbivax instructions.
llvm-svn: 214332
2014-07-30 20:44:04 +00:00
Joerg Sonnenberger
270c5a66fb Add BookE's wrtee and wrteei instructions.
llvm-svn: 214297
2014-07-30 10:32:51 +00:00
Joerg Sonnenberger
577206124d SPRG 0 to 3 are valid outside BookE, so move them to the normal test
file. Add support for accessing SPRG 4 to 7 on BookE.

llvm-svn: 214295
2014-07-30 09:24:37 +00:00
Joerg Sonnenberger
db7b2c7644 Add rfci instruction.
llvm-svn: 214256
2014-07-29 23:45:20 +00:00
Joerg Sonnenberger
abd973f309 mbar without argument is equivalent to mbar 0.
llvm-svn: 214250
2014-07-29 23:31:27 +00:00
Joerg Sonnenberger
afb8bfcb47 Recognize BookE's mbar instruction.
llvm-svn: 214244
2014-07-29 23:16:31 +00:00
Joerg Sonnenberger
5dd0828a9c Fix typo in alias: DSIR -> DSISR
llvm-svn: 214238
2014-07-29 22:42:44 +00:00
Joerg Sonnenberger
d52d4c80b5 Support move to/from segment register.
llvm-svn: 214234
2014-07-29 22:21:57 +00:00
Joerg Sonnenberger
f3709585f9 Add a number of aliases for SPR access.
llvm-svn: 214196
2014-07-29 18:55:43 +00:00
Joerg Sonnenberger
c3bec0cdc8 Add rfi instruction. Based on feedback by Ulrich Weigand.
llvm-svn: 214181
2014-07-29 15:49:09 +00:00
Matt Arsenault
55d94a2290 Fix typos / grammar.
llvm-svn: 214147
2014-07-29 00:02:40 +00:00
Ulrich Weigand
37cf88e787 [PowerPC] Support ELFv1/ELFv2 ABI selection via features
While LLVM now supports both ELFv1 and ELFv2 ABIs, their use is currently
hard-coded via the target triple: powerpc64-linux is always ELFv1, while
powerpc64le-linux is always ELFv2.

These are of course the most common scenarios, but in principle it is
possible to support the ELFv2 ABI on big-endian or the ELFv1 ABI on
little-endian systems (and GCC does support that), and there are some
special use cases for that (e.g. certain Linux kernel versions could
only be built using ELFv1 on LE).

This patch implements the LLVM side of supporting this.  As precedent
on other platforms suggests, ABI options are passed to the back-end as
features.  Thus, this patch implements two features "elfv1" and "elfv2"
that select the desired ABI if present.  (If not, the LLVM uses the
same default rules as now.)

llvm-svn: 214072
2014-07-28 13:09:28 +00:00
Matt Arsenault
76c7b7a591 Add alignment value to allowsUnalignedMemoryAccess
Rename to allowsMisalignedMemoryAccess.

On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment,
and don't need to be split into multiple accesses. Vector loads with
an alignment of the element type are not uncommon in OpenCL code.

llvm-svn: 214055
2014-07-27 17:46:40 +00:00
Hal Finkel
ba36f6399d [PowerPC] Support TLS on PPC32/ELF
Patch by Justin Hibbits!

llvm-svn: 213960
2014-07-25 17:47:22 +00:00
Bill Schmidt
1861624c2e [PATCH][PPC64LE] Correct little-endian usage of vmrgh* and vmrgl*.
Because the PowerPC vmrgh* and vmrgl* instructions have a built-in
big-endian bias, it is necessary to swap their inputs in little-endian
mode when using them to implement a vector shuffle.  This was
previously missed in the vector LE implementation.

There was already logic to distinguish between unary and "normal"
vmrg* vector shuffles, so this patch extends that logic to use a third
option:  "swapped" vmrg* vector shuffles that are used for little
endian in place of the "normal" ones.

I've updated the vec-shuffle-le.ll test to check for the expected
register ordering on the generated instructions.

This bug was discovered when testing the LE and ELFv2 patches for
safety if they were backported to 3.4.  A different vectorization
decision was made in 3.4 than on mainline trunk, and that exposed the
problem.  I've verified this fix takes care of that issue.

llvm-svn: 213915
2014-07-25 01:55:55 +00:00
Joerg Sonnenberger
97f682d3e2 Don't use 128bit functions on PPC32.
llvm-svn: 213899
2014-07-24 22:20:10 +00:00
NAKAMURA Takumi
509350ce96 Prune dependency to MC from each target disassembler.
llvm-svn: 213856
2014-07-24 11:45:11 +00:00
NAKAMURA Takumi
dbff653a5a Update library dependencies.
llvm-svn: 213832
2014-07-24 02:10:42 +00:00
Ulrich Weigand
eb914f2256 [PowerPC] ELFv2 aggregate passing support
This patch adds infrastructure support for passing array types
directly.  These can be used by the front-end to pass aggregate
types (coerced to an appropriate array type).  The details of the
array type being used inform the back-end about ABI-relevant
properties.  Specifically, the array element type encodes:
- whether the parameter should be passed in FPRs, VRs, or just
  GPRs/stack slots  (for float / vector / integer element types,
  respectively)
- what the alignment requirements of the parameter are when passed in
  GPRs/stack slots  (8 for float / 16 for vector / the element type
  size for integer element types) -- this corresponds to the
  "byval align" field

Using the infrastructure provided by this patch, a companion patch
to clang will enable two features:
- In the ELFv2 ABI, pass (and return) "homogeneous" floating-point
  or vector aggregates in FPRs and VRs (this is similar to the ARM
  homogeneous aggregate ABI)
- As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates
  that fit fully in registers without using the "byval" mechanism

The patch uses the functionArgumentNeedsConsecutiveRegisters callback
to encode that special treatment is required for all directly-passed
array types.  The isInConsecutiveRegs / isInConsecutiveRegsLast bits set
as a results are then used to implement the required size and alignment
rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc.

As a related change, the ABI routines have to be modified to support
passing floating-point types in GPRs.  This is necessary because with
homogeneous aggregates of 4-byte float type we can now run out of FPRs
*before* we run out of the 64-byte argument save area that is shadowed
by GPRs.  Any extra floating-point arguments that no longer fit in FPRs
must now be passed in GPRs until we run out of those too.

Note that there was already code to pass floating-point arguments in
GPRs used with vararg parameters, which was done by writing the argument
out to the argument save area first and then reloading into GPRs.  The
patch re-implements this, however, in favor of code packing float arguments
directly via extension/truncation, BITCAST, and BUILD_PAIR operations.

This is required to support the ELFv2 ABI, since we cannot unconditionally
write to the argument save area (which the caller might not have allocated).
The change does, however, affect ELFv1 varags routines too; but even here
the overall effect should be advantageous: Instead of loading the argument
into the FPR, then storing the argument to the stack slot, and finally
reloading the argument from the stack slot into a GPR, the new code now
just loads the argument into the FPR, and subsequently loads the argument
into the GPR (via BITCAST).  That BITCAST might imply a save/reload from
a stack temporary (in which case we're no worse than before); but it
might be implemented more efficiently in some cases.

The final part of the patch enables up to 8 FPRs and VRs for argument
return in PPCCallingConv.td; this is required to support returning
ELFv2 homogeneous aggregates.  (Note that this doesn't affect other ABIs
since LLVM wil only look for which register to use if the parameter is
marked as "direct" return anyway.)

Reviewed by Hal Finkel.

llvm-svn: 213493
2014-07-21 00:13:26 +00:00
Ulrich Weigand
9fcc5caf2d [PowerPC] ELFv2 explicit CFI for CR fields
This is a minor improvement in the ELFv2 ABI.   In ELFv1, DWARF CFI
would represent a saved CR word (holding CR fields CR2, CR3, and CR4)
using just a single CFI record refering to CR2.   In ELFv2 instead,
each of the CR fields is represented by its own CFI record.  The
advantage is that the compiler can now chose to save just a single
(or two) CR fields instead of all of them, if those are the only ones
that actually need saving.  That can lead to more efficient code using
mf(o)crf instead of the (slow) mfcr instruction.

Note that this patch does not (yet) implement this more efficient
code generation, but it does implement the part that is required to
be ABI compliant: creating multiple CFI records if multiple CR fields
are saved.

Reviewed by Hal Finkel.

llvm-svn: 213492
2014-07-21 00:03:18 +00:00
Ulrich Weigand
fb90fdfb31 [PowerPC] ELFv2 stack space reduction
The ELFv2 ABI reduces the amount of stack required to implement an
ABI-compliant function call in two ways:
* the "linkage area" is reduced from 48 bytes to 32 bytes by
  eliminating two unused doublewords
* the 64-byte "parameter save area" is now optional and need not be
  present in certain cases (it remains mandatory in functions with
  variable arguments, and functions that have any parameter that is
  passed on the stack)

The following patch implements this required changes:
- reducing the linkage area, and associated relocation of the TOC save
  slot, in getLinkageSize / getTOCSaveOffset (this requires updating all
  callers of these routines to pass in the isELFv2ABI flag).
- (partially) handling the case where the parameter save are is optional

This latter part requires some extra explanation:  Currently, we still
always allocate the parameter save area when *calling* a function.
That is certainly always compliant with the ABI, but may cause code to
allocate stack unnecessarily.  This can be addressed by a follow-on
optimization patch.

On the *callee* side, in LowerFormalArguments, we *must* track
correctly whether the ABI guarantees that the caller has allocated
the parameter save area for our use, and the patch does so. However,
there is one complication: the code that handles incoming "byval"
arguments will currently *always* write to the parameter save area,
because it has to force incoming register arguments to the stack since
it must return an *address* to implement the byval semantics.

To fix this, the patch changes the LowerFormalArguments code to write
arguments to a freshly allocated stack slot on the function's own stack
frame instead of the argument save area in those cases where that area
is not present.

Reviewed by Hal Finkel.

llvm-svn: 213490
2014-07-20 23:43:15 +00:00
Ulrich Weigand
41e116ee77 [PowerPC] ELFv2 function call changes
This patch builds upon the two preceding MC changes to implement the
basic ELFv2 function call convention.  In the ELFv1 ABI, a "function
descriptor" was associated with every function, pointing to both the
entry address and the related TOC base (and a static chain pointer
for nested functions).  Function pointers would actually refer to that
descriptor, and the indirect call sequence needed to load up both entry
address and TOC base.

In the ELFv2 ABI, there are no more function descriptors, and function
pointers simply refer to the (global) entry point of the function code.
Indirect function calls simply branch to that address, after loading it
up into r12 (as required by the ABI rules for a global entry point).
Direct function calls continue to just do a "bl" to the target symbol;
this will be resolved by the linker to the local entry point of the
target function if it is local, and to a PLT stub if it is global.
That PLT stub would then load the (global) entry point address of the
final target into r12 and branch to it.  Note that when performing a
local function call, r2 must be set up to point to the current TOC
base: if the target ends up local, the ABI requires that its local
entry point is called with r2 set up; if the target ends up global,
the PLT stub requires that r2 is set up.

This patch implements all LLVM changes to implement that scheme:
- No longer create a function descriptor when emitting a function
  definition (in EmitFunctionEntryLabel)
- Emit two entry points *if* the function needs the TOC base (r2)
  anywhere (this is done EmitFunctionBodyStart; note that this cannot
  be done in EmitFunctionBodyStart because the global entry point
  prologue code must be *part* of the function as covered by debug info).
- In order to make use tracking of r2 (as needed above) work correctly,
  mark direct function calls as implicitly using r2.
- Implement the ELFv2 indirect function call sequence (no function
  descriptors; load target address into r12).
- When creating an ELFv2 object file, emit the .abiversion 2 directive
  to tell the linker to create the appropriate version of PLT stubs.  

Reviewed by Hal Finkel.

llvm-svn: 213489
2014-07-20 23:31:44 +00:00
Ulrich Weigand
28c7be6585 [MC] Pass MCSymbolData to needsRelocateWithSymbol
As discussed in a previous checking to support the .localentry
directive on PowerPC, we need to inspect the actual target symbol
in needsRelocateWithSymbol to make the appropriate decision based
on that symbol's st_other bits.

Currently, needsRelocateWithSymbol does not get the target symbol.
However, it is directly available to its sole caller.  This patch
therefore simply extends the needsRelocateWithSymbol by a new
parameter "const MCSymbolData &SD", passes in the target symbol,
and updates all derived implementations.

In particular, in the PowerPC implementation, this patch removes
the FIXME added by the previous checkin.

llvm-svn: 213487
2014-07-20 23:15:06 +00:00
Ulrich Weigand
1376d19372 [PowerPC] ELFv2 MC support for .localentry directive
A second binutils feature needed to support ELFv2 is the .localentry
directive.  In the ELFv2 ABI, functions may have two entry points:
one for calling the routine locally via "bl", and one for calling the
function via function pointer (either at the source level, or implicitly
via a PLT stub for global calls).  The two entry points share a single
ELF symbol, where the ELF symbol address identifies the global entry
point address, while the local entry point is found by adding a delta
offset to the symbol address.  That offset is encoded into three
platform-specific bits of the ELF symbol st_other field.

The .localentry directive instructs the assembler to set those fields
to encode a particular offset.  This is typically used by a function
prologue sequence like this:

func:
        addis r2, r12, (.TOC.-func)@ha
        addi r2, r2, (.TOC.-func)@l
        .localentry func, .-func

Note that according to the ABI, when calling the global entry point,
r12 must be set to point the global entry point address itself; while
when calling the local entry point, r2 must be set to point to the TOC
base.  The two instructions between the global and local entry point in
the above example translate the first requirement into the second.

This patch implements support in the PowerPC MC streamers to emit the
.localentry directive (both into assembler and ELF object output), as
well as support in the assembler parser to parse that directive.

In addition, there is another change required in MC fixup/relocation
handling to properly deal with relocations targeting function symbols
with two entry points: When the target function is known local, the MC
layer would immediately handle the fixup by inserting the target
address -- this is wrong, since the call may need to go to the local
entry point instead.  The GNU assembler handles this case by *not*
directly resolving fixups targeting functions with two entry points,
but always emits the relocation and relies on the linker to handle
this case correctly.  This patch changes LLVM MC to do the same (this
is done via the processFixupValue routine).

Similarly, there are cases where the assembler would normally emit a
relocation, but "simplify" it to a relocation targeting a *section*
instead of the actual symbol.  For the same reason as above, this
may be wrong when the target symbol has two entry points.  The GNU
assembler again handles this case by not performing this simplification
in that case, but leaving the relocation targeting the full symbol,
which is then resolved by the linker.  This patch changes LLVM MC
to do the same (via the needsRelocateWithSymbol routine).
NOTE: The method used in this patch is overly pessimistic, since the
needsRelocateWithSymbol routine currently does not have access to the
actual target symbol, and thus must always assume that it might have
two entry points.  This will be improved upon by a follow-on patch
that modifies common code to pass the target symbol when calling
needsRelocateWithSymbol.

Reviewed by Hal Finkel.

llvm-svn: 213485
2014-07-20 23:06:03 +00:00
Ulrich Weigand
22549677c8 [PowerPC] ELFv2 MC support for .abiversion directive
ELFv2 binaries are marked by a bit in the ELF header e_flags field.
A new assembler directive .abiversion can be used to set that flag.
This patch implements support in the PowerPC MC streamers to emit the
.abiversion directive (both into assembler and ELF binary output),
as well as support in the assembler parser to parse the .abiversion
directive.

Reviewed by Hal Finkel.

llvm-svn: 213484
2014-07-20 22:56:57 +00:00
Ulrich Weigand
d1393de80e [PowerPC] Refactor byval handling in LowerFormalArguments_64SVR4
When handling an incoming byval argument, we need to possibly write
incoming registers to the stack in order to create an on-stack image
of the parameter, so we can return its address to common code.

This currently uses CreateFixedObject to access the parts of the
parameter save area where the argument is (or needs to be) stored.
However, sometimes we need to access multiple parts of that area,
e.g. to write multiple registers.  The code currently uses a new
CreateFixedObject call for each of these accesses, resulting in
a patchwork of overlapping (fixed) stack objects.

This doesn't really matter in the case of fixed objects, since
any access to those turns into a fixed stackpointer + offset
address anyway.  However, with the upcoming ELFv2 patches, we
may actually need to place an incoming argument into our *own*
stack frame instead of the caller's.  This means we need to use
CreateStackObject instead, and we cannot have multiple overlapping
instances of those.

To make the rest of the argument handling code work equally in
both situations, this patch refactors it to always use just a
single call to CreateFixedObject, and access parts of that object
as required using address arithmetic.  This way, we can in a future
patch substitute CreateStackObject without further changes.

No change to generated code intended.

llvm-svn: 213483
2014-07-20 22:36:52 +00:00
Ulrich Weigand
c274d8aae7 [PowerPC] Fix FrameIndex handling in SelectAddressRegImm
The PPCTargetLowering::SelectAddressRegImm routine needs to handle
FrameIndex nodes in a special manner, by tranlating them into a
TargetFrameIndex node.  This was done in most cases, but seems to
have been neglected in one path: when the input tree has an OR of
the FrameIndex with an immediate.  This can happen if the FrameIndex
can be proven to be sufficiently aligned that an OR of that immediate
is equivalent to an ADD.

The missing handling of FrameIndex in that case caused the SelectionDAG
instruction selection to miss opportunities to merge the OR back into
the FrameIndex node, leading to superfluous addi/ori instructions in
the final assembler output.

llvm-svn: 213482
2014-07-20 22:26:40 +00:00
Hal Finkel
006e1d44a6 [PowerPC] 32-bit ELF PIC support
This adds initial support for PPC32 ELF PIC (Position Independent Code; the
-fPIC variety), thus rectifying a long-standing deficiency in the PowerPC
backend.

Patch by Justin Hibbits!

llvm-svn: 213427
2014-07-18 23:29:49 +00:00
Sanjay Patel
2f0f025b2b Move Post RA Scheduling flag bit into SchedMachineModel
Refactoring; no functional changes intended

    Removed PostRAScheduler bits from subtargets (X86, ARM).
    Added PostRAScheduler bit to MCSchedModel class.
    This bit is set by a CPU's scheduling model (if it exists).
    Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses.
    Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!).
    Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling.
    Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values.
    Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86: 
       a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget. 
       b. MIPS overrides the CPU's postRA settings by enabling postRA for everything. 
       c. PPC overrides the CPU's postRA settings by enabling postRA for everything. 
       d. X86 is the only target that actually has postRA specified via sched model info.

Differential Revision: http://reviews.llvm.org/D4217

llvm-svn: 213101
2014-07-15 22:39:58 +00:00
NAKAMURA Takumi
365309bb8b Prune Redundant libdeps in CMake's target_link_libraries and LLVMBuild.txt.
I checked this with Release+Asserts on x86_64-mingw32. Please restore partially if this were overkill.

llvm-svn: 213064
2014-07-15 11:37:03 +00:00
Ulrich Weigand
1aae06e395 [PowerPC] Fix invalid displacement created by LocalStackAlloc
This commit fixes a bug in PPCRegisterInfo::isFrameOffsetLegal that
could result in the LocalStackAlloc pass creating an MI instruction
out-of-range displacement:
        %vreg17<def> = LD 33184, %vreg31; mem:LD8[%g](align=32)
        %G8RC:%vreg17 G8RC_and_G8RC_NOX0:%vreg31
(In final assembler output the top bits are stripped off, resulting
in a negative offset loading from below the stack pointer.)

Common code expects the isFrameOffsetLegal routine to verify whether
adding a given offset to the offset already present in the instruction
results in a valid displacement.  However, on PowerPC the routine
did not take the already present instruction offset into account.

This commit fixes isFrameOffsetLegal to add the instruction offset,
and updates a local caller (needsFrameBaseReg) to no longer add the
instruction offset itself before calling isFrameOffsetLegal.

Reviewed by Hal Finkel.

llvm-svn: 212832
2014-07-11 17:19:31 +00:00
Ulrich Weigand
1078e34b76 [PowerPC] Implement atomic NAND operations as actual NAND
This changes the implementation of atomic NAND operations
from "a & ~b" (compatible with GCC < 4.4) to actual "~(a & b)"
(compatible with GCC >= 4.4).

This is in line with the common-code and ARM back-end change
implemented in r212433.

llvm-svn: 212547
2014-07-08 16:16:02 +00:00
Ulrich Weigand
5e8a0b585b [PowerPC] Fix no-assert build
r212476 caused a compile failure (unused variable) in a non-assertion
build ...

llvm-svn: 212477
2014-07-07 19:39:44 +00:00
Ulrich Weigand
8f2aff0b0c [PowerPC] Fix "byval align" arguments
Arguments passed as "byval align" should get the specified alignment
in the parameter save area.  There was some code in PPCISelLowering.cpp
that attempted to implement this, but this didn't work correctly:
while code did update the ArgOffset value, it neglected to update
the PtrOff value (which was already computed from the old ArgOffset),
and it also neglected to update GPR_idx -- fields skipped due to
alignment in the save area must likewise be skipped in GPRs.

This patch fixes and simplifies this logic by:
- handling argument offset alignment right at the beginning
  of argument processing, using a new helper routine
  CalculateStackSlotAlignment (this avoids having to update
  PtrOff and other derived values later on)
- not tracking GPR_idx separately, but always computing the
  correct GPR_idx for each argument *from* its ArgOffset
- removing some redundant computation in LowerFormalArguments:
  MinReservedArea must equal ArgOffset after argument processing,
  so there's no use in computing it twice.

[This doesn't change the behavior of the current clang front-end,
since that never creates "byval align" arguments at the moment.
This will change with a follow-on patch, however.]

llvm-svn: 212476
2014-07-07 19:26:41 +00:00
Juergen Ributzka
241f08ad4e [DAG] Pass the argument list to the CallLoweringInfo via move semantics. NFCI.
The argument list vector is never used after it has been passed to the
CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and
pretty much as cheap as keeping a pointer to it.

llvm-svn: 212135
2014-07-01 22:01:54 +00:00
Craig Topper
4c15d35f50 Add ops() method to SDNode that returns an ArrayRef<SDUse>. Use it to simplify some code.
llvm-svn: 211993
2014-06-29 00:40:57 +00:00
Ulrich Weigand
85c764c15a [PowerPC] Constrain base register in PPCRegisterInfo::resolveFrameIndex
I've run into a bug where current LLVM at -O0 (with fast-isel)
generated invalid code like:

        ld 0, 20936(1)                  # 8-byte Folded Reload
        stw 12, 10348(0)
        stw 12, 10344(0)

The underlying vreg had been introduced as base register by the
Local Stack Slot Allocation pass.  That register was constrained
to G8RC by PPCRegisterInfo::materializeFrameBaseRegister to match
the ADDI instruction used to set it, but it was *not* constrained
to G8RC_NOX0 to fit the *use* of the register in an address.

That should have happened in PPCRegisterInfo::resolveFrameIndex.
This patch adds an appropriate constrainRegClass call.

Reviewed by Hal Finkel.

llvm-svn: 211897
2014-06-27 13:04:12 +00:00
Eric Christopher
04713c650c Remove extraneous includes from the target machines.
llvm-svn: 211800
2014-06-26 19:30:05 +00:00
Will Schmidt
fbbc998175 add ppc64/pwr8 as target
includes handling DIR_PWR8 where appropriate
The P7Model Itinerary is currently tied in for use under the P8Model, and will be updated later.

llvm-svn: 211779
2014-06-26 13:36:19 +00:00
Rafael Espindola
196881cf29 Move expression visitation logic up to MCStreamer.
Remove the duplicate from MCRecordStreamer. No functionality change.

llvm-svn: 211714
2014-06-25 15:45:33 +00:00
Rafael Espindola
0e71694bfa Simplify the visitation of target expressions. No functionality change.
llvm-svn: 211707
2014-06-25 15:29:54 +00:00
Bill Schmidt
bfe90f8c83 [PPC64] Fix PR20071 (fctiduz generated for targets lacking that instruction)
PR20071 identifies a problem in PowerPC's fast-isel implementation for
floating-point conversion to integer.  The fctiduz instruction was added in
Power ISA 2.06 (i.e., Power7 and later).  However, this instruction is being
generated regardless of which 64-bit PowerPC target is selected.

The intent is for fast-isel to punt to DAG selection when this instruction is
not available.  This patch implements that change.  For testing purposes, the
existing fast-isel-conversion.ll test adds a RUN line for -mcpu=970 and tests
for the expected code generation.  Additionally, the existing test
fast-isel-conversion-p5.ll was found to be incorrectly expecting the
unavailable instruction to be generated.  I've removed these test variants
since we have adequate coverage in fast-isel-conversion.ll.

llvm-svn: 211627
2014-06-24 20:05:18 +00:00
Ulrich Weigand
b36d130d19 [PowerPC] Refactor getMinCallFrameSize / getMinCallArgumentsSize
As of r211495, the only remaining users of getMinCallFrameSize are in
core ABI code (LowerFormalParameter / LowerCall).  This is actually a
good thing, since the details of the parameter save area are ABI specific.

With the new ELFv2 ABI in particular, the rules defining the size of the
save area will become significantly more complex, so it wouldn't make
sense to implement those outside ABI code that has all required
information.

In preparation, this patch eliminates the getMinCallFrameSize (and
associated getMinCallArgumentsSize) routines, and inlines them into all
callers.  Note that since nearly all call arguments are constant, this
allows simplifying the inlined copies to a single line everywhere.

No change in generate code expected.

llvm-svn: 211497
2014-06-23 14:15:53 +00:00
Ulrich Weigand
ac0412b387 [PowerPC] Allow stack frames without parameter save area
The PPCFrameLowering::determineFrameLayout routine currently ensures
that every function that allocates a stack frame provides space for the
parameter save area (via PPCFrameLowering::getMinCallFrameSize).

This is actually not necessary.  There may be functions that never call
another routine but still allocate a frame; those do not require the
parameter save area.  In the future, with the ELFv2 ABI, even some
routines that do call other functions do not need to allocate the
parameter save area.

While it is not a bug to allocate the parameter area when it is not
needed, it is better to avoid it to save stack space.

Note that when any particular function call requires the parameter save
area, this space will already have been included by ABI code in the size
the CALLSEQ_START insn is annotated with, and therefore included in the
size returned by MFI->getMaxCallFrameSize().

This means that determineFrameLayout simply does not need to care about
the parameter save area.  (It still needs to ensure that every frame
provides the linkage area.)  This is implemented by this patch.

Note that this exposed a bug in the new fast-isel code where the parameter
area was *not* included in the CALLSEQ_START size; this is also fixed.

A couple of test cases needed to be adapted for the new (smaller) stack
frame size those tests now see.

llvm-svn: 211495
2014-06-23 13:47:52 +00:00
Ulrich Weigand
cd08720d8e [PowerPC] Fix IsDarwin arg in PPCFrameLowering:: calls
As remarked in the commit message to r211493, in several places
throughout the 64-bit SVR4 ABI code there are calls to
PPCFrameLowering::getLinkageSize and getMinCallFrameSize
using an incorrect IsDarwin argument of "true".

(Some of those were made explicit by the above refactoring patch, others
have been there all along.)

This patch fixes those places to pass "false" for IsDarwin.

No change in generated code expected.

llvm-svn: 211494
2014-06-23 13:21:43 +00:00
Ulrich Weigand
186fcfa6d3 [PowerPC] Refactor setMinReservedArea and CalculateParameterAndLinkageAreaSize
The PPCISelLowering.cpp routines PPCTargetLowering::setMinReservedArea and
CalculateParameterAndLinkageAreaSize are currently used as subroutines
from both 64-bit SVR4 and Darwin ABI code.

However, the two ABIs are already quite different w.r.t. AltiVec
conventions, and they will become more different when the ELFv2 ABI is
supported.  Also, in general it seems better to disentangle ABI support
routines for different ABIs to avoid accidentally affecting one ABI when
intending to change only the other.

(Actually, the current code strictly speaking already contains a bug:
these routines call PPCFrameLowering::getMinCallFrameSize and
PPCFrameLowering::getLinkageSize with the IsDarwin parameter set to
"true" even on 64-bit SVR4.  This bug currently has no adverse effect
since those routines always return the same for 64-bit SVR4 and 64-bit
Darwin, but it still seems wrong ...  I'll fix this in a follow-up
commit shortly.)

To remove this code sharing, I'm simply inlining both routines into all
call sites (there are just two each, one for 64-bit SVR4 and one for
Darwin), and simplifying due to constant parameters where possible.

A small piece of code that *does* make sense to share is refactored into
the new routine EnsureStackAlignment, now also called from 32-bit SVR4
ABI code.

No change in generated code is expected.

llvm-svn: 211493
2014-06-23 13:08:27 +00:00
Ulrich Weigand
e1fa36144d [PowerPC] Fix on-stack AltiVec arguments with 64-bit SVR4
Current 64-bit SVR4 code seems to have some remnants of Darwin code
in AltiVec argument handing.  This had the effect that AltiVec arguments
(or subsequent arguments) were not correctly placed in the parameter area
in some cases.

The correct behaviour with the 64-bit SVR4 ABI is:
- All AltiVec arguments take up space in the parameter area, just like
  any other arguments, whether vararg or not.
- They are always 16-byte aligned, skipping a parameter area doubleword
  (and the associated GPR, if any), if necessary.

This patch implements the correct behaviour and adds a test case.
(Verified against GCC behaviour via the ABI compat test suite.)

llvm-svn: 211492
2014-06-23 12:36:34 +00:00
Ulrich Weigand
3f880bd06e [PowerPC] Fix small argument stack slot offset for LE
When small arguments (structures < 8 bytes or "float") are passed in a
stack slot in the ppc64 SVR4 ABI, they must reside in the least
significant part of that slot.  On BE, this means that an offset needs
to be added to the stack address of the parameter, but on LE, the least
significant part of the slot has the same address as the slot itself.

This changes the PowerPC back-end ABI code to only add the small
argument stack slot offset for BE.  It also adds test cases to verify
the correct behavior on both BE and LE.

llvm-svn: 211368
2014-06-20 16:34:05 +00:00
Ulrich Weigand
c393e04673 [PowerPC] Remove unnecessary load of r12 in indirect call
When looking at the 64-bit SVR4 indirect call sequence, I noticed
an unnecessary load of r12.  And indeed the code says:

  // R12 must contain the address of an indirect callee. 

But this is not correct; in the 64-bit SVR4 (ELFv1) ABI, there is
no need to load r12 at this point.  It seems this code and comment
is a remnant of code originally shared with the Darwin ABI ...

This patch simply removes the unnecessary load.

llvm-svn: 211203
2014-06-18 18:33:36 +00:00
Ulrich Weigand
e256aa48b9 [PowerPC] Simplify and improve loading into TOC register
During an indirect function call sequence on the 64-bit SVR4 ABI,
generate code must load and then restore the TOC register.

This does not use a regular LOAD instruction since the TOC
register r2 is marked as reserved.  Instead, the are two
special instruction patterns:

 let RST = 2, DS = 2 in
 def LDinto_toc: DSForm_1a<58, 0, (outs), (ins g8rc:$reg),
                     "ld 2, 8($reg)", IIC_LdStLD,
                     [(PPCload_toc i64:$reg)]>, isPPC64;
 
 let RST = 2, DS = 10, RA = 1 in
 def LDtoc_restore : DSForm_1a<58, 0, (outs), (ins),
                     "ld 2, 40(1)", IIC_LdStLD,
                     [(PPCtoc_restore)]>, isPPC64;

Note that these not only restrict the destination of the
load to r2, but they also restrict the *source* of the
load to particular address combinations.  The latter is
a problem when we want to support the ELFv2 ABI, since
there the TOC save slot is no longer at 40(1).

This patch replaces those two instructions with a single
instruction pattern that only hard-codes r2 as destination,
but supports generic addresses as source.  This will allow
supporting the ELFv2 ABI, and also helps generate more
efficient code for calls to absolute addresses (allowing
simplification of the ppc64-calls.ll test case).

llvm-svn: 211193
2014-06-18 17:52:49 +00:00
Ulrich Weigand
1458f67d3e [PowerPC] Do not use BLA with the 64-bit SVR4 ABI
The PowerPC back-end uses BLA to implement calls to functions at
known-constant addresses, which is apparently used for certain
system routines on Darwin.

However, with the 64-bit SVR4 ABI, this is actually incorrect.
An immediate function pointer value on this platform is not
directly usable as a target address for BLA:
- in the ELFv1 ABI, the function pointer value refers to the
  *function descriptor*, not the code address
- in the ELFv2 ABI, the function pointer value refers to the
  global entry point, but BL(A) would only be correct when
  calling the *local* entry point

This bug didn't show up since using immediate function pointer
values is not usually done in the 64-bit SVR4 ABI in the first
place.  However, I ran into this issue with a certain use case
of LLVM as JIT, where immediate function pointer values were
uses to implement callbacks from JITted code to helpers in
statically compiled code.

Fixed by simply not using BLA with the 64-bit SVR4 ABI.

llvm-svn: 211174
2014-06-18 16:14:04 +00:00
Ulrich Weigand
cfcf14bd4c [PowerPC] Fix emitting instruction pairs on LE
My patch r204634 to emit instructions in little-endian format failed to
handle those special cases where we emit a pair of instructions from a
single LLVM MC instructions (like the bl; nop pairs used to implement
the call sequence).

In those cases, we still need to emit the "first" instruction (the one
in the more significant word) first, on both big and little endian,
and not swap them.

llvm-svn: 211171
2014-06-18 15:37:07 +00:00
Bill Schmidt
b0bab996e0 [PPC64] Fix PR19893 - improve code generation for local function addresses
Rafael opened http://llvm.org/bugs/show_bug.cgi?id=19893 to track non-optimal
code generation for forming a function address that is local to the compile
unit.  The existing code was treating both local and non-local functions
identically.

This patch fixes the problem by properly identifying local functions and
generating the proper addis/addi code.  I also noticed that Rafael's earlier
changes to correct the surrounding code in PPCISelLowering.cpp were also
needed for fast instruction selection in PPCFastISel.cpp, so this patch
fixes that code as well.

The existing test/CodeGen/PowerPC/func-addr.ll is modified to test the new
code generation.  I've added a -O0 run line to test the fast-isel code as
well.

Tested on powerpc64[le]-unknown-linux-gnu with no regressions.

llvm-svn: 211056
2014-06-16 21:36:02 +00:00
Eric Christopher
2ea2870f36 The hazard recognizer only needs a subtarget, not a target machine
so make it take one. Fix up all users accordingly.

llvm-svn: 210948
2014-06-13 22:38:52 +00:00
Eric Christopher
72e46a5bfe Fix typo.
llvm-svn: 210947
2014-06-13 22:38:48 +00:00
Eric Christopher
286dd39af0 Move the PPCSelectionDAGInfo off the TargetMachine and onto the
subtarget.

llvm-svn: 210854
2014-06-12 23:02:32 +00:00
Eric Christopher
0382d62f87 Make PPCSelectionDAGInfo take a DataLayout instead of a TargetMachine
since that's all it needs.

llvm-svn: 210853
2014-06-12 22:56:48 +00:00
Eric Christopher
42e57db35c Move PPCTargetLowering off of the TargetMachine and onto the subtarget.
llvm-svn: 210852
2014-06-12 22:50:10 +00:00
Eric Christopher
576f8a1a6b Remove an extraneous this-> to access the subtarget.
llvm-svn: 210849
2014-06-12 22:38:20 +00:00
Eric Christopher
d1b1190bd1 Rename PPCSubTarget to Subtarget in PPCTargetLowering for consistency.
Also remove an extra local subtarget in the initialization functions.

llvm-svn: 210848
2014-06-12 22:38:18 +00:00
Eric Christopher
429be5d609 Move PPCJITInfo off of the TargetMachine and onto the subtarget.
Needed to migrate a few functions around to avoid circular header
dependencies.

llvm-svn: 210845
2014-06-12 22:28:06 +00:00
Eric Christopher
40ba57e7f5 Remove the use of TargetMachine from PPCJITInfo and replace with
the subtarget. Also remove unnecessary argument to the constructor
at the same time, we already have access via the subtarget.

llvm-svn: 210844
2014-06-12 22:19:51 +00:00
Eric Christopher
95b7901fd6 Move PPCInstrInfo off of the target machine and onto the subtarget.
llvm-svn: 210839
2014-06-12 22:05:46 +00:00
Eric Christopher
d4532ed073 Remove TargetMachine from PPCInstrInfo and all dependencies and
replace with the current subtarget.

llvm-svn: 210836
2014-06-12 21:48:52 +00:00
Eric Christopher
fe75c4c997 Move DataLayout from the PPCTargetMachine to the subtarget.
llvm-svn: 210824
2014-06-12 21:08:06 +00:00
Eric Christopher
0c6467adf3 Move PPCFrameLowering into PPCSubtarget from PPCTargetMachine. Use
the initializeSubtargetDependencies code to obtain an initialized
subtarget and migrate a couple of subtarget using functions to the
.cpp file to avoid circular includes.

llvm-svn: 210822
2014-06-12 20:54:11 +00:00
Eric Christopher
a30bc53a6b Remove duplicate copy of InstrItineraryData from the TargetMachine,
it's already on the subtarget.

llvm-svn: 210619
2014-06-11 00:53:17 +00:00
Bill Schmidt
6e11183ad7 [PPC64LE] Recognize shufflevector patterns for little endian
Various masks on shufflevector instructions are recognizable as
specific PowerPC instructions (vector pack, vector merge, etc.).
There is existing code in PPCISelLowering.cpp to recognize the correct
patterns for big endian code.  The masks for these instructions are
different for little endian code due to the big-endian numbering
employed by these instructions.  This patch adds the recognition code
for little endian.

I've added a new test case test/CodeGen/PowerPC/vec_shuffle_le.ll for
this.  The existing recognizer test (vec_shuffle.ll) is unnecessarily
verbose and difficult to read, so I felt it was better to add a new
test rather than modify the old one.

llvm-svn: 210536
2014-06-10 14:35:01 +00:00
Bill Schmidt
41cd7375c8 [PPC64LE] Generate correct code for unaligned little-endian vector loads
The code in PPCTargetLowering::PerformDAGCombine() that handles
unaligned Altivec vector loads generates a lvsl followed by a vperm.
As we've seen in numerous other places, the vperm instruction has a
big-endian bias, and this is fixed for little endian by complementing
the permute control vector and swapping the input operands.  In this
case the lvsl is providing the permute control vector.  Rather than
generating an lvsl and a complement operation, it is sufficient to
generate an lvsr instruction instead.  Thus for LE code generation we
will generate an lvsr rather than an lvsl, and swap the other input
arguments on the vperm.

The existing test/CodeGen/PowerPC/vec_misalign.ll is updated to test
the code generation for PPC64 and PPC64LE, in addition to the existing
PPC32/G5 testing.

llvm-svn: 210493
2014-06-09 22:00:52 +00:00
Bill Schmidt
3ff0a8eb8b [PPC64LE] Generate correct little-endian code for v16i8 multiply
The existing code in PPCTargetLowering::LowerMUL() for multiplying two
v16i8 values assumes that vector elements are numbered in big-endian
order.  For little-endian targets, the vector element numbering is
reversed, but the vmuleub, vmuloub, and vperm instructions still
assume big-endian numbering.  To account for this, we must adjust the
permute control vector and reverse the order of the input registers on
the vperm instruction.

The existing test/CodeGen/PowerPC/vec_mul.ll is updated to be executed
on powerpc64 and powerpc64le targets as well as the original powerpc
(32-bit) target.

llvm-svn: 210474
2014-06-09 16:06:29 +00:00
David Blaikie
f670b953e7 AsmMatchers: Use unique_ptr to manage ownership of MCParsedAsmOperand
I saw at least a memory leak or two from inspection (on probably
untested error paths) and r206991, which was the original inspiration
for this change.

I ran this idea by Jim Grosbach a few weeks ago & he was OK with it.
Since it's a basically mechanical patch that seemed sufficient - usual
post-commit review, revert, etc, as needed.

llvm-svn: 210427
2014-06-08 16:18:35 +00:00
Eric Christopher
db8e2ecde5 Have TargetSelectionDAGInfo take a DataLayout initializer rather than
a TargetMachine since the only thing it wants is DataLayout.

llvm-svn: 210366
2014-06-06 19:04:48 +00:00
Bill Schmidt
647be1ef2c [PPC64LE] Fix lowering of BUILD_VECTOR and SHUFFLE_VECTOR for little endian
This patch fixes a couple of lowering issues for little endian
PowerPC.  The code for lowering BUILD_VECTOR contains a number of
optimizations that are only valid for big endian.  For now, we disable
those optimizations for correctness.  In the future, we will add
analogous optimizations that are correct for little endian.

When lowering a SHUFFLE_VECTOR to a VPERM operation, we again need to
make the now-familiar transformation of swapping the input operands
and complementing the permute control vector.  Correctness of this
transformation is tested by the accompanying test case.

llvm-svn: 210336
2014-06-06 14:06:26 +00:00
Bill Schmidt
9380b3b7bf [PPC64LE] Temporarily disable VSX support in little-endian mode
This is a preliminary patch for the PowerPC64LE support.  In stage 1
of the vector support, we will support the VMX (Altivec) instruction
set, but will not yet support the VSX instructions.  This is merely a
staging issue to provide functional vector support as soon as
possible.

llvm-svn: 210271
2014-06-05 16:21:13 +00:00
Eric Christopher
44a7e98ce5 Omit else branch after return.
llvm-svn: 210034
2014-06-02 17:29:07 +00:00
Eric Christopher
1aad72164e Have the TLOF creation take a Triple rather than needing a subtarget.
llvm-svn: 209937
2014-05-31 00:07:32 +00:00
Eric Christopher
0cc6977494 isSVR4ABI() returned !isDarwin() so just move that to the else
block and remove the unreachable code.

llvm-svn: 209927
2014-05-30 22:47:53 +00:00
Eric Christopher
f0478ea2df Rename CreateTLOF->createTLOF to match the rest of the file and the
rest of the targets with a similar function name.

llvm-svn: 209926
2014-05-30 22:47:48 +00:00
Rafael Espindola
34f4870951 [PPC] Use alias symbols in address computation.
This seems to match what gcc does for ppc and what every other llvm
backend does.

This is a fixed version of r209638. The difference is to avoid any change
in behavior for functions. The logic for using constant pools for function
addresseses is spread over a few places and we have to keep them in sync.

llvm-svn: 209821
2014-05-29 15:41:38 +00:00
Rafael Espindola
acdb307db3 [pr19844] Add thread local mode to aliases.
This matches gcc's behavior. It also seems natural given that aliases
contain other properties that govern how it is accessed (linkage,
visibility, dll storage).

Clang still has to be updated to expose this feature to C.

llvm-svn: 209759
2014-05-28 18:15:43 +00:00
Hal Finkel
47a225fb6c Revert "[PPC] Use alias symbols in address computation."
This reverts commit r209638 because it broke self-hosting on ppc64/Linux. (the
Clang-compiled TableGen would segfault because it jumped to an invalid address
from within _ZNK4llvm17ManagedStaticBase21RegisterManagedStaticEPFPvvEPFvS1_E
(which is within the command-line parameter registration process)).

llvm-svn: 209745
2014-05-28 15:25:06 +00:00
Bill Schmidt
b806d02b5b [PATCH] Correct type used for VADD_SPLAT optimization on PowerPC
In PPCISelLowering.cpp: PPCTargetLowering::LowerBUILD_VECTOR(), there
is an optimization for certain patterns to generate one or two vector
splats followed by a vector add or subtract.  This operation is
represented by a VADD_SPLAT in the selection DAG.  Prior to this
patch, it was possible for the VADD_SPLAT to be assigned the wrong
data type, causing incorrect code generation.  This patch corrects the
problem.

Specifically, the code previously assigned the value type of the
BUILD_VECTOR node to the newly generated VADD_SPLAT node.  This is
correct much of the time, but not always.  The problem is that the
call to isConstantSplat() may return a SplatBitSize that is not the
same as the number of bits in the original element vector type.  The
correct type to assign is a vector type with the same element bit size
as SplatBitSize.

The included test case shows an example of this, where the
BUILD_VECTOR node has a type of v16i8.  The vector to be built is {0,
16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16}.  isConstantSplat
detects that we can generate a splat of 16 for type v8i16, which is
the type we must assign to the VADD_SPLAT node.  If we do not, we
generate a vspltisb of 8 and a vaddubm, which generates the incorrect
result {16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16}.  The correct code generation is a vspltish of 8 and a vadduhm.

This patch also corrected code generation for
CodeGen/PowerPC/2008-07-10-SplatMiscompile.ll, which had been marked
as an XFAIL, so we can remove the XFAIL from the test case.

llvm-svn: 209662
2014-05-27 15:57:51 +00:00
Rafael Espindola
94cd9a1ed6 [PPC] Use alias symbols in address computation.
This seems to match what gcc does for ppc and what every other llvm
backend does.

llvm-svn: 209638
2014-05-26 19:08:19 +00:00
Eric Christopher
1b3b092405 Fix typo.
llvm-svn: 209377
2014-05-22 01:21:44 +00:00
Eric Christopher
7898bbfc19 Avoid using subtarget features when initializing the pass pipeline
on PPC.

llvm-svn: 209376
2014-05-22 01:21:35 +00:00
Eric Christopher
4c56d10ea0 Reset the subtarget for DAGToDAG on every iteration of runOnMachineFunction.
This required updating the generated functions and TD file accordingly
to be pointers rather than const references.

llvm-svn: 209375
2014-05-22 01:07:24 +00:00
Eric Christopher
7880d61aac Make early if conversion dependent upon the subtarget and add
a subtarget hook to enable. Unconditionally add to the pass pipeline
for targets that might want to use it. No functional change.

llvm-svn: 209340
2014-05-21 23:40:26 +00:00
Adam Nemet
37337f0359 [PowerPC] PR19796: Also match ISD::TargetConstant in isIntS16Immediate
The SplitIndexingFromLoad changes exposed a latent isel bug in the PowerPC64
backend.  We matched an immediate offset with STWX8 even though it only
supports register offset.

The culprit is the complex-pattern predicate, SelectAddrIdx, which decides
that if the offset is not ISD::Constant it must be a register.

Many thanks to Bill Schmidt for testing this.

llvm-svn: 209219
2014-05-20 17:20:34 +00:00
Benjamin Kramer
600e24a1cb SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the bswap not.
- On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though.
- On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal.
- On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled.

llvm-svn: 209123
2014-05-19 13:12:38 +00:00