1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 13:02:52 +02:00
Commit Graph

783 Commits

Author SHA1 Message Date
David Blaikie
30abbc718f Remove unused field in DICompileUnit
llvm-svn: 177590
2013-03-20 22:34:33 +00:00
Hal Finkel
6c0ef5bcb5 Add a comment to the CodeGen/PowerPC/asym-regclass-copy.ll test
llvm-svn: 177434
2013-03-19 20:22:32 +00:00
Ulrich Weigand
d5787350ad Rewrite pre-increment store patterns to use standard memory operands.
Currently, pre-increment store patterns are written to use two separate
operands to represent address base and displacement:

  stwu $rS, $ptroff($ptrreg)

This causes problems when implementing the assembler parser, so this
commit changes the patterns to use standard (complex) memory operands
like in all other memory access instruction patterns:

  stwu $rS, $dst

To still match those instructions against the appropriate pre_store
SelectionDAG nodes, the patch uses the new feature that allows a Pat
to match multiple DAG operands against a single (complex) instruction
operand.

Approved by Hal Finkel.

llvm-svn: 177429
2013-03-19 19:52:04 +00:00
Hal Finkel
08d0f0125c Prepare to make r0 an allocatable register on PPC
Currently the PPC r0 register is unconditionally reserved. There are two reasons
for this:

 1. r0 is treated specially (as the constant 0) by certain instructions, and so
    cannot be used with those instructions as a regular register.

 2. r0 is used as a temporary register in the CR-register spilling process
    (where, under some circumstances, we require two GPRs).

This change addresses the first reason by introducing a restricted register
class (without r0) for use by those instructions that treat r0 specially. These
register classes have a new pseudo-register, ZERO, which represents the r0-as-0
use. This has the side benefit of making the existing target code simpler (and
easier to understand), and will make it clear to the register allocator that
uses of r0 as 0 don't conflict will real uses of the r0 register.

Once the CR spilling code is improved, we'll be able to allocate r0.

Adding these extra register classes, for some reason unclear to me, causes
requests to the target to copy 32-bit registers to 64-bit registers. The
resulting code seems correct (and causes no test-suite failures), and the new
test case covers this new kind of asymmetric copy.

As r0 is still reserved, no functionality change intended.

llvm-svn: 177423
2013-03-19 18:51:05 +00:00
Hal Finkel
b4208059c6 Cleanup PPC64 unaligned i64 load/store
Remove an accidentally-added instruction definition and add a comment in the
test case. This is in response to a post-commit review by Bill Schmidt.

No functionality change intended.

llvm-svn: 177404
2013-03-19 15:23:39 +00:00
Hal Finkel
5fd6394c16 Don't reserve R31 on PPC64 unless the frame pointer is needed
llvm-svn: 177379
2013-03-19 08:09:38 +00:00
Hal Finkel
b4a799cf7e Fix a sign-extension bug in PPCCTRLoops
Don't sign extend the immediate value from the OR instruction in
an LIS/OR pair.

llvm-svn: 177361
2013-03-18 23:58:28 +00:00
Hal Finkel
42f72e7756 Fix PPC unaligned 64-bit loads and stores
PPC64 supports unaligned loads and stores of 64-bit values, but
in order to use the r+i forms, the offset must be a multiple of 4.
Unfortunately, this cannot always be determined by examining the
immediate itself because it might be available only via a TOC entry.

In order to get around this issue, we additionally predicate the
selection of the r+i form on the alignment of the load or store
(forcing it to be at least 4 in order to select the r+i form).

llvm-svn: 177338
2013-03-18 23:00:58 +00:00
Bill Schmidt
532eac0ca2 Change test cases to handle unaligned references.
Hal Finkel recently added code to allow unaligned memory references
for PowerPC.  Two tests were temporarily modified with
-disable-ppc-unaligned to keep them from failing.  This patch adjusts
the expected code generation for the unaligned references.

llvm-svn: 177328
2013-03-18 22:12:04 +00:00
David Blaikie
928fd30ba7 Remove unnecessary leading comment characters in lit-only file
llvm-svn: 177327
2013-03-18 22:08:16 +00:00
David Blaikie
ae14af22c5 Include '.test' suffix in target specific lit configs that need it
Apparently my final cleanup to use a relevant suffix for these tests before
committing r176831 caused them to stop running since lit wasn't configured to
run tests with that suffix in those directories (why don't we just have a
global suffix list?). So, add the suffix to the relevant directories & fix the
test that has bitrotted over the last week due to my debug info schema changes.

llvm-svn: 177315
2013-03-18 20:31:44 +00:00
Hal Finkel
ad2997da12 Fix large count and negative constant count handling in PPCCTRLoops
This commit fixes an assert that would occur on loops with large constant counts
(like looping for ((uint32_t) -1) iterations on PPC64). The existing code did
not handle counts that it computed to be negative (asserting instead), but
these can be created with valid inputs.

This bug was discovered by bugpoint while I was attempting to isolate a
completely different problem.

Also, in writing test cases for the negative-count problem, I discovered that
the ori/lsi handling was broken (there was a typo which caused the logic that
was supposed to detect these pairs and extract the iteration count to always
fail). This has now also been corrected (and is covered by one of the new test
cases).

llvm-svn: 177295
2013-03-18 17:40:44 +00:00
Hal Finkel
2ab64cdbb2 Cleanup initial-value constants in PPCCTRLoops
Because the initial-value constants had not been added to the list
of instructions considered for DCE the resulting code had redundant
constant-materialization instructions.

llvm-svn: 177294
2013-03-18 17:40:27 +00:00
Hal Finkel
54a73d3443 Improve PPC VR (Altivec) register spilling
This change cleans up two issues with Altivec register spilling:

  1. The spilling code was inefficient (using two instructions, and add and a
     load, when just one would do)

  2. The code assumed that r0 would always be available (true for now, but this
     will change)

The new code handles VR spilling just like GPR spills but forced into r+r mode.
As a result, when any VR spills are present, we must now always allocate the
register-scavenger spill slot.

llvm-svn: 177231
2013-03-17 04:43:44 +00:00
Hal Finkel
e729872345 Remove FIXMEs in PPC test cases related to unaligned loads/stores
As pointed out by Bill in response to r177160, these two FIXMEs
can also be removed.

llvm-svn: 177229
2013-03-16 23:02:31 +00:00
Hal Finkel
a5a86f0a8e Enable unaligned memory access on PPC for scalar types
Unaligned access is supported on PPC for non-vector types, and is generally
more efficient than manually expanding the loads and stores.

A few of the existing test cases were using expanded unaligned loads and stores
to test other features (like load/store with update), and for these test cases,
unaligned access remains disabled.

llvm-svn: 177160
2013-03-15 15:27:13 +00:00
Hal Finkel
2ecb85412e Protect PPC Altivec patterns with a predicate
In preparation for the addition of other SIMD ISA extensions (such as QPX) we
need to make sure that all Altivec patterns are properly predicated on having
Altivec support.

No functionality change intended (one test case needed to be updated b/c it
assumed that Altivec intrinsics would be supported without enabling Altivec
support).

llvm-svn: 177152
2013-03-15 13:21:21 +00:00
Hal Finkel
503a3723d1 Allocate the RS spill slot for any PPC function with spills and a large stack frame
For spills into a large stack frame, the FI-elimination code uses the register
scavenger to obtain a free GPR for use with an r+r-addressed load or store.
When there are no available GPRs, the scavenger gets one by using its spill
slot. Previously, we were not always allocating that spill slot and the RS
would assert when the spill slot was needed.

I don't currently have a small test that triggered the assert, but I've
created a small regression test that verifies that the spill slot is now
added when the stack frame is sufficiently large.

llvm-svn: 177140
2013-03-15 05:06:04 +00:00
Hal Finkel
37a5522734 Not all PPC functions with a frame pointer need a RS spill slot
We used to add a spill slot for the register scavenger whenever the function
has a frame pointer. This is unnecessarily conservative: We may need the spill
slot for dynamic stack allocations, and functions with dynamic stack
allocations always have a FP, but we might also have a FP for other reasons
(such as the user explicitly disabling frame-pointer elimination), and we don't
necessarily need a spill slot for those functions.

The structsinregs test needed adjustment because it disables FP elimination.

llvm-svn: 177106
2013-03-14 19:34:32 +00:00
David Blaikie
3c701e7671 Refactor filename/directory in DICompileUnit into a DIFile
This is the next step towards making the metadata for DIScopes have a common
prefix rather than having to delegate based on their tag type.

llvm-svn: 176913
2013-03-13 00:01:35 +00:00
David Blaikie
98d9ccffb8 Remove unused "isMain" field from DICompileUnit
llvm-svn: 176910
2013-03-12 22:43:04 +00:00
David Blaikie
c37a0a822a Update debug info test cases with empty SplitDebugFilename field.
This could be 'null' or the empty string, DIDescriptor::getStringField
coalesces the two cases anyway so it's just a matter of legible/efficient
representation.

The change in behavior of the DICompileUnit::get* functions could be
subsumed by the full verification check - but ideally that should just be an
assertion if we could front-load the actual debug info metadata failure paths.

llvm-svn: 176907
2013-03-12 22:25:36 +00:00
Jan Wen Voung
74d9647d18 Revert the test moves from 176733. Use "REQUIRES: asserts" instead.
llvm-svn: 176873
2013-03-12 16:27:52 +00:00
Hal Finkel
3edf100dda Don't reserve R2 on Darwin/PPC
Now that only the register-scavenger version of the CR spilling code remains,
we no longer need the Darwin R2 hack. Darwin can use R0 as a spare register in
any case where the System V ABI uses it (R0 is special architecturally, and so
is reserved under all common ABIs).

A few test cases needed to be updated to reflect the register-allocation changes.

llvm-svn: 176868
2013-03-12 15:18:14 +00:00
David Blaikie
b8d3b70835 Remove duplicate test contents.
llvm-svn: 176831
2013-03-11 22:10:14 +00:00
Benjamin Kramer
202c1b8357 Test case hygiene.
llvm-svn: 176772
2013-03-09 18:25:40 +00:00
Jan Wen Voung
2346df4d41 Disable statistics on Release builds and move tests that depend on -stats.
Summary:
Statistics are still available in Release+Asserts (any +Asserts builds),
and stats can also be turned on with LLVM_ENABLE_STATS.

Move some of the FastISel stats that were moved under DEBUG()
back out of DEBUG(), since stats are disabled across the board now.

Many tests depend on grepping "-stats" output.  Move those into
a orig_dir/Stats/. so that they can be marked as unsupported
when building without statistics.

Differential Revision: http://llvm-reviews.chandlerc.com/D486

llvm-svn: 176733
2013-03-08 22:56:31 +00:00
Bill Schmidt
5440b8eaca Fix PR15332 (patch by Florian Zeitz).
There's no need to generate a stack frame for PPC32 SVR4 when there are
no local variables assigned to the stack, i.e., when no red zone is needed.
(PPC64 supports a red zone, but PPC32 does not.)

llvm-svn: 176124
2013-02-26 21:28:57 +00:00
Bill Schmidt
76befd83d4 Fix PR15359.
The PowerPC TLS relocation types were not previously added to the
necessary list in MCELFStreamer::fixSymbolsInTLSFixups().  Now they are!

llvm-svn: 176094
2013-02-26 16:41:03 +00:00
Bill Schmidt
a7e4a58051 Fix missing relocation for TLS addressing peephole optimization.
Report and fix due to Kai Nacke.  Testcase update by me.

llvm-svn: 176029
2013-02-25 16:44:35 +00:00
Bill Schmidt
049ba390f5 Large code model support for PowerPC.
Large code model is identical to medium code model except that the
addis/addi sequence for "local" accesses is never used.  All accesses
use the addis/ld sequence.

The coding changes are straightforward; most of the patch is taken up
with creating variants of the medium model tests for large model.

llvm-svn: 175767
2013-02-21 17:12:27 +00:00
Bill Schmidt
0e7935e723 PPCDAGToDAGISel::PostprocessISelDAG()
This patch implements the PPCDAGToDAGISel::PostprocessISelDAG virtual
method to perform post-selection peephole optimizations on the DAG
representation.

One optimization is implemented here:  folds to clean up complex
addressing expressions for thread-local storage and medium code
model.  It will also be useful for large code model sequences when
those are added later.  I originally thought about doing this on the
MI representation prior to register assignment, but it's difficult to
do effective global dead code elimination at that point.  DCE is
trivial on the DAG representation.

A typical example of a candidate code sequence in assembly:

   addis 3, 2, globalvar@toc@ha
   addi  3, 3, globalvar@toc@l
   lwz   5, 0(3)

When the final instruction is a load or store with an immediate offset
of zero, the offset from the add-immediate can replace the zero,
provided the relocation information is carried along:

   addis 3, 2, globalvar@toc@ha
   lwz   5, globalvar@toc@l(3)

Since the addi can in general have multiple uses, we need to only
delete the instruction when the last use is removed.

llvm-svn: 175697
2013-02-21 00:38:25 +00:00
Bill Schmidt
9e8b42e2f9 Stabilize vec_constants.ll
llvm-svn: 175683
2013-02-20 22:43:03 +00:00
Bill Schmidt
bcb4fa48fa Additional fixes for bug 15155.
This handles the cases where the 6-bit splat element is odd, converting
to a three-instruction sequence to add or subtract two splats.  With this
fix, the XFAIL in test/CodeGen/PowerPC/vec_constants.ll is removed.

llvm-svn: 175663
2013-02-20 20:41:42 +00:00
Bill Schmidt
358367c60f Fix bug 14779 for passing anonymous aggregates [patch by Kai Nacke].
The PPC backend doesn't handle these correctly.  This patch uses logic
similar to that in the X86 and ARM backends to track these arguments
properly.

llvm-svn: 175635
2013-02-20 17:31:41 +00:00
Bill Schmidt
93b2fc9f50 Fix PR15155: lost vadd/vsplat optimization.
During lowering of a BUILD_VECTOR, we look for opportunities to use a
vector splat.  When the splatted value fits in 5 signed bits, a single
splat does the job.  When it doesn't fit in 5 bits but does fit in 6,
and is an even value, we can splat on half the value and add the result
to itself.

This last optimization hasn't been working recently because of improved
constant folding.  To circumvent this, create a pseudo VADD_SPLAT that
can be expanded during instruction selection.

llvm-svn: 175632
2013-02-20 15:50:31 +00:00
Hal Finkel
624f5d5d67 DAGCombiner: Constant folding around pre-increment loads/stores
Previously, even when a pre-increment load or store was generated,
we often needed to keep a copy of the original base register for use
with other offsets. If all of these offsets are constants (including
the offset which was combined into the addressing mode), then this is
clearly unnecessary. This change adjusts these other offsets to use the
new incremented address.

llvm-svn: 174746
2013-02-08 21:35:47 +00:00
Benjamin Kramer
d07b68b101 Disable a couple more vector splat optimizations on PPC.
I didn't see those because the test case used "not grep". FileCheck the test and
XFAIL it, preserving the old optimization, so this can be fixed eventually.

llvm-svn: 174330
2013-02-04 15:52:32 +00:00
Benjamin Kramer
aa2475fd87 SelectionDAG: Teach FoldConstantArithmetic how to deal with vectors.
This required disabling a PowerPC optimization that did the following:
input:
x = BUILD_VECTOR <i32 16, i32 16, i32 16, i32 16>
lowered to:
tmp = BUILD_VECTOR <i32 8, i32 8, i32 8, i32 8>
x = ADD tmp, tmp

The add now gets folded immediately and we're back at the BUILD_VECTOR we
started from. I don't see a way to fix this currently so I left it disabled
for now.

Fix some trivially foldable X86 tests too.

llvm-svn: 174325
2013-02-04 15:19:18 +00:00
David Blaikie
7c3ec60da7 Remove the (apparently) unnecessary debug info metadata indirection.
The main lists of debug info metadata attached to the compile_unit had an extra
layer of metadata nodes they went through for no apparent reason. This patch
removes that (& still passes just as much of the GDB 7.5 test suite). If anyone
can show evidence as to why these extra metadata nodes are there I'm open to
reverting this patch & documenting why they're there.

llvm-svn: 174266
2013-02-02 05:56:24 +00:00
Bill Schmidt
d3beefd1a4 LLVM enablement for some older PowerPC CPUs
llvm-svn: 174230
2013-02-01 22:59:51 +00:00
Hal Finkel
32085870d7 PPC QPX requires a 32-byte aligned stack
On systems which support the QPX vector instructions, the stack must be
32-byte aligned.

llvm-svn: 173993
2013-01-30 23:43:27 +00:00
Hal Finkel
7969f87a01 Add definitions for the PPC a2q core marked as having QPX available
This is the first commit of a large series which will add support for the
QPX vector instruction set to the PowerPC backend. This instruction set is
used on the IBM Blue Gene/Q supercomputers.

llvm-svn: 173973
2013-01-30 21:17:42 +00:00
Bill Schmidt
28daa2bbf4 This patch addresses bug 15031.
The common code in the post-RA scheduler to break anti-dependencies on the
critical path contained a flaw.  In the reported case, an anti-dependency
between the overlapping registers %X4 and %R4 exists:

	%X29<def> = OR8 %X4, %X4
	%R4<def>, %X3<def,dead,tied3> = LBZU 1, %X3<kill,tied1>

The unpatched code breaks the dependency by replacing %R4 and its uses
with %R3, the first register on the available list.  However, %R3 and
%X3 overlap, so this creates two overlapping definitions on the same
instruction.

The fix is straightforward, preventing selection of a register that
overlaps any other defined register on the same instruction.

The test case is reduced from the bug report, and verifies that we no
longer produce "lbzu 3, 1(3)" when breaking this anti-dependency.

llvm-svn: 173706
2013-01-28 18:36:58 +00:00
Bill Schmidt
5aab290f08 Restore reverted test case, this time with REQUIRES: asserts
llvm-svn: 172747
2013-01-17 19:46:51 +00:00
Bill Schmidt
2bbdbadcdc Remove bad test case
llvm-svn: 172746
2013-01-17 19:39:36 +00:00
Bill Schmidt
27aac7a2f3 This patch fixes PR13626 by providing i128 support in the return
calling convention.  128-bit integers are now properly returned
in GPR3 and GPR4 on PowerPC.

llvm-svn: 172745
2013-01-17 19:34:57 +00:00
Bill Schmidt
a2682ebf0e This patch fixes the PPC calling convention to handle returns of
_Complex float and _Complex long double, by simply increasing the
number of floating point registers available for return values.

The test case verifies that the correct registers are loaded.

llvm-svn: 172733
2013-01-17 17:45:19 +00:00
Bill Schmidt
ae8a966ad7 This patch addresses an incorrect transformation in the DAG combiner.
The included test case is derived from one of the GCC compatibility tests.
The problem arises after the selection DAG has been converted to type-legalized
form.  The combiner first sees a 64-bit load that can be converted into a
pre-increment form.  The original load feeds into a SRL that isolates the
upper 32 bits of the loaded doubleword.  This looks like an opportunity for
DAGCombiner::ReduceLoadWidth() to replace the 64-bit load with a 32-bit load.

However, this transformation is not valid, as the replacement load is not
a pre-increment load.  The pre-increment load produces an extra result,
which feeds a subsequent add instruction.  The replacement load only has
one result value, and this value is propagated to all uses of the pre-
increment load, including the add.  Because the add is looking for the
second result value as its operand, it ends up attempting to add a constant
to a token chain, resulting in a crash.

So the patch simply disables this transformation for any load with more than
two result values.

llvm-svn: 172480
2013-01-14 22:04:38 +00:00
Benjamin Kramer
17f2252b33 When lowering an inreg sext first shift left, then right arithmetically.
Shifting right two times will only yield zero. Should fix
SingleSource/UnitTests/SignlessTypes/factor.

llvm-svn: 172322
2013-01-12 19:06:44 +00:00
Nadav Rotem
7a3f564b06 PPC: Implement efficient lowering of sign_extend_inreg.
llvm-svn: 172269
2013-01-11 22:57:48 +00:00
Tim Northover
978c012c2a Simplify writing floating types to assembly.
This removes previous special cases for each floating-point type in favour of a
shared codepath.

llvm-svn: 172189
2013-01-11 10:36:13 +00:00
Andrew Trick
c15e94c204 MIsched: add an ILP window property to machine model.
This was an experimental option, but needs to be defined
per-target. e.g. PPC A2 needs to aggressively hide latency.

I converted some in-order scheduling tests to A2. Hal is working on
more test cases.

llvm-svn: 171946
2013-01-09 03:36:49 +00:00
Tim Northover
337241fa6f Specify complete triple for fp128 tests.
This avoids FileCheck failing over different comment characters in
assembly (notably powerpc64 on Linux vs Darwin) and should fix David's
build-bot.

llvm-svn: 171886
2013-01-08 19:36:33 +00:00
Tim Northover
dde4cda878 Allow the asm printer to print fp128 values properly.
llvm-svn: 171866
2013-01-08 16:56:23 +00:00
Bill Schmidt
250d836387 This patch addresses bug 14678 by fixing two problems in medium code model
code generation.  Variables addressed through a GlobalAlias were not being
handled, and variables with available_externally linkage were treated
incorrectly.  The patch contains two new tests to verify the correct code
generation for these cases.

llvm-svn: 171778
2013-01-07 19:29:18 +00:00
Hal Finkel
3bc1b07a1b Support ppcf128 in SelectionDAG::getConstantFP
Fixes pr14751.

Patch by Kai; Thanks!

llvm-svn: 171261
2012-12-30 19:03:32 +00:00
Hal Finkel
62efc81644 Loosen scheduling restrictions on the PPC dcbt intrinsic
As with the prefetch intrinsic to which it maps, simply have dcbt
marked as reading from and writing to its arguments instead of having
unmodeled side effects. While this might cause unwanted code motion
(because aliasing checks don't really capture cache-line sharing),
it is more important that prefetches in unrolled loops don't block
the scheduler from rearranging the unrolled loop body.

llvm-svn: 171073
2012-12-25 18:51:18 +00:00
Hal Finkel
6b98f1baa2 Expand PPC64 atomic load and store
Use of store or load with the atomic specifier on 64-bit types would
cause instruction-selection failures. As with the 32-bit case, these
can use the default expansion in terms of cmp-and-swap.

llvm-svn: 171072
2012-12-25 17:22:53 +00:00
Rafael Espindola
70826f068e Simplify the testcase a bit.
I checked that it would still crash llc before the corresponding fix.

llvm-svn: 170709
2012-12-20 17:47:27 +00:00
Benjamin Kramer
fac7a73ad8 PowerPC: Expand VSELECT nodes.
There's probably a better expansion for those nodes than the default for
altivec, but this is better than crashing. VSELECTs occur in loop vectorizer
output.

llvm-svn: 170551
2012-12-19 15:49:14 +00:00
Hal Finkel
e689252aae Check multiple register classes for inline asm tied registers
A register can be associated with several distinct register classes.
For example, on PPC, the floating point registers are each associated with
both F4RC (which holds f32) and F8RC (which holds f64). As a result, this code
would fail when provided with a floating point register and an f64 operand
because it would happen to find the register in the F4RC class first and
return that. From the F4RC class, SDAG would extract f32 as the register
type and then assert because of the invalid implied conversion between
the f64 value and the f32 register.

Instead, search all register classes. If a register class containing the
the requested register has the requested type, then return that register
class. Otherwise, as before, return the first register class found that
contains the requested register.

llvm-svn: 170436
2012-12-18 17:50:58 +00:00
Bill Schmidt
ac9760fe81 This patch removes some nondeterminism from direct object file output
for TLS dynamic models on 64-bit PowerPC ELF.  The default sort routine
for relocations only sorts on the r_offset field; but with TLS, there
can be two relocations with the same r_offset.  For PowerPC, this patch
sorts secondarily on descending r_type, which matches the behavior
expected by the linker.

llvm-svn: 170237
2012-12-14 20:28:38 +00:00
Bill Schmidt
c895316923 This patch improves the 64-bit PowerPC InitialExec TLS support by providing
for a wider range of GOT entries that can hold thread-relative offsets.
This matches the behavior of GCC, which was not documented in the PPC64 TLS
ABI.  The ABI will be updated with the new code sequence.

Former sequence:

  ld 9,x@got@tprel(2)
  add 9,9,x@tls

New sequence:

  addis 9,2,x@got@tprel@ha
  ld 9,x@got@tprel@l(9)
  add 9,9,x@tls

Note that a linker optimization exists to transform the new sequence into
the shorter sequence when appropriate, by replacing the addis with a nop
and modifying the base register and relocation type of the ld.

llvm-svn: 170209
2012-12-14 17:02:38 +00:00
Bill Schmidt
29d2ca4de4 The ordering of two relocations on the same instruction is apparently not
predictable when compiled on at least one non-PowerPC host.  Source of
nondeterminism not apparent.  Restrict the test to build on PowerPC hosts
for now while looking into the issue further.

llvm-svn: 170016
2012-12-12 20:29:20 +00:00
Bill Schmidt
7a93daad1a This patch implements local-dynamic TLS model support for the 64-bit
PowerPC target.  This is the last of the four models, so we now have 
full TLS support.

This is mostly a straightforward extension of the general dynamic model.
I had to use an additional Chain operand to tie ADDIS_DTPREL_HA to the
register copy following ADDI_TLSLD_L; otherwise everything above the
ADDIS_DTPREL_HA appeared dead and was removed.

As before, there are new test cases to test the assembly generation, and
the relocations output during integrated assembly.  The expected code
gen sequence can be read in test/CodeGen/PowerPC/tls-ld.ll.

There are a couple of things I think can be done more efficiently in the
overall TLS code, so there will likely be a clean-up patch forthcoming;
but for now I want to be sure the functionality is in place.

Bill

llvm-svn: 170003
2012-12-12 19:29:35 +00:00
Bill Schmidt
45b56f7632 This patch implements the general dynamic TLS model for 64-bit PowerPC.
Given a thread-local symbol x with global-dynamic access, the generated
code to obtain x's address is:

     Instruction                            Relocation            Symbol
  addis ra,r2,x@got@tlsgd@ha           R_PPC64_GOT_TLSGD16_HA       x
  addi  r3,ra,x@got@tlsgd@l            R_PPC64_GOT_TLSGD16_L        x
  bl __tls_get_addr(x@tlsgd)           R_PPC64_TLSGD                x
                                       R_PPC64_REL24           __tls_get_addr
  nop
  <use address in r3>

The implementation borrows from the medium code model work for introducing
special forms of ADDIS and ADDI into the DAG representation.  This is made
slightly more complicated by having to introduce a call to the external
function __tls_get_addr.  Using the full call machinery is overkill and,
more importantly, makes it difficult to add a special relocation.  So I've
introduced another opcode GET_TLS_ADDR to represent the function call, and
surrounded it with register copies to set up the parameter and return value.

Most of the code is pretty straightforward.  I ran into one peculiarity
when I introduced a new PPC opcode BL8_NOP_ELF_TLSGD, which is just like
BL8_NOP_ELF except that it takes another parameter to represent the symbol
("x" above) that requires a relocation on the call.  Something in the 
TblGen machinery causes BL8_NOP_ELF and BL8_NOP_ELF_TLSGD to be treated
identically during the emit phase, so this second operand was never
visited to generate relocations.  This is the reason for the slightly
messy workaround in PPCMCCodeEmitter.cpp:getDirectBrEncoding().

Two new tests are included to demonstrate correct external assembly and
correct generation of relocations using the integrated assembler.

Comments welcome!

Thanks,
Bill

llvm-svn: 169910
2012-12-11 20:30:11 +00:00
Hal Finkel
3b65689ab9 Use GetUnderlyingObjects in misched
misched used GetUnderlyingObject in order to break false load/store
dependencies, and the -enable-aa-sched-mi feature similarly relied on
GetUnderlyingObject in order to ensure it is safe to use the aliasing analysis.
Unfortunately, GetUnderlyingObject does not recurse through phi nodes, and so
(especially due to LSR) all of these mechanisms failed for
induction-variable-dependent loads and stores inside loops.

This change replaces uses of GetUnderlyingObject with GetUnderlyingObjects
(which will recurse through phi and select instructions) in misched.

Andy reviewed, tested and simplified this patch; Thanks!

llvm-svn: 169744
2012-12-10 18:49:16 +00:00
Bill Schmidt
9d8cdcda41 This patch introduces initial-exec model support for thread-local storage
on 64-bit PowerPC ELF.

The patch includes code to handle external assembly and MC output with the
integrated assembler.  It intentionally does not support the "old" JIT.

For the initial-exec TLS model, the ABI requires the following to calculate
the address of external thread-local variable x:

 Code sequence            Relocation                  Symbol
  ld 9,x@got@tprel(2)      R_PPC64_GOT_TPREL16_DS      x
  add 9,9,x@tls            R_PPC64_TLS                 x

The register 9 is arbitrary here.  The linker will replace x@got@tprel
with the offset relative to the thread pointer to the generated GOT
entry for symbol x.  It will replace x@tls with the thread-pointer
register (13).

The two test cases verify correct assembly output and relocation output
as just described.

PowerPC-specific selection node variants are added for the two
instructions above:  LD_GOT_TPREL and ADD_TLS.  These are inserted
when an initial-exec global variable is encountered by
PPCTargetLowering::LowerGlobalTLSAddress(), and later lowered to
machine instructions LDgotTPREL and ADD8TLS.  LDgotTPREL is a pseudo
that uses the same LDrs support added for medium code model's LDtocL,
with a different relocation type.

The rest of the processing is straightforward.

llvm-svn: 169281
2012-12-04 16:18:08 +00:00
Chad Rosier
05b569a7a5 test/CodeGen/PowerPC/vec_mul.ll: Add a triple. Thanks, Hal.
llvm-svn: 169026
2012-11-30 19:15:10 +00:00
Chad Rosier
bee049d9ed test/CodeGen/PowerPC/vec_mul.ll: Fix register operands.
llvm-svn: 169020
2012-11-30 18:29:01 +00:00
NAKAMURA Takumi
a95fd58fdb test/CodeGen/PowerPC: Add explicit -march=ppc32.
FIXME: Please add another RUN line if you would like to check also on ppc64.
llvm-svn: 168999
2012-11-30 13:28:31 +00:00
Adhemerval Zanella
72208bbf33 This patch fixes the Altivec addend construction for the fused multiply-add
instruction (vmaddfp) to conform with IEEE to ensure the sign of a zero
result when resulting product is -0.0.

The -0.0 vector addend to vmaddfp is generated by a creating a vector
with full bits sets and then shifting each elements by 31-bits to the
left, resulting in a vector of 0x80000000 (or -0.0 as float).

The 'buildvec_canonicalize.ll' was adjusted to reflect this change and
the 'vec_mul.ll' was complemented with the float vector multiplication
test.

llvm-svn: 168998
2012-11-30 13:05:44 +00:00
Bill Schmidt
9f4da44752 This patch makes medium code model the default for 64-bit PowerPC ELF.
When the CodeGenInfo is to be created for the PPC64 target machine,
a default code-model selection is converted to CodeModel::Medium
provided we are not targeting the Darwin OS.  Defaults for Darwin
are unaffected.

llvm-svn: 168747
2012-11-27 23:36:26 +00:00
Bill Schmidt
0975882ed4 This patch implements medium code model support for 64-bit PowerPC.
The default for 64-bit PowerPC is small code model, in which TOC entries
must be addressable using a 16-bit offset from the TOC pointer.  Additionally,
only TOC entries are addressed via the TOC pointer.

With medium code model, TOC entries and data sections can all be addressed
via the TOC pointer using a 32-bit offset.  Cooperation with the linker
allows 16-bit offsets to be used when these are sufficient, reducing the
number of extra instructions that need to be executed.  Medium code model
also does not generate explicit TOC entries in ".section toc" for variables
that are wholly internal to the compilation unit.

Consider a load of an external 4-byte integer.  With small code model, the
compiler generates:

	ld 3, .LC1@toc(2)
	lwz 4, 0(3)

	.section	.toc,"aw",@progbits
.LC1:
	.tc ei[TC],ei

With medium model, it instead generates:

	addis 3, 2, .LC1@toc@ha
	ld 3, .LC1@toc@l(3)
	lwz 4, 0(3)

	.section	.toc,"aw",@progbits
.LC1:
	.tc ei[TC],ei

Here .LC1@toc@ha is a relocation requesting the upper 16 bits of the
32-bit offset of ei's TOC entry from the TOC base pointer.  Similarly,
.LC1@toc@l is a relocation requesting the lower 16 bits.  Note that if
the linker determines that ei's TOC entry is within a 16-bit offset of
the TOC base pointer, it will replace the "addis" with a "nop", and
replace the "ld" with the identical "ld" instruction from the small
code model example.

Consider next a load of a function-scope static integer.  For small code
model, the compiler generates:

	ld 3, .LC1@toc(2)
	lwz 4, 0(3)

	.section	.toc,"aw",@progbits
.LC1:
	.tc test_fn_static.si[TC],test_fn_static.si
	.type	test_fn_static.si,@object
	.local	test_fn_static.si
	.comm	test_fn_static.si,4,4

For medium code model, the compiler generates:

	addis 3, 2, test_fn_static.si@toc@ha
	addi 3, 3, test_fn_static.si@toc@l
	lwz 4, 0(3)

	.type	test_fn_static.si,@object
	.local	test_fn_static.si
	.comm	test_fn_static.si,4,4

Again, the linker may replace the "addis" with a "nop", calculating only
a 16-bit offset when this is sufficient.

Note that it would be more efficient for the compiler to generate:

	addis 3, 2, test_fn_static.si@toc@ha
        lwz 4, test_fn_static.si@toc@l(3)

The current patch does not perform this optimization yet.  This will be
addressed as a peephole optimization in a later patch.

For the moment, the default code model for 64-bit PowerPC will remain the
small code model.  We plan to eventually change the default to medium code
model, which matches current upstream GCC behavior.  Note that the different
code models are ABI-compatible, so code compiled with different models will
be linked and execute correctly.

I've tested the regression suite and the application/benchmark test suite in
two ways:  Once with the patch as submitted here, and once with additional
logic to force medium code model as the default.  The tests all compile
cleanly, with one exception.  The mandel-2 application test fails due to an
unrelated ABI compatibility with passing complex numbers.  It just so happens
that small code model was incredibly lucky, in that temporary values in 
floating-point registers held the expected values needed by the external
library routine that was called incorrectly.  My current thought is to correct
the ABI problems with _Complex before making medium code model the default,
to avoid introducing this "regression."

Here are a few comments on how the patch works, since the selection code
can be difficult to follow:

The existing logic for small code model defines three pseudo-instructions:
LDtoc for most uses, LDtocJTI for jump table addresses, and LDtocCPT for
constant pool addresses.  These are expanded by SelectCodeCommon().  The
pseudo-instruction approach doesn't work for medium code model, because
we need to generate two instructions when we match the same pattern.
Instead, new logic in PPCDAGToDAGISel::Select() intercepts the TOC_ENTRY
node for medium code model, and generates an ADDIStocHA followed by either
a LDtocL or an ADDItocL.  These new node types correspond naturally to
the sequences described above.

The addis/ld sequence is generated for the following cases:
 * Jump table addresses
 * Function addresses
 * External global variables
 * Tentative definitions of global variables (common linkage)

The addis/addi sequence is generated for the following cases:
 * Constant pool entries
 * File-scope static global variables
 * Function-scope static variables

Expanding to the two-instruction sequences at select time exposes the
instructions to subsequent optimization, particularly scheduling.

The rest of the processing occurs at assembly time, in
PPCAsmPrinter::EmitInstruction.  Each of the instructions is converted to
a "real" PowerPC instruction.  When a TOC entry needs to be created, this
is done here in the same manner as for the existing LDtoc, LDtocJTI, and
LDtocCPT pseudo-instructions (I factored out a new routine to handle this).

I had originally thought that if a TOC entry was needed for LDtocL or
ADDItocL, it would already have been generated for the previous ADDIStocHA.
However, at higher optimization levels, the ADDIStocHA may appear in a 
different block, which may be assembled textually following the block
containing the LDtocL or ADDItocL.  So it is necessary to include the
possibility of creating a new TOC entry for those two instructions.

Note that for LDtocL, we generate a new form of LD called LDrs.  This
allows specifying the @toc@l relocation for the offset field of the LD
instruction (i.e., the offset is replaced by a SymbolLo relocation).
When the peephole optimization described above is added, we will need
to do similar things for all immediate-form load and store operations.

The seven "mcm-n.ll" test cases are kept separate because otherwise the
intermingling of various TOC entries and so forth makes the tests fragile
and hard to understand.

The above assumes use of an external assembler.  For use of the
integrated assembler, new relocations are added and used by
PPCELFObjectWriter.  Testing is done with "mcm-obj.ll", which tests for
proper generation of the various relocations for the same sequences
tested with the external assembler.

llvm-svn: 168708
2012-11-27 17:35:46 +00:00
Eli Bendersky
d85e96be00 Rewrite test to not use a FileCheck variable and redefine it on the same line.
In preparation for the FileCheck functionality change which will allow using
a variable later on the same line.

No functionality change.

llvm-svn: 168588
2012-11-26 14:09:46 +00:00
Benjamin Kramer
42c6896fe3 PPC: MCize most of the darwin PIC emission.
The last remaining bit is "bcl 20, 31, AnonSymbol", which I couldn't find the
instruction definition for. Only whitespace changes in assembly output.

llvm-svn: 168541
2012-11-24 13:18:25 +00:00
Andrew Trick
ab75b8798c Use a full triple for a PPC test case for asm syntax.
llvm-svn: 168283
2012-11-18 06:21:03 +00:00
Andrew Trick
d4358df73b Silence the buildbots for this test while I figure out the triple
llvm-svn: 168249
2012-11-17 03:39:26 +00:00
Andrew Trick
52f84ce773 Broaden isSchedulingBoundary to check aliases of SP.
On PPC the stack pointer is X1, but ADJCALLSTACK writes R1.

Fixes PR14315: Register regmask dependency problem with misched.

llvm-svn: 168248
2012-11-17 03:35:11 +00:00
Adhemerval Zanella
c159b16933 PowerPC: Lowering floor intrinsic for Altivec
This patch lowers the llvm.floor, llvm.ceil, llvm.trunc, and
llvm.nearbyint to Altivec instruction when using 4 single-precision
float vectors.

llvm-svn: 168086
2012-11-15 20:56:03 +00:00
Bill Schmidt
f294eb980a This patch is in preparation for adding medium code model support to the
PPC64 target.  The five tests modified herein test code generation that is
sensitive to the code model selected.  So I've added -code-model=small to
the RUN commands for each.

Since small code model is the default, this has no effect for now; but this
prepares us for eventually changing the default to medium code model for PPC64.

Test changes verified with small and medium code model as default on
powerpc64-unknown-linux-gnu.  All tests continue to pass.

llvm-svn: 167999
2012-11-14 23:23:27 +00:00
Ulrich Weigand
9c5e333c90 Do not consider a machine instruction that uses and defines the same
physical register as candidate for common subexpression elimination
in MachineCSE.

This fixes a bug on PowerPC in MultiSource/Applications/oggenc/oggenc
caused by MachineCSE invalidly merging two separate DYNALLOC insns.

llvm-svn: 167855
2012-11-13 18:40:58 +00:00
Jakob Stoklund Olesen
887571e652 Fix assertions in updateRegMaskSlots().
The RegMaskSlots contains 'r' slots while NewIdx and OldIdx are 'B'
slots. This broke the checks in the assertions.

This fixes PR14302.

llvm-svn: 167625
2012-11-09 19:18:49 +00:00
Ulrich Weigand
5e496676d0 On PowerPC64, integer return values (as well as arguments) are supposed
to be extended to a full register.   This is modeled in the IR by marking
the return value (or argument) with a signext or zeroext attribute.

However, while these attributes are respected for function arguments,
they are currently ignored for function return values by the PowerPC
back-end.  This patch updates PPCCallingConv.td to ask for the promotion
to i64, and fixes LowerReturn and LowerCallResult to implement it.

The new test case verifies that both arguments and return values are
properly extended when passing them; and also that the optimizers
understand incoming argument and return values are in fact guaranteed
by the ABI to be extended.

The patch caused a spurious breakage in CodeGen/PowerPC/coalesce-ext.ll,
since the test case used a "ret" instruction to create a use of an i32
value at the end of the function (to set up data flow as required for
what the test is intended to test).  Since there's now an implicit
promotion to i64, that data flow no longer works as expected.  To fix
this, this patch now adds an extra "add" to ensure we have an appropriate
use of the i32 value.

llvm-svn: 167396
2012-11-05 19:39:45 +00:00
Hal Finkel
a82b79fc22 Add support for the PowerPC-specific inline asm Z constraint and y modifier.
The Z constraint specifies an r+r memory address, and the y modifier expands
to the "r, r" in the asm string. For this initial implementation, the base
register is forced to r0 (which has the special meaning of 0 for r+r addressing
on PowerPC) and the full address is taken in the second register. In the
future, this should be improved.

llvm-svn: 167388
2012-11-05 18:18:42 +00:00
Adhemerval Zanella
382ede5fd4 [PATCH] PowerPC: Expand load extend vector operations
This patch expands the SEXTLOAD, ZEXTLOAD, and EXTLOAD operations for
vector types when altivec is enabled.

llvm-svn: 167386
2012-11-05 17:15:56 +00:00
Bill Schmidt
f4c899f8e7 This patch addresses an ABI compatibility issue with empty aggregate
parameters.  Examples of these are:

  struct { } a;
  union { } b[256];
  int a[0];

An empty aggregate has an address, although dereferencing that address is
pointless.  When passed as a parameter, an empty aggregate does not consume
a protocol register, nor does it consume a doubleword in the parameter save
area.  Passing an empty aggregate by reference passes an address just as
for any other aggregate.  Returning an empty aggregate uses GPR3 as a hidden
address of the return value location, just as for any other aggregate.

The patch modifies PPCTargetLowering::LowerFormalArguments_64SVR4 and
PPCTargetLowering::LowerCall_64SVR4 to properly skip empty aggregate
parameters passed by value.  The handling of return values and by-reference
parameters was already correct.

Built on powerpc64-unknown-linux-gnu and tested with no new regressions.
A test case is included to test proper handling of empty aggregate
parameters on both sides of the function call protocol.

llvm-svn: 167090
2012-10-31 01:15:05 +00:00
Adhemerval Zanella
74fd05ff3f PowerPC: Expand FSRQT for vector types
This patch expands FSQRT for floating point vector types when altivec is
used.

llvm-svn: 167034
2012-10-30 18:29:42 +00:00
Adhemerval Zanella
ac3ba40bc2 PowerPC: More support for Altivec compare operations
This patch adds more support for vector type comparisons using altivec.
It adds correct support for v16i8, v8i16, v4i32, and v4f32 vector
types for comparison operators ==, !=, >, >=, <, and <=.

llvm-svn: 167015
2012-10-30 13:50:19 +00:00
Bill Schmidt
77a8fd274b This patch solves a problem with passing varargs parameters under the PPC64
ELF ABI.

A varargs parameter consisting of a single-precision floating-point value,
or of a single-element aggregate containing a single-precision floating-point
value, must be passed in the low-order (rightmost) four bytes of the
doubleword stack slot reserved for that parameter.  If there are GPR protocol
registers remaining, the parameter must also be mirrored in the low-order
four bytes of the reserved GPR.

Prior to this patch, such parameters were being passed in the high-order
four bytes of the stack slot and the mirrored GPR.

The patch adds a new test case to verify the correct code generation.

llvm-svn: 166968
2012-10-29 21:18:16 +00:00
Ulrich Weigand
445bd73056 In various places throughout the code generator, there were special
checks to avoid performing compile-time arithmetic on PPCDoubleDouble.

Now that APFloat supports arithmetic on PPCDoubleDouble, those checks
are no longer needed, and we can treat the type like any other.

llvm-svn: 166958
2012-10-29 18:35:49 +00:00
Ulrich Weigand
2daab9e4b4 Allow i32/i64 for 'f' constraint on PowerPC.
This fixes PR12757.

llvm-svn: 166943
2012-10-29 17:49:34 +00:00
Bill Schmidt
2682b73f6d This patch adds alignment information for long double to the 64-bit PowerPC
ELF subtarget.

The existing logic is used as a fallback to avoid any changes to the Darwin
ABI.  PPC64 ELF now has two possible data layout strings: one for FreeBSD,
which requires 8-byte alignment, and a default string that requires
16-byte alignment.

I've added a test for PPC64 Linux to verify the 16-byte alignment.  If
somebody wants to add a separate test for FreeBSD, that would be great.

Note that there is a companion patch to update the alignment information
in Clang, which I am committing now as well.

llvm-svn: 166928
2012-10-29 14:59:36 +00:00
Bill Schmidt
71c462aff2 This patch addresses a PPC64 ELF issue with passing parameters consisting of
structs having size 3, 5, 6, or 7.  Such a struct must be passed and received
as right-justified within its register or memory slot.  The problem is only
present for structs that are passed in registers.

Previously, as part of a patch handling all structs of size less than 8, I
added logic to rotate the incoming register so that the struct was left-
justified prior to storing the whole register.  This was incorrect because
the address of the parameter had already been adjusted earlier to point to
the right-adjusted value in the storage slot.  Essentially I had accidentally
accounted for the right-adjustment twice.

In this patch, I removed the incorrect logic and reorganized the code to make
the flow clearer.

The removal of the rotates changes the expected code generation, so test case
structsinregs.ll has been modified to reflect this.  I also added a new test
case, jaggedstructs.ll, to demonstrate that structs of these sizes can now
be properly received and passed.

I've built and tested the code on powerpc64-unknown-linux-gnu with no new
regressions.  I also ran the GCC compatibility test suite and verified that
earlier problems with these structs are now resolved, with no new regressions.

llvm-svn: 166680
2012-10-25 13:38:09 +00:00
Ulrich Weigand
2248bca601 This patch fixes failures in the SingleSource/Regression/C/uint64_to_float
test case on PowerPC caused by rounding errors when converting from a 64-bit
integer to a single-precision floating point. The reason for this are
double-rounding effects, since on PowerPC we have to convert to an
intermediate double-precision value first, which gets rounded to the
final single-precision result.

The patch fixes the problem by preparing the 64-bit integer so that the
first conversion step to double-precision will always be exact, and the
final rounding step will result in the correctly-rounded single-precision
result.  The generated code sequence is equivalent to what GCC would generate.

When -enable-unsafe-fp-math is in effect, that extra effort is omitted
and we accept possible rounding errors (just like GCC does as well).

llvm-svn: 166178
2012-10-18 13:16:11 +00:00
Bill Schmidt
ad04de0c32 This patch addresses PR13949.
For the PowerPC 64-bit ELF Linux ABI, aggregates of size less than 8
bytes are to be passed in the low-order bits ("right-adjusted") of the
doubleword register or memory slot assigned to them.  A previous patch
addressed this for aggregates passed in registers.  However, small
aggregates passed in the overflow portion of the parameter save area are
still being passed left-adjusted.

The fix is made in PPCTargetLowering::LowerCall_Darwin_Or_64SVR4 on the
caller side, and in PPCTargetLowering::LowerFormalArguments_64SVR4 on
the callee side.  The main fix on the callee side simply extends
existing logic for 1- and 2-byte objects to 1- through 7-byte objects,
and correcting a constant left over from 32-bit code.  There is also a
fix to a bogus calculation of the offset to the following argument in
the parameter save area.

On the caller side, again a constant left over from 32-bit code is
fixed.  Additionally, some code for 1, 2, and 4-byte objects is
duplicated to handle the 3, 5, 6, and 7-byte objects for SVR4 only.  The
LowerCall_Darwin_Or_64SVR4 logic is getting fairly convoluted trying to
handle both ABIs, and I propose to separate this into two functions in a
future patch, at which time the duplication can be removed.

The patch adds a new test (structsinmem.ll) to demonstrate correct
passing of structures of all seven sizes.  Eight dummy parameters are
used to force these structures to be in the overflow portion of the
parameter save area.

As a side effect, this corrects the case when aggregates passed in
registers are saved into the first eight doublewords of the parameter
save area:  Previously they were stored left-justified, and now are
properly stored right-justified.  This requires changing the expected
output of existing test case structsinregs.ll.

llvm-svn: 166022
2012-10-16 13:30:53 +00:00
NAKAMURA Takumi
1a69f3cfb5 llvm/test/CodeGen/PowerPC/2012-10-12-bitcast.ll: Try to fix failure on non-ppc hosts, to add -mattr=+altivec.
llvm-svn: 165803
2012-10-12 16:01:08 +00:00
Ulrich Weigand
dd9a6100a0 Fix big-endian codegen bug in DAGTypeLegalizer::ExpandRes_BITCAST
On PowerPC, a bitcast of <16 x i8> to i128 may run through a code
path in ExpandRes_BITCAST that attempts to do an intermediate
bitcast to a <4 x i32> vector, and then construct the Hi and Lo parts
of the resulting i128 by pairing up two of those i32 vector elements
each.  The code already recognizes that on a big-endian system, the
first two vector elements form the Hi part, and the final two vector
elements form the Lo part (vice-versa from the little-endian situation).

However, we also need to take endianness into account when forming each
of those separate pairs:  on a big-endian system, vector element 0 is
the *high* part of the pair making up the Hi part of the result, and
vector element 1 is the low part of the pair.  The code currently always
uses vector element 0 as the low part and vector element 1 as the high
part, as is appropriate for little-endian platforms only.

This patch fixes this by swapping the vector elements as they are
paired up as appropriate.

llvm-svn: 165802
2012-10-12 15:42:58 +00:00
Bill Schmidt
3b8ee801af This patch addresses PR13947.
For function calls on the 64-bit PowerPC SVR4 target, each parameter
is mapped to as many doublewords in the parameter save area as
necessary to hold the parameter.  The first 13 non-varargs
floating-point values are passed in registers; any additional
floating-point parameters are passed in the parameter save area.  A
single-precision floating-point parameter (32 bits) must be mapped to
the second (rightmost, low-order) word of its assigned doubleword
slot.

Currently LLVM violates this ABI requirement by mapping such a
parameter to the first (leftmost, high-order) word of its assigned
doubleword slot.  This is internally self-consistent but will not
interoperate correctly with libraries compiled with an ABI-compliant
compiler.

This patch corrects the problem by adjusting the parameter addressing
on both sides of the calling convention.

llvm-svn: 165714
2012-10-11 15:38:20 +00:00
Bill Schmidt
6cac0197ea Add -mattr=+altivec and remove XFAIL.
llvm-svn: 165666
2012-10-10 22:25:11 +00:00
Bill Schmidt
227cfed3b5 XFAIL for all targets pending investigation
llvm-svn: 165664
2012-10-10 21:52:10 +00:00
Bill Schmidt
57e3d38632 When generating spill and reload code for vector registers on PowerPC,
the compiler makes use of GPR0.  However, there are two flavors of
GPR0 defined by the target:  the 32-bit GPR0 (R0) and the 64-bit GPR0
(X0).  The spill/reload code makes use of R0 regardless of whether we
are generating 32- or 64-bit code.

This patch corrects the problem in the obvious manner, using X0 and
ADDI8 for 64-bit and R0 and ADDI for 32-bit.

llvm-svn: 165658
2012-10-10 21:25:01 +00:00
Bill Schmidt
5f0844eeb4 The PowerPC VRSAVE register has been somewhat of an odd beast since
the Altivec extensions were introduced.  Its use is optional, and
allows the compiler to communicate to the operating system which
vector registers should be saved and restored during a context switch.
In practice, this information is ignored by the various operating
systems using the SVR4 ABI; the kernel saves and restores the entire
register state.  Setting the VRSAVE register is no longer performed by
the AIX XL compilers, the IBM i compilers, or by GCC on Power Linux
systems.  It seems best to avoid this logic within LLVM as well.

This patch avoids generating code to update and restore VRSAVE for the
PowerPC SVR4 ABIs (32- and 64-bit).  The code remains in place for the
Darwin ABI.

llvm-svn: 165656
2012-10-10 20:54:15 +00:00
Adhemerval Zanella
91fa3a3479 PR12716: PPC crashes on vector compare
Vector compare using altivec 'vcmpxxx' instructions have as third argument
a vector register instead of CR one, different from integer and float-point
compares. This leads to a failure in code generation, where 'SelectSETCC'
expects a DAG with a CR register and gets vector register instead.

This patch changes the behavior by just returning a DAG with the 
vector compare instruction based on the type. The patch also adds a testcase
for all vector types llvm defines.

It also included a fix on signed 5-bits predicates printing, where
signed values were not handled correctly as signed (char are unsigned by
default for PowerPC). This generates 'vspltisw' (vector splat)
instruction with SIM out of range.

llvm-svn: 165419
2012-10-08 18:59:53 +00:00
Adhemerval Zanella
490909f7d1 Add floating-point to and from integer conversion
This patch add altivec support for v4i32 to v4f32 and for v4f32 to
v4i32 vector rounding conversion.

llvm-svn: 165409
2012-10-08 17:27:24 +00:00
Rafael Espindola
2ebde0e0fb Convert to unix line endings.
llvm-svn: 165308
2012-10-05 13:32:38 +00:00
Roman Divacky
87f9c41b1c Specify MachinePointerInfo as refering to the argument value and offset of the
store when handling byval arguments. Thus preventing reordering of the store
with load with post-RA scheduler.

llvm-svn: 164553
2012-09-24 20:47:19 +00:00
Roman Divacky
3f44c24bfd Specify cpu to get the correct instruction ordering. Remove XFAIL.
llvm-svn: 164306
2012-09-20 14:59:42 +00:00
Jordan Rose
8051e46a49 Really XFAIL test/CodeGen/PowerPC/structsinregs.ll.
XFAIL needs a trailing colon. Hopefully this will get the buildbots
happy again while Bill works on getting it passing.

llvm-svn: 164237
2012-09-19 17:03:11 +00:00
Bill Schmidt
10c15bfd85 XFAIL test/CodeGen/PowerPC/structsinregs.ll
llvm-svn: 164233
2012-09-19 16:18:23 +00:00
Bill Schmidt
4e7e64ff70 Small structs for PPC64 SVR4 must be passed right-justified in registers.
lib/Target/PowerPC/PPCISelLowering.{h,cpp}
 Rename LowerFormalArguments_Darwin to LowerFormalArguments_Darwin_Or_64SVR4.
 Rename LowerFormalArguments_SVR4 to LowerFormalArguments_32SVR4.
 Receive small structs right-justified in LowerFormalArguments_Darwin_Or_64SVR4.
 Rename LowerCall_Darwin to LowerCall_Darwin_Or_64SVR4.
 Rename LowerCall_SVR4 to LowerCall_32SVR4.
 Pass small structs right-justified in LowerCall_Darwin_Or_64SVR4.

test/CodeGen/PowerPC/structsinregs.ll
 New test.

llvm-svn: 164228
2012-09-19 15:42:13 +00:00
Roman Divacky
1cc7e2c795 Add test for r164155 and remove two tests superseded by ppc64-calls.ll.
llvm-svn: 164162
2012-09-18 19:51:44 +00:00
Roman Divacky
bb7740900c Avoid symbol name clash when filling TOC.
Patch by Adhemerval Zanella.

llvm-svn: 164141
2012-09-18 17:10:37 +00:00
Roman Divacky
377f342a56 On PPC64 emit the environment pointer. Patch by Adhemerval Zanella.
llvm-svn: 164139
2012-09-18 16:55:29 +00:00
Roman Divacky
953cd43dfa Optimize local func calls to not emit nop for TOC restoration.
Patch by Adhemerval Zanella.

llvm-svn: 164138
2012-09-18 16:47:58 +00:00
Roman Divacky
3d302860e6 This patch corrects logic in PPCFrameLowering for save and restore of
nonvolatile condition register fields across calls under the SVR4 ABIs.                                            
                                                                                                                   
 * With the 64-bit ABI, the save location is at a fixed offset of 8 from                                           
the stack pointer.  The frame pointer cannot be used to access this                                                
portion of the stack frame since the distance from the frame pointer may                                           
change with alloca calls.                                                                                          
                                                                                                                   
 * With the 32-bit ABI, the save location is just below the general
register save area, and is accessed via the frame pointer like the rest
of the save areas.  This is an optional slot, so it must only be created                                           
if any of CR2, CR3, and CR4 were modified.                                                                      
                                                                                                                   
 * For both ABIs, save/restore logic is generated only if one of the     
nonvolatile CR fields were modified.                                   

I also took this opportunity to clean up an extra FIXME in
PPCFrameLowering.h.  Save area offsets for 32-bit GPRs are meaningless
for the 64-bit ABI, so I removed them for correctness and efficiency.


Fixes PR13708 and partially also PR13623. It lets us enable exception handling
on PPC64.

Patch by William J. Schmidt!

llvm-svn: 163713
2012-09-12 14:47:47 +00:00
Jakob Stoklund Olesen
c89c722370 Allow overlaps between virtreg and physreg live ranges.
The RegisterCoalescer understands overlapping live ranges where one
register is defined as a copy of the other. With this change, register
allocators using LiveRegMatrix can do the same, at least for copies
between physical and virtual registers.

When a physreg is defined by a copy from a virtreg, allow those live
ranges to overlap:

  %CL<def> = COPY %vreg11:sub_8bit; GR32_ABCD:%vreg11
  %vreg13<def,tied1> = SAR32rCL %vreg13<tied0>, %CL<imp-use,kill>

We can assign %vreg11 to %ECX, overlapping the live range of %CL.

llvm-svn: 163336
2012-09-06 18:15:23 +00:00
Jakob Stoklund Olesen
ef5dcf47b8 Move tie checks into MachineVerifier::visitMachineOperand.
llvm-svn: 163152
2012-09-04 18:38:28 +00:00
Hal Finkel
b356af14b1 Reserve space for the mandatory traceback fields on PPC64.
We need to reserve space for the mandatory traceback fields,
though leaving them as zero is appropriate for now.

Although the ABI calls for these fields to be filled in fully, no
compiler on Linux currently does this, and GDB does not read these
fields.  GDB uses the first word of zeroes during exception handling to
find the end of the function and the size field, allowing it to compute
the beginning of the function.  DWARF information is used for everything
else.  We need the extra 8 bytes of pad so the size field is found in
the right place.

As a comparison, GCC fills in a few of the fields -- language, number
of saved registers -- but ignores the rest.  IBM's proprietary OSes do
make use of the full traceback table facility.

Patch by Bill Schmidt.

llvm-svn: 162854
2012-08-29 20:22:24 +00:00
Roman Divacky
7c3f29735a Emit word of zeroes after the last instruction as a start of the mandatory
traceback table on PowerPC64. This helps gdb handle exceptions. The other
mandatory fields are ignored by gdb and harder to implement so just add
there a FIXME.

Patch by Bill Schmidt. PR13641.

llvm-svn: 162778
2012-08-28 19:06:55 +00:00
Hal Finkel
0673920af6 Add PPC Freescale e500mc and e5500 subtargets.
Add subtargets for Freescale e500mc (32-bit) and e5500 (64-bit) to
the PowerPC backend.

Patch by Tobias von Koch.

llvm-svn: 162764
2012-08-28 16:12:39 +00:00
Hal Finkel
367c494415 Allow remat of LI on PPC.
Allow load-immediates to be rematerialised in the register coalescer for
PPC. This makes test/CodeGen/PowerPC/big-endian-formal-args.ll fail,
because it relies on a register move getting emitted. The immediate load is
equivalent, so change this test case.

Patch by Tobias von Koch.

llvm-svn: 162727
2012-08-28 02:10:33 +00:00
Hal Finkel
d28587407f Eliminate redundant CR moves on PPC32.
The 32-bit ABI requires CR bit 6 to be set if the call has fp arguments and
unset if it doesn't. The solution up to now was to insert a MachineNode to
set/unset the CR bit, which produces a CR vreg. This vreg was then copied
into CR bit 6. When the register allocator saw a bunch of these in the same
function, it allocated the set/unset CR bit in some random CR register (1
extra instruction) and then emitted CR moves before every vararg function
call, rather than just setting and unsetting CR bit 6 directly before every
vararg function call. This patch instead inserts a PPCcrset/PPCcrunset
instruction which are then matched by a dedicated instruction pattern.

Patch by Tobias von Koch.

llvm-svn: 162725
2012-08-28 02:10:27 +00:00
Hal Finkel
caa4701e37 Optimize zext on PPC64.
The zeroextend IR instruction is lowered to an 'and' node with an immediate
mask operand, which in turn gets legalised to a sequence of ori's & ands.
This can be done more efficiently using the rldicl instruction.

Patch by Tobias von Koch.

llvm-svn: 162724
2012-08-28 02:10:15 +00:00
Roman Divacky
eab620e38c Lower constant pools and jump tables via TOC on PPC64/SVR4.
In collaboration with Adhemerval Zanella.

llvm-svn: 162562
2012-08-24 16:26:02 +00:00
Nadav Rotem
eb22b069bb During the CodeGenPrepare we often lower intrinsics (such as objsize)
and allow some optimizations to turn conditional branches into unconditional.
This commit adds a simple control-flow optimization which merges two consecutive
basic blocks which are connected by a single edge. This allows the codegen to
operate on larger basic blocks.

rdar://11973998

llvm-svn: 161852
2012-08-14 05:19:07 +00:00
Hal Finkel
15265edebe MFTB on PPC64 should really be encoded using MFSPR.
The MFTB instruction itself is being phased out, and its functionality
is provided by MFSPR. According to the ISA docs, using MFSPR works on all known
chips except for the 601 (which did not have a timebase register anyway)
and the POWER3.

Thanks to Adhemerval Zanella for pointing this out!

llvm-svn: 161346
2012-08-06 21:21:44 +00:00
Hal Finkel
aadd19de06 Add readcyclecounter lowering on PPC64.
On PPC64, this can be done with a simple TableGen pattern.
To enable this, I've added the (otherwise missing) readcyclecounter
SDNode definition to TargetSelectionDAG.td.

llvm-svn: 161302
2012-08-04 14:10:46 +00:00
Bob Wilson
9f6e25017a Refactor and check "onlyReadsMemory" before optimizing builtins.
This patch is mostly just refactoring a bunch of copy-and-pasted code, but
it also adds a check that the call instructions are readnone or readonly.
That check was already present for sin, cos, sqrt, log2, and exp2 calls, but
it was missing for the rest of the builtins being handled in this code.

llvm-svn: 161282
2012-08-03 23:29:17 +00:00
Chandler Carruth
5d3a0ce4e5 Fix the remaining TCL-style quotes found in the testsuite. This is
another mechanical change accomplished though the power of terrible Perl
scripts.

I have manually switched some "s to 's to make escaping simpler.

While I started this to fix tests that aren't run in all configurations,
the massive number of tests is due to a really frustrating fragility of
our testing infrastructure: things like 'grep -v', 'not grep', and
'expected failures' can mask broken tests all too easily.

Essentially, I'm deeply disturbed that I can change the testsuite so
radically without causing any change in results for most platforms. =/

llvm-svn: 159547
2012-07-02 19:09:46 +00:00
Chandler Carruth
d200829a4f Convert the uses of '|&' to use '2>&1 |' instead, which works on old
versions of Bash. In addition, I can back out the change to the lit
built-in shell test runner to support this.

This should fix the majority of fallout on Darwin, but I suspect there
will be a few straggling issues.

llvm-svn: 159544
2012-07-02 18:37:59 +00:00
Chandler Carruth
8a358b3669 Convert all tests using TCL-style quoting to use shell-style quoting.
This was done through the aid of a terrible Perl creation. I will not
paste any of the horrors here. Suffice to say, it require multiple
staged rounds of replacements, state carried between, and a few
nested-construct-parsing hacks that I'm not proud of. It happens, by
luck, to be able to deal with all the TCL-quoting patterns in evidence
in the LLVM test suite.

If anyone is maintaining large out-of-tree test trees, feel free to poke
me and I'll send you the steps I used to convert things, as well as
answer any painful questions etc. IRC works best for this type of thing
I find.

Once converted, switch the LLVM lit config to use ShTests the same as
Clang. In addition to being able to delete large amounts of Python code
from 'lit', this will also simplify the entire test suite and some of
lit's architecture.

Finally, the test suite runs 33% faster on Linux now. ;]
For my 16-hardware-thread (2x 4-core xeon e5520): 36s -> 24s

llvm-svn: 159525
2012-07-02 12:47:22 +00:00
Hal Finkel
ebe9ea8bd7 Add support for the PPC isel instruction.
The isel (integer select) instruction is supported on the 440 and A2
embedded cores and on the POWER7.

llvm-svn: 159045
2012-06-22 23:10:08 +00:00
Lang Hames
7d298105e5 Rename fp-op fusion option (yet again) for compatibility with GCC option.
llvm-svn: 159042
2012-06-22 22:31:00 +00:00
Lang Hames
68cf87e3ef Rename -allow-excess-fp-precision flag to -fuse-fp-ops, and switch from a
boolean flag to an enum: { Fast, Standard, Strict } (default = Standard).

This option controls the creation by optimizations of fused FP ops that store
intermediate results in higher precision than IEEE allows (E.g. FMAs). The
behavior of this option is intended to match the behaviour specified by a
soon-to-be-introduced frontend flag: '-ffuse-fp-ops'.

Fast mode - allows formation of fused FP ops whenever they're profitable.

Standard mode - allow fusion only for 'blessed' FP ops. At present the only
blessed op is the fmuladd intrinsic. In the future more blessed ops may be
added.

Strict mode - allow fusion only if/when it can be proven that the excess
precision won't effect the result.

Note: This option only controls formation of fused ops by the optimizers.  Fused
operations that are explicitly requested (e.g. FMA via the llvm.fma.* intrinsic)
will always be honored, regardless of the value of this option.

Internally TargetOptions::AllowExcessFPPrecision has been replaced by
TargetOptions::AllowFPOpFusion.

llvm-svn: 158956
2012-06-22 01:09:09 +00:00
Hal Finkel
bc9be7c0e5 Treat TargetGlobalAddress as a constant for the purpose of matching pre-inc stores on PPC.
Thanks to Tobias von Koch for pointing out this problem.

llvm-svn: 158932
2012-06-21 20:10:48 +00:00
Hal Finkel
a94da28a6d Add support for generating reg+reg (indexed) pre-inc loads on PPC.
llvm-svn: 158823
2012-06-20 15:43:03 +00:00
Lang Hames
f0b9601a6d Add DAG-combines for aggressive FMA formation.
This patch adds DAG combines to form FMAs from pairs of FADD + FMUL or
FSUB + FMUL. The combines are performed when:
(a) Either
      AllowExcessFPPrecision option (-enable-excess-fp-precision for llc)
        OR
      UnsafeFPMath option (-enable-unsafe-fp-math)
    are set, and
(b) TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) is true for the type of
    the FADD/FSUB, and
(c) The FMUL only has one user (the FADD/FSUB).

If your target has fast FMA instructions you can make use of these combines by
overriding TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) to return true for
types supported by your FMA instruction, and adding patterns to match ISD::FMA
to your FMA instructions.

llvm-svn: 158757
2012-06-19 22:51:23 +00:00
Jakob Stoklund Olesen
0a9edb38d3 Add a triple.
The test was failing on Linux because of asm syntax differences.

llvm-svn: 158748
2012-06-19 21:46:25 +00:00
Jakob Stoklund Olesen
66e7517610 Implement PPCInstrInfo::isCoalescableExtInstr().
The PPC::EXTSW instruction preserves the low 32 bits of its input, just
like some of the x86 instructions. Use it to reduce register pressure
when the low 32 bits have multiple uses.

This requires a small change to PeepholeOptimizer since EXTSW takes a
64-bit input register.

This is related to PR5997.

llvm-svn: 158743
2012-06-19 21:14:34 +00:00
Hal Finkel
42b797225a Add support for generating reg+reg preinc stores on PPC.
PPC will now generate STWUX and friends.

llvm-svn: 158698
2012-06-19 02:34:32 +00:00
Hal Finkel
40483bafbf Cleanup trip-count finding for PPC CTR loops (and some bug fixes).
This cleans up the method used to find trip counts in order to form CTR loops on PPC.
This refactoring allows the pass to find loops which have a constant trip count but also
happen to end with a comparison to zero. This also adds explicit FIXMEs to mark two different
classes of loops that are currently ignored.

In addition, we now search through all potential induction operations instead of just the first.
Also, we check the predicate code on the conditional branch and abort the transformation if the
code is not EQ or NE, and we then make sure that the branch to be transformed matches the
condition register defined by the comparison (multiple possible comparisons will be considered).

llvm-svn: 158607
2012-06-16 20:34:07 +00:00
Hal Finkel
b6ac451381 Enable ILP scheduling for all nodes by default on PPC.
Over the entire test-suite, this has an insignificantly negative average
performance impact, but reduces some of the worst slowdowns from the
anti-dep. change (r158294).

Largest speedups:
SingleSource/Benchmarks/Stanford/Quicksort - 28%
SingleSource/Benchmarks/Stanford/Towers - 24%
SingleSource/Benchmarks/Shootout-C++/matrix - 23%
MultiSource/Benchmarks/SciMark2-C/scimark2 - 19%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15%
(matrix and automotive-bitcount were both in the top-5 slowdown list from the
anti-dep. change)

Largest slowdowns:
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26%
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21%
SingleSource/Benchmarks/CoyoteBench/lpbench - 20%
MultiSource/Applications/d/make_dparser - 16%

llvm-svn: 158296
2012-06-10 19:32:29 +00:00
Hal Finkel
a9b329fcf1 Improve ext/trunc patterns on PPC64.
The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that
would leave self-moves in the final assembly. Replacing those patterns with ones
based on the SUBREG builtins yields better-looking code.

Thanks to Jakob and Owen for their suggestions in this matter.

llvm-svn: 158283
2012-06-09 22:10:19 +00:00
Hal Finkel
d2d71dd821 Enable tail merging on PPC.
Tail merging had been disabled on PPC because it would disturb bundling decisions
made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions
are made during post-RA scheduling, and tail merging is generally beneficial (the
average test-suite speedup is insignificantly positive).

Largest test-suite speedups:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23%
SingleSource/Benchmarks/Shootout-C++/ary - 21%
SingleSource/Benchmarks/Stanford/Queens - 17%

Largest slowdowns:
MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22%
MultiSource/Applications/JM/ldecod/ldecod - 14%
MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9%

This is improved by using full (instead of just critical) anti-dependency breaking,
but doing so still causes miscompiles and so cannot yet be enabled by default.

llvm-svn: 158259
2012-06-09 03:14:50 +00:00
Jakob Stoklund Olesen
ce0f9aef12 Don't run RAFast in the optimizing regalloc pipeline.
The fast register allocator is not supposed to work in the optimizing
pipeline. It doesn't make sense to compute live intervals, run full copy
coalescing, and then run RAFast.

Fast register allocation in the optimizing pipeline is better done by
RABasic.

llvm-svn: 158242
2012-06-08 23:15:12 +00:00
Hal Finkel
1424f01791 Enable PPC CTR loop formation by default.
Thanks to Jakob's help, this now causes no new test suite failures!

Over the entire test suite, this gives an average 1% speedup. The largest speedups are:
SingleSource/Benchmarks/Misc/pi - 108%
SingleSource/Benchmarks/CoyoteBench/lpbench - 54%
MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50%
SingleSource/Benchmarks/Shootout/ary3 - 32%
SingleSource/Benchmarks/Shootout-C++/matrix - 30%

The largest slowdowns are:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30%
MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22%
MultiSource/Applications/d/make_dparser - -14%
SingleSource/Benchmarks/Shootout-C++/ary - -13%

In light of these slowdowns, additional profiling work is obviously needed!

llvm-svn: 158223
2012-06-08 19:19:53 +00:00
Hal Finkel
d05ff520b8 Disable the PPC CTR-Loops pass by default.
The pass itself works well, but the something in the Machine* infrastructure
does not understand terminators which define registers. Without the ability
to use the block-placement pass, etc. this causes performance regressions (and
so is turned off by default). Turning off the analysis turns off the problems
with the Machine* infrastructure.

llvm-svn: 158206
2012-06-08 15:38:25 +00:00
Hal Finkel
a6629c556e Fix a bug in the new PPC CTR-Loops pass.
The code which tests for an induction operation cannot assume that any
ADDI instruction will have a register operand because the operand could
also be a frame index; for example:
    %vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16

llvm-svn: 158205
2012-06-08 15:38:23 +00:00
Hal Finkel
bb4e499e94 Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form CTR-based loop branching code.
This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon
pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are
no longer otherwise used. Also, invalid preheader DebugLoc is not used.

llvm-svn: 158204
2012-06-08 15:38:21 +00:00
Roman Divacky
0daa2c0556 Implement local-exec TLS on PowerPC.
llvm-svn: 157935
2012-06-04 17:36:38 +00:00
Hal Finkel
c1fe73fae2 Enable generating PPC pre-increment (r+imm) instructions by default.
It seems that this no longer causes test suite failures on PPC64 (after r157159),
and often gives a performance benefit, so it can be enabled by default.

llvm-svn: 157911
2012-06-04 02:21:00 +00:00
Hal Finkel
9fad4cf803 Add a missing PPC 64-bit stwu pattern.
This seems to fix the remaining compile-time failures on PPC64 when
compiling with -enable-ppc-preinc.

llvm-svn: 157159
2012-05-20 17:11:24 +00:00
Jakob Stoklund Olesen
b3487aa334 Remove -join-physregs from the test suite.
This option has been disabled for a while, and it is going away so I can
clean up the coalescer code.

The tests that required physreg joining to be enabled were almost all of
the form "tiny function with interference between arguments and return
value". Such functions are usually inlined in the real world.

The problem exposed by phys_subreg_coalesce-3.ll is real, but fairly
rare.

llvm-svn: 157027
2012-05-17 23:44:19 +00:00
Hal Finkel
457fbe481c Remove dead SD nodes after the combining pass. Fixes PR12201.
llvm-svn: 154786
2012-04-16 03:33:22 +00:00
Hal Finkel
1c045f6845 Enable prefetch generation on PPC64.
llvm-svn: 153851
2012-04-01 20:08:17 +00:00
Hal Finkel
fd26145bc6 Add instruction itinerary for the PPC64 A2 core.
This adds a full itinerary for IBM's PPC64 A2 embedded core. These
cores form the basis for the CPUs in the new IBM BG/Q supercomputer.

llvm-svn: 153842
2012-04-01 19:22:40 +00:00
Eli Bendersky
3ef88c1833 Continue cleanup of LIT, getting rid of the remaining artifacts from dejagnu
* Removed test/lib/llvm.exp - it is no longer needed 
* Deleted the dg.exp reading code from test/lit.cfg. There are no dg.exp files
  left in the test suite so this code is no longer required. test/lit.cfg is
  now much shorter and clearer 
* Removed a lot of duplicate code in lit.local.cfg files that need access to
  the root configuration, by adding a "root" attribute to the TestingConfig
  object. This attribute is dynamically computed to provide the same
  information as was previously provided by the custom getRoot functions. 
* Documented the config.root attribute in docs/CommandGuide/lit.pod

llvm-svn: 153408
2012-03-25 09:02:19 +00:00
Hal Finkel
30d4df9f6d Fix small-integer VAARG on SVR4 ABI PPC64.
The PPC64 SVR4 ABI requires integer stack arguments, and thus the var. args., that
are smaller than 64 bits be zero extended to 64 bits.

llvm-svn: 153373
2012-03-24 03:53:55 +00:00
Roman Divacky
588712f080 Test the section specification.
llvm-svn: 151552
2012-02-27 20:42:19 +00:00
Roman Divacky
200acf8e6e Reapply r151278 with fixes.
MCize function entry label emission on PowerPC64 properly.

llvm-svn: 151547
2012-02-27 20:20:47 +00:00
Hal Finkel
3aea686faa Revert r151278, breaks static linking.
Reverting this because it breaks static linking on ppc64. Specifically, it may be linkonce_odr functions that are the problem.
With this patch, if you link statically, calls to some functions end up calling their descriptor addresses instead
of calling to their entry points. This causes the execution to fail with SIGILL (b/c the descriptor address just
has some pointers, not code).

llvm-svn: 151433
2012-02-25 03:40:11 +00:00
Hal Finkel
784c4bf068 X11/X2 loads around indirect calls on ppc64 should not be deleted.
llvm-svn: 151374
2012-02-24 17:54:01 +00:00
Hal Finkel
8c2c90c035 Don't crash when a glue node contains an internal CopyToReg
This is necessary to support the existing ppc lowering code for indirect calls.
Fixes PR12071.

llvm-svn: 151373
2012-02-24 17:53:59 +00:00
Roman Divacky
35c45da372 MCize function entry label emission on PowerPC64 properly.
llvm-svn: 151278
2012-02-23 20:28:39 +00:00
Hal Finkel
cfc8c850f6 Allow the use of an alternate symbol for calculating a function's size.
The standard function epilog includes a .size directive, but ppc64 uses
an alternate local symbol to tag the actual start of each function.

Until recently, binutils accepted the .size directive as:
 .size	test1, .Ltmp0-test1
however, using this directive with recent binutils will result in the error:
 .size expression for XXX does not evaluate to a constant
so we must use the label which actually tags the start of the function.

llvm-svn: 151200
2012-02-22 21:11:47 +00:00
Jakob Stoklund Olesen
4404c980b2 Remove a bad PowerPC test.
This test case was way too strict, matching the entire assembly output.
Every non-trivial change to the ppc backend  or -O0 pipeline required
the test to be updated.

It should be replaced with a test of the specific vaarg feature.

llvm-svn: 151105
2012-02-21 23:49:18 +00:00
Eli Bendersky
4afdeeb682 Replace all instances of dg.exp file with lit.local.cfg, since all tests are run with LIT now and now Dejagnu. dg.exp is no longer needed.
Patch reviewed by Daniel Dunbar. It will be followed by additional cleanup patches.

llvm-svn: 150664
2012-02-16 06:28:33 +00:00
Hal Finkel
0c67e8f4d9 AggressiveAntiDepBreaker needs to skip debug values because a debug value does not have a corresponding SUnit
llvm-svn: 148260
2012-01-16 22:53:41 +00:00
Hal Finkel
4a09216dfb Cleanup stack/frame register define/kill states. This fixes two bugs:
1. The ST*UX instructions that store and update the stack pointer did not set define/kill on R1. This became a problem when I activated post-RA scheduling (and had incorrectly adjusted the Frames-large test).

2. eliminateFrameIndex did not kill its scavenged temporary register, and this could cause the scavenger to exhaust all available registers (and its emergency spill slot) when there were a lot of CR values to spill. The 2010-02-12-saveCR test has been adjusted to check for this.

llvm-svn: 147359
2011-12-30 00:34:00 +00:00
Hal Finkel
e8220d9927 Add a test case to make sure that the nop really does follow the bl on ppc64 elf
llvm-svn: 146666
2011-12-15 17:59:23 +00:00
Chandler Carruth
2bedf185c9 Manually upgrade the test suite to specify the flag to cttz and ctlz.
I followed three heuristics for deciding whether to set 'true' or
'false':

- Everything target independent got 'true' as that is the expected
  common output of the GCC builtins.
- If the target arch only has one way of implementing this operation,
  set the flag in the way that exercises the most of codegen. For most
  architectures this is also the likely path from a GCC builtin, with
  'true' being set. It will (eventually) require lowering away that
  difference, and then lowering to the architecture's operation.
- Otherwise, set the flag differently dependending on which target
  operation should be tested.

Let me know if anyone has any issue with this pattern or would like
specific tests of another form. This should allow the x86 codegen to
just iteratively improve as I teach the backend how to differentiate
between the two forms, and everything else should remain exactly the
same.

llvm-svn: 146370
2011-12-12 11:59:10 +00:00
Hal Finkel
d591c94df7 Make CR spill and restore use a reserved register. These operations cannot use the register scavenger because the scavenger can only scavenge one register and frame-index elimination may have already grabbed it.
llvm-svn: 146318
2011-12-10 04:50:53 +00:00
Eli Friedman
8f3db3867c Fix a couple of logic bugs in TargetLowering::SimplifyDemandedBits. PR11514.
llvm-svn: 146219
2011-12-09 01:16:26 +00:00
Hal Finkel
a76ada827b delaying restore-cr changed assigned registers in some tests
llvm-svn: 145963
2011-12-06 20:55:46 +00:00
Hal Finkel
7d78f1a8a4 add a test case that uses RESTORE_CR
llvm-svn: 145962
2011-12-06 20:55:41 +00:00
Hal Finkel
c8d6ce5e09 Add test case - this input used to crash because of duplicate generation of SPILL_CRs
llvm-svn: 145820
2011-12-05 17:55:22 +00:00
Hal Finkel
8b1e460cd9 enable PPC register scavenging by default (update tests and remove some FIXMEs)
llvm-svn: 145819
2011-12-05 17:55:17 +00:00
Hal Finkel
68e102ed41 remove wasted space for extra bit copies of CR2 subregs
llvm-svn: 145817
2011-12-05 17:55:06 +00:00
Hal Finkel
a5b78f0e58 specify cpu for test to fix failure on some darwin systems with a g4+ cpu
llvm-svn: 145699
2011-12-02 19:38:17 +00:00
Hal Finkel
2984a1dfcb adjust the instruction ordering in some PPC tests: changes due to postRA haz. rec.
llvm-svn: 145678
2011-12-02 04:58:12 +00:00
Chris Lattner
9d1e8420ff Upgrade syntax of tests using volatile instructions to use 'load volatile' instead of 'volatile load', which is archaic.
llvm-svn: 145171
2011-11-27 06:54:59 +00:00
Hal Finkel
77cfe064a7 add basic PPC register-pressure feedback; adjust the vaarg test to match the new register-allocation pattern
llvm-svn: 145065
2011-11-22 16:21:04 +00:00
NAKAMURA Takumi
78a0f170d6 test/CodeGen/PowerPC/2008-10-17-AsmMatchingOperands.ll: [PR11218] Mark "REQUIRES: asserts" for now.
llvm-svn: 143247
2011-10-28 23:11:03 +00:00
Dan Gohman
6e1bd851dc Change the default scheduler from Latency to ILP, since Latency
is going away.

llvm-svn: 142810
2011-10-24 17:45:02 +00:00
Hal Finkel
d65adcde2d use FileCheck and not grep in new tests
llvm-svn: 142189
2011-10-17 16:01:41 +00:00
Hal Finkel
8be5b30fa8 Test case for CanLowerReturn fix (r141981)
llvm-svn: 142172
2011-10-17 04:03:59 +00:00
Hal Finkel
b128cda81b Add PPC 440 scheduler and some associated tests (new files)
llvm-svn: 142171
2011-10-17 04:03:55 +00:00
Eli Friedman
6aaaadc188 Convert more tests over to the new atomic instructions.
I did not convert Atomics-32.ll and Atomics-64.ll by hand; the diff is autoupgrade output.

The wmb test is gone because there isn't any way to express wmb with the new atomic instructions; if someone really needs a non-asm way to write a wmb on Alpha, a platform-specific intrisic could be added.

llvm-svn: 140566
2011-09-26 21:30:17 +00:00
Devang Patel
75c70b2315 Remove ancient debug info constructs from test cases, they are not relevant to test case's main objective.
llvm-svn: 139675
2011-09-14 00:29:50 +00:00
Duncan Sands
6939ae53ac Split the init.trampoline intrinsic, which currently combines GCC's
init.trampoline and adjust.trampoline intrinsics, into two intrinsics
like in GCC.  While having one combined intrinsic is tempting, it is
not natural because typically the trampoline initialization needs to
be done in one function, and the result of adjust trampoline is needed
in a different (nested) function.  To get around this llvm-gcc hacks the
nested function lowering code to insert an additional parent variable
holding the adjust.trampoline result that can be accessed from the child
function.  Dragonegg doesn't have the luxury of tweaking GCC code, so it
stored the result of adjust.trampoline in the memory GCC set aside for
the trampoline itself (this is always available in the child function),
and set up some new memory (using an alloca) to hold the trampoline.
Unfortunately this breaks Go which allocates trampoline memory on the
heap and wants to use it even after the parent has exited (!).  Rather
than doing even more hacks to get Go working, it seemed best to just use
two intrinsics like in GCC.  Patch mostly by Sanjoy Das.

llvm-svn: 139140
2011-09-06 13:37:06 +00:00
Bill Wendling
722de8a9aa Update more tests to the new EH scheme.
llvm-svn: 138894
2011-08-31 21:04:11 +00:00
Roman Divacky
7ac1bc57f7 Set CR1EQ only when lowering vararg floating arguments (not any vararg
arguments as before), unset CR1EQ otherwise.

llvm-svn: 138802
2011-08-30 17:04:16 +00:00
Evan Cheng
380dc98371 Add MCObjectFileInfo and sink the MCSections initialization code from
TargetLoweringObjectFileImpl down to MCObjectFileInfo.

TargetAsmInfo is done to one last method. It's *almost* gone!

llvm-svn: 135569
2011-07-20 05:58:47 +00:00
Eli Friedman
887bb0b25a FileCheck-ize a couple tests.
llvm-svn: 135427
2011-07-18 21:23:42 +00:00
NAKAMURA Takumi
183ec41f4a test/CodeGen/PowerPC/vector.ll: Tweak redirection >%t >%t to >%t >>%t. See also r134814 (test/CodeGen/X86/vector.ll).
llvm-svn: 134900
2011-07-11 16:21:52 +00:00
Roman Divacky
736e37d9b9 Implement ISD::VAARG lowering on PPC32.
llvm-svn: 134005
2011-06-28 15:30:42 +00:00
Roman Divacky
79578394f5 Don't apply on PPC64 the 32bit ADDIC optimizations as there's no overflow
with 32bit values.

llvm-svn: 133439
2011-06-20 15:28:39 +00:00
Chris Lattner
ad5400fa72 rip out a ton of intrinsic modernization logic from AutoUpgrade.cpp, which is
for pre-2.9 bitcode files.  We keep x86 unaligned loads, movnt, crc32, and the
target indep prefetch change.

As usual, updating the testsuite is a PITA.

llvm-svn: 133337
2011-06-18 06:05:24 +00:00
Roman Divacky
6778c94b24 Fix a few places where 32bit instructions/registerset were used on PPC64.
llvm-svn: 133260
2011-06-17 15:21:10 +00:00
Chris Lattner
9e7c036d09 remove parser support for the obsolete "multiple return values" syntax, which
was replaced with return of a "first class aggregate".

llvm-svn: 133245
2011-06-17 06:49:41 +00:00
Chris Lattner
4eb6f76fa6 Remove support for using "foo" as symbols instead of %"foo". This is ancient
syntax and has been long obsolete.  As usual, updating the tests is the nasty
part of this.

llvm-svn: 133242
2011-06-17 06:36:20 +00:00
Chris Lattner
9ec82f54d4 manually upgrade a bunch of tests to modern syntax, and remove some that
are either unreduced or only test old syntax.

llvm-svn: 133228
2011-06-17 03:14:27 +00:00
Roman Divacky
3624922127 Fix wrong usages of CTR/MCTR where CTR8/MCTR8 was meant.
- Check for MTCTR8 in addition to MTCTR when looking up a hazard.

- When lowering an indirect call use CTR8 when targeting 64bit.

- Introduce BCTR8 that uses CTR8 and use it on 64bit when expanding ISD::BRIND.

The last change fixes PR8487. With those changes, we are able to compile a
running "ls" and "sh" on FreeBSD/PowerPC64.

llvm-svn: 132552
2011-06-03 15:47:49 +00:00
Jakob Stoklund Olesen
b85b71f4de FileCheckize and break dependence on coalescing order.
llvm-svn: 130856
2011-05-04 19:02:01 +00:00
Jakob Stoklund Olesen
8a075ce7ea Explicitly request -join-physregs for some tests that depend on it.
llvm-svn: 130855
2011-05-04 19:01:59 +00:00
Rafael Espindola
339ecf7100 Add 130690 back.
llvm-svn: 130693
2011-05-02 15:58:16 +00:00
Rafael Espindola
b5ce7c77ac Revert while I debug the tests that use march but not mtriple.
llvm-svn: 130691
2011-05-02 15:42:31 +00:00
Rafael Espindola
80af9a69e8 Move ppc OS X to cfi too. I am building it on an old ppc mini, but it will take some time.
llvm-svn: 130690
2011-05-02 15:00:52 +00:00
Rafael Espindola
d49e7769a7 Add r130623 back now that ELF has been fixed to work with -fno-dwarf2-cfi-asm.
llvm-svn: 130658
2011-05-01 15:44:13 +00:00
Rafael Espindola
886aa563be Revert the previous patch while I figure out how to make llvm-gcc
less agressive about disabling cfi on linux :-(

llvm-svn: 130626
2011-04-30 23:03:44 +00:00
Rafael Espindola
9455887b10 Enable CFI on OS X.
Currently the output should be almost identical to the one produced by CodeGen
to make the transition easier.

The only two differences I know of are:

* Some files get an extra advance loc of size 0. This will be fixed when
relaxations are enabled.
* The optimization of declaring an EH symbol as an external variable is not
implemented. This is a subset of adding the nounwind attribute, so we if really
this at -O0 we should probably do it at the IL level.

llvm-svn: 130623
2011-04-30 22:29:54 +00:00
Jakob Stoklund Olesen
a0e0f8d74b These tests no longer require linear scan because reserved register coalescing is now universal.
llvm-svn: 128936
2011-04-05 21:40:41 +00:00
Jakob Stoklund Olesen
3d3cee403f Disable the PowerPC/Atomics-64 test.
The code inserted by PPCTargetLowering::EmitInstrWithCustomInserter for ppc64 is
wrong, and I don't know how to fix it. It seems to be using the correct register
classes for pointers, but it inserts all 32-bit instructions.

llvm-svn: 128835
2011-04-04 17:57:26 +00:00
Jakob Stoklund Olesen
57a62da2db Fix PowerPC tests to be register allocator independent.
llvm-svn: 128827
2011-04-04 17:07:03 +00:00
Benjamin Kramer
8313cf1cf4 Fix mistyped CHECK lines.
llvm-svn: 127366
2011-03-09 22:07:31 +00:00
Joerg Sonnenberger
5f2f5fa638 Be nice to Xcore and the XMOS assembler and avoid quoting section names
that contain only letters, digits and the characters "_" and ".".

llvm-svn: 127028
2011-03-04 20:03:14 +00:00
Joerg Sonnenberger
bb93506f95 Bug#9033: For the ELF assembler output, always quote the section name.
llvm-svn: 126963
2011-03-03 22:31:08 +00:00
Anton Korobeynikov
f49c9c02d6 Restore the behavior of frame lowering before my refactoring.
It turns out that ppc backend has really weird interdependencies
over different hooks and all stuff is fragile wrt small changes.
This should fix PR8749

llvm-svn: 122155
2010-12-18 19:53:14 +00:00
Devang Patel
6fe7fe8dd4 If dbg_declare() or dbg_value() is not lowered by isel then emit DEBUG message instead of creating DBG_VALUE for undefined value in reg0.
llvm-svn: 121059
2010-12-06 22:39:26 +00:00
Chris Lattner
3e135493e9 remove a pointless testcase.
llvm-svn: 119119
2010-11-15 05:07:03 +00:00
Chris Lattner
7743db8f20 remove some extraneous quotes to make the new instprinter match.
llvm-svn: 119104
2010-11-15 02:43:46 +00:00
Chris Lattner
8cb5c07514 add some nounwind's.
llvm-svn: 119086
2010-11-14 22:22:14 +00:00
John Thompson
4255425219 Inline asm mult-alt constraint tests.
llvm-svn: 118107
2010-11-02 23:01:44 +00:00
Jakob Stoklund Olesen
a0a5015a35 PowerPC varargs functions store live-in registers on the stack. Make sure we use
virtual registers for those stores since RegAllocFast requires that each live
physreg only be used once.

This fixes PR8357.

llvm-svn: 116222
2010-10-11 20:43:09 +00:00
Chris Lattner
307552613d force a triple, varargs isn't supported with the SVR4 ABI the buildbot tells me.
llvm-svn: 116170
2010-10-10 18:59:01 +00:00
Chris Lattner
2b428a0ab8 fix the expansion of va_arg instruction on PPC to know the arg
alignment for PPC32/64, avoiding some masking operations.

llvm-gcc expands vaarg inline instead of using the instruction
so it has never hit this.

llvm-svn: 116168
2010-10-10 18:34:00 +00:00
Chris Lattner
0ebcc18dec the latest assembler that runs on powerpc 10.4 machines doesn't
support aligned comm.  Detect when compiling for 10.4 and don't
emit an alignment for comm.  THis will hopefully fix PR8198.

llvm-svn: 114817
2010-09-27 06:44:54 +00:00
Eli Friedman
40cb7d9994 PR7781: Fix incorrect shifting in PPCTargetLowering::LowerBUILD_VECTOR.
llvm-svn: 109998
2010-08-02 00:18:19 +00:00
Bill Wendling
85d6ed81b7 Consider this function:
void foo() { __builtin_unreachable(); }

It will output the following on Darwin X86:

_func1:
Leh_func_begin0:
        pushq %rbp
Ltmp0:
        movq %rsp, %rbp
Ltmp1:
Leh_func_end0:

This prolog adds a new Call Frame Information (CFI) row to the FDE with an
address that is not within the address range of the code it describes -- part is
equal to the end of the function -- and therefore results in an invalid EH
frame. If we emit a nop in this situation, then the CFI row is now within the
address range.

llvm-svn: 108568
2010-07-16 22:51:10 +00:00
Bill Wendling
756b0a4d45 Revert. This isn't the correct way to go.
llvm-svn: 108478
2010-07-15 23:42:21 +00:00
Bill Wendling
991234752d Handle code gen for the unreachable instruction if it's the only instruction in
the function. We'll just turn it into a "trap" instruction instead.

The problem with not handling this is that it might generate a prologue without
the equivalent epilogue to go with it:

$ cat t.ll
define void @foo() {
entry:
  unreachable
}
$ llc -o - t.ll -relocation-model=pic -disable-fp-elim -unwind-tables
        .section        __TEXT,__text,regular,pure_instructions
        .globl  _foo
        .align  4, 0x90
_foo:                                   ## @foo
Leh_func_begin0:
## BB#0:                                ## %entry
        pushq   %rbp
Ltmp0:
        movq    %rsp, %rbp
Ltmp1:
Leh_func_end0:
...

The unwind tables then have bad data in them causing all sorts of problems.

Fixes <rdar://problem/8096481>.

llvm-svn: 108473
2010-07-15 23:32:40 +00:00
Eric Christopher
e873e9978c Fix up -fstack-protector on linux to use the segment
registers.  Split out testcases per architecture and os
now.

Patch from Nelson Elhage.

llvm-svn: 107640
2010-07-06 05:18:56 +00:00
Bill Wendling
90b6422f2f Implement the "linker_private_weak" linkage type. This will be used for
Objective-C metadata types which should be marked as "weak", but which the
linker will remove upon final linkage. However, this linkage isn't specific to
Objective-C.

For example, the "objc_msgSend_fixup_alloc" symbol is defined like this:

      .globl l_objc_msgSend_fixup_alloc
      .weak_definition l_objc_msgSend_fixup_alloc
      .section __DATA, __objc_msgrefs, coalesced
      .align 3
l_objc_msgSend_fixup_alloc:
       .quad   _objc_msgSend_fixup
       .quad   L_OBJC_METH_VAR_NAME_1

This is different from the "linker_private" linkage type, because it can't have
the metadata defined with ".weak_definition".

Currently only supported on Darwin platforms.

llvm-svn: 107433
2010-07-01 21:55:59 +00:00
Dan Gohman
d79ac4a097 Eliminate the other half of the BRCOND optimization, and update
as many tests as possible.

llvm-svn: 106749
2010-06-24 15:24:03 +00:00
Dan Gohman
3285057a9d Eliminate the first have of the optimization which eliminates BRCOND
when the condition is constant. This optimization shouldn't be
necessary, because codegen shouldn't be able to find dead control
paths that the IR-level optimizer can't find. And it's undesirable,
because it encourages bugpoint to leave "br i1 false" branches
in its output. And it wasn't updating the CFG.

I updated all the tests I could, but some tests are too reduced
and I wasn't able to meaningfully preserve them.

llvm-svn: 106748
2010-06-24 15:04:11 +00:00
Jakob Stoklund Olesen
7fe0620525 Remove the local register allocator.
Please use the fast allocator instead.

llvm-svn: 106051
2010-06-15 21:58:33 +00:00
Evan Cheng
849bca1ab6 Fix some latency computation bugs: if the use is not a machine opcode do not just return zero.
llvm-svn: 105061
2010-05-28 23:26:21 +00:00
Jakob Stoklund Olesen
40545bf117 Only use clairvoyance when defining a register, and then only if it has one use.
This makes allocation independent on the ordering of use-def chains.

llvm-svn: 103935
2010-05-17 04:50:57 +00:00
Jakob Stoklund Olesen
d99818256c Take allocation hints from copy instructions to/from physregs.
This causes way more identity copies to be generated, ripe for coalescing.

llvm-svn: 103686
2010-05-13 00:19:43 +00:00
Jakob Stoklund Olesen
6976c543cd Enable a bunch more -regalloc=fast tests
llvm-svn: 103531
2010-05-12 00:11:24 +00:00
Dale Johannesen
b10ca6bf4c Implement builtin_return_address(x) and builtin_frame_address(x)
on PPC for x!=0.  7624113.

llvm-svn: 102972
2010-05-03 22:59:34 +00:00
Duncan Sands
153ad3b903 Remove the -enable-sjlj-eh option, which doesn't do anything.
Remove the -enable-eh option which is only used by the JIT,
and replace it with -jit-enable-eh.

llvm-svn: 102865
2010-05-02 15:36:26 +00:00
Chris Lattner
9292bad5f5 on darwin empty functions need to codegen into something of non-zero length,
otherwise labels get incorrectly merged.  We handled this by emitting a 
".byte 0", but this isn't correct on thumb/arm targets where the text segment
needs to be a multiple of 2/4 bytes.  Handle this by emitting a noop.  This
is more gross than it should be because arm/ppc are not fully mc'ized yet.

This fixes rdar://7908505

llvm-svn: 102400
2010-04-26 23:37:21 +00:00
Chris Lattner
b66b0c36cd Bill's change in r95336 broke empty aggregates embedded
in other types.  fix this by only bumping zero-byte globals
up to a single byte if the *entire global* is zero size,
fixing PR6340.

This also fixes empty arrays etc to be handled correctly,
and only does this on subsection-via-symbols targets (aka
darwin) which is the only place where this matters.

llvm-svn: 101879
2010-04-20 06:20:21 +00:00
Dan Gohman
5736cd1e47 Start function numbering at 0.
llvm-svn: 101638
2010-04-17 16:29:15 +00:00
Chris Lattner
23334439e9 add newlines at the end of files.
llvm-svn: 100705
2010-04-07 22:53:17 +00:00
Dale Johannesen
4cdb545401 Split big test into multiple directories to cater to
those who don't build all targets.

llvm-svn: 100688
2010-04-07 20:43:35 +00:00
Evan Cheng
921fc2c77b After trivial coalescing, the MI being visited may have become a copy. Avoid adding it to CSE hash table since copies aren't being considered for CSE and they may be deleted.
rdar://7819990

llvm-svn: 100170
2010-04-02 02:21:24 +00:00
Chris Lattner
1a838000ec add some nounwinds
llvm-svn: 99752
2010-03-28 07:58:37 +00:00
Chris Lattner
de8b42ce67 this takes an insane amount of time to run, disable it for now (PR6727)
llvm-svn: 99751
2010-03-28 07:58:09 +00:00
Duncan Sands
217cec1786 Turn calls to copysignl into an FCOPYSIGN node. Handle FCOPYSIGN nodes
with ppc_f128 type by having the type legalizer turn these back into a
call to copysignl.

llvm-svn: 98514
2010-03-14 21:08:40 +00:00
Chris Lattner
2bdb0765f8 fix AsmPrinter::GetBlockAddressSymbol to always return a unique
label instead of trying to form one based on the BB name (which
causes collisions if the name is empty).  This fixes PR6608

llvm-svn: 98495
2010-03-14 17:53:23 +00:00
Chris Lattner
9331acc6d7 get MMI out of the label uniquing business, just go to MCContext
to get unique assembler temporary labels.

llvm-svn: 98489
2010-03-14 08:36:50 +00:00
Evan Cheng
668ceddeec Enable machine cse pass.
llvm-svn: 98132
2010-03-10 03:07:41 +00:00
Dale Johannesen
4b8f3692f4 The address of an indirect call must be in R12 on Darwin.
Make it so.  (This patch is in LowerCall_Darwin, which seems
to be used by SVR4 code as well; since that doesn't belong here,
I haven't worried about this case.)

llvm-svn: 98077
2010-03-09 20:15:42 +00:00
Chris Lattner
74db1864da add some random nounwinds.
llvm-svn: 97411
2010-02-28 20:36:49 +00:00
Jakob Stoklund Olesen
755ba2ee84 Use the right floating point load/store instructions in PPCInstrInfo::foldMemoryOperandImpl().
The PowerPC floating point registers can represent both f32 and f64 via the
two register classes F4RC and F8RC. F8RC is considered a subclass of F4RC to
allow cross-class coalescing. This coalescing only affects whether registers
are spilled as f32 or f64.

Spill slots must be accessed with load/store instructions corresponding to the
class of the spilled register. PPCInstrInfo::foldMemoryOperandImpl was looking
at the instruction opcode which is wrong.

X86 has similar floating point register classes, but doesn't try to fold
memory operands, so there is no problem there.

llvm-svn: 97262
2010-02-26 21:09:24 +00:00
Chris Lattner
52a02205d8 Change the scheduler from adding nodes in allnodes order
to adding them in a determinstic order (bottom up from 
the root) based on the structure of the graph itself.

This updates tests for some random changes, interesting
bits: CodeGen/Blackfin/promote-logic.ll no longer crashes.
I have no idea why, but that's good right?

CodeGen/X86/2009-07-16-LoadFoldingBug.ll also fails, but
now compiles to have one fewer constant pool entry, making
the expected load that was being folded disappear.  Since it
is an unreduced mass of gnast, I just removed it.

This fixes PR6370

llvm-svn: 97023
2010-02-24 06:11:37 +00:00
Dan Gohman
c44dee5fbd When emitting an instruction which depends on both a post-incremented
induction variable value and a loop-variant value, don't force the
insert position to be at the post-increment position, because it may
not be dominated by the loop-variant value. This fixes a
use-before-def problem noticed on PPC.

llvm-svn: 96774
2010-02-22 03:59:54 +00:00
Chris Lattner
37f20c29c8 add some no-unwinds, other minor cleanups.
llvm-svn: 96756
2010-02-21 20:33:20 +00:00
Chris Lattner
fa1fdcf146 add a triple so that this doesn't fail due to linux/ppc register printing
syntax.

llvm-svn: 96748
2010-02-21 19:27:38 +00:00
Chris Lattner
654f38165b filecheckize and add nouwinds.
llvm-svn: 96745
2010-02-21 18:53:28 +00:00
Dale Johannesen
d147b9a4d4 Make g5 target explicit; scheduling affects register choice.
llvm-svn: 96413
2010-02-16 23:25:23 +00:00
Dale Johannesen
60d48aef7b Adjust register numbers in tests to compensate for the
new lack of R2.

llvm-svn: 96407
2010-02-16 22:31:31 +00:00
Dale Johannesen
ea96b2974f When save/restoring CR at prolog/epilog, in a large
stack frame, the prolog/epilog code was using the same
register for the copy of CR and the address of the save slot.  Oops.
This is fixed here for Darwin, sort of, by reserving R2 for this case.
A better way would be to do the store before the decrement of SP,
which is safe on Darwin due to the red zone.

SVR4 probably has the same problem, but I don't know how to fix it;
there is no red zone and R2 is already used for something else.
I'm going to leave it to someone interested in that target.

Better still would be to rewrite the CR-saving code completely;
spilling each CR subregister individually is horrible code.

llvm-svn: 96015
2010-02-12 21:35:34 +00:00
Rafael Espindola
b0bb1ddfe3 Fix alignment on ppc linux. This fixes the build of crtend.o
llvm-svn: 95477
2010-02-06 03:32:21 +00:00
Bill Wendling
c3f4101cc6 Make test more fucused eliminating extraneous bits.
llvm-svn: 95384
2010-02-05 11:21:05 +00:00
Bill Wendling
9761f067f8 An empty global constant (one of size 0) may have a section immediately
following it. However, the EmitGlobalConstant method wasn't emitting a body for
the constant. The assembler doesn't like that. Before, we were generating this:

  .zerofill __DATA, __common, __cmd, 1, 3

This fix puts us back to that semantic.

llvm-svn: 95336
2010-02-05 00:17:02 +00:00
Dale Johannesen
1e9d147461 Reapply 95050 with a tweak to check the register class.
llvm-svn: 95183
2010-02-03 01:40:33 +00:00
Dale Johannesen
08ab638bdc Test revert 95050; there's a good chance it's causing
buildbot failure.

llvm-svn: 95103
2010-02-02 18:52:56 +00:00
Dale Johannesen
a20fc3d1a9 Make local RA smarter about reusing input register of a copy
as output.  Needed for (functional) correctness in inline asm,
and should be generally beneficial.  7361612.

llvm-svn: 95050
2010-02-02 02:08:02 +00:00
Chris Lattner
95118672e3 Give AsmPrinter the most common expected implementation of
runOnMachineFunction, and switch PPC to use EmitFunctionBody.
The two ppc asmprinters now don't heave to define 
runOnMachineFunction.

llvm-svn: 94722
2010-01-28 01:28:58 +00:00
Daniel Dunbar
c1df55e99c Attempt to unbreak test on Linux. Chris, please check.
llvm-svn: 94399
2010-01-25 00:54:13 +00:00
Chris Lattner
e5e7b41090 stop testing for invalid output.
llvm-svn: 94288
2010-01-23 05:45:28 +00:00
Chris Lattner
75db03497a testcase for r94095
llvm-svn: 94096
2010-01-21 20:01:04 +00:00
Chris Lattner
377bd87849 Now that we have everything nicely factored (e.g. asmprinter is not
doing global variable classification anymore) and hookized, sink almost
all target targets global variable emission code into AsmPrinter and out
of each target.

Some notes:

1. PIC16 does completely custom and crazy stuff, so it is not changed.
2. XCore has some custom handling for extra directives.  I'll look at it next.
3. This switches linux/ppc to use .globl instead of .global.  If .globl is
   actually wrong, let me know and I'll fix it.
4. This makes linux/ppc get a lot of random cases right which were obviously
   wrong before, it is probably now a bit healthier.
5. Blackfin will probably start getting .comm and other things that it didn't
   before.  If this is undesirable, it should explicitly opt out of these
   things by clearing the relevant fields of MCAsmInfo.

This leads to a nice diffstat:
 14 files changed, 127 insertions(+), 830 deletions(-)

llvm-svn: 93858
2010-01-19 05:38:33 +00:00
Chris Lattner
1b6c061cd0 remove uses of deprecated functions, this generates slightly
different BlockAddress labels, but nothing semantically important.

Add a FIXME that BlockAddress codegen is broken if the LLVM BB has 
an empty name (e.g. strip was run).

llvm-svn: 93303
2010-01-13 07:30:49 +00:00
Dan Gohman
5fa04f2707 Delete useless trailing semicolons.
llvm-svn: 92740
2010-01-05 17:55:26 +00:00
Dale Johannesen
365ae431a7 Do better with physical reg operands (typically, from inline asm)
in local register allocator.  If a reg-reg copy has a phys reg
input and a virt reg output, and this is the last use of the phys
reg, assign the phys reg to the virt reg.  If a reg-reg copy has
a phys reg output and we need to reload its spilled input, reload
it directly into the phys reg than passing it through another reg.

Following 76208, there is sometimes no dependency between the def of
a phys reg and its use; this creates a window where that phys reg
can be used for spilling (this is true in linear scan also).  This
is bad and needs to be fixed a better way, although 76208 works too
well in practice to be reverted.  However, there should normally be
no spilling within inline asm blocks.  The patch here goes a long way
towards making this actually be true.

llvm-svn: 91485
2009-12-16 00:29:41 +00:00
Evan Cheng
bdedf32e51 ProcessImplicitDefs should watch out for invalidated iterator and extra implicit operands on copies.
llvm-svn: 89880
2009-11-25 21:13:39 +00:00
Dale Johannesen
5809ff0e58 Do not store R31 into the caller's link area on PPC.
This violates the ABI (that area is "reserved"), and
while it is safe if all code is generated with current
compilers, there is some very old code around that uses
that slot for something else, and breaks if it is stored
into.  Adjust testcases looking for current behavior.
I've verified that the stack frame size is right in all
testcases, whether it changed or not.  7311323.

llvm-svn: 89811
2009-11-24 22:59:02 +00:00
Edward O'Callaghan
5ae4559914 Fix for bad FileCheck converts in revision 89584.
llvm-svn: 89586
2009-11-22 12:50:05 +00:00
Edward O'Callaghan
949850890f Convert a few tests to FileCheck for PR5307.
llvm-svn: 89584
2009-11-22 11:45:44 +00:00
Dale Johannesen
907ff5a620 When generating a vector the really slow way, via loads
and stores, handle the case where the element size is not
a valid target type correctly (PPC).

llvm-svn: 89521
2009-11-21 00:53:23 +00:00
Dale Johannesen
45f80d39f6 Remove an incorrect overaggressive optimization
(PPC specific).

llvm-svn: 89496
2009-11-20 22:16:40 +00:00
Evan Cheng
ea46259f53 Check if subreg index is zero.
llvm-svn: 88899
2009-11-16 06:31:49 +00:00
Evan Cheng
2fa416debd For some targets, a copy can use a register multiple times, e.g. ppc.
llvm-svn: 88895
2009-11-16 05:52:06 +00:00
Dale Johannesen
f57a58c4fe Adjust isConstantSplat to allow for big-endian targets.
PPC is such a target; make it work.

llvm-svn: 87060
2009-11-13 01:45:18 +00:00
Bill Wendling
a6d7a411d3 Fix test to work on every platform.
llvm-svn: 86786
2009-11-11 01:44:22 +00:00
Bill Wendling
ff705446e1 Test this on Darwin only.
llvm-svn: 86752
2009-11-10 23:18:33 +00:00
Dale Johannesen
20e1cd09ba Emit correct code when making a ConstantPool entry for a vector
constant whose component type is not a legal type for the target.
(If the target ConstantPool cannot handle this type either, it has
an opportunity to merge elements.  In practice any target with
8-bit bytes must support i8 *as data*).  7320806 (partial).

llvm-svn: 86751
2009-11-10 23:16:41 +00:00
Bill Wendling
1176227990 Modify how the prologue encoded the "move" information for the FDE. GCC
generates a sequence similar to this:

__Z4funci:
LFB2:
        mflr r0
LCFI0:
        stmw r30,-8(r1)
LCFI1:
        stw r0,8(r1)
LCFI2:
        stwu r1,-80(r1)
LCFI3:
        mr r30,r1
LCFI4:

where LCFI3 and LCFI4 are used by the FDE to indicate what the FP, LR, and other
things are. We generated something more like this:

Leh_func_begin1:
        mflr r0
        stw r31, 20(r1)
        stw r0, 8(r1)
Llabel1:
        stwu r1, -80(r1)
Llabel2:
        mr r31, r1

Note that we are missing the "mr" instruction. This patch makes it more like the
GCC output.

llvm-svn: 86729
2009-11-10 22:14:04 +00:00
Dan Gohman
229f9edf7a Update these tests for the new label names.
llvm-svn: 86192
2009-11-05 23:31:40 +00:00
Bob Wilson
641ce17702 Add -mtriple to llc commands, attempting to fix buildbot failures.
llvm-svn: 86086
2009-11-05 00:51:31 +00:00
Bob Wilson
25738f9e79 Add PowerPC codegen for indirect branches.
llvm-svn: 86050
2009-11-04 21:31:18 +00:00
Dan Gohman
f6c6858329 Add nounwind to this test.
llvm-svn: 82708
2009-09-24 20:20:08 +00:00
Dale Johannesen
7d68f8de7f Model the carry bit on ppc32. Without this we could
move a SUBFC (etc.) below the SUBFE (etc.) that consumed
the carry bit.  Add missing ADDIC8, noticed along the way.

llvm-svn: 82266
2009-09-18 20:15:22 +00:00
Dan Gohman
f2c290dfa6 Convert more tests to avoid llvm-as.
llvm-svn: 81545
2009-09-11 18:36:27 +00:00