1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-27 14:02:50 +01:00
Commit Graph

4513 Commits

Author SHA1 Message Date
Eric Christopher
bce38d60f8 Now that the optimization level is adjusting the feature string
before we hit the subtarget, remove the constructor parameter.

llvm-svn: 218817
2014-10-01 21:05:35 +00:00
Eric Christopher
4ce55b7a5c Rework the PPC TargetMachine so that the non-function specific
overrides happen at TargetMachine creation and not on every
subtarget creation.

llvm-svn: 218805
2014-10-01 20:38:26 +00:00
Sanjay Patel
c095454ca2 Split the estimate() interface into separate functions for each type. NFC.
It was hacky to use an opcode as a switch because it won't always match
(rsqrte != sqrte), and it looks like we'll need to add more special casing
per arch than I had hoped for. Eg, x86 will prefer a different NR estimate
implementation. ARM will want to use it's 'step' instructions. There also
don't appear to be any new estimate instructions in any arch in a long,
long time. Altivec vloge and vexpte may have been the first and last in
that field...

llvm-svn: 218698
2014-09-30 20:28:48 +00:00
Sanjay Patel
98a98574c5 Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2).
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

And:

z = y / x

into:

z = y * rcpe(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.

Differential Revision: http://reviews.llvm.org/D5484

llvm-svn: 218553
2014-09-26 23:01:47 +00:00
Robin Morisset
64053dff5f [Power] Use AtomicExpandPass for fence insertion, and use lwsync where appropriate
Summary:
This patch makes use of AtomicExpandPass in Power for inserting fences around
atomic as part of an effort to remove fence insertion from SelectionDAGBuilder.
As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic
lwsync) instead of sync 0 (heavyweight sync) in many cases.

I also added a test, as there was no test for the barriers emitted by the Power
backend for atomic loads and stores.

Test Plan: new test + make check-all

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5180

llvm-svn: 218331
2014-09-23 20:46:49 +00:00
Lang Hames
719ff0ff4f [MCJIT] Remove PPCRelocations.h - it's no longer used.
This was overlooked in r218320, which removed the relocation headers for other
targets. Thanks to Ulrich Weigand for catching it.

llvm-svn: 218327
2014-09-23 19:17:48 +00:00
Sanjay Patel
1235f854b3 Refactor reciprocal square root estimate into target-independent function; NFC.
This is purely a plumbing patch. No functional changes intended.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner.

Next steps:

    The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added.
    Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner.
    X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added.

Differential Revision: http://reviews.llvm.org/D5425

llvm-svn: 218219
2014-09-21 15:19:15 +00:00
Hal Finkel
0c0c256ad7 Optionally enable more-aggressive FMA formation in DAGCombine
The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one
use, but this is overly-conservative on some systems. Specifically, if the FMA
and the FADD have the same latency (and the FMA does not compete for resources
with the FMUL any more than the FADD does), there is no need for the
restriction, and furthermore, forming the FMA leaving the FMUL can still allow
for higher overall throughput and decreased critical-path length.

Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to
elide the hasOneUse check. This is enabled for PowerPC by default, as most
PowerPC systems will benefit.

Patch by Olivier Sallenave, thanks!

llvm-svn: 218120
2014-09-19 11:42:56 +00:00
Aaron Ballman
c9d2119dc2 Reverting NFC changes from r218050. Instead, the warning was disabled for GCC in r218059, so these changes are no longer required.
llvm-svn: 218062
2014-09-18 17:34:23 +00:00
Aaron Ballman
2e4b3f3dca Fixing a bunch of -Woverloaded-virtual warnings due to hiding getSubtargetImpl from the base class. NFC.
llvm-svn: 218050
2014-09-18 13:27:14 +00:00
Eric Christopher
c75fbbac7c Add a new pass FunctionTargetTransformInfo. This pass serves as a
shim between the TargetTransformInfo immutable pass and the Subtarget
via the TargetMachine and Function. Migrate a single call from
BasicTargetTransformInfo as an example and provide shims where TargetMachine
begins taking a Function to determine the subtarget.

No functional change.

llvm-svn: 218004
2014-09-18 00:34:14 +00:00
Samuel Antao
ec112df870 Fix FastISel bug in boolean returns for PowerPC.
For PPC targets, FastISel does not take the sign extension information into account when selecting return instructions whose operands are constants. A consequence of this is that the return of boolean values is not correct. This patch fixes the problem by evaluating the sign extension information also for constants, forwarding this information to PPCMaterializeInt which takes this information to drive the sign extension during the materialization. 

llvm-svn: 217993
2014-09-17 23:25:06 +00:00
Samuel Antao
7871b3496a Remove unnecessary blank space (test commit)
llvm-svn: 217991
2014-09-17 22:47:28 +00:00
Bill Schmidt
b47e7e1e1d Address comments on r217622
llvm-svn: 217680
2014-09-12 14:26:36 +00:00
Bill Schmidt
020f302f01 [PATCH, PowerPC] Accept 'U' and 'X' constraints in inline asm
Inline asm may specify 'U' and 'X' constraints to print a 'u' for an
update-form memory reference, or an 'x' for an indexed-form memory
reference.  However, these are really only useful in GCC internal code
generation.  In inline asm the operand of the memory constraint is
typically just a register containing the address, so 'U' and 'X' make
no sense.

This patch quietly accepts 'U' and 'X' in inline asm patterns, but
otherwise does nothing.  If we ever unexpectedly see a non-register,
we'll assert and sort it out afterwards.

I've added a new test for these constraints; the test case should be
used for other asm-constraints changes down the road.

llvm-svn: 217622
2014-09-11 20:10:03 +00:00
Sanjay Patel
8030ed3639 Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable.
"Unroll" is not the appropriate name for this variable. Clang already uses 
the term "interleave" in pragmas and metadata for this.

Differential Revision: http://reviews.llvm.org/D5066

llvm-svn: 217528
2014-09-10 17:58:16 +00:00
Craig Topper
743e490f28 Use cast to MVT instead of EVT on a couple calls to getSizeInBits.
llvm-svn: 217473
2014-09-10 04:51:36 +00:00
Juergen Ributzka
76dd2e3da7 [FastISel][tblgen] Rename tblgen generated FastISel functions. NFC.
This is the final round of renaming. This changes tblgen to emit lower-case
function names for FastEmitInst_* and FastEmit_*, and updates all its uses
in the source code.

Reviewed by Eric

llvm-svn: 217075
2014-09-03 20:56:59 +00:00
Juergen Ributzka
fa7bc008ce [FastISel] Rename public visible FastISel functions. NFC.
This commit renames the following public FastISel functions:
LowerArguments -> lowerArguments
SelectInstruction -> selectInstruction
TargetSelectInstruction -> fastSelectInstruction
FastLowerArguments -> fastLowerArguments
FastLowerCall -> fastLowerCall
FastLowerIntrinsicCall -> fastLowerIntrinsicCall
FastEmitZExtFromI1 -> fastEmitZExtFromI1
FastEmitBranch -> fastEmitBranch
UpdateValueMap -> updateValueMap
TargetMaterializeConstant -> fastMaterializeConstant
TargetMaterializeAlloca -> fastMaterializeAlloca
TargetMaterializeFloatZero -> fastMaterializeFloatZero
LowerCallTo -> lowerCallTo

Reviewed by Eric

llvm-svn: 217074
2014-09-03 20:56:52 +00:00
Eric Christopher
e1f21228eb Remove resetSubtargetFeatures as it is unused.
llvm-svn: 217071
2014-09-03 20:36:31 +00:00
Benjamin Kramer
e991977346 Add override to overriden virtual methods, remove virtual keywords.
No functionality change. Changes made by clang-tidy + some manual cleanup.

llvm-svn: 217028
2014-09-03 11:41:21 +00:00
Eric Christopher
2f6f860aaa Reinstate "Nuke the old JIT."
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.

This reinstates commits r215111, 215115, 215116, 215117, 215136.

llvm-svn: 216982
2014-09-02 22:28:02 +00:00
Alexey Samsonov
2724968a53 Fix signed integer overflow in PPCInstPrinter.
This bug was reported by UBSan.

llvm-svn: 216917
2014-09-02 17:38:34 +00:00
Hal Finkel
46b8c3f54e [PowerPC] Guard against illegal selection of add for TargetConstant operands
r208640 was reverted because it caused a self-hosting failure on ppc64. The
underlying cause was the formation of ISD::ADD nodes with ISD::TargetConstant
operands. Because we have no patterns for 'add' taking 'timm' nodes, these are
selected as r+r add instructions (which is a miscompile). Guard against this
kind of behavior in the future by making the backend crash should this occur
(instead of silently generating invalid output).

llvm-svn: 216897
2014-09-02 06:23:54 +00:00
Craig Topper
57c93cf3ef Remove 'virtual' keyword from methods markedwith 'override' keyword.
llvm-svn: 216823
2014-08-30 16:48:34 +00:00
Justin Hibbits
fd1faff02b Test commit. Fix whitespace from a previous patch of mine.
llvm-svn: 216650
2014-08-28 04:40:55 +00:00
Karthik Bhat
d94045aa5a Allow vectorization of division by uniform power of 2.
This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible.
Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend.
Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971)

llvm-svn: 216371
2014-08-25 04:56:54 +00:00
Hal Finkel
7014e3d50f [PowerPC] Add support for dcbtst and icbt (prefetch)
Adds code generation support for dcbtst (data cache prefetch for write) and
icbt (instruction cache prefetch for read - Book E cores only).

We still end up with a 'cannot select' error for the non-supported prefetch
intrinsic forms. This will be fixed in a later commit.

Fixes PR20692.

llvm-svn: 216339
2014-08-23 23:21:04 +00:00
Sanjay Patel
7498546daf name change: isPow2DivCheap -> isPow2SDivCheap
isPow2DivCheap

That name doesn't specify signed or unsigned.

Lazy as I am, I eventually read the function and variable comments. It turns out that this is strictly about signed div. But I discovered that the comments are wrong:

   srl/add/sra

is not the general sequence for signed integer division by power-of-2. We need one more 'sra':

   sra/srl/add/sra

That's the sequence produced in DAGCombiner. The first 'sra' may be removed when dividing by exactly '2', but that's a special case.

This patch corrects the comments, changes the name of the flag bit, and changes the name of the accessor methods.

No functional change intended.

Differential Revision: http://reviews.llvm.org/D5010

llvm-svn: 216237
2014-08-21 22:31:48 +00:00
Tim Northover
9127b613b1 TableGen: allow use of uint64_t for available features mask.
ARM in particular is getting dangerously close to exceeding 32 bits worth of
possible subtarget features. When this happens, various parts of MC start to
fail inexplicably as masks get truncated to "unsigned".

Mostly just refactoring at present, and there's probably no way to test.

llvm-svn: 215887
2014-08-18 11:49:42 +00:00
Hal Finkel
5f7466abdb [PowerPC] Mark fixed-offset byvals as pointed-to by IR values
A byval object, even if allocated at a fixed offset (prescribed by the ABI) is
pointed to by IR values. Most fixed-offset stack objects are not pointed-to by
IR values, so the default is to assume this is not possible. However, we need
to override the default in this case (instruction scheduling can cause
miscompiles otherwise).

Fixes PR20280.

llvm-svn: 215795
2014-08-16 00:17:05 +00:00
Hal Finkel
519c3f0279 [PowerPC] Darwin byval arguments are not immutable
On PPC/Darwin, byval arguments occur at fixed stack offsets in the callee's
frame, but are not immutable -- the pointer value is directly available to the
higher-level code as the address of the argument, and the value of the byval
argument can be modified at the IR level.

This is necessary, but not sufficient, to fix PR20280. When PR20280 is fixed in
a follow-up commit, its test case will cover this change.

llvm-svn: 215793
2014-08-16 00:16:29 +00:00
Rafael Espindola
80b0c0c71c Remove HasLEB128.
We already require CFI, so it should be safe to require .leb128 and .uleb128.

llvm-svn: 215712
2014-08-15 14:01:07 +00:00
Benjamin Kramer
9d3803050f PPC: Clean up pointer casting, no functionality change.
Silences GCC's -Wcast-qual.

llvm-svn: 215703
2014-08-15 11:05:45 +00:00
Bill Schmidt
c5f23638d5 [PPC64] Add missing dependency on X2 to LDinto_toc.
The LDinto_toc pattern has been part of 64-bit PowerPC for a long
time, and represents loading from a memory location into the TOC
register (X2).  However, this pattern doesn't explicitly record that
it modifies that register.  This patch adds the missing dependency.

It was very surprising to me that this has never shown up as a problem
in the past, and that we only saw this problem recently in a single
scenario when building a self-hosted clang.  It turns out that in most
cases we have another dependency present that keeps the LDinto_toc
instruction tied in place.  LDinto_toc is used for TOC restore
following a call site, so this is a typical sequence:

   BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
   LDinto_toc 24, %X1
   ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use>

Because the LDinto_toc is inserted prior to the ADJCALLSTACKUP, there
is a natural anti-dependency between the two that keeps it in place.

Therefore we don't usually see a problem.  However, in one particular
case, one call is followed immediately by another call, and the second
call requires a parameter that is a TOC-relative address.  This is the
code sequence:

  BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
  LDinto_toc 24, %X1
  ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use>
  ADJCALLSTACKDOWN 96, %R1<imp-def>, %R1<imp-use>
  %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
  %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39

Note that the back-to-back stack adjustments are the same size!  The
back end is smart enough to recognize this and optimize them away:

  BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
  LDinto_toc 24, %X1
  %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
  %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39

Now there is nothing to prevent the ADDIStocHA instruction from moving
ahead of the LDinto_toc instruction, and because of the longest-path
heuristic, this is what happens.

With the accompanying patch, %X2 is represented as an implicit def:

  BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
  LDinto_toc 24, %X1, %X2<imp-def,dead>
  ADJCALLSTACKUP 96, 0, %R1<imp-def,dead>, %R1<imp-use>
  ADJCALLSTACKDOWN 96, %R1<imp-def,dead>, %R1<imp-use>
  %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
  %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39

So now when the two stack adjustments are removed, ADDIStocHA is
prevented from being moved above LDinto_toc.

I have not yet created a test case for this, because the original
failure occurs on a relatively large function that needs reduction.
However, this is a fairly serious bug, despite its infrequency, and I
wanted to get this patch onto the list as soon as possible so that it
can be considered for a 3.5 backport.  I'll work on whittling down a
test case.

Have we missed the boat for 3.5 at this point?

Thanks,
Bill

llvm-svn: 215685
2014-08-15 01:25:26 +00:00
Benjamin Kramer
da144ed5a2 Canonicalize header guards into a common format.
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)

Changes made by clang-tidy with minor tweaks.

llvm-svn: 215558
2014-08-13 16:26:38 +00:00
Hal Finkel
97fb1d4d91 [PowerPC] Implement PPCTargetLowering::getTgtMemIntrinsic
This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store
intrinsics. As with the construction of the MachineMemOperands for the
intrinsic calls used for unaligned load/store lowering, the only slight
complication is that we need to represent a larger memory range than the
loaded/stored value-type size (because the address is rounded down to an
aligned address, and we need to conservatively represent the entire possible
range of the actual access). This required adding an extra size field to
TargetLowering::IntrinsicInfo, and this was done in a way that required no
modifications to other targets (the size defaults to the store size of the
provided memory data type).

This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed).

llvm-svn: 215512
2014-08-13 01:15:40 +00:00
Joerg Sonnenberger
4a2da5b0e2 @l and friends adjust their value depending the context used in.
For ori, they are unsigned, for addi, signed. Create a new target
expression type to handle this and evaluate Fixups accordingly.

llvm-svn: 215315
2014-08-10 12:41:50 +00:00
Joerg Sonnenberger
df2012d0aa If available, pass down the Fixup object to EvaluateAsRelocatable.
At least on PowerPC, the interpretation of certain modifiers depends on
the context they appear in.

llvm-svn: 215310
2014-08-10 11:35:12 +00:00
Joerg Sonnenberger
f744f70919 Allow the third argument for the subi family to be an expression.
llvm-svn: 215286
2014-08-09 17:10:26 +00:00
Joerg Sonnenberger
a632bc8df4 Use the full form of dccci and iccci from the early PPC 405 documents,
since the operands are actually used on those cores. Provide aliases for
the only documented case in the newer Power ISA speec.

llvm-svn: 215282
2014-08-09 13:58:31 +00:00
Eric Christopher
3461eab467 Initialize PPC DataLayout based on the Triple only.
llvm-svn: 215281
2014-08-09 04:53:17 +00:00
Eric Christopher
846e6d954c Remove extraneous 64-bit argument to the PPC TargetMachine constructor
and update initialization.

llvm-svn: 215280
2014-08-09 04:38:56 +00:00
Joerg Sonnenberger
71ad99879f Allow large immediates for branch instructions in 32bit mode.
llvm-svn: 215240
2014-08-08 20:57:58 +00:00
Joerg Sonnenberger
8e42fa93a6 Provide an implementation of getNoopForMachoTarget for PPC, otherwise
empty functions will assert in the MC object writer.

llvm-svn: 215238
2014-08-08 19:13:23 +00:00
Joerg Sonnenberger
4df95fac67 Add low-level option for avoiding float stores from va_start until
soft-float is properly supported.

llvm-svn: 215221
2014-08-08 16:46:10 +00:00
Joerg Sonnenberger
ca56042175 Add support for SPE load/store from memory.
llvm-svn: 215220
2014-08-08 16:43:49 +00:00
Eric Christopher
378bc328f0 Temporarily Revert "Nuke the old JIT." as it's not quite ready to
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.

Approved by Jim Grosbach, Lang Hames, Rafael Espindola.

This reverts commits r215111, 215115, 215116, 215117, 215136.

llvm-svn: 215154
2014-08-07 22:02:54 +00:00
Joerg Sonnenberger
857466a57b Add the majority of the remaining SPE instructions.
llvm-svn: 215131
2014-08-07 18:52:39 +00:00
Rafael Espindola
e9ebbe5559 Nuke the old JIT.
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.

Thanks to Lang Hames for making MCJIT a good replacement!

llvm-svn: 215111
2014-08-07 14:21:18 +00:00
Joerg Sonnenberger
c6357b6edc Add mfasr and mtasr
llvm-svn: 215110
2014-08-07 13:35:34 +00:00
Joerg Sonnenberger
57c4e076f0 Add mfrtcu and mfrtcl instructions
llvm-svn: 215109
2014-08-07 13:16:58 +00:00
Joerg Sonnenberger
91c0c415d7 Support mttbl and mttbu mnemonic
llvm-svn: 215108
2014-08-07 13:06:23 +00:00
Joerg Sonnenberger
22c67059fa Add RFID instruction.
llvm-svn: 215105
2014-08-07 12:39:59 +00:00
Joerg Sonnenberger
2b751a0bc6 Fix Itineray class of rfi
llvm-svn: 215104
2014-08-07 12:35:16 +00:00
Joerg Sonnenberger
9980cac0b2 Spell e500 feature in lower case.
llvm-svn: 215103
2014-08-07 12:31:28 +00:00
Joerg Sonnenberger
d344d15c74 Add first bunch of SPE instructions. As they overlap with Altivec, mark
them as parser-only until the disassembler is extended to handle
predicates properly.

llvm-svn: 215102
2014-08-07 12:18:21 +00:00
Eric Christopher
4a1cdb2ba7 Remove the target machine from CCState. Previously it was only used
to get the subtarget and that's accessible from the MachineFunction
now. This helps clear the way for smaller changes where we getting
a subtarget will require passing in a MachineFunction/Function as
well.

llvm-svn: 214988
2014-08-06 18:45:26 +00:00
Rafael Espindola
c981388b03 Remove a virtual function from TargetMachine. NFC.
llvm-svn: 214929
2014-08-05 22:10:21 +00:00
Bill Schmidt
8159f9047b [PowerPC] Swap arguments and adjust shift count for vsldoi on little endian
Commits r213915 and r214718 fix recognition of shuffle masks for vmrg*
and vpku*um instructions for a little-endian target, by swapping the
input arguments.  The vsldoi instruction requires similar treatment,
and also needs its shift count adjusted for little endian.

Reviewed by Ulrich Weigand.

This is a bug fix candidate for release 3.5 (and hopefully the last of
those for PowerPC).

llvm-svn: 214923
2014-08-05 20:47:25 +00:00
Joerg Sonnenberger
6eb6423316 Add accessors for the PPC 403 bank registers.
llvm-svn: 214875
2014-08-05 15:45:15 +00:00
Joerg Sonnenberger
cc8f67da81 Accessors for SSR2 and SSR3 on PPC 403.
llvm-svn: 214867
2014-08-05 14:53:05 +00:00
Joerg Sonnenberger
bc6e9b7c12 Add dci/ici instructions for PPC 476 and friends.
llvm-svn: 214864
2014-08-05 14:40:32 +00:00
Joerg Sonnenberger
f726eed55e Add mftblo and mftbhi for PPC 4xx.
llvm-svn: 214863
2014-08-05 14:18:16 +00:00
Joerg Sonnenberger
d97414d887 Add lswi / stswi for assembler use with a warning to not add patterns
for them.

llvm-svn: 214862
2014-08-05 13:34:01 +00:00
Eric Christopher
67c04e77e5 Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.

Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.

llvm-svn: 214838
2014-08-05 02:39:49 +00:00
Joerg Sonnenberger
c84fe6e931 Add TCR register access
llvm-svn: 214826
2014-08-04 23:53:42 +00:00
Joerg Sonnenberger
cd34d01e29 Add PPC 603's tlbld and tlbli instructions.
llvm-svn: 214825
2014-08-04 23:49:45 +00:00
Bill Schmidt
5e2cd3791c [PPC64LE] Fix wrong IR for vec_sld and vec_vsldoi
My original LE implementation of the vsldoi instruction, with its
altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect
shufflevector operations in the LLVM IR.  Correct code is generated
because the back end handles the incorrect shufflevector in a
consistent manner.

This patch and a companion patch for Clang correct this problem by
removing the fixup from altivec.h and the corresponding fixup from the
PowerPC back end.  Several test cases are also modified to reflect the
now-correct LLVM IR.

llvm-svn: 214800
2014-08-04 23:21:01 +00:00
Joerg Sonnenberger
b94b28812d Add simplified aliases for access to DCCR, ICCR, DEAR and ESR
llvm-svn: 214797
2014-08-04 22:56:42 +00:00
Joerg Sonnenberger
0f4e0ad62e tlbre / tlbwe / tlbsx / tlbsx. variants for the PPC 4xx CPUs.
llvm-svn: 214784
2014-08-04 21:28:22 +00:00
Eric Christopher
99307e99a2 Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

llvm-svn: 214781
2014-08-04 21:25:23 +00:00
Joerg Sonnenberger
c3da10cbbc Recognize mftbl as alias for mftb, for symmetry with mttb.
llvm-svn: 214769
2014-08-04 20:28:34 +00:00
Joerg Sonnenberger
3be1146503 Rename PPCLinuxMCAsmInfo to PPCELFMCAsmInfo to better reflect the
systems it represents.

llvm-svn: 214755
2014-08-04 18:46:13 +00:00
Joerg Sonnenberger
660877af60 Allow .lcomm with alignment on ELF targets.
llvm-svn: 214754
2014-08-04 18:45:10 +00:00
Joerg Sonnenberger
51a8467117 Refactor SPRG instructions.
llvm-svn: 214733
2014-08-04 17:26:15 +00:00
Joerg Sonnenberger
f6049406d6 Add support for m[ft][di]bat[ul] instructions.
llvm-svn: 214731
2014-08-04 17:07:41 +00:00
Joerg Sonnenberger
b8c7e54901 Add features for PPC 4xx and e500/e500mc instructions.
Move the test cases for them into separate files.

llvm-svn: 214724
2014-08-04 15:47:38 +00:00
Ulrich Weigand
5df23aacbe [PowerPC] Swap arguments to vpkuhum/vpkuwum on little-endian
In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl*
by swapping the input arguments.  As it turns out, the exact same fix
is also required for the vpkuhum/vpkuwum patterns.

This fixes another regression in llvmpipe when vector support is
enabled.

Reviewed by Bill Schmidt.

llvm-svn: 214718
2014-08-04 13:53:40 +00:00
Ulrich Weigand
bf94969247 [PowerPC] MULHU/MULHS are not legal for vector types
I ran into some test failures where common code changed vector division
by constant into a multiply-high operation (MULHU).  But these are not
implemented by the back-end, so we failed to recognize the insn.

Fixed by marking MULHU/MULHS as Expand for vector types.

llvm-svn: 214716
2014-08-04 13:27:12 +00:00
Ulrich Weigand
32b8ceb243 [PowerPC] Fix and improve vector comparisons
This patch refactors code generation of vector comparisons.

This fixes a wrong code-gen bug for ISD::SETGE for floating-point types,
and improves generated code for vector comparisons in general.

Specifically, the patch moves all logic deciding how to implement vector
comparisons into getVCmpInst, which gets two extra boolean outputs
indicating to its caller whether its needs to swap the input operands
and/or negate the result of the comparison.  Apart from implementing
these two modifications as directed by getVCmpInst, there is no need
to ever implement vector comparisons in any other manner; in particular,
there is never a need to perform two separate comparisons (e.g. one for
equal and one for greater-than, as code used to do before this patch).

Reviewed by Bill Schmidt.

llvm-svn: 214714
2014-08-04 13:13:57 +00:00
Joerg Sonnenberger
1253396c1a tlbia support
llvm-svn: 214640
2014-08-02 20:16:29 +00:00
Joerg Sonnenberger
e208527e31 mfdcr / mtdcr support
llvm-svn: 214639
2014-08-02 20:00:26 +00:00
Joerg Sonnenberger
ca3cb82714 Don't use additional arguments for dss and friends to satisfy DSS_Form,
when let can do the same thing. Keep the 64bit variants as codegen-only.
While they have a different register class, the encoding is the same for
32bit and 64bit mode. Having both present would otherwise confuse the
disassembler.

llvm-svn: 214636
2014-08-02 15:09:41 +00:00
Eric Christopher
dfc6457da2 Add a non-const subtarget returning function to the target machine
so that we can use it to get the old-style JIT out of the subtarget.

This code should be removed when the old-style JIT is removed
(imminently).

llvm-svn: 214560
2014-08-01 21:18:01 +00:00
Ulrich Weigand
9dae728acd [PowerPC] PR20280 - Slots for byval parameters are not immutable
Found by inspection while looking at PR20280: code would mark slots
in the parameter save area where a byval parameter is passed as
"immutable".  This is not correct since code is allowed to modify
byval parameters in place in the parameter save area.

llvm-svn: 214517
2014-08-01 14:35:58 +00:00
Hal Finkel
f46410e6eb [PowerPC] Generate unaligned vector loads using intrinsics instead of regular loads
Altivec vector loads on PowerPC have an interesting property: They always load
from an aligned address (by rounding down the address actually provided if
necessary). In order to generate an actual unaligned load, you can generate two
load instructions, one with the original address, one offset by one vector
length, and use a special permutation to extract the bytes desired.

When this was originally implemented, I generated these two loads using regular
ISD::LOAD nodes, now marked as aligned. Unfortunately, there is a problem with
this:

The alignment of a load does not contribute to its identity, and SDNodes
are uniqued. So, imagine that we have some unaligned load, L1, that is not
aligned. The routine will create two loads, L1(aligned) and (L1+16)(aligned).
Further imagine that there had already existed a load (L1+16)(unaligned) with
the same chain operand as the load L1. When (L1+16)(aligned) is created as part
of the lowering of L1, this load *is* also the (L1+16)(unaligned) node, just
now marked as aligned (because the new alignment overwrites the old). But the
original users of (L1+16)(unaligned) now get the data intended for the
permutation yielding the data for L1, and (L1+16)(unaligned) no longer exists
to get its own permutation-based expansion. This was PR19991.

A second potential problem has to do with the MMOs on these loads, which can be
used by AA during instruction scheduling to break chain-based dependencies. If
the new "aligned" loads get the MMO from the original unaligned load, this does
not represent the fact that it will load data from below the original address.
Normally, this would not matter, but this load might be combined with another
load pair for a previous vector, and then the dependency on the otherwise-
ignored lower bytes can matter.

To fix both problems, instead of generating the necessary loads using regular
ISD::LOAD instructions, ppc_altivec_lvx intrinsics are used instead. These are
provided with MMOs with a conservative address range.

Unfortunately, I no longer have a failing test case (since PR19991 was
reported, other changes in CodeGen have forced this bug back into hiding it
again). Nevertheless, this should fix the underlying problem.

llvm-svn: 214481
2014-08-01 05:20:41 +00:00
Hal Finkel
3be61a8b81 [PowerPC] Recognize consecutive memory accesses from intrinsics
When generating unaligned vector loads, we need to search for other loads or
stores nearby offset by one vector width. If we find one, then we know that we
can safely generate another aligned load at that address. Otherwise, we must
generate the next load using an offset of the vector width minus one byte (so
we don't read off the end of the allocation if the base unaligned address
happened to be aligned at runtime). We had previously done this using only
other vector loads and stores, but did not consider the PowerPC-specific vector
load/store intrinsics. Now we'll also consider vector intrinsics. By itself,
this change is a feature enhancement, but is a necessary step toward fixing the
underlying problem behind PR19991.

llvm-svn: 214469
2014-08-01 01:02:01 +00:00
Louis Gerbarg
8048e52537 Make sure no loads resulting from load->switch DAGCombine are marked invariant
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.

This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.

llvm-svn: 214449
2014-07-31 21:45:05 +00:00
Joerg Sonnenberger
a5a8f9d13c Add mtpid/mfpid for BookE.
llvm-svn: 214363
2014-07-30 23:59:11 +00:00
Joerg Sonnenberger
3845a9a9b8 Refactor TLBIVAX and add tlbsx.
llvm-svn: 214354
2014-07-30 22:51:15 +00:00
Joerg Sonnenberger
acae7c8f64 Add rfdi and rfmci from the e500/e500mc ISA.
llvm-svn: 214339
2014-07-30 21:09:03 +00:00
Joerg Sonnenberger
e81540b210 Add BookE's tlbre, tlbwe and tlbivax instructions.
llvm-svn: 214332
2014-07-30 20:44:04 +00:00
Joerg Sonnenberger
270c5a66fb Add BookE's wrtee and wrteei instructions.
llvm-svn: 214297
2014-07-30 10:32:51 +00:00
Joerg Sonnenberger
577206124d SPRG 0 to 3 are valid outside BookE, so move them to the normal test
file. Add support for accessing SPRG 4 to 7 on BookE.

llvm-svn: 214295
2014-07-30 09:24:37 +00:00
Joerg Sonnenberger
db7b2c7644 Add rfci instruction.
llvm-svn: 214256
2014-07-29 23:45:20 +00:00
Joerg Sonnenberger
abd973f309 mbar without argument is equivalent to mbar 0.
llvm-svn: 214250
2014-07-29 23:31:27 +00:00
Joerg Sonnenberger
afb8bfcb47 Recognize BookE's mbar instruction.
llvm-svn: 214244
2014-07-29 23:16:31 +00:00
Joerg Sonnenberger
5dd0828a9c Fix typo in alias: DSIR -> DSISR
llvm-svn: 214238
2014-07-29 22:42:44 +00:00
Joerg Sonnenberger
d52d4c80b5 Support move to/from segment register.
llvm-svn: 214234
2014-07-29 22:21:57 +00:00
Joerg Sonnenberger
f3709585f9 Add a number of aliases for SPR access.
llvm-svn: 214196
2014-07-29 18:55:43 +00:00
Joerg Sonnenberger
c3bec0cdc8 Add rfi instruction. Based on feedback by Ulrich Weigand.
llvm-svn: 214181
2014-07-29 15:49:09 +00:00
Matt Arsenault
55d94a2290 Fix typos / grammar.
llvm-svn: 214147
2014-07-29 00:02:40 +00:00
Ulrich Weigand
37cf88e787 [PowerPC] Support ELFv1/ELFv2 ABI selection via features
While LLVM now supports both ELFv1 and ELFv2 ABIs, their use is currently
hard-coded via the target triple: powerpc64-linux is always ELFv1, while
powerpc64le-linux is always ELFv2.

These are of course the most common scenarios, but in principle it is
possible to support the ELFv2 ABI on big-endian or the ELFv1 ABI on
little-endian systems (and GCC does support that), and there are some
special use cases for that (e.g. certain Linux kernel versions could
only be built using ELFv1 on LE).

This patch implements the LLVM side of supporting this.  As precedent
on other platforms suggests, ABI options are passed to the back-end as
features.  Thus, this patch implements two features "elfv1" and "elfv2"
that select the desired ABI if present.  (If not, the LLVM uses the
same default rules as now.)

llvm-svn: 214072
2014-07-28 13:09:28 +00:00
Matt Arsenault
76c7b7a591 Add alignment value to allowsUnalignedMemoryAccess
Rename to allowsMisalignedMemoryAccess.

On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment,
and don't need to be split into multiple accesses. Vector loads with
an alignment of the element type are not uncommon in OpenCL code.

llvm-svn: 214055
2014-07-27 17:46:40 +00:00
Hal Finkel
ba36f6399d [PowerPC] Support TLS on PPC32/ELF
Patch by Justin Hibbits!

llvm-svn: 213960
2014-07-25 17:47:22 +00:00
Bill Schmidt
1861624c2e [PATCH][PPC64LE] Correct little-endian usage of vmrgh* and vmrgl*.
Because the PowerPC vmrgh* and vmrgl* instructions have a built-in
big-endian bias, it is necessary to swap their inputs in little-endian
mode when using them to implement a vector shuffle.  This was
previously missed in the vector LE implementation.

There was already logic to distinguish between unary and "normal"
vmrg* vector shuffles, so this patch extends that logic to use a third
option:  "swapped" vmrg* vector shuffles that are used for little
endian in place of the "normal" ones.

I've updated the vec-shuffle-le.ll test to check for the expected
register ordering on the generated instructions.

This bug was discovered when testing the LE and ELFv2 patches for
safety if they were backported to 3.4.  A different vectorization
decision was made in 3.4 than on mainline trunk, and that exposed the
problem.  I've verified this fix takes care of that issue.

llvm-svn: 213915
2014-07-25 01:55:55 +00:00
Joerg Sonnenberger
97f682d3e2 Don't use 128bit functions on PPC32.
llvm-svn: 213899
2014-07-24 22:20:10 +00:00
NAKAMURA Takumi
509350ce96 Prune dependency to MC from each target disassembler.
llvm-svn: 213856
2014-07-24 11:45:11 +00:00
NAKAMURA Takumi
dbff653a5a Update library dependencies.
llvm-svn: 213832
2014-07-24 02:10:42 +00:00
Ulrich Weigand
eb914f2256 [PowerPC] ELFv2 aggregate passing support
This patch adds infrastructure support for passing array types
directly.  These can be used by the front-end to pass aggregate
types (coerced to an appropriate array type).  The details of the
array type being used inform the back-end about ABI-relevant
properties.  Specifically, the array element type encodes:
- whether the parameter should be passed in FPRs, VRs, or just
  GPRs/stack slots  (for float / vector / integer element types,
  respectively)
- what the alignment requirements of the parameter are when passed in
  GPRs/stack slots  (8 for float / 16 for vector / the element type
  size for integer element types) -- this corresponds to the
  "byval align" field

Using the infrastructure provided by this patch, a companion patch
to clang will enable two features:
- In the ELFv2 ABI, pass (and return) "homogeneous" floating-point
  or vector aggregates in FPRs and VRs (this is similar to the ARM
  homogeneous aggregate ABI)
- As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates
  that fit fully in registers without using the "byval" mechanism

The patch uses the functionArgumentNeedsConsecutiveRegisters callback
to encode that special treatment is required for all directly-passed
array types.  The isInConsecutiveRegs / isInConsecutiveRegsLast bits set
as a results are then used to implement the required size and alignment
rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc.

As a related change, the ABI routines have to be modified to support
passing floating-point types in GPRs.  This is necessary because with
homogeneous aggregates of 4-byte float type we can now run out of FPRs
*before* we run out of the 64-byte argument save area that is shadowed
by GPRs.  Any extra floating-point arguments that no longer fit in FPRs
must now be passed in GPRs until we run out of those too.

Note that there was already code to pass floating-point arguments in
GPRs used with vararg parameters, which was done by writing the argument
out to the argument save area first and then reloading into GPRs.  The
patch re-implements this, however, in favor of code packing float arguments
directly via extension/truncation, BITCAST, and BUILD_PAIR operations.

This is required to support the ELFv2 ABI, since we cannot unconditionally
write to the argument save area (which the caller might not have allocated).
The change does, however, affect ELFv1 varags routines too; but even here
the overall effect should be advantageous: Instead of loading the argument
into the FPR, then storing the argument to the stack slot, and finally
reloading the argument from the stack slot into a GPR, the new code now
just loads the argument into the FPR, and subsequently loads the argument
into the GPR (via BITCAST).  That BITCAST might imply a save/reload from
a stack temporary (in which case we're no worse than before); but it
might be implemented more efficiently in some cases.

The final part of the patch enables up to 8 FPRs and VRs for argument
return in PPCCallingConv.td; this is required to support returning
ELFv2 homogeneous aggregates.  (Note that this doesn't affect other ABIs
since LLVM wil only look for which register to use if the parameter is
marked as "direct" return anyway.)

Reviewed by Hal Finkel.

llvm-svn: 213493
2014-07-21 00:13:26 +00:00
Ulrich Weigand
9fcc5caf2d [PowerPC] ELFv2 explicit CFI for CR fields
This is a minor improvement in the ELFv2 ABI.   In ELFv1, DWARF CFI
would represent a saved CR word (holding CR fields CR2, CR3, and CR4)
using just a single CFI record refering to CR2.   In ELFv2 instead,
each of the CR fields is represented by its own CFI record.  The
advantage is that the compiler can now chose to save just a single
(or two) CR fields instead of all of them, if those are the only ones
that actually need saving.  That can lead to more efficient code using
mf(o)crf instead of the (slow) mfcr instruction.

Note that this patch does not (yet) implement this more efficient
code generation, but it does implement the part that is required to
be ABI compliant: creating multiple CFI records if multiple CR fields
are saved.

Reviewed by Hal Finkel.

llvm-svn: 213492
2014-07-21 00:03:18 +00:00
Ulrich Weigand
fb90fdfb31 [PowerPC] ELFv2 stack space reduction
The ELFv2 ABI reduces the amount of stack required to implement an
ABI-compliant function call in two ways:
* the "linkage area" is reduced from 48 bytes to 32 bytes by
  eliminating two unused doublewords
* the 64-byte "parameter save area" is now optional and need not be
  present in certain cases (it remains mandatory in functions with
  variable arguments, and functions that have any parameter that is
  passed on the stack)

The following patch implements this required changes:
- reducing the linkage area, and associated relocation of the TOC save
  slot, in getLinkageSize / getTOCSaveOffset (this requires updating all
  callers of these routines to pass in the isELFv2ABI flag).
- (partially) handling the case where the parameter save are is optional

This latter part requires some extra explanation:  Currently, we still
always allocate the parameter save area when *calling* a function.
That is certainly always compliant with the ABI, but may cause code to
allocate stack unnecessarily.  This can be addressed by a follow-on
optimization patch.

On the *callee* side, in LowerFormalArguments, we *must* track
correctly whether the ABI guarantees that the caller has allocated
the parameter save area for our use, and the patch does so. However,
there is one complication: the code that handles incoming "byval"
arguments will currently *always* write to the parameter save area,
because it has to force incoming register arguments to the stack since
it must return an *address* to implement the byval semantics.

To fix this, the patch changes the LowerFormalArguments code to write
arguments to a freshly allocated stack slot on the function's own stack
frame instead of the argument save area in those cases where that area
is not present.

Reviewed by Hal Finkel.

llvm-svn: 213490
2014-07-20 23:43:15 +00:00
Ulrich Weigand
41e116ee77 [PowerPC] ELFv2 function call changes
This patch builds upon the two preceding MC changes to implement the
basic ELFv2 function call convention.  In the ELFv1 ABI, a "function
descriptor" was associated with every function, pointing to both the
entry address and the related TOC base (and a static chain pointer
for nested functions).  Function pointers would actually refer to that
descriptor, and the indirect call sequence needed to load up both entry
address and TOC base.

In the ELFv2 ABI, there are no more function descriptors, and function
pointers simply refer to the (global) entry point of the function code.
Indirect function calls simply branch to that address, after loading it
up into r12 (as required by the ABI rules for a global entry point).
Direct function calls continue to just do a "bl" to the target symbol;
this will be resolved by the linker to the local entry point of the
target function if it is local, and to a PLT stub if it is global.
That PLT stub would then load the (global) entry point address of the
final target into r12 and branch to it.  Note that when performing a
local function call, r2 must be set up to point to the current TOC
base: if the target ends up local, the ABI requires that its local
entry point is called with r2 set up; if the target ends up global,
the PLT stub requires that r2 is set up.

This patch implements all LLVM changes to implement that scheme:
- No longer create a function descriptor when emitting a function
  definition (in EmitFunctionEntryLabel)
- Emit two entry points *if* the function needs the TOC base (r2)
  anywhere (this is done EmitFunctionBodyStart; note that this cannot
  be done in EmitFunctionBodyStart because the global entry point
  prologue code must be *part* of the function as covered by debug info).
- In order to make use tracking of r2 (as needed above) work correctly,
  mark direct function calls as implicitly using r2.
- Implement the ELFv2 indirect function call sequence (no function
  descriptors; load target address into r12).
- When creating an ELFv2 object file, emit the .abiversion 2 directive
  to tell the linker to create the appropriate version of PLT stubs.  

Reviewed by Hal Finkel.

llvm-svn: 213489
2014-07-20 23:31:44 +00:00
Ulrich Weigand
28c7be6585 [MC] Pass MCSymbolData to needsRelocateWithSymbol
As discussed in a previous checking to support the .localentry
directive on PowerPC, we need to inspect the actual target symbol
in needsRelocateWithSymbol to make the appropriate decision based
on that symbol's st_other bits.

Currently, needsRelocateWithSymbol does not get the target symbol.
However, it is directly available to its sole caller.  This patch
therefore simply extends the needsRelocateWithSymbol by a new
parameter "const MCSymbolData &SD", passes in the target symbol,
and updates all derived implementations.

In particular, in the PowerPC implementation, this patch removes
the FIXME added by the previous checkin.

llvm-svn: 213487
2014-07-20 23:15:06 +00:00
Ulrich Weigand
1376d19372 [PowerPC] ELFv2 MC support for .localentry directive
A second binutils feature needed to support ELFv2 is the .localentry
directive.  In the ELFv2 ABI, functions may have two entry points:
one for calling the routine locally via "bl", and one for calling the
function via function pointer (either at the source level, or implicitly
via a PLT stub for global calls).  The two entry points share a single
ELF symbol, where the ELF symbol address identifies the global entry
point address, while the local entry point is found by adding a delta
offset to the symbol address.  That offset is encoded into three
platform-specific bits of the ELF symbol st_other field.

The .localentry directive instructs the assembler to set those fields
to encode a particular offset.  This is typically used by a function
prologue sequence like this:

func:
        addis r2, r12, (.TOC.-func)@ha
        addi r2, r2, (.TOC.-func)@l
        .localentry func, .-func

Note that according to the ABI, when calling the global entry point,
r12 must be set to point the global entry point address itself; while
when calling the local entry point, r2 must be set to point to the TOC
base.  The two instructions between the global and local entry point in
the above example translate the first requirement into the second.

This patch implements support in the PowerPC MC streamers to emit the
.localentry directive (both into assembler and ELF object output), as
well as support in the assembler parser to parse that directive.

In addition, there is another change required in MC fixup/relocation
handling to properly deal with relocations targeting function symbols
with two entry points: When the target function is known local, the MC
layer would immediately handle the fixup by inserting the target
address -- this is wrong, since the call may need to go to the local
entry point instead.  The GNU assembler handles this case by *not*
directly resolving fixups targeting functions with two entry points,
but always emits the relocation and relies on the linker to handle
this case correctly.  This patch changes LLVM MC to do the same (this
is done via the processFixupValue routine).

Similarly, there are cases where the assembler would normally emit a
relocation, but "simplify" it to a relocation targeting a *section*
instead of the actual symbol.  For the same reason as above, this
may be wrong when the target symbol has two entry points.  The GNU
assembler again handles this case by not performing this simplification
in that case, but leaving the relocation targeting the full symbol,
which is then resolved by the linker.  This patch changes LLVM MC
to do the same (via the needsRelocateWithSymbol routine).
NOTE: The method used in this patch is overly pessimistic, since the
needsRelocateWithSymbol routine currently does not have access to the
actual target symbol, and thus must always assume that it might have
two entry points.  This will be improved upon by a follow-on patch
that modifies common code to pass the target symbol when calling
needsRelocateWithSymbol.

Reviewed by Hal Finkel.

llvm-svn: 213485
2014-07-20 23:06:03 +00:00
Ulrich Weigand
22549677c8 [PowerPC] ELFv2 MC support for .abiversion directive
ELFv2 binaries are marked by a bit in the ELF header e_flags field.
A new assembler directive .abiversion can be used to set that flag.
This patch implements support in the PowerPC MC streamers to emit the
.abiversion directive (both into assembler and ELF binary output),
as well as support in the assembler parser to parse the .abiversion
directive.

Reviewed by Hal Finkel.

llvm-svn: 213484
2014-07-20 22:56:57 +00:00
Ulrich Weigand
d1393de80e [PowerPC] Refactor byval handling in LowerFormalArguments_64SVR4
When handling an incoming byval argument, we need to possibly write
incoming registers to the stack in order to create an on-stack image
of the parameter, so we can return its address to common code.

This currently uses CreateFixedObject to access the parts of the
parameter save area where the argument is (or needs to be) stored.
However, sometimes we need to access multiple parts of that area,
e.g. to write multiple registers.  The code currently uses a new
CreateFixedObject call for each of these accesses, resulting in
a patchwork of overlapping (fixed) stack objects.

This doesn't really matter in the case of fixed objects, since
any access to those turns into a fixed stackpointer + offset
address anyway.  However, with the upcoming ELFv2 patches, we
may actually need to place an incoming argument into our *own*
stack frame instead of the caller's.  This means we need to use
CreateStackObject instead, and we cannot have multiple overlapping
instances of those.

To make the rest of the argument handling code work equally in
both situations, this patch refactors it to always use just a
single call to CreateFixedObject, and access parts of that object
as required using address arithmetic.  This way, we can in a future
patch substitute CreateStackObject without further changes.

No change to generated code intended.

llvm-svn: 213483
2014-07-20 22:36:52 +00:00
Ulrich Weigand
c274d8aae7 [PowerPC] Fix FrameIndex handling in SelectAddressRegImm
The PPCTargetLowering::SelectAddressRegImm routine needs to handle
FrameIndex nodes in a special manner, by tranlating them into a
TargetFrameIndex node.  This was done in most cases, but seems to
have been neglected in one path: when the input tree has an OR of
the FrameIndex with an immediate.  This can happen if the FrameIndex
can be proven to be sufficiently aligned that an OR of that immediate
is equivalent to an ADD.

The missing handling of FrameIndex in that case caused the SelectionDAG
instruction selection to miss opportunities to merge the OR back into
the FrameIndex node, leading to superfluous addi/ori instructions in
the final assembler output.

llvm-svn: 213482
2014-07-20 22:26:40 +00:00
Hal Finkel
006e1d44a6 [PowerPC] 32-bit ELF PIC support
This adds initial support for PPC32 ELF PIC (Position Independent Code; the
-fPIC variety), thus rectifying a long-standing deficiency in the PowerPC
backend.

Patch by Justin Hibbits!

llvm-svn: 213427
2014-07-18 23:29:49 +00:00
Sanjay Patel
2f0f025b2b Move Post RA Scheduling flag bit into SchedMachineModel
Refactoring; no functional changes intended

    Removed PostRAScheduler bits from subtargets (X86, ARM).
    Added PostRAScheduler bit to MCSchedModel class.
    This bit is set by a CPU's scheduling model (if it exists).
    Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses.
    Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!).
    Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling.
    Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values.
    Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86: 
       a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget. 
       b. MIPS overrides the CPU's postRA settings by enabling postRA for everything. 
       c. PPC overrides the CPU's postRA settings by enabling postRA for everything. 
       d. X86 is the only target that actually has postRA specified via sched model info.

Differential Revision: http://reviews.llvm.org/D4217

llvm-svn: 213101
2014-07-15 22:39:58 +00:00
NAKAMURA Takumi
365309bb8b Prune Redundant libdeps in CMake's target_link_libraries and LLVMBuild.txt.
I checked this with Release+Asserts on x86_64-mingw32. Please restore partially if this were overkill.

llvm-svn: 213064
2014-07-15 11:37:03 +00:00
Ulrich Weigand
1aae06e395 [PowerPC] Fix invalid displacement created by LocalStackAlloc
This commit fixes a bug in PPCRegisterInfo::isFrameOffsetLegal that
could result in the LocalStackAlloc pass creating an MI instruction
out-of-range displacement:
        %vreg17<def> = LD 33184, %vreg31; mem:LD8[%g](align=32)
        %G8RC:%vreg17 G8RC_and_G8RC_NOX0:%vreg31
(In final assembler output the top bits are stripped off, resulting
in a negative offset loading from below the stack pointer.)

Common code expects the isFrameOffsetLegal routine to verify whether
adding a given offset to the offset already present in the instruction
results in a valid displacement.  However, on PowerPC the routine
did not take the already present instruction offset into account.

This commit fixes isFrameOffsetLegal to add the instruction offset,
and updates a local caller (needsFrameBaseReg) to no longer add the
instruction offset itself before calling isFrameOffsetLegal.

Reviewed by Hal Finkel.

llvm-svn: 212832
2014-07-11 17:19:31 +00:00
Ulrich Weigand
1078e34b76 [PowerPC] Implement atomic NAND operations as actual NAND
This changes the implementation of atomic NAND operations
from "a & ~b" (compatible with GCC < 4.4) to actual "~(a & b)"
(compatible with GCC >= 4.4).

This is in line with the common-code and ARM back-end change
implemented in r212433.

llvm-svn: 212547
2014-07-08 16:16:02 +00:00
Ulrich Weigand
5e8a0b585b [PowerPC] Fix no-assert build
r212476 caused a compile failure (unused variable) in a non-assertion
build ...

llvm-svn: 212477
2014-07-07 19:39:44 +00:00
Ulrich Weigand
8f2aff0b0c [PowerPC] Fix "byval align" arguments
Arguments passed as "byval align" should get the specified alignment
in the parameter save area.  There was some code in PPCISelLowering.cpp
that attempted to implement this, but this didn't work correctly:
while code did update the ArgOffset value, it neglected to update
the PtrOff value (which was already computed from the old ArgOffset),
and it also neglected to update GPR_idx -- fields skipped due to
alignment in the save area must likewise be skipped in GPRs.

This patch fixes and simplifies this logic by:
- handling argument offset alignment right at the beginning
  of argument processing, using a new helper routine
  CalculateStackSlotAlignment (this avoids having to update
  PtrOff and other derived values later on)
- not tracking GPR_idx separately, but always computing the
  correct GPR_idx for each argument *from* its ArgOffset
- removing some redundant computation in LowerFormalArguments:
  MinReservedArea must equal ArgOffset after argument processing,
  so there's no use in computing it twice.

[This doesn't change the behavior of the current clang front-end,
since that never creates "byval align" arguments at the moment.
This will change with a follow-on patch, however.]

llvm-svn: 212476
2014-07-07 19:26:41 +00:00
Juergen Ributzka
241f08ad4e [DAG] Pass the argument list to the CallLoweringInfo via move semantics. NFCI.
The argument list vector is never used after it has been passed to the
CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and
pretty much as cheap as keeping a pointer to it.

llvm-svn: 212135
2014-07-01 22:01:54 +00:00
Craig Topper
4c15d35f50 Add ops() method to SDNode that returns an ArrayRef<SDUse>. Use it to simplify some code.
llvm-svn: 211993
2014-06-29 00:40:57 +00:00
Ulrich Weigand
85c764c15a [PowerPC] Constrain base register in PPCRegisterInfo::resolveFrameIndex
I've run into a bug where current LLVM at -O0 (with fast-isel)
generated invalid code like:

        ld 0, 20936(1)                  # 8-byte Folded Reload
        stw 12, 10348(0)
        stw 12, 10344(0)

The underlying vreg had been introduced as base register by the
Local Stack Slot Allocation pass.  That register was constrained
to G8RC by PPCRegisterInfo::materializeFrameBaseRegister to match
the ADDI instruction used to set it, but it was *not* constrained
to G8RC_NOX0 to fit the *use* of the register in an address.

That should have happened in PPCRegisterInfo::resolveFrameIndex.
This patch adds an appropriate constrainRegClass call.

Reviewed by Hal Finkel.

llvm-svn: 211897
2014-06-27 13:04:12 +00:00
Eric Christopher
04713c650c Remove extraneous includes from the target machines.
llvm-svn: 211800
2014-06-26 19:30:05 +00:00
Will Schmidt
fbbc998175 add ppc64/pwr8 as target
includes handling DIR_PWR8 where appropriate
The P7Model Itinerary is currently tied in for use under the P8Model, and will be updated later.

llvm-svn: 211779
2014-06-26 13:36:19 +00:00
Rafael Espindola
196881cf29 Move expression visitation logic up to MCStreamer.
Remove the duplicate from MCRecordStreamer. No functionality change.

llvm-svn: 211714
2014-06-25 15:45:33 +00:00
Rafael Espindola
0e71694bfa Simplify the visitation of target expressions. No functionality change.
llvm-svn: 211707
2014-06-25 15:29:54 +00:00
Bill Schmidt
bfe90f8c83 [PPC64] Fix PR20071 (fctiduz generated for targets lacking that instruction)
PR20071 identifies a problem in PowerPC's fast-isel implementation for
floating-point conversion to integer.  The fctiduz instruction was added in
Power ISA 2.06 (i.e., Power7 and later).  However, this instruction is being
generated regardless of which 64-bit PowerPC target is selected.

The intent is for fast-isel to punt to DAG selection when this instruction is
not available.  This patch implements that change.  For testing purposes, the
existing fast-isel-conversion.ll test adds a RUN line for -mcpu=970 and tests
for the expected code generation.  Additionally, the existing test
fast-isel-conversion-p5.ll was found to be incorrectly expecting the
unavailable instruction to be generated.  I've removed these test variants
since we have adequate coverage in fast-isel-conversion.ll.

llvm-svn: 211627
2014-06-24 20:05:18 +00:00
Ulrich Weigand
b36d130d19 [PowerPC] Refactor getMinCallFrameSize / getMinCallArgumentsSize
As of r211495, the only remaining users of getMinCallFrameSize are in
core ABI code (LowerFormalParameter / LowerCall).  This is actually a
good thing, since the details of the parameter save area are ABI specific.

With the new ELFv2 ABI in particular, the rules defining the size of the
save area will become significantly more complex, so it wouldn't make
sense to implement those outside ABI code that has all required
information.

In preparation, this patch eliminates the getMinCallFrameSize (and
associated getMinCallArgumentsSize) routines, and inlines them into all
callers.  Note that since nearly all call arguments are constant, this
allows simplifying the inlined copies to a single line everywhere.

No change in generate code expected.

llvm-svn: 211497
2014-06-23 14:15:53 +00:00
Ulrich Weigand
ac0412b387 [PowerPC] Allow stack frames without parameter save area
The PPCFrameLowering::determineFrameLayout routine currently ensures
that every function that allocates a stack frame provides space for the
parameter save area (via PPCFrameLowering::getMinCallFrameSize).

This is actually not necessary.  There may be functions that never call
another routine but still allocate a frame; those do not require the
parameter save area.  In the future, with the ELFv2 ABI, even some
routines that do call other functions do not need to allocate the
parameter save area.

While it is not a bug to allocate the parameter area when it is not
needed, it is better to avoid it to save stack space.

Note that when any particular function call requires the parameter save
area, this space will already have been included by ABI code in the size
the CALLSEQ_START insn is annotated with, and therefore included in the
size returned by MFI->getMaxCallFrameSize().

This means that determineFrameLayout simply does not need to care about
the parameter save area.  (It still needs to ensure that every frame
provides the linkage area.)  This is implemented by this patch.

Note that this exposed a bug in the new fast-isel code where the parameter
area was *not* included in the CALLSEQ_START size; this is also fixed.

A couple of test cases needed to be adapted for the new (smaller) stack
frame size those tests now see.

llvm-svn: 211495
2014-06-23 13:47:52 +00:00
Ulrich Weigand
cd08720d8e [PowerPC] Fix IsDarwin arg in PPCFrameLowering:: calls
As remarked in the commit message to r211493, in several places
throughout the 64-bit SVR4 ABI code there are calls to
PPCFrameLowering::getLinkageSize and getMinCallFrameSize
using an incorrect IsDarwin argument of "true".

(Some of those were made explicit by the above refactoring patch, others
have been there all along.)

This patch fixes those places to pass "false" for IsDarwin.

No change in generated code expected.

llvm-svn: 211494
2014-06-23 13:21:43 +00:00
Ulrich Weigand
186fcfa6d3 [PowerPC] Refactor setMinReservedArea and CalculateParameterAndLinkageAreaSize
The PPCISelLowering.cpp routines PPCTargetLowering::setMinReservedArea and
CalculateParameterAndLinkageAreaSize are currently used as subroutines
from both 64-bit SVR4 and Darwin ABI code.

However, the two ABIs are already quite different w.r.t. AltiVec
conventions, and they will become more different when the ELFv2 ABI is
supported.  Also, in general it seems better to disentangle ABI support
routines for different ABIs to avoid accidentally affecting one ABI when
intending to change only the other.

(Actually, the current code strictly speaking already contains a bug:
these routines call PPCFrameLowering::getMinCallFrameSize and
PPCFrameLowering::getLinkageSize with the IsDarwin parameter set to
"true" even on 64-bit SVR4.  This bug currently has no adverse effect
since those routines always return the same for 64-bit SVR4 and 64-bit
Darwin, but it still seems wrong ...  I'll fix this in a follow-up
commit shortly.)

To remove this code sharing, I'm simply inlining both routines into all
call sites (there are just two each, one for 64-bit SVR4 and one for
Darwin), and simplifying due to constant parameters where possible.

A small piece of code that *does* make sense to share is refactored into
the new routine EnsureStackAlignment, now also called from 32-bit SVR4
ABI code.

No change in generated code is expected.

llvm-svn: 211493
2014-06-23 13:08:27 +00:00
Ulrich Weigand
e1fa36144d [PowerPC] Fix on-stack AltiVec arguments with 64-bit SVR4
Current 64-bit SVR4 code seems to have some remnants of Darwin code
in AltiVec argument handing.  This had the effect that AltiVec arguments
(or subsequent arguments) were not correctly placed in the parameter area
in some cases.

The correct behaviour with the 64-bit SVR4 ABI is:
- All AltiVec arguments take up space in the parameter area, just like
  any other arguments, whether vararg or not.
- They are always 16-byte aligned, skipping a parameter area doubleword
  (and the associated GPR, if any), if necessary.

This patch implements the correct behaviour and adds a test case.
(Verified against GCC behaviour via the ABI compat test suite.)

llvm-svn: 211492
2014-06-23 12:36:34 +00:00
Ulrich Weigand
3f880bd06e [PowerPC] Fix small argument stack slot offset for LE
When small arguments (structures < 8 bytes or "float") are passed in a
stack slot in the ppc64 SVR4 ABI, they must reside in the least
significant part of that slot.  On BE, this means that an offset needs
to be added to the stack address of the parameter, but on LE, the least
significant part of the slot has the same address as the slot itself.

This changes the PowerPC back-end ABI code to only add the small
argument stack slot offset for BE.  It also adds test cases to verify
the correct behavior on both BE and LE.

llvm-svn: 211368
2014-06-20 16:34:05 +00:00
Ulrich Weigand
c393e04673 [PowerPC] Remove unnecessary load of r12 in indirect call
When looking at the 64-bit SVR4 indirect call sequence, I noticed
an unnecessary load of r12.  And indeed the code says:

  // R12 must contain the address of an indirect callee. 

But this is not correct; in the 64-bit SVR4 (ELFv1) ABI, there is
no need to load r12 at this point.  It seems this code and comment
is a remnant of code originally shared with the Darwin ABI ...

This patch simply removes the unnecessary load.

llvm-svn: 211203
2014-06-18 18:33:36 +00:00
Ulrich Weigand
e256aa48b9 [PowerPC] Simplify and improve loading into TOC register
During an indirect function call sequence on the 64-bit SVR4 ABI,
generate code must load and then restore the TOC register.

This does not use a regular LOAD instruction since the TOC
register r2 is marked as reserved.  Instead, the are two
special instruction patterns:

 let RST = 2, DS = 2 in
 def LDinto_toc: DSForm_1a<58, 0, (outs), (ins g8rc:$reg),
                     "ld 2, 8($reg)", IIC_LdStLD,
                     [(PPCload_toc i64:$reg)]>, isPPC64;
 
 let RST = 2, DS = 10, RA = 1 in
 def LDtoc_restore : DSForm_1a<58, 0, (outs), (ins),
                     "ld 2, 40(1)", IIC_LdStLD,
                     [(PPCtoc_restore)]>, isPPC64;

Note that these not only restrict the destination of the
load to r2, but they also restrict the *source* of the
load to particular address combinations.  The latter is
a problem when we want to support the ELFv2 ABI, since
there the TOC save slot is no longer at 40(1).

This patch replaces those two instructions with a single
instruction pattern that only hard-codes r2 as destination,
but supports generic addresses as source.  This will allow
supporting the ELFv2 ABI, and also helps generate more
efficient code for calls to absolute addresses (allowing
simplification of the ppc64-calls.ll test case).

llvm-svn: 211193
2014-06-18 17:52:49 +00:00
Ulrich Weigand
1458f67d3e [PowerPC] Do not use BLA with the 64-bit SVR4 ABI
The PowerPC back-end uses BLA to implement calls to functions at
known-constant addresses, which is apparently used for certain
system routines on Darwin.

However, with the 64-bit SVR4 ABI, this is actually incorrect.
An immediate function pointer value on this platform is not
directly usable as a target address for BLA:
- in the ELFv1 ABI, the function pointer value refers to the
  *function descriptor*, not the code address
- in the ELFv2 ABI, the function pointer value refers to the
  global entry point, but BL(A) would only be correct when
  calling the *local* entry point

This bug didn't show up since using immediate function pointer
values is not usually done in the 64-bit SVR4 ABI in the first
place.  However, I ran into this issue with a certain use case
of LLVM as JIT, where immediate function pointer values were
uses to implement callbacks from JITted code to helpers in
statically compiled code.

Fixed by simply not using BLA with the 64-bit SVR4 ABI.

llvm-svn: 211174
2014-06-18 16:14:04 +00:00
Ulrich Weigand
cfcf14bd4c [PowerPC] Fix emitting instruction pairs on LE
My patch r204634 to emit instructions in little-endian format failed to
handle those special cases where we emit a pair of instructions from a
single LLVM MC instructions (like the bl; nop pairs used to implement
the call sequence).

In those cases, we still need to emit the "first" instruction (the one
in the more significant word) first, on both big and little endian,
and not swap them.

llvm-svn: 211171
2014-06-18 15:37:07 +00:00
Bill Schmidt
b0bab996e0 [PPC64] Fix PR19893 - improve code generation for local function addresses
Rafael opened http://llvm.org/bugs/show_bug.cgi?id=19893 to track non-optimal
code generation for forming a function address that is local to the compile
unit.  The existing code was treating both local and non-local functions
identically.

This patch fixes the problem by properly identifying local functions and
generating the proper addis/addi code.  I also noticed that Rafael's earlier
changes to correct the surrounding code in PPCISelLowering.cpp were also
needed for fast instruction selection in PPCFastISel.cpp, so this patch
fixes that code as well.

The existing test/CodeGen/PowerPC/func-addr.ll is modified to test the new
code generation.  I've added a -O0 run line to test the fast-isel code as
well.

Tested on powerpc64[le]-unknown-linux-gnu with no regressions.

llvm-svn: 211056
2014-06-16 21:36:02 +00:00
Eric Christopher
2ea2870f36 The hazard recognizer only needs a subtarget, not a target machine
so make it take one. Fix up all users accordingly.

llvm-svn: 210948
2014-06-13 22:38:52 +00:00
Eric Christopher
72e46a5bfe Fix typo.
llvm-svn: 210947
2014-06-13 22:38:48 +00:00
Eric Christopher
286dd39af0 Move the PPCSelectionDAGInfo off the TargetMachine and onto the
subtarget.

llvm-svn: 210854
2014-06-12 23:02:32 +00:00
Eric Christopher
0382d62f87 Make PPCSelectionDAGInfo take a DataLayout instead of a TargetMachine
since that's all it needs.

llvm-svn: 210853
2014-06-12 22:56:48 +00:00
Eric Christopher
42e57db35c Move PPCTargetLowering off of the TargetMachine and onto the subtarget.
llvm-svn: 210852
2014-06-12 22:50:10 +00:00
Eric Christopher
576f8a1a6b Remove an extraneous this-> to access the subtarget.
llvm-svn: 210849
2014-06-12 22:38:20 +00:00
Eric Christopher
d1b1190bd1 Rename PPCSubTarget to Subtarget in PPCTargetLowering for consistency.
Also remove an extra local subtarget in the initialization functions.

llvm-svn: 210848
2014-06-12 22:38:18 +00:00
Eric Christopher
429be5d609 Move PPCJITInfo off of the TargetMachine and onto the subtarget.
Needed to migrate a few functions around to avoid circular header
dependencies.

llvm-svn: 210845
2014-06-12 22:28:06 +00:00
Eric Christopher
40ba57e7f5 Remove the use of TargetMachine from PPCJITInfo and replace with
the subtarget. Also remove unnecessary argument to the constructor
at the same time, we already have access via the subtarget.

llvm-svn: 210844
2014-06-12 22:19:51 +00:00
Eric Christopher
95b7901fd6 Move PPCInstrInfo off of the target machine and onto the subtarget.
llvm-svn: 210839
2014-06-12 22:05:46 +00:00
Eric Christopher
d4532ed073 Remove TargetMachine from PPCInstrInfo and all dependencies and
replace with the current subtarget.

llvm-svn: 210836
2014-06-12 21:48:52 +00:00
Eric Christopher
fe75c4c997 Move DataLayout from the PPCTargetMachine to the subtarget.
llvm-svn: 210824
2014-06-12 21:08:06 +00:00
Eric Christopher
0c6467adf3 Move PPCFrameLowering into PPCSubtarget from PPCTargetMachine. Use
the initializeSubtargetDependencies code to obtain an initialized
subtarget and migrate a couple of subtarget using functions to the
.cpp file to avoid circular includes.

llvm-svn: 210822
2014-06-12 20:54:11 +00:00
Eric Christopher
a30bc53a6b Remove duplicate copy of InstrItineraryData from the TargetMachine,
it's already on the subtarget.

llvm-svn: 210619
2014-06-11 00:53:17 +00:00
Bill Schmidt
6e11183ad7 [PPC64LE] Recognize shufflevector patterns for little endian
Various masks on shufflevector instructions are recognizable as
specific PowerPC instructions (vector pack, vector merge, etc.).
There is existing code in PPCISelLowering.cpp to recognize the correct
patterns for big endian code.  The masks for these instructions are
different for little endian code due to the big-endian numbering
employed by these instructions.  This patch adds the recognition code
for little endian.

I've added a new test case test/CodeGen/PowerPC/vec_shuffle_le.ll for
this.  The existing recognizer test (vec_shuffle.ll) is unnecessarily
verbose and difficult to read, so I felt it was better to add a new
test rather than modify the old one.

llvm-svn: 210536
2014-06-10 14:35:01 +00:00
Bill Schmidt
41cd7375c8 [PPC64LE] Generate correct code for unaligned little-endian vector loads
The code in PPCTargetLowering::PerformDAGCombine() that handles
unaligned Altivec vector loads generates a lvsl followed by a vperm.
As we've seen in numerous other places, the vperm instruction has a
big-endian bias, and this is fixed for little endian by complementing
the permute control vector and swapping the input operands.  In this
case the lvsl is providing the permute control vector.  Rather than
generating an lvsl and a complement operation, it is sufficient to
generate an lvsr instruction instead.  Thus for LE code generation we
will generate an lvsr rather than an lvsl, and swap the other input
arguments on the vperm.

The existing test/CodeGen/PowerPC/vec_misalign.ll is updated to test
the code generation for PPC64 and PPC64LE, in addition to the existing
PPC32/G5 testing.

llvm-svn: 210493
2014-06-09 22:00:52 +00:00
Bill Schmidt
3ff0a8eb8b [PPC64LE] Generate correct little-endian code for v16i8 multiply
The existing code in PPCTargetLowering::LowerMUL() for multiplying two
v16i8 values assumes that vector elements are numbered in big-endian
order.  For little-endian targets, the vector element numbering is
reversed, but the vmuleub, vmuloub, and vperm instructions still
assume big-endian numbering.  To account for this, we must adjust the
permute control vector and reverse the order of the input registers on
the vperm instruction.

The existing test/CodeGen/PowerPC/vec_mul.ll is updated to be executed
on powerpc64 and powerpc64le targets as well as the original powerpc
(32-bit) target.

llvm-svn: 210474
2014-06-09 16:06:29 +00:00
David Blaikie
f670b953e7 AsmMatchers: Use unique_ptr to manage ownership of MCParsedAsmOperand
I saw at least a memory leak or two from inspection (on probably
untested error paths) and r206991, which was the original inspiration
for this change.

I ran this idea by Jim Grosbach a few weeks ago & he was OK with it.
Since it's a basically mechanical patch that seemed sufficient - usual
post-commit review, revert, etc, as needed.

llvm-svn: 210427
2014-06-08 16:18:35 +00:00
Eric Christopher
db8e2ecde5 Have TargetSelectionDAGInfo take a DataLayout initializer rather than
a TargetMachine since the only thing it wants is DataLayout.

llvm-svn: 210366
2014-06-06 19:04:48 +00:00
Bill Schmidt
647be1ef2c [PPC64LE] Fix lowering of BUILD_VECTOR and SHUFFLE_VECTOR for little endian
This patch fixes a couple of lowering issues for little endian
PowerPC.  The code for lowering BUILD_VECTOR contains a number of
optimizations that are only valid for big endian.  For now, we disable
those optimizations for correctness.  In the future, we will add
analogous optimizations that are correct for little endian.

When lowering a SHUFFLE_VECTOR to a VPERM operation, we again need to
make the now-familiar transformation of swapping the input operands
and complementing the permute control vector.  Correctness of this
transformation is tested by the accompanying test case.

llvm-svn: 210336
2014-06-06 14:06:26 +00:00
Bill Schmidt
9380b3b7bf [PPC64LE] Temporarily disable VSX support in little-endian mode
This is a preliminary patch for the PowerPC64LE support.  In stage 1
of the vector support, we will support the VMX (Altivec) instruction
set, but will not yet support the VSX instructions.  This is merely a
staging issue to provide functional vector support as soon as
possible.

llvm-svn: 210271
2014-06-05 16:21:13 +00:00
Eric Christopher
44a7e98ce5 Omit else branch after return.
llvm-svn: 210034
2014-06-02 17:29:07 +00:00
Eric Christopher
1aad72164e Have the TLOF creation take a Triple rather than needing a subtarget.
llvm-svn: 209937
2014-05-31 00:07:32 +00:00
Eric Christopher
0cc6977494 isSVR4ABI() returned !isDarwin() so just move that to the else
block and remove the unreachable code.

llvm-svn: 209927
2014-05-30 22:47:53 +00:00
Eric Christopher
f0478ea2df Rename CreateTLOF->createTLOF to match the rest of the file and the
rest of the targets with a similar function name.

llvm-svn: 209926
2014-05-30 22:47:48 +00:00
Rafael Espindola
34f4870951 [PPC] Use alias symbols in address computation.
This seems to match what gcc does for ppc and what every other llvm
backend does.

This is a fixed version of r209638. The difference is to avoid any change
in behavior for functions. The logic for using constant pools for function
addresseses is spread over a few places and we have to keep them in sync.

llvm-svn: 209821
2014-05-29 15:41:38 +00:00
Rafael Espindola
acdb307db3 [pr19844] Add thread local mode to aliases.
This matches gcc's behavior. It also seems natural given that aliases
contain other properties that govern how it is accessed (linkage,
visibility, dll storage).

Clang still has to be updated to expose this feature to C.

llvm-svn: 209759
2014-05-28 18:15:43 +00:00
Hal Finkel
47a225fb6c Revert "[PPC] Use alias symbols in address computation."
This reverts commit r209638 because it broke self-hosting on ppc64/Linux. (the
Clang-compiled TableGen would segfault because it jumped to an invalid address
from within _ZNK4llvm17ManagedStaticBase21RegisterManagedStaticEPFPvvEPFvS1_E
(which is within the command-line parameter registration process)).

llvm-svn: 209745
2014-05-28 15:25:06 +00:00
Bill Schmidt
b806d02b5b [PATCH] Correct type used for VADD_SPLAT optimization on PowerPC
In PPCISelLowering.cpp: PPCTargetLowering::LowerBUILD_VECTOR(), there
is an optimization for certain patterns to generate one or two vector
splats followed by a vector add or subtract.  This operation is
represented by a VADD_SPLAT in the selection DAG.  Prior to this
patch, it was possible for the VADD_SPLAT to be assigned the wrong
data type, causing incorrect code generation.  This patch corrects the
problem.

Specifically, the code previously assigned the value type of the
BUILD_VECTOR node to the newly generated VADD_SPLAT node.  This is
correct much of the time, but not always.  The problem is that the
call to isConstantSplat() may return a SplatBitSize that is not the
same as the number of bits in the original element vector type.  The
correct type to assign is a vector type with the same element bit size
as SplatBitSize.

The included test case shows an example of this, where the
BUILD_VECTOR node has a type of v16i8.  The vector to be built is {0,
16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16}.  isConstantSplat
detects that we can generate a splat of 16 for type v8i16, which is
the type we must assign to the VADD_SPLAT node.  If we do not, we
generate a vspltisb of 8 and a vaddubm, which generates the incorrect
result {16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16}.  The correct code generation is a vspltish of 8 and a vadduhm.

This patch also corrected code generation for
CodeGen/PowerPC/2008-07-10-SplatMiscompile.ll, which had been marked
as an XFAIL, so we can remove the XFAIL from the test case.

llvm-svn: 209662
2014-05-27 15:57:51 +00:00
Rafael Espindola
94cd9a1ed6 [PPC] Use alias symbols in address computation.
This seems to match what gcc does for ppc and what every other llvm
backend does.

llvm-svn: 209638
2014-05-26 19:08:19 +00:00
Eric Christopher
1b3b092405 Fix typo.
llvm-svn: 209377
2014-05-22 01:21:44 +00:00
Eric Christopher
7898bbfc19 Avoid using subtarget features when initializing the pass pipeline
on PPC.

llvm-svn: 209376
2014-05-22 01:21:35 +00:00
Eric Christopher
4c56d10ea0 Reset the subtarget for DAGToDAG on every iteration of runOnMachineFunction.
This required updating the generated functions and TD file accordingly
to be pointers rather than const references.

llvm-svn: 209375
2014-05-22 01:07:24 +00:00
Eric Christopher
7880d61aac Make early if conversion dependent upon the subtarget and add
a subtarget hook to enable. Unconditionally add to the pass pipeline
for targets that might want to use it. No functional change.

llvm-svn: 209340
2014-05-21 23:40:26 +00:00
Adam Nemet
37337f0359 [PowerPC] PR19796: Also match ISD::TargetConstant in isIntS16Immediate
The SplitIndexingFromLoad changes exposed a latent isel bug in the PowerPC64
backend.  We matched an immediate offset with STWX8 even though it only
supports register offset.

The culprit is the complex-pattern predicate, SelectAddrIdx, which decides
that if the offset is not ISD::Constant it must be a register.

Many thanks to Bill Schmidt for testing this.

llvm-svn: 209219
2014-05-20 17:20:34 +00:00
Benjamin Kramer
600e24a1cb SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the bswap not.
- On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though.
- On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal.
- On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled.

llvm-svn: 209123
2014-05-19 13:12:38 +00:00
Saleem Abdulrasool
501d3b6235 Target: remove old constructors for CallLoweringInfo
This is mostly a mechanical change changing all the call sites to the newer
chained-function construction pattern.  This removes the horrible 15-parameter
constructor for the CallLoweringInfo in favour of setting properties of the call
via chained functions.  No functional change beyond the removal of the old
constructors are intended.

llvm-svn: 209082
2014-05-17 21:50:17 +00:00
Pete Cooper
fa13048706 Use a sized enum for MachineOperandType. No functionality change
llvm-svn: 209048
2014-05-16 23:28:17 +00:00
Rafael Espindola
e809bea68e Delete getAliasedGlobal.
llvm-svn: 209040
2014-05-16 22:37:03 +00:00
Jay Foad
e0eac700cb Rename ComputeMaskedBits to computeKnownBits. "Masked" has been
inappropriate since it lost its Mask parameter in r154011.

llvm-svn: 208811
2014-05-14 21:14:37 +00:00
Eric Christopher
935299458d Fix typo in function name.
llvm-svn: 208743
2014-05-14 00:31:15 +00:00
Eric Christopher
1091ab4275 Save the optimization level the subtarget was created with in a
member variable and sink the initialization of crbits into the
subtarget feature reset code.

No functional change, but this refactor will be used in a future
commit.

llvm-svn: 208726
2014-05-13 20:49:08 +00:00
Hal Finkel
f6f53bcd51 [PowerPC] Add global named register support
Support for the intrinsics that read from and write to global named registers
is added for r1, r2 and r13 (depending on the subtarget).

llvm-svn: 208509
2014-05-11 19:29:11 +00:00
Hal Finkel
ef707e1c3e [PowerPC] On PPC32, 128-bit shifts might be runtime calls
The counter-loops formation pass needs to know what operations might be
function calls (because they can't appear in counter-based loops). On PPC32,
128-bit shifts might be runtime calls (even though you can't use __int128 on
PPC32, it seems that SROA might form them).

Fixes PR19709.

llvm-svn: 208501
2014-05-11 16:23:29 +00:00
Rafael Espindola
765e5e78cf Remove the UseCFI option from createAsmStreamer.
We were already always passing true, this just removes the option.

llvm-svn: 208205
2014-05-07 13:00:43 +00:00
Rafael Espindola
bb77d317cd Fix pr19645.
The fix itself is fairly simple: move getAccessVariant to MCValue so that we
replace the old weak expression evaluation with the far more general
EvaluateAsRelocatable.

This then requires that EvaluateAsRelocatable stop when it finds a non
trivial reference kind. And that in turn requires the ELF writer to look
harder for weak references.

Last but not least, this found a case where we were being bug by bug
compatible with gas and accepting an invalid input. I reported pr19647
to track it.

llvm-svn: 207920
2014-05-03 19:57:04 +00:00
Craig Topper
79b097d66a Use makeArrayRef insted of calling ArrayRef<T> constructor directly. I introduced most of these recently.
llvm-svn: 207616
2014-04-30 07:17:30 +00:00
Craig Topper
5f5f448d02 De-virtualize or remove some methods that have no overrides nor override anything. In some cases remove all together if there are no callers either.
llvm-svn: 207610
2014-04-30 05:53:27 +00:00
Craig Topper
dcce1d897e [C++11] Add 'override' keywords and remove 'virtual'. Additionally add 'final' and leave 'virtual' on some methods that are marked virtual without overriding anything and have no obvious overrides themselves. PowerPC edition
llvm-svn: 207504
2014-04-29 07:57:37 +00:00
Eric Christopher
4af26f41d6 None of these targets actually define their own CFI_INSTRUCTION
opcode so there's no reason to use the target namespace for it
rather than TargetOpcode.

llvm-svn: 207475
2014-04-29 00:16:46 +00:00
Eric Christopher
17d4cf2e96 80-column, tab characters, comment fixups.
llvm-svn: 207473
2014-04-29 00:16:40 +00:00
Craig Topper
9683cb114b Convert more SelectionDAG functions to use ArrayRef.
llvm-svn: 207397
2014-04-28 05:57:50 +00:00
Craig Topper
b663bffa27 [C++] Use 'nullptr'.
llvm-svn: 207394
2014-04-28 04:05:08 +00:00
Craig Topper
1efda44640 Convert SelectionDAG::SelectNodeTo to use ArrayRef.
llvm-svn: 207377
2014-04-27 19:21:11 +00:00
Craig Topper
536995c0a7 Convert SelectionDAG::getMergeValues to use ArrayRef.
llvm-svn: 207374
2014-04-27 19:20:57 +00:00
Craig Topper
e0741a0fcb Convert getMemIntrinsicNode to take ArrayRef of SDValue instead of pointer and size.
llvm-svn: 207329
2014-04-26 19:29:41 +00:00
Craig Topper
1b1f54bcca Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>.
llvm-svn: 207327
2014-04-26 18:35:24 +00:00
Craig Topper
6d411cb95a [C++] Use 'nullptr'. Target edition.
llvm-svn: 207197
2014-04-25 05:30:21 +00:00
Reid Kleckner
e7e2ccb9e9 Add 'musttail' marker to call instructions
This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur.  It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.

Reviewers: nicholas

Differential Revision: http://llvm-reviews.chandlerc.com/D3240

llvm-svn: 207143
2014-04-24 20:14:34 +00:00
David Blaikie
52ea9cc073 Spread some const around for non-mutating uses of MCSymbolData.
I discovered this const-hole while attempting to coalesnce the Symbol
and SymbolMap data structures. There's some pending issues with that,
but I figured this change was easy to flush early.

llvm-svn: 207124
2014-04-24 16:59:40 +00:00
Evgeniy Stepanov
c242bd4b23 Create MCTargetOptions.
For now it contains a single flag, SanitizeAddress, which enables
AddressSanitizer instrumentation of inline assembly.

Patch by Yuri Gorshenin.

llvm-svn: 206971
2014-04-23 11:16:03 +00:00
Chandler Carruth
ae889a5f85 [Modules] Fix potential ODR violations by sinking the DEBUG_TYPE
definition below all of the header #include lines, lib/Target/...
edition.

llvm-svn: 206842
2014-04-22 02:41:26 +00:00
Chandler Carruth
72185824a4 [cleanup] Lift using directives, DEBUG_TYPE definitions, and even some
system headers above the includes of generated '.inc' files that
actually contain code. In a few targets this was already done pretty
consistently, but it wasn't done *really* consistently anywhere. It is
strictly cleaner IMO and necessary in a bunch of places where the
DEBUG_TYPE is referenced from the generated code. Consistency with the
necessary places trumps. Hopefully the build bots are OK with the
movement of intrin.h...

llvm-svn: 206838
2014-04-22 02:03:14 +00:00
Chandler Carruth
15c7b91ac2 [Modules] Make Support/Debug.h modular. This requires it to not change
behavior based on other files defining DEBUG_TYPE, which means it cannot
define DEBUG_TYPE at all. This is actually better IMO as it forces folks
to define relevant DEBUG_TYPEs for their files. However, it requires all
files that currently use DEBUG(...) to define a DEBUG_TYPE if they don't
already. I've updated all such files in LLVM and will do the same for
other upstream projects.

This still leaves one important change in how LLVM uses the DEBUG_TYPE
macro going forward: we need to only define the macro *after* header
files have been #include-ed. Previously, this wasn't possible because
Debug.h required the macro to be pre-defined. This commit removes that.
By defining DEBUG_TYPE after the includes two things are fixed:

- Header files that need to provide a DEBUG_TYPE for some inline code
  can do so by defining the macro before their inline code and undef-ing
  it afterward so the macro does not escape.

- We no longer have rampant ODR violations due to including headers with
  different DEBUG_TYPE definitions. This may be mostly an academic
  violation today, but with modules these types of violations are easy
  to check for and potentially very relevant.

Where necessary to suppor headers with DEBUG_TYPE, I have moved the
definitions below the includes in this commit. I plan to move the rest
of the DEBUG_TYPE macros in LLVM in subsequent commits; this one is big
enough.

The comments in Debug.h, which were hilariously out of date already,
have been updated to reflect the recommended practice going forward.

llvm-svn: 206822
2014-04-21 22:55:11 +00:00
Nick Lewycky
82ad9fc7c8 Break PseudoSourceValue out of the Value hierarchy. It is now the root of its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead.
llvm-svn: 206255
2014-04-15 07:22:52 +00:00
Lang Hames
91cdab6916 [MC] Require an MCContext when constructing an MCDisassembler.
This patch re-introduces the MCContext member that was removed from
MCDisassembler in r206063, and requires that an MCContext be passed in at
MCDisassembler construction time. (Previously the MCContext member had been
initialized in an ad-hoc fashion after construction). The MCCContext member
can be used by MCDisassembler sub-classes to construct constant or
target-specific MCExprs.

This patch updates disassemblers for in-tree targets, and provides the
MCRegisterInfo instance that some disassemblers were using through the
MCContext (previously those backends were constructing their own
MCRegisterInfo instances).

llvm-svn: 206241
2014-04-15 04:40:56 +00:00
Hal Finkel
c4a623f8d4 [PowerPC] [Constant Hoisting] Enable constant hoisting on PPC
Implements the various TTI functions to enable constant hoisting on PPC. The
only significant test-suite change is this:

MultiSource/Benchmarks/VersaBench/bmm/bmm - 20% speedup
(which essentially reverses the slowdown from r206120).

llvm-svn: 206141
2014-04-13 23:02:40 +00:00
Hal Finkel
4da0e32e2a [PowerPC] Fix rlwimi isel when mask is not constant
We had been using the known-zero values of the operand of the or to construct
the mask for an rlwimi; this is not quite correct, but fine when the mask is
constant. When the mask is constant, then the known zeros of the operand must
be a superset of the zeros in the mask. However, when the mask is not a
constant, then there might be bits in the operand that are not known to be zero
that, at runtime, might be zero in the mask. Therefore, we check that any bits
not known to be zero *are* known to be one in the mask. Otherwise, we can't
fold the mask with the or and shift.

This was revealed as a miscompile of
MultiSource/Benchmarks/BitBench/drop3/drop3 when I started experimenting with
constant hoisting.

llvm-svn: 206136
2014-04-13 17:10:58 +00:00
Hal Finkel
a1849e7ac8 [PowerPC] Implement some additional TLI callbacks
Add implementations of:
  bool isLegalICmpImmediate(int64_t Imm) const
  bool isLegalAddImmediate(int64_t Imm) const
  bool isTruncateFree(Type *Ty1, Type *Ty2) const
  bool isTruncateFree(EVT VT1, EVT VT2) const
  bool shouldConvertConstantLoadToIntImm(const APInt &Imm, Type *Ty) const

Unfortunately, this regresses counter-register-based loop formation because
some of the loops now end up in forms were SE cannot compute loop counts.
However, nevertheless, the test-suite results favor committing:

SingleSource/Benchmarks/BenchmarkGame/puzzle: 26% speedup
MultiSource/Benchmarks/FreeBench/analyzer/analyzer: 21% speedup
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan: 20% speedup
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv: 19% speedup
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv: 15% speedup
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2: 2% speedup

MultiSource/Benchmarks/VersaBench/bmm/bmm: 26% slowdown

llvm-svn: 206120
2014-04-12 21:52:38 +00:00
NAKAMURA Takumi
b6baf3b8a1 LLVMBuild.txt: Reformat.
llvm-svn: 205961
2014-04-10 11:16:17 +00:00
Hal Finkel
0e97afbdb4 [PowerPC] Don't return false from PPC::isVSLDOIShuffleMask
PPC::isVSLDOIShuffleMask should return -1, not false, when the shuffle
predicate should be false.

Noticed by inspection; no test case (yet).

llvm-svn: 205787
2014-04-08 19:00:27 +00:00
Hal Finkel
216842b276 [PowerPC] Remove unused TM member variable to unbreak build
Fix "error: private field 'TM' is not used [-Werror,-Wunused-private-field]"

llvm-svn: 205660
2014-04-05 00:16:28 +00:00
Hal Finkel
ade2d32df0 [PowerPC] Adjust load/store costs in PPCTTI
This provides more realistic costs for the insert/extractelement instructions
(which are load/store pairs), accounts for the cheap unaligned Altivec load
sequence, and for unaligned VSX load/stores.

Bad news:
MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation)
SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized)
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown

Good news:
SingleSource/Benchmarks/Shootout/ary3 - 54% speedup
SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup
MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup
MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup
MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup

Unfortunately, estimating the costs of the stack-based scalarization sequences
is hard, and adjusting these costs is like a game of whac-a-mole :( I'll
revisit this again after we have better codegen for vector extloads and
truncstores and unaligned load/stores.

llvm-svn: 205658
2014-04-04 23:51:18 +00:00
Hal Finkel
e6580e744f [PowerPC] PPCTTI Cleanup
Remove the declaration of an unimplemented function.

llvm-svn: 205657
2014-04-04 23:51:11 +00:00
Hal Finkel
e63f5074c7 [PowerPC] Add a full condition code register to make the "cc" clobber work
gcc inline asm supports specifying "cc" as a clobber of all condition
registers. Add just enough modeling of the full register to make this work.
Fixed PR19326.

llvm-svn: 205630
2014-04-04 15:15:57 +00:00
Craig Topper
694437e2ef Make consistent use of MCPhysReg instead of uint16_t throughout the tree.
llvm-svn: 205610
2014-04-04 05:16:06 +00:00
Hal Finkel
7f4525afba [PowerPC] Make PPCTTI::getMemoryOpCost call BasicTTI::getMemoryOpCost
PPCTTI::getMemoryOpCost will now make use of BasicTTI::getMemoryOpCost to
calculate the base cost of the memory access, and then adjust on top of that.
There is no functionality change from this modification, but it will become
important so that PPCTTI can take advantage of scalarization information for which
BasicTTI::getMemoryOpCost will account in the near future.

llvm-svn: 205476
2014-04-02 22:43:49 +00:00
Jim Grosbach
f349849199 Simplify resolveFrameIndex() signature.
Just pass a MachineInstr reference rather than an MBB iterator.
Creating a MachineInstr& is the first thing every implementation did
anyway.

llvm-svn: 205453
2014-04-02 19:28:18 +00:00
Hal Finkel
3d67e62e4c [PowerPC] Add some missing VSX bitcast patterns
llvm-svn: 205352
2014-04-01 19:24:27 +00:00
Hal Finkel
0296ed914f [PowerPC] Don't ever expand BUILD_VECTOR of v2i64 with shuffles
If we have two unique values for a v2i64 build vector, this will always result
in two vector loads if we expand using shuffles. Only one is necessary.

llvm-svn: 205231
2014-03-31 17:48:16 +00:00
Hal Finkel
09ae220a41 [PowerPC] Correct P7 dispatch unit allocation for vector instructions
llvm-svn: 205222
2014-03-31 17:02:10 +00:00
Hal Finkel
d7201e5971 [PowerPC] Handle VSX v2i64 SIGN_EXTEND_INREG
sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node
(and similarly for v2i16 and v2i8). Even though there are no sign-extension (or
algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by
converting two and from v2i64. The small trick necessary here is to shift the
i32 elements into the right lanes before the i32 -> f64 step. This is because
of the big Endian nature of the system, we need the i32 portion in the high
word of the i64 elements.

For v2i16 and v2i8 we can do the same, but we first use the default Altivec
shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and
then apply the above procedure.

llvm-svn: 205146
2014-03-30 13:22:59 +00:00
Hal Finkel
92fc087786 [PowerPC] Handle v2i64 comparisons
v2i64 is a legal type under VSX, however we don't have native vector
comparisons. We can handle eq/ne by casting it to an Altivec type, but
everything else must be expanded.

llvm-svn: 205106
2014-03-29 16:04:40 +00:00
Hal Finkel
bac4772857 [PowerPC] VSX instruction latency corrections
The vector divide and sqrt instructions have high latencies, and the scalar
comparisons are like all of the others. On the P7, permutations take an extra
cycle over purely-simple vector ops.

llvm-svn: 205096
2014-03-29 13:20:31 +00:00
Rafael Espindola
0b8859a3e5 Completely rewrite ELFObjectWriter::RecordRelocation.
I started trying to fix a small issue, but this code has seen a small fix too
many.

The old code was fairly convoluted. Some of the issues it had:

* It failed to check if a symbol difference was in the some section when
  converting a relocation to pcrel.
* It failed to check if the relocation was already pcrel.
* The pcrel value computation was wrong in some cases (relocation-pc.s)
* It was missing quiet a few cases where it should not convert symbol
  relocations to section relocations, leaving the backends to patch it up.
* It would not propagate the fact that it had changed a relocation to pcrel,
  requiring a quiet nasty work around in ARM.
* It was missing comments.

llvm-svn: 205076
2014-03-29 06:26:49 +00:00
Hal Finkel
99fd50482e [PowerPC] Add subregister classes for f64 VSX values
We had stored both f64 values and v2f64, etc. values in the VSX registers. This
worked, but was suboptimal because we would always spill 16-byte values even
through we almost always had scalar 8-byte values. This resulted in an
increase in stack-size use, extra memory bandwidth, etc. To fix this, I've
added 64-bit subregisters of the Altivec registers, and combined those with the
existing scalar floating-point registers to form a class of VSX scalar
floating-point registers. The ABI code has also been enhanced to use this
register class and some other necessary improvements have been made.

llvm-svn: 205075
2014-03-29 05:29:01 +00:00
Hal Finkel
f15b90e07a [PowerPC] Fix VSX permutation isel
Not only did I invert the indices when I wrote the code, but I also did the
same thing when I wrote the regression test. Oops.

llvm-svn: 205046
2014-03-28 20:24:55 +00:00
Hal Finkel
c1ab8c2486 [PowerPC] v2[fi]64 need to be explicitly passed in VSX registers
v2[fi]64 values need to be explicitly passed in VSX registers. This is because
the code in TRI that finds the minimal register class given a register and a
value type will assert if given an Altivec register and a non-Altivec type.

llvm-svn: 205041
2014-03-28 19:58:11 +00:00
Hal Finkel
786d7d887a [PowerPC] Use a small cleanup pass to remove VSX self copies
As explained in r204976, because of how the allocation of VSX registers
interacts with the call-lowering code, we sometimes end up generating self VSX
copies. Specifically, things like this:
  %VSL2<def> = COPY %F2, %VSL2<imp-use,kill>
(where %F2 is really a sub-register of %VSL2, and so this copy is a nop)

This adds a small cleanup pass to remove these prior to post-RA scheduling.

llvm-svn: 204980
2014-03-27 23:12:31 +00:00
Hal Finkel
ee610c3b4d [PowerPC] Don't remove self VSX copies in PPCInstrInfo::copyPhysReg
Because of how the allocation of VSX registers interacts with the call-lowering
code, we sometimes end up generating self VSX copies. Specifically, things like
this:
  %VSL2<def> = COPY %F2, %VSL2<imp-use,kill>
(where %F2 is really a sub-register of %VSL2, and so this copy is a nop)

The problem is that ExpandPostRAPseudos always assumes that *some* instruction
has been inserted, and adds implicit defs to it. This is a problem if no copy
was inserted because it can cause subtle problems during post-RA scheduling.
These self copies will have to be removed some other way.

llvm-svn: 204976
2014-03-27 22:46:28 +00:00
Hal Finkel
f19bcef675 [PowerPC] Fix v2f64 vector extract and related patterns
First, v2f64 vector extract had not been declared legal (and so the existing
patterns were not being used). Second, the patterns for that, and for
scalar_to_vector, should really be a regclass copy, not a subregister
operation, because the VSX registers directly hold both the vector and scalar data.

llvm-svn: 204971
2014-03-27 22:22:48 +00:00
Hal Finkel
fa7f1597ca [PowerPC] Expand v2i64 shifts
These operations need to be expanded during legalization so that isel does not
crash. In theory, we might be able to custom lower some of these. That,
however, would need to be follow-up work.

llvm-svn: 204963
2014-03-27 21:26:33 +00:00
Rafael Espindola
d78485af3e Remove another unused argument.
llvm-svn: 204961
2014-03-27 20:49:35 +00:00
Rafael Espindola
4e5d391691 Remove unused argument.
llvm-svn: 204956
2014-03-27 20:41:17 +00:00
Rafael Espindola
5c8926deed Prevent alias from pointing to weak aliases.
This adds back r204781.

Original message:

Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204934
2014-03-27 15:26:56 +00:00
Hal Finkel
ca154788e6 [PowerPC] Generate VSX permutations for v2[fi]64 vectors
llvm-svn: 204873
2014-03-26 22:58:37 +00:00
Hal Finkel
800564a97b [PowerPC] VSX loads and stores support unaligned access
I've not yet updated PPCTTI because I'm not sure what the actual relative cost
is compared to the aligned uses.

llvm-svn: 204848
2014-03-26 19:39:09 +00:00
Hal Finkel
ab7214ddc6 [PowerPC] Use v2f64 <-> v2i64 VSX conversion instructions
llvm-svn: 204843
2014-03-26 19:13:54 +00:00
Hal Finkel
49e186f6c1 [PowerPC] Remove some dead VSX v4f32 store patterns
These patterns are dead (because v4f32 stores are currently promoted to v4i32
and stored using Altivec instructions), and also are likely not correct
(because they'd store the vector elements in the opposite order from that
assumed by the rest of the Altivec code).

llvm-svn: 204839
2014-03-26 18:26:36 +00:00
Hal Finkel
11338e1f96 [PowerPC] Use VSX vector load/stores for v2[fi]64
These instructions have access to the complete VSX register file. In addition,
they "swap" the order of the elements so that element 0 (the scalar part) comes
first in memory and element 1 follows at a higher address.

llvm-svn: 204838
2014-03-26 18:26:30 +00:00
Hal Finkel
0bf7496bb8 [PowerPC] Add v2i64 as a legal VSX type
v2i64 needs to be a legal VSX type because it is the SetCC result type from
v2f64 comparisons. We need to expand all non-arithmetic v2i64 operations.

This fixes the lowering for v2f64 VSELECT.

llvm-svn: 204828
2014-03-26 16:12:58 +00:00
Hal Finkel
00925a52e5 [PowerPC] Lower VSELECT using xxsel when VSX is available
With VSX there is a real vector select instruction, and so we should use it.
Note that VSELECT will still scalarize for v2f64 because the corresponding
SetCC result type (v2i64) is not currently a legal type.

llvm-svn: 204801
2014-03-26 12:49:28 +00:00
Rafael Espindola
63a8ff6883 Revert "Prevent alias from pointing to weak aliases."
This reverts commit r204781.

I will follow up to with msan folks to see what is what they
were trying to do with aliases to weak aliases.

llvm-svn: 204784
2014-03-26 06:14:40 +00:00
Hal Finkel
7a700cc27d [PowerPC] Generate logical vector VSX instructions
These instructions are essentially the same as their Altivec counterparts, but
have access to the larger VSX register file.

llvm-svn: 204782
2014-03-26 04:55:40 +00:00
Rafael Espindola
c9179b8b50 Prevent alias from pointing to weak aliases.
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204781
2014-03-26 04:48:47 +00:00
Hal Finkel
066a5cfe42 [PowerPC] Select between VSX A-type and M-type FMA instructions just before RA
The VSX instruction set has two types of FMA instructions: A-type (where the
addend is taken from the output register) and M-type (where one of the product
operands is taken from the output register). This adds a small pass that runs
just after MI scheduling (and, thus, just before register allocation) that
mutates A-type instructions (that are created during isel) into M-type
instructions when:

 1. This will eliminate an otherwise-necessary copy of the addend

 2. One of the product operands is killed by the instruction

The "right" moment to make this decision is in between scheduling and register
allocation, because only there do we know whether or not one of the product
operands is killed by any particular instruction. Unfortunately, this also
makes the implementation somewhat complicated, because the MIs are not in SSA
form and we need to preserve the LiveIntervals analysis.

As a simple example, if we have:

%vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9
%vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
                        %RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16
  ...
  %vreg9<def,tied1> = XSMADDADP %vreg9<tied0>, %vreg17, %vreg19,
                        %RM<imp-use>; VSLRC:%vreg9,%vreg17,%vreg19
  ...

We can eliminate the copy by changing from the A-type to the
M-type instruction. This means:

  %vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
                        %RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16

is replaced by:

  %vreg16<def,tied1> = XSMADDMDP %vreg16<tied0>, %vreg18, %vreg9,
                        %RM<imp-use>; VSLRC:%vreg16,%vreg18,%vreg9

and we remove: %vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9

llvm-svn: 204768
2014-03-25 23:29:21 +00:00
Hal Finkel
b10ebe6ad3 [PowerPC] Correct commutable indices for VSX FMA instructions
Although the first two operands are the ones that can be swapped, the tied
input operand is listed before them, so we need to adjust for that.

I have a test case for this, but it goes along with an upcoming commit (so it
will come soon).

llvm-svn: 204748
2014-03-25 19:26:43 +00:00
Hal Finkel
41960891a5 [PowerPC] Add a TableGen relation for A-type and M-type VSX FMA instructions
TableGen will create a lookup table for the A-type FMA instructions providing
their corresponding M-form opcodes. This will be used by upcoming commits.

llvm-svn: 204746
2014-03-25 18:55:11 +00:00
Ulrich Weigand
f2e33e8135 [PowerPC] Generate little-endian object files
As a first step towards real little-endian code generation, this patch
changes the PowerPC MC layer to actually generate little-endian object
files.  This involves passing the little-endian flag through the various
layers, including down to createELFObjectWriter so we actually get basic
little-endian ELF objects, emitting instructions in little-endian order,
and handling fixups and relocations as appropriate for little-endian.

The bulk of the patch is to update most test cases in test/MC/PowerPC
to verify both big- and little-endian encodings.  (The only test cases
*not* updated are those that create actual big-endian ABI code, like
the TLS tests.)

Note that while the object files are now little-endian, the generated
code itself is not yet updated, in particular, it still does not adhere
to the ELFv2 ABI.

llvm-svn: 204634
2014-03-24 18:16:09 +00:00
Will Schmidt
8fe019f872 [PPC64LE] ELFv2 ABI updates for the .opd section
[PPC64LE] ELFv2 ABI updates for the .opd section
The PPC64 Little Endian (PPC64LE) target supports the ELFv2 ABI, and as
such, does not have a ".opd" section.  This is keyed off a _CALL_ELF=2
macro check.

The CALL_ELF check is not clearly documented at this time.  The basis
for usage in this patch is from the gcc thread here:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01144.html

> Adding comment from Uli:
Looks good to me.  I think the old-style JIT doesn't really work
anyway for 64-bit, but at least with this patch LLVM will compile
and link again on a ppc64le host ...

llvm-svn: 204614
2014-03-24 16:04:15 +00:00
Hal Finkel
feee356b52 [PowerPC] Mark many instructions as commutative
I'm under the impression that we used to infer the isCommutable flag from the
instruction-associated pattern. Regardless, we don't seem to do this (at least
by default) any more. I've gone through all of our instruction definitions, and
marked as commutative all of those that should be trivial to commute (by
exchanging the first two operands). There has been special code for the RL*
instructions, and that's not changed.

Before this change, we had the following commutative instructions:

 RLDIMI
 RLDIMIo
 RLWIMI
 RLWIMI8
 RLWIMI8o
 RLWIMIo
 XSADDDP
 XSMULDP
 XVADDDP
 XVADDSP
 XVMULDP
 XVMULSP

After:

 ADD4
 ADD4o
 ADD8
 ADD8o
 ADDC
 ADDC8
 ADDC8o
 ADDCo
 ADDE
 ADDE8
 ADDE8o
 ADDEo
 AND
 AND8
 AND8o
 ANDo
 CRAND
 CREQV
 CRNAND
 CRNOR
 CROR
 CRXOR
 EQV
 EQV8
 EQV8o
 EQVo
 FADD
 FADDS
 FADDSo
 FADDo
 FMADD
 FMADDS
 FMADDSo
 FMADDo
 FMSUB
 FMSUBS
 FMSUBSo
 FMSUBo
 FMUL
 FMULS
 FMULSo
 FMULo
 FNMADD
 FNMADDS
 FNMADDSo
 FNMADDo
 FNMSUB
 FNMSUBS
 FNMSUBSo
 FNMSUBo
 MULHD
 MULHDU
 MULHDUo
 MULHDo
 MULHW
 MULHWU
 MULHWUo
 MULHWo
 MULLD
 MULLDo
 MULLW
 MULLWo
 NAND
 NAND8
 NAND8o
 NANDo
 NOR
 NOR8
 NOR8o
 NORo
 OR
 OR8
 OR8o
 ORo
 RLDIMI
 RLDIMIo
 RLWIMI
 RLWIMI8
 RLWIMI8o
 RLWIMIo
 VADDCUW
 VADDFP
 VADDSBS
 VADDSHS
 VADDSWS
 VADDUBM
 VADDUBS
 VADDUHM
 VADDUHS
 VADDUWM
 VADDUWS
 VAND
 VAVGSB
 VAVGSH
 VAVGSW
 VAVGUB
 VAVGUH
 VAVGUW
 VMADDFP
 VMAXFP
 VMAXSB
 VMAXSH
 VMAXSW
 VMAXUB
 VMAXUH
 VMAXUW
 VMHADDSHS
 VMHRADDSHS
 VMINFP
 VMINSB
 VMINSH
 VMINSW
 VMINUB
 VMINUH
 VMINUW
 VMLADDUHM
 VMULESB
 VMULESH
 VMULEUB
 VMULEUH
 VMULOSB
 VMULOSH
 VMULOUB
 VMULOUH
 VNMSUBFP
 VOR
 VXOR
 XOR
 XOR8
 XOR8o
 XORo
 XSADDDP
 XSMADDADP
 XSMAXDP
 XSMINDP
 XSMSUBADP
 XSMULDP
 XSNMADDADP
 XSNMSUBADP
 XVADDDP
 XVADDSP
 XVMADDADP
 XVMADDASP
 XVMAXDP
 XVMAXSP
 XVMINDP
 XVMINSP
 XVMSUBADP
 XVMSUBASP
 XVMULDP
 XVMULSP
 XVNMADDADP
 XVNMADDASP
 XVNMSUBADP
 XVNMSUBASP
 XXLAND
 XXLNOR
 XXLOR
 XXLXOR

This is a by-inspection change, and I'm not sure how to write a reliable test
case. I would like advice on this, however.

llvm-svn: 204609
2014-03-24 15:07:28 +00:00
Hal Finkel
829bfc7c99 [PowerPC] Don't schedule VSX copy legalization unless VSX is enabled
There is no need to schedule this extra pass if it will have nothing to do.

llvm-svn: 204594
2014-03-24 09:51:41 +00:00
Hal Finkel
d4fe687c4a [PowerPC] Update comment re: VSX copy-instruction selection
I've done some experimentation with this, and it looks like using the
lower-latency (but lower throughput) copy instruction is essentially always the
right thing to do.

My assumption is that, in order to be relatively sure that the higher-latency
copy will increase throughput, we'd want to have it unlikely to be in-flight
with its use. On the P7, the global completion table (GCT) can hold a maximum
of 120 instructions, shared among all active threads (up to 4), giving 30
instructions per thread.  So specifically, I'd require at least that many
instructions between the copy and the use before the high-latency variant is
used.

Trying this, however, over the entire test suite resulted in zero cases where
the high-latency form would be preferable. This may be a consequence of the
fact that the scheduler views copies as free, and so they tend to end up close
to their uses. For this experiment I created a function:

  unsigned chooseVSXCopy(MachineBasicBlock &MBB,
                         MachineBasicBlock::iterator I,
                         unsigned DestReg, unsigned SrcReg,
                         unsigned StartDist = 1,
                         unsigned Depth = 3) const;

with an implementation like:

  if (!Depth)
    return PPC::XXLOR;

  const unsigned MaxDist = 30;
  unsigned Dist = StartDist;
  for (auto J = I, JE = MBB.end(); J != JE && Dist <= MaxDist; ++J) {
    if (J->isTransient() && !J->isCopy())
      continue;

    if (J->isCall() || J->isReturn() || J->readsRegister(DestReg, TRI))
      return PPC::XXLOR;

    ++Dist;
  }

  // We've exceeded the required distance for the high-latency form, use it.
  if (Dist > MaxDist)
    return PPC::XVCPSGNDP;

  // If this is only an exit block, use the low-latency form.
  if (MBB.succ_empty())
    return PPC::XXLOR;

  // We've reached the end of the block, check the successor blocks (up to some
  // depth), and use the high-latency form if that is okay with all successors.
  for (auto J = MBB.succ_begin(), JE = MBB.succ_end(); J != JE; ++J) {
    if (chooseVSXCopy(**J, (*J)->begin(), DestReg, SrcReg,
                      Dist, --Depth) == PPC::XXLOR)
      return PPC::XXLOR;
  }

  // All of our successor blocks seem okay with the high-latency variant, so
  // we'll use it.
  return PPC::XVCPSGNDP;

and then changed the copy opcode selection from:
    Opc = PPC::XXLOR;
to:
    Opc = chooseVSXCopy(MBB, std::next(I), DestReg, SrcReg);

In conclusion, I'm removing the FIXME from the comment, because I believe that
there is, at least absent other examples, nothing to fix.

llvm-svn: 204591
2014-03-24 09:36:36 +00:00
Nuno Lopes
79d18a66ec remove a bunch of unused private methods
found with a smarter version of -Wunused-member-function that I'm playwing with.
Appologies in advance if I removed someone's WIP code.

 include/llvm/CodeGen/MachineSSAUpdater.h            |    1 
 include/llvm/IR/DebugInfo.h                         |    3 
 lib/CodeGen/MachineSSAUpdater.cpp                   |   10 --
 lib/CodeGen/PostRASchedulerList.cpp                 |    1 
 lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp    |   10 --
 lib/IR/DebugInfo.cpp                                |   12 --
 lib/MC/MCAsmStreamer.cpp                            |    2 
 lib/Support/YAMLParser.cpp                          |   39 ---------
 lib/TableGen/TGParser.cpp                           |   16 ---
 lib/TableGen/TGParser.h                             |    1 
 lib/Target/AArch64/AArch64TargetTransformInfo.cpp   |    9 --
 lib/Target/ARM/ARMCodeEmitter.cpp                   |   12 --
 lib/Target/ARM/ARMFastISel.cpp                      |   84 --------------------
 lib/Target/Mips/MipsCodeEmitter.cpp                 |   11 --
 lib/Target/Mips/MipsConstantIslandPass.cpp          |   12 --
 lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp              |   21 -----
 lib/Target/NVPTX/NVPTXISelDAGToDAG.h                |    2 
 lib/Target/PowerPC/PPCFastISel.cpp                  |    1 
 lib/Transforms/Instrumentation/AddressSanitizer.cpp |    2 
 lib/Transforms/Instrumentation/BoundsChecking.cpp   |    2 
 lib/Transforms/Instrumentation/MemorySanitizer.cpp  |    1 
 lib/Transforms/Scalar/LoopIdiomRecognize.cpp        |    8 -
 lib/Transforms/Scalar/SCCP.cpp                      |    1 
 utils/TableGen/CodeEmitterGen.cpp                   |    2 
 24 files changed, 2 insertions(+), 261 deletions(-)

llvm-svn: 204560
2014-03-23 17:09:26 +00:00
Hal Finkel
deec4f1f76 [PowerPC] Make use of VSX f64 <-> i64 conversion instructions
When VSX is available, these instructions should be used in preference to the
older variants that only have access to the scalar floating-point registers.

llvm-svn: 204559
2014-03-23 05:35:00 +00:00
Hal Finkel
47d76a6461 [PowerPC] Fix the VSX v2f64 return register
v2f64 values, like other 128-bit values, are returned under VSX in register
vs34 (Altivec register v2).

llvm-svn: 204543
2014-03-22 18:24:43 +00:00
Alexey Samsonov
f3333428a5 Add llvm_unreachable after fully-covered switches to appease GCC
llvm-svn: 204318
2014-03-20 07:30:40 +00:00
Rafael Espindola
9eed3f671f Look through variables when computing relocations.
Given

bar = foo + 4
	.long bar

MC would eat the 4. GNU as includes it in the relocation. The rule seems to be
that a variable that defines a symbol is used in the relocation and one that
does not define a symbol is evaluated and the result included in the relocation.

Fixing this unfortunately required some other changes:

* Since the variable is now evaluated, it would prevent the ELF writer from
  noticing the weakref marker the elf streamer uses. This patch then replaces
  that with a VariantKind in MCSymbolRefExpr.

* Using VariantKind then requires us to look past other VariantKind to see

	.weakref	bar,foo
	call	bar@PLT

  doing this also fixes

	zed = foo +2
	call zed@PLT

  so that is a good thing.

* Looking past VariantKind means that the relocation selection has to use
  the fixup instead of the target.

This is a reboot of the previous fixes for MC. I will watch the sanitizer
buildbot and wait for a build before adding back the previous fixes.

llvm-svn: 204294
2014-03-20 02:12:01 +00:00
Bill Schmidt
ec1edc24b0 Fix PR19144: Incorrect offset generated for int-to-fp conversion at -O0.
When converting a signed 32-bit integer to double-precision floating point on
hardware without a lfiwax instruction, we have to instead use a lfd followed
by fcfid.  We were erroneously offsetting the address by 4 bytes in
preparation for either a lfiwax or lfiwzx when generating the lfd.  This fixes
that silly error.

This was not caught in the test suite since the conversion tests were run with
-mcpu=pwr7, which implies availability of lfiwax.  I've added another test
case for older hardware that checks the code we expect in the absence of
lfiwax and other flavors of fcfid.  There are fewer tests in this test case
because we punt to DAG selection in more cases on older hardware.  (We must
generate complex fiddly sequences in those cases, and there is marginal
benefit in duplicating that logic in fast-isel.)

llvm-svn: 204155
2014-03-18 14:32:50 +00:00
Craig Topper
f7e1d669fe [C++11] Mark the target fast isel classes as 'final' so that the compiler can de-virtualize some of the internal calls.
llvm-svn: 204123
2014-03-18 07:27:13 +00:00
Ulrich Weigand
59b05e81f9 [ppc64] Avoid copy relocs in named rodata sections
Commit r181723 introduced code to avoid placing initialized variables
needing relocations into the .rodata section, which avoid copy relocs
that do not work as expected on ppc64 function references.

The same treatment is also needed for *named* .rodata.XXX sections.
This patch changes PPC64LinuxTargetObjectFile::SelectSectionForGlobal
to modify "Kind" *before* calling the default SelectSectionForGlobal
routine, instead of first calling the default routine and then just
checking for the (main) .rodata section afterwards.

llvm-svn: 203921
2014-03-14 12:45:22 +00:00
Owen Anderson
e541764c5f Phase 2 of the great MachineRegisterInfo cleanup. This time, we're changing
operator* on the by-operand iterators to return a MachineOperand& rather than
a MachineInstr&.  At this point they almost behave like normal iterators!

Again, this requires making some existing loops more verbose, but should pave
the way for the big range-based for-loop cleanups in the future.

llvm-svn: 203865
2014-03-13 23:12:04 +00:00
Hal Finkel
8b6358ead9 [PowerPC] Initial support for the VSX instruction set
VSX is an ISA extension supported on the POWER7 and later cores that enhances
floating-point vector and scalar capabilities. Among other things, this adds
<2 x double> support and generally helps to reduce register pressure.

The interesting part of this ISA feature is the register configuration: there
are 64 new 128-bit vector registers, the 32 of which are super-registers of the
existing 32 scalar floating-point registers, and the second 32 of which overlap
with the 32 Altivec vector registers. This makes things like vector insertion
and extraction tricky: this can be free but only if we force a restriction to
the right register subclass when needed. A new "minipass" PPCVSXCopy takes care
of this (although it could do a more-optimal job of it; see the comment about
unnecessary copies below).

Please note that, currently, VSX is not enabled by default when targeting
anything because it is not yet ready for that.  The assembler and disassembler
are fully implemented and tested. However:

 - CodeGen support causes miscompiles; test-suite runtime failures:
      MultiSource/Benchmarks/FreeBench/distray/distray
      MultiSource/Benchmarks/McCat/08-main/main
      MultiSource/Benchmarks/Olden/voronoi/voronoi
      MultiSource/Benchmarks/mafft/pairlocalalign
      MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
      SingleSource/Benchmarks/CoyoteBench/almabench
      SingleSource/Benchmarks/Misc/matmul_f64_4x4

 - The lowering currently falls back to using Altivec instructions far more
   than it should. Worse, there are some things that are scalarized through the
   stack that shouldn't be.

 - A lot of unnecessary copies make it past the optimizers, and this needs to
   be fixed.

 - Many more regression tests are needed.

Normally, I'd fix these things prior to committing, but there are some
students and other contributors who would like to work this, and so it makes
sense to move this development process upstream where it can be subject to the
regular code-review procedures.

llvm-svn: 203768
2014-03-13 07:58:58 +00:00
Hal Finkel
68c9b6839e [TableGen] Optionally forbid overlap between named and positional operands
There are currently two schemes for mapping instruction operands to
instruction-format variables for generating the instruction encoders and
decoders for the assembler and disassembler respectively: a) to map by name and
b) to map by position.

In the long run, we'd like to remove the position-based scheme and use only
name-based mapping. Unfortunately, the name-based scheme currently cannot deal
with complex operands (those with suboperands), and so we currently must use
the position-based scheme for those. On the other hand, the position-based
scheme cannot deal with (register) variables that are split into multiple
ranges. An upcoming commit to the PowerPC backend (adding VSX support) will
require this capability. While we could teach the position-based scheme to
handle that, since we'd like to move away from the position-based mapping
generally, it seems silly to teach it new tricks now. What makes more sense is
to allow for partial transitioning: use the name-based mapping when possible,
and only use the position-based scheme when necessary.

Now the problem is that mixing the two sensibly was not possible: the
position-based mapping would map based on position, but would not skip those
variables that were mapped by name. Instead, the two sets of assignments would
overlap. However, I cannot currently change the current behavior, because there
are some backends that rely on it [I think mistakenly, but I'll send a message
to llvmdev about that]. So I've added a new TableGen bit variable:
noNamedPositionallyEncodedOperands, that can be used to cause the
position-based mapping to skip variables mapped by name.

llvm-svn: 203767
2014-03-13 07:57:54 +00:00
Roman Divacky
023131bd23 Allow exclamation and tilde to be parsed as a part of the ppc asm operand.
llvm-svn: 203699
2014-03-12 19:25:57 +00:00
Rafael Espindola
9eaa756fe4 Try harder to evaluate expressions when printing assembly.
When printing assembly we don't have a Layout object, but we can still
try to fold some constants.

Testcase by Ulrich Weigand.

llvm-svn: 203677
2014-03-12 16:55:59 +00:00
Will Schmidt
40cf50fd75 Update the datalayout string for ppc64LE.
Update the datalayout string for ppc64LE.

llvm-svn: 203664
2014-03-12 14:59:17 +00:00
Patrik Hagglund
f6f25d32ac Replace '#include ValueTypes.h' with forward declarations.
In some cases the include is pushed "downstream" (or removed if
unused).

llvm-svn: 203644
2014-03-12 08:00:24 +00:00
Chandler Carruth
8f25783c45 [TTI] There is actually no realistic way to pop TTI implementations off
the stack of the analysis group because they are all immutable passes.
This is made clear by Craig's recent work to use override
systematically -- we weren't overriding anything for 'finalizePass'
because there is no such thing.

This is kind of a lame restriction on the API -- we can no longer push
and pop things, we just set up the stack and run. However, I'm not
invested in building some better solution on top of the existing
(terrifying) immutable pass and legacy pass manager.

llvm-svn: 203437
2014-03-10 02:45:14 +00:00
Rafael Espindola
72416905a2 Don't avoid cfi instructions on the bg/p.
The integrated assembler now works for ppc. Since this was the last use of the
bg/p predicate and Hal says that it is now dead, drop the predicate too.

llvm-svn: 203269
2014-03-07 19:04:12 +00:00
David Majnemer
b702f79917 MC: Remove superfluous section attribute flag definitions
Summary:
llvm/MC/MCSectionMachO.h and llvm/Support/MachO.h both had the same
definitions for the section flags.  Instead, grab the definitions out of
support.

No functionality change.

Reviewers: grosbach, Bigcheese, rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2998

llvm-svn: 203211
2014-03-07 07:36:05 +00:00
Rafael Espindola
cb9ca86245 Replace PROLOG_LABEL with a new CFI_INSTRUCTION.
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.

The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.

The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.

I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.

The net result is that we don't create temporary labels that are never used.

llvm-svn: 203204
2014-03-07 06:08:31 +00:00
Hal Finkel
c86bafb5ea The PPC global base register cannot be r0
The global base register cannot be r0 because it might end up as the first
argument to addi or addis. Fixes PR18316.

I don't have a small stable test case.

llvm-svn: 203054
2014-03-06 01:28:23 +00:00
Chandler Carruth
0873afae39 [Layering] Move DebugInfo.h into the IR library where its implementation
already lives.

llvm-svn: 203046
2014-03-06 00:46:21 +00:00
Hal Finkel
c04da1f4c7 Fixup PPC Darwin i1 argument handling
Like on other targets, we need to zero_extend/truncate i1 args before copying
them to GPRs.

llvm-svn: 203045
2014-03-06 00:45:19 +00:00
Hal Finkel
0373847793 When using CR bit registers on PPC32, handle the i1 vaarg case
When copying an i1 value into a GPR for a vaarg call, we need to explicitly
zero-extend the i1 value (otherwise an invalid CRBIT -> GPR copy will be
generated).

llvm-svn: 203041
2014-03-06 00:23:33 +00:00
Hal Finkel
18344a3ff6 With PPC CR bit registers, handle int_to_fp on older cores
On cores without fpcvt support, we cannot promote int_to_fp i1 operations,
because there is nothing to promote them to. The most straightforward
implementation of this uses a select to choose between the two possible
resulting floating-point values (and that's what is done here).

llvm-svn: 203015
2014-03-05 22:14:00 +00:00
Joerg Sonnenberger
b500454592 Enable integrated assembler on OpenBSD/PPC32 by default, too.
From Brad Smith.

llvm-svn: 202967
2014-03-05 11:37:04 +00:00
Will Schmidt
701f152950 [PowerPC] support powerpc64le as syntax-checking target (pass2)
Register the Asm Printer for the ppc64le target.

This fills in a spot that was missed in an earlier change (r187179).

llvm-svn: 202861
2014-03-04 16:51:52 +00:00
Chandler Carruth
649f6270aa [Modules] Move ValueHandle into the IR library where Value itself lives.
Move the test for this class into the IR unittests as well.

This uncovers that ValueMap too is in the IR library. Ironically, the
unittest for ValueMap is useless in the Support library (honestly, so
was the ValueHandle test) and so it already lives in the IR unittests.
Mmmm, tasty layering.

llvm-svn: 202821
2014-03-04 11:17:44 +00:00
Chandler Carruth
0bf5689f06 [Modules] Move GetElementPtrTypeIterator into the IR library. As its
name might indicate, it is an iterator over the types in an instruction
in the IR.... You see where this is going.

Another step of modularizing the support library.

llvm-svn: 202815
2014-03-04 10:40:04 +00:00
Hal Finkel
64680c3ba1 Add a PPC inline asm constraint type for single CR bits
Now that the PowerPC backend can track individual CR bits as first-class
registers, we should also have a way of allocating them for inline asm
statements. Because these registers are only one bit, if an output variable is
implicitly cast to a larger integer size, we'll get an any_extend to that
larger type (this is part of the existing target-independent logic). As a
result, regardless of the size of the output type, only the first bit is
meaningful.

The constraint identifier "wc" has been chosen for this purpose. Although gcc
does not currently support allocating individual CR bits, this identifier
choice has been coordinated with the gcc PowerPC team, and will be marked as
reserved for this purpose in the gcc constraints.md file.

llvm-svn: 202657
2014-03-02 18:23:39 +00:00
Benjamin Kramer
e4eb1b495f [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.
Remove the old functions.

llvm-svn: 202636
2014-03-02 12:27:27 +00:00
Craig Topper
b0056a4ca7 Switch all uses of LLVM_OVERRIDE to just use 'override' directly.
llvm-svn: 202621
2014-03-02 09:09:27 +00:00
Craig Topper
c8a0b9e381 Switch all uses of LLVM_FINAL to just use 'final', and remove the macro.
llvm-svn: 202618
2014-03-02 08:08:51 +00:00
Hal Finkel
4937443651 Remove extra truncs/exts around i32 bit operations on PPC64
This generalizes the code to eliminate extra truncs/exts around i1 bit
operations to also do the same on PPC64 for i32 bit operations. This eliminates
a fairly prevalent code wart:

int foo(int a) {
  return a == 5 ? 7 : 8;
}

On PPC64, because of the extension implied by the ABI, this would generate:

	cmplwi 0, 3, 5
	li 12, 8
	li 4, 7
	isel 3, 4, 12, 2
	rldicl 3, 3, 0, 32
	blr

where the 'rldicl 3, 3, 0, 32', the extension, is completely unnecessary. At
least for the single-BB case (which is all that the DAG combine mechanism can
handle), this unnecessary extension is no longer generated.

llvm-svn: 202600
2014-03-01 21:36:57 +00:00
Hal Finkel
1970087008 Swap PPC isel operands to allow for 0-folding
The PPC isel instruction can fold 0 into the first operand (thus eliminating
the need to materialize a zero-containing register when the 'true' result of
the isel is 0). When the isel is fed by a bit register operation that we can
invert, do so as part of the bit-register-operation peephole routine.

llvm-svn: 202469
2014-02-28 06:11:16 +00:00
Hal Finkel
3bd3f4e287 Trying to unbreak the darwin11 builder
The CR bit tracking code broke PPC/Darwin; trying to get it working again...

(the darwin11 builder, which defaults to the darwin ABI when running PPC tests,
asserted when running test/CodeGen/PowerPC/inverted-bool-compares.ll)

llvm-svn: 202459
2014-02-28 01:17:25 +00:00
Hal Finkel
94f3724df6 Try to unbreak the C++11 build
Cannot use negative numbers in case statements without running afoul of -Wc++11-narrowing.

llvm-svn: 202455
2014-02-28 00:45:27 +00:00
Hal Finkel
883c64377d Add CR-bit tracking to the PowerPC backend for i1 values
This change enables tracking i1 values in the PowerPC backend using the
condition register bits. These bits can be treated on PowerPC as separate
registers; individual bit operations (and, or, xor, etc.) are supported.
Tracking booleans in CR bits has several advantages:

 - Reduction in register pressure (because we no longer need GPRs to store
   boolean values).

 - Logical operations on booleans can be handled more efficiently; we used to
   have to move all results from comparisons into GPRs, perform promoted
   logical operations in GPRs, and then move the result back into condition
   register bits to be used by conditional branches. This can be very
   inefficient, because the throughput of these CR <-> GPR moves have high
   latency and low throughput (especially when other associated instructions
   are accounted for).

 - On the POWER7 and similar cores, we can increase total throughput by using
   the CR bits. CR bit operations have a dedicated functional unit.

Most of this is more-or-less mechanical: Adjustments were needed in the
calling-convention code, support was added for spilling/restoring individual
condition-register bits, and conditional branch instruction definitions taking
specific CR bits were added (plus patterns and code for generating bit-level
operations).

This is enabled by default when running at -O2 and higher. For -O0 and -O1,
where the ability to debug is more important, this feature is disabled by
default. Individual CR bits do not have assigned DWARF register numbers,
and storing values in CR bits makes them invisible to the debugger.

It is critical, however, that we don't move i1 values that have been promoted
to larger values (such as those passed as function arguments) into bit
registers only to quickly turn around and move the values back into GPRs (such
as happens when values are returned by functions). A pair of target-specific
DAG combines are added to remove the trunc/extends in:
  trunc(binary-ops(binary-ops(zext(x), zext(y)), ...)
and:
  zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...)
In short, we only want to use CR bits where some of the i1 values come from
comparisons or are used by conditional branches or selects. To put it another
way, if we can do the entire i1 computation in GPRs, then we probably should
(on the POWER7, the GPR-operation throughput is higher, and for all cores, the
CR <-> GPR moves are expensive).

POWER7 test-suite performance results (from 10 runs in each configuration):

SingleSource/Benchmarks/Misc/mandel-2: 35% speedup
MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup
MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup
SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup
SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup
SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup

SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown
MultiSource/Applications/lemon/lemon: 8% slowdown

llvm-svn: 202451
2014-02-28 00:27:01 +00:00
Aaron Ballman
9e5315239d Silencing an MSVC signed comparison warning.
llvm-svn: 202295
2014-02-26 20:22:20 +00:00
Hal Finkel
08c64addef Account for 128-bit integer operations in PPCCTRLoops
We need to abort the formation of counter-register-based loops where there are
128-bit integer operations that might become function calls.

llvm-svn: 202192
2014-02-25 20:51:50 +00:00
Rafael Espindola
32da4bdd4b Make DataLayout a plain object, not a pass.
Instead, have a DataLayoutPass that holds one. This will allow parts of LLVM
don't don't handle passes to also use DataLayout.

llvm-svn: 202168
2014-02-25 17:30:31 +00:00
Rafael Espindola
6c834371d9 Make some DataLayout pointers const.
No functionality change. Just reduces the noise of an upcoming patch.

llvm-svn: 202087
2014-02-24 23:12:18 +00:00
Rafael Espindola
1f7e9d4bed Rename a few more DataLayout variables.
llvm-svn: 201833
2014-02-21 01:53:35 +00:00
Rafael Espindola
60133f3afe move getNameWithPrefix and getSymbol to TargetMachine.
TargetLoweringBase is implemented in CodeGen, so before this patch we had
a dependency fom Target to CodeGen. This would show up as a link failure of
llvm-stress when building with -DBUILD_SHARED_LIBS=ON.

This fixes pr18900.

llvm-svn: 201711
2014-02-19 20:30:41 +00:00
Rafael Espindola
aea6192f20 Add back r201608, r201622, r201624 and r201625
r201608 made llvm corretly handle private globals with MachO. r201622 fixed
a bug in it and r201624 and r201625 were changes for using private linkage,
assuming that llvm would do the right thing.

They all got reverted because r201608 introduced a crash in LTO. This patch
includes a fix for that. The issue was that TargetLoweringObjectFile now has
to be initialized before we can mangle names of private globals. This is
trivially true during the normal codegen pipeline (the asm printer does it),
but LTO has to do it manually.

llvm-svn: 201700
2014-02-19 17:23:20 +00:00
Daniel Jasper
bf4e7d8ac3 Revert r201622 and r201608.
This causes the LLVMgold plugin to segfault. More information on the
replies to r201608.

llvm-svn: 201669
2014-02-19 12:26:01 +00:00
Rafael Espindola
d39a573c72 Fix PR18743.
The IR
@foo = private constant i32 42

is valid, but before this patch we would produce an invalid MachO from it. It
was invalid because it would use an L label in a section where the liker needs
the labels in order to atomize it.

One way of fixing it would be to just reject this IR in the backend, but that
would not be very front end friendly.

What this patch does is use an 'l' prefix in sections that we know the linker
requires symbols for atomizing them. This allows frontends to just use
private and not worry about which sections they go to or how the linker handles
them.

One small issue with this strategy is that now a symbol name depends on the
section, which is not available before codegen. This is not a problem in
practice. The reason is that it only happens with private linkage, which will
be ignored by the non codegen users (llvm-nm and llvm-ar).

llvm-svn: 201608
2014-02-18 22:24:57 +00:00
Rafael Espindola
c898de3245 Rename a DebugLoc variable to DbgLoc and a DataLayout to DL.
This is quiet a bit less confusing now that TargetData was renamed DataLayout.

llvm-svn: 201606
2014-02-18 22:05:46 +00:00
Daniel Sanders
7a3a160940 Re-commit: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for
targets with mature MC support. Such targets will always parse the inline
assembly (even when emitting assembly). Targets without mature MC support
continue to use EmitRawText() for assembly output.

The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced
with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler
to parse inline assembly (even when emitting assembly output). UseIntegratedAs
is set to true for targets that consider any failure to parse valid assembly
to be a bug. Target specific subclasses generally enable the integrated
assembler in their constructor. The default value can be overridden with
-no-integrated-as.

All tests that rely on inline assembly supporting invalid assembly (for example,
those that use mnemonics such as 'foo' or 'hello world') have been updated to
disable the integrated assembler.

Changes since review (and last commit attempt):
- Fixed test failures that were missed due to configuration of local build.
  (fixes crash.ll and a couple others).
- Fixed tests that happened to pass because the local build was on X86
  (should fix 2007-12-17-InvokeAsm.ll)
- mature-mc-support.ll's should no longer require all targets to be compiled.
  (should fix ARM and PPC buildbots)
- Object output (-filetype=obj and similar) now forces the integrated assembler
  to be enabled regardless of default setting or -no-integrated-as.
  (should fix SystemZ buildbots)

Reviewers: rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2686

llvm-svn: 201333
2014-02-13 14:44:26 +00:00
Daniel Sanders
656c4d360b Revert r201237+r201238: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
It introduced multiple test failures in the buildbots.

llvm-svn: 201241
2014-02-12 15:39:20 +00:00
Daniel Sanders
e647d6441b Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output.

The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as.

All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler.

Reviewers: rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2686

llvm-svn: 201237
2014-02-12 14:44:54 +00:00
Rafael Espindola
8d47aa1e4e Pass the Mangler by reference.
It is never null and it is not used in casts, so there is no reason to use a
pointer. This matches how we pass TM.

llvm-svn: 201025
2014-02-08 14:53:28 +00:00
Rafael Espindola
1d50d1310d Add LLVM_OVERRIDE to a few declarations.
llvm-svn: 201022
2014-02-08 06:07:27 +00:00
Rafael Espindola
c27ddf11ba Just returning false is the default.
llvm-svn: 200890
2014-02-06 00:03:15 +00:00
Matt Arsenault
7b69102edb Add address space argument to allowsUnalignedMemoryAccess.
On R600, some address spaces have more strict alignment
requirements than others.

llvm-svn: 200887
2014-02-05 23:15:53 +00:00
Rafael Espindola
98165a6a91 Remove support for not using .loc directives.
Clang itself was not using this. The only way to access it was via llc.

llvm-svn: 200862
2014-02-05 18:00:21 +00:00
Hal Finkel
c649b3ff57 Replace PPC instruction-size code with MCInstrDesc getSize
As part of the cleanup done to enable the disassembler, the PPC instructions
now have a valid Size description field. This can now be used to replace some
custom logic in a few places to compute instruction sizes.

Patch by David Wiberg!

llvm-svn: 200623
2014-02-02 06:12:27 +00:00
David Woodhouse
6c8fefd999 Delete MCSubtargetInfo data members from target MCCodeEmitter classes
The subtarget info is explicitly passed to the EncodeInstruction
method and we should use that subtarget info to influence any
encoding decisions.

llvm-svn: 200350
2014-01-28 23:13:25 +00:00
David Woodhouse
a79a37b435 Propagate MCSubtargetInfo through TableGen's getBinaryCodeForInstr()
llvm-svn: 200349
2014-01-28 23:13:18 +00:00
David Woodhouse
4a4c611e36 Explictly pass MCSubtargetInfo to MCCodeEmitter::EncodeInstruction()
llvm-svn: 200348
2014-01-28 23:13:07 +00:00
David Woodhouse
5d0b529d58 Change MCStreamer EmitInstruction interface to take subtarget info
llvm-svn: 200345
2014-01-28 23:12:42 +00:00
Iain Sandoe
f16c0c72d6 Provide a stub Target Streamer implementation for PPC MachO
At present, this handles .tc (error) and needs to be expanded to deal properly with .machine

llvm-svn: 200309
2014-01-28 11:03:17 +00:00
Hal Finkel
5c72f63cb7 Handle spilling the PPC GPRC_NOR0 register class
GPRC_NOR0 is not a subclass of GPRC (because it also contains the ZERO pseudo
register). As a result, we also need to check for it in the spilling code.

llvm-svn: 200288
2014-01-28 05:32:58 +00:00
Eric Christopher
2b6e161fce Revert r199871 and replace it with a simple check in the debug info
code to see if we're emitting a function into a non-default
text section. This is still a less-than-ideal solution, but more
contained than r199871 to determine whether or not we're emitting
code into an array of comdat sections.

llvm-svn: 200269
2014-01-28 00:49:26 +00:00
Rafael Espindola
bfdd58b802 Pass a MCSubtargetInfo down to the TargetStreamer creation.
With this the target streamers will be able to know the target features that
are in use.

llvm-svn: 200135
2014-01-26 06:38:58 +00:00
Rafael Espindola
806f778fa0 Construct the MCStreamer before constructing the MCTargetStreamer.
This has a few advantages:
* Only targets that use a MCTargetStreamer have to worry about it.
* There is never a MCTargetStreamer without a MCStreamer, so we can use a
  reference.
* A MCTargetStreamer can talk to the MCStreamer in its constructor.

llvm-svn: 200129
2014-01-26 06:06:37 +00:00
Rafael Espindola
bb8a9e36c5 Remove an easy use of EmitRawText from PPC.
This makes lib/Target/PowerPC EmitRawText free.

llvm-svn: 200065
2014-01-25 02:35:56 +00:00
Juergen Ributzka
e4d29eb495 Add final and owerride keywords to TargetTransformInfo's subclasses.
llvm-svn: 200021
2014-01-24 18:22:59 +00:00
Alp Toker
1c4b33e8e5 Fix known typos
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.

llvm-svn: 200018
2014-01-24 17:20:08 +00:00
Eric Christopher
4de8f33ce6 Add a variable to track whether or not we've used a unique section,
e.g. linkonce, to TargetMachine and set it when we've done so
for ELF targets currently. This involved making TargetMachine
non-const in a TLOF use and propagating that change around - I'm
open to other ideas.

This will be used in a future commit to handle emitting debug
information with ranges.

llvm-svn: 199871
2014-01-23 06:47:25 +00:00
Rafael Espindola
012bf3f3ac Fix pr18515.
My understanding (from reading just the llvm code) is that
* most ppc cpus have a "sync n" instruction and an msync alias that is "sync 0".
* "book e" cpus instead have a msync instruction and not the more
general "sync n"

This patch reflects that in the .td files, allowing a single codepath for
asm ond obj streamer and incidentelly fixes a crash when EmitRawText was
called on a obj streamer.

llvm-svn: 199832
2014-01-22 20:20:52 +00:00
Hal Finkel
ca5b9beeb1 Fix pointer info on PPC byval stores
For PPC64 SVR (and Darwin), the stores that take byval aggregate parameters
from registers into the stack frame had MachinePointerInfo objects with
incorrect offsets. These offsets are relative to the object itself, not to the
stack frame base.

This fixes self hosting on PPC64 when compiling with -enable-aa-sched-mi.

llvm-svn: 199763
2014-01-21 20:15:58 +00:00
Rafael Espindola
ebbf4c6cd5 Make getTargetStreamer return a possibly null pointer.
This will allow it to be called from target independent parts of the main
streamer that don't know if there is a registered target streamer or not. This
in turn will allow targets to perform extra actions at specified points in the
interface: add extra flags for some labels, extra work during finalization, etc.

llvm-svn: 199174
2014-01-14 01:21:46 +00:00
Chandler Carruth
98adff6224 [PM] Split DominatorTree into a concrete analysis result object which
can be used by both the new pass manager and the old.

This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.

The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.

Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.

llvm-svn: 199104
2014-01-13 13:07:17 +00:00
Chandler Carruth
ee051af6e2 [cleanup] Move the Dominators.h and Verifier.h headers into the IR
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.

Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.

But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.

llvm-svn: 199082
2014-01-13 09:26:24 +00:00
Saleem Abdulrasool
b90512a41c correct target directive handling error handling
The target specific parser should return `false' if the target AsmParser handles
the directive, and `true' if the generic parser should handle the directive.
Many of the target specific directive handlers would `return Error' which does
not follow these semantics.  This change simply changes the target specific
routines to conform to the semantis of the ParseDirective correctly.

Conformance to the semantics improves diagnostics emitted for the invalid
directives.  X86 is taken as a sample to ensure that multiple diagnostics are
not presented for a single error.

llvm-svn: 199068
2014-01-13 01:15:39 +00:00
Chandler Carruth
53468087f3 Put the functionality for printing a value to a raw_ostream as an
operand into the Value interface just like the core print method is.
That gives a more conistent organization to the IR printing interfaces
-- they are all attached to the IR objects themselves. Also, update all
the users.

This removes the 'Writer.h' header which contained only a single function
declaration.

llvm-svn: 198836
2014-01-09 02:29:41 +00:00
Rafael Espindola
4dc5af8bc2 Move the llvm mangler to lib/IR.
This makes it available to tools that don't link with target (like llvm-ar).

llvm-svn: 198708
2014-01-07 21:19:40 +00:00
Chandler Carruth
7aa902a488 Move the LLVM IR asm writer header files into the IR directory, as they
are part of the core IR library in order to support dumping and other
basic functionality.

Rename the 'Assembly' include directory to 'AsmParser' to match the
library name and the only functionality left their -- printing has been
in the core IR library for quite some time.

Update all of the #includes to match.

All of this started because I wanted to have the layering in good shape
before I started adding support for printing LLVM IR using the new pass
infrastructure, and commandline support for the new pass infrastructure.

llvm-svn: 198688
2014-01-07 12:34:26 +00:00
Chandler Carruth
87f14b4eec Re-sort all of the includes with ./utils/sort_includes.py so that
subsequent changes are easier to review. About to fix some layering
issues, and wanted to separate out the necessary churn.

Also comment and sink the include of "Windows.h" in three .inc files to
match the usage in Memory.inc.

llvm-svn: 198685
2014-01-07 11:48:04 +00:00
Bill Wendling
e1a9065ca0 Remove unnecessary #includes.
llvm-svn: 198585
2014-01-06 06:00:00 +00:00
Bill Wendling
c3b5643da4 Refactor function that checks that __builtin_returnaddress's argument is constant.
This moves the check up into the parent class so that all targets can use it
without having to copy (and keep in sync) the same error message.

llvm-svn: 198579
2014-01-06 00:43:20 +00:00
Bill Wendling
be9af41475 Emit an error message if the value passed to __builtin_returnaddress isn't a constant
__builtin_returnaddress requires that the value passed into is be a constant.
However, at -O0 even a constant expression may not be converted to a constant.
Emit an error message intead of crashing.

llvm-svn: 198531
2014-01-05 01:47:20 +00:00
Rafael Espindola
eae6386a1e Make the llvm mangler depend only on DataLayout.
Before this patch any program that wanted to know the final symbol name of a
GlobalValue had to link with Target.

This patch implements a compromise solution where the mangler uses DataLayout.
This way, any tool that already links with Target (llc, clang) gets the exact
behavior as before and new IR files can be mangled without linking with Target.

With this patch the mangler is constructed with just a DataLayout and DataLayout
is extended to include the information the Mangler needs.

llvm-svn: 198438
2014-01-03 19:21:54 +00:00
Hal Finkel
d0f8236add [PPC] Fix comment to match function name
llvm-svn: 198362
2014-01-02 22:09:39 +00:00
Hal Finkel
0e77f939b4 [PPC] Fix the scheduling of CR logicals on the P7
CR logicals (crand, crxor, etc.) on the P7 need to be in the first slot of each
dispatch group. The old itinerary entry was just wrong (but has not mattered
because we don't generate these instructions).

This will matter when, in an upcoming commit, we start generating these
instructions.

llvm-svn: 198359
2014-01-02 21:38:26 +00:00
Hal Finkel
dc47cd8c25 [PPC] Use the correct immediate operands on 64-bit instructions
Several of the 64-bit fixed-point instructions with immediate operands were
using the 32-bit (i32) operand nodes instead of the corresponding 64-bit (i64)
operand definitions (u16imm instead of u16imm64, for example).

This error has had no effect so far, but would have caused type-checking
violations with an upcoming change.

llvm-svn: 198356
2014-01-02 21:26:59 +00:00
Roman Divacky
f81810bdfd Use r2 when encoding tls on ppc32. Fixes PR18305.
llvm-svn: 197878
2013-12-22 10:45:37 +00:00
Roman Divacky
7114adfec5 Add some comments.
llvm-svn: 197875
2013-12-22 09:48:38 +00:00
Roman Divacky
513296cd04 Implement initial-exec TLS for PPC32.
llvm-svn: 197824
2013-12-20 18:08:54 +00:00
Rafael Espindola
6cf8a98d5f Long doubles are required to be aligned to 128 bits and svr4 32 bits.
Clang was already getting this right.

llvm-svn: 197694
2013-12-19 16:23:59 +00:00
Hal Finkel
ce61543897 Add a disassembler to the PowerPC backend
The tests for the disassembler were adapted from the encoder tests, and for the
most part, the output from the disassembler matches that encoder-test inputs.
There are some places where more-informative mnemonics could be produced
(notably for the branch instructions), and those cases are noted in the tests
with FIXMEs.

Future work includes:

 - Generating more-informative mnemonics when possible (this may also be done
   in the printer).

 - Remove the dependence on positional "numbered" operand-to-variable mapping
   (for both encoding and decoding).

 - Internally using 64-bit instruction variants in 64-bit mode (if this turns
   out to matter).

llvm-svn: 197693
2013-12-19 16:13:01 +00:00
Rafael Espindola
9b3064d2fb Fix f64 and f128 for ppc-darwin.
This patch adds -f64:32:64 to 32 bit ppc darwin since a f64 inside a
structure are only 32 bit aligned.

The patch also drop -f128:64:128 from all ppc darwin, since f128 is
128 bit aligned.

llvm-svn: 197574
2013-12-18 15:06:25 +00:00
Rafael Espindola
e1792e72e1 One ppc32-darwin, a i64 inside a structure can have 32 bit alignment.
Thanks for Iain Sandoe for testing this with the original gcc.

Clang was already getting this right.

llvm-svn: 197572
2013-12-18 14:35:37 +00:00
Hal Finkel
0cc169d05b Eliminate PPC instruction decoding ambiguities
The instruction definitions in the PPC backend have a number of variants
defined for the same instruction to represent differences between 64-bit and
32-bit semantics. In order to generate a disassembler for the PPC backend, we
need to mark all but one of these as CodeGen only.

No functionality change intended; this is prep work for PPC disassembly
support.

llvm-svn: 197535
2013-12-17 23:05:18 +00:00
Rafael Espindola
4a6f55789a Fix the pointer size for the PS3 datalayout.
This will be tested from clang.

llvm-svn: 197501
2013-12-17 15:29:48 +00:00
Andrew Trick
a3aa2ba174 Allow MachineCSE to coalesce trivial subregister copies the same way that it coalesces normal copies.
Without this, MachineCSE is powerless to handle redundant operations with truncated source operands.

This required fixing the 2-addr pass to handle tied subregisters. It isn't clear what combinations of subregisters can legally be tied, but the simple case of truncated source operands is now safely handled:

     %vreg11<def> = COPY %vreg1:sub_32bit; GR32:%vreg11 GR64:%vreg1
     %vreg12<def> = COPY %vreg2:sub_32bit; GR32:%vreg12 GR64:%vreg2
     %vreg13<def,tied1> = ADD32rr %vreg11<tied0>, %vreg12<kill>, %EFLAGS<imp-def>

Test case: cse-add-with-overflow.ll.

This exposed an existing bug in
PPCInstrInfo::commuteInstruction. Thanks to Rafael for the test case:
PowerPC/crash.ll.

llvm-svn: 197465
2013-12-17 04:50:45 +00:00
Andrew Trick
935a379f21 whitespace
llvm-svn: 197464
2013-12-17 04:50:40 +00:00
Rafael Espindola
559bceac20 The preferred alignment defaults to the abi alignment. Omit if it is the same.
llvm-svn: 197400
2013-12-16 18:01:51 +00:00
Rafael Espindola
65c80ee4a2 On DataLayout, omit the default of p:64:64:64.
llvm-svn: 197397
2013-12-16 17:15:29 +00:00
Hal Finkel
951040af31 Set has_asmparser in PowerPC/LLVMBuild.txt
PowerPC now has an asm parser (and has for many months now); indicate this in
PowerPC/LLVMBuild.txt.

llvm-svn: 197393
2013-12-16 15:48:09 +00:00
Iain Sandoe
d78fe2e004 [Powerpc darwin] AsmParser Base implementation.
This is a base implementation of the powerpc-apple-darwin asm parser dialect.

* Enables infrastructure (essentially isDarwin()) and fixes up the parsing of asm directives to separate out ELF and MachO/Darwin additions.
* Enables parsing of {r,f,v}XX as register identifiers.
* Enables parsing of lo16() hi16() and ha16() as modifiers.

The changes to the test case are from David Fang (fangism).

llvm-svn: 197324
2013-12-14 13:34:02 +00:00
Rafael Espindola
2bd13393e0 Assume defaults to produce smaller datalayout strings.
llvm-svn: 197249
2013-12-13 17:56:11 +00:00
Iain Sandoe
fc27a861a1 test commit.
Amend a comment.

llvm-svn: 197237
2013-12-13 15:46:48 +00:00
Gabor Greif
c84043be50 typo in comment
llvm-svn: 197136
2013-12-12 08:00:34 +00:00
Hal Finkel
bb547872c5 Remove unused multiclass from PPCInstrInfo.td
llvm-svn: 197100
2013-12-12 00:23:29 +00:00
Hal Finkel
e8c7fef754 Improve instruction scheduling for the PPC POWER7
Aside from a few minor latency corrections, the major change here is a new
hazard recognizer which focuses on better dispatch-group formation on the
POWER7. As with the PPC970's hazard recognizer, the most important thing it
does is avoid load-after-store hazards within the same dispatch group. It uses
the POWER7's special dispatch-group-terminating nop instruction (instead of
inserting multiple regular nop instructions). This new hazard recognizer makes
use of the scheduling dependency graph itself, built using AA information, to
robustly detect the possibility of load-after-store hazards.

significant test-suite performance changes (the error bars are 99.5% confidence
intervals based on 5 test-suite runs both with and without the change --
speedups are negative):

speedups:

MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2
	-0.55171% +/- 0.333168%

MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl
	-17.5576% +/- 14.598%

MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
	-29.5708% +/- 7.09058%

MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt
	-34.9471% +/- 11.4391%

SingleSource/Benchmarks/BenchmarkGame/puzzle
	-25.1347% +/- 11.0104%

SingleSource/Benchmarks/Misc/flops-8
	-17.7297% +/- 9.79061%

SingleSource/Benchmarks/Shootout-C++/ary3
	-35.5018% +/- 23.9458%

SingleSource/Regression/C/uint64_to_float
	-56.3165% +/- 25.4234%

SingleSource/UnitTests/Vectorizer/gcc-loops
	-18.5309% +/- 6.8496%

regressions:

MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000
	18.351% +/- 12.156%

SingleSource/Benchmarks/Shootout-C++/methcall
	27.3086% +/- 14.4733%

llvm-svn: 197099
2013-12-12 00:19:11 +00:00
Hal Finkel
4a76fae817 Fix the PPC subsumes-predicate check
For one predicate to subsume another, they must both check the same condition
register. Failure to check this prerequisite was causing miscompiles.

Fixes PR18003.

llvm-svn: 197089
2013-12-11 23:12:25 +00:00
NAKAMURA Takumi
54fa39136d Prune redundant dependencies in LLVMBuild.txt.
llvm-svn: 196988
2013-12-11 00:30:57 +00:00
Rafael Espindola
db206ff5b6 Move PPC's getDataLayoutString out of line and document it better.
llvm-svn: 196987
2013-12-11 00:09:06 +00:00
David Fang
7d43532c99 on darwin<10, fallback to .weak_definition (PPC,X86)
.weak_def_can_be_hidden was not yet supported by the system assembler

llvm-svn: 196970
2013-12-10 21:37:41 +00:00
NAKAMURA Takumi
3fda77b6c7 Add proper dependencies to LLVMBuild.txt in llvm/lib.
I'll prune redundant deps in LLVMBuild.txt, later.

llvm-svn: 196881
2013-12-10 05:39:34 +00:00
Rafael Espindola
2b4db8d379 Remove the isImplicitlyPrivate argument of getNameWithPrefix.
getSymbolWithGlobalValueBase use is to create a name of a new symbol based
on the name of an existing GV. Assert that and then remove the last call
to pass true to isImplicitlyPrivate.

This gives the mangler API a 1:1 mapping from GV to names, which is what we
need to drop the mangler dependency on the target (and use an extended
datalayout instead).

llvm-svn: 196472
2013-12-05 05:53:12 +00:00
Alp Toker
e845f8af67 Correct word hyphenations
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471
2013-12-05 05:44:44 +00:00
Hal Finkel
2fbbf9299e Remove PPCScoreboardHazardRecognizer
PPCScoreboardHazardRecognizer was a subclass of ScoreboardHazardRecognizer
which did only one thing: filtered out nodes in EmitInstruction for which
DAG->getInstrDesc(SU) returned NULL. This used to be the case for PPC pseudo
instructions. As far as I can tell, this is no longer true, and so we can use
ScoreboardHazardRecognizer directly.

llvm-svn: 196171
2013-12-02 23:52:46 +00:00
Rafael Espindola
5f33387c40 Refactor the setting of PrivateGlobalPrefix.
No functionality change.

llvm-svn: 196170
2013-12-02 23:39:26 +00:00
Rafael Espindola
c36d63a948 Move getSymbolWithGlobalValueBase to TargetLoweringObjectFile.
This allows it to be used in TargetLoweringObjectFileImpl.cpp.

llvm-svn: 196117
2013-12-02 16:25:47 +00:00
Rafael Espindola
cee11d6f43 Remove dead code.
MO_JumpTableIndex and MO_ExternalSymbol don't show up on inline asm.

Keeping parts of the old asm printer just to print inline asm to a string that
we then parse back looks like a hack.

llvm-svn: 196111
2013-12-02 15:36:37 +00:00
Rafael Espindola
427ca8d886 Change the default of AsmWriterClassName and isMCAsmWriter.
llvm-svn: 196065
2013-12-02 04:55:42 +00:00
Rafael Espindola
3192965fa3 Refactor for clarity and efficiency.
The PPC GetSymbolFromOperand already prefixed stubs of MO_ExternalSymbol, so
this should be a nop.

llvm-svn: 196059
2013-12-02 03:26:43 +00:00
Hal Finkel
725757ccc8 Add a scheduling model (with itinerary) for the PPC POWER7
This adds a scheduling model for the POWER7 (P7) core, and enables the
machine-instruction scheduler when targeting the P7. Scheduling for the P7,
like earlier ooo PPC cores, requires considering both dispatch group hazards,
and functional unit resources and latencies. These are both modeled in a
combined itinerary. Dispatch group formation is still handled by the post-RA
scheduler (which still needs to be updated for the P7, but nevertheless does a
pretty good job).

One interesting aspect of this change is that I've also enabled to use of AA
duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark
results seem to support this decision (see below), and while this is normally
useful for in-order cores, and not for ooo cores like the P7, I think that the
dispatch slot hazards are enough like in-order resources to make the AA useful.

Test suite significant performance differences (where negative is a speedup,
and positive is a regression) vs. the current situation:

MultiSource/Benchmarks/BitBench/drop3/drop3
  with AA: N/A
  without AA: -28.7614% +/- 19.8356%
(significantly against AA)

MultiSource/Benchmarks/FreeBench/neural/neural
  with AA: -17.7406% +/- 11.2712%
  without AA: N/A
(significantly in favor of AA)

MultiSource/Benchmarks/SciMark2-C/scimark2
  with AA: -11.2079% +/- 1.80543%
  without AA: -11.3263% +/- 2.79651%

MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt
  with AA: -41.8649% +/- 17.0053%
  without AA: -34.5256% +/- 23.7072%

MultiSource/Benchmarks/mafft/pairlocalalign
  with AA: 25.3016% +/- 17.8614%
  without AA: 38.6629% +/- 14.9391%
(significantly in favor of AA)

MultiSource/Benchmarks/sim/sim
  with AA: N/A
  without AA: 13.4844% +/- 7.18195%
(significantly in favor of AA)

SingleSource/Benchmarks/BenchmarkGame/Large/fasta
  with AA: 15.0664% +/- 6.70216%
  without AA: 12.7747% +/- 8.43043%

SingleSource/Benchmarks/BenchmarkGame/puzzle
  with AA: 82.2713% +/- 26.3567%
  without AA: 75.7525% +/- 41.1842%

SingleSource/Benchmarks/Misc/flops-2
  with AA: -37.1621% +/- 20.7964%
  without AA: -35.2342% +/- 20.2999%
(significantly in favor of AA)

These are 99.5% confidence intervals from 5 runs per configuration. Regarding
the choice to turn on AA during CodeGen, of these results, four seem
significantly in favor of using AA, and one seems significantly against. I'm
not making this decision based on these numbers alone, but these results
seem consistent with results I have from other tests, and so I think that, on
balance, using AA is a win.

llvm-svn: 195981
2013-11-30 20:55:12 +00:00
Hal Finkel
fa2b249f38 Split some PPC itinerary classes
In preparation for adding scheduling definitions for the POWER7, split some PPC
itinerary classes so that the P7's latencies and hazards can be better
described. For the most part, this means differentiating indexed from non-index
pre-increment loads and stores. Also, differentiate single from
double-precision sqrt.

No functionality change intended (except for a more-specific latency for
single-precision sqrt on the A2).

llvm-svn: 195980
2013-11-30 20:41:13 +00:00
Hal Finkel
2d7cc00415 Adjust PPC A2 input operand latencies
On the PPC A2, instructions are only issued after their input operands are
ready. Model this by specifying that input operands are read at dispatch (0
cycles after issue). This changes all input operand latencies from 1 to 0.

Significant test-suite performance changes (these are 99.5% confidence
intervals on 6 runs for both before and after):

speedups:
MultiSource/Benchmarks/sim/sim
	-1.21915% +/- 0.175063%
MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt
	-1.23946% +/- 1.05133%
SingleSource/Benchmarks/Misc/flops-2
	-1.24237% +/- 0.681362%
MultiSource/Applications/JM/lencod/lencod
	-1.33992% +/- 0.757498%
MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt
	-1.51802% +/- 1.21468%
MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt
	-2.18818% +/- 1.28605%
MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt
	-2.21977% +/- 1.19499%
SingleSource/Benchmarks/BenchmarkGame/spectral-norm
	-2.29822% +/- 0.671871%
MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl
	-2.40975% +/- 0.355931%
SingleSource/Benchmarks/Misc/fp-convert
	-2.41899% +/- 1.04751%
MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl
	-2.50349% +/- 0.126765%
SingleSource/Benchmarks/Misc/flops-3
	-3.00214% +/- 0.700795%
MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt
	-3.56995% +/- 3.2929%
MultiSource/Applications/sgefa/sgefa
	-4.24908% +/- 2.00413%
MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk
	-18.1294% +/- 3.96489%

regressions:
MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
	1.03249% +/- 0.178547%
MultiSource/Applications/hexxagon/hexxagon
	1.16597% +/- 0.285235%
MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt
	1.39576% +/- 1.07855%
SingleSource/Benchmarks/Misc-C++/stepanov_v1p2
	1.71539% +/- 0.173182%
MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1
	1.90013% +/- 0.866472%
MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl
	2.39854% +/- 1.05914%
MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl
	2.4402% +/- 0.817904%
MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl
	5.87997% +/- 3.3172%
MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc
	9.02643% +/- 5.79591%
MultiSource/Benchmarks/VersaBench/bmm/bmm
	10.3517% +/- 1.227%

Obviously, there are data points on both sides of this; but I think, overall,
this supports making the change.

llvm-svn: 195951
2013-11-29 07:04:59 +00:00
Hal Finkel
69f21285ed Create a PPC440 SchedMachineModel
Some of the older PPC processor definitions don't have associated
SchedMachineModels; correct this for the PPC440.

llvm-svn: 195949
2013-11-29 06:32:17 +00:00
Hal Finkel
c5a38fd3e6 Fixup PPC440 load/store operand latencies
The operand latencies for loads and stores in the PPC440 itinerary were wrong
(the store operands are all inputs, and the "with update" (pre-increment)
instructions need a latency for the additional output).

llvm-svn: 195948
2013-11-29 06:19:43 +00:00
Hal Finkel
a9d93b1740 Adjust PPC440 operand latencies
The operand latencies for the PPC440 should be specified relative to dispatch,
not relative to the initial fetch-and-decode stages. Because most instructions
(ignoring bypass) wait in dispatch until their operands are ready, this is
modeled as reading input operands "at dispatch" (0 cycles after issue), and so
every input and output operand has 4 cycles subtracted from it.

This could alter scheduling slightly, but I don't expect a large effect.

llvm-svn: 195947
2013-11-29 05:59:00 +00:00
Hal Finkel
f086fc01ab Don't model the fetch and decode units for the PPC440
Modeling the fetch and decode units in the PPC440 itinerary does not add
anything to the hazard detection capability (and so modeling them just wastes
compile time).

No functionality change intended.

llvm-svn: 195946
2013-11-29 05:58:38 +00:00
NAKAMURA Takumi
9b851e876f [CMake] Let add_public_tablegen_target() provide intrinsics_gen, too.
I think, in principle, intrinsics_gen may be added explicitly.
That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen.
Almost all source files depend on both CommonTaleGen and intrinsics_gen.

Explicit add_dependencies() have been pruned under lib/Target.

llvm-svn: 195929
2013-11-28 17:04:31 +00:00
NAKAMURA Takumi
99f544b37e [CMake] Let add_public_tablegen_target responsible to provide dependency to CommonTableGen.
add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS.
LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope.

llvm-svn: 195927
2013-11-28 17:04:04 +00:00
NAKAMURA Takumi
5dbd3bcf3d [CMake] Prune include_directories() in llvm/lib/Target. add_llvm_target() sets them.
llvm-svn: 195921
2013-11-28 14:53:30 +00:00
Rafael Espindola
05d05c4e8a Use the mangler consistently instead of using getGlobalPrefix directly.
llvm-svn: 195911
2013-11-28 08:59:52 +00:00
Hal Finkel
e49bc01fba Don't share functional units among the PPC itineraries
Instead of sharing functional unit names between the various PPC itineraries,
give each core its own unit names prefixed with the core name.  This follows
the convention used by other backends (such as ARM), and removes a non-obvious
ordering dependency between the various PPCSchedule*.td files.

No functionality change intended.

llvm-svn: 195908
2013-11-28 06:05:59 +00:00
Hal Finkel
40fc5609c6 Add IIC_ prefix to PPC instruction-class names
This adds the IIC_ prefix to the instruction itinerary class names, giving the
PPC backend a naming convention for itinerary classes that is more consistent
with that used by the X86 and ARM backends.

Instruction scheduling in the PPC backend needs a bunch of cleanup and
improvement (especially for the ooo cores). This is just a preliminary step.

No functionality change intended.

llvm-svn: 195890
2013-11-27 23:26:09 +00:00
Rafael Espindola
916039a184 Don't set GlobalPrefix to the default value.
llvm-svn: 195884
2013-11-27 21:57:54 +00:00
Hal Finkel
4dc7aae3a3 Fix comment in PPCA2Model
llvm-svn: 195807
2013-11-27 03:12:56 +00:00
Hal Finkel
fb82ed6bb5 PPC popcnt[dw] do not have record forms
The instruction definitions incorrectly specified that popcntd and popcntw have
record forms; they do not. This mistake was causing invalid code generation.

llvm-svn: 195272
2013-11-20 20:54:55 +00:00
Hal Finkel
d1fc028d62 PPC: Optimize rldicl generation for masked shifts
Masking operations (where only some number of the low bits are being kept) are
selected to rldicl(x, 0, mb). If x is a logical right shift (which would become
rldicl(y, 64-n, n)), we might be able to fold the two instructions together:

  rldicl(rldicl(x, 64-n, n), 0, mb) -> rldicl(x, 64-n, mb) for n <= mb

The right shift is really a left rotate followed by a mask, and if the explicit
mask is a more-restrictive sub-mask of the mask implied by the shift, only one
rldicl is needed.

llvm-svn: 195185
2013-11-20 01:10:15 +00:00
Juergen Ributzka
5357a6d64b [weak vtables] Remove a bunch of weak vtables
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file. The memory leaks in this version have been fixed. Thanks
Alexey for pointing them out.

Differential Revision: http://llvm-reviews.chandlerc.com/D2068

Reviewed by Andy

llvm-svn: 195064
2013-11-19 00:57:56 +00:00
Alexey Samsonov
3bfef6bdb6 Revert r194865 and r194874.
This change is incorrect. If you delete virtual destructor of both a base class
and a subclass, then the following code:
  Base *foo = new Child();
  delete foo;
will not cause the destructor for members of Child class. As a result, I observe
plently of memory leaks. Notable examples I investigated are:
ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl.

llvm-svn: 194997
2013-11-18 09:31:53 +00:00
Juergen Ributzka
ee3af15269 [weak vtables] Remove a bunch of weak vtables
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file.

Differential Revision: http://llvm-reviews.chandlerc.com/D2068

Reviewed by Andy

llvm-svn: 194865
2013-11-15 22:34:48 +00:00
Bob Wilson
d433cf7463 Avoid illegal integer promotion in fastisel
Stop folding constant adds into GEP when the type size doesn't match.
Otherwise, the adds' operands are effectively being promoted, changing the
conditions of an overflow.  Results are different when:

    sext(a) + sext(b) != sext(a + b)

Problem originally found on x86-64, but also fixed issues with ARM and PPC,
which used similar code.

<rdar://problem/15292280>

Patch by Duncan Exon Smith!

llvm-svn: 194840
2013-11-15 19:09:27 +00:00
Hal Finkel
2d9d341e70 Add PPC option for full register names in asm
On non-Darwin PPC systems, we currently strip off the register name prefix
prior to instruction printing. So instead of something like this:

  mr r3, r4

we print this:

  mr 3, 4

The first form is the default on Darwin, and is understood by binutils, but not
yet understood by our integrated assembler. Once our integrated-as understands
full register names as well, this temporary option will be replaced by tying
this functionality to the verbose-asm option. The numeric-only form is
compatible with legacy assemblers and tools, and is also gcc's default on most
PPC systems. On the other hand, it is harder to read, and there are some
analysis tools that expect full register names.

llvm-svn: 194384
2013-11-11 14:58:40 +00:00
Rui Ueyama
cac5d9f1f9 Use StringRef::startswith_lower. No functionality change.
llvm-svn: 193796
2013-10-31 19:59:55 +00:00
Rafael Espindola
68ddc56344 Add a helper getSymbol to AsmPrinter.
llvm-svn: 193627
2013-10-29 17:07:16 +00:00
Eric Christopher
25d167bd58 Add support for the VSX target attribute. No functional change
as we don't actually use it to emit any code yet.

llvm-svn: 192837
2013-10-16 20:38:58 +00:00
Rafael Espindola
90d8b36e1e Add a MCAsmInfoELF class and factor some code into it.
We had a MCAsmInfoCOFF, but no common class for all the ELF MCAsmInfos before.

llvm-svn: 192760
2013-10-16 01:34:32 +00:00
Rafael Espindola
6267c79fdb Add a MCTargetStreamer interface.
This patch fixes an old FIXME by creating a MCTargetStreamer interface
and moving the target specific functions for ARM, Mips and PPC to it.

The ARM streamer is still declared in a common place because it is
used from lib/CodeGen/ARMException.cpp, but the Mips and PPC are
completely hidden in the corresponding Target directories.

I will send an email to llvmdev with instructions on how to use this.

llvm-svn: 192181
2013-10-08 13:08:17 +00:00
Rafael Espindola
e60c3625e3 Remove getEHExceptionRegister and getEHHandlerRegister.
They haven't been used for a long time. Patch by MathOnNapkins.

llvm-svn: 192099
2013-10-07 13:39:22 +00:00
Bill Schmidt
b5aca928c2 [PowerPC] Fix PR17354: Generate nop after local calls for PIC code.
When generating code for shared libraries, even local calls may be
intercepted, so we need a nop after the call for the linker to fix up the
TOC.  Test case adapted from the one provided in PR17354.

llvm-svn: 191440
2013-09-26 17:09:28 +00:00
David Majnemer
be4c5e7ce1 PPC: Allow partial fills in writeNopData()
When asked to pad an irregular number of bytes, we should fill with
zeros.  This is consistent with the behavior specified in the AIX
Assembler Language Reference as well as other LLVM and binutils
assemblers.

N.B. There is a small deviation from binutils' PPC assembler:
when handling pads which are greater than 4 bytes but not mod 4,
binutils will not emit any NOP sequences at all and only use zeros.
This may or may not be a bug but there is no excellent rationale as to
why that behavior is important to emulate.  If that behavior is needed,
we can change writeNopData() to behave in the same way.

This fixes PR17352.

llvm-svn: 191426
2013-09-26 09:18:48 +00:00
David Majnemer
9ed79f2d96 PPC: Do not introduce ISD nodes for fctid and fctiw
llvm-svn: 191421
2013-09-26 05:22:11 +00:00
David Majnemer
2652c503c6 PPC: Add support for fctid and fctiw
Encodings were checked against the Power ISA documents and double
checked against binutils.

This fixes PR17350.

llvm-svn: 191419
2013-09-26 04:11:24 +00:00
David Majnemer
1ffe32b198 MC: Add support for treating $ as a reference to the PC
The binutils assembler supports a mode called DOLLAR_DOT which treats
the dollar sign token as a reference to the current program counter if
the dollar sign doesn't precede a constant or identifier.

This commit adds a new MCAsmInfo flag stating whether or not a given
target supports this interpretation of the dollar sign token; by
default, this flag is not enabled.

Further, enable this flag for PPC. The system assembler for AIX and
binutils both support using the dollar sign in this manner.

This fixes PR17353.

llvm-svn: 191368
2013-09-25 10:47:21 +00:00
David Majnemer
30b6b79b54 MC: Remove vestigial PCSymbol field from AsmInfo
llvm-svn: 191362
2013-09-25 09:36:11 +00:00
Tim Northover
c9a7e47164 ISelDAG: spot chain cycles involving MachineNodes
Previously, the DAGISel function WalkChainUsers was spotting that it
had entered already-selected territory by whether a node was a
MachineNode (amongst other things). Since it's fairly common practice
to insert MachineNodes during ISelLowering, this was not the correct
check.

Looking around, it seems that other nodes get their NodeId set to -1
upon selection, so this makes sure the same thing happens to all
MachineNodes and uses that characteristic to determine whether we
should stop looking for a loop during selection.

This should fix PR15840.

llvm-svn: 191165
2013-09-22 08:21:56 +00:00
Hal Finkel
7e8ba290f6 Correct the pre-increment load latencies in the PPC A2 itinerary
Pre-increment loads are microcoded on the A2, and the address increment occurs
only after the load completes. As a result, the latency of the GPR address
update is an additional 2 cycles on top of the load latency.

llvm-svn: 191156
2013-09-22 00:08:14 +00:00
Bill Schmidt
ae6c256723 [PowerPC] Add a FIXME.
Documenting a design choice to generate only medium model sequences for TLS
addresses at this time.  Small and large code models could be supported if
necessary.

llvm-svn: 190883
2013-09-17 20:22:05 +00:00
Bill Schmidt
f26512486a [PowerPC] Fix problems with large code model (PR17169).
Large code model on PPC64 requires creating and referencing TOC entries when
using the addis/ld form of addressing.  This was not being done in all cases.
The changes in this patch to PPCAsmPrinter::EmitInstruction() fix this.  Two
test cases are also modified to reflect this requirement.

Fast-isel was not creating correct code for loading floating-point constants
using large code model.  This also requires the addis/ld form of addressing.
Previously we were using the addis/lfd shortcut which is only applicable to
medium code model.  One test case is modified to reflect this requirement.

llvm-svn: 190882
2013-09-17 20:03:25 +00:00
Bill Schmidt
b9dc991f55 [PowerPC] Fix PR17155 - Ignore COPY_TO_REGCLASS during emit.
Fast-isel generates a COPY_TO_REGCLASS for widening f32 to f64, which
is a nop on PPC64.  This is needed to keep the register class system
happy, but on the fast-isel path it is not removed before emit as it
is for DAG select.  Ignore this op when emitting instructions.

llvm-svn: 190795
2013-09-16 17:25:12 +00:00
Hal Finkel
5bb449bca0 PPC: Don't restrict lvsl generation to after type legalization
This is a re-commit of r190764, with an extra check to make sure that we're not
performing the transformation on illegal types (a small test case has been
added for this as well).

Original commit message:

The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.

llvm-svn: 190771
2013-09-15 22:09:58 +00:00
Hal Finkel
c45bfe85cc Revert r190764: PPC: Don't restrict lvsl generation to after type legalization
This is causing test-suite failures.

Original commit message:

The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.

llvm-svn: 190765
2013-09-15 15:41:11 +00:00
Hal Finkel
ae7feec56e PPC: Don't restrict lvsl generation to after type legalization
The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.

llvm-svn: 190764
2013-09-15 15:20:54 +00:00
Hal Finkel
ac7ef3f74f Add missing break statement in PPCISelLowering
As it turns out, not a problem in practice, but it should be there.

llvm-svn: 190720
2013-09-13 20:09:02 +00:00
Chandler Carruth
e62d2f2be7 Remove an unused variable, fixing -Werror build with latest Clang.
llvm-svn: 190640
2013-09-12 23:30:48 +00:00
Hal Finkel
4b3cfb4727 Fix PPC ABI for ByVal structs with vector members
When a structure is passed by value, and that structure contains a vector
member, according to the PPC ABI, the structure will receive enhanced alignment
(so that the vector within the structure will always be aligned).

This should resolve PR16641.

llvm-svn: 190636
2013-09-12 23:20:06 +00:00
Hal Finkel
47bfa9a072 Make the PPC fast-math sqrt expansion safe at 0
In fast-math mode sqrt(x) is calculated using the fast expansion of the
reciprocal of the reciprocal sqrt expansion. The reciprocal and reciprocal
sqrt expansions use the associated estimate instructions along with some Newton
iterations. Unfortunately, as a result, sqrt(0) was being calculated as NaN,
which is not correct. Now we explicitly return a result of zero if the input is
zero.

llvm-svn: 190624
2013-09-12 19:04:12 +00:00
Roman Divacky
582df28090 Implement asm support for a few PowerPC bookIII that are needed for assembling
FreeBSD kernel.

llvm-svn: 190618
2013-09-12 17:50:54 +00:00
Hal Finkel
5ddc53b3ef Mark PPC MFTB and DST (and friends) as deprecated
Use the new instruction deprecation feature to mark mftb (now replaced with
mfspr) and dst (along with the other Altivec cache control instructions) as
deprecated when targeting cores supporting at least ISA v2.03.

llvm-svn: 190605
2013-09-12 14:40:06 +00:00
Joey Gouly
fccb3bcae3 Add an instruction deprecation feature to TableGen.
The 'Deprecated' class allows you to specify a SubtargetFeature that the
instruction is deprecated on.

The 'ComplexDeprecationPredicate' class allows you to define a custom
predicate that is called to check for deprecation.
For example:
  ComplexDeprecationPredicate<"MCR">

would mean you would have to define the following function:
  bool getMCRDeprecationInfo(MCInst &MI, MCSubtargetInfo &STI,
                             std::string &Info)

Which returns 'false' for not deprecated, and 'true' for deprecated
and store the warning message in 'Info'.

The MCTargetAsmParser constructor was chaned to take an extra argument of
the MCInstrInfo class, so out-of-tree targets will need to be changed.

llvm-svn: 190598
2013-09-12 10:28:05 +00:00
Hal Finkel
6164109851 PPC: Enable aggressive anti-dependency breaking
Aggressive anti-dependency breaking is enabled by default for all PPC cores.
This provides a general speedup on the P7 and other platforms (among other
factors, the instruction group formation for the non-embedded PPC cores is done
during post-RA scheduling). In order to do this safely, the incompatibility
between uses of the MFOCRF instruction and anti-dependency breaking are
resolved by marking MFOCRF with hasExtraSrcRegAllocReq. As noted in the removed
FIXME, the problem was that MFOCRF's output is sensitive to the identify of the
source register, and always paired with a shift to undo this effect. Because
anti-dependency breaking is unaware of this hidden dependency of the shift
amount on the source register of the MFOCRF instruction, changing that register
must be inhibited.

Two test cases were adjusted: The SjLj test was made more insensitive to
register choices and scheduling; the saveCR test disabled anti-dependency
breaking because part of what it is testing is proper register reuse.

llvm-svn: 190587
2013-09-12 05:24:49 +00:00
Hal Finkel
2f2965c149 Greatly simplify the PPC A2 scheduling itinerary
As Andy pointed out to me a long time ago, there are no structural hazards in
the later pipeline stages of the A2, and so modeling them is useless. Also,
modeling the top pre-dispatch stages is deceiving because, when multiple
hardware threads are active, those resources are shared among the threads. The
bypass definitions were mostly wrong, and so those have been removed. The
resulting itinerary is much simpler, and more accurate.

llvm-svn: 190562
2013-09-11 23:25:21 +00:00
Hal Finkel
bad4087f4a Enable MI scheduling (and CodeGen AA) by default for embedded PPC cores
For embedded PPC cores (especially the A2 core), using the MI scheduler with AA
is far superior to the other scheduling options.

llvm-svn: 190558
2013-09-11 23:05:25 +00:00
Hal Finkel
87afacb632 Implement TTI getUnrollingPreferences for PowerPC
The PowerPC A2 core greatly benefits from aggressive concatenation unrolling;
use the new getUnrollingPreferences to enable this by default when targeting
the PPC A2 core.

llvm-svn: 190549
2013-09-11 21:20:40 +00:00
Bill Wendling
2c532e9c9b Generate compact unwind encoding from CFI directives.
We used to generate the compact unwind encoding from the machine
instructions. However, this had the problem that if the user used `-save-temps'
or compiled their hand-written `.s' file (with CFI directives), we wouldn't
generate the compact unwind encoding.

Move the algorithm that generates the compact unwind encoding into the
MCAsmBackend. This way we can generate the encoding whether the code is from a
`.ll' or `.s' file.

<rdar://problem/13623355>

llvm-svn: 190290
2013-09-09 02:37:14 +00:00
Charles Davis
5191e0b0d0 Move everything depending on Object/MachOFormat.h over to Support/MachO.h.
llvm-svn: 189728
2013-09-01 04:28:48 +00:00
Bill Schmidt
fbcebc7d75 [PowerPC] Fast-isel cleanup patch.
Here are a few miscellaneous things to tidy up the PPC64 fast-isel
implementation.  I corrected a couple of commentary lapses, and added
documentation of future opportunities.  I also implemented
TargetMaterializeAlloca, which I somehow forgot when I split up the
original huge patch.

Finally, I decided to delete SelectCmp.  I hadn't previously hooked it
in to TargetSelectInstruction(), and when I did I realized it wasn't
serving any useful purpose.  This is only useful for compares that
don't feed a branch in the same block, and to handle that we would
have to have logic to interpret i1 as a condition register.  This
could probably be done, but would require Unseemly Hackery, and
honestly does not seem worth the hassle.

This ends the current patch series.

llvm-svn: 189715
2013-08-31 02:33:40 +00:00
Bill Schmidt
98963cd850 [PowerPC] Add integer truncation support to fast-isel.
This is the last substantive patch I'm planning for fast-isel in the
near future, adding fast selection of integer truncates.  There are
certainly more things that can be improved (many of which are called
out in FIXMEs), but for now we are catching most of the important
cases.

I'll document some of the remaining work in a cleanup patch shortly.

llvm-svn: 189706
2013-08-30 23:31:33 +00:00
Bill Schmidt
62a0b5c55b Correct partially defined variable
llvm-svn: 189705
2013-08-30 23:25:30 +00:00
Bill Schmidt
65bf01a470 [PowerPC] Call support for fast-isel.
This patch adds fast-isel support for calls (but not intrinsic calls
or varargs calls).  It also removes a badly-formed assert.  There are
some new tests just for calls, and also for folding loads into
arguments on calls to avoid extra extends.

llvm-svn: 189701
2013-08-30 22:18:55 +00:00
Bill Schmidt
886231ba0f [PowerPC] Add handling for conversions to fast-isel.
Yet another chunk of fast-isel code.  This one handles various
conversions involving floating-point.  (It also includes some
miscellaneous handling throughout the back end for LWA_32 and LWAX_32
that should have been part of the load-store patch.)

llvm-svn: 189677
2013-08-30 15:18:11 +00:00
Bill Schmidt
07bcdd6b9c [PowerPC] Handle selection of compare instructions in fast-isel.
Mostly trivial patch adding support for compares.  The meat of the
work was added with the branch support.

llvm-svn: 189639
2013-08-30 03:16:48 +00:00
Bill Schmidt
3f2df6119d Remove bogus debug statement. Sheesh.
llvm-svn: 189638
2013-08-30 03:07:11 +00:00
Bill Schmidt
b3f46e50b4 [PowerPC] Add loads, stores, and related things to fast-isel.
This is the next big chunk of fast-isel code.  The primary purpose is
to implement selection of loads and stores, but there is a lot of
drag-along to support this.  The common code to analyze addresses for
both loads and stores is substantial.  It's also necessary to add the
materialization code for global values.

Related to load-store processing is the code to fold loads into
integer extends, since otherwise we generate lots of redundant
instructions.  We also need to add some overrides to some FastEmit
routines to ensure we don't assign GPR 0 to a virtual register when
this would change the meaning of an instruction.

I added handling selection of a few binary arithmetic instructions, to
enable committing some test cases I wrote a while back.

Finally, ap couple of miscellaneous changes:
 * I cleaned up some poor style from a previous patch in
   PPCISelLowering.cpp, pointed out by David Blaikie.
 * I enlarged the Addr.Offset field to avoid sign problems with 32-bit
   offsets. 

llvm-svn: 189636
2013-08-30 02:29:45 +00:00
Alexey Samsonov
edf369f1c4 Fix use of uninitialized value added in r189400 (found by MemorySanitizer)
llvm-svn: 189456
2013-08-28 08:30:47 +00:00
Joerg Sonnenberger
b2cc802497 Given target assembler parsers a chance to handle variant expressions
first. Use this to turn the PPC modifiers into PPC specific expressions,
allowing them to work on constants.

llvm-svn: 189400
2013-08-27 20:23:19 +00:00
Charles Davis
6e439dabdb Revert "Fix the build broken by r189315." and "Move everything depending on Object/MachOFormat.h over to Support/MachO.h."
This reverts commits r189319 and r189315. r189315 broke some tests on what I
believe are big-endian platforms.

llvm-svn: 189321
2013-08-27 05:38:30 +00:00
Charles Davis
cecfbfaf57 Move everything depending on Object/MachOFormat.h over to Support/MachO.h.
llvm-svn: 189315
2013-08-27 05:00:43 +00:00
Bill Schmidt
6b370e28fb Dummy code to silence warning from 4189266
llvm-svn: 189272
2013-08-26 20:11:46 +00:00
Bill Schmidt
84204dd94d [PowerPC] More fast-isel chunks (returns and integer extends)
Incremental improvement to fast-isel for PPC64.  This allows us to
select on ret, sext, and zext.  Filling in sext/zext improves some of
the existing logic in handling compare-immediates that needed extends.

A simplified return convention for fast-isel is also added to the
PPC64 calling conventions.  All call/return processing for DAG
selection is handled with custom code, so there isn't an existing CC
to rely on here.  The include of PPCGenCallingConv.inc causes compiler
warnings due to the 32-bit calling conventions that are not used, so
the dummy function "usePPC32CCs()" is added here to silence those.

Test cases for the return and extend logic are added.

llvm-svn: 189266
2013-08-26 19:42:51 +00:00
Bill Schmidt
c010bfbf75 [PowerPC] Add fast-isel branch and compare selection.
First chunk of actual fast-isel selection code.  This handles direct
and indirect branches, as well as feeding compares for direct
branches.  PPCFastISel::PPCEmitIntExt() is just roughed in and will be
expanded in a future patch.  This also corrects a problem with
selection for constant pool entries in JIT mode or with small code
model.

llvm-svn: 189202
2013-08-25 22:33:42 +00:00
Bill Schmidt
178a1ca418 [PowerPC] More refactoring prior to real PPC emitPrologue/Epilogue changes.
(Patch committed on behalf of Mark Minich, whose log entry follows.)

This is a continuation of the refactorings performed in svn rev 188573
(see that rev's comments for more detail).

This is my stage 2 refactoring: I combined the emitPrologue() &
emitEpilogue() PPC32 & PPC64 code into a single flow, simplifying a
lot of the code since in essence the PPC32 & PPC64 code generation
logic is the same, only the instruction forms are different (in most
cases). This simplification is necessary because my functional changes
(yet to come) add significant complexity, and without the
simplification of my stage 2 refactoring, the overall complexity of
both emitPrologue() & emitEpilogue() would have become almost
intractable for most mortal programmers (like me).

This submission was intended to be a pure refactoring (no functional
changes whatsoever). However, in the process of combining the PPC32 &
PPC64 flows, I spotted a difference that I believe is a bug (see svn
rev 186478 line 863, or svn rev 188573 line 888): This line appears to
be restoring the BP with the original FP content, not the original BP
content. When I merged the 32-bit and 64-bit code, I used the
corresponding code from the 64-bit flow, which I believe uses the
correct offset (BPOffset) for this operation.

llvm-svn: 188741
2013-08-20 03:12:23 +00:00
Hal Finkel
8f395a803a Add a llvm.copysign intrinsic
This adds a llvm.copysign intrinsic; We already have Libfunc recognition for
copysign (which is turned into the FCOPYSIGN SDAG node). In order to
autovectorize calls to copysign in the loop vectorizer, we need a corresponding
intrinsic as well.

In addition to the expected changes to the language reference, the loop
vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into
an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a
few lists in LegalizeVector{Ops,Types} so that vector copysigns can be
expanded.

In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN
be Expand for vector types. This seems correct for all in-tree targets, and I
think is the right thing to do because, previously, there was no way to generate
vector-values FCOPYSIGN nodes (and most targets don't specify an action for
vector-typed FCOPYSIGN).

llvm-svn: 188728
2013-08-19 23:35:46 +00:00
Hal Finkel
4bb40e7c8d Don't form PPC CTR-based loops around a copysignl call
copysign/copysignf never become function calls (because the SDAG expansion code
does not lower to the corresponding function call, but rather directly
implements the associated logic), but copysignl almost always is lowered into a
call to the requested libm functon (and, thus, might clobber CTR).

llvm-svn: 188727
2013-08-19 23:35:24 +00:00
Hal Finkel
9591220c33 Add the PPC fcpsgn instruction
Modern PPC cores support a floating-point copysign instruction, and we can use
this to lower the FCOPYSIGN node (which is created from calls to the libm
copysign function). A couple of extra patterns are necessary because the
operand types of FCOPYSIGN need not agree.

llvm-svn: 188653
2013-08-19 05:01:02 +00:00
Bill Schmidt
1bd9d284d6 [PowerPC] Preparatory refactoring for making prologue and epilogue
safe on PPC32 SVR4 ABI

[Patch and following text by Mark Minich; committing on his behalf.]

There are FIXME's in PowerPC/PPCFrameLowering.cpp, method
PPCFrameLowering::emitPrologue() related to "negative offsets of R1"
on PPC32 SVR4. They're true, but the real issue is that on PPC32 SVR4
(and any ABI without a Red Zone), no spills may be made until after
the stackframe is claimed, which also includes the LR spill which is
at a positive offset. The same problem exists in emitEpilogue(),
though there's no FIXME for it. I intend to fix this issue, making
LLVM-compiled code finally safe for use on SVR4/EABI/e500 32-bit
platforms (including in particular, OS-free embedded systems & kernel
code, where interrupts may share the same stack as user code).

In preparation for making these changes, to make the diffs for the
functional changes less cluttered, I am providing the non-functional
refactorings in two stages:

Stage 1 does some minor fluffy refactorings to pull multiple method
calls up into a single bool, creating named bools for repeated uses of
obscure logic, moving some code up earlier because either stage 2 or
my final version will require it earlier, and rewording/adding some
comments. My stage 1 changes can be characterized as primarily fluffy
cleanup, the purpose of which may be unclear until the stage 2 or
final changes are made.

My stage 2 refactorings combine the separate PPC32 & PPC64 logic,
which is currently performed by largely duplicate code, into a single
flow, with the differences handled by a group of constants initialized
early in the methods.

This submission is for my stage 1 changes. There should be no
functional changes whatsoever; this is a pure refactoring.

llvm-svn: 188573
2013-08-16 20:05:04 +00:00
Craig Topper
2653227b8f Replace getValueType().getSimpleVT() with getSimpleValueType(). Also remove one weird cast from MVT->EVT just to call getSimpleVT().
llvm-svn: 188441
2013-08-15 02:33:50 +00:00
Hal Finkel
a89d228510 Actually fix PPC64 64-bit GPR inline asm constraint matching
This is a follow-up to r187693, correcting that code to request the correct
register class. The previous version, with the wrong register class, was not
really correcting the constraints, but rather was removing them. Coincidentally,
this fixed the failing test case in r187693, but obviously created other
problems.

llvm-svn: 188407
2013-08-14 20:05:04 +00:00
David Fang
3e8c045f48 cast fix to appease buildbot
llvm-svn: 188014
2013-08-08 21:29:30 +00:00
David Fang
772a101ff0 initial draft of PPCMachObjectWriter.cpp
this records relocation entries in the mach-o object file
for PIC code generation.
tested on powerpc-darwin8, validated against darwin otool -rvV

llvm-svn: 188004
2013-08-08 20:14:40 +00:00
Hal Finkel
e76170ce53 PPC: Map frin to round() not nearbyint() and rint()
Making use of the recently-added ISD::FROUND, which allows for custom lowering
of round(), the PPC backend will now map frin to round(). Previously, we had
been using frin to lower nearbyint() (and rint() via some custom lowering to
handle the extra fenv flags requirements), but only in fast-math mode because
frin does not tie-to-even. Several users had complained about this behavior,
and this new mapping of frin to round is certainly more appropriate (and does
not require fast-math mode).

In effect, this reverts r178362 (and part of r178337, replacing the nearbyint
mapping with the round mapping).

llvm-svn: 187960
2013-08-08 04:31:34 +00:00
Hal Finkel
bdc7aa32c1 Add ISD::FROUND for libm round()
All libm floating-point rounding functions, except for round(), had their own
ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm
adding ISD::FROUND so that round() can be custom lowered as well.

For the most part, this is straightforward. I've added an intrinsic
and a matching ISD node just like those for nearbyint() and friends. The
SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed
fround).

This will be used by the PowerPC backend in a follow-up commit.

llvm-svn: 187926
2013-08-07 22:49:12 +00:00
Hal Finkel
71d37e18da Add PPC64 mulli pattern
The PPC backend had been missing a pattern to generate mulli for 64-bit
multiples. We had been generating it only for 32-bit multiplies. Unfortunately,
generating li + mulld unnecessarily increases register pressure.

llvm-svn: 187807
2013-08-06 17:03:03 +00:00
NAKAMURA Takumi
0eb9242c56 Target/*/CMakeLists.txt: Add the dependency to CommonTableGen explicitly for each corresponding CodeGen.
Without explicit dependencies, both per-file action and in-CommonTableGen action could run in parallel.
It races to emit *.inc files simultaneously.

llvm-svn: 187780
2013-08-06 06:38:37 +00:00
Benjamin Kramer
a913e72728 PPCAsmParser: Stop leaking names.
Store them in a place that gets cleaned up properly.

llvm-svn: 187700
2013-08-03 22:43:29 +00:00
Hal Finkel
f91cfcdaed Fix PPC64 64-bit GPR inline asm constraint matching
Internally, the PowerPC backend names the 32-bit GPRs R[0-9]+, and names the
64-bit parent GPRs X[0-9]+. When matching inline assembly constraints with
explicit register names, on PPC64 when an i64 MVT has been requested, we need
to follow gcc's convention of using r[0-9]+ to refer to the 64-bit (parent)
registers.

At some point, we'll probably want to arrange things so that the generic code
in TargetLowering uses the AsmName fields declared in *RegisterInfo.td in order
to match these inline asm register constraints. If we do that, this change can
be reverted.

llvm-svn: 187693
2013-08-03 12:25:10 +00:00
Bill Wendling
e7b7059f1d Use function attributes to indicate that we don't want to realign the stack.
Function attributes are the future! So just query whether we want to realign the
stack directly from the function instead of through a random target options
structure.

llvm-svn: 187618
2013-08-01 21:42:05 +00:00
Bill Schmidt
f02898d71f [PowerPC] Skeletal FastISel support for 64-bit PowerPC ELF.
This is the first of many upcoming patches for PowerPC fast
instruction selection support.  This patch implements the minimum
necessary for a functional (but extremely limited) FastISel pass.  It
allows the table-generated portions of the selector to be created and
used, but in most cases selection will fall back to the DAG selector.
None of the block terminator instructions are implemented yet, and
most interesting instructions require some special handling.
Therefore there aren't any new test cases with this patch.  There will
be quite a few tests coming with future patches.

This patch adds the make/CMake support for the new code (including
tablegen -gen-fast-isel) and creates the FastISel object for PPC64 ELF
only.  It instantiates the necessary virtual functions
(TargetSelectInstruction, TargetMaterializeConstant,
TargetMaterializeAlloca, tryToFoldLoadIntoMI, and FastLowerArguments),
but of these, only TargetMaterializeConstant contains any useful
implementation.  This is present since the table-generated code
requires the ability to materialize integer constants for some
instructions.

This patch has been tested by building and running the
projects/test-suite code with -O0.  All tests passed with the
exception of a couple of long-running tests that time out using -O0
code generation.

llvm-svn: 187399
2013-07-30 00:50:39 +00:00
Bill Schmidt
f400d5844e [PowerPC] Add comment explaining preprocessor directive.
llvm-svn: 187320
2013-07-28 03:23:32 +00:00
Bill Schmidt
795f27eb7f Revert 187318
llvm-svn: 187319
2013-07-28 02:13:24 +00:00
Bill Schmidt
0c824086d1 [PowerPC] Remove unnecessary preprocessor checking.
The tests !defined(__ppc__) && !defined(__powerpc__) are not needed
or helpful when verifying that code is being compiled for a 64-bit
target.  The simpler test provided by this revision is sufficient to
tell if the target is 64-bit.

llvm-svn: 187318
2013-07-28 02:08:13 +00:00
Rafael Espindola
326babb234 Revert "[PowerPC] Improve consistency in use of __ppc__, __powerpc__, etc."
This reverts commit r187248. It broke many bots.

llvm-svn: 187254
2013-07-26 22:13:57 +00:00
Bill Schmidt
149eda1b1e [PowerPC] Improve consistency in use of __ppc__, __powerpc__, etc.
Both GCC and LLVM will implicitly define __ppc__ and __powerpc__ for
all PowerPC targets, whether 32- or 64-bit.  They will both implicitly
define __ppc64__ and __powerpc64__ for 64-bit PowerPC targets, and not
for 32-bit targets.  We cannot be sure that all other possible
compilers used to compile Clang/LLVM define both __ppc__ and
__powerpc__, for example, so it is best to check for both when relying
on either inside the Clang/LLVM code base.

This patch makes sure we always check for both variants.  In addition,
it fixes one unnecessary check in lib/Target/PowerPC/PPCJITInfo.cpp.
(At least one of __ppc__ and __powerpc__ should always be defined when
compiling for a PowerPC target, no matter which compiler is used, so
testing for them is unnecessary.)

There are some places in the compiler that check for other variants,
like __POWERPC__ and _POWER, and I have left those in place.  There is
no need to add them elsewhere.  This seems to be in Apple-specific
code, and I won't take a chance on breaking it.

There is no intended change in behavior; thus, no test cases are
added.

llvm-svn: 187248
2013-07-26 21:39:15 +00:00
Bill Schmidt
88e45cc177 [PowerPC] Support powerpc64le as a syntax-checking target.
This patch provides basic support for powerpc64le as an LLVM target.
However, use of this target will not actually generate little-endian
code.  Instead, use of the target will cause the correct little-endian
built-in defines to be generated, so that code that tests for
__LITTLE_ENDIAN__, for example, will be correctly parsed for
syntax-only testing.  Code generation will otherwise be the same as
powerpc64 (big-endian), for now.

The patch leaves open the possibility of creating a little-endian
PowerPC64 back end, but there is no immediate intent to create such a
thing.

The LLVM portions of this patch simply add ppc64le coverage everywhere
that ppc64 coverage currently exists.  There is nothing of any import
worth testing until such time as little-endian code generation is
implemented.  In the corresponding Clang patch, there is a new test
case variant to ensure that correct built-in defines for little-endian
code are generated.

llvm-svn: 187179
2013-07-26 01:35:43 +00:00
Roman Divacky
32a21acb65 PPC32 va_list is an actual structure so va_copy needs to copy the whole
structure not just a pointer. This implements that and thus fixes va_copy
on PPC32. Fixes #15286. Both bug and patch by Florian Zeitz!

llvm-svn: 187158
2013-07-25 21:36:47 +00:00
David Fang
e8ec195aa6 allow tests to run on powerpc-darwin8 again, checking for __ppc__
llvm-svn: 187027
2013-07-24 07:52:16 +00:00
Craig Topper
9bf1d910c8 Split generated asm mnemonic matching table into a separate table for each asm variant.
This removes the need to store the asm variant in each row of the single table that existed before. Shaves ~16K off the size of X86AsmParser.o.

llvm-svn: 187026
2013-07-24 07:33:14 +00:00
Hal Finkel
fdd124178e PPC: Support dynamic allocas with large alignment
Support for dynamic stack alignments in the PPC backend has been unfinished, in
part because it depends on dynamic stack realignment (which I only just
recently implemented fully). Now we can also support dynamic allocas with
higher than the default target stack alignment (16 bytes).

In order to round-up the requested size to the maximum requested alignment, we
need an additional register to hold the rounded-up size. We're already using one
scavenged register to hold the previous stack-pointer value (which needs to be
stored with the signal-safe stdux update), and so when we have dynamic allocas
and a large alignment, we allocate two emergency spill slots for the scavenger.

llvm-svn: 186562
2013-07-18 04:28:21 +00:00
Hal Finkel
79a33a00d6 PPC: Add base-pointer support to builtin setjmp/longjmp
First, this changes the base-pointer implementation to remove an unnecessary
complication (and one that is incompatible with how builtin SjLj is
implemented): instead of using r31 as the base pointer when it is not needed as
a frame pointer, now the base pointer will always be r30 when needed.

Second, we introduce another pseudo register, BP, which is used just like the FP
pseudo register to refer to the base register before we know for certain what
register it will be.

Third, we now save BP into the jmp_buf, and restore r30 from that slot in
longjmp.  If the function that called setjmp did not use a base pointer, then
r30 will be overwritten by the setjmp-calling-function's restore code. FP
restoration (which is restored into r31) works the same way.

llvm-svn: 186545
2013-07-17 23:50:51 +00:00
Hal Finkel
149f358122 PPC: Add CTR-register clobber to builtin setjmp
Because the builtin longjmp implementation uses a CTR-based indirect jump, when
the control flow arrives at the builtin setjmp call, the CTR register has
necessarily been clobbered. Correspondingly, this adds CTR to the list of
implicit definitions of the builtin setjmp pseudo instruction.

We don't need to add CTR to the implicit definitions of builtin longjmp
because, even though it does clobber the CTR register, the control flow cannot
return to inside the loop unless there is also a builtin setjmp call.

llvm-svn: 186488
2013-07-17 05:35:44 +00:00
Hal Finkel
e625744d86 PPC: Implement base pointer and stack realignment
This builds on some frame-lowering code that has existed since 2005 (r24224)
but was disabled in 2008 (r48188) because it needed base pointer support to
function correctly. This implementation follows the strategy suggested by Dale
Johannesen in r48188 where the following comment was added:

  This does not currently work, because the delta between old and new stack
  pointers is added to offsets that reference incoming parameters after the
  prolog is generated, and the code that does that doesn't handle a variable
  delta.  You don't want to do that anyway; a better approach is to reserve
  another register that retains to the incoming stack pointer, and reference
  parameters relative to that.

And now we do exactly that. If we don't need a frame pointer, then we use r31
as a base pointer. If we do need a frame pointer, then we use r30 as a base
pointer. The base pointer retains the value of the stack pointer before it was
decremented in the prologue. We then use the base pointer to resolve all
negative frame indicies. The basic scheme follows that for base pointers in the
X86 backend.

We use a base pointer when we need to dynamically realign the incoming stack
pointer. This currently applies only to static objects (dynamic allocas with
large alignments, and base-pointer support in SjLj lowering will come in future
commits).

llvm-svn: 186478
2013-07-17 00:45:52 +00:00
NAKAMURA Takumi
c21b0543b6 PPCJITInfo.cpp: Tweak r186252 with s/__ppc/__powerpc/ to work on powerpc-linux Fedora 12.
g++ (GCC) 4.4.4 20100630 (Red Hat 4.4.4-10)

llvm-svn: 186396
2013-07-16 09:59:51 +00:00
Hal Finkel
4fda98b03b PPC: Refactoring to support subtarget feature changing
This change mirrors the changes that were made to the X86 and ARM targets to
support subtarget feature changing. As indicated in r182899, the mechanism is
still undergoing revision, and so as with the X86 and ARM targets, there is no
test case yet (there is no effective functionality change).

llvm-svn: 186357
2013-07-15 22:29:40 +00:00
Hal Finkel
608dbe4a4d Fix register subclass handling in PPCInstrInfo::insertSelect
PPCInstrInfo::insertSelect and PPCInstrInfo::canInsertSelect were computing the
common subclass of the true and false inputs, and then selecting either the
32-bit or the 64-bit isel variant based on the result of calling
PPC::GPRCRegClass.hasSubClassEq(RC) and PPC::G8RCRegClass.hasSubClassEq(RC)
(where RC is the common subclass). Unfortunately, this is not quite right: if
we have something like this:

  %vreg8<def> = SELECT_CC_I8 %vreg4<kill>, %vreg7<kill>, %vreg6<kill>, 76;
    G8RC_and_G8RC_NOX0:%vreg8 CRRC:%vreg4 G8RC_NOX0:%vreg7,%vreg6

then the common subclass of G8RC_and_G8RC_NOX0 and G8RC_NOX0 is G8RC_NOX0, and
G8RC_NOX0 is not a subclass of G8RC (because it also contains the ZERO8
pseudo-register). As a result, we also need to check the common subclass
against GPRC_NOR0 and G8RC_NOX0 explicitly.

This had not been a problem for clients of insertSelect that called
canInsertSelect first (because it had a compensating mistake), but insertSelect
is also used by the PPC pseudo-instruction expander, and this error was causing
a problem in that context.

This problem was found by csmith.

llvm-svn: 186343
2013-07-15 20:22:58 +00:00
Craig Topper
4e9457fd7d Use llvm::array_lengthof to replace sizeof(array)/sizeof(array[0]).
llvm-svn: 186301
2013-07-15 04:27:47 +00:00
Craig Topper
58fa7a9b4a Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size.
llvm-svn: 186274
2013-07-14 04:42:23 +00:00
Joerg Sonnenberger
cf53a1cf64 Reduce large list of macros to the primary platform macros. Distingiush
between ELF (Linux, FreeBSD, NetBSD) and OSX as platform for the
assembler dialect.

llvm-svn: 186252
2013-07-13 17:59:55 +00:00
Hal Finkel
f153e34eee PPC: Add some missing V_SET0 patterns
We had patterns to match v4i32 immAllZerosV -> V_SET0, but not patterns for
v8i16 (which occurs in the test case) or v16i8. The same was true for
V_SETALLONES (so I added the associated patterns for those as well).

Another bug found by llvm-stress.

llvm-svn: 186108
2013-07-11 17:43:32 +00:00
Hal Finkel
adac2cbb4a PPCDAGToDAGISel::isRunOfOnes should return false on zero
This fixes a bug (found by csmith) at -O0 where we attempt to create a RLWIMI
with an out-of-range operand. Most uses of the isRunOfOnes function are guarded
by a condition that the value is not zero. This was not true in two places, and
in both places a zero input would result in an out-of-rage MB value (= 32).

To fix this, isRunOfOnes returns false on a zero input (and I've remove one
now-redundant guard).

llvm-svn: 186101
2013-07-11 16:31:51 +00:00
Hal Finkel
b6802c0a3e PPC: Add a better comment about the i64 FI fixup
In discussing this change with Bill Schmidt, it was decided that the original
comment about negative FIs was incorrect. We'll still exclude them for now, but
now with a more-accurate explanation.

llvm-svn: 186005
2013-07-10 15:29:01 +00:00
Bill Schmidt
2499045a19 [PowerPC] Better fix for PR16556.
A more complete example of the bug in PR16556 was recently provided,
showing that the previous fix was not sufficient.  The previous fix is
reverted herein.

The real problem is that ReplaceNodeResults() uses LowerFP_TO_INT as
custom lowering for FP_TO_SINT during type legalization, without
checking whether the input type is handled by that routine.
LowerFP_TO_INT requires the input to be f32 or f64, so we fail when
the input is ppcf128.

I'm leaving the test case from the initial fix (r185821) in place, and
adding the new test as another crash-only check.

llvm-svn: 185959
2013-07-09 18:50:20 +00:00
Stephen Lin
30b326010c AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and all
in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in
order to resolve the following issues with fmuladd (i.e. optional FMA)
intrinsics:

1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd
intrinsics even if the subtarget does not support FMA instructions, leading
to laughably bad code generation in some situations.

2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128,
resulting in a call to a software fp128 FMA implementation.

3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types
like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize,
etc. to types that support hardware FMAs.

The function has also been slightly renamed for consistency and to force a
merge/build conflict for any out-of-tree target implementing it. To resolve,
see comments and fixed in-tree examples.

llvm-svn: 185956
2013-07-09 18:16:56 +00:00
Ulrich Weigand
b664c03f18 [PowerPC] Revert r185476 and fix up TLS variant kinds
In the commit message to r185476 I wrote:

>The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD
>correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD.
>This causes some confusion with the asm parser, since VK_PPC_TLSGD
>is output as @tlsgd, which is then read back in as VK_TLSGD.
>
>To avoid this confusion, this patch removes the PowerPC-specific
>modifiers and uses the generic modifiers throughout.  (The only
>drawback is that the generic modifiers are printed in upper case
>while the usual convention on PowerPC is to use lower-case modifiers.
>But this is just a cosmetic issue.)

This was unfortunately incorrect, there is is fact another,
serious drawback to using the default VK_TLSLD/VK_TLSGD
variant kinds: using these causes ELFObjectWriter::RelocNeedsGOT
to return true, which in turn causes the ELFObjectWriter to emit
an undefined reference to _GLOBAL_OFFSET_TABLE_.

This is a problem on powerpc64, because it uses the TOC instead
of the GOT, and the linker does not provide _GLOBAL_OFFSET_TABLE_,
so the symbol remains undefined.  This means shared libraries
using TLS built with the integrated assembler are currently
broken.

While the whole RelocNeedsGOT / _GLOBAL_OFFSET_TABLE_ situation
probably ought to be properly fixed at some point, for now I'm
simply reverting the r185476 commit.  Now this in turn exposes
the breakage of handling @tlsgd/@tlsld in the asm parser that
this check-in was originally intended to fix.

To avoid this regression, I'm also adding a different fix for
this problem: while common code now parses @tlsgd as VK_TLSGD,
a special hack in the asm parser translates this code to the
platform-specific VK_PPC_TLSGD that the back-end now expects.
While this is not really pretty, it's self-contained and
shouldn't hurt anything else for now.  One the underlying
problem is fixed, this hack can be reverted again.

llvm-svn: 185945
2013-07-09 16:41:09 +00:00
Ulrich Weigand
bf2db94897 [PowerPC] Support ".machine any"
The PowerPC assembler is supposed to provide a directive .machine
that allows switching the supported CPU instruction set on the fly.
Since we do not yet check CPU feature sets at all and always accept
any available instruction, this is not really useful at this point.

However, it makes sense to accept (and ignore) ".machine any" to
avoid spuriously rejecting existing assembler files that use this.

llvm-svn: 185924
2013-07-09 10:00:34 +00:00
Ulrich Weigand
1926a433be [PowerPC] Support .llong and fix .word
This adds support for the .llong PowerPC-specifc assembler directive.
In doing so, I notices that .word is currently incorrect: it is
supposed to define a 2-byte data element, not a 4-byte one.

llvm-svn: 185911
2013-07-09 07:59:25 +00:00
Hal Finkel
f972e31b6c PPC: Allocate RS spill slot for unaligned i64 load/store
This fixes another bug found by llvm-stress!

If we happen to be doing an i64 load or store into a stack slot that has less
than a 4-byte alignment, then the frame-index elimination may need to use an
indexed load or store instruction (because the offset may not be a multiple of
4, a requirement of the STD/LD instructions). The extra register needed to hold
the offset comes from the register scavenger, and it is possible that the
scavenger will need to use an emergency spill slot. As a result, we need to
make sure that a spill slot is allocated when doing an i64 load/store into a
less-than-4-byte-aligned stack slot.

Because test cases for things like this tend to be fairly fragile, I've
concatenated a few small bugpoint-reduced test cases together to form the
regression test.

llvm-svn: 185907
2013-07-09 06:34:51 +00:00
Ulrich Weigand
cb20efc341 [PowerPC] Always use "assembler dialect" 1
A setting in MCAsmInfo defines the "assembler dialect" to use.  This is used
by common code to choose between alternatives in a multi-alternative GNU
inline asm statement like the following:

  __asm__ ("{sfe|subfe} %0,%1,%2" : "=r" (out) : "r" (in1), "r" (in2));

The meaning of these dialects is platform specific, and GCC defines those
for PowerPC to use dialect 0 for old-style (POWER) mnemonics and 1 for
new-style (PowerPC) mnemonics, like in the example above.

To be compatible with inline asm used with GCC, LLVM ought to do the same.
Specifically, this means we should always use assembler dialect 1 since
old-style mnemonics really aren't supported on any current platform.

However, the current LLVM back-end uses:
  AssemblerDialect = 1;           // New-Style mnemonics.
in PPCMCAsmInfoDarwin, and
  AssemblerDialect = 0;           // Old-Style mnemonics.
in PPCLinuxMCAsmInfo.

The Linux setting really isn't correct, we should be using new-style
mnemonics everywhere.  This is changed by this commit.

Unfortunately, the setting of this variable is overloaded in the back-end
to decide whether or not we are on a Darwin target.  This is done in
PPCInstPrinter (the "SyntaxVariant" is initialized from the MCAsmInfo
AssemblerDialect setting), and also in PPCMCExpr.  Setting AssemblerDialect
to 1 for both Darwin and Linux no longer allows us to make this distinction.

Instead, this patch uses the MCSubtargetInfo passed to createPPCMCInstPrinter
to distinguish Darwin targets, and ignores the SyntaxVariant parameter.
As to PPCMCExpr, this patch adds an explicit isDarwin argument that needs
to be passed in by the caller when creating a target MCExpr.  (To do so
this patch implicitly also reverts commit 184441.)

llvm-svn: 185858
2013-07-08 20:20:51 +00:00
Hal Finkel
c4d29e61ee PPC: Mark vector CC action for SETO and SETONE as Expand
Another bug found by llvm-stress! This fixes hitting
  llvm_unreachable("Invalid integer vector compare condition");
at the end of getVCmpInst in PPCISelDAGToDAG.

llvm-svn: 185855
2013-07-08 20:00:03 +00:00
Hal Finkel
059614de7f PPC: Mark vector FREM as Expand by default
Another bug found by llvm-stress! This fixes crashing with:
  LLVM ERROR: Cannot select: v4f32 = frem ...

llvm-svn: 185840
2013-07-08 17:30:25 +00:00
Ulrich Weigand
2a068338ad [PowerPC] Support time base instructions
This adds support for the old-style time base instructions;
while new programs are supposed to use mfspr, the mftb instructions
are still supported and in use by existing assembler files.

llvm-svn: 185829
2013-07-08 15:20:38 +00:00
Ulrich Weigand
86dbe652aa [PowerPC] Support basic compare mnemonics
This adds support for the basic mnemoics (with the L operand) for the
fixed-point compare instructions.  These are defined as aliases for the
already existing CMPW/CMPD patterns, depending on the value of L.

This requires use of InstAlias patterns with immediate literal operands.
To make this work, we need two further changes:

 - define a RegisterPrefix, because otherwise literals 0 and 1 would
   be parsed as literal register names

 - provide a PPCAsmParser::validateTargetOperandClass routine to
   recognize immediate literals (like ARM does)

llvm-svn: 185826
2013-07-08 14:49:37 +00:00
Bill Schmidt
58913550ff [PowerPC] Fix PR16556 (handle undef ppcf128 in LowerFP_TO_INT).
PPCTargetLowering::LowerFP_TO_INT() expects its source operand to be
either an f32 or f64, but this is not checked.  A long double
(ppcf128) operand will normally be custom-lowered to a conversion to
f64 in this context.  However, this isn't the case for an UNDEF node.

This patch recognizes a ppcf128 as a legal source operand for
FP_TO_INT only if it's an undef, in which case it creates an undef of
the target type.

At some point we might want to do a wholesale custom lowering of
ISD::UNDEF when the type is ppcf128, but it's not really clear that's
a great idea, and probably more work than it's worth for a situation
that only arises in the case of a programming error.  At this point I
think simple is best.

The test case comes from PR16556, and is a crash-test only.

llvm-svn: 185821
2013-07-08 14:22:45 +00:00
Ulrich Weigand
cbd8a1faa9 [PowerPC] Add some special @got@tprel fixup cases
When a target@got@tprel or target@got@tprel@l symbol variant is used in
a fixup_ppc_half16 (*not* fixup_ppc_half16ds) context, we currently fail,
since the corresponding R_PPC64_GOT_TPREL16 / R_PPC64_GOT_TPREL16_LO
relocation types do not exist.

However, since such symbol variants resolve to GOT offsets which are
always 4-aligned, we can simply instead use the _DS variants of the
relocation types, which *do* exist.

The same applies for the @got@dtprel variants.

llvm-svn: 185700
2013-07-05 13:49:46 +00:00
Ulrich Weigand
4fe8cda44b [PowerPC] Support @tls in the asm parser
This adds support for the last missing construct to parse TLS-related
assembler code:
   add 3, 4, symbol@tls

The ADD8TLS currently hard-codes the @tls into the assembler string.
This cannot be handled by the asm parser, since @tls is parsed as
a symbol variant.  This patch changes ADD8TLS to have the @tls suffix
printed as symbol variant on output too, which allows us to remove
the isCodeGenOnly marker from ADD8TLS.  This in turn means that we
can add a AsmOperand to accept @tls marked symbols on input.

As a side effect, this means that the fixup_ppc_tlsreg fixup type
is no longer necessary and can be merged into fixup_ppc_nofixup.

llvm-svn: 185692
2013-07-05 12:22:36 +00:00