1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-25 05:52:53 +02:00
Commit Graph

4753 Commits

Author SHA1 Message Date
Chuang-Yu Cheng
83dddda4a1 [PPC64] Use mfocrf in prologue when we only need to save 1 nonvolatile CR field
In the ELFv2 ABI, we are not required to save all CR fields. If only one
nonvolatile CR field is clobbered, use mfocrf instead of mfcr to
selectively save the field, because mfocrf has short latency compares to
mfcr.

Thanks Nemanja's invaluable hint!
Reviewers: nemanjai tjablin hfinkel kbarton

http://reviews.llvm.org/D17749

llvm-svn: 266038
2016-04-12 03:04:44 +00:00
Chuang-Yu Cheng
6e4b4f696f CXX_FAST_TLS calling convention: performance improvement for PPC64
This is the same change on PPC64 as r255821 on AArch64. I have even borrowed
his commit message.

The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.

We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.

Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.

We add CSRsViaCopy, it will be explicitly handled during lowering.

1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
   supports it for the given machine function and the function has only return
   exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
   virtual registers at beginning of the entry block and copies from virtual
   registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.

Author: Tom Jablin (tjablin)
Reviewers: hfinkel kbarton cycheng

http://reviews.llvm.org/D17533

llvm-svn: 265781
2016-04-08 12:04:32 +00:00
Ehsan Amiri
36d2ad6539 [PPC] Enable transformations in PPCPassConfig::addIRPasses at O2
http://reviews.llvm.org/D18562

A large number of testcases has been modified so they pass after this test.
One testcase is deleted, because I realized even after undoing the original
change that was committed with this testcase, the testcase still passes. So
I removed it. The change to one other testcase (test/CodeGen/PowerPC/pr25802.ll)
is an arbitrary change to keep it passing. Given the original intention of the
testcase, and the fact that fixing it will require some time to change the testcase,
we concluded that this quick change will be enough.

llvm-svn: 265683
2016-04-07 15:30:55 +00:00
JF Bastien
f4f5b32f44 NFC: make AtomicOrdering an enum class
Summary:
In the context of http://wg21.link/lwg2445 C++ uses the concept of
'stronger' ordering but doesn't define it properly. This should be fixed
in C++17 barring a small question that's still open.

The code currently plays fast and loose with the AtomicOrdering
enum. Using an enum class is one step towards tightening things. I later
also want to tighten related enums, such as clang's
AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI'
enum).

This change touches a few lines of code which can be improved later, I'd
like to keep it as NFC for now as it's already quite complex. I have
related changes for clang.

As a follow-up I'll add:
  bool operator<(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>(AtomicOrdering, AtomicOrdering) = delete;
  bool operator<=(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>=(AtomicOrdering, AtomicOrdering) = delete;
This is separate so that clang and LLVM changes don't need to be in sync.

Reviewers: jyknight, reames

Subscribers: jyknight, llvm-commits

Differential Revision: http://reviews.llvm.org/D18775

llvm-svn: 265602
2016-04-06 21:19:33 +00:00
Ehsan Amiri
ee69c81e47 [PPC] Use VSX/FP Facility integer load when an integer load's only users are conversion to FP
http://reviews.llvm.org/D18405

When the integer value loaded is never used directly as integer we should use VSX 
or Floating Point Facility integer loads and avoid extra direct move

llvm-svn: 265593
2016-04-06 20:12:29 +00:00
Chuang-Yu Cheng
a5a77bb95c [ppc64] Temporary disable sibling call optimization on ppc64 due to breaking test case
r265506 breaks print-stack-trace.cc test case of compiler-rt in bootstrap
test.

http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/1708

llvm-svn: 265528
2016-04-06 10:48:36 +00:00
Matthias Braun
8623ebcaee RegisterScavenger: Take a reference as enterBasicBlock() argument.
Make it obvious that the argument cannot be nullptr.
Remove an unnecessary nullptr check in initRegState.

llvm-svn: 265511
2016-04-06 02:47:09 +00:00
Chuang-Yu Cheng
de5217e247 [ppc64] Enable sibling call optimization on ppc64 ELFv1/ELFv2 abi
This patch enable sibling call optimization on ppc64 ELFv1/ELFv2 abi, and
add a couple of test cases. This patch also passed llvm/clang bootstrap
test, and spec2006 build/run/result validation.

Original issue: https://llvm.org/bugs/show_bug.cgi?id=25617

Great thanks to Tom's (tjablin) help, he contributed a lot to this patch.
Thanks Hal and Kit's invaluable opinions!

Reviewers: hfinkel kbarton

http://reviews.llvm.org/D16315

llvm-svn: 265506
2016-04-06 02:04:38 +00:00
Chuang-Yu Cheng
45f8677a56 [Power9] Implement add-pc, multiply-add, modulo, extend-sign-shift, random number, set bool, and dfp test significance
This patch implement the following instructions:
- addpcis subpcis
- maddhd maddhdu maddld
- modsw moduw modsd modud
- darn
- extswsli extswsli.
- setb
- dtstsfi dtstsfiq

Total 15 instructions

Reviewers: nemanjai hfinkel tjablin amehsan kbarton

http://reviews.llvm.org/D17885

llvm-svn: 265505
2016-04-06 01:47:02 +00:00
Chuang-Yu Cheng
dff7434b0f [Power9] Implement copy-paste, msgsync, slb, and stop instructions
This patch implements the following BookII and Book III instructions:
- copy copy_first cp_abort paste paste. paste_last
- msgsync
- slbieg slbsync
- stop

Total 10 instructions

Reviewers: nemanjai hfinkel tjablin amehsan kbarton
llvm-svn: 265504
2016-04-06 01:46:45 +00:00
Derek Schuff
e26993d076 Add MachineFunctionProperty checks for AllVRegsAllocated for target passes
Summary:
This adds the same checks that were added in r264593 to all
target-specific passes that run after register allocation.

Reviewers: qcolombet

Subscribers: jyknight, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18525

llvm-svn: 265313
2016-04-04 17:09:25 +00:00
Chuang-Yu Cheng
c754367d2e [PPC64] Bug fix: when enabling sibling-call-opt and shrink-wrapping, the tail call branch instruction might disappear
Bug Pattern:
    # BB#0:                                 # %entry
	    cmpldi	 3, 0
	    beq-	 0, .LBB0_2
    # BB#1:                                 # %exit
	    lwz 4, 0(3)
	    #TC_RETURNd8 LVComputationKind 0
    .LBB0_2:                                # %cond.false
	    mflr 0
	    std 0, 16(1)
	    stdu 1, -96(1)
    .Ltmp0:
	    .cfi_def_cfa_offset 96
    .Ltmp1:
	    .cfi_offset lr, 16
	    bl __assert_fail
	    nop

The branch instruction for tail call return is not generated, because the
shrink-wrapping pass choosing a new Restore Point: %cond.false, so %exit
block is not sent to emitEpilogue, that's why the branch is not generated.

Thanks Kit's opinions!
Reviewers: nemanjai hfinkel tjablin kbarton

http://reviews.llvm.org/D17606

llvm-svn: 265112
2016-04-01 06:44:32 +00:00
Hal Finkel
f7a5afdf0a [PowerPC] Add a late MI-level pass for QPX load/splat simplification
Chapter 3 of the QPX manual states that, "Scalar floating-point load
instructions, defined in the Power ISA, cause a replication of the source data
across all elements of the target register." Thus, if we have a load followed
by a QPX splat (from the first lane), the splat is redundant. This adds a late
MI-level pass to remove the redundant splats in some of these cases
(specifically when both occur in the same basic block).

This optimization is scheduled just prior to post-RA scheduling. It can't happen
before anything that might replace the load with some already-computed quantity
(i.e. store-to-load forwarding).

llvm-svn: 265047
2016-03-31 20:39:41 +00:00
Hans Wennborg
07d10a6e30 Change eliminateCallFramePseudoInstr() to return an iterator
This will become necessary in a subsequent change to make this method
merge adjacent stack adjustments, i.e. it might erase the previous
and/or next instruction.

It also greatly simplifies the calls to this function from Prolog-
EpilogInserter. Previously, that had a bunch of logic to resume iteration
after the call; now it just continues with the returned iterator.

Note that this changes the behaviour of PEI a little. Previously,
it attempted to re-visit the new instruction created by
eliminateCallFramePseudoInstr(). That code was added in r36625,
but I can't see any reason for it: the new instructions will obviously
not be pseudo instructions, they will not have FrameIndex operands,
and we have already accounted for the stack adjustment.

Differential Revision: http://reviews.llvm.org/D18627

llvm-svn: 265036
2016-03-31 18:33:38 +00:00
Ehsan Amiri
6ce7458c83 [PPC] basic support for Power 9 direct move instructions
http://reviews.llvm.org/D18097

Initial support does not include any patterns to generate this instructions

llvm-svn: 265031
2016-03-31 17:47:17 +00:00
Ulrich Weigand
20f5073bab [PowerPC] Correctly compute 64-bit offsets in fast isel
PPCSimplifyAddress contains this code:

  IntegerType *OffsetTy = ((VT == MVT::i32) ? Type::getInt32Ty(*Context)
                                            : Type::getInt64Ty(*Context));

to determine the type to be used for an index register, if one needs
to be created.  However, the "VT" here is the type of the data being
loaded or stored, *not* the type of an address.  This means that if
a data element of type i32 is accessed using an index that does not
not fit into 32 bits, a wrong address is computed here.

Note that PPCFastISel is only ever used on 64-bit currently, so the type
of an address is actually *always* MVT::i64.  Other parts of the code,
even in this same PPCSimplifyAddress routine, already rely on that fact.
Thus, this patch changes the code to simply unconditionally use
Type::getInt64Ty(*Context) as OffsetTy.

llvm-svn: 265023
2016-03-31 15:37:06 +00:00
Nemanja Ivanovic
bbd7e26280 [PowerPC] Basic support for P9 atomic loads and stores
This patch corresponds to review:
http://reviews.llvm.org/D18032

This patch provides asm implementation for the following instructions:
lwat, ldat, stwat, stdat, ldmx, mcrxrx

llvm-svn: 265022
2016-03-31 15:26:37 +00:00
Ulrich Weigand
ef758401da [PowerPC] Remove incorrect use of COPY_TO_REGCLASS in fast isel
The fast isel pass currently emits a COPY_TO_REGCLASS node to convert
from a F4RC to a F8RC register class during conversion of a
floating-point number to integer. There is actually no support in the
common code instruction printers to emit COPY_TO_REGCLASS nodes, so the
PowerPC back-end has special code there to simply ignore
COPY_TO_REGCLASS.

This is correct *if and only if* the source and destination registers of
COPY_TO_REGCLASS are the same (except for the different register class).
But nothing guarantees this to be the case, and if the register
allocator does end up allocating source and destination to different
registers after all, the back-end simply generates incorrect code. I've
included a test case that shows such incorrect code generation.

However, it seems that COPY_TO_REGCLASS is actually not intended to be
used at the MI layer at all. It is used during SelectionDAG, but always
lowered to a plain COPY before emitting MI. Other back-end's fast isel
passes never emit COPY_TO_REGCLASS at all. I suspect it is simply wrong
for the PowerPC back-end to emit it here.

This patch changes the PowerPC back-end to directly emit COPY instead of
COPY_TO_REGCLASS and removes the special handling in the instruction
printers.

Differential Revision: http://reviews.llvm.org/D18605

llvm-svn: 265020
2016-03-31 14:44:50 +00:00
Hal Finkel
fb163aaea6 [PowerPC] Load two floats directly instead of using one 64-bit integer load
When dealing with complex<float>, and similar structures with two
single-precision floating-point numbers, especially when such things are being
passed around by value, we'll sometimes end up loading both float values by
extracting them from one 64-bit integer load. It looks like this:

  t13: i64,ch = load<LD8[%ref.tmp]> t0, t6, undef:i64
      t16: i64 = srl t13, Constant:i32<32>
    t17: i32 = truncate t16
  t18: f32 = bitcast t17
    t19: i32 = truncate t13
  t20: f32 = bitcast t19

The problem, especially before the P8 where those bitcasts aren't legal (and
get expanded via the stack), is that it would have been better to use two
floating-point loads directly. Here we add a target-specific DAGCombine to do
just that. In short, we turn:

	ld 3, 0(5)
	stw 3, -8(1)
	rldicl 3, 3, 32, 32
	stw 3, -4(1)
	lfs 3, -4(1)
	lfs 0, -8(1)

into:

        lfs 3, 4(5)
        lfs 0, 0(5)

llvm-svn: 264988
2016-03-31 02:56:05 +00:00
Nirav Dave
7a0e387b04 Remove HasFnAttribute guards to getFnAttribute calls
These checks are redundant and can be removed

Reviewers: hans

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D18564

llvm-svn: 264872
2016-03-30 15:41:12 +00:00
Adam Nemet
858430e4c1 [PPC] Remove -ppc-loop-prefetch-distance in favor of -prefetch-distance
After the previous change, this can now be overridden centrally in the
pass.

llvm-svn: 264807
2016-03-29 23:45:56 +00:00
Hal Finkel
82fdd00490 [PowerPC] Refactor popcnt[dw] target features
Instead of using two feature bits, one to indicate the availability of the
popcnt[dw] instructions, and another to indicate whether or not they're fast,
use a single enum. This allows more consistent control via target attribute
strings, and via Clang's command line.

llvm-svn: 264690
2016-03-29 01:36:01 +00:00
Hal Finkel
87e110c9e5 [PowerPC] Clarify a comment in PPCTTI about vector loads
This should say that we could do unaligned vector loads on the P7 using VSX
instructions, not that we should.

llvm-svn: 264683
2016-03-28 22:39:35 +00:00
Hal Finkel
9df9f25590 [PowerPC] On the A2, popcnt[dw] are very slow
The A2 cores support the popcntw/popcntd instructions, but they're microcoded,
and slower than our default software emulation. Specifically, popcnt[dw] take
approximately 74 cycles, whereas our software emulation takes only 24-28
cycles.

I've added a new target feature to indicate a slow popcnt[dw], instead of just
removing the existing target feature from the a2/a2q processor models, because:
  1. This allows us to return more accurate information via the TTI interface
     (I recognize that this currently makes no practical difference)
  2. Is hopefully easier to understand (it allows the core's features to match
     its manual while still having the desired effect).

llvm-svn: 264600
2016-03-28 17:52:08 +00:00
Chuang-Yu Cheng
946c8b7db0 [Power9] Implement new altivec instructions: bcd* series
This patch implements the following altivec instructions:

- Decimal Convert From/to National/Zoned/Signed-QWord:
    bcdcfn. bcdcfz. bcdctn. bcdctz. bcdcfsq. bcdctsq.

- Decimal Copy-Sign/Set-Sign:
    bcdcpsgn. bcdsetsgn.

- Decimal Shift/Unsigned-Shift/Shift-and-Round:
    bcds. bcdus. bcdsr.

- Decimal (Unsigned) Truncate:
    bcdtrunc. bcdutrunc.

Total 13 instructions

Thanks Amehsan's advice! Thanks Kit's great help!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

http://reviews.llvm.org/D17838

llvm-svn: 264568
2016-03-28 09:04:23 +00:00
Chuang-Yu Cheng
86334cecad [Power9] Implement new vsx instructions: insert, extract, test data class, min/max, reverse, permute, splat
This change implements the following vsx instructions:

- Scalar Insert/Extract
    xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp

- Vector Insert/Extract
    xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp
    xxextractuw xxinsertw

- Scalar/Vector Test Data Class
    xststdcdp xststdcsp xststdcqp
    xvtstdcdp xvtstdcsp

- Maximum/Minimum
    xsmaxcdp xsmaxjdp
    xsmincdp xsminjdp

- Vector Byte-Reverse/Permute/Splat
    xxbrd xxbrh xxbrq xxbrw
    xxperm xxpermr
    xxspltib

30 instructions

Thanks Nemanja for invaluable discussion! Thanks Kit's great help!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

http://reviews.llvm.org/D16842

llvm-svn: 264567
2016-03-28 08:34:28 +00:00
Chuang-Yu Cheng
f63e6b8f62 [Power9] Implement new vsx instructions: quad-precision move, fp-arithmetic
This change implements the following vsx instructions:

- quad-precision move
    xscpsgnqp, xsabsqp, xsnegqp, xsnabsqp

- quad-precision fp-arithmetic
    xsaddqp(o) xsdivqp(o) xsmulqp(o) xssqrtqp(o) xssubqp(o)
    xsmaddqp(o) xsmsubqp(o) xsnmaddqp(o) xsnmsubqp(o)

22 instructions

Thanks Nemanja and Kit for careful review and invaluable discussion!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

http://reviews.llvm.org/D16110

llvm-svn: 264565
2016-03-28 07:38:01 +00:00
Hal Finkel
10aaab9a43 [PowerPC] Map max/minnum intrinsics and fmax/fmin to ISD nodes for CTR-based loop legality
Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc
function calls (fmax[f], etc.) generally map to function calls after lowering.
For some vector types with QPX at least, however, we can legally lower these,
and we don't need to prohibit CTR-based loops on their account.

It turned out, however, that the logic that checked the opcodes associated with
intrinsics was broken (it would set the Opcode variable, but that variable was
later checked only if set for some otherwise-external function call.

This fixes the latter problem and adds the FMAX/MINNUM mappings.

llvm-svn: 264532
2016-03-27 05:40:56 +00:00
David Majnemer
e94081b48d [PowerPC] Disable the CTR optimization in the presence of {min,max}num
The minnum and maxnum intrinsics get lowered to libcalls which
invalidates the CTR optimization.

This fixes PR27083.

llvm-svn: 264508
2016-03-26 09:42:31 +00:00
Chuang-Yu Cheng
e4ba18d52f [Power9] Implement new altivec instructions: permute, count zero, extend sign, negate, parity, shift/rotate, mul10
This change implements the following vector operations:
- vclzlsbb vctzlsbb vctzb vctzd vctzh vctzw
- vextsb2w vextsh2w vextsb2d vextsh2d vextsw2d
- vnegd vnegw
- vprtybd vprtybq vprtybw
- vbpermd vpermr
- vrlwnm vrlwmi vrldnm vrldmi vslv vsrv
- vmul10cuq vmul10uq vmul10ecuq vmul10euq

28 instructions

Thanks Nemanja, Kit for invaluable hints and discussion!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

Phabricator: http://reviews.llvm.org/D15887
llvm-svn: 264504
2016-03-26 05:46:11 +00:00
Eric Christopher
6ddb84b162 Finish the incomplete 'd' inline asm constraint support for PPC by
making sure we give it a register and mark it as a register constraint.

llvm-svn: 264340
2016-03-24 21:04:52 +00:00
Nemanja Ivanovic
7d010d19df [PowerPC] Disable direct moves for extractelement and bitcast in 32-bit mode
This patch corresponds to review:
http://reviews.llvm.org/D17711

It disables direct moves on these operations in 32-bit mode since the patterns
assume 64-bit registers. The final patch is slightly different from the
Phabricator review as the bitcast operations needed to be disabled in 32-bit
mode as well. This fixes PR26617.

llvm-svn: 264282
2016-03-24 13:40:33 +00:00
Kyle Butt
0c8ea793c0 Codegen: [PPC] Word Rotates are Zero Extending.
Add Word rotates to the list of instructions that are zero extending.
This allows them to be used in dot form to compare with zero.

llvm-svn: 264183
2016-03-23 19:51:22 +00:00
Ehsan Amiri
bce337dbba adding another optimization opportunity to readme file
llvm-svn: 263775
2016-03-18 04:02:25 +00:00
Tim Shen
695f1e65cb [PPC, FastISel] Fix ordered/unordered fcmp
For fcmp, major concern about the following 6 cases is NaN result. The
comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered),
only one of which will be set. The result is generated by fcmpu
instruction. However, bc instruction only inspects one of the first 3
bits, so when un is set, bc instruction may jump to to an undesired
place.

More specifically, if we expect an unordered comparison and un is set, we
expect to always go to true branch; in such case UEQ, UGT and ULT still
give false, which are undesired; but UNE, UGE, ULE happen to give true,
since they are tested by inspecting !eq, !lt, !gt, respectively.

Similarly, for ordered comparison, when un is set, we always expect the
result to be false. In such case OGT, OLT and OEQ is good, since they are
actually testing GT, LT, and EQ respectively, which are false. OGE, OLE
and ONE are tested through !lt, !gt and !eq, and these are true.

llvm-svn: 263753
2016-03-17 22:27:58 +00:00
Petar Jovanovic
f95962007a [PowerPC] Disable CTR loops optimization for soft float operations
This patch prevents CTR loops optimization when using soft float operations
inside loop body. Soft float operations use function calls, but function
calls are not allowed inside CTR optimized loops.

Patch by Aleksandar Beserminji.

Differential Revision: http://reviews.llvm.org/D17600

llvm-svn: 263727
2016-03-17 17:11:33 +00:00
James Y Knight
4b163a5aa3 Tweak some atomics functions in preparation for larger changes; NFC.
- Rename getATOMIC to getSYNC, as llvm will soon be able to emit both
  '__sync' libcalls and '__atomic' libcalls, and this function is for
  the '__sync' ones.

- getInsertFencesForAtomic() has been replaced with
  shouldInsertFencesForAtomic(Instruction), so that the decision can be
  made per-instruction. This functionality will be used soon.

- emitLeadingFence/emitTrailingFence are no longer called if
  shouldInsertFencesForAtomic returns false, and thus don't need to
  check the condition themselves.

llvm-svn: 263665
2016-03-16 22:12:04 +00:00
Sanjay Patel
f7ad46820f [DAG] use !isUndef() ; NFCI
llvm-svn: 263453
2016-03-14 18:09:43 +00:00
Sanjay Patel
f22bc14a47 [DAG] use isUndef() ; NFCI
llvm-svn: 263448
2016-03-14 17:28:46 +00:00
Nemanja Ivanovic
eaa9a4f81a Fix for PR 26378
This patch corresponds to review:
http://reviews.llvm.org/D17712

We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This
caused duplicate definition asserts when the pass is reused on the module
(i.e. with -compile-twice or in JIT contexts).

llvm-svn: 263338
2016-03-12 10:23:07 +00:00
Kit Barton
0f693412b6 [PPC] backend changes to generate xvabs[s,d]p and xvnabs[s,d]p instructions
This has to be committed before the FE changes

Phabricator: http://reviews.llvm.org/D17837
llvm-svn: 263035
2016-03-09 17:48:01 +00:00
Kit Barton
8b3a860234 [Power9] Implement new vsx instructions: load, store instructions for vector and scalar
We follow the comments mentioned in http://reviews.llvm.org/D16842#344378 to
implement this new patch.

This patch implements the following vsx instructions:

Vector load/store:
lxv lxvx lxvb16x lxvl lxvll lxvh8x lxvwsx
stxv stxvb16x stxvh8x stxvl stxvll stxvx
Scalar load/store:
lxsd lxssp lxsibzx lxsihzx
stxsd stxssp stxsibx stxsihx
21 instructions

Phabricator: http://reviews.llvm.org/D16919
llvm-svn: 262906
2016-03-08 03:49:13 +00:00
Richard Smith
2ea1e68515 A couple more UB fixes for C++14 sized deallocation.
llvm-svn: 262891
2016-03-08 00:59:44 +00:00
Tim Shen
464ed3d905 [PPCVSXFMAMutate] Temporarily disable this pass
llvm-svn: 262573
2016-03-03 01:27:35 +00:00
Kit Barton
4d7130da2e [Power9] Implement new vector compare, extract, insert instructions
This change implements the following vector operations:

  - Vector Compare Not Equal
    - vcmpneb(.) vcmpneh(.) vcmpnew(.)
    - vcmpnezb(.) vcmpnezh(.) vcmpnezw(.)
  - Vector Extract Unsigned
    - vextractub vextractuh vextractuw vextractd
    - vextublx vextubrx vextuhlx vextuhrx vextuwlx vextuwrx
  - Vector Insert
    - vinsertb vinserth vinsertw vinsertd

26 instructions.

Phabricator: http://reviews.llvm.org/D15916
llvm-svn: 262392
2016-03-01 20:51:57 +00:00
Kit Barton
977c191475 New file to track implementation status of new POWER9 instructions
llvm-svn: 262386
2016-03-01 20:19:43 +00:00
Matthias Braun
6958800987 TableGen: Check scheduling models for completeness
TableGen checks at compiletime that for scheduling models with
"CompleteModel = 1" one of the following holds:

- Is marked with the hasNoSchedulingInfo flag
- The instruction is a subclass of Sched
- There are InstRW definitions in the scheduling model

Typical steps necessary to complete a model:

- Ensure all pseudo instructions that are expanded before machine
  scheduling (usually everything handled with EmitYYY() functions in
  XXXTargetLowering).
- If a CPU does not support some instructions mark the corresponding
  resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }".
- Add missing scheduling information.

Differential Revision: http://reviews.llvm.org/D17747

llvm-svn: 262384
2016-03-01 20:03:21 +00:00
Nemanja Ivanovic
f9bc1afbcf Fix for PR26180
Corresponds to Phabricator review:
http://reviews.llvm.org/D16592

This fix includes both an update to how we handle the "generic" CPU on LE
systems as well as Anton's fix for the Fast Isel issue.

llvm-svn: 262233
2016-02-29 16:42:27 +00:00
Duncan P. N. Exon Smith
779625a4a6 CodeGen: Change MachineInstr to use MachineInstr&, NFC
Change MachineInstr API to prefer MachineInstr& over MachineInstr*
whenever the parameter is expected to be non-null.  Slowly inching
toward being able to fix PR26753.

llvm-svn: 262149
2016-02-27 20:01:33 +00:00
Duncan P. N. Exon Smith
13c519204e CodeGen: Take MachineInstr& in SlotIndexes and LiveIntervals, NFC
Take MachineInstr by reference instead of by pointer in SlotIndexes and
the SlotIndex wrappers in LiveIntervals.  The MachineInstrs here are
never null, so this cleans up the API a bit.  It also incidentally
removes a few implicit conversions from MachineInstrBundleIterator to
MachineInstr* (see PR26753).

At a couple of call sites it was convenient to convert to a range-based
for loop over MachineBasicBlock::instr_begin/instr_end, so I added
MachineBasicBlock::instrs.

llvm-svn: 262115
2016-02-27 06:40:41 +00:00
Kit Barton
f070e5e85c [PPC] Legalize FNEG on PPC when possible
Currently we always expand ISD::FNEG. For v4f32 and v2f64 vector types VSX has
native support for this opcode

Phabricator: http://reviews.llvm.org/D17647
llvm-svn: 262079
2016-02-26 21:59:44 +00:00
Kit Barton
e51a13ad8a Power9] Implement new vsx instructions: compare and conversion
This change implements the following vsx instructions:

Quad/Double-Precision Compare:
xscmpoqp xscmpuqp
xscmpexpdp xscmpexpqp
xscmpeqdp xscmpgedp xscmpgtdp xscmpnedp
xvcmpnedp(.) xvcmpnesp(.)
Quad-Precision Floating-Point Conversion
xscvqpdp(o) xscvdpqp
xscvqpsdz xscvqpswz xscvqpudz xscvqpuwz xscvsdqp xscvudqp
xscvdphp xscvhpdp xvcvhpsp xvcvsphp
xsrqpi xsrqpix xsrqpxp
28 instructions

Phabricator: http://reviews.llvm.org/D16709
llvm-svn: 262068
2016-02-26 21:11:55 +00:00
Aaron Ballman
96f0a702b2 Silencing a signed vs unsigned mismatch.
llvm-svn: 261640
2016-02-23 15:02:43 +00:00
Duncan P. N. Exon Smith
53cb4596f6 CodeGen: TII: Take MachineInstr& in predicate API, NFC
Change TargetInstrInfo API to take `MachineInstr&` instead of
`MachineInstr*` in the functions related to predicated instructions
(I'll try to come back later and get some of the rest).  All of these
functions require non-null parameters already, so references are more
clear.  As a bonus, this happens to factor away a host of implicit
iterator => pointer conversions.

No functionality change intended.

llvm-svn: 261605
2016-02-23 02:46:52 +00:00
Nemanja Ivanovic
662ab414aa Fix for PR26690 take 2
This is what was meant to be in the initial commit to fix this bug. The
parens were missing. This commit also adds a test case for the bug and
has undergone full testing on PPC and X86.

llvm-svn: 261546
2016-02-22 18:04:00 +00:00
Nemanja Ivanovic
752e2bfba7 Revert bad fix for PR26690.
llvm-svn: 261527
2016-02-22 15:06:32 +00:00
Nemanja Ivanovic
b136ac47cd Fix for PR26690
I mistook BitVector::empty() to mean BitVector::count() == 0 and it does
not. Corrected the issue with the fix for PR26500.

llvm-svn: 261525
2016-02-22 14:47:49 +00:00
Benjamin Kramer
e5027dce2c Fix some abuse of auto flagged by clang's -Wrange-loop-analysis.
llvm-svn: 261524
2016-02-22 13:11:58 +00:00
Nemanja Ivanovic
6ba5cb86b8 Fix the build bot break caused by rL261441.
The patch has a necessary call to a function inside an assert. Which is fine
when you have asserts turned on. Not so much when they're off. Sorry about
the regression.

llvm-svn: 261447
2016-02-20 20:45:37 +00:00
Nemanja Ivanovic
57bd7dee35 Fix for PR 26500
This patch corresponds to review:
http://reviews.llvm.org/D17294

It ensures that whatever block we are emitting the prologue/epilogue into, we
have the necessary scratch registers. It takes away the hard-coded register
numbers for use as scratch registers as registers that are guaranteed to be
available in the function prologue/epilogue are not guaranteed to be available
within the function body. Since we shrink-wrap, the prologue/epilogue may end
up in the function body.

llvm-svn: 261441
2016-02-20 18:16:25 +00:00
Richard Trieu
5a759985de Remove uses of builtin comma operator.
Cleanup for upcoming Clang warning -Wcomma.  No functionality change intended.

llvm-svn: 261270
2016-02-18 22:09:30 +00:00
Adam Nemet
27b8897111 [PPCLoopDataPrefetch] Move pass to Transforms/Scalar/LoopDataPrefetch. NFC
This patch is part of the work to make PPCLoopDataPrefetch
target-independent
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758).

Obviously the pass still only used from PPC at this point.  Subsequent
patches will start driving this from ARM64 as well.

Due to the previous patch most lines should show up as moved lines.

llvm-svn: 261265
2016-02-18 21:38:19 +00:00
Adam Nemet
f9d4c08808 [PPCLoopDataPrefetch] Remove PPC from some of the names. NFC
This is done only to make the next patch that move the pass out PPC to
Transforms easier to read.  After this most line should show up as moved
lines in that patch.

This patch is part of the work to make PPCLoopDataPrefetch
target-independent
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758).

llvm-svn: 261264
2016-02-18 21:37:12 +00:00
Ahmed Bougacha
7c87497de6 [CodeGen] Document and use getConstant's splat-building feature. NFC.
Differential Revision: http://reviews.llvm.org/D17229

llvm-svn: 260901
2016-02-15 18:07:29 +00:00
Colin LeMahieu
db6a39c7b6 [MC] Merge VK_PPC_TPREL in to generic VK_TPREL.
Differential Revision: http://reviews.llvm.org/D17038

llvm-svn: 260401
2016-02-10 18:32:01 +00:00
Nemanja Ivanovic
3a405a94b2 Fix for PR 26193
This is a simple fix for a PowerPC intrinsic that was incorrectly defined
(the return type was incorrect).

llvm-svn: 259886
2016-02-05 14:50:29 +00:00
Nemanja Ivanovic
b7bc445a9f Fix for PR 26356
Using the load immediate only when the immediate (whether signed or unsigned)
can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and
0 to 65535 for unsigned. This patch also ensures that we sign-extend under the
right conditions.

llvm-svn: 259840
2016-02-04 23:14:42 +00:00
Nemanja Ivanovic
3fb0b09e1f Fix for PR 26381
Simple fix - Constant values were not being sign extended in FastIsel.

llvm-svn: 259645
2016-02-03 12:53:38 +00:00
Kyle Butt
77f7a2b7a8 Codegen: [PPC] Fix PPCVSXFMAMutate to handle duplicates.
The purpose of PPCVSXFMAMutate is to elide copies by changing FMA forms
on PPC.

    %vreg6<def> = COPY %vreg96
    %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg7
    ;v6 = v6 + v5 * v7

is replaced by

    %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg7, %vreg96
    ;v5 = v5 * v7 + v96

This was broken in the case where the target register was also used as a
multiplicand. Fix this case by checking for it and replacing both uses
with the copied register.

    %vreg6<def> = COPY %vreg96
    %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg6
    ;v6 = v6 + v5 * v6

is replaced by

    %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg96, %vreg96
    ;v5 = v5 * v96 + v96

llvm-svn: 259617
2016-02-03 01:41:09 +00:00
Eric Christopher
aa54c26bb0 Refactor common code for PPC fast isel load immediate selection.
llvm-svn: 259178
2016-01-29 07:20:30 +00:00
Eric Christopher
14a36607b8 Since LI/LIS sign extend the constant passed into the instruction we should
check that the sign extended constant fits into 16-bits if we want a
zero extended value, otherwise go ahead and put it together piecemeal.

Fixes PR26356.

llvm-svn: 259177
2016-01-29 07:20:01 +00:00
Eric Christopher
2ac8efe601 Fix up conditional formatting.
llvm-svn: 259176
2016-01-29 07:19:49 +00:00
Adam Nemet
b70814180d [TTI] Add getPrefetchDistance from PPCLoopDataPrefetch, NFC
This patch is part of the work to make PPCLoopDataPrefetch
target-independent
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758).

As it was discussed in the above thread, getPrefetchDistance is
currently using instruction count which may change in the future.

llvm-svn: 258995
2016-01-27 22:21:25 +00:00
Benjamin Kramer
1d1115c0c4 Rename TargetSelectionDAGInfo into SelectionDAGTargetInfo and move it to CodeGen/
It's a SelectionDAG thing, not a Target thing.

llvm-svn: 258939
2016-01-27 16:32:26 +00:00
Benjamin Kramer
e3023baf50 Move MCTargetAsmParser.h to llvm/MC/MCParser where it belongs.
llvm-svn: 258917
2016-01-27 10:01:28 +00:00
Chris Bieneman
1b8d4f74aa Remove autoconf support
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html

"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi

Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark

Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D16471

llvm-svn: 258861
2016-01-26 21:29:08 +00:00
Benjamin Kramer
75511fc092 Reflect the MC/MCDisassembler split on the include/ level.
No functional change, just moving code around.

llvm-svn: 258818
2016-01-26 16:44:37 +00:00
Adam Nemet
2628dc52d2 [TTI] Add getCacheLineSize
Summary:
And use it in PPCLoopDataPrefetch.cpp.

@hfinkel, please let me know if your preference would be to preserve the
ppc-loop-prefetch-cache-line option in order to be able to override the
value of TTI::getCacheLineSize for PPC.

Reviewers: hfinkel

Subscribers: hulx2000, mcrosier, mssimpso, hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D16306

llvm-svn: 258419
2016-01-21 18:28:36 +00:00
Manuel Jacob
e6438acb66 GlobalValue: use getValueType() instead of getType()->getPointerElementType().
Reviewers: mjacob

Subscribers: jholewinski, arsenm, dsanders, dblaikie

Patch by Eduard Burtescu.

Differential Revision: http://reviews.llvm.org/D16260

llvm-svn: 257999
2016-01-16 20:30:46 +00:00
Kyle Butt
a60f1430f3 Codegen: [PPC] Silence false-positive initialization warning. NFC
Some compilers don't do exhaustive switch checking. For those compilers,
add an initialization to prevent un-initialized variable warnings from
firing. For compilers with exhaustive switch checking, we still get a
guarantee that the switch is exhaustive, and hence the initializations
are redundant, and a non-functional change.

llvm-svn: 257923
2016-01-15 19:20:06 +00:00
JF Bastien
56616e9461 WebAssembly: fix build break introduced by ELFObjectWriter churn
llvm-svn: 257709
2016-01-13 23:36:00 +00:00
Rafael Espindola
7f81c885ac Convert a few assert failures into proper errors.
Fixes PR25944.

llvm-svn: 257697
2016-01-13 22:56:57 +00:00
Ulrich Weigand
257810adf2 [PowerPC] Fix large code model with the ELFv2 ABI
The global entry point prologue currently assumes that the TOC
associated with a function is less than 2GB away from the function
entry point.  This is always true when using the medium or small
code model, but may not be the case when using the large code model.

This patch adds a new variant of the ELFv2 global entry point prologue
that lifts the 2GB restriction when building with -mcmodel=large.
This works by emitting a quadword containing the distance from the
function entry point to its associated TOC immediately before the
entry point, and then using a prologue like:

ld r2,-8(r12)
add r2,r2,r12

Since creation of the entry point prologue is now split across two
separate routines (PPCLinuxAsmPrinter::EmitFunctionEntryLabel emits
the data word, PPCLinuxAsmPrinter::EmitFunctionBodyStart the prolog
code), I've switched to using named labels instead of just temporaries
to indicate the locations of the global and local entry points and the
new TOC offset data word.

These names are provided by new routines in PPCFunctionInfo modeled
after the existing PPCFunctionInfo::getPICOffsetSymbol.

Note that a corresponding change was committed to GCC here:
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00355.html

Reviewers: hfinkel

Differential Revision: http://reviews.llvm.org/D15500

llvm-svn: 257597
2016-01-13 13:12:23 +00:00
Kyle Butt
f1fe33f755 Codegen: [PPC] Handle weighted comparisons when inserting selects.
Only non-weighted predicates were handled in PPCInstrInfo::insertSelect. Handle
the weighted predicates as well.

This latent bug was triggered by r255398, because it added use of the
branch-weighted predicates.

While here, switch over an enum instead of an int to get the compiler to enforce
totality in the future.

llvm-svn: 257518
2016-01-12 21:00:43 +00:00
Nemanja Ivanovic
ec3aeea01e Prevent renaming of CR fields in AADB when a CR restore is present
This patch corresponds to review:
http://reviews.llvm.org/D15930

Moves to and from CR fields depend on shifts/masks that depend on the
target/source CR field. Thus, post-ra anti-dep breaking must not later
change that CR register assignment.

llvm-svn: 257168
2016-01-08 13:09:54 +00:00
Kyle Butt
70e01b07f9 Add call sequence start and end for __tls_get_addr
This is a fix for bug http://llvm.org/bugs/show_bug.cgi?id=25839.

For a PIC TLS variable access in a function, prologue (mflr followed by std and
stdu) gets scheduled after a tls_get_addr call. tls_get_addr messed up LR but
no one saves/restores it.

Also added a test for save/restore clobbered registers during calling __tls_get_addr.

Patch by Tim Shen

llvm-svn: 257137
2016-01-08 02:06:19 +00:00
Alexander Kornienko
c882727d66 Refactor: Simplify boolean conditional return statements in lib/Target/PowerPC
Summary: Use clang-tidy to simplify boolean conditional return statements

Reviewers: uweigand, rafael, wschmidt

Subscribers: craig.topper, llvm-commits

Patch by Richard Thomson!

Differential Revision: http://reviews.llvm.org/D9984

llvm-svn: 256493
2015-12-28 13:38:42 +00:00
Craig Topper
8ac345d153 Remove extra forward declarations and scrub includes for all in tree InstPrinters. NFC
llvm-svn: 256427
2015-12-25 22:10:01 +00:00
Cong Hou
50c405416c [BPI] Replace weights by probabilities in BPI.
This patch removes all weight-related interfaces from BPI and replace
them by probability versions. With this patch, we won't use edge weight
anymore in either IR or MC passes. Edge probabilitiy is a better
representation in terms of CFG update and validation.


Differential revision: http://reviews.llvm.org/D15519 

llvm-svn: 256263
2015-12-22 18:56:14 +00:00
Rafael Espindola
4648b3e87c Simplify. NFC.
llvm-svn: 255894
2015-12-17 14:19:52 +00:00
Ahmed Bougacha
34e91bd86a [CodeGen] Make MachineInstrBuilder::copyImplicitOps const. NFC.
This matches the other MIB methods, none of which modify the builder.
Without this, we can't chain copyImplicitOps.
Also reformat the few users, in PPCEarlyReturn.

llvm-svn: 255828
2015-12-16 22:15:30 +00:00
Justin Bogner
621a2ef540 LPM: Stop threading Pass * through all of the loop utility APIs. NFC
A large number of loop utility functions take a `Pass *` and reach
into it to find out which analyses to preserve. There are a number of
problems with this:

- The APIs have access to pretty well any Pass state they want, so
  it's hard to tell what they may or may not do.

- Other APIs have copied these and pass around a `Pass *` even though
  they don't even use it. Some of these just hand a nullptr to the API
  since the callers don't even have a pass available.

- Passes in the new pass manager don't work like the current ones, so
  the APIs can't be used as is there.

Instead, we should explicitly thread the analysis results that we
actually care about through these APIs. This is both simpler and more
reusable.

llvm-svn: 255669
2015-12-15 19:40:57 +00:00
Nemanja Ivanovic
3c08538d6e Bitcasts between FP and INT values using direct moves
This patch corresponds to review:
http://reviews.llvm.org/D15286

This patch was meant to land in revision 255246, but I accidentally uploaded
the patch that corresponds to http://reviews.llvm.org/D15372 in that revision
accidentally.

Thereby, this patch is the actual Bitcasts using direct moves patch, whereas
http://reviews.llvm.org/rL255246 actually corresponds to
http://reviews.llvm.org/D15372.

llvm-svn: 255649
2015-12-15 14:50:34 +00:00
Nemanja Ivanovic
d1b01f2c52 Define a feature for __float128 support in the PPC back end
This patch corresponds to review:
http://reviews.llvm.org/D15117

In preparation for supporting IEEE Quad precision floating point,
this patch simply defines a feature to specify the target supports this.
For now, nothing is done with the target feature, we just don't want
warnings from the Clang FE when a user specifies -mfloat128.
Calling convention and other related work will add to this patch in
the near future.

llvm-svn: 255642
2015-12-15 12:19:34 +00:00
Petar Jovanovic
6b5f6d05bd [Power PC] llvm soft float support for ppc32
This is the second in a set of patches for soft float support for ppc32,
it enables soft float operations.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D13700

llvm-svn: 255516
2015-12-14 17:57:33 +00:00
Chad Rosier
7d9d3a541b [PPC] Early exit loop. NFC.
llvm-svn: 255497
2015-12-14 14:44:06 +00:00
Cong Hou
b76a7a8dcb Normalize MBB's successors' probabilities in several locations.
This patch adds some missing calls to MBB::normalizeSuccProbs() in several
locations where it should be called. Those places are found by checking if the
sum of successors' probabilities is approximate one in MachineBlockPlacement
pass with some instrumented code (not in this patch).


Differential revision: http://reviews.llvm.org/D15259

llvm-svn: 255455
2015-12-13 09:26:17 +00:00
Hal Finkel
a39d5a985d [PowerPC] OutStreamer cleanup in PPCAsmPrinter
We don't need to pass OutStreamer as a parameter to LowerSTACKMAP and
LowerPATCHPOINT. It is a member variable of PPCAsmPrinter, and thus, is already
available. NFC.

llvm-svn: 255418
2015-12-12 01:47:08 +00:00
Hal Finkel
28f5cfb976 [PowerPC] Add Branch Hints for Highly-Biased Branches
This branch adds hints for highly biased branches on the PPC architecture. Even
in absence of profiling information, LLVM will mark code reaching unreachable
terminators and other exceptional control flow constructs as highly unlikely to
be reached.

Patch by Tom Jablin!

llvm-svn: 255398
2015-12-12 00:32:00 +00:00
Matt Arsenault
fad94dae85 Start replacing vector_extract/vector_insert with extractelt/insertelt
These are redundant pairs of nodes defined for
INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT.
insertelement/extractelement are slightly closer to the corresponding
C++ node name, and has stricter type checking so prefer it.

Update targets to only use these nodes where it is trivial to do so.
AArch64, ARM, and Mips all have various type errors on simple replacement,
so they will need work to fix.

Example from AArch64:

def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8),
          (i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>;

Which is trying to do sext_inreg i8, i8.

llvm-svn: 255359
2015-12-11 19:20:16 +00:00