1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 20:12:56 +02:00
Commit Graph

8015 Commits

Author SHA1 Message Date
Rafael Espindola
b9807cdcf1 Replace more uses of sse41 with sse4.1.
llc using the host cpu features and *waning* on unknown features is probably
not a good thing :-(

llvm-svn: 189144
2013-08-23 20:39:19 +00:00
Rafael Espindola
e8f25b8c77 Update a test that I missed in the previous commit.
llvm-svn: 189143
2013-08-23 20:27:02 +00:00
Rafael Espindola
112bdc8929 Rename features to match what gcc and clang use.
There is no advantage in being different and using the same names simplifies
clang a bit.

llvm-svn: 189141
2013-08-23 20:21:34 +00:00
Joey Gouly
25375f9ffb [ARM] Fix another ARM FastISel -verify-machineinstrs issue.
llvm-svn: 189109
2013-08-23 15:20:56 +00:00
Joey Gouly
9ebd1c7d68 [ARMv8] Add CodeGen for VMAXNM/VMINNM.
llvm-svn: 189103
2013-08-23 12:01:13 +00:00
Richard Sandiford
9867b44c59 [SystemZ] Add basic prefetch support
Just the instructions and intrinsics for now.

llvm-svn: 189100
2013-08-23 11:36:42 +00:00
Richard Sandiford
152d2f09a8 [SystemZ] Try reversing comparisons whose first operand is in memory
This allows us to make more use of the many compare reg,mem instructions.

llvm-svn: 189099
2013-08-23 11:27:19 +00:00
Richard Sandiford
de9eba2208 [SystemZ] Prefer LHI;ST... over LAY;MV...
If we had a store of an integer to memory, and the integer and store size
were suitable for a form of MV..., we used MV... no matter what.  We could
then have sequences like:

    lay %r2, 0(%r3,%r4)
    mvi 0(%r2), 4

In these cases it seems better to force the constant into a register
and use a normal store:

    lhi %r2, 4
    stc %r2, 0(%r3, %r4)

since %r2 is more likely to be hoisted and is easier to rematerialize.

llvm-svn: 189098
2013-08-23 11:18:53 +00:00
Richard Sandiford
b195d89bde Turn MipsOptimizeMathLibCalls into a target-independent scalar transform
...so that it can be used for z too.  Most of the code is the same.
The only real change is to use TargetTransformInfo to test when a sqrt
instruction is available.

The pass is opt-in because at the moment it only handles sqrt.

llvm-svn: 189097
2013-08-23 10:27:02 +00:00
Tim Northover
7c24b95efe ARM: make sure ARM-mode pseudo-inst requires IsARM
I'd forgotten that "Requires" blocks override rather than add to the
constraints, so my pseudo-instruction was being selected in Thumb mode leading
to nonsense instructions.

rdar://problem/14817358

llvm-svn: 189096
2013-08-23 10:16:39 +00:00
Michael Gottesman
cb2cf901dc [stack protector] Work around an issue with the BMOVPCB_CALL instruction on ARM by disabling does not return on __stack_chk_fail.
This is to fix the bots while I look to see if there is something I can do here.

rdar://14811848

llvm-svn: 189076
2013-08-22 23:45:24 +00:00
Bill Wendling
fa2b06a7e7 Update to remove the no-frame-pointer-elim-non-leaf flag if it was set to 'false'.
llvm-svn: 189068
2013-08-22 21:28:54 +00:00
Bill Wendling
77e4bfaf0d Fix some tests. The 'false' version just omits the attribute altogether.
llvm-svn: 189065
2013-08-22 21:20:14 +00:00
Tom Stellard
d70d216860 R600/SI: Fix another case of illegal VGPR to SGPR copy
This fixes a crash in Unigine Tropics.

https://bugs.freedesktop.org/show_bug.cgi?id=68389

llvm-svn: 189057
2013-08-22 20:21:02 +00:00
Manman Ren
b66c695f15 [Debug Info Tests] Update testing cases.
A single metadata will not span multiple lines. This also helps me with
my script to automatic update the testing cases.
A debug info testing case should have a llvm.dbg.cu.
Do not use hard-coded id for debug nodes.

llvm-svn: 189033
2013-08-22 17:11:18 +00:00
Joey Gouly
355a09f268 [ARMv8] Add CodeGen support for VSEL.
This uses the ARMcmov pattern that Tim cleaned up in r188995.

Thanks to Simon Tatham for his floating point help!

llvm-svn: 189024
2013-08-22 15:29:11 +00:00
Joey Gouly
67edac2b5e [ARM] Constrain some register classes in EmitAtomicBinary64 so that
we pass these tests with -verify-machineinstrs.

llvm-svn: 189006
2013-08-22 12:19:24 +00:00
Logan Chien
2891b0c61c Fix ARM FastISel PIC function call.
The function call to external function should come with PLT relocation
type if the PIC relocation model is used.

llvm-svn: 189002
2013-08-22 12:08:04 +00:00
Tim Northover
eb7a86ed88 ARM: use TableGen patterns to select CMOV operations.
Back in the mists of time (2008), it seems TableGen couldn't handle the
patterns necessary to match ARM's CMOV node that we convert select operations
to, so we wrote a lot of fairly hairy C++ to do it for us.

TableGen can deal with it now: there were a few minor differences to CodeGen
(see tests), but nothing obviously worse that I could see, so we should
probably address anything that *does* come up in a localised manner.

llvm-svn: 188995
2013-08-22 09:57:11 +00:00
Tim Northover
7e34f41ef8 ARM: respect tied 64-bit inlineasm operands when printing
The code for 'Q' and 'R' operand modifiers needs to look through tied
operands to discover the register class.

llvm-svn: 188990
2013-08-22 06:51:04 +00:00
Michael Gottesman
e5ccfcac27 [stackprotector] When finding the split point to splice off the end of a parentmbb into a successmbb, include any DBG_VALUE MI.
Fix for PR16954.

llvm-svn: 188987
2013-08-22 05:40:50 +00:00
Jim Grosbach
d6ff6507fa ARM: R9 is not safe to use for tcGPR.
Indirect tail-calls shouldn't use R9 for the branch destination, as
it's not reliably a call-clobbered register.

rdar://14793425

llvm-svn: 188967
2013-08-22 00:14:24 +00:00
Tom Stellard
721e3acccd SelectionDAG: Make sure stores are always added to the LegalizedNodes list
When truncated vector stores were being custom lowered in
VectorLegalizer::LegalizeOp(), the old (illegal) and new (legal) node pair
was not being added to LegalizedNodes list.  Instead of the legalized
result being passed to VectorLegalizer::TranslateLegalizeResult(),
the result was being passed back into VectorLegalizer::LegalizeOp(),
which ended up adding a (new, new) pair to the list instead.

This was causing an assertion failure when a custom lowered truncated
vector store was the last instruction a basic block and the VectorLegalizer
was unable to find it in the LegalizedNodes list when updating the
DAG root.

llvm-svn: 188953
2013-08-21 22:42:58 +00:00
Manman Ren
1b7973fc19 TBAA: remove !tbaa from testing cases when they are not needed.
This will make it easier to turn on struct-path aware TBAA since the metadata
format will change.

llvm-svn: 188944
2013-08-21 22:20:53 +00:00
Juergen Ributzka
d12ce0859f Teach BaseIndexOffset::match to identify base pointers in loops.
The small utility function that pattern matches Base + Index +
Offset patterns for loads and stores fails to recognize the base
pointer for loads/stores from/into an array at offset 0 inside a
loop. As a result DAGCombiner::MergeConsecutiveStores was not able
to merge all stores.

This commit fixes the issue by adding an additional pattern match
and also a test case.

Reviewer: Nadav
llvm-svn: 188936
2013-08-21 21:53:38 +00:00
Hao Liu
7962606ca8 A minor change for an obvous problem caused by r188451:
def imm0_63 : Operand<i32>, ImmLeaf<i32, [{ return Imm >= 0 && Imm < 63;}]>{
As it seems Imm <63 should be Imm <= 63. ImmLeaf is used in pattern match, but there is already a function check the shift amount range, so just remove ImmLeaf. Also add a test to check 63.

llvm-svn: 188911
2013-08-21 17:47:53 +00:00
Joey Gouly
ec3b9aa53e Add -mcpu to two X86 tests.
These tests are failing on Haswell CPUs due to different instruction selection.

llvm-svn: 188908
2013-08-21 17:14:31 +00:00
Elena Demikhovsky
44bbb2b413 AVX-512: Added SHIFT instructions.
llvm-svn: 188899
2013-08-21 09:36:02 +00:00
Richard Sandiford
1dc05c13d2 [SystemZ] Define remainig *MUL_LOHI patterns
The initial port used MLG(R) for i64 UMUL_LOHI but left the other three
combinations as not-legal-or-custom.  Although 32x32->{32,32}
multiplications exist, they're not as quick as doing a normal 64-bit
multiplication, so it didn't seem like i32 SMUL_LOHI and UMUL_LOHI
would be useful.  There's also no direct instruction for i64 SMUL_LOHI,
so it needs to be implemented in terms of UMUL_LOHI.

However, not defining these patterns means that we don't convert
division by a constant into multiplication, so this patch fills
in the other cases.  The new i64 SMUL_LOHI sequence is simpler
than the one that we used previously for 64x64->128 multiplication,
so int-mul-08.ll now tests the full sequence.

llvm-svn: 188898
2013-08-21 09:34:56 +00:00
Richard Sandiford
e6e07910e3 [SystemZ] Use FI[EDX]BRA for codegen
llvm-svn: 188895
2013-08-21 09:04:20 +00:00
Akira Hatanaka
a80bdd3ab0 [mips] Add support for mfhc1 and mthc1.
llvm-svn: 188848
2013-08-20 23:47:25 +00:00
Reed Kotler
3e323b240e Add an option which permits the user to specify using a bitmask, that various
functions be compiled as mips32, without having to add attributes. This
is useful in certain situations where you don't want to have to edit the
function attributes in the source. For now it's only an option used for
the compiler developers when debugging the mips16 port.

llvm-svn: 188826
2013-08-20 20:53:09 +00:00
Jim Grosbach
343f1fbc39 ARM: Fix fast-isel copy/paste-o.
Update testcase to be more careful about checking register
values. While regexes are general goodness for these sorts of
testcases, in this example, the registers are constrained by
the calling convention, so we can and should check their
explicit values.

rdar://14779513

llvm-svn: 188819
2013-08-20 19:12:42 +00:00
Elena Demikhovsky
f09dad5d90 AVX-512: Added more patterns for VMOVSS, VMOVSD, VMOVD, VMOVQ
llvm-svn: 188786
2013-08-20 11:00:29 +00:00
Daniel Sanders
30561c36b8 [mips][msa] Removed fcge, fcgt, fsge, fsgt
These instructions were present in a draft spec but were removed before
publication.

llvm-svn: 188782
2013-08-20 09:41:47 +00:00
Richard Sandiford
add1a68f21 [SystemZ] Use SRST to optimize memchr
SystemZTargetLowering::emitStringWrapper() previously loaded the character
into R0 before the loop and made R0 live on entry.  I'd forgotten that
allocatable registers weren't allowed to be live across blocks at this stage,
and it confused LiveVariables enough to cause a miscompilation of f3 in
memchr-02.ll.

This patch instead loads R0 in the loop and leaves LICM to hoist it
after RA.  This is actually what I'd tried originally, but I went for
the manual optimisation after noticing that R0 often wasn't being hoisted.
This bug forced me to go back and look at why, now fixed as r188774.

We should also try to optimize null checks so that they test the CC result
of the SRST directly.  The select between null and the SRST GPR result could
then usually be deleted as dead.

llvm-svn: 188779
2013-08-20 09:38:48 +00:00
Daniel Sanders
91c40d80de [mips][msa] Added insve
llvm-svn: 188777
2013-08-20 09:22:54 +00:00
Richard Sandiford
6a0b1638b4 Fix test typo and add usual "br %r14" test
llvm-svn: 188775
2013-08-20 09:14:46 +00:00
Richard Sandiford
fcd54a3b89 Fix overly pessimistic shortcut in post-RA MachineLICM
Post-RA LICM keeps three sets of registers: PhysRegDefs, PhysRegClobbers
and TermRegs.  When it sees a definition of R it adds all aliases of R
to the corresponding set, so that when it needs to test for membership
it only needs to test a single register, rather than worrying about
aliases there too.  E.g. the final candidate loop just has:

    unsigned Def = Candidates[i].Def;
    if (!PhysRegClobbers.test(Def) && ...) {

to test whether register Def is multiply defined.

However, there was also a shortcut in ProcessMI to make sure we didn't
add candidates if we already knew that they would fail the final test.
This shortcut was more pessimistic than the final one because it
checked whether _any alias_ of the defined register was multiply defined.
This is too conservative for targets that define register pairs.
E.g. on z, R0 and R1 are sometimes used as a pair, so there is a
128-bit register that aliases both R0 and R1.  If a loop used
R0 and R1 independently, and the definition of R0 came first,
we would be able to hoist the R0 assignment (because that used
the final test quoted above) but not the R1 assignment (because
that meant we had two definitions of the paired R0/R1 register
and would fail the shortcut in ProcessMI).

This patch just uses the same check for the ProcessMI shortcut as
we use in the final candidate loop.

llvm-svn: 188774
2013-08-20 09:11:13 +00:00
Tim Northover
cec1079024 ARM: implement some simple f64 materializations.
Previously we used a const-pool load for virtually all 64-bit floating values.
Actually, we can get quite a few common values (including 0.0, 1.0) via "vmov"
instructions of one stripe or another.

llvm-svn: 188773
2013-08-20 08:57:11 +00:00
Daniel Sanders
15341e9a12 [mips][msa] Added and.v, bmnz.v, bmz.v, bsel.v, nor.v, or.v, xor.v
llvm-svn: 188767
2013-08-20 08:38:21 +00:00
Hal Finkel
4bb40e7c8d Don't form PPC CTR-based loops around a copysignl call
copysign/copysignf never become function calls (because the SDAG expansion code
does not lower to the corresponding function call, but rather directly
implements the associated logic), but copysignl almost always is lowered into a
call to the requested libm functon (and, thus, might clobber CTR).

llvm-svn: 188727
2013-08-19 23:35:24 +00:00
Paul Redmond
404ef5af36 Improve the widening of integral binary vector operations
- split WidenVecRes_Binary into WidenVecRes_Binary and WidenVecRes_BinaryCanTrap
  - WidenVecRes_BinaryCanTrap preserves the original behaviour for operations
    that can trap
  - WidenVecRes_Binary simply widens the operation and improves codegen for
    3-element vectors by allowing widening and promotion on x86 (matches the
    behaviour of unary and ternary operation widening)
- use WidenVecRes_Binary for operations on integers.

Reviewed by: nrotem

llvm-svn: 188699
2013-08-19 20:01:35 +00:00
Elena Demikhovsky
f1afd2e4db AVX-512: added arithmetic and logical operations.
ADD, SUB, MUL integer and FP types. OR, AND, XOR.
Added embeded broadcast form for these instructions.

llvm-svn: 188673
2013-08-19 13:26:14 +00:00
Richard Sandiford
5e32ef0acd [SystemZ] Add negative integer absolute (load negative)
For now this matches the equivalent of (neg (abs ...)), which did hit a few
times in projects/test-suite.  We should probably also match cases where
absolute-like selects are used with reversed arguments.

llvm-svn: 188671
2013-08-19 12:56:58 +00:00
Richard Sandiford
aee0958460 [SystemZ] Add integer absolute (load positive)
llvm-svn: 188670
2013-08-19 12:48:54 +00:00
Richard Sandiford
841d24aa5a [SystemZ] Add support for sibling calls
This first cut is pretty conservative.  The final argument register (R6)
is call-saved, so we would need to make sure that the R6 argument to a
sibling call is the same as the R6 argument to the calling function,
which seems worth keeping as a separate patch.

Saying that integer truncations are free means that we no longer
use the extending instructions LGF and LLGF for spills in int-conv-09.ll
and int-conv-10.ll.  Instead we treat the registers as 64 bits wide and
truncate them to 32-bits where necessary.  I think it's unlikely we'd
use LGF and LLGF for spills in other situations for the same reason,
so I'm removing the tests rather than replacing them.  The associated
code is generic and applies to many more instructions than just
LGF and LLGF, so there is no corresponding code removal.

llvm-svn: 188669
2013-08-19 12:42:31 +00:00
Hal Finkel
23714b2e52 Add ExpandFloatOp_FCOPYSIGN to handle ppcf128-related expansions
We had previously been asserting when faced with a FCOPYSIGN f64, ppcf128 node
because there was no way to expand the FCOPYSIGN node. Because ppcf128 is the
sum of two doubles, and the first double must have the larger magnitude, we
can take the sign from the first double. As a result, in addition to fixing the
crash, this is also an optimization.

llvm-svn: 188655
2013-08-19 06:55:37 +00:00
Hal Finkel
9591220c33 Add the PPC fcpsgn instruction
Modern PPC cores support a floating-point copysign instruction, and we can use
this to lower the FCOPYSIGN node (which is created from calls to the libm
copysign function). A couple of extra patterns are necessary because the
operand types of FCOPYSIGN need not agree.

llvm-svn: 188653
2013-08-19 05:01:02 +00:00
Tim Northover
057a4d7c26 ARM: make sure we keep inline asm operands tied.
When patching inlineasm nodes to use GPRPair for 64-bit values, we
were dropping the information that two operands were tied, which
effectively broke the live-interval of vregs affected.

llvm-svn: 188643
2013-08-18 18:06:03 +00:00