1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 14:33:02 +02:00
Commit Graph

12729 Commits

Author SHA1 Message Date
Michael Kuperstein
c9165a5a41 [X86] ABI change for x86-32: pass 3 vector arguments in-register instead of 4, except on Darwin.
This changes the ABI used on 32-bit x86 for passing vector arguments. 
Historically, clang passes the first 4 vector arguments in-register, and additional vector arguments on the stack, regardless of platform. That is different from the behavior of gcc, icc, and msvc, all of which pass only the first 3 arguments in-register.
The 3-register convention is documented, unofficially, in Agner's calling convention guide, and, officially, in the recently released version 1.0 of the i386 psABI.

Darwin is kept as is because the OS X ABI Function Call Guide explicitly documents the current (4-register) behavior.

This fixes PR21510

Differential revision: http://reviews.llvm.org/D9644

llvm-svn: 237682
2015-05-19 11:06:56 +00:00
Reid Kleckner
6d90c1f8f3 Re-land r237175: [X86] Always return the sret parameter in eax/rax ...
This reverts commit r237210.

Also fix X86/complex-fca.ll to match the code that we used to generate
on win32 and now generate everwhere to conform to SysV.

llvm-svn: 237639
2015-05-18 23:35:09 +00:00
Tim Northover
42f554c12a ARM: allow jump tables to be placed as constant islands.
Previously, they were forced to immediately follow the actual branch
instruction. This was usually OK (the LEAs actually accessing them got emitted
nearby, and weren't usually separated much afterwards). Unfortunately, a
sufficiently nasty phi elimination dumps many instructions right before the
basic block terminator, and this can increase the range too much.

This patch frees them up to be placed as usual by the constant islands pass,
and consequently has to slightly modify the form of TBB/TBH tables to refer to
a PC-relative label at the final jump. The other jump table formats were
already position-independent.

rdar://20813304

llvm-svn: 237590
2015-05-18 17:10:40 +00:00
Oliver Stannard
d562deac4b Revert r237579, as it broke windows buildbots
llvm-svn: 237583
2015-05-18 16:39:16 +00:00
James Y Knight
494ba9c43a Add support for the Sparc implementation-defined "ASR" registers.
(Note that register "Y" is essentially just ASR0).

Also added some test cases for divide and multiply, which had none before.

Differential Revision: http://reviews.llvm.org/D8670

llvm-svn: 237580
2015-05-18 16:29:48 +00:00
Oliver Stannard
8fe6b2aa15 [LLVM - ARM/AArch64] Add ACLE special register intrinsics
This patch implements LLVM support for the ACLE special register intrinsics in
section 10.1, __arm_{w,r}sr{,p,64}.

This patch is intended to lower the read/write_register instrinsics, used to
implement the special register intrinsics in the clang patch for special
register intrinsics (see http://reviews.llvm.org/D9697), to ARM specific
instructions MRC,MCR,MSR etc. to allow reading an writing of coprocessor
registers in AArch32 and AArch64. This is done by inspecting the register
string passed to the intrinsic and then lowering to the appropriate
instruction.

Patch by Luke Cheeseman.

Differential Revision: http://reviews.llvm.org/D9699

llvm-svn: 237579
2015-05-18 16:23:33 +00:00
Elena Demikhovsky
0b01514dcb AVX-512: Added intrinsics for ADDSS/D, MULSS/D, SUBSS/D, DIVSS/D
instructions. These intrinsics are comming with rounding mode.
Added intrinsics for MAXSS/D, MINSS/D - with and without  sae.

By Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 237560
2015-05-18 07:24:19 +00:00
Elena Demikhovsky
a8c3a107bb AVX-512: Added patterns for scalar-to-vector broadcast
llvm-svn: 237558
2015-05-18 07:06:23 +00:00
Hal Finkel
d380588a20 [PowerPC] Add extra r2 read deps on @toc@l relocations
If some commits are happy, and some commits are sad, this is a sad commit. It
is sad because it restricts instruction scheduling to work around a binutils
linker bug, and moreover, one that may never be fixed. On 2012-05-21, GCC was
updated not to produce code triggering this bug, and now we'll do the same...

When resolving an address using the ELF ABI TOC pointer, two relocations are
generally required: one for the high part and one for the low part. Only
the high part generally explicitly depends on r2 (the TOC pointer). And, so,
we might produce code like this:

.Ltmp526:
        addis 3, 2, .LC12@toc@ha
.Ltmp1628:
        std 2, 40(1)
        ld 5, 0(27)
        ld 2, 8(27)
        ld 11, 16(27)
        ld 3, .LC12@toc@l(3)
        rldicl 4, 4, 0, 32
        mtctr 5
        bctrl
        ld 2, 40(1)

And there is nothing wrong with this code, as such, but there is a linker bug
in binutils (https://sourceware.org/bugzilla/show_bug.cgi?id=18414) that will
misoptimize this code sequence to this:
        nop
        std     r2,40(r1)
        ld      r5,0(r27)
        ld      r2,8(r27)
        ld      r11,16(r27)
        ld      r3,-32472(r2)
        clrldi  r4,r4,32
        mtctr   r5
        bctrl
        ld      r2,40(r1)
because the linker does not know (and does not check) that the value in r2
changed in between the instruction using the .LC12@toc@ha (TOC-relative)
relocation and the instruction using the .LC12@toc@l(3) relocation.
Because it finds these instructions using the relocations (and not by
scanning the instructions), it has been asserted that there is no good way
to detect the change of r2 in between. As a result, this bug may never be
fixed (i.e. it may become part of the definition of the ABI). GCC was
updated to add extra dependencies on r2 to instructions using the @toc@l
relocations to avoid this problem, and we'll do the same here.

This is done as a separate pass because:
 1. These extra r2 dependencies are not really properties of the
    instructions, but rather due to a linker bug, and maybe one day we'll be
    able to get rid of them when targeting linkers without this bug (and,
    thus, keeping the logic centralized here will make that
    straightforward).
 2. There are ISel-level peephole optimizations that propagate the @toc@l
    relocations to some user instructions, and so the exta dependencies do
    not apply only to a fixed set of instructions (without undesirable
    definition replication).

The test case was reduced with the help of bugpoint, with minimal cleaning. I'm
looking forward to our upcoming MI serialization support, and with that, much
better tests can be created.

llvm-svn: 237556
2015-05-18 06:25:59 +00:00
Elena Demikhovsky
e802d572b1 AVX-512: fixed extended load to 512-bit register
llvm-svn: 237537
2015-05-17 08:08:06 +00:00
Elena Demikhovsky
1e28397b5a AVX-512: fixed a bug in mask operations - (i1 1) pattern
Filling k-reg with all-ones value was wrong,
(i1 1) should switch on only one bit in mask register

llvm-svn: 237536
2015-05-17 07:28:51 +00:00
Bill Schmidt
d115a77d30 [PPC64] Add vector pack/unpack support from ISA 2.07
This patch adds support for the following new instructions in the
Power ISA 2.07:

  vpksdss
  vpksdus
  vpkudus
  vpkudum
  vupkhsw
  vupklsw

These instructions are available through the vec_packs, vec_packsu,
vec_unpackh, and vec_unpackl built-in interfaces.  These are
lane-sensitive instructions, so the built-ins have different
implementations for big- and little-endian, and the instructions must
be marked as killing the vector swap optimization for now.

The first three instructions perform saturating pack operations.  The
fourth performs a modulo pack operation, which means it can be
represented with a vector shuffle, and conversely the appropriate
vector shuffles may cause this instruction to be generated.  The other
instructions are only generated via built-in support for now.

Appropriate tests have been added.

There is a companion patch to clang for the rest of this support.

llvm-svn: 237499
2015-05-16 01:02:12 +00:00
James Molloy
a6f56a768b Mark SMIN/SMAX/UMIN/UMAX nodes as legal and add patterns for them.
The new [SU]{MIN,MAX} SDNodes can be lowered directly to instructions for
most NEON datatypes - the big exclusion being v2i64.

llvm-svn: 237455
2015-05-15 16:15:57 +00:00
Brendon Cahoon
7b7b1f4d51 [Hexagon] Generate hardware loop for a vectorized loop
The induction variable in the vectorized loop wasn't
recognized properly, so a hardware loop wasn't generated.

Differential Revision: http://reviews.llvm.org/D9722

llvm-svn: 237388
2015-05-14 20:36:19 +00:00
Brendon Cahoon
92c444f2aa [Hexagon] Remove dead constant assignment in hardware loop pass
After converting a loop to a hardware loop, the pass should remove
any unnecessary instructions from the old compare-and-branch
code. This patch removes a dead constant assignment that was
used in the compare instruction.

Differential Revision: http://reviews.llvm.org/D9720

llvm-svn: 237373
2015-05-14 17:31:40 +00:00
Brendon Cahoon
c76b3878ca [Hexagon] Check for underflow/wrap in hardware loop pass
If the loop trip count may underflow or wrap, the compiler should
not generate a hardware loop since the trip count will be
incorrect.

llvm-svn: 237365
2015-05-14 14:15:08 +00:00
Vasileios Kalintiris
fdfa4b2c5d [mips] Do not place users of $ra in the delay slot of call instructions.
Summary:
When we are trying to fill the delay slot of a call instruction, we must avoid
filler instructions that use the $ra register. This fixes the test
MultiSource/Applications/JM/lencod when we enable the forward delay slot filler.

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9670

llvm-svn: 237362
2015-05-14 13:17:56 +00:00
Artyom Skrobov
b1d8d6d276 Re-apply r237247 - [AArch64] Codegen VMAX/VMIN for safe math cases
No longer breaks SPEC2000/2006

llvm-svn: 237361
2015-05-14 12:59:46 +00:00
Elena Demikhovsky
b6e5772812 AVX-512: Added i1 type handling for calling conventions.
i1 type is a legal type on AVX-512 and can be passed as parameter or return value.
i1 is promoted to i8 on return and to i32 for call arguments (i8 is also promoted to i32 here).
The result code is similar to the previous X86 targets, where i1 is allways promoted to i8.

llvm-svn: 237350
2015-05-14 09:04:45 +00:00
Ahmed Bougacha
e34f179706 [CodeGen] Use standard -not gnueabi- naming for f16 libcalls on Darwin.
Other targets probably should as well.  Since r237161, compiler-rt has
both, but I don't see why anything other than gnueabi would use a
gnueabi naming scheme.

llvm-svn: 237324
2015-05-14 01:00:51 +00:00
Brendon Cahoon
72f21b863a [Hexagon] Generate loop1 instruction for nested loops
loop1 is for the outer loop and loop0 is for the inner loop.

Differential Revision: http://reviews.llvm.org/D9680

llvm-svn: 237266
2015-05-13 17:56:03 +00:00
Brendon Cahoon
1b5bcf570f [Hexagon] Generate hardware loop when loop has a critical edge
The hardware loop pass should try to generate a hardware loop
instruction when the original loop has a critical edge.

Differential Revision: http://reviews.llvm.org/D9678

llvm-svn: 237258
2015-05-13 14:54:24 +00:00
Silviu Baranga
452d76e681 Revert r237247 - [AArch64] Codegen VMAX/VMIN.. as it is causing failures in SPEC2000/2006
llvm-svn: 237256
2015-05-13 14:03:18 +00:00
Artyom Skrobov
b27364c475 [AArch64] Codegen VMAX/VMIN for safe math cases
llvm-svn: 237247
2015-05-13 12:01:09 +00:00
Sanjoy Das
6d67db8c09 [Statepoints] Support for "patchable" statepoints.
Summary:
This change adds two new parameters to the statepoint intrinsic, `i64 id`
and `i32 num_patch_bytes`.  `id` gets propagated to the ID field
in the generated StackMap section.  If the `num_patch_bytes` is
non-zero then the statepoint is lowered to `num_patch_bytes` bytes of
nops instead of a call (the spill and reload code remains unchanged).
A non-zero `num_patch_bytes` is useful in situations where a language
runtime requires complete control over how a call is lowered.

This change brings statepoints one step closer to patchpoints.  With
some additional work (that is not part of this patch) it should be
possible to get rid of `TargetOpcode::STATEPOINT` altogether.

PlaceSafepoints generates `statepoint` wrappers with `id` set to
`0xABCDEF00` (the old default value for the ID reported in the stackmap)
and `num_patch_bytes` set to `0`.  This can be made more sophisticated
later.

Reviewers: reames, pgavlin, swaroop.sridhar, AndyAyers

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9546

llvm-svn: 237214
2015-05-12 23:52:24 +00:00
Saleem Abdulrasool
fa770f8269 CodeGen: ignore DEBUG_VALUE nodes in KILL tagging
DEBUG_VALUE nodes do not take part in code generation.  Ignore them when
performing KILL updates.  Addresses PR23486.

llvm-svn: 237211
2015-05-12 23:36:18 +00:00
Chandler Carruth
a9a05bbdfe Revert r237175: [X86] Always return the sret parameter in eax/rax ...
This commit broke an x86 test and the bots have been broken for well
over an hour now so I'm just reverting.

llvm-svn: 237210
2015-05-12 23:34:27 +00:00
Reid Kleckner
170b30ec78 [X86] Always return the sret parameter in eax/rax, even on 32-bit
Summary:
This rule was always in the old SysV i386 ABI docs and the new ones that
H.J. Lu has put together, but we never noticed:

  EAX   scratch register; also used to return integer and pointer values
        from functions; also stores the address of a returned struct or union

Fixes PR23491.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9715

llvm-svn: 237175
2015-05-12 20:56:32 +00:00
Sundeep Kushwaha
033e3e3b61 [PATCH] [HEXAGON] Add a test program to verify calling convention
for large struct return by value.

Differential Revision: http://reviews.llvm.org/D9709

llvm-svn: 237170
2015-05-12 20:13:10 +00:00
Pat Gavlin
f7cb0841d2 [Statepoints] Split the calling convention and statepoint flags operand to STATEPOINT into two separate operands.
Differential Revision: http://reviews.llvm.org/D9623

llvm-svn: 237166
2015-05-12 19:50:19 +00:00
Petar Jovanovic
bd0b851f88 [Mips] Return false for isFPCloseToIncomingSP()
On Mips, frame pointer points to the same side of the frame as the stack
pointer. This function is used to decide where to put register scavenging
spill slot. So far, it was put on the wrong side of the frame, and thus it
was too far away from $fp when frame was larger than 2^15 bytes.

Patch by Vladimir Radosavljevic.

http://reviews.llvm.org/D8895

llvm-svn: 237153
2015-05-12 17:14:05 +00:00
Tom Stellard
db0893e43a R600/SI: add pass to mark CF live ranges as non-spillable
Spilling can insert instructions almost anywhere, and this can mess
up control flow lowering in a multitude of ways, due to instruction
reordering. Let's sort this out the easy way: never spill registers
involved with control flow, i.e. saved EXEC masks.

Unfortunately, this does not work at all with optimizations disabled,
as the register allocator ignores spill weights. This should be
addressed in a future commit.

The test was reduced from the "stacks" shader of [1]. Some issues
trigger the machine verifier while another one is checked manually.

[1] http://madebyevan.com/webgl-path-tracing/

v2: only insert pass with optimizations enabled, merge test runs.

Patch by: Grigori Goronzy

llvm-svn: 237152
2015-05-12 17:13:02 +00:00
Sunil Srivastava
e4b73132fd Changed renaming of local symbols by inserting a dot vefore the numeric suffix.
One code change and several test changes to match that
details in http://reviews.llvm.org/D9481

llvm-svn: 237150
2015-05-12 16:47:30 +00:00
Tom Stellard
d2e3c8c77e R600/SI: Remove M0Reg register class
It is no longer used.

llvm-svn: 237142
2015-05-12 15:00:52 +00:00
Tom Stellard
368195b84d R600/SI: Remove explicit m0 operand from DS instructions
Instead add m0 as an implicit operand.  This helps avoid spills
of the m0 register in some cases.

llvm-svn: 237141
2015-05-12 15:00:49 +00:00
Tom Stellard
b85064af8f R600/SI: Make sendmsg test more strict
We want to make sure that the m0 copies are being cse'd.

llvm-svn: 237134
2015-05-12 14:18:16 +00:00
Elena Demikhovsky
bd877a1b65 AVX-512, X86: Added lowering for shift operations for SKX.
The other changes in the LowerShift() are not functional,
just to make the code more convenient.
So, the functional changes for SKX only.

llvm-svn: 237129
2015-05-12 13:25:46 +00:00
John Brawn
c74bcc2521 [ARM] Use AEABI aligned function variants
AEABI defines aligned variants of memcpy etc. that can be faster than
the default version due to not having to do alignment checks. When
emitting target code for these functions make use of these aligned
variants if possible. Also convert memset to memclr if possible.

Differential Revision: http://reviews.llvm.org/D8060

llvm-svn: 237127
2015-05-12 13:13:38 +00:00
Igor Laevsky
d8e7227125 Reverse ordering of base and derived pointer during safepoint lowering.
According to the documentation in StackMap section for the safepoint we should have:
"The first Location in each pair describes the base pointer for the object. The second is the derived pointer actually being relocated."
But before this change we emitted them in reverse order - derived pointer first, base pointer second.

llvm-svn: 237126
2015-05-12 13:12:14 +00:00
Vasileios Kalintiris
2ff1217d69 [mips][FastISel] Handle calls with non legal types i8 and i16.
Summary: Allow calls with non legal integer types based on i8 and i16 to be processed by mips fast-isel.

Based on a patch by Reed Kotler.

Test Plan:
"Make check" test forthcoming.
Test-suite passes at O0/O2 and with mips32 r1/r2

Reviewers: rkotler, dsanders

Subscribers: llvm-commits, rfuhler

Differential Revision: http://reviews.llvm.org/D6770

llvm-svn: 237121
2015-05-12 12:29:17 +00:00
Vasileios Kalintiris
adab1a88f7 [mips][FastISel] Simplify callabi.ll by using multiple check prefixes.
Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9635

llvm-svn: 237119
2015-05-12 12:17:11 +00:00
Vasileios Kalintiris
23d90511ce [mips][FastISel] Allow computation of addresses from constant expressions.
Summary:
Try to compute addresses when the offset from a memory location is a constant
expression.

Based on a patch by Reed Kotler.

Test Plan:
Passes test-suite for -O0/O2 and mips 32 r1/r2

Reviewers: rkotler, dsanders

Subscribers: llvm-commits, aemerson, rfuhler

Differential Revision: http://reviews.llvm.org/D6767

llvm-svn: 237117
2015-05-12 12:08:31 +00:00
Elena Demikhovsky
a975085c80 AVX-512: select operation for i1 vectors
like: select i1 %cond, <16 x i1> %a, <16 x i1> %b.
I added pseudo-CMOV patterns to resolve the "select".
Added tests for KNL and SKX.

llvm-svn: 237106
2015-05-12 09:36:52 +00:00
Michael Kuperstein
59d6eb954a [X86] DAGCombine should not assume arbitrary vector types are simple
The X86-specific DAGCombine for stores should not assume vector types are always simple.
This fixes PR23476.

Differential Revision: http://reviews.llvm.org/D9659

llvm-svn: 237097
2015-05-12 07:33:07 +00:00
Eric Christopher
2ba04d1116 Migrate existing backends that care about software floating point
to use the information in the module rather than TargetOptions.

We've had and clang has used the use-soft-float attribute for some
time now so have the backends set a subtarget feature based on
a particular function now that subtargets are created based on
functions and function attributes.

For the one middle end soft float check go ahead and create
an overloadable TargetLowering::useSoftFloat function that
just checks the TargetSubtargetInfo in all cases.

Also remove the command line option that hard codes whether or
not soft-float is set by using the attribute for all of the
target specific test cases - for the generic just go ahead and
add the attribute in the one case that showed up.

llvm-svn: 237079
2015-05-12 01:26:05 +00:00
Andrew Kaylor
c29dc428f8 [WinEH] Handle nested landing pads that return directly to the parent function.
Differential Revision: http://reviews.llvm.org/D9684

llvm-svn: 237063
2015-05-11 23:06:02 +00:00
Andrew Kaylor
295660694e [WinEH] Update exception numbering to give handlers their own base state.
Differential Revision: http://reviews.llvm.org/D9512

llvm-svn: 237014
2015-05-11 19:41:19 +00:00
Pirama Arumuga Nainar
e8329e15ec [X86] Updates to X86 backend for f16 promotion
Summary:
r235215 adds support for f16 to be considered as a load/store type and
promote f16 operations to f32.

This patch has miscellaneous fixes for the X86 backend so all f16
operations are handled:
1. Set loadextaction for f16 vectors to expand.
2. Handle FP_EXTEND in a switch statement when handling v2f32
3. Do not fold (FP_TO_SINT (load f16)) into FP_TO_INT*_IN_MEM or
(store (SINT_TO_FP )) to a FILD.

Tests included.

Reviewers: ab, srhines, delena

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9092

llvm-svn: 237004
2015-05-11 17:14:39 +00:00
Elena Demikhovsky
8c817abca9 AVX-512: Changed CC parameter in "cmp" intrinsic
from i8 to i32 according to the Intel Spec

by Igor Breger (igor.breger@intel.com)

llvm-svn: 236979
2015-05-11 09:03:14 +00:00
Elena Demikhovsky
f25b492812 AVX-512: Added SKX instructions and intrinsics:
{add/sub/mul/div/} x {ps/pd} x {128/256} 2. max/min with sae

By Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 236971
2015-05-11 06:05:05 +00:00