1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 13:11:39 +01:00

29567 Commits

Author SHA1 Message Date
Joerg Sonnenberger
d344d15c74 Add first bunch of SPE instructions. As they overlap with Altivec, mark
them as parser-only until the disassembler is extended to handle
predicates properly.

llvm-svn: 215102
2014-08-07 12:18:21 +00:00
Alexander Kornienko
bd7efdfe72 Insert parens to avoid a warning:
suggest parentheses around arithmetic in operand of '^' [-Wparentheses]

llvm-svn: 215101
2014-08-07 12:09:34 +00:00
Daniel Sanders
64c798332f [mips] Add assembler support for .set msa/nomsa directive.
Summary:
These directives are used to toggle whether the assembler accepts MSA-specific instructions or not.

Patch by Matheus Almeida and Toma Tabacu.

Reviewers: dsanders

Reviewed By: dsanders

Differential Revision: http://reviews.llvm.org/D4783

llvm-svn: 215099
2014-08-07 12:03:36 +00:00
Pavel Chupin
3b0d0d5928 Fix lld-x86_64-win7 Build #11969
llvm-svn: 215097
2014-08-07 11:09:59 +00:00
Chandler Carruth
74dc5ae1d0 [x86] Fix another miscompile found through fuzz testing the new vector
shuffle lowering.

This is closely related to the previous one. Here we failed to use the
source offset when swapping in the other case -- where we end up
swapping the *final* shuffle. The cause of this bug is a bit different:
I simply wasn't thinking about the fact that this mask is actually
a slice of a wide mask and thus has numbers that need SourceOffset
applied. Simple fix. Would be even more simple with an algorithm-y thing
to use here, but correctness first. =]

llvm-svn: 215095
2014-08-07 10:37:35 +00:00
Chandler Carruth
c19c208775 [x86] Fix another miscompile in the new vector shuffle lowering found
via the fuzz tester.

Here I missed an offset when round-tripping a value through a shuffle
mask. I got it right 2 lines below. See a problem? I do. ;] I'll
probably be adding a little "swap" algorithm which accepts a range and
two values and swaps those values where they occur in the range. Don't
really have a name for it, let me know if you do.

llvm-svn: 215094
2014-08-07 10:14:27 +00:00
Chandler Carruth
4d35998980 [x86] Fix another miscompile in the new vector shuffle lowering found
through the new fuzzer.

This one is great: bad operator precedence led the modulus to happen at
the wrong point. All the asserts didn't fire because there were usually
the right values past the end of the 4 element region we were looking
at. Probably could have gotten a crash here with ASan + fuzzing, but the
correctness tests pinpointed this really nicely.

llvm-svn: 215092
2014-08-07 09:45:02 +00:00
Pavel Chupin
7f4a227354 [x32] Use ebp/esp as frame and stack pointer
Summary:
Since pointers are 32-bit on x32 we can use ebp and esp as frame and stack
pointer. Some operations like PUSH/POP and CFI_INSTRUCTION still
require 64-bit register, so using 64-bit MachineFramePtr where required.

X86_64 NaCl uses 64-bit frame/stack pointers, however it's been found that
both isTarget64BitLP64 and isTarget64BitILP32 are true for NaCl. Addressing
this issue here as well by making isTarget64BitLP64 false.

Also mark hasReservedSpillSlot unreachable on X86. See inlined comments.

Test Plan: Add one new simple test and upgrade 2 existing with x32 target case.

Reviewers: nadav, dschuff

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D4617

llvm-svn: 215091
2014-08-07 09:41:19 +00:00
Chandler Carruth
291bc5d9a4 [x86] Fix a miscompile in the new shuffle lowering found through the new
fuzz testing.

The function which tested for adjacency did what it said on the tin, but
when I called it, I wanted it to do something more thorough: I wanted to
know if the *pairs* of shuffle elements were adjacent and started at
0 mod 2. In one place I had the decency to try to test for this, but in
the other it was completely skipped, miscompiling this test case. Fix
this by making the helper actually do what I wanted it to do everywhere
I called it (and removing the now redundant code in one place).

I *really* dislike the name "canWidenShuffleElements" for this
predicate. If anyone can come up with a better name, please let me know.
The other name I thought about was "canWidenShuffleMask" but is it
really widening the mask to reduce the number of lanes shuffled? I don't
know. Naming things is hard.

llvm-svn: 215089
2014-08-07 08:11:31 +00:00
Pete Cooper
4afa5aa1cc Fix a whole bunch of binary literals which were the wrong size. All were being silently zero extended to the correct width.
The commit after this changes { } and 0bxx literals to be of type bits<n> and not int.  This means we need to write exactly the right number of bits, and not rely on the values being silently zero extended for us.

llvm-svn: 215082
2014-08-07 05:46:54 +00:00
Saleem Abdulrasool
37f9f1e4f7 MC: split Win64EHUnwindEmitter into a shared streamer
This changes Win64EHEmitter into a utility WinEH UnwindEmitter that can be
shared across multiple architectures and a target specific bit which is
overridden (Win64::UnwindEmitter).  This enables sharing the section selection
code across X86 and the intended use in ARM for emitting unwind information for
Windows on ARM.

llvm-svn: 215050
2014-08-07 02:59:41 +00:00
Quentin Colombet
e540e9c357 [X86][SchedModel] Fixed missing/wrong scheduling model found by code inspection.
Source: Agner Fog's Instruction tables.

Related to <rdar://problem/15607571>

llvm-svn: 215045
2014-08-07 00:20:44 +00:00
Reid Kleckner
f0567dde14 MC X86: Accept ".att_syntax prefix" and diagnose noprefix
Fixes PR18916.  I don't think we need to implement support for either
hybrid syntax.  Nobody should write Intel assembly with '%' prefixes on
their registers or AT&T assembly without them.

llvm-svn: 215031
2014-08-06 23:21:13 +00:00
Sanjay Patel
8cd2aae34c fix typo
llvm-svn: 214995
2014-08-06 21:08:38 +00:00
Eric Christopher
4a1cdb2ba7 Remove the target machine from CCState. Previously it was only used
to get the subtarget and that's accessible from the MachineFunction
now. This helps clear the way for smaller changes where we getting
a subtarget will require passing in a MachineFunction/Function as
well.

llvm-svn: 214988
2014-08-06 18:45:26 +00:00
Chad Rosier
0214f513b9 [AArch64] Add a few isTarget* API to AArch64 Subtarget.
llvm-svn: 214977
2014-08-06 16:56:58 +00:00
Chad Rosier
8291c24b09 [AArch64] Fix OS ABI flag for aarch64-linux-gnu target.
For triple aarch64-linux-gnu we were incorrectly setting IRIX.
For triple aarch64 we are correctly setting SYSV.

Patch by Ana Pazos <apazos@codeaurora.org>.

llvm-svn: 214974
2014-08-06 16:05:02 +00:00
Robert Khasanov
970483b673 [AVX512] Added load/store instructions to Register2Memory opcode tables.
Added lowering tests for load/store.

Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>

llvm-svn: 214972
2014-08-06 15:40:34 +00:00
James Molloy
7518c61a09 [AArch64] Add a testcase for r214957.
llvm-svn: 214965
2014-08-06 13:31:32 +00:00
Tim Northover
0028a9a97d ARM: do not generate BLX instructions on Cortex-M CPUs.
Particularly on MachO, we were generating "blx _dest" instructions on M-class
CPUs, which don't actually exist. They happen to get fixed up by the linker
into valid "bl _dest" instructions (which is why such a massive issue has
remained largely undetected), but we shouldn't rely on that.

llvm-svn: 214959
2014-08-06 11:13:14 +00:00
Tim Northover
7abd4db81b ARM-MachO: materialize callee address correctly on v4t.
llvm-svn: 214958
2014-08-06 11:13:06 +00:00
James Molloy
2bbe86fab0 [AArch64] Conditional selects are expensive on out-of-order cores.
Specifically Cortex-A57. This probably applies to Cyclone too but I haven't enabled it for that as I can't test it.

This gives ~4% improvement on SPEC 174.vpr, and ~1% in 471.omnetpp.

llvm-svn: 214957
2014-08-06 10:42:18 +00:00
Chandler Carruth
2a63640957 [x86] Fix two independent miscompiles in the process of getting the same
test case to actually generate correct code.

The primary miscompile fixed here is that we weren't correctly handling
in-place elements in one half of a single-input v8i16 shuffle when
moving a dword of elements from that half to the other half. Some times,
we would clobber the in-place elements in forming the dword to move
across halves.

The fix to this involves forcibly marking the in-place inputs even when
there is no need to gather them into a dword, and to much more carefully
re-arrange the elements when grouping them into a dword to move across
halves. With these two changes we would generate correct shuffles for
the test case, but found another miscompile. There are also some random
perturbations of the generated shuffle pattern in SSE2. It looks like
a wash; more instructions in some cases fewer in others.

The second miscompile would corrupt the results into nonsense. This is
a buggy pattern in one of the added DAG combines. Mapping elements
through a PSHUFD when pairing redundant half-shuffles is *much* harder
than this code makes it out to be -- it requires reasoning about *all*
of where the input is used in the PSHUFD, not just one part of where it
is used. Plus, we can't combine a half shuffle *into* a PSHUFD but the
code didn't guard against it. I think this was just a bad idea and I've
just removed that aspect of the combine. No tests regress as
a consequence so seems OK.

llvm-svn: 214954
2014-08-06 10:16:36 +00:00
Chandler Carruth
8d598f2f29 [x86] Switch to a formulation of a for loop that is much more obviously
not corrupting the mask by mutating it more times than intended. No
functionality changed (the results were non-overlapping so the old
version "worked" but was non-obvious).

llvm-svn: 214953
2014-08-06 10:16:33 +00:00
Adam Nemet
aea49d4d5f [X86] Fixes commit r214890 to match the posted patch
This was another fallout from my local rebase where something went wrong :(

llvm-svn: 214951
2014-08-06 07:13:12 +00:00
Matt Arsenault
7d4ad478b1 Correct comment
llvm-svn: 214945
2014-08-06 00:44:25 +00:00
Matt Arsenault
3005d01057 R600: Increase nearby load scheduling threshold.
This partially fixes weird looking load scheduling
in memcpy test. The load clustering doesn't seem
particularly smart, but this method seems to be partially
deprecated so it might not be worth trying to fix.

llvm-svn: 214943
2014-08-06 00:29:49 +00:00
Matt Arsenault
68712599d9 R600/SI: Implement areLoadsFromSameBasePtr
This currently has a noticable effect on the kernel argument loads.
LDS and global loads are more problematic, I think because of how copies
are currently inserted to ensure that the address is a VGPR.

llvm-svn: 214942
2014-08-06 00:29:43 +00:00
Quentin Colombet
8e3912c486 [X86][SchedModel] Fixed some wrong scheduling model found by code inspection.
Source: Agner Fog's Instruction tables.

Related to <rdar://problem/15607571>

llvm-svn: 214940
2014-08-06 00:22:39 +00:00
Matt Arsenault
1ecd6214f0 R600/SI: Add definitions for ds_read2st64_ / ds_write2st64_
llvm-svn: 214936
2014-08-05 23:53:20 +00:00
JF Bastien
314c089ba3 Fix typos in comments and doc
Committing http://reviews.llvm.org/D4798 for Robin Morisset (morisset@google.com)

llvm-svn: 214934
2014-08-05 23:27:34 +00:00
Rafael Espindola
c981388b03 Remove a virtual function from TargetMachine. NFC.
llvm-svn: 214929
2014-08-05 22:10:21 +00:00
Jonathan Roelofs
7bcdffb32c Re-apply r214881: Fix return sequence on armv4 thumb
This reverts r214893, re-applying r214881 with the test case relaxed a bit to
satiate the build bots.

POP on armv4t cannot be used to change thumb state (unilke later non-m-class
architectures), therefore we need a different return sequence that uses 'bx'
instead:

  POP {r3}
  ADD sp, #offset
  BX r3

This patch also fixes an issue where the return value in r3 would get clobbered
for functions that return 128 bits of data. In that case, we generate this
sequence instead:

  MOV ip, r3
  POP {r3}
  ADD sp, #offset
  MOV lr, r3
  MOV r3, ip
  BX lr

http://reviews.llvm.org/D4748

llvm-svn: 214928
2014-08-05 21:32:21 +00:00
Bill Schmidt
8159f9047b [PowerPC] Swap arguments and adjust shift count for vsldoi on little endian
Commits r213915 and r214718 fix recognition of shuffle masks for vmrg*
and vpku*um instructions for a little-endian target, by swapping the
input arguments.  The vsldoi instruction requires similar treatment,
and also needs its shift count adjusted for little endian.

Reviewed by Ulrich Weigand.

This is a bug fix candidate for release 3.5 (and hopefully the last of
those for PowerPC).

llvm-svn: 214923
2014-08-05 20:47:25 +00:00
Chandler Carruth
045f351a7f [x86] Fix a crasher due to shuffles which cancel each other out and add
a test case.

We also miscompile this test case which is showing a serious flaw in the
single-input v8i16 shuffle code. I've left the specific instruction
checks FIXME-ed out until I can address the bug in the single-input
code, but I wanted to separate out a significant functionality change to
produce correct code from a very simple and targeted crasher fix.

The miscompile problem stems from keeping track of inputs by value
rather than by index. As a consequence of doing this, we can't reliably
update those inputs because they might swap and we can't detect this
without copying the mask.

The blend code now uses indices for the input lists and this seems
strictly better. It also should make it easier to sort things and do
other cleanups. I think the time has come to simplify The Great Lambda
here.

llvm-svn: 214914
2014-08-05 18:45:49 +00:00
NAKAMURA Takumi
b1c451ece2 X86CodeEmitter.cpp: Add SEH_Epilogue to ignored list for legacy JIT, corresponding to r214775.
llvm-svn: 214905
2014-08-05 18:04:15 +00:00
Adam Nemet
83d4fdecfc [X86] Improve comments for r214888
A rebase somehow ate my comments. This restores them.

llvm-svn: 214903
2014-08-05 17:58:49 +00:00
Matt Arsenault
4273af4e86 R600/SI: Use register class instead of list of registers
I'm not sure if this has any consequence or not.

llvm-svn: 214902
2014-08-05 17:52:40 +00:00
Matt Arsenault
f63af80339 R600/SI: Add exec_lo and exec_hi subregisters.
This allows accessing an SReg subregister with a normal subregister
index, instead of getting a machine verifier error.

Also be sure to include all of these subregisters in SReg_32.
This fixes inferring SGPR instead of SReg when finding a
super register class.

llvm-svn: 214901
2014-08-05 17:52:37 +00:00
Jonathan Roelofs
5721e98d43 Revert r214881 because it broke lots of build-bots
llvm-svn: 214893
2014-08-05 17:36:05 +00:00
Adam Nemet
d956b84b6b [AVX512] Add masking variant and intrinsics for valignd/q
This is similar to what I did with the two-source permutation recently.  (It's
almost too similar so that we should consider generating the masking variants
with some tablegen help.)

Both encoding and intrinsic tests are added as well.  For the latter, this is
what the IR that the intrinsic test on the clang side generates.

Part of <rdar://problem/17688758>

llvm-svn: 214890
2014-08-05 17:23:04 +00:00
Adam Nemet
3d838a0ae1 [X86] Increase X86_MAX_OPERANDS from 5 to 6
This controls the number of operands in the disassembler's x86OperandSets
table.  The entries describe how the operand is encoded and its type.

Not to surprisingly 5 operands is insufficient for AVX512.  Consider
VALIGNDrrik in the next patch.  These are its operand specifiers:

  { /* 328 */
    { ENCODING_DUP, TYPE_DUP1 },
    { ENCODING_REG, TYPE_XMM512 },
    { ENCODING_WRITEMASK, TYPE_VK8 },
    { ENCODING_VVVV, TYPE_XMM512 },
    { ENCODING_RM_CD64, TYPE_XMM512 },
    { ENCODING_IB, TYPE_IMM8 },
  },

llvm-svn: 214889
2014-08-05 17:23:01 +00:00
Adam Nemet
6ce89969d7 [X86] Add lowering to VALIGN
This was currently part of lowering to PALIGNR with some special-casing to
make interlane shifting work.  Since AVX512F has interlane alignr (valignd/q)
and AVX512BW has vpalignr we need to support both of these *at the same time*,
e.g. for SKX.

This patch breaks out the common code and then add support to check both of
these lowering options from LowerVECTOR_SHUFFLE.

I also added some FIXMEs where I think the AVX512BW and AVX512VL additions
should probably go.

llvm-svn: 214888
2014-08-05 17:22:59 +00:00
Adam Nemet
3cbf71a23b [X86] Separate DAG node for valign and palignr
They have different semantics (valign is interlane while palingr is intralane)
and palingr is still needed even in the AVX512 context.  According to the
latest spec AVX512BW provides these.

llvm-svn: 214887
2014-08-05 17:22:55 +00:00
Adam Nemet
1a6d2ace74 [AVX512] alignr: Use suffix rather than name argument to multiclass
Again no functional change.  This prepares for the suffix to be used with the
intrinsic matching.

llvm-svn: 214886
2014-08-05 17:22:52 +00:00
Adam Nemet
b6384f0bad [AVX512] Pull everything alignr-related into the multiclass
The packed integer pattern becomes the DAG pattern for rri and the packed
float, another Pat<> inside the multiclass.

No functional change.

llvm-svn: 214885
2014-08-05 17:22:50 +00:00
Adam Nemet
04f70d285b Wrap long lines
llvm-svn: 214884
2014-08-05 17:22:47 +00:00
Jonathan Roelofs
32bc0b3143 Fix return sequence on armv4 thumb
POP on armv4t cannot be used to change thumb state (unilke later non-m-class
architectures), therefore we need a different return sequence that uses 'bx'
instead:

  POP {r3}
  ADD sp, #offset
  BX r3

This patch also fixes an issue where the return value in r3 would get clobbered
for functions that return 128 bits of data. In that case, we generate this
sequence instead:

  MOV ip, r3
  POP {r3}
  ADD sp, #offset
  MOV lr, r3
  MOV r3, ip
  BX lr

http://reviews.llvm.org/D4748

llvm-svn: 214881
2014-08-05 17:13:17 +00:00
Joerg Sonnenberger
6eb6423316 Add accessors for the PPC 403 bank registers.
llvm-svn: 214875
2014-08-05 15:45:15 +00:00
Keith Walker
1855cb5c25 Specify that the thumb setend and blx <immed> instructions are not valid on an m-class target
llvm-svn: 214871
2014-08-05 15:11:59 +00:00