1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 04:22:57 +02:00
Commit Graph

30985 Commits

Author SHA1 Message Date
Bill Schmidt
015b0526cc [PowerPC 3/4] Little-endian adjustments for VSX vector shuffle
When performing instruction selection for ISD::VECTOR_SHUFFLE, there
is special code for handling v2f64 and v2i64 using VSX instructions.
This code must be adjusted for little-endian.  Because the two inputs
are treated as a double-wide register, we must swap their order for
little endian.  To get the appropriate mask elements to use with the
big-endian biased XXPERMDI instruction, we must reverse their order
and invert the bits.

A new test is added to test the 16 possible values of the shuffle
mask.  It is initially disabled for reasons specified in the test.  It
is re-enabled by patch 4/4.

llvm-svn: 223791
2014-12-09 16:52:29 +00:00
Bill Schmidt
8456365c7d [PowerPC 2/4] Little-endian adjustments for VSX insert/extract operations
For little endian, we need to make some straightforward adjustments in
the code expansions for scalar_to_vector and vector_extract of v2f64.
First, scalar_to_vector must place the scalar into vector element
zero.  However, our implementation of SUBREG_TO_REG will place it into
big-element vector element zero (high-order bits), and for little
endian we need it in the low-order bits.  The LE implementation splats
the high-order doubleword into the low-order doubleword.

Second, the meaning of (vector_extract x, 0) and (vector_extract x, 1)
must be reversed for similar reasons.

A new test is added that tests code generation for insertelement and
extractelement for both element 0 and element 1.  It is disabled in
this patch but enabled in patch 4/4, for reasons stated in the test.

llvm-svn: 223788
2014-12-09 16:43:32 +00:00
Robert Khasanov
8a2292f8b6 [AVX512] Added VPBROADCAST{BWDQ} (Load with Broadcast Integer Data from General Purpose Register) encodings for AVX512-BW/VL subsets
Added encoding tests.
        

llvm-svn: 223787
2014-12-09 16:38:41 +00:00
Bill Schmidt
cb25653e71 [PowerPC 1/4] Little-endian adjustments for VSX loads/stores
This patch addresses the inherent big-endian bias in the lxvd2x,
lxvw4x, stxvd2x, and stxvw4x instructions.  These instructions load
vector elements into registers left-to-right (with the first element
loaded into the high-order bits of the register), regardless of the
endian setting of the processor.  However, these are the only
vector memory instructions that permit unaligned storage accesses, so
we want to use them for little-endian.

To make this work, a lxvd2x or lxvw4x is replaced with an lxvd2x
followed by an xxswapd, which swaps the doublewords.  This works for
lxvw4x as well as lxvd2x, because for lxvw4x on an LE system the
vector elements are in LE order (right-to-left) within each
doubleword.  (Thus after lxvw2x of a <4 x float> the elements will
appear as 1, 0, 3, 2.  Following the swap, they will appear as 3, 2,
0, 1, as desired.)   For stores, an stxvd2x or stxvw4x is replaced
with an stxvd2x preceded by an xxswapd.

Introduction of extra swap instructions provides correctness, but
obviously is not ideal from a performance perspective.  Future patches
will address this with optimizations to remove most of the introduced
swaps, which have proven effective in other implementations.

The introduction of the swaps is performed during lowering of LOAD,
STORE, INTRINSIC_W_CHAIN, and INTRINSIC_VOID operations.  The latter
are used to translate intrinsics that specify the VSX loads and stores
directly into equivalent sequences for little endian.  Thus code that
uses vec_vsx_ld and vec_vsx_st does not have to be modified to be
ported from BE to LE.

We introduce new PPCISD opcodes for LXVD2X, STXVD2X, and XXSWAPD for
use during this lowering step.  In PPCInstrVSX.td, we add new SDType
and SDNode definitions for these (PPClxvd2x, PPCstxvd2x, PPCxxswapd).
These are recognized during instruction selection and mapped to the
correct instructions.

Several tests that were written to use -mcpu=pwr7 or pwr8 are modified
to disable VSX on LE variants because code generation changes with
this and subsequent patches in this set.  I chose to include all of
these in the first patch than try to rigorously sort out which tests
were broken by one or another of the patches.  Sorry about that.

The new test vsx-ldst-builtin-le.ll, and the changes to vsx-ldst.ll,
are disabled until LE support is enabled because of breakages that
occur as noted in those tests.  They are re-enabled in patch 4/4.

llvm-svn: 223783
2014-12-09 16:35:51 +00:00
Chandler Carruth
bc148c7e5c [x86] Fix the test to actually test things for the CPU names, add the
missing barcelona CPU which that test uncovered, and remove the 32-bit
x86 CPUs which I really wasn't prepared to audit and test thoroughly.

If anyone wants to clean up the 32-bit only x86 CPUs, go for it.

Also, if anyone else wants to try to de-duplicate the AMD CPUs, that'd
be cool, but from the looks of it wouldn't save as much as it did for
the Intel CPUs.

llvm-svn: 223774
2014-12-09 14:25:55 +00:00
Aaron Ballman
df401918b6 Removing an unused variable to silence a -Wunused-but-set-variable warning. NFC.
llvm-svn: 223773
2014-12-09 13:20:11 +00:00
Asiri Rathnayake
b61fafa4cd Fix modified immediate bug reported by MC Hammer.
Instructions of the form [ADD Rd, pc, #imm] are manually aliased
in processInstruction() to use ADR. To accomodate this, mod_imm handling
had to be tweaked a bit. Turns out it was the manual aliasing that must
be tweaked to accommodate mod_imms instead. More information about the
parsed instruction is available at the point where processInstruction()
is invoked, which makes it easier to detect a mod_imm at that point rather
than trying to detect a potential alias when a mod_imm is being prepped.
Added a test case and fixed some white spaces as well.

llvm-svn: 223772
2014-12-09 13:14:58 +00:00
Chandler Carruth
e916866a89 [x86] Bring some sanity to the x86 CPU processor definitions.
Notably, this adds simple micro-architecture names for the Intel CPU
variants, and defines the old 'core'-based names as aliases. GCC has
started to simplify their documented interface to use these names as
well, so it seems like we can start to converge on a consistent pattern.

I'd appreciate Intel double checking the entries that aren't yet
documented widely, especially Atom (Bonnell and Silvermont), Knights
Landing, and Skylake. But this change shouldn't break any existing
users.

Also, ran clang-format to re-format this code and it actually worked
(modulo a tiny bug) so hopefully we can start to stop thinking about
formatting this stuff.

llvm-svn: 223769
2014-12-09 10:58:36 +00:00
Elena Demikhovsky
b30aead98b AVX-512: Added some comments to ERI scalar intrinsics.
No functional change.

llvm-svn: 223761
2014-12-09 07:06:32 +00:00
Mohit K. Bhakkad
3f295dffb4 test commit (spelling correction)
llvm-svn: 223758
2014-12-09 06:31:07 +00:00
Michael Kuperstein
224c7d7edb [X86] Convert esp-relative movs of function arguments into pushes, step 1
This handles the simplest case for mov -> push conversion:
1. x86-32 calling convention, everything is passed through the stack.
2. There is no reserved call frame.
3. Only registers or immediates are pushed, no attempt to combine a mem-reg-mem sequence into a single PUSHmm.

Differential Revision: http://reviews.llvm.org/D6503

llvm-svn: 223757
2014-12-09 06:10:44 +00:00
Bill Schmidt
e2ee779bbb Restore r223709 as it was meant to be, and enable FeatureP8Vector for P8
llvm-svn: 223751
2014-12-09 03:02:48 +00:00
NAKAMURA Takumi
12ce662a42 Revert r223709, "[PowerPC]Activate FeatureVSX for the Power target", to unbreak bots.
CodeGen/PowerPC/vsx-p8.ll was failing.

  '+power8-vector' is not a recognized feature for this target (ignoring feature)
  llvm/test/CodeGen/PowerPC/vsx-p8.ll:33:14: error: expected string not found in input
  ; CHECK-REG: lxvw4x 34, 0, 3
               ^
  <stdin>:50:2: note: scanning from here
   .align 3
   ^
  <stdin>:61:2: note: possible intended match here
   lvx 3, 0, 3
   ^

llvm-svn: 223729
2014-12-09 01:03:27 +00:00
Tom Stellard
42e269acdb R600/SI: Set MayStore = 0 on MUBUF loads
llvm-svn: 223722
2014-12-09 00:03:54 +00:00
Tom Stellard
6c038de6d6 R600/SI: Move setting of the lds bit to the base MUBUF class
llvm-svn: 223721
2014-12-09 00:03:51 +00:00
Colin LeMahieu
0febbead81 [Hexagon] Removing old def versions and replacing usages with versions that have encodings.
llvm-svn: 223720
2014-12-08 23:55:43 +00:00
Colin LeMahieu
f45c4d630c [Hexagon] Adding any8, all8, and/or/xor/andn/orn/not predicate register forms, mask, and vitpack instructions and patterns.
llvm-svn: 223710
2014-12-08 23:07:59 +00:00
Bill Seurer
132393f6f8 [PowerPC]Activate FeatureVSX for the Power target
This change activates FeatureVSX for Power 7 and Power 8 in PPC.td.

http://reviews.llvm.org/D6570

llvm-svn: 223709
2014-12-08 23:07:12 +00:00
Hal Finkel
494145ce57 [PowerPC] Don't use a non-allocatable register to implement the 'cc' alias
GCC accepts 'cc' as an alias for 'cr0', and we need to do the same when
processing inline asm constraints. This had previously been implemented using a
non-allocatable register, named 'cc', that was listed as an alias of 'cr0', but
the infrastructure does not seem to support this properly (neither the register
allocator nor the scheduler properly accounts for the alias). Instead, we can
just process this as a naming alias inside of the inline asm
constraint-processing code, so we'll do that instead.

There are two regression tests, one where the post-RA scheduler did the wrong
thing with the non-allocatable alias, and one where the register allocator did
the wrong thing. Fixes PR21742.

llvm-svn: 223708
2014-12-08 22:54:22 +00:00
Colin LeMahieu
2cb1961db7 [Hexagon] Adding xtype doubleword add, sub, and, or, xor and patterns.
llvm-svn: 223702
2014-12-08 22:19:14 +00:00
Colin LeMahieu
d625c9eddf [Hexagon] Adding xtype doubleword comparisons. Removing unused multiclass.
llvm-svn: 223701
2014-12-08 21:56:47 +00:00
Colin LeMahieu
e3767ba748 [Hexagon] Adding xtype parity, min, minu, max, maxu instructions.
llvm-svn: 223693
2014-12-08 21:19:18 +00:00
Colin LeMahieu
5ae76f5dc5 [Hexagon] Adding xtype halfword add/sub ll/hl/lh/hh/sat/<<16 instructions.
llvm-svn: 223692
2014-12-08 20:33:01 +00:00
Matt Arsenault
99b77bb723 R600/SI: Move continue after checking s_mov_b32.
There's nothing else to bother trying to shrink these.

llvm-svn: 223686
2014-12-08 19:55:43 +00:00
Colin LeMahieu
663e36e9b8 [Hexagon] Adding add/sub with saturation. Removing unused def. Cleaning up shift patterns.
llvm-svn: 223680
2014-12-08 18:33:49 +00:00
Bruno Cardoso Lopes
915b66faf0 [CompactUnwind] Fix register encoding logic
Fix a compact unwind encoding logic bug which would try to encode
more callee saved registers than it should, leading to early bail out
in the encoding logic and abusive use of DWARF frame mode unnecessarily.

Also remove no-compact-unwind.ll which was testing the wrong thing
based on this bug and move it to valid 'compact unwind' tests. Added
other few more tests too.

llvm-svn: 223676
2014-12-08 18:18:32 +00:00
Tim Northover
4032b61c4b AArch64: treat HFAs containing "half" types as blocks too.
llvm-svn: 223669
2014-12-08 17:54:58 +00:00
Andrea Di Biagio
bdef9a5ae6 [X86] Improved tablegen patters for matching TZCNT/LZCNT.
Teach ISel how to match a TZCNT/LZCNT from a conditional move if the
condition code is X86_COND_NE.
Existing tablegen patterns only allowed to match TZCNT/LZCNT from a
X86cond with condition code equal to X86_COND_E. To avoid introducing
extra rules, I added an 'ImmLeaf' definition that checks if the
condition code is COND_E or COND_NE.

llvm-svn: 223668
2014-12-08 17:47:18 +00:00
Colin LeMahieu
ca3bb1aa32 [Hexagon] Adding combine reg, reg with predicated forms.
llvm-svn: 223667
2014-12-08 17:33:06 +00:00
Colin LeMahieu
d90979b168 [Hexagon] Adding packhl instruction.
llvm-svn: 223664
2014-12-08 17:01:18 +00:00
Daniel Sanders
9ef2117af8 [mips] Add Mips-specific CCIf's for accessing the MipsCCState. NFC.
Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6213

llvm-svn: 223662
2014-12-08 15:40:09 +00:00
Andrea Di Biagio
db2b7a7592 [X86] Improved lowering of packed v8i16 vector shifts by non-constant count.
Before this patch, the backend sub-optimally expanded the non-constant shift
count of a v8i16 shift into a sequence of two 'movd' plus 'movzwl'.

With this patch the backend checks if the target features sse4.1. If so, then
it lets the shuffle legalizer deal with the expansion of the shift amount.

Example:
;;
define <8 x i16> @test(<8 x i16> %A, <8 x i16> %B) {
  %shamt = shufflevector <8 x i16> %B, <8 x i16> undef, <8 x i32> zeroinitializer
  %shl = shl <8 x i16> %A, %shamt
  ret <8 x i16> %shl
}
;;

Before (with -mattr=+avx):
  vmovd  %xmm1, %eax
  movzwl  %ax, %eax
  vmovd  %eax, %xmm1
  vpsllw  %xmm1, %xmm0, %xmm0
  retq

Now:
  vpxor  %xmm2, %xmm2, %xmm2
  vpblendw  $1, %xmm1, %xmm2, %xmm1
  vpsllw  %xmm1, %xmm0, %xmm0
  retq

llvm-svn: 223660
2014-12-08 14:36:51 +00:00
Elena Demikhovsky
3492489267 X86 intrinsics moved form X86ISelLowering.cpp to X86IntrinsicsInfo.h
X86ISelLowering.cpp has a long switch for intrinsics. I moved a part of
this long switch to the new intrinsics table in X86IntrinsicsInfo.h.
No functional changes, just code and compile time optimization.

llvm-svn: 223641
2014-12-08 09:03:08 +00:00
Marek Olsak
156b10f50e R600/SI: Disable VMEM and SMEM clauses by breaking them with S_NOP
This is only a workaround.

llvm-svn: 223615
2014-12-07 17:17:43 +00:00
Marek Olsak
06992b08aa R600/SI: Set 20-bit immediate byte offset for SMRD on VI
llvm-svn: 223614
2014-12-07 17:17:38 +00:00
Marek Olsak
15875571f6 R600/SI: Update instruction conversions for VI
There are 3 changes:
- Convert 32-bit S_LSHL/LSHR/ASHR to their V_*REV variants for VI
- Lower RSQ_CLAMP for VI
- Don't generate MIN/MAX_LEGACY on VI

llvm-svn: 223604
2014-12-07 12:19:03 +00:00
Marek Olsak
73de9fa64e R600/SI: Add VI instructions
llvm-svn: 223603
2014-12-07 12:18:57 +00:00
Marek Olsak
ab8bc1e699 R600/SI: Add SCC Defs/Uses to SOP1 and SOP2 opcodes
llvm-svn: 223602
2014-12-07 12:18:45 +00:00
Benjamin Kramer
a1a4ea6fcb Make the DenseMap bucket type configurable and use a smaller bucket for DenseSet.
DenseSet used to be implemented as DenseMap<Key, char>, which usually doubled
the memory footprint of the map. Now we use a compressed set so the second
element uses no memory at all. This required some surgery on DenseMap as
all accesses to the bucket now have to go through methods; this should
have no impact on the behavior of DenseMap though. The new default bucket
type for DenseMap is a slightly extended std::pair as we expose it through
DenseMap's iterator and don't want to break any existing users.

llvm-svn: 223588
2014-12-06 19:22:44 +00:00
Tom Stellard
ef2e57c33a R600/SI: Restore PrivateGlobalPrefix to the default ELF value of ".L"
This was changed in r223323.

llvm-svn: 223579
2014-12-06 05:34:34 +00:00
Ahmed Bougacha
b46577058a [X86] Refactor PMOV[SZ]Xrm to add missing AVX2 patterns.
Most patterns will go away once the extload legalization changes land.

Differential Revision: http://reviews.llvm.org/D6125

llvm-svn: 223567
2014-12-06 01:31:07 +00:00
Tim Northover
2195966715 AArch64: use explicit MVT::i64 when creating EXTRACT_SUBVECTOR nodes.
All our patterns use MVT::i64, but the ISelLowering nodes were inconsistent in
their choice.

No functional change.

llvm-svn: 223551
2014-12-06 00:33:37 +00:00
Ahmed Bougacha
a614254259 [X86] Cleanup FCOPYSIGN lowering. NFC intended.
llvm-svn: 223542
2014-12-05 23:11:36 +00:00
Colin LeMahieu
8ca972a974 [Hexagon] Relocating logical instructions and templates later in the td file.
llvm-svn: 223523
2014-12-05 21:51:12 +00:00
Colin LeMahieu
07e42127a4 [Hexagon] Adding sub/and/or reg, imm forms
llvm-svn: 223522
2014-12-05 21:38:29 +00:00
Sanjay Patel
88b824a8d3 Optimize merging of scalar loads for 32-byte vectors [X86, AVX]
Fix the poor codegen seen in PR21710 ( http://llvm.org/bugs/show_bug.cgi?id=21710 ).
Before we crack 32-byte build vectors into smaller chunks (and then subsequently
glue them back together), we should look for the easy case where we can just load
all elements in a single op.

An example of the codegen change is:

From:

vmovss  16(%rdi), %xmm1
vmovups (%rdi), %xmm0
vinsertps       $16, 20(%rdi), %xmm1, %xmm1
vinsertps       $32, 24(%rdi), %xmm1, %xmm1
vinsertps       $48, 28(%rdi), %xmm1, %xmm1
vinsertf128     $1, %xmm1, %ymm0, %ymm0
retq

To:

vmovups (%rdi), %ymm0
retq

Differential Revision: http://reviews.llvm.org/D6536

llvm-svn: 223518
2014-12-05 21:28:14 +00:00
Colin LeMahieu
4f286f5f23 [Hexagon] Updating mux_ir/ri/ii/rr with encoding bits
llvm-svn: 223515
2014-12-05 21:09:27 +00:00
Jan Wen Voung
b856ac92dc Use 32-bit ebp for NaCl64 in a limited case: llvm.frameaddress.
Summary:
Follow up to [x32] "Use ebp/esp as frame and stack pointer":
http://reviews.llvm.org/D4617

In that earlier patch, NaCl64 was made to always use rbp.
That's needed for most cases because rbp should hold a full
64-bit address within the NaCl sandbox so that load/stores
off of rbp don't require sandbox adjustment (zeroing the top
32-bits, then filling those by adding r15).

However, llvm.frameaddress returns a pointer and pointers
are 32-bit for NaCl64. In this case, use ebp instead, which
will make the register copy type check. A similar mechanism
may be needed for llvm.eh.return, but is not added in this change.

Test Plan: test/CodeGen/X86/frameaddr.ll

Reviewers: dschuff, nadav

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D6514

llvm-svn: 223510
2014-12-05 20:55:53 +00:00
Bill Seurer
b4d665d454 [PowerPC]Add VSX loads/stores to fastisel for PPC target
This patch adds VSX floating point loads and stores to fastisel.

Along with the change to tablegen (D6220), VSX instructions are now fully supported in fastisel.

http://reviews.llvm.org/D6274

llvm-svn: 223507
2014-12-05 20:15:56 +00:00
Colin LeMahieu
ea4694aa6c [Hexagon] Adding tfrih/l instructions.
llvm-svn: 223506
2014-12-05 20:07:19 +00:00