1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 20:12:56 +02:00
Commit Graph

11301 Commits

Author SHA1 Message Date
Sanjay Patel
019be40c8c Enable FeatureFastUAMem for btver2
Allow unaligned 16-byte memop codegen for btver2. No functional changes for any other subtargets.

Replace the existing supposed small memcpy test with an actual test of a small memcpy. 
The previous test wasn't using FileCheck either.

This patch should allow us to close PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).

Differential Revision: http://reviews.llvm.org/D6360

llvm-svn: 222925
2014-11-28 18:40:18 +00:00
Tim Northover
e8b34aaff0 AArch64: treat [N x Ty] as a block during procedure calls.
The AAPCS treats small structs and homogeneous floating (or vector) aggregates
specially, and guarantees they either get passed as a contiguous block of
registers, or prevent any future use of those registers and get passed on the
stack.

This concept can fit quite neatly into LLVM's own type system, mapping an HFA
to [N x float] and so on, and small structs to [N x i64]. Doing so allows
front-ends to emit AAPCS compliant code without having to duplicate the
register counting logic.

llvm-svn: 222903
2014-11-27 21:02:42 +00:00
Charlie Turner
e4a6c7fd89 Stop uppercasing build attribute data.
The string data for string-valued build attributes were being unconditionally
uppercased. There is no mention in the ARM ABI addenda about case conventions,
so it's technically implementation defined as to whether the data are
capitialised in some way or not. However, there are good reasons not to
captialise the data.

  * It's less work.
  * Some vendors may legitimately have case-sensitive checks for these
    attributes which would fail on LLVM generated object files.
  * There could be locale issues with uppercasing.

The original reasons for uppercasing appear to have stemmed from an
old codesourcery toolchain behaviour, see

http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/87133

This patch makes the object file emitted no longer captialise string
data, it encodes as seen in the assembly source.

Change-Id: Ibe20dd6e60d2773d57ff72a78470839033aa5538
llvm-svn: 222882
2014-11-27 12:13:56 +00:00
Will Newton
da953f13b9 Update AArch64 ELF relocations to ABI 1.0
This mostly entails adding relocations, however there are a couple of
changes to existing relocations:

1. R_AARCH64_NONE is defined to be zero rather than 256

R_AARCH64_NONE has been defined to be zero for a long time elsewhere
e.g. binutils and glibc since the submission of the AArch64 port in
2012 so this is required for compatibility.

2. R_AARCH64_TLSDESC_ADR_PAGE renamed to R_AARCH64_TLSDESC_ADR_PAGE21

I don't think there is any way for relocation names to leak out of LLVM
so this should not break anything.

Tested with check-all with no regressions.

llvm-svn: 222821
2014-11-26 10:49:18 +00:00
Elena Demikhovsky
868b76ae69 AVX-512: Scalar ERI intrinsics
including SAE mode and memory operand.
Added AVX512_maskable_scalar template, that should cover all scalar instructions in the future.

The main difference between AVX512_maskable_scalar<> and AVX512_maskable<> is using X86select instead of vselect.
I need it, because I can't create vselect node for MVT::i1 mask for scalar instruction.

http://reviews.llvm.org/D6378

llvm-svn: 222820
2014-11-26 10:46:49 +00:00
Simon Pilgrim
0e1c44b939 [X86][SSE] Improvements to byte shift shuffle matching
Since (v)pslldq / (v)psrldq instructions resolve to a single input argument it is useful to match it much earlier than we currently do - this prevents more complicated shuffles (notably insertion into a zero vector) matching before it.

Differential Revision: http://reviews.llvm.org/D6409

llvm-svn: 222796
2014-11-25 22:34:59 +00:00
Cameron McInally
c32dadfa69 [AVX512] Add 512b integer shift by variable intrinsics and patterns.
llvm-svn: 222786
2014-11-25 20:41:51 +00:00
Zoran Jovanovic
c3664f6f8a [mips][micromips] Use call instructions with short delay slots
Differential Revision: http://reviews.llvm.org/D6338

llvm-svn: 222752
2014-11-25 10:50:00 +00:00
Juergen Ributzka
c90ddb75a2 [FastISel][AArch64] Fix and extend the tbz/tbnz pattern matching.
The pattern matching failed to recognize all instances of "-1", because when
comparing against "-1" we didn't use an APInt of the same bitwidth.

This commit fixes this and also adds inverse versions of the conditon to catch
more cases.

llvm-svn: 222722
2014-11-25 04:16:15 +00:00
Hal Finkel
515f6e50f5 [PowerPC] Implement combineRepeatedFPDivisors
This does not matter on newer cores (where we can use reciprocal estimates in
fast-math mode anyway), but for older cores this allows us to generate better
fast-math code where we have multiple FDIVs with a common divisor.

llvm-svn: 222710
2014-11-24 23:45:21 +00:00
Andrea Di Biagio
3646b17160 [X86] Improved target specific combine on VSELECT dag nodes.
This patch teaches function 'transformVSELECTtoBlendVECTOR_SHUFFLE' how to
convert VSELECT dag nodes to shuffles on targets that do not have SSE4.1.
On pre-SSE4.1 targets, we can still perform blend operations using movss/movsd.

Also, removed a target specific combine that performed a premature lowering of
VSELECT nodes to target specific MOVSS/MOVSD nodes.

llvm-svn: 222647
2014-11-24 12:23:15 +00:00
Michael Kuperstein
b12b19a24a [X86] Fixes bug in build_vector v4x32 lowering
r222375 made some improvements to build_vector lowering of v4x32 and v4xf32 into an insertps, but it missed a case where:

1. A single extracted element is used twice.
2. The lower of the two non-zero indexes should be preserved, and the higher should be used for the dest mask.

This caused a crash, since the source value for the insertps ends-up uninitialized.

Differential Revision: http://reviews.llvm.org/D6377

llvm-svn: 222635
2014-11-23 13:09:06 +00:00
Elena Demikhovsky
36a2243ab7 Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 222632
2014-11-23 08:07:43 +00:00
Matt Arsenault
1b03538afe R600: Fix extloads of i1 on R600/Evergreen
llvm-svn: 222631
2014-11-23 02:57:54 +00:00
Matt Arsenault
5edd328299 R600/SI: Add additional tests for i1 loads
llvm-svn: 222629
2014-11-23 02:57:50 +00:00
Matt Arsenault
1d9ec67967 R600/SI: Fix broken check lines and modernize prefixes
Use -LABEL and remove -CHECK

llvm-svn: 222628
2014-11-23 02:57:49 +00:00
Matt Arsenault
a30225b261 R600/SI: Fix missing -verify-machineinstrs on a test
llvm-svn: 222627
2014-11-23 02:57:47 +00:00
Chandler Carruth
4c0a2a8001 [x86] Add some tests for a common unpack pattern of vector shuffle that
has a remarkably unique and efficient lowering.

While we get this some of the time already, we miss a few cases and
there wasn't a principled reason we got it. We should at least test
this. v8 already has tests for this pattern.

llvm-svn: 222607
2014-11-22 05:44:43 +00:00
Tom Stellard
3929e69978 R600/SI: Add a failing test case for offset order in ds_read2 instructions
llvm-svn: 222585
2014-11-21 22:31:47 +00:00
Tom Stellard
a112fe4e40 R600/SI: Emit s_mov_b32 m0, -1 before every DS instruction
This s_mov_b32 will write to a virtual register from the M0Reg
class and all the ds instructions now take an extra M0Reg explicit
argument.

This change is necessary to prevent issues with the scheduler
mixing together instructions that expect different values in the m0
registers.

llvm-svn: 222583
2014-11-21 22:31:44 +00:00
Tom Stellard
484f10138e R600/SI: Add SIFoldOperands pass
This pass attempts to fold the source operands of mov and copy
instructions into their uses.

llvm-svn: 222581
2014-11-21 22:06:37 +00:00
Jozef Kolek
52fa965cf8 [mips][microMIPS] This patch implements functionality in MIPS delay slot
filler such as if delay slot filler have to put NOP instruction into the
delay slot of microMIPS BEQ or BNE instruction which uses the register $0,
then instead of emitting NOP this instruction is replaced by the corresponding
microMIPS compact branch instruction, i.e. BEQZC or BNEZC.

Differential Revision: http://reviews.llvm.org/D3566

llvm-svn: 222580
2014-11-21 22:04:35 +00:00
Tom Stellard
267a21d6d7 R600/SI: Use hex notation for constant in test
llvm-svn: 222578
2014-11-21 22:00:13 +00:00
Sanjay Patel
776e5485fb Add a feature flag for slow 32-byte unaligned memory accesses [x86].
This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen
for Sandy Bridge and Ivy Bridge. There is no functionality change intended for 
those chips. Previously, the absence of AVX2 was being used as a proxy to detect
this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2
that do not have the 32-byte unaligned access slowdown.

Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).

Differential Revision: http://reviews.llvm.org/D6355

llvm-svn: 222544
2014-11-21 17:40:04 +00:00
Chandler Carruth
5e598c0342 [x86] Restructure the checking patterns for v16 and v32 avx2 vector
shuffle lowering to allow much better blend matching.

Specifically, with the new structure the code seems clearer to me and we
correctly can hit the cases where merging two 128-bit lanes is a clear
win and can be shuffled cheaply afterward.

llvm-svn: 222539
2014-11-21 14:53:03 +00:00
Chandler Carruth
7491f1f32f [x86] Make the previous logic significantly less conservative and get
a bunch more improvements.

Non-lane-crossing is fine, the key is that lane merging only makes sense
for single-input shuffles. Not sure why I got so turned around here. The
code all works, I was just using the wrong model for it.

This only updates v4 and v8 lowering. The v16 and v32 lowering requires
restructuring the entire check sequence.

llvm-svn: 222537
2014-11-21 14:33:24 +00:00
Andrea Di Biagio
0a8cf1ad5a [DAG] Teach how to turn a build_vector into a shuffle if some of the operands are zero.
Before this patch, the DAGCombiner only tried to convert build_vector dag nodes
into shuffles if all operands were either extract_vector_elt or undef.

This patch improves that logic and teaches the DAGCombiner how to deal with
build_vector dag nodes where one or more operands are zero. A build_vector
dag node with some zero operands is turned into a shuffle only if the resulting
shuffle mask is legal for the target.

llvm-svn: 222536
2014-11-21 14:32:06 +00:00
Chandler Carruth
8387bec088 [x86] Teach the x86 vector shuffle lowering to detect mergable 128-bit
lanes.

By special casing these we can often either reduce the total number of
shuffles significantly or reduce the number of (high latency on Haswell)
AVX2 shuffles that potentially cross 128-bit lanes. Even when these
don't actually cross lanes, they have much higher latency to support
that. Doing two of them and a blend is worse than doing a single insert
across the 128-bit lanes to blend and then doing a single interleaved
shuffle.

While this seems like a narrow case, it kept cropping up on me and the
difference is *huge* as you can see in many of the test cases. I first
hit this trying to perfectly fix the interleaving shuffle patterns used
by Halide for AVX2.

llvm-svn: 222533
2014-11-21 13:56:05 +00:00
Chandler Carruth
5646862a2e [x86] Remove more windows line endings that slipped into this file...
llvm-svn: 222528
2014-11-21 12:33:46 +00:00
Chandler Carruth
2db7c4cf32 [x86] Add a bunch of test cases to 256-bit shuffles that exercise
merging 128-bit subvectors and also shuffling all the elements of those
subvectors. Currently we generate pretty bad code for many of these, but
I'm testing a patch that should dramatically improve this in addition to
making the shuffle lowering robust to other changes.

llvm-svn: 222525
2014-11-21 12:17:50 +00:00
Alexey Volkov
235268b4ed [X86] For Silvermont CPU use 16-bit division instead of 64-bit for small positive numbers
Differential Revision: http://reviews.llvm.org/D5938

llvm-svn: 222521
2014-11-21 11:19:34 +00:00
Hao Liu
9cb82be410 DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal.
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)

A hook is added to allow the target to control whether it needs to do such combine.

Reviewed in http://reviews.llvm.org/D6334

llvm-svn: 222510
2014-11-21 06:39:58 +00:00
Hal Finkel
ac26448a5c [PPC] Use SeparateConstOffsetFromGEP
This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in
the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on
SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM,
there is a store moved out of the inner loop) and a potential speedup on
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it
makes some code look cleaner, and synchronizing the backends in this regard
seems like a generally good thing.

llvm-svn: 222504
2014-11-21 04:35:51 +00:00
Quentin Colombet
8fea50c066 [X86] Do not custom lower UINT_TO_FP when the target type does not
match the custom lowering.

<rdar://problem/19026326>

llvm-svn: 222489
2014-11-21 00:47:19 +00:00
Saleem Abdulrasool
0de13e90eb X86: use the correct alloca symbol for Windows Itanium
Windows itanium targets the MSVCRT, and the stack probe symbol is provided by
MSVCRT.  This corrects the emission of stack probes on i686-windows-itanium.

llvm-svn: 222439
2014-11-20 18:01:26 +00:00
Andrea Di Biagio
b770dd344e [X86] Improved lowering of v4x32 build_vector dag nodes.
This patch improves the lowering of v4f32 and v4i32 build_vector dag nodes
that are known to have at least two non-zero elements.

With this patch, a build_vector that performs a blend with zero is 
converted into a shuffle. This is done to let the shuffle legalizer expand
the dag node in a optimal way. For example, if we know that a build_vector
performs a blend with zero, we can try to lower it as a movq/blend instead of
always selecting an insertps.

This patch also improves the logic that lowers a build_vector into a insertps
with zero masking. See for example the extra test cases added to test sse41.ll.

Differential Revision: http://reviews.llvm.org/D6311

llvm-svn: 222375
2014-11-19 19:34:29 +00:00
Tom Stellard
271c0a936e R600/SI: Make SIInstrInfo::isOperandLegal() more strict
A register operand that has a common sub-class with its instruction's
defined register class is not always legal.  For example,
SReg_32 and M0Reg both have a common sub-class, but we can't
use an SReg_32 in instructions that expect a M0Reg.

This prevents the llvm.SI.sendmsg.ll test from failing when the fold
operand pass is added.

llvm-svn: 222368
2014-11-19 16:58:49 +00:00
Jozef Kolek
d19675f448 [mips][microMIPS] Implement CodeGen support for 16-bit instruction ADDIUR2.
Differential Revision: http://reviews.llvm.org/D5800

llvm-svn: 222352
2014-11-19 13:23:58 +00:00
Jozef Kolek
9fbf00198c [mips][microMIPS] Implement CodeGen support for ADDIUS5 instruction.
Differential Revision: http://reviews.llvm.org/D5799

llvm-svn: 222351
2014-11-19 13:11:09 +00:00
Jozef Kolek
e466cd5b54 [mips][microMIPS] Implement SDBBP and RDHWR instructions.
Differential Revision: http://reviews.llvm.org/D5240

llvm-svn: 222347
2014-11-19 11:25:50 +00:00
Simon Pilgrim
e5f972f1c1 [X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2
This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain.

I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr.

Differential Revision: http://reviews.llvm.org/D5699

llvm-svn: 222340
2014-11-19 10:06:49 +00:00
Hao Liu
00d285aca3 [AArch64] Enable SeparateConstOffsetFromGEP, EarlyCSE and LICM passes on AArch64 backend.
SeparateConstOffsetFromGEP can gives more optimizaiton opportunities related to GEPs, which benefits EarlyCSE
and LICM. By enabling these passes we can have better address calculations and generate a better addressing
mode. Some SPEC 2006 benchmarks (astar, gobmk, namd) have obvious improvements on Cortex-A57.

Reviewed in http://reviews.llvm.org/D5864.

llvm-svn: 222331
2014-11-19 06:39:53 +00:00
Weiming Zhao
c7ce2ee93f [Aarch64] Customer lowering of CTPOP to SIMD should check for NEON availability
llvm-svn: 222292
2014-11-19 00:29:14 +00:00
Matt Arsenault
73f4bd8758 R600/SI: Implement areMemAccessesTriviallyDisjoint
This partially makes up for not having address spaces
used for alias analysis in some simple cases.

This is not yet enabled by default so shouldn't change anything yet.

llvm-svn: 222286
2014-11-19 00:01:31 +00:00
Simon Pilgrim
daabed160f [X86][AVX] 256-bit vector stack unaligned load/stores identification
Under many circumstances the stack is not 32-byte aligned, resulting in the use of the vmovups/vmovupd/vmovdqu instructions when inserting ymm reloads/spills.

This minor patch adds these instructions to the isFrameLoadOpcode/isFrameStoreOpcode helpers so that they can be correctly identified and not be treated as folded reloads/spills.

This has also been noticed by http://llvm.org/bugs/show_bug.cgi?id=18846 where it was causing redundant spills - I've added a reduced test case at test/CodeGen/X86/pr18846.ll

Differential Revision: http://reviews.llvm.org/D6252

llvm-svn: 222281
2014-11-18 23:38:19 +00:00
Chad Rosier
ccf41a5c21 [FastISel][AArch64] Also allow folding of sign-/zero-extend and arithmetic
shift-right for booleans (i1).

Arithmetic shift-right immediate with sign-/zero-extensions also works for
boolean values.  Update the assert and the test cases to reflect that fact.

llvm-svn: 222272
2014-11-18 22:41:49 +00:00
Chad Rosier
7153154a79 [FastISel][AArch64] Also allow folding of sign-/zero-extend and logical
shift-right for booleans (i1).

Logical shift-right immediate with sign-/zero-extensions also works for boolean
values.  Update the assert and the test cases to reflect that fact.

llvm-svn: 222270
2014-11-18 22:38:42 +00:00
Juergen Ributzka
b3791ee3a7 [FastISel][AArch64] Follow-up fix for "Fix shift-immediate emission for "zero" shifts."
Shifts also perform sign-/zero-extends to larger types, which requires us to emit
an integer extend instead of a simple COPY.

Related to PR21594.

llvm-svn: 222257
2014-11-18 21:20:17 +00:00
Matt Arsenault
84d2214a94 R600/SI: Move SIFixSGPRCopies to inst selector passes
This should expose more of the actually used VALU
instructions to the machine optimization passes.

This also should help getting i1 handling into a better state.
For not entirly understood reasons, this fixes the split-scalar-i64-add.ll
test where a 64-bit add would only partially be moved to the VALU
resulting in use of undefined VCC.

llvm-svn: 222256
2014-11-18 21:06:58 +00:00
Tom Stellard
962ccd7f85 R600/SI: Make sure resource descriptors are always stored in SGPRs
llvm-svn: 222253
2014-11-18 20:39:39 +00:00