1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 12:33:33 +02:00
Commit Graph

30778 Commits

Author SHA1 Message Date
Sanjay Patel
776e5485fb Add a feature flag for slow 32-byte unaligned memory accesses [x86].
This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen
for Sandy Bridge and Ivy Bridge. There is no functionality change intended for 
those chips. Previously, the absence of AVX2 was being used as a proxy to detect
this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2
that do not have the 32-byte unaligned access slowdown.

Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).

Differential Revision: http://reviews.llvm.org/D6355

llvm-svn: 222544
2014-11-21 17:40:04 +00:00
Chandler Carruth
5e598c0342 [x86] Restructure the checking patterns for v16 and v32 avx2 vector
shuffle lowering to allow much better blend matching.

Specifically, with the new structure the code seems clearer to me and we
correctly can hit the cases where merging two 128-bit lanes is a clear
win and can be shuffled cheaply afterward.

llvm-svn: 222539
2014-11-21 14:53:03 +00:00
Chandler Carruth
7491f1f32f [x86] Make the previous logic significantly less conservative and get
a bunch more improvements.

Non-lane-crossing is fine, the key is that lane merging only makes sense
for single-input shuffles. Not sure why I got so turned around here. The
code all works, I was just using the wrong model for it.

This only updates v4 and v8 lowering. The v16 and v32 lowering requires
restructuring the entire check sequence.

llvm-svn: 222537
2014-11-21 14:33:24 +00:00
Chandler Carruth
8387bec088 [x86] Teach the x86 vector shuffle lowering to detect mergable 128-bit
lanes.

By special casing these we can often either reduce the total number of
shuffles significantly or reduce the number of (high latency on Haswell)
AVX2 shuffles that potentially cross 128-bit lanes. Even when these
don't actually cross lanes, they have much higher latency to support
that. Doing two of them and a blend is worse than doing a single insert
across the 128-bit lanes to blend and then doing a single interleaved
shuffle.

While this seems like a narrow case, it kept cropping up on me and the
difference is *huge* as you can see in many of the test cases. I first
hit this trying to perfectly fix the interleaving shuffle patterns used
by Halide for AVX2.

llvm-svn: 222533
2014-11-21 13:56:05 +00:00
Alexey Volkov
235268b4ed [X86] For Silvermont CPU use 16-bit division instead of 64-bit for small positive numbers
Differential Revision: http://reviews.llvm.org/D5938

llvm-svn: 222521
2014-11-21 11:19:34 +00:00
NAKAMURA Takumi
3cd455da60 Add LLVMScalarOpts to LLVMPowerPCCodeGen.
llvm-svn: 222516
2014-11-21 09:14:45 +00:00
Hao Liu
9cb82be410 DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal.
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)

A hook is added to allow the target to control whether it needs to do such combine.

Reviewed in http://reviews.llvm.org/D6334

llvm-svn: 222510
2014-11-21 06:39:58 +00:00
Craig Topper
45dffff5e4 Remove a bunch of unnecessary typecasts to 'const TargetRegisterClass *'
llvm-svn: 222509
2014-11-21 05:58:21 +00:00
Hal Finkel
ac26448a5c [PPC] Use SeparateConstOffsetFromGEP
This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in
the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on
SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM,
there is a store moved out of the inner loop) and a potential speedup on
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it
makes some code look cleaner, and synchronizing the backends in this regard
seems like a generally good thing.

llvm-svn: 222504
2014-11-21 04:35:51 +00:00
Quentin Colombet
8fea50c066 [X86] Do not custom lower UINT_TO_FP when the target type does not
match the custom lowering.

<rdar://problem/19026326>

llvm-svn: 222489
2014-11-21 00:47:19 +00:00
Reid Kleckner
dbf3d8a5a4 Fix more instances of -Wsentinel on Windows with s/NULL/nullptr/
Follow up to r221940, where I must not have caught em all. NFC

llvm-svn: 222481
2014-11-20 23:51:47 +00:00
Reid Kleckner
6a21619ebc Add out of line virtual destructors to all LLVMTargetMachine subclasses
These recently all grew a unique_ptr<TargetLoweringObjectFile> member in
r221878.  When anyone calls a virtual method of a class, clang-cl
requires all virtual methods to be semantically valid. This includes the
implicit virtual destructor, which triggers instantiation of the
unique_ptr destructor, which fails because the type being deleted is
incomplete.

This is just part of the ongoing saga of PR20337, which is affecting
Blink as well. Because the MSVC ABI doesn't have key functions, we end
up referencing the vtable and implicit destructor on any virtual call
through a class. We don't actually end up emitting the dtor, so it'd be
good if we could avoid this unneeded type completion work.

llvm-svn: 222480
2014-11-20 23:37:18 +00:00
Mehdi Amini
fe9410a47b Update Makefile following directory removal in r222466
llvm-svn: 222475
2014-11-20 22:48:24 +00:00
Colin LeMahieu
ff9ce82394 [Hexagon] [NFC] Merging InstPrinter directory in to MCTargetDesc since they have a circular dependency.
llvm-svn: 222458
2014-11-20 21:56:35 +00:00
Saleem Abdulrasool
0de13e90eb X86: use the correct alloca symbol for Windows Itanium
Windows itanium targets the MSVCRT, and the stack probe symbol is provided by
MSVCRT.  This corrects the emission of stack probes on i686-windows-itanium.

llvm-svn: 222439
2014-11-20 18:01:26 +00:00
Jyoti Allur
0aaf89456e [ELF] Prevent ARM ELF object writer from generating deprecated relocation code R_ARM_PLT32
llvm-svn: 222414
2014-11-20 05:58:11 +00:00
Craig Topper
0f032e3130 Fix a typo in a comment.
llvm-svn: 222412
2014-11-20 05:22:37 +00:00
Colin LeMahieu
384b462d47 [Hexagon] Adding A2_xor instruction with IR selection pattern and test.
llvm-svn: 222399
2014-11-19 23:22:23 +00:00
Colin LeMahieu
35e8a8aa73 [Hexagon] Adding A2_or instruction with IR selection pattern and test.
llvm-svn: 222396
2014-11-19 22:58:04 +00:00
Andrea Di Biagio
b770dd344e [X86] Improved lowering of v4x32 build_vector dag nodes.
This patch improves the lowering of v4f32 and v4i32 build_vector dag nodes
that are known to have at least two non-zero elements.

With this patch, a build_vector that performs a blend with zero is 
converted into a shuffle. This is done to let the shuffle legalizer expand
the dag node in a optimal way. For example, if we know that a build_vector
performs a blend with zero, we can try to lower it as a movq/blend instead of
always selecting an insertps.

This patch also improves the logic that lowers a build_vector into a insertps
with zero masking. See for example the extra test cases added to test sse41.ll.

Differential Revision: http://reviews.llvm.org/D6311

llvm-svn: 222375
2014-11-19 19:34:29 +00:00
Tom Stellard
271c0a936e R600/SI: Make SIInstrInfo::isOperandLegal() more strict
A register operand that has a common sub-class with its instruction's
defined register class is not always legal.  For example,
SReg_32 and M0Reg both have a common sub-class, but we can't
use an SReg_32 in instructions that expect a M0Reg.

This prevents the llvm.SI.sendmsg.ll test from failing when the fold
operand pass is added.

llvm-svn: 222368
2014-11-19 16:58:49 +00:00
Zoran Jovanovic
ebf19d975c [mips][micromips] Implement SWM32 and LWM32 instructions
Differential Revision: http://reviews.llvm.org/D5519

llvm-svn: 222367
2014-11-19 16:44:02 +00:00
Jozef Kolek
2b6a42be6d [mips][microMIPS] Fix opcodes of MFHC1 and MTHC1 instructions.
Differential Revision: http://reviews.llvm.org/D6169

llvm-svn: 222355
2014-11-19 13:37:51 +00:00
Jozef Kolek
d19675f448 [mips][microMIPS] Implement CodeGen support for 16-bit instruction ADDIUR2.
Differential Revision: http://reviews.llvm.org/D5800

llvm-svn: 222352
2014-11-19 13:23:58 +00:00
Jozef Kolek
9fbf00198c [mips][microMIPS] Implement CodeGen support for ADDIUS5 instruction.
Differential Revision: http://reviews.llvm.org/D5799

llvm-svn: 222351
2014-11-19 13:11:09 +00:00
Jozef Kolek
0de52b5b97 [mips][microMIPS] Implement LWXS instruction.
Differential Revision: http://reviews.llvm.org/D5407

llvm-svn: 222348
2014-11-19 11:39:12 +00:00
Jozef Kolek
e466cd5b54 [mips][microMIPS] Implement SDBBP and RDHWR instructions.
Differential Revision: http://reviews.llvm.org/D5240

llvm-svn: 222347
2014-11-19 11:25:50 +00:00
Simon Pilgrim
e5f972f1c1 [X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2
This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain.

I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr.

Differential Revision: http://reviews.llvm.org/D5699

llvm-svn: 222340
2014-11-19 10:06:49 +00:00
David Blaikie
60e6c80905 Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.

This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...

llvm-svn: 222334
2014-11-19 07:49:26 +00:00
Hao Liu
f7e0bd2878 [AArch64] Disable useAA for Cortex-A57.
Using AA during CodeGen is very useful for in-order cores. It is less useful for ooo cores. Also I find
enabling useAA for Cortex-A57 may generate worse code for some test cases. If useAA in codegen is improved 
and benefical for ooo cores, we can enable it again.

llvm-svn: 222333
2014-11-19 06:48:56 +00:00
Hao Liu
00d285aca3 [AArch64] Enable SeparateConstOffsetFromGEP, EarlyCSE and LICM passes on AArch64 backend.
SeparateConstOffsetFromGEP can gives more optimizaiton opportunities related to GEPs, which benefits EarlyCSE
and LICM. By enabling these passes we can have better address calculations and generate a better addressing
mode. Some SPEC 2006 benchmarks (astar, gobmk, namd) have obvious improvements on Cortex-A57.

Reviewed in http://reviews.llvm.org/D5864.

llvm-svn: 222331
2014-11-19 06:39:53 +00:00
David Blaikie
7499cbae4c Remove StringMap::GetOrCreateValue in favor of StringMap::insert
Having two ways to do this doesn't seem terribly helpful and
consistently using the insert version (which we already has) seems like
it'll make the code easier to understand to anyone working with standard
data structures. (I also updated many references to the Entry's
key and value to use first() and second instead of getKey{Data,Length,}
and get/setValue - for similar consistency)

Also removes the GetOrCreateValue functions so there's less surface area
to StringMap to fix/improve/change/accommodate move semantics, etc.

llvm-svn: 222319
2014-11-19 05:49:42 +00:00
Weiming Zhao
c7ce2ee93f [Aarch64] Customer lowering of CTPOP to SIMD should check for NEON availability
llvm-svn: 222292
2014-11-19 00:29:14 +00:00
Matt Arsenault
73f4bd8758 R600/SI: Implement areMemAccessesTriviallyDisjoint
This partially makes up for not having address spaces
used for alias analysis in some simple cases.

This is not yet enabled by default so shouldn't change anything yet.

llvm-svn: 222286
2014-11-19 00:01:31 +00:00
Matt Arsenault
1843dcc2b6 R600/SI: Set hasSideEffects = 0 on load and store instructions.
Assuming unmodeled side effects interferes with some scheduling
opportunities.

Don't put it in the base class of DS instructions since there
are a few weird effecting, non load/store instructions there.

llvm-svn: 222285
2014-11-18 23:57:33 +00:00
Simon Pilgrim
daabed160f [X86][AVX] 256-bit vector stack unaligned load/stores identification
Under many circumstances the stack is not 32-byte aligned, resulting in the use of the vmovups/vmovupd/vmovdqu instructions when inserting ymm reloads/spills.

This minor patch adds these instructions to the isFrameLoadOpcode/isFrameStoreOpcode helpers so that they can be correctly identified and not be treated as folded reloads/spills.

This has also been noticed by http://llvm.org/bugs/show_bug.cgi?id=18846 where it was causing redundant spills - I've added a reduced test case at test/CodeGen/X86/pr18846.ll

Differential Revision: http://reviews.llvm.org/D6252

llvm-svn: 222281
2014-11-18 23:38:19 +00:00
Colin LeMahieu
f815ca1b0b [Hexagon] Adding A2_and instruction.
llvm-svn: 222274
2014-11-18 22:45:47 +00:00
Chad Rosier
ccf41a5c21 [FastISel][AArch64] Also allow folding of sign-/zero-extend and arithmetic
shift-right for booleans (i1).

Arithmetic shift-right immediate with sign-/zero-extensions also works for
boolean values.  Update the assert and the test cases to reflect that fact.

llvm-svn: 222272
2014-11-18 22:41:49 +00:00
Chad Rosier
7153154a79 [FastISel][AArch64] Also allow folding of sign-/zero-extend and logical
shift-right for booleans (i1).

Logical shift-right immediate with sign-/zero-extensions also works for boolean
values.  Update the assert and the test cases to reflect that fact.

llvm-svn: 222270
2014-11-18 22:38:42 +00:00
Colin LeMahieu
1d74e8f01b [Hexagon] Adding A2_sub instruction
Renaming test files.

llvm-svn: 222263
2014-11-18 21:51:51 +00:00
Juergen Ributzka
b3791ee3a7 [FastISel][AArch64] Follow-up fix for "Fix shift-immediate emission for "zero" shifts."
Shifts also perform sign-/zero-extends to larger types, which requires us to emit
an integer extend instead of a simple COPY.

Related to PR21594.

llvm-svn: 222257
2014-11-18 21:20:17 +00:00
Matt Arsenault
84d2214a94 R600/SI: Move SIFixSGPRCopies to inst selector passes
This should expose more of the actually used VALU
instructions to the machine optimization passes.

This also should help getting i1 handling into a better state.
For not entirly understood reasons, this fixes the split-scalar-i64-add.ll
test where a 64-bit add would only partially be moved to the VALU
resulting in use of undefined VCC.

llvm-svn: 222256
2014-11-18 21:06:58 +00:00
Juergen Ributzka
fe3cb34d8e [AArch64] Don't optimize all compare instructions.
"optimizeCompareInstr" converts compares (cmp/cmn) into plain sub/add
instructions when the flags are not used anymore. This conversion is valid for
most instructions, but not all. Some instructions that don't set the flags
(e.g. sub with immediate) can set the SP, whereas the flag setting version uses
the same encoding for the "zero" register.

Update the code to also check for the return register before performing the
optimization to make sure that a cmp doesn't suddenly turn into a sub that sets
the stack pointer.

I don't have a test case for this, because it isn't easy to trigger.

llvm-svn: 222255
2014-11-18 21:02:40 +00:00
Tom Stellard
962ccd7f85 R600/SI: Make sure resource descriptors are always stored in SGPRs
llvm-svn: 222253
2014-11-18 20:39:39 +00:00
Colin LeMahieu
f7ca7f6c70 [Hexagon] Converting from ADD_rr to A2_add which has encoding bits.
Adding test to show correct instruction selection and encoding.

llvm-svn: 222249
2014-11-18 20:28:11 +00:00
Juergen Ributzka
a2005be2b4 [FastISel][AArch64] Fix shift-immediate emission for "zero" shifts.
This change emits a COPY for a shift-immediate with a "zero" shift value.
This fixes PR21594 where we emitted a shift instruction with an incorrect
immediate operand.

llvm-svn: 222247
2014-11-18 19:58:59 +00:00
Jozef Kolek
d5cccacaae Test commit to verify that commit access works.
llvm-svn: 222244
2014-11-18 19:20:34 +00:00
Reid Kleckner
0ae02a3892 Revert "ADT: correctly report isMSVCEnvironment for windows itanium"
This reverts commit r222180.

llvm-svn: 222188
2014-11-17 22:55:59 +00:00
Saleem Abdulrasool
c8bc5eb9ab ADT: correctly report isMSVCEnvironment for windows itanium
The itanium environment on Windows uses MSVC and is a MSVC environment.  Report
this correctly.

llvm-svn: 222180
2014-11-17 22:13:26 +00:00
Matt Arsenault
0f208ea195 R600/SI: Don't copy flags when extracting subreg
This was resulting in use of a register after a kill.
For some reason this showed up as a problem in many tests
when moving the SIFixSGPRCopies pass closer to instruction
selection.

llvm-svn: 222175
2014-11-17 21:11:37 +00:00