1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 19:12:56 +02:00
Commit Graph

36148 Commits

Author SHA1 Message Date
Quentin Colombet
0ce1d525f1 [AArch64] Plug the beginning of the GlobalISel pipeline.
llvm-svn: 260569
2016-02-11 19:35:06 +00:00
Tom Stellard
60df0370a2 [AMDGPU] Fix for "v_div_scale_f64 reg, vcc, ..." parsing
Summary:
Added support for "VOP3Only" attribute in VOP3bInst encoding.
Set VOP3Only=1 for V_DIV_SCALE_F64/32 insns.
Added support for multi-dest instructions in AMDGPUAs::cvt*().
Added lit test for "V_DIV_SCALE_F64|F32 vreg,vcc|sreg,vreg,vreg,vreg".

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, SamWot, nhaustov, vpykhtin

Differential Revision: http://reviews.llvm.org/D16995

Patch By: Artem Tamazov

llvm-svn: 260560
2016-02-11 18:25:26 +00:00
Artem Belevich
672363ee16 [NVPTX] emit .file directives for files referenced by subprograms.
.. so .loc directives referring to those files work correctly.

Differential Revision: http://reviews.llvm.org/D17086

llvm-svn: 260557
2016-02-11 18:21:47 +00:00
Hans Wennborg
3625a4f59b Revert r260507: "[X86] Enable the LEA optimization pass by default."
This caused PR26575.

llvm-svn: 260538
2016-02-11 16:44:06 +00:00
Jun Bum Lim
a57736cc50 [AArch64] Refactoring findMatchingStore() in aarch64-ldst-opt; NFC
Summary: This change makes findMatchingStore() follow the same coding style introduced in r260275.

Reviewers: gberry, junbuml

Subscribers: aemerson, rengolin, haicheng, bmakam, mssimpso

Differential Revision: http://reviews.llvm.org/D17083

llvm-svn: 260534
2016-02-11 16:18:24 +00:00
Chad Rosier
0febca6df8 [AArch64] Improve load/store optimizer to handle LDUR + LDR.
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.

This is a reapplication of r259812, which had an incorrect assert.  The
test_stur_str_no_assert() test is a reduced version of the issue hit in
the AArch64 self-host.

PR24465

llvm-svn: 260523
2016-02-11 14:25:08 +00:00
Andrey Turetskiy
50853c0d36 [X86] Enable the LEA optimization pass by default.
Differential Revision: http://reviews.llvm.org/D16877

llvm-svn: 260507
2016-02-11 10:51:26 +00:00
Simon Atanasyan
b344cf2221 [MC][ELF] Handle MIPS specific .sdata and .sbss directives
MIPS specific .sdata and .sbss directives create corresponding sections
with proper initialized ELF flags including ELF::SHF_MIPS_GPREL.

Differential Revision: http://reviews.llvm.org/D17001

llvm-svn: 260498
2016-02-11 06:45:54 +00:00
Matt Arsenault
36dc1c179e AMDGPU: Fix constant bus use check with subregisters
If the two operands to an instruction were both
subregisters of the same super register, it would incorrectly
think this counted as the same constant bus use.

This fixes the verifier error in fmin_legacy.ll which
was missing -verify-machineinstrs.

llvm-svn: 260495
2016-02-11 06:15:39 +00:00
Matt Arsenault
db87fa7ab4 AMDGPU: Fix passes depending on dominator tree for no reason
llvm-svn: 260494
2016-02-11 06:15:34 +00:00
Matt Arsenault
612f0d286e AMDGPU: Fix not handling new workitem intrinsics in DivergenceAnalysis
llvm-svn: 260491
2016-02-11 05:32:51 +00:00
Matt Arsenault
cdd789021d AMDGPU: Split R600 and SI store lowering
These were only sharing some somewhat incorrect
logic for when to scalarize or split vectors.

llvm-svn: 260490
2016-02-11 05:32:46 +00:00
Tom Stellard
c44c82f004 [AMDGPU] Assembler: Fix VOP3 only instructions
Separate methods to convert parsed instructions to MCInst:

  - VOP3 only instructions (always create modifiers as operands in MCInst)
  - VOP2 instrunctions with modifiers (create modifiers as operands
    in MCInst when e64 encoding is forced or modifiers are parsed)
  - VOP2 instructions without modifiers (do not create modifiers
    as operands in MCInst)
  - Add VOP3Only flag. Pass HasMods flag to VOP3Common.
  - Simplify code that deals with modifiers (-1 is now same as
    0). This is no longer needed.
  - Add few tests (more will be added separately).
    Update error message now correct.

Patch By: Nikolay Haustov

Differential Revision: http://reviews.llvm.org/D16778

llvm-svn: 260483
2016-02-11 03:28:15 +00:00
Derek Schuff
41d067a08e [WebAssembly] Re-triage list of compilation failures for torture tests
llvm-svn: 260438
2016-02-10 21:43:16 +00:00
Reid Kleckner
ea1c57fa88 [codeview] Describe int local variables using .cv_def_range
Summary:
Refactor common value, scope, and label tracking logic out of DwarfDebug
into a common base class called DebugHandlerBase.

Update an old LLVM IR test case to avoid an assertion in LexicalScopes.

Reviewers: dblaikie, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16931

llvm-svn: 260432
2016-02-10 20:55:49 +00:00
Derek Schuff
5c2592b245 [WebAssembly] Address comments left over from r260421
llvm-svn: 260429
2016-02-10 20:14:15 +00:00
Nicolai Haehnle
e72d316342 AMDGPU: Release the scavenged offset register during VGPR spill
Summary:
This fixes a crash where subsequent spills would be unable to scavenge
a register. In particular, it fixes a crash in piglit's
spec@glsl-1.50@execution@geometry@max-input-components (the test still
has a shader that fails to compile because of too many SGPR spills, but
at least it doesn't crash any more).

This is a candidate for the release branch.

Reviewers: arsenm, tstellarAMD

Subscribers: qcolombet, arsenm

Differential Revision: http://reviews.llvm.org/D16558

llvm-svn: 260427
2016-02-10 20:13:58 +00:00
Sanjay Patel
b9786603cc [x86] refactor masked load/store combine logic ; NFCI
llvm-svn: 260426
2016-02-10 20:02:45 +00:00
Derek Schuff
08898bef47 [WebAssembly] Switch varags calling convention to use a register
Instead of passing varargs directly on the user stack, allocate a buffer in
the caller's stack frame and pass a pointer to it. This simplifies the C
ABI (e.g. non-C callers of C functions do not need to use C's user stack if
they have their own mechanism) and allows further optimizations in the future
(e.g. fewer functions may need to use the stack).

Differential Revision: http://reviews.llvm.org/D17048

llvm-svn: 260421
2016-02-10 19:51:04 +00:00
Chad Rosier
b149add9e6 [AArch64] Refactor is logic into a helper function. NFC.
llvm-svn: 260419
2016-02-10 19:45:48 +00:00
Chad Rosier
b12fd0bc31 [AArch64] Update comment to match reality. NFC.
llvm-svn: 260406
2016-02-10 18:49:28 +00:00
Colin LeMahieu
db6a39c7b6 [MC] Merge VK_PPC_TPREL in to generic VK_TPREL.
Differential Revision: http://reviews.llvm.org/D17038

llvm-svn: 260401
2016-02-10 18:32:01 +00:00
Matt Arsenault
409ffa77e6 AMDGPU: Fix indentation and variable names
llvm-svn: 260399
2016-02-10 18:21:45 +00:00
Matt Arsenault
2c8154f002 AMDGPU: Split R600 and SI load lowering
These weren't actually sharing anything in the common
LowerLOAD.

llvm-svn: 260398
2016-02-10 18:21:39 +00:00
James Y Knight
529a53c338 [SPARC] Repair floating-point condition encodings in assembly parser.
The encodings for floating point conditions A(lways) and N(ever) were
incorrectly specified for the assembly parser, per Sparc manual v8 page
121. This change corrects that mistake.

Also, strangely, all of the branch instructions already had MC test
cases, except for the broken ones. Added the tests.

Patch by Chris Dewhurst

Differential Revision: http://reviews.llvm.org/D17074

llvm-svn: 260390
2016-02-10 17:47:20 +00:00
Chad Rosier
4c1db07ad4 [AArch64] This bit of logic is specific to pairing. NFC.
llvm-svn: 260383
2016-02-10 15:52:46 +00:00
Andrey Turetskiy
b4ff8194a0 [X86] Fix stack alignment for MCU target, by Anton Nadolskiy.
This patch fixes stack alignments for MCU (should be aligned to 4 bytes).

Differential Revision: http://reviews.llvm.org/D15646

llvm-svn: 260375
2016-02-10 11:57:06 +00:00
Dylan McKay
35d0ab1842 [AVR] Add instruction definitions
Summary: Add the AVR instruction tablegen definitions.

Reviewers: stoklund, hfinkel, dsanders, arsenm, vkalintiris

Subscribers: dylanmckay, agnat, rjordans, llvm-commits

Differential Revision: http://reviews.llvm.org/D15703

llvm-svn: 260363
2016-02-10 08:55:23 +00:00
JF Bastien
de7879a249 X86: Remove useless semicolon
llvm-svn: 260359
2016-02-10 04:04:12 +00:00
Sanjay Patel
f3f4f7a88a [x86] convert masked load of exactly one element to scalar load
This is the load counterpart to the store optimization that was added in:
http://reviews.llvm.org/rL260145

llvm-svn: 260325
2016-02-09 23:44:35 +00:00
Ahmed Bougacha
f23bc5a4de [CodeGen] Prefer "if (SDValue R = ...)" to "if (R.getNode())". NFCI.
llvm-svn: 260316
2016-02-09 22:54:12 +00:00
Ahmed Bougacha
3589294b5a [X86] Don't reuse an unrelated variable, create a new one. NFC.
Using Op makes it look like we're doing something with it.
We're really not.

llvm-svn: 260315
2016-02-09 22:54:05 +00:00
Ahmed Bougacha
68b447ec28 [X86] Remove unnecessary assignment. NFC.
llvm-svn: 260314
2016-02-09 22:53:58 +00:00
Simon Atanasyan
b8beb47094 [mips] Extend MipsAsmParser class to handle %got(sym + const) expressions
Now the parser supports `%got(sym)` expressions only but `%got(sym + const)`
variant is also valid and accepted by GAS.

Differential Revision: http://reviews.llvm.org/D16885

llvm-svn: 260305
2016-02-09 22:31:49 +00:00
Sanjay Patel
4c9e344480 [SelectionDAG] make getMemBasePlusOffset() accessible; NFCI
I reinvented this functionality in http://reviews.llvm.org/D16828 because it was
hidden away as a static function. The changes in x86 are not based on a complete
audit. I suspect there are other possible uses there, and there are almost certainly
more potential users in other targets.

llvm-svn: 260295
2016-02-09 21:42:04 +00:00
Chad Rosier
44ba21a425 [AArch64] This check is specific to merging instructions. NFC.
llvm-svn: 260283
2016-02-09 21:20:12 +00:00
Geoff Berry
16f8ead77d [AArch64] AArch64LoadStoreOptimizer: fix bug in pre-inc check iterator
Summary:
Fix case where a pre-inc/dec load/store would not be formed if the
add/sub that forms the inc/dec part of the operation was the first
instruction in the block being examined.

Reviewers: mcrosier, jmolloy, t.p.northover, junbuml

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16785

llvm-svn: 260275
2016-02-09 20:47:21 +00:00
Chad Rosier
c7429053bb [AArch64] Bail even earlier if the instructions modifieds the base register. NFC.
llvm-svn: 260274
2016-02-09 20:44:41 +00:00
Chad Rosier
e20f1c74ed [AArch64] Simplify. NFC.
llvm-svn: 260273
2016-02-09 20:27:45 +00:00
Chad Rosier
cb79de089b [AArch64] Add an assert to ensure we don't scale an offset that can't be scaled.
llvm-svn: 260272
2016-02-09 20:18:07 +00:00
Chad Rosier
dd552205e1 [AArch64] Add a FIXME about invalid KILL markers after the ld/st opt pass.
llvm-svn: 260264
2016-02-09 19:42:19 +00:00
Chad Rosier
8d7cb088ae [AArch64] Remove redundant calls and clang format. NFC.
llvm-svn: 260260
2016-02-09 19:33:42 +00:00
Colin LeMahieu
7d2b0f70e8 [Hexagon] Fixing relocation generation and adding tests.
llvm-svn: 260259
2016-02-09 19:18:02 +00:00
Chad Rosier
b34ba75bdc [AArch64] Hoist now common logic. NFC.
llvm-svn: 260257
2016-02-09 19:17:18 +00:00
Chad Rosier
236a622860 [AArch64] Rename variable to make it clear we're merging here, not pairing.
llvm-svn: 260256
2016-02-09 19:09:22 +00:00
Chad Rosier
4c9c20482f [AArch64] Separage the codegen logic for widening vs. pairing. NFC.
llvm-svn: 260249
2016-02-09 19:02:12 +00:00
Chad Rosier
5c27cebb99 [AArch64] Cleanup to simplify logic when widening vs. pairing loads/stores. NFC.
The logic to pair instructions and merge narrow instructions has become cloogy
and error prone.  This patch beings to unravel these two similar, but distinct
optimizations.

llvm-svn: 260242
2016-02-09 18:10:20 +00:00
Sanjay Patel
9ad4cf304b [x86] make getOneTrueElt() a helper function ; NFC
As mentioned in http://reviews.llvm.org/D16828 , the related masked load transform
will need this logic, so I'm moving it out to make that patch smaller.

llvm-svn: 260240
2016-02-09 17:39:58 +00:00
Chad Rosier
04b8aec790 [AArch64] Rename variable to improve readability. NFC.
llvm-svn: 260228
2016-02-09 15:59:57 +00:00
Chad Rosier
271160c32d [AArch64] Remove stale comment.
llvm-svn: 260226
2016-02-09 15:51:33 +00:00
Simon Pilgrim
9569b015d4 [X86][AVX2] Fix SIGN_EXTEND vector handling on AVX2 targets.
On AVX2 target we are poorly legalizing SIGN_EXTEND ops for which the input's legalized type doesn't have the same number of elements as the destination, resulting in an ANY_EXTEND followed by a SIGN_EXTEND_INREG.

This patch uses the existing SIGN_EXTEND -> SIGN_EXTEND_VECTOR_INREG combine to extend the input to the size of the result and using SIGN_EXTEND_VECTOR_INREG instead.

Differential Revision: http://reviews.llvm.org/D16994

llvm-svn: 260210
2016-02-09 08:19:19 +00:00
Simon Pilgrim
4d4989655e [X86][SSE1] Add MOVLHPS/MOVHLPS lowering and memory folding support
As discussed on PR26491, this patch adds support for lowering v4f32 shuffles to the MOVLHPS/MOVHLPS instructions. It also adds support for memory folding with their MOVLPS/MOVHPS load equivalents.

This first patch only really helps SSE1 targets as SSE2+ targets will widen the shuffle mask and use v2f64 equivalents (although they still combine to MOVLHPS/MOVHLPS for v2f64 splats). This will have to be addressed in a future patch, most likely when we add support for binary target shuffle combines.

Differential Revision: http://reviews.llvm.org/D16956

llvm-svn: 260168
2016-02-08 23:03:46 +00:00
Dan Gohman
32a15ba966 [WebAssembly] Update the br_if instructions' operand orders to match the spec.
llvm-svn: 260152
2016-02-08 21:50:13 +00:00
Sanjay Patel
69b9326059 [x86] convert masked store of one element to scalar store
Another opportunity to reduce masked stores: in D16691, we decided not to attempt the 'one mask element is set'
transform in InstCombine, but this should be a win for any AVX machine.

Code comments note that this transform could be extended for other targets / cases.

Differential Revision: http://reviews.llvm.org/D16828

llvm-svn: 260145
2016-02-08 21:05:08 +00:00
Tom Stellard
bb35f34026 AMDGPU/SI: Implement a work-around for smrd corrupting vccz bit
Summary:
We will hit this once we have enabled uniform branches.  The
smrd-vccz-bug.ll test will be added with the uniform branch commit.

Reviewers: mareko, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16725

llvm-svn: 260137
2016-02-08 19:49:20 +00:00
Hans Wennborg
36d0cb3fd9 [X86] Don't zero/sign-extend i1, i8, or i16 return values to 32 bits (PR22532)
This matches GCC and MSVC's behaviour, and saves on code size.

We were already not extending i1 return values on x86_64 after r127766. This
takes that patch further by applying it to x86 target as well, and also for i8
and i16.

The ABI docs have been unclear about the required behaviour here. The new i386
psABI [1] clearly states (Table 2.4, page 14) that i1, i8, and i16 return
vales do not need to be extended beyond 8 bits. The x86_64 ABI doc is being
updated to say the same [2].

Differential Revision: http://reviews.llvm.org/D16907

 [1]. https://01.org/sites/default/files/file_attach/intel386-psabi-1.0.pdf
 [2]. https://groups.google.com/d/msg/x86-64-abi/E8O33onbnGQ/_RFWw_ixDQAJ

llvm-svn: 260133
2016-02-08 19:34:30 +00:00
Tim Northover
1dabed8e29 AArch64: match correct order in subtraction pattern.
The accumulator in multiply-and-subtract instructions is actually subtracted
*from* so these patterns were computing the wrong value.

llvm-svn: 260131
2016-02-08 19:33:18 +00:00
Matt Arsenault
eed0ad4e3e AMDGPU: Remove bfi and bfm intrinsics
Nothing is using them.

llvm-svn: 260123
2016-02-08 19:06:01 +00:00
Michael Zuckerman
c705af63bb [AVX512][PROLQ][PROLD] Change imm8 to int
Differential Revision: http://reviews.llvm.org/D16983

llvm-svn: 260101
2016-02-08 15:13:32 +00:00
Dan Gohman
33dc7c1040 [WebAssembly] Add another optimization idea to README.txt.
llvm-svn: 260070
2016-02-08 03:42:36 +00:00
Craig Topper
bc3298f1ee [X86] Change FeatureIFMA string to 'avx512ifma'. Matches gcc and fixes PR26461.
llvm-svn: 260069
2016-02-08 01:23:15 +00:00
Simon Pilgrim
9ecb59bace [X86][SSE] Resolve target shuffle inputs to sentinels to permit more combines
The combineX86ShufflesRecursively only supports unary shuffles, but was missing the opportunity to combine binary shuffles with a zero / undef second input.

This patch resolves target shuffle inputs, converting the shuffle mask elements to SM_SentinelUndef/SM_SentinelZero where possible. It then resolves the updated mask to check if we have created a faux unary shuffle.

Additionally, we now attempt to recursively call combineX86ShufflesRecursively for all input operands (we used to just recurse for unary integer shuffles and unary unpacks) - it safely returns early if its not a target shuffle.

Differential Revision: http://reviews.llvm.org/D16683

llvm-svn: 260063
2016-02-07 22:51:06 +00:00
Simon Pilgrim
8d73f8b49c [X86][SSE] Added support for MOVHPD/MOVLPD + MOVHPS/MOVLPS shuffle decoding.
llvm-svn: 260034
2016-02-07 15:39:22 +00:00
Asaf Badouh
5bbbfafa66 [X86][AVX512] add intrinsics of Scalar FP to integer conversion with rounding mode
Differential Revision: http://reviews.llvm.org/D16629

llvm-svn: 260033
2016-02-07 14:59:13 +00:00
Simon Pilgrim
bab30569fb [X86][SSE] Pulled out repeated target shuffle decodes into helper functions. NFCI.
Pulled out the code used by PSHUFB/VPERMV/VPERMV3 shuffle mask decoding into common helper functions.

The helper functions handle masks coming from BROADCAST/BUILD_VECTOR and ConstantPool nodes respectively.

llvm-svn: 260032
2016-02-07 14:33:03 +00:00
Igor Breger
60ac21f165 AVX512: VPBROADCASTB/W/D/Q from GPR intrinsics implementation.
Differential Revision: http://reviews.llvm.org/D16813

llvm-svn: 260024
2016-02-07 08:30:50 +00:00
Simon Pilgrim
b7e95cd192 [X86][AVX512] Added support for VPMOVZX shuffle decoding.
llvm-svn: 260007
2016-02-06 19:51:21 +00:00
Simon Pilgrim
a80ad70644 [X86][SSE] Moved shuffle decode CASE macros earlier. NFC.
To allow the helper functions to make use of them.

llvm-svn: 259997
2016-02-06 17:02:15 +00:00
Simon Pilgrim
520933a81a [X86][SSE] Refactored PMOVZX shuffle decoding to use scalar input types
First step towards being able to decode AVX512 PMOVZX instructions without a massive bloat in the shuffle decode switch statement.

This should also make it easier to decode X86ISD::VZEXT target shuffles in the future.

llvm-svn: 259995
2016-02-06 16:33:42 +00:00
Simon Pilgrim
f1a97ef96e [X86][SSE] Don't replace an existing 32-bit load with its duplicate
If we are already loading a single 32-bit float/integer then just reuse it.

Fix for regression in D16729

llvm-svn: 259991
2016-02-06 15:37:09 +00:00
Simon Pilgrim
354f04bdd5 Comment fix
llvm-svn: 259990
2016-02-06 14:21:49 +00:00
Evandro Menezes
0e4ef392d8 [AArch64] Add the scheduling model for Exynos-M1
Summary:
Add the core scheduling model for the Samsung Exynos-M1 (ARMv8-A).


Reviewers: jmolloy, rengolin, christof, MinSeongKIM, t.p.northover

Subscribers: aemerson, rengolin, MatzeB

Differential Revision: http://reviews.llvm.org/D16644

llvm-svn: 259958
2016-02-06 00:01:41 +00:00
Jun Bum Lim
62bd130ca7 [AArch64] Refactoring aarch64-ldst-opt. NCF.
Remove narrow load / store instructions from getMatchingPairOpcode(),
and add getMatchingWideOpcode().

llvm-svn: 259914
2016-02-05 20:02:03 +00:00
Matt Arsenault
8009cfb6b5 AMDGPU: Account for LDS alignment
The current situation isn't great, because the amount of padding
requires is determined by the inverse order of the first encountered
use. We should eventually somehow sort these to minimize wasted space.

Another problem is the alignment of kernel arguments isn't
respected. The group_segment_alignment is always emitted as
the default 16, and typed arguments with higher alignments
or an explicitly set alignment are also ignored.

llvm-svn: 259912
2016-02-05 19:47:29 +00:00
Matt Arsenault
3c264fc8c0 AMDGPU: Preserve alignments on new created globals
Also switch to internal linkage, and include the name of the function in
the name.

llvm-svn: 259911
2016-02-05 19:47:23 +00:00
Tom Stellard
350a5c65b3 AMDGPU: Remove some purely R600 functions from AMDGPUInstrInfo
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16862

llvm-svn: 259900
2016-02-05 18:44:57 +00:00
Tom Stellard
9f8aefe66d AMDGPU: Fix ordering of CPU and FS parameters in TargetMachine constructors
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16863

llvm-svn: 259897
2016-02-05 18:29:17 +00:00
Tom Stellard
4948571e05 AMDGPU/SI: Correctly initialize SIInsertWaits pass
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16724

llvm-svn: 259894
2016-02-05 17:42:38 +00:00
Dan Gohman
f44d278023 [WebAssembly] Update the select instructions' operand orders to match the spec.
llvm-svn: 259893
2016-02-05 17:14:59 +00:00
Nemanja Ivanovic
3a405a94b2 Fix for PR 26193
This is a simple fix for a PowerPC intrinsic that was incorrectly defined
(the return type was incorrect).

llvm-svn: 259886
2016-02-05 14:50:29 +00:00
Benjamin Kramer
9b60b05c94 Move classes defined in a cpp file into an anonymous namespace.
No functionality change intended.

llvm-svn: 259883
2016-02-05 13:50:53 +00:00
Renato Golin
c3580d7a36 Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3)."
This reverts commit r259812 as it broke AArch64 self-hosting.

llvm-svn: 259881
2016-02-05 12:14:30 +00:00
Nemanja Ivanovic
b7bc445a9f Fix for PR 26356
Using the load immediate only when the immediate (whether signed or unsigned)
can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and
0 to 65535 for unsigned. This patch also ensures that we sign-extend under the
right conditions.

llvm-svn: 259840
2016-02-04 23:14:42 +00:00
Chad Rosier
379876e08f [AArch64] Bound the number of instructions we scan when searching for updates.
This only impacts the creation of pre-/post-index instructions.  The bound was
set high enough such that it did not change code generation for SPEC200X.

llvm-svn: 259828
2016-02-04 21:26:02 +00:00
Simon Pilgrim
cc76b8656c [X86][SSE] Select domain for 32/64-bit partial loads for EltsFromConsecutiveLoads
Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type.

This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain.....

llvm-svn: 259816
2016-02-04 19:27:51 +00:00
Chad Rosier
b8b5852fe4 [AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3).
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.

PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.

This is a reapplication of r246769 and r259790.  The tramp3d failure was caused
by an incorrect refactoring in the patch.  Specifically, we weren't always
properly clearing the SExtIdx flag.

llvm-svn: 259812
2016-02-04 18:59:49 +00:00
Silviu Baranga
22ab3adc5c [AArch64] Multiply extended 32-bit ints with `[U|S]MADDL'
During instruction selection, the AArch64 backend can recognise the
following pattern and generate an [U|S]MADDL instruction, i.e. a
multiply of two 32-bit operands with a 64-bit result:

(mul (sext i32), (sext i32))
However, when one of the operands is constant, the sign extension
gets folded into the constant in SelectionDAG::getNode(). This means
that the instruction selection sees this:

(mul (sext i32), i64)
...which doesn't match the pattern. Sign-extension and 64-bit
multiply instructions are generated, which are slower than one 32-bit
multiply.

Add a pattern to match this and generate the correct instruction, for
both signed and unsigned multiplies.

Patch by Chris Diamand!

llvm-svn: 259800
2016-02-04 16:47:09 +00:00
Simon Pilgrim
da26d272a9 [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads
This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load.

Differential Revision: http://reviews.llvm.org/D16729

llvm-svn: 259796
2016-02-04 16:12:56 +00:00
Chad Rosier
fcca55983b Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR."
This reverts commit r259790. tramp3d-v4 is still having problems.

llvm-svn: 259795
2016-02-04 16:01:40 +00:00
Elena Demikhovsky
86a7e2549e AVX-512: Fixed a bug in FMA instruction selection on KNL
The FMA instruction was selected from AVX2 set instead of AVX-512

Differential Revision: http://reviews.llvm.org/D16884

llvm-svn: 259792
2016-02-04 15:11:11 +00:00
Chad Rosier
52d5d7b161 [AArch64] Improve load/store optimizer to handle LDUR + LDR.
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.

PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.

This is a reapplication of r246769, which was reverted in r246782 due to a
test-suite failure.  I'm unable to reproduce the issue at this time.

llvm-svn: 259790
2016-02-04 14:42:55 +00:00
Michael Zuckerman
d8de4a9888 [AVX512] add vfmadd132ss and vfmadd132sd Intrinsic
Differential Revision: http://reviews.llvm.org/D16589

llvm-svn: 259789
2016-02-04 14:41:08 +00:00
Simon Pilgrim
36bd348c5b [X86] Moved SEXT -> SIGN_EXTEND_VECTOR_INREG combine into helper. NFC.
llvm-svn: 259771
2016-02-04 09:27:19 +00:00
Andrey Turetskiy
93bc15df7c [X86] Use hash table in LEA optimization pass.
Use hash table (key is a memory operand) to store found LEA instructions to reduce compile time.

Differential Revision: http://reviews.llvm.org/D16404

llvm-svn: 259770
2016-02-04 08:57:03 +00:00
Jingyue Wu
bb54579422 [NVPTX] Disable performance optimizations when OptLevel==None
Reviewers: jholewinski, tra, eliben

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D16874

llvm-svn: 259749
2016-02-04 04:15:36 +00:00
Sanjay Patel
58b4f8e215 clean up; NFC
llvm-svn: 259720
2016-02-03 22:37:37 +00:00
Saleem Abdulrasool
d7405cba41 ARM: support TLS for WoA
Add support for TLS access for Windows on ARM.  This generates a similar access
to MSVC for ARM.

The changes to the tablegen data is needed to support loading an external symbol
global that is not for a call.  The adjustments to the DAG to DAG transforms are
needed to preserve the 32-bit move.

llvm-svn: 259676
2016-02-03 18:21:59 +00:00
Renato Golin
662cbc93f4 [ARM] Move GNUEABI divmod to __aeabi_divmod*
The GNU toolchain emits __aeabi_divmod for soft-divide on ARM cores
which happens to be a lot faster than __divsi3/__modsi3 when the core
has hardware divide instructions. Do the same here.

Fixes PR26450.

llvm-svn: 259657
2016-02-03 16:10:54 +00:00
Daniel Sanders
36e4bed845 [mips] Remove redundant inclusions of MipsAnalyzeImmediate.h
llvm-svn: 259655
2016-02-03 15:54:12 +00:00
Nemanja Ivanovic
3fb0b09e1f Fix for PR 26381
Simple fix - Constant values were not being sign extended in FastIsel.

llvm-svn: 259645
2016-02-03 12:53:38 +00:00