1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 19:42:54 +02:00
Commit Graph

36079 Commits

Author SHA1 Message Date
Simon Pilgrim
f1a97ef96e [X86][SSE] Don't replace an existing 32-bit load with its duplicate
If we are already loading a single 32-bit float/integer then just reuse it.

Fix for regression in D16729

llvm-svn: 259991
2016-02-06 15:37:09 +00:00
Simon Pilgrim
354f04bdd5 Comment fix
llvm-svn: 259990
2016-02-06 14:21:49 +00:00
Evandro Menezes
0e4ef392d8 [AArch64] Add the scheduling model for Exynos-M1
Summary:
Add the core scheduling model for the Samsung Exynos-M1 (ARMv8-A).


Reviewers: jmolloy, rengolin, christof, MinSeongKIM, t.p.northover

Subscribers: aemerson, rengolin, MatzeB

Differential Revision: http://reviews.llvm.org/D16644

llvm-svn: 259958
2016-02-06 00:01:41 +00:00
Jun Bum Lim
62bd130ca7 [AArch64] Refactoring aarch64-ldst-opt. NCF.
Remove narrow load / store instructions from getMatchingPairOpcode(),
and add getMatchingWideOpcode().

llvm-svn: 259914
2016-02-05 20:02:03 +00:00
Matt Arsenault
8009cfb6b5 AMDGPU: Account for LDS alignment
The current situation isn't great, because the amount of padding
requires is determined by the inverse order of the first encountered
use. We should eventually somehow sort these to minimize wasted space.

Another problem is the alignment of kernel arguments isn't
respected. The group_segment_alignment is always emitted as
the default 16, and typed arguments with higher alignments
or an explicitly set alignment are also ignored.

llvm-svn: 259912
2016-02-05 19:47:29 +00:00
Matt Arsenault
3c264fc8c0 AMDGPU: Preserve alignments on new created globals
Also switch to internal linkage, and include the name of the function in
the name.

llvm-svn: 259911
2016-02-05 19:47:23 +00:00
Tom Stellard
350a5c65b3 AMDGPU: Remove some purely R600 functions from AMDGPUInstrInfo
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16862

llvm-svn: 259900
2016-02-05 18:44:57 +00:00
Tom Stellard
9f8aefe66d AMDGPU: Fix ordering of CPU and FS parameters in TargetMachine constructors
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16863

llvm-svn: 259897
2016-02-05 18:29:17 +00:00
Tom Stellard
4948571e05 AMDGPU/SI: Correctly initialize SIInsertWaits pass
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16724

llvm-svn: 259894
2016-02-05 17:42:38 +00:00
Dan Gohman
f44d278023 [WebAssembly] Update the select instructions' operand orders to match the spec.
llvm-svn: 259893
2016-02-05 17:14:59 +00:00
Nemanja Ivanovic
3a405a94b2 Fix for PR 26193
This is a simple fix for a PowerPC intrinsic that was incorrectly defined
(the return type was incorrect).

llvm-svn: 259886
2016-02-05 14:50:29 +00:00
Benjamin Kramer
9b60b05c94 Move classes defined in a cpp file into an anonymous namespace.
No functionality change intended.

llvm-svn: 259883
2016-02-05 13:50:53 +00:00
Renato Golin
c3580d7a36 Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3)."
This reverts commit r259812 as it broke AArch64 self-hosting.

llvm-svn: 259881
2016-02-05 12:14:30 +00:00
Nemanja Ivanovic
b7bc445a9f Fix for PR 26356
Using the load immediate only when the immediate (whether signed or unsigned)
can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and
0 to 65535 for unsigned. This patch also ensures that we sign-extend under the
right conditions.

llvm-svn: 259840
2016-02-04 23:14:42 +00:00
Chad Rosier
379876e08f [AArch64] Bound the number of instructions we scan when searching for updates.
This only impacts the creation of pre-/post-index instructions.  The bound was
set high enough such that it did not change code generation for SPEC200X.

llvm-svn: 259828
2016-02-04 21:26:02 +00:00
Simon Pilgrim
cc76b8656c [X86][SSE] Select domain for 32/64-bit partial loads for EltsFromConsecutiveLoads
Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type.

This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain.....

llvm-svn: 259816
2016-02-04 19:27:51 +00:00
Chad Rosier
b8b5852fe4 [AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3).
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.

PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.

This is a reapplication of r246769 and r259790.  The tramp3d failure was caused
by an incorrect refactoring in the patch.  Specifically, we weren't always
properly clearing the SExtIdx flag.

llvm-svn: 259812
2016-02-04 18:59:49 +00:00
Silviu Baranga
22ab3adc5c [AArch64] Multiply extended 32-bit ints with `[U|S]MADDL'
During instruction selection, the AArch64 backend can recognise the
following pattern and generate an [U|S]MADDL instruction, i.e. a
multiply of two 32-bit operands with a 64-bit result:

(mul (sext i32), (sext i32))
However, when one of the operands is constant, the sign extension
gets folded into the constant in SelectionDAG::getNode(). This means
that the instruction selection sees this:

(mul (sext i32), i64)
...which doesn't match the pattern. Sign-extension and 64-bit
multiply instructions are generated, which are slower than one 32-bit
multiply.

Add a pattern to match this and generate the correct instruction, for
both signed and unsigned multiplies.

Patch by Chris Diamand!

llvm-svn: 259800
2016-02-04 16:47:09 +00:00
Simon Pilgrim
da26d272a9 [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads
This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load.

Differential Revision: http://reviews.llvm.org/D16729

llvm-svn: 259796
2016-02-04 16:12:56 +00:00
Chad Rosier
fcca55983b Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR."
This reverts commit r259790. tramp3d-v4 is still having problems.

llvm-svn: 259795
2016-02-04 16:01:40 +00:00
Elena Demikhovsky
86a7e2549e AVX-512: Fixed a bug in FMA instruction selection on KNL
The FMA instruction was selected from AVX2 set instead of AVX-512

Differential Revision: http://reviews.llvm.org/D16884

llvm-svn: 259792
2016-02-04 15:11:11 +00:00
Chad Rosier
52d5d7b161 [AArch64] Improve load/store optimizer to handle LDUR + LDR.
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.

PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.

This is a reapplication of r246769, which was reverted in r246782 due to a
test-suite failure.  I'm unable to reproduce the issue at this time.

llvm-svn: 259790
2016-02-04 14:42:55 +00:00
Michael Zuckerman
d8de4a9888 [AVX512] add vfmadd132ss and vfmadd132sd Intrinsic
Differential Revision: http://reviews.llvm.org/D16589

llvm-svn: 259789
2016-02-04 14:41:08 +00:00
Simon Pilgrim
36bd348c5b [X86] Moved SEXT -> SIGN_EXTEND_VECTOR_INREG combine into helper. NFC.
llvm-svn: 259771
2016-02-04 09:27:19 +00:00
Andrey Turetskiy
93bc15df7c [X86] Use hash table in LEA optimization pass.
Use hash table (key is a memory operand) to store found LEA instructions to reduce compile time.

Differential Revision: http://reviews.llvm.org/D16404

llvm-svn: 259770
2016-02-04 08:57:03 +00:00
Jingyue Wu
bb54579422 [NVPTX] Disable performance optimizations when OptLevel==None
Reviewers: jholewinski, tra, eliben

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D16874

llvm-svn: 259749
2016-02-04 04:15:36 +00:00
Sanjay Patel
58b4f8e215 clean up; NFC
llvm-svn: 259720
2016-02-03 22:37:37 +00:00
Saleem Abdulrasool
d7405cba41 ARM: support TLS for WoA
Add support for TLS access for Windows on ARM.  This generates a similar access
to MSVC for ARM.

The changes to the tablegen data is needed to support loading an external symbol
global that is not for a call.  The adjustments to the DAG to DAG transforms are
needed to preserve the 32-bit move.

llvm-svn: 259676
2016-02-03 18:21:59 +00:00
Renato Golin
662cbc93f4 [ARM] Move GNUEABI divmod to __aeabi_divmod*
The GNU toolchain emits __aeabi_divmod for soft-divide on ARM cores
which happens to be a lot faster than __divsi3/__modsi3 when the core
has hardware divide instructions. Do the same here.

Fixes PR26450.

llvm-svn: 259657
2016-02-03 16:10:54 +00:00
Daniel Sanders
36e4bed845 [mips] Remove redundant inclusions of MipsAnalyzeImmediate.h
llvm-svn: 259655
2016-02-03 15:54:12 +00:00
Nemanja Ivanovic
3fb0b09e1f Fix for PR 26381
Simple fix - Constant values were not being sign extended in FastIsel.

llvm-svn: 259645
2016-02-03 12:53:38 +00:00
Simon Atanasyan
37a4fee5f0 [mips] Add SHF_MIPS_GPREL flag to the MIPS .sbss and .sdata sections
MIPS ABI states that .sbss and .sdata sections must have SHF_MIPS_GPREL
flag. See Figure 4–7 on page 69 in the following document:
ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf.

Differential Revision: http://reviews.llvm.org/D15740

llvm-svn: 259641
2016-02-03 11:50:22 +00:00
Simon Pilgrim
8aa2db1f2d [X86][AVX] Add support for 64-bit VZEXT_LOAD of 256/512-bit vectors to EltsFromConsecutiveLoads
Follow up to D16217 and D16729

This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD

Differential Revision: http://reviews.llvm.org/D16768

llvm-svn: 259635
2016-02-03 09:41:59 +00:00
Kyle Butt
77f7a2b7a8 Codegen: [PPC] Fix PPCVSXFMAMutate to handle duplicates.
The purpose of PPCVSXFMAMutate is to elide copies by changing FMA forms
on PPC.

    %vreg6<def> = COPY %vreg96
    %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg7
    ;v6 = v6 + v5 * v7

is replaced by

    %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg7, %vreg96
    ;v5 = v5 * v7 + v96

This was broken in the case where the target register was also used as a
multiplicand. Fix this case by checking for it and replacing both uses
with the copied register.

    %vreg6<def> = COPY %vreg96
    %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg6
    ;v6 = v6 + v5 * v6

is replaced by

    %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg96, %vreg96
    ;v5 = v5 * v96 + v96

llvm-svn: 259617
2016-02-03 01:41:09 +00:00
Yunzhong Gao
d9694f67fb Revert r259576: Disable the vzeroupper insertion pass on PS4.
Will re-implement based on review feedback.

llvm-svn: 259615
2016-02-03 01:25:12 +00:00
Yunzhong Gao
3180165799 Disable the vzeroupper insertion pass on PS4.
See comments in test/CodeGen/X86/avx-vzeroupper.ll for more explanation.

Original patch by: Sean Silva

llvm-svn: 259576
2016-02-02 21:39:23 +00:00
Matt Arsenault
b7a70ed17f AMDGPU: Do not promote allocas with non-inbounds GEPs
If we can't assume the pointer value isn't within the bounds
of the object, it seems risky to try to replace the pointer
calculations.

llvm-svn: 259573
2016-02-02 21:16:12 +00:00
Matt Arsenault
1eab1a7019 AMDGPU: Handle promoting memmove
Also add missing tests for the others.

llvm-svn: 259558
2016-02-02 20:28:10 +00:00
Quentin Colombet
d668e1ecc8 [X86] Fix the merging of SP updates in prologue/epilogue insertions.
When the merging was involving LEAs, we were taking the wrong immediate
from the list of operands.

rdar://problem/24446069

llvm-svn: 259553
2016-02-02 20:11:17 +00:00
Matt Arsenault
48d83980e8 AMDGPU: Skip promote alloca with no optimizations
llvm-svn: 259551
2016-02-02 19:32:42 +00:00
Matt Arsenault
dc5fdc3a8f AMDGPU: Minor cleanups for AMDGPUPromoteAlloca
Mostly convert to use range loops.

llvm-svn: 259550
2016-02-02 19:32:35 +00:00
Matt Arsenault
de561cf1f6 AMDGPU: Report AMDGPUPromoteAlloca changed the function
llvm-svn: 259547
2016-02-02 19:18:57 +00:00
Matt Arsenault
201441fa82 AMDGPU: Whitelist handled intrinsics
We shouldn't crash on unhandled intrinsics.
Also simplify failure handling in loop.

llvm-svn: 259546
2016-02-02 19:18:53 +00:00
Matt Arsenault
aef62a4730 AMDGPU: Use inbounds when calculating workitem offset
When promoting allocas to LDS, we know we are indexing
into a specific area just created, and the calculation
will also never overflow.

Also emit some of the muls as nsw nuw, because instcombine
infers this already from the range metadata. I think
putting this on the other adds and muls might be OK too,
but I'm not 100% sure.

llvm-svn: 259545
2016-02-02 19:18:48 +00:00
Eugene Zelenko
0ebce618ad Fix Clang-tidy readability-redundant-control-flow warnings; other minor fixes.
Differential revision: http://reviews.llvm.org/D16793

llvm-svn: 259539
2016-02-02 18:20:45 +00:00
Derek Schuff
c9579c25d0 [MC] Enable eip-relative addressing on x86-64 for X32 ABI
Summary:
Enables eip-based addressing, e.g.,

lea    constant(%eip), %rax
lea    constant(%eip), %eax

in MC, (used for the x32 ABI). EIP-base addressing is also valid in x86_64,
it is left enabled for that architecture as well.

Patch by João Porto

Differential Revision: http://reviews.llvm.org/D16581

llvm-svn: 259528
2016-02-02 17:20:04 +00:00
Chad Rosier
7178a271fa [AArch64] Add a FIXME comment.
llvm-svn: 259515
2016-02-02 15:22:55 +00:00
Chad Rosier
8104a75622 [AArch64] Allocate the modified and used regs only once per function.
llvm-svn: 259510
2016-02-02 15:02:30 +00:00
JF Bastien
1a84a12635 WebAssembly: update expected GCC torture test failures
The 3 programs used __attribute__((mode(?))) on enum, which clang r259497 fixed.

llvm-svn: 259508
2016-02-02 14:27:34 +00:00
Oliver Stannard
a96193f77c Refactor backend diagnostics for unsupported features
Re-commit of r258951 after fixing layering violation.

The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.

There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.

llvm-svn: 259498
2016-02-02 13:52:43 +00:00
Simon Pilgrim
7647026685 [X86][AVX512] Add support for AVX512 VMOVQ (load) shuffle decoding
llvm-svn: 259496
2016-02-02 13:32:56 +00:00
JF Bastien
5372afc7ec WebAssembly: add option to disable register coloring
Having this hidden option makes it easier to debug other issues.

llvm-svn: 259482
2016-02-02 09:30:01 +00:00
Sjoerd Meijer
e59db362af Removed FeatureVFPOnlySP from the Cortex-R7 processor model
description and changed the regression test accordingly.
The default configuration of a Cortex-R7 is to implement the
VFPv3-D16 architecture and the feature line as it was is too
restrictive.

llvm-svn: 259480
2016-02-02 09:28:20 +00:00
Sanjoy Das
6b9e756a71 [X86] Fix a bug in getMemOpBaseRegImmOfs
Fix a crash in `getMemOpBaseRegImmOfs` that happens if the base of
`MemOp` is a frame index memory operand.  The fix is to have
`getMemOpBaseRegImmOfs` bail out in such cases.  We can possibly be more
clever here, if needed.

llvm-svn: 259456
2016-02-02 02:32:43 +00:00
Ahmed Bougacha
7e17e58964 [X86][FastISel] Don't force Nearest-Even rounding for VCVTPS2PH, use MXCSR.
FastISel counterpart to r259448.

llvm-svn: 259449
2016-02-02 01:44:03 +00:00
Ahmed Bougacha
d732a878e7 [X86] Don't force Nearest-Even rounding for VCVTPS2PH, use MXCSR.
Officially, we don't acknowledge non-default configurations of MXCSR,
as getting there would require usage of the FENV_ACCESS pragma (at
least insofar as rounding mode is concerned).

We don't support the pragma, so we can assume that the default
rounding mode - round to nearest, ties to even - is always used.

However, it's inconsistent with the rest of the instruction set,
where MXCSR is always effective (unless otherwise specified).
Also, it's an unnecessary obstacle to the few brave souls that use
fenv.h with LLVM.

Avoid the hard-coded rounding mode for fp_to_f16; use MXCSR instead.

llvm-svn: 259448
2016-02-02 01:32:50 +00:00
Sanjay Patel
796fb93f27 fix typos; NFC
llvm-svn: 259438
2016-02-01 23:53:35 +00:00
Simon Pilgrim
50130a8ffb [X86][AVX512] Add support for AVX512 VMOVD (load) shuffle decoding
llvm-svn: 259430
2016-02-01 23:04:05 +00:00
Simon Pilgrim
176f061ffa [X86][AVX512] Add support for AVX512 VMOVSD/VMOVSS shuffle decoding
llvm-svn: 259427
2016-02-01 22:26:28 +00:00
Simon Pilgrim
f6407af598 [X86][AVX512] Add support for AVX512 VINSERTPS shuffle decoding
llvm-svn: 259420
2016-02-01 22:05:50 +00:00
Matthias Braun
1ad84f2d28 SmallSet/SmallPtrSet: Refuse huge Small numbers
These sets do linear searching in small mode; It is not a good idea to
use huge numbers as the small value here, save people from themselves by
adding a static_assert.

Differential Revision: http://reviews.llvm.org/D16706

llvm-svn: 259419
2016-02-01 22:05:16 +00:00
Chad Rosier
2d547c1969 Move comments a bit closer to associated code. NFC.
llvm-svn: 259411
2016-02-01 21:38:31 +00:00
Chad Rosier
ff5772941f Remove extra semicolon. NFC.
llvm-svn: 259402
2016-02-01 20:54:36 +00:00
Balaram Makam
a423972bd5 AArch64: Implement missed conditional compare sequences.
Summary:
This is an extension to the existing implementation of r242436 which
restricts to only select inputs. This version fixes missed opportunities
in pr26084 by attempting to lower conditional compare sequences of
and/or trees with setcc leafs. This will additionaly handle the case
when a tree with select input is not a conjunction-disjunction tree
but some of the sub trees are conjunction-disjunction trees.

Reviewers: jmolloy, t.p.northover, mcrosier, MatzeB

Subscribers: mcrosier, llvm-commits, junbuml, haicheng, mssimpso, gberry

Differential Revision: http://reviews.llvm.org/D16291

llvm-svn: 259387
2016-02-01 19:13:07 +00:00
Geoff Berry
80bf76b8b7 [AArch64] Simplify prolog/epilog callee save/restore. NFC.
Summary:
Factor out common code for callee-save register pair calculation.  This
is intended to simplify follow-on changes that reduce the number of
registers saved/restored.

Depends on D16732

Reviewers: mcrosier, jmolloy, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16734

llvm-svn: 259384
2016-02-01 19:07:06 +00:00
Ulrich Weigand
c791b42454 [SystemZ] Fix wrong-code generation for certain always-false conditions
We've found another bug in the code generation logic conditions for a
certain class of always-false conditions, those of the form
   if ((a & 1) < 0)

These only reach the back end when compiling without optimization.

The bug was introduced by the choice of using TEST UNDER MASK
to implement a check for
   if ((a & MASK) < VAL)
as
   if ((a & MASK) == 0)

where VAL is less than the the lowest bit of MASK.  This is correct
in all cases except for VAL == 0, in which case the original
condition is always false, but the replacement isn't.

Fixed by excluding that particular case.

llvm-svn: 259381
2016-02-01 18:31:19 +00:00
Colin LeMahieu
698983cb93 [NFC] Referencing manual for reason why subregbit is checked
llvm-svn: 259380
2016-02-01 18:15:39 +00:00
Geoff Berry
eacbf522af [AArch64] Simplify callee-save register save/restore. NFC.
Summary:
Simplify callee-save register save/restore code generation by
remembering the size of the callee-save area when it is computed so we
don't have to scan the prologue/epilogue instructions again later to
reconstruct it.

This is intended to simplify follow-on changes that reduce the number of
registers saved/restored.

Reviewers: mcrosier, jmolloy, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16732

llvm-svn: 259365
2016-02-01 16:29:19 +00:00
Asaf Badouh
7d5bdf84bb [X86][AVX512VBMI] add encoding and intrinsics for Multishift
Differential Revision: http://reviews.llvm.org/D16399

llvm-svn: 259363
2016-02-01 15:48:21 +00:00
Daniel Sanders
878dadf925 [mips] Range check uimm16 and fix several bugs this revealed.
Summary:
The bugs were:
* teq and similar take 4-bit unsigned immediates on microMIPS.
* teqi and similar have side-effects like teq do.
* shll_s.w and shra_r.w take 5-bit unsigned immediates.
* The various DSP ext* instructions take a 5-bit immediate.
* repl.qh takes an 8-bit unsigned immediate.
* repl.ph takes a 10-bit unsigned immediate.
* rddsp/wrdsp take a 10-bit unsigned immediate.
* teqi and similar take signed 16-bit immediates (10-bit for microMIPS).
* Out-of-range immediate macros for or/xor take a simm32/simm64 depending
  on architecture. I'll fix the simm64 case properly when I reach simm32.

lui is a bit more lenient than GAS and accepts signed immediates in addition
to unsigned. This is because MipsMCExpr can produce signed values when
constant folding and it currently lacks a way of knowing it should fold to
an unsigned value.

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D15446

llvm-svn: 259360
2016-02-01 15:13:31 +00:00
JF Bastien
56abfb8d21 WebAssembly NFC: simplify control flow
This should now be easier to read.

llvm-svn: 259349
2016-02-01 10:46:16 +00:00
Igor Breger
15632eed43 AVX512: fix mask handling for gather/scatter/prefetch intrinsics.
Differential Revision: http://reviews.llvm.org/D16755

llvm-svn: 259346
2016-02-01 09:57:15 +00:00
Simon Pilgrim
f62b33f8d4 [X86][SSE] Find source of the inserted element of INSERTPS
Minor patch to trace back through target shuffles to the source of the inserted element in a (V)INSERTPS shuffle.

Differential Revision: http://reviews.llvm.org/D16652

llvm-svn: 259343
2016-02-01 08:59:30 +00:00
Igor Breger
fa62fb9857 AVX512 : Fix SETCCE lowering for KNL 32 bit.
Differential Revision: http://reviews.llvm.org/D16752

llvm-svn: 259342
2016-02-01 07:56:09 +00:00
David Majnemer
56b2a51bb8 [X86] Cleanup the WinEHState pass
Remove unnecessary includes and class state.

No functional change intended.

llvm-svn: 259340
2016-02-01 04:28:59 +00:00
Craig Topper
0df6bdba52 Replace usages of llvm::utostr_32 with just llvm::utostr. While this is less efficient, its unclear the few places that were using the _32 version were doing so for efficiency.
llvm-svn: 259330
2016-01-31 20:00:24 +00:00
JF Bastien
adbc41abb9 WebAssembly: more failures are gone
llvm-svn: 259321
2016-01-31 08:19:40 +00:00
JF Bastien
3b85804577 WebAssembly: update expected failures
r259305 fixed a few assertions around FrameIndex, and I forgot to update these failures despite having run the torture tests.

llvm-svn: 259320
2016-01-31 08:05:05 +00:00
Derek Schuff
2f77371cea [WebAssembly] Fix uses of FrameIndex as store values
Previously the code assumed all uses of FI on loads and stores were as
addresses. This checks whether the use is the address or a value and
handles the latter case as it does for non-memory instructions.

llvm-svn: 259306
2016-01-30 21:43:08 +00:00
JF Bastien
d89bb7340c WebAssembly: don't optimize frameindex store
The previous code was incorrect (can't getReg a frameindex). We could instead optimize it to reduce tree height, but I'm not sure that's worthwhile yet because we then try to eliminate the frameindex.

This patch also fixes frame index elimination for operations which may load or store: it used to assume the base was operand 2 and immediate offset operand 1. That's not true for stores, where they're 4 and 3.

llvm-svn: 259305
2016-01-30 14:11:26 +00:00
JF Bastien
2125e2f3c7 WebAssembly NFC: fix build warning
WebAssemblyFrameLowering.cpp:158:44: warning: enumeral and non-enumeral type in conditional expression [enabled by default]

llvm-svn: 259303
2016-01-30 11:19:26 +00:00
Matt Arsenault
2699008644 AMDGPU: Fix emitting invalid workitem intrinsics for HSA
The AMDGPUPromoteAlloca pass was emitting the read.local.size
calls, which with HSA was incorrectly selected to reading from
the offset mesa uses off of the kernarg pointer.

Error on intrinsics which aren't supported by HSA, and start
emitting the correct IR to read the workgroup size
out of the dispatch pointer.

Also initialize the pass so it can be tested with opt, and
start moving towards not depending on the subtarget as an
argument.

Start emitting errors for the intrinsics not handled with HSA.

llvm-svn: 259297
2016-01-30 05:19:45 +00:00
Matt Arsenault
bcaaea3448 AMDGPU: Stop checking intrinsics not used by HSA for dispatch-ptr
Only the dispatch.ptr intrinsic is supposed to be used now to get
the workgroup size, and the read.local.size intrinsics do not
work correctly.

llvm-svn: 259296
2016-01-30 05:10:59 +00:00
Dan Gohman
3f3cb842c4 [WebAssembly] Refine block placement to insert blocks between trees.
Refine the test for whether an instruction is in an expression tree so that
it detects when one tree ends and another begins, so we can place a block
at that point, rather than continuing to find the first instruction not in
a tree at all.

llvm-svn: 259294
2016-01-30 05:01:06 +00:00
Matt Arsenault
2f96fe904d AMDGPU: Add new amdgcn workitem intrinsics
These use the correct prefix and follow the HSA naming convention
rather than the config register option names.

llvm-svn: 259293
2016-01-30 04:25:19 +00:00
Matthias Braun
882ae69776 Avoid overly large SmallPtrSet/SmallSet
These sets perform linear searching in small mode so it is never a good
idea to use SmallSize/N bigger than 32.

llvm-svn: 259283
2016-01-30 01:24:31 +00:00
Justin Lebar
e107deb822 [CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.
Summary: Previously we'd just silently skip these.

Reviewers: tra, jholewinski

Subscribers: llvm-commits, jhen, echristo,

Differential Revision: http://reviews.llvm.org/D16739

llvm-svn: 259279
2016-01-30 01:07:38 +00:00
Yaron Keren
d008c8f557 Annotate dump() methods with LLVM_DUMP_METHOD, addressing Richard Smith r259192 post commit comment.
clang part in r259232, this is the LLVM part of the patch.

llvm-svn: 259240
2016-01-29 20:50:44 +00:00
Tim Northover
81271b4305 ARM: don't mangle DAG constant if it has more than one use
The basic optimisation was to convert (mul $LHS, $complex_constant) into
roughly "(shl (mul $LHS, $simple_constant), $simple_amt)" when it was expected
to be cheaper. The original logic checks that the mul only has one use (since
we're mangling $complex_constant), but when used in even more complex
addressing modes there may be an outer addition that can pick up the wrong
value too.

I *think* the ARM addressing-mode problem is actually unreachable at the
moment, but that depends on complex assessments of the profitability of
pre-increment addressing modes so I've put a real check in there instead of an
assertion.

llvm-svn: 259228
2016-01-29 19:18:46 +00:00
Derek Schuff
f866cd42ae [WebAssembly] Update test expectations
llvm-svn: 259223
2016-01-29 18:54:38 +00:00
Derek Schuff
385a651d43 [WebAssembly] Support frame pointer
Add support for frame pointer use in prolog/epilog.
Supports dynamic allocas but not yet over-aligned locals.
Target-independend CG generates SP updates, but we still need to write
back the SP value to memory when necessary.

llvm-svn: 259220
2016-01-29 18:37:49 +00:00
Zoran Jovanovic
6be01a5642 [mips] Absolute value macro expansion
Author: obucina
Reviewers: dsanders
Differential Revision: http://reviews.llvm.org/D16323

llvm-svn: 259202
2016-01-29 16:18:34 +00:00
Alexandros Lamprineas
1eca4c99e9 [ARM] Emit trap instruction using .inst directive
The trap instruction is emitted as a data-in-text rather
than an instruction. This patch uses the .inst directive
for emitting trap.

Differential Revision: http://reviews.llvm.org/D16684

llvm-svn: 259182
2016-01-29 10:23:32 +00:00
Matt Arsenault
b119805900 AMDGPU: Remove 24-bit intrinsics
The known bit matching code seems to work reasonably well,
so these shouldn't really be needed.

llvm-svn: 259180
2016-01-29 10:05:16 +00:00
Eric Christopher
aa54c26bb0 Refactor common code for PPC fast isel load immediate selection.
llvm-svn: 259178
2016-01-29 07:20:30 +00:00
Eric Christopher
14a36607b8 Since LI/LIS sign extend the constant passed into the instruction we should
check that the sign extended constant fits into 16-bits if we want a
zero extended value, otherwise go ahead and put it together piecemeal.

Fixes PR26356.

llvm-svn: 259177
2016-01-29 07:20:01 +00:00
Eric Christopher
2ac8efe601 Fix up conditional formatting.
llvm-svn: 259176
2016-01-29 07:19:49 +00:00
David Majnemer
d6f63d5999 [WinEH] Don't perform state stores in cleanups
Our cleanups do not support true lexical nesting of funclets which
obviates the need to perform state stores.

This fixes PR26361.

llvm-svn: 259161
2016-01-29 05:33:15 +00:00
Ahmed Bougacha
94c2d7d066 [AArch64] Fix i64 nontemporal high-half extraction.
Since we only have pair - not single - nontemporal store instructions,
we have to extract the high part into a separate register to be able
to use them.

When the initial nontemporal codegen support was added, I wrote the
extract using the nonsensical UBFX [0,32[.
Use the correct LSR form instead.

llvm-svn: 259134
2016-01-29 01:08:41 +00:00
Matt Arsenault
750cfb2e8d AMDGPU: Match fmed3 patterns with legacy fmin/fmax
llvm-svn: 259090
2016-01-28 20:53:48 +00:00