1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 19:42:54 +02:00
Commit Graph

35947 Commits

Author SHA1 Message Date
Manuel Jacob
e6438acb66 GlobalValue: use getValueType() instead of getType()->getPointerElementType().
Reviewers: mjacob

Subscribers: jholewinski, arsenm, dsanders, dblaikie

Patch by Eduard Burtescu.

Differential Revision: http://reviews.llvm.org/D16260

llvm-svn: 257999
2016-01-16 20:30:46 +00:00
Manman Ren
efc840cd69 CXX_FAST_TLS calling convention: fix issue on x86-64.
%RBP can't be handled explicitly. We generate the following code:
    pushq %rbp
    movq  %rsp, %rbp
    ...
    movq  %rbx, (%rbp)  ## 8-byte Spill
where %rbp will be overwritten by the spilled value.

The fix is to let PEI handle %RBP.
PR26136

llvm-svn: 257997
2016-01-16 16:39:46 +00:00
NAKAMURA Takumi
3f93ca1561 [Cygwin] Use -femulated-tls by default since r257718 introduced the new pass.
FIXME: Add more targets to use emutls into clang/test/Driver/emulated-tls.cpp.
FIXME: Add cygwin tests into llvm/test/CodeGen/X86. Working in progress.
llvm-svn: 257984
2016-01-16 03:44:52 +00:00
Dan Gohman
efe666f55a [WebAssembly] Add some more README.txt entries.
llvm-svn: 257969
2016-01-16 00:20:03 +00:00
Kevin B. Smith
266f53a305 [X86]: Make param names in header and body match for isCalleePop.
Differential Revision: http://reviews.llvm.org/D16246

llvm-svn: 257965
2016-01-16 00:08:36 +00:00
Dan Gohman
62a2872518 [WebAssembly] Don't create a needless .note.GNU-stack section
WebAssembly's stack will never be executable by default, so it isn't
necessary to declare .note.GNU-stack sections to request a non-executable
stack.

Differential Revision: http://reviews.llvm.org/D15969

llvm-svn: 257962
2016-01-15 23:59:13 +00:00
Artem Belevich
b31d5d18b5 [NVPTX] Do not emit .hidden or .protected directives as they are not allowed by PTX.
llvm-svn: 257961
2016-01-15 23:57:53 +00:00
Manman Ren
edece54223 CXX_FAST_TLS calling convention: fix issue on ARM.
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.

PR26136

llvm-svn: 257930
2016-01-15 20:24:11 +00:00
Manman Ren
177654f861 CXX_FAST_TLS calling convention: fix issue on AArch64.
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.

I will commit fixes to other platforms as well.

PR26136

llvm-svn: 257929
2016-01-15 20:13:28 +00:00
Manman Ren
5feeddb3a5 CXX_FAST_TLS calling convention: fix issue on X86-64.
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.

I will commit fixes to other platforms as well.

PR26136

llvm-svn: 257925
2016-01-15 19:35:42 +00:00
Kyle Butt
a60f1430f3 Codegen: [PPC] Silence false-positive initialization warning. NFC
Some compilers don't do exhaustive switch checking. For those compilers,
add an initialization to prevent un-initialized variable warnings from
firing. For compilers with exhaustive switch checking, we still get a
guarantee that the switch is exhaustive, and hence the initializations
are redundant, and a non-functional change.

llvm-svn: 257923
2016-01-15 19:20:06 +00:00
Reid Kleckner
75210daffb Revert "[ARM] Add ARMv8-M security extension instructions to ARMv8-M Baseline/Mainline"
This reverts commit r257883.

Somehow this didn't make it into r257916.

llvm-svn: 257919
2016-01-15 18:55:12 +00:00
Reid Kleckner
3937bf8e5a # This is a combination of 2 commits.
# The first commit's message is:

Revert "[ARM] Add DSP build attribute and extension targeting"

This reverts commit b11cc50c0b4a7c8cdb628abc50b7dc226ff583dc.

# This is the 2nd commit message:

Revert "[ARM] Add new system registers to ARMv8-M Baseline/Mainline"

This reverts commit 837d08454e3e5beb8581951ac26b22fa07df3cd5.

llvm-svn: 257916
2016-01-15 18:31:29 +00:00
Krzysztof Parzyszek
c2a984edbb [Hexagon] Generate CONST64 when optimizing for size in copy-to-combine
llvm-svn: 257891
2016-01-15 14:08:31 +00:00
Krzysztof Parzyszek
ccc8f74189 [Hexagon] Handle DBG_VALUE instructions in copy-to-combine
llvm-svn: 257890
2016-01-15 13:55:57 +00:00
Bradley Smith
8c53f2fbdf [ARM] Add DSP build attribute and extension targeting
llvm-svn: 257885
2016-01-15 10:28:25 +00:00
Bradley Smith
84c89d1ac3 [ARM] Add new system registers to ARMv8-M Baseline/Mainline
llvm-svn: 257884
2016-01-15 10:28:03 +00:00
Bradley Smith
c60b26bd62 [ARM] Add ARMv8-M security extension instructions to ARMv8-M Baseline/Mainline
llvm-svn: 257883
2016-01-15 10:27:14 +00:00
Bradley Smith
eaffa9a647 [ARM] Add ARMv8-A semaphore/atomic instructions to ARMv8-M Baseline/Mainline
llvm-svn: 257882
2016-01-15 10:26:51 +00:00
Bradley Smith
3ee98732a3 [ARM] Add B.W and CBZ instructions to ARMv8-M Baseline
llvm-svn: 257881
2016-01-15 10:26:17 +00:00
Bradley Smith
f335bc33b1 [ARM] Add SDIV/UDIV instructions to ARMv8-M Baseline
llvm-svn: 257880
2016-01-15 10:25:35 +00:00
Bradley Smith
8eefe89898 [ARM] Add MOVW/MOVT instructions to ARMv8-M Baseline/Mainline
llvm-svn: 257879
2016-01-15 10:25:14 +00:00
Bradley Smith
faae1a370a [ARM] Add ARMv8-M Baseline/Mainline LLVM targeting
llvm-svn: 257878
2016-01-15 10:24:39 +00:00
Bradley Smith
68c51277a2 [ARM] Split out ARMv8-A semaphores and atomics and ARMv7 clrex as separate features
llvm-svn: 257877
2016-01-15 10:23:46 +00:00
Jonas Paulsson
be22f38105 [SystemZ] Fix bad instruction name
SLGBR -> SLBGR

Reviewed by Ulrich Weigand

llvm-svn: 257874
2016-01-15 07:12:09 +00:00
Pete Cooper
28d2ff3ccd Delete MCRelocationInfo::createExprForRelocation.
This method has no callers.

Also remove X86ELFRelocationInfo.cpp and X86MachORelocationInfo.cpp
which only existed to provide an implementation of that method.

Ok'd by Rafael and Jim.

llvm-svn: 257859
2016-01-15 02:24:12 +00:00
Weiming Zhao
9a18c0735c Fix AArch64ConditionOptimizer
Summary:
This pass may modify the Cmp operands. However, the flag reg may be used by both the branch and CSEL.
Modifying CMP will have side effect on CSEL.

Reviewers: t.p.northover

Subscribers: llvm-commits, aemerson, rengolin

Differential Revision: http://reviews.llvm.org/D16147

llvm-svn: 257844
2016-01-15 00:06:58 +00:00
Krzysztof Parzyszek
bccfe64a1e [Hexagon] Use S2_lsr_i_r instead of S2_extractu to obtain upper halfword
llvm-svn: 257815
2016-01-14 21:59:22 +00:00
Krzysztof Parzyszek
d9129a3d14 [Hexagon] Handle HVX registers in bit simplification
llvm-svn: 257811
2016-01-14 21:45:43 +00:00
Rui Ueyama
dca64dbccc Update to use new name alignTo().
llvm-svn: 257804
2016-01-14 21:06:47 +00:00
Rafael Espindola
8e0f92e5a1 Handle offsets larger than 32 bits.
David Majnemer noticed that it was not obvious what the behavior would
be if B.Offset - A.Offset could not fit in an int.

llvm-svn: 257803
2016-01-14 21:03:06 +00:00
Rafael Espindola
5ed565b517 Assert that a cmp function defines a total order.
Thanks to David Blaikie for noticing it.

llvm-svn: 257796
2016-01-14 20:28:25 +00:00
Krzysztof Parzyszek
3c4fc74899 [Hexagon] Expand pseudo instruction Insert4
llvm-svn: 257771
2016-01-14 15:37:16 +00:00
Krzysztof Parzyszek
5b1b8914af [Hexagon] Handle branches with non-mbb operands
llvm-svn: 257768
2016-01-14 15:05:27 +00:00
Benjamin Kramer
a3d2bfba9e [ARM] Use the efficient version of BitVector::set and a static_assert.
No functional change intended.

llvm-svn: 257766
2016-01-14 14:33:04 +00:00
Igor Breger
bd173e545a AVX512: VMOVDQA32/64 (load) intrinsic implementation.
Differential Revision: http://reviews.llvm.org/D16142

llvm-svn: 257749
2016-01-14 07:56:04 +00:00
Ahmed Bougacha
28ea980393 [AArch64] Don't assume extractelt constant index when matching shuffle.
llvm-svn: 257735
2016-01-14 02:12:30 +00:00
JF Bastien
9f9b1386e0 WebAssembly: mark a few new failures
A recent change introduced this assertion failure in some corner cases.

Repro:
mkdir /s/wasm/torture-out ; time /s/wasm/waterfall/src/compile_torture_tests.py --c /s/llvm/out/bin/clang --cxx /s/llvm/out/bin/clang++ --testsuite /s/gcc/gcc/testsuite --fails /s/llvm/llvm/lib/Target/WebAssembly/known_gcc_test_failures.txt --out /s/wasm/torture-out

Or look on the wasm integration bot:
https://build.chromium.org/p/client.wasm.llvm/console

llvm-svn: 257733
2016-01-14 01:49:22 +00:00
David Majnemer
1371bbfd83 [X86] Don't alter HasOpaqueSPAdjustment after we've relied on it
We rely on HasOpaqueSPAdjustment not changing after we've calculated
things based on it.  Things like whether or not we can use 'rep;movs' to
copy bytes around, that sort of thing.  If it changes, invariants in the
backend will quietly break.  This situation arose when we had a call to
memcpy *and* a COPY of the FLAGS register where we would attempt to
reference local variables using %esi, a register that was clobbered by
the 'rep;movs'.

This fixes PR26124.

llvm-svn: 257730
2016-01-14 01:20:03 +00:00
JF Bastien
56616e9461 WebAssembly: fix build break introduced by ELFObjectWriter churn
llvm-svn: 257709
2016-01-13 23:36:00 +00:00
Rafael Espindola
7f81c885ac Convert a few assert failures into proper errors.
Fixes PR25944.

llvm-svn: 257697
2016-01-13 22:56:57 +00:00
Krzysztof Parzyszek
7f3bfe7d2f [Hexagon] Fix the options controlling jump table generation
llvm-svn: 257679
2016-01-13 21:43:13 +00:00
Changpeng Fang
a20f9886e2 AMDGPU/SI: Update ISA version for FIJI
llvm-svn: 257666
2016-01-13 20:39:25 +00:00
Dan Gohman
46ee3ce034 [WebAssembly] Add an assertion to catch unexpected MCFixupKindInfo flags.
llvm-svn: 257657
2016-01-13 19:31:57 +00:00
Dan Gohman
0579006fe8 [WebAssembly] MCFixupKindInfo's TargetSize is in bits rather than bytes.
llvm-svn: 257655
2016-01-13 19:29:37 +00:00
Hans Wennborg
f6d8990fbf Fix struct/class mismatch for MachineSchedContext
llvm-svn: 257648
2016-01-13 18:59:45 +00:00
Marek Olsak
fabfcc8989 AMDGPU/SI: Fix a GPU hang with POS_W_FLOAT enabled
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16037

llvm-svn: 257625
2016-01-13 17:23:20 +00:00
Marek Olsak
49708629a1 AMDGPU/SI: Remove ending s_endpgm from non-void functions
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16035

llvm-svn: 257623
2016-01-13 17:23:12 +00:00
Marek Olsak
aa3a6d7b3d AMDGPU/SI: Add s_waitcnt at the end of non-void functions
Summary:
v2: Make ReturnsVoid private, so that I can another 8 lines of code and
    look more productive.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16034

llvm-svn: 257622
2016-01-13 17:23:09 +00:00
Marek Olsak
2b92f6022f AMDGPU/SI: Add support for non-void functions
Summary:
Return values can be stored in SGPRs (i32) and VGPRs (f32).

This will be used by functions which expect some bytecode or other binary to
be appended at the end. It allows defining in which registers the return
values will be stored.

v2: don't do this for compute shaders

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16033

llvm-svn: 257621
2016-01-13 17:23:04 +00:00
Derek Schuff
ebdcd2d854 [WebAssemly] Invalidate liveness in CFG stackifier
WebAssemblyCFGStackify does not track liveness for EXPR_STACK, causing
verifier failure if liveness has not already been invalidated.

llvm-svn: 257620
2016-01-13 17:10:28 +00:00
Nicolai Haehnle
a2b45f7350 AMDGPU/SI: Add SI Machine Scheduler
Summary:
It is off by default, but can be used
with --misched=si

Patch by: Axel Davy

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D11885

llvm-svn: 257609
2016-01-13 16:10:10 +00:00
Michael Zuckerman
6df5def12a Fixing warning by adding the X86ISD::VROTRI case.
Differential Revision: http://reviews.llvm.org/D16052 

llvm-svn: 257607
2016-01-13 15:48:42 +00:00
Krzysztof Parzyszek
d9323342be [Hexagon] Do not insert non-phis before phis in bit simplification
llvm-svn: 257606
2016-01-13 15:48:18 +00:00
Michael Zuckerman
dfb762ecb8 [AVX512] Adding PMOVSXBD/W/Q , PMOVZSDQ and PMOVZSWD/Q Intrinsics .
Differential Revision: http://reviews.llvm.org/D16111 

llvm-svn: 257604
2016-01-13 14:59:19 +00:00
Michael Zuckerman
7d1a571f3e [AVX512] Adding PMOVZXBD/W/Q , PMOVZXDQ and PMOVZXWD/Q Intrinsics
Differential Revision:http://reviews.llvm.org/D16071

llvm-svn: 257601
2016-01-13 14:25:21 +00:00
Ulrich Weigand
257810adf2 [PowerPC] Fix large code model with the ELFv2 ABI
The global entry point prologue currently assumes that the TOC
associated with a function is less than 2GB away from the function
entry point.  This is always true when using the medium or small
code model, but may not be the case when using the large code model.

This patch adds a new variant of the ELFv2 global entry point prologue
that lifts the 2GB restriction when building with -mcmodel=large.
This works by emitting a quadword containing the distance from the
function entry point to its associated TOC immediately before the
entry point, and then using a prologue like:

ld r2,-8(r12)
add r2,r2,r12

Since creation of the entry point prologue is now split across two
separate routines (PPCLinuxAsmPrinter::EmitFunctionEntryLabel emits
the data word, PPCLinuxAsmPrinter::EmitFunctionBodyStart the prolog
code), I've switched to using named labels instead of just temporaries
to indicate the locations of the global and local entry points and the
new TOC offset data word.

These names are provided by new routines in PPCFunctionInfo modeled
after the existing PPCFunctionInfo::getPICOffsetSymbol.

Note that a corresponding change was committed to GCC here:
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00355.html

Reviewers: hfinkel

Differential Revision: http://reviews.llvm.org/D15500

llvm-svn: 257597
2016-01-13 13:12:23 +00:00
Michael Zuckerman
3e4a1e477a [AVX512] adding PRORQ , PRORD , PRORLVQ and PRORLVD Intrinsics
Differential Revision: http://reviews.llvm.org/D16052

llvm-svn: 257594
2016-01-13 12:39:33 +00:00
Marek Olsak
26f0487127 AMDGPU/SI: Allow more shader inputs
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16032

llvm-svn: 257593
2016-01-13 11:46:48 +00:00
Marek Olsak
a288ad7bd5 AMDGPU/SI: Allow any number of PS inputs
Summary:
With the ability to concatenate shader binaries, the limit of 15 no longer
applies.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16031

llvm-svn: 257592
2016-01-13 11:46:10 +00:00
Marek Olsak
9d874b8ed0 AMDGPU/SI: Add new target attribute InitialPSInputAddr
Summary:
This allows Mesa to pass initial SPI_PS_INPUT_ADDR to LLVM.
The register assigns VGPR locations to PS inputs, while the ENA register
determines whether or not they are loaded.

Mesa needs to set some inputs as not-movable, so that a pixel shader prolog
binary appended at the beginning can assume where some inputs are.

v2: Make PSInputAddr private, because there is never enough silly getters
    and setters for people to read.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16030

llvm-svn: 257591
2016-01-13 11:45:36 +00:00
Marek Olsak
9b704a84bb AMDGPU/SI: Fix a bug in SIFoldOperands
Summary: ret.ll will contain a test for this

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16029

llvm-svn: 257590
2016-01-13 11:44:29 +00:00
Andrey Turetskiy
45056f0d6a LEA code size optimization pass (Part 2): Remove redundant LEA instructions.
Make x86 OptimizeLEAs pass remove LEA instruction if there is another LEA
(in the same basic block) which calculates address differing only be a
displacement. Works only for -Oz.

Differential Revision: http://reviews.llvm.org/D13295

llvm-svn: 257589
2016-01-13 11:30:44 +00:00
James Y Knight
42de23f7fd [SPARC] Revamp AnalyzeBranch and add ReverseBranchCondition.
AnalyzeBranch on X86 (and, previously, SPARC, which implementation was
copied from X86) tries to modify the branches based on block
layout (e.g. checking isLayoutSuccessor), when AllowModify is true.

The rest of the architectures leave that up to the caller, which can
call InsertBranch, RemoveBranch, and ReverseBranchCondition as
appropriate. That appears to be the preferred way to do it nowadays.

This commit makes SPARC like the rest: replaces AnalyzeBranch with an
implementation cribbed from AArch64, and adds a ReverseBranchCondition
implementation.

Additionally, a test-case has been added (also cribbed from AArch64)
demonstrating that redundant branch sequences no longer get emitted.

E.g., it used to emit code like this:
         bne .LBB1_2
         nop
         ba .LBB1_1
         nop
 .LBB1_2:

And now emits:
        cmp %i0, 42
        be .LBB1_1
        nop

llvm-svn: 257572
2016-01-13 04:44:14 +00:00
Ana Pazos
c67605986f Guard fabs to bfc convert with V6T2 flag
Summary:
BFC instructions are available in ARMv6T2 and above.


Reviewers: t.p.northover

Subscribers: aemerson

Differential Revision: http://reviews.llvm.org/D16076

llvm-svn: 257546
2016-01-13 00:03:35 +00:00
Quentin Colombet
9549547f2b [ARM] Mark VMOV with immediate: isAsCheapAsMove.
VMOVs are not strictly speaking cheap, but they are as expensive as a vector
copy (VORR), so we should prefer rematerialization over splitting when it
applies.

rdar://problem/23754176

llvm-svn: 257545
2016-01-13 00:02:40 +00:00
Derek Schuff
278b1bc727 [WebAssembly] Fix disassembler shared-libs build
llvm-svn: 257536
2016-01-12 23:03:40 +00:00
Dan Gohman
f4c61d26b6 [WebAsssembly] Register the MC register info.
llvm-svn: 257525
2016-01-12 21:27:55 +00:00
Michael Zuckerman
412db37229 [AVX512] adding PROLQ and PROLD Intrinsics
Differential Revision: http://reviews.llvm.org/D16048

llvm-svn: 257523
2016-01-12 21:19:17 +00:00
Kyle Butt
f1fe33f755 Codegen: [PPC] Handle weighted comparisons when inserting selects.
Only non-weighted predicates were handled in PPCInstrInfo::insertSelect. Handle
the weighted predicates as well.

This latent bug was triggered by r255398, because it added use of the
branch-weighted predicates.

While here, switch over an enum instead of an int to get the compiler to enforce
totality in the future.

llvm-svn: 257518
2016-01-12 21:00:43 +00:00
Dan Gohman
44c4718de7 [WebAssembly] Add a EM_WEBASSEMBLY value, and several bits of code that use it.
A request has been made to the official registry, but an official value is
not yet available. This patch uses a temporary value in order to support
development. When an official value is recieved, the value of EM_WEBASSEMBLY
will be updated.

llvm-svn: 257517
2016-01-12 20:56:01 +00:00
Dan Gohman
37a958f796 [WebAssembly] Introduce a WebAssemblyTargetStreamer class.
Refactor .param, .result, .local, and .endfunc, as directives, using the
proper MCTargetStreamer mechanism, rather than fake instructions.

llvm-svn: 257511
2016-01-12 20:30:51 +00:00
Krzysztof Parzyszek
9eca362658 Replace inherited constructor with an explicit one
Some bots failed when the inherited constructor was used.

llvm-svn: 257508
2016-01-12 19:27:59 +00:00
Dan Gohman
e122783b00 [WebAssembly] Make CFG stackification independent of basic-block labels.
This patch changes the way labels are referenced. Instead of referencing the
basic-block label name (eg. .LBB0_0), instructions now just have an immediate
which indicates the depth in the control-flow stack to find a label to jump to.
This makes them much closer to what we expect to have in the binary encoding,
and avoids the problem of basic-block label names not being explicit in the
binary encoding.

Also, it terminates blocks and loops with end_block and end_loop instructions,
rather than basic-block label names, for similar reasons.

This will also fix problems where two constructs appear to have the same label,
because we no longer explicitly use labels, so consumers that need labels will
presumably create their own labels, and presumably they won't reuse labels
when they do.

This patch does make the code a little more awkward to read; as a partial
mitigation, this patch also introduces comments showing where the labels are,
and comments on each branch showing where it's branching to.

llvm-svn: 257505
2016-01-12 19:14:46 +00:00
Krzysztof Parzyszek
ce7b184512 [Hexagon] Implement RDF-based post-RA optimizations
- Handle simple cases of register copies (what current RDF CP allows).
- Hexagon-specific dead code elimination: handles dead address updates
  in post-increment instructions.

llvm-svn: 257504
2016-01-12 19:09:01 +00:00
Krzysztof Parzyszek
8ef90fe4f8 RDF: Copy propagation
This is a very limited implementation of DFG-based copy propagation.
It only handles actual COPY instructions (does not handle other equivalents
such as add-immediate with a 0 operand).
The major limitation is that it does not update the DFG: that will be the
change required to make it more robust (hopefully coming up soon).

llvm-svn: 257490
2016-01-12 17:23:48 +00:00
Tom Stellard
57fb300ca9 AMDGPU: Emit note directive for HSA even if there are no functions
Reviewers: arsenm, echristo

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16010

llvm-svn: 257488
2016-01-12 17:18:17 +00:00
Krzysztof Parzyszek
3a4051fd2e RDF: Dead code elimination
Utility class to perform DFG-based dead code elimination.

llvm-svn: 257485
2016-01-12 17:01:16 +00:00
Krzysztof Parzyszek
0ef8e6d23c Fix compiler warnings from r257477
llvm-svn: 257483
2016-01-12 16:51:55 +00:00
Krzysztof Parzyszek
e67e1b7be9 RDF: Implement register liveness analysis
Compute block live-ins and operand kill flags from the DFG.

llvm-svn: 257480
2016-01-12 15:56:33 +00:00
Daniel Sanders
4187b1a144 [mips] Correct operand order in DSP's mthi/mtlo
Summary: The result register is the second operand as per the other mt* instructions.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D15993

llvm-svn: 257478
2016-01-12 15:15:14 +00:00
Krzysztof Parzyszek
c0119537d0 Register Data Flow: data flow graph
Target independent, SSA-based data flow framework for representing
data flow between physical registers.

This commit implements the creation of the actual data flow graph.

llvm-svn: 257477
2016-01-12 15:09:49 +00:00
Benjamin Kramer
8a32d80124 [Hexagon] Make helper function static. NFC.
llvm-svn: 257476
2016-01-12 14:58:49 +00:00
Keno Fischer
b466194d5d [ARM] Fix several state persistence bugs
Summary:
This fixes three bugs, in all of which state is not or incorrecly reset between
objects (i.e. when reusing the same pass manager to create multiple object
files):
1) AttributeSection needs to be reset to nullptr, because otherwise the backend
   will try to emit into the old object file's attribute section causing a
   segmentation fault.
2) MappingSymbolCounter needs to be reset, otherwise the second object file
   will start where the first one left off.
3) The MCStreamer base class resets the Streamer's e_flags settings. Since
   EF_ARM_EABI_VER5 is set on streamer creation, we need to set it again
   after the MCStreamer was rest.

Also rename Reset (uppser case) to EHReset to avoid confusion with
reset (lower case).

Reviewers: rengolin
Differential Revision: http://reviews.llvm.org/D15950

llvm-svn: 257473
2016-01-12 13:38:15 +00:00
Andrey Turetskiy
3f9a1fbdbe Test commit access - tiny comment and code style fix.
llvm-svn: 257472
2016-01-12 13:34:11 +00:00
Robert Lougher
86b723434f The isel pattern that selects the memory-register form of VCVTPH2PS
(64 to 128-bit) matches against the pattern fragment 'vzmovl_v2i64'
(a zero-extended 64-bit load).

However, a change in r248784 teaches the instruction combiner that only
the lower 64 bits of the input to a 128-bit vcvtph2ps are used.  This means
the instruction combiner will ordinarily optimize away the upper 64-bit
insertelement instruction in the zero-extension and so we no longer select
the memory-register form.  To fix this a new pattern has been added.

Differential Revision: http://reviews.llvm.org/D16067

llvm-svn: 257470
2016-01-12 11:48:25 +00:00
Igor Breger
46e273fe48 AVX512: VPMOVAPS/PD and VPMOVUPS/PD (load) intrinsic implementation.
Differential Revision: http://reviews.llvm.org/D16042

llvm-svn: 257463
2016-01-12 10:02:32 +00:00
Dan Gohman
2d32547aa5 [WebAssembly] Implement a prototype instruction encoder and disassembler.
This is using an extremely simple temporary made-up binary format, not the
official binary format (which isn't defined yet).

llvm-svn: 257440
2016-01-12 03:32:29 +00:00
Dan Gohman
ba5f0b5ef9 [WebAssembly] Register the MC subtarget info.
llvm-svn: 257439
2016-01-12 03:30:06 +00:00
Dan Gohman
07131ec8af [WebAssembly] Define OperandTypes for decoding immediate values.
llvm-svn: 257438
2016-01-12 03:09:16 +00:00
Dan Gohman
77b2aec88f [WebAssembly] Use TSFlags instead of keeping a list of special-case opcodes.
llvm-svn: 257433
2016-01-12 01:45:12 +00:00
Manman Ren
7584f0dfac CXX_FAST_TLS calling convention: performance improvement for x86-64.
This is the same change on x86-64 as r255821 on AArch64.
rdar://9001553

llvm-svn: 257428
2016-01-12 01:08:46 +00:00
Manman Ren
6cbd4fbe85 CXX_FAST_TLS calling convention: performance improvement for ARM.
This is the same change on ARM as r255821 on AArch64.
rdar://9001553

llvm-svn: 257424
2016-01-12 00:47:18 +00:00
Manman Ren
edd11c4d38 CXX_FAST_TLS calling convention: Add support for ARM on Darwin.
rdar://9001553

llvm-svn: 257417
2016-01-11 23:50:43 +00:00
Dan Gohman
28312257c8 [WebAssembly] Define WebAssembly-specific relocation codes.
Currently WebAssembly has two kinds of relocations; data addresses and
function addresses. This adds ELF relocations for them, as well as an
MC symbol kind to indicate which type of relocation is needed.

llvm-svn: 257416
2016-01-11 23:38:05 +00:00
Dan Gohman
a3c4472bce [WebAssembly] Reorganize address offset folding.
Always expect tglobaladdr and texternalsym to be wrapped in
WebAssemblywrapper nodes. Also, split out a regPlusGA from regPlusImm so
that it can special-case global addresses, as they can be folded in more
cases.

Unfortunately this doesn't enable any new optimizations yet due to
SelectionDAG limitations. I'll be submitting changes to the SelectionDAG
infrastructure, along with tests, in a separate patch.

llvm-svn: 257394
2016-01-11 22:05:44 +00:00
Matt Arsenault
e2033b4ea1 AMDGPU: Implement {{s|u}}int_to_fp i64 -> f32
The old lowering for uint_to_fp failed opencl conformance.
It might be OK for fast math mode, but I'm not sure.

llvm-svn: 257393
2016-01-11 22:01:48 +00:00
Matt Arsenault
f903e8d6a9 AMDGPU: Fix crash with dispatch.ptr intrinsic with non-HSA target
It might be better to let this be a select failure instead.

llvm-svn: 257386
2016-01-11 21:18:33 +00:00
Matt Arsenault
c485a8dc2b AMDGPU: Fix ctlz combine for sub 32-bit types
llvm-svn: 257353
2016-01-11 17:02:06 +00:00
Matt Arsenault
b88ff2e112 AMDGPU: Pattern match ffbh pattern to instruction.
The hardware instruction's output on 0 is -1 rather than 32.
Eliminate a test and select to -1. This removes an extra instruction
from the compatability function with HSAIL's firstbit instruction.

llvm-svn: 257352
2016-01-11 17:02:00 +00:00
Matt Arsenault
9caa943926 AMDGPU: Custom lower i64 ctlz
llvm-svn: 257348
2016-01-11 16:50:29 +00:00
Matt Arsenault
cb3a3bebdf Mips: Remove lowerSELECT_CC
This is the same as the default expansion.

llvm-svn: 257346
2016-01-11 16:44:48 +00:00
Matt Arsenault
c04b46aa04 LegalizeDAG: Expand ctlz with ctlz_zero_undef if legal
llvm-svn: 257345
2016-01-11 16:37:46 +00:00
Matt Arsenault
c2a350a74a AMDGPU: Remove dead target dag combine
llvm-svn: 257344
2016-01-11 16:37:40 +00:00
Daniel Sanders
f78b9ea9f0 [mips] Never select JAL for calls to an absolute immediate address.
Summary:
It actually takes an offset into the current PC-region.

This fixes the 'expr' command in lldb.

Reviewers: vkalintiris, jaydeep, bhushan

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D16054

llvm-svn: 257339
2016-01-11 15:57:46 +00:00
Krzysztof Parzyszek
e8a7ebcd1a [Hexagon] Add check for nullptr in getFixupNoBits
llvm-svn: 257338
2016-01-11 15:51:53 +00:00
Krzysztof Parzyszek
c902c4f38a [Hexagon] Add implicit uses of GP to GP-relative loads and stores
llvm-svn: 257337
2016-01-11 15:49:58 +00:00
Krzysztof Parzyszek
438a5799b1 [Hexagon] Mark D14 and GP as reserved registers
llvm-svn: 257336
2016-01-11 15:47:41 +00:00
Alexey Bataev
abddd2d70e [X86] Reduce complexity of the LEA optimization pass, by Andrey Turetsky.
In the OptimizeLEA pass keep instructions' positions in the basic block saved and use them for calculation of the distance between two instructions instead of std::distance. This reduces complexity of the pass from O(n^3) to O(n^2) and thus the compile time.
Differential Revision: http://reviews.llvm.org/D15692

llvm-svn: 257328
2016-01-11 11:52:29 +00:00
Craig Topper
d62ff1659a [AVX-512] Remove another extra space from the Intel syntax asm strings.
llvm-svn: 257304
2016-01-11 01:03:40 +00:00
Craig Topper
d9fb66564c [AVX-512] Remove more superfluous spaces from asm strings.
llvm-svn: 257301
2016-01-11 00:44:58 +00:00
Craig Topper
e6ae356ee7 [AVX-512] Remove unused Round and Itinerary from the maskable_cmp multiclasses. They weren't used and there were extra spaces in the asm string to prepare for the concatenations of the round string that wasn't ever used.
llvm-svn: 257300
2016-01-11 00:44:56 +00:00
Craig Topper
59a087ed05 [AVX-512] Make spacing between comma and {sae} operand consistent in asm strings.
llvm-svn: 257299
2016-01-11 00:44:52 +00:00
Craig Topper
d1e44ba69a [X86] Remove extra spaces from MPX instruction asm strings.
llvm-svn: 257298
2016-01-11 00:44:46 +00:00
Elena Demikhovsky
6b2fa94db0 Optimized instruction sequence for sitofp operation on X86-32
Optimized sitofp i64 %x to double. The current sequence

movl %ecx, 8(%esp) 
movl %edx, 12(%esp) 
fildll 8(%esp)

is replaced with:

movd %ecx, %xmm0 
movd %edx, %xmm1 
punpckldq %xmm1, %xmm0 
movq %xmm0, 8(%esp)

Differential Revision: http://reviews.llvm.org/D15946

llvm-svn: 257285
2016-01-10 09:41:22 +00:00
Michael Zuckerman
142f0a5d1e [AVX512] add PRORVQ and PRORVD Intrinsic
Differential Revision:http://reviews.llvm.org/D15955

llvm-svn: 257283
2016-01-10 09:16:41 +00:00
Simon Pilgrim
fb82a5dfcc [X86][AVX] Match broadcast loads through a bitcast
AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through any bitcast to check for a load node to allow broadcasts to occur.

This is a re-commit of r257055 after r257264 fixed 32-bit broadcast loads of i64 scalars.

llvm-svn: 257266
2016-01-09 20:59:39 +00:00
Simon Pilgrim
098bdc5b08 [X86][AVX] Add support for i64 broadcast loads on 32-bit targets
Added 32-bit AVX1/AVX2 broadcast tests.

llvm-svn: 257264
2016-01-09 19:59:27 +00:00
Tobias Edler von Koch
18c138a981 [Hexagon] Replace a static member variable in HexagonCVIResource (NFC)
This creates one instance of TUL per HexagonShuffler, which avoids thread-safety
issues with future changes.

llvm-svn: 257215
2016-01-08 22:07:25 +00:00
Weiming Zhao
f34d1f0dc5 RBIT Instruction only available for ARMv6t2 and above.
Summary:
r255334 matches bit-reverse pattern in InstCombine and generates calls to Instrinsic::bitreverse.

RBIT instruction is only available for ARMv6t2 and above. This patch has the intrinsic expanded during legalization for ARMv4 and ARMv5.

Patch by Z. Zheng <zhaoshiz@codeaurora.org>

Reviewers: apazos, jmolloy, weimingz

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D15932

llvm-svn: 257188
2016-01-08 18:43:41 +00:00
Weiming Zhao
d25e3c1837 Disable shrink-wrap for Thumb1
Summary: In ARMConstantIslandPass, which runs after Shrink Wrap pass, long jumps will be fixed up as BL (tBfar) which depends on spilling LR in epilogue.  However, shrink-wrap may remove the LR, which causes issues when the function returns.

Reviewers: qcolombet, rengolin

Subscribers: aemerson, rengolin

Differential Revision: http://reviews.llvm.org/D15984

llvm-svn: 257187
2016-01-08 18:37:43 +00:00
Tom Stellard
1d434ce3dd AMDGPU/SI: Emit global variable sizes when targeting HSA
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15952

llvm-svn: 257173
2016-01-08 14:50:28 +00:00
Tom Stellard
802bf97e04 AMDGPU: Emit functions sizes
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15951

llvm-svn: 257172
2016-01-08 14:50:23 +00:00
Nemanja Ivanovic
ec3aeea01e Prevent renaming of CR fields in AADB when a CR restore is present
This patch corresponds to review:
http://reviews.llvm.org/D15930

Moves to and from CR fields depend on shifts/masks that depend on the
target/source CR field. Thus, post-ra anti-dep breaking must not later
change that CR register assignment.

llvm-svn: 257168
2016-01-08 13:09:54 +00:00
Dylan McKay
37ab0b1ed1 [AVR] Added AVRSelectionDAGInfo header file
llvm-svn: 257152
2016-01-08 06:32:27 +00:00
Craig Topper
9845f978d1 [AVX-512] Remove superfluous spaces from some asm strings.
llvm-svn: 257150
2016-01-08 06:09:20 +00:00
Craig Topper
dc8e284ff1 [X86] Don't print the aliased version of CVTSD2SI64rm. This appears to be a mistake I made years ago.
llvm-svn: 257149
2016-01-08 06:09:18 +00:00
Craig Topper
e2f6754f73 [X86] Use \t instead of space after mnemonics in a bunch InstAliases for consistency.
llvm-svn: 257148
2016-01-08 06:09:13 +00:00
Kyle Butt
70e01b07f9 Add call sequence start and end for __tls_get_addr
This is a fix for bug http://llvm.org/bugs/show_bug.cgi?id=25839.

For a PIC TLS variable access in a function, prologue (mflr followed by std and
stdu) gets scheduled after a tls_get_addr call. tls_get_addr messed up LR but
no one saves/restores it.

Also added a test for save/restore clobbered registers during calling __tls_get_addr.

Patch by Tim Shen

llvm-svn: 257137
2016-01-08 02:06:19 +00:00
Dan Gohman
650311cc9a [WebAssembly] Minor code cleanups. NFC.
llvm-svn: 257131
2016-01-08 01:18:00 +00:00
Dan Gohman
96a9f5305c [WebAssembly] Minor code cleanups. NFC.
llvm-svn: 257128
2016-01-08 01:06:00 +00:00
Dan Gohman
4b51173afa [WebAssembly] Remove an unused def : Pat.
WebAssemblyISelLowering.cpp does not wrap jump table nodes inside of
WebAssemblywrapper nodes, so this pattern is not currently used.

llvm-svn: 257127
2016-01-08 00:50:33 +00:00
Dan Gohman
9a59be1c08 [WebAssembly] Remove unused arguments, unused functions. NFC.
llvm-svn: 257125
2016-01-08 00:43:54 +00:00
Eric Christopher
e21afdb3b5 Add some testing for thumb1 and thumb2 inline asm immediate constraints
and fix a couple of bugs on inspection.

Also fixes PR26061.

llvm-svn: 257122
2016-01-08 00:34:44 +00:00
JF Bastien
cdce125371 WebAssembly: use .skip instead of .zero directive
.zero is confusing when used with two arguments. Documentation:

  This directive emits SIZE 0-valued bytes.  SIZE must be an absolute
  expression.  This directive is actually an alias for the '.skip'
  directive so in can take an optional second argument of the value to
  store in the bytes instead of zero.  Using '.zero' in this way would be
  confusing however.

Ref: https://sourceware.org/bugzilla/show_bug.cgi?id=18353

Hexagon and Sparc do the same, and it's all the same to WebAssembly so
let's pick the less confusing of the two.

llvm-svn: 257111
2016-01-07 23:18:29 +00:00
JF Bastien
b8ea306e3f WebAssembly: update expected failures, more assert got resolved.
llvm-svn: 257098
2016-01-07 21:00:37 +00:00
JF Bastien
07a5def67e WebAssembly: update expected failures, assert got resolved by r257084.
llvm-svn: 257093
2016-01-07 20:07:21 +00:00
Derek Schuff
03c784208d [WebAssembly] Support combining GEP and FrameIndex offsets in memory operand offset field
Previously we only supported putting the FI into memory operand offset
fields if there was nothing there already. Now combine them.

Differential Revision: http://reviews.llvm.org/D15941

llvm-svn: 257084
2016-01-07 18:55:52 +00:00
Dan Gohman
cadd51d256 [WebAssembly] Use the default private label prefixes.
The MC assembler doesn't like using the empty string as a private label
prefix because then it treats all labels as private. This commit reverts
back to the default prefix, which is .L, which is common in ELF targets
and consistent with the LLVM name mangler.

llvm-svn: 257083
2016-01-07 18:49:53 +00:00
Nicolai Haehnle
8f90c2a416 AMDGPU/SI: Fold operands with sub-registers
Summary:
Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs,
increasing the code size and VGPR pressure. These moves are now folded away.

Note that this lack of operand folding was not a problem for VMEM loads,
because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register
coalescer.

Some tests are updated, note that the fsub.ll test explicitly checks that
the move is elided.

With the IR generated by current Mesa, the changes are obviously relatively
minor:

7063 shaders in 3531 tests
Totals:
SGPRS: 351872 -> 352560 (0.20 %)
VGPRS: 199984 -> 200732 (0.37 %)
Code Size: 9876968 -> 9881112 (0.04 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave
Wait states: 295164 -> 295337 (0.06 %)

Totals from affected shaders:
SGPRS: 65784 -> 66472 (1.05 %)
VGPRS: 38064 -> 38812 (1.97 %)
Code Size: 1993828 -> 1997972 (0.21 %) bytes
LDS: 42 -> 42 (0.00 %) blocks
Scratch: 795648 -> 783360 (-1.54 %) bytes per wave
Wait states: 54026 -> 54199 (0.32 %)

Reviewers: tstellarAMD, arsenm, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15875

llvm-svn: 257074
2016-01-07 17:10:29 +00:00
Nicolai Haehnle
b68efce7c9 AMDGPU/SI: xnack_mask is always reserved on VI
Summary:
Somehow, I first interpreted the docs as saying space for xnack_mask is only
reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy about this and
went back to actually test what is happening, and it turns out that xnack_mask
is always reserved at least on Tonga and Carrizo, in the sense that flat_scr
is always fixed below the SGPRs that are used to implement xnack_mask, whether
or not they are actually used.

I confirmed this by writing a shader using inline assembly to tease out the
aliasing between flat_scratch and regular SGPRs. For example, on Tonga, where
we fix the number of SGPRs to 80, s[74:75] aliases flat_scratch (so
xnack_mask is s[76:77] and vcc is s[78:79]).

This patch changes both the calculation of the total number of SGPRs and the
various register reservations to account for this.

It ought to be possible to use the gap left by xnack_mask when the feature
isn't used, but this patch doesn't try to do that. (Note that the same applies
to vcc.)

Note that previously, even before my earlier change in r256794, the SGPRs that
alias to xnack_mask could end up being used as well when flat_scr was unused
and the total number of SGPRs happened to fall on the right alignment
(e.g. highest regular SGPR being used s29 and VCC used would lead to number
of SGPRs being 32, where s28 and s29 alias with xnack_mask). So if there
were some conflict due to such aliasing, we should have noticed that already.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15898

llvm-svn: 257073
2016-01-07 17:10:20 +00:00
Michael Zuckerman
3926170a80 [AVX512] add PSLLW and PSLLV Intrinsic
Differential Revision: http://reviews.llvm.org/D15889

llvm-svn: 257070
2016-01-07 16:02:51 +00:00
Nico Weber
83e5259a0e Revert r257055, it caused PR26064.
llvm-svn: 257066
2016-01-07 15:01:46 +00:00
Michael Zuckerman
7df5f38dfa [AVX512] add PSRAV Intrinsic
Differential Revision: http://reviews.llvm.org/D15856

llvm-svn: 257063
2016-01-07 14:42:20 +00:00
Amjad Aboud
a54507b1df Added support for macro emission in dwarf (supporting DWARF version 4).
Differential Revision: http://reviews.llvm.org/D15495

llvm-svn: 257060
2016-01-07 14:28:20 +00:00
Michael Zuckerman
889be550c7 [AVX512] add PSHUFHW and PSHUFLW Intrinsic
Differential Revision: http://reviews.llvm.org/D15925

llvm-svn: 257056
2016-01-07 12:35:43 +00:00
Simon Pilgrim
dea273b77c [X86][AVX] Match broadcast loads through a bitcast
AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through bitcasts to check for a load node to allow broadcasts to occur.

Follow up to D15310

llvm-svn: 257055
2016-01-07 11:34:27 +00:00
Dylan McKay
bca58a659a Added AVRTargetObjectFile class and AVR.h
llvm-svn: 257049
2016-01-07 10:53:15 +00:00
Simon Pilgrim
4d6eae695d [X86][SSE} Add INSERTPS as a target shuffle
Follow up to D15378, added INSERTPS to the list of decodable target shuffles and enabled XFormVExtractWithShuffleIntoLoad to handle target shuffles with SentinelZero and tested this with INSERTPS.

llvm-svn: 257046
2016-01-07 10:24:19 +00:00
Michael Zuckerman
d509278da4 [AVX512] add PSHUFD Intrinsic
Differential Revision: http://reviews.llvm.org/D15934

llvm-svn: 257044
2016-01-07 09:24:12 +00:00
Tim Northover
815876c683 ARM: support TLS accesses on Darwin platforms
Darwin TLS accesses most closely resemble ELF's general-dynamic situation,
since they have to be able to handle all possible situations. The descriptors
and so on are obviously slightly different though.

llvm-svn: 257039
2016-01-07 09:03:03 +00:00
Jonas Paulsson
716757f08e [SystemZ] Add hasSideEffects flag on Serialize instruction.
Serialize will perform a hardware serialization operation, and is
acting as a memory barrier. Therefore it must have the hasSideEffects
flag set so it will be treated as a global memory object.

Reviewed by Ulrich Weigand

llvm-svn: 257036
2016-01-07 07:20:55 +00:00
Craig Topper
3ffb7b7727 [X86] Remove superfluous mayLoad flag. The pattern already implies it.
llvm-svn: 257035
2016-01-07 06:42:10 +00:00
Craig Topper
ee83b6a43c [X86] Had hasSideEffects=0 to VBROADCASTI128.
llvm-svn: 257034
2016-01-07 06:37:55 +00:00
Craig Topper
6c0a01cc26 [X86] Add OpSize32 to MOVSX32_NOREX instructions to match their other versions.
llvm-svn: 257033
2016-01-07 06:37:52 +00:00
Craig Topper
1e8c72699f [X86] Add hasSideEffects=0 and mayLoad=1 to MOVZX64* instructions. While there remove a superfluous _Q from the instruction names.
llvm-svn: 257032
2016-01-07 05:57:39 +00:00
Craig Topper
aa4d1466b3 [X86] STOSQ without a rep prefix doesn't read or write RCX.
llvm-svn: 257030
2016-01-07 05:18:49 +00:00
Haicheng Wu
0feec017ff [AArch64 MachineCombine] Enhance/Add support for general reassociation to reduce the critical path
Allow fadd/fmul to be reassociated in aarch64.

llvm-svn: 257024
2016-01-07 04:01:02 +00:00
Dan Gohman
8422d6f477 [WebAssembly] Add -m:e to the target triple.
This enables ELF-style name mangling, which primarily means using ".L" for
private symbols.

llvm-svn: 257020
2016-01-07 03:19:23 +00:00
Simon Pilgrim
c377a8a5e0 [X86] Determine if target shuffle can contain zero elements
getTargetShuffleMask may return shuffle masks with SM_SentinelZero (-2) values (currently just for PSHUFB but VPERM2X128 as well with this patch). Although some calling functions can make use of this (mainly for shuffle combining), others can not and their inclusion makes shuffle mask comparisons more difficult.

This patch adds a flag to getTargetShuffleMask to indicate if the calling function can't handle SM_SentinelZero; getTargetShuffleMask will then return false if it occurs to make handling much easier.

I've tidied up some uses of getTargetShuffleMask to better indicate what is going on - more could be done but at present I don't have test cases to demonstrate it.

Some upcoming patches will make use of this to both support more uses where SM_SentinelZero is not permitted (e.g. combineShuffleToAddSub), and also will allow us to add INSERTPS support to getTargetShuffleMask as part of better zero handling discussed in D14261.

Differential Revision: http://reviews.llvm.org/D15378

llvm-svn: 256992
2016-01-06 23:24:40 +00:00
Nicolai Haehnle
befc78df0d AMDGPU/SI: Fix crash when inline assembly is used in a graphics shader
Summary:
This is admittedly something that you could only run into by manually
playing around with shader assembly because the SITypeWriter pass is
skipped for compute.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15902

llvm-svn: 256980
2016-01-06 22:01:04 +00:00
Quentin Colombet
1b2176f1f4 [X86] Correctly model TLS calls w.r.t. frame requirements.
TLS calls need the stack frame to be properly set up and this
implies that such calls need ADJUSTSTACK_xxx markers.

Fixes PR25820.

llvm-svn: 256959
2016-01-06 19:09:26 +00:00
Sanjay Patel
df9b659110 refactor divrem8 lowering; NFCI
The code duplication contributed to PR25754:
https://llvm.org/bugs/show_bug.cgi?id=25754

llvm-svn: 256957
2016-01-06 18:47:09 +00:00
Dan Gohman
f58a82f98e [WebAssembly] Don't use range-based loop for a list that's being modified
The first instruction in a block is what the rend() iterator points to, so
if it moves, we need to re-evaluate rend() so that we continue to iterate
through the rest of the instructions.

llvm-svn: 256953
2016-01-06 18:29:35 +00:00
JF Bastien
61a8beaafa WebAssembly: add missing expected failures exposed by r256890
llvm-svn: 256948
2016-01-06 17:08:56 +00:00
JF Bastien
15f0f3953b WebAssembly: add new expected failures exposed by r256890
llvm-svn: 256945
2016-01-06 16:15:51 +00:00
Krzysztof Parzyszek
787f659237 [Hexagon] Add system instructions for cache manipulation
llvm-svn: 256936
2016-01-06 14:22:22 +00:00
Artyom Skrobov
402764f41e PR25754: avoid generating UDIVREM8_ZEXT_HREG nodes with i64 result
Reviewers: spatel, srking

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15331

llvm-svn: 256924
2016-01-06 09:41:10 +00:00
Simon Pilgrim
e7308ddcf1 [X86][SSE] There is no zmm addsubpd/addsubps instruction.
Replace the assert in combineShuffleToAddSub with an early out.

llvm-svn: 256922
2016-01-06 09:08:49 +00:00
Simon Pilgrim
3c38fe0257 [X86][SSE] An empty target shuffle mask is always a failure.
As discussed on D15378, move the mask.empty() tests to after the switch statement and consider any shuffle decode where the extracted target shuffle mask is empty as a failure.

llvm-svn: 256921
2016-01-06 08:59:32 +00:00
Craig Topper
f0f0ca85b5 [X86] Use PS instead of TB for instructions that have PD/XS/XD variations. Use OpSize32 on an instruction that has an OpSize16 variant.
llvm-svn: 256918
2016-01-06 06:18:41 +00:00
Craig Topper
b5a33aa69f [X86] Fix an incorrect usage of In32BitMode that should have been Not64BitMode.
llvm-svn: 256917
2016-01-06 06:18:37 +00:00
Philip Reames
f7e04fbc3d Extract helper function to merge MemoryOperand lists [NFC]
In the discussion on http://reviews.llvm.org/D15730, Andy pointed out we had a utility function for merging MMO lists. Since it turned we actually had two copies and there's another review in progress (http://reviews.llvm.org/D15230) which needs the same, extract it into a utility function and clean up the interfaces to make it easier to use with a MachineInstBuilder.

I introduced a pair here to track size and allocation together. I think we should probably move in the direction of the MachineOperandsRef helper class, but I'm leaving that for further work. I want to get the poison state introduced before I make major changes to the interface.

Differential Revision: http://reviews.llvm.org/D15757

llvm-svn: 256909
2016-01-06 04:39:03 +00:00
Junmo Park
fff17a4cfa Delete trailing whitespace; NFC
llvm-svn: 256908
2016-01-06 03:53:36 +00:00
Junmo Park
18c511ac11 Delete trailing whitespace; NFC
llvm-svn: 256906
2016-01-06 03:41:30 +00:00
Nicolai Haehnle
e9014468b6 AMDGPU/SI: Do not move scratch resource register on Tonga & Iceland
Due to the SGPR init bug, every program claims to use the same number
of SGPRs anyway, so there's no point in trying to shift those registers
down from their initial spot of reservation.

Add a test that uses VGPR spilling and blocks most SGPRs from being used for
the scratch resource register. Previously, this would run into an assertion.

Differential Revision: http://reviews.llvm.org/D15724

llvm-svn: 256870
2016-01-05 20:42:49 +00:00
David Majnemer
62491667b4 [X86] Determine if we have an OpaqueSPAdjustment earlier
We queried hasFP before we hit ExpandISelPseudos.  ExpandISelPseudos
manipulated state that hasFP relied on, potentially changing the result
after it has been queried elsewhere.

While I am not aware of any particular bug due to this state of affairs,
it seems best to avoid it entirely by changing the state during DAG
construction.

llvm-svn: 256849
2016-01-05 17:46:36 +00:00
Michael Zuckerman
77c5bba68e [AVX512] add PSLLD and PSLLQ Intrinsic
Differential Revision: http://reviews.llvm.org/D15885

llvm-svn: 256840
2016-01-05 15:17:39 +00:00
MinSeong Kim
fea8e6c4f8 [AArch64] Add support for Samsung Exynos-M1
Adds core tuning support for new Samsung Exynos-M1 core (ARMv8-A).

Differential Revision: http://reviews.llvm.org/D15663

llvm-svn: 256828
2016-01-05 12:51:59 +00:00
Junmo Park
0ba263d82c Remove extra whitespace. NFC.
llvm-svn: 256820
2016-01-05 09:36:47 +00:00
Simon Pilgrim
f31defede7 [X86][SSE] Merge PerformBLENDICombine into PerformShuffleCombine
PBLEND/BLENDPD/BLENDPS are no different to the other target shuffles and this will make future improvements to the target shuffle combines more straightforward.

llvm-svn: 256819
2016-01-05 09:12:17 +00:00
Craig Topper
39cda94562 [X86] Make MOV32ri64 a post-RA pseudo instead of a CodeGenOnly instruction. It was only needed for rematerialization.
llvm-svn: 256818
2016-01-05 07:44:14 +00:00
Craig Topper
323796d03c [X86] Add OpSize32 to OR32mrLocked instruction to match the normal OR32mr instruction.
llvm-svn: 256817
2016-01-05 07:44:11 +00:00
Craig Topper
a899645fbc [AVX512] Add hasSideEffects=0 to kunpck instructions since they lack a pattern in their instructions.
llvm-svn: 256816
2016-01-05 07:44:08 +00:00
Matt Arsenault
baa38e5685 AMDGPU: Remove redundant let mayLoad = 1
This is already set on the SMRD format class.

llvm-svn: 256813
2016-01-05 04:50:28 +00:00
Tom Stellard
9b871727b4 AMDGPU/SI: Select non-uniform constant addrspace loads to flat instructions for HSA
Summary: This fixes a regression caused by r256282.

Reviewers: arsenm, cfang

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15736

llvm-svn: 256810
2016-01-05 03:40:16 +00:00
David Majnemer
c60cb66b34 Revert "[X86] Use push-pop for materializing small constants under 'minsize'"
The red zone consists of 128 bytes beyond the stack pointer so that the
allocation of objects in leaf functions doesn't require decrementing
rsp.  In r255656, we introduced an optimization that would cheaply
materialize certain constants via push/pop.  Push decrements the stack
pointer and stores it's result at what is now the top of the stack.
However, this means that using push/pop would encroach on the red zone.
PR26023 gives an example where this corrupts an object in the red zone.

llvm-svn: 256808
2016-01-05 02:32:06 +00:00
Tom Stellard
a0e8b3ae68 AMDGPU/SI: Consolidate FLAT patterns
Summary:
We had to sets of identical FLAT patterns one inside the
HasFlatAddressSpace predicate and one inside the useFlatForGloabl
predicate.  This patch merges these sets into a single pattern
under the isCIVI predicate.

The reason we can remove the predicates is that when MUBUF instructions
are legal, the instruction selector will prefer selecting those over
FLAT instructions because MUBUF patterns have a higher complexity score.
So, in this case having patterns for FLAT instructions will have no effect.

This change also simplifies the process for forcing global address space
loads to use FLAT instructions, since we no only have to disable the
MUBUF patterns instead of having to disable the MUBUF patterns and
enable the FLAT patterns.

Reviewers: arsenm, cfang

Subscribers: llvm-commits
llvm-svn: 256807
2016-01-05 02:26:37 +00:00
Matthias Braun
c040998a61 MachineInstrBundle: Fix reversed isSuperRegisterEq() call
Unfortunately this fix had the effect of exposing the
-verify-machineinstrs FIXME of X86InstrInfo.cpp in two testcases for
which I disabled it for now.
Two testcases also have additional pushq/popq where the corrected code
cannot prove that %rax is dead any longer. Looking at the examples, this
could potentially be fixed by improving computeRegisterLiveness() to check
the live-in lists of the successors blocks when reaching the end of a
block.

This fixes http://llvm.org/PR25951.

llvm-svn: 256799
2016-01-05 00:45:35 +00:00
Nicolai Haehnle
7d4706e2a1 AMDGPU: add +xnack feature
Summary:
Enabling this feature will account for the two SGPRs used by the hardware
to store the XNACK_MASK physically.

The hardware only requires this reservation when the XNACK feature is
explicitly enabled. At some point, HSA will probably want to do that, but
it does increase SGPR register pressure, so leave it disabled by default
for now (but do add a small test).

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15869

llvm-svn: 256794
2016-01-04 23:35:53 +00:00
Simon Pilgrim
4d1a4f7a0d [X86][SSE] Ensure BLENDPD/BLENDPS/PBLEND inputs are both of the correct input type
llvm-svn: 256782
2016-01-04 21:41:11 +00:00
Tom Stellard
d9aef7bbef AMDGPU/SI: Move VI SMEM pattern back into VIInstructions.td
Summary: This was accidently moved to CIInstructions.td in r256282

Reviewers: cfang, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15763

llvm-svn: 256775
2016-01-04 20:23:10 +00:00
Geoff Berry
584b69e94e [AArch64] Optimize some simple TBZ/TBNZ cases.
Summary:
Add some AArch64 dag combines to optimize some simple TBZ/TBNZ cases:

 (tbz (and x, m), b) -> (tbz x, b)
 (tbz (shl x, c), b) -> (tbz x, b-c)
 (tbz (shr x, c), b) -> (tbz x, b+c)
 (tbz (xor x, -1), b) -> (tbnz x, b)

Reviewers: jmolloy, mcrosier, t.p.northover

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D15702

llvm-svn: 256765
2016-01-04 18:55:47 +00:00
Nicolai Haehnle
5efc944d33 AMDGPU: Avoid assertions after SGPR spilling failed
Summary:
The comment explains it: emitError does not necessarily exit the compilation
process, and then using NoRegister leads to assertions later on.
This generates incorrect code, of course, but the user should know to not use
the result when an error has been emitted.

It would be nice to have a test-case for this inside the LLVM repository,
but llc exits on error. shader-db tests trigger the underlying issue at least
on Tonga.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15826

llvm-svn: 256757
2016-01-04 15:50:01 +00:00
Michael Zuckerman
4e0fc50eed [AVX512] add PSRAD and PSRAQ Intrinsic
Differential Revision: http://reviews.llvm.org/D15851

llvm-svn: 256754
2016-01-04 13:45:45 +00:00
Michael Zuckerman
92e457ef4e [AVX512] add PSRAW Intrinsic
Differential Revision: http://reviews.llvm.org/D15850

llvm-svn: 256751
2016-01-04 12:50:36 +00:00
Michael Zuckerman
52d7de4a89 [AVX512] add PSRLV Intrinsic
Differential Revision: http://reviews.llvm.org/D15838

llvm-svn: 256747
2016-01-04 11:39:06 +00:00
David Majnemer
5ccb736e05 [X86] Make hasFP constant time
We need a frame pointer if there is a push/pop sequence after the
prologue in order to unwind the stack.  Scanning the instructions to
figure out if this happened made hasFP not constant-time which is a
violation of expectations.  Let's compute this up-front and reuse that
computation when we need it.

llvm-svn: 256730
2016-01-04 04:49:41 +00:00
Craig Topper
3c04b20219 Use std::is_sorted and std::none_of instead of manual loops. NFC
llvm-svn: 256719
2016-01-03 19:43:40 +00:00
Dimitry Andric
0614f2a55e Fix several accidental DOS line endings in source files
Summary:
There are a number of files in the tree which have been accidentally checked in with DOS line endings.  Convert these to native line endings.

There are also a few files which have DOS line endings on purpose, and I have set the svn:eol-style property to 'CRLF' on those.

Reviewers: joerg, aaron.ballman

Subscribers: aaron.ballman, sanjoy, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D15848

llvm-svn: 256707
2016-01-03 17:22:03 +00:00
David Majnemer
93803262f4 [X86] Add intrinsics for reading and writing to the flags register
LLVM's targets need to know if stack pointer adjustments occur after the
prologue.  This is needed to correctly determine if the red-zone is
appropriate to use or if a frame pointer is required.

Normally, LLVM can figure this out very precisely by reasoning about the
contents of the MachineFunction.  There is an interesting corner case:
inline assembly.

The vast majority of inline assembly which will perform a push or pop is
done so to pair up with pushf or popf as appropriate.  Unfortunately,
this inline assembly doesn't mark the stack pointer as clobbered
because, well, it isn't.  The stack pointer is decremented and then
immediately incremented.  Because of this, LLVM was changed in r256456
to conservatively assume that inline assembly contain a sequence of
stack operations.  This is unfortunate because the vast majority of
inline assembly will not end up manipulating the stack pointer in any
way at all.

Instead, let's provide a more principled solution: an intrinsic.
FWIW, other compilers (MSVC and GCC among them) also provide this
functionality as an intrinsic.

llvm-svn: 256685
2016-01-01 06:50:01 +00:00
Craig Topper
4ffb442ed6 [X86] Remove a return after llvm_unreachable.
llvm-svn: 256681
2015-12-31 22:40:48 +00:00
Craig Topper
cf29b2e15a [X86] Move shuffle decoding for constant pool into the X86CodeGen library to remove a layering violation in the Util library.
llvm-svn: 256680
2015-12-31 22:40:45 +00:00
Michael Zuckerman
861e8172f1 [AVX512] add PSRLQ and PSRLD Intrinsic
Differential Revision: http://reviews.llvm.org/D15770

llvm-svn: 256673
2015-12-31 15:22:04 +00:00
Michael Kuperstein
ebbd053e6a [X86] Avoid folding scalar loads into unary sse intrinsics
Not folding these cases tends to avoid partial register updates:
sqrtss (%eax), %xmm0
Has a partial update of %xmm0, while
movss (%eax), %xmm0
sqrtss %xmm0, %xmm0
Has a clobber of the high lanes immediately before the partial update,
avoiding a potential stall.

Given this, we only want to fold when optimizing for size.
This is consistent with the patterns we already have for some of
the fp/int converts, and in X86InstrInfo::foldMemoryOperandImpl()

Differential Revision: http://reviews.llvm.org/D15741

llvm-svn: 256671
2015-12-31 09:45:16 +00:00
Asaf Badouh
f9720f53b4 [X86][PKU] Add {RD,WR}PKRU intrinsics
Differential Revision: http://reviews.llvm.org/D15808

llvm-svn: 256670
2015-12-31 08:31:13 +00:00
Craig Topper
efc9b3c5bb [TableGen] Modify the AsmMatcherEmitter to only apply the table growth from r252440 to the Hexagon target.
This restores the previous behavior of not including the mnemonic in the classes table for every target that starts instruction lines with the mnemonic. Not only did the table size increase by 1 entry, but the class enum increased in size which caused every class in the array to increase in size. It also grew the size of the function that parsers tokens into classes by a substantial amount.

This adds a new HasMnemonicFirst flag to all AsmParsers. It's set to 1 by default and Hexagon target overrides it to 0.

For the X86 target alone this recovers 324KB of size on the llvm-mc executable.

I believe the current state is still a bad design choice for the Hexagon target as it causes most of the parsing to do a linear search through the entire match table to comparing operands against every instruction until it finds one that works. At least for the other targets we do a binary search based on mnemonic over which to do the linear scan.

llvm-svn: 256669
2015-12-31 08:18:23 +00:00
Sanjay Patel
32b3efed19 use range-based for-loops; NFCI
llvm-svn: 256573
2015-12-29 19:14:23 +00:00
Michael Zuckerman
d97aa00156 [AVX512] add PSRLW Intrinsic
Differential Revision: http://reviews.llvm.org/D15751

llvm-svn: 256558
2015-12-29 13:04:35 +00:00
Craig Topper
886aa9ef46 [TableGen] Remove MnemonicContainsDot from AsmParser. It isn't used. NFC
llvm-svn: 256542
2015-12-29 07:03:30 +00:00
Craig Topper
26319780b7 [X86] Remove declaration of ATTAsmParser. Its equivalent to the DefaultAsmParser. NFC
llvm-svn: 256541
2015-12-29 07:03:27 +00:00
Artyom Skrobov
4f10d9b0f0 [Thumb] Fix assembler error 'cannot honor width suffix pop {lr}'
Summary:
* avoid generating POP {LR} in Thumb1 epilogues
* combine MOV LR, Rx + BX LR -> BX Rx in a peephole optimization pass
* combine POP {LR} + B + BX LR -> POP {PC} on v5T+

Test cases by Ana Pazos

Differential Revision: http://reviews.llvm.org/D15707

llvm-svn: 256523
2015-12-28 21:40:45 +00:00
Sanjay Patel
61b36ff574 [x86] lower calls to fmin and llvm.minnum.* using minss/minsd/minps/minpd (PR24475)
This is a follow-on to:
http://reviews.llvm.org/rL255700
http://reviews.llvm.org/rL256454
http://reviews.llvm.org/rL256510

llvm-svn: 256522
2015-12-28 21:16:55 +00:00
Elena Demikhovsky
3ed0b3c7f1 Implemented cost model for masked gather and scatter operations
The cost is calculated for all X86 targets. When gather/scatter instruction
is not supported we calculate the cost of scalar sequence.

Differential revision: http://reviews.llvm.org/D15677

llvm-svn: 256519
2015-12-28 20:10:59 +00:00
Sanjay Patel
7785be794b [x86] lower calls to fmax and llvm.maxnum.* using maxps/maxpd (PR24475)
This is a follow-on to:
http://reviews.llvm.org/rL255700
http://reviews.llvm.org/rL256454

llvm-svn: 256510
2015-12-28 19:20:19 +00:00
Sanjay Patel
9ffd44cf90 tidy up; NFC
llvm-svn: 256506
2015-12-28 18:18:22 +00:00
Roman Divacky
fa44452ed8 Support clrex instruction on ARMv6k. Patch by Andrew Turner.
llvm-svn: 256505
2015-12-28 17:47:23 +00:00
Michael Kuperstein
ba0a393451 [X86] Better support for the MCU psABI (LLVM part)
This adds support for the MCU psABI in a way different from r251223 and r251224,
basically reverting most of these two patches. The problem with the approach
taken in r251223/4 is that it only handled libcalls that originated from the backend.
However, the mid-end also inserts quite a few libcalls and assumes these use the
platform's default calling convention.

The previous patch tried to insert inregs when necessary both in the FE and,
somewhat hackily, in the CG. Instead, we now define a new default calling convention
for the MCU, which doesn't use inreg marking at all, similarly to what x86-64 does.

Differential Revision: http://reviews.llvm.org/D15054

llvm-svn: 256494
2015-12-28 14:39:21 +00:00
Alexander Kornienko
c882727d66 Refactor: Simplify boolean conditional return statements in lib/Target/PowerPC
Summary: Use clang-tidy to simplify boolean conditional return statements

Reviewers: uweigand, rafael, wschmidt

Subscribers: craig.topper, llvm-commits

Patch by Richard Thomson!

Differential Revision: http://reviews.llvm.org/D9984

llvm-svn: 256493
2015-12-28 13:38:42 +00:00
Asaf Badouh
3e8d6828a0 [X86][AVX512] Lower broadcast sub vector to vector inrtrinsics
lower broadcast<type>x<vector> to shuffles.
 there are two cases:
1.src is 128 bits and dest is 512 bits: in this case we will lower it to shuffle with imm = 0.
2.src is 256 bit and dest is 512 bits: in this case we will lower it to shuffle with imm = 01000100b (0x44) that way we will broadcast the 256bit source: ymm[0,1,2,3] => zmm[0,1,2,3,0,1,2,3] then it will mask it with the passthru value (in case it's mask op).



Differential Revision: http://reviews.llvm.org/D15790

llvm-svn: 256490
2015-12-28 08:26:26 +00:00
Asaf Badouh
6fcb80c7ac [X86][AVX512] add fp scalar broadcast intrinsics
Differential Revision: http://reviews.llvm.org/D15790

llvm-svn: 256489
2015-12-28 08:09:25 +00:00
Craig Topper
228bc66bfc [AVX512] Remove VEX_LIG from vmovd/vmovq instructions. From what I can tell from the Intel docs these instructions require the L-bit to be 0.
llvm-svn: 256486
2015-12-28 06:32:47 +00:00
Craig Topper
a315cd0b1d [AVX512] Fix some places that used FR64 instead of FR64X.
llvm-svn: 256484
2015-12-28 06:11:45 +00:00
Craig Topper
cf3121d888 [AVX512] Bring vmovq instructions names into alignment with the AVX and SSE names. Add a missing encoding to disassembler and assembler.
I believe this also fixes a case where a 64-bit memory form that is documented as being unsupported in 32-bit mode was able to be selected there.

llvm-svn: 256483
2015-12-28 06:11:42 +00:00
Craig Topper
78232095c9 [X86] Move address for store target from outs to ins on a couple instructions.
llvm-svn: 256482
2015-12-28 06:11:39 +00:00
Craig Topper
bfdc4a3764 [X86] Add proper Uses/Defs/mayLoad flags for AAA/AAD/AAM/AAS/DAA/DAS/XLAT instructions.
llvm-svn: 256481
2015-12-28 06:11:37 +00:00
Craig Topper
e4e0592ca3 [AVX512] Remove separate instruction and patterns for lowering ctlz_zero_undef. Change the operation for CTLZ_ZERO_UNDEF to Expand so SelectionDAG will convert them to CTLZ before lowering.
llvm-svn: 256477
2015-12-27 21:33:50 +00:00
Craig Topper
ce5014e9fe [AVX512] Remove alternate data type versions of VALIGND, VALIGNQ, VMOVSHDUP and VMOVSLDUP. They don't have any tests and I don't think they can be selected. If they are truly needed they should be implemented with patterns against the normal instructions and not separate instructions.
llvm-svn: 256475
2015-12-27 19:45:21 +00:00
Igor Breger
a848a96908 AVX512: Change VPMOVB2M DAG lowering , use CVT2MASK node instead TRUNCATE.
Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB.
Implement VPMOVB/W/D/Q2M intrinsic.

Differential Revision: http://reviews.llvm.org/D15675

llvm-svn: 256470
2015-12-27 13:56:16 +00:00
Asaf Badouh
f94cbd0492 [X86][AVX512] change broadcast to use maskable pattern
Differential Revision: http://reviews.llvm.org/D15786

llvm-svn: 256469
2015-12-27 12:14:34 +00:00
Craig Topper
c79efd26f5 [AVX-512] Remove alernate integer forms for VPERMILPS and VPERMILPD. There no tests for them and I don't see any way to select them anyway. If they are really needed they should be implemented as patterns and not full fledged instructions.
llvm-svn: 256462
2015-12-27 06:55:08 +00:00
David Majnemer
38d1ffe261 [X86, Win64] Use a frame pointer if pushf is emitted
A frame pointer must be used if stack pointer is modified after the
prologue.  LLVM will emit pushf/popf if we need to save/restore the
FLAGS register, requiring us to have a frame pointer for the function.

There is a small twist: this sequence might exist in user code via
inline-assembly.  For now, conservatively assume that such functions
require a frame pointer.  For real world justification, please see
clang's implementation of __readeflags.

This fixes PR25945.

llvm-svn: 256456
2015-12-27 06:07:26 +00:00
Sanjay Patel
5cb4cfb9d6 [x86] lower calls to llvm.maxnum.v4f32 using maxps
This is a follow-on to:
http://reviews.llvm.org/rL255700

llvm-svn: 256454
2015-12-26 21:44:55 +00:00
Craig Topper
e482a43cb8 [X86] Fix an unused variable warning in released builds.
llvm-svn: 256453
2015-12-26 20:13:33 +00:00
Craig Topper
dcab6c8bcf [X86] Add support for printing shuffle comments for AVX512 PSHUFB instructions.
llvm-svn: 256452
2015-12-26 19:48:43 +00:00
Craig Topper
89319531b9 [X86] Fold some variable declarations and initializations into if statements. NFC
llvm-svn: 256451
2015-12-26 19:48:37 +00:00
Craig Topper
de07308c81 [X86] Fix shuffle decoding for variable VPERMIL to be tolerant of the Constant type not matching due to folding in the constant pool and to get VPERMILPD correct.
llvm-svn: 256433
2015-12-26 04:50:07 +00:00
Craig Topper
38dac27e3a [X86] Fix copy and paste typo from pasting from another Makefile to restore code.
llvm-svn: 256431
2015-12-25 23:27:57 +00:00
Craig Topper
f89e5ceed0 [X86] Put back the include path to the main X86 sources in the AsmParser library to fix the bots.
llvm-svn: 256430
2015-12-25 22:22:16 +00:00
Craig Topper
7e1d2e52cd [X86] Remove X86CodeGen dependency from the AsmParser library.
llvm-svn: 256429
2015-12-25 22:10:11 +00:00
Craig Topper
8d6e33b512 [X86] Move getX86SubSuperRegisterOrZero to X86MCTargetDesc.cpp so it can be used by AsmParser library without depending on X86CodeGen library.
llvm-svn: 256428
2015-12-25 22:10:08 +00:00
Craig Topper
8ac345d153 Remove extra forward declarations and scrub includes for all in tree InstPrinters. NFC
llvm-svn: 256427
2015-12-25 22:10:01 +00:00
Craig Topper
08678ccbc2 [X86] Move AVX512 STATIC_ROUNDING enum to X86BaseInfo.h to fix a layering violation in AsmParser.
llvm-svn: 256426
2015-12-25 22:09:49 +00:00
Craig Topper
fe5f33f108 [X86] Replace MVT::SimpleValueType in the AsmParser library and getX86SubSuperRegister with just an unsigned representing size.
This a is step towards fixing a layering violation so the X86 AsmParser won't depending on CodeGen types.

llvm-svn: 256425
2015-12-25 22:09:45 +00:00
Craig Topper
f023f89e96 [X86] Don't pass the default value to the High argument of getX86SubSuperRegister. Most place don't care about this argument. NFC
llvm-svn: 256424
2015-12-25 19:44:16 +00:00
Craig Topper
f3eef0e344 [X86] getX86SubSuperRegisterOrZero shouldn't call getX86SubSuperRegister recursively. It should call itself instead. Otherwise it might fire an assertion when it was designed not too.
llvm-svn: 256422
2015-12-25 17:07:32 +00:00
Craig Topper
be47256c20 [X86] Add missing X86II::MRM_C4, MRM_C5, etc. encodings to getMemoryOperandNo. These aren't used by any instructions, but could be someday. NFC
llvm-svn: 256421
2015-12-25 17:07:30 +00:00
Craig Topper
474cc56790 [X86] Use assert instead of if and llvm_unreachable. NFC
llvm-svn: 256420
2015-12-25 17:07:27 +00:00
Craig Topper
e1f71ad36d [X86] Minor identation fixes. NFC
llvm-svn: 256419
2015-12-25 17:07:24 +00:00
Dan Gohman
9c3961aabe [WebAssembly] Fix handling of COPY instructions in WebAssemblyRegStackify.
Move RegStackify after coalescing and teach it to use LiveIntervals instead
of depending on SSA form. This avoids a problem where a register in a COPY
instruction is stackified and then subsequently coalesced with a register
that is not stackified.

This also puts it after the scheduler, which allows us to simplify the
EXPR_STACK constraint, as we no longer have instructions being reordered
after stackification and before coloring.

llvm-svn: 256402
2015-12-25 00:31:02 +00:00