1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 19:12:56 +02:00
Commit Graph

40496 Commits

Author SHA1 Message Date
Ulrich Weigand
564945a3af [SystemZ] Support remaining atomic instructions
Add assembler support for all atomic instructions that weren't already
supported.  Some of those could be used to implement codegen for 128-bit
atomic operations, but this isn't done here yet.

llvm-svn: 288526
2016-12-02 18:24:16 +00:00
Ulrich Weigand
3319b28944 [SystemZ] Support floating-point control register instructions
Add assembler support for instructions manipulating the FPC.

Also add codegen support via the GCC compatibility builtins:
  __builtin_s390_sfpc
  __builtin_s390_efpc

llvm-svn: 288525
2016-12-02 18:21:53 +00:00
Ulrich Weigand
479e55b0a2 [SystemZ] Refactor hasSideEffects setting
Move setting of hasSideEffects out of SystemZInstrFormats.td,
to allow use of the format classes for instructions where this
flag shouldn't be set.  NFC.

llvm-svn: 288524
2016-12-02 18:19:22 +00:00
Matt Arsenault
ffb70acf5d AMDGPU: Implement isCheapAddrSpaceCast
llvm-svn: 288523
2016-12-02 18:12:53 +00:00
Simon Pilgrim
ea697ac313 Tidyup code with indentation and clang-format. NFCI.
llvm-svn: 288505
2016-12-02 15:44:30 +00:00
Daniel Cederman
102ef92e53 [Sparc] Fix parsing of double-precision %f18, %f20, and %f22
Summary: They are currently being parsed as %f14, %f16, and %f18.

Reviewers: venkatra, jyknight

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27342

llvm-svn: 288503
2016-12-02 15:05:26 +00:00
Simon Pilgrim
ddb5b71e46 [X86][SSE] Add support for extracting constant bit data from broadcasted constants
llvm-svn: 288499
2016-12-02 13:16:08 +00:00
Simon Pilgrim
250561fdd2 [X86] Refactored getTargetConstantBitsFromNode to allow for expansion. NFCI.
getTargetConstantBitsFromNode currently only extracts constant pool vector data, but it will need to be generalized to support broadcast and scalar constant pool data as well.

Converted Constant bit extraction and Bitset splitting to helper lambda functions.

llvm-svn: 288496
2016-12-02 11:58:05 +00:00
Craig Topper
2fbd5b4588 [AVX-512] Add EVEX vpshuflw/vpshufhw/vpshufd instructions to load folding tables.
llvm-svn: 288484
2016-12-02 07:57:11 +00:00
Craig Topper
93bce7573e [AVX-512] Add EVEX PSHUFB instructions to load folding tables.
llvm-svn: 288482
2016-12-02 07:06:30 +00:00
Craig Topper
29c496f58b [AVX-512] Add masked VINSERTF/VINSERTI instructions to load folding tables.
llvm-svn: 288481
2016-12-02 06:24:38 +00:00
Peter Collingbourne
1329d17185 IR: Change PointerType to derive from Type rather than SequentialType.
As proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/106640.html

This is for a couple of reasons:

- Values of type PointerType are unlike the other SequentialTypes (arrays
  and vectors) in that they do not hold values of the element type. By moving
  PointerType we can unify certain aspects of how the other SequentialTypes
  are handled.
- PointerType will have no place in the SequentialType hierarchy once
  pointee types are removed, so this is a necessary step towards removing
  pointee types.

Differential Revision: https://reviews.llvm.org/D26595

llvm-svn: 288462
2016-12-02 03:05:41 +00:00
Peter Collingbourne
bc87b9fd38 IR: Change the gep_type_iterator API to avoid always exposing the "current" type.
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.

Differential Revision: https://reviews.llvm.org/D26594

llvm-svn: 288458
2016-12-02 02:24:42 +00:00
Matt Arsenault
f24fcd4ad7 AMDGPU: Use wider scalar spills for SGPR spilling
Since the spill is for the whole wave, these
don't have the swizzling problems that vector stores do
and a single 4-byte allocation is enough to spill a 64 element
register. This should reduce the number of spill instructions and
put all the spills for a register in the same cacheline.

This should save allocated private size, but for now it doesn't.
The extra slots are allocated for each component, but never used
because the frame layout is essentially finalized before frame
indices are replaced. For always using the scalar store path,
this should probably be moved into processFunctionBeforeFrameFinalized.

llvm-svn: 288445
2016-12-02 00:54:45 +00:00
Geoff Berry
993081c749 [AArch64] Fold more spilled/refilled COPYs.
Summary:
Make AArch64InstrInfo::foldMemoryOperandImpl more general by folding all
full COPYs between register classes of the same size that are either
spilled or refilled.

Reviewers: MatzeB, qcolombet

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27271

llvm-svn: 288439
2016-12-01 23:43:55 +00:00
Dan Gohman
078794f61e [MC] Refactor emitELFSize to make usage more consistent. NFC.
Move the cast<MCSymbolELF> inside emitELFSize, so that: 
 - it's done in one place instead of at each call
 - it's more consistent with similar functions like EmitCOFFSafeSEH
 - ambiguity between cast<> and dyn_cast<> is avoided (which also
   eliminates an unnecessary dyn_cast call)

This also makes it easier to experiment with using ".size" directives on
non-ELF targets.

llvm-svn: 288437
2016-12-01 23:39:08 +00:00
Oleg Ranevskyy
ac195a6f2b [ARM] Fix for 64-bit CAS expansion on ARM32 with -O0
Summary:
This patch fixes comparison of 64-bit atomic with its expected value in CMP_SWAP_64 expansion.

Currently, the low words are compared with CMP, while the high words are compared with SBC. SBC expects the carry flag to be set if CMP detects a difference. CMP might leave the carry unset for unequal arguments though if the first one is >= than the second. This might cause the comparison logic to detect false equality.

Example of the broken C++ code:
```
std::atomic<long long> at(2);

long long ll = 1;
std::atomic_compare_exchange_strong(&at, &ll, 3);
```
Even though the atomic `at` and the expected value `ll` are not equal and `atomic_compare_exchange_strong` returns `false`, `at` is changed to 3.

The patch replaces SBC with CMPEQ.

Reviewers: t.p.northover

Subscribers: aemerson, rengolin, llvm-commits, asl

Differential Revision: https://reviews.llvm.org/D27315

llvm-svn: 288433
2016-12-01 22:58:35 +00:00
Tim Northover
bde2af05d4 AArch64: fix 128-bit cmpxchg at -O0 (again, again).
This time the issue is fortunately just a simple mistake rather than a horrible
design spectre. I thought SUBS/SBCS provided sufficient NZCV flags for
comparing two 64-bit values, but they don't.

The fix is slightly clunkier in AArch64 because we can't use conditional
execution to emit a pair of CMPs. Traditionally an "icmp ne i128" would map to
an EOR/EOR/ORR/CBNZ, but that uses more registers so it's easier to go with a
CSET/CINC/CBNZ combination. Slightly less efficient, but this is -O0 anyway.

Thanks to Anton Korobeynikov for pointing out the issue.

llvm-svn: 288418
2016-12-01 21:31:59 +00:00
Benjamin Kramer
75f09ab5fe Fix unused variable warning in Release builds. NFC.
llvm-svn: 288416
2016-12-01 20:49:34 +00:00
David L Kreitzer
f3bc5b6f2f Refactored X86InterleavedAccess into a class. NFCI.
Patch by Farhana Aleen

Differential Revision: https://reviews.llvm.org/D25986

llvm-svn: 288410
2016-12-01 19:56:39 +00:00
Matthias Braun
148c29c710 Move most EH from MachineModuleInfo to MachineFunction
Recommitting r288293 with some extra fixes for GlobalISel code.

Most of the exception handling members in MachineModuleInfo is actually
per function data (talks about the "current function") so it is better
to keep it at the function instead of the module.

This is a necessary step to have machine module passes work properly.

Also:
- Rename TidyLandingPads() to tidyLandingPads()
- Use doxygen member groups instead of "//===- EH ---"... so it is clear
  where a group ends.
- I had to add an ugly const_cast at two places in the AsmPrinter
  because the available MachineFunction pointers are const, but the code
  wants to call tidyLandingPads() in between
  (markFunctionEnd()/endFunction()).

Differential Revision: https://reviews.llvm.org/D27227

llvm-svn: 288405
2016-12-01 19:32:15 +00:00
Simon Pilgrim
a46e15a14c [X86][SSE] Moved shuffle mask widening/narrowing helper functions earlier in the file.
Will be necessary for a future patch.

llvm-svn: 288395
2016-12-01 18:27:19 +00:00
Ulrich Weigand
77892ebea6 [SystemZ] Fix fallout from r288374
Avoid undefined behavior due to too-large shift count.

llvm-svn: 288391
2016-12-01 18:00:50 +00:00
Ulrich Weigand
65015d91d0 [SystemZ] Fix applyFixup for 12-bit fixups
Now that we have fixups that only fill parts of a byte, it turns
out we have to mask off the bits outside the fixup area when
applying them.  Failing to do so caused invalid object code to
be emitted for bprp with a negative 12-bit displacement.

llvm-svn: 288374
2016-12-01 17:10:27 +00:00
Simon Pilgrim
a6380752f1 [X86][SSE] Classify AND bitmasks as variable shuffle masks
They are loading the bitmasks from the constant pool so the cost is similar to loading a shuffle mask.

llvm-svn: 288367
2016-12-01 16:00:14 +00:00
Simon Pilgrim
457eb2e819 [X86][SSE] Add support for combining AND bitmasks to shuffles.
llvm-svn: 288365
2016-12-01 15:41:40 +00:00
Asaf Badouh
4d1b8425dd [LMT] Restrict nop length to one
not all lakemont MCU support long nop.
we can't assume we can generate long nop by default for MCU.

Differential Revision: https://reviews.llvm.org/D26895

llvm-svn: 288363
2016-12-01 15:19:10 +00:00
Daniel Jasper
fd88782f55 Silence GCC's -Wenum-compare after r288335 in the same way it is done
in X86FastISel.cpp.

llvm-svn: 288337
2016-12-01 14:33:50 +00:00
Simon Pilgrim
f64ec6909b [X86][SSE] Add support for combining target shuffles to AND bitmasks.
llvm-svn: 288335
2016-12-01 13:47:02 +00:00
Simon Pilgrim
e04c9ca4d7 [X86][SSE] Add support for combining ISD::AND with shuffles.
Attempts to convert an AND with a vector of 255 or 0 values into a shuffle (blend) mask.

llvm-svn: 288333
2016-12-01 11:52:37 +00:00
Eric Christopher
4e57c02b4a Temporarily Revert "Move most EH from MachineModuleInfo to MachineFunction"
This apprears to have broken the global isel bot:
http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-globalisel_build/5174/console

This reverts commit r288293.

llvm-svn: 288322
2016-12-01 07:50:12 +00:00
Derek Schuff
9a9dc02678 [WebAssembly] Emit .import_global assembler directives
Support a new assembler directive, .import_global, to declare imported
global variables (i.e. those with external linkage and no
initializer). The linker turns these into wasm imports.

Patch by Jacob Gravelle

Differential Revision: https://reviews.llvm.org/D26875

llvm-svn: 288296
2016-12-01 00:11:15 +00:00
Matthias Braun
6a2832f1f2 Move most EH from MachineModuleInfo to MachineFunction
Most of the exception handling members in MachineModuleInfo is actually
per function data (talks about the "current function") so it is better
to keep it at the function instead of the module.

This is a necessary step to have machine module passes work properly.

Also:
- Rename TidyLandingPads() to tidyLandingPads()
- Use doxygen member groups instead of "//===- EH ---"... so it is clear
  where a group ends.
- I had to add an ugly const_cast at two places in the AsmPrinter
  because the available MachineFunction pointers are const, but the code
  wants to call tidyLandingPads() in between
  (markFunctionEnd()/endFunction()).

Differential Revision: https://reviews.llvm.org/D27227

llvm-svn: 288293
2016-11-30 23:49:01 +00:00
Matthias Braun
ced7bf3e1d Move FrameInstructions from MachineModuleInfo to MachineFunction
This is per function data so it is better kept at the function instead
of the module.

This is a necessary step to have machine module passes work properly.

Differential Revision: https://reviews.llvm.org/D27185

llvm-svn: 288291
2016-11-30 23:48:42 +00:00
Paul Robinson
f6c56efd2e [PS4] Tighten up a triple check.
llvm-svn: 288286
2016-11-30 23:14:27 +00:00
Joel Jones
0d9a67f578 [AArch64] Refactor LSE support as feature separate from V8.1a support.
Summary:
This is preparation for ThunderX processors that have Large
System Extension (LSE) atomic instructions, but not the 
other instructions introduced by V8.1a.
This will mimic changes to GCC as described here:
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00388.html

LSE instructions are: LD/ST<op>, CAS*, SWP

Reviewers: t.p.northover, echristo, jmolloy, rengolin

Subscribers: aemerson, mehdi_amini

Differential Revision: https://reviews.llvm.org/D26621

llvm-svn: 288279
2016-11-30 22:25:24 +00:00
Matthias Braun
32bde7b907 Clarify rules for reserved regs, fix aarch64 ones.
No test case necessary as the problematic condition is checked with the
newly introduced assertAllSuperRegsMarked() function.

Differential Revision: https://reviews.llvm.org/D26648

llvm-svn: 288277
2016-11-30 22:17:10 +00:00
Silviu Baranga
eb3c226087 [AArch64] Fix useful bits detection for BFM instructions
Summary:
When computing useful bits for a BFM instruction, we need
to take into consideration the case where both operands
of the BFM are equal and provide data that we need to track.

Not doing this can cause us to miss useful bits.
    
Fixes PR31138 (https://llvm.org/bugs/show_bug.cgi?id=31138)

Reviewers: t.p.northover, jmolloy

Subscribers: evandro, gberry, srhines, pirama, mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D27130

llvm-svn: 288253
2016-11-30 17:04:22 +00:00
Simon Pilgrim
ae03f3ddf8 [X86][SSE] Add support for target shuffle constant folding
Initial support for target shuffle constant folding in cases where all shuffle inputs are constant. We may be able to relax this and merge shuffles with only some constant inputs in the future.

I've added the helper function getTargetConstantBitsFromNode (based off a similar function in X86ShuffleDecodeConstantPool.cpp) that could be reused for other cases requiring constant vector extraction.

Differential Revision: https://reviews.llvm.org/D27220

llvm-svn: 288250
2016-11-30 16:33:46 +00:00
Krzysztof Parzyszek
02d1c3dbab [PowerPC] Preserve machine dominator tree in PPCVSXFMAMutate
It is needed by LiveIntervalAnalysis.

llvm-svn: 288243
2016-11-30 13:31:09 +00:00
Nemanja Ivanovic
124cca126e [PowerPC] Improvements for BUILD_VECTOR Vol. 2
This patch corresponds to review:
https://reviews.llvm.org/D26023

This patch adds support for converting a vector of loads into a single load if
the loads are consecutive (in either direction).

llvm-svn: 288219
2016-11-29 23:57:54 +00:00
Nemanja Ivanovic
920b70e9c4 [PowerPC] Improvements for BUILD_VECTOR Vol. 2
This patch corresponds to review:
https://reviews.llvm.org/D25980

This is the 2nd patch in a series of 4 that improve the lowering and combining
for BUILD_VECTOR nodes on PowerPC. This particular patch combines a build vector
of fp-to-int conversions into an fp-to-int conversion of a build vector of fp
values. For example:
Converts (build_vector (fp_to_[su]i $A), (fp_to_[su]i $B), ...)
Into (fp_to_[su]i (build_vector $A, $B, ...))).
Which is a natural match for much cleaner code.

llvm-svn: 288218
2016-11-29 23:36:03 +00:00
Jacques Pienaar
bb32a0735a [lanai] Manually match 0/-1 with R0/R1.
Summary: Previously 0 and -1 was matched via tablegen rules. But this could cause problems where a physical register was being used where a virtual register was expected (seen in optimizeSelect and TwoAddressInstructionPass). Instead follow AArch64 and match in DAGToDAGISel.

Reviewers: eliben, majnemer

Subscribers: llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D27171

llvm-svn: 288215
2016-11-29 23:01:09 +00:00
Nemanja Ivanovic
62aef0dfee Revert https://reviews.llvm.org/rL287679
This commit caused some miscompiles that did not show up on any of the bots.
Reverting until we can investigate the cause of those failures.

llvm-svn: 288214
2016-11-29 23:00:33 +00:00
Sanjay Patel
6fe7b03bee [AArch64] allow and-not-compare transform to form 'bics'
This target hook was added with D19087:
https://reviews.llvm.org/D19087

Differential Revision: https://reviews.llvm.org/D27221

llvm-svn: 288206
2016-11-29 22:28:58 +00:00
Chad Rosier
1d8d76fe6d [AArch64] Add a basic SchedMachineModel for Falkor.
Differential Revision: https://reviews.llvm.org/D26972

llvm-svn: 288194
2016-11-29 20:00:27 +00:00
Matt Arsenault
30967b5c23 AMDGPU: Disallow exec as SMEM instruction operand
This is not in the list of valid inputs for the encoding.
When spilling, copies from exec can be folded directly
into the spill instruction which results in broken
stores.

This only fixes the operand constraints, more codegen
work is required to avoid emitting the invalid
spills.

This sort of breaks the dbg.value test. Because the
register class of the s_load_dwordx2 changes, there
is a copy to SReg_64, and the copy is the operand
of dbg_value. The copy is later dead, and removed
from the dbg_value.

llvm-svn: 288191
2016-11-29 19:39:53 +00:00
Matt Arsenault
a418419fae AMDGPU: Use SGPR_64 for argument lowerings
llvm-svn: 288190
2016-11-29 19:39:48 +00:00
Matt Arsenault
c97932df14 AMDGPU: Rename flat operands to match mubuf
Use vaddr/vdst for the same purposes.

This also fixes a beg in SIInsertWaits for the
operand check. The stored value operand is currently called
data0 in the single offset case, not data.

llvm-svn: 288188
2016-11-29 19:30:44 +00:00
Matt Arsenault
75513f4636 AMDGPU: Use else if
llvm-svn: 288187
2016-11-29 19:30:41 +00:00
Matt Arsenault
3c5076a42d AMDGPU: Materialize frame index before add
It isn't generally safe to fold the frame index
directly into the operand since it will possibly
not be an inline immediate after it is expanded.

This surprisingly seems to produce better code, since
the FI doesn't prevent folding other immediate operands.

llvm-svn: 288185
2016-11-29 19:20:48 +00:00
Matt Arsenault
a89faeec9c AMDGPU: Refactor immediate folding logic
Change the logic for when to fold immediates to
consider the destination operand rather than the
source of the materializing mov instruction.

No change yet, but this will allow for correctly handling
i16/f16 operands. Since 32-bit moves are used to materialize
constants for these, the same bitvalue will not be in the
register.

llvm-svn: 288184
2016-11-29 19:20:42 +00:00
Geoff Berry
5ed377ecc1 [AArch64] Fold spills of COPY of WZR/XZR
Summary:
In AArch64InstrInfo::foldMemoryOperandImpl, catch more cases where the
COPY being spilled is copying from WZR/XZR, but the source register is
not in the COPY destination register's regclass.

For example, when spilling:

  %vreg0 = COPY %XZR ; %vreg0:GPR64common

without this change, the code in TargetInstrInfo::foldMemoryOperand()
and canFoldCopy() that normally handles cases like this would fail to
optimize since %XZR is not in GPR64common.  So the spill code generated
would be:

  %vreg0 = COPY %XZR
  STR %vreg

instead of the new code generated:

  STR %XZR

Reviewers: qcolombet, MatzeB

Subscribers: mcrosier, aemerson, t.p.northover, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D26976

llvm-svn: 288176
2016-11-29 18:28:32 +00:00
Simon Pilgrim
7d5e5aa9e9 Avoid repeated calls to MVT getSizeInBits and getScalarSizeInBits(). NFCI.
llvm-svn: 288170
2016-11-29 17:57:48 +00:00
Nemanja Ivanovic
325b871295 [PowerPC] Improvements for BUILD_VECTOR Vol. 1
This patch corresponds to review:
https://reviews.llvm.org/D25912

This is the first patch in a series of 4 that improve the lowering and combining
for BUILD_VECTOR nodes on PowerPC.

llvm-svn: 288152
2016-11-29 16:11:34 +00:00
Simon Pilgrim
f1dcbdb14d [X86] Moved getTargetConstantFromNode function so a future patch is more understandable. NFCI.
llvm-svn: 288147
2016-11-29 15:32:58 +00:00
Simon Pilgrim
b228a39a42 [X86][SSE] Add initial support for combining target shuffles to (V)PMOVZX.
We can only handle 128-bit vectors until we support target shuffle inputs of different size to the output.

llvm-svn: 288140
2016-11-29 14:18:51 +00:00
Simon Pilgrim
b9fd0b9690 Avoid repeated calls to MVT::getScalarSizeInBits(). NFCI.
llvm-svn: 288138
2016-11-29 13:43:08 +00:00
Tom Stellard
4f879c1dc0 AMDGPU/SI: Avoid moving PHIs to VALU when phi values are defined in scalar branches
Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23417

llvm-svn: 288095
2016-11-29 00:46:46 +00:00
Matthias Braun
ce011a4aed MachineScheduler: Export function to construct "default" scheduler.
This makes the createGenericSchedLive() function that constructs the
default scheduler available for the public API. This should help when
you want to get a scheduler and the default list of DAG mutations.

This also shrinks the list of default DAG mutations:
{Load|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer
added by default. Targets can easily add them if they need them. It also
makes it easier for targets to add alternative/custom macrofusion or
clustering mutations while staying with the default
createGenericSchedLive(). It also saves the callback back and forth in
TargetInstrInfo::enableClusterLoads()/enableClusterStores().

Differential Revision: https://reviews.llvm.org/D26986

llvm-svn: 288057
2016-11-28 20:11:54 +00:00
Stanislav Mekhanoshin
92eac85076 [AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies
Codegen prepare sinks comparisons close to a user is we have only one register
for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions.
Changed BE to report we have many condition registers. That way IR LICM pass
would hoist an invariant comparison out of a loop and codegen prepare will not
sink it.

With that done a condition is calculated in one block and used in another.
Current behavior is to store workitem's condition in a VGPR using v_cndmask_b32
and then restore it with yet another v_cmp instruction from that v_cndmask's
result. To mitigate the issue a propagation of source SGPR pair in place of v_cmp
is implemented. Additional side effect of this is that we may consume less VGPRs
at a cost of more SGPRs in case if holding of multiple conditions is needed, and
that is a clear win in most cases.

Differential Revision: https://reviews.llvm.org/D26114

llvm-svn: 288053
2016-11-28 18:58:49 +00:00
Simon Pilgrim
e91c88b619 [X86][SSE] Add initial support for combining (V)PMOVZX with shuffles.
llvm-svn: 288049
2016-11-28 17:58:19 +00:00
Sanjay Patel
07d0147189 [x86] fix formatting; NFC
llvm-svn: 288045
2016-11-28 17:39:21 +00:00
Simon Pilgrim
33c460873f [X86][SSE] Added support for combining bit-shifts with shuffles.
Bit-shifts by a whole number of bytes can be represented as a shuffle mask suitable for combining.

Added a 'getFauxShuffleMask' function to allow us to create shuffle masks from other suitable operations.

llvm-svn: 288040
2016-11-28 16:25:01 +00:00
Daniel Cederman
517e67b3a9 Test commit
llvm-svn: 288036
2016-11-28 15:33:03 +00:00
Ulrich Weigand
636aaaae0c [SystemZ] Fix build bot fallout from r288030
Remove unused variable that came in due to a copy-and-paste bug
and caused build bot failures.

llvm-svn: 288033
2016-11-28 14:24:14 +00:00
Ulrich Weigand
0120bda5a7 [SystemZ] Support execution hint instructions
This adds assembler support for the instructions provided by the
execution-hint facility (NIAI and BP(R)P).  This required adding
support for the new relocation types for 12-bit and 24-bit PC-
relative offsets used by the BP(R)P instructions.

llvm-svn: 288031
2016-11-28 14:01:51 +00:00
Ulrich Weigand
64c39ae7f5 [SystemZ] Support load-and-trap instructions
This adds support for the instructions provided with the
load-and-trap facility.

llvm-svn: 288030
2016-11-28 13:59:22 +00:00
Ulrich Weigand
0487d18102 [SystemZ] Add remaining branch instructions
This patch adds assembler support for the remaining branch instructions:
the non-relative branch on count variants, and all variants of branch
on index.

The only one of those that can be readily exploited for code generation
is BRCTH (branch on count using a high 32-bit register as count).  Do
use it, however, it is necessary to also introduce a hew CHIMux pseudo
to allow comparisons of a 32-bit value agains a short immediate to go
into a high register as well (implemented via CHI/CIH).

This causes a bit of codegen changes overall, but those have proven to
be neutral (or even beneficial) in performance measurements.

llvm-svn: 288029
2016-11-28 13:40:08 +00:00
Ulrich Weigand
b483140318 [SystemZ] Improve use of conditional instructions
This patch moves formation of LOC-type instructions from (late)
IfConversion to the early if-conversion pass, and in some cases
additionally creates them directly from select instructions
during DAG instruction selection.

To make early if-conversion work, the patch implements the
canInsertSelect / insertSelect callbacks.  It also implements
the commuteInstructionImpl and FoldImmediate callbacks to
enable generation of the full range of LOC instructions.

Finally, the patch adds support for all instructions of the
load-store-on-condition-2 facility, which allows using LOC
instructions also for high registers.

Due to the use of the GRX32 register class to enable high registers,
we now also have to handle the cases where there are still no single
hardware instructions (conditional move from a low register to a high
register or vice versa).  These are converted back to a branch sequence
after register allocation.  Since the expandRAPseudos callback is not
allowed to create new basic blocks, this requires a simple new pass,
modelled after the ARM/AArch64 ExpandPseudos pass.

Overall, this patch causes significantly more LOC-type instructions
to be used, and results in a measurable performance improvement.

llvm-svn: 288028
2016-11-28 13:34:08 +00:00
Craig Topper
624fc12e44 [X86][FMA4] Remove isCommutable from FMA4 scalar intrinsics. They aren't commutable as operand 0 should pass its upper bits through to the output.
llvm-svn: 288011
2016-11-27 21:37:04 +00:00
Craig Topper
bf372031c2 [X86][FMA] Add missing Predicates qualifier around scalar FMA intrinsic patterns.
llvm-svn: 288010
2016-11-27 21:37:02 +00:00
Craig Topper
9440849602 [X86][FMA4] Add load folding support for FMA4 scalar intrinsic instructions.
llvm-svn: 288009
2016-11-27 21:37:00 +00:00
Craig Topper
ba29d9ef0b [X86] Add SHL by 1 to the load folding tables.
I don't think isel selects these today, favoring adding the register to itself instead. But the load folding tables shouldn't be so concerned with what isel will use and just represent the relationships.

llvm-svn: 288007
2016-11-27 21:36:54 +00:00
Simon Pilgrim
38bff0f1fd [X86][SSE] Add support for combining target shuffles to 128/256-bit PSLL/PSRL bit shifts
llvm-svn: 288006
2016-11-27 21:08:19 +00:00
Craig Topper
1afaa3a66e [AVX-512] Add integer and fp unpck instructions to load folding tables.
llvm-svn: 288004
2016-11-27 19:51:41 +00:00
Simon Pilgrim
ba6a7de0c4 [X86][SSE] Split lowerVectorShuffleAsShift ready for combines. NFCI.
Moved most of matching code into matchVectorShuffleAsShift to share with target shuffle combines (in a future commit).

llvm-svn: 288003
2016-11-27 19:28:39 +00:00
Craig Topper
9da20958ad [X86] Add TB_NO_REVERSE to entries in the load folding table where the instruction's load size is smaller than the register size.
If we were to unfold these, the load size would be increased to the register size. This is not safe to do since the enlarged load can do things like cross a page boundary into a page that doesn't exist.

I probably missed some instructions, but this should be a large portion of them.

llvm-svn: 288001
2016-11-27 18:51:13 +00:00
Craig Topper
b4246740bf [AVX-512] Add masked EVEX vpmovzx/sx instructions to load folding tables.
llvm-svn: 287995
2016-11-27 08:55:31 +00:00
Craig Topper
b2d170fa15 [X86] Remove alignment restrictions from load folding table for some instructions that don't have a restriction.
Most of these are the SSE4.1 PMOVZX/PMOVSX instructions which all read less than 128-bits. The only other was PMOVUPD which by definition is an unaligned load.

llvm-svn: 287991
2016-11-27 01:52:51 +00:00
Craig Topper
6e69c55c53 [X86] Remove hasOneUse check that is redundant with the one in IsProfitableToFold.
llvm-svn: 287987
2016-11-26 18:43:26 +00:00
Craig Topper
f0aaae1a9c [X86] Fix the zero extending load detection in X86DAGToDAGISel::selectScalarSSELoad to pass the load node to IsProfitableToFold and IsLegalToFold.
Previously we were passing the SCALAR_TO_VECTOR node.

llvm-svn: 287986
2016-11-26 18:43:24 +00:00
Craig Topper
1f1f3db883 [X86] Simplify control flow. NFCI
llvm-svn: 287985
2016-11-26 18:43:21 +00:00
Craig Topper
e254e42669 [X86] Add a hasOneUse check to selectScalarSSELoad to keep the same load from being folded multiple times.
Summary: When selectScalarSSELoad is looking for a scalar_to_vector of a scalar load, it makes sure the load is only used by the scalar_to_vector. But it doesn't make sure the scalar_to_vector is only used once. This can cause the same load to be folded multiple times. This can be bad for performance. This also causes the chain output to be duplicated, but not connected to anything so chain dependencies will not be satisfied.

Reviewers: RKSimon, zvi, delena, spatel

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D26790

llvm-svn: 287983
2016-11-26 17:29:25 +00:00
Craig Topper
80de3aea41 [AVX-512] Add unmasked EVEX vpmovzx/sx instructions to load folding tables.
llvm-svn: 287975
2016-11-26 08:21:52 +00:00
Craig Topper
34f7b485a5 [AVX-512] Add masked 128/256-bit integer add/sub instructions to load folding tables.
llvm-svn: 287974
2016-11-26 08:21:48 +00:00
Craig Topper
3c44468f76 [AVX-512] Add masked 512-bit integer add/sub instructions to load folding tables.
llvm-svn: 287972
2016-11-26 07:21:00 +00:00
Craig Topper
e08dbcdead [AVX-512] Teach LowerFormalArguments to use the extended register class when available. Fix the avx512vl stack folding tests to clobber more registers or otherwise they use xmm16 after this change.
llvm-svn: 287971
2016-11-26 07:20:57 +00:00
Craig Topper
214cf755dd [AVX-512] Add VLX versions of VDIVPD/PS and VMULPD/PS to load folding tables.
llvm-svn: 287970
2016-11-26 07:20:53 +00:00
Tom Stellard
f3e7f685e9 AMDGPU/SI: Use float as the operand type for amdgcn.interp intrinsics
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26724

llvm-svn: 287962
2016-11-26 02:26:04 +00:00
Craig Topper
9ca4196263 [X86][XOP] Add a reversed reg/reg form for VPROT instructions.
The W bit distinquishes which operand is the memory operand. But if the mod bits are 3 then the memory operand is a register and there are two possible encodings. We already did this correctly for several other XOP instructions.

llvm-svn: 287961
2016-11-26 02:14:00 +00:00
Craig Topper
4718902485 [X86] Add SSE, AVX, and AVX2 version of MOVDQU to the load/store folding tables for consistency.
Not sure this is truly needed but we had the floating point equivalents, the aligned equivalents, and the EVEX equivalents. So this just makes it complete.

llvm-svn: 287960
2016-11-26 02:13:58 +00:00
Craig Topper
829a407378 [AVX-512] Put the AVX-512 sections of the load folding tables into mostly alphabetical order. This is consistent with the older sections of the table. NFC
llvm-svn: 287956
2016-11-25 23:21:34 +00:00
Marek Olsak
30b976334f AMDGPU/SI: Add back reverted SGPR spilling code, but disable it
suggested as a better solution by Matt

llvm-svn: 287942
2016-11-25 17:37:09 +00:00
Simon Pilgrim
088d6c5f6c Use SDValue helper instead of explicitly going via SDValue::getNode(). NFCI
llvm-svn: 287940
2016-11-25 17:19:53 +00:00
Craig Topper
363e0abfa5 [AVX-512] Add support for changing VSHUFF64x2 to VSHUFF32x4 when its feeding a vselect with 32-bit element size.
Summary:
Shuffle lowering may have widened the element size of a i32 shuffle to i64 before selecting X86ISD::SHUF128. If this shuffle was used by a vselect this can prevent us from selecting masked operations.

This patch detects this and changes the element size to match the vselect.

I don't handle changing integer to floating point or vice versa as its not clear if its better to push such a bitcast to the inputs of the shuffle or to the user of the vselect. So I'm ignoring that case for now.

Reviewers: delena, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27087

llvm-svn: 287939
2016-11-25 16:48:05 +00:00
Craig Topper
3f045379f5 [AVX-512] Add VPERMT2* and VPERMI2* instructions to load folding tables.
llvm-svn: 287937
2016-11-25 16:33:53 +00:00
Marek Olsak
9d8f0b805a Revert "AMDGPU: Implement SGPR spilling with scalar stores"
This reverts commit 4404d0d6e354e80dd7f8f0a0e12d8ad809cf007e.

llvm-svn: 287936
2016-11-25 16:03:34 +00:00
Marek Olsak
35ac58863e Revert "AMDGPU: Fix MMO when splitting spill"
This reverts commit 79d4f8b8b1ce430c3d5dac4fc72a9eebaed24fe1.

llvm-svn: 287935
2016-11-25 16:03:27 +00:00
Marek Olsak
663483a60f Revert "AMDGPU: Fix adding extra implicit def of register"
This reverts commit e834ce5976567575621901fb967b8018b9916d71.

llvm-svn: 287934
2016-11-25 16:03:22 +00:00
Marek Olsak
56e27f3dc2 Revert "AMDGPU: Fix not setting kill flag on temp reg when spilling"
This reverts commit 057bbbe4ae170247ba37f08f2e70ef185267d1bb.

llvm-svn: 287933
2016-11-25 16:03:19 +00:00
Marek Olsak
c530d56272 Revert "AMDGPU: Make m0 unallocatable"
This reverts commit 124ad83dae04514f943902446520c859adee0e96.

llvm-svn: 287932
2016-11-25 16:03:15 +00:00
Marek Olsak
2cef424fef Revert "AMDGPU: Remove m0 spilling code"
This reverts commit f18de36554eb22416f8ba58e094e0272523a4301.

llvm-svn: 287931
2016-11-25 16:03:06 +00:00
Marek Olsak
55154098e4 Revert "AMDGPU: Preserve m0 value when spilling"
This reverts commit a5a179ffd94fd4136df461ec76fb30f04afa87ce.

llvm-svn: 287930
2016-11-25 16:03:02 +00:00
Simon Dardis
9f9cfefda4 [mips] Correct jal expansion for local symbols in .local directives.
This patch corrects the behaviour of code such as:

   .local foo
   jal foo
foo:
to use the correct jal expansion when writing ELF files.

Patch by: Daniel Sanders

Reviewers: zoran.jovanovic, seanbruno, vkalintiris

Differential Revision: https://reviews.llvm.org/D24722

llvm-svn: 287918
2016-11-25 11:06:43 +00:00
Craig Topper
af38594037 [X86] Invert an 'if' and early out to fix a weird indentation. NFCI
llvm-svn: 287909
2016-11-25 02:29:24 +00:00
Craig Topper
6b380be65e [X86] Size a SmallVector to the worst case mask size for a 512-bit shuffle. NFCI
llvm-svn: 287908
2016-11-25 02:29:21 +00:00
Simon Pilgrim
37cfdddc1f Fix unused variable warning
llvm-svn: 287889
2016-11-24 15:24:47 +00:00
Benjamin Kramer
e74c692f6d [X86] Don't round trip a unique_ptr through a raw pointer for assignment.
No functional change.

llvm-svn: 287888
2016-11-24 15:17:39 +00:00
Simon Pilgrim
6733d66278 [X86][SSE] Improve UINT_TO_FP v2i32 -> v2f64
Vectorize UINT_TO_FP v2i32 -> v2f64 instead of scalarization (albeit still on the SIMD unit).

The codegen matches that generated by legalization (and is in fact used by AVX for UINT_TO_FP v4i32 -> v4f64), but has to be done in the x86 backend to account for legalization via 4i32.

Differential Revision: https://reviews.llvm.org/D26938

llvm-svn: 287886
2016-11-24 15:12:56 +00:00
Simon Pilgrim
b2804b00f4 [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287882
2016-11-24 14:46:55 +00:00
Simon Pilgrim
a0d57c34d1 [X86][AVX512DQVL] Add awareness of vcvtqq2ps and vcvtuqq2ps implicit zeroing of upper 64-bits of xmm result
llvm-svn: 287878
2016-11-24 14:02:30 +00:00
Simon Pilgrim
90a7669966 [X86][AVX512DQVL] Add support for v2i64 -> v2f32 SINT_TO_FP/UINT_TO_FP lowering
llvm-svn: 287877
2016-11-24 13:38:59 +00:00
Nikolai Bozhenov
717d0227e3 [x86] Fixing PR28755 by precomputing the address used in CMPXCHG8B
The bug arises during register allocation on i686 for
CMPXCHG8B instruction when base pointer is needed. CMPXCHG8B
needs 4 implicit registers (EAX, EBX, ECX, EDX) and a memory address,
plus ESI is reserved as the base pointer. With such constraints the only
way register allocator would do its job successfully is when the addressing
mode of the instruction requires only one register. If that is not the case
- we are emitting additional LEA instruction to compute the address.

It fixes PR28755.

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D25088

llvm-svn: 287875
2016-11-24 13:23:35 +00:00
Nikolai Bozhenov
393e654aee [x86] Minor refactoring of X86TargetLowering::EmitInstrWithCustomInserter
Move the definitions of three variables out of the switch.

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D25192

llvm-svn: 287874
2016-11-24 13:15:49 +00:00
Nikolai Bozhenov
4ab148a50e [x86] Rewrite getAddressFromInstr helper function
- It does not modify the input instruction
- Second operand of any address is always an Index Register,
  make sure we actually check for that, instead of a check for
  an immediate value

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D24938

llvm-svn: 287873
2016-11-24 13:05:43 +00:00
Simon Pilgrim
b9ed8abdaf [X86] Generalize CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes. NFCI
Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions.

This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types.

Differential Revision: https://reviews.llvm.org/D27072

llvm-svn: 287870
2016-11-24 12:13:46 +00:00
Matt Arsenault
4a06c5b78a AMDGPU: Preserve m0 value when spilling
llvm-svn: 287844
2016-11-24 00:26:50 +00:00
Matt Arsenault
dac54cd124 TRI: Add hook to pass scavenger during frame elimination
The scavenger was not passed if requiresFrameIndexScavenging was
enabled. I need to be able to test for the availability of an
unallocatable register here, so I can't create a virtual register for
it.

It might be better to just always use the scavenger and stop
creating virtual registers.

llvm-svn: 287843
2016-11-24 00:26:47 +00:00
Matt Arsenault
eb4e4ccc03 AMDGPU: Remove m0 spilling code
Since m0 isn't allocatable it should never be spilled anymore.

llvm-svn: 287842
2016-11-24 00:26:44 +00:00
Matt Arsenault
9a257a9a17 AMDGPU: Make m0 unallocatable
m0 may need to be written for spill code, so
we don't want general code uses relying on the
value stored in it.

This introduces a few code quality regressions where copies
from m0 are not coalesced into copies of a copy of m0.

llvm-svn: 287841
2016-11-24 00:26:40 +00:00
Simon Pilgrim
a6ebd63f82 [X86][SSE] Add awareness of (v)cvtpd2dq and vcvtpd2udq implicit zeroing of upper 64-bits of xmm result
We've already added the equivalent for (v)cvttpd2dq (rL284459) and vcvttpd2udq

llvm-svn: 287835
2016-11-23 22:35:06 +00:00
Matt Arsenault
52046c39e3 AMDGPU: Cleanup immediate folding code
Move code down to use, reorder to avoid hard to follow
immediate folding logic.

llvm-svn: 287818
2016-11-23 21:51:07 +00:00
Matt Arsenault
51f193382b AMDGPU: Fix debug printing
The uint8_t was printed as a char which didn't really work.

llvm-svn: 287817
2016-11-23 21:51:05 +00:00
Matt Arsenault
8316e1ce66 AMDGPU: Fix not setting kill flag on temp reg when spilling
llvm-svn: 287808
2016-11-23 21:00:12 +00:00
Matt Arsenault
4fa66c3653 AMDGPU: Fix adding extra implicit def of register
In the scalar case, there's no reason to add an additional
def of the same register.

llvm-svn: 287807
2016-11-23 21:00:10 +00:00
Matt Arsenault
dae776c6dc AMDGPU: Fix MMO when splitting spill
The size and offset were wrong. The size of the object was
being used for the size of the access, when here it is really
being split into 4-byte accesses. The underlying object size
is set in the MachinePointerInfo, which also didn't have the
offset set.

llvm-svn: 287806
2016-11-23 20:52:53 +00:00
Michael Kuperstein
fb1214dfc3 [X86] Allow folding of stack reloads when loading a subreg of the spilled reg
We did not support subregs in InlineSpiller:foldMemoryOperand() because targets
may not deal with them correctly.

This adds a target hook to let the spiller know that a target can handle
subregs, and actually enables it for x86 for the case of stack slot reloads.
This fixes PR30832.

Differential Revision: https://reviews.llvm.org/D26521

llvm-svn: 287792
2016-11-23 18:33:49 +00:00
Nemanja Ivanovic
3f612871bc [PowerPC] Remove InstAlias definitions that cause incorrect assembly
In rL283190, I added some InstAlias definitions to generate extended mnemonics
for some uses of the XXPERMDI instruction. However, when the assembler matches
these extended mnemonics, it matches the new instruction in situations where it
should match the old one.
This patch removes these definitions and accomplishes that by defining these
mnemonics with additional instructions that are isCodeGenOnly.

Fixes PR31127.

llvm-svn: 287765
2016-11-23 15:51:52 +00:00
Simon Pilgrim
5df905e2e1 [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287762
2016-11-23 14:01:18 +00:00
Simon Pilgrim
1f99fea9ae [CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs
llvm-svn: 287760
2016-11-23 13:42:09 +00:00
Craig Topper
db1139c3c8 [AVX-512] Remove intrinsics for valignd/q and autoupgrade them to native shuffles.
llvm-svn: 287744
2016-11-23 06:54:55 +00:00
Zvi Rackover
624c7ddb4e [X86] Simplify lowerVectorShuffleAsBitMask to handle only integer VT's
Summary: This function is only called with integer VT arguments, so remove code that handles FP vectors.

Reviewers: RKSimon, craig.topper, delena, andreadb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26985

llvm-svn: 287743
2016-11-23 06:45:25 +00:00
Kuba Mracek
c7c751102c [xray] Add XRay support for Mach-O in CodeGen
Currently, XRay only supports emitting the XRay table (xray_instr_map) on ELF binaries. Let's add Mach-O support.

Differential Revision: https://reviews.llvm.org/D26983

llvm-svn: 287734
2016-11-23 02:07:04 +00:00
Matthias Braun
882a081504 TargetSubtargetInfo: Move implementation to lib/CodeGen; NFC
TargetSubtargetInfo is filled with CodeGen specific interfaces nowadays
(getInstrInfo(), getFrameLowering(), getSelectionDAGInfo()) most of the
tuning flags like enablePostRAScheduler(), getAntiDepBreakMode(),
enableRALocalReassignment(), ... also do not seem to be universal enough
to make sense outside of CodeGen.

Differential Revision: https://reviews.llvm.org/D26948

llvm-svn: 287708
2016-11-22 22:09:03 +00:00
Simon Dardis
f137c531a9 [mips] seb, seh instruction aliases
Add the single operand form.

Reviewers: vkalintiris

Differential Revision: https://reviews.llvm.org/D26961

llvm-svn: 287681
2016-11-22 19:17:23 +00:00
Nemanja Ivanovic
6069e88621 [PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE
This patch corresponds to review:
https://reviews.llvm.org/D26861

It also fixes PR30730.

Committing on behalf of Lei Huang.

llvm-svn: 287679
2016-11-22 19:02:07 +00:00
Simon Pilgrim
c4e32d1f05 [X86][SSE] Combine UNPCKL(FHADD,FHADD) -> FHADD for v2f64 shuffles.
This occurs during UINT_TO_FP v2f64 lowering. 

We can easily generalize this to other horizontal ops (FHSUB, PACKSS, PACKUS) as required - we are doing something similar with PACKUS in lowerV2I64VectorShuffle

llvm-svn: 287676
2016-11-22 17:50:06 +00:00
Vasileios Kalintiris
ee20592fd5 [mips] Add support for unaligned load/store macros.
Add missing unaligned store macros (ush/usw) and fix the exisiting
implementation of the unaligned load macros in order to generate
identical expansions with the GNU assembler.

llvm-svn: 287646
2016-11-22 16:43:49 +00:00
Tim Northover
6fa36b94d5 CodeGen: simplify TargetMachine::getSymbol interface. NFC.
No-one actually had a mangler handy when calling this function, and
getSymbol itself went most of the way towards getting its own mangler
(with a local TLOF variable) so forcing all callers to supply one was
just extra complication.

llvm-svn: 287645
2016-11-22 16:17:20 +00:00
Zvi Rackover
cca9ed0f2c [X86] Change lowerBuildVectorToBitOp() to take a BuildVectorSDNode. NFC.
llvm-svn: 287644
2016-11-22 15:33:28 +00:00
Zvi Rackover
011b7bdc42 [X86] Remove dead code from LowerVectorBroadcast
Summary: Splat vectors are canonicalized to BUILD_VECTOR's so the code can be simplified. NFC-ish.

Reviewers: craig.topper, delena, RKSimon, andreadb

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D26678

llvm-svn: 287643
2016-11-22 15:17:52 +00:00
Chad Rosier
fb1ac1426b [AArch64] Set the max interleave factor for Falkor.
llvm-svn: 287642
2016-11-22 14:25:02 +00:00
Chad Rosier
b2099bf97c [AArch64] Maximize 80-column. NFC.
llvm-svn: 287640
2016-11-22 14:12:09 +00:00
Coby Tayree
02307bc032 [AVX512][inline-asm] Fix AVX512 inline assembly instruction resolution when the size qualifier of a memory operand is not specified explicitly.
This commit handles cases where the size qualifier of an indirect memory reference operand in Intel syntax is missing (e.g. "vaddps xmm1, xmm2, [a]").

GCC will deduce the size qualifier for AVX512 vector and broadcast memory operands based on the possible matches:
"vaddps xmm1, xmm2, [a]" matches only “XMMWORD PTR” qualifier.
"vaddps xmm1, xmm2, [a]{1to4}" matches only “DWORD PTR” qualifier.

This is different from the current behavior of LLVM, which deduces the size qualifier based on the size of the memory operand.
For "vaddps xmm1, xmm2, [a]"
"char a;" will imply "BYTE PTR" qualifier
"short a;" will imply "WORD PTR" qualifier.

This commit aligns LLVM to GCC’s behavior.

This is the LLVM part of the review.
The Clang part of the review: https://reviews.llvm.org/D26587

Differential Revision: https://reviews.llvm.org/D26586

llvm-svn: 287630
2016-11-22 09:30:29 +00:00
Craig Topper
0ca9ef1d49 [X86] Remove alternate CodeGenOnly version of (v)movq that declared the load size as i128mem. Change all uses to the use the i64mem version.
I'm sure this caused the load size to misprint in Intel syntax output. We were also inconsistent about which patterns used which instruction between VEX and EVEX.

There are two different reg/reg versions of movq, one from a GPR and one from the lower 64-bits of an XMM register. This changes the loading folding table to use the single i64mem memory form for folding both cases. But we need to use TB_NO_REVERSE to prevent a duplicate entry in the unfolding table.

llvm-svn: 287622
2016-11-22 05:31:43 +00:00
Craig Topper
f6fac334e5 [AVX-512] Add support for commuting VPERMT2(B/W/D/Q/PS/PD) to/from VPERMI2(B/W/D/Q/PS/PD).
Summary:
The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities.

We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too.

Reviewers: igorb, delena, Ayal, Farhana, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25652

llvm-svn: 287621
2016-11-22 04:57:34 +00:00
Craig Topper
4a0cbbbdb0 [AVX-512] Add support for changing the element size of PALIGNR/VALIGND/VALIGNQ shuffles if they feed a vselect with a different type
Summary:
Shuffle lowering widens the element size of a shuffle if elements are contiguous. This is sometimes help because wider element types have more shuffle options. If the shuffle is one of the arguments to a vselect this shuffle widening can introduce a bitcast between the vselect and the shuffle. This will prevent isel from selecting a masked operation. If the shuffle can be written equally efficiently with a different element size to match the vselect type we should change the shuffle type to allow masking.

This patch does this conversion for all VALIGND/VALIGNQ sizes. It also supports turning 128-bit PALIGNR into VALIGND/VALIGNQ. This fixes the case shown in PR31018.

I plan to add support for more operations in future patches.

Reviewers: RKSimon, zvi, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26902

llvm-svn: 287612
2016-11-22 03:51:53 +00:00
Stanislav Mekhanoshin
e6e37f3c8a [AMDGPU] Fix multiple vreg definitions in si-lower-control-flow
Differential Revision: https://reviews.llvm.org/D26939

llvm-svn: 287608
2016-11-22 01:42:34 +00:00
Geoff Berry
bab8f5af79 [AArch64LoadStoreOptimizer] Don't treat write to XZR/WZR as a clobber.
Summary:
When searching for load/store instructions to pair/merge don't treat
writes to WZR/XZR as clobbers since they don't change the value read
from WZR/XZR (which is always 0).

Reviewers: mcrosier, junbuml, jmolloy, t.p.northover

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D26921

llvm-svn: 287592
2016-11-21 22:51:10 +00:00
Simon Dardis
984df218ce [mips] seq macro support
This patch adds the seq macro.

This partially resolves PR/30381.

Thanks to Sean Bruno for reporting the issue!

Reviewers: zoran.jovanovic, vkalintiris, seanbruno

Differential Revision: https://reviews.llvm.org/D24607

llvm-svn: 287573
2016-11-21 20:30:41 +00:00
Coby Tayree
31003d8e56 small fixup which enables the issuing of the aforementioned instruction (w/o operands), on MS/Intel syntax.
Differential Revision: https://reviews.llvm.org/D26913

llvm-svn: 287548
2016-11-21 15:50:56 +00:00
Simon Pilgrim
ccffba1888 [X86][SSE] Allow PACKSS to be used to truncate any type of all/none sign bits input
At the moment we only use truncateVectorCompareWithPACKSS with direct vector comparison results (just one example of a known all/none signbits input).

This change relaxes the direct matching of a SETCC opcode by moving the logic up into SelectionDAG::ComputeNumSignBits and accepting any input with a known splatted signbit.

llvm-svn: 287535
2016-11-21 12:05:49 +00:00
Michael Zuckerman
5b15b8b9d0 Fixing a small typo (A->U).
This seem to fixes PR30992.

-         HasAVX512 ? X86::VMOVAPSZ128rm_NOVLX 
+         HasAVX512 ? X86::VMOVUPSZ128rm_NOVLX 

llvm-svn: 287532
2016-11-21 11:52:11 +00:00
Craig Topper
514d5c696a [AVX-512] Add EVEX form of VMOVZPQILo2PQIZrm to load folding tables to match SSE and AVX.
llvm-svn: 287523
2016-11-21 07:51:31 +00:00
Alexei Starovoitov
5e8323860d [bpf] fix dwarf elf relocs and line numbers
- teach RelocVisitor to recognize bpf relocations
- fix AsmInfo->PointerSize to make sure dwarf is emitted correctly
- add a test for the above

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 287521
2016-11-21 06:21:23 +00:00
Craig Topper
e4cf482936 [X86] Remove duplicate instructions for (v)movq and replace with patterns on other instructions. NFC
llvm-svn: 287519
2016-11-21 04:07:56 +00:00
Dean Michael Berris
aeea4f9adc [XRay][AArch64] Implemented a test for the compile-time sleds emitted, and fixed a bug in the jump instruction
This patch adds a test for the assembly code emitted with XRay
instrumentation. It also fixes a bug where the operand of a jump
instruction must be not the number of bytes to jump over, but rather the
number of 4-byte instructions.

Author: rSerge

Reviewers: dberris, rengolin

Differential Revision: https://reviews.llvm.org/D26805

llvm-svn: 287516
2016-11-21 03:01:43 +00:00
Simon Dardis
29c3915faa [mips] Restrict tail call optimization
The tail call optimization was being used without proper consideration of
ABI requirements for saving and restoring the GP. This patch restricts tail
call optimization to functions within the same translation unit.

Reviewers: vkalintiris

Differential Revision: https://reviews.llvm.org/D24763

llvm-svn: 287505
2016-11-20 21:23:08 +00:00
Coby Tayree
c8112b9e1c The 'vpmultishiftqb' instruction was implemented falsely, this patch amend it.
More specifically - (MS dialect) broadcasting variants were implemented falsely.

Differential Revision: https://reviews.llvm.org/D26257

llvm-svn: 287501
2016-11-20 17:19:55 +00:00
Coby Tayree
dfe03a8851 Some instructions were missing, other implemented falsely. this patch aims at amending those issues. full list:
vcvtps2pd
vcvtudq2pd
vcvtps2qq
vcvttps2qq
vcvtps2uqq
vcvttps2uqq

variants are:

[Dst]XMM(zero-masked/merge-masked/unmasked)
[Src]Mem64

Differential Revision: https://reviews.llvm.org/D26799

llvm-svn: 287500
2016-11-20 17:09:56 +00:00
Simon Pilgrim
0f773325da [X86][AVX512] Combine unary + zero target shuffles to VPERMV3 with a zero vector where possible
llvm-svn: 287497
2016-11-20 16:11:36 +00:00
Simon Pilgrim
7ee727735c [X86][AVX512] Add support for VBMI VPERMV3 target shuffle combines
llvm-svn: 287496
2016-11-20 15:24:38 +00:00
Simon Pilgrim
11a92eecb6 [X86][AVX512] Add support for VBMI VPERMV target shuffle combines
llvm-svn: 287495
2016-11-20 15:05:45 +00:00
Simon Pilgrim
477296e35a [X86][AVX512VL] Removed duplicate operation action
Basic AVX512F already declared uint_to_fp v4i32 as legal

llvm-svn: 287493
2016-11-20 14:19:29 +00:00
Simon Pilgrim
d4d881b5ab Strip trailing whitespace
llvm-svn: 287492
2016-11-20 14:05:23 +00:00
Simon Pilgrim
415ba9bf65 [X86][AVX512F] Add support for uint_to_fp v2i32 to v2f64 on AVX512F-only targets
Use 512-bit instructions (we already do something similar for uint_to_fp v4i32 to v4f64)

llvm-svn: 287491
2016-11-20 14:03:23 +00:00
Simon Pilgrim
070f334aac Fix comment typos. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287486
2016-11-20 13:10:51 +00:00
Oren Ben Simhon
bce4fd5213 [X86] RegCall - Handling long double arguments
The change is part of RegCall calling convention support for LLVM.
Long double (f80) requires special treatment as the first f80 parameter is saved in FP0 (floating point stack).
This review present the change and the corresponding tests.

Differential Revision: https://reviews.llvm.org/D26151

llvm-svn: 287485
2016-11-20 11:06:07 +00:00
Coby Tayree
c2ed22c0ae [X86][InlineAsm]Test commit.
Fixing a wrong comment on X86AsmParser.cpp::ParseZ: "true" --> "false"

Differential Revision: https://reviews.llvm.org/D26797

llvm-svn: 287484
2016-11-20 09:31:11 +00:00
Alexei Starovoitov
79a4851ba7 [bpf] add BPF disassembler
add BPF disassembler, so tools like llvm-objdump can be used:
$ llvm-objdump -d -no-show-raw-insn ./sockex1_kern.o

./sockex1_kern.o:	file format ELF64-BPF

Disassembly of section socket1:
bpf_prog1:
       0:	r6 = r1
       8:	r0 = *(u8 *)skb[23]
      10:	*(u32 *)(r10 - 4) = r0
      18:	r1 = *(u32 *)(r6 + 4)
      20:	if r1 != 4 goto 8
      28:	r2 = r10
      30:	r2 += -4

ld_imm64 (the only 16-byte insn) and special ld_abs/ld_ind instructions
had to be treated in a special way. The decoders for the rest of the insns
are automatically generated.

Add tests to cover new functionality.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 287477
2016-11-20 02:25:00 +00:00
Benjamin Kramer
7cad84ff09 Give some helper classes/functions internal linkage. NFC.
llvm-svn: 287462
2016-11-19 20:44:26 +00:00
Simon Pilgrim
04cf2ff75c [X86][SSE] Improve PSHUFB lowering from either input
Canonicalization may leave the zeroable vector in the first input.

llvm-svn: 287461
2016-11-19 20:41:48 +00:00
Simon Pilgrim
dfba5e88ee [X86][AVX512] Add VPERMV/VPERMV3 v64i8 byte shuffles on avx512vbmi targets
llvm-svn: 287459
2016-11-19 20:12:34 +00:00
Craig Topper
5b8b954f1c [X86] Simplify some code a little by removing a dulicate variable and combinining two if statements. NFCI
llvm-svn: 287443
2016-11-19 17:33:17 +00:00
Daniel Sanders
206a49e5a7 Try again to fix unused variable warning on lld-x86_64-darwin13 after r287439.
The previous attempt didn't work. I assume LLVM_ATTRIBUTE_UNUSED isn't
available on that machine.

llvm-svn: 287442
2016-11-19 14:47:41 +00:00
Daniel Sanders
4f0f4b9233 Try to fix unused variable warning on lld-x86_64-darwin13 after r287439.
Whether the variable is used or not depends on NDEBUG.

llvm-svn: 287440
2016-11-19 13:50:32 +00:00
Daniel Sanders
811dc2eda3 Check that emitted instructions meet their predicates on all targets except ARM, Mips, and X86.
Summary:
* ARM is omitted from this patch because this check appears to expose bugs in this target.
* Mips is omitted from this patch because this check either detects bugs or deliberate
  emission of instructions that don't satisfy their predicates. One deliberate
  use is the SYNC instruction where the version with an operand is correctly
  defined as requiring MIPS32 while the version without an operand is defined
  as an alias of 'SYNC 0' and requires MIPS2.
* X86 is omitted from this patch because it doesn't use the tablegen-erated
  MCCodeEmitter infrastructure.

Patches for ARM and Mips will follow.

Depends on D25617

Reviewers: tstellarAMD, jmolloy

Subscribers: wdng, jmolloy, aemerson, rengolin, arsenm, jyknight, nemanjai, nhaehnle, tstellarAMD, llvm-commits

Differential Revision: https://reviews.llvm.org/D25618

llvm-svn: 287439
2016-11-19 13:05:44 +00:00
Dylan McKay
864fb084b4 [AVR] Remove a bunch of unused variables
llvm-svn: 287416
2016-11-19 01:33:42 +00:00
Dylan McKay
3a88412681 [AVR] Remove a variable which was unused in release mode
In release mode where assertions are not enabled, this caused an 'unused
variable' warning.

llvm-svn: 287414
2016-11-19 01:14:44 +00:00
Konstantin Zhuravlyov
aaff08fa3a [AMDGPU] Change frexp.exp intrinsic to return i16 for f16 input
Differential Revision: https://reviews.llvm.org/D26862

llvm-svn: 287389
2016-11-18 22:31:08 +00:00
Matthias Braun
bcd40e41de Timer: Track name and description.
The previously used "names" are rather descriptions (they use multiple
words and contain spaces), use short programming language identifier
like strings for the "names" which should be used when exporting to
machine parseable formats.

Also removed a unused TimerGroup from Hexxagon.

Differential Revision: https://reviews.llvm.org/D25583

llvm-svn: 287369
2016-11-18 19:43:18 +00:00
Matt Arsenault
0ee6327437 AMDGPU: Fix unused variable warning
llvm-svn: 287362
2016-11-18 18:33:36 +00:00
Ehsan Amiri
64f7597ad0 [PPC] limit line width to 80 characters
NFC. Forgot to fix this in the original commit.

llvm-svn: 287350
2016-11-18 16:24:27 +00:00
Simon Dardis
f29272d743 [mips][msa] Implement f16 support
The MIPS MSA ASE provides instructions to convert to and from half precision
floating point. This patch teaches the MIPS backend to treat f16 as a legal
type and how to promote such values to f32 for the usual set of operations.

As a result of this, the fexup[lr].w intrinsics no longer crash LLVM during
type legalization.

Reviewers: zoran.jovanvoic, vkalintiris

Differential Revision: https://reviews.llvm.org/D26398

llvm-svn: 287349
2016-11-18 16:17:44 +00:00
Tom Stellard
a3b8754644 AMDGPU/SI: Remove zero_extend patterns for i16 ops selected to 32-bit insts
Summary:
The 32-bit instructions don't zero the high 16-bits like the 16-bit
instructions do.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26828

llvm-svn: 287342
2016-11-18 13:53:34 +00:00
Simon Pilgrim
b26853abef Cleanup function with clang-format. NFCI.
llvm-svn: 287340
2016-11-18 12:16:18 +00:00
Nicolai Haehnle
49cca90a3f AMDGPU: Fix legalization of MUBUF instructions in shaders
Summary:
The addr64-based legalization is incorrect for MUBUF instructions with idxen
set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions.  This affects
e.g.  shaders that access buffer textures.

Since we never actually need the addr64-legalization in shaders, this patch
takes the easy route and keys off the calling convention.  If this ever
affects (non-OpenGL) compute, the type of legalization needs to be chosen
based on some TSFlag.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D26747

llvm-svn: 287339
2016-11-18 11:55:52 +00:00
Simon Pilgrim
1c5557505e Fix spelling mistakes in MIPS target comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287338
2016-11-18 11:53:36 +00:00
Ehsan Amiri
d6a7aef00b [Power9] Add patterns for vnegd, vnegw
Exploit new instructions by adding patterns to .td file.
https://reviews.llvm.org/D26551

llvm-svn: 287334
2016-11-18 11:05:55 +00:00
Simon Pilgrim
a971e6cc43 Fix spelling mistakes in AMDGPU target comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287333
2016-11-18 11:04:02 +00:00
Simon Pilgrim
903cc98eef Fix typo in comment. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287331
2016-11-18 10:52:12 +00:00
Ehsan Amiri
852aec9243 [PPC][DAGCombine] Convert SETCC to subtract when the result is zero extended
When we see a SETCC whose only users are zero extend operations, we can replace
it with a subtraction. This results in doing all calculations in GPRs and
avoids CR use.

Currently we do this only for ULT, ULE, UGT and UGE condition codes. There are
ways that this can be extended. For example for signed condition codes. In that
case we will be introducing additional sign extend instructions, so more careful
profitability analysis may be required.

Another direction to extend this is for equal, not equal conditions. Also when
users of SETCC are any_ext or sign_ext, we might be able to do something 
similar.

llvm-svn: 287329
2016-11-18 10:41:44 +00:00
Craig Topper
8578a62eac [AVX-512] Replace masked 16-bit element variable shift intrinsics with new unmasked versions and selects.
The same thing was done to 32-bit and 64-bit element sizes previously.

This will allow us to support these shuffls in InstCombineCalls along with the other variable shift intrinsics.

llvm-svn: 287312
2016-11-18 05:04:44 +00:00
Matt Arsenault
cf561a0fd0 AMDGPU: Move redundant setting of inst properties
llvm-svn: 287311
2016-11-18 04:42:59 +00:00
Matt Arsenault
0fe623be4f AMDGPU: Fix crash on illegal type for inlineasm
There are still crashes on non-MVT types in other
places.

llvm-svn: 287310
2016-11-18 04:42:57 +00:00
Alexei Starovoitov
719b0399a8 convert bpf assembler to look like kernel verifier output
since bpf instruction set was introduced people learned to
read and understand kernel verifier output whereas llvm asm
output stayed obscure and unknown. Convert llvm to emit
assembler text similar to kernel to avoid this discrepancy

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 287300
2016-11-18 02:32:35 +00:00
Craig Topper
d5ba7319d4 [AVX-512] Support FCOPYSIGN for v16f32 and v8f64
Summary:
This extends FCOPYSIGN support to 512-bit vectors.

I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads.

Reviewers: delena, zvi, RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26791

llvm-svn: 287298
2016-11-18 02:25:34 +00:00
Simon Pilgrim
811103e354 Fix spelling mistakes in Hexagon target comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287248
2016-11-17 19:21:20 +00:00
Simon Pilgrim
1d43c52b59 Fix spelling mistakes in X86 target comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287247
2016-11-17 19:03:05 +00:00