1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 12:02:58 +02:00
Commit Graph

39734 Commits

Author SHA1 Message Date
Sjoerd Meijer
5a6c252c8d [ARM] Code size optimisation to lower udiv+urem to udiv+mls instead of a
library call to __aeabi_uidivmod. This is an improved implementation of
r280808, see also D24133, that got reverted because isel was stuck in a loop.
That was caused by the optimisation incorrectly triggering on i64 ints, which
shouldn't happen because there is no 64bit hwdiv support; that put isel's type
legalization and this optimisation in a loop. A native ARM compiler and testing
now shows that this is fixed.

Patch mostly by Pablo Barrio.

Differential Revision: https://reviews.llvm.org/D25077

llvm-svn: 283098
2016-10-03 10:12:32 +00:00
Konstantin Zhuravlyov
2231c0c9d2 [AMDGPU] Remove unused variables from SIOptimizeExecMasking
Differential Revision: https://reviews.llvm.org/D25110

llvm-svn: 283087
2016-10-03 04:43:22 +00:00
Hal Finkel
89dfd5d7ed [PowerPC] Account for the ELFv2 function prologue during branch selection
The PPC branch-selection pass, which performs branch relaxation, needs to
account for the padding that might be introduced to satisfy block alignment
requirements. We were assuming that the first block was at offset zero (i.e.
had the alignment of the function itself), but under the ELFv2 ABI, a global
entry function prologue is added to the first block, and it is a
two-instruction sequence (i.e. eight-bytes long). If the function has 16-byte
alignment, the fact that the first block is eight bytes offset from the start
of the function is relevant to calculating where padding will be added in
between later blocks.

Unfortunately, I don't have a small test case.

llvm-svn: 283086
2016-10-03 04:06:44 +00:00
Craig Topper
97b89b8e00 [AVX-512] Remove isCheapAsAMove flag from VMOVAPSZ128rm_NOVLX and friends.
This was accidentally copy and pasted from other Pseudos in the file.

llvm-svn: 283084
2016-10-03 02:22:33 +00:00
Craig Topper
05ed355772 [X86] Mark all sizes of (V)MOVUPD as trivially rematerializable.
I don't know for sure that we truly needs this, but its the only vector load that isn't rematerializable. Making it consistent allows it to not be a special case in the td files.

llvm-svn: 283083
2016-10-03 02:00:29 +00:00
Simon Pilgrim
c8b06fb7ec [X86][AVX2] Add support for combining target shuffles to VPERMD/VPERMPS
llvm-svn: 283080
2016-10-02 21:07:58 +00:00
Simon Pilgrim
143637ccb2 [X86][AVX] Ensure broadcast loads respect dependencies
To allow broadcast loads of a non-zero'th vector element, lowerVectorShuffleAsBroadcast can replace a load with a new load with an adjusted address, but unfortunately we weren't ensuring that the new load respected the same dependencies.

This patch adds a TokenFactor and updates all dependencies of the old load to reference the new load instead.

Bug found during internal testing.

Differential Revision: https://reviews.llvm.org/D25039

llvm-svn: 283070
2016-10-02 15:59:15 +00:00
Craig Topper
d5cac20861 [X86] Don't set i64 ADDC/ADDE/SUBC/SUBE as Custom if the target isn't 64-bit. This way we don't have to catch them and do nothing with them in ReplaceNodeResults.
llvm-svn: 283066
2016-10-02 06:13:43 +00:00
Craig Topper
d5126c36f5 [X86] Fix indentation. NFC
llvm-svn: 283065
2016-10-02 06:13:40 +00:00
Hal Finkel
35dddafa6b [PowerPC] Refactor soft-float support, and enable PPC64 soft float
This change enables soft-float for PowerPC64, and also makes soft-float disable
all vector instruction sets for both 32-bit and 64-bit modes. This latter part
is necessary because the PPC backend canonicalizes many Altivec vector types to
floating-point types, and so soft-float breaks scalarization support for many
operations. Both for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware floating-point
also disables vector instructions (embedded targets without hardware floating
point support are unlikely to have Altivec, etc. and operating system kernels
desiring not to use floating-point registers to lower syscall cost are unlikely
to want to use vector registers either). If someone needs this to work, we'll
need to change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is enabled,
hardware floating-point support needs to be expressed as a positive feature,
like the others, and not a negative feature, because target features cannot
have dependencies on the disabling of some other feature. So +soft-float has
now become -hard-float.

Fixes PR26970.

llvm-svn: 283060
2016-10-02 02:10:20 +00:00
Simon Pilgrim
c8c1e1ed53 [X86][SSE] Cleaned up shuffle decode assertion messages
llvm-svn: 283050
2016-10-01 20:12:56 +00:00
Simon Pilgrim
e572b8dadd Fix signed/unsigned warning
llvm-svn: 283041
2016-10-01 16:14:57 +00:00
Simon Pilgrim
3c7330116c [X86][SSE] Add support for combining target shuffles to binary BLEND
We already had support for 1-input BLEND with zero - this adds support for 2-input BLEND as well.

llvm-svn: 283040
2016-10-01 16:04:28 +00:00
Simon Pilgrim
fe10dc6252 [X86][SSE] Always combine target shuffles to MOVSD/MOVSS
Now we can commute to BLENDPD/BLENDPS on SSE41+ targets if necessary, so simplify the combine matching where we can.

This required me to add a couple of scalar math movsd/moss fold patterns that hadn't been needed in the past.

llvm-svn: 283038
2016-10-01 15:33:01 +00:00
Simon Pilgrim
74de116143 [X86][SSE] Enable commutation from MOVSD/MOVSS to BLENDPD/BLENDPS on SSE41+ targets
Instead of selecting between MOVSD/MOVSS and BLENDPD/BLENDPS at shuffle lowering by subtarget this will help us select the instruction based on actual commutation requirements.

We could possibly add BLENDPD/BLENDPS -> MOVSD/MOVSS commutation and MOVSD/MOVSS memory folding using a similar approach if it proves useful

I avoided adding AVX512 handling as I'm not sure when we should be making use of VBLENDPD/VBLENDPS on EVEX targets

llvm-svn: 283037
2016-10-01 14:26:11 +00:00
Craig Topper
6d5567d561 [X86] Cleanup patterns for using VMOVDDUP for broadcasts.
-Remove OptForSize. Not all of the backend follows the same rules for creating broadcasts and there is no conflicting pattern.
-Don't stop selecting VEX VMOVDDUP when AVX512 is supported. We need VLX for EVEX VMOVDDUP.
-Only use VMOVDDUP for v2i64 broadcasts if AVX2 is not supported.

llvm-svn: 283020
2016-10-01 07:11:24 +00:00
Mehdi Amini
422da84ace Revert "Use StringRef instead of raw pointer in TargetRegistry API (NFC)"
This reverts commit r283017. Creates an infinite loop somehow.

llvm-svn: 283019
2016-10-01 07:08:23 +00:00
Mehdi Amini
51f7f15d3c Use StringRef instead of raw pointers in MCAsmInfo/MCInstrInfo APIs (NFC)
llvm-svn: 283018
2016-10-01 06:46:33 +00:00
Mehdi Amini
ced7371476 Use StringRef instead of raw pointer in TargetRegistry API (NFC)
llvm-svn: 283017
2016-10-01 06:25:30 +00:00
Craig Topper
dddc4290c8 [AVX-512] Add EVEX versions of VPBROADCASTW patterns with truncated i32 loads.
llvm-svn: 283015
2016-10-01 06:01:23 +00:00
Mehdi Amini
356d607cc1 Use StringRef in Datalayout API (NFC)
llvm-svn: 283013
2016-10-01 05:57:55 +00:00
Mehdi Amini
b9d860c7ab Revert "Use StringRef in Datalayout API (NFC)"
This reverts commit r283009. Bots are broken.

llvm-svn: 283011
2016-10-01 05:12:48 +00:00
Mehdi Amini
72c57fa04f Use StringRef in Datalayout API (NFC)
llvm-svn: 283009
2016-10-01 04:17:59 +00:00
Mehdi Amini
1fef2dd6b7 Use StringRef in Pass/PassManager APIs (NFC)
llvm-svn: 283004
2016-10-01 02:56:57 +00:00
Mehdi Amini
1851fff9c0 Revert "AMDGPU: Don't use offen if it is 0"
This reverts commit r282999.
Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038

llvm-svn: 283003
2016-10-01 02:35:24 +00:00
Eric Christopher
d40a93b0ee Remove TargetTriple from AArch64MCInstLower as it's used in few places
and can be pulled from the TargetMachine. NFC.

llvm-svn: 283000
2016-10-01 01:50:25 +00:00
Matt Arsenault
58e2ff3f3c AMDGPU: Don't use offen if it is 0
This removes many re-initializations of a base register to 0.

llvm-svn: 282999
2016-10-01 01:37:15 +00:00
Quentin Colombet
86ffc4ee4a [AArch64][RegisterBankInfo] Use the helper functions for the checks
This makes sure the helper functions work as expected.

NFC.

llvm-svn: 282961
2016-09-30 21:46:21 +00:00
Quentin Colombet
cb5efa1795 [AArch64][RegisterBankInfo] Rename getValueMappingIdx to getValueMapping
We don't return index, we return the actual ValueMapping.

NFC.

llvm-svn: 282960
2016-09-30 21:46:19 +00:00
Quentin Colombet
66021ca5d5 [AArch64][RegisterBankInfo] Compress the ValueMapping table a bit.
We don't need to have singleton ValueMapping on their own, we can just
reuse one of the elements of the 3-ops mapping.
This allows even more code sharing.

NFC.

llvm-svn: 282959
2016-09-30 21:46:17 +00:00
Quentin Colombet
138aafd3ba [AArch64][RegisterBankInfo] Refactor the code to access AArch64::ValMapping
Use a helper function to access ValMapping. This should make the code
easier to understand and maintain.

NFC.

llvm-svn: 282958
2016-09-30 21:46:15 +00:00
Quentin Colombet
d414a349f2 [AArch64][RegisterBankInfo] Rename getRegBankIdx to getRegBankIdxOffset
The function name did not make it clear that the returned value was an
offset to apply to a register bank index.

NFC.

llvm-svn: 282957
2016-09-30 21:46:12 +00:00
Quentin Colombet
efb70bd767 [AArch64][RegisterBankInfo] Use the static opds mapping for alt mappings
Avoid to rely on the dynamically allocated operands mapping for the
alternative mapping.
NFC.

llvm-svn: 282956
2016-09-30 21:45:56 +00:00
Hans Wennborg
a560abffe4 X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)
We can't use Jcc to leave a Win64 function in general, because that
confuses the unwinder. However, for "leaf" functions, that is, functions
where the return address is always on top of the stack and which don't
have unwind info, it's OK.

Differential Revision: https://reviews.llvm.org/D24836

llvm-svn: 282920
2016-09-30 20:07:35 +00:00
Derek Schuff
cc1cf3d580 [WebAssembly] Make register stackification more conservative
Register stackification currently checks VNInfo for changes. Make that
more accurate by testing each intervening instruction for any other defs
to the same virtual register.

Patch by Jacob Gravelle

Differential Revision: https://reviews.llvm.org/D24942

llvm-svn: 282886
2016-09-30 18:02:54 +00:00
Konstantin Zhuravlyov
0d3cafde5b [AMDGPU] Choose VMCNT, EXPCNT, LGKMCNT masks and shifts based on the isa version
Differential Revision: https://reviews.llvm.org/D24973

llvm-svn: 282877
2016-09-30 17:01:40 +00:00
Konstantin Zhuravlyov
bbe6c88c7e [AMDGPU] Ask subtarget if waitcnt instruction is needed before barrier instruction
Differential Revision: https://reviews.llvm.org/D24985

llvm-svn: 282875
2016-09-30 16:50:36 +00:00
Konstantin Zhuravlyov
ef884dccde [AMDGPU] Do not run scalar optimization passes at "-O0"
Differential Revision: https://reviews.llvm.org/D25055

llvm-svn: 282873
2016-09-30 16:39:24 +00:00
Dylan McKay
cdace906da [AVR] Add the ELF object file writer
Summary: This adds the ELF32 writer for AVR.

Reviewers: kparzysz

Subscribers: beanz, mgorny

Differential Revision: https://reviews.llvm.org/D25031

llvm-svn: 282856
2016-09-30 14:09:20 +00:00
Dylan McKay
fc0928fdac [AVR] Add the assembly instruction printer
Summary:
This change adds the AVR assembly instruction printer.

No tests are included in this patch. I have left them downstream so we can
add them once `llc` successfully runs (there's very few components left
to upstream until this).

Reviewers: arsenm, kparzysz

Subscribers: wdng, beanz, mgorny

Differential Revision: https://reviews.llvm.org/D25028

llvm-svn: 282854
2016-09-30 14:01:50 +00:00
Craig Topper
4c30ce46a8 [AVX-512] Store address operand should be an input operand for the special stack spilling pseudos for XMM16-31 and YMM16-31 without VLX.
llvm-svn: 282843
2016-09-30 05:35:47 +00:00
Craig Topper
b9d2d4ca13 [AVX-512] Add the special stack spilling pseudos for XMM16-31 and YMM16-31 without VLX to teh isFrameLoadOpcode and isFrameStoreOpcode.
llvm-svn: 282842
2016-09-30 05:35:45 +00:00
Craig Topper
0213fa7d16 Revert r282835 "[AVX-512] Always use the full 32 register vector classes for addRegisterClass regardless of whether AVX512/VLX is enabled or not."
Turns out this doesn't pass verify-machineinstrs.

llvm-svn: 282841
2016-09-30 05:35:42 +00:00
Craig Topper
ce7f1ca0c2 [X86] Add AVX-512 VTs to findRepresentativeClass as well as v16i16 which was also missing. Change register class to include the extra 16 AVX512 registers.
I'm not completely sure what this method does or why all the 256-bit VTs returned VR128RegClass when the comments on the method definiton say it should return the largest super register class. I just figured AVX-512 should be similar.

llvm-svn: 282836
2016-09-30 04:31:37 +00:00
Craig Topper
829b549027 [AVX-512] Always use the full 32 register vector classes for addRegisterClass regardless of whether AVX512/VLX is enabled or not.
If AVX512 is disabled, the registers should already be marked reserved. Pattern predicates and register classes on instructions should take care of most of the rest. Loads/stores and physical register copies for XMM16-31 and YMM16-31 without VLX have already been taken care of.

I'm a little unclear why this changed the register allocation of the SSE2 run of the sad.ll test, but the registers selected appear to be valid after this change.

llvm-svn: 282835
2016-09-30 04:31:33 +00:00
Matt Arsenault
3a9a1ac61b AMDGPU: Use unsigned compare for eq/ne
For some reason there are both of these available, except
for scalar 64-bit compares which only has u64. I'm not sure
why there are both (I'm guessing it's for the one bit inputs we
don't use), but for consistency always using the
unsigned one.

llvm-svn: 282832
2016-09-30 01:50:20 +00:00
Reid Kleckner
d8564063fd [X86] Don't preserve Win64 SSE CSRs when SSE is disabled
Code that doesn't use floating point and doesn't use SSE (kernel code)
shouldn't save and restore SSE registers.

Fixes PR30503

llvm-svn: 282819
2016-09-30 00:17:49 +00:00
Quentin Colombet
5705443f01 [AArch64][RegisterBankInfo] Use static mapping for 3-operands instrs.
This uses a TableGen'ed like structure for all 3-operands instrs.
The output of the RegBankSelect pass should be identical but the
RegisterBankInfo will do less dynamic allocations.

llvm-svn: 282817
2016-09-30 00:10:00 +00:00
Quentin Colombet
2fc0df719f [AArch64][RegisterBankInfo] Add static value mapping for 3-op instrs.
This is the kind of input TableGen should generate at some point.
NFC.

llvm-svn: 282816
2016-09-30 00:09:58 +00:00
Quentin Colombet
6fe4d7907c [AArch64][RegisterBankInfo] Check the statically created ValueMapping.
Make sure that the ValueMappings contain the value we expect at the
indices we expect.

NFC.

llvm-svn: 282815
2016-09-30 00:09:43 +00:00
Douglas Katzman
d5fbe7bbc1 [X86] Avoid "unused" warnings if no asserts
llvm-svn: 282732
2016-09-29 17:26:12 +00:00
Simon Pilgrim
e14120722d [X86][SSE] Added common helper for shuffle mask constant pool decodes.
The shuffle mask decodes have a large amount of repeated code extracting/splitting mask values from Constant data.

This patch pulls all of this duplicated code into a single helper function to identify undef elements and combine/split constant integer data into the requested shuffle mask elements.

Updated PSHUFB/VPERMIL/VPERMIL2/VPPERM decoders to use it (VPERMV/VPERMV3 could be converted as well in the future).

llvm-svn: 282720
2016-09-29 15:25:48 +00:00
Dylan McKay
c658b91293 Revert "[AVR] Add instruction selection lowering code"
I accidentally comitted it.

llvm-svn: 282712
2016-09-29 12:49:18 +00:00
Dylan McKay
e8c532f66f [AVR] Add instruction selection lowering code
Summary: This adds AVRISelLowering.cpp

Reviewers: kparzysz, arsenm

Subscribers: wdng, beanz, mgorny

Differential Revision: https://reviews.llvm.org/D25034

llvm-svn: 282711
2016-09-29 12:44:38 +00:00
Craig Topper
574fec2e71 [AVX-512] Support spills of XMM16-31 and YMM16-31 when VLX isn't available.
This adds new pseudo instructions that can be selected during register allocation to represent loads and stores of XMM/YMM registers when AVX512F is available, but VLX isn't. They will be converted to VEX encoded moves if the register turns out to be XMM0-15/YMM0-15. Otherwise either an EVEX VEXTRACT(store) or VBROADCAST(load) will be used.

Fixes one of the cases from PR29112.

llvm-svn: 282690
2016-09-29 06:07:09 +00:00
Craig Topper
b2bb9a2bbc [AVX-512] Replicate pattern from AVX to select VMOVDDUP for (v2f64 (X86VBroadcast f64:)). Add AVX512VL to command line of existing AVX2 test that hits this condition.
llvm-svn: 282688
2016-09-29 05:54:43 +00:00
Craig Topper
4a193b69ad [X86] Add EVEX encoded VBROADCASTSS/SD and VPBROADCASTD/Q to execution domain fixing table.
llvm-svn: 282687
2016-09-29 05:54:39 +00:00
Craig Topper
cd7fa6db3b [X86] Remove AddedComplexity adjustments that don't seem to be needed.
llvm-svn: 282686
2016-09-29 05:54:34 +00:00
Craig Topper
9593dd5f5b [X86] Add VBROADCASTF128/VBROADCASTI128 to execution domain fixing tables.
llvm-svn: 282684
2016-09-29 05:54:28 +00:00
Eric Christopher
221e7ee5fe Remove an unnecessary duplicate initialization of TLOF from the Mips
AsmPrinter. This was reinitializing the Mangler after we moved the
Mangler down to TLOF and causing us to have two different unnamed
global values accessed with the same name.

This should fix the problems on the ubsan tests here:
http://lab.llvm.org:8011/builders/clang-cmake-mips/builds/15307

llvm-svn: 282675
2016-09-29 02:03:52 +00:00
Eric Christopher
1279434d8e Update comment about initializing TLOF with a pointer at the previous
line or the other commented out place.

llvm-svn: 282673
2016-09-29 02:03:47 +00:00
Matt Arsenault
c8493c6153 AMDGPU: Partially fix control flow at -O0
Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.

This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.

llvm-svn: 282667
2016-09-29 01:44:16 +00:00
Lei Liu
51c8520dd3 AArch64: Set shift bit of TLSLE HI12 add instruction
Summary: AArch64 LLVM assembler emits add instruction without shift bit to calculate the higher 12-bit address of TLS variables in local exec model.  This generates wrong code sequence to access TLS variables with thread offset larger than 0x1000.

Reviewers: t.p.northover, peter.smith, rovka

Subscribers: salim.nasser, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D24702

llvm-svn: 282661
2016-09-29 01:05:48 +00:00
Quentin Colombet
0a4e3ffa81 [RegisterBankInfo] Uniquely generate OperandsMapping.
This is a step toward statically allocate InstructionMapping. Like the
previous few commits, the goal is to move toward a TableGen'ed like
structure with no dynamic allocation at all.

This should already improve compile time by getting rid of a bunch of
memmove of SmallVectors.

llvm-svn: 282643
2016-09-28 22:20:49 +00:00
Konstantin Zhuravlyov
ebf3beb03f [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
Differential Revision: https://reviews.llvm.org/D24125

llvm-svn: 282624
2016-09-28 20:05:39 +00:00
Artem Belevich
ed0bd7024b [NVPTX] Added intrinsics for atom.gen.{sys|cta}.* instructions.
These are only available on sm_60+ GPUs.

Differential Revision: https://reviews.llvm.org/D24943

llvm-svn: 282607
2016-09-28 17:25:38 +00:00
Nirav Dave
1f7d22e77d Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

llvm-svn: 282604
2016-09-28 16:37:50 +00:00
Dylan McKay
95751de394 [AVR] Rename the builtin calling convention names
'BUILTIN' is clearer than 'RT' in this context.

llvm-svn: 282602
2016-09-28 16:04:40 +00:00
Marina Yatsina
269811ad82 [x86] Accept 'retn' as an alias to 'ret[lqw]'\'ret' (At&t\Intel)
Implement 'retn' simply by aliasing it to the relevant 'ret' instruction

Commit on behalf of coby

Differential Revision: https://reviews.llvm.org/D24346

llvm-svn: 282601
2016-09-28 15:52:56 +00:00
Nirav Dave
080cb64e9c In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 282600
2016-09-28 15:50:43 +00:00
Dylan McKay
aeb4afb742 [AVR] Import the LLVM namespace inside AVRMCTargetDesc.cpp
llvm-svn: 282598
2016-09-28 15:35:26 +00:00
Dylan McKay
f032b425d8 [AVR] Add AVRMCTargetDesc.cpp
Summary:
This adds the AVRMCTargetDesc file in tree. It allows creation of the
core classes used in the backend.

Reviewers: arsenm, kparzysz

Subscribers: wdng, beanz, mgorny

Differential Revision: https://reviews.llvm.org/D25023

llvm-svn: 282597
2016-09-28 15:31:12 +00:00
Dylan McKay
2ba7cbf4bc [AVR] Update the signature of createAVRAsmBackend
It has been recently changed to also take a MCTargetOptions structure.

llvm-svn: 282594
2016-09-28 14:35:07 +00:00
Dylan McKay
b456ea33de [AVR] Enable the assembly parser
We very recently landed the code. This commit enables the parser.

It also adds a missing include to AVRAsmParser.cpp

llvm-svn: 282593
2016-09-28 14:34:42 +00:00
Dylan McKay
aabba06f13 [AVR] Merge most recent changes to AVRInstrInfo.td
This adds two new things:

- Operand types per fixup
- Atomic pseudo operations

llvm-svn: 282588
2016-09-28 13:44:02 +00:00
Dylan McKay
56d1404eab [AVR] Update the data layout
The previous data layout caused issues when dealing with atomics.

Foe example, it is illegal to load a 16-bit value with less than 16-bits
of alignment.

This changes the data layout so that all types are aligned by at least
their own width.

Interestingly, this also _slightly_ decreased register pressure in some
cases.

llvm-svn: 282587
2016-09-28 13:29:10 +00:00
Dylan McKay
6c6ddcf476 [AVR] Add assembly parser
Summary: This patch adds the AVRAsmParser library.

Reviewers: arsenm, kparzysz

Subscribers: wdng, beanz, mgorny, kparzysz, simoncook, jtbandes, llvm-commits

Differential Revision: https://reviews.llvm.org/D20046

llvm-svn: 282584
2016-09-28 13:02:57 +00:00
Guy Blank
4de6e44893 [X86][FastISel] Use a COPY from K register to a GPR instead of a K operation
The KORTEST was introduced due to a bug where a TEST instruction used a K register.
but, turns out that the opposite case of KORTEST using a GPR is now happening

The change removes the KORTEST flow and adds a COPY instruction from the K reg to a GPR.

Differential Revision: https://reviews.llvm.org/D24953

llvm-svn: 282580
2016-09-28 11:22:17 +00:00
Simon Pilgrim
09aa1e4358 Strip trailing whitespace
llvm-svn: 282579
2016-09-28 11:08:00 +00:00
Jonas Paulsson
3c5fa71cd5 [SystemZ] Implementation of getUnrollingPreferences().
This commit enables more unrolling for SystemZ by implementing the
SystemZTargetTransformInfo::getUnrollingPreferences() method.

It has been found that it is better to only unroll moderately, so the
DefaultUnrollRuntimeCount has been moved into UnrollingPreferences in order
to set this to a lower value for SystemZ (4).

Reviewers: Evgeny Stupachenko, Ulrich Weigand.
https://reviews.llvm.org/D24451

llvm-svn: 282570
2016-09-28 09:41:38 +00:00
Quentin Colombet
248c18cc50 [AArch64][RegisterBankInfo] Switch to statically allocated ValueMapping.
Another step toward TableGen'ed like structure for the RegisterBankInfo
of AArch64. By doing this, we also save a bit of compile time for the
exact same output.

llvm-svn: 282550
2016-09-27 22:55:04 +00:00
Quentin Colombet
8c34b748a9 [AArch64][RegisterBankInfo] Fix copy/paste in comments.
NFC.

llvm-svn: 282549
2016-09-27 22:54:57 +00:00
Sanjay Patel
f1f848bbad [x86] add folds for FP logic with vector zeros
The 'or' case shows up in copysign. The copysign code also had 
redundant checking for a scalar zero operand with 'and', so I 
removed that. 

I'm not sure how to test vector 'and', 'andn', and 'xor' yet, 
but it seems better to just include all of the logic ops since
we're fixing 'or' anyway.

llvm-svn: 282546
2016-09-27 22:28:13 +00:00
Geoff Berry
8e556287ce [TargetRegisterInfo, AArch64] Add target hook for isConstantPhysReg().
Summary:
The current implementation of isConstantPhysReg() checks for defs of
physical registers to determine if they are constant.  Some
architectures (e.g. AArch64 XZR/WZR) have registers that are constant
and may be used as destinations to indicate the generated value is
discarded, preventing isConstantPhysReg() from returning true.  This
change adds a TargetRegisterInfo hook that overrides the no defs check
for cases such as this.

Reviewers: MatzeB, qcolombet, t.p.northover, jmolloy

Subscribers: junbuml, aemerson, mcrosier, rengolin

Differential Revision: https://reviews.llvm.org/D24570

llvm-svn: 282543
2016-09-27 22:17:27 +00:00
Sanjay Patel
139fefcb70 [x86] use isNullFPConstant(); NFCI
Also, put the related FP logic functions together to see the similarities. 

llvm-svn: 282522
2016-09-27 18:48:02 +00:00
Krzysztof Parzyszek
6ba0d72f22 [RDF] Add "dead" flag to node attributes
llvm-svn: 282520
2016-09-27 18:24:33 +00:00
Krzysztof Parzyszek
dd91b47def [RDF] Special treatment of exception handling registers
A landing pad can have live-in registers that are defined by the runtime,
not the program (exception pointer register and exception selector
register). Make sure to recognize that case and not link these registers
with any defs in the program.
Each landing pad will have phi nodes added at the beginning to provide
definitions of these registers, but the uses of those phi nodes will not
have any reaching defs.

llvm-svn: 282519
2016-09-27 18:18:44 +00:00
Konstantin Zhuravlyov
57914efac5 [AMDGPU] Enable changing instprinter's behavior based on the per-function
subtarget

This is a prerequisite for coming waitcnt changes

Differential Revision: https://reviews.llvm.org/D24939

llvm-svn: 282489
2016-09-27 14:42:48 +00:00
Simon Dardis
1e548fbd1e [mips] Disable tail calls temporarily
Disable tail calls while the remaining bugs are fixed. Enable only for tests.

Reviewers: vkalintiris

Differential Review: https://reviews.llvm.org/D24912

llvm-svn: 282487
2016-09-27 13:15:54 +00:00
Simon Dardis
1a8d0f68c8 [mips] Add rsqrt, recip for MIPS
Add rsqrt.[ds], recip.[ds] for MIPS. Correct the microMIPS definitions for
architecture support and register usage.

Reviewers: vkalintiris, zoran.jovanoic

Differential Review: https://reviews.llvm.org/D24499

llvm-svn: 282485
2016-09-27 12:25:15 +00:00
Nemanja Ivanovic
bf003f8d13 [Power9] Builtins for ELF v.2 API conformance - back end portion
This patch corresponds to review:
https://reviews.llvm.org/D24396

This patch adds support for the "vector count trailing zeroes",
"vector compare not equal" and "vector compare not equal or zero instructions"
as well as "scalar count trailing zeroes" instructions. It also changes the
vector negation to use XXLNOR (when VSX is enabled) so as not to increase
register pressure (previously this was done with a splat immediate of all
ones followed by an XXLXOR). This was done because the altivec.h
builtins (patch to follow) use vector negation and the use of an additional
register for the splat immediate is not optimal.

llvm-svn: 282478
2016-09-27 08:42:12 +00:00
Craig Topper
90082c1e67 [X86] Use std::max to calculate alignment instead of assuming RC->getSize() will not return a value greater than 32. I think it theoretically could be 64 for AVX-512.
llvm-svn: 282471
2016-09-27 06:44:25 +00:00
Davide Italiano
836f9d2e4e [CodeGen] Add support for emitting .init_array instead of .ctors on FreeBSD.
PR: 30494
llvm-svn: 282451
2016-09-26 22:53:15 +00:00
Derek Schuff
831613e75c [WebAssembly] Use the frame pointer instead of the stack pointer
When we have dynamic allocas we have a frame pointer, and
when we're lowering frame indexes we should make sure we use it.

Patch by Jacob Gravelle

Differential Revision: https://reviews.llvm.org/D24889

llvm-svn: 282442
2016-09-26 21:18:03 +00:00
Nirav Dave
2a4049ccf6 Add support for Code16GCC
[X86] The .code16gcc directive parses X86 assembly input in 32-bit mode and
outputs in 16-bit mode. Teach parser to switch modes appropriately.

Reviewers: dwmw2, craig.topper

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20109

llvm-svn: 282430
2016-09-26 19:33:36 +00:00
Andrew Kaylor
fce59ef3ff Add optimization bisect support to an optional Mips pass
Differential Revision: https://reviews.llvm.org/D19513

llvm-svn: 282428
2016-09-26 19:05:37 +00:00
Tom Stellard
6e3f633648 AMDGPU/SI: Don't crash on anonymous GlobalValues
Summary:
We need to call AsmPrinter::getNameWithPrefix() in order to handle
anonymous GlobalValues (e.g. @0, @1).

Reviewers: arsenm, b-sumner

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D24865

llvm-svn: 282420
2016-09-26 17:29:25 +00:00
Geoff Berry
0e859047b9 [AArch64] Improve add/sub/cmp isel of uxtw forms.
Don't match the UXTW extended reg forms of ADD/ADDS/SUB/SUBS if the
32-bit to 64-bit zero-extend can be done for free by taking advantage
of the 32-bit defining instruction zeroing the upper 32-bits of the X
register destination.  This enables better instruction selection in a
few cases, such as:

  sub x0, xzr, x8
  instead of:
  mov x8, xzr
  sub x0, x8, w9, uxtw

  madd x0, x1, x1, x8
  instead of:
  mul x9, x1, x1
  add x0, x9, w8, uxtw

  cmp x2, x8
  instead of:
  sub x8, x2, w8, uxtw
  cmp x8, #0

  add x0, x8, x1, lsl #3
  instead of:
  lsl x9, x1, #3
  add x0, x9, w8, uxtw

Reviewers: t.p.northover, jmolloy

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D24747

llvm-svn: 282413
2016-09-26 15:34:47 +00:00
Evandro Menezes
44fc35bb88 Add support to optionally limit the size of jump tables.
Many high-performance processors have a dedicated branch predictor for
indirect branches, commonly used with jump tables.  As sophisticated as such
branch predictors are, they tend to have well defined limits beyond which
their effectiveness is hampered or even nullified.  One such limit is the
number of possible destinations for a given indirect branches that such
branch predictors can handle.

This patch considers a limit that a target may set to the number of
destination addresses in a jump table.

Patch by: Evandro Menezes <e.menezes@samsung.com>, Aditya Kumar
<aditya.k7@samsung.com>, Sebastian Pop <s.pop@samsung.com>.

Differential revision: https://reviews.llvm.org/D21940

llvm-svn: 282412
2016-09-26 15:32:33 +00:00
Dylan McKay
c4ee28fdc5 [AVR] Add AVRMCExpr
Summary: This adds the AVRMCExpr headers and implementation.

Reviewers: arsenm, ruiu, grosbach, kparzysz

Subscribers: wdng, beanz, mgorny, kparzysz, jtbandes, llvm-commits

Differential Revision: https://reviews.llvm.org/D20503

llvm-svn: 282397
2016-09-26 11:35:32 +00:00
Sam Kolton
cf735303bc Revert "[AMDGPU] Disassembler: print label names in branch instructions"
This reverts commit 6c6dbe625263ec9fcf8de0df27263cf147cde550.

llvm-svn: 282396
2016-09-26 11:29:03 +00:00
Sam Kolton
4a0cb73585 [AMDGPU] Disassembler: print label names in branch instructions
Summary: Add AMDGPUSymbolizer for finding names for labels from ELF symbol table.

Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D24802

llvm-svn: 282394
2016-09-26 10:05:50 +00:00
James Molloy
462f650cad [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.

It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.

llvm-svn: 282387
2016-09-26 07:26:24 +00:00
Zvi Rackover
9ca8e1a54b [X86] Optimization for replacing LEA with MOV at frame index elimination time
Summary:
Replace a LEA instruction of the form 'lea (%esp), %ebx' --> 'mov %esp, %ebx'

MOV is preferable over LEA because usually there are more issue-slots available to execute MOVs than LEAs. Latest processors also support zero-latency MOVs.

Fixes pr29022.

Reviewers: hfinkel, delena, igorb, myatsina, mkuper

Differential Revision: https://reviews.llvm.org/D24705

llvm-svn: 282385
2016-09-26 06:42:07 +00:00
Ayman Musa
a9677386d1 [X86][avx512] Fix bug in masked compress store.
Differential Revision: https://reviews.llvm.org/D23984

llvm-svn: 282381
2016-09-26 06:22:08 +00:00
Craig Topper
f91537d654 [X86] Remove what appears to be leftover MMX code involving (v1i64 scalar_to_vector).
llvm-svn: 282361
2016-09-25 16:34:11 +00:00
Craig Topper
fe64841663 [X86] Remove patterns for scalar_to_vector from FR32/FR64 to 256-bit vectors. Lowering explicitly avoids creating this pattern.
llvm-svn: 282360
2016-09-25 16:34:09 +00:00
Craig Topper
7bc5d46631 [AVX-512] Replace get512BitSuperRegister with calls to TargetRegisterInfo::getMatchingSuperReg.
llvm-svn: 282359
2016-09-25 16:34:06 +00:00
Craig Topper
c35f693a30 [AVX-512] Fix some patterns predicates to properly enforce priority for various versions of CVTDQ2PD instruction.
llvm-svn: 282358
2016-09-25 16:34:02 +00:00
Craig Topper
8e5bd2f3f8 [AVX-512] Add rounding versions of instructions to hasUndefRegUpdate.
llvm-svn: 282357
2016-09-25 16:33:59 +00:00
Craig Topper
9977467fdd [AVX-512] Add the scalar unsigned integer to fp conversion instructions to hasUndefRegUpdate.
llvm-svn: 282356
2016-09-25 16:33:57 +00:00
Craig Topper
1f56afd8eb [AVX-512] Remove duplicate instructions for converting integer to scalar floating point. We can use patterns to point to the other instructions instead.
llvm-svn: 282355
2016-09-25 16:33:53 +00:00
Craig Topper
0ccdd06243 [AVX-512] Don't use two opcodes for INTR_TYPE_SCALAR_MASK_RM. The handling was such that if the second opcode was present the first was ingored, so we can just have one opcode.
llvm-svn: 282344
2016-09-25 01:03:10 +00:00
Craig Topper
bab9b76656 [X86] Teach combineShuffle to avoid creating floating point operations with integer types and integer operations with floating point types. Seems isOperationLegal lies for mismatched types and operations.
Fixes PR30511.

llvm-svn: 282341
2016-09-24 21:42:49 +00:00
Craig Topper
306ed8539b [AVX-512] Split scalar version of X86ISD::SELECT into a separate opcode because isel is not robust with multiple type profiles for the same opcode.
llvm-svn: 282340
2016-09-24 21:42:47 +00:00
Craig Topper
eddbef5520 [AVX-512] Remove the patterns for selecting scalar VCOMI/VUCOMI instructions with SAE as there is no way to create the pattern.
llvm-svn: 282339
2016-09-24 21:42:43 +00:00
Sanjay Patel
da8aa7cf2e [x86] don't try to create a vector integer inst for an SSE1 target (PR30512)
This bug was introduced with:
http://reviews.llvm.org/rL272511

We need to restrict the lowering to v4f32 comparisons because that's all SSE1 can handle.

This should fix:
https://llvm.org/bugs/show_bug.cgi?id=28044

llvm-svn: 282336
2016-09-24 20:24:06 +00:00
Dylan McKay
87118e6a0f [AVR] Update signature of AVRTargetObjectFile::SelectSectionForGlobal
It was changed recently, and was breaking compilation of the backend.

llvm-svn: 282329
2016-09-24 11:38:08 +00:00
Quentin Colombet
3708fb9829 [RegisterBankInfo] Uniquely generate ValueMapping.
This is a step toward statically allocate ValueMapping. Like the
previous few commits, the goal is to move toward a TableGen'ed like
structure with no dynamic allocation at all.

llvm-svn: 282324
2016-09-24 04:53:52 +00:00
Sanjay Patel
07ded31922 [x86] fix FCOPYSIGN lowering to create constants instead of ConstantPool loads
This is similar to:
https://reviews.llvm.org/rL279958

By not prematurely lowering to loads, we should be able to more easily eliminate
the 'or' with zero instructions seen in copysign-constant-magnitude.ll.

We should also be able to extend this code to handle vectors.

llvm-svn: 282312
2016-09-23 23:17:29 +00:00
Valery Pykhtin
48873e2d24 [AMDGPU] Fix for bz30427: wrong MTBUF encoding on VI
Differential revision: https://reviews.llvm.org/D24875

llvm-svn: 282296
2016-09-23 21:21:21 +00:00
James Molloy
1d4f73e415 Revert "[ARM] Promote small global constants to constant pools"
This reverts commit r282241. It caused http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/19882.

llvm-svn: 282249
2016-09-23 13:35:43 +00:00
Nemanja Ivanovic
5779a8862c [Power9] Exploit move and splat instructions for build_vector improvement
This patch corresponds to review:
https://reviews.llvm.org/D21135

This patch exploits the following instructions:
mtvsrws
lxvwsx
mtvsrdd
mfvsrld

In order to improve some build_vector and extractelement patterns.

llvm-svn: 282246
2016-09-23 13:25:31 +00:00
James Molloy
2b4c2966f2 [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.

It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.

llvm-svn: 282241
2016-09-23 12:15:58 +00:00
Valery Pykhtin
8bae56ae1d [AMDGPU] Refactor VOP1 and VOP2 instruction TD definitions
Differential revision: https://reviews.llvm.org/D24738

llvm-svn: 282234
2016-09-23 09:08:07 +00:00
Craig Topper
8e59adb433 [AVX-512] Split X86ISD::VFPROUND and X86ISD::VFPEXT into separate opcodes for each type constraint.
This revealed that scalar intrinsics could create nodes with a rounding mode of FROUND_CUR_DIRECTION, but the patterns didn't check for it. It just worked because isel doesn't check operand count and we had a pattern without the rounding mode argument at all.

llvm-svn: 282231
2016-09-23 06:24:43 +00:00
Craig Topper
0358026c76 [AVX-512] Add separate ISD opcodes for each form of CVT instructions. Don't reuse non-X86 ISD opcodes with extra X86 specific arguments.
llvm-svn: 282230
2016-09-23 06:24:39 +00:00
Craig Topper
f8270c27ed [AVX-512] Use different ISD opcodes for some of the scalar intrinsic lowering. Isel is not very robust against using the same ISD opcode with different number of operands so its better to separate.
llvm-svn: 282229
2016-09-23 06:24:35 +00:00
Tom Stellard
a377ff1511 AMDGPU/SI: Include implicit arguments in kernarg_segment_byte_size
Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D24835

llvm-svn: 282223
2016-09-23 01:33:26 +00:00
Quentin Colombet
ad7834b866 [AArch64][RegisterBankInfo] Sanity check TableGen'ed like inputs.
Make sure the entries written to mimic the behavior of TableGen are
sane.

llvm-svn: 282220
2016-09-23 00:59:07 +00:00
Quentin Colombet
a1311e08f9 [AArch64][RegisterBankInfo] Switch to TableGen'ed like PartialMapping.
Statically instanciate the most common PartialMappings. This should
be closer to what the code would look like when TableGen support is
added for GlobalISel. As a side effect, this should improve compile
time.

llvm-svn: 282215
2016-09-23 00:14:36 +00:00
Quentin Colombet
2a450b9923 [RegisterBankInfo] Use array instead of SmallVector for BreakDown.
This is another step toward TableGen'ed like structures. The BreakDown of
the mapping of the value will be statically computed by TableGen, thus
we only have to point to the right entry in the table instead of
dynamically allocate the mapping for each instruction.

We still support the dynamic allocation through a factory of
PartialMapping to ease the bring-up of the targets while the TableGen
backend is not available.

llvm-svn: 282213
2016-09-23 00:14:30 +00:00
Krzysztof Parzyszek
a28e3dd9fd [RDF] Add initial support for lane masks in the DFG
Use lane masks for calculating covering and aliasing of register
references.

llvm-svn: 282194
2016-09-22 21:01:24 +00:00
Krzysztof Parzyszek
047695086f [Hexagon] Remove USR_OVF from CtrRegs register class
USR_OVF is a subregister of USR, which is a member of CtrRegs. Having both
a register and its proper subregister in the same register class has bad
consequences for lane mask calculation: based solely on the lane mask info,
USR_OVF would not appear to be a subregister of USR.

llvm-svn: 282192
2016-09-22 20:59:41 +00:00
Krzysztof Parzyszek
4e87f8e468 [RDF] Print the function name for calls in dumps
llvm-svn: 282191
2016-09-22 20:58:19 +00:00
Krzysztof Parzyszek
38eef5294c [RDF] Use uint32_t for register numbers instead of unsigned
llvm-svn: 282190
2016-09-22 20:56:39 +00:00
Arnold Schwaighofer
115971c664 i386 does not support optimized swifterror handling
rdar://28432565

llvm-svn: 282186
2016-09-22 20:06:25 +00:00
Hans Wennborg
d1cad41742 Win64: Don't emit unwind info for "leaf" functions (PR30337)
According to MSDN (see the PR), functions which don't touch any callee-saved
registers (including %rsp) don't need any unwind info.

This patch makes LLVM not emit unwind info for such functions, to save
binary size.

Differential Revision: https://reviews.llvm.org/D24748

llvm-svn: 282185
2016-09-22 19:50:05 +00:00
Nemanja Ivanovic
07dc6b6937 [PowerPC] Sign extend sub-word values for atomic comparisons
Atomic comparison instructions use the sub-word load instruction on
Power8 and up but the value is not sign extended prior to the signed word
compare instruction. This patch adds that sign extension.

llvm-svn: 282182
2016-09-22 19:06:38 +00:00
Krzysztof Parzyszek
344cd70c0b [PPC] Set SP after loading data from stack frame, if no red zone is present
Follow-up to r280705: Make sure that the SP is only restored after all data
is loaded from the stack frame, if there is no red zone.

This completes the fix for https://llvm.org/bugs/show_bug.cgi?id=26519.

Differential Revision: https://reviews.llvm.org/D24466

llvm-svn: 282174
2016-09-22 17:22:43 +00:00
Tim Northover
3841e1ae81 GlobalISel: handle stack-based parameters on AArch64.
llvm-svn: 282153
2016-09-22 13:49:25 +00:00
Artem Tamazov
0f7a51941f [AMDGPU][mc] Add support for absolute expressions in DPP modifiers.
Also added range checking for DPP attributes.
Assembler tests added as well.

Differential Revision: https://reviews.llvm.org/D24755

llvm-svn: 282145
2016-09-22 11:47:21 +00:00
Nemanja Ivanovic
90c1d15d13 [PowerPC] Remove LE patterns matching generic stores/loads to VSX permuting ops
This patch corresponds to:
https://reviews.llvm.org/D21409

The LXVD2X, LXVW4X, STXVD2X and STXVW4X instructions permute the two doublewords
in the vector register when in little-endian mode. Custom code ensures that the
necessary swaps are inserted for these. This patch simply removes the possibilty
that a load/store node will match one of these instructions in the SDAG as that
would not insert the necessary swaps.

llvm-svn: 282144
2016-09-22 10:32:03 +00:00
Nemanja Ivanovic
a22f68829a [Power9] Add exploitation of non-permuting memory ops
This patch corresponds to review:
https://reviews.llvm.org/D19825

The new lxvx/stxvx instructions do not require the swaps to line the elements
up correctly. In order to select them over the lxvd2x/lxvw4x instructions which
require swaps, the patterns for the old instruction have a predicate that
ensures they won't be selected on Power9 and newer CPUs.

llvm-svn: 282143
2016-09-22 09:52:19 +00:00
Craig Topper
daf6088084 [AVX-512] Add support for commuting VPTERNLOG instructions.
VPTERNLOG is a ternary instruction with an immediate specifying the logical operation to perform. For each bit position in the 3 source vectors the bit from each source is concatenated together and the resulting 3-bit value is used to select a bit in the immediate. This bit value is written to the result vector.

We can commute this by swapping operands and modifying the immediate. To modify the immediate we need to swap two pairs of bits. The pairs correspond to the locations in the immediate where the commuted operands bits have opposite values and the uncommuted operand has the same value. Bits 0 and 7 will never be swapped since the relevant bits from all sources are the same value.

This refactors and reuses parts of the FMA3 commuting code which is also a three operand instruction.

llvm-svn: 282132
2016-09-22 03:00:50 +00:00
Quentin Colombet
aefcf654f5 [RegisterBankInfo] Move to statically allocated RegisterBank.
This commit is basically the first step toward what will
RegisterBankInfo look when it gets TableGen'ed.

It introduces a XXXGenRegisterBankInfo.def file that is what TableGen
will issue at some point. Moreover, the RegBanks field in
RegisterBankInfo changed to reflect the static (compile time) aspect of
the information.

llvm-svn: 282131
2016-09-22 02:10:37 +00:00
Artem Tamazov
1aa913cd20 [AMDGPU][mc] Add support for ds_add_[rtn_]f32.
Lit tests added.
Resolves https://github.com/RadeonOpenCompute/hcc/issues/122.

Differential Revision: https://reviews.llvm.org/D24765

llvm-svn: 282086
2016-09-21 16:35:44 +00:00
Nico Weber
e87f9a86c2 Revert r281715, it caused PR30475
llvm-svn: 282076
2016-09-21 15:33:24 +00:00
Tim Northover
588e105a97 GlobalISel: produce correct code for signext/zeroext ABI flags.
We still don't really have an equivalent of "AssertXExt" in DAG, so we don't
exploit the guarantees on the receiving side yet, but this should produce
conservatively correct code on iOS ABIs.

llvm-svn: 282069
2016-09-21 12:57:45 +00:00
Tim Northover
6af161c8b2 GlobalISel: pass Function to lowerFormalArguments directly (NFC).
The only implementation that exists immediately looks it up anyway, and the
information is needed to handle various parameter attributes (stored on the
function itself).

llvm-svn: 282068
2016-09-21 12:57:35 +00:00
Sam Kolton
911939a43c [AMDGPU] Assembler: remove unused AMDGPUMCObjectWriter.
Summary: It is replaced by AMDGPUELFObjectWriter

Reviewers: tstellarAMD, vpykhtin, artem.tamazov

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl

Differential Revision: https://reviews.llvm.org/D24654

llvm-svn: 282065
2016-09-21 10:33:32 +00:00
Simon Dardis
55f9daf6b9 [mips] LLVM PR/30197 - Tail call incorrectly clobbers arguments for mips
The postRA scheduler performs alias analysis to determine if stores and loads
can moved past each other. When a function has more arguments than argument
registers for the calling convention used, excess arguments are spilled onto the
stack. LLVM by default assumes that argument slots are immutable, unless the
function contains a tail call. Without the knowledge of that a function contains
a tail call site, stores and loads to fixed stack slots may be re-ordered
causing the out-going arguments to clobber the incoming arguments before the
incoming arguments are supposed to be dead.

Reviewers: vkalintiris

Differential Review: https://reviews.llvm.org/D24077

llvm-svn: 282063
2016-09-21 09:43:40 +00:00
Diana Picus
5feb684c69 Revert "AArch64: Set shift bit of TLSLE HI12 add instruction"
This reverts commit r282057 because it broke the buildbots - see e.g.
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-42vma/builds/12063

llvm-svn: 282058
2016-09-21 08:24:41 +00:00
Lei Liu
d6e7064c9c AArch64: Set shift bit of TLSLE HI12 add instruction
Summary: AArch64 LLVM assembler emits add instruction without shift bit to calculate the higher 12-bit address of TLS variables in local exec model.  This generates wrong code sequence to access TLS variables with thread offset larger than 0x1000.

Reviewers: t.p.northover, peter.smith, rovka

Subscribers: salim.nasser, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D24702

llvm-svn: 282057
2016-09-21 07:41:41 +00:00
Craig Topper
4e1c2ac20e [AVX-512] Split the 3 different usages of the X86ISD::FSETCC opcode into 3 different opcodes.
It turns out isel is really not robust against having different type profiles for the same opcode. It turns out that if you put an illegal rounding mode(i.e. not CUR_DIRECTION or NO_EXC) on a comiss intrinsic we would generate the FSETCC form with the rounding mode added, but then pattern match to an instruction with ROUND_CUR_DIRECTION.

We can probably get away with just one FSETCCM opcode that always contains the rounding mode and explicitly put ROUND_CUR_DIRECTION in the pattern, but I'll leave that for future work.

With this change the clang tests for the comiss intrinsics that used an incorrect rounding mode of 3 properly fail isel instead of silently doing the wrong thing. Those clang tests will be fixed in a follow up commit and I also plan to add rounding mode checking to clang.

llvm-svn: 282055
2016-09-21 06:37:54 +00:00
Craig Topper
e611d84dea [AVX-512] Don't add an additional rounding mode operand to the avx512 vcvtps2ph intrinsic lowering.
There was no way to control its value so it was always FROUND_CURRENT making it unnecessary. The true rounding mode is encoded in the immediate operand of the instruction.

This also removes the pattern from the rb form of the instructions since there is no way to specify the FROUND_NO_EXC rounding mode it required.

llvm-svn: 282052
2016-09-21 03:58:44 +00:00
Craig Topper
241989dad6 [AVX-512] Simplify handling of INTR_TYPE_1OP_MASK_RM to remove support for the second opcode since its never used. This makes it consistent with INTR_TYPE_2OP_MASK_RM and INTR_TYPE_3OP_MASK_RM.
And even if it was used we were passing the same operands to both so it wouldn't make sense to have two opcodes.

llvm-svn: 282051
2016-09-21 03:58:41 +00:00
Craig Topper
9e4f5327ec [AVX-512] Don't lower avx512 vcvtps2ph/vcvtph2ps nodes to ISD::FP16_TO_FP/ISD::FP_TO_FP16 with an extra x86 specific rounding mode operand. We should use a target specific ISD opcode.
llvm-svn: 282046
2016-09-21 02:05:22 +00:00
Jacques Pienaar
92806722a6 [NVPTX] Check if callsite is defined when computing argument allignment
Summary: In getArgumentAlignment check if the ImmutableCallSite pointer CS is non-null before dereferencing. If CS is 0x0 fall back to the ABI type alignment else compute the alignment as before.

Reviewers: eliben, jpienaar

Subscribers: jlebar, vchuravy, cfe-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D9168

llvm-svn: 282045
2016-09-21 01:57:57 +00:00
Eric Christopher
0310283aeb Remove the default subtarget from the x86 port as it isn't necessary (or
correct) anymore.

llvm-svn: 282031
2016-09-20 22:19:33 +00:00
Eric Christopher
94d50bf6e6 Revert "Remove extra argument used once on
TargetMachine::getNameWithPrefix and inline the result into the singular
caller." and "Remove more guts of TargetMachine::getNameWithPrefix and
migrate one check to the TLOF mach-o version." temporarily until I can
get the whole call migrated out of the TargetMachine as we could hit
places where TLOF isn't valid.

This reverts commits r281981 and r281983.

llvm-svn: 282028
2016-09-20 22:03:28 +00:00
Evandro Menezes
ac70555f2f Revert part of "AArch64: Do not test for CPUs, use SubtargetFeatures"
This reverts part of commit 119e358d9635c8d1f3e7aee67e3ea3b8a62f8db6 by
removing FeatureUseRSqrt et al per request by Eric Christopher
<echristo@gmail.com> (v. http://bit.ly/2cmz6kW).

llvm-svn: 282001
2016-09-20 19:02:09 +00:00
Evandro Menezes
1c47b35a01 Revert "[AArch64] Use the reciprocal estimation machinery"
This reverts commit b7d42b0048f65346e9fa37fb65defeea7ce8c337 per request by
Eric Christopher <echristo@gmail.com> (v. http://bit.ly/2cmz6kW).

llvm-svn: 282000
2016-09-20 19:02:06 +00:00
Evandro Menezes
27cb59936a Revert "[AArch64] Properly validate the reciprocal estimation."
This reverts commit ad8ca1528242e2a4cb363e3779309e70eb7a430e per request by
Eric Christopher <echristo@gmail.com> (v. http://bit.ly/2cmz6kW).

llvm-svn: 281999
2016-09-20 19:02:02 +00:00
Saleem Abdulrasool
f0f8609e1a X86: loosen an overly aggressive MachO assertion
We would assert that the FP setup CFI used esp/rsp always.  This held up in
practice when the code was generated from IR.  However, with the integrated
assembler, it is possible to have the input be user specified assembly.  In such
a case, we cannot assume that the function implementation has a compact unwind
representation.  Loosen the assertion into a check and bail if we cannot
represent the frame pointer in the compact unwinding.

Addresses PR30453!

llvm-svn: 281986
2016-09-20 17:05:04 +00:00
Eric Christopher
68d7d407a3 Remove more guts of TargetMachine::getNameWithPrefix and migrate one check to the TLOF mach-o version.
NFC intended.

llvm-svn: 281983
2016-09-20 16:05:02 +00:00
Eric Christopher
a7d9b53018 Remove a use of subtarget initialization in the X86 backend so we can get rid of the default subtarget.
NFC intended.

llvm-svn: 281982
2016-09-20 16:04:59 +00:00
Eric Christopher
a005027a00 Remove extra argument used once on TargetMachine::getNameWithPrefix and inline the result into the singular caller.
llvm-svn: 281981
2016-09-20 16:04:50 +00:00
Tim Northover
c5866724c9 GlobalISel: split aggregates for PCS lowering
This should match the existing behaviour for passing complicated struct and
array types, in particular HFAs come through like that from Clang.

For C & C++ we still need to somehow support all the weird ABI flags, or at
least those that are present in the IR (signext, byval, ...), and stack-based
parameter passing.

llvm-svn: 281977
2016-09-20 15:20:36 +00:00
Elena Demikhovsky
943a1dadb1 AVX-512: Fixed a bug in lowering saturated operations on KNL.
The generated code is still not optimal.

Differential Revision: https://reviews.llvm.org/D24723

llvm-svn: 281966
2016-09-20 11:02:26 +00:00
Valery Pykhtin
b5ec8d7e35 [AMDGPU] Refactor VOP3 instruction TD definitions
Differential revision: https://reviews.llvm.org/D24664

llvm-svn: 281965
2016-09-20 10:41:16 +00:00
Craig Topper
d9c4e58523 [AVX-512] Teach X86InstrInfo::copyPhysReg to use a 512-bit move if XMM16-XMM31 or YMM16-YMM31 are the source or dest of the copy and VLX is not supported.
This can happen with SUBREG_TO_REG of ZMM16-ZMM31. Fixes PR30430.

llvm-svn: 281959
2016-09-20 06:49:17 +00:00
Craig Topper
6651d4f318 [AVX-512] Use 512-bit vcvtps2ph/vcvtph2ps to implement fp_to_f16/f16_to_fp when F16C and VLX are not supported.
Fixes PR23941.

llvm-svn: 281958
2016-09-20 05:44:47 +00:00
Sanjay Patel
f5fe06aed8 [x86] fix variable names; NFC
llvm-svn: 281953
2016-09-20 00:27:22 +00:00
Sanjay Patel
f2d341bf96 [x86] use getSignBit() to simplify code; NFCI
llvm-svn: 281944
2016-09-19 22:07:27 +00:00
Valery Pykhtin
9c39968ce1 [AMDGPU] Refactor VOPC instruction TD definitions
Differential Revision: https://reviews.llvm.org/D24546

llvm-svn: 281903
2016-09-19 14:39:49 +00:00
Diana Picus
0020aa4a64 [AArch64] Fix encoding for lsl #12 in add/sub immediates
Whenever an add/sub immediate needs a fixup, we set that immediate field to zero,
which is correct, but we also set the shift bits to zero, which is not true for
instructions that use lsl #12. This patch makes sure that if lsl #12 was used,
it will appear in the encoding of the instruction.

Differential Revision: https://reviews.llvm.org/D23930

llvm-svn: 281898
2016-09-19 11:10:18 +00:00
Sam Kolton
dc0750f4ac [AMDGPU] Fix s_branch with -1 offset
Summary:
In case s_branch instruction target is itself backend should emit offset -1 but instead it emit 0.
'''
label:
    s_branch label  // should emit [0xff,0xff,0x82,0xbf]
'''

Tom, Matt: why are we adjusting fixup values in applyFixup() method instead of processFixup()? processFixup() is calling adjustFixupValue() but does nothing with its result.

Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl

Differential Revision: https://reviews.llvm.org/D24671

llvm-svn: 281896
2016-09-19 10:20:55 +00:00
Oliver Stannard
07a1bb0354 [Thumb] Set correct initial mapping symbol for big-endian thumb
The initial mapping symbol state is set from the triple, but we only checked
for the little-endian thumb triple, so could end up with an ARM mapping symbol
for big-endian thumb.

Differential Revision: https://reviews.llvm.org/D24553

llvm-svn: 281894
2016-09-19 09:21:45 +00:00
Tim Northover
11b2b57958 ARM: check alignment before transforming ldr -> ldm (or similar).
ldm and stm instructions always require 4-byte alignment on the pointer, but we
weren't checking this before trying to reduce code-size by replacing a
post-indexed load/store with them. Unfortunately, we were also dropping this
incormation in DAG ISel too, but that's easy enough to fix.

llvm-svn: 281893
2016-09-19 09:11:09 +00:00
Craig Topper
c7f779bf05 [X86,AVX-512] Use INSERT_SUBREG instead of SUBREG_TO_REG when the input is not the output of an instruction.
SUBREG_TO_REG is supposed to indicate that the super register has been zeroed, but we can't prove that if we don't know where it came from.

llvm-svn: 281885
2016-09-19 02:53:43 +00:00
Craig Topper
9f5d417d08 [AVX-512] Add support for lowering fp_to_f16 and f16_to_fp when VLX is supported regardless of whether F16C is also supported.
Still need to add support for lowering using AVX512F when neither VLX or F16C is supported.

llvm-svn: 281884
2016-09-19 02:53:37 +00:00
Dean Michael Berris
fca2ef5fb0 [XRay] ARM 32-bit no-Thumb support in LLVM
This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter.
This is one of 3 commits to different repositories of XRay ARM port. The other 2 are:

https://reviews.llvm.org/D23932 (Clang test)
https://reviews.llvm.org/D23933 (compiler-rt)

Differential Revision: https://reviews.llvm.org/D23931

llvm-svn: 281878
2016-09-19 00:54:35 +00:00
Craig Topper
8173f1a5ca [AVX-512] Don't lower CVTPD2PS intrinsics to ISD::FP_ROUND with an X86 rounding mode encoding in the second operand. This immediate should only be 0 or 1 and indicates if the truncation loses precision.
Also enhance an assert in SelectionDAG::getNode to flag this sort of problem in the future.

llvm-svn: 281868
2016-09-18 21:49:32 +00:00
Craig Topper
040e018f7c [AVX-512] Stop lowering avx512_mask_sqrt intrinsics to ISD:FSQRT with a second operand containing an X86 specific rounding mode encoding that doesn't belong.
llvm-svn: 281867
2016-09-18 21:49:28 +00:00
Craig Topper
c385adca9d [X86] Fix typo in comment. NFC
llvm-svn: 281862
2016-09-18 18:59:38 +00:00
Craig Topper
b166011773 [AVX-512] Add memory load patterns for the legacy SSE scalar fp to integer conversion intrinsics to be consistent across all intruction sets.
llvm-svn: 281861
2016-09-18 18:59:36 +00:00
Craig Topper
fca6198042 [AVX-512] Remove COPY_TO_REGCLASS from a few patterns that already had the correct register class.
llvm-svn: 281860
2016-09-18 18:59:33 +00:00
Simon Pilgrim
b15b82c418 [X86][SSE] Improve recognition of uitofp conversions that can be performed as sitofp
With D24253 we can now use SelectionDAG::SignBitIsZero with vector operations.

This patch uses SelectionDAG::SignBitIsZero to recognise that a zero sign bit means that we can use a sitofp instead of a uitofp (which is not directly support on pre-AVX512 hardware).

While AVX512 does provide support for uitofp, the conversion to sitofp should not cause any regressions.

Differential Revision: https://reviews.llvm.org/D24343

llvm-svn: 281852
2016-09-18 12:45:23 +00:00
Simon Pilgrim
8b1084e1d0 [X86][SSE] Improve target shuffle mask extraction
Add ability to extract vXi64 'vzext_movl' masks on 32-bit targets

llvm-svn: 281834
2016-09-17 18:50:54 +00:00
Ron Lieberman
374b9a54f7 [Hexagon] segv while processing SUnit with nullNodePtr
Added BoundaryNode check to isBestZeroLatency function.

llvm-svn: 281825
2016-09-17 16:21:09 +00:00
Matt Arsenault
17a8bb755d AMDGPU: Fix broken FrameIndex handling
We were trying to avoid using a FrameIndex operand in non-pointer
operands in a convoluted way, and would break because of
using TargetFrameIndex. The TargetFrameIndex should only be used
in the case where it makes sense to fold it as part of the addressing
mode, otherwise it requires materialization like a normal constant.
This wasn't working reliably and failed in the added testcase, hitting
the assert when processing the frame index.

The TargetFrameIndex was coming from trying to produce an AssertZext
limiting the maximum stack size. I'm not sure this was correct to begin
with, because it is apparently possible to have a single workitem
dispatch that requires all 4G of private memory.

llvm-svn: 281824
2016-09-17 16:09:55 +00:00
Matt Arsenault
6d10bb9741 AMDGPU: Rename spill operands to match real instruction
llvm-svn: 281823
2016-09-17 15:52:37 +00:00
Matt Arsenault
8c9bd8e031 AMDGPU: Push bitcasts through build_vector
This reduces the number of copies and reg_sequences
when using fp constant vectors. This significantly
reduces the code size in local-stack-alloc-bug.ll

llvm-svn: 281822
2016-09-17 15:44:16 +00:00
Matt Arsenault
4beb31bd8d AMDGPU: Use i64 scalar compare instructions
VI added eq/ne for i64, so use them.

llvm-svn: 281800
2016-09-17 02:02:19 +00:00
Tom Stellard
4715bc3836 AMDGPU/SI: Fix kernel argument ABI for HSA
Summary: i8, i16, and f16 values are not extended to 32-bit in the HSA kernel ABI.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24621

llvm-svn: 281789
2016-09-16 22:20:24 +00:00
Matt Arsenault
ef8518e8b1 AMDGPU: Allow some control flow intrinsics to be CSEd
These clean up some unnecessary or instructions in
cases with complex loops.

In the original testcase I noticed this, the same
or with exec was repeated 5 or 6 times in a row. With
this only one is emitted or sometimes a copy.

llvm-svn: 281786
2016-09-16 22:11:18 +00:00
Tom Stellard
d9bf037744 AMDGPU: Refactor kernel argument lowering
Summary:
The main challenge in lowering kernel arguments for AMDGPU is determing the
memory type of the argument.  The generic calling convention code assumes
that only legal register types can be stored in memory, but this is not the
case for AMDGPU.

This consolidates all the logic AMDGPU uses for deducing memory types into a single
function.  This will make it much easier to support different ABIs in the future.

Reviewers: arsenm

Subscribers: arsenm, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24614

llvm-svn: 281781
2016-09-16 21:53:00 +00:00
Matt Arsenault
6fa6edbc52 AMDGPU: Use SOPK compare instructions
llvm-svn: 281780
2016-09-16 21:41:16 +00:00
Tom Stellard
ca8fbd7138 AMDGPU/SI: Add support for triples with the mesa3d operating system
Summary:
mesa3d will use the same kernel calling convention as amdhsa, but it will
handle everything else like the default 'unknown' OS type.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22783

llvm-svn: 281779
2016-09-16 21:34:26 +00:00
Nirav Dave
66019b69d5 Defer asm errors to post-statement failure
Recommitting after fixing AsmParser initialization and X86 inline asm
error cleanup.

Allow errors to be deferred and emitted as part of clean up to simplify
and shorten Assembly parser code. This will allow error messages to be
emitted in helper functions and be modified by the caller which has
better context.

As part of this many minor cleanups to the Parser:

* Unify parser cleanup on error
* Add Workaround for incorrect return values in ParseDirective instances
* Tighten checks on error-signifying return values for parser functions
  and fix in-tree TargetParsers to be more consistent with the changes.
* Fix AArch64 test cases checking for spurious error messages that are
  now fixed.

These changes should be backwards compatible with current Target Parsers
so long as the error status are correctly returned in appropriate
functions.

Reviewers: rnk, majnemer

Subscribers: aemerson, jyknight, llvm-commits

Differential Revision: https://reviews.llvm.org/D24047

llvm-svn: 281762
2016-09-16 18:30:20 +00:00
Eric Christopher
f55bd9749f Actually remove the Mangler from the AsmPrinter and clean up the places it was "used" but not used.
llvm-svn: 281749
2016-09-16 17:07:23 +00:00
Eric Christopher
001e8b342a Fix a hidden use of grabbing the Mangler from the AsmPrinter and update
accordingly.

llvm-svn: 281748
2016-09-16 17:07:13 +00:00
Ahmed Bougacha
f8a01e9d2c [AArch64][GlobalISel] Add default regbank mapping for int<>FP.
llvm-svn: 281739
2016-09-16 15:12:46 +00:00
Ahmed Bougacha
d60222bb60 [AArch64][GlobalISel] Add default regbank mapping for G_FCMP.
llvm-svn: 281738
2016-09-16 15:12:43 +00:00
Ahmed Bougacha
67c8014905 [AArch64][GlobalISel] Add default regbank mapping for FP ops.
These should have all their operands - even scalars - go on FPR.

llvm-svn: 281737
2016-09-16 15:12:40 +00:00
Ahmed Bougacha
1fd329a7e8 [AArch64][GlobalISel] Add default regbank mappings for mixed-type ops.
We used to only support instructions with same-type operands.
Instead, use the per-register type information to map each
operand more accurately.

llvm-svn: 281734
2016-09-16 14:44:51 +00:00
Simon Dardis
969f6da95d [mips] Fix previous revert r281726.
llvm-svn: 281729
2016-09-16 14:16:23 +00:00
Keith Walker
ed3a815136 Place the lowered phi instruction(s) before the DEBUG_VALUE entry
When a phi node is finally lowered to a machine instruction it is
important that the lowered "load" instruction is placed before the
associated DEBUG_VALUE entry describing the value loaded.

Renamed the existing SkipPHIsAndLabels to SkipPHIsLabelsAndDebug to
more fully describe that it also skips debug entries. Then used the
"new" function SkipPHIsAndLabels when the debug information should not
be skipped when placing the lowered "load" instructions so that it is
placed before the debug entries.

Differential Revision: https://reviews.llvm.org/D23760 

llvm-svn: 281727
2016-09-16 14:07:29 +00:00
Simon Dardis
87e1c985be Revert "[mips] Fix aui/daui/dahi/dati for MIPSR6"
This reverts r281724. Still need dsanders to accept this.

llvm-svn: 281726
2016-09-16 13:56:05 +00:00
Simon Dardis
ed71912ab3 [mips] Fix aui/daui/dahi/dati for MIPSR6
For compatiblity with binutils, define these instructions to take
two registers with a 16bit unsigned immediate. Both of the registers
have to be same for dahi and dati.

Reviewers: vkalintiris, dsanders, zoran.jovanovic
 
Differential Review: https://reviews.llvm.org/D21473

llvm-svn: 281724
2016-09-16 13:50:43 +00:00
Sjoerd Meijer
668b8b519c Reverting r281719, this is causing buildbot failures and timeouts again.
llvm-svn: 281722
2016-09-16 13:16:52 +00:00
Ahmed Bougacha
34f6095c93 [AArch64][GlobalISel] Use the generic DefaultMapping as the default.
This lets generic logic handle the common case, instead of having to
implement applyMappingImpl for each instruction.

llvm-svn: 281720
2016-09-16 12:33:34 +00:00
Sjoerd Meijer
d564334693 This is an attempt to reapply r280808: [ARM] Lower UDIV+UREM to UDIV+MLS
(and the same for SREM)

This was causing buildbot failures earlier (time outs in the LNT suite).
However, we haven't been able to reproduce this and are suspecting this
was caused by another (reverted) patch.

llvm-svn: 281719
2016-09-16 12:10:09 +00:00
Eric Liu
f6a6007dcc Trying to fix Mangler memory leak in TargetLoweringObjectFile.
Summary:
`TargetLoweringObjectFile` can be re-used and thus `TargetLoweringObjectFile::Initialize()`
can be called multiple times causing `Mang` pointer memory leak.

Reviewers: echristo

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24659

llvm-svn: 281718
2016-09-16 11:50:57 +00:00
James Molloy
50b5b0ebc9 [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.

It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.

llvm-svn: 281715
2016-09-16 10:17:04 +00:00
Eric Christopher
fc58955e10 Move the Mangler from the AsmPrinter down to TLOF and clean up the
TLOF API accordingly.

llvm-svn: 281708
2016-09-16 07:33:15 +00:00
Eric Christopher
297c8907c7 Remove unused function getMang().
llvm-svn: 281707
2016-09-16 07:32:58 +00:00
Evandro Menezes
3204b2357d [AArch64] Support for FP FMA when -ffp-contract=fast
Currently, the machine combiner can proceed matching when -ffast-math is on.
It should also match when only -ffp-contract=fast is specified as was the
case before when DAGCombiner was doing the job.

Patch by: Abderrazek Zaafrani <a.zaafrani@samsung.com>.

Differential Revision: https://reviews.llvm.org/D24366

llvm-svn: 281649
2016-09-15 19:55:23 +00:00
Evgeniy Stepanov
eaeaf64504 Revert "[ARM] Promote small global constants to constant pools"
This reverts r281604, which adds text relocations to ARM binaries.

llvm-svn: 281645
2016-09-15 19:13:32 +00:00
Simon Dardis
84e72c6334 [mips][ias] Enable IAS by default for N64 on Debian mips64el.
Unfortunately we can't enable it for all N64 because it is not yet possible to
distinguish N32 from N64.

N64 has been confirmed to produce identical (within reason) objects to GAS
during stage 2 of compiler recursion on N64-abit Fedora. Unfortunately,
Fedora's triples do not distinguish N32 from N64 so I can't enable it by
default there. I'm currently repeating this testing for Debian mips64el but
it's very unlikely to produce a different result.

Patch by: Daniel Sanders

Reviewers: sdardis

Differential Review: https://reviews.llvm.org/D22678

llvm-svn: 281607
2016-09-15 13:13:01 +00:00
James Molloy
e7d2986a37 [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.

llvm-svn: 281604
2016-09-15 12:30:27 +00:00
Tim Northover
ed8959ddd8 GlobalISel: legalize GEP instructions with small offsets.
llvm-svn: 281602
2016-09-15 11:02:19 +00:00
Tim Northover
dd47cd5b6e GlobalISel: relax type constraints on G_ICMP to allow pointers.
llvm-svn: 281600
2016-09-15 10:40:38 +00:00
Tim Northover
337f4de87e GlobalISel: remove "unsized" LLT
It was only really there as a sentinel when instructions had to have precisely
one type. Now that registers are typed, each register really has to have a type
that is sized.

llvm-svn: 281599
2016-09-15 10:09:59 +00:00
Tim Northover
6a9b1a6161 GlobalISel: cache pointer sizes in LLT
Otherwise everything that needs to work out what size they are has to keep a
DataLayout handy, which is a bit silly and very annoying.

llvm-svn: 281597
2016-09-15 09:20:34 +00:00
Matt Arsenault
106c504026 Finish renaming remaining analyzeBranch functions
llvm-svn: 281535
2016-09-14 20:43:16 +00:00
Evgeniy Stepanov
8dd4aa2167 Revert "[ARM] Promote small global constants to constant pools"
Breaks Android tests by introducing text relocations to ARM binaries.

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/25362/steps/run%20asan%20lit%20tests%20%5Barm%2Fbullhead-userdebug%2FMTC20F%5D/logs/stdio

llvm-svn: 281526
2016-09-14 20:02:30 +00:00
Matt Arsenault
6a290fc995 Revert "AMDGPU: Use SOPK compare instructions"
Accidentally committed

llvm-svn: 281514
2016-09-14 18:04:42 +00:00
Matt Arsenault
2196097c5a AMDGPU: Use SOPK compare instructions
llvm-svn: 281513
2016-09-14 18:03:53 +00:00
Matt Arsenault
186940fec6 Make analyzeBranch family of instruction names consistent
analyzeBranch was renamed to use lowercase first, rename
the related set to match.

llvm-svn: 281506
2016-09-14 17:24:15 +00:00
Matt Arsenault
3b94dad7d3 AArch64: Use TTI branch functions in branch relaxation
The main change is to return the code size from
InsertBranch/RemoveBranch.

Patch mostly by Tim Northover

llvm-svn: 281505
2016-09-14 17:23:48 +00:00
Sanjay Patel
d3ca66ea70 [x86] fix formatting; NFC
llvm-svn: 281504
2016-09-14 17:23:18 +00:00
Simon Pilgrim
5f03f4e898 [X86][SSE] Improve recognition of i64 sitofp conversions that can be performed as i32 (PR29078)
Until AVX512DQ we only support i64/vXi64 sitofp conversion as scalars.

This patch sees if the sign bit extends far enough that we can truncate to a i32 type and then perform sitofp without loss of precision.

Differential Revision: https://reviews.llvm.org/D24345

llvm-svn: 281502
2016-09-14 17:15:26 +00:00
Simon Pilgrim
43b58d8ddc [X86][SSE] Don't use PSHUFD directly - lower with generic shuffle
Remove the last user of the old getTargetShuffleNode helpers

llvm-svn: 281499
2016-09-14 17:04:22 +00:00
Sanjay Patel
d9f29ae5f3 getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits(), round 2 ; NFCI
llvm-svn: 281498
2016-09-14 16:54:10 +00:00
Sanjay Patel
bb652cfa0b getVectorElementType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281495
2016-09-14 16:37:15 +00:00
Sanjay Patel
c514c5d741 getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCI
llvm-svn: 281493
2016-09-14 16:05:51 +00:00
Matt Arsenault
4086e0fffa AMDGPU: Support folding FrameIndex operands
This avoids test regressions in a future commit.

llvm-svn: 281491
2016-09-14 15:51:33 +00:00
Sanjay Patel
fe75137cbe getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits() ; NFCI
llvm-svn: 281490
2016-09-14 15:43:44 +00:00
Sanjay Patel
5023a2bb82 getScalarType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281489
2016-09-14 15:21:00 +00:00
Matt Arsenault
9249507611 AMDGPU: Improve splitting 64-bit bit ops by constants
This addresses a TODO to handle operations besides and. This
also starts eliminating no-op operations with a constant that
can emerge later.

llvm-svn: 281488
2016-09-14 15:19:03 +00:00
James Molloy
30dc709e61 [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

llvm-svn: 281484
2016-09-14 14:47:27 +00:00
Simon Pilgrim
55a7d3e6b5 [X86][SSE] Removed unused getTargetShuffleNode function
llvm-svn: 281481
2016-09-14 14:30:00 +00:00
Nemanja Ivanovic
05f2f7ea61 Fix code-gen crash on Power9 for insert_vector_elt with variable index (PR30189)
This patch corresponds to review:
https://reviews.llvm.org/D24021

In the initial implementation of this instruction, I forgot to account for
variable indices. This patch fixes PR30189 and should probably be merged into
3.9.1 (I'll open a bug according to the new instructions).

llvm-svn: 281479
2016-09-14 14:19:09 +00:00
Nemanja Ivanovic
816d3f4f3f Adding missing directive for Power9.
There is currently no codegen for Power9 that depends on the directive
so this is NFC for now but will be important in the future. This was
missed in r268950 so I'm adding it now.

llvm-svn: 281473
2016-09-14 14:09:39 +00:00
Simon Pilgrim
a4ae5a2b5a [X86][SSE] Don't blend vector shifts with MOVSS/MOVSD directly, lower from generic shuffle
Shuffle lowering will correctly lower to MOVSS/MOVSD/PBLEND, improving commutation opportunities

llvm-svn: 281471
2016-09-14 14:08:18 +00:00
James Molloy
69a6bea0ca Revert "[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently"
This reverts commit r281323. It caused chromium test failures and a selfhost failure.

llvm-svn: 281451
2016-09-14 09:45:28 +00:00
Tim Northover
aedd29a3b9 GlobalISel: mark pointer stores as legal on AArch64.
llvm-svn: 281448
2016-09-14 08:28:54 +00:00
Sjoerd Meijer
54f48e4444 This reapplies r281304. The issue was that I had missed
to copy the new isAdd field in the tablegen data structure.

llvm-svn: 281447
2016-09-14 08:20:03 +00:00