1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 03:23:01 +02:00
Commit Graph

37388 Commits

Author SHA1 Message Date
Ahmed Bougacha
b6c12fe106 [X86] Use LivePhysRegs in X86FixupBWInsts.
Kill-flags, which computeRegisterLiveness uses, are not reliable.
LivePhysRegs is.

Differential Revision: http://reviews.llvm.org/D19472

llvm-svn: 267495
2016-04-26 00:00:48 +00:00
James Y Knight
439f0092c7 [Sparc] Fix double-float fabs and fneg on little endian CPUs.
The SparcV8 fneg and fabs instructions interestingly come only in a
single-float variant. Since the sign bit is always the topmost bit no
matter what size float it is, you simply operate on the high
subregister, as if it were a single float.

However, the layout of double-floats in the float registers is reversed
on little-endian CPUs, so that the high bits are in the second
subregister, rather than the first.

Thus, this expansion must check the endianness to use the correct
subregister.

llvm-svn: 267489
2016-04-25 22:54:09 +00:00
Andrew Kaylor
b9ffd9d2d0 Fix build warning
llvm-svn: 267487
2016-04-25 22:27:30 +00:00
Andrew Kaylor
ce278dfa7a Add optimization bisect opt-in calls for AMDGPU passes
Differential Revision: http://reviews.llvm.org/D19450

llvm-svn: 267485
2016-04-25 22:23:44 +00:00
Andrew Kaylor
e8dd108465 Add optimization bisect opt-in calls for ARM passes
Differential Revision: http://reviews.llvm.org/D19449

llvm-svn: 267480
2016-04-25 22:01:04 +00:00
Andrew Kaylor
3825400f62 Add optimization bisect opt-in calls for AArch64 passes
Differential Revision: http://reviews.llvm.org/D19394

llvm-svn: 267479
2016-04-25 21:58:52 +00:00
Tim Northover
66f8d5ae59 ARM: put extern __thread stubs in a special section.
The linker needs to know that the symbols are thread-local to do its job
properly.

llvm-svn: 267473
2016-04-25 21:12:04 +00:00
Krzysztof Parzyszek
08a24a9751 [Hexagon] Few fixes for exception handling
llvm-svn: 267469
2016-04-25 21:05:19 +00:00
Quentin Colombet
7f8c56085e Re-apply r267206 with a fix for the encoding problem: when the immediate of
log2(Mask) is smaller than 32, we must use the 32-bit variant because the 64-bit
variant cannot encode it. Therefore, set the subreg part accordingly.

[AArch64] Fix optimizeCondBranch logic.

The opcode for the optimized branch does not depend on the size
of the activate bits in the AND masks, but the AND opcode itself.
Indeed, we need to use a X or W variant based on the AND variant
not based on whether the mask fits into the related variant.
Otherwise, we may end up using the W variant of the optimized branch
for 64-bit register inputs!

This fixes the last make check verifier issues for AArch64: PR27479.

llvm-svn: 267465
2016-04-25 20:54:08 +00:00
Matt Arsenault
edc94ff860 AMDGPU/SI: Optimize adjacent s_nop instructions
Use the operand for how long to wait. This is somewhat
distasteful, since it would be better to just emit s_nop
with the right argument in the first place. This would require
changing TII::insertNoop to emit N operands, which would be easy.
Slightly more problematic is the post-RA scheduler and hazard recognizer
represent nops as a single null node, and would require inventing
another way of representing N nops.

llvm-svn: 267456
2016-04-25 19:53:22 +00:00
Matt Arsenault
b60850cb10 AMDGPU: Implement addrspacecast
llvm-svn: 267452
2016-04-25 19:27:24 +00:00
Matt Arsenault
524b24258c AMDGPU: Add queue ptr intrinsic
llvm-svn: 267451
2016-04-25 19:27:18 +00:00
Matt Arsenault
7b1a862756 AMDGPU: Add DAG to debug dump
Also reorder case to match enum order

llvm-svn: 267449
2016-04-25 19:27:09 +00:00
Krzysztof Parzyszek
7bc31501a1 [Hexagon] Register save/restore functions do not follow regular conventions
Do not mark them as modifying any of the volatile registers by default.

llvm-svn: 267433
2016-04-25 17:49:44 +00:00
Jacques Pienaar
ded20a9ee9 [lanai] Expand findClosestSuitableAluInstr check to consider offset register.
Previously findClosestSuitableAluInstr was only considering the base register when checking the current instruction for suitability. Expand check to consider the offset if the offset is a register.

llvm-svn: 267424
2016-04-25 16:41:21 +00:00
Hrvoje Varga
e25aaa57ef [mips][microMIPS] Revert commit r267137
Commit r267137 was the reason for failing tests in LLVM test suite.

llvm-svn: 267419
2016-04-25 15:40:08 +00:00
Zlatko Buljan
eabfffce01 [mips][microMIPS] Revert commit r266977
Commit r266977 was reason for failing LLVM test suite with error message: fatal error: error in backend: Cannot select: t17: i32 = rotr t2, t11 ...

llvm-svn: 267418
2016-04-25 15:34:57 +00:00
Etienne Bergeron
37376d525a Fix incorrect redundant expression in target AMDGPU.
Summary:
The expression is detected as a redundant expression.
Turn out, this is probably a bug.

```
/home/etienneb/llvm/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:306:26: warning: both side of operator are equivalent [misc-redundant-expression]
  if (isSMRD(*FirstLdSt) && isSMRD(*FirstLdSt)) {
```

Reviewers: rnk, tstellarAMD

Subscribers: arsenm, cfe-commits

Differential Revision: http://reviews.llvm.org/D19460

llvm-svn: 267415
2016-04-25 15:06:33 +00:00
Silviu Baranga
6c665bec7d [ARM] Add support for the X asm constraint
Summary:
This patch adds support for the X asm constraint.

To do this, we lower the constraint to either a "w" or "r" constraint
depending on the operand type (both constraints are supported on ARM).

Fixes PR26493

Reviewers: t.p.northover, echristo, rengolin

Subscribers: joker.eph, jgreenhalgh, aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D19061

llvm-svn: 267411
2016-04-25 14:29:18 +00:00
Artem Tamazov
7fa01faba1 [AMDGPU][llvm-mc] s_getreg/setreg* - Add hwreg(...) syntax.
Added hwreg(reg[,offset,width]) syntax.
Default offset = 0, default width = 32.
Possibility to specify 16-bit immediate kept.
Added out-of-range checks.
Disassembling is always to hwreg(...) format.
Tests updated/added.

Differential Revision: http://reviews.llvm.org/D19329

llvm-svn: 267410
2016-04-25 14:13:51 +00:00
Krzysztof Parzyszek
18108f55f4 [Hexagon] Correctly set "Flags" in ELF header
llvm-svn: 267397
2016-04-25 12:49:47 +00:00
Marcin Koscielnicki
5a59940bb3 [PowerPC] [PR27387] Disallow r0 for ADD8TLS.
ADD8TLS, a variant of add instruction used for initial-exec TLS,
currently accepts r0 as a source register.  While add itself supports
r0 just fine, linker can relax it to a local-exec sequence, converting
it to addi - which doesn't support r0.

Differential Revision: http://reviews.llvm.org/D19193

llvm-svn: 267388
2016-04-25 09:24:34 +00:00
Craig Topper
8e26afd797 [X86] Replace a SmallVector used to pass 2 values to an ArrayRef parameter with a fixed size array. NFC
llvm-svn: 267377
2016-04-25 04:30:29 +00:00
Junmo Park
df2e2c23ce Minor code cleanups. NFC.
llvm-svn: 267375
2016-04-25 01:40:54 +00:00
Saleem Abdulrasool
13a27b5a96 ARM: fix __chkstk Frame Setup on WoA
This corrects the MI annotations for the stack adjustment following the __chkstk
invocation.  We were marking the original SP usage as a Def rather than Kill.
The (new) assigned value is the definition, the original reference is killed.

Adjust the ISelLowering to mark Kills and FrameSetup as well.

This partially resolves PR27480.

llvm-svn: 267361
2016-04-24 20:12:48 +00:00
Nick Lewycky
e703258f62 Fix typo in comment. NFC
llvm-svn: 267354
2016-04-24 17:55:57 +00:00
Simon Pilgrim
713ecd864c [X86][SSE] getTargetShuffleMaskIndices - dropped (unused) UNDEF handling
We aren't currently making use of this in any successful mask decode and its actually incorrect as it inserts the wrong number of SM_SentinelUndef mask elements.

llvm-svn: 267350
2016-04-24 16:49:53 +00:00
Simon Pilgrim
81b1ce90a3 [X86][SSE] Use range loop. NFCI.
llvm-svn: 267349
2016-04-24 16:33:35 +00:00
Craig Topper
043de2291d [Lanai] Use EVT::getEVTString() to print a type as a string instead of an enum encoding value.
llvm-svn: 267348
2016-04-24 16:30:51 +00:00
Simon Pilgrim
9549c9fa45 [X86][XOP] Fixed VPPERM permute op decoding (PR27472).
Fixed issue with VPPERM target shuffle mask decoding that was incorrectly masking off the 3-bit permute op with a 2-bit mask.

llvm-svn: 267346
2016-04-24 15:05:04 +00:00
Simon Pilgrim
1211a431ae [X86][SSE] Improved support for decoding target shuffle masks through bitcasts
Reused the ability to split constants of a type wider than the shuffle mask to work with masks generated from scalar constants transfered to xmm.

This fixes an issue preventing PSHUFB target shuffle masks decoding rematerialized scalar constants and also exposes the XOP VPPERM bug described in PR27472.

llvm-svn: 267343
2016-04-24 14:53:54 +00:00
Marcin Koscielnicki
6b999dbbc0 [SystemZ] [SSP] Add support for LOAD_STACK_GUARD.
This fixes PR22248 on s390x.  The previous attempt at this was D19101,
which was before LOAD_STACK_GUARD existed.  Compared to the previous
version, this always emits a rather ugly block of 4 instructions, involving
a thread pointer load that can't be shared with other potential users.
However, this is necessary for SSP - spilling the guard value (or thread
pointer used to load it) is counter to the goal, since it could be
overwritten along with the frame it protects.

Differential Revision: http://reviews.llvm.org/D19363

llvm-svn: 267340
2016-04-24 13:57:49 +00:00
Craig Topper
87f8c97986 [X86] Merge LowerCTLZ and LowerCTLZ_ZERO_UNDEF into a single function that branches internally for the one difference, allowing the rest of the code to be common. NFC
llvm-svn: 267331
2016-04-24 06:27:39 +00:00
Craig Topper
5e5f4bfbbd [X86] Node need to check if AVX512 is supported when lowering vector CTLZ. The CTLZ operation is only Custom for vectors if AVX512 is enabled so if a vector gets here AVX512 is implied. NFC
llvm-svn: 267330
2016-04-24 06:27:35 +00:00
Gerolf Hoflehner
f601ce3709 [MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098)
The original patch caused crashes because it could derefence a null pointer
for SelectionDAGTargetInfo for targets that do not define it.

Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:

- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math

llvm-svn: 267328
2016-04-24 05:14:01 +00:00
Craig Topper
7cda2fdad5 [X86] Remove isel patterns for selecting tzcnt/lzcnt from cmove/ne+cttz/ctlz. These are folded by DAG combine now.
llvm-svn: 267326
2016-04-24 04:38:34 +00:00
Craig Topper
3b13c64dcf Fix an assertion that can never fire because the condition ANDed with the string is just true or 1.
llvm-svn: 267324
2016-04-24 04:38:29 +00:00
Craig Topper
ff7f9dafdb Fix a couple assertions that can never fire because they just contained the text string which always evaluates to true. Add a ! so they'll evaluate to false.
llvm-svn: 267312
2016-04-24 02:01:25 +00:00
Craig Topper
917b9d7a47 [X86] Fix patterns that turn cmove/cmovne+ctlz/cttz into lzcnt/tzcnt instructions. Only one of the conditions should be valid for each pattern, not both. Update tests accordingly.
llvm-svn: 267311
2016-04-24 02:01:22 +00:00
Davide Italiano
734b1e3ae5 [MC/ELF] Implement support for GOTPCRELX/REX_GOTPCRELX.
The option to control the emission of the new relocations
is -relax-relocations (blatantly copied from GNU as).
It can't be enabled by default because it breaks relatively
recent versions of ld.bfd/ld.gold (late 2015).

llvm-svn: 267307
2016-04-24 01:03:57 +00:00
Davide Italiano
bed5ce5d10 [MC/ELF] Pass Fixup to getRelocType64.
In preparation for other changes.

llvm-svn: 267300
2016-04-23 22:26:31 +00:00
Renato Golin
d3f349208c Revert "[AArch64] Fix optimizeCondBranch logic."
This reverts commit r267206, as it broke self-hosting on AArch64.

llvm-svn: 267294
2016-04-23 19:30:52 +00:00
Craig Topper
aa283acffa [Hexagon] Set ctlz_zero_undef/cttz_zero_undef to Expand so LegalizeDAG will convert them to ctlz/cttz. Remove the now unneccessary isel patterns. NFC
llvm-svn: 267266
2016-04-23 02:49:31 +00:00
Craig Topper
0c0d9161b6 [NVPTX] Set ctlz_zero_undef to Expand so LegalizeDAG will convert it to ctlz. Remove the now unneccessary isel patterns. NFC
llvm-svn: 267265
2016-04-23 02:49:29 +00:00
Craig Topper
a338f25008 [WebAssembly] Set ctlz_zero_undef/cttz_zero_undef to Expand so LegalizeDAG will convert them to ctlz/cttz. Remove the now unneccessary isel patterns. NFC
llvm-svn: 267264
2016-04-23 02:49:25 +00:00
Matt Arsenault
3cbb6d7d74 AMDGPU: sext_inreg (srl x, K), vt -> bfe x, K, vt.Size
llvm-svn: 267244
2016-04-22 22:59:16 +00:00
Matt Arsenault
b274219b5d AMDGPU: Re-visit nodes in performAndCombine
This fixes test regressions when i64 loads/stores are made promote.

llvm-svn: 267240
2016-04-22 22:48:38 +00:00
Sriraman Tallam
bdcd2a52a2 Differential Revision: http://reviews.llvm.org/D19040
llvm-svn: 267229
2016-04-22 21:41:58 +00:00
Peter Collingbourne
df3f55c3ab CodeGen: Use PLT relocations for relative references to unnamed_addr functions.
The relative vtable ABI (PR26723) needs PLT relocations to refer to virtual
functions defined in other DSOs. The unnamed_addr attribute means that the
function's address is not significant, so we're allowed to substitute it
with the address of a PLT entry.

Also includes a bonus feature: addends for COFF image-relative references.

Differential Revision: http://reviews.llvm.org/D17938

llvm-svn: 267211
2016-04-22 20:40:10 +00:00
Quentin Colombet
dfaf1211a5 [AArch64] Fix optimizeCondBranch logic.
The opcode for the optimized branch does not depend on the size
of the activate bits in the AND masks, but the AND opcode itself.
Indeed, we need to use a X or W variant based on the AND variant
not based on whether the mask fits into the related variant.
Otherwise, we may end up using the W variant of the optimized branch
for 64-bit register inputs!

This fixes the last make check verifier issues for AArch64: PR27479.

llvm-svn: 267206
2016-04-22 20:09:58 +00:00
Quentin Colombet
e89cf71564 [AArch64] When creating MRS instruction, make sure the destination register is
declared as a definition.

This fixes the machine verifier error for CodeGen/AArch64/nzcv-save.ll.

llvm-svn: 267185
2016-04-22 18:46:17 +00:00
Quentin Colombet
e4d08c7e23 [AArch64][AdvSIMDScalar] Update the kill flags correctly.
We used to simply set the kill flags to true when transforming a scalar
instruction to a vector one.
SrcScalar1 = copy SrcVector1
... = opScalar SrcScalar1
=>
SrcScalar1 = copy SrcVector1
... = opVector SrcVector1<kill>

This is obviously wrong. The proper update consists in:
1. Propagate the kill status from the copy to the new opVector
2. Reset the kill status on the copy, since the live-range of
   SrcVector1 got extended.

This fixes some of the machine verifier errors for AArch64 with make check.

llvm-svn: 267180
2016-04-22 18:09:14 +00:00
Krzysztof Parzyszek
905ab44058 [Hexagon] Use common Pat classes for selecting code for intrinsics
llvm-svn: 267178
2016-04-22 18:05:55 +00:00
Krzysztof Parzyszek
7f1281b445 [Hexagon] Properly close live range in HexagonBlockRanges
llvm-svn: 267173
2016-04-22 17:27:22 +00:00
Konstantin Zhuravlyov
613d1574ee [AMDGPU] Insert nop pass: take care of outstanding feedback
- Switch few loops to range-based for loops
- Fix nop insertion at the end of BB
- Fix formatting
- Check for endpgm

Differential Revision: http://reviews.llvm.org/D19380

llvm-svn: 267167
2016-04-22 17:04:51 +00:00
Zoran Jovanovic
cc65c7cf04 [mips][microMIPS] Revert commit r266861.
Commit r266861 was the reason for failing tests in LLVM test suite.

llvm-svn: 267166
2016-04-22 16:53:15 +00:00
Krzysztof Parzyszek
e1e99be481 [Hexagon] Teach mux expansion how to deal with undef predicates
llvm-svn: 267165
2016-04-22 16:47:01 +00:00
Krzysztof Parzyszek
dc41008bf4 [Hexagon] Add definitions for trap/pause instructions
Also add tests for other instructions from HexagonSystemInst.td.

llvm-svn: 267162
2016-04-22 16:25:00 +00:00
Nirav Dave
2e439346e9 Emit code16 in assembly in 16-bit mode
Summary:
When generating assembly using -m16 we must explicitly mark it as
16-bit. Emit .code16 at beginning of file. Fixes wrong results when
using -fno-integrated-as.

Reviewers: dwmw2

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19392

llvm-svn: 267152
2016-04-22 13:36:11 +00:00
Simon Dardis
c64ea03044 [mips] Fix select patterns for MIPS64
When targetting MIPS64R6 some of the patterns for select were guarded by a
broken predicate. The predicate was supposed to test if a constant value
could fit in a 16 bit zero-extended field. Instead the value was tested to
fit in a 16 bit sign-extended field. For negative constants of native word
width this resulted in wrong code generation.

Reviewers: vkalintiris, dsanders

Differential Review: http://reviews.llvm.org/D19378

llvm-svn: 267151
2016-04-22 13:19:22 +00:00
Vasileios Kalintiris
fb33c878e4 [mips] Fix a small typo that would leave BLTZC out of getAnalyzableBrOpc().'
llvm-svn: 267149
2016-04-22 13:05:51 +00:00
Hrvoje Varga
0102a446f7 [mips][microMIPS] Implement SLT, SLTI, SLTIU, SLTU microMIPS32r6 instructions
Differential Revision: http://reviews.llvm.org/D19354

llvm-svn: 267137
2016-04-22 11:18:40 +00:00
Zoran Jovanovic
057e3e3388 [mips][microMIPS] Add R_MICROMIPS_PC18_S3 relocation
Differential Revision: http://reviews.llvm.org/D15026

llvm-svn: 267130
2016-04-22 10:15:12 +00:00
Daniel Sanders
ffe901fc26 Revert r267098 - [MachineCombiner] Support for floating-point FMA on ARM64
It introduced buildbot failures on clang-cmake-mips, clang-ppc64le-linux, among others.

llvm-svn: 267127
2016-04-22 09:37:26 +00:00
Ashutosh Nema
4becb7c51e [X86]: Changing cost for “TRUNCATE v16i32 to v16i8” in SSE4.1 mode.
Summary:
rL256194 transforms truncations between vectors of integers into PACKUS/PACKSS
operations during DAG combine. This generates better code for truncate, so cost
of truncate needs to be changed but looks like it got changed only in SSE2 table
Whereas this change is also applicable for SSE4.1, so the cost of truncate needs
to be changed for that as well. Cost of “TRUNCATE v16i32 to v16i8” & “TRUNCATE 
v16i16 to v16i8” should be same in SSE4.1 & SSE2 table. Removing their cost from
SSE4.1, so it will fall back to SSE2.

Reviewers: Simon Pilgrim
llvm-svn: 267123
2016-04-22 08:34:05 +00:00
Chris Dewhurst
028d2e0885 [Sparc] This provides support for itineraries on Sparc.
Specifically, itineraries for LEON processors has been added, along with several LEON processor Subtargets. Although currently all these targets are pretty much identical, support for features that will differ among these processors will be added in the very near future.

The different Instruction Itinerary Classes (IICs) added are sufficient to differentiate between the instruction timings used by LEON and, quite probably, by generic Sparc processors too, but the focus of the exercise has been for LEON processors, as the requirement of my project. If the IICs are not sufficient for other Sparc processor types and you want to add a new itinerary for one of those, it should be relatively trivial to adapt this.

As none of the LEON processors has Quad Floats, or is a Version 9 processor, none of those instructions have itinerary classes defined and revert to the default "NoItinerary" instruction itinerary.

Phabricator Review: http://reviews.llvm.org/D19359

llvm-svn: 267121
2016-04-22 08:17:17 +00:00
Chris Dewhurst
a841bbdbcd The following code would not work before this patch, due to the inability to take the address of a global object:
void func1() {

...
}

int main(int argc, char** argv) {

void (*pFunc)();
pFunc = &func1
pFunc();
...
}

Phabricator review: http://reviews.llvm.org/D19368

llvm-svn: 267120
2016-04-22 08:13:47 +00:00
Zlatko Buljan
b2404307e8 [mips][microMIPS] Implement DVP, EVP and JALRC.HB instructions
Differential Revision: http://reviews.llvm.org/D18687

llvm-svn: 267114
2016-04-22 06:44:34 +00:00
David Majnemer
4c6833cc3f Fix some spelling mistakes
llvm-svn: 267112
2016-04-22 06:37:48 +00:00
Craig Topper
05d7be2051 [SystemZ] Mark CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF as Expand instead of Custom since the custom logic just did what Expand does when CTTZ/CTLZ are Legal. NFC
llvm-svn: 267109
2016-04-22 05:29:58 +00:00
Craig Topper
9b0556c083 [Lanai] Set CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF to Expand instead of Legal so they will be converted to CTLZ/CTTZ by LegalizeDAG. Remove extra instructions that only existed to to contain patterns that match the zero_undef operations. NFC
llvm-svn: 267108
2016-04-22 05:13:01 +00:00
Craig Topper
07012bab8b [Lanai] Remove unused methods declarations. NFC
llvm-svn: 267107
2016-04-22 05:12:57 +00:00
Nicolai Haehnle
f54e57a212 AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic
Summary:
This intrinsic returns true if the current thread belongs to a live pixel
and false if it belongs to a pixel that we are executing only for derivative
computation. It will be used by Mesa to implement gl_HelperInvocation.

Note that for pixels that are killed during the shader, this implementation
also returns true, but it doesn't matter because those pixels are always
disabled in the EXEC mask.

This unearthed a corner case in the instruction verifier, which complained
about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but
correct code, so make the verifier accept it as such.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19191

llvm-svn: 267102
2016-04-22 04:04:08 +00:00
Craig Topper
6ea4b97a42 [AVX512] Teach lowering to use vplzcntd/q to implement 128/256-bit CTTZ_ZERO_UNDEF even without VLX support. We can just extend to 512-bits and extract like we do for CTLZ.
llvm-svn: 267100
2016-04-22 03:22:38 +00:00
Gerolf Hoflehner
d63bafa58b [MachineCombiner] Support for floating-point FMA on ARM64
Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:

- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math

llvm-svn: 267098
2016-04-22 02:15:19 +00:00
Dan Gohman
fdbfed8f83 [WebAssembly] Limit alignment hints to natural alignment.
This follows the current binary format rules.

llvm-svn: 267082
2016-04-21 23:59:48 +00:00
Saleem Abdulrasool
4fb09b352f ARM: restrict register class for WIN__DBZCHK
WIN__DBZCHK will insert a CBZ instruction into the stream.  This instruction
reserves 3 bits for the condition register (rn).  As such, we must ensure that
we restrict the register to a low register.  Use the tGPR class instead of GPR
to ensure that this is properly constrained.  In debug builds, we would attempt
to use lr as a condition register which would silently get truncated with no
hint that the register selection was incorrect.

llvm-svn: 267080
2016-04-21 23:53:19 +00:00
Tim Northover
95d001461b MachO: enable .data_region directives everywhere
We'd disabled them on x86 because back in the early days some host tools
couldn't handle the new load commands. This no longer holds: anyone capable of
deploying Clang should be able to deploy its copies of ar/ranlib/etc.

rdar://25254790

llvm-svn: 267075
2016-04-21 23:00:17 +00:00
Krzysztof Parzyszek
59ba2ba8f4 [Hexagon] Properly recognize register alt names
llvm-svn: 267038
2016-04-21 19:49:53 +00:00
Krzysztof Parzyszek
b6d9f909f0 [Hexagon] Expand handling of the small-data/bss section
llvm-svn: 267034
2016-04-21 18:56:45 +00:00
Quentin Colombet
e5c2597507 [RegisterBankInfo] Change the API for the verify methods.
Return bool instead of void so that it is natural to put the calls into
asserts.

llvm-svn: 267033
2016-04-21 18:34:43 +00:00
Matt Arsenault
b59e0e2747 AMDGPU: Fix debug name of pass to better match
I get this wrong every time I try to debug this.

llvm-svn: 267030
2016-04-21 18:21:54 +00:00
Nicolai Haehnle
0127d62008 Split IntrReadArgMem into IntrReadMem and IntrArgMemOnly
Summary:
IntrReadWriteArgMem simply becomes IntrArgMemOnly.

So there are fewer intrinsic properties that express their orthogonality
better, and correspond more closely to the corresponding IR attributes.

Suggested by: Philip Reames

Reviewers: joker.eph, reames, tstellarAMD

Subscribers: jholewinski, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19291

llvm-svn: 267021
2016-04-21 17:48:02 +00:00
Marcin Koscielnicki
452981873a [PowerPC] [SSP] Fix stack guard load for 32-bit.
r266809 incorrectly used LD to load the stack guard, it should be LWZ.

Differential Revision: http://reviews.llvm.org/D19358

llvm-svn: 267017
2016-04-21 17:36:05 +00:00
Zoran Jovanovic
173f170236 [mips][microMIPS] Implement ldpc instruction
Differential Revision: http://reviews.llvm.org/D15009

llvm-svn: 266990
2016-04-21 14:32:12 +00:00
Zoran Jovanovic
2f0e417cac [mips][microMIPS] Add R_MICROMIPS_PC19_S2 relocation
Differential Revision: http://reviews.llvm.org/D14915

llvm-svn: 266988
2016-04-21 14:09:35 +00:00
Zoran Jovanovic
a3f572b81a [mips][microMIPS] Add R_MICROMIPS_PC26_S1 relocation
Differential Revision: http://reviews.llvm.org/D14822

llvm-svn: 266985
2016-04-21 13:43:26 +00:00
Sam Kolton
63ee8a0dc2 [AMDGPU] Assembler: prevent parseDPPCtrlOps from eating invalid tokens
Reviewers: nhaustov, tstellarAMD

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19317

llvm-svn: 266984
2016-04-21 13:14:24 +00:00
Zlatko Buljan
f385a81a9c [mips][microMIPS] Implement TLBP, TLBR, TLBWI and TLBWR instructions
Differential Revision: http://reviews.llvm.org/D18855

llvm-svn: 266980
2016-04-21 11:32:40 +00:00
Zlatko Buljan
d599e5d9a8 [mips][microMIPS] Implement LL, SC, MOVEP, ROTR, ROTRV and SYSCALL instructions and add tests for LWM32 and SWM32
Differential Revision: http://reviews.llvm.org/D19150

llvm-svn: 266977
2016-04-21 11:01:51 +00:00
Evgeny Astigeevich
4405b74d21 [AArch64][CodeGen] Fix of PR27158: incorrect peephole optimization in AArch64InstrInfo::optimizeCompareInstr
AArch64InstrInfo::optimizeCompareInstr has bug PR27158 which causes generation of incorrect code.
A compare instruction is substituted with another instruction which does not
produce the same flags as the original compare instruction.
This patch contains:
1. Fix of the bug.
2. A regression test in MIR.
3. A new test to check that SUBS is replaced by SUB.

Differential Revision: http://reviews.llvm.org/D18838

llvm-svn: 266969
2016-04-21 08:54:08 +00:00
Craig Topper
b64a162a22 [AVX512] Add CTTZ support for v8i64 and v16i32 vectors.
llvm-svn: 266968
2016-04-21 07:30:06 +00:00
Craig Topper
dc636097f0 [AVX512] Add support for lowering CTTZ v64i8 and v32i16 with BWI instructions.
llvm-svn: 266963
2016-04-21 06:39:34 +00:00
Craig Topper
9bbc9aab84 [X86] Remove redundant calls to setOperationAction for EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT from SSE41 block. They were already done in an earlier block. NFC
llvm-svn: 266962
2016-04-21 06:39:32 +00:00
Craig Topper
d01eaac544 [X86] Remove some operations from the default Expand all vector ops loop. Instead let them stay Legal and mark them Expand for specific types where needed. Reduces overall number of calls to setOperationAction. NFC
llvm-svn: 266961
2016-04-21 06:39:29 +00:00
Craig Topper
f0c3929cb7 [X86] Remove old leftover MMX code that sets various 64-bit vector operations to Expand. These vector types aren't legal so these operations would never make it far enough to need to expand. NFC
llvm-svn: 266960
2016-04-21 06:39:26 +00:00
Craig Topper
34b6242c22 [X86] Remove unnecessary setting of CTTZ_ZERO_UNDEF to Custom for vector types where we can't do any better than the Custom lowering of CTTZ. LegalizeVectorOps will expand to CTTZ since its marked Custom.
CTTZ_ZERO_UNDEF can be custom lowered specially if CTLZ is supported. Otherwise CTTZ and CTTZ_ZERO_UNDEF are handled the same way by using CTPOP and bitmath.

llvm-svn: 266952
2016-04-21 04:44:00 +00:00
Craig Topper
7e74d3bc64 [AVX512] Add support for popcount of v8i64 and v16i32 with and without BWI instructions.
Without BWI we have to split the vectors into 256-bit vectors so we can use AVX2 pshufb and then concatenate the results.

llvm-svn: 266950
2016-04-21 03:57:24 +00:00
Krzysztof Parzyszek
d8b3e72047 [Hexagon] Add -mv.. options to override CPU selection
This is for compatibility with scripts that use -mv5, etc. with the
assembler.

llvm-svn: 266918
2016-04-20 21:17:40 +00:00
Davide Italiano
bb4e30e8f3 [MC] Silence warning due to unused variable in !Debug builds.
llvm-svn: 266901
2016-04-20 18:45:31 +00:00
Jacques Pienaar
0cb285e7fd [lanai] Add subword scheduling itineraries.
Differentiate between word and subword memory operations as they take different
amount of cycles to complete. This just adds a basic model of the subword
latency to the scheduler.

llvm-svn: 266898
2016-04-20 18:28:55 +00:00
Davide Italiano
46875b649d [MC] EmitNop: Make an assertion more useful.
Differential Revision:  http://reviews.llvm.org/D19334

llvm-svn: 266895
2016-04-20 17:53:21 +00:00
Krzysztof Parzyszek
c603d6f55f [Hexagon] Fix handling of lcomm directive
Patch by Colin LeMahieu.

llvm-svn: 266882
2016-04-20 15:54:13 +00:00
Krzysztof Parzyszek
17749987f0 [RDF] Consider register as live if any alias is live
This only affects the recomputation of kill flags.

llvm-svn: 266875
2016-04-20 14:33:23 +00:00
Zoran Jovanovic
69b803dc41 [mips][microMIPS] Implement BGEC, BGEUC, BLTC, BLTUC, BEQC and BNEC instructions
Differential Revision: http://reviews.llvm.org/D14206

llvm-svn: 266873
2016-04-20 14:07:46 +00:00
Nikolay Haustov
b49bd2b1da AMDGPU/SI: Assembler: improvements to support trap handlers.
Add ParseAMDGPURegister which can be invoked recursively for parsing lists.
Rename getRegForName to getSpecialRegForName.
Support legacy SP3 register list syntax: [s2,s3,s4,s5] or [flat_scratch_lo,flat_scratch_hi].
Add 64-bit registers TBA, TMA where missing.
Add some tests.

Differential Revision: http://reviews.llvm.org/D19163

llvm-svn: 266865
2016-04-20 09:34:48 +00:00
Asaf Badouh
9cc5def4a3 [X86] enable PIE for functions
Call locally defined function directly for PIE/fPIE

Differential Revision: http://reviews.llvm.org/D19226

llvm-svn: 266863
2016-04-20 08:32:57 +00:00
Hrvoje Varga
c7e838369e [mips][microMIPS]Implement CFC*, CTC* and LDC* instructions
Differential Revision: http://reviews.llvm.org/D18640

llvm-svn: 266861
2016-04-20 06:34:48 +00:00
Craig Topper
f480c8bddb [AVX512] Add popcount support for v32i16 and v64i8.
llvm-svn: 266858
2016-04-20 05:18:55 +00:00
Craig Topper
4fb9a12474 [X86] Mark some floating point operations that are always expanded for vector types as Expand in a floating point only loop instead of looping through all vector types.
llvm-svn: 266850
2016-04-20 01:57:44 +00:00
Craig Topper
76b73c1374 [X86] Don't mark vector loads and shifts Expand in advance. Loads are always marked Legal or Promote for all the legal types later. Shifts are always marked custom. NFC
llvm-svn: 266849
2016-04-20 01:57:42 +00:00
Craig Topper
d0e724fa60 [X86] Merge the two different SSE2 blocks in the X86TargetLowering constructor. Also qualfiy the XOP block with !useSoftFloat to match the other vector blocks.
llvm-svn: 266848
2016-04-20 01:57:40 +00:00
Craig Topper
1bdb84bae2 [X86] Don't set vector FADD,FSUB,FMUL,FDIV,FNEG,FSQRT to Expand early. For every legal FP type we either set them to Legal or Custom anyway. So let them stay defaulted to Legal and only change when they need to be Custom.
llvm-svn: 266847
2016-04-20 01:57:38 +00:00
Marcin Koscielnicki
64bfaf0336 [SystemZ] Add support for llvm.thread.pointer intrinsic.
Differential Revision: http://reviews.llvm.org/D19054

llvm-svn: 266844
2016-04-20 01:03:48 +00:00
NAKAMURA Takumi
3600a30ffa MipsAsmParser::loadImmediate(): Prune an obsolete \param in r266602. [-Wdocumentation]
llvm-svn: 266841
2016-04-20 00:55:38 +00:00
Tim Northover
8ead88dca1 ARM: fix assertion failure on -O0 cmpxchg.
Because lowering of CMP_SWAP_64 occurs during type legalization, there can be
i64 types produced by more than just a BUILD_PAIR or similar. My initial tests
used just incoming function args.

llvm-svn: 266828
2016-04-19 22:25:02 +00:00
Nicolai Haehnle
0c7a341af5 Add IntrWrite[Arg]Mem intrinsic property
Summary:
This property is used to mark an intrinsic that only writes to memory, but
neither reads from memory nor has other side effects.

An example where this is useful is the llvm.amdgcn.buffer.store.format.*
intrinsic, which corresponds to a store instruction that goes through a special
buffer descriptor rather than through a plain pointer.

With this property, the intrinsic should still be handled as having side
effects at the LLVM IR level, but machine scheduling can make smarter
decisions.

Reviewers: tstellarAMD, arsenm, joker.eph, reames

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18291

llvm-svn: 266826
2016-04-19 21:58:33 +00:00
Nicolai Haehnle
e4cebbe0dc AMDGPU: Guard VOPC instructions against incorrect commute
Summary:
The added testcase, which triggered this, was derived from a shader-db case
via bugpoint. A separate question is why scalar branching wasn't used.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19208

llvm-svn: 266825
2016-04-19 21:58:22 +00:00
Nicolai Haehnle
25b765f396 AMDGPU/SI: SGPR accounting in getSIProgramInfo must ignore exec_lo/hi
Summary:
A shader stored the live mask (initial exec mask) in an SGPR which was then
spilled during register allocation. The allocator quite reasonably
optimized turned the spill into

  v_writelane_b32 %vgpr, exec_lo, N
  v_writelane_b32 %vgpr, exec_hi, N+1

at the beginning of the shader, confusing the SGPR accounting.

No test case, because si-sgpr-spill.ll together with an upcoming patch for
WQM handling exhibits the problem.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19199

llvm-svn: 266824
2016-04-19 21:58:17 +00:00
Krzysztof Parzyszek
6a86876079 [Hexagon] Fix operand swapping in HexagonPeephole
Also, disable zero- and size-extend optimizations for now.

llvm-svn: 266821
2016-04-19 21:36:24 +00:00
Marcin Koscielnicki
0919e582e6 [AArch64] [ARM] Make a target-independent llvm.thread.pointer intrinsic.
Both AArch64 and ARM support llvm.<arch>.thread.pointer intrinsics that
just return the thread pointer.  I have a pending patch that does the same
for SystemZ (D19054), and there are many more targets that could benefit
from one.

This patch merges the ARM and AArch64 intrinsics into a single target
independent one that will also be used by subsequent targets.

Differential Revision: http://reviews.llvm.org/D19098

llvm-svn: 266818
2016-04-19 20:51:05 +00:00
Krzysztof Parzyszek
5607676c28 [Hexagon] Fix printing the address operand of S2_storerinewabs
llvm-svn: 266811
2016-04-19 20:20:33 +00:00
Tim Shen
a12f27cb73 [PPC, SSP] Support PowerPC Linux stack protection.
llvm-svn: 266809
2016-04-19 20:14:52 +00:00
Tim Shen
3a75cd4bf9 [SSP, 2/2] Create llvm.stackguard() intrinsic and lower it to LOAD_STACK_GUARD
With this change, ideally IR pass can always generate llvm.stackguard
call to get the stack guard; but for now there are still IR form stack
guard customizations around (see getIRStackGuard()). Future SSP
customization should go through LOAD_STACK_GUARD.

There is a behavior change: stack guard values are not CSEed anymore,
since we should never reuse the value in case that it has been spilled (and
corrupted). See ssp-guard-spill.ll. This also cause the change of stack
size and codegen in X86 and AArch64 test cases.

Ideally we'd like to know if the guard created in llvm.stackprotector() gets
spilled or not. If the value is spilled, discard the value and reload
stack guard; otherwise reuse the value. This can be done by teaching
register allocator to know how to rematerialize LOAD_STACK_GUARD and
force a rematerialization (which seems hard), or check for spilling in
expandPostRAPseudo. It only makes sense when the stack guard is a global
variable, which requires more instructions to load. Anyway, this seems to go out
of the scope of the current patch.

llvm-svn: 266806
2016-04-19 19:40:37 +00:00
Jacques Pienaar
88ae65cfbd [lanai] Add lowering for SETCCE i32.
* Add lowering for SETCCE i32.
* Add test to check lowering of i64 compares uses SETCCE expansion (outside of EQ and NE).
* Fix select.ll test and immediate form selection for RI operations.

llvm-svn: 266802
2016-04-19 19:15:25 +00:00
Sanjoy Das
a5665e943c [X86] Simplify StackMapShadowTracker; NFC
- Elide trivial contructor and desctructor
 - Move implementation out of an unnecessary explicit llvm namespace
   scope

llvm-svn: 266794
2016-04-19 18:48:16 +00:00
Sanjoy Das
3ceeafa248 [X86MCInstLower] Clean up EmitNops; NFC
Instead of having a conditional assert inside EmitNops, refactor so that
the caller can have the assert instead.

llvm-svn: 266793
2016-04-19 18:48:13 +00:00
Krzysztof Parzyszek
a1a15a920f [Hexagon] Implement branch relaxation
Patch by Sirish Pande.

llvm-svn: 266792
2016-04-19 18:30:18 +00:00
David L Kreitzer
99b2b898cb Preliminary changes for fixing PR27241. Generalized/restructured some things
in preparation for enabling the outgoing parameter store-to-push optimization
for 64-bit targets.

Differential Revision: http://reviews.llvm.org/D19222

llvm-svn: 266774
2016-04-19 17:43:44 +00:00
Simon Pilgrim
e1392bc92c [X86][AVX2] Prefer VPERMQ/VPERMPD over VINSERTI128/VINSERTF128 for unary shuffles
Using VPERMQ/VPERMPD allows memory folding of the (repeated) input where VINSERTI128/VINSERTF128 can not.

Differential Revision: http://reviews.llvm.org/D19228

llvm-svn: 266728
2016-04-19 12:26:40 +00:00
Sanjoy Das
7e1b5ea5d3 Disable the PatchableFunction pass for NVPTX & Wasm
PatchableFunction requires AllVRegsAllocated that these targets don't
provide.

llvm-svn: 266720
2016-04-19 06:24:58 +00:00
Sanjoy Das
91fd65c3a6 Introduce a "patchable-function" function attribute
Summary:
The `"patchable-function"` attribute can be used by an LLVM client to
influence LLVM's code generation in ways that makes the generated code
easily patchable at runtime (for instance, to redirect control).
Right now only one patchability scheme is supported,
`"prologue-short-redirect"`, but this can be expanded in the future.

Reviewers: joker.eph, rnk, echristo, dberris

Subscribers: joker.eph, echristo, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D19046

llvm-svn: 266715
2016-04-19 05:24:47 +00:00
Jacques Pienaar
6d78964128 [lanai] Set boolean contentss to ZeroOrOneBooleanContent.
llvm-svn: 266701
2016-04-19 00:26:42 +00:00
Tim Northover
5f6de253c5 ARM: use a pseudo-instruction for cmpxchg at -O0.
The fast register-allocator cannot cope with inter-block dependencies without
spilling. This is fine for ldrex/strex loops coming from atomicrmw instructions
where any value produced within a block is dead by the end, but not for
cmpxchg. So we lower a cmpxchg at -O0 via a pseudo-inst that gets expanded
after regalloc.

Fortunately this is at -O0 so we don't have to care about performance. This
simplifies the various axes of expansion considerably: we assume a strong
seq_cst operation and ensure ordering via the always-present DMB instructions
rather than v8 acquire/release instructions.

Should fix the 32-bit part of PR25526.

llvm-svn: 266679
2016-04-18 21:48:55 +00:00
JF Bastien
2809029d17 Lanai: fix debug build
There's currently no raw_ostream &operator<<(SimpleValueType); provided by LLVM. It could be added by refactoring utils/TableGen/CodeGenTarget.cpp:getEnumName, but that's much more work than fixing the build.

llvm-svn: 266627
2016-04-18 16:33:41 +00:00
Konstantin Zhuravlyov
809d827c49 [AMDGPU] Add insert nops pass based on subtarget features instead of cl::opt
Also,
- Skip pass if machine module does not have debug info
- Minor comment changes
- Added test

Differential Revision: http://reviews.llvm.org/D19079

llvm-svn: 266626
2016-04-18 16:28:23 +00:00
Artem Tamazov
cc6f4e6962 [AMDGPU][llvm-mc] s_setreg* - Fix order of operands
Order should match the sp3 syntax, where destination (simm16 denoting the hwreg) is coming first.

Differential Revision: http://reviews.llvm.org/D19161

llvm-svn: 266617
2016-04-18 14:54:26 +00:00
Aaron Ballman
9c5a572171 Silence some "initialized but unused" warnings from MSVC -- the function being called is a static function, so there's no need for an instance variable. NFC.
llvm-svn: 266616
2016-04-18 14:47:19 +00:00
Daniel Sanders
e94fb01a45 [mips][ias] Prevent double-filling of delay slots by generating '.set noreorder' regions.
Summary:
When clang is given -save-temps or -via-file-asm, any inline assembly in
the source is parsed twice. Once by the compiler, and again by the
assembler. We must take care to ensure that this doesn't lead to
double-filling delay slots.

Reviewers: sdardis, vkalintiris

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D19166

llvm-svn: 266608
2016-04-18 12:35:36 +00:00
Eric Liu
983218e96e Include SmallVector.h header in lib/Target/WebAssembly/InstPrinter/WebAssemblyInstPrinter.h
llvm-svn: 266606
2016-04-18 12:21:59 +00:00
Renato Golin
4628935835 [ARM] AArch32 v8 NEON is still not IEEE-754 compliant
llvm-svn: 266603
2016-04-18 12:06:47 +00:00
Daniel Sanders
c3774bdfc9 [mips][ias] Stream macro expansions to output instead of buffering them. NFC.
Summary:
This will allows us to eliminate some magic numbers from the offset operand of
branch instructions in favour of symbols and makes it possible to avoid
double-filling delay slots when clang is given -save-temps.

parseDirectiveCpRestore() is calling isIntegratedAssemblerRequired() for the
moment since correctly pushing the generation of these instructions into the
ELF target streamer is tricky enough to warrant a separate patch.

Reviewers: sdardis, vkalintiris

Subscribers: dsanders, llvm-commits, sdardis

Differential Revision: http://reviews.llvm.org/D19164

llvm-svn: 266602
2016-04-18 12:06:15 +00:00
Mehdi Amini
9ff867f98c [NFC] Header cleanup
Removed some unused headers, replaced some headers with forward class declarations.

Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'

Patch by Eugene Kosov <claprix@yandex.ru>

Differential Revision: http://reviews.llvm.org/D19219

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
2016-04-18 09:17:29 +00:00
Craig Topper
e20db8c870 [X86] Be explicit about calls to setOperationAction for AVX2 and AVX512 rather than just looping over all vector types and conditinally matching them. NFC
llvm-svn: 266577
2016-04-17 22:49:46 +00:00
Craig Topper
8419372e6c Declare MVT::SimpleValueType as an int8_t sized enum. This removes 400 bytes from TargetLoweringBase and probably other places.
This required changing several places to print VT enums as strings instead of raw ints since the proper method to use to print became ambiguous. This is probably an improvement anyway.

This also appears to save ~8K from an x86 self host build of llc.

llvm-svn: 266562
2016-04-17 17:37:33 +00:00
Simon Pilgrim
4faecf2c4a [X86] Added TODO comment for target shuffle mask decoding of bitcasted masks
llvm-svn: 266559
2016-04-17 11:34:18 +00:00
Asaf Badouh
9e7a551497 [X86] Remove unneeded variables
no functional change.
ExtraLoad and WrapperKind are been used only if (OpFlags == X86II::MO_GOTPCREL).

Differential Revision: http://reviews.llvm.org/D18942

llvm-svn: 266557
2016-04-17 08:28:40 +00:00
Craig Topper
bb00d9af19 [AVX512] ISD::MUL v2i64/v4i64 should only be legal if DQI and VLX features are enabled.
llvm-svn: 266554
2016-04-17 07:25:39 +00:00
Craig Topper
ec533074db [X86] Use ternary operator to reduce code slightly. NFC
llvm-svn: 266534
2016-04-16 19:09:32 +00:00
Simon Pilgrim
cd3aca2121 [X86][XOP] Added VPPERM constant mask decoding and target shuffle combining support
Added additional test that peeks through bitcast to v16i8 mask

llvm-svn: 266533
2016-04-16 17:52:07 +00:00
Matt Arsenault
ff544fe603 AMDGPU: Enable LocalStackSlotAllocation pass
This resolves more frame indexes early and folds
the immediate offsets into the scratch mubuf instructions.

This cleans up a lot of the mess that's currently emitted,
such as emitting add 0s and repeatedly initializing the same
register to 0 when spilling.

llvm-svn: 266508
2016-04-16 02:13:37 +00:00
Matt Arsenault
5d84ff0690 AMDGPU: Use s_addk_i32 / s_mulk_i32
llvm-svn: 266506
2016-04-16 01:46:49 +00:00
Vasileios Kalintiris
202510e60b [mips] More range-based for loops. NFC.
There are still a couple more inside the MIPS target. I opted for a single
commit in order to avoid spamming the list.

llvm-svn: 266472
2016-04-15 20:43:17 +00:00
Vasileios Kalintiris
820f0ffce2 [mips] Use range-based for loops and simplify slightly the code. NFC.
llvm-svn: 266471
2016-04-15 20:18:48 +00:00
Ulrich Weigand
41b710c861 [SystemZ] Call tryAddingSymbolicOperand in the disassembler
Use the tryAddingSymbolicOperand callback to attempt to present immediate
values in symbolic form when disassembling.  This is currently only used
for PC-relative immediates (which are most likely to be symbolic in the
SystemZ ISA).  Add new DecodeMethod types to allow distinguishing between
branch and non-branch instructions.

llvm-svn: 266469
2016-04-15 19:55:58 +00:00
Tim Northover
4396c4b8cf ARM: don't try to hoist constant RHS out of a division.
Divisions by a constant can be converted into multiplies which are usually
cheaper, but this isn't possible if the constant gets separated (particularly
in loops). Fix this by telling ConstantHoisting that the immediate in a DIV is
cheap.

I considered making the check generic, but neither AArch64 (strangely) nor x86
showed any benefit on the tests I had.

llvm-svn: 266464
2016-04-15 18:17:18 +00:00
Chad Rosier
8ec20feca4 [AArch64] Add load/store pair instructions to getMemOpBaseRegImmOfsWidth().
This improves AA in the MI schduler when reason about paired instructions.

Phabricator Revision: http://reviews.llvm.org/D17098
PR26358

llvm-svn: 266462
2016-04-15 18:09:10 +00:00
Geoff Berry
56afefb9c8 [AArch64] Add MMOs to callee-save load/store instructions.
Summary:
Without MMOs, the callee-save load/store instructions were treated as
volatile by the MI post-RA scheduler and AArch64LoadStoreOptimizer.

Reviewers: t.p.northover, mcrosier

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D17661

llvm-svn: 266439
2016-04-15 15:16:19 +00:00
Nirav Dave
a36a5aeca3 Fix typing on generated LXV2DX/STXV2DX instructions
[PPC] Previously when casting generic loads to LXV2DX/ST instructions we
would leave the original load return type in place allowing for an
assertion failure when we merge two equivalent LXV2DX nodes with
different types.

This fixes PR27350.

Reviewers: nemanjai

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19133

llvm-svn: 266438
2016-04-15 15:01:38 +00:00
Jun Bum Lim
ad7ab4cf46 [MachineScheduler]Add support for store clustering
Perform store clustering just like load clustering. This change add
StoreClusterMutation in machine-scheduler. To control StoreClusterMutation,
added enableClusterStores() in TargetInstrInfo.h. This is enabled only on
AArch64 for now.

This change also add support for unscaled stores which were not handled in
getMemOpBaseRegImmOfs().

llvm-svn: 266437
2016-04-15 14:58:38 +00:00
Nicolai Haehnle
730a184595 AMDGPU/SI: Fix regression with no-return atomics
Summary:
In the added test-case, the atomic instruction feeds into a non-machine
CopyToReg node which hasn't been selected yet, so guard against
non-machine opcodes here.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19043

llvm-svn: 266433
2016-04-15 14:42:36 +00:00
Craig Topper
ab8ae0e21d Use MVT instead of EVT to remove a bunch of unnecessary calls to getSimpleVT.
llvm-svn: 266414
2016-04-15 06:20:21 +00:00
Craig Topper
d62db6aa41 Add a setOperationPromotedToType convenience method that sets an operation to promoted and set the type in one call. Use it so save code in X86.
llvm-svn: 266413
2016-04-15 06:20:18 +00:00
Craig Topper
31a58dd264 [X86] AND, OR, and XOR of vectors are always legal no need to set them legal explicitly.
llvm-svn: 266412
2016-04-15 06:20:14 +00:00
Craig Topper
61886cf59d [X86] Combine an if and else block that had the same set of calls to setOperationAction that only varied in Legal/Custom. Use the ternary operator on that argument instead. NFC
llvm-svn: 266410
2016-04-15 04:57:09 +00:00
Justin Lebar
4b4ad39f81 [NVPTX] Set NVPTXTTI::getInliningThresholdMultiplier to 5.
Summary:
Calls on NVPTX are unusually expensive (for one thing, lots of state
needs to be saved to memory, which is slow), so make the inlininer much
more aggressive.

Reviewers: chandlerc

Subscribers: jholewinski, llvm-commits, tra

Differential Revision: http://reviews.llvm.org/D18561

llvm-svn: 266406
2016-04-15 01:38:50 +00:00
Matt Arsenault
6e839f3775 AMDGPU: Remove custom load/store scalarization
llvm-svn: 266385
2016-04-14 23:31:26 +00:00
Matt Arsenault
18048c7ee0 AMDGPU: Include LDS size in printed comment
llvm-svn: 266382
2016-04-14 22:11:51 +00:00
Mehdi Amini
ea195a382e Remove every uses of getGlobalContext() in LLVM (but the C API)
At the same time, fixes InstructionsTest::CastInst unittest: yes
you can leave the IR in an invalid state and exit when you don't
destroy the context (like the global one), no longer now.

This is the first part of http://reviews.llvm.org/D19094

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266379
2016-04-14 21:59:01 +00:00
Matt Arsenault
2dfb6d03c5 AMDGPU: Run SIFoldOperands after PeepholeOptimizer
PeepholeOptimizer cleans up redundant copies, which makes
the operand folding more effective.

shader-db stats:

Totals:
SGPRS: 34200 -> 34336 (0.40 %)
VGPRS: 22118 -> 21655 (-2.09 %)
Code Size: 632144 -> 633460 (0.21 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 10240 -> 11264 (10.00 %) bytes per wave
Max Waves: 8822 -> 8918 (1.09 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 7704 -> 7840 (1.77 %)
VGPRS: 5169 -> 4706 (-8.96 %)
Code Size: 234444 -> 235760 (0.56 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 0 -> 1024 (0.00 %) bytes per wave
Max Waves: 1188 -> 1284 (8.08 %)
Wait states: 0 -> 0 (0.00 %)

Increases:
SGPRS: 35 (0.01 %)
VGPRS: 1 (0.00 %)
Code Size: 59 (0.02 %)
LDS: 0 (0.00 %)
Scratch: 1 (0.00 %)
Max Waves: 48 (0.02 %)
Wait states: 0 (0.00 %)

Decreases:
SGPRS: 26 (0.01 %)
VGPRS: 54 (0.02 %)
Code Size: 68 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Max Waves: 4 (0.00 %)
Wait states: 0 (0.00 %)

llvm-svn: 266378
2016-04-14 21:58:24 +00:00
Matt Arsenault
61abb9daf9 AMDGPU: Directly emit m0 initialization with s_mov_b32
Currently what comes out of instruction selection is a
register initialized to -1, and then copied to m0.
MachineCSE doesn't consider copies, but we want these
to be CSEed. This isn't much of a problem currently,
because SIFoldOperands is run immediately after.

This avoids regressions when SIFoldOperands is run later
from leaving all copies to m0.

llvm-svn: 266377
2016-04-14 21:58:15 +00:00
Matt Arsenault
e73cb153a7 AMDGPU: Fold bitcasts of scalar constants to vectors
This cleans up some messes since the individual scalar components
can be CSEed.

llvm-svn: 266376
2016-04-14 21:58:07 +00:00
Renato Golin
bf4b959200 [ARM] Adding IEEE-754 SIMD detection to loop vectorizer
Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON.

This patch teaches the loop vectorizer to only allow transformations of loops
that either contain no floating-point operations or have enough allowance
flags supporting lack of precision (ex. -ffast-math, Darwin).

For that, the target description now has a method which tells us if the
vectorizer is allowed to handle FP math without falling into unsafe
representations, plus a check on every FP instruction in the candidate loop
to check for the safety flags.

This commit makes LLVM behave like GCC with respect to ARM NEON support, but
it stops short of fixing the underlying problem: sub-normals. Neither GCC
nor LLVM have a flag for allowing sub-normal operations. Before this patch,
GCC only allows it using unsafe-math flags and LLVM allows it by default with
no way to turn it off (short of not using NEON at all).

As a first step, we push this change to make it safe and in sync with GCC.
The second step is to discuss a new sub-normal's flag on both communitues
and come up with a common solution. The third step is to improve the FastMath
flags in LLVM to encode sub-normals and use those flags to restrict NEON FP.

Fixes PR16275.

llvm-svn: 266363
2016-04-14 20:42:18 +00:00
Tom Stellard
383448325b AMDGPU: Add skeleton GlobalIsel implementation
Summary:
This adds the necessary target code to be able to run the ir translator.
Lowering function arguments and returns is a nop and there is no support
for RegBankSelect.

Reviewers: arsenm, qcolombet

Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits

Differential Revision: http://reviews.llvm.org/D19077

llvm-svn: 266356
2016-04-14 19:09:28 +00:00
Reid Kleckner
2c00bf1830 Sink DI metadata usage out of MachineInstr.h and MachineInstrBuilder.h
MachineInstr.h and MachineInstrBuilder.h are very popular headers,
widely included across all LLVM backends. It turns out that there only a
handful of TUs that actually care about DI operands on MachineInstrs.

After this change, touching DebugInfoMetadata.h and rebuilding llc only
needs 112 actions instead of 542.

llvm-svn: 266351
2016-04-14 18:29:59 +00:00
Jacques Pienaar
7870fedd7e [lanai] Add custom lowering for SRL_PARTS i32.
llvm-svn: 266349
2016-04-14 17:59:22 +00:00
Tom Stellard
63209a899d [GlobalISel] Move GISelAccessor class into public headers
Reviewers: qcolombet

Subscribers: joker.eph, vkalintiris, llvm-commits

Differential Revision: http://reviews.llvm.org/D19120

llvm-svn: 266348
2016-04-14 17:45:38 +00:00
Nicolai Haehnle
25eef7cc0f [StructurizeCFG] Annotate branches that were treated as uniform
Summary:
This fully solves the problem where the StructurizeCFG pass does not
consider the same branches as uniform as the SIAnnotateControlFlow pass.
The patch in D19013 helps with this problem, but is not sufficient
(and, interestingly, causes a "regression" with one of the existing
test cases).

No tests included here, because tests in D19013 already cover this.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19018

llvm-svn: 266346
2016-04-14 17:42:35 +00:00
Nicolai Haehnle
014104e476 AMDGPU: Remove SIFixSGPRLiveRanges pass
Summary:
This pass is unnecessary and overly conservative. It was motivated by
situations like

  def %vreg0:SGPR_32
  ...
if-block:
  ..
  def %vreg1:SGPR_32
  ...
else-block:
  ...
  use %vreg0:SGPR_32
  ...

and similar situations with uses after the non-uniform control flow, where
we are not allowed to assign %vreg0 and %vreg1 to the same physical register,
even though in the original, thread/workitem-based CFG, it looks like the
live ranges of these registers do not overlap.

However, by the time register allocation runs, we have moved to a wave-based
CFG that accurately represents the fact that the wave may run through both
the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already
overlap even without the SIFixSGPRLiveRanges pass.

In addition to proving this change correct, I have tested it with Piglit
and a small number of other tests.

Reviewers: arsenm, tstellarAMD

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19041

llvm-svn: 266345
2016-04-14 17:42:29 +00:00
Nicolai Haehnle
141bf141e2 AMDGPU: change a redundant if () to an assert(). NFC
Summary:
I've been carrying this change around with me for a while, because the if ()
managed to confuse me while following the code. All callers ensure that the
assertion holds.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19042

llvm-svn: 266344
2016-04-14 17:42:18 +00:00
Tom Stellard
c2eabf0b58 [GlobalISel] Coding style and whitespace fixes
Reviewers: qcolombet

Subscribers: joker.eph, llvm-commits, vkalintiris

Differential Revision: http://reviews.llvm.org/D19119

llvm-svn: 266342
2016-04-14 17:23:33 +00:00
Tim Northover
d6721e4e7e AArch64: expand cmpxchg after regalloc at -O0.
FastRegAlloc works only at the basic-block level and spills all live-out
registers. Unfortunately for a stack-based cmpxchg near the spill slots, this
can perpetually clear the exclusive monitor, which means the cmpxchg will never
succeed.

I believe the only way to handle this within LLVM is by expanding the loop
post-regalloc. We don't want this in general because it severely limits the
optimisations that can be done, so we limit this to -O0 compilations.

It's an ugly hack, and about the one good point in the whole mess is that we
can treat all cmpxchg operations in the most naive way possible (seq_cst, no
clrex faff) without affecting correctness.

Should fix PR25526.

llvm-svn: 266339
2016-04-14 17:03:29 +00:00
Jacques Pienaar
77c93881e7 [lanai] Add areMemAccessesTriviallyDisjoint, getMemOpBaseRegImmOfs and getMemOpBaseRegImmOfsWidth.
Summary: Add getMemOpBaseRegImmOfsWidth to enable determining independence during MiSched.

Reviewers: eliben, majnemer

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18903

llvm-svn: 266338
2016-04-14 16:47:42 +00:00
Tom Stellard
c0b7282ebc AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit
Summary:
For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD.

This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions.

Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug.

Reviewers: mareko, arsenm, tstellarAMD, nhaehnle

Subscribers: FireBurn, kerberizer, llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D18340

Patch By: Bas Nieuwenhuizen

llvm-svn: 266337
2016-04-14 16:27:07 +00:00
Tom Stellard
b9654f5566 AMDGPU/SI: Use the correct scratch wave offset register for shaders.
Summary:
The code previously always used s1 as it was using the user + system SGPR
information for compute kernels. This is incorrect for Mesa shaders though,

The register should be the next SGPR after all user and system SGPR's.
We use that Mesa adds arguments for all input and system SGPR's and
take the next available SGPR for the scratch wave offset register.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Reviewers: mareko, arsenm, nhaehnle, tstellarAMD

Subscribers: qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18941

Patch By: Bas Nieuwenhuizen

llvm-svn: 266336
2016-04-14 16:27:03 +00:00
Simon Dardis
46f0e9cf63 Summary:
Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like
binutils.

This patch was previous committed as r266055 as seemed to have caused some spurious
test failures. They did not reappear after further local testing.

llvm-svn: 266301
2016-04-14 13:43:17 +00:00
Mehdi Amini
6bf091d940 Do not use getGlobalContext()... ever.
This code was creating a new type in the global context, regardless
of which context the user is sitting in, what can possibly go wrong?

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266275
2016-04-14 04:36:40 +00:00
Matt Arsenault
489a8fbeea AMDGPU: Implement canonicalize
Also add generic DAG node for it.

llvm-svn: 266272
2016-04-14 01:42:16 +00:00
Matthias Braun
ba8f42fa2c TargetLowering: Factor out common code for tail call eligibility checking; NFC
llvm-svn: 266270
2016-04-14 01:10:42 +00:00
Tim Northover
3d26dcef22 ARM: override cost function to re-enable ConstantHoisting (& fix it).
At some point, ARM stopped getting any benefit from ConstantHoisting because
the pass called a different variant of getIntImmCost. Reimplementing the
correct variant revealed some problems, however:

  + ConstantHoisting was modifying switch statements. This is simply invalid,
    the cases must remain integer constants no matter the notional cost.
  + ConstantHoisting was mangling alloca instructions in the entry block. These
    should be handled by FrameLowering, so constants actually have a cost of 0.
    Worse, the resulting bitcasts meant they became dynamic allocas.

rdar://25707382

llvm-svn: 266260
2016-04-13 23:08:27 +00:00
Matthias Braun
1f20820c09 ARM: Use a callee save register for the swiftself parameter.
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.

Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.

This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.

Differential Revision: http://reviews.llvm.org/D18901

llvm-svn: 266253
2016-04-13 21:43:25 +00:00
Matthias Braun
fc08c62acd X86: Use a callee save register for the swiftself parameter.
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.

Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.

This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.

Differential Revision: http://reviews.llvm.org/D18902

llvm-svn: 266252
2016-04-13 21:43:21 +00:00
Matthias Braun
40d6ea2b7b AArch64: Use a callee save registers for swiftself parameters
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.

Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.

This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.

Differential Revision: http://reviews.llvm.org/D19007

llvm-svn: 266251
2016-04-13 21:43:16 +00:00
Tom Stellard
937a1371b7 AMDGPU/SI: Add support for spilling VGPRs without having to scavenge registers
Summary:
When we are spilling SGPRs to scratch memory, we usually don't have
free SGPRs to do the address calculation, so we need to re-use the
ScratchOffset register for the calculation.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18917

llvm-svn: 266244
2016-04-13 20:44:16 +00:00
Nemanja Ivanovic
f6399ae1dc [PowerPC] Basic support for P9 byte comparison and count trailing zero insns
This patch corresponds to review:
http://reviews.llvm.org/D17850

This patch implements the following instructions:
cmprb, cmpeqb, cnttzw, cnttzw., cnttzd, cnttzd.

llvm-svn: 266228
2016-04-13 18:51:18 +00:00
Evandro Menezes
076242a2eb [AArch64] Disable LDP/STP for quads
Disable LDP/STP for quads on Exynos M1 as they are not as efficient as pairs
of regular LDR/STR.

Patch by Abderrazek Zaafrani <a.zaafrani@samsung.com>.

llvm-svn: 266223
2016-04-13 18:31:45 +00:00
Tim Northover
f9f3caeceb AArch64: don't create instructions that write to xzr/wzr twice.
These are unpredictable even on AArch64.

Patch by Yichao Yu.

llvm-svn: 266206
2016-04-13 16:25:39 +00:00
Artem Tamazov
ead8b434de [AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA)git status
Tests added along with implemented feature.
Note that there is a small leftover of unecessary MI sheduling issue
(more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix
the false regression.

TODO: Support for TTMP quads, comma-separated syntax in "[]" and more.

Differential Revision: http://reviews.llvm.org/D17825

llvm-svn: 266205
2016-04-13 16:18:41 +00:00
Zoran Jovanovic
9cd420ea3a [mips] Fix emitAtomicCmpSwapPartword to handle 64 bit pointers correctly
Differential Revision: http://reviews.llvm.org/D18995

llvm-svn: 266204
2016-04-13 16:02:25 +00:00
Vasileios Kalintiris
93f6871925 [mips] Sign-extend i32 values truncated from previously zero-extended i32 values.
Summary:
This is a special case for MIPS64 because the architecture requires
properly 32-bit sign-extended values in the register containers.

Additionaly, we merge consecutive trunc + AssertZExt nodes in order
to avoid unnecessary sign-extensions when the extension comes from a
type smaller than i32.

Reviewers: dsanders

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D18893

llvm-svn: 266203
2016-04-13 15:07:45 +00:00
Zlatko Buljan
b4097ad285 [mips][microMIPS] Add CodeGen support for DIV, MOD, DIVU, MODU, DDIV, DMOD, DDIVU and DMODU instructions
Differential Revision: http://reviews.llvm.org/D17137

This patch was reverted after the revertion of dependant patch http://reviews.llvm.org/D17068.
There was the problem with test-suite failure.
The problem is hopefully solved with dependant patch so this patch is commited again.

llvm-svn: 266179
2016-04-13 08:02:26 +00:00
Hrvoje Varga
acf02ee748 [mips][microMIPS] Fix for "Cannot copy registers" assertion
Differential Revision: http://reviews.llvm.org/D17068

This changes contains fix for failing test-suite. So, this patch should hopefully work now.

llvm-svn: 266171
2016-04-13 06:17:21 +00:00
Tom Stellard
7c8ab79409 AMDGPU/SI: Fix spilling of 96-bit registers
Summary:
It seems like this was broken in r252327.  I thought we had test cases
for this, but it's really hard to tirgger spills of this exact register
size since they aren't used very much.

Reviewers: arsenm, nhaehnle

Subscribers: nhaehnle, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19021

llvm-svn: 266152
2016-04-12 23:57:30 +00:00
Evandro Menezes
de85c02fba [AArch64] Fuse AES{D,E}/AESMC for Exynos M1. (NFC)
llvm-svn: 266144
2016-04-12 22:42:36 +00:00
David L Kreitzer
96a6a7cd40 Fixed a few typos and formatting problems. NFCI.
llvm-svn: 266135
2016-04-12 21:45:09 +00:00
Justin Bogner
7ca5d7c5c4 X86: Avoid accessing SDValues after they've been RAUW'd
This fixes two use-after-frees in selectLEA64_32Addr. If matchAddress
matches an ADD with an AND as an operand, and that AND hits one of the
"heroic transforms" that folds masks and shifts, we end up with N
pointing to an SDNode that was deleted. Make sure we're done accessing
it before that.

Found by ASan with the recycling allocator changes in llvm.org/PR26808.

llvm-svn: 266130
2016-04-12 21:34:24 +00:00
Nicolai Haehnle
30a743add7 AMDGPU: add llvm.amdgcn.buffer.load/store intrinsics
Summary:
They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like
llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and
atomic counters at least when robust buffer access behavior is desired.
(These instructions perform no format conversion and do buffer range checking
per component.)

As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format,
it has become trivial to add support for the f32 and v2f32 variants of that
intrinsic, so the patch does so.

Also DAG-ify (and fix) some tests that I noticed intermittent failures in
while developing this patch.

Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects
changes to the BUFFER_STORE_DWORD* instructions. See also
http://reviews.llvm.org/D18291.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18292

llvm-svn: 266126
2016-04-12 21:18:10 +00:00
James Y Knight
a55b68a75e Add __atomic_* lowering to AtomicExpandPass.
(Recommit of r266002, with r266011, r266016, and not accidentally
including an extra unused/uninitialized element in LibcallRoutineNames)

AtomicExpandPass can now lower atomic load, atomic store, atomicrmw, and
cmpxchg instructions to __atomic_* library calls, when the target
doesn't support atomics of a given size.

This is the first step towards moving all atomic lowering from clang
into llvm. When all is done, the behavior of __sync_* builtins,
__atomic_* builtins, and C11 atomics will be unified.

Previously LLVM would pass everything through to the ISelLowering
code. There, unsupported atomic instructions would turn into __sync_*
library calls. Because of that behavior, Clang currently avoids emitting
llvm IR atomic instructions when this would happen, and emits __atomic_*
library functions itself, in the frontend.

This change makes LLVM able to emit __atomic_* libcalls, and thus will
eventually allow clang to depend on LLVM to do the right thing.

It is advantageous to do the new lowering to atomic libcalls in
AtomicExpandPass, before ISel time, because it's important that all
atomic operations for a given size either lower to __atomic_*
libcalls (which may use locks), or native instructions which won't. No
mixing and matching.

At the moment, this code is enabled only for SPARC, as a
demonstration. The next commit will expand support to all of the other
targets.

Differential Revision: http://reviews.llvm.org/D18200

llvm-svn: 266115
2016-04-12 20:18:48 +00:00
Tom Stellard
ec17b58f39 AMDGPU/SI: Insert wait states required after v_readfirstlane on SI
Summary:
We will be able to handle this case much better once the hazard recognizer
is finished, but this conservative implementation  fixes a hang with the piglit
test:

spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra

Reviewers: arsenm, nhaehnle

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18988

llvm-svn: 266105
2016-04-12 18:40:43 +00:00
Matt Arsenault
0740a1bead AMDGPU: Eliminate half of i64 or if one operand is zero_extend from i32
This helps clean up some of the mess when expanding unaligned 64-bit
loads when changed to be promote to v2i32, and fixes situations
where or x, 0 was emitted after splitting 64-bit ors during moveToVALU.

I think this could be a generic combine but I'm not sure.

llvm-svn: 266104
2016-04-12 18:24:38 +00:00
Sanjay Patel
1177075d7d fix indentation; NFC
llvm-svn: 266097
2016-04-12 18:01:48 +00:00
Nicolai Haehnle
be6e455946 AMDGPU/SI: Fix a mis-compilation of multi-level breaks
Summary:
Under certain circumstances, multi-level breaks (or what is understood by
the control flow passes as such) could be miscompiled in a way that causes
infinite loops, by emitting incorrect control flow intrinsics.

This fixes a hang in
dEQP-GLES3.functional.shaders.loops.while_dynamic_iterations.conditional_continue_vertex

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18967

llvm-svn: 266088
2016-04-12 16:10:38 +00:00
Petar Jovanovic
29c4e0652e [mips] add assembler support for .set arch=octeon
This patch enables assembler support for .set arch=octeon.
It will fix issues with inline assembler when this directive is used.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D18548

llvm-svn: 266081
2016-04-12 15:28:16 +00:00
Matt Arsenault
007a459aa3 AMDGPU: Implement i64 global atomics
llvm-svn: 266075
2016-04-12 14:05:11 +00:00
Matt Arsenault
f100af354c AMDGPU: Add atomic_inc + atomic_dec intrinsics
These are different than atomicrmw add 1 because they have
an additional input value to clamp the result.

llvm-svn: 266074
2016-04-12 14:05:04 +00:00
Matt Arsenault
e47b7e7758 AMDGPU: Remove trailing whitespace
llvm-svn: 266073
2016-04-12 14:04:54 +00:00
Rafael Espindola
b04bc032f0 This reverts commit r266002, r266011 and r266016.
They broke the msan bot.

Original message:

Add __atomic_* lowering to AtomicExpandPass.

AtomicExpandPass can now lower atomic load, atomic store, atomicrmw,and
cmpxchg instructions to __atomic_* library calls, when the target
doesn't support atomics of a given size.

This is the first step towards moving all atomic lowering from clang
into llvm. When all is done, the behavior of __sync_* builtins,
__atomic_* builtins, and C11 atomics will be unified.

Previously LLVM would pass everything through to the ISelLowering
code. There, unsupported atomic instructions would turn into __sync_*
library calls. Because of that behavior, Clang currently avoids emitting
llvm IR atomic instructions when this would happen, and emits __atomic_*
library functions itself, in the frontend.

This change makes LLVM able to emit __atomic_* libcalls, and thus will
eventually allow clang to depend on LLVM to do the right thing.

It is advantageous to do the new lowering to atomic libcalls in
AtomicExpandPass, before ISel time, because it's important that all
atomic operations for a given size either lower to __atomic_*
libcalls (which may use locks), or native instructions which won't. No
mixing and matching.

At the moment, this code is enabled only for SPARC, as a
demonstration. The next commit will expand support to all of the other
targets.

Differential Revision: http://reviews.llvm.org/D18200

llvm-svn: 266062
2016-04-12 12:30:25 +00:00
Simon Dardis
0af53c5560 Revert "[mips] MIPSR6 Compact branch aliases"
This reverts commit r266055.

ps4-buildslave2 is highlighting a failure.

llvm-svn: 266061
2016-04-12 12:22:45 +00:00
Jonas Paulsson
4156ee11bb [SystemZ] Use LDE32 instead of LE, when Offset is small.
On z13, if eliminateFrameIndex() chooses LE (and not LEY), immediately
transform that LE to LDE32 to avoid partial register dependencies.

LEY should be generally preferred for big offsets over an expansion
into LAY + LDE32.

Reviewed by Ulrich Weigand.

llvm-svn: 266060
2016-04-12 12:07:23 +00:00
Simon Dardis
d847d985b9 [mips] MIPSR6 Compact branch aliases
Summary:
Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like
binutils.

Reviewers: dsanders

Differential Revision: http://reviews.llvm.org/D18856

llvm-svn: 266055
2016-04-12 10:41:53 +00:00
Chuang-Yu Cheng
19e3905d51 [PPC64] Mark CR0 Live if PPCInstrInfo::optimizeCompareInstr Creates a Use of CR0
Resolve Bug 27046 (https://llvm.org/bugs/show_bug.cgi?id=27046).
The PPCInstrInfo::optimizeCompareInstr function could create a new use of
CR0, even if CR0 were previously dead. This patch marks CR0 live if a use of
CR0 is created.

Author: Tom Jablin (tjablin)
Reviewers: hfinkel kbarton cycheng

http://reviews.llvm.org/D18884

llvm-svn: 266040
2016-04-12 03:10:52 +00:00
Chuang-Yu Cheng
83dddda4a1 [PPC64] Use mfocrf in prologue when we only need to save 1 nonvolatile CR field
In the ELFv2 ABI, we are not required to save all CR fields. If only one
nonvolatile CR field is clobbered, use mfocrf instead of mfcr to
selectively save the field, because mfocrf has short latency compares to
mfcr.

Thanks Nemanja's invaluable hint!
Reviewers: nemanjai tjablin hfinkel kbarton

http://reviews.llvm.org/D17749

llvm-svn: 266038
2016-04-12 03:04:44 +00:00
Matthias Braun
c7cce2e862 AArch64: Drive-by cleanup
llvm-svn: 266035
2016-04-12 02:16:13 +00:00
Tim Northover
83aa2384f4 ARM: use r7 as the frame-pointer on all MachO targets.
This is better for a few reasons:
  + It matches the other tooling for iOS.
  + It matches EABI in more cases (i.e. Thumb-mode, and in practice we don't
    use ARM mode).
  + It leads to infinitesimally smaller code (0.2%, yay!).

rdar://25369506

llvm-svn: 266003
2016-04-11 22:27:40 +00:00
James Y Knight
003ee915ba Add __atomic_* lowering to AtomicExpandPass.
AtomicExpandPass can now lower atomic load, atomic store, atomicrmw, and
cmpxchg instructions to __atomic_* library calls, when the target
doesn't support atomics of a given size.

This is the first step towards moving all atomic lowering from clang
into llvm. When all is done, the behavior of __sync_* builtins,
__atomic_* builtins, and C11 atomics will be unified.

Previously LLVM would pass everything through to the ISelLowering
code. There, unsupported atomic instructions would turn into __sync_*
library calls. Because of that behavior, Clang currently avoids emitting
llvm IR atomic instructions when this would happen, and emits __atomic_*
library functions itself, in the frontend.

This change makes LLVM able to emit __atomic_* libcalls, and thus will
eventually allow clang to depend on LLVM to do the right thing.

It is advantageous to do the new lowering to atomic libcalls in
AtomicExpandPass, before ISel time, because it's important that all
atomic operations for a given size either lower to __atomic_*
libcalls (which may use locks), or native instructions which won't. No
mixing and matching.

At the moment, this code is enabled only for SPARC, as a
demonstration. The next commit will expand support to all of the other
targets.

Differential Revision: http://reviews.llvm.org/D18200

llvm-svn: 266002
2016-04-11 22:22:33 +00:00
Manman Ren
e6636caf43 Swift Calling Convention: swifterror target support.
Differential Revision: http://reviews.llvm.org/D18716

llvm-svn: 265997
2016-04-11 21:08:06 +00:00
Tom Stellard
2898e45c97 Revert "AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute"
This reverts commit r263720.

Just confirmed that s_waitcnt is required after ds_permute/ds_bpermute.

llvm-svn: 265992
2016-04-11 20:38:40 +00:00
Hans Wennborg
f1f7a0ce15 Fix broken assert, PR24624
llvm-svn: 265989
2016-04-11 20:35:41 +00:00
Sriraman Tallam
0bd1bfc81e Test commit.
llvm-svn: 265976
2016-04-11 18:40:50 +00:00
Petar Jovanovic
a8eb1f20c0 [mips] Make Static a default relocation model for MIPS codegen
This change follows up defaults for GCC and Clang, so LLVM does not differ
from them. While number of the test files are touched with this change, they
all keep the old (expected) behaviour with the explicit option:
"-relocation-model=pic"
The tests that have not been touched are insensitive to relocation model.

Differential Revision: http://reviews.llvm.org/D17995

llvm-svn: 265949
2016-04-11 15:24:23 +00:00
Daniel Sanders
d5e0309560 [mips] Trivial corrections to range checked immediates.
Summary:
SYNC has a 5-bit unsigned immediate.
Move MIPS16-specific pcrel16 operand to Mips16 files.

Reviewers: vkalintiris

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D18755

llvm-svn: 265947
2016-04-11 15:20:40 +00:00
Ulrich Weigand
28d926d1e5 [SystemZ] README: remove an implemented idea, add some new ones
The note about conditional returns can now be removed, as they are
implemented. Let's also add 2 new ones in exchange.

Author: koriakin
Differential Revision: http://reviews.llvm.org/D18962

llvm-svn: 265944
2016-04-11 14:38:47 +00:00
Ulrich Weigand
71be6fad91 [SystemZ] Add SVC instruction
This is going to be useful for inline assembly only.

Author: koriakin
Differential Revision: http://reviews.llvm.org/D18952

llvm-svn: 265943
2016-04-11 14:35:39 +00:00
Oliver Stannard
7248f4417e [ARM] Avoid switching ARM/Thumb mode on .arch/.cpu directive
When we see a .arch or .cpu directive, we should try to avoid switching
ARM/Thumb mode if possible.

If we do have to switch modes, we also need to emit the correct mapping
symbol for the new ISA. We did not do this previously, so could emit
ARM code with Thumb mapping symbols (or vice-versa).

The GAS behaviour is to always stay in the same mode, and to emit an
error on any instructions seen when the current mode is not available on
the current target. We can't represent that situation easily (we assume
that Thumb mode is available if ModeThumb is set), so we differ from the
GAS behaviour when switching to a target that can't support the old
mode. I've added a warning for when this implicit mode-switch occurs.

Differential Revision: http://reviews.llvm.org/D18955

llvm-svn: 265936
2016-04-11 13:06:28 +00:00
Ulrich Weigand
8612094f9a [SystemZ] Support conditional indirect sibling calls via BCR
This adds a conditional variant of CallBR instruction, CallBCR. Also,
it can be fused with integer comparisons, resulting in one of the new
C*BCall instructions.

In addition to CallBRCL limitations, this has another one: it won't
trigger if the function to call isn't already in %r1 - see f22 in the
test for an example (it's also why the loads in tests are volatile).

Author: koriakin
Differential Revision: http://reviews.llvm.org/D18928

llvm-svn: 265933
2016-04-11 12:12:32 +00:00
Ulrich Weigand
8f03ec0d11 [SystemZ] Remove incorrect CC use for C*BReturn instructions
These are fused compare-and-branches, so they obviously don't use CC.

Author: koriakin
Differential Revision: http://reviews.llvm.org/D18927

llvm-svn: 265932
2016-04-11 12:03:30 +00:00
Andrey Turetskiy
79cd7e75f9 [X86] Restrict max long nop length for Lakemont.
Restrict the max length of long nops for Lakemont to 7. Experiments on MCU
benchmarks (Dhrystone, Coremark) show that this is the most optimal length.

Differential Revision: http://reviews.llvm.org/D18897

llvm-svn: 265924
2016-04-11 10:07:36 +00:00
Simon Pilgrim
3b0d269398 [X86][AVX512BW] Add support for v64i8 multiplies
Extend the existing lowering of vXi8 multiplies to support v64i8 on avx512bw targets.

I added the Lower512IntArith helper function to help with this - not sure how often this could be used in the future, but it seemed better than putting all that logic inside LowerMUL.

Differential Revision: http://reviews.llvm.org/D18937

llvm-svn: 265902
2016-04-10 17:02:48 +00:00
Craig Topper
9ff7c33220 [X86] Use for loops over types to reduce code for setting up operation actions.
llvm-svn: 265893
2016-04-10 05:39:32 +00:00
Craig Topper
466e9e81be [X86] Remove unnecessary setOperationAction for SRA v2i64/v4i64 when VLX is suppored. This is already done for SSE2/AVX2 which VLX implies. NFC
llvm-svn: 265892
2016-04-10 05:39:28 +00:00
Davide Italiano
5f21656c16 [MC] support TLSDESC and TLSCALL / GNU2 tls dialect
Differential Revision:  http://reviews.llvm.org/D18885

llvm-svn: 265881
2016-04-09 20:32:33 +00:00
Sanjay Patel
ea5cc7e72d [x86] use BMI 'andn' for logic + compare ops
With BMI, we can use 'andn' to save an instruction when the result is only used in a compare.
This is related to one of the potential sequences to check 'isfinite' in:
https://llvm.org/bugs/show_bug.cgi?id=27164

Differential Revision: http://reviews.llvm.org/D18910

llvm-svn: 265875
2016-04-09 16:02:52 +00:00
Simon Pilgrim
c8f7b4ec60 [X86][XOP] Support for VPPERM 2-input shuffle mask decoding
This patch adds support for decoding XOP VPPERM instruction when it represents a basic shuffle.

The mask decoding required the existing MCInstrLowering code to be updated to support binary shuffles - the implementation now matches what is done in X86InstrComments.cpp.

Differential Revision: http://reviews.llvm.org/D18441

llvm-svn: 265874
2016-04-09 14:51:26 +00:00
Craig Topper
4d29234fe1 [X86] Use for loops over types to reduce code for setting up operation actions. NFC
llvm-svn: 265871
2016-04-09 06:31:02 +00:00
Craig Topper
cf77b72f78 [X86] Remove calls to setOperationAction that set CTLZ_ZERO_UNDEF for some vector types to Expand. Expand is already set for all operations for all vector types earlier so this is redundant. NFC
llvm-svn: 265870
2016-04-09 05:53:48 +00:00
Tim Shen
8cac1d5c28 [SSP] Remove llvm.stackprotectorcheck.
This is a cleanup patch for SSP support in LLVM. There is no functional change.
llvm.stackprotectorcheck is not needed, because SelectionDAG isn't
actually lowering it in SelectBasicBlock; rather, it adds check code in
FinishBasicBlock, ignoring the position where the intrinsic is inserted
(See FindSplitPointForStackProtector()).

llvm-svn: 265851
2016-04-08 21:26:31 +00:00
Hans Wennborg
136b627c46 Rangeify a loop. NFC.
llvm-svn: 265846
2016-04-08 20:46:09 +00:00
Hans Wennborg
c3f3a07456 Remove some redundant variables from X86TargetLowering::LowerDYNAMIC_STACKALLOC
These are already defined, with the same values, a few lines up. NFC.

llvm-svn: 265845
2016-04-08 20:46:00 +00:00
Kevin B. Smith
5defc888fe [X86] Fix PR23155 by turning on X86FixupBWInsts by default.
Differential Revision: http://reviews.llvm.org/D18866

llvm-svn: 265830
2016-04-08 18:58:29 +00:00
Ulrich Weigand
7102a6833f [SystemZ] Support conditional sibling calls via BRCL
This adds a conditional variant of CallJG instruction, CallBRCL.
It can be used for conditional sibling calls. Unfortunately, due
to IfCvt limitations, it only really works well for functions without
arguments.

Author: koriakin
Differential Revision: http://reviews.llvm.org/D18864

llvm-svn: 265814
2016-04-08 17:22:19 +00:00
Sam Parker
e40fc81f76 [ARM] Enable SMLAW[B|T] and SMLUW[B|T] instruction selection
Added ISelDAGToDAG functions to enable selection of the smlawb, smlawt,
smulwb and smulwt instructions for the ARM backend. Also updated the smul
CodeGen test and removed the smulw one.

Differential Revision: http://reviews.llvm.org/D18892

llvm-svn: 265793
2016-04-08 16:02:53 +00:00
Simon Pilgrim
4abee08f8b [X86] Tidied up shuffle decode function doxygen descriptions
As discussed on D18441 - auto brief is used so we don't need /brief, we don't need to include the function name and added some missing descriptions.

llvm-svn: 265785
2016-04-08 14:17:07 +00:00
Chuang-Yu Cheng
6e4b4f696f CXX_FAST_TLS calling convention: performance improvement for PPC64
This is the same change on PPC64 as r255821 on AArch64. I have even borrowed
his commit message.

The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.

We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.

Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.

We add CSRsViaCopy, it will be explicitly handled during lowering.

1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
   supports it for the given machine function and the function has only return
   exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
   virtual registers at beginning of the entry block and copies from virtual
   registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.

Author: Tom Jablin (tjablin)
Reviewers: hfinkel kbarton cycheng

http://reviews.llvm.org/D17533

llvm-svn: 265781
2016-04-08 12:04:32 +00:00
Vasileios Kalintiris
3afd53d9e5 [mips] Use range-based for loops. NFC.
llvm-svn: 265780
2016-04-08 10:33:00 +00:00
Zlatko Buljan
a49b48852d [mips][microMIPS] Add CodeGen support for ADD, ADDIU*, ADDU* and DADD* instructions
Differential Revision: http://reviews.llvm.org/D16454

llvm-svn: 265772
2016-04-08 07:27:26 +00:00
Quentin Colombet
2aebd873f1 [AArch64] Fix a typo in the register class to register bank mapping.
For GPR family we want the GPR register bank, not FPR!

llvm-svn: 265743
2016-04-07 23:10:14 +00:00
Quentin Colombet
fe5d2f01f8 [AArch64] Get rid of some GlobalISel ifdefs.
llvm-svn: 265725
2016-04-07 21:24:40 +00:00
Quentin Colombet
539cebbb04 [AArch64] gcc does not like litteral without quotes even on preprocessor macros.
llvm-svn: 265720
2016-04-07 20:49:15 +00:00
Quentin Colombet
ec63533e26 [AArch64][CallLowering] Do not build the API if GlobalISel is not built.
This gets rid of some ifdefs and dummy implementations that were here
just to fill the blanks.

llvm-svn: 265719
2016-04-07 20:47:51 +00:00
Quentin Colombet
17164e059e [GlobalISel] Add RegBankSelect hooks into the pass pipeline.
Now, RegBankSelect will happen after the IRTranslation and the target
may optionally add additional passes in between.

llvm-svn: 265716
2016-04-07 20:27:33 +00:00
Jan Vesely
e71f70898a AMDGPU/SI: Implement atomic load/store for i32 and i64
Standard load/store instructions with GLC bit set.

Reviewers: tstellardAMD, arsenm

Differential Revision: http://reviews.llvm.org/D18760

llvm-svn: 265709
2016-04-07 19:23:11 +00:00
Tom Stellard
4a205248ea AMDGPU/SI: Add latency for export instructions
Reviewers: arsenm, nhaehnle

Subscribers: nhaehnle, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18599

llvm-svn: 265708
2016-04-07 18:30:05 +00:00
Quentin Colombet
2313f65666 [RegisterBank] Rename RegisterBank::contains into RegisterBank::covers.
llvm-svn: 265695
2016-04-07 17:09:39 +00:00
Ulrich Weigand
be976ed2cc [SystemZ] Fix build break from r265689
Fix build error seen on some build bots due to:
error: default label in switch which covers all enumeration values

llvm-svn: 265693
2016-04-07 16:33:25 +00:00
Kevin B. Smith
b1d0008007 [X86]: Fix for PR27251.
Differential Revision: http://reviews.llvm.org/D18850

llvm-svn: 265690
2016-04-07 16:15:34 +00:00
Ulrich Weigand
80d5f68422 [SystemZ] Implement conditional returns
Return is now considered a predicable instruction, and is converted
to a newly-added CondReturn (which maps to BCR to %r14) instruction by
the if conversion pass.

Also, fused compare-and-branch transform knows about conditional
returns, emitting the proper fused instructions for them.

This transform triggers on a *lot* of tests, hence the huge diffstat.
The changes are mostly jX to br %r14 -> bXr %r14.

Author: koriakin

Differential Revision: http://reviews.llvm.org/D17339

llvm-svn: 265689
2016-04-07 16:11:44 +00:00
Ehsan Amiri
36d2ad6539 [PPC] Enable transformations in PPCPassConfig::addIRPasses at O2
http://reviews.llvm.org/D18562

A large number of testcases has been modified so they pass after this test.
One testcase is deleted, because I realized even after undoing the original
change that was committed with this testcase, the testcase still passes. So
I removed it. The change to one other testcase (test/CodeGen/PowerPC/pr25802.ll)
is an arbitrary change to keep it passing. Given the original intention of the
testcase, and the fact that fixing it will require some time to change the testcase,
we concluded that this quick change will be enough.

llvm-svn: 265683
2016-04-07 15:30:55 +00:00
Tom Stellard
b255b03243 AMDGPU/SI: Add MachineBasicBlock parameter to SIInstrInfo::insertWaitStates
Summary: This makes it possible to insert nops at the end of blocks.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18549

llvm-svn: 265678
2016-04-07 14:47:07 +00:00
Valery Pykhtin
1971061641 [AMDGPU] fix readlane/readfirstlane src vgpr operand type.
For VGPR_32 operand disassembler expects a VGPR register encoded as 0..255 (enum8 src operand).
readfirstlane/readline actually has enum9 operand and this change fixes VGPR_32 to VS_32 (enum9 encoding).

Differential Revision: http://reviews.llvm.org/D18696

llvm-svn: 265670
2016-04-07 13:41:51 +00:00
Benjamin Kramer
5cf908079f Make helper functions static. NFC.
llvm-svn: 265653
2016-04-07 10:10:09 +00:00
Simon Pilgrim
2e3347a5a5 [X86][SSE] Add support for VZEXT constant folding
llvm-svn: 265646
2016-04-07 07:52:45 +00:00
Ahmed Bougacha
d78960d142 [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.
Re-apply r265450 which caused PR27245 and was reverted in r265559
because of a wrong generalization: the fetch_and_add->add_and_fetch
combine only works in specific, but pretty common, cases:
  (icmp slt x, 0) -> (icmp sle (add x, 1), 0)
  (icmp sge x, 0) -> (icmp sgt (add x, 1), 0)
  (icmp sle x, 0) -> (icmp slt (sub x, 1), 0)
  (icmp sgt x, 0) -> (icmp sge (sub x, 1), 0)

Original Message:

We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.

Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.

This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).

This uses the LOCK ISD opcodes added by r262244.

Differential Revision: http://reviews.llvm.org/D17633

llvm-svn: 265636
2016-04-07 02:07:10 +00:00
Quentin Colombet
29f0c6f517 [AArch64] Teach RegisterBankInfo about the CC register bank.
We need to cover each register class with a register bank.

llvm-svn: 265629
2016-04-07 00:39:29 +00:00
Quentin Colombet
be98b33b17 [AArch64] Teach RegisterBankInfo about the mapping of register classes
on register banks.

llvm-svn: 265626
2016-04-07 00:14:30 +00:00
Hans Wennborg
3add36fb90 Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"
Third time's the charm? The previous attempt (r265345) caused ASan test
failures on X86, as broken CFI caused stack traces to not work.

This version of the patch makes sure not to merge with stack adjustments
that have CFI, and to not add merged instructions' offests to the CFI
about to be generated.

This is already covered by the lit tests; I just got the expectations
wrong previously.

llvm-svn: 265623
2016-04-07 00:05:49 +00:00
JF Bastien
f4f5b32f44 NFC: make AtomicOrdering an enum class
Summary:
In the context of http://wg21.link/lwg2445 C++ uses the concept of
'stronger' ordering but doesn't define it properly. This should be fixed
in C++17 barring a small question that's still open.

The code currently plays fast and loose with the AtomicOrdering
enum. Using an enum class is one step towards tightening things. I later
also want to tighten related enums, such as clang's
AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI'
enum).

This change touches a few lines of code which can be improved later, I'd
like to keep it as NFC for now as it's already quite complex. I have
related changes for clang.

As a follow-up I'll add:
  bool operator<(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>(AtomicOrdering, AtomicOrdering) = delete;
  bool operator<=(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>=(AtomicOrdering, AtomicOrdering) = delete;
This is separate so that clang and LLVM changes don't need to be in sync.

Reviewers: jyknight, reames

Subscribers: jyknight, llvm-commits

Differential Revision: http://reviews.llvm.org/D18775

llvm-svn: 265602
2016-04-06 21:19:33 +00:00
Ehsan Amiri
ee69c81e47 [PPC] Use VSX/FP Facility integer load when an integer load's only users are conversion to FP
http://reviews.llvm.org/D18405

When the integer value loaded is never used directly as integer we should use VSX 
or Floating Point Facility integer loads and avoid extra direct move

llvm-svn: 265593
2016-04-06 20:12:29 +00:00
James Y Knight
82e732e95b Put quotes around #error string.
GCC reports "missing terminating ' character", even when it's being
skipped by preprocessing.

llvm-svn: 265590
2016-04-06 19:52:32 +00:00
Nicolai Haehnle
69b2d0adeb AMDGPU: Add a shader calling convention
This makes it possible to distinguish between mesa shaders
and other kernels even in the presence of compute shaders.

Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Differential Revision: http://reviews.llvm.org/D18559

llvm-svn: 265589
2016-04-06 19:40:20 +00:00
Quentin Colombet
f79a85c911 [AArch64] Change the CMake to avoid to build GlobalISel related APIs
when GISel is not built.
The positive side effects are:
- We do not have to define dummy implementation
- We do not have to do weird gymnastic to avoid like issues (like
  missing constructor or vtable for the base classes)

llvm-svn: 265570
2016-04-06 17:38:12 +00:00
Quentin Colombet
c5fc893533 [AArch64] Teach the subtarget how to get to the RegisterBankInfo.
Rework the access to GlobalISel APIs to contain how much of
the APIs we need to access for the final executable to build when
GlobalISel is not built.

This prevents massive usage of ifdefs in various places. Now, all the
GlobalISel ifdefs will be happing only in AArch64TargetMachine.cpp.

llvm-svn: 265567
2016-04-06 17:26:03 +00:00
Hans Wennborg
f357d442d6 Revert r265450 "[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC."
It caused ASan 32-bit tests to hang (PR27245).

llvm-svn: 265559
2016-04-06 16:44:38 +00:00
Hans Wennborg
657e738668 Revert "Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)""
It seems to be causing ASan tests to crash, probably due to
miscompiling the run-time somehow.

llvm-svn: 265551
2016-04-06 16:10:20 +00:00
Quentin Colombet
b6ea381746 [AArch64] Use the default constructor of RegisterBankInfo when GlobalISel is not built.
This will avoid link-time error as the defautl constructor of RegisterBankInfo is
the only one available when GlobalISel is not built.

llvm-svn: 265549
2016-04-06 15:53:13 +00:00
Sam Kolton
518916f9b4 [AMDGPU] AsmParser: disable DPP for unsupported instructions. New dpp tests. Fix v_nop_dpp.
Summary:
1. Disable DPP encoding for instructions that do not support it:
    - VOP1:
        - v_readfirstlane_b32
        - v_clrexcp
        - v_movreld_b32
        - v_movrels_b32
        - v_movrelsd_b32
    - VOP2:
        - v_madmk_f16/32
        - v_madak_f16/32
    - VOPC, VINTRP, VOP3
2. Fix DPP for v_nop
3. New DPP tests for VOP1 and VOP2 instructions

Reviewers: nhaustov, tstellarAMD, vpykhtin

Subscribers: tstellarAMD, arsenm

Differential Revision: http://reviews.llvm.org/D18552

llvm-svn: 265538
2016-04-06 13:29:59 +00:00
Evgeny Astigeevich
f7b1463cab [AArch64][CodeGen] NFC refactor AArch64InstrInfo::optimizeCompareInstr to prepare it for fixing a bug in it
AArch64InstrInfo::optimizeCompareInstr has a bug which causes generation of incorrect code (PR#27158).
The patch refactors the function to simplify reviewing the fix of the bug.

1. Function name ‘modifiesConditionCode’ is changed to ‘areCFlagsAccessedBetweenInstrs’
   to reflect that the function can check modifying accesses, reading accesses or both.
2. Function ‘AArch64InstrInfo::optimizeCompareInstr’
   - Documented the function
   - Cmp_NZCV is DeadNZCVIdx to reflect that it is an operand index of dead NZCV
   - The code for the case of substituting CmpInstr is put into separate
     functions the main of them is ‘substituteCmpInstr’.

Differential Revision: http://reviews.llvm.org/D18609

llvm-svn: 265531
2016-04-06 11:39:00 +00:00
Chuang-Yu Cheng
a5a77bb95c [ppc64] Temporary disable sibling call optimization on ppc64 due to breaking test case
r265506 breaks print-stack-trace.cc test case of compiler-rt in bootstrap
test.

http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/1708

llvm-svn: 265528
2016-04-06 10:48:36 +00:00
Matthias Braun
1dd3b523a9 AArch64: Fix compile error
Fixed to adapt a use of enterBasicBlock() in my last commit (because I
had follow on patches in my repository that change the code).

llvm-svn: 265513
2016-04-06 02:59:44 +00:00
Matthias Braun
8623ebcaee RegisterScavenger: Take a reference as enterBasicBlock() argument.
Make it obvious that the argument cannot be nullptr.
Remove an unnecessary nullptr check in initRegState.

llvm-svn: 265511
2016-04-06 02:47:09 +00:00
Chuang-Yu Cheng
de5217e247 [ppc64] Enable sibling call optimization on ppc64 ELFv1/ELFv2 abi
This patch enable sibling call optimization on ppc64 ELFv1/ELFv2 abi, and
add a couple of test cases. This patch also passed llvm/clang bootstrap
test, and spec2006 build/run/result validation.

Original issue: https://llvm.org/bugs/show_bug.cgi?id=25617

Great thanks to Tom's (tjablin) help, he contributed a lot to this patch.
Thanks Hal and Kit's invaluable opinions!

Reviewers: hfinkel kbarton

http://reviews.llvm.org/D16315

llvm-svn: 265506
2016-04-06 02:04:38 +00:00
Chuang-Yu Cheng
45f8677a56 [Power9] Implement add-pc, multiply-add, modulo, extend-sign-shift, random number, set bool, and dfp test significance
This patch implement the following instructions:
- addpcis subpcis
- maddhd maddhdu maddld
- modsw moduw modsd modud
- darn
- extswsli extswsli.
- setb
- dtstsfi dtstsfiq

Total 15 instructions

Reviewers: nemanjai hfinkel tjablin amehsan kbarton

http://reviews.llvm.org/D17885

llvm-svn: 265505
2016-04-06 01:47:02 +00:00
Chuang-Yu Cheng
dff7434b0f [Power9] Implement copy-paste, msgsync, slb, and stop instructions
This patch implements the following BookII and Book III instructions:
- copy copy_first cp_abort paste paste. paste_last
- msgsync
- slbieg slbsync
- stop

Total 10 instructions

Reviewers: nemanjai hfinkel tjablin amehsan kbarton
llvm-svn: 265504
2016-04-06 01:46:45 +00:00
NAKAMURA Takumi
59391eddda AArch64CodeGen: Make AArch64RegisterBankInfo.cpp optional along LLVM_BUILD_GLOBAL_ISEL.
llvm-svn: 265499
2016-04-06 01:18:08 +00:00
Quentin Colombet
5e6846e56c [AArch64] Initial implementation of the targeting of the register bank information.
llvm-svn: 265489
2016-04-05 23:34:59 +00:00
Manman Ren
6e8ec540e6 Swift Calling Convention: swiftcc for ARM.
Differential Revision: http://reviews.llvm.org/D18769

llvm-svn: 265482
2016-04-05 22:44:44 +00:00
Evgeniy Stepanov
c80f414fe8 Faster stack-protector for Android/AArch64.
Bionic has a defined thread-local location for the stack protector
cookie. Emit a direct load instead of going through __stack_chk_guard.

llvm-svn: 265481
2016-04-05 22:41:50 +00:00
Manman Ren
d8f96bea63 Swift Calling Convention: add swiftcc.
Differential Revision: http://reviews.llvm.org/D17863

llvm-svn: 265480
2016-04-05 22:41:47 +00:00
Duncan P. N. Exon Smith
da537cb806 IR: Introduce ConstantAggregate, NFC
Add a common parent class for ConstantArray, ConstantVector, and
ConstantStruct called ConstantAggregate.  These are the aggregate
subclasses of Constant that take operands.

This is mainly a cleanup, adding common `isa` target and removing
duplicated code.  However, it also simplifies caching which constants
point transitively at `GlobalValue` (a possible future direction).

llvm-svn: 265466
2016-04-05 21:10:45 +00:00
Duncan P. N. Exon Smith
5f366e4aa0 Revert "Fix Clang-tidy modernize-deprecated-headers warnings in remaining files; other minor fixes."
This reverts commit r265454 since it broke the build.  E.g.:

  http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_build/22413/

llvm-svn: 265459
2016-04-05 20:45:04 +00:00
Eugene Zelenko
a612bac11f Fix Clang-tidy modernize-deprecated-headers warnings in remaining files; other minor fixes.
Some Include What You Use suggestions were used too.

Use anonymous namespaces in source files.

Differential revision: http://reviews.llvm.org/D18778

llvm-svn: 265454
2016-04-05 20:19:49 +00:00
Ahmed Bougacha
6921e4df62 [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.
We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.

Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.

This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).

This uses the LOCK ISD opcodes added by r262244.

Differential Revision: http://reviews.llvm.org/D17633

llvm-svn: 265450
2016-04-05 20:02:57 +00:00
Ahmed Bougacha
0d71e0a51a [X86] Simplify early-exit check. NFC.
llvm-svn: 265447
2016-04-05 20:02:22 +00:00
Sanjay Patel
7507aa6e81 fix typo; NFC
llvm-svn: 265442
2016-04-05 19:27:39 +00:00
Jacques Pienaar
6e9736dc21 [lanai] LanaiSetflagAluCombiner more conservative
Summary: LanaiSetflagAluCombiner could previously combine instructions across basic building blocks even when not legal. Make the LanaiSetflagAluCombiner more conservative to avoid this.

Reviewers: eliben

Subscribers: joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D18746

llvm-svn: 265411
2016-04-05 16:18:13 +00:00
Sam Parker
fc1ce7d191 [ARM] Cleanup of smul and smla instruction descriptions
Removed the SDNode argument passed to the AI_smul and AI_smla multiclass
definitions as they are always mul.

Differential Revision: http://reviews.llvm.org/D18791

llvm-svn: 265409
2016-04-05 16:01:25 +00:00
Konstantin Zhuravlyov
57ddfb8bca [AMDGPU] Emit linkonce and linkonce_odr symbols
Differential Revision: http://reviews.llvm.org/D18726

llvm-svn: 265408
2016-04-05 16:00:58 +00:00
Simon Dardis
edd0920371 [mips] MIPSR6 Compact jump support
This patch adds support for compact jumps similiar to the previous compact
branch support for MIPSR6. Unlike compact branches, compact jumps do not
have a forbidden slot.

As MipsInstrInfo::getEquivalentCompactForm can determine the correct
expansion for jumps and branches for both microMIPS and MIPSR6, remove the
unnecessary distinction in the delay slot filler.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders
llvm-svn: 265390
2016-04-05 12:50:29 +00:00
Justin Holewinski
234f7b3e41 [NVPTX] Handle ldg created from sign-/zero-extended load
Reviewers: jingyue

Subscribers: jholewinski

Differential Revision: http://reviews.llvm.org/D18053

llvm-svn: 265389
2016-04-05 12:38:01 +00:00
JF Bastien
41b74c35a6 Lanai: fix -Wsign-compare warning
llvm-svn: 265368
2016-04-05 00:20:27 +00:00
JF Bastien
a008c37e9f Lanai: fix -Wpedantic warnings
Extra semicolon.

llvm-svn: 265365
2016-04-04 23:47:30 +00:00
Hans Wennborg
fa531d7979 Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"
The original commit miscompiled things on 32-bit Windows, e.g. a Clang
boostrap. It turns out that mergeSPUpdates() was a bit too generous in
what it interpreted as a stack adjustment, causing the following code:

        addl    $12, %esp
        leal    -4(%ebp), %esp

To be "optimized" into simply:

        addl    $8, %esp

This commit tightens up mergeSPUpdates() and includes a new test
(test14 in movtopush.ll) for this situation.

llvm-svn: 265345
2016-04-04 21:02:46 +00:00
Matthias Braun
9984790824 ARM, AArch64, X86: Check preserved registers for tail calls.
We can only perform a tail call to a callee that preserves all the
registers that the caller needs to preserve.

This situation happens with calling conventions like preserver_mostcc or
cxx_fast_tls. It was explicitely handled for fast_tls and failing for
preserve_most. This patch generalizes the check to any calling
convention.

Related to rdar://24207743

Differential Revision: http://reviews.llvm.org/D18680

llvm-svn: 265329
2016-04-04 18:56:13 +00:00
Derek Schuff
e26993d076 Add MachineFunctionProperty checks for AllVRegsAllocated for target passes
Summary:
This adds the same checks that were added in r264593 to all
target-specific passes that run after register allocation.

Reviewers: qcolombet

Subscribers: jyknight, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18525

llvm-svn: 265313
2016-04-04 17:09:25 +00:00
Daniel Sanders
f97e8263b1 [mips] Range check simm32 and fold MIPS16's imm32 into simm32.
Summary:
At this point we should be able to enable IAS by default for O32 without
breaking check-all, or recursion.

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18439

llvm-svn: 265302
2016-04-04 15:32:49 +00:00
Ulrich Weigand
73db4ab07d [SystemZ] Add compare-and-branch instructions to MC
This adds MC support for fused compare + indirect branch instructions,
ie. CRB, CGRB, CLRB, CLGRB, CIB, CGIB, CLIB, CLGIB. They aren't actually
generated yet -- this is preparation for their use for conditional
returns in the next iteration of D17339.

Author: koriakin
Differential Revision: http://reviews.llvm.org/D18742

llvm-svn: 265296
2016-04-04 14:26:43 +00:00
Ulrich Weigand
ef8c006d34 [SystemZ] Support ATOMIC_FENCE
A cross-thread sequentially consistent fence should be lowered into
z/Architecture's BCR serialization instruction, instead of causing a
fatal error in the back-end.

Author: bryanpkc
Differential Revision: http://reviews.llvm.org/D18644

llvm-svn: 265292
2016-04-04 12:45:44 +00:00
Ulrich Weigand
c0457f68d4 [SystemZ] Support llvm.frameaddress/llvm.returnaddress intrinsics
Enable the SystemZ back-end to lower FRAMEADDR and RETURNADDR, which
previously would cause the back-end to crash.  Currently, only a
frame count of zero is supported.

Author: bryanpkc
Differential Revision: http://reviews.llvm.org/D18514

llvm-svn: 265291
2016-04-04 12:44:55 +00:00
Elena Demikhovsky
c8ec20583d AVX-512: Truncating store for i1 vectors
Implemented truncstore for KNL and skylake-avx512.
Covered vectors from v2i1 to v64i1. We save the value in bits (not in bytes) - v32i1 is saved in 4 bytes.

Differential Revision: http://reviews.llvm.org/D18740

llvm-svn: 265283
2016-04-04 07:17:47 +00:00
Simon Pilgrim
892accb72e [X86] Removed duplicate code.
llvm-svn: 265274
2016-04-03 20:40:35 +00:00
Simon Pilgrim
9aa00086fb [X86][SSE] Support for MOVMSK signbit extraction instructions
Add support for lowering with the MOVMSK instruction to extract vector element signbits to a GPR.

This is an early step towards more optimal handling of vector comparison results.

Differential Revision: http://reviews.llvm.org/D18741

llvm-svn: 265266
2016-04-03 18:22:03 +00:00
Simon Pilgrim
6ec9cf7d9d [X86] Tidied up X86ISD instruction nodes. NFCI.
Tidied up comments, stripped trailing whitespace, split apart nodes that aren't related.

No change in ordering although there is definitely some scope for it.

llvm-svn: 265263
2016-04-03 14:14:32 +00:00
Elena Demikhovsky
b1c5cedd43 AVX-512: Load and Extended Load for i1 vectors
Implemented load+{sign|zero}_extend for i1 vectors
Fixed failures in i1 vector load.
Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX.

Differential Revision: http://reviews.llvm.org/D18737

llvm-svn: 265259
2016-04-03 08:41:12 +00:00
Jacques Pienaar
6295e51827 [lanai] Fix for LanaiDelaySlotFiller and LanaiMCInstLower.cpp
Summary:
* Fix to stop delay slot filler from inserting SP modifying instructions in the newly expanded call/return instructions.
* In LowerSymbol the outermost type was not LanaiMCExpr if there was a binary expression
* Remove printExpr in LanaiInstPrinter

Subscribers: joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D18734

llvm-svn: 265251
2016-04-03 00:49:27 +00:00
Zoran Jovanovic
9cae055d65 [mips][microMIPS] Revert commits r264245 and r264248.
Commit r264245 was the reason for failing tests in LLVM test suite.
Commit r264248 depends on the first one.

llvm-svn: 265249
2016-04-02 23:06:13 +00:00
Saleem Abdulrasool
f5d9036758 AArch64: support .cpu directive
Add support for the AArch64 .cpu directive.  This is a slightly involved
directive since the parameter is actually a variable encoded string.  The
general structure is:

  <cpu>[[+-]<feature>]*

We now map some of the supported string names for features for internal
representation of feature flags.  If we encounter one which we do not support,
bail out as we cannot validate the assembly any longer.

Resolves PR27010.

llvm-svn: 265240
2016-04-02 19:29:52 +00:00
Tim Northover
8b6499026f AArch64: avoid clobbering SP for dead MOVimm pseudos.
We were producing ORR, which actually defines a GPR32sp rather than a GPR32.

Should fix PR23209.

llvm-svn: 265198
2016-04-01 23:14:52 +00:00
James Y Knight
9ad0ccef43 Remove useless check for ThreadModel==Single in ARMISelLowering. NFC.
ThreadModel::Single is already handled already by ARMPassConfig adding
LowerAtomicPass to the pass list, which lowers all atomics to non-atomic
ops and deletes fences.

So by the time we get to ISel, there's no atomic fences left, so they
don't need special handling.

llvm-svn: 265178
2016-04-01 19:33:19 +00:00
Tom Stellard
07e93d8663 AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2}
Summary:
Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+.

32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý.

Patch by: Vedran Miletić

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: jvesely, scchan, kanarayan, arsenm

Differential Revision: http://reviews.llvm.org/D17280

llvm-svn: 265170
2016-04-01 18:27:37 +00:00
Sanjay Patel
638944f122 [x86] avoid intermediate splat for non-zero memsets (PR27100)
Follow-up to http://reviews.llvm.org/D18566 and http://reviews.llvm.org/D18676 -
where we noticed that an intermediate splat was being generated for memsets of
non-zero chars.

That was because we told getMemsetStores() to use a 32-bit vector element type,
and it happily obliged by producing that constant using an integer multiply.

The 16-byte test that was added in D18566 is now equivalent for AVX1 and AVX2
(no splats, just a vector load), but we have PR27141 to track that splat difference.

Note that the SSE1 path is not changed in this patch. That can be a follow-up.
This patch should resolve PR27100.

llvm-svn: 265161
2016-04-01 17:36:45 +00:00
Chad Rosier
0e0548bc28 [AArch64] Fix a typo. NFC.
llvm-svn: 265160
2016-04-01 17:34:38 +00:00
Sanjay Patel
c25235169d [x86] avoid intermediate splat for non-zero memsets (PR27100)
Follow-up to D18566 - where we noticed that an intermediate splat was being
generated for memsets of non-zero chars.

That was because we told getMemsetStores() to use a 32-bit vector element type,
and it happily obliged by producing that constant using an integer multiply.

The tests that were added in the last patch are now equivalent for AVX1 and AVX2
(no splats, just a vector load), but we have PR27141 to track that splat difference.
In the new tests, the splat via shuffling looks ok to me, but there might be some
room for improvement depending on uarch there.

Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up.
This patch should resolve PR27100.

Differential Revision: http://reviews.llvm.org/D18676

llvm-svn: 265148
2016-04-01 16:27:14 +00:00
Valery Pykhtin
33bd0c0f43 [AMDGPU] fix MADAK/MADMK instructions operand namings to match encoding fields.
$vsrc1 -> $src1, $k -> $imm

Differential Revision: http://reviews.llvm.org/D18659

llvm-svn: 265141
2016-04-01 13:13:12 +00:00
Andrea Di Biagio
a30a690d23 [x86] Remove redundant call to setTargetDAGCombine for BUILD_VECTOR node type.
Since revision 235394, we no longer perform target specific combines on
build_vector nodes. No functional change intended.

llvm-svn: 265138
2016-04-01 12:25:44 +00:00
Sagar Thakur
fe28f3a1b9 [MIPS][LLVM-MC] Fix JR encoding for MIPSR6 ISA
Summary: The assembler was picking the wrong JR variant because the pre-R6 one was still enabled at R6.

Author: nitesh.jain
Reviewers: vkalintiris, dsanders
Subscribers: dsanders, llvm-commits, mohit.bhakkad, sagar, bhushan, jaydeep
Differential: D18387
llvm-svn: 265134
2016-04-01 11:55:33 +00:00
Andrey Turetskiy
02a9d74413 [X86] Introduce Lakemont CPU.
Add a new Intel MCU CPU Lakemont, which doesn't support X87.

Differential Revision: http://reviews.llvm.org/D18650

llvm-svn: 265128
2016-04-01 10:16:15 +00:00
James Molloy
69a7afba11 Fix for pr24346: arm asm label calculation error in sub
Some ARM instructions encode 32-bit immediates as a 8-bit integer (0-255)
and a 4-bit rotation (0-30, even) in its least significant 12 bits. The
original fixup, FK_Data_4, patches the instruction by the value bit-to-bit,
regardless of the encoding. For example, assuming the label L1 and L2 are
0x0 and 0x104 respectively, the following instruction:

  add r0, r0, #(L2 - L1) ; expects 0x104, i.e., 260

would be assembled to the following, which adds 1 to r0, instead of 260:

  e2800104 add r0, r0, #4, 2 ; equivalently 1

The new fixup kind fixup_arm_mod_imm takes care of the encoding:

  e2800f41 add r0, r0, #260

Patch by Ting-Yuan Huang!

llvm-svn: 265122
2016-04-01 09:40:47 +00:00
Oliver Stannard
e9695b757b [AArch64] Better errors for out-of-range fixups
When a fixup that can be resolved by the assembler is out of range, we should
report an error in the source, rather than crashing.

Differential Revision: http://reviews.llvm.org/D18402

llvm-svn: 265120
2016-04-01 09:14:50 +00:00
Chuang-Yu Cheng
c754367d2e [PPC64] Bug fix: when enabling sibling-call-opt and shrink-wrapping, the tail call branch instruction might disappear
Bug Pattern:
    # BB#0:                                 # %entry
	    cmpldi	 3, 0
	    beq-	 0, .LBB0_2
    # BB#1:                                 # %exit
	    lwz 4, 0(3)
	    #TC_RETURNd8 LVComputationKind 0
    .LBB0_2:                                # %cond.false
	    mflr 0
	    std 0, 16(1)
	    stdu 1, -96(1)
    .Ltmp0:
	    .cfi_def_cfa_offset 96
    .Ltmp1:
	    .cfi_offset lr, 16
	    bl __assert_fail
	    nop

The branch instruction for tail call return is not generated, because the
shrink-wrapping pass choosing a new Restore Point: %cond.false, so %exit
block is not sent to emitEpilogue, that's why the branch is not generated.

Thanks Kit's opinions!
Reviewers: nemanjai hfinkel tjablin kbarton

http://reviews.llvm.org/D17606

llvm-svn: 265112
2016-04-01 06:44:32 +00:00
Michael Kuperstein
57c96fd288 Use range-based for loops. NFC.
llvm-svn: 265105
2016-04-01 03:45:08 +00:00
Matthias Braun
7ad7f48e17 AArch64ISelLowering: Remove unused variables/arguments; NFC
llvm-svn: 265098
2016-04-01 02:49:17 +00:00
Justin Lebar
06adb3808e [NVPTX] Add a truncate DAG node to some calls.
Summary:
Previously, we were running afoul of the assertion

  EVT(CLI.Ins[i].VT) == InVals[i].getValueType() && "LowerCall emitted a value with the wrong type!"

in SelectionDAGBuilder.cpp when running the NVPTX/i8-param.ll test.
This is because our backend (for some reason) treats small return values
as i32, but it wasn't ever truncating the i32 back down to the expected
width in the DAG.

Unclear to me whether this fixes any actual bugs -- in this test, at
least, the generated code is unchanged.

Reviewers: jingyue

Subscribers: llvm-commits, tra, jholewinski

Differential Revision: http://reviews.llvm.org/D17872

llvm-svn: 265091
2016-04-01 01:09:10 +00:00
Justin Lebar
dc761e7edf [NVPTX] Read __CUDA_FTZ from module flags in NVVMReflect.
Summary:
Previously the NVVMReflect pass would read its configuration from
command-line flags or a static configuration given to the pass at
instantiation time.

This doesn't quite work for clang's use-case.  It needs to pass a value
for __CUDA_FTZ down on a per-module basis.  We use a module flag for
this, so the NVVMReflect pass needs to be updated to read said flag.

Reviewers: tra, rnk

Subscribers: cfe-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D18672

llvm-svn: 265090
2016-04-01 01:09:07 +00:00
Justin Lebar
debef79a9f [NVPTX] Annotate some instructions as hasSideEffects = 0.
Summary:
Tablegen tries to infer this from the selection DAG patterns defined for
the instructions, but it can't always.

An instructive example is CLZr64.  CLZr32 is correctly inferred to have
no side-effects, but the selection DAG pattern for CLZr64 is slightly
more complicated, and in particular the ctlz DAG node is not at the root
of the pattern.  Thus tablegen can't infer that CLZr64 has no
side-effects.

Reviewers: jholewinski

Subscribers: jholewinski, tra, llvm-commits

Differential Revision: http://reviews.llvm.org/D17472

llvm-svn: 265089
2016-04-01 01:09:05 +00:00
Hans Wennborg
6d8fafbd5f Follow-up to r265036: I got these iterators mixed up
llvm-svn: 265076
2016-03-31 23:55:16 +00:00
Jun Bum Lim
28eed32332 [AArch64] Allow loads with imp-def to be handled in getMemOpBaseRegImmOfsWidth()
Summary:
This change will allow loads with imp-def to be clustered in machine-scheduler pass.
areMemAccessesTriviallyDisjoint() can also handle loads with imp-def.

Reviewers: mcrosier, jmolloy, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18665

llvm-svn: 265051
2016-03-31 20:53:47 +00:00
Hal Finkel
f7a5afdf0a [PowerPC] Add a late MI-level pass for QPX load/splat simplification
Chapter 3 of the QPX manual states that, "Scalar floating-point load
instructions, defined in the Power ISA, cause a replication of the source data
across all elements of the target register." Thus, if we have a load followed
by a QPX splat (from the first lane), the splat is redundant. This adds a late
MI-level pass to remove the redundant splats in some of these cases
(specifically when both occur in the same basic block).

This optimization is scheduled just prior to post-RA scheduling. It can't happen
before anything that might replace the load with some already-computed quantity
(i.e. store-to-load forwarding).

llvm-svn: 265047
2016-03-31 20:39:41 +00:00
Hans Wennborg
8f51e6b989 Revert r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"
I think it might have caused these build breakages:
http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7234/steps/build%20stage%202/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19566/steps/run%20tests/logs/stdio

llvm-svn: 265046
2016-03-31 20:27:30 +00:00
Benjamin Kramer
a78953e3c5 [ARM] Expand v1i64 and v2i64 ctpop.
The default is legal, which results in 'Cannot select' errors. This is
triggered during selfhost due to a recent cost model change.

llvm-svn: 265040
2016-03-31 19:42:04 +00:00
Hans Wennborg
de026b8f35 [X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)
For code such as:

  void f(int, int);
  void g() {
      f(1, 2);
  }

compiled for 32-bit X86 Linux, Clang would previously generate:

  subl    $12, %esp
  subl    $8, %esp
  pushl   $2
  pushl   $1
  calll   f
  addl    $16, %esp
  addl    $12, %esp
  retl

This patch fixes that by merging adjacent stack adjustments in
eliminateCallFramePseudoInstr().

Differential Revision: http://reviews.llvm.org/D18627

llvm-svn: 265039
2016-03-31 19:26:24 +00:00
Hans Wennborg
07d10a6e30 Change eliminateCallFramePseudoInstr() to return an iterator
This will become necessary in a subsequent change to make this method
merge adjacent stack adjustments, i.e. it might erase the previous
and/or next instruction.

It also greatly simplifies the calls to this function from Prolog-
EpilogInserter. Previously, that had a bunch of logic to resume iteration
after the call; now it just continues with the returned iterator.

Note that this changes the behaviour of PEI a little. Previously,
it attempted to re-visit the new instruction created by
eliminateCallFramePseudoInstr(). That code was added in r36625,
but I can't see any reason for it: the new instructions will obviously
not be pseudo instructions, they will not have FrameIndex operands,
and we have already accounted for the stack adjustment.

Differential Revision: http://reviews.llvm.org/D18627

llvm-svn: 265036
2016-03-31 18:33:38 +00:00