1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 11:02:59 +02:00
Commit Graph

37826 Commits

Author SHA1 Message Date
Tom Stellard
b52bf01ef4 AMDGPU/SI: Don't use fixup_si_rodata for scratch rsrc relocations
Summary:
We need to set the fixup type to FK_Data_4 for the
SCRATCH_RSRC_DWORD[01] symbols, since these require absolute
relocations, and fixup_si_rodata is for relative relocations.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21153

llvm-svn: 272417
2016-06-10 19:26:38 +00:00
Michael Kuperstein
57fc4a3484 [X86] Add costs for SSE zext/sext to v4i64 to TTI
The costs are somewhat hand-wavy, but should be much closer to the truth
than what we get from BasicTTI.

Differential Revision: http://reviews.llvm.org/D21156

llvm-svn: 272406
2016-06-10 17:01:05 +00:00
Evandro Menezes
bcba8613a3 [AArch64] Add preferred alignments for Exynos M1
Differential Revision: http://reviews.llvm.org/D21203

llvm-svn: 272400
2016-06-10 16:00:18 +00:00
Krzysztof Parzyszek
e8ab9012c8 [Hexagon] Remove incorrect offset scaling
llvm-svn: 272399
2016-06-10 15:43:18 +00:00
Roman Shirokiy
173c35e16e Test commit
llvm-svn: 272393
2016-06-10 13:12:48 +00:00
Sam Kolton
4f6d8a41f5 [AMDGPU] AsmParser: Support for sext() modifier in SDWA. Some code cleaning in AMDGPUOperand.
Summary:
sext() modifier is supported in SDWA instructions only for integer operands. Spec is unclear should integer operands support abs and neg modifiers with sext - for now they are not supported.
Renamed InputModsWithNoDefault to FloatInputMods. Added SextInputMods for operands that support sext() modifier.
Added AMDGPUOperand::Modifier struct to handle register and immediate modifiers.
Code cleaning in AMDGPUOperand class: organize method in groups (render-, predicate-methods...).

Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D20968

llvm-svn: 272384
2016-06-10 09:57:59 +00:00
Roger Ferrer Ibanez
f99b65e91e test commit: remove trailing whitespaces in README.txt
llvm-svn: 272380
2016-06-10 08:19:58 +00:00
Craig Topper
62a6e30eb0 [AVX512] Add shuffle comment printing for masked VPERMPD/VPERMQ.
llvm-svn: 272371
2016-06-10 05:12:40 +00:00
Craig Topper
d0d202e5ad [AVX512] Fix shuffle comment printing to handle the masked versions of some shuffles. Previously we were printing the mask operands as the register names.
llvm-svn: 272367
2016-06-10 04:48:05 +00:00
Matt Arsenault
738cbf4e5d AMDGPU: Fix trailing whitespace
llvm-svn: 272364
2016-06-10 02:18:02 +00:00
Matt Arsenault
f4490b9ad5 AMDGPU: v_cndmask_b32 does not def vcc
Fixes verifier errors after SIShrinkInstructions.

llvm-svn: 272351
2016-06-10 00:18:41 +00:00
Tom Stellard
7240ff2203 AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_*permute
Summary:
This fixes a bug with ds_*permute instructions where if it was passed a
constant address, then the offset operand would get assigned a register
operand instead of an immediate.

Reviewers: scchan, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19994

llvm-svn: 272349
2016-06-10 00:01:04 +00:00
Tom Stellard
8ca66729b9 AMDGPU/SI: Use common topological sort algorithm in SIScheduleDAGMI
Reviewers: arsenm, axeldavy

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19823

llvm-svn: 272346
2016-06-09 23:48:02 +00:00
Matt Arsenault
b7b3917848 AMDGPU: Fix flat atomics
The flat atomics could already be selected, but only
when using flat instructions for global memory. Add
patterns for flat addresses.

llvm-svn: 272345
2016-06-09 23:42:54 +00:00
Matt Arsenault
bcec847408 AMDGPU: Fix i64 global cmpxchg
This was using extract_subreg sub0 to extract the low register
of the result instead of sub0_sub1, producing an invalid copy.

There doesn't seem to be a way to use the compound subreg indices
in tablegen since those are generated, so manually select it.

llvm-svn: 272344
2016-06-09 23:42:48 +00:00
Eric Christopher
99f6f9fa1d Add aliases for mfvrsave/mtvrsave.
Update a test as we're now going to emit it for easier reading of
generated assembly as well.

llvm-svn: 272339
2016-06-09 23:27:48 +00:00
Matt Arsenault
ceb1d538ec AMDGPU: Run verifer after insert waits pass
llvm-svn: 272338
2016-06-09 23:19:14 +00:00
Matt Arsenault
524364ac4e AMDGPU: Remove incorrect assertion
I'm still not sure under what circumstances the offset here is non-0,
but private memory is not limited to 27-bits.

llvm-svn: 272337
2016-06-09 23:19:08 +00:00
Matt Arsenault
dde12252bd AMDGPU: Properly initialize SIShrinkInstructions
llvm-svn: 272336
2016-06-09 23:18:47 +00:00
Simon Pilgrim
10fdc6dca9 [X86][AVX512] Added avx512 VPSLLDQ/VPSRLDQ instruction comments
llvm-svn: 272319
2016-06-09 22:03:15 +00:00
Simon Pilgrim
4515a471d4 [X86][AVX512] Dropped avx512 VPSLLDQ/VPSRLDQ intrinsics
Auto-upgrade to generic shuffles like sse/avx2 implementations now that we can lower to VPSLLDQ/VPSRLDQ 

llvm-svn: 272308
2016-06-09 21:09:03 +00:00
Simon Pilgrim
c8cd1f56db [X86][AVX512] Fixed issue with v16i32 shuffles lowering to VPALIGNR
llvm-svn: 272307
2016-06-09 20:53:12 +00:00
Simon Pilgrim
01d87ac7ba [X86][AVX512] Added support for lowering 512-bit vector shuffles to bit/byte shifts
512-bit VPSLLDQ/VPSRLDQ can only be used for avx512bw targets so lowerVectorShuffleAsShift had to be adjusted to include the subtarget

llvm-svn: 272300
2016-06-09 20:13:58 +00:00
Justin Lebar
05e13f8eda [NVPTX] Add intrinsics for shfl instructions.
Summary:
Currently clang emits these instructions via inline (volatile) asm in
the CUDA headers.  Switching to intrinsics will let the optimizer reason
across calls to these intrinsics.

Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D21160

llvm-svn: 272298
2016-06-09 20:04:08 +00:00
Wei Ding
2bbde20e67 AMDGPU/SI: Fix 32-bit fdiv lowering
We were using the fast fdiv lowering for all division, implementation of
IEEE754 fdiv is added.

http://reviews.llvm.org/D20557

llvm-svn: 272292
2016-06-09 19:17:15 +00:00
Ulrich Weigand
3a7df35715 [SystemZ] Enable long displacement constraints for inline ASM operands
This enables use of the 'S' constraint for inline ASM operands on
SystemZ, which allows for a memory reference with a signed 20-bit
immediate displacement. This patch includes corresponding documentation
and test case updates.

I've changed the 'T' constraint to match the new behavior for 'S', as
'T' also uses a long displacement (though index constraints are still
not implemented). I also changed 'm' to match the behavior for 'S' as
this will allow for a wider range of displacements for 'm', though
correct me if that's not the right decision.

Author: colpell
Differential Revision: http://reviews.llvm.org/D21097

llvm-svn: 272266
2016-06-09 15:19:16 +00:00
Hrvoje Varga
62d5d2ff36 [mips][microMIPS] Implement BOVC, BNVC, EXT, INS and JALRC instructions
Differential Revision: http://reviews.llvm.org/D11798

llvm-svn: 272259
2016-06-09 12:57:23 +00:00
James Molloy
e31be9c68b [Thumb] A branch is not part of an IT block
ReplaceTailWithBranchTo assumed that if an instruction is predicated, it must be part of an IT block. This is not correct for conditional branches.

No testcase as this was triggered by the reverted patch r272017 - test coverage will occur when that patch is re-reverted and there is no known way to trigger this in the meantime.

llvm-svn: 272258
2016-06-09 11:51:29 +00:00
Igor Breger
cf92d72d91 [AVX512] Remove masked_move/blendm intrinsic from back-end.
This is complement patch to D21060.

Differential Revision: http://reviews.llvm.org/D21174

llvm-svn: 272257
2016-06-09 11:46:55 +00:00
Zlatko Buljan
63e6509982 [mips][microMIPS] Add CodeGen support for SEL.*, SELEQZ, SELNEZ, SELEQZ.*, SELNEZ.* and CMP.condn.fmt instructions
Differential Revision: http://reviews.llvm.org/D20862

llvm-svn: 272256
2016-06-09 11:15:53 +00:00
Sam Kolton
dc2814b1fe [AMDGPU] Disassembler: Support for sdwa instructions
Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D21129

llvm-svn: 272255
2016-06-09 11:04:45 +00:00
Craig Topper
ebd5461b3b [AVX512] Fix shuffle decode printing for several instructions with write masks. There are still more bugs here with UNPCK and PALIGN for sure. But these were the easiest ones to fix.
llvm-svn: 272252
2016-06-09 07:49:08 +00:00
James Molloy
9342eb300f [Thumb] Select a BIC instead of AND if the immediate can be encoded more optimally negated
If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead;

  int i(int a) {
    return a & 0xfffffeec;
  }

Used to produce:
    ldr r1, [CONSTPOOL]
    ands r0, r1
  CONSTPOOL: 0xfffffeec

And now produces:
    movs    r1, #255
    adds    r1, #20  ; Less costly immediate generation
    bics    r0, r1

llvm-svn: 272251
2016-06-09 07:39:08 +00:00
Craig Topper
a336600f12 [X86] Bring consistent naming to the SSE/AVX and AVX512 PALIGNR instructions. Then add shuffle decode printing for the EVEX forms which is made easier by having the naming structure more similar to other instructions.
llvm-svn: 272249
2016-06-09 07:06:38 +00:00
Craig Topper
830df4dc0b [X86] Fix bad comment in assert. NFC
llvm-svn: 272248
2016-06-09 07:06:33 +00:00
Saleem Abdulrasool
548ac13275 AArch64: support the .arch directive in the IAS
Add support to the AArch64 IAS for the `.arch` directive.  This allows the
assembly input to use architectural functionality in part of a file.  This is
used in existing code like BoringSSL.

Resolves PR26016!

llvm-svn: 272241
2016-06-09 02:56:40 +00:00
Benjamin Kramer
d415569b3b Apply most suggestions of clang-tidy's performance-unnecessary-value-param
Avoids unnecessary copies. All changes audited & pass tests with asan.
No functional change intended.

llvm-svn: 272190
2016-06-08 19:09:22 +00:00
Quentin Colombet
c7fca81422 [AArch64][RegisterBankInfo] G_OR are fine on either GPR or FPR.
Teach AArch64RegisterBankInfo that G_OR can be mapped on either GPR or
FPR for 64-bit or 32-bit values.

Add test cases demonstrating how this information is used to coalesce a
computation on a single register bank.

llvm-svn: 272170
2016-06-08 16:53:32 +00:00
Oliver Stannard
e8d6d1e081 [ARM] MSR instructions implicitly set CPSR
The MSR instructions can write to the CPSR, but we did not model this
fact, so we could emit them in the middle of IT blocks, changing the
condition flags for later instructions in the block.

The tests use two calls to llvm.write_register.i32 because it is valid
to use these instructions at the end of an IT block, which if conversion
does do in some cases. With two calls, the first clobbers the flags, so
a branch has to be used to make the second one conditional.

Differential Revision: http://reviews.llvm.org/D21139

llvm-svn: 272154
2016-06-08 15:26:34 +00:00
Vasileios Kalintiris
042223679b [mips] Add a proper file header in MipsFastISel.cpp
llvm-svn: 272138
2016-06-08 13:13:15 +00:00
Krzysztof Parzyszek
f15287627c [Hexagon] Modify HexagonExpandCondsets to handle subregisters
Also, switch to using functions from LiveIntervalAnalysis to update
live intervals, instead of performing the updates manually.

Re-committing r272045.

llvm-svn: 272135
2016-06-08 12:31:16 +00:00
Diana Picus
2047f3e03c [ARM] Remove redundant check. NFC
isSwift is tested earlier and known to be false when we reach this code.

llvm-svn: 272127
2016-06-08 10:29:02 +00:00
Benjamin Kramer
5d5a0e4f68 Avoid copies of std::strings and APInt/APFloats where we only read from it
As suggested by clang-tidy's performance-unnecessary-copy-initialization.
This can easily hit lifetime issues, so I audited every change and ran the
tests under asan, which came back clean.

llvm-svn: 272126
2016-06-08 10:01:20 +00:00
Igor Breger
c1b262a77f [AVX512] Fix cvtusi2sd instruction Opcode, it should be 0x7B instead of 0x2A.
llvm-svn: 272122
2016-06-08 07:48:23 +00:00
Quentin Colombet
29b8e1633e [AArch64][RegisterBankInfo] Use the generic implementation of copyCost.
Long term we may want to give high cost at FPR to/from GPR copies.

llvm-svn: 272086
2016-06-08 01:24:00 +00:00
Quentin Colombet
342277536e [RegisterBankInfo] Add a size argument for the cost of copy.
The cost of a copy may be different based on how many bits we have to
copy around. E.g., a 8-bit copy may be different than a 32-bit copy.

llvm-svn: 272084
2016-06-08 01:11:03 +00:00
Nicolai Haehnle
364da50d63 AMDGPU: Add amdgpu-ps-wqm-outputs function attributes
Summary:
The presence of this attribute indicates that VGPR outputs should be computed
in whole quad mode. This will be used by Mesa for prolog pixel shaders, so
that derivatives can be taken of shader inputs computed by the prolog, fixing
a bug.

The generated code could certainly be improved: if a prolog pixel shader is
used (which isn't common in modern OpenGL - they're used for gl_Color, polygon
stipples, and forcing per-sample interpolation), Mesa will use this attribute
unconditionally, because it has to be conservative. So WQM may be used in the
prolog when it isn't really needed, and furthermore a silly back-and-forth
switch is likely to happen at the boundary between prolog and main shader
parts.

Fixing this is a bit involved: we'd first have to add a mechanism by which
LLVM writes the WQM-related input requirements to the main shader part binary,
and then Mesa specializes the prolog part accordingly. At that point, we may
as well just compile a monolithic shader...

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D20839

llvm-svn: 272063
2016-06-07 21:37:17 +00:00
Eric Christopher
d5bd140a4f Revert "Differential Revision: http://reviews.llvm.org/D20557"
Author: Wei Ding <wei.ding2@amd.com>
Date:   Tue Jun 7 19:04:44 2016 +0000

    Differential Revision: http://reviews.llvm.org/D20557

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044
    91177308-0d34-0410-b5e6-96231b3b80d8

as it was breaking the bots.

This reverts commit r272044.

llvm-svn: 272056
2016-06-07 20:27:12 +00:00
Etienne Bergeron
3b57eca787 [stack-protection] Add support for MSVC buffer security check
Summary:
This patch is adding support for the MSVC buffer security check implementation

The buffer security check is turned on with the '/GS' compiler switch.
  * https://msdn.microsoft.com/en-us/library/8dbf701c.aspx
  * To be added to clang here: http://reviews.llvm.org/D20347

Some overview of buffer security check feature and implementation:
  * https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx
  * http://www.ksyash.com/2011/01/buffer-overflow-protection-3/
  * http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html


For the following example:
```
int example(int offset, int index) {
  char buffer[10];
  memset(buffer, 0xCC, index);
  return buffer[index];
}
```

The MSVC compiler is adding these instructions to perform stack integrity check:
```
        push        ebp  
        mov         ebp,esp  
        sub         esp,50h  
  [1]   mov         eax,dword ptr [__security_cookie (01068024h)]  
  [2]   xor         eax,ebp  
  [3]   mov         dword ptr [ebp-4],eax  
        push        ebx  
        push        esi  
        push        edi  
        mov         eax,dword ptr [index]  
        push        eax  
        push        0CCh  
        lea         ecx,[buffer]  
        push        ecx  
        call        _memset (010610B9h)  
        add         esp,0Ch  
        mov         eax,dword ptr [index]  
        movsx       eax,byte ptr buffer[eax]  
        pop         edi  
        pop         esi  
        pop         ebx  
  [4]   mov         ecx,dword ptr [ebp-4]  
  [5]   xor         ecx,ebp  
  [6]   call        @__security_check_cookie@4 (01061276h)  
        mov         esp,ebp  
        pop         ebp  
        ret  
```

The instrumentation above is:
  * [1] is loading the global security canary,
  * [3] is storing the local computed ([2]) canary to the guard slot,
  * [4] is loading the guard slot and ([5]) re-compute the global canary,
  * [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling.

Overview of the current stack-protection implementation:
  * lib/CodeGen/StackProtector.cpp
    * There is a default stack-protection implementation applied on intermediate representation.
    * The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie.
    * An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast).
    * Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling.
    * Guard manipulation and comparison are added directly to the intermediate representation.

  * lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
  * lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
    * There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls).
      * see long comment above 'class StackProtectorDescriptor' declaration.
    * The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr).
      * 'getSDagStackGuard' returns the appropriate stack guard (security cookie)
    * The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'.

  * include/llvm/Target/TargetLowering.h
    * Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'.

  * lib/Target/X86/X86ISelLowering.cpp
    * Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm.

Function-based Instrumentation:
  * The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions.
  * To support function-based instrumentation, this patch is
    * adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h),
      * If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue.
    * modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation,
    * generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp),
    * if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp).

Modifications
  * adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp)
  * adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h)

Results

  * IR generated instrumentation:
```
clang-cl /GS test.cc /Od /c -mllvm -print-isel-input
```

```
*** Final LLVM Code input to ISel ***

; Function Attrs: nounwind sspstrong
define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 {
entry:
  %StackGuardSlot = alloca i8*                                                  <<<-- Allocated guard slot
  %0 = call i8* @llvm.stackguard()                                              <<<-- Loading Stack Guard value
  call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot)                  <<<-- Prologue intrinsic call (store to Guard slot)
  %index.addr = alloca i32, align 4
  %offset.addr = alloca i32, align 4
  %buffer = alloca [10 x i8], align 1
  store i32 %index, i32* %index.addr, align 4
  store i32 %offset, i32* %offset.addr, align 4
  %arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0
  %1 = load i32, i32* %index.addr, align 4
  call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false)
  %2 = load i32, i32* %index.addr, align 4
  %arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2
  %3 = load i8, i8* %arrayidx, align 1
  %conv = sext i8 %3 to i32
  %4 = load volatile i8*, i8** %StackGuardSlot                                  <<<-- Loading Guard slot
  call void @__security_check_cookie(i8* %4)                                    <<<-- Epilogue function-based check
  ret i32 %conv
}
```

  * SelectionDAG generated instrumentation:

```
clang-cl /GS test.cc /O1 /c /FA
```

```
"?example@@YAHHH@Z":                    # @"\01?example@@YAHHH@Z"
# BB#0:                                 # %entry
        pushl   %esi
        subl    $16, %esp
        movl    ___security_cookie, %eax                                        <<<-- Loading Stack Guard value
        movl    28(%esp), %esi
        movl    %eax, 12(%esp)                                                  <<<-- Store to Guard slot
        leal    2(%esp), %eax
        pushl   %esi
        pushl   $204
        pushl   %eax
        calll   _memset
        addl    $12, %esp
        movsbl  2(%esp,%esi), %esi
        movl    12(%esp), %ecx                                                  <<<-- Loading Guard slot
        calll   @__security_check_cookie@4                                      <<<-- Epilogue function-based check
        movl    %esi, %eax
        addl    $16, %esp
        popl    %esi
        retl
```

Reviewers: kcc, pcc, eugenis, rnk

Subscribers: majnemer, llvm-commits, hans, thakis, rnk

Differential Revision: http://reviews.llvm.org/D20346

llvm-svn: 272053
2016-06-07 20:15:35 +00:00
Krzysztof Parzyszek
de2b1f74b1 Revert r272045 since GCC doesn't know how to compile it.
llvm-svn: 272048
2016-06-07 19:25:28 +00:00