1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00
Commit Graph

10900 Commits

Author SHA1 Message Date
Sanjay Patel
54a084ae08 Use rcpss/rcpps (X86) to speed up reciprocal calcs (PR21385).
This is a first step for generating SSE rcp instructions for reciprocal
calcs when fast-math allows it. This is very similar to the rsqrt optimization
enabled in D5658 ( http://reviews.llvm.org/rL220570 ).

For now, be conservative and only enable this for AMD btver2 where performance
improves significantly both in terms of latency and throughput.

We may never enable this codegen for Intel Core* chips because the divider circuits
are just too fast. On SandyBridge, divss can be as fast as 10 cycles versus the 21
cycle critical path for the rcp + mul + sub + mul + add estimate.

Follow-on patches may allow configuration of the number of Newton-Raphson refinement
steps, add AVX512 support, and enable the optimization for more chips.

More background here: http://llvm.org/bugs/show_bug.cgi?id=21385

Differential Revision: http://reviews.llvm.org/D6175

llvm-svn: 221706
2014-11-11 20:51:00 +00:00
Rafael Espindola
d10e16c7f3 Use a 8 bit immediate when possible.
This fixes pr21529.

llvm-svn: 221700
2014-11-11 19:46:36 +00:00
Dario Domizioli
138d3c8b1e [X86][ELF] Fix PR20243 - leaf frame pointer bug with TLS access
The ISel lowering for global TLS access in PIC mode was creating a pseudo 
instruction that is later expanded to a call, but the code was not 
setting the hasCalls flag in the MachineFrameInfo alongside the adjustsStack 
flag. This caused some functions to be mistakenly recognized as leaf functions,
and this in turn affected the decision to eliminate the frame pointer.

With the fix, hasCalls is properly set and the leaf frame pointer is correctly
preserved.

llvm-svn: 221695
2014-11-11 18:44:49 +00:00
Andrea Di Biagio
8872877147 [X86] Add missing check for 'isINSERTPSMask' in method 'isShuffleMaskLegal'.
This helps the DAGCombiner to identify more opportunities to fold shuffles.

llvm-svn: 221684
2014-11-11 11:20:31 +00:00
Craig Topper
b267ba2986 Use uint64_t as the type for the X86 TSFlag format enum. Allows removal of the VEXShift hack that was used to access the higher bits of TSFlags.
llvm-svn: 221673
2014-11-11 07:32:32 +00:00
Michael Kuperstein
cd941e090d [X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSext
This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits. See PR20494 for details.
Recommitting - This time, with a hopefully working test.

Differential Revision: http://reviews.llvm.org/D6128

llvm-svn: 221672
2014-11-11 07:07:40 +00:00
Rafael Espindola
476a83ecd4 MCAsmParserExtension has a copy of the MCAsmParser. Use it.
Base classes were storing a second copy.

llvm-svn: 221667
2014-11-11 05:18:41 +00:00
Quentin Colombet
6564d569c9 [X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 if
AVX2 is available.
According to IACA, the new lowering has a throughput of 8 cycles instead of 13
with the previous one.

Althought this lowering kicks in some SPECs benchmarks, the performance
improvement was within the noise.

Correctness testing has been done for the whole range of uint32_t with the
following program:
    uint4 v = (uint4) {0,1,2,3};
    uint32_t i;
    
    //Check correctness over entire range for uint4 -> float4 conversion
    for( i = 0; i < 1U << (32-2); i++ )
    {
        float4 t = test(v);
        float4 c = correct(v);
        
        if( 0xf != _mm_movemask_ps( t == c ))
        {
            printf( "Error @ %vx: %vf vs. %vf\n", v, c, t);
            return -1;
        }
        
        v += 4;
    }
Where "correct" is the old lowering and "test" the new one.

The patch adds a test case for the two custom lowering instruction.
It also modifies the vector cost model, which is why cast.ll and uitofp.ll are
modified.
2009-02-26-MachineLICMBug.ll is also modified because we now hoist 7
instructions instead of 4 (3 more constant loads).

rdar://problem/18153096>

llvm-svn: 221657
2014-11-11 02:23:47 +00:00
Michael Kuperstein
c9f353294c Reverting r221626 due to a too-strict test.
llvm-svn: 221629
2014-11-10 21:07:41 +00:00
Michael Kuperstein
db34b4e22e [X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSext
This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits.
See PR20494 for details.

Differential Revision: http://reviews.llvm.org/D6128

llvm-svn: 221626
2014-11-10 20:40:21 +00:00
Rafael Espindola
8cb53479d4 Misc style fixes. NFC.
This fixes a few cases of:

* Wrong variable name style.
* Lines longer than 80 columns.
* Repeated names in comments.
* clang-format of the above.

This make the next patch a lot easier to read.

llvm-svn: 221615
2014-11-10 18:11:10 +00:00
Simon Pilgrim
e2d60b3127 [X86][SSE] Vector integer/float conversion memory folding (cvttps2dq / cvttpd2dq)
Fixed an issue with the (v)cvttps2dq and (v)cvttpd2dq instructions being incorrectly put in the 2 source operand folding tables instead of the 1 source operand and added the missing SSE/AVX versions.

Also added missing (v)cvtps2dq and (v)cvtpd2dq instructions to the folding tables.

Differential Revision: http://reviews.llvm.org/D6001

llvm-svn: 221489
2014-11-06 22:15:41 +00:00
Ahmed Bougacha
d69ce819ea [X86] Add VFMADDSUB cases for the 213->231 custom inserter.
Also add tests for vfmadd/vfmsub.

llvm-svn: 221488
2014-11-06 22:04:15 +00:00
Ahmed Bougacha
97c6b57f51 [X86] Add missing FMA3 VFMADDSUB in the emitter.
Also reuse the fma4 intrinsic test to cover fma3 instructions too.

llvm-svn: 221487
2014-11-06 21:58:11 +00:00
Andrea Di Biagio
a697bfdcaf [X86] When commuting SSE immediate blend, make sure that the new blend mask is a valid imm8.
Example:
define <4 x i32> @test(<4 x i32> %a, <4 x i32> %b) {
  %shuffle = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 4, i32 5, i32 6, i32 3>
  ret <4 x i32> %shuffle
}

Before llc (-mattr=+sse4.1), produced the following assembly instruction:
  pblendw $4294967103, %xmm1, %xmm0

After
  pblendw $63, %xmm1, %xmm0

llvm-svn: 221455
2014-11-06 14:36:45 +00:00
David Majnemer
8c2f8688d2 X86, MC: Tidy up some whitespace in GetRelocType
No functionality change intended.

llvm-svn: 221443
2014-11-06 08:10:37 +00:00
Quentin Colombet
d01225e402 [X86] Lower VSELECT into SHRUNKBLEND when we shrink the bits used into the
condition to match a blend.
This prevents optimizations that work on VSELECT to perform invalid
transformations. Indeed, the optimized condition does not match the vector
boolean content that is expected and bad things may happen.

This patch yields the exact same code on the whole test-suite + specs (-O3 and
-O3 -march=core-avx2), it improves one test case (vector-blend.ll) and fixes a
bug reduced in vselect-avx.ll.

<rdar://problem/18819506>

llvm-svn: 221429
2014-11-06 02:25:03 +00:00
Simon Pilgrim
c2b395ce18 [X86][SSE] Vector integer to float conversion memory folding
Added missing memory folding for the (V)CVTDQ2PS instructions - we can safely fold these (but not the (V)CVTDQ2PD versions which have a register/memory size discrepancy in the source operand). I've added a test case demonstrating that stack folding now works.

Differential Revision: http://reviews.llvm.org/D5981

llvm-svn: 221407
2014-11-05 22:28:25 +00:00
Derek Schuff
2c22caff55 [x86 fast-isel] Materialize allocas with the correct-sized lea for ILP32
Summary:
X86FastISel::fastMaterializeAlloca was incorrectly conditioning its
opcode selection on subtarget bitness rather than pointer size.

Differential Revision: http://reviews.llvm.org/D6136

llvm-svn: 221386
2014-11-05 19:27:21 +00:00
Andrea Di Biagio
38de0c92c6 [X86] Teach method 'isVectorClearMaskLegal' how to check for legal blend masks.
This patch improves the folding of vector AND nodes into blend operations for
targets that feature SSE4.1. A vector AND node where one of the operands is
a constant build_vector with elements that are either zero or all-ones can be
converted into a blend.

This allows for example to simplify the following code:

define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
  %1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1>
  %2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0>
  %3 = or <4 x i32> %1, %2
  ret <4 x i32> %3
}

Before this patch llc (-mcpu=corei7) generated:
        andps  LCPI1_0(%rip), %xmm0, %xmm0
        andps  LCPI1_1(%rip), %xmm1, %xmm1
        orps   %xmm1, %xmm0, %xmm0
        retq

With this patch we generate a single 'vpblendw'.

llvm-svn: 221343
2014-11-05 13:04:14 +00:00
Simon Pilgrim
c7629829ea [X86][SSE] Enable commutation for SSE immediate blend instructions
Patch to allow (v)blendps, (v)blendpd, (v)pblendw and vpblendd instructions to be commuted - swaps the src registers and inverts the blend mask.

This is primarily to improve memory folding (see new tests), but it also improves the quality of shuffles (see modified tests).

Differential Revision: http://reviews.llvm.org/D6015

llvm-svn: 221313
2014-11-04 23:25:08 +00:00
Andrea Di Biagio
1f25a26da7 [X86] Add 'FeatureSlowSHLD' to cpu 'bdver3'. Also explicit set FeatureAVX and FeatureSSE4A for all the bdver* cpus.
This patch adds 'FeatureSlowSHLD' to 'bdver3'.
According to the official AMD optimization guide for amdfam15: "Using
alternative code in place of SHLD achieves lower overall latency and
requires fewer execution resources. The 32-bit and 64-bit forms of
ADD, ADC, SHR, and LEA (except 16-bit form) are DirectPath
instructions, while SHLD is a VectorPath instruction."

This patch also explicitly sets feature AVX and SSE4A for all the bdver*
cpus. This part of the patch is a non-functional change and it is mainly
done for clarity reasons (Both XOP and FMA4 already imply AVX and SSE4A).

llvm-svn: 221296
2014-11-04 21:18:09 +00:00
Ahmed Bougacha
bc21735f09 [X86] Add debug print name for X86ISD::[US]MUL8. NFC-ish.
The opcodes were added in r220516, but I forgot to add the print names.

llvm-svn: 221185
2014-11-03 21:25:18 +00:00
Ahmed Bougacha
f2a8134721 [X86] 8bit divrem: Improve codegen for AH register extraction.
For 8-bit divrems where the remainder is used, we used to generate:
    divb  %sil
    shrw  $8, %ax
    movzbl  %al, %eax

That was to avoid an H-reg access, which is problematic mainly because
it isn't possible in REX-prefixed instructions.

This patch optimizes that to:
    divb  %sil
    movzbl  %ah, %eax

To do that, we explicitly extend AH, and extract the L-subreg in the
resulting register.  The extension is done using the NOREX variants of
MOVZX.  To support signed operations, MOVSX_NOREX is also added.
Further, this introduces a new SDNode type, [us]divrem_ext_hreg, which is
then lowered to a sequence containing a single zext (rather than 2).

Differential Revision: http://reviews.llvm.org/D6064

llvm-svn: 221176
2014-11-03 20:26:35 +00:00
Rafael Espindola
20114e4c02 Remove redundant calls to isMaterializable.
This removes calls to isMaterializable in the following cases:

* It was redundant with a call to isDeclaration now that isDeclaration returns
  the correct answer for materializable functions.
* It was followed by a call to Materialize. Just call Materialize and check EC.

llvm-svn: 221050
2014-11-01 16:46:18 +00:00
Adrian Prantl
401e61e9a6 Revert "Temporarily revert r220777 to sort out build bot breakage."
This reverts commit r221028. Later commits depend on this and
reverting just this one causes even more bots to fail.

llvm-svn: 221041
2014-11-01 03:19:45 +00:00
NAKAMURA Takumi
2419cf2c14 Revert r220779, "[AVX512] Removed special case for cmp instructions in getVectorMaskingNode. Now cmp intrinsics lower as other intrinsics through VSELECT, and then VSELECT tranforms to AND in PerformSELECTCombine."
Since r221028 (reverting r220777), this caused failures.

llvm-svn: 221040
2014-11-01 01:36:14 +00:00
Adrian Prantl
379cee6cd0 Temporarily revert r220777 to sort out build bot breakage.
"[x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros."

llvm-svn: 221028
2014-11-01 00:26:59 +00:00
Reid Kleckner
626248cfdf Work around bugs in MSVC "14" CTP 3's conversion logic
It appears to ignore or find ambiguous MachineInstrBuilder's conversion
operators that allow conversion to MachineInstr* and
MachineBasicBlock::bundle_iterator.

As a workaround, add an explicit way to get the MachineInstr.

llvm-svn: 221017
2014-10-31 23:19:46 +00:00
Robert Khasanov
3e398a3800 [AVX512] Added VBROADCAST{SS/SD} encoding for VL subset.
Refactored through AVX512_maskable
        

llvm-svn: 220908
2014-10-30 14:21:47 +00:00
Robert Khasanov
b26f057244 [AVX512] Implemented AVX512VL FP bnary packed instructions (VADDP*, VSUBP*, VMULP*, VDIVP*, VMAXP*, VMINP*)
Refactored through AVX512_maskable
Added encoding tests for them.

llvm-svn: 220858
2014-10-29 15:43:02 +00:00
Robert Khasanov
4bea47e6eb [AVX512] Fix VSQRT packed instructions internal names.
No functional change

llvm-svn: 220808
2014-10-28 18:22:41 +00:00
Robert Khasanov
2ca56ad410 [AVX512] Extended avx512_sqrt_packed (sqrt instructions) to VL subset.
Refactored through AVX512_maskable

llvm-svn: 220806
2014-10-28 18:15:20 +00:00
Robert Khasanov
d134194df8 [AVX-512] Expanded rsqrt/rcp instructions to VL subset.
Refactored multiclass through AVX512_maskable

llvm-svn: 220783
2014-10-28 16:37:13 +00:00
Robert Khasanov
2b3eb6af20 [AVX512] Removed special case for cmp instructions in getVectorMaskingNode. Now cmp intrinsics lower as other intrinsics through VSELECT, and then VSELECT tranforms to AND in PerformSELECTCombine.
No functional change.

llvm-svn: 220779
2014-10-28 16:17:14 +00:00
Robert Khasanov
fa811056f3 [x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros.
This transformation worked if selector is produced by SETCC, however SETCC is needed only if we consider to swap operands. So I replaced SETCC check for this case.
Added tests for vselect of <X x i1> values.

llvm-svn: 220777
2014-10-28 15:59:40 +00:00
Robert Khasanov
a87de33c61 [AVX512] Bring back vector-shuffle lowering support through broadcasts
Ffter commit at rev219046 512-bit broadcasts lowering become non-optimal. Most of tests on broadcasting and embedded broadcasting were changed and they doesn’t produce efficient code.

Example below is from commit changes (it’s the first test from test/CodeGen/X86/avx512-vbroadcast.ll):

 define   <16 x i32> @_inreg16xi32(i32 %a) {
 ; CHECK-LABEL: _inreg16xi32:
 ; CHECK:       ## BB#0:
-; CHECK-NEXT:    vpbroadcastd %edi, %zmm0
+; CHECK-NEXT:    vmovd %edi, %xmm0
+; CHECK-NEXT:    vpbroadcastd %xmm0, %ymm0
+; CHECK-NEXT:    vinserti64x4 $1, %ymm0, %zmm0, %zmm0
 ; CHECK-NEXT:    retq
 %b = insertelement <16 x i32> undef, i32 %a, i32 0
 %c = shufflevector <16 x i32> %b, <16 x i32> undef, <16 x i32> zeroinitializer
 ret <16 x i32> %c
}

Here, 256-bit broadcast was generated instead of 512-bit one.

In this patch
1) I added vector-shuffle lowering through broadcasts
2) Removed asserts and branches likes because this is incorrect
-  assert(Subtarget->hasDQI() && "We can only lower v8i64 with AVX-512-DQI");
3) Fixed lowering tests

llvm-svn: 220774
2014-10-28 12:28:51 +00:00
Reid Kleckner
af3046bd9e X86: Implement the vectorcall calling convention
This is a Microsoft calling convention that supports both x86 and x86_64
subtargets. It passes vector and floating point arguments in XMM0-XMM5,
and passes them indirectly once they are consumed.

Homogenous vector aggregates of up to four elements can be passed in
sequential vector registers, but this part is not implemented in LLVM
and will be handled in Clang.

On 32-bit x86, it is similar to fastcall in that it uses ecx:edx as
integer register parameters and is callee cleanup. On x86_64, it
delegates to the normal win64 calling convention.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D5943

llvm-svn: 220745
2014-10-28 01:29:26 +00:00
Adam Nemet
f35a0bba1b [AVX512] Add vpermil variable version
This is implemented via a multiclass that derives from the vperm imm
multiclass.

Fixes <rdar://problem/18426089>

llvm-svn: 220737
2014-10-27 23:08:40 +00:00
Adam Nemet
134480598b [AVX512] Clean up avx512_perm_imm to use X86VectorVTInfo
No functionality change.  No change in X86.td.expanded except that we only set
the CD8 attributes for the memory variants.  (This shouldn't be used unless we
have a memory operand.)

llvm-svn: 220736
2014-10-27 23:08:37 +00:00
Adam Nemet
8cf84f2568 [AVX512] Derive vpermil* from avx512_perm_imm
This used to derive from avx512_pshuf_imm which is confusing.

NFC.  Compared X86.td.expanded.

llvm-svn: 220735
2014-10-27 23:08:34 +00:00
Adam Nemet
78db8293e5 [AVX512] Fix copy-and-paste bugs in vpermil
1) i512mem -> f512mem (this is the packed FP input being permuted)
2) element size is 64 bits in EVEX_CD8 for PD.

(A good illustration why X86VectorVTInfo is useful)

llvm-svn: 220734
2014-10-27 23:08:31 +00:00
Pete Cooper
01b2132972 Fix a stackmap bug introduced in r220710.
For a call to not return in to the stackmap shadow, the shadow must end with the call.

To do this, we must insert any required nops *before* the call, and not after it.

llvm-svn: 220728
2014-10-27 22:38:45 +00:00
Pete Cooper
87efb91a50 Stackmap shadows should consider call returns a branch target.
To avoid emitting too many nops, a stackmap shadow can include emitted instructions in the shadow, but these must not include branch targets.

A return from a call should count as a branch target as patching over the instructions after the call would lead to incorrect behaviour for threads currently making that call, when they return.

llvm-svn: 220710
2014-10-27 19:40:35 +00:00
Yuri Gorshenin
0701551505 [asan-asm-instrumentation] Added comment describing how asm instrumentation works.
Summary: [asan-asm-instrumentation] Added comment describing how asm instrumentation works.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5970

llvm-svn: 220670
2014-10-27 08:38:54 +00:00
Elena Demikhovsky
467839f2f5 AVX-512: Fixed encoding of VPBROADCASTM and added SKX forms of this instruction
llvm-svn: 220638
2014-10-26 09:52:24 +00:00
Simon Pilgrim
c20b3d1fdd [X86][SSE] Vector integer/float conversion memory folding
Tidied up some entries in the folding tables so that they are under the correct comment section (they were categorised as AVX2 instructions when they're AVX1).

Minor patch agreed with qcolombet.

llvm-svn: 220613
2014-10-25 08:11:20 +00:00
Kevin Enderby
9e33867d25 Fix a Mach-O assembler segfault for a subtraction expression with an undefined symbol.
In a Mach-O object file a relocatable expression of the form
SymbolA - SymbolB + constant is allowed when both symbols are
defined in a section.  But when either symbol is undefined it
is an error.

The code was crashing when it had an undefined symbol in this case.
And should have printed a error message using the location information
in the relocation entry.

rdar://18678402

llvm-svn: 220599
2014-10-24 22:39:40 +00:00
Simon Pilgrim
29b3d266a4 [X86][SSE] Bitcast assertion in XFormVExtractWithShuffleIntoLoad
Minor patch to fix an issue in XFormVExtractWithShuffleIntoLoad where a load is unary shuffled, then bitcast (to a type with the same number of elements) before extracting an element.

An undef was created for the second shuffle operand using the original (post-bitcasted) vector type instead of the pre-bitcasted type like the rest of the shuffle node - this was then causing an assertion on the different types later on inside SelectionDAG::getVectorShuffle.

Differential Revision: http://reviews.llvm.org/D5917

llvm-svn: 220592
2014-10-24 21:04:41 +00:00
Sanjay Patel
4e42d925c1 Allow AVX vrsqrtps generation.
This is a follow-on to r220570 that allows a 256-bit (v8f32)
version of vrsqrtps to be generated.

llvm-svn: 220579
2014-10-24 17:59:18 +00:00