1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 14:33:02 +02:00
Commit Graph

12435 Commits

Author SHA1 Message Date
Craig Topper
0354af2e80 [X86] Remove assertions that check for valid scale values on scatter/gather intrinsics. Nothing upstream prevented illegal values from getting here.
llvm-svn: 251780
2015-11-02 07:24:40 +00:00
Craig Topper
150b1555c7 [X86] Fold 'if' followed by just an llvm_unreachable into an assert.
llvm-svn: 251778
2015-11-02 07:24:34 +00:00
Craig Topper
df102b96ca [X86] Use isa instead of dyn_cast in a bool context. NFC
llvm-svn: 251777
2015-11-02 07:24:32 +00:00
Craig Topper
cbb29b67b1 [X86] Remove some llvm_unreachables after switches that already have an unreachable in their default case.
llvm-svn: 251776
2015-11-02 07:24:30 +00:00
Craig Topper
055cdb4c83 [X86] Remove a 'break' after an llvm_unreachable.
llvm-svn: 251775
2015-11-02 07:24:27 +00:00
Craig Topper
490cb1be76 [X86] Use cast instead of dyn_cast and a null check marked unreachable.
llvm-svn: 251774
2015-11-02 07:24:25 +00:00
Craig Topper
edd05abeaa [X86] Use MVT instead of EVT when the type is known to be simple. NFC
llvm-svn: 251772
2015-11-02 05:24:22 +00:00
Elena Demikhovsky
f42814b247 AVX-512: Optimized SIMD truncate operations for AVX512F set.
Optimized <8 x i32> to <8 x i16>
<4 x i64> to < 4 x i32>
<16 x i16> to <16 x i8>
All these oprtrations use now AVX512F set (KNL). Before this change it was implemented with AVX2 set.


Differential Revision: http://reviews.llvm.org/D14108

llvm-svn: 251764
2015-11-01 11:45:47 +00:00
Craig Topper
9f7fa4107c [X86] Replace getScalarType with getVectorElementType when the type is already known to be a vector. This should result in slightly less code. NFC
llvm-svn: 251751
2015-10-31 21:44:52 +00:00
Craig Topper
bfd831d3f2 [X86] Convert to MVT instead of calling EVT functions since we already know the type is simple. NFC
llvm-svn: 251745
2015-10-31 18:14:17 +00:00
Craig Topper
057ba18026 [X86] Call getScalarSizeInBits() instead of getScalarType().getScalarSizeInBits(). NFC
llvm-svn: 251744
2015-10-31 18:14:15 +00:00
Craig Topper
403a7454ac [X86] Remove two const references to the return value of a constructor and just use normal object creation syntax. NFC
llvm-svn: 251743
2015-10-31 17:28:02 +00:00
Craig Topper
70916f79d8 [X86] Replace EVT with MVT in some more places. NFC
llvm-svn: 251742
2015-10-31 17:27:59 +00:00
Craig Topper
c1776ebb91 [X86] Fix indentation of case statements in switch. NFC
llvm-svn: 251741
2015-10-31 17:27:56 +00:00
Craig Topper
e9bbc5ba7a [X86] Reduce math for index calculation for inserting and extracting subvectors and elements by exploiting the fact that all supported vector types have a power 2 number of elements.
llvm-svn: 251740
2015-10-31 17:27:52 +00:00
Craig Topper
8ef684fd95 [X86] Use is128BitVector/is256BitVector/is512BitVector in place of getSizeInBits == in some places. NFC
llvm-svn: 251687
2015-10-30 04:31:18 +00:00
Craig Topper
da6d448666 [X86] Minor formatting fixes. NFC.
llvm-svn: 251686
2015-10-30 04:31:14 +00:00
Craig Topper
4059955e80 [X86] Use MVT instead of EVT in some places. NFC
Prior to this the compiled code probably had extra checks for extended types that won't ever execute.

llvm-svn: 251682
2015-10-30 03:19:12 +00:00
Simon Pilgrim
95f9ec47f2 [X86][SSE] Shuffle blends with zero
This patch generalizes the zeroing of vector elements with the BLEND instructions. Currently a zero vector will only blend if the shuffled elements are correctly inline, this patch recognises when a vector input is zero (or zeroable) and modifies a local copy of the shuffle mask to support a blend. As a zeroable vector input may not be all zeroes, the zeroable vector is regenerated if necessary.

Differential Revision: http://reviews.llvm.org/D14050

llvm-svn: 251659
2015-10-29 22:11:28 +00:00
Benjamin Kramer
7ae7c7fe44 Remove CRLF line endings.
llvm-svn: 251594
2015-10-29 02:33:05 +00:00
Cong Hou
1c1bef845a [X86] A small fix in X86/X86TargetTransformInfo.cpp: check a value type is simple before calling getSimpleVT().
llvm-svn: 251538
2015-10-28 18:15:46 +00:00
Benjamin Kramer
9b02f9efc7 Put global classes into the appropriate namespace.
Most of the cases belong into an anonymous namespace. No
functionality change intended.

llvm-svn: 251515
2015-10-28 13:54:36 +00:00
Craig Topper
f193656e65 [X86] Make some for loops over MVTs more explicit (and shorter) by just mentioning all the relevant types in an initializer list. NFC
llvm-svn: 251500
2015-10-28 05:48:32 +00:00
Craig Topper
60ec421ca5 Use range-based for loops and use initializer list to remove a small static array. NFC
llvm-svn: 251494
2015-10-28 04:53:27 +00:00
Craig Topper
7fa00ae045 Remove templates from CostTableLookup functions. All instantiations had the same type.
This also lets us remove the versions of the functions that took a statically sized array as we can rely on ArrayRef implicit conversion now.

llvm-svn: 251490
2015-10-28 04:02:12 +00:00
Asaf Badouh
2110df5fda [X86][AVX512] [X86][AVX512] add convert float to half
convert float to half with mask/maskz for the reg to reg version and mask for the reg to mem version (there is no maskz version for reg to mem).

Differential Revision: http://reviews.llvm.org/D14113

llvm-svn: 251409
2015-10-27 15:37:17 +00:00
Michael Kuperstein
6d9717c9b3 [X86] Make elfiamcu an OS, not an environment.
GNU tools require elfiamcu to take up the entire OS field, so, e.g.
i?86-*-linux-elfiamcu is not considered a legal triple.
Make us compatible.

Differential Revision: http://reviews.llvm.org/D14081

llvm-svn: 251390
2015-10-27 07:23:59 +00:00
Craig Topper
c143499136 Convert cost table lookup functions to return a pointer to the entry or nullptr instead of the index.
This avoid mentioning the table name an extra time and allows the lookup to be done directly in the ifs by relying on the bool conversion of the pointer.

While there make use of ArrayRef and std::find_if.

llvm-svn: 251382
2015-10-27 04:14:24 +00:00
Sanjay Patel
7530f47511 [x86] replace integer logic ops with packed SSE FP logic ops
If we have an operand to a bitwise logic op that's already in
an XMM register and the result is going to be sent to an XMM
register, then use an SSE logic op to avoid moves between the
integer and vector register files.

Related commits:
http://reviews.llvm.org/rL248395
http://reviews.llvm.org/rL248399
http://reviews.llvm.org/rL248404
http://reviews.llvm.org/rL248409
http://reviews.llvm.org/rL248415

This should solve PR22428:
https://llvm.org/bugs/show_bug.cgi?id=22428

llvm-svn: 251378
2015-10-27 01:28:07 +00:00
Sanjay Patel
158d058285 reorganize logic; NFCI (retry r251349)
This is a preliminary step before adding another optimization
to PerformBITCASTCombine().

..and I really hope it's NFC this time!

llvm-svn: 251357
2015-10-26 21:54:14 +00:00
Sanjay Patel
96136a0845 revert r251349; it included code for a functional change
llvm-svn: 251350
2015-10-26 21:28:02 +00:00
Sanjay Patel
e1ed7047d1 reorganize logic; NFCI
This is a preliminary step before adding another optimization
to PerformBITCASTCombine().

llvm-svn: 251349
2015-10-26 21:24:09 +00:00
Evgeniy Stepanov
6d86bc6f79 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

The previous iteration of this change was reverted in r250461. This
version leaves the generic, compiler-rt based implementation in
SafeStack.cpp instead of moving it to TargetLoweringBase in order to
allow testing without a TargetMachine.

llvm-svn: 251324
2015-10-26 18:28:25 +00:00
Igor Breger
0803d66832 AVX512: Enabled VPBROADCASTB lowering for v64i8 vectors.
Differential Revision: http://reviews.llvm.org/D13896

llvm-svn: 251287
2015-10-26 13:01:02 +00:00
Igor Breger
1ab3a1be2f AVX-512: Use correct extract vector length.
Bug https://llvm.org/bugs/show_bug.cgi?id=25318

Differential Revision: http://reviews.llvm.org/D14062

llvm-svn: 251285
2015-10-26 12:26:34 +00:00
Igor Breger
d41ad7cd99 AVX512: Add AVX-512 not materializable instructions.
Otherwise value can be reused , despite its value could be changed - produces incorrect assembler.

https://llvm.org/bugs/show_bug.cgi?id=25270

Differential Revision: http://reviews.llvm.org/D14057

llvm-svn: 251275
2015-10-26 08:37:12 +00:00
David Majnemer
296cfbf069 [MC] Don't crash when .word is given bogus values
We didn't validate that the .word directive was given a sane value,
leading to crashes when we attempt to write out the object file.

Instead, perform some validation and issue a diagnostic pointing at the
start of the diagnostic.

llvm-svn: 251270
2015-10-26 02:45:50 +00:00
Benjamin Kramer
2c2eefe199 Convert assert(false) into llvm_unreachable where it makes sense.
llvm-svn: 251266
2015-10-25 22:28:27 +00:00
Simon Pilgrim
8a425d054a [X86][SSE4A] Fix for EXTRQI shuffle lowering.
Incorrect range test - found during fuzz testing.

llvm-svn: 251245
2015-10-25 17:40:54 +00:00
Elena Demikhovsky
b4a97e3e11 Scalarizer for masked.gather and masked.scatter intrinsics.
When the target does not support these intrinsics they should be converted to a chain of scalar load or store operations.
If the mask is not constant, the scalarizer will build a chain of conditional basic blocks.
I added isLegalMaskedGather() isLegalMaskedScatter() APIs.

Differential Revision: http://reviews.llvm.org/D13722

llvm-svn: 251237
2015-10-25 15:37:55 +00:00
Michael Kuperstein
b8a24f50b1 [X86] Use correct calling convention for MCU psABI libcalls
When using the MCU psABI, compiler-generated library calls should pass
some parameters in-register. However, since inreg marking for x86 is currently
done by the front end, it will not be applied to backend-generated calls.

This is a workaround for PR3997, which describes a similar issue for -mregparm.

Differential Revision: http://reviews.llvm.org/D13977

llvm-svn: 251223
2015-10-25 08:14:05 +00:00
Michael Kuperstein
f5ce8355a3 [X86] Add support for elfiamcu triple
This adds support for the i?86-*-elfiamcu triple, which indicates the IAMCU psABI is used.

Differential Revision: http://reviews.llvm.org/D13977

llvm-svn: 251222
2015-10-25 08:07:37 +00:00
Craig Topper
53b91789eb Remove two unnecessary conversions from MVT to EVT. NFC
llvm-svn: 251219
2015-10-25 03:15:29 +00:00
Simon Pilgrim
667daf3b66 [X86][SSE] Use lowerVectorShuffleWithUNPCK instead of custom matches.
Most 128-bit and 256-bit shuffles were manually matching UNPCK patterns - use lowerVectorShuffleWithUNPCK to be more thorough.

llvm-svn: 251211
2015-10-24 22:45:04 +00:00
Simon Pilgrim
04340b6235 [X86][SSE] lowerVectorShuffleWithUNPCK - use equivalent shuffle mask test.
Use isShuffleEquivalent to match UNPCK shuffles - better support for build vector inputs.

llvm-svn: 251207
2015-10-24 20:48:08 +00:00
Hans Wennborg
c534071ad7 X86ISelLowering: Support tail calls to/from callee pop functions
This enables tail calls with thiscall, stdcall, vectorcall and
fastcall functions.

Differential Revision: http://reviews.llvm.org/D13999

llvm-svn: 251190
2015-10-24 16:47:10 +00:00
Simon Pilgrim
018ed97e7f Fix unused variable warning. NFC.
llvm-svn: 251189
2015-10-24 13:41:45 +00:00
Simon Pilgrim
2da7bcfef6 [X86][XOP] Add support for lowering vector rotations
This patch adds support for lowering to the XOP VPROT / VPROTI vector bit rotation instructions.

This has required changes to the DAGCombiner rotation pattern matching to support vector types - so far I've only changed it to support splat vectors, but generalising this further is feasible in the future.

Differential Revision: http://reviews.llvm.org/D13851

llvm-svn: 251188
2015-10-24 13:17:26 +00:00
Reid Kleckner
32f853dcc1 [X86] Clean up the tail call eligibility logic
Summary:
The logic here isn't straightforward because our support for
TargetOptions::GuaranteedTailCallOpt.

Also fix a bug where we were allowing tail calls to cdecl functions from
fastcall and vectorcall functions. We were special casing thiscall and
stdcall callers rather than checking for any convention that requires
clearing stack arguments before returning.

Reviewers: hans

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14024

llvm-svn: 251137
2015-10-23 19:35:38 +00:00
Joseph Tremoulet
d888298118 [CodeGen] Mark setjmp/catchret MBBs address-taken
Summary:
This ensures that BranchFolding (and similar) won't remove these blocks.

Also allow AsmPrinter::EmitBasicBlockStart to process MBBs which are
address-taken but do not have BBs that are address-taken, since otherwise
its call to getAddrLabelSymbolTableToEmit would fail an assertion on such
blocks.  I audited the other callers of getAddrLabelSymbolTableToEmit
(and getAddrLabelSymbol); they all have BBs known to be address-taken
except for the call through getAddrLabelSymbol from
WinException::create32bitRef; that call is actually now unreachable, so
I've removed it and updated the signature of create32bitRef.

This fixes PR25168.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: pgavlin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13774

llvm-svn: 251113
2015-10-23 15:06:05 +00:00
Eric Christopher
09497cc81a Remove the last traces of X86CompilationCallback as it is completely
unused.

llvm-svn: 251035
2015-10-22 17:55:35 +00:00
Asaf Badouh
99f2354837 [X86][AVX512] extend vcvtph2ps to support xmm/ymm and sae versions
Differential Revision: http://reviews.llvm.org/D13945

llvm-svn: 251018
2015-10-22 14:01:16 +00:00
Elena Demikhovsky
03caf2fe8c AVX-512: Fixed a bug in select_cc for i1 type
Fixed faiure:
LLVM ERROR: Cannot select: t33: i1 = select_cc t25, Constant:i32<0>, t45, t42, seteq:ch

added a test

Differential Revision: http://reviews.llvm.org/D13943

llvm-svn: 250996
2015-10-22 07:10:29 +00:00
Elena Demikhovsky
20f57f03d4 Partially reverted changes from r250686
Clang runtime failure was reported.
   Assertion failed: (isExtended() && "Type is not extended!"), function getTypeForEVT
I'll need to add a proper handling for PointerType in masked load/store intrinsics.

llvm-svn: 250995
2015-10-22 06:20:29 +00:00
Sanjay Patel
e2dc1e0494 [x86] move recursive add match for LEA to helper function; NFCI
llvm-svn: 250926
2015-10-21 18:56:06 +00:00
Craig Topper
dcce633156 [X86] Add AMD mwaitx, monitorx, and clzero instructions to the assembly parser and disassembler.
llvm-svn: 250911
2015-10-21 17:26:45 +00:00
Mehdi Amini
a26e394644 Do not use dyn_cast<X> after isa<X> (NFC)
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 250883
2015-10-21 06:11:01 +00:00
Igor Breger
c385abd09d AVX512: Implemented encoding and intrinsics for VPBROADCASTB/W/D/Q instructions.
Differential Revision: http://reviews.llvm.org/D13884

llvm-svn: 250819
2015-10-20 11:56:42 +00:00
Duncan P. N. Exon Smith
4be8cb04fe X86: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250741
2015-10-19 21:48:29 +00:00
Benjamin Kramer
170c107663 Remove CRLF newlines. NFC.
llvm-svn: 250698
2015-10-19 13:05:25 +00:00
Elena Demikhovsky
2e0208e770 Removed parameter "Consecutive" from isLegalMaskedLoad() / isLegalMaskedStore().
Originally I planned to use the same interface for masked gather/scatter and set isConsecutive to "false" in this case.

Now I'm implementing masked gather/scatter and see that the interface is inconvenient. I want to add interfaces isLegalMaskedGather() / isLegalMaskedScatter() instead of using the "Consecutive" parameter in the existing interfaces.

Differential Revision: http://reviews.llvm.org/D13850

llvm-svn: 250686
2015-10-19 07:43:38 +00:00
Asaf Badouh
381b11d5f2 [X86][AVX512DQ] add scalar fpclass
Differential Revision: http://reviews.llvm.org/D13769

llvm-svn: 250650
2015-10-18 11:04:38 +00:00
Igor Breger
c4af5733a0 AVX512: Lowering i8/i16 vector CTLZ using the dword LZCNT vector instruction
Differential Revision: http://reviews.llvm.org/D13632

llvm-svn: 250649
2015-10-18 09:56:39 +00:00
Simon Pilgrim
a6b49e0950 [X86][XOP] Add VPROT instruction opcodes
Added X86ISD opcodes for VPROT vector rotate by variable and by immediate.

llvm-svn: 250620
2015-10-17 19:04:24 +00:00
Craig Topper
23be433ba9 Replace a custom table sort check with std::is_sorted. Change a function to take ArrayRef instead of pointer and length. NFC
llvm-svn: 250615
2015-10-17 16:37:13 +00:00
Simon Pilgrim
179d5ff620 [CostModel] Fixed AVX integer shift costs
Targets with AVX but without AVX2 were incorrectly reporting costs of 256-bit integer shifts.

llvm-svn: 250611
2015-10-17 13:23:38 +00:00
Simon Pilgrim
7150f7047f [X86][FastISel] Teach how to select SSE4A nontemporal stores.
Add FastISel support for SSE4A scalar float / double non-temporal stores

Follow up to D13698

Differential Revision: http://reviews.llvm.org/D13773

llvm-svn: 250610
2015-10-17 13:04:42 +00:00
Reid Kleckner
a26f7de3d1 [WinEH] Fix stack alignment in funclets and ParentFrameOffset calculation
Our previous value of "16 + 8 + MaxCallFrameSize" for ParentFrameOffset
is incorrect when CSRs are involved. We were supposed to have a test
case to catch this, but it wasn't very rigorous.

The main effect here is that calling _CxxThrowException inside a
catchpad doesn't immediately crash on MOVAPS when you have an odd number
of CSRs.

llvm-svn: 250583
2015-10-16 23:43:27 +00:00
Sanjay Patel
16d66c3d22 [x86] promote 'add nsw' to a wider type to allow more combines
The motivation for this patch starts with PR20134:
https://llvm.org/bugs/show_bug.cgi?id=20134

void foo(int *a, int i) {
  a[i] = a[i+1] + a[i+2];
}

It seems better to produce this (14 bytes):

movslq	%esi, %rsi
movl	0x4(%rdi,%rsi,4), %eax
addl	0x8(%rdi,%rsi,4), %eax
movl	%eax, (%rdi,%rsi,4)

Rather than this (22 bytes):

leal	0x1(%rsi), %eax
cltq             
leal	0x2(%rsi), %ecx      
movslq	%ecx, %rcx     
movl	(%rdi,%rcx,4), %ecx
addl	(%rdi,%rax,4), %ecx
movslq	%esi, %rax       
movl	%ecx, (%rdi,%rax,4)

The most basic problem (the first test case in the patch combines constants) should also be fixed in InstCombine, 
but it gets more complicated after that because we need to consider architecture and micro-architecture. For
example, AArch64 may not see any benefit from the more general transform because the ISA solves the sexting in
hardware. Some x86 chips may not want to replace 2 ADD insts with 1 LEA, and there's an attribute for that: 
FeatureSlowLEA. But I suspect that doesn't go far enough or maybe it's not getting used when it should; I'm 
also not sure if FeatureSlowLEA should also mean "slow complex addressing mode".

I see no perf differences on test-suite with this change running on AMD Jaguar, but I see small code size
improvements when building clang and the LLVM tools with the patched compiler.

A more general solution to the sext(add nsw(x, C)) problem that works for multiple targets is available
in CodeGenPrepare, but it may take quite a bit more work to get that to fire on all of the test cases that
this patch takes care of.

Differential Revision: http://reviews.llvm.org/D13757

llvm-svn: 250560
2015-10-16 22:14:12 +00:00
Andrew Kaylor
28de527b97 Fix assertion failure with fp128 to unsigned i64 conversion
Patch by Mitch Bodart

Differential Revision: http://reviews.llvm.org/D13780

llvm-svn: 250550
2015-10-16 20:39:20 +00:00
Craig Topper
b5be430181 [X86] Add fxsr feature flag for fxsave/fxrestore instructions.
llvm-svn: 250497
2015-10-16 06:03:09 +00:00
Evgeniy Stepanov
17157ff131 Revert "[safestack] Fast access to the unsafe stack pointer on AArch64/Android."
Breaks the hexagon buildbot.

llvm-svn: 250461
2015-10-15 21:26:49 +00:00
Evgeniy Stepanov
f2afc9b765 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

llvm-svn: 250456
2015-10-15 20:50:16 +00:00
JF Bastien
04a5b0845c x86: preserve flags when folding atomic operations
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags. I fixed some of this issue in D13680 but had missed INC/DEC.

This patch adds the missing EFLAGS definition.

llvm-svn: 250438
2015-10-15 18:24:52 +00:00
JF Bastien
836736a5d0 x86 FP atomic codegen: don't drop globals, stack
Summary:
x86 codegen is clever about generating good code for relaxed
floating-point operations, but it was being silly when globals and
immediates were involved, forgetting where the global was and
loading/storing from/to the wrong place. The same applied to hard-coded
address immediates.

Don't let it forget about the displacement.

This fixes https://llvm.org/bugs/show_bug.cgi?id=25171

A very similar bug when doing floating-points atomics to the stack is
also fixed by this patch.

This fixes https://llvm.org/bugs/show_bug.cgi?id=25144

Reviewers: pete

Subscribers: llvm-commits, majnemer, rsmith

Differential Revision: http://reviews.llvm.org/D13749

llvm-svn: 250429
2015-10-15 16:46:29 +00:00
Benjamin Kramer
8f192aeaf6 [X86] Rip out orphaned method declarations and other dead code. NFC.
llvm-svn: 250406
2015-10-15 14:09:59 +00:00
Igor Breger
bd521d55a0 AVX512: Implemented DAG lowering for shuff62x2/shufi62x2 instructions ( shuffle packed values at 128-bit granularity )
Differential Revision: http://reviews.llvm.org/D13648

llvm-svn: 250400
2015-10-15 13:29:07 +00:00
Igor Breger
6e29702ee8 AVX512: Implemented encoding and intrinsics for vpternlogd/q.
Differential Revision: http://reviews.llvm.org/D13768

llvm-svn: 250396
2015-10-15 12:33:24 +00:00
Elena Demikhovsky
50e9acac80 AVX-512: Fixed a bug in shuffle lowering 32-bit mode
AVX-512 bit shuffle fails on 32 bit since we create a vector of 64-bit constants.
I split 8x64-bit const vector to 16x32 on 32-bit mode.

Differential Revision: http://reviews.llvm.org/D13644

llvm-svn: 250390
2015-10-15 11:35:33 +00:00
Craig Topper
ccb1d31da1 Add XSAVE/XSAVEOPT to KNL processor.
llvm-svn: 250362
2015-10-15 03:56:54 +00:00
Andrea Di Biagio
f179f2ff85 [x86][FastISel] Teach how to select nontemporal stores.
This patch teaches x86 fast-isel how to select nontemporal stores.

On x86, we can use MOVNTI for nontemporal stores of doublewords/quadwords.
Instructions (V)MOVNTPS/PD/DQ can be used for SSE2/AVX aligned nontemporal
vector stores.

Before this patch, fast-isel always selected 'movd/movq' instead of 'movnti'
for doubleword/quadword nontemporal stores. In the case of nontemporal stores
of aligned vectors, fast-isel always selected movaps/movapd/movdqa instead of
movntps/movntpd/movntdq.

With this patch, if we use SSE2/AVX intrinsics for nontemporal stores we now
always get the expected (V)MOVNT instructions.
The lack of fast-isel support for nontemporal stores was spotted when analyzing
the -O0 codegen for nontemporal stores.

Differential Revision: http://reviews.llvm.org/D13698

llvm-svn: 250285
2015-10-14 10:03:13 +00:00
Craig Topper
975ee18a02 [X86] Add XSAVE feature flags to their various processors.
llvm-svn: 250268
2015-10-14 05:37:38 +00:00
Sanjay Patel
14776224a0 function names should start with a lower case letter; NFC
llvm-svn: 250174
2015-10-13 16:23:00 +00:00
Sanjay Patel
1d6d311286 don't repeat function/class/variable names in comments; NFC
llvm-svn: 250162
2015-10-13 15:12:27 +00:00
Michael Kuperstein
6430612ef2 Fix line-ending issue. NFC.
llvm-svn: 250151
2015-10-13 06:22:30 +00:00
Craig Topper
60d5a95ab6 [X86] Mark the AAD and AAM aliases as not valid in 64-bit mode.
llvm-svn: 250148
2015-10-13 05:12:07 +00:00
Craig Topper
02d45efab2 [X86] Change all the i8imm operands in XOP instructions to u8imm so the parser will check the size.
llvm-svn: 250147
2015-10-13 05:06:25 +00:00
JF Bastien
36979ec5a3 x86: preserve flags when folding atomic operations
Summary:
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags.

This patch adds the missing EFLAGS definition.

Floating point operations don't set flags, the subsequent fadd
optimization is therefore correct. The same applies for surrounding
load/store optimizations.

Reviewers: rsmith, rtrieu

Subscribers: llvm-commits, reames, morisset

Differential Revision: http://reviews.llvm.org/D13680

llvm-svn: 250135
2015-10-13 00:28:47 +00:00
Reid Kleckner
9480154bf5 Make Win64 localescape offsets FP relative instead of SP relative
We made them SP relative back in March (r233137) because that's the
value the runtime passes to EH functions. With the new cleanuppad IR,
funclets adjust their frame argument from SP to FP, so our offsets
should now be FP-relative.

llvm-svn: 250088
2015-10-12 19:43:34 +00:00
Andrea Di Biagio
64241e8d03 [x86] Fix wrong lowering of vsetcc nodes (PR25080).
Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong
assumption that for non-AVX512 targets, the source type and destination type
of a type-legalized setcc node were always the same type.

This assumption was unfortunately incorrect; the type legalizer is not always
able to promote the return type of a setcc to the same type as the first
operand of a setcc.

In the case of a vsetcc node, the legalizer firstly checks if the first input
operand has a legal type. If so, then it promotes the return type of the vsetcc
to that same type. Otherwise, the return type is promoted to the 'next legal
type', which, for vectors of MVT::i1 is always a 128-bit integer vector type.

Example (-mattr=+avx):

  %0 = trunc <8 x i32> %a to <8 x i23>
  %1 = icmp eq <8 x i23> %0, zeroinitializer

The initial selection dag for the code above is:

v8i1 = setcc t5, t7, seteq:ch
  t5: v8i23 = truncate t2
    t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1
    t7: v8i32 = build_vector of all zeroes.

The type legalizer would firstly check if 't5' has a legal type. If so, then it
would reuse that same type to promote the return type of the setcc node.
Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to
promote the return type of the setcc node. Consequently, the setcc return type
is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the
following dag node:
  v8i16 = setcc t32, t25, seteq:ch

  where t32 and t25 are now values of type v8i32.

Before this patch, function LowerVSETCC would have wrongly expanded the setcc
to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an
instruction. In our case, ISel would have matched a VPCMPEQWrr:
  t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25

However, t36 and t25 are both VR256, while the result type is instead of class
VR128. This inconsistency ended up causing the insertion of COPY instructions
like this:
  %vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3

Which is an invalid full copy (not a sub register copy).
Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy
instruction" in the attempt to expand the malformed pseudo COPY instructions.

This patch fixes the problem adding the missing logic in LowerVSETCC to handle
the corner case of a setcc with 128-bit return type and 256-bit operand type.

This problem was originally reported by Dimitry as PR25080. It has been latent
for a very long time. I have added the minimal reproducible from that bugzilla
as test setcc-lowering.ll.

Differential Revision: http://reviews.llvm.org/D13660

llvm-svn: 250085
2015-10-12 19:22:30 +00:00
Sanjay Patel
918786bb02 combine predicates; NFCI
llvm-svn: 250075
2015-10-12 18:15:08 +00:00
Sanjay Patel
4f799d50c9 fix typos; NFC
llvm-svn: 250059
2015-10-12 16:09:59 +00:00
Sanjay Patel
63e48e87c3 fix capitalization; NFC
llvm-svn: 250049
2015-10-12 15:24:01 +00:00
Amjad Aboud
7b91f508e9 [X86] Add XSAVE intrinsic family
Add intrinsics for the
  XSAVE instructions (XSAVE/XSAVE64/XRSTOR/XRSTOR64)
  XSAVEOPT instructions (XSAVEOPT/XSAVEOPT64)
  XSAVEC instructions (XSAVEC/XSAVEC64)
  XSAVES instructions (XSAVES/XSAVES64/XRSTORS/XRSTORS64)

Differential Revision: http://reviews.llvm.org/D13012

llvm-svn: 250029
2015-10-12 11:47:46 +00:00
Andrea Di Biagio
64ad58bc9d [x86] PR24562: fix incorrect folding of PSHUFB nodes with a mask where all indices have the most significant bit set.
This patch fixes a problem in function 'combineX86ShuffleChain' that causes a
chain of shuffles to be wrongly folded away when the combined shuffle mask has
only one element.

We may end up with a combined shuffle mask of one element as a result of
multiple calls to function 'canWidenShuffleElements()'.
Function canWidenShuffleElements attempts to simplify a shuffle mask by widening
the size of the elements being shuffled.
For every pair of shuffle indices, function canWidenShuffleElements checks if
indices refer to adjacent elements. If all pairs refer to "adjacent" elements
then the shuffle mask is safely widened. As a consequence of widening, we end up
with a new shuffle mask which is half the size of the original shuffle mask.

The byte shuffle (pshufb) from test pr24562.ll has a mask of all SM_SentinelZero
indices. Function canWidenShuffleElements would combine each pair of
SM_SentinelZero indices into a single SM_SentinelZero index. So, in a
logarithmic number of steps (4 in this case), the pshufb mask is simplified to
a mask with only one index which is equal to SM_SentinelZero.

Before this patch, function combineX86ShuffleChain wrongly assumed that a mask
of size one is always equivalent to an identity mask. So, the entire shuffle
chain was just folded away as the combined shuffle mask was treated as a no-op
mask.

With this patch we know check if the only element of a combined shuffle mask is
SM_SentinelZero. In case, we propagate a zero vector.

Differential Revision: http://reviews.llvm.org/D13364

llvm-svn: 250027
2015-10-12 11:25:41 +00:00
Craig Topper
cc35f2e5b6 [X86] Use u8imm for the immediate type for all shift and rotate instructions. This way the assembler will perform range checking. Believe this matches gas behavior.
llvm-svn: 250016
2015-10-12 06:23:10 +00:00
Craig Topper
e7d5502122 [X86] Add support to assembler and MCInst lowering to use the other vmovq %xmmX, %xmmX encoding if it would be a shorter VEX encoding.
llvm-svn: 250014
2015-10-12 04:57:59 +00:00
Craig Topper
e343e59b1b [X86] Cleanup formatting a bit. NFC
llvm-svn: 250013
2015-10-12 04:27:17 +00:00
Craig Topper
18c728de25 [X86] Change the immediate for IN/OUT instructions to u8imm so the assembly parser will check the size.
llvm-svn: 250012
2015-10-12 04:17:55 +00:00
Craig Topper
52829a95cf [X86] Add some instruction aliases to get the assembly parser table to favor arithmetic instructions with 8-bit immediates over the forms that implicitly use the ax/eax/rax.
This allows us to remove the explicit code for working around the existing priority

llvm-svn: 250011
2015-10-12 03:39:57 +00:00
Craig Topper
192161e032 [X86] Fix CMP and TEST with al/ax/eax/rax to not mark EFLAGS as a use or al/ax/eax/rax as a def. Probably doesn't have a functional affect since these aren't used in isel.
llvm-svn: 249994
2015-10-11 19:54:02 +00:00
Craig Topper
7a2aafba5d [X86] Remove special validation for INT immediate operand from AsmParser. Instead mark its operand type as u8imm which will cause it to fail to match. This is more consistent with other instruction behavior.
This also fixes a bug where negative immediates below -128 were not being reported as errors.

llvm-svn: 249989
2015-10-11 18:27:24 +00:00
Craig Topper
a7582c1073 [X86] Simplify immediate range checking code.
llvm-svn: 249979
2015-10-11 16:38:14 +00:00
Simon Pilgrim
ebc177d6ad [X86][XOP] Added support for the lowering of 128-bit vector integer comparisons to XOP PCOM/PCOMU instructions.
The XOP vector integer comparisons can deal with all signed/unsigned comparison cases directly and can be easily commuted as well (D7646).

llvm-svn: 249976
2015-10-11 14:15:17 +00:00
Craig Topper
ebfe76e532 Use range-based for loops. NFC.
llvm-svn: 249941
2015-10-10 05:25:06 +00:00
Craig Topper
57aa5f153c Use emplace_back instead of a constructor call and push_back. NFC
llvm-svn: 249940
2015-10-10 05:25:02 +00:00
David Majnemer
731d894ca2 [WinEH] Remove more dead code
wineh-parent is dead, so is ValueOrMBB.

llvm-svn: 249920
2015-10-10 00:04:29 +00:00
Reid Kleckner
51ced6582b [WinEH] Delete the old landingpad implementation of Windows EH
The new implementation works at least as well as the old implementation
did.

Also delete the associated preparation tests. They don't exercise
interesting corner cases of the new implementation. All the codegen
tests of the EH tables have already been ported.

llvm-svn: 249918
2015-10-09 23:34:53 +00:00
David Majnemer
f5bab94d5d [WinEH] Insert the catchpad return before CSR restoration
x64 catchpads use rax to inform the unwinder where control should go
next.  However, we must initialize rax before the epilogue sequence so
as to not perturb the unwinder.

llvm-svn: 249910
2015-10-09 22:18:45 +00:00
James Y Knight
6a7b902bbf Fix assert in X86 backend.
When running combine on an extract_vector_elt, it wants to look through
a bitcast to check if the argument to the bitcast was itself an
extract_vector_elt with particular operands.

However, it called getOperand() on the argument to the bitcast *before*
checking that the opcode was EXTRACT_VECTOR_ELT, assert-failing if there
were zero operands for the actual opcode.

Fix, and add trivial test.

llvm-svn: 249891
2015-10-09 20:10:14 +00:00
Reid Kleckner
7b4f0dd3be Revert "Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"""
This reverts commit r249794.

Apparently my checkouts are full of unexpected surprises today.

llvm-svn: 249796
2015-10-09 01:13:17 +00:00
Reid Kleckner
a0123b1610 Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64""
This reverts commit r249032.

TODO write commit msg

llvm-svn: 249794
2015-10-09 01:11:37 +00:00
Evgeniy Stepanov
4d73d6235a Add Triple::isAndroid().
This is a simple refactoring that replaces Triple.getEnvironment()
checks for Android with Triple.isAndroid().

llvm-svn: 249750
2015-10-08 21:21:24 +00:00
Eric Christopher
3d07db4e44 Move the MMX subtarget feature out of the SSE set of features and into
its own variable.

This is needed so that we can explicitly turn off MMX without turning
off SSE and also so that we can diagnose feature set incompatibilities
that involve MMX without SSE.

Rationale:

// sse3
__m128d test_mm_addsub_pd(__m128d A, __m128d B) {
  return _mm_addsub_pd(A, B);
}

// mmx
void shift(__m64 a, __m64 b, int c) {
  _mm_slli_pi16(a, c);
  _mm_slli_pi32(a, c);
  _mm_slli_si64(a, c);
  _mm_srli_pi16(a, c);
  _mm_srli_pi32(a, c);
  _mm_srli_si64(a, c);
  _mm_srai_pi16(a, c);
  _mm_srai_pi32(a, c);
}

clang -msse3 -mno-mmx file.c -c

For this code we should be able to explicitly turn off MMX
without affecting the compilation of the SSE3 function and then
diagnose and error on compiling the MMX function.

This matches the existing gcc behavior and follows the spirit of
the SSE/MMX separation in llvm where we can (and do) turn off
MMX code generation except in the presence of intrinsics.

Updated a couple of tests, but primarily tested with a couple of tests
for turning on only mmx and only sse.

This is paired with a patch to clang to take advantage of this behavior.

llvm-svn: 249731
2015-10-08 20:10:06 +00:00
Reid Kleckner
7b27da9328 [WinEH] Relax assertion in the presence of stack realignment
The code is correct as is, but we should test it.

llvm-svn: 249715
2015-10-08 18:41:52 +00:00
Frederic Riss
2e2e9d4ade [X86] Disable X86CallFrameOptimization on Darwin in presence of EH
We emit 1 compact unwind encoding per function, and this can’t represent
the varying stack pointer that will be generated by X86CallFrameOptimization.
Disable the optimization on Darwin.

(It might be possible to split the function into multiple ranges
and emit 1 compact unwind info per range. The compact unwind emission
code isn’t ready for that and this kind of info certainly isn’t
tested/used anywhere. It might be worth exploring this path if we want
to get the space savings at some point though)

llvm-svn: 249694
2015-10-08 15:45:08 +00:00
Igor Breger
63cd1bda1b AVX512: vpextrb/w/d/q and vpinsrb/w/d/q implementation.
This instructions doesn't have intrincis.
Added tests for lowering and encoding.

Differential Revision: http://reviews.llvm.org/D12317

llvm-svn: 249688
2015-10-08 12:55:01 +00:00
Michael Kuperstein
3a6fc10ee4 [X86] Fix wrong treatment of multi-lane blends in BUILD_VECTORtoBlendMask()
This fixes two separate bugs:
1) The mask for the high lane was not set correctly. That fixes PR24532.
2) The transformation should bail out if it believes it involves more than
2 lanes, as it does not currently do anything sensible in this case.

Differential Revision: http://reviews.llvm.org/D13505

llvm-svn: 249669
2015-10-08 08:13:02 +00:00
Reid Kleckner
65e461dfc5 [WinEH] Fix 32-bit funclet epilogues in the presence of dynamic allocas
In particular, passing non-trivially copyable objects by value on win32
uses a dynamic alloca (inalloca). We would clobber ESP in the epilogue
and end up returning to outer space.

llvm-svn: 249637
2015-10-07 23:55:01 +00:00
Kevin B. Smith
598f5e1bea Test commit access. Fixed comment to have correct input parameter name and
period termination.

llvm-svn: 249571
2015-10-07 17:24:25 +00:00
Michael Kuperstein
203878427e [X86] Emit .cfi_escape GNU_ARGS_SIZE when adjusting the stack before calls
When outgoing function arguments are passed using push instructions, and EH
is enabled, we may need to indicate to the stack unwinder that the stack
pointer was adjusted before the call.

This should fix the exception handling issues in PR24792.

Differential Revision: http://reviews.llvm.org/D13132

llvm-svn: 249522
2015-10-07 07:01:31 +00:00
Igor Breger
495e2a8625 AVX512: Change encoding of vpshuflw and vpshufhw instructions. Implement WIG as W0 and not W1, like all other instruction have been implemented.
Add encoding tests.

Differential Revision: http://reviews.llvm.org/D13471

llvm-svn: 249521
2015-10-07 06:31:18 +00:00
Hans Wennborg
7d1f4ff326 Fix Clang-tidy modernize-use-nullptr warnings in source directories and generated files; other minor cleanups.
Patch by Eugene Zelenko!

Differential Revision: http://reviews.llvm.org/D13321

llvm-svn: 249482
2015-10-06 23:24:35 +00:00
Joseph Tremoulet
d1c89447ca [WinEH] Recognize CoreCLR personality function
Summary:
 - Add CoreCLR to if/else ladders and switches as appropriate.
 - Rename isMSVCEHPersonality to isFuncletEHPersonality to better
   reflect what it captures.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: pgavlin, AndyAyers, llvm-commits

Differential Revision: http://reviews.llvm.org/D13449

llvm-svn: 249455
2015-10-06 20:28:16 +00:00
Craig Topper
a4f530a18c [X86] Teach constant hoisting that ANDs with 64-bit immediates in the range 0x80000000-0xffffffff can be handled cheaply and don't need to be hoisted.
Most importantly, this keeps constant hoisting from preventing instruction selections ability to turn an AND with 0xffffffff into a move into a 32-bit subregister.

llvm-svn: 249370
2015-10-06 02:50:24 +00:00
Craig Topper
9392857223 [X86] Remove unnecessary AddComplexity directive. The instruction is already wrapped in the equivalent earlier. NFC
llvm-svn: 249369
2015-10-06 02:50:21 +00:00
David Majnemer
0d41db222e [WinEH] Update CATCHRET's operand to match its successor
The CATCHRET operand did not match the MachineFunction's CFG.  This
mismatch happened because FrameLowering created a new MachineBasicBlock
and updated the CFG but forgot to update the CATCHRET operand.

Let's make sure this doesn't happen again by strengthing the funclet
membership analysis: it can now reason about the membership of all basic
blocks, not just those inside of funclets.

llvm-svn: 249344
2015-10-05 20:09:16 +00:00
Igor Breger
38dd6d8710 AVX512: Implemented encoding and intrinsics for VPERMILPS/PD instructions.
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12690

llvm-svn: 249261
2015-10-04 07:20:41 +00:00
Simon Pilgrim
1803b2cb29 [X86] Lower SEXTLOAD using SIGN_EXTEND_VECTOR_INREG. NCI.
The custom lowering in LowerExtendedLoad is doing the equivalent shuffle, so make use of existing lowering code to reduce duplication.

llvm-svn: 249243
2015-10-03 18:55:43 +00:00
Andrea Di Biagio
a1b187e1bf Reapply r249121 : "[FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types."
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.

Example:

  %1 = bitcast <4 x i31> %V to <2 x i64>

Before (-fast-isel -fast-isel-abort=1):

  FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>

Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.

Originally reviewed here: http://reviews.llvm.org/D13347

llvm-svn: 249147
2015-10-02 16:08:05 +00:00
Andrea Di Biagio
d92074f9ba Revert: [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types.
r249121 caused a Clang test failure (avx2-buitins.c).
Revert r249121 while I keep investigating on the reason why that test failed.

llvm-svn: 249124
2015-10-02 13:06:19 +00:00
Andrea Di Biagio
ecfcdc0d7d [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types.
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.

Example:

  %1 = bitcast <4 x i31> %V to <2 x i64>

Before (-fast-isel -fast-isel-abort=1):

  FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>

Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.

Differential Revision: http://reviews.llvm.org/D13347

llvm-svn: 249121
2015-10-02 12:45:37 +00:00
Reid Kleckner
ad491293c0 [WinEH] Emit __C_specific_handler tables for the new IR
We emit denormalized tables, where every range of invokes in the same
state gets a complete list of EH action entries. This is significantly
simpler than trying to infer the correct nested scoping structure from
the MI. Fortunately, for SEH, the nesting structure is really just a
size optimization.

With this, some basic __try / __except examples work.

llvm-svn: 249078
2015-10-01 21:38:24 +00:00
David Majnemer
a0b6521d43 [WinEH] Make FuncletLayout more robust against catchret
Catchret transfers control from a catch funclet to an earlier funclet.
However, it is not completely clear which funclet the catchret target is
part of.  Make this clear by stapling the catchret target's funclet
membership onto the CATCHRET SDAG node.

llvm-svn: 249052
2015-10-01 18:44:59 +00:00
NAKAMURA Takumi
08692acedd Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"
It broke; LLVM :: CodeGen__Generic__2009-11-16-BadKillsCrash.ll

llvm-svn: 249032
2015-10-01 17:00:56 +00:00
Ahmed Bougacha
eb32174104 [X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.

Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:

  movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
  pand %xmm0, %xmm1
  por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
  psrld $16, %xmm0
  por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
  addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
  addps %xmm1, %xmm0

Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).

In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).

However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.

When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.

Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.

One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.

Fixes PR24512.

...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.

llvm-svn: 248965
2015-10-01 00:11:07 +00:00
Reid Kleckner
9cd26938c1 [WinEH] Emit int3 after noreturn calls on Win64
The Win64 unwinder disassembles forwards from each PC to try to
determine if this PC is in an epilogue. If so, it skips calling the EH
personality function for that frame. Typically, this means you cannot
catch an exception in the same frame that you threw it, because 'throw'
calls a noreturn runtime function.

Previously we avoided this problem with the TrapUnreachable
TargetOption, but that's a much bigger hammer than we need. All we need
is a 1 byte non-epilogue instruction right after the call.  Instead,
what we got was an unconditional branch to a shared block containing the
ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which
added TrapUnreachable, and replaces it with something better.

The new code pattern matches for invoke/call followed by unreachable and
inserts an int3 into the DAG. To be 100% watertight, we would need to
insert SEH_Epilogue instructions into all basic blocks ending in a call
with no terminators or successors, but in practice this is unlikely to
come up.

llvm-svn: 248959
2015-09-30 23:09:23 +00:00
Sanjay Patel
275d6f22b1 [x86] enable machine combiner reassociations for 256-bit vector logical integer insts
llvm-svn: 248955
2015-09-30 22:25:55 +00:00
David Blaikie
d5e11fabef Fix -Wsign-compare warning
llvm-svn: 248942
2015-09-30 20:37:48 +00:00
Simon Pilgrim
d3e938c0f5 [X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions
The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes.

Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases.

Differential Revision: http://reviews.llvm.org/D8690

llvm-svn: 248878
2015-09-30 08:17:50 +00:00
Reid Kleckner
cdc900e40d [WinEH] Setup RBP correctly in Win64 funclet prologues
Previously local variable captures just didn't work in 64-bit. Now we
can access local variables more or less correctly.

llvm-svn: 248857
2015-09-29 23:32:01 +00:00
David Majnemer
e6a9872845 [WinEH] Ensure that funclets obey the x64 ABI
The x64 ABI requires that epilogues do not contain code other than stack
adjustments and some limited control flow.  However, we'd insert code to
initialize the return address after stack adjustments.  Instead, insert
EAX/RAX with the current value before we create the stack adjustments in
the epilogue.

llvm-svn: 248839
2015-09-29 22:33:36 +00:00
Maksim Panchenko
cb20c21c8a HHVM calling conventions.
HHVM calling convention, hhvmcc, is used by HHVM JIT for
functions in translated cache. We currently support LLVM back end to
generate code for X86-64 and may support other architectures in the
future.

In HHVM calling convention any GP register could be used to pass and
return values, with the exception of R12 which is reserved for
thread-local area and is callee-saved. Other than R12, we always
pass RBX and RBP as args, which are our virtual machine's stack pointer
and frame pointer respectively.

When we enter translation cache via hhvmcc function, we expect
the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed
to standard ABI alignment. This affects stack object alignment and stack
adjustments for function calls.

One extra calling convention, hhvm_ccc, is used to call C++ helpers from
HHVM's translation cache. It is almost identical to standard C calling
convention with an exception of first argument which is passed in RBP
(before we use RDI, RSI, etc.)

Differential Revision: http://reviews.llvm.org/D12681

llvm-svn: 248832
2015-09-29 22:09:16 +00:00
David Majnemer
5446226c74 [WinEH] Teach AsmPrinter about funclets
Summary:
Funclets have been turned into functions by the time they hit the object
file.  Make sure that they have decent names for the symbol table and
CFI directives explaining how to reason about their prologues.

Differential Revision: http://reviews.llvm.org/D13261

llvm-svn: 248824
2015-09-29 20:12:33 +00:00
Jeroen Ketema
808a3386e2 Arguments spilled on the stack before a function call may have
alignment requirements, for example in the case of vectors.
These requirements are exploited by the code generator by using
move instructions that have similar alignment requirements, e.g.,
movaps on x86.

Although the code generator properly aligns the arguments with
respect to the displacement of the stack pointer it computes,
the displacement itself may cause misalignment. For example if
we have

%3 = load <16 x float>, <16 x float>* %1, align 64
call void @bar(<16 x float> %3, i32 0)

the x86 back-end emits:

movaps  32(%ecx), %xmm2
movaps  (%ecx), %xmm0
movaps  16(%ecx), %xmm1
movaps  48(%ecx), %xmm3
subl    $20, %esp       <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards 
movaps  %xmm3, (%esp)   <-- movaps requires 16-byte alignment, while %esp is not aligned as such.
movl    $0, 16(%esp)
calll   __bar

To solve this, we need to make sure that the computed value with which
the stack pointer is changed is a multiple af the maximal alignment seen
during its computation. With this change we get proper alignment:

subl    $32, %esp
movaps  %xmm3, (%esp)

Differential Revision: http://reviews.llvm.org/D12337

llvm-svn: 248786
2015-09-29 10:12:57 +00:00
NAKAMURA Takumi
f22bbc09ec [CMake] X86AsmParser: Prune redundant LINK_LIBS.
It is described in LLVMBuild.txt.

llvm-svn: 248771
2015-09-29 01:25:01 +00:00
Sanjay Patel
69ef985eae add a FIXME for a CPU model check that should have an attribute instead
llvm-svn: 248746
2015-09-28 22:00:24 +00:00
Andrew Kaylor
8d27e2d077 Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing.
Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com)

Differential Revision: http://reviews.llvm.org/D11370

llvm-svn: 248735
2015-09-28 20:33:22 +00:00
Yaron Keren
e346f84d3b Silence clang warning: variable ‘Status’ set but not used.
llvm-svn: 248691
2015-09-27 21:31:33 +00:00
Simon Pilgrim
e1d61112d5 [X86][SSE2] Fix zero/any extension shuffles that don't start from the first element
Fix for D12561 - we weren't correctly ensuring that the base element for extension was moved to start on a boundary suitable for UNPCKL/H

llvm-svn: 248536
2015-09-24 21:02:17 +00:00
Sanjay Patel
02c70e2e49 [x86] replace integer 'xor' ops with packed SSE FP 'xor' ops when operating on FP scalars
Turn this:

movd %xmm0, %eax
movd %xmm1, %ecx
xorl %eax, %ecx
movd %ecx, %xmm0

into this:

xorps %xmm1, %xmm0

This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428

This is an extension of:
http://reviews.llvm.org/rL248395

llvm-svn: 248415
2015-09-23 18:33:42 +00:00
Sanjay Patel
296caa9388 [x86] replace integer 'or' ops with packed SSE FP 'or' ops when operating on FP scalars
Turn this:

movd %xmm0, %eax
movd %xmm1, %ecx
orl %eax, %ecx
movd %ecx, %xmm0

into this:

orps %xmm1, %xmm0

This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428

This is an extension of:
http://reviews.llvm.org/rL248395

llvm-svn: 248409
2015-09-23 18:19:07 +00:00
Evgeniy Stepanov
c8b423f66c Android support for SafeStack.
Add two new ways of accessing the unsafe stack pointer:

* At a fixed offset from the thread TLS base. This is very similar to
  StackProtector cookies, but we plan to extend it to other backends
  (ARM in particular) soon. Bionic-side implementation here:
  https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
  neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
  not emutls).

This is a re-commit of a change in r248357 that was reverted in
r248358.

llvm-svn: 248405
2015-09-23 18:07:56 +00:00
Sanjay Patel
6f3e2579be move call to convertIntLogicToFPLogic up; NFCI
The BEXTR comments didn't make sense before, we may want to extend the
FP logic transform to work on vectors, and this way is more beautiful.

llvm-svn: 248404
2015-09-23 18:03:37 +00:00
Sanjay Patel
6833c567fe [x86] move code for converting int logic to FP logic to a helper function; NFCI
This is a follow-on to:
http://reviews.llvm.org/rL248395

so we can add the call to the or/xor combines too.

llvm-svn: 248399
2015-09-23 17:39:41 +00:00
Sanjay Patel
df3037ebeb [x86] replace integer 'and' ops with packed SSE FP 'and' ops when operating on FP scalars
Turn this:
   movd %xmm0, %eax
   movd %xmm1, %ecx
   andl %eax, %ecx
   movd %ecx, %xmm0

into this:
   andps %xmm1, %xmm0


This is related to, but does not solve:
https://llvm.org/bugs/show_bug.cgi?id=22428

Differential Revision: http://reviews.llvm.org/D13065

llvm-svn: 248395
2015-09-23 17:00:06 +00:00
Simon Pilgrim
838c618b88 [X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IR
This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead.

LLVM counterpart to D12835

Differential Revision: http://reviews.llvm.org/D13002

llvm-svn: 248368
2015-09-23 08:48:33 +00:00
Evgeniy Stepanov
cc79e4a6c7 Revert "Android support for SafeStack."
test/Transforms/SafeStack/abi.ll breaks when target is not supported;
needs refactoring.

llvm-svn: 248358
2015-09-23 01:23:22 +00:00
Evgeniy Stepanov
db4d1982a6 Android support for SafeStack.
Add two new ways of accessing the unsafe stack pointer:

* At a fixed offset from the thread TLS base. This is very similar to
  StackProtector cookies, but we plan to extend it to other backends
  (ARM in particular) soon. Bionic-side implementation here:
  https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
  neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
  not emutls).

llvm-svn: 248357
2015-09-23 01:03:51 +00:00
Stephen Canon
7777a204e2 Don't raise inexact when lowering ceil, floor, round, trunc.
The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions *did* raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior.  This also lets us fold some logic into TD patterns, which is nice.

Differential Revision: http://reviews.llvm.org/D12969

llvm-svn: 248266
2015-09-22 11:43:17 +00:00
NAKAMURA Takumi
dd4e00ed1e Untabify.
llvm-svn: 248264
2015-09-22 11:15:07 +00:00
NAKAMURA Takumi
28276df470 Reformat blank lines.
llvm-svn: 248263
2015-09-22 11:14:39 +00:00
NAKAMURA Takumi
a6e5b048c7 Reformat comment lines.
llvm-svn: 248262
2015-09-22 11:14:12 +00:00
NAKAMURA Takumi
8c11e8767b Reformat.
llvm-svn: 248261
2015-09-22 11:13:55 +00:00
Simon Pilgrim
f4315577b8 [X86][SSE] Match zero/any extension shuffles that don't start from the first element
This patch generalizes the lowering of shuffles as zero extensions to allow extensions that don't start from the first element. It now recognises extensions starting anywhere in the lower 128-bits or at the start of any higher 128-bit lane.

The motivation was to reduce the number of high cost pshufb calls, but it also improves the SSE2 case as well.

Differential Revision: http://reviews.llvm.org/D12561

llvm-svn: 248250
2015-09-22 08:16:08 +00:00
Chad Rosier
432f88b93e [Machine Combiner] Refactor machine reassociation code to be target-independent.
No functional change intended.
Patch by Haicheng Wu <haicheng@codeaurora.org>!

http://reviews.llvm.org/D12887
PR24522

llvm-svn: 248164
2015-09-21 15:09:11 +00:00
Asaf Badouh
8011b4b495 [X86][AVX512] add masked version for RSQRT14 & RCP14 Scalar FP
Differential Revision: http://reviews.llvm.org/D12524

llvm-svn: 248147
2015-09-21 10:23:53 +00:00
Igor Breger
a833017e0d AVX512: Implemented encoding and intrinsics for vcmpss/sd.
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12593

llvm-svn: 248121
2015-09-20 15:15:10 +00:00
Asaf Badouh
4ce11a0a36 [X86][AVX512] extend support in Scalar conversion
add scalar FP to Int conversion with truncation intrinsics
add scalar conversion FP32 from/to FP64 intrinsics
add rounding mode and SAE mode encoding for these intrinsics

Differential Revision: http://reviews.llvm.org/D12665

llvm-svn: 248117
2015-09-20 14:31:19 +00:00
Igor Breger
6c78cd17ac AVX512: vsqrtss/sd encoding and intrinsics implementation.
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12102

llvm-svn: 248116
2015-09-20 09:13:41 +00:00
Asaf Badouh
981ab82bef [X86][AVX512DQ] Add fpclass instruction
Differential Revision: http://reviews.llvm.org/D12931

llvm-svn: 248115
2015-09-20 08:46:07 +00:00
Michael Kuperstein
516836f0c5 [X86] Fix sitofp and uitofp instruction matching failures with long double and avx512
The operation action for i32 and i64 cannot be set to legal, as long double 
needs custom lowering.

Patch by: mitch.l.bodart@intel.com
Differential Revision: http://reviews.llvm.org/D12372

llvm-svn: 248114
2015-09-20 08:12:17 +00:00
Igor Breger
5be1c75e87 AVX512: Implemented intrinsics for vshuff32x4, vshuff64x2, vshufi64x2, vshufi32x4
Added tests for intrinsics.

Differential Revision: http://reviews.llvm.org/D12525

llvm-svn: 248113
2015-09-20 07:18:53 +00:00
Igor Breger
43e0d98a01 AVX512: Implement instructions encoding, lowering and intrinsics
vinserti64x4, vinserti64x2, vinserti32x8, vinserti32x4, vinsertf64x4, vinsertf64x2, vinsertf32x8, vinsertf32x4
Added tests for encoding, lowering and intrinsics.

Differential Revision: http://reviews.llvm.org/D11893

llvm-svn: 248111
2015-09-20 06:52:42 +00:00
Simon Pilgrim
353a09ac7f [X86][SSE] Vectorize CTTZ + CTTZ_ZERO_UNDEF
Now that we have fast vector CTPOP implementations we can use this to speed up vector CTTZ using the pattern (cttz(x) = ctpop((x & -x) - 1))

Additionally, for AVX512CD that provides lzcnt instructions we can use the pattern (cttz_undef(x) = (width - 1) - ctlz(x & -x))

Differential Revision: http://reviews.llvm.org/D12663

llvm-svn: 248091
2015-09-19 13:22:57 +00:00
Reid Kleckner
5370cf8a53 [WinEH] Make funclet return instrs pseudo instrs
This makes catchret look more like a branch, and less like a weird use
of BlockAddress. It also lets us get away from
llvm.x86.seh.restoreframe, which relies on the old parentfpoffset label
arithmetic.

llvm-svn: 247936
2015-09-17 20:43:47 +00:00
Elena Demikhovsky
e47e32fe51 AVX-512: shufflevector for i1 vectors <2 x i1> .. <64 x i1>
AVX-512 does not provide an instruction that shuffles mask register. So I do the following way:

mask-2-simd , shuffle simd , simd-2-mask

Differential Revision: http://reviews.llvm.org/D12727

llvm-svn: 247876
2015-09-17 06:53:12 +00:00
Eric Christopher
a096f8ec12 constify the Function parameter to the TTI creation callback and
propagate to all callers/users/etc.

llvm-svn: 247864
2015-09-16 23:38:13 +00:00
Reid Kleckner
59dba79fe0 [WinEH] Rip out the landingpad-based C++ EH state numbering code
It never really worked, and the new code is working better every day.

llvm-svn: 247860
2015-09-16 22:14:46 +00:00
Reid Kleckner
88fa7e5bef [WinEH] Pull Adjectives and CatchObj out of the catchpad arg list
Clang now passes the adjectives as an argument to catchpad.

Getting the CatchObj working is simply a matter of threading another
static alloca through codegen, first as an alloca, then as a frame
index, and finally as a frame offset.

llvm-svn: 247844
2015-09-16 20:16:27 +00:00
Reid Kleckner
7d4d6332db [WinEH] Skip state numbering when no EH pads are present
Otherwise we'd try to emit the thunk that passes the LSDA to
__CxxFrameHandler3. We don't emit the LSDA if there were no landingpads,
so we'd end up with an assembler error when trying to write the COFF
object.

llvm-svn: 247820
2015-09-16 17:19:44 +00:00
Sanjay Patel
a84e4eb51a propagate fast-math-flags on DAG nodes
After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, 
so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: 
if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is
one test case in this patch to prove that point.

This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I 
did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF
( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes.

This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as
FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the
current global settings.

Differential Revision: http://reviews.llvm.org/D12095

llvm-svn: 247815
2015-09-16 16:31:21 +00:00
Michael Kuperstein
55916c8b60 [X86] Do not generate 64-bit pops of 32-bit GPRs.
When trying emit a stack adjustments using pops, frame lowering selects an
arbitrary free GPR. It should always select one from an appropriate class...
This fixes PR24649.

Patch by: amjad.aboud@intel.com
Differential Revision: http://reviews.llvm.org/D12609

llvm-svn: 247785
2015-09-16 11:27:20 +00:00
Michael Kuperstein
919e2954fc [X86] Fix emitEpilogue() to make less assumptions about pops
This is the mirror image of r242395.
When X86FrameLowering::emitEpilogue() looks for where to insert the %esp addition that
deallocates stack space used for local allocations, it assumes that any sequence of pop
instructions from function exit backwards consists purely of restoring callee-save registers.

This may be false, since from some point backward, the pops may be clean-up of stack space
allocated for arguments to a call.

Patch by: amjad.aboud@intel.com
Differential Revision: http://reviews.llvm.org/D12688

llvm-svn: 247784
2015-09-16 11:18:25 +00:00
Daniel Sanders
a6be0437bb Revert r247692: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Eric has replied and has demanded the patch be reverted.

llvm-svn: 247702
2015-09-15 16:17:27 +00:00
Daniel Sanders
2df05b7d7d Re-commit r247683: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).

For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.

This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.

This commit also contains a trivial patch to clang to account for the C++ API
change. Thanks go to Pavel Labath for fixing LLDB for me.

Reviewers: rengolin

Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10969

llvm-svn: 247692
2015-09-15 14:08:28 +00:00
Daniel Sanders
e42ee9384a Revert r247684 - Replace Triple with a new TargetTuple ...
LLDB needs to be updated in the same commit.

llvm-svn: 247686
2015-09-15 13:46:21 +00:00
Daniel Sanders
f247476e1e Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).

For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.

This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.

This commit also contains a trivial patch to clang to account for the C++ API
change.

Reviewers: rengolin

Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10969

llvm-svn: 247683
2015-09-15 13:17:40 +00:00
Daniel Sanders
68efbaf9b2 Fix namespace indentation and missing blank lines before 'public:' in *MCAsmInfo.h. NFC.
This is to reduce noise in a following commit.

Also fixes a couple missing spaces before the reference operator.

llvm-svn: 247679
2015-09-15 12:27:06 +00:00
Simon Pilgrim
0b2a131879 [X86][MMX] Added shuffle decodes for MMX/3DNow! shuffles.
Added shuffle decodes for MMX PUNPCK + PSHUFW shuffles.
Added shuffle decodes for 3DNow! PSWAPD shuffles.

llvm-svn: 247526
2015-09-13 11:28:45 +00:00
Elena Demikhovsky
07a6c5ce5d AVX-512: Fixed a bug in OR/XOR operations for 512-bit FP values on KNL.
KNL does not have VXORPS, VORPS for 512-bit values.
I use integer VPXOR, VPOR that actually do the same.

X86ISD::FXOR/FOR are generated as a result of FSUB combining.

Differential Revision: http://reviews.llvm.org/D12753

llvm-svn: 247523
2015-09-13 08:15:15 +00:00
Sanjay Patel
21adf9726f [x86] enable machine combiner reassociations for 128-bit vector logical integer insts (2nd try)
The changes in:
test/CodeGen/X86/machine-cp.ll
are just due to scheduling differences after some logic instructions were reassociated.

llvm-svn: 247516
2015-09-12 19:47:50 +00:00
Simon Pilgrim
17901f3024 [X86] Renamed lowerVectorShuffleAsUnpack NFCI.
Renamed to lowerVectorShuffleAsPermuteAndUnpack to make it clear that it lowers to more than just a UNPCK instruction.

llvm-svn: 247513
2015-09-12 18:26:47 +00:00
Simon Pilgrim
56e6eab0db [X86] Moved lowerVectorShuffleWithUNPCK earlier to make reuse easier. NFCI.
llvm-svn: 247511
2015-09-12 16:03:06 +00:00
Sanjay Patel
9965d5f476 revert r247506; need to verify changes in existing tests
llvm-svn: 247507
2015-09-12 15:27:31 +00:00
Sanjay Patel
8f3e510d08 [x86] enable machine combiner reassociations for 128-bit vector logical integer insts
llvm-svn: 247506
2015-09-12 14:58:04 +00:00
Akira Hatanaka
a71c14303e Use function attribute "stackrealign" to decide whether stack
realignment should be forced.

With this commit, we can now force stack realignment when doing LTO and
do so on a per-function basis. Also, add a new cl::opt option
"stackrealign" to CommandFlags.h which is used to force stack
realignment via llc's command line.

Out-of-tree projects currently using -force-align-stack to force stack
realignment should make changes to attach the attribute to the functions
in the IR.

Differential Revision: http://reviews.llvm.org/D11814

llvm-svn: 247450
2015-09-11 18:54:38 +00:00
Ahmed Bougacha
3d9f4e715f [CodeGen] Refactor TLI/AtomicExpand interface to make LLSC explicit.
We used to have this magic "hasLoadLinkedStoreConditional()" callback,
which really meant two things:
- expand cmpxchg (to ll/sc).
- expand atomic loads using ll/sc (rather than cmpxchg).

Remove it, and, instead, introduce explicit callbacks:
- bool shouldExpandAtomicCmpXchgInIR(inst)
- AtomicExpansionKind shouldExpandAtomicLoadInIR(inst)

Differential Revision: http://reviews.llvm.org/D12557

llvm-svn: 247429
2015-09-11 17:08:28 +00:00
Ahmed Bougacha
aa999c6622 [CodeGen] Rename AtomicRMWExpansionKind to AtomicExpansionKind.
This lets us generalize its usage to the other atomic instructions.

llvm-svn: 247428
2015-09-11 17:08:17 +00:00
Reid Kleckner
22d50caa1c [WinEH] Push and pop EBP for 32-bit funclets
The Win32 EH runtime caller does not preserve EBP, even though it does
preserve the CSRs (EBX, ESI, EDI) for us. The result was that each
finally funclet call would leave the frame pointer off by 12 bytes.

llvm-svn: 247348
2015-09-10 22:00:02 +00:00