1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-01 08:23:21 +01:00
Commit Graph

3381 Commits

Author SHA1 Message Date
Eli Bendersky
4afdeeb682 Replace all instances of dg.exp file with lit.local.cfg, since all tests are run with LIT now and now Dejagnu. dg.exp is no longer needed.
Patch reviewed by Daniel Dunbar. It will be followed by additional cleanup patches.

llvm-svn: 150664
2012-02-16 06:28:33 +00:00
Bill Wendling
0ea63367ec Add a test for generating Objective-C metadata from module flags.
llvm-svn: 150635
2012-02-15 23:43:37 +00:00
Pete Cooper
21409dd760 Stop custom lowering forr x86 DEC64m from happening if the load in the lowered sequence has more than 1 user
llvm-svn: 150537
2012-02-15 00:33:37 +00:00
Nadav Rotem
5da800572a Fix PR12000. Some vector operations may use scalar operands with types
that are greater than the vector element type. For example BUILD_VECTOR
of type <1 x i1> with a constant i8 operand.
This patch fixes the assertion.

llvm-svn: 150477
2012-02-14 13:06:32 +00:00
Nadav Rotem
2141a8413e Fix a bug in DAGCombine for the optimization of BUILD_VECTOR. We cant generate a shuffle node from two vectors of different types.
llvm-svn: 150383
2012-02-13 12:42:26 +00:00
Craig Topper
1487726cdf Revert accidental commit of a pruned testcase from r150360.
llvm-svn: 150361
2012-02-13 04:33:33 +00:00
Craig Topper
250c8fb194 Update CanXFormVExtractWithShuffleIntoLoad to ensure bitcasts of loads only have one use. Matches DAGCombiner and prevents vector_shuffles from reaching isel.
llvm-svn: 150360
2012-02-13 04:30:38 +00:00
Pete Cooper
b1229a8866 Fixed bug when custom lowering DEC64m on x86.
If the DEC node had more than one user, it was doing this lowering but
leaving the original DEC node around and so decrementing twice.

Fixes PR11964.

llvm-svn: 150356
2012-02-13 00:10:03 +00:00
Nadav Rotem
ea4aecb3e5 This patch addresses the problem of poor code generation for the zext
v8i8 -> v8i32 on AVX machines. The codegen often scalarizes ANY_EXTEND nodes.
The DAGCombiner has two optimizations that can mitigate the problem. First,
if all of the operands of a BUILD_VECTOR node are extracted from an ZEXT/ANYEXT
nodes, then it is possible to create a new simplified BUILD_VECTOR which uses
UNDEFS/ZERO values to eliminate the scalar ZEXT/ANYEXT nodes.
Second, another dag combine optimization lowers BUILD_VECTOR into a shuffle
vector instruction.

In the case of zext v8i8->v8i32 on AVX, a value in an XMM register is to be
shuffled into a wide YMM register.

This patch modifes the second optimization and allows the creation of
shuffle vectors even when the newly generated vector and the original vector
from which we extract the values are of different types.

llvm-svn: 150340
2012-02-12 15:05:31 +00:00
Anton Korobeynikov
5996573d4b Add support for implicit TLS model used with MS VC runtime.
Patch by Kai Nacke!

llvm-svn: 150307
2012-02-11 17:26:53 +00:00
Andrew Trick
c3cc8fa604 RegAlloc superpass: includes phi elimination, coalescing, and scheduling.
Creates a configurable regalloc pipeline.

Ensure specific llc options do what they say and nothing more: -reglloc=... has no effect other than selecting the allocator pass itself. This patch introduces a new umbrella flag, "-optimize-regalloc", to enable/disable the optimizing regalloc "superpass". This allows for example testing coalscing and scheduling under -O0 or vice-versa.

When a CodeGen pass requires the MachineFunction to have a particular property, we need to explicitly define that property so it can be directly queried rather than naming a specific Pass. For example, to check for SSA, use MRI->isSSA, not addRequired<PHIElimination>.

CodeGen transformation passes are never "required" as an analysis

ProcessImplicitDefs does not require LiveVariables.

We have a plan to massively simplify some of the early passes within the regalloc superpass.

llvm-svn: 150226
2012-02-10 04:10:36 +00:00
NAKAMURA Takumi
81f7ad5b9b test/CodeGen/X86/atom-lea-sp.ll: Add explicit -mtriple=i686-linux.
llvm-svn: 150151
2012-02-09 05:12:58 +00:00
Evan Cheng
1be96ff50e Commit Andy Zhang's test for the lea patch.
llvm-svn: 150107
2012-02-08 22:33:17 +00:00
Elena Demikhovsky
87a6e08d3a Fixed a bug in printing "cmp" pseudo ops.
> This IR code
> %res = call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a0, <8 x float> %a1, i8 14)
> fails with assertion:
>
> llc: X86ATTInstPrinter.cpp:62: void llvm::X86ATTInstPrinter::printSSECC(const llvm::MCInst*, unsigned int, llvm::raw_ostream&): Assertion `0 && "Invalid ssecc argument!"' failed.
> 0  llc             0x0000000001355803
> 1  llc             0x0000000001355dc9
> 2  libpthread.so.0 0x00007f79a30575d0
> 3  libc.so.6       0x00007f79a23a1945 gsignal + 53
> 4  libc.so.6       0x00007f79a23a2f21 abort + 385
> 5  libc.so.6       0x00007f79a239a810 __assert_fail + 240
> 6  llc             0x00000000011858d5 llvm::X86ATTInstPrinter::printSSECC(llvm::MCInst const*, unsigned int, llvm::raw_ostream&) + 119

I added the full testing for all possible pseudo-ops of cmp.
I extended X86AsmPrinter.cpp and X86IntelInstPrinter.cpp.

You'l also see lines alignments (unrelated to this fix) in X86IselLowering.cpp from my previous check-in.

llvm-svn: 150068
2012-02-08 08:37:26 +00:00
Craig Topper
a8a69356e1 Add instruction selection for 256-bit VPSHUFD and 128-bit VPERMILPS/VPERMILPD.
llvm-svn: 149968
2012-02-07 06:28:42 +00:00
Benjamin Kramer
8e54f21216 Testing vector code without sse doesn't make much sense.
Should bring arm and ppc testers back to life (they default to -mcpu=generic)

llvm-svn: 149821
2012-02-05 11:19:39 +00:00
Chris Lattner
4881c9ecb0 Add a test for the miscompilation my recent ConstantDataArray patches introduced, to make sure
we don't regress on it in the future.

llvm-svn: 149803
2012-02-05 02:37:36 +00:00
Craig Topper
c289726019 Remove most of the intrinsics for XOP VPCMOV instruction. They all aliased to the same instruction with different types. This would be better accomplished with casts in the not yet created xopintrin.h header file.
llvm-svn: 149795
2012-02-05 00:55:56 +00:00
Nadav Rotem
5c5681cf27 The type-legalizer often scalarizes code. One of the common patterns is extract-and-truncate.
In this patch we optimize this pattern and convert the sequence into extract op of a narrow type.
This allows the BUILD_VECTOR dag optimizations to construct efficient shuffle operations in many cases.

llvm-svn: 149692
2012-02-03 13:18:25 +00:00
Matt Beaumont-Gay
8b5dfe05f5 Unix line endings
llvm-svn: 149615
2012-02-02 19:00:49 +00:00
Elena Demikhovsky
7ca11b6e3f Optimization for SIGN_EXTEND operation on AVX.
Special handling was added for v4i32 -> v4i64 and v8i16 -> v8i32
extensions.

llvm-svn: 149600
2012-02-02 09:10:43 +00:00
Lang Hames
004f627ed6 Set EFLAGS correctly in EmitLoweredSelect on X86.
llvm-svn: 149597
2012-02-02 07:48:37 +00:00
Andrew Trick
d09b64fc25 Instruction scheduling itinerary for Intel Atom.
Adds an instruction itinerary to all x86 instructions, giving each a default latency of 1, using the InstrItinClass IIC_DEFAULT.

Sets specific latencies for Atom for the instructions in files X86InstrCMovSetCC.td, X86InstrArithmetic.td, X86InstrControl.td, and X86InstrShiftRotate.td. The Atom latencies for the remainder of the x86 instructions will be set in subsequent patches.

Adds a test to verify that the scheduler is working.

Also changes the scheduling preference to "Hybrid" for i386 Atom, while leaving x86_64 as ILP.

Patch by Preston Gurd!

llvm-svn: 149558
2012-02-01 23:20:51 +00:00
Mon P Wang
7313ffe333 Avoid creating an extract element to an illegal type after LegalizeTypes has run.
llvm-svn: 149548
2012-02-01 22:15:20 +00:00
NAKAMURA Takumi
0bb21fdfce test/CodeGen/X86/avx-minmax.ll: Relax expressions for Win32 targets. YMM arguments are passed as indirect on Win32 x64.
llvm-svn: 149505
2012-02-01 14:35:29 +00:00
Elena Demikhovsky
455db87d41 Passing AVX 256-bit structures in Win64 was wrong.
Fixed Win64 calling conventions.

llvm-svn: 149494
2012-02-01 10:46:14 +00:00
Elena Demikhovsky
da37eb48d8 Optimization for "truncate" operation on AVX.
Truncating v4i64 -> v4i32 and v8i32 -> v8i16 may be done with set of shuffles.

llvm-svn: 149485
2012-02-01 07:56:44 +00:00
Craig Topper
2b764de6ab Remove pcmpgt/pcmpeq intrinsics as clang is not using them.
llvm-svn: 149367
2012-01-31 06:52:44 +00:00
Bill Wendling
7761976036 Remove all references to the old EH.
There was always the current EH. -- Ministry of Truth

llvm-svn: 149335
2012-01-31 02:09:07 +00:00
Chandler Carruth
865317627e Chris's constant data sequence refactoring actually enabled printing
vectors of all one bits to be printed more cleverly in the AsmPrinter.
Unfortunately, the byte value for all one bits is the same with
-fsigned-char as the error return of '-1'. Force this to be the unsigned
byte value when returning it to avoid this problem, and update the test
case for the shiny new behavior.

Yay for building LLVM and Clang with -funsigned-char.

Chris, please review, and let me know if there is any reason to not
desire this change. It seems good on the surface, and certainly intended
based on the code written.

llvm-svn: 149299
2012-01-30 23:47:44 +00:00
Craig Topper
9a8c6c1633 Fix pattern for memory form of PSHUFD for use with FP vectors to remove bitcast to an integer vector that normal code wouldn't have. Also remove bitcasts from code that turns splat vector loads into a shuffle as it was making the broken pattern necessary.
llvm-svn: 149232
2012-01-30 07:50:31 +00:00
Rafael Espindola
7bddde2b49 Add r149110 back with a fix for when the vector and the int have the same
width.

llvm-svn: 149151
2012-01-27 23:33:07 +00:00
Rafael Espindola
7800e62486 Revert r149110 and add a testcase that was crashing since that revision.
Unfortunately I also had to disable constant-pool-sharing.ll the code it tests has been
updated to use the IL logic.

llvm-svn: 149148
2012-01-27 22:42:48 +00:00
Matt Beaumont-Gay
f1e0eb546a Unix line endings
llvm-svn: 149115
2012-01-27 02:31:29 +00:00
Jakob Stoklund Olesen
c63a45ebe6 Handle call-clobbered ymm registers on Win64.
The Win64 calling convention has xmm6-15 as callee-saved while still
clobbering all ymm registers.

Add a YMM_HI_6_15 pseudo-register that aliases the clobbered part of the
ymm registers, and mark that as call-clobbered.  This allows live xmm
registers across calls.

This hack wouldn't be necessary with RegisterMask operands representing
the call clobbers, but they are not quite operational yet.

llvm-svn: 149088
2012-01-26 22:59:28 +00:00
Victor Umansky
bf35274368 Fix for the following bug in AVX codegen for double-to-int conversions:
.	"fptosi" and "fptoui" IR instructions are defined with round-to-zero rounding mode.
.	Currently for AVX mode for <4xdouble> and <8xdouble>  the "VCVTPD2DQ.128" and "VCVTPD2DQ.256" instructions are selected (for .fp_to_sint. DAG node operation ) by AVX codegen. However they use round-to-nearest-even rounding mode.
.	Consequently, the conversion produces incorrect numbers.
 
The fix is to replace selection of VCVTPD2DQ instructions with VCVTTPD2DQ instructions. The latter use truncate (i.e. round-to-zero) rounding mode. 
As .fp_to_sint. DAG node operation is used only for lowering of  "fptosi" and "fptoui" IR instructions, the fix in X86InstrSSE.td definition file doesn.t have an impact on other LLVM flows.
 
The patch includes changes in the .td file, LIT test for the changes and a fix in a legacy LIT test (which produced asm code conflicting with LLVN IR spec). 

llvm-svn: 149056
2012-01-26 08:51:39 +00:00
Anton Korobeynikov
682b2821ce Properly emit ctors / dtors with priorities into desired sections
and let linker handle the rest.

This finally fixes PR5329

llvm-svn: 148990
2012-01-25 22:24:19 +00:00
Elena Demikhovsky
ee8c87b433 ZERO_EXTEND operation is optimized for AVX.
v8i16 -> v8i32, v4i32 -> v4i64 - used vpunpck* instructions.

llvm-svn: 148803
2012-01-24 13:54:13 +00:00
Craig Topper
e2d3f3060d Add support for selecting 256-bit PALIGNR.
llvm-svn: 148532
2012-01-20 05:53:00 +00:00
Eli Friedman
d13443515e Remove a low-quality test which was failing on Windows; test/CodeGen/X86/sret.ll is a better test for the relevant behavior.
llvm-svn: 148526
2012-01-20 02:06:40 +00:00
Eli Friedman
ffa3b8b5f1 Support MSVC x86-32 sret convention. PR11688. Patch by Joe Groff.
llvm-svn: 148513
2012-01-20 00:05:46 +00:00
Nick Lewycky
9888a01041 Space after punctuation.
llvm-svn: 148451
2012-01-19 01:13:47 +00:00
Nick Lewycky
c1e7e2eaf6 Add a TargetOption for disabling tail calls.
llvm-svn: 148442
2012-01-19 00:34:10 +00:00
Nadav Rotem
afc446fbcb Fix a bug in the type-legalization of vector integers. When we bitcast one vector type to another, we must not bitcast the result if one type is widened while the other is promoted.
llvm-svn: 148383
2012-01-18 08:33:18 +00:00
Nadav Rotem
c23e698b5c Transform: (EXTRACT_VECTOR_ELT( VECTOR_SHUFFLE )) -> EXTRACT_VECTOR_ELT.
llvm-svn: 148337
2012-01-17 21:44:01 +00:00
Nadav Rotem
c460c870b2 Fix 11769.
In CanXFormVExtractWithShuffleIntoLoad we assumed that EXTRACT_VECTOR_ELT can be later handled by the DAGCombiner.
However, in some cases on AVX, the EXTRACT_VECTOR_ELT is legalized to EXTRACT_SUBVECTOR + EXTRACT_VECTOR_ELT, which
currently is not handled by the DAGCombiner. In this patch I added a check that we only extract from the XMM part.

llvm-svn: 148298
2012-01-17 09:13:19 +00:00
Eli Friedman
a343d87eac Make sure the non-SSE lowering for fences correctly clobbers EFLAGS. PR11768.
llvm-svn: 148240
2012-01-16 16:42:21 +00:00
Nadav Rotem
b36c029e3b [AVX] Optimize x86 VSELECT instructions using SimplifyDemandedBits.
We know that the blend instructions only use the MSB, so if the mask is
sign-extended then we can convert it into a SHL instruction. This is a
common pattern because the type-legalizer sign-extends the i1 type which
is used by the LLVM-IR for the condition.

Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL.

llvm-svn: 148225
2012-01-15 19:27:55 +00:00
Chandler Carruth
7ee4fa971b Relax the FileCheck assertion a bit -- all we really care about is that
we're loading from the global array, not how it is spelled in the asm.
This should fix the MSVC bots.

llvm-svn: 148214
2012-01-15 09:38:59 +00:00
Chandler Carruth
e57b0f2a8b FileCheck-ize a test, make it more specific to directly test the shift
removal desired.

llvm-svn: 148213
2012-01-15 09:32:57 +00:00
Chad Rosier
b4f2ccc1e0 Cleanup test case by adding checks for test names.
llvm-svn: 148166
2012-01-14 01:46:51 +00:00
Rafael Espindola
551f422d34 Add a test showing how the Leh_func_endN symbol is used.
llvm-svn: 148161
2012-01-14 00:12:59 +00:00
Craig Topper
71ea42cc29 Add patterns for v16i16 and v32i8 immAllZerosV to select VPXOR to match v4i64 and v8i32.
llvm-svn: 148106
2012-01-13 06:59:47 +00:00
Elena Demikhovsky
beb66de0f9 Fixed a bug in LowerVECTOR_SHUFFLE caused assertion failure
lc: X86ISelLowering.cpp:6480: llvm::SDValue llvm::X86TargetLowering::LowerVECTOR_SHUFFLE(llvm::SDValue, llvm::SelectionDAG&) const: Assertion `V1.getOpcode() != ISD::UNDEF&&  "Op 1 of shuffle should not be undef"' failed.
Added a test.

llvm-svn: 148044
2012-01-12 20:33:10 +00:00
Rafael Espindola
d079866e36 Add error-reporting tests for platforms that don't support segmented stacks.
Patch by Brian Anderson.

llvm-svn: 148042
2012-01-12 20:26:13 +00:00
Rafael Espindola
959adf57db Support segmented stacks on 64-bit FreeBSD.
This patch uses tcb_spare field in the tcb structure to store info.
Patch by Jyun-Yan You.

llvm-svn: 148041
2012-01-12 20:24:30 +00:00
Rafael Espindola
dda46f4081 Support segmented stacks on win32.
Uses the pvArbitrary slot of the TIB, which is reserved for applications. We
only support frames with a static size.

llvm-svn: 148040
2012-01-12 20:22:08 +00:00
Nadav Rotem
618722de09 Fix a bug in the AVX 256-bit shuffle code in cases where the splat element is on the boundary of two 128-bit vectors.
The attached testcase was stuck in an endless loop.

llvm-svn: 148027
2012-01-12 15:31:55 +00:00
Benjamin Kramer
ae4ad5f924 X86: Generalize the x << (y & const) optimization to also catch masks with more set bits set than 31 or 63.
llvm-svn: 148024
2012-01-12 12:41:34 +00:00
Nadav Rotem
18cdee3bee On AVX, we can load v8i32 at a time. The bug happens when two uneven loads are used.
When we load the v12i32 type, the GenWidenVectorLoads method generates two loads: v8i32 and v4i32 
and attempts to use CONCAT_VECTORS to join them. In this fix I concat undef values to widen 
the smaller value. The test "widen_load-2.ll" also exposes this bug on AVX.

llvm-svn: 147964
2012-01-11 20:19:17 +00:00
Bill Wendling
cab765a3d5 Check to make sure that the CFString's back store ends up in the correct section.
llvm-svn: 147961
2012-01-11 19:33:37 +00:00
Rafael Espindola
cd3a456eef Support segmented stacks on mac.
This uses TLS slot 90, which actually belongs to JavaScriptCore. We only support
frames with static size
Patch by Brian Anderson.

llvm-svn: 147960
2012-01-11 19:00:37 +00:00
Rafael Espindola
431029d375 Split segmented stacks tests into tests for static- and dynamic-size frames.
Patch by Brian Anderson.

llvm-svn: 147959
2012-01-11 18:51:03 +00:00
Rafael Espindola
f7e3528d92 Generate the segmented stack prologue for fastcc too.
Patch by Brian Anderson.

llvm-svn: 147958
2012-01-11 18:41:19 +00:00
Chandler Carruth
e3facd7860 Revert r147945 which disabled an addressing mode transformation. I had
hoped this would revive one of the llvm-gcc selfhost build bots, but it
didn't so it doesn't appear that my transform is the culprit.

If anyone else is seeing failures, please let me know!

llvm-svn: 147957
2012-01-11 18:36:12 +00:00
Rafael Espindola
12313ca273 Use unsigned comparison in segmented stack prologue.
This is a comparison of two addresses, and GCC does the comparison unsigned.

Patch by Brian Anderson.

llvm-svn: 147954
2012-01-11 18:23:35 +00:00
Rafael Espindola
dd2f5fe210 Explicitly set the scale to 1 on some segstack prologue instrs.
Patch by Brian Anderson.

llvm-svn: 147952
2012-01-11 18:14:03 +00:00
Jan Sjödin
cb98434ebe Add XOP Intrinsics and tests
llvm-svn: 147949
2012-01-11 15:20:20 +00:00
Nadav Rotem
01bf090299 Fix a bug in the lowering of BUILD_VECTOR for AVX. SCALAR_TO_VECTOR does not zero untouched elements. Use INSERT_VECTOR_ELT instead.
llvm-svn: 147948
2012-01-11 14:07:51 +00:00
Chandler Carruth
e443e5e55d Disable the transformation I added in r147936 to see if it fixes some
strange build bot failures that look like a miscompile into an infloop.
I'll investigate this tomorrow, but I'd both like to know whether my
patch is the culprit, and get the bots back to green.

llvm-svn: 147945
2012-01-11 12:17:47 +00:00
Jakob Stoklund Olesen
9baf09c11a Fix undefined code and reenable test case.
I don't think the compact encoding code is right, but at least is has
defined behavior now.

llvm-svn: 147938
2012-01-11 09:08:04 +00:00
Chandler Carruth
b3371fa250 Teach the X86 instruction selection to do some heroic transforms to
detect a pattern which can be implemented with a small 'shl' embedded in
the addressing mode scale. This happens in real code as follows:

  unsigned x = my_accelerator_table[input >> 11];

Here we have some lookup table that we look into using the high bits of
'input'. Each entity in the table is 4-bytes, which means this
implicitly gets turned into (once lowered out of a GEP):

  *(unsigned*)((char*)my_accelerator_table + ((input >> 11) << 2));

The shift right followed by a shift left is canonicalized to a smaller
shift right and masking off the low bits. That hides the shift right
which x86 has an addressing mode designed to support. We now detect
masks of this form, and produce the longer shift right followed by the
proper addressing mode. In addition to saving a (rather large)
instruction, this also reduces stalls in Intel chips on benchmarks I've
measured.

In order for all of this to work, one part of the DAG needs to be
canonicalized *still further* than it currently is. This involves
removing pointless 'trunc' nodes between a zextload and a zext. Without
that, we end up generating spurious masks and hiding the pattern.

llvm-svn: 147936
2012-01-11 08:41:08 +00:00
NAKAMURA Takumi
71c80e7fe7 llvm/test/CodeGen/X86/zext-fold.ll: Relax an expression in stack offset.
llvm-svn: 147928
2012-01-11 07:34:22 +00:00
NAKAMURA Takumi
83fb05f0c7 llvm/test/CodeGen/X86/sub-with-overflow.ll: Add explicit -mtriple=i686-linux.
llvm-svn: 147927
2012-01-11 07:34:14 +00:00
Jakob Stoklund Olesen
77f594c37b Disable test that seems to expose an unrelated Linux issue.
llvm-svn: 147921
2012-01-11 03:42:27 +00:00
Jakob Stoklund Olesen
37e4396a06 Detect when a value is undefined on an edge to a landing pad.
Consider this code:

int h() {
  int x;
  try {
    x = f();
    g();
  } catch (...) {
    return x+1;
  }
  return x;
}

The variable x is undefined on the first edge to the landing pad, but it
has the f() return value on the second edge to the landing pad.

SplitAnalysis::getLastSplitPoint() would assume that the return value
from f() was live into the landing pad when f() throws, which is of
course impossible.

Detect these cases, and treat them as if the landing pad wasn't there.
This allows spill code to be inserted after the function call to f().

<rdar://problem/10664933>

llvm-svn: 147912
2012-01-11 02:07:05 +00:00
Chad Rosier
830f1909d8 Add test case for r147881.
llvm-svn: 147891
2012-01-10 23:09:53 +00:00
Joerg Sonnenberger
fccad68f65 Default stack alignment for 32bit x86 should be 4 Bytes, not 8 Bytes.
Add a test that checks the stack alignment of a simple function for
Darwin, Linux and NetBSD for 32bit and 64bit mode.

llvm-svn: 147888
2012-01-10 22:43:53 +00:00
Nadav Rotem
969b8a6903 Fix a bug in the legalization of shuffle vectors. When we emulate shuffles using BUILD_VECTORS we may be using a BV of different type. Make sure to cast it back.
llvm-svn: 147851
2012-01-10 14:28:46 +00:00
Craig Topper
01eba20904 Fix a crash in AVX2 when trying to broadcast a double into a 128-bit vector. There is no vbroadcastsd xmm, but we do need to support 64-bit integers broadcasted into xmm. Also factor the AVX check into the isVectorBroadcast function. This makes more sense since the AVX2 check was already inside.
llvm-svn: 147844
2012-01-10 08:23:59 +00:00
Evan Cheng
7855c5d08f Allow machine-cse to look across MBB boundary when cse'ing instructions that
define physical registers. It's currently very restrictive, only catching
cases where the CE is in an immediate (and only) predecessor. But it catches
a surprising large number of cases.

rdar://10660865

llvm-svn: 147827
2012-01-10 02:02:58 +00:00
Chandler Carruth
9a418a2713 Cleanup and FileCheck-ize a test.
llvm-svn: 147772
2012-01-09 09:44:26 +00:00
Craig Topper
ee2dabebe3 Clean up patterns for MOVNT*. Not sure why there were floating point types on MOVNTPS and MOVNTDQ. And v4i64 was completely missing.
llvm-svn: 147767
2012-01-09 06:52:46 +00:00
Rafael Espindola
7618aa1c64 Don't print an unused label before .cfi_endproc.
llvm-svn: 147763
2012-01-09 00:17:29 +00:00
Craig Topper
1fdcf7071d Don't disable MMX support when AVX is enabled. Fix predicates for MMX instructions that were added along with SSE instructions to check for AVX in addition to SSE level.
llvm-svn: 147762
2012-01-09 00:11:29 +00:00
Victor Umansky
5d24f5f51a Reverted commit #147601 upon Evan's request.
llvm-svn: 147748
2012-01-08 17:20:33 +00:00
Rafael Espindola
19a13321f8 Don't print a label before .cfi_startproc when we don't need to. This makes
the produce assembly when using CFI just a bit more readable.

llvm-svn: 147743
2012-01-07 22:42:19 +00:00
Evan Cheng
8af07ba749 Added a late machine instruction copy propagation pass. This catches
opportunities that only present themselves after late optimizations
such as tail duplication .e.g.
## BB#1:
        movl    %eax, %ecx
        movl    %ecx, %eax
        ret

The register allocator also leaves some of them around (due to false
dep between copies from phi-elimination, etc.)

This required some changes in codegen passes. Post-ra scheduler and the
pseudo-instruction expansion passes have been moved after branch folding
and tail merging. They were before branch folding before because it did
not always update block livein's. That's fixed now. The pass change makes
independently since we want to properly schedule instructions after
branch folding / tail duplication.

rdar://10428165
rdar://10640363

llvm-svn: 147716
2012-01-07 03:02:36 +00:00
Eric Christopher
09a41e8939 Make the 'x' constraint work for AVX registers as well.
Fixes rdar://10614894

llvm-svn: 147704
2012-01-07 01:02:09 +00:00
Chandler Carruth
90f124f420 Prevent a DAGCombine from firing where there are two uses of
a combined-away node and the result of the combine isn't substantially
smaller than the input, it's just canonicalized. This is the first part
of a significant (7%) performance gain for Snappy's hot decompression
loop.

llvm-svn: 147604
2012-01-05 11:05:55 +00:00
Chandler Carruth
75ff869d6a Cleanup and FileCheck-ize a test.
llvm-svn: 147603
2012-01-05 11:05:47 +00:00
Victor Umansky
87d5ada510 Peephole optimization of ptest-conditioned branch in X86 arch. Performs instruction combining of sequences generated by ptestz/ptestc intrinsics to ptest+jcc pair for SSE and AVX.
Testing: passed 'make check' including LIT tests for all sequences being handled (both SSE and AVX)

Reviewers: Evan Cheng, David Blaikie, Bruno Lopes, Elena Demikhovsky, Chad Rosier, Anton Korobeynikov
llvm-svn: 147601
2012-01-05 08:46:19 +00:00
Benjamin Kramer
e5589bccdd FileCheck hygiene.
llvm-svn: 147580
2012-01-05 00:43:34 +00:00
NAKAMURA Takumi
6ebbc05c9d test/CodeGen/X86/jump_sign.ll: Add -mcpu=pentiumpro for non-x86 hosts. It uses "cmov".
llvm-svn: 147521
2012-01-04 03:52:23 +00:00
Evan Cheng
caba5d2fc2 For x86, canonicalize max
(x > y) ? x : y
=>
(x >= y) ? x : y

So for something like
(x - y) > 0 : (x - y) ? 0
It will be
(x - y) >= 0 : (x - y) ? 0

This makes is possible to test sign-bit and eliminate a comparison against
zero. e.g.
subl   %esi, %edi
testl  %edi, %edi
movl   $0, %eax
cmovgl %edi, %eax
=>
xorl   %eax, %eax
subl   %esi, $edi
cmovsl %eax, %edi

rdar://10633221

llvm-svn: 147512
2012-01-04 01:41:39 +00:00
Nadav Rotem
79f1692fe0 Revert 147426 because it caused pr11696.
llvm-svn: 147485
2012-01-03 22:19:42 +00:00
Nadav Rotem
cc49c4d74d Fix incorrect widening of the bitcast sdnode in case the incoming operand is integer-promoted.
llvm-svn: 147484
2012-01-03 22:12:28 +00:00
Chad Rosier
afcaa8f38a Enhance DAGCombine for transforming 128->256 casts into a vmovaps, rather
then a vxorps + vinsertf128 pair if the original vector came from a load.
rdar://10594409

llvm-svn: 147481
2012-01-03 21:05:52 +00:00
Elena Demikhovsky
40f2c3077f Fixed a bug in SelectionDAG.cpp.
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.

The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.

I added a special test that checks vector shuffle on win32.

llvm-svn: 147445
2012-01-03 11:59:04 +00:00
Nadav Rotem
6929a8868b Optimize the sequence blend(sign_extend(x)) to blend(shl(x)) since SSE blend instructions only look at the highest bit.
llvm-svn: 147426
2012-01-02 08:05:46 +00:00
Craig Topper
f7c9bf17dd Allow CRC32 instructions to be selected when AVX is enabled.
llvm-svn: 147411
2012-01-01 19:51:58 +00:00
Craig Topper
d8ae2d9f27 Fix sfence, lfence, mfence, and clflush to be able to be selected when AVX is enabled. Fix monitor and mwait to require SSE3 or AVX, previously they worked even if SSE3 was disabled. Make prefetch instructions not set the execution domain since they don't use XMM registers.
llvm-svn: 147409
2012-01-01 19:40:22 +00:00
Rafael Espindola
1cb17796db Revert 147399. It broke CodeGen/ARM/vext.ll.
llvm-svn: 147400
2012-01-01 17:36:23 +00:00
Elena Demikhovsky
9b74049783 Fixed a bug in SelectionDAG.cpp.
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.

The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.

I added a special test that checks vector shuffle on win32.

llvm-svn: 147399
2012-01-01 16:22:47 +00:00
Craig Topper
0311c45aed Add patterns for integer forms of SHUFPD/VSHUFPD with a memory load.
llvm-svn: 147393
2011-12-31 23:24:49 +00:00
Craig Topper
c01ce759d7 Fix typo in a SHUFPD and VSHUFPD pattern that prevented SHUFPD/VSHUFPD with a load from being selected.
llvm-svn: 147392
2011-12-31 23:15:11 +00:00
Craig Topper
33091db89a Change FMA4 memory forms to use memopv* instead of alignedloadv*. No need to force alignment on these instructions. Add a couple testcases for memory forms.
llvm-svn: 147361
2011-12-30 02:18:36 +00:00
Craig Topper
e066262284 Fix load size for FMA4 SS/SD instructions. They need to use f32 and f64 size, but with the special handling to be compatible with the intrinsic expecting a vector. Similar handling is already used elsewhere.
llvm-svn: 147360
2011-12-30 01:49:53 +00:00
Eli Friedman
db54b4b68f Fix type-checking for load transformation which is not legal on floating-point types. PR11674.
llvm-svn: 147323
2011-12-28 21:24:44 +00:00
Nadav Rotem
d8c4880903 PR11662.
Promotion of the mask operand needs to be done using PromoteTargetBoolean, and not padded with garbage.

llvm-svn: 147309
2011-12-28 13:08:20 +00:00
Elena Demikhovsky
9b4613ff14 Fixed a bug in LowerVECTOR_SHUFFLE and LowerBUILD_VECTOR.
Matching MOVLP mask for AVX (265-bit vectors) was wrong.
The failure was detected by conformance tests.

llvm-svn: 147308
2011-12-28 08:14:01 +00:00
Eli Friedman
064187912e Make sure DAGCombiner doesn't introduce multiple loads from the same memory location. PR10747, part 2.
llvm-svn: 147283
2011-12-26 22:49:32 +00:00
Chandler Carruth
7a5c52fadf Use standard promotion for i8 CTTZ nodes and i8 CTLZ nodes when the
LZCNT instructions are available. Force promotion to i32 to get
a smaller encoding since the fix-ups necessary are just as complex for
either promoted type

We can't do standard promotion for CTLZ when lowering through BSR
because it results in poor code surrounding the 'xor' at the end of this
instruction. Essentially, if we promote the entire CTLZ node to i32, we
end up doing the xor on a 32-bit CTLZ implementation, and then
subtracting appropriately to get back to an i8 value. Instead, our
custom logic just uses the knowledge of the incoming size to compute
a perfect xor. I'd love to know of a way to fix this, but so far I'm
drawing a blank. I suspect the legalizer could be more clever and/or it
could collude with the DAG combiner, but how... ;]

llvm-svn: 147251
2011-12-24 12:12:34 +00:00
Chandler Carruth
82b7a7478b Add systematic testing for cttz as well, and fix the bug I spotted by
inspection earlier.

llvm-svn: 147250
2011-12-24 11:46:10 +00:00
Chandler Carruth
b52ba33d0a Add i8 and i64 testing for ctlz on x86. Also simplify the i16 test.
llvm-svn: 147249
2011-12-24 11:26:59 +00:00
Chandler Carruth
800a803717 Tidy up this rather crufty test. Put the declarations at the top to make
my C-brain happy. Remove the unnecessary bits of pedantic IR fluff like
nounwind. Remove stray uses comments. Name things semantically rather
than tN so that adding a new test in the middle doesn't cause pain, and
so that new tests can be grouped semantically.

This exposes how little systematic testing is going on here. I noticed
this by finding several bugs via inspection and wondering why this test
wasn't catching any of them. =[

llvm-svn: 147248
2011-12-24 11:26:57 +00:00
Chandler Carruth
48f5be6ce0 Expand more when we have a nice 'tzcnt' instruction, to avoid generating
'bsf' instructions here.

This one is actually debatable to my eyes. It's not clear that any chip
implementing 'tzcnt' would have a slow 'bsf' for any reason, and unless
EFLAGS or a zero input matters, 'tzcnt' is just a longer encoding.
Still, this restores the old behavior with 'tzcnt' enabled for now.

llvm-svn: 147246
2011-12-24 11:11:38 +00:00
Chandler Carruth
1846086903 Tidy up some of these tests.
llvm-svn: 147245
2011-12-24 11:11:36 +00:00
Chandler Carruth
9ef50ef1f7 Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the
X86ISelLowering C++ code. Because this is lowered via an xor wrapped
around a bsr, we want the dagcombine which runs after isel lowering to
have a chance to clean things up. In particular, it is very common to
see code which looks like:

  (sizeof(x)*8 - 1) ^ __builtin_clz(x)

Which is trying to compute the most significant bit of 'x'. That's
actually the value computed directly by the 'bsr' instruction, but if we
match it too late, we'll get completely redundant xor instructions.

The more naive code for the above (subtracting rather than using an xor)
still isn't handled correctly due to the dagcombine getting confused.

Also, while here fix an issue spotted by inspection: we should have been
expanding the zero-undef variants to the normal variants when there is
an 'lzcnt' instruction. Do so, and test for this. We don't want to
generate unnecessary 'bsr' instructions.

These two changes fix some regressions in encoding and decoding
benchmarks. However, there is still a *lot* to be improve on in this
type of code.

llvm-svn: 147244
2011-12-24 10:55:54 +00:00
Chandler Carruth
514920d53b Cleanup this test a bit, sorting things and grouping them more clearly.
llvm-svn: 147243
2011-12-24 10:55:42 +00:00
Elena Demikhovsky
b37883fe87 This is the second fix related to VZEXT_MOVL node.
The failure that I see in the current version is:

LLVM ERROR: Cannot select: 0x18b8f70: v4i64 = X86ISD::VZEXT_MOVL 0x18beee0 [ID=14]
  0x18beee0: v4i64 = insert_subvector 0x18b8c70, 0x18b9170, 0x18b9570 [ID=13]
    0x18b8c70: v4i64 = insert_subvector 0x18b9870, 0x18bf4e0, 0x18b9970 [ID=12]
      0x18b9870: v4i64 = undef [ID=4]
      0x18bf4e0: v2i64 = bitcast 0x18bf3e0 [ID=10]
        0x18bf3e0: v4i32 = BUILD_VECTOR 0x18b9770, 0x18b9770, 0x18b9770, 0x18b9770 [ID=8]
          0x18b9770: i32 = TargetConstant<0> [ID=6]
          0x18b9770: i32 = TargetConstant<0> [ID=6]
          0x18b9770: i32 = TargetConstant<0> [ID=6]
          0x18b9770: i32 = TargetConstant<0> [ID=6]
      0x18b9970: i32 = Constant<0> [ID=3]
    0x18b9170: v2i64 = undef [ORD=1] [ID=1]
    0x18b9570: i32 = Constant<2> [ID=5]

llvm-svn: 146975
2011-12-20 13:34:28 +00:00
Chandler Carruth
7564e8371a Begin teaching the X86 target how to efficiently codegen patterns that
use the zero-undefined variants of CTTZ and CTLZ. These are just simple
patterns for now, there is more to be done to make real world code using
these constructs be optimized and codegen'ed properly on X86.

The existing tests are spiffed up to check that we no longer generate
unnecessary cmov instructions, and that we generate the very important
'xor' to transform bsr which counts the index of the most significant
one bit to the number of leading (most significant) zero bits. Also they
now check that when the variant with defined zero result is used, the
cmov is still produced.

llvm-svn: 146974
2011-12-20 11:19:37 +00:00
Lang Hames
e32ef23ba8 Make sure that the lower bits on the VSELECT condition are properly set.
llvm-svn: 146800
2011-12-17 01:08:46 +00:00
Craig Topper
88e2bfef0a Don't try to match 'unpackl/h v, v' for 32xi8 and 16xi16 when only AVX1 is supported. Fix 'unpackh v, v' for 256-bit types to understand 128-bit lanes.
llvm-svn: 146726
2011-12-16 08:06:31 +00:00
Chad Rosier
62ebee9859 Add missing zmovl AVX patterns which were causing crashes.
Patch by Elena Demikhovsky <elena.demikhovsky@intel.com>!

llvm-svn: 146689
2011-12-15 22:11:31 +00:00
Chad Rosier
e74b3b1469 Fix assert in LowerBUILD_VECTOR for v16i16 type on AVX.
Patch by Elena Demikhovsky <elena.demikhovsky@intel.com>!

llvm-svn: 146684
2011-12-15 21:34:44 +00:00
Lang Hames
d5cee672a7 Set specific target cpu for testcase.
llvm-svn: 146678
2011-12-15 20:22:34 +00:00
Lang Hames
0e361e816d Added test case for r146671.
llvm-svn: 146675
2011-12-15 19:56:07 +00:00
Eli Friedman
71c0914b64 Don't try to form FGETSIGN after legalization; it is possible in some cases, but the existing code can't do it correctly. PR11570.
llvm-svn: 146630
2011-12-15 02:07:20 +00:00
Chad Rosier
b93733686c Add support for lowering fneg when AVX is enabled.
rdar://10566486

llvm-svn: 146625
2011-12-15 01:02:25 +00:00
Chandler Carruth
2bedf185c9 Manually upgrade the test suite to specify the flag to cttz and ctlz.
I followed three heuristics for deciding whether to set 'true' or
'false':

- Everything target independent got 'true' as that is the expected
  common output of the GCC builtins.
- If the target arch only has one way of implementing this operation,
  set the flag in the way that exercises the most of codegen. For most
  architectures this is also the likely path from a GCC builtin, with
  'true' being set. It will (eventually) require lowering away that
  difference, and then lowering to the architecture's operation.
- Otherwise, set the flag differently dependending on which target
  operation should be tested.

Let me know if anyone has any issue with this pattern or would like
specific tests of another form. This should allow the x86 codegen to
just iteratively improve as I teach the backend how to differentiate
between the two forms, and everything else should remain exactly the
same.

llvm-svn: 146370
2011-12-12 11:59:10 +00:00
Evan Cheng
77f0fb0296 Update test to something more sensible.
llvm-svn: 146282
2011-12-09 21:54:10 +00:00
Benjamin Kramer
06cd66b1d7 X86: Add patterns for the various rounding ops for SSE4.1 and AVX.
llvm-svn: 146257
2011-12-09 15:44:03 +00:00
Evan Cheng
2c8bac6b4c Forgot setting -march.
llvm-svn: 146244
2011-12-09 06:15:00 +00:00
Evan Cheng
ad8debd736 Add 256-bit variant vmovss and vmovsd patterns. rdar://10538417
llvm-svn: 146196
2011-12-08 22:30:45 +00:00
Evan Cheng
d8a73b8918 Add various missing AVX patterns which was causing crashes. Sadly, the generated
code looks pretty bad compared to SSE.

rdar://10538793

llvm-svn: 146191
2011-12-08 22:05:28 +00:00
Evan Cheng
0e0e920975 Add test for r146163.
llvm-svn: 146167
2011-12-08 19:21:39 +00:00
NAKAMURA Takumi
671c1da473 test/CodeGen/X86/vec_compare-2.ll: Add explicit -mtriple=i686-linux.
llvm-svn: 146152
2011-12-08 15:24:09 +00:00
Nadav Rotem
341b30a457 Fix a bug in the integer-promotion of bitcast operations on vector types.
We must not issue a bitcast operation for integer-promotion of vector types, because the
location of the values in the vector may be different.

llvm-svn: 146150
2011-12-08 13:10:01 +00:00
Eli Friedman
9e8d557cd1 Support vector bitcasts in the AsmPrinter. PR11495.
llvm-svn: 146001
2011-12-07 00:50:54 +00:00
Eli Friedman
5545db0906 Fix an optimization involving EXTRACT_SUBVECTOR in DAGCombine so it behaves correctly. PR11494.
llvm-svn: 145996
2011-12-07 00:11:56 +00:00
Craig Topper
8b05e7d035 Fix a bunch of SSE/AVX patterns to use v2i64/v4i64 loads since all other integer vector loads are promoted to those.
llvm-svn: 145927
2011-12-06 09:04:59 +00:00
Craig Topper
72b41227d8 Merge isSHUFPMask and isCommutedSHUFPMask into single function that can do both. Do the same for the 256-bit version. Use loops to reduce size of isVSHUFPYMask. Fix test cases that were incorrectly passing due to isCommutedSHUFPMask not checking for the vector being 128-bit. This caused some 256-bit shuffles to be incorrectly commuted.
llvm-svn: 145921
2011-12-06 04:59:07 +00:00
NAKAMURA Takumi
c6a187dfdd test/CodeGen/X86/pointer-vector.ll: Add explicit -mtriple=i686-linux.
llvm-svn: 145805
2011-12-05 07:54:57 +00:00
Nadav Rotem
1a91e4381d Add support for vectors of pointers.
llvm-svn: 145801
2011-12-05 06:29:09 +00:00
Sanjoy Das
fe35e107cd Check for stack space more intelligently.
libgcc sets the stack limit field in TCB to 256 bytes above the actual
allocated stack limit.  This means if the function's stack frame needs
less than 256 bytes, we can just compare the stack pointer with the
stack limit.  This should result in lesser calls to __morestack.

llvm-svn: 145766
2011-12-03 09:32:07 +00:00
Sanjoy Das
d1c3d82afe Fix a bug in the x86-32 code generated for segmented stacks.
Currently LLVM pads the call to __morestack with a add and sub of 8
bytes to esp.  This isn't correct since __morestack expects the call
to be followed directly by a ret.

This commit also adjusts the relevant test-case.

llvm-svn: 145765
2011-12-03 09:21:07 +00:00
Craig Topper
d381116357 Add instruction selection support for horizontal add/sub of 256-bit floating point vectors. Also add the test case for 256-bit integer vectors.
llvm-svn: 145680
2011-12-02 07:16:01 +00:00
Eric Christopher
4aa8024569 For 64-bit the rest of the general regs are ok for the q constraint. Make
sure we can emit both the high and low versions of those registers.

Fixes rdar://10392864

llvm-svn: 145579
2011-12-01 08:12:41 +00:00
Eli Friedman
ad916965a6 Pass AVX vectors which are arguments to varargs functions on the stack. <rdar://problem/10463281>.
llvm-svn: 145573
2011-12-01 04:49:21 +00:00
Jan Sjödin
2dfb343ffa Support for encoding all FMA4 instructions and tablegen patterns for all
remaining FMA4 instructions and intrinsics with tests.

llvm-svn: 145525
2011-11-30 22:09:42 +00:00
Nadav Rotem
148d347bb7 Add test arch to make it pass on non x86 targets
llvm-svn: 145498
2011-11-30 17:34:28 +00:00
Nadav Rotem
bef49f31be Add a tripple to the test
llvm-svn: 145489
2011-11-30 11:20:56 +00:00
Nadav Rotem
f8e096f4ee X86: PerformOrCombine introduced a vselect node with a wrong order of operands. This bug was introduced when a dedicated blend sdnode was replaced with the vselect node (in 139479).
llvm-svn: 145488
2011-11-30 10:13:37 +00:00
Evan Cheng
5c1efd630b Add another missing pattern. llvm-gcc likes f64 but clang likes i64 so it was generating poor code for some SSE builtins.
llvm-svn: 145448
2011-11-29 22:48:34 +00:00
Jakob Stoklund Olesen
5d6a4584d9 Make X86::FsFLD0SS / FsFLD0SD real pseudo-instructions.
Like V_SET0, these instructions are expanded by ExpandPostRA to xorps /
vxorps so they can participate in execution domain swizzling.

This also makes the AVX variants redundant.

llvm-svn: 145440
2011-11-29 22:27:25 +00:00
Elena Demikhovsky
735cff1fa8 Fixed vsqrt.ss intrinsic usage - order of input operands was wrong.
Added a test.
Thanks Bruno for reviewing the patch.

llvm-svn: 145403
2011-11-29 15:00:45 +00:00
Craig Topper
637e60afdd Fix shuffle decoding for memory forms for (V)SHUFPS/D.
llvm-svn: 145392
2011-11-29 07:58:09 +00:00
Craig Topper
4550fc2649 Fix issues in shuffle decoding around VPERM* instructions. Fix shuffle decoding for VSHUFPS/D for 256-bit types. Add pattern matching for memory forms of VPERMILPS/VPERMILPD.
llvm-svn: 145390
2011-11-29 07:49:05 +00:00
Craig Topper
aca91b9f14 Fix VINSERTF128/VEXTRACTF128 to be marked as FP instructions. Allow execution dependency fix pass to convert them to their integer equivalents when AVX2 is enabled.
llvm-svn: 145376
2011-11-29 05:37:58 +00:00
Craig Topper
a6c1d25798 Correctly mark VPERM2F128 as being an FP instruction and add execution domain fixing support to convert it to VPERM2I128 for AVX2.
llvm-svn: 145370
2011-11-29 03:57:34 +00:00
Evan Cheng
1435aa5fdc Revert r145273 and fix in SelectionDAG::InferPtrAlignment() instead.
Conservatively returns zero when the GV does not specify an alignment nor is it
initialized. Previously it returns ABI alignment for type of the GV. However, if
the type is a "packed" type, then the under-specified alignments is attached to
the load / store instructions. In that case, the alignment of the type cannot be
trusted.
rdar://10464621

llvm-svn: 145300
2011-11-28 22:37:34 +00:00
Evan Cheng
567aa3dfb3 DAG combine should not increase alignment of loads / stores with alignment less
than ABI alignment. These are loads / stores from / to "packed" data structures.
Their alignments are intentionally under-specified.

rdar://10301431

llvm-svn: 145273
2011-11-28 20:42:56 +00:00
Craig Topper
6f5a0bc4e3 Add X86 instruction selection for VPERM2I128 when AVX2 is enabled. Merge VPERMILPS/VPERMILPD detection since they are pretty similar.
llvm-svn: 145238
2011-11-28 10:14:51 +00:00
Chandler Carruth
bb4c250613 Take two on rotating the block ordering of loops. My previous attempt
was centered around the premise of laying out a loop in a chain, and
then rotating that chain. This is good for preserving contiguous layout,
but bad for actually making sane rotations. In order to keep it safe,
I had to essentially make it impossible to rotate deeply nested loops.
The information needed to correctly reason about a deeply nested loop is
actually available -- *before* we layout the loop. We know the inner
loops are already fused into chains, etc. We lose information the moment
we actually lay out the loop.

The solution was the other alternative for this algorithm I discussed
with Benjamin and some others: rather than rotating the loop
after-the-fact, try to pick a profitable starting block for the loop's
layout, and then use our existing layout logic. I was worried about the
complexity of this "pick" step, but it turns out such complexity is
needed to handle all the important cases I keep teasing out of benchmarks.

This is, I'm afraid, a bit of a work-in-progress. It is still
misbehaving on some likely important cases I'm investigating in Olden.
It also isn't really tested. I'm going to try to craft some interesting
nested-loop test cases, but it's likely to be extremely time consuming
and I don't want to go there until I'm sure I'm testing the correct
behavior. Sadly I can't come up with a way of getting simple, fine
grained test cases for this logic. We need complex loop structures to
even trigger much of it.

llvm-svn: 145183
2011-11-27 13:34:33 +00:00
Chandler Carruth
e6374e6953 Rework a bit of the implementation of loop block rotation to not rely so
heavily on AnalyzeBranch. That routine doesn't behave as we want given
that rotation occurs mid-way through re-ordering the function. Instead
merely check that there are not unanalyzable branching constructs
present, and then reason about the CFG via successor lists. This
actually simplifies my mental model for all of this as well.

The concrete result is that we now will rotate more loop chains. I've
added a test case from Olden highlighting the effect. There is still
a bit more to do here though in order to regain all of the performance
in Olden.

llvm-svn: 145179
2011-11-27 09:22:53 +00:00
Chris Lattner
84bf52737a remove autoupgrade support for old forms of llvm.prefetch and the old
trampoline forms.  Both of these were correct in LLVM 3.0, and we don't
need to support LLVM 2.9 and earlier in mainline.

llvm-svn: 145174
2011-11-27 07:42:04 +00:00
Chris Lattner
9d1e8420ff Upgrade syntax of tests using volatile instructions to use 'load volatile' instead of 'volatile load', which is archaic.
llvm-svn: 145171
2011-11-27 06:54:59 +00:00
Chris Lattner
8067661775 remove some old autoupgrade logic
llvm-svn: 145167
2011-11-27 06:10:54 +00:00
Chandler Carruth
0d073febb6 Introduce a loop block rotation optimization to the new block placement
pass. This is designed to achieve one of the important optimizations
that the old code placement pass did, but more simply.

This is a somewhat rough and *very* conservative version of the
transform. We could get a lot fancier here if there are profitable cases
to do so. In particular, this only looks for a single pattern, it
insists that the loop backedge being rotated away is the last backedge
in the chain, and it doesn't provide any means of doing better in-loop
placement due to the rotation. However, it appears that it will handle
the important loops I am finding in the LLVM test suite.

llvm-svn: 145158
2011-11-27 00:38:03 +00:00
Eli Friedman
448be745f6 Fix APFloat::convert so that it handles narrowing conversions correctly; it
was returning incorrect values in rare cases, and incorrectly marking
exact conversions as inexact in some more common cases. Fixes PR11406, and a
missed optimization in test/CodeGen/X86/fp-stack-O0.ll.

llvm-svn: 145141
2011-11-26 03:38:02 +00:00
Bruno Cardoso Lopes
626d04cc6f This patch contains support for encoding FMA4 instructions and
tablegen patterns for scalar FMA4 operations and intrinsic. Also
add tests for vfmaddsd.

Patch by Jan Sjodin

llvm-svn: 145133
2011-11-25 19:33:42 +00:00
Craig Topper
e761f42368 Remove 256-bit specific node types for UNPCKHPS/D and instead use the 128-bit versions and let the operand type disinquish. Also fix the load form of the v8i32 patterns for these to realize that the load would be promoted to v4i64.
llvm-svn: 145126
2011-11-24 22:57:10 +00:00
Chandler Carruth
f6e96b54f8 Fix a silly use-after-free issue. A much earlier version of this code
need lots of fanciness around retaining a reference to a Chain's slot in
the BlockToChain map, but that's all gone now. We can just go directly
to allocating the new chain (which will update the mapping for us) and
using it.

Somewhat gross mechanically generated test case replicates the issue
Duncan spotted when actually testing this out.

llvm-svn: 145120
2011-11-24 11:23:15 +00:00
Chandler Carruth
1d3f68ffd0 When adding blocks to the list of those which no longer have any CFG
conflicts, we should only be adding the first block of the chain to the
list, lest we try to merge into the middle of that chain. Most of the
places we were doing this we already happened to be looking at the first
block, but there is no reason to assume that, and in some cases it was
clearly wrong.

I've added a couple of tests here. One already worked, but I like having
an explicit test for it. The other is reduced from a test case Duncan
reduced for me and used to crash. Now it is handled correctly.

llvm-svn: 145119
2011-11-24 08:46:04 +00:00
Benjamin Kramer
825892c47f X86: Use btq for bit tests if the immediate can't be encoded in 32 bits.
Before:
	movabsq	$4294967296, %rax       ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x01,0x00,0x00,0x00]
	testq	%rax, %rdi              ## encoding: [0x48,0x85,0xf8]
	jne	LBB0_2                  ## encoding: [0x75,A]

After:
	btq	$32, %rdi               ## encoding: [0x48,0x0f,0xba,0xe7,0x20]
	jb	LBB0_2                  ## encoding: [0x72,A]

btq is usually slower than testq because it doesn't fuse with the jump, but here we're better off
saving one register and a giant movabsq.

llvm-svn: 145103
2011-11-23 13:54:17 +00:00
NAKAMURA Takumi
364c7c80ec test/CodeGen/X86/block-placement.ll: Add explicit -mtriple=i686-linux. X86 Win32 CodeGen does not support EH yet.
llvm-svn: 145101
2011-11-23 12:18:22 +00:00
Chandler Carruth
dcf04fc35a Relax an invariant that block placement was trying to assert a bit
further. This invariant just wasn't going to work in the face of
unanalyzable branches; we need to be resillient to the phenomenon of
chains poking into a loop and poking out of a loop. In fact, we already
were, we just needed to not assert on it.

This was found during a bootstrap with block placement turned on.

llvm-svn: 145100
2011-11-23 10:35:36 +00:00
Elena Demikhovsky
681438f0b2 I added several lines in X86 code generator that allow to choose
VSHUFPS/VSHUFPD instructions while lowering VECTOR_SHUFFLE node. I check a commuted VSHUFP mask.

The patch was reviewed by Bruno.

llvm-svn: 145099
2011-11-23 10:23:16 +00:00
Chandler Carruth
c175221b2d Handle the case of a no-return invoke correctly. It actually still has
successors, they just are all landing pad successors. We handle this the
same way as no successors. Comments attached for the next person to wade
through here and another lovely test case courtesy of Benjamin Kramer's
bugpoint reduction.

llvm-svn: 145098
2011-11-23 08:23:54 +00:00
Bob Wilson
8722a53e52 Enable stack protectors for all arrays, not just char arrays. rdar://5875909
Patch by Bill Wendling.

llvm-svn: 145097
2011-11-23 07:13:56 +00:00
Jakob Stoklund Olesen
2b76f5685c Fix PR11422.
This was a bug in keeping track of the available domains when merging
domain values.

The wrong domain mask caused ExecutionDepsFix to try to move VANDPSYrr
to the integer domain which is only available in AVX2.

Also add an assertion to catch future attempts at emitting AVX2
instructions.

llvm-svn: 145096
2011-11-23 04:03:08 +00:00
Chandler Carruth
a357b0a5ee Fix a crash in block placement due to an inner loop that happened to be
reversed in the function's original ordering, and we happened to
encounter it while handling an outer unnatural CFG structure.

Thanks to the test case reduced from GCC's source by Benjamin Kramer.
This may also fix a crasher in gzip that Duncan reduced for me, but
I haven't yet gotten to testing that one.

llvm-svn: 145094
2011-11-23 03:03:21 +00:00
Chandler Carruth
59f0abf50e Fix a devilish miscompile exposed by block placement. The
updateTerminator code didn't correctly handle EH terminators in one very
specific case. AnalyzeBranch would find no terminator instruction, and
so the fallback in updateTerminator is to assume fallthrough. This is
correct, but the destination of the fallthrough was assumed to be the
first successor.

This is *almost always* true, but in certain cases the loop
transformations will cause the landing pad to be the first successor!
Instead of this brittle logic, actually look through the successors for
a non-landing-pad accessor, and to assert if more than one is found.

This will hopefully fix some (if not all) of the self host miscompiles
with block placement. Thanks to Benjamin Kramer for reporting, Nick
Lewycky for an initial stab at a reduction, and Duncan for endless
advice on EH (which I know nothing about) as well as reviewing the
actual fix.

llvm-svn: 145062
2011-11-22 13:13:16 +00:00
Rafael Espindola
05f88add77 Add triple to the test.
llvm-svn: 145057
2011-11-22 06:36:25 +00:00
Rafael Espindola
b45f6ea527 If a register is both an early clobber and part of a tied use, handle the use
before the clobber so that we copy the value if needed.

Fixes pr11415.

llvm-svn: 145056
2011-11-22 06:27:18 +00:00
Craig Topper
866214a486 Lowering for v32i8 to VPUNPCKLBW/VPUNPCKHBW when AVX2 is enabled.
llvm-svn: 145028
2011-11-21 08:26:50 +00:00
Craig Topper
05db995317 Test case for r145026
llvm-svn: 145027
2011-11-21 06:58:09 +00:00
Craig Topper
62ae335144 Make LowerSIGN_EXTEND_INREG split 256-bit vectors when AVX1 is enabled and use AVX2 shifts when AVX2 is enabled.
llvm-svn: 145022
2011-11-21 01:12:36 +00:00
NAKAMURA Takumi
0738f386b4 test/CodeGen/X86/block-placement.ll: Relax expressions for Win32.
llvm-svn: 145011
2011-11-20 12:49:45 +00:00
Chandler Carruth
aac8e5082a The logic for breaking the CFG in the presence of hot successors didn't
properly account for the *global* probability of the edge being taken.
This manifested as a very large number of unconditional branches to
blocks being merged against the CFG even though they weren't
particularly hot within the CFG.

The fix is to check whether the edge being merged is both locally hot
relative to other successors for the source block, and globally hot
compared to other (unmerged) predecessors of the destination block.

This introduces a new crasher on GCC single-source, but it's currently
behind a flag, and Ben has offered to work on the reduction. =]

llvm-svn: 145010
2011-11-20 11:22:06 +00:00
Chandler Carruth
fa7964c81f Add some comments to the latest test case I added here to document what
is actually being tested. Also add some FileCheck goodness to much more
carefully ensure that the result is the desired result. Before this test
would only have failed through an assert failure if the underlying fix
were reverted.

Also, add some weight metadata and a comment explaining exactly what is
going on to a trick section of the test case. Originally, we were
getting very unlucky and trying to form a block chain that isn't
actually profitable. I'm working on a fix to avoid forming these
unprofitable chains, and that would also have masked any failure from
this test case. The easy solution is to add some metadata that makes it
*really* profitable to form the bad chain here.

llvm-svn: 145006
2011-11-20 09:30:40 +00:00
Craig Topper
e878c775cf Add code for lowering v32i8 shifts by a splat to AVX2 immediate shift instructions. Remove 256-bit splat handling from LowerShift as it was already handled by PerformShiftCombine.
llvm-svn: 145005
2011-11-20 00:12:05 +00:00
Craig Topper
6ed413c495 Use 256-bit vcmpeqd for creating an all ones vector when AVX2 is enabled.
llvm-svn: 145004
2011-11-19 22:34:59 +00:00
Chandler Carruth
f24d3f8fc7 Move the handling of unanalyzable branches out of the loop-driven chain
formation phase and into the initial walk of the basic blocks. We
essentially pre-merge all blocks where unanalyzable fallthrough exists,
as we won't be able to update the terminators effectively after any
reorderings. This is quite a bit more principled as there may be CFGs
where the second half of the unanalyzable pair has some analyzable
predecessor that gets placed first. Then it may get placed next,
implicitly breaking the unanalyzable branch even though we never even
looked at the part that isn't analyzable. I've included a test case that
triggers this (thanks Benjamin yet again!), and I'm hoping to synthesize
some more general ones as I dig into related issues.

Also, to make this new scheme work we have to be able to handle branches
into the middle of a chain, so add this check. We always fallback on the
incoming ordering.

Finally, this starts to really underscore a known limitation of the
current implementation -- we don't consider broken predecessors when
merging successors. This can caused major missed opportunities, and is
something I'm planning on looking at next (modulo more bug reports).

llvm-svn: 144994
2011-11-19 10:26:02 +00:00
Craig Topper
44b06b0096 Test cases for SSSE3/AVX integer horizontal add/sub.
llvm-svn: 144990
2011-11-19 09:03:33 +00:00
Craig Topper
536f9d9434 Extend VPBLENDVB and VPSIGN lowering to work for AVX2.
llvm-svn: 144987
2011-11-19 07:07:26 +00:00
Nadav Rotem
08f8a75c2c Add AVX2 vpbroadcast support
llvm-svn: 144967
2011-11-18 02:49:55 +00:00
Devang Patel
a0973b0c53 DISubrange supports unsigned lower/upper array bounds, so let's not fake it in the end while emitting DWARF. If a FE needs to encode signed lower/upper array bounds then we need to extend DISubrange or ad DISignedSubrange.
llvm-svn: 144937
2011-11-17 23:43:15 +00:00
Eli Friedman
51adc2ea5a Make sure to replace the chain properly when DAGCombining a LOAD+EXTRACT_VECTOR_ELT into a single LOAD. Fixes PR10747/PR11393.
llvm-svn: 144863
2011-11-16 23:50:22 +00:00
Evan Cheng
5bae2333cb Another missing X86ISD::MOVLPD pattern. rdar://10450317
llvm-svn: 144839
2011-11-16 22:24:44 +00:00
Evan Cheng
5242b6aaa1 Disable expensive two-address optimizations at -O0. rdar://10453055
llvm-svn: 144806
2011-11-16 18:44:48 +00:00
Eli Friedman
c2a46f1a90 Fix testcase.
llvm-svn: 144769
2011-11-16 03:03:52 +00:00
Eli Friedman
1f3d774ba4 CONCAT_VECTORS can have more than two operands. PR11389.
llvm-svn: 144768
2011-11-16 02:52:39 +00:00
Nadav Rotem
63be4a26a9 AVX: Add support for vbroadcast from BUILD_VECTOR and refactor some of the vbroadcast code.
llvm-svn: 144720
2011-11-15 22:50:37 +00:00
NAKAMURA Takumi
f99c0f0fcd test/CodeGen/X86/dec-eflags-lower.ll: Relax expression for win32 x64.
llvm-svn: 144714
2011-11-15 22:30:37 +00:00
Pete Cooper
8441c08e0b Added custom lowering for load->dec->store sequence in x86 when the EFLAGS registers is used
by later instructions.

Only done for DEC64m right now.

Fixes <rdar://problem/6172640>

llvm-svn: 144705
2011-11-15 21:57:53 +00:00
Rafael Espindola
95f4e0c409 We currently use a callback to handle an IL pass deleting a BB that still
has a reference to it. Unfortunately, that doesn't work for codegen passes
since we don't get notified of MBB's being deleted (the original BB stays).

Use that fact to our advantage and after printing a function, check if
any of the IL BBs corresponds to a symbol that was not printed. This fixes
pr11202.

llvm-svn: 144674
2011-11-15 19:08:46 +00:00
Jakob Stoklund Olesen
606d4e999b Revert r144611 and r144613.
These tests are actually correct, clang was miscompiling ExeDepsFix::processUses.

Evan fixed the miscompilation in r144628.

llvm-svn: 144630
2011-11-15 07:13:03 +00:00
Chandler Carruth
fdcba17bec Rather than trying to use the loop block sequence *or* the function
block sequence when recovering from unanalyzable control flow
constructs, *always* use the function sequence. I'm not sure why I ever
went down the path of trying to use the loop sequence, it is
fundamentally not the correct sequence to use. We're trying to preserve
the incoming layout in the cases of unreasonable control flow, and that
is only encoded at the function level. We already have a filter to
select *exactly* the sub-set of blocks within the function that we're
trying to form into a chain.

The resulting code layout is also significantly better because of this.
In several places we were ending up with completely unreasonable control
flow constructs due to the ordering chosen by the loop structure for its
internal storage. This change removes a completely wasteful vector of
basic blocks, saving memory allocation in the common case even though it
costs us CPU in the fairly rare case of unnatural loops. Finally, it
fixes the latest crasher reduced out of GCC's single source. Thanks
again to Benjamin Kramer for the reduction, my bugpoint skills failed at
it.

llvm-svn: 144627
2011-11-15 06:26:43 +00:00
Craig Topper
a584521daa Properly qualify AVX2 specific parts of execution dependency table. Also enable converting between 256-bit PS/PD operations when AVX1 is enabled. Fixes PR11370.
llvm-svn: 144622
2011-11-15 05:55:35 +00:00
Jakob Stoklund Olesen
a0ceee932a Really fix test.
llvm-svn: 144613
2011-11-15 03:17:01 +00:00
Jakob Stoklund Olesen
a4ad7080ff Allow for depencendy-breaking instructions before cvt*.
This should unbreak clang-x86_64-darwin10-RA, but I can't actually
reproduce the failure.

llvm-svn: 144611
2011-11-15 02:29:48 +00:00
Jakob Stoklund Olesen
2709f65821 Break false dependencies before partial register updates.
Two new TargetInstrInfo hooks lets the target tell ExecutionDepsFix
about instructions with partial register updates causing false unwanted
dependencies.

The ExecutionDepsFix pass will break the false dependencies if the
updated register was written in the previoius N instructions.

The small loop added to sse-domains.ll runs twice as fast with
dependency-breaking instructions inserted.

llvm-svn: 144602
2011-11-15 01:15:30 +00:00
Evan Cheng
2034ff3b0b Add a missing pattern for X86ISD::MOVLPD. rdar://10436044
llvm-svn: 144566
2011-11-14 20:35:52 +00:00
Evan Cheng
f19d257488 Teach two-address pass to re-schedule two-address instructions (or the kill
instructions of the two-address operands) in order to avoid inserting copies.
This fixes the few regressions introduced when the two-address hack was
disabled (without regressing the improvements).
rdar://10422688

llvm-svn: 144559
2011-11-14 19:48:55 +00:00
Pete Cooper
c9d6834f38 Changed SSE4/AVX <2 x i64> extract and insert ops to be Custom lowered
Constant idx case is still done in tablegen but other cases are then expanded

Fixes <rdar://problem/10435460>

llvm-svn: 144557
2011-11-14 19:38:42 +00:00
Chandler Carruth
462bb16130 Fix an overflow bug in MachineBranchProbabilityInfo. This pass relied on
the sum of the edge weights not overflowing uint32, and crashed when
they did. This is generally safe as BranchProbabilityInfo tries to
provide this guarantee. However, the CFG can get modified during codegen
in a way that grows the *sum* of the edge weights. This doesn't seem
unreasonable (imagine just adding more blocks all with the default
weight of 16), but it is hard to come up with a case that actually
triggers 32-bit overflow. Fortuately, the single-source GCC build is
good at this. The solution isn't very pretty, but its no worse than the
previous code. We're already summing all of the edge weights on each
query, we can sum them, check for an overflow, compute a scale, and sum
them again.

I've included a *greatly* reduced test case out of the GCC source that
triggers it. It's a pretty lame test, as it clearly is just barely
triggering the overflow. I'd like to have something that is much more
definitive, but I don't understand the fundamental pattern that triggers
an explosion in the edge weight sums.

The buggy code is duplicated within this file. I'll colapse them into
a single implementation in a subsequent commit.

llvm-svn: 144526
2011-11-14 08:50:16 +00:00
Chandler Carruth
b7f21af176 Teach machine block placement to cope with unnatural loops. These don't
get loop info structures associated with them, and so we need some way
to make forward progress selecting and placing basic blocks. The
technique used here is pretty brutal -- it just scans the list of blocks
looking for the first unplaced candidate. It keeps placing blocks like
this until the CFG becomes tractable.

The cost is somewhat unfortunate, it requires allocating a vector of all
basic block pointers eagerly. I have some ideas about how to simplify
and optimize this, but I'm trying to get the logic correct first.

Thanks to Benjamin Kramer for the reduced test case out of GCC. Sadly
there are other bugs that GCC is tickling that I'm reducing and working
on now.

llvm-svn: 144516
2011-11-14 00:00:35 +00:00
Chandler Carruth
e67c92282f Rewrite #3 of machine block placement. This is based somewhat on the
second algorithm, but only loosely. It is more heavily based on the last
discussion I had with Andy. It continues to walk from the inner-most
loop outward, but there is a key difference. With this algorithm we
ensure that as we visit each loop, the entire loop is merged into
a single chain. At the end, the entire function is treated as a "loop",
and merged into a single chain. This chain forms the desired sequence of
blocks within the function. Switching to a single algorithm removes my
biggest problem with the previous approaches -- they had different
behavior depending on which system triggered the layout. Now there is
exactly one algorithm and one basis for the decision making.

The other key difference is how the chain is formed. This is based
heavily on the idea Andy mentioned of keeping a worklist of blocks that
are viable layout successors based on the CFG. Having this set allows us
to consistently select the best layout successor for each block. It is
expensive though.

The code here remains very rough. There is a lot that needs to be done
to clean up the code, and to make the runtime cost of this pass much
lower. Very much WIP, but this was a giant chunk of code and I'd rather
folks see it sooner than later. Everything remains behind a flag of
course.

I've added a couple of tests to exercise the issues that this iteration
was motivated by: loop structure preservation. I've also fixed one test
that was exhibiting the broken behavior of the previous version.

llvm-svn: 144495
2011-11-13 11:20:44 +00:00
Jakob Stoklund Olesen
3eaaa93104 Remove the -color-ss-with-regs option.
It was off by default.

The new register allocators don't have the problems that made it
necessary to reallocate registers during stack slot coloring.

llvm-svn: 144481
2011-11-13 00:31:23 +00:00
Jakob Stoklund Olesen
4aa9c6888f Linear scan is going away.
llvm-svn: 144472
2011-11-12 22:39:34 +00:00
Jakob Stoklund Olesen
43b7a3871b Remove obsolete test.
This test was committed with a bugfix to RemoveCopyByCommutingDef, but
that optimization is no longer triggered by this test.

llvm-svn: 144470
2011-11-12 22:39:27 +00:00
Jakob Stoklund Olesen
6a290484cb Remove obsolete test.
This test is for a very specific LocalRewriter bug.  LocalRewriter is
going away.

llvm-svn: 144469
2011-11-12 22:39:24 +00:00
Jakob Stoklund Olesen
005eabf28a Remove obsolete test.
I don't think this test does what is was supposed to do, and
LocalRewriter is going away anyway.

llvm-svn: 144463
2011-11-12 20:37:57 +00:00
Jakob Stoklund Olesen
c11d7a9b4d Eliminate more linear scan tests.
llvm-svn: 144462
2011-11-12 20:35:26 +00:00
Jakob Stoklund Olesen
0fe59856fd Switch a couple -O0 tests to RABasic.
llvm-svn: 144461
2011-11-12 20:11:04 +00:00
Jakob Stoklund Olesen
f8fed2a3a7 Delete old test of a VirtRegRewriter feature.
This test doesn't expose the issue with RAGreedy.

I filed PR11363 to track the missing InlineSpiller feature.

llvm-svn: 144459
2011-11-12 19:53:48 +00:00
Jakob Stoklund Olesen
49118cf9a5 Remove old test that doesn't make sense.
The test is checking that the output doesn't contains any 'mov '
strings. It does contain movl, though.

llvm-svn: 144458
2011-11-12 19:53:45 +00:00
Craig Topper
0458cdf64a Add more AVX2 shift lowering support. Move AVX2 variable shift to use patterns instead of custom lowering code.
llvm-svn: 144457
2011-11-12 09:58:49 +00:00
Craig Topper
50df7c3842 Add lowering for AVX2 shift instructions.
llvm-svn: 144380
2011-11-11 07:39:23 +00:00
NAKAMURA Takumi
ea14fd81c6 test/CodeGen/X86/lsr-loop-exit-cond.ll: Try to appease linux and freebsd bots to specify explicit -mtriple=x86_64-darwin.
I guess it expects -relocation-model=pic.

llvm-svn: 144290
2011-11-10 14:18:59 +00:00
Evan Cheng
4760ff0763 Use a bigger hammer to fix PR11314 by disabling the "forcing two-address
instruction lower optimization" in the pre-RA scheduler.

The optimization, rather the hack, was done before MI use-list was available.
Now we should be able to implement it in a better way, perhaps in the
two-address pass until a MI scheduler is available.

Now that the scheduler has to backtrack to handle call sequences. Adding
artificial scheduling constraints is just not safe. Furthermore, the hack
is not taking all the other scheduling decisions into consideration so it's just
as likely to pessimize code. So I view disabling this optimization goodness
regardless of PR11314.

llvm-svn: 144267
2011-11-10 07:43:16 +00:00
Jakob Stoklund Olesen
bc48cd34b6 Strip old implicit operands after foldMemoryOperand.
The TII.foldMemoryOperand hook preserves implicit operands from the
original instruction.  This is not what we want when those implicit
operands refer to the register being spilled.

Implicit operands referring to other registers are preserved.

This fixes PR11347.

llvm-svn: 144247
2011-11-10 00:17:03 +00:00
Nadav Rotem
ddc6bfa543 AVX2: Add patterns for variable shift operations
llvm-svn: 144212
2011-11-09 21:22:13 +00:00
Duncan Sands
2934a0eaeb Speculatively revert commit 144124 (djg) in the hope that the 32 bit
dragonegg self-host buildbot will recover (it is complaining about object
files differing between different build stages).  Original commit message:

Add a hack to the scheduler to disable pseudo-two-address dependencies in
basic blocks containing calls. This works around a problem in which
these artificial dependencies can get tied up in calling seqeunce
scheduling in a way that makes the graph unschedulable with the current
approach of using artificial physical register dependencies for calling
sequences. This fixes PR11314.

llvm-svn: 144188
2011-11-09 14:20:48 +00:00
Nadav Rotem
e66a72a2c4 Add AVX2 support for vselect of v32i8
llvm-svn: 144187
2011-11-09 13:21:28 +00:00
Craig Topper
432dd8d623 Enable execution dependency fix pass for YMM registers when AVX2 is enabled. Add AVX2 logical operations to list of replaceable instructions.
llvm-svn: 144179
2011-11-09 09:37:21 +00:00
Craig Topper
7ff77dc2b1 Add instruction selection for AVX2 integer comparisons.
llvm-svn: 144176
2011-11-09 08:06:13 +00:00
Craig Topper
d82abb7156 Add AVX2 instruction lowering for add, sub, and mul.
llvm-svn: 144174
2011-11-09 07:28:55 +00:00
Jakob Stoklund Olesen
1239fed1e2 Collapse DomainValues across loop back-edges.
During the initial RPO traversal of the basic blocks, remember the ones
that are incomplete because of back-edges from predecessors that haven't
been visited yet.

After the initial RPO, revisit all those loop headers so the incoming
DomainValues on the back-edges can be properly collapsed.

This will properly fix execution domains on software pipelined code,
like the included test case.

llvm-svn: 144151
2011-11-09 01:06:56 +00:00
Dan Gohman
b6cf7c4e94 Add a hack to the scheduler to disable pseudo-two-address dependencies in
basic blocks containing calls. This works around a problem in which
these artificial dependencies can get tied up in calling seqeunce
scheduling in a way that makes the graph unschedulable with the current
approach of using artificial physical register dependencies for calling
sequences. This fixes PR11314.

llvm-svn: 144124
2011-11-08 21:29:06 +00:00
Pete Cooper
a1c151814a Adding test for machine-licm operating on invariant load instructions
llvm-svn: 144104
2011-11-08 19:06:53 +00:00
NAKAMURA Takumi
e7c7964113 test/CodeGen/X86/vec_shuffle-39.ll: Add explicit -mtriple=x86_64-linux. Passing packed value is not compatible on Win32 x64.
llvm-svn: 144068
2011-11-08 03:46:39 +00:00
NAKAMURA Takumi
7094a0d830 test/CodeGen/X86/vec_shuffle-38.ll: Relax expression for Win32 x64.
llvm-svn: 144067
2011-11-08 03:46:32 +00:00
NAKAMURA Takumi
8bc13fe0b2 test/CodeGen/X86/vec_shuffle.ll: Add explicit -mtriple=i686-linux. We may see some suboptimal frame (%ebp) emission on certain hosts. Possible [PR11031]
llvm-svn: 144066
2011-11-08 03:46:25 +00:00
Eli Friedman
741d364aa9 Add a bunch of calls to RemoveDeadNode in LegalizeDAG, so legalization doesn't get confused by CSE later on. Fixes PR11318.
Re-commit of r144034, with an extra fix so that RemoveDeadNode doesn't blow up.

llvm-svn: 144055
2011-11-08 01:25:24 +00:00
Evan Cheng
4a63100fe3 Add x86 isel logic and patterns to match movlps from clang generated IR for _mm_loadl_pi(). rdar://10134392, rdar://10050222
llvm-svn: 144052
2011-11-08 00:31:58 +00:00
Bill Wendling
788df1dca1 Convert to the new EH model.
llvm-svn: 144049
2011-11-08 00:17:28 +00:00
Bill Wendling
16499170c2 Convert tests to the new EH model.
llvm-svn: 144048
2011-11-08 00:09:27 +00:00