1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 21:42:54 +02:00
Commit Graph

4385 Commits

Author SHA1 Message Date
Andrew Trick
916e01c917 Recommit r129383. PreRA scheduler heuristic fixes: VRegCycle, TokenFactor latency.
Additional fixes:
Do something reasonable for subtargets with generic
itineraries by handle node latency the same as for an empty
itinerary. Now nodes default to unit latency unless an itinerary
explicitly specifies a zero cycle stage or it is a TokenFactor chain.

Original fixes:
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make the ndoe latency adjustments work, I also
needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.

llvm-svn: 129421
2011-04-13 00:38:32 +00:00
Bill Wendling
0984f4927e Reapply r129401 with patch for clang.
llvm-svn: 129419
2011-04-13 00:36:11 +00:00
Eric Christopher
147cad907a Temporarily revert r129408 to see if it brings the bots back.
llvm-svn: 129417
2011-04-13 00:20:59 +00:00
Eric Christopher
c72bd6024f Fix a bug where we were counting the alias sets as completely used
registers for fast allocation.

Fixes rdar://9207598

llvm-svn: 129408
2011-04-12 23:23:14 +00:00
Bill Wendling
f6446a0961 Revert r129401 for now. Clang is using the old way of doing things.
llvm-svn: 129403
2011-04-12 22:59:27 +00:00
Bill Wendling
f9c9d3e05b Remove the unaligned load intrinsics in favor of using native unaligned loads.
Now that we have a first-class way to represent unaligned loads, the unaligned
load intrinsics are superfluous.

First part of <rdar://problem/8460511>.

llvm-svn: 129401
2011-04-12 22:46:31 +00:00
Andrew Trick
d83e7b6a5d Revert 129383. It causes some targets to hit a scheduler assert.
llvm-svn: 129385
2011-04-12 20:14:07 +00:00
Andrew Trick
1e0821075d PreRA scheduler heuristic fixes: VRegCycle, TokenFactor latency.
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make these heuristic adjustments to node latency work,
I also needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.

llvm-svn: 129383
2011-04-12 19:54:36 +00:00
Cameron Zwarich
c05412175e Split a store of a VMOVDRR into two integer stores to avoid mixing NEON and ARM
stores of arguments in the same cache line. This fixes the second half of
<rdar://problem/8674845>.

llvm-svn: 129345
2011-04-12 02:24:17 +00:00
Wesley Peck
6263e05e6d Add scheduling information for the MBlaze backend.
llvm-svn: 129311
2011-04-11 22:31:52 +00:00
Evan Cheng
ea0d287a8a Look pass copies when determining whether hoisting would end up inserting more copies. rdar://9266679
llvm-svn: 129297
2011-04-11 21:09:18 +00:00
Chris Lattner
dab8e5119b look for the verboten argument slot access in any order, thanks to Frits
for pointing this out

llvm-svn: 129217
2011-04-09 17:00:34 +00:00
Chris Lattner
f7623daa2b Fix a bug where RecursivelyDeleteTriviallyDeadInstructions could
delete the instruction pointed to by CGP's current instruction
iterator, leading to a crash on the testcase.  This fixes PR9578.

llvm-svn: 129200
2011-04-09 07:05:44 +00:00
Chris Lattner
7cc2bc5cd1 fix two completely broken tests, which were matching due to PR9629.
llvm-svn: 129195
2011-04-09 06:34:38 +00:00
Chris Lattner
9fb9788a47 remove a bunch of CHECK lines that aren't checking what
they thought they were, because alternation was expanding
wrong in {{}}'s.

llvm-svn: 129194
2011-04-09 06:31:06 +00:00
Chris Lattner
badb8ca63c have dag combine zap "store undef", which can be formed during call lowering
with undef arguments.

llvm-svn: 129185
2011-04-09 02:32:02 +00:00
Chris Lattner
de62b962e8 don't test for codegen of 'store undef'
llvm-svn: 129184
2011-04-09 02:31:26 +00:00
Evan Cheng
bc053100af Change -arm-trap-func= into a non-arm specific option. Now Intrinsic::trap is lowered into a call to the specified trap function at sdisel time.
llvm-svn: 129152
2011-04-08 21:37:21 +00:00
Evan Cheng
9049eb2113 Add option to emit @llvm.trap as a function call instead of a trap instruction. rdar://9249183.
llvm-svn: 129107
2011-04-07 20:31:12 +00:00
Andrew Trick
36a1759769 Added a check in the preRA scheduler for potential interference on a
induction variable. The preRA scheduler is unaware of induction vars,
so we look for potential "virtual register cycles" instead.

Fixes <rdar://problem/8946719> Bad scheduling prevents coalescing

llvm-svn: 129100
2011-04-07 19:54:57 +00:00
Akira Hatanaka
24e15bbe94 Fix handling of functions with internal linkage.
llvm-svn: 129099
2011-04-07 19:51:44 +00:00
Tanya Lattner
3deb96fad7 Prevent ARM DAG Combiner from doing an AND or OR combine on an illegal vector type (vectors of size 3). Also included test cases.
llvm-svn: 129074
2011-04-07 15:24:20 +00:00
Evan Cheng
859dff2c87 Change -arm-divmod-libcall to a target neutral option.
llvm-svn: 129045
2011-04-07 00:58:44 +00:00
Owen Anderson
37b60bdf09 Teach the ARM peephole optimizer that RSB, RSC, ADC, and SBC can be used for folded comparisons, just like ADD and SUB.
llvm-svn: 129038
2011-04-06 23:35:59 +00:00
Jakob Stoklund Olesen
a0e0f8d74b These tests no longer require linear scan because reserved register coalescing is now universal.
llvm-svn: 128936
2011-04-05 21:40:41 +00:00
Jakob Stoklund Olesen
a819faa2f7 Run LiveDebugVariables in RegAllocBasic and RegAllocGreedy.
llvm-svn: 128935
2011-04-05 21:40:37 +00:00
Jakob Stoklund Olesen
c6297924dd Fix one more batch of X86 tests to be register allocation dependent.
llvm-svn: 128919
2011-04-05 20:20:30 +00:00
Jakob Stoklund Olesen
613bcf88be When dead code elimination removes all but one use, try to fold the single def into the remaining use.
Rematerialization can leave single-use loads behind that we might as well fold whenever possible.

llvm-svn: 128918
2011-04-05 20:20:26 +00:00
Johnny Chen
8b1acb8d9b Fix test-llvm failures.
llvm-svn: 128906
2011-04-05 18:41:40 +00:00
Stuart Hastings
3ed284e001 ARM doesn't support byval yet. XFAIL this test until it does.
llvm-svn: 128891
2011-04-05 17:16:21 +00:00
Jakob Stoklund Olesen
2bef449b52 Ensure all defs referring to a virtual register are marked dead by addRegisterDead().
There can be multiple defs for a single virtual register when they are defining
sub-registers.

The missing <dead> flag was stopping the inline spiller from eliminating dead
code after rematerialization.

llvm-svn: 128888
2011-04-05 16:53:50 +00:00
Rafael Espindola
7618e7be93 Print visibility info for external variables.
llvm-svn: 128887
2011-04-05 15:51:32 +00:00
Eric Christopher
b126193e19 Fix up testcase for previous commit.
llvm-svn: 128870
2011-04-05 00:56:01 +00:00
Jakob Stoklund Olesen
32cf19caa6 Fix register-dependent X86 tests.
llvm-svn: 128867
2011-04-05 00:32:44 +00:00
Jakob Stoklund Olesen
1454095d5e Allow coalescing with reserved physregs in certain cases:
When a virtual register has a single value that is defined as a copy of a
reserved register, permit that copy to be joined. These virtual register are
usually copies of the stack pointer:

  %vreg75<def> = COPY %ESP; GR32:%vreg75
  MOV32mr %vreg75, 1, %noreg, 0, %noreg, %vreg74<kill>
  MOV32mi %vreg75, 1, %noreg, 8, %noreg, 0
  MOV32mi %vreg75<kill>, 1, %noreg, 4, %noreg, 0
  CALLpcrel32 ...

Coalescing these virtual registers early decreases register pressure.
Previously, they were coalesced by RALinScan::attemptTrivialCoalescing after
register allocation was completed.

The lower register pressure causes the mcinst-lowering-cmp0.ll test case to fail
because it depends on linear scan spilling a particular register.

I am deleting 2008-08-05-SpillerBug.ll because it is counting the number of
instructions emitted, and its revision history shows the 'correct' count being
edited many times.

llvm-svn: 128845
2011-04-04 21:00:03 +00:00
Jakob Stoklund Olesen
3d3cee403f Disable the PowerPC/Atomics-64 test.
The code inserted by PPCTargetLowering::EmitInstrWithCustomInserter for ppc64 is
wrong, and I don't know how to fix it. It seems to be using the correct register
classes for pointers, but it inserts all 32-bit instructions.

llvm-svn: 128835
2011-04-04 17:57:26 +00:00
Jakob Stoklund Olesen
57a62da2db Fix PowerPC tests to be register allocator independent.
llvm-svn: 128827
2011-04-04 17:07:03 +00:00
Che-Liang Chiou
c4a22b7cd5 ptx: support setp's 4-operand format
llvm-svn: 128767
2011-04-02 08:51:39 +00:00
Cameron Zwarich
9573b6277e Do some peephole optimizations to remove pointless VMOVs from Neon to integer
registers that arise from argument shuffling with the soft float ABI. These
instructions are particularly slow on Cortex A8. This fixes one half of
<rdar://problem/8674845>.

llvm-svn: 128759
2011-04-02 02:40:43 +00:00
Jim Grosbach
039844acc5 LDRD/STRD instructions should print both Rt and Rt2 in the asm string.
llvm-svn: 128736
2011-04-01 20:26:57 +00:00
Akira Hatanaka
c2d74b05ca Add code for analyzing FP branches. Clean up branch Analysis functions.
llvm-svn: 128718
2011-04-01 17:39:08 +00:00
Evan Cheng
830f695385 Add test case.
llvm-svn: 128707
2011-04-01 06:27:25 +00:00
Evan Cheng
985215c699 FileCheck'ify test.
llvm-svn: 128706
2011-04-01 03:36:33 +00:00
Jakob Stoklund Olesen
369e673289 Fix Thumb and Thumb2 tests to be register allocator independent.
llvm-svn: 128690
2011-03-31 23:31:50 +00:00
Jakob Stoklund Olesen
5421130bfc Provide a legal pointer register class when targeting thumb1.
The LocalStackSlotAllocation pass was creating illegal registers.

llvm-svn: 128687
2011-03-31 23:02:15 +00:00
Jakob Stoklund Olesen
26236c8554 Fix SystemZ tests
llvm-svn: 128686
2011-03-31 23:02:12 +00:00
Jakob Stoklund Olesen
33f01d005c Fix ARM tests to be register allocator independent.
llvm-svn: 128680
2011-03-31 22:14:03 +00:00
Evan Cheng
64850406cf Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplier
accumulator forwarding:
vadd d3, d0, d1
vmul d3, d3, d2
=>
vmul d3, d0, d2
vmla d3, d1, d2

llvm-svn: 128665
2011-03-31 19:38:48 +00:00
Jakob Stoklund Olesen
36c7c9d42d Fix Mips, Sparc, and XCore tests that were dependent on register allocation.
Add an extra run with -regalloc=basic to keep them honest.

llvm-svn: 128654
2011-03-31 18:42:43 +00:00
Akira Hatanaka
b26c89ee68 Added support for FP conditional move instructions and fixed bugs in handling of FP comparisons.
llvm-svn: 128650
2011-03-31 18:26:17 +00:00
Jakob Stoklund Olesen
a935319339 Don't completely eliminate identity copies that also modify super register liveness.
Turn them into noop KILL instructions instead. This lets the scavenger know when
super-registers are killed and defined.

llvm-svn: 128645
2011-03-31 17:55:25 +00:00
Jakob Stoklund Olesen
84bb8092b6 Mark all uses as <undef> when joining a copy.
This way, shrinkToUses() will ignore the instruction that is about to be
deleted, and we avoid leaving invalid live ranges that SplitKit doesn't like.

Fix a misunderstanding in MachineVerifier about <def,undef> operands. The
<undef> flag is valid on def operands where it has the same meaning as <undef>
on a use operand. It only applies to sub-register defines which also read the
full register.

llvm-svn: 128642
2011-03-31 17:23:25 +00:00
Richard Osborne
5b9df0d075 Add XCore intrinsics for initializing / starting / synchronizing threads.
llvm-svn: 128633
2011-03-31 15:13:13 +00:00
Jakob Stoklund Olesen
e72dfb1c45 Pick a conservative register class when creating a small live range for remat.
The rematerialized instruction may require a more constrained register class
than the register being spilled. In the test case, the spilled register has been
inflated to the DPR register class, but we are rematerializing a load of the
ssub_0 sub-register which only exists for DPR_VFP2 registers.

The register class is reinflated after spilling, so the conservative choice is
only temporary.

llvm-svn: 128610
2011-03-31 03:54:44 +00:00
Evan Cheng
fa37c7d815 Don't try to create zero-sized stack objects.
llvm-svn: 128586
2011-03-30 23:44:13 +00:00
Cameron Zwarich
1b8f91d2c8 Add a ARM-specific SD node for VBSL so that forms with a constant first operand
can be recognized. This fixes <rdar://problem/9183078>.

llvm-svn: 128584
2011-03-30 23:01:21 +00:00
Evan Cheng
ed09135349 Add intrinsics @llvm.arm.neon.vmulls and @llvm.arm.neon.vmullu.* back. Frontends
was lowering them to sext / uxt + mul instructions. Unfortunately the
optimization passes may hoist the extensions out of the loop and separate them.
When that happens, the long multiplication instructions can be broken into
several scalar instructions, causing significant performance issue.

Note the vmla and vmls intrinsics are not added back. Frontend will codegen them
as intrinsics vmull* + add / sub. Also note the isel optimizations for catching
mul + sext / zext are not changed either.

First part of rdar://8832507, rdar://9203134

llvm-svn: 128502
2011-03-29 23:06:19 +00:00
Cameron Zwarich
95260e5ebb Add Neon SINT_TO_FP and UINT_TO_FP lowering from v4i16 to v4f32. Fixes
<rdar://problem/8875309> and <rdar://problem/9057191>.

llvm-svn: 128492
2011-03-29 21:41:55 +00:00
Rafael Espindola
b103223cdd Reduce test case.
llvm-svn: 128445
2011-03-29 02:18:54 +00:00
Evan Cheng
5bcaef9cc9 Optimizing (zext A + zext B) * C, to (VMULL A, C) + (VMULL B, C) during
isel lowering to fold the zero-extend's and take advantage of no-stall
back to back vmul + vmla:
 vmull q0, d4, d6
 vmlal q0, d5, d6
is faster than
 vaddl q0, d4, d5
 vmovl q1, d6                                                                                                                                                                             
 vmul  q0, q0, q1

This allows us to vmull + vmlal for:
    f = vmull_u8(   vget_high_u8(s), c);
    f = vmlal_u8(f, vget_low_u8(s),  c);

rdar://9197392

llvm-svn: 128444
2011-03-29 01:56:09 +00:00
Bill Wendling
cb8447ad52 In some cases, the "fail BB dominator" may be null after the BB was split (and
becomes reachable when before it wasn't). Check to make sure that it's not null
before trying to use it.

llvm-svn: 128434
2011-03-28 23:02:18 +00:00
Jakob Stoklund Olesen
446412de55 Collect and coalesce DBG_VALUE instructions before emitting the function.
Correctly terminate the range of register DBG_VALUEs when the register is
clobbered or when the basic block ends.

The code is now ready to deal with variables that are sometimes in a register
and sometimes on the stack. We just need to teach emitDebugLoc to say 'stack
slot'.

llvm-svn: 128327
2011-03-26 02:19:36 +00:00
Eric Christopher
b51c27cd9a Fix the bfi handling for or (and a mask) (and b mask). We need the two
masks to match inversely for the code as is to work. For the example given
we actually want:

bfi r0, r2, #1, #1

not #0, however, given the way the pattern is written it's not possible
at the moment.

Fixes rdar://9177502

llvm-svn: 128320
2011-03-26 01:21:03 +00:00
Jakob Stoklund Olesen
ab0501221b Emit less labels for debug info and stop emitting .loc directives for DBG_VALUEs.
The .dot directives don't need labels, that is a leftover from when we created
line number info manually.

Instructions following a DBG_VALUE can share its label since the DBG_VALUE
doesn't produce any code.

llvm-svn: 128284
2011-03-25 17:20:59 +00:00
Devang Patel
c6ed54c434 Move test in x86 specific area.
llvm-svn: 128245
2011-03-24 22:39:09 +00:00
Devang Patel
4909f41ec5 Keep track of directory namd and fIx regression caused by Rafael's patch r119613.
A better approach would be to move source id handling inside MC.

llvm-svn: 128233
2011-03-24 20:30:50 +00:00
NAKAMURA Takumi
cabdaca3c7 Target/X86: [PR8777][PR8778] Tweak alloca/chkstk for Windows targets.
FIXME: Some cleanups would be needed.
llvm-svn: 128206
2011-03-24 07:07:00 +00:00
Cameron Zwarich
4d1c5fe9ae Do early taildup of ret in CodeGenPrepare for potential tail calls that have a
void return type. This fixes PR9487.

llvm-svn: 128197
2011-03-24 04:52:10 +00:00
Devang Patel
2cea16e9bb Enable GlobalMerge on darwin.
llvm-svn: 128183
2011-03-23 23:34:19 +00:00
Andrew Trick
80893981d6 Revert r128175.
I'm backing this out for the second time. It was supposed to be fixed by r128164, but the mingw self-host must be defeating the fix.

llvm-svn: 128181
2011-03-23 23:11:02 +00:00
Evan Cheng
6e799c3c58 Cmp peephole optimization isn't always safe for signed arithmetics.
int tries = INT_MAX;    
while (tries > 0) {
      tries--;
}

The check should be:
        subs    r4, #1
        cmp     r4, #0
        bgt     LBB0_1

The subs can set the overflow V bit when r4 is INT_MAX+1 (which loop
canonicalization apparently does in this case). cmp #0 would have cleared
it while not changing the N and Z bits. Since BGT is dependent on the V
bit, i.e. (N == V) && !Z, it is not safe to eliminate the cmp #0.

rdar://9172742

llvm-svn: 128179
2011-03-23 22:52:04 +00:00
Eli Friedman
76fcfaab12 PR9535: add support for splitting and scalarizing vector ISD::FP_ROUND.
Also cleaning up some duplicated code while I'm here.

llvm-svn: 128176
2011-03-23 22:18:48 +00:00
Andrew Trick
a7b48f34b1 Reapply Eli's r127852 now that the pre-RA scheduler can spill EFLAGS.
(target-specific branchless method for double-width relational comparisons on x86)

llvm-svn: 128175
2011-03-23 22:16:02 +00:00
Jakob Stoklund Olesen
28ebc380f6 Reapply r128045 and r128051 with fixes.
This will extend the ranges of debug info variables in registers until they are
clobbered.

Fix 1: Don't mistake DBG_VALUE instructions referring to incoming arguments on
the stack with DBG_VALUE instructions referring to variables in the frame
pointer. This fixes the gdb test-suite failure.

Fix 2: Don't trace through copies to physical registers setting up call
arguments. These registers are call clobbered, and the source register is more
likely to be a callee-saved register that can be extended through the call
instruction.

llvm-svn: 128114
2011-03-22 22:33:08 +00:00
Andrew Trick
63dc418ea3 Revert r128045 and r128051, debug info enhancements.
Temporarily reverting these to see if we can get llvm-objdump to link. Hopefully this is not the problem.

llvm-svn: 128097
2011-03-22 19:18:42 +00:00
Che-Liang Chiou
7c5fc3a68f ptx: add analyze/insert/remove branch
llvm-svn: 128084
2011-03-22 14:12:00 +00:00
Jakob Stoklund Olesen
fc9e8a04c3 Dont emit 'DBG_VALUE %noreg, ...' to terminate user variable ranges.
These ranges get completely jumbled by the post-ra scheduler, and it is not
really reasonable to expect it to make sense of them.

Instead, teach DwarfDebug to notice when user variables in registers are
clobbered, and terminate the ranges there.

llvm-svn: 128045
2011-03-22 00:21:41 +00:00
Dan Gohman
a83323bca5 Fix fast-isel address mode folding to avoid folding instructions
outside of the current basic block. This fixes PR9500, rdar://9156159.

llvm-svn: 128041
2011-03-22 00:04:35 +00:00
Rafael Espindola
b5c6ae67ac Write the section table and the section data in the same order that
gun as does. This makes it a lot easier to compare the output of both
as the addresses are now a lot closer.

llvm-svn: 127972
2011-03-20 18:44:20 +00:00
Daniel Dunbar
34c65737c3 Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessors
to canonicalize IR", it broke a lot of things.

llvm-svn: 127954
2011-03-19 21:47:14 +00:00
Evan Cheng
c5f50f7322 SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR
to have single return block (at least getting there) for optimizations. This
is general goodness but it would prevent some tailcall optimizations.
One specific case is code like this:
int f1(void);
int f2(void);
int f3(void);
int f4(void);
int f5(void);
int f6(void);
int foo(int x) {
  switch(x) {
  case 1: return f1();
  case 2: return f2();
  case 3: return f3();
  case 4: return f4();
  case 5: return f5();
  case 6: return f6();
  }
}

=>
LBB0_2:                                 ## %sw.bb
  callq   _f1
  popq    %rbp
  ret
LBB0_3:                                 ## %sw.bb1
  callq   _f2
  popq    %rbp
  ret
LBB0_4:                                 ## %sw.bb3
  callq   _f3
  popq    %rbp
  ret

This patch teaches codegenprep to duplicate returns when the return value
is a phi and where the phi operands are produced by tail calls followed by
an unconditional branch:

sw.bb7:                                           ; preds = %entry
  %call8 = tail call i32 @f5() nounwind
  br label %return
sw.bb9:                                           ; preds = %entry
  %call10 = tail call i32 @f6() nounwind
  br label %return
return:
  %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ]
  ret i32 %retval.0

This allows codegen to generate better code like this:

LBB0_2:                                 ## %sw.bb
        jmp     _f1                     ## TAILCALL
LBB0_3:                                 ## %sw.bb1
        jmp     _f2                     ## TAILCALL
LBB0_4:                                 ## %sw.bb3
        jmp     _f3                     ## TAILCALL

rdar://9147433

llvm-svn: 127953
2011-03-19 17:17:39 +00:00
Nadav Rotem
92561196b7 Add support for legalizing UINT_TO_FP of vectors on platforms which do
not have native support for this operation (such as X86).
The legalized code uses two vector INT_TO_FP operations and is faster
than scalarizing.

llvm-svn: 127951
2011-03-19 13:09:10 +00:00
Andrew Trick
76870bd5b1 FileCheckize a test.
(one-by-one until valgrind is happy)

llvm-svn: 127925
2011-03-19 00:41:39 +00:00
Evan Cheng
93d04c1c00 Match a few more obvious patterns to revsh. rdar://9147637.
llvm-svn: 127913
2011-03-18 21:52:42 +00:00
Eli Friedman
8d903449c3 Revert r127852; it's apparently causing an ICE on mingw.
llvm-svn: 127909
2011-03-18 21:12:29 +00:00
Justin Holewinski
d9c382441b PTX: Fix various codegen issues
- Emit mad instead of mad.rn for shader model 1.0
- Emit explicit mov.u32 instructions for reading global variables
- (most PTX instructions cannot take global variable immediates)

llvm-svn: 127895
2011-03-18 19:24:28 +00:00
Che-Liang Chiou
2b173c0443 ptx: fix parameter order that is reversed
llvm-svn: 127874
2011-03-18 11:23:56 +00:00
Che-Liang Chiou
f4a2c17cf5 ptx: add unconditional and conditional branch
llvm-svn: 127873
2011-03-18 11:08:52 +00:00
Eli Friedman
64a2b7e4f2 Add a target-specific branchless method for double-width relational
comparisons on x86.  Essentially, the way this works is that SUB+SBB sets
the relevant flags the same way a double-width CMP would.

This is a substantial improvement over the generic lowering in LLVM. The output
is also shorter than the gcc-generated output; I haven't done any detailed
benchmarking, though.

llvm-svn: 127852
2011-03-18 02:34:11 +00:00
Benjamin Kramer
52ffb6ea96 BuildUDIV: If the divisor is even we can simplify the fixup of the multiplied value by introducing an early shift.
This allows us to compile "unsigned foo(unsigned x) { return x/28; }" into
	shrl	$2, %edi
	imulq	$613566757, %rdi, %rax
	shrq	$32, %rax
	ret

instead of
	movl    %edi, %eax
	imulq   $613566757, %rax, %rcx
	shrq    $32, %rcx
	subl    %ecx, %eax
	shrl    %eax
	addl    %ecx, %eax
	shrl    $4, %eax

on x86_64

llvm-svn: 127829
2011-03-17 20:39:14 +00:00
Richard Osborne
6bad79b514 Add XCore intrinsic for setpsc.
llvm-svn: 127821
2011-03-17 18:42:05 +00:00
NAKAMURA Takumi
0639b29656 test/CodeGen/X86/h-registers-1.ll: Add explicit -mtriple=x86_64-linux. It does not need to be checked on x86_64-win32 (aka Win64).
llvm-svn: 127800
2011-03-17 04:24:40 +00:00
NAKAMURA Takumi
8d08700d77 test/CodeGen/X86/constant-pool-remat-0.ll: FileCheck-ize and add explicit -mtriple=x86_64-linux.
llvm-svn: 127775
2011-03-16 23:01:31 +00:00
Cameron Zwarich
2bb1e45ea3 The x86-64 ABI says that a bool is only guaranteed to be sign-extended to a byte
rather than an int. Thankfully, this only causes LLVM to miss optimizations, not
generate incorrect code.

This just fixes the zext at the return. We still insert an i32 ZextAssert when
reading a function's arguments, but it is followed by a truncate and another i8
ZextAssert so it is not optimized.

llvm-svn: 127766
2011-03-16 22:20:18 +00:00
Cameron Zwarich
005920cae9 Rename a test to be more inclusive.
llvm-svn: 127765
2011-03-16 22:20:12 +00:00
Daniel Dunbar
8757b8c000 Revert r127757, "Patch to a fix dwarf relocation problem on ARM. One-line fix
plus the test where it used to break.", which broke Clang self-host of a
Debug+Asserts compiler, on OS X.

llvm-svn: 127763
2011-03-16 22:16:39 +00:00
Richard Osborne
8b90369d96 Add XCore intrinsics for setclk, setrdy.
llvm-svn: 127761
2011-03-16 21:56:00 +00:00
Renato Golin
bf788a5626 Patch to a fix dwarf relocation problem on ARM. One-line fix plus the test where it used to break.
llvm-svn: 127757
2011-03-16 21:05:52 +00:00
Cameron Zwarich
05eb0080d6 Add a test for i1 zeroext arguments on x86-64. We currently generate code that
conforms to the ABI, but DAGCombine could in theory recognize the sequence of
zext asserts and truncates and generate incorrect code.

llvm-svn: 127754
2011-03-16 20:15:44 +00:00
Richard Osborne
318e25c620 Add checkevent intrinsic to check if any resources owned by the current thread
can event.

llvm-svn: 127741
2011-03-16 18:34:00 +00:00
NAKAMURA Takumi
7fd500c31d test/CodeGen/X86: FileCheck-ize and add actions for x86_64-linux and x86_64-win32.
llvm-svn: 127734
2011-03-16 13:53:07 +00:00
NAKAMURA Takumi
aa13f3550e test/CodeGen/X86: Add a pattern for Win64.
llvm-svn: 127733
2011-03-16 13:52:51 +00:00
NAKAMURA Takumi
e776f16e9d test/CodeGen/X86: FileCheck-ize and add explicit -mtriple=x86_64-linux. They are useless to Win64 target.
llvm-svn: 127732
2011-03-16 13:52:38 +00:00
NAKAMURA Takumi
bfbbe15937 test/CodeGen/X86/byval*.ll: Win64 has not supported byval yet.
llvm-svn: 127731
2011-03-16 13:52:20 +00:00
NAKAMURA Takumi
73a9789e43 test/CodeGen/X86/dyn-stackalloc.ll: FileCheck-ize.
llvm-svn: 127730
2011-03-16 13:52:08 +00:00
Bill Wendling
388dad6d62 Some minor cleanups based on feedback.
llvm-svn: 127694
2011-03-15 20:47:26 +00:00
Evan Cheng
59ba6777c3 Do not form thumb2 ldrd / strd if the offset is by multiple of 4. rdar://9133587
llvm-svn: 127683
2011-03-15 18:41:52 +00:00
Richard Osborne
af1b66c427 On the XCore the scavenging slot should be closest to the SP.
llvm-svn: 127680
2011-03-15 15:10:11 +00:00
Richard Osborne
70204c1c29 Add XCore intrinsics for getps, setps, setsr and clrsr.
llvm-svn: 127678
2011-03-15 13:45:47 +00:00
Justin Holewinski
8948485aa7 PTX: Set PTX 2.0 as the minimum supported version
- Remove PTX 1.4 code generation
- Change type of intrinsics to .v4.i32 instead of .v4.i16
- Add and/or/xor integer instructions

llvm-svn: 127677
2011-03-15 13:24:15 +00:00
Evan Cheng
29faaebae9 Add a peephole optimization to optimize pairs of bitcasts. e.g.
v2 = bitcast v1
...
v3 = bitcast v2
...
   = v3
=>
v2 = bitcast v1
...
   = v1
if v1 and v3 are of in the same register class.

bitcast between i32 and fp (and others) are often not nops since they
are in different register classes. These bitcast instructions are often
left because they are in different basic blocks and cannot be
eliminated by dag combine.

rdar://9104514

llvm-svn: 127668
2011-03-15 05:13:13 +00:00
Evan Cheng
bac3e87eaa sext(undef) = 0, because the top bits will all be the same.
zext(undef) = 0, because the top bits will be zero.

llvm-svn: 127649
2011-03-15 02:22:10 +00:00
Bill Wendling
713c4bc3ee Testcase for r127630.
llvm-svn: 127648
2011-03-15 01:49:08 +00:00
Jim Grosbach
3de97c6e32 Clean up ARM tail calls a bit. They're pseudo-instructions for normal branches.
Also more cleanly separate the ARM vs. Thumb functionality. Previously, the
encoding would be incorrect for some Thumb instructions (the indirect calls).

llvm-svn: 127637
2011-03-15 00:30:40 +00:00
Bill Wendling
da1364d669 Generate a VTBL instruction instead of a series of loads and stores when we
can. As Nate pointed out, VTBL isn't super performant, but it *has* to be better
than this:

_shuf:
@ BB#0:       @ %entry
  push        {r4, r7, lr}
  add         r7, sp, #4
  sub         sp, #12
  mov         r4, sp
  bic         r4, r4, #7
  mov         sp, r4
  mov         r2, sp
  vmov        d16, r0, r1
  orr         r0, r2, #6
  orr         r3, r2, #7
  vst1.8      {d16[0]}, [r3]
  vst1.8      {d16[5]}, [r0]
  subs        r4, r7, #4
  orr         r0, r2, #5
  vst1.8      {d16[4]}, [r0]
  orr         r0, r2, #4
  vst1.8      {d16[4]}, [r0]
  orr         r0, r2, #3
  vst1.8      {d16[0]}, [r0]
  orr         r0, r2, #2
  vst1.8      {d16[2]}, [r0]
  orr         r0, r2, #1
  vst1.8      {d16[1]}, [r0]
  vst1.8      {d16[3]}, [r2]
  vldr.64     d16, [sp]
  vmov        r0, r1, d16
  mov         sp, r4
  pop         {r4, r7, pc}

The "illegal" testcase in vext.ll is no longer illegal.
<rdar://problem/9078775>

llvm-svn: 127630
2011-03-14 23:02:38 +00:00
Eric Christopher
8180806e0f Fix this test up a bit.
llvm-svn: 127621
2011-03-14 21:05:21 +00:00
Evan Cheng
cb70b9e80b Minor optimization. sign-ext/anyext of undef is still undef.
llvm-svn: 127598
2011-03-14 18:15:55 +00:00
Justin Holewinski
a2f7c8557c PTX: Emit global arrays with proper sizes
- Emit all arrays as type .b8 and proper sizes in bytes to conform
  to the output of nvcc

llvm-svn: 127584
2011-03-14 15:40:11 +00:00
Justin Holewinski
995d10cfea PTX: Add support for sqrt/sin/cos intrinsics
llvm-svn: 127578
2011-03-14 14:09:33 +00:00
Che-Liang Chiou
6ff0aa8ab3 ptx: add set.p instruction and related changes to predicate execution
llvm-svn: 127577
2011-03-14 11:26:01 +00:00
Eric Christopher
392d8f7d08 Saving files before committing is overrated.
Add a RUN line to this test.

llvm-svn: 127520
2011-03-12 01:36:23 +00:00
Eric Christopher
80a45901e0 Sometimes isPredicable lies to us and tells us we don't need the operands.
Go ahead and add them on when we might want to use them and let
later passes remove them.

Fixes rdar://9118569

llvm-svn: 127518
2011-03-12 01:09:29 +00:00
Jim Grosbach
27eaca3e0d Properly pseudo-ize the ARM LDMIA_RET instruction. This has the nice side-
effect that we get proper instruction printing using the "pop" mnemonic for it.

llvm-svn: 127502
2011-03-11 22:51:41 +00:00
Cameron Zwarich
bf5c9cd119 Roll r127459 back in:
Optimize trivial branches in CodeGenPrepare, which often get created from the
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.

This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.

llvm-svn: 127498
2011-03-11 21:52:04 +00:00
Cameron Zwarich
39a49276db Fix the GCC test suite issue exposed by r127477, which was caused by stack
protector insertion not working correctly with unreachable code. Since that
revision was rolled out, this test doesn't actual fail before this fix.

llvm-svn: 127497
2011-03-11 21:51:56 +00:00
Daniel Dunbar
a02706c889 Revert r127459, "Optimize trivial branches in CodeGenPrepare, which often get
created from the", it broke some GCC test suite tests.

llvm-svn: 127477
2011-03-11 19:30:30 +00:00
Cameron Zwarich
9ed726c151 Optimize trivial branches in CodeGenPrepare, which often get created from the
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.

This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.

llvm-svn: 127459
2011-03-11 04:54:27 +00:00
Eric Christopher
46f43c9cce Change the x86 32-bit scheduler to register pressure and fix up the
corresponding testcases back to the previous versions.

Fixes some performance regressions only seen on 32-bit.

llvm-svn: 127441
2011-03-11 01:05:58 +00:00
Evan Cheng
d5d2d4a158 Avoid replacing the value of a directly stored load with the stored value if the load is indexed. rdar://9117613.
llvm-svn: 127440
2011-03-11 00:48:56 +00:00
Jim Grosbach
1986d9ac8f Properly pseudo-ize MOVCCr and MOVCCs.
llvm-svn: 127434
2011-03-10 23:56:09 +00:00
Justin Holewinski
a26d2f782e PTX: Add preliminary support for floating-point divide and multiply-and-add
llvm-svn: 127410
2011-03-10 16:57:18 +00:00
Che-Liang Chiou
fc6c7ba9d5 ptx: add the rest of special registers of ISA version 2.0
llvm-svn: 127397
2011-03-10 04:05:57 +00:00
Stuart Hastings
fd42046d56 Revert 127359; it broke lencod.
llvm-svn: 127382
2011-03-10 00:25:53 +00:00
Daniel Dunbar
11a8b6c4ff Revert "Re-enable test and hope to silence the buildbots", still broken.
llvm-svn: 127369
2011-03-09 22:48:46 +00:00
Benjamin Kramer
8313cf1cf4 Fix mistyped CHECK lines.
llvm-svn: 127366
2011-03-09 22:07:31 +00:00
Stuart Hastings
6c99b422b1 Tweak test to work on Linux.
llvm-svn: 127364
2011-03-09 21:35:10 +00:00
Stuart Hastings
5c555821bb Disable this test temporarily to reduce BuildBot complaints.
llvm-svn: 127363
2011-03-09 21:33:47 +00:00
Stuart Hastings
61f9a3dab2 X86 byval copies no longer always_inline. <rdar://problem/8706628>
llvm-svn: 127359
2011-03-09 21:10:30 +00:00
Bruno Cardoso Lopes
f34376b0e1 Add a testcase for the addc improvements introduced some commits ago. Patch by Akira Hatanaka
llvm-svn: 127358
2011-03-09 21:05:32 +00:00
Bruno Cardoso Lopes
51df3638f8 Re-enable test and hope to silence the buildbots
llvm-svn: 127357
2011-03-09 21:00:16 +00:00
Bruno Cardoso Lopes
64f0169989 try to make o32 cc tests less specific to silence some buildbots. The test isn't enabled yet, this is will be done in a subsequent commit. Patch by Akira Hatanaka.
llvm-svn: 127356
2011-03-09 20:59:05 +00:00
Jakob Stoklund Olesen
4d0c9d0af7 Make physreg coalescing independent on the number of uses of the virtual register.
The damage done by physreg coalescing only depends on the number of instructions
the extended physreg live range covers. This fixes PR9438.

The heuristic is still luck-based, and physreg coalescing really should be
disabled completely. We need a register allocator with better hinting support
before that is possible.

Convert a test to FileCheck and force spilling by inserting an extra call. The
previous spilling behavior was dependent on misguided physreg coalescing
decisions.

llvm-svn: 127351
2011-03-09 19:27:06 +00:00
Jakob Stoklund Olesen
10d5d5d25c Delete a test case that is very sensitive to coalescer behavior.
The test is derived from an old miscompilation of
MultiSource/Benchmarks/VersaBench/8b10b which is run regularly, so we are not
losing coverage.

llvm-svn: 127350
2011-03-09 19:27:02 +00:00
Bruno Cardoso Lopes
88bef593d8 Improve varags handling, with testcases. Patch by Sasa Stankovic
llvm-svn: 127349
2011-03-09 19:22:22 +00:00
Andrew Trick
0b0002dfba This test case should work with list-ilp or list-burr.
llvm-svn: 127348
2011-03-09 19:17:10 +00:00
NAKAMURA Takumi
fe84f8672a Target/X86: Tweak va_arg for Win64 not to miss taking va_start when number of fixed args > 4.
llvm-svn: 127328
2011-03-09 11:33:15 +00:00
Eric Christopher
3ffd0d2f15 Fix testcase.
llvm-svn: 127298
2011-03-09 00:41:41 +00:00
Benjamin Kramer
c3944efb6f Strip cruft.
llvm-svn: 127269
2011-03-08 20:19:10 +00:00
Eric Christopher
1be4747d53 Add a testcase for r127263.
llvm-svn: 127266
2011-03-08 19:49:15 +00:00
Benjamin Kramer
d5782492c8 X86: Fix the (saddo/ssub x, 1) -> incl/decl selection to check the right operand for 1.
Found by inspection.

llvm-svn: 127247
2011-03-08 15:20:20 +00:00