1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-27 22:12:47 +01:00
Commit Graph

12168 Commits

Author SHA1 Message Date
Jakob Stoklund Olesen
d7a523358c Enable LiveDebugVariables by default.
llvm-svn: 123282
2011-01-11 22:45:28 +00:00
Venkatraman Govindaraju
f681d4e782 SPARC backend: correct ICC/FCC uses for ADDX and SELECT_CC
llvm-svn: 123281
2011-01-11 22:38:28 +00:00
Chris Lattner
586e7af07d Fix PR8946, a missing reg/reg form of movdqu.
llvm-svn: 123242
2011-01-11 17:04:55 +00:00
Daniel Dunbar
fbc0b96c34 McARM: Add more hard coded logic to SplitMnemonicAndCC to also split out the
carry setting flag from the mnemonic.

Note that this currently involves me disabling a number of working cases in
arm_instructions.s, this is a hopefully short term evil which will be rapidly
fixed (and greatly surpassed), assuming my current approach flies.

llvm-svn: 123238
2011-01-11 15:59:50 +00:00
Eric Christopher
c6db56a31e Revert the testcase from the previous reverted commit.
llvm-svn: 123227
2011-01-11 09:20:44 +00:00
Chris Lattner
feade29ab8 merge tests into one crash.ll test.
llvm-svn: 123220
2011-01-11 07:50:07 +00:00
Chris Lattner
5731a92f5b remove a bogus assertion: the latch block of a loop is not
neccesarily an uncond branch to the header.  This fixes 
PR8955 (the assertion tripping).

llvm-svn: 123219
2011-01-11 07:47:59 +00:00
Chandler Carruth
250dce460c Teach constant folding to perform conversions from constant floating
point values to their integer representation through the SSE intrinsic
calls. This is the last part of a README.txt entry for which I have real
world examples.

llvm-svn: 123206
2011-01-11 01:07:24 +00:00
Chandler Carruth
acb82e863d FileCheck-ize a test, and move a no-longer calling test case to another
file and make it actually test something...

llvm-svn: 123205
2011-01-11 01:07:20 +00:00
Owen Anderson
4479341626 Fix a random missed optimization by making InstCombine more aggressive when determining which bits are demanded by
a comparison against a constant.

llvm-svn: 123203
2011-01-11 00:36:45 +00:00
Eric Christopher
68263285d5 Even if we don't have 7 bytes of stack space we may need to save and
restore the stack pointer from the frame pointer on thumbv6.

Fixes rdar://8819685

llvm-svn: 123196
2011-01-11 00:16:04 +00:00
Dale Johannesen
cd78621861 Fix PR 8916 (qv for analysis), at least the immediate problem.
There's an inherent tension in DAGCombine between assuming
that things will be put in canonical form, and the Depth
mechanism that disables transformations when recursion gets
too deep.  It would not surprise me if there's a lot of little
bugs like this one waiting to be discovered.  The mechanism
seems fragile and I'd suggest looking at it from a design viewpoint.

llvm-svn: 123191
2011-01-10 21:53:07 +00:00
Daniel Dunbar
0e9ece99bb McARM: Flush out hard coded known non-predicated mnemonic list.
llvm-svn: 123189
2011-01-10 21:01:03 +00:00
Chandler Carruth
772e26df36 Teach instcombine about the rest of the SSE and SSE2 conversion
intrinsics element dependencies. Reviewed by Nick.

llvm-svn: 123161
2011-01-10 07:19:37 +00:00
Chandler Carruth
7f854ac9a9 Fold two related tests into the newly FileCheck-ized test, migrating
them to FileCheck as well.

llvm-svn: 123154
2011-01-10 02:53:58 +00:00
Chandler Carruth
7c332e5abd Clean up and FileCheck-ize a test.
llvm-svn: 123153
2011-01-10 02:53:54 +00:00
Chris Lattner
867dbe1329 fix typo
llvm-svn: 123148
2011-01-10 02:33:34 +00:00
Chris Lattner
b5562212e2 another (more) aggressive attempt to bring llvm-gcc-i386-linux-selfhost
back to life.

llvm-svn: 123146
2011-01-10 00:47:34 +00:00
Chris Lattner
e8e9ec58bf temporarily disable memset formation from memsets in an effort to restore buildbot stability.
llvm-svn: 123144
2011-01-09 23:52:48 +00:00
Chris Lattner
749f1eff13 add a testcase I missed in previous commit.
llvm-svn: 123143
2011-01-09 23:52:31 +00:00
Tobias Grosser
9899845dd3 Instcombine: Fix pattern where the sext did not dominate the icmp using it
llvm-svn: 123121
2011-01-09 16:00:11 +00:00
Chris Lattner
57e9b35653 teach SCEV analysis of PHI nodes that PHI recurences formed
with GEP instructions are always NUW, because PHIs cannot wrap
the end of the address space.

llvm-svn: 123105
2011-01-09 02:28:48 +00:00
Chris Lattner
fa37cac39c reduce indentation. Print <nuw> and <nsw> when dumping SCEV AddRec's
that have the bit set.

llvm-svn: 123104
2011-01-09 02:16:18 +00:00
Chris Lattner
98136397bd Merge memsets followed by neighboring memsets and other stores into
larger memsets.  Among other things, this fixes rdar://8760394 and
allows us to handle "Example 2" from http://blog.regehr.org/archives/320,
compiling it into a single 4096-byte memset:

_mad_synth_mute:                        ## @mad_synth_mute
## BB#0:                                ## %entry
	pushq	%rax
	movl	$4096, %esi             ## imm = 0x1000
	callq	___bzero
	popq	%rax
	ret

llvm-svn: 123089
2011-01-08 21:19:19 +00:00
Chris Lattner
e09439ed9d fix an issue in IsPointerOffset that prevented us from recognizing that
P and P+1 are relative to the same base pointer.

llvm-svn: 123087
2011-01-08 21:07:56 +00:00
Chris Lattner
20bf2d50b8 enhance memcpyopt to merge a store and a subsequent
memset into a single larger memset.

llvm-svn: 123086
2011-01-08 20:54:51 +00:00
Chris Lattner
756416d4c0 merge two tests and filecheckify
llvm-svn: 123082
2011-01-08 20:27:22 +00:00
Chris Lattner
7d3c4712e9 When loop rotation happens, it is *very* common for the duplicated condbr
to be foldable into an uncond branch.  When this happens, we can make a
much simpler CFG for the loop, which is important for nested loop cases
where we want the outer loop to be aggressively optimized.

Handle this case more aggressively.  For example, previously on
phi-duplicate.ll we would get this:


define void @test(i32 %N, double* %G) nounwind ssp {
entry:
  %cmp1 = icmp slt i64 1, 1000
  br i1 %cmp1, label %bb.nph, label %for.end

bb.nph:                                           ; preds = %entry
  br label %for.body

for.body:                                         ; preds = %bb.nph, %for.cond
  %j.02 = phi i64 [ 1, %bb.nph ], [ %inc, %for.cond ]
  %arrayidx = getelementptr inbounds double* %G, i64 %j.02
  %tmp3 = load double* %arrayidx
  %sub = sub i64 %j.02, 1
  %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
  %tmp7 = load double* %arrayidx6
  %add = fadd double %tmp3, %tmp7
  %arrayidx10 = getelementptr inbounds double* %G, i64 %j.02
  store double %add, double* %arrayidx10
  %inc = add nsw i64 %j.02, 1
  br label %for.cond

for.cond:                                         ; preds = %for.body
  %cmp = icmp slt i64 %inc, 1000
  br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge

for.cond.for.end_crit_edge:                       ; preds = %for.cond
  br label %for.end

for.end:                                          ; preds = %for.cond.for.end_crit_edge, %entry
  ret void
}

Now we get the much nicer:

define void @test(i32 %N, double* %G) nounwind ssp {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %j.01 = phi i64 [ 1, %entry ], [ %inc, %for.body ]
  %arrayidx = getelementptr inbounds double* %G, i64 %j.01
  %tmp3 = load double* %arrayidx
  %sub = sub i64 %j.01, 1
  %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
  %tmp7 = load double* %arrayidx6
  %add = fadd double %tmp3, %tmp7
  %arrayidx10 = getelementptr inbounds double* %G, i64 %j.01
  store double %add, double* %arrayidx10
  %inc = add nsw i64 %j.01, 1
  %cmp = icmp slt i64 %inc, 1000
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

With all of these recent changes, we are now able to compile:

void foo(char *X) {
 for (int i = 0; i != 100; ++i) 
   for (int j = 0; j != 100; ++j)
     X[j+i*100] = 0;
}

into a single memset of 10000 bytes.  This series of changes
should also be helpful for other nested loop scenarios as well.

llvm-svn: 123079
2011-01-08 19:59:06 +00:00
Chris Lattner
db05334c7f Three major changes:
1. Rip out LoopRotate's domfrontier updating code.  It isn't
   needed now that LICM doesn't use DF and it is super complex
   and gross.
2. Make DomTree updating code a lot simpler and faster.  The 
   old loop over all the blocks was just to find a block??
3. Change the code that inserts the new preheader to just use
   SplitCriticalEdge instead of doing an overcomplex 
   reimplementation of it.

No behavior change, except for the name of the inserted preheader.

llvm-svn: 123072
2011-01-08 18:52:51 +00:00
Rafael Espindola
9f526bcf4d First step in fixing PR8927:
Add a unnamed_addr bit to global variables and functions. This will be used
to indicate that the address is not significant and therefore the constant
or function can be merged with others.

If an optimization pass can show that an address is not used, it can set this.

Examples of things that can have this set by the FE are globals created to
hold string literals and C++ constructors.

Adding unnamed_addr to a non-const global should have no effect unless
an optimization can transform that global into a constant.

Aliases are not allowed to have unnamed_addr since I couldn't figure
out any use for it.

llvm-svn: 123063
2011-01-08 16:42:36 +00:00
Frits van Bommel
966cc00809 Fix a bug in r123034 (trying to sext/zext non-integers) and clean up a little.
llvm-svn: 123061
2011-01-08 10:51:36 +00:00
Chris Lattner
6729ce1c33 Have loop-rotate simplify instructions (yay instsimplify!) as it clones
them into the loop preheader, eliminating silly instructions like
"icmp i32 0, 100" in fixed tripcount loops.  This also better exposes the 
bigger problem with loop rotate that I'd like to fix: once this has been
folded, the duplicated conditional branch *often* turns into an uncond branch.

Not aggressively handling this is pessimizing later loop optimizations 
somethin' fierce by making "dominates all exit blocks" checks fail.

llvm-svn: 123060
2011-01-08 08:24:46 +00:00
Evan Cheng
1afd04fc59 Recognize inline asm 'rev /bin/bash, ' as a bswap intrinsic call.
llvm-svn: 123048
2011-01-08 01:24:27 +00:00
Evan Cheng
aa16fd02ad Do not model all INLINEASM instructions as having unmodelled side effects.
Instead encode llvm IR level property "HasSideEffects" in an operand (shared
with IsAlignStack). Added MachineInstrs::hasUnmodeledSideEffects() to check
the operand when the instruction is an INLINEASM.

This allows memory instructions to be moved around INLINEASM instructions.

llvm-svn: 123044
2011-01-07 23:50:32 +00:00
Devang Patel
d3ba97949a Speculatively revert r123032.
llvm-svn: 123039
2011-01-07 22:33:41 +00:00
Bob Wilson
c485ff3ced Lower some BUILD_VECTORS using VEXT+shuffle.
Patch by Tim Northover.

llvm-svn: 123035
2011-01-07 21:37:30 +00:00
Tobias Grosser
48469b566a InstCombine: Match min/max hidden by sext/zext
X = sext x; x >s c ? X : C+1 --> X = sext x; X <s C+1 ? C+1 : X
X = sext x; x <s c ? X : C-1 --> X = sext x; X >s C-1 ? C-1 : X
X = zext x; x >u c ? X : C+1 --> X = zext x; X <u C+1 ? C+1 : X
X = zext x; x <u c ? X : C-1 --> X = zext x; X >u C-1 ? C-1 : X
X = sext x; x >u c ? X : C+1 --> X = sext x; X <u C+1 ? C+1 : X
X = sext x; x <u c ? X : C-1 --> X = sext x; X >u C-1 ? C-1 : X

Instead of calculating this with mixed types promote all to the
larger type. This enables scalar evolution to analyze this
expression. PR8866

llvm-svn: 123034
2011-01-07 21:33:14 +00:00
Devang Patel
a52d6c216d Appropriately truncate debug info range in dwarf output.
Enable live debug variables pass.

llvm-svn: 123032
2011-01-07 21:30:41 +00:00
Benjamin Kramer
62b5a4d14c Revert 122959, it needs more thought. Add it back to README.txt with additional notes.
llvm-svn: 123030
2011-01-07 20:42:20 +00:00
Evan Cheng
ae26b91353 Revert r122955. It seems using movups to lower memcpy can cause massive regression (even on Nehalem) in edge cases. I also didn't see any real performance benefit.
llvm-svn: 123015
2011-01-07 19:35:30 +00:00
David Greene
e9b2fb7e0d Rename lisp-like functions as suggested by Gabor Greif as loooong time
ago.  This is both easier to learn and easier to read.

llvm-svn: 123001
2011-01-07 17:05:37 +00:00
Benjamin Kramer
a842a10fc1 Try to unbreak the arm buildbot.
llvm-svn: 122999
2011-01-07 11:35:21 +00:00
Bob Wilson
8341c8971d Add testcases for PR8411 (vget_low and vget_high implemented as shuffles).
llvm-svn: 122997
2011-01-07 06:44:14 +00:00
Bob Wilson
22f18a7e94 Add ARM patterns to match EXTRACT_SUBVECTOR nodes.
Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle
vectors from being translated to EXTRACT_SUBVECTOR.
Patch by Tim Northover.

The test changes are needed to keep those spill-q tests from testing aligned
spills and restores.  If the only aligned stack objects are spill slots, we
no longer realign the stack frame.  Prior to this patch, an EXTRACT_SUBVECTOR
was legalized by loading from the stack, which created an aligned frame index.
Now, however, there is nothing except the spill slot in the stack frame, so
I added an aligned alloca.

llvm-svn: 122995
2011-01-07 04:59:04 +00:00
Duncan Sands
06444485ee Fix the other problem reported in PR8582. Testcase and patch by
Nadav Rotem.

llvm-svn: 122983
2011-01-06 23:45:22 +00:00
Duncan Sands
883391f3f2 Add a testcase for PR8582, which mysteriously fixed itself, in case the problem
comes back some day.

llvm-svn: 122982
2011-01-06 23:04:29 +00:00
Bob Wilson
461eb28678 PR8921: LDM/POP do not support interworking prior to v5t.
llvm-svn: 122970
2011-01-06 19:24:41 +00:00
Rafael Espindola
64814fff0b Correctly disassemble truncated asm.
Patch by Richard Simth.

llvm-svn: 122962
2011-01-06 16:48:42 +00:00
Benjamin Kramer
fb2bb22b6f InstCombine: Turn _chk functions into the "unsafe" variant if length and max langth are equal.
This happens when we take the (non-constant) length from a malloc.

llvm-svn: 122961
2011-01-06 14:22:52 +00:00
Benjamin Kramer
5834b2bab8 InstCombine: If we call llvm.objectsize on a malloc call we can replace it with the size passed to malloc.
llvm-svn: 122959
2011-01-06 13:11:05 +00:00
Benjamin Kramer
d5e1c24646 InstCombine: Teach llvm.objectsize folding to look through GEPs.
llvm-svn: 122958
2011-01-06 13:07:49 +00:00
Evan Cheng
1a1771584e Use movups to lower memcpy and memset even if it's not fast (like corei7).
The theory is it's still faster than a pair of movq / a quad of movl. This
will probably hurt older chips like P4 but should run faster on current
and future Intel processors. rdar://8817010

llvm-svn: 122955
2011-01-06 07:58:36 +00:00
Evan Cheng
cb39cc2164 Re-implement r122936 with proper target hooks. Now getMaxStoresPerMemcpy
etc. takes an option OptSize. If OptSize is true, it would return
the inline limit for functions with attribute OptSize.

llvm-svn: 122952
2011-01-06 06:52:41 +00:00
Chris Lattner
83067bc3e7 implement constant folding support for an exotic constant expr:
ret i64 ptrtoint (i8* getelementptr ([1000 x i8]* @X, i64 1, i64 sub (i64 0, i64 ptrtoint ([1000 x i8]* @X to i64))) to i64)

to "ret i64 1000".  This allows us to correctly compute the trip count
on a loop in PR8883, which occurs with std::fill on a char array.  This
allows us to transform it into a memset with a constant size.

llvm-svn: 122950
2011-01-06 06:19:46 +00:00
Evan Cheng
70711ea54d Revert r122936. I'll re-implement the change.
llvm-svn: 122949
2011-01-06 06:17:53 +00:00
Bill Wendling
b3bf7cd562 Fix test to coincide with r122934 change from PR8919.
llvm-svn: 122937
2011-01-06 01:09:35 +00:00
Evan Cheng
d425aa5d2a r105228 reduced the memcpy / memset inline limit to 4 with -Os to avoid blowing
up freebsd bootloader. However, this doesn't make much sense for Darwin, whose
-Os is meant to optimize for size only if it doesn't hurt performance.
rdar://8821501

llvm-svn: 122936
2011-01-06 01:04:47 +00:00
Evan Cheng
2af40ae781 Avoid zero extend bit test operands to pointer type if all the masks fit in
the original type of the switch statement key.
rdar://8781238

llvm-svn: 122935
2011-01-06 01:02:44 +00:00
Evan Cheng
bf92316fab Optimize:
r1025 = s/zext r1024, 4
  r1026 = extract_subreg r1025, 4
to:
  r1026 = copy r1024

llvm-svn: 122925
2011-01-05 23:06:49 +00:00
Chris Lattner
3ef9db5cd4 fix PR8900, a shuffle miscompilation. Patch by Nadav Rotem!
llvm-svn: 122921
2011-01-05 22:28:46 +00:00
Frits van Bommel
4c9ed7d055 Fix lit for people whose LLVM path contains 'opt', which is a common directory name on Unix-like systems.
llvm-svn: 122873
2011-01-05 15:10:24 +00:00
Chris Lattner
dbb1b09731 fix an off-by-one bug that caused a crash analyzing
ashr's with huge shift amounts, PR8896

llvm-svn: 122814
2011-01-04 18:19:15 +00:00
Tobias Grosser
e8df8a7c39 Include llvm-gcc dir before llvm_tools_dir
This ensures that always the recently compiled tools are picked for testing.

llvm-svn: 122810
2011-01-04 16:01:17 +00:00
Chris Lattner
1f58120bfe Teach loop-idiom to turn a loop containing a memset into a larger memset
when safe.

The testcase is basically this nested loop:
void foo(char *X) {
  for (int i = 0; i != 100; ++i) 
    for (int j = 0; j != 100; ++j)
      X[j+i*100] = 0;
}

which gets turned into a single memset now.  clang -O3 doesn't optimize
this yet though due to a phase ordering issue I haven't analyzed yet.

llvm-svn: 122806
2011-01-04 07:46:33 +00:00
David Greene
3926d82c53 Don't pattern match "/clang" so we don't mangle directory names. Some
tests use absolute paths to clang.

llvm-svn: 122796
2011-01-04 01:05:30 +00:00
Evan Cheng
5f20fa7fed Convert MC tests to .s so codegen changes won't break them.
llvm-svn: 122786
2011-01-03 23:47:14 +00:00
Chris Lattner
cd13979300 Duncan deftly points out that readnone functions aren't
invalidated by stores, so they can be handled as 'simple'
operations.

llvm-svn: 122785
2011-01-03 23:38:13 +00:00
Evan Cheng
25f7df1bce Use pushq / popq instead of subq $8, %rsp / addq $8, %rsp to adjust stack in
prologue and epilogue if the adjustment is 8. Similarly, use pushl / popl if
the adjustment is 4 in 32-bit mode.

In the epilogue, takes care to pop to a caller-saved register that's not live
at the exit (either return or tailcall instruction).
rdar://8771137

llvm-svn: 122783
2011-01-03 22:53:22 +00:00
David Greene
73d8d1dc69 Don't pattern match "clang-" as it may be part of a tool name with a
triple suffix.

llvm-svn: 122779
2011-01-03 21:55:08 +00:00
Chris Lattner
a7735a573d fix rdar://8813415 - a miscompilation of 164.gzip that loop-idiom
exposed.  It turns out to be a latent bug in basicaa, scary.

llvm-svn: 122772
2011-01-03 21:03:33 +00:00
Chris Lattner
76b74870be filecheckize
llvm-svn: 122771
2011-01-03 21:01:26 +00:00
David Greene
124695aed0 Reapply 122341 to fix PR8199 now that clang changes are in.
llvm-svn: 122754
2011-01-03 17:30:25 +00:00
Chris Lattner
c1ebe702b1 earlycse can do trivial with-a-block dead store
elimination as well.  This deletes 60 stores in 176.gcc
that largely come from bitfield code.

llvm-svn: 122736
2011-01-03 04:17:24 +00:00
Chris Lattner
d19ae32f2f now that loads are in their own table, we can implement
store->load forwarding.  This allows EarlyCSE to zap 600 more
loads from 176.gcc.

llvm-svn: 122732
2011-01-03 03:46:34 +00:00
Chris Lattner
57b02a342e add a testcase for readonly call CSE
llvm-svn: 122730
2011-01-03 03:33:47 +00:00
Chris Lattner
4cfdaa3f02 Teach EarlyCSE to do trivial CSE of loads and read-only calls.
On 176.gcc, this catches 13090 loads and calls, and increases the
number of simple instructions CSE'd from 29658 to 36208.

llvm-svn: 122727
2011-01-03 03:18:43 +00:00
Chris Lattner
39d1fb3320 add DEBUG and -stats output to earlycse.
Teach it to CSE the rest of the non-side-effecting instructions.

llvm-svn: 122716
2011-01-02 23:19:45 +00:00
Chris Lattner
dbad0b5e40 Enhance earlycse to do CSE of casts, instsimplify and die.
Add a testcase.

llvm-svn: 122715
2011-01-02 23:04:14 +00:00
Chris Lattner
0c2cfcf430 fix a miscompilation of tramp3d-v4: when forming a memcpy, we have to make
sure that the loop we're promoting into a memcpy doesn't mutate the input
of the memcpy.  Before we were just checking that the dest of the memcpy
wasn't mod/ref'd by the loop.

llvm-svn: 122712
2011-01-02 21:14:18 +00:00
Chris Lattner
c655681b32 If a loop iterates exactly once (has backedge count = 0) then don't
mess with it.  We'd rather peel/unroll it than convert all of its 
stores into memsets.

llvm-svn: 122711
2011-01-02 20:24:21 +00:00
Benjamin Kramer
a58b69aa9d Try to reuse the value when lowering memset.
This allows us to compile:
  void test(char *s, int a) {
    __builtin_memset(s, a, 15);
  }
into 1 mul + 3 stores instead of 3 muls + 3 stores.

llvm-svn: 122710
2011-01-02 19:57:05 +00:00
Benjamin Kramer
38491f47ce Lower the i8 extension in memset to a multiply instead of a potentially long series of shifts and ors.
We could implement a DAGCombine to turn x * 0x0101 back into logic operations
on targets that doesn't support the multiply or it is slow (p4) if someone cares
enough.

Example code:
  void test(char *s, int a) {
      __builtin_memset(s, a, 4);
  }
before:
  _test:                                  ## @test
    movzbl  8(%esp), %eax
    movl  %eax, %ecx
    shll  $8, %ecx
    orl %eax, %ecx
    movl  %ecx, %eax
    shll  $16, %eax
    orl %ecx, %eax
    movl  4(%esp), %ecx
    movl  %eax, 4(%ecx)
    movl  %eax, (%ecx)
    ret
after:
  _test:                                  ## @test
    movzbl  8(%esp), %eax
    imull $16843009, %eax, %eax   ## imm = 0x1010101
    movl  4(%esp), %ecx
    movl  %eax, 4(%ecx)
    movl  %eax, (%ecx)
    ret

llvm-svn: 122707
2011-01-02 19:44:58 +00:00
Chris Lattner
c78e4bc366 enhance loop idiom recognition to scan *all* unconditionally executed
blocks in a loop, instead of just the header block.  This makes it more
aggressive, able to handle Duncan's Ada examples.

llvm-svn: 122704
2011-01-02 19:01:03 +00:00
Duncan Sands
2d1c116071 Fix PR8702 by not having LoopSimplify claim to preserve LCSSA form. As described
in the PR, the pass could break LCSSA form when inserting preheaders.  It probably
would be easy enough to fix this, but since currently we always go into LCSSA form
after running this pass, doing so is not urgent.

llvm-svn: 122695
2011-01-02 13:38:21 +00:00
Chris Lattner
f669d6a901 Allow loop-idiom to run on multiple BB loops, but still only scan the loop
header for now for memset/memcpy opportunities.  It turns out that loop-rotate
is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for 
loops" into 2 basic block loops that loop-idiom was ignoring.

With this fix, we form many *many* more memcpy and memsets than before, including
on the "history" loops in the viterbi benchmark, which look like this:

        for (j=0; j<MAX_history; ++j) {
          history_new[i][j+1] = history[2*i][j];
        }

Transforming these loops into memcpy's speeds up the viterbi benchmark from
11.98s to 3.55s on my machine.  Woo.

llvm-svn: 122685
2011-01-02 07:58:36 +00:00
Chris Lattner
34a61ab676 teach loop idiom recognition to form memcpy's from simple loops.
llvm-svn: 122678
2011-01-02 03:37:56 +00:00
Chris Lattner
fda382af51 fix a globalopt crash on two Adobe-C++ testcases that the recent
loop idiom pass exposed.

llvm-svn: 122674
2011-01-01 22:31:46 +00:00
Rafael Espindola
2600dec19b Fix darwin bots.
llvm-svn: 122672
2011-01-01 21:58:41 +00:00
Rafael Espindola
55f7a5057d Add support for the 'H' modifier.
llvm-svn: 122667
2011-01-01 20:58:46 +00:00
Anton Korobeynikov
c3dd871e95 Update the test
llvm-svn: 122666
2011-01-01 20:57:26 +00:00
Chris Lattner
b9c1684fce add a validity check that was missed, fixing a crash on the
new testcase.

llvm-svn: 122662
2011-01-01 20:12:04 +00:00
Duncan Sands
aaddf57af9 Revert commit 122654 at the request of Chris, who reckons that instsimplify
is the wrong hammer for this nail, and is probably right.

llvm-svn: 122661
2011-01-01 20:08:02 +00:00
Chris Lattner
9a9f43c4a2 improve validity check to handle constant-trip-count loops more
aggressively.  In practice, this doesn't help anything though,
see the todo.

llvm-svn: 122660
2011-01-01 19:54:22 +00:00
Chris Lattner
4651f8b037 implement the "no aliasing accesses in loop" safety check. This pass
should be correct now.

llvm-svn: 122659
2011-01-01 19:39:01 +00:00
Rafael Espindola
e223c0aa14 Fix PR8878.
llvm-svn: 122658
2011-01-01 19:05:35 +00:00
Duncan Sands
ec8b2b4cc5 Fix a README item by having InstructionSimplify do a mild form of value
numbering, in which it considers (for example) "%a = add i32 %x, %y" and
"%b = add i32 %x, %y" to be equal because the operands are equal and the
result of the instructions only depends on the values of the operands.
This has almost no effect (it removes 4 instructions from gcc-as-one-file),
and perhaps slows down compilation: I measured a 0.4% slowdown on the large
gcc-as-one-file testcase, but it wasn't statistically significant.

llvm-svn: 122654
2011-01-01 16:12:09 +00:00
Che-Liang Chiou
a188cfb574 ptx: remove reg-reg addressing mode and st.const
llvm-svn: 122653
2011-01-01 11:58:58 +00:00
Che-Liang Chiou
995a853724 ptx: add store instruction
llvm-svn: 122652
2011-01-01 10:50:37 +00:00
Nick Lewycky
5cb84ee2cf Add another non-commutable instruction that gas accepts commuted forms for.
Fixes PR8861.

llvm-svn: 122641
2010-12-30 22:10:49 +00:00
Che-Liang Chiou
f4aaa1b2e1 ptx: add state spaces
llvm-svn: 122638
2010-12-30 10:41:27 +00:00
Daniel Dunbar
9ee74282a6 MC/Mach-O/Thumb: Set the thumb bit in the symbol table.
llvm-svn: 122630
2010-12-29 14:14:06 +00:00
Rafael Espindola
7b1b3f5d82 Correctly encode pcrel|indirect.
llvm-svn: 122624
2010-12-29 04:31:26 +00:00
NAKAMURA Takumi
68063487b1 test/Transforms/ConstProp/logicaltest.ll: FileCheck-ize.
llvm-svn: 122620
2010-12-29 03:58:56 +00:00
NAKAMURA Takumi
e16c40416e test/CodeGen/X86/negative-sin.ll: FileCheck-ize.
llvm-svn: 122619
2010-12-29 03:58:47 +00:00
NAKAMURA Takumi
d66bcf9ada test/CodeGen/X86/fp-in-intregs.ll: FileCheck-ize.
llvm-svn: 122618
2010-12-29 03:58:36 +00:00
Rafael Espindola
0408a378e9 Fix bug when trying to output uint16_t or uint32_t.
llvm-svn: 122615
2010-12-29 02:30:49 +00:00
Rafael Espindola
d51ed1fc6a Implement cfi_def_cfa. Also don't convert to dwarf reg numbers twice. Looks
like 6 is a fixed point of that and so the previous tests were OK :-)

llvm-svn: 122614
2010-12-29 01:42:56 +00:00
Rafael Espindola
257de5c4a2 Implement cfi_def_cfa_register.
llvm-svn: 122612
2010-12-29 00:26:06 +00:00
Rafael Espindola
061209eaf3 Initial .cfi_offset implementation.
llvm-svn: 122611
2010-12-29 00:09:59 +00:00
Rafael Espindola
d93ec3572d Don't produce a "DW_CFA_advance_loc 0".
llvm-svn: 122609
2010-12-28 23:38:03 +00:00
Rafael Espindola
3fdd045643 Implement .cfi_remember_state and .cfi_restore_state.
llvm-svn: 122602
2010-12-28 18:36:23 +00:00
Rafael Espindola
c97d642bf7 Relax address updates in the eh_frame section.
llvm-svn: 122591
2010-12-28 05:39:27 +00:00
Rafael Espindola
0552cb0638 Start adding basic support for emitting the call frame instructions.
llvm-svn: 122590
2010-12-28 04:15:37 +00:00
Rafael Espindola
12c30aed07 Add support for .cfi_lsda.
llvm-svn: 122584
2010-12-27 15:56:22 +00:00
Daniel Dunbar
2d0cf8e149 MC/Mach-O/Thumb: Select appropriate relocation types for Thumb.
llvm-svn: 122583
2010-12-27 14:49:49 +00:00
Rafael Espindola
7f947794d7 Handle reloc_riprel_4byte_movq_load. Should make the bots happy.
llvm-svn: 122579
2010-12-27 02:03:24 +00:00
Rafael Espindola
e7e67fce10 Add support for the same encodings of the personality function that gnu as
supports.

llvm-svn: 122577
2010-12-27 00:36:05 +00:00
Chris Lattner
d4daf9f002 implement enough of the memset inference algorithm to recognize and insert
memsets.  This is still missing one important validity check, but this is enough
to compile stuff like this:

void test0(std::vector<char> &X) {
  for (std::vector<char>::iterator I = X.begin(), E = X.end(); I != E; ++I)
    *I = 0;
}

void test1(std::vector<int> &X) {
  for (long i = 0, e = X.size(); i != e; ++i)
    X[i] = 0x01010101;
}

With:
 $ clang t.cpp -S -o - -O2 -emit-llvm | opt -loop-idiom | opt -O3 | llc 

to:

__Z5test0RSt6vectorIcSaIcEE:            ## @_Z5test0RSt6vectorIcSaIcEE
## BB#0:                                ## %entry
	subq	$8, %rsp
	movq	(%rdi), %rax
	movq	8(%rdi), %rsi
	cmpq	%rsi, %rax
	je	LBB0_2
## BB#1:                                ## %bb.nph
	subq	%rax, %rsi
	movq	%rax, %rdi
	callq	___bzero
LBB0_2:                                 ## %for.end
	addq	$8, %rsp
	ret
...
__Z5test1RSt6vectorIiSaIiEE:            ## @_Z5test1RSt6vectorIiSaIiEE
## BB#0:                                ## %entry
	subq	$8, %rsp
	movq	(%rdi), %rax
	movq	8(%rdi), %rdx
	subq	%rax, %rdx
	cmpq	$4, %rdx
	jb	LBB1_2
## BB#1:                                ## %for.body.preheader
	andq	$-4, %rdx
	movl	$1, %esi
	movq	%rax, %rdi
	callq	_memset
LBB1_2:                                 ## %for.end
	addq	$8, %rsp
	ret

llvm-svn: 122573
2010-12-26 23:42:51 +00:00
Chris Lattner
9007b56712 start using irbuilder to make mem intrinsics in a few passes.
llvm-svn: 122572
2010-12-26 22:57:41 +00:00
Rafael Espindola
2ebe553431 Add support for @note. Patch by Jörg Sonnenberger.
llvm-svn: 122568
2010-12-26 21:30:59 +00:00
Rafael Espindola
99f1527316 Add basic support for .cfi_personality.
llvm-svn: 122566
2010-12-26 20:20:31 +00:00
Chris Lattner
2129ce0891 Generalize a previous change, fixing PR8855 - an valid large immediate
rejected by the mc assembler.

llvm-svn: 122557
2010-12-25 21:36:35 +00:00
Benjamin Kramer
49e40d4c4b MemCpyOpt: Turn memcpys from a constant into a memset if possible.
This allows us to compile "int cst[] = {-1, -1, -1};" into
  movl  $-1, 16(%rsp)
  movq  $-1, 8(%rsp)
instead of
  movl  _cst+8(%rip), %eax
  movl  %eax, 16(%rsp)
  movq  _cst(%rip), %rax
  movq  %rax, 8(%rsp)

llvm-svn: 122548
2010-12-24 21:17:12 +00:00
Daniel Dunbar
592854a10a MC/Mach-O/ARM: Start handling some Thumb branches.
llvm-svn: 122547
2010-12-24 16:41:46 +00:00
Kevin Enderby
ff7e68c5e7 In llvm-mc parse a Hash token as a full line comment. Allows handling of
preprocessed .s files and matches darwin gas.  rdar://8798690
Also fix a comment on the next line of AsmParser.cpp after this new code.

llvm-svn: 122531
2010-12-24 00:12:02 +00:00
Owen Anderson
6afd90810e When determining if we can fold (x >> C1) << C2, the bits that we need to verify are zero
are not the low bits of x, but the bits that WILL be the low bits after the operation completes.

llvm-svn: 122529
2010-12-23 23:56:24 +00:00
Bob Wilson
85dbc89f44 Radar 8803471: Fix expansion of ARM BCCi64 pseudo instructions.
If the basic block containing the BCCi64 (or BCCZi64) instruction ends with
an unconditional branch, that branch needs to be deleted before appending
the expansion of the BCCi64 to the end of the block.

llvm-svn: 122521
2010-12-23 22:45:49 +00:00
Torok Edwin
cfd30fdf42 XFAIL vg_leak the new test as the rest.
llvm-svn: 122517
2010-12-23 21:22:09 +00:00
Torok Edwin
2acdd67db2 Fix OCaml bindings crash, PR8847.
See http://caml.inria.fr/mantis/view.php?id=4166
If we call only external functions from a module, then its 'let _' bindings
don't get executed, which means that the exceptions don't get registered for use
in the C code.
This in turn causes llvm_raise to call raise_with_arg() with a NULL pointer and
cause a segmentation fault.

The workaround is to declare all 'external' functions as 'val' in these .mli
files.

Also added a separate testcase (the testcase must call only external functions
for the bug to occur).

llvm-svn: 122497
2010-12-23 15:49:26 +00:00
Andrew Trick
cc701bcfdc Fixes PR8823: add-with-overflow-128.ll
In the bottom-up selection DAG scheduling, handle two-address
instructions that read/write unspillable registers. Treat
the entire chain of two-address nodes as a single live range.

llvm-svn: 122472
2010-12-23 03:15:51 +00:00
Benjamin Kramer
49942a90b7 DAGCombine add (sext i1), X into sub X, (zext i1) if sext from i1 is illegal. The latter usually compiles into smaller code.
example code:
unsigned foo(unsigned x, unsigned y) {
  if (x != 0) y--;
  return y;
}

before:
  _foo:                           ## @foo
    cmpl  $1, 4(%esp)             ## encoding: [0x83,0x7c,0x24,0x04,0x01]
    sbbl  %eax, %eax              ## encoding: [0x19,0xc0]
    notl  %eax                    ## encoding: [0xf7,0xd0]
    addl  8(%esp), %eax           ## encoding: [0x03,0x44,0x24,0x08]
    ret                           ## encoding: [0xc3]

after:
  _foo:                           ## @foo
    cmpl  $1, 4(%esp)             ## encoding: [0x83,0x7c,0x24,0x04,0x01]
    movl  8(%esp), %eax           ## encoding: [0x8b,0x44,0x24,0x08]
    adcl  $-1, %eax               ## encoding: [0x83,0xd0,0xff]
    ret                           ## encoding: [0xc3]

llvm-svn: 122455
2010-12-22 23:17:45 +00:00
Benjamin Kramer
27d13684f5 InstCombine: creating selects from -1 and 0 is fine, they combine into a sext from i1.
llvm-svn: 122453
2010-12-22 23:12:15 +00:00
Benjamin Kramer
d8387aa9bd X86: Lower a select directly to a setcc_carry if possible.
int test(unsigned long a, unsigned long b) { return -(a < b); }
compiles to
  _test:                              ## @test
    cmpq  %rsi, %rdi                  ## encoding: [0x48,0x39,0xf7]
    sbbl  %eax, %eax                  ## encoding: [0x19,0xc0]
    ret                               ## encoding: [0xc3]
instead of
  _test:                              ## @test
    xorl  %ecx, %ecx                  ## encoding: [0x31,0xc9]
    cmpq  %rsi, %rdi                  ## encoding: [0x48,0x39,0xf7]
    movl  $-1, %eax                   ## encoding: [0xb8,0xff,0xff,0xff,0xff]
    cmovael %ecx, %eax                ## encoding: [0x0f,0x43,0xc1]
    ret                               ## encoding: [0xc3]

llvm-svn: 122451
2010-12-22 23:09:28 +00:00
Daniel Dunbar
e6ec0e7149 MC/Mach-O/ARM: Don't try to use scattered relocs for BR24 fixups.
llvm-svn: 122441
2010-12-22 21:26:43 +00:00
Rafael Espindola
5004de4d8b Add reduced test from 8845.
llvm-svn: 122438
2010-12-22 21:15:13 +00:00
Duncan Sands
68d969c2f5 When determining whether the new instruction was already present in
the original instruction, half the cases were missed (making it not
wrong but suboptimal).  Also correct a typo (A <-> B) in the second
chunk. 

llvm-svn: 122414
2010-12-22 17:15:25 +00:00
Duncan Sands
e1522867e6 Make this test not depend on how the variable is named.
llvm-svn: 122413
2010-12-22 17:08:04 +00:00
Daniel Dunbar
cb8ac619a2 MC/Mach-O/ARM: We always use the SECTDIFF reloc type on ARM, which is
esp. important given that the LOCAL_SECTDIFF enumeration got redefined.

llvm-svn: 122412
2010-12-22 16:52:19 +00:00
Daniel Dunbar
e44a2c1166 MC/Mach-O/ARM: Add enough relocation logic to get BR24 relocations.
llvm-svn: 122407
2010-12-22 16:19:24 +00:00
Rafael Espindola
7c995a90fc Simplify the handling of .size expressions.
llvm-svn: 122404
2010-12-22 16:03:00 +00:00
Duncan Sands
922251757b Add a generic expansion transform: A op (B op' C) -> (A op B) op' (A op C)
if both A op B and A op C simplify.  This fires fairly often but doesn't
make that much difference.  On gcc-as-one-file it removes two "and"s and
turns one branch into a select.

llvm-svn: 122399
2010-12-22 13:36:08 +00:00
Che-Liang Chiou
e73ad4387e ptx: add ld instruction and test
llvm-svn: 122398
2010-12-22 10:38:51 +00:00
Chris Lattner
04ef853e23 Fix a bug in ReduceLoadWidth that wasn't handling extending
loads properly.  We miscompiled the testcase into:

_test:                                  ## @test
	movl	$128, (%rdi)
	movzbl	1(%rdi), %eax
	ret

Now we get a proper:

_test:                                  ## @test
	movl	$128, (%rdi)
	movsbl	(%rdi), %eax
	movzbl	%ah, %eax
	ret

This fixes PR8757.

llvm-svn: 122392
2010-12-22 08:02:57 +00:00
Owen Anderson
b4f1511864 Give GVN back the ability to perform simple conditional propagation on conditional branch values.
I still think that LVI should be handling this, but that capability is some ways off in the future,
and this matters for some significant benchmarks.

llvm-svn: 122378
2010-12-21 23:54:34 +00:00
Dale Johannesen
e0fb87c3d7 Reapply 122353-122355 with fixes. 122354 was wrong;
the shift type was needed one place, the shift count
type another.  The transform in 123555 had the same
problem.

llvm-svn: 122366
2010-12-21 21:55:50 +00:00
Benjamin Kramer
369872edfc Add some x86 specific dagcombines for conditional increments.
(add Y, (sete  X, 0)) -> cmp X, 1; adc  0, Y
(add Y, (setne X, 0)) -> cmp X, 1; sbb -1, Y
(sub (sete  X, 0), Y) -> cmp X, 1; sbb  0, Y
(sub (setne X, 0), Y) -> cmp X, 1; adc -1, Y

for
  unsigned foo(unsigned a, unsigned b) {
    if (a == 0) b++;
    return b;
  }
we now get:
  foo:
    cmpl  $1, %edi
    movl  %esi, %eax
    adcl  $0, %eax
    ret
instead of:
  foo:
    testl %edi, %edi
    sete  %al
    movzbl  %al, %eax
    addl  %esi, %eax
    ret

llvm-svn: 122364
2010-12-21 21:41:44 +00:00
Dale Johannesen
972aba543a Revert 122353-122355 for the moment, they broke stuff.
llvm-svn: 122360
2010-12-21 21:22:27 +00:00
Dale Johannesen
39186cfb0b Add a new transform to DAGCombiner.
llvm-svn: 122355
2010-12-21 20:10:51 +00:00
Dale Johannesen
5f3e7b08f6 Get the type of a shift from the shift, not from its shift
count operand.  These should be the same but apparently are
not always, and this is cleaner anyway.  This improves the
code in an existing test.

llvm-svn: 122354
2010-12-21 20:06:19 +00:00
David Greene
33a91c0c9a Revert 122341. It breaks some darwin tests.
llvm-svn: 122346
2010-12-21 17:25:43 +00:00
David Greene
28140b5288 Fix PR 8199. This patch prepends the build tool dir to LLVM programs
being tested.  This ensures that we test the tools just built and not
some random tools that might happen to be in the user's PATH.  This
makes LLVM testing much more stable and predictable.

llvm-svn: 122341
2010-12-21 16:55:53 +00:00
Duncan Sands
658dd68e10 Add an additional InstructionSimplify factorization test.
llvm-svn: 122333
2010-12-21 15:12:22 +00:00
Duncan Sands
b4497c7e0f While I don't think any later transforms can fire, it seems cleaner to
not assume this (for example in case more transforms get added below
it).  Suggested by Frits van Bommel.

llvm-svn: 122332
2010-12-21 15:03:43 +00:00
Duncan Sands
3ceeaf218e Fix typo in comment, spotted by Deewiant.
llvm-svn: 122329
2010-12-21 13:39:20 +00:00
Duncan Sands
0bd25425b6 Teach InstructionSimplify about distributive laws. These transforms fire
quite often, but don't make much difference in practice presumably because
instcombine also knows them and more.

llvm-svn: 122328
2010-12-21 13:32:22 +00:00
Duncan Sands
5880f299da Add generic simplification of associative operations, generalizing
a couple of existing transforms.  This fires surprisingly often, for
example when compiling gcc "(X+(-1))+1->X" fires quite a lot as well
as various "and" simplifications (usually with a phi node operand).
Most of the time this doesn't make a real difference since the same
thing would have been done elsewhere anyway, eg: by instcombine, but
there are a few places where this results in simplifications that we
were not doing before.

llvm-svn: 122326
2010-12-21 08:49:00 +00:00
Bob Wilson
01593c55a2 Add ARM-specific DAG combining to cast i64 vector element load/stores to f64.
Type legalization splits up i64 values into pairs of i32 values, which leads
to poor quality code when inserting or extracting i64 vector elements.
If the vector element is loaded or stored, it can be treated as an f64 value
and loaded or stored directly from a VPR register.  Use the pre-legalization
DAG combiner to cast those vector elements to f64 types so that the type
legalizer won't mess them up.  Radar 8755338.

llvm-svn: 122319
2010-12-21 06:43:19 +00:00
Wesley Peck
e8ec7a4d1f Teach the MBlaze disassembler to disassemble special purpose registers.
llvm-svn: 122269
2010-12-20 21:18:04 +00:00
Roman Divacky
42b3eee794 Set the value of absolute symbols.
llvm-svn: 122268
2010-12-20 21:14:39 +00:00
Roman Divacky
13b5260f62 Print all 64bits for st_value and st_size. Adjust tests accordingly.
llvm-svn: 122263
2010-12-20 20:49:43 +00:00
Wesley Peck
af2890a051 Teach the MBlaze asm parser how to parse special purpose register names.
llvm-svn: 122261
2010-12-20 20:43:24 +00:00
Dale Johannesen
036c3da142 Cosmetic changes.
llvm-svn: 122259
2010-12-20 20:10:50 +00:00
Benjamin Kramer
bec7a6be15 Teach InstCombine to merge (icmp ult (X + CA), C1) | (icmp eq X, C2) into (icmp ult (X + CA), C1 + 1) if C2 + CA == C1.
InstCombine creates these so now we compile x == 23 || x == 24 || x == 25 to
  %x.off = add i32 %x, -23
  %1 = icmp ult i32 %x.off, 3
instead of
  %x.off = add i32 %x, -23
  %1 = icmp ult i32 %x.off, 2
  %cmp3 = icmp eq i32 %x, 25
  %ret2 = or i1 %1, %cmp3

llvm-svn: 122248
2010-12-20 16:18:51 +00:00
Duncan Sands
f72cfa961d Have SimplifyBinOp dispatch Xor, Add and Sub to the corresponding methods
(they had just been forgotten before).  Adding Xor causes "main" in the
existing testcase 2010-11-01-lshr-mask.ll to be hugely more simplified.

llvm-svn: 122245
2010-12-20 14:47:04 +00:00
Chris Lattner
b27b5d0a3a fix PR8807 by making transformConstExprCastCall aware of byval arguments.
llvm-svn: 122238
2010-12-20 08:36:38 +00:00
Chris Lattner
ba962825a4 when eliding a byval copy due to inlining a readonly function, we have
to make sure that the reused alloca has sufficient alignment.

llvm-svn: 122236
2010-12-20 08:10:40 +00:00
Chris Lattner
c0a48df9f9 pull byval processing out to its own helper function.
llvm-svn: 122235
2010-12-20 07:57:41 +00:00
Chris Lattner
029952c844 fix PR8769, a miscompilation by inliner when inlining a function with a byval
argument.  The generated alloca has to have at least the alignment of the
byval, if not, the client may be making assumptions that the new alloca won't
satisfy.

llvm-svn: 122234
2010-12-20 07:45:28 +00:00
Chris Lattner
52149d6e21 merge two tests.
llvm-svn: 122233
2010-12-20 07:39:57 +00:00
Chris Lattner
2fa128c4c5 filecheckize
llvm-svn: 122232
2010-12-20 07:38:24 +00:00
Chris Lattner
4c3662e299 temporarily disable this: PR8823.
llvm-svn: 122222
2010-12-20 02:11:23 +00:00
Chris Lattner
bee7320c3c now that addc/adde are gone, "ADDC" in the X86 backend uses EFLAGS results,
the same as setcc.  Optimize ADDC(0,0,FLAGS) -> SET_CARRY(FLAGS).  This is
a step towards finishing off PR5443.  In the testcase in that bug we now  get:

	movq	%rdi, %rax
	addq	%rsi, %rax
	sbbq	%rcx, %rcx
	testb	$1, %cl
	setne	%dl
	ret

instead of:

	movq	%rdi, %rax
	addq	%rsi, %rax
	movl	$0, %ecx
	adcq	$0, %rcx
	testq	%rcx, %rcx
	setne	%dl
	ret

llvm-svn: 122219
2010-12-20 01:37:09 +00:00
Chris Lattner
2d4e17d195 We lower setb to sbb with the hope that the and will go away, when it
doesn't, match it back to setb.

On a 64-bit version of the testcase before we'd get:

	movq	%rdi, %rax
	addq	%rsi, %rax
	sbbb	%dl, %dl
	andb	$1, %dl
	ret

now we get:

	movq	%rdi, %rax
	addq	%rsi, %rax
	setb	%dl
	ret

llvm-svn: 122217
2010-12-20 01:16:03 +00:00
Mon P Wang
236fa96503 Test case for r122215 when InstCombine optimizes memset
llvm-svn: 122216
2010-12-20 01:06:23 +00:00
Mon P Wang
666259546c Add comment for testcase for 122206
llvm-svn: 122210
2010-12-20 00:54:26 +00:00
Mon P Wang
d3adab7a64 Prevents PerformShuffleCombine from creating a node with an illegal type after legalize types
has run, e.g., prevent creating an i64 node from a v2i64 when i64 is not a legal type.

llvm-svn: 122206
2010-12-19 23:55:53 +00:00
Chris Lattner
297259f6f1 improve the setcc -> setcc_carry optimization to happen more
consistently by moving it out of lowering into dag combine.

Add some missing patterns for matching away extended versions of setcc_c.

llvm-svn: 122201
2010-12-19 22:08:31 +00:00
Chris Lattner
ad85635a93 now that generic vector types aren't selected onto MMX registers, these
tests don't need -disable-mmx.

llvm-svn: 122188
2010-12-19 20:12:58 +00:00
Chris Lattner
29475c23d0 X86 supports i8/i16 overflow ops (except i8 multiplies), we should
generate them.  

Now we compile:

define zeroext i8 @X(i8 signext %a, i8 signext %b) nounwind ssp {
entry:
  %0 = tail call %0 @llvm.sadd.with.overflow.i8(i8 %a, i8 %b)
  %cmp = extractvalue %0 %0, 1
  br i1 %cmp, label %if.then, label %if.end

into:

_X:                                     ## @X
## BB#0:                                ## %entry
	subl	$12, %esp
	movb	16(%esp), %al
	addb	20(%esp), %al
	jo	LBB0_2

Before we were generating:

_X:                                     ## @X
## BB#0:                                ## %entry
	pushl	%ebp
	movl	%esp, %ebp
	subl	$8, %esp
	movb	12(%ebp), %al
	testb	%al, %al
	setge	%cl
	movb	8(%ebp), %dl
	testb	%dl, %dl
	setge	%ah
	cmpb	%cl, %ah
	sete	%cl
	addb	%al, %dl
	testb	%dl, %dl
	setge	%al
	cmpb	%al, %ah
	setne	%al
	andb	%cl, %al
	testb	%al, %al
	jne	LBB0_2

llvm-svn: 122186
2010-12-19 20:03:11 +00:00
Chris Lattner
2d6a408c4c add a general coverage test for overflow intrinsics.
llvm-svn: 122185
2010-12-19 20:01:13 +00:00
Chris Lattner
3bc741a0d2 recognize an unsigned add with overflow idiom into uadd.
This resolves a README entry and technically resolves PR4916,
but we still get poor code for the testcase in that PR because
GVN isn't CSE'ing uadd with add, filed as PR8817.

Previously we got:

_test7:                                 ## @test7
	addq	%rsi, %rdi
	cmpq	%rdi, %rsi
	movl	$42, %eax
	cmovaq	%rsi, %rax
	ret

Now we get:

_test7:                                 ## @test7
	addq	%rsi, %rdi
	movl	$42, %eax
	cmovbq	%rsi, %rax
	ret

llvm-svn: 122182
2010-12-19 19:37:52 +00:00
Chris Lattner
faef9b6bfb optimize uadd(x, cst) into a comparison when the normal
result is dead.  This is required for my next patch to not
regress the testsuite.

llvm-svn: 122181
2010-12-19 19:35:32 +00:00
Chris Lattner
d1f114d8f2 generalize the sadd creation code to not require that the
sadd formed is half the size of the original type. We can
now compile this into a sadd.i8:

unsigned char X(char a, char b) {
  int res = a+b;
  if ((unsigned )(res+128) > 255U)
    abort();
  return res;
}

llvm-svn: 122178
2010-12-19 18:35:09 +00:00
Chris Lattner
bb0d067691 fix another miscompile in the llvm.sadd formation logic: it wasn't
checking to see if the high bits of the original add result were dead.
Inserting a smaller add and zexting back to that size is not good enough.

This is likely to be the fix for 8816.

llvm-svn: 122177
2010-12-19 18:22:06 +00:00
Chris Lattner
c7876edb16 fix a bug (possibly 8816) in the sadd forming xform: it isn't
profitable (or safe) to promote code when the add-with-constant
has other uses.

llvm-svn: 122175
2010-12-19 17:59:02 +00:00
Chris Lattner
bb93cd80d6 Enhance LICM to promote alias sets whose pointers themselves are stored,
which doesn't affect the memory address being promoted.

llvm-svn: 122172
2010-12-19 05:57:25 +00:00
Chris Lattner
71fcecf597 fix PR8602, a bug in an assertion: a volatile store *of* a pointer
does not make the alias set for that pointer volatile, just stores
*to* the pointer.

llvm-svn: 122171
2010-12-19 05:51:54 +00:00
Chris Lattner
ac82ea26da fix PR8642: if a critical edge has a PHI value that can trap,
isel is *required* to split the edge.  PHI values get evaluated
on the edge, not in their predecessor block.

llvm-svn: 122170
2010-12-19 04:58:57 +00:00
Chris Lattner
0965f3f76d revert r122164, I'm going to go with a different approach.
llvm-svn: 122168
2010-12-19 04:23:03 +00:00
Chris Lattner
14a3e26146 first step to fixing PR8642: don't fold away empty basic blocks
which have trapping constant exprs in them due to PHI nodes.
Eliminating them can cause the constant expr to be evalutated
on new paths if the input edges are critical.

llvm-svn: 122164
2010-12-19 03:02:34 +00:00
Chris Lattner
1cc35d2472 move this test into the ARM test so that it is only run when the arm backend
is enabled.

llvm-svn: 122163
2010-12-19 02:58:14 +00:00
Anton Korobeynikov
f49c9c02d6 Restore the behavior of frame lowering before my refactoring.
It turns out that ppc backend has really weird interdependencies
over different hooks and all stuff is fragile wrt small changes.
This should fix PR8749

llvm-svn: 122155
2010-12-18 19:53:14 +00:00
Benjamin Kramer
84d3e6cfd0 Just rename the functions, relying on matching a instruction that has the same name as a symbol is way too fragile.
llvm-svn: 122154
2010-12-18 14:23:57 +00:00
Benjamin Kramer
4d591385d1 Test more than just label names and make test work on non-x86 hosts.
llvm-svn: 122153
2010-12-18 14:07:28 +00:00
Roman Divacky
ed5bb14415 Add support for lexing single quotes like 'c'.
This fixed 8615.

llvm-svn: 122150
2010-12-18 08:56:37 +00:00
Rafael Espindola
4c1b83e53f Add a test that shows that we produce no fixups when computing the difference
of two symbols in the same fragment.

llvm-svn: 122145
2010-12-18 05:07:45 +00:00
Rafael Espindola
dff6c18a5e Test for push being relaxed.
llvm-svn: 122124
2010-12-18 01:16:59 +00:00
Bob Wilson
776d3f73eb Fix result type of Neon floating-point comparisons against zero.
The result vector elements are always integers.  Radar 8782191.

llvm-svn: 122112
2010-12-18 00:04:33 +00:00
Nate Begeman
063d88d6fb Add vector versions of some existing scalar transforms to aid codegen in matching psign & pblend operations to the IR produced by clang/gcc for their C idioms.
llvm-svn: 122105
2010-12-17 23:12:19 +00:00
Bill Wendling
c16f9b1ccc During local stack slot allocation, the materializeFrameBaseRegister function
may be called. If the entry block is empty, the insertion point iterator will be
the "end()" value. Calling ->getParent() on it (among others) causes problems.

Modify materializeFrameBaseRegister to take the machine basic block and insert
the frame base register at the beginning of that block. (It's very similar to
what the code does all ready. The only difference is that it will always insert
at the beginning of the entry block instead of after a previous materialization
of the frame base register. I doubt that that matters here.)

<rdar://problem/8782198>

llvm-svn: 122104
2010-12-17 23:09:14 +00:00
Bob Wilson
c57c2d755b Fix a DAGCombiner crash when folding binary vector operations with constant
BUILD_VECTOR operands where the element type is not legal.  I had previously
changed this code to insert TRUNCATE operations, but that was just wrong.

llvm-svn: 122102
2010-12-17 23:06:49 +00:00
Bob Wilson
347efe0b29 Combine several vector-related DAGCombiner tests.
llvm-svn: 122101
2010-12-17 23:06:46 +00:00
Nate Begeman
ef5f3c0fa7 Add support for matching psign & plendvb to the x86 target
Remove unnecessary pandn patterns, 'vnot' patfrag looks through bitcasts

llvm-svn: 122098
2010-12-17 22:55:37 +00:00
Dale Johannesen
c2c6ebd82a Add a transform to DAG Combiner. This improves the
code for the case where 32-bit divide by constant is
turned into 64-bit multiply by constant.  8771012.

llvm-svn: 122090
2010-12-17 21:45:49 +00:00
Owen Anderson
6acf8c9125 Reapply r121905 (automatic synthesis of @llvm.sadd.with.overflow) with a fix for a bug that manifested itself
on the DragonEgg self-host bot.  Unfortunately, the testcase is pretty messy and doesn't reduce well due to
interactions with other parts of InstCombine.

llvm-svn: 122072
2010-12-17 18:08:00 +00:00
Benjamin Kramer
39b30b18fa SimplifyCFG: Ranges can be larger than 64 bits. Fixes Release-selfhost build.
llvm-svn: 122054
2010-12-17 10:48:14 +00:00
Kalle Raiskila
68f221707a Don't feed 19 bit immediates to ILA.
Patch (slightly modified) by Visa Putkinen.

llvm-svn: 122052
2010-12-17 09:36:09 +00:00
Chris Lattner
e92f8121d4 improve switch formation to handle small range
comparisons formed by comparisons.  For example,
this:

void foo(unsigned x) {
  if (x == 0 || x == 1 || x == 3 || x == 4 || x == 6) 
    bar();
}

compiles into:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	cmpl	$6, %edi
	ja	LBB0_2
## BB#1:                                ## %entry
	movl	%edi, %eax
	movl	$91, %ecx
	btq	%rax, %rcx
	jb	LBB0_3

instead of:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	cmpl	$2, %edi
	jb	LBB0_4
## BB#1:                                ## %switch.early.test
	cmpl	$6, %edi
	ja	LBB0_3
## BB#2:                                ## %switch.early.test
	movl	%edi, %eax
	movl	$88, %ecx
	btq	%rax, %rcx
	jb	LBB0_4

This catches a bunch of cases in GCC, which look like this:

 %804 = load i32* @which_alternative, align 4, !tbaa !0
 %805 = icmp ult i32 %804, 2
 %806 = icmp eq i32 %804, 3
 %or.cond121 = or i1 %805, %806
 %807 = icmp eq i32 %804, 4
 %or.cond124 = or i1 %or.cond121, %807
 br i1 %or.cond124, label %.thread, label %808

turning this into a range comparison.

llvm-svn: 122045
2010-12-17 06:20:15 +00:00
Daniel Dunbar
1f9fd0b79b MC/Expr: Implemnt more aggressive folding during symbol evaluation using
IsSymbolRefDifferenceFullyResolved(). For example, we will now fold away
something like:
--
_a:
...
L0:
...
L1:
...
.long (L1 - L0) / 2
--

llvm-svn: 122043
2010-12-17 05:50:33 +00:00
Bob Wilson
e06f6eabe7 Fix crash compiling a QQQQ REG_SEQUENCE for a Neon vld3_lane operation.
Radar 8776599

llvm-svn: 122018
2010-12-17 01:21:12 +00:00
Dan Gohman
f8949c3d1a Revert r64460. strtol and friends cannot be marked readonly, even with
a null endptr argument, because they may write to errno.

This fixes a seflhost miscompile observed on Linux targets when TBAA
was enabled.

llvm-svn: 122014
2010-12-17 01:09:43 +00:00
Rafael Espindola
bd13ceed72 "Fix" FDE alignment to match what gas does.
llvm-svn: 122006
2010-12-17 00:28:02 +00:00
Rafael Espindola
3ee4530406 Make pushq produce signed relocations.
llvm-svn: 122005
2010-12-16 22:50:01 +00:00
Duncan Sands
22de496ae3 Speculatively revert commit 121905 since it looks like it might have broken the
dragonegg self-host buildbot.  Original commit message:

Add an InstCombine transform to recognize instances of manual overflow-safe addition
(performing the addition in a wider type and explicitly checking for overflow), and
fold them down to intrinsics.  This currently only supports signed-addition, but could
be generalized if someone works out the magic constant formulas for other operations.

llvm-svn: 121965
2010-12-16 09:40:54 +00:00
Jason W Kim
2f4a8d7553 1. ARM/MC/ELF: A few more ELF relocs for .o
2. Fixed EmitLocalCommonSymbol for ELF (Yes, they exist. :)
   Test added.

llvm-svn: 121951
2010-12-16 03:12:17 +00:00
Dan Gohman
29a260015a -enable-tbaa is on by default now.
llvm-svn: 121945
2010-12-16 02:53:48 +00:00
Dan Gohman
e106936414 Make memcpyopt TBAA-aware.
llvm-svn: 121944
2010-12-16 02:51:19 +00:00
Jason W Kim
529d762fff Fix elf-dump --dump-section-data for .bss section
llvm-svn: 121927
2010-12-16 00:15:10 +00:00
Dan Gohman
a2fd4f2e22 Preserve TBAA tags when doing load PRE.
llvm-svn: 121921
2010-12-15 23:53:55 +00:00
Jim Grosbach
84c2b29b58 Thumb1 had two patterns for the same load-from-constant-pool instruction.
Canonicalize on tLDRpci and remove tLDRcp.

llvm-svn: 121920
2010-12-15 23:52:36 +00:00
Eric Christopher
339499f8f3 Don't handle -arm-long-calls in fast isel for now.
llvm-svn: 121919
2010-12-15 23:47:29 +00:00
Owen Anderson
aefeb448a9 Add an InstCombine transform to recognize instances of manual overflow-safe addition
(performing the addition in a wider type and explicitly checking for overflow), and
fold them down to intrinsics.  This currently only supports signed-addition, but could
be generalized if someone works out the magic constant formulas for other operations.

Fixes <rdar://problem/8558713>.

llvm-svn: 121905
2010-12-15 22:32:38 +00:00
Evan Cheng
68e1ed8752 Teach machine cse to commute instructions.
llvm-svn: 121903
2010-12-15 22:16:21 +00:00
Bob Wilson
438a9a1367 Add Neon VCVT instructions for f32 <-> f16 conversions.
Clang is now providing intrinsics for these and so we need to support them
in the backend.  Radar 8068427.

llvm-svn: 121902
2010-12-15 22:14:12 +00:00
Bob Wilson
1082705e72 Fix misspelled target triples in MC/ARM test commands.
llvm-svn: 121901
2010-12-15 22:14:01 +00:00
Wesley Peck
2a376c535e Lower the MBlaze target specific calling conventions for "interrupt_handler"
and "save_volatiles" correctly. This completes the custom calling convention
functionality changes for the MBlaze backend that were started in 121888.

llvm-svn: 121891
2010-12-15 20:27:28 +00:00
Duncan Sands
2699fb1072 Move Sub simplifications and additional Add simplifications out of
instcombine and into InstructionSimplify.

llvm-svn: 121861
2010-12-15 14:07:39 +00:00
Frits van Bommel
83b7c3773f Teach jump threading to "look through" a select when the branch direction of a terminator depends on it.
When it sees a promising select it now tries to figure out whether the condition of the select is known in any of the predecessors and if so it maps the operands appropriately.

llvm-svn: 121859
2010-12-15 09:51:20 +00:00
Rafael Espindola
94d026d157 Relax alignment fragments.
With this we don't need the EffectiveSize field anymore. Without that field
LayoutFragment only updates offsets and we don't need to invalidate the
current fragment when it is relaxed (only the ones following it).

This is also a very small improvement in the accuracy of the layout info as
we now use the after relaxation size immediately.

llvm-svn: 121857
2010-12-15 08:45:53 +00:00
Rafael Espindola
f55f520a7a Patch by David Meyer to avoid a O(N^2) behaviour when relaxing fragments.
Since we now don't update addresses so early, we might relax a bit more than
we need to. This is simillar to the issue in PR8467.

llvm-svn: 121856
2010-12-15 07:39:29 +00:00
Chris Lattner
81815cd4db take care of some todos, transforming [us]mul_lohi into
a wider mul if the wider mul is legal.

llvm-svn: 121848
2010-12-15 06:04:19 +00:00
Chris Lattner
3bec2e7d0d merge two tests
llvm-svn: 121847
2010-12-15 05:58:59 +00:00
Kevin Enderby
6b3ae489f8 Add some more MC tests for ARM arithmetic instructions that update or don't
update the condition codes.  These come from my test generator and are just
the ones that MC currently assembles correctly.

llvm-svn: 121830
2010-12-15 01:24:36 +00:00
Owen Anderson
de42e1136e Fix PR8790, another instance where unreachable code can cause instruction simplification to fail,
this case involve a select that simplifies to itself.

llvm-svn: 121817
2010-12-15 00:55:35 +00:00
Evan Cheng
7e96e67d98 Fix a minor bug in two-address pass. It was missing a commute opportunity.
regB = move RCX
regA = op regB, regC
RAX  = move regA
where both regB and regC are killed. If regB is constrainted to non-compatible
physical registers but regC is not constrainted at all, then it's better to
commute the instruction.
       movl    %edi, %eax
       shlq    $32, %rcx
       leaq    (%rcx,%rax), %rax
=>
       movl    %edi, %eax
       shlq    $32, %rcx
       orq     %rcx, %rax
rdar://8762995

llvm-svn: 121793
2010-12-14 21:34:53 +00:00
Daniel Dunbar
3f9b9dc852 MC/ARM: Fix-up fixup offset for fixup_arm_branch target specific fixup.
llvm-svn: 121772
2010-12-14 17:37:16 +00:00
Chris Lattner
c1aaf52608 - Insert new instructions before DomBlock's terminator,
which is simpler than finding a place to insert in BB.
 - Don't perform the 'if condition hoisting' xform on certain
   i1 PHIs, as it interferes with switch formation.

This re-fixes "example 7", without breaking the world hopefully.

llvm-svn: 121764
2010-12-14 08:46:09 +00:00
Chris Lattner
22d4dc5a4d fix two significant issues with FoldTwoEntryPHINode:
first, it can kick in on blocks whose conditions have been
folded to a constant, even though one of the edges will be
trivially folded.

second, it doesn't clean up the "if diamond" that it just 
eliminated away.  This is a problem because other simplifycfg
xforms kick in depending on the order of block visitation,
causing pointless work.

llvm-svn: 121762
2010-12-14 08:01:53 +00:00
Chris Lattner
5d4aea9791 fix yet anohter broken line
llvm-svn: 121750
2010-12-14 06:09:07 +00:00
Chris Lattner
093b5b256d reapply my recent change that disables a piece of the switch formation
work, but fixes 400.perlbmk.

llvm-svn: 121749
2010-12-14 05:57:30 +00:00
Evan Cheng
6a2bed92f5 bfi A, (and B, C1), C2) -> bfi A, B, C2 iff C1 & C2 == C1. rdar://8458663
llvm-svn: 121746
2010-12-14 03:22:07 +00:00
Jason W Kim
0cf0f7a078 fix fixme case typo :-)
llvm-svn: 121743
2010-12-14 01:42:38 +00:00
Owen Anderson
5536134dc4 Fix recent buildbot breakage by pulling SimplifyCFG back to its state as of r121694, the most recent state
where I'm confident there were no crashes or miscompilations.  XFAIL the test added since then for now.

llvm-svn: 121733
2010-12-13 23:49:28 +00:00
Jason W Kim
b5cc5dad79 First cut of ARM/MC/ELF PIC relocations.
Test has fixme, to move to .s -> .o test when AsmParser works better.

llvm-svn: 121732
2010-12-13 23:16:07 +00:00
Bob Wilson
33e5e902b0 Remove the rest of the *_sfp Neon instruction patterns.
Use the same COPY_TO_REGCLASS approach as for the 2-register *_sfp instructions.
This change made a big difference in the code generated for the
CodeGen/Thumb2/cross-rc-coalescing-2.ll test: The coalescer is still doing
a fine job, but some instructions that were previously moved outside the loop
are not moved now.  It's using fewer VFP registers now, which is generally
a good thing, so I think the estimates for register pressure changed and that
affected the LICM behavior.  Since that isn't obviously wrong, I've just
changed the test file.  This completes the work for Radar 8711675.

llvm-svn: 121730
2010-12-13 23:02:37 +00:00
Chris Lattner
dcba81d96f temporarily disable part of my previous patch, which causes an iterator invalidation issue, causing a crash on some versions of perlbmk.
llvm-svn: 121728
2010-12-13 23:02:19 +00:00
Dan Gohman
b187cce266 Reapply r121520, PartialAlias implementation for BasicAA, now that
memdep is updated to handle it.

llvm-svn: 121725
2010-12-13 22:50:24 +00:00
Benjamin Kramer
7f1cdac1e4 Fix sort predicate. qsort(3)'s predicate semantics differ from std::sort's. Fixes PR 8780.
llvm-svn: 121705
2010-12-13 18:20:38 +00:00
Chris Lattner
d17cbf803b rename test
llvm-svn: 121697
2010-12-13 08:39:40 +00:00
Chris Lattner
14810c808b Add a couple dag combines to transform mulhi/mullo into a wider multiply
when the wider type is legal.  This allows us to compile:

define zeroext i16 @test1(i16 zeroext %x) nounwind {
entry:
	%div = udiv i16 %x, 33
	ret i16 %div
}

into:

test1:                                  # @test1
	movzwl	4(%esp), %eax
	imull	$63551, %eax, %eax      # imm = 0xF83F
	shrl	$21, %eax
	ret

instead of:

test1:                                  # @test1
        movw    $-1985, %ax             # imm = 0xFFFFFFFFFFFFF83F
        mulw    4(%esp)
        andl    $65504, %edx            # imm = 0xFFE0
        movl    %edx, %eax
        shrl    $5, %eax
        ret

Implementing rdar://8760399 and example #4 from:
http://blog.regehr.org/archives/320

We should implement the same thing for [su]mul_hilo, but I don't
have immediate plans to do this.

llvm-svn: 121696
2010-12-13 08:39:01 +00:00