1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-25 14:02:52 +02:00
Commit Graph

5278 Commits

Author SHA1 Message Date
Jakob Stoklund Olesen
83d6dda738 Delete stale comment.
llvm-svn: 144542
2011-11-14 18:03:05 +00:00
Chandler Carruth
462bb16130 Fix an overflow bug in MachineBranchProbabilityInfo. This pass relied on
the sum of the edge weights not overflowing uint32, and crashed when
they did. This is generally safe as BranchProbabilityInfo tries to
provide this guarantee. However, the CFG can get modified during codegen
in a way that grows the *sum* of the edge weights. This doesn't seem
unreasonable (imagine just adding more blocks all with the default
weight of 16), but it is hard to come up with a case that actually
triggers 32-bit overflow. Fortuately, the single-source GCC build is
good at this. The solution isn't very pretty, but its no worse than the
previous code. We're already summing all of the edge weights on each
query, we can sum them, check for an overflow, compute a scale, and sum
them again.

I've included a *greatly* reduced test case out of the GCC source that
triggers it. It's a pretty lame test, as it clearly is just barely
triggering the overflow. I'd like to have something that is much more
definitive, but I don't understand the fundamental pattern that triggers
an explosion in the edge weight sums.

The buggy code is duplicated within this file. I'll colapse them into
a single implementation in a subsequent commit.

llvm-svn: 144526
2011-11-14 08:50:16 +00:00
Chad Rosier
0e5094ca87 Add support for ARM halfword load/stores and signed byte loads with negative
offsets.
rdar://10412592

llvm-svn: 144518
2011-11-14 04:09:28 +00:00
Chandler Carruth
b7f21af176 Teach machine block placement to cope with unnatural loops. These don't
get loop info structures associated with them, and so we need some way
to make forward progress selecting and placing basic blocks. The
technique used here is pretty brutal -- it just scans the list of blocks
looking for the first unplaced candidate. It keeps placing blocks like
this until the CFG becomes tractable.

The cost is somewhat unfortunate, it requires allocating a vector of all
basic block pointers eagerly. I have some ideas about how to simplify
and optimize this, but I'm trying to get the logic correct first.

Thanks to Benjamin Kramer for the reduced test case out of GCC. Sadly
there are other bugs that GCC is tickling that I'm reducing and working
on now.

llvm-svn: 144516
2011-11-14 00:00:35 +00:00
Chandler Carruth
e67c92282f Rewrite #3 of machine block placement. This is based somewhat on the
second algorithm, but only loosely. It is more heavily based on the last
discussion I had with Andy. It continues to walk from the inner-most
loop outward, but there is a key difference. With this algorithm we
ensure that as we visit each loop, the entire loop is merged into
a single chain. At the end, the entire function is treated as a "loop",
and merged into a single chain. This chain forms the desired sequence of
blocks within the function. Switching to a single algorithm removes my
biggest problem with the previous approaches -- they had different
behavior depending on which system triggered the layout. Now there is
exactly one algorithm and one basis for the decision making.

The other key difference is how the chain is formed. This is based
heavily on the idea Andy mentioned of keeping a worklist of blocks that
are viable layout successors based on the CFG. Having this set allows us
to consistently select the best layout successor for each block. It is
expensive though.

The code here remains very rough. There is a lot that needs to be done
to clean up the code, and to make the runtime cost of this pass much
lower. Very much WIP, but this was a giant chunk of code and I'd rather
folks see it sooner than later. Everything remains behind a flag of
course.

I've added a couple of tests to exercise the issues that this iteration
was motivated by: loop structure preservation. I've also fixed one test
that was exhibiting the broken behavior of the previous version.

llvm-svn: 144495
2011-11-13 11:20:44 +00:00
Chad Rosier
58ab241006 The order in which the predicate is added differs between Thumb and ARM mode. Fix predicate when in ARM mode and restore SelectIntrinsicCall.
llvm-svn: 144494
2011-11-13 09:44:21 +00:00
Chad Rosier
8cfccc356e Temporarily disable SelectIntrinsicCall when in ARM mode. This is causing failures.
llvm-svn: 144492
2011-11-13 05:14:43 +00:00
Chad Rosier
acd199b5a4 Add support for emitting both signed- and zero-extend loads. Fix
SimplifyAddress to handle either a 12-bit unsigned offset or the ARM +/-imm8
offsets (addressing mode 3).  This enables a load followed by an integer 
extend to be folded into a single load.

For example:
ldrb r1, [r0]       ldrb r1, [r0]
uxtb r2, r1     =>
mov  r3, r2         mov  r3, r1

llvm-svn: 144488
2011-11-13 02:23:59 +00:00
Jakob Stoklund Olesen
3eaaa93104 Remove the -color-ss-with-regs option.
It was off by default.

The new register allocators don't have the problems that made it
necessary to reallocate registers during stack slot coloring.

llvm-svn: 144481
2011-11-13 00:31:23 +00:00
Jakob Stoklund Olesen
d0ddec5771 Delete the 'standard' spiller with used the old spilling framework.
The current register allocators all use the inline spiller.

llvm-svn: 144477
2011-11-12 23:29:02 +00:00
Jakob Stoklund Olesen
bb527a67c0 Remove histogram tests.
Counting the number of occurences of each opcode is not a useful test.

llvm-svn: 144474
2011-11-12 22:39:40 +00:00
Jakob Stoklund Olesen
9195bec6e7 RAGreedy is better about hinting now.
Or maybe we are just getting lucky.

llvm-svn: 144473
2011-11-12 22:39:37 +00:00
Jakob Stoklund Olesen
4aa9c6888f Linear scan is going away.
llvm-svn: 144472
2011-11-12 22:39:34 +00:00
Jakob Stoklund Olesen
e1b1bbb882 XFAIL test that depends on linear scan to remove dead code.
Filed PR11364 to track the problem.  Should the register allocator
eliminate dead code?

llvm-svn: 144471
2011-11-12 22:39:30 +00:00
Jakob Stoklund Olesen
43b7a3871b Remove obsolete test.
This test was committed with a bugfix to RemoveCopyByCommutingDef, but
that optimization is no longer triggered by this test.

llvm-svn: 144470
2011-11-12 22:39:27 +00:00
Jakob Stoklund Olesen
6a290484cb Remove obsolete test.
This test is for a very specific LocalRewriter bug.  LocalRewriter is
going away.

llvm-svn: 144469
2011-11-12 22:39:24 +00:00
Jakob Stoklund Olesen
005eabf28a Remove obsolete test.
I don't think this test does what is was supposed to do, and
LocalRewriter is going away anyway.

llvm-svn: 144463
2011-11-12 20:37:57 +00:00
Jakob Stoklund Olesen
c11d7a9b4d Eliminate more linear scan tests.
llvm-svn: 144462
2011-11-12 20:35:26 +00:00
Jakob Stoklund Olesen
0fe59856fd Switch a couple -O0 tests to RABasic.
llvm-svn: 144461
2011-11-12 20:11:04 +00:00
Jakob Stoklund Olesen
94ce588b20 Switch a few tests off linearscan.
llvm-svn: 144460
2011-11-12 19:53:52 +00:00
Jakob Stoklund Olesen
f8fed2a3a7 Delete old test of a VirtRegRewriter feature.
This test doesn't expose the issue with RAGreedy.

I filed PR11363 to track the missing InlineSpiller feature.

llvm-svn: 144459
2011-11-12 19:53:48 +00:00
Jakob Stoklund Olesen
49118cf9a5 Remove old test that doesn't make sense.
The test is checking that the output doesn't contains any 'mov '
strings. It does contain movl, though.

llvm-svn: 144458
2011-11-12 19:53:45 +00:00
Craig Topper
0458cdf64a Add more AVX2 shift lowering support. Move AVX2 variable shift to use patterns instead of custom lowering code.
llvm-svn: 144457
2011-11-12 09:58:49 +00:00
Eli Friedman
8563e57e38 Don't try to form pre/post-indexed loads/stores until after LegalizeDAG runs. Fixes PR11029.
llvm-svn: 144438
2011-11-12 00:35:34 +00:00
Chad Rosier
a2a0fbeded Add support in fast-isel for selecting memset/memcpy/memmove intrinsics.
llvm-svn: 144426
2011-11-11 23:31:03 +00:00
Chad Rosier
88ab27405f Loosen test by using REs. Approved by Devang.
llvm-svn: 144425
2011-11-11 23:25:38 +00:00
Andrew Trick
6ff75a5d8d Preserve MachineMemOperands in ARMLoadStoreOptimizer.
Fixes PR8113.

llvm-svn: 144409
2011-11-11 22:18:09 +00:00
Dan Bailey
ad6c209a79 allow non-device function calls in PTX when natively handling device-side printf
llvm-svn: 144388
2011-11-11 14:45:12 +00:00
Craig Topper
50df7c3842 Add lowering for AVX2 shift instructions.
llvm-svn: 144380
2011-11-11 07:39:23 +00:00
Chad Rosier
feb72bfc08 Add support for using immediates with select instructions.
rdar://10412592

llvm-svn: 144376
2011-11-11 06:20:39 +00:00
Eli Friedman
285b451941 Make sure to expand SIGN_EXTEND_INREG for NEON vectors. PR11319, round 3.
llvm-svn: 144361
2011-11-11 03:16:38 +00:00
Chad Rosier
ac92994773 Add support for using MVN to materialize negative constants.
rdar://10412592

llvm-svn: 144348
2011-11-11 00:36:21 +00:00
Chad Rosier
7b7dced006 When in ARM mode, LDRH/STRH require special handling of negative offsets.
For correctness, disable this for now.
rdar://10418009

llvm-svn: 144316
2011-11-10 21:09:49 +00:00
NAKAMURA Takumi
ea14fd81c6 test/CodeGen/X86/lsr-loop-exit-cond.ll: Try to appease linux and freebsd bots to specify explicit -mtriple=x86_64-darwin.
I guess it expects -relocation-model=pic.

llvm-svn: 144290
2011-11-10 14:18:59 +00:00
Evan Cheng
4760ff0763 Use a bigger hammer to fix PR11314 by disabling the "forcing two-address
instruction lower optimization" in the pre-RA scheduler.

The optimization, rather the hack, was done before MI use-list was available.
Now we should be able to implement it in a better way, perhaps in the
two-address pass until a MI scheduler is available.

Now that the scheduler has to backtrack to handle call sequences. Adding
artificial scheduling constraints is just not safe. Furthermore, the hack
is not taking all the other scheduling decisions into consideration so it's just
as likely to pessimize code. So I view disabling this optimization goodness
regardless of PR11314.

llvm-svn: 144267
2011-11-10 07:43:16 +00:00
Chad Rosier
69cdae5eb9 For immediate encodings of icmp, zero or sign extend first. Then
determine if the value is negative and flip the sign accordingly.
rdar://10422026

llvm-svn: 144258
2011-11-10 01:30:39 +00:00
Jakob Stoklund Olesen
bc48cd34b6 Strip old implicit operands after foldMemoryOperand.
The TII.foldMemoryOperand hook preserves implicit operands from the
original instruction.  This is not what we want when those implicit
operands refer to the register being spilled.

Implicit operands referring to other registers are preserved.

This fixes PR11347.

llvm-svn: 144247
2011-11-10 00:17:03 +00:00
Eli Friedman
c93f8aa514 Make sure we correctly unroll conversions between v2f64 and v2i32 on ARM.
llvm-svn: 144241
2011-11-09 23:36:02 +00:00
Eli Friedman
b01f15653c Add check so we don't try to perform an impossible transformation. Fixes issue from PR11319.
llvm-svn: 144216
2011-11-09 22:25:12 +00:00
Nadav Rotem
ddc6bfa543 AVX2: Add patterns for variable shift operations
llvm-svn: 144212
2011-11-09 21:22:13 +00:00
Chad Rosier
228dc76221 Use REs to remove dependencies on the register allocation order.
llvm-svn: 144209
2011-11-09 20:06:13 +00:00
Duncan Sands
2934a0eaeb Speculatively revert commit 144124 (djg) in the hope that the 32 bit
dragonegg self-host buildbot will recover (it is complaining about object
files differing between different build stages).  Original commit message:

Add a hack to the scheduler to disable pseudo-two-address dependencies in
basic blocks containing calls. This works around a problem in which
these artificial dependencies can get tied up in calling seqeunce
scheduling in a way that makes the graph unschedulable with the current
approach of using artificial physical register dependencies for calling
sequences. This fixes PR11314.

llvm-svn: 144188
2011-11-09 14:20:48 +00:00
Nadav Rotem
e66a72a2c4 Add AVX2 support for vselect of v32i8
llvm-svn: 144187
2011-11-09 13:21:28 +00:00
Craig Topper
432dd8d623 Enable execution dependency fix pass for YMM registers when AVX2 is enabled. Add AVX2 logical operations to list of replaceable instructions.
llvm-svn: 144179
2011-11-09 09:37:21 +00:00
Craig Topper
7ff77dc2b1 Add instruction selection for AVX2 integer comparisons.
llvm-svn: 144176
2011-11-09 08:06:13 +00:00
Craig Topper
d82abb7156 Add AVX2 instruction lowering for add, sub, and mul.
llvm-svn: 144174
2011-11-09 07:28:55 +00:00
Chad Rosier
e32fed6868 Add support for encoding immediates in icmp and fcmp. Hopefully, this will
remove a fair number of unnecessary materialized constants.
rdar://10412592

llvm-svn: 144163
2011-11-09 03:22:02 +00:00
Jakob Stoklund Olesen
1239fed1e2 Collapse DomainValues across loop back-edges.
During the initial RPO traversal of the basic blocks, remember the ones
that are incomplete because of back-edges from predecessors that haven't
been visited yet.

After the initial RPO, revisit all those loop headers so the incoming
DomainValues on the back-edges can be properly collapsed.

This will properly fix execution domains on software pipelined code,
like the included test case.

llvm-svn: 144151
2011-11-09 01:06:56 +00:00
Dan Gohman
b6cf7c4e94 Add a hack to the scheduler to disable pseudo-two-address dependencies in
basic blocks containing calls. This works around a problem in which
these artificial dependencies can get tied up in calling seqeunce
scheduling in a way that makes the graph unschedulable with the current
approach of using artificial physical register dependencies for calling
sequences. This fixes PR11314.

llvm-svn: 144124
2011-11-08 21:29:06 +00:00
Evan Cheng
08e61752f2 Add workaround for Cortex-M3 errata 602117 by replacing ldrd x, y, [x] with ldm or ldr pairs.
llvm-svn: 144123
2011-11-08 21:21:09 +00:00