1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 11:02:59 +02:00
Commit Graph

54776 Commits

Author SHA1 Message Date
Nadav Rotem
7eefbe1005 Add AutoUpgrade support for the SSE4 ptest intrinsics.
Patch by Michael Kuperstein.

llvm-svn: 158295
2012-06-10 18:42:51 +00:00
Hal Finkel
6416ab5c36 Use critical anti-dep. breaking on all PPC targets, but also add other register classes.
Using 'all' instead of 'critical' would be better because it would make it easier to
satisfy the bundling constraints, but, as noted in the FIXME, that is currently not
possible with the crs.

This yields an average 1% speedup over the entire test suite (on Power 7). Largest speedups:
SingleSource/Benchmarks/Shootout-C++/moments - 40%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
SingleSource/Benchmarks/BenchmarkGame/nsieve-bits - 26%
SingleSource/Benchmarks/McGill/misr - 23%
MultiSource/Applications/JM/ldecod/ldecod - 22%

Largest slowdowns:
SingleSource/Benchmarks/Shootout-C++/matrix - -29%
SingleSource/Benchmarks/Shootout-C++/ary3 - -22%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -18%
SingleSource/Benchmarks/Shootout-C++/ary - -17%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - -15%

llvm-svn: 158294
2012-06-10 11:15:36 +00:00
Craig Topper
b355582afd Add intrinsics for immediate form of XOP vprot instructions. Use i128mem instead of f128mem for integer XOP instructions.
llvm-svn: 158291
2012-06-10 07:31:56 +00:00
Hal Finkel
a9b329fcf1 Improve ext/trunc patterns on PPC64.
The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that
would leave self-moves in the final assembly. Replacing those patterns with ones
based on the SUBREG builtins yields better-looking code.

Thanks to Jakob and Owen for their suggestions in this matter.

llvm-svn: 158283
2012-06-09 22:10:19 +00:00
Craig Topper
633e88fd15 Use XOP vpcom intrinsics in patterns instead of a target specific SDNode type. Remove the custom lowering code that selected the SDNode type.
llvm-svn: 158279
2012-06-09 17:02:24 +00:00
Craig Topper
ad5e38e410 Replace XOP vpcom intrinsics with fewer intrinsics that take the immediate as an argument.
llvm-svn: 158278
2012-06-09 16:46:13 +00:00
Aaron Ballman
4f2525a0ee Disabling a spurious deprecation warning about using PathV1 from within the PathV1 implementation file.
llvm-svn: 158274
2012-06-09 13:59:29 +00:00
Aaron Ballman
c8f84c4f1a Fixing a typo in the comments.
llvm-svn: 158273
2012-06-09 13:46:36 +00:00
Benjamin Kramer
bbdde33ff0 Allocate the contents of DwarfDebug's StringMaps in a single big BumpPtrAllocator.
llvm-svn: 158265
2012-06-09 10:34:15 +00:00
Duncan Sands
18d2eeeba4 Silence a gcc-4.6 warning: GCC fails to understand that secondReg and cmpOp2 are
correlated, and thinks that cmpOp2 may be used uninitialized.

llvm-svn: 158263
2012-06-09 10:04:03 +00:00
Hal Finkel
d2d71dd821 Enable tail merging on PPC.
Tail merging had been disabled on PPC because it would disturb bundling decisions
made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions
are made during post-RA scheduling, and tail merging is generally beneficial (the
average test-suite speedup is insignificantly positive).

Largest test-suite speedups:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23%
SingleSource/Benchmarks/Shootout-C++/ary - 21%
SingleSource/Benchmarks/Stanford/Queens - 17%

Largest slowdowns:
MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22%
MultiSource/Applications/JM/ldecod/ldecod - 14%
MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9%

This is improved by using full (instead of just critical) anti-dependency breaking,
but doing so still causes miscompiles and so cannot yet be enabled by default.

llvm-svn: 158259
2012-06-09 03:14:50 +00:00
Andrew Trick
e5a1b98d5d Register pressure: added getPressureAfterInstr.
llvm-svn: 158256
2012-06-09 02:16:58 +00:00
Jakob Stoklund Olesen
c590d6ca6d Sketch a LiveRegMatrix analysis pass.
The LiveRegMatrix represents the live range of assigned virtual
registers in a Live interval union per register unit. This is not
fundamentally different from the interference tracking in RegAllocBase
that both RABasic and RAGreedy use.

The important differences are:

- LiveRegMatrix tracks interference per register unit instead of per
  physical register. This makes interference checks cheaper and
  assignments slightly more expensive. For example, the ARM D7 reigster
  has 24 aliases, so we would check 24 physregs before assigning to one.
  With unit-based interference, we check 2 units before assigning to 2
  units.

- LiveRegMatrix caches regmask interference checks. That is currently
  duplicated functionality in RABasic and RAGreedy.

- LiveRegMatrix is a pass which makes it possible to insert
  target-dependent passes between register allocation and rewriting.
  Such passes could tweak the register assignments with interference
  checking support from LiveRegMatrix.

Eventually, RABasic and RAGreedy will be switched to LiveRegMatrix.

llvm-svn: 158255
2012-06-09 02:13:10 +00:00
Jack Carter
15c7f01638 Test commit
llvm-svn: 158250
2012-06-09 00:27:55 +00:00
Jakob Stoklund Olesen
b5a23c5c71 Also compute MBB live-in lists in the new rewriter pass.
This deduplicates some code from the optimizing register allocators, and
it means that it is now possible to change the register allocators'
solutions simply by editing the VirtRegMap between the register
allocator pass and the rewriter.

llvm-svn: 158249
2012-06-09 00:14:47 +00:00
Dmitri Gribenko
6319fd5eb8 Convert comments to proper Doxygen comments.
llvm-svn: 158248
2012-06-09 00:01:45 +00:00
Jakob Stoklund Olesen
c0bb0e899d Reintroduce VirtRegRewriter.
OK, not really. We don't want to reintroduce the old rewriter hacks.

This patch extracts virtual register rewriting as a separate pass that
runs after the register allocator. This is possible now that
CodeGen/Passes.cpp can configure the full optimizing register allocator
pipeline.

The rewriter pass uses register assignments in VirtRegMap to rewrite
virtual registers to physical registers, and it inserts kill flags based
on live intervals.

These finalization steps are the same for the optimizing register
allocators: RABasic, RAGreedy, and PBQP.

llvm-svn: 158244
2012-06-08 23:44:45 +00:00
Nuno Lopes
4485a55890 canonicalize:
-%a + 42
into
42 - %a

previously we were emitting:
-(%a + 42)

This fixes the infinite loop in PR12338. The generated code is still not perfect, though.
Will work on that next

llvm-svn: 158237
2012-06-08 22:30:05 +00:00
Evan Cheng
287aa9fb18 Start implementing pre-ra if-converter: using speculation and selects to eliminate branches.
llvm-svn: 158234
2012-06-08 21:53:50 +00:00
Andrew Trick
151209d9dc TargetInstrInfo hooks implemented in codegen should be declared pure virtual.
llvm-svn: 158233
2012-06-08 21:52:38 +00:00
Duncan Sands
03f9c316e2 Reapply commit 158073 with a fix (the testcase was already committed). The
problem was that by moving instructions around inside the function, the pass
could accidentally move the iterator being used to advance over the function
too.  Fix this by only processing the instruction equal to the iterator, and
leaving processing of instructions that might not be equal to the iterator
to later (later = after traversing the basic block; it could also wait until
after traversing the entire function, but this might make the sets quite big).
Original commit message:

Grab-bag of reassociate tweaks.  Unify handling of dead instructions and
instructions to reoptimize.  Exploit this to more systematically eliminate
dead instructions (this isn't very useful in practice but is convenient for
analysing some testcase I am working on).  No need for WeakVH any more: use
an AssertingVH instead.

llvm-svn: 158226
2012-06-08 20:15:33 +00:00
Hal Finkel
96aa8d0716 Remove the TODO statement in the PPC README re: CTR loops
As Chris points out, this can now be removed!

TODO: check if the associated section on viterbi's inner loop can also be removed.
llvm-svn: 158224
2012-06-08 20:02:09 +00:00
Hal Finkel
1424f01791 Enable PPC CTR loop formation by default.
Thanks to Jakob's help, this now causes no new test suite failures!

Over the entire test suite, this gives an average 1% speedup. The largest speedups are:
SingleSource/Benchmarks/Misc/pi - 108%
SingleSource/Benchmarks/CoyoteBench/lpbench - 54%
MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50%
SingleSource/Benchmarks/Shootout/ary3 - 32%
SingleSource/Benchmarks/Shootout-C++/matrix - 30%

The largest slowdowns are:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30%
MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22%
MultiSource/Applications/d/make_dparser - -14%
SingleSource/Benchmarks/Shootout-C++/ary - -13%

In light of these slowdowns, additional profiling work is obviously needed!

llvm-svn: 158223
2012-06-08 19:19:53 +00:00
Hal Finkel
44387d4528 Mark the PPC CTRRC and CTRRC8 register classes as non-allocatable.
Marking these classes as non-alocatable allows CTR loop generation to
work correctly with the block placement passes, etc. These register
classes are currently used only by some unused TCRETURN patterns.
In future cleanup, these will be removed.

Thanks again to Jakob for suggesting this fix to the CTR loop problem!

llvm-svn: 158221
2012-06-08 19:02:08 +00:00
Manman Ren
186346ff90 Enable optimization for integer ABS on X86 if Subtarget has CMOV.
llvm-svn: 158220
2012-06-08 18:58:26 +00:00
Chad Rosier
42c1d8b369 Fix a crash in APInt::lshr when shiftAmt > BitWidth.
Patch by James Benton <jbenton@vmware.com>.

llvm-svn: 158213
2012-06-08 18:04:52 +00:00
Andrew Trick
d4053e5a11 Fix Target->Codegen dependence.
Bulk move of TargetInstrInfo implementation into
TargetInstrInfoImpl. This is dirty because the code isn't part of
TargetInstrInfoImpl class, nor should it be, because the methods are
not target hooks. However, it's the current mechanism for keeping
libTarget useful outside the backend. You'll get a not-so-nice link
error if you invoke a TargetInstrInfo method that depends on CodeGen.

The TargetInstrInfoImpl class should probably be removed since it
doesn't really solve this problem.

To really fix this, we probably need separate interfaces for the
CodeGen/nonCodeGen sides of TargetInstrInfo.

llvm-svn: 158212
2012-06-08 17:23:27 +00:00
Nuno Lopes
c6a0165f7f BoundsChecking: add support for ConstantPointerNull. fixes a bunch of instrumentation failures in loops with reallocs
llvm-svn: 158210
2012-06-08 16:31:42 +00:00
Hal Finkel
d05ff520b8 Disable the PPC CTR-Loops pass by default.
The pass itself works well, but the something in the Machine* infrastructure
does not understand terminators which define registers. Without the ability
to use the block-placement pass, etc. this causes performance regressions (and
so is turned off by default). Turning off the analysis turns off the problems
with the Machine* infrastructure.

llvm-svn: 158206
2012-06-08 15:38:25 +00:00
Hal Finkel
a6629c556e Fix a bug in the new PPC CTR-Loops pass.
The code which tests for an induction operation cannot assume that any
ADDI instruction will have a register operand because the operand could
also be a frame index; for example:
    %vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16

llvm-svn: 158205
2012-06-08 15:38:23 +00:00
Hal Finkel
bb4e499e94 Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form CTR-based loop branching code.
This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon
pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are
no longer otherwise used. Also, invalid preheader DebugLoc is not used.

llvm-svn: 158204
2012-06-08 15:38:21 +00:00
Duncan Sands
e6b780ada5 Revert commit 158073 while waiting for a fix. The issue is that reassociate
can move instructions within the instruction list.  If the instruction just
happens to be the one the basic block iterator is pointing to, and it is
moved to a different basic block, then we get into an infinite loop due to
the iterator running off the end of the basic block (for some reason this
doesn't fire any assertions).  Original commit message:

Grab-bag of reassociate tweaks.  Unify handling of dead instructions and
instructions to reoptimize.  Exploit this to more systematically eliminate
dead instructions (this isn't very useful in practice but is convenient for
analysing some testcase I am working on).  No need for WeakVH any more: use
an AssertingVH instead.

llvm-svn: 158199
2012-06-08 13:37:30 +00:00
Manman Ren
f51a6d5fae X86: optimize generated code for integer ABS
This patch will generate the following for integer ABS:
      movl    %edi, %eax
      negl    %eax
      cmovll  %edi, %eax
INSTEAD OF
      movl    %edi, %ecx
      sarl    $31, %ecx
      leal    (%rdi,%rcx), %eax
      xorl    %ecx, %eax

There exists a target-independent DAG combine for integer ABS, which converts
integer ABS to sar+add+xor. For X86, we match this pattern back to neg+cmov. 
This is implemented in PerformXorCombine.

rdar://10695237

llvm-svn: 158175
2012-06-07 22:39:10 +00:00
Nadav Rotem
7d996eafba Do not optimize the used bits of the x86 vselect condition operand, when the condition operand is a vector of 1-bit predicates.
This may happen on MIC devices.

llvm-svn: 158168
2012-06-07 20:53:48 +00:00
Nadav Rotem
e3db9cf2fd Fix a bug in FoldSelectOpOp. Bitcast ops may change the number of vector elements, which may disagree with the select condition type.
llvm-svn: 158166
2012-06-07 20:28:57 +00:00
Andrew Trick
4fe40f02fd Continue factoring computeOperandLatency. Use it for ARM hasHighOperandLatency.
llvm-svn: 158164
2012-06-07 19:42:04 +00:00
Andrew Trick
1429abc731 ARM getOperandLatency rewrite.
Match expectations of the new latency API. Cleanup and make the logic consistent.

llvm-svn: 158163
2012-06-07 19:42:00 +00:00
Andrew Trick
cbd1d0a130 ARM getOperandLatency should return -1 for unknown, consistent with API
llvm-svn: 158162
2012-06-07 19:41:58 +00:00
Andrew Trick
d0f85a1d12 Fix ARM getInstrLatency logic to work with the current API.
llvm-svn: 158161
2012-06-07 19:41:55 +00:00
Manman Ren
c8e46bcf47 PR13046: we can't replace usage of SUB with CMP in the lowering phase.
It will cause assertion failure later on.

llvm-svn: 158160
2012-06-07 19:27:33 +00:00
Rafael Espindola
4c9d611360 Use a base register instead of an index register with the local dynamic model.
Fixes pr13048.

llvm-svn: 158158
2012-06-07 18:39:19 +00:00
Pete Cooper
089cb1dcc5 Move terminator machine verification to check MachineBasicBlock::instr_iterator instead of MBB::iterator
llvm-svn: 158154
2012-06-07 17:41:39 +00:00
Manman Ren
1d91fc3342 X86: replace SUB with CMP if possible
This patch will optimize the following
    movq    %rdi, %rax
    subq    %rsi, %rax
    cmovsq  %rsi, %rdi
    movq    %rdi, %rax
to
    cmpq    %rsi, %rdi
    cmovsq  %rsi, %rdi
    movq    %rdi, %rax

Perform this optimization if the actual result of SUB is not used.

rdar: 11540023
llvm-svn: 158126
2012-06-07 00:42:47 +00:00
Manman Ren
f591de61da Revert r157755.
The commit is intended to fix rdar://11540023.
It is implemented as part of peephole optimization. We can actually implement
this in the SelectionDAG lowering phase.

llvm-svn: 158122
2012-06-06 23:53:03 +00:00
Jakob Stoklund Olesen
5dbcc6898b Properly verify liveness with bundled machine instructions.
Bundles should be treated as one atomic transaction when checking
liveness. That is how the register allocator (and VLIW targets) treats
bundles.

llvm-svn: 158116
2012-06-06 22:34:30 +00:00
Benjamin Kramer
0a2c816e75 Add accessors for all private members of DisasmContext.
LLVM should be -Wunused-private-field clean now.

llvm-svn: 158103
2012-06-06 20:45:10 +00:00
Andrew Trick
3e809a2fba Move RegisterClassInfo.h.
Allow targets to access this API. It's required for RegisterPressure.

llvm-svn: 158102
2012-06-06 20:29:31 +00:00
Andrew Trick
c1ae96787a Move RegisterPressure.h.
Make it a general utility for use by Targets.

llvm-svn: 158097
2012-06-06 19:47:35 +00:00
Benjamin Kramer
58b98297ac Round 2 of dead private variable removal.
LLVM is now -Wunused-private-field clean except for
- lib/MC/MCDisassembler/Disassembler.h. Not sure why it keeps all those unaccessible fields.
- gtest.

llvm-svn: 158096
2012-06-06 19:47:08 +00:00
Benjamin Kramer
d93c18846c Remove unused private fields found by clang's new -Wunused-private-field.
There are some that I didn't remove this round because they looked like
obvious stubs. There are dead variables in gtest too, they should be
fixed upstream.

llvm-svn: 158090
2012-06-06 18:25:08 +00:00