1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 06:22:56 +02:00
Commit Graph

21478 Commits

Author SHA1 Message Date
Hal Finkel
ab5027a1a2 Add POWER6 and POWER7 CPU types to the PPC backend.
No functional change; these will be used by upcoming scheduler enhancements.

llvm-svn: 158313
2012-06-11 15:43:08 +00:00
Bill Wendling
2320ff76d7 Re-enable the CMN instruction.
We turned off the CMN instruction because it had semantics which we weren't
getting correct. If we are comparing with an immediate, then it's okay to use
the CMN instruction.
<rdar://problem/7569620>

llvm-svn: 158302
2012-06-11 08:07:26 +00:00
Hal Finkel
b6ac451381 Enable ILP scheduling for all nodes by default on PPC.
Over the entire test-suite, this has an insignificantly negative average
performance impact, but reduces some of the worst slowdowns from the
anti-dep. change (r158294).

Largest speedups:
SingleSource/Benchmarks/Stanford/Quicksort - 28%
SingleSource/Benchmarks/Stanford/Towers - 24%
SingleSource/Benchmarks/Shootout-C++/matrix - 23%
MultiSource/Benchmarks/SciMark2-C/scimark2 - 19%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15%
(matrix and automotive-bitcount were both in the top-5 slowdown list from the
anti-dep. change)

Largest slowdowns:
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26%
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21%
SingleSource/Benchmarks/CoyoteBench/lpbench - 20%
MultiSource/Applications/d/make_dparser - 16%

llvm-svn: 158296
2012-06-10 19:32:29 +00:00
Hal Finkel
6416ab5c36 Use critical anti-dep. breaking on all PPC targets, but also add other register classes.
Using 'all' instead of 'critical' would be better because it would make it easier to
satisfy the bundling constraints, but, as noted in the FIXME, that is currently not
possible with the crs.

This yields an average 1% speedup over the entire test suite (on Power 7). Largest speedups:
SingleSource/Benchmarks/Shootout-C++/moments - 40%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
SingleSource/Benchmarks/BenchmarkGame/nsieve-bits - 26%
SingleSource/Benchmarks/McGill/misr - 23%
MultiSource/Applications/JM/ldecod/ldecod - 22%

Largest slowdowns:
SingleSource/Benchmarks/Shootout-C++/matrix - -29%
SingleSource/Benchmarks/Shootout-C++/ary3 - -22%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -18%
SingleSource/Benchmarks/Shootout-C++/ary - -17%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - -15%

llvm-svn: 158294
2012-06-10 11:15:36 +00:00
Craig Topper
b355582afd Add intrinsics for immediate form of XOP vprot instructions. Use i128mem instead of f128mem for integer XOP instructions.
llvm-svn: 158291
2012-06-10 07:31:56 +00:00
Hal Finkel
a9b329fcf1 Improve ext/trunc patterns on PPC64.
The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that
would leave self-moves in the final assembly. Replacing those patterns with ones
based on the SUBREG builtins yields better-looking code.

Thanks to Jakob and Owen for their suggestions in this matter.

llvm-svn: 158283
2012-06-09 22:10:19 +00:00
Craig Topper
633e88fd15 Use XOP vpcom intrinsics in patterns instead of a target specific SDNode type. Remove the custom lowering code that selected the SDNode type.
llvm-svn: 158279
2012-06-09 17:02:24 +00:00
Craig Topper
ad5e38e410 Replace XOP vpcom intrinsics with fewer intrinsics that take the immediate as an argument.
llvm-svn: 158278
2012-06-09 16:46:13 +00:00
Duncan Sands
18d2eeeba4 Silence a gcc-4.6 warning: GCC fails to understand that secondReg and cmpOp2 are
correlated, and thinks that cmpOp2 may be used uninitialized.

llvm-svn: 158263
2012-06-09 10:04:03 +00:00
Hal Finkel
d2d71dd821 Enable tail merging on PPC.
Tail merging had been disabled on PPC because it would disturb bundling decisions
made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions
are made during post-RA scheduling, and tail merging is generally beneficial (the
average test-suite speedup is insignificantly positive).

Largest test-suite speedups:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23%
SingleSource/Benchmarks/Shootout-C++/ary - 21%
SingleSource/Benchmarks/Stanford/Queens - 17%

Largest slowdowns:
MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22%
MultiSource/Applications/JM/ldecod/ldecod - 14%
MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9%

This is improved by using full (instead of just critical) anti-dependency breaking,
but doing so still causes miscompiles and so cannot yet be enabled by default.

llvm-svn: 158259
2012-06-09 03:14:50 +00:00
Jack Carter
15c7f01638 Test commit
llvm-svn: 158250
2012-06-09 00:27:55 +00:00
Hal Finkel
96aa8d0716 Remove the TODO statement in the PPC README re: CTR loops
As Chris points out, this can now be removed!

TODO: check if the associated section on viterbi's inner loop can also be removed.
llvm-svn: 158224
2012-06-08 20:02:09 +00:00
Hal Finkel
1424f01791 Enable PPC CTR loop formation by default.
Thanks to Jakob's help, this now causes no new test suite failures!

Over the entire test suite, this gives an average 1% speedup. The largest speedups are:
SingleSource/Benchmarks/Misc/pi - 108%
SingleSource/Benchmarks/CoyoteBench/lpbench - 54%
MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50%
SingleSource/Benchmarks/Shootout/ary3 - 32%
SingleSource/Benchmarks/Shootout-C++/matrix - 30%

The largest slowdowns are:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30%
MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22%
MultiSource/Applications/d/make_dparser - -14%
SingleSource/Benchmarks/Shootout-C++/ary - -13%

In light of these slowdowns, additional profiling work is obviously needed!

llvm-svn: 158223
2012-06-08 19:19:53 +00:00
Hal Finkel
44387d4528 Mark the PPC CTRRC and CTRRC8 register classes as non-allocatable.
Marking these classes as non-alocatable allows CTR loop generation to
work correctly with the block placement passes, etc. These register
classes are currently used only by some unused TCRETURN patterns.
In future cleanup, these will be removed.

Thanks again to Jakob for suggesting this fix to the CTR loop problem!

llvm-svn: 158221
2012-06-08 19:02:08 +00:00
Manman Ren
186346ff90 Enable optimization for integer ABS on X86 if Subtarget has CMOV.
llvm-svn: 158220
2012-06-08 18:58:26 +00:00
Andrew Trick
d4053e5a11 Fix Target->Codegen dependence.
Bulk move of TargetInstrInfo implementation into
TargetInstrInfoImpl. This is dirty because the code isn't part of
TargetInstrInfoImpl class, nor should it be, because the methods are
not target hooks. However, it's the current mechanism for keeping
libTarget useful outside the backend. You'll get a not-so-nice link
error if you invoke a TargetInstrInfo method that depends on CodeGen.

The TargetInstrInfoImpl class should probably be removed since it
doesn't really solve this problem.

To really fix this, we probably need separate interfaces for the
CodeGen/nonCodeGen sides of TargetInstrInfo.

llvm-svn: 158212
2012-06-08 17:23:27 +00:00
Hal Finkel
d05ff520b8 Disable the PPC CTR-Loops pass by default.
The pass itself works well, but the something in the Machine* infrastructure
does not understand terminators which define registers. Without the ability
to use the block-placement pass, etc. this causes performance regressions (and
so is turned off by default). Turning off the analysis turns off the problems
with the Machine* infrastructure.

llvm-svn: 158206
2012-06-08 15:38:25 +00:00
Hal Finkel
a6629c556e Fix a bug in the new PPC CTR-Loops pass.
The code which tests for an induction operation cannot assume that any
ADDI instruction will have a register operand because the operand could
also be a frame index; for example:
    %vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16

llvm-svn: 158205
2012-06-08 15:38:23 +00:00
Hal Finkel
bb4e499e94 Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form CTR-based loop branching code.
This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon
pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are
no longer otherwise used. Also, invalid preheader DebugLoc is not used.

llvm-svn: 158204
2012-06-08 15:38:21 +00:00
Manman Ren
f51a6d5fae X86: optimize generated code for integer ABS
This patch will generate the following for integer ABS:
      movl    %edi, %eax
      negl    %eax
      cmovll  %edi, %eax
INSTEAD OF
      movl    %edi, %ecx
      sarl    $31, %ecx
      leal    (%rdi,%rcx), %eax
      xorl    %ecx, %eax

There exists a target-independent DAG combine for integer ABS, which converts
integer ABS to sar+add+xor. For X86, we match this pattern back to neg+cmov. 
This is implemented in PerformXorCombine.

rdar://10695237

llvm-svn: 158175
2012-06-07 22:39:10 +00:00
Nadav Rotem
7d996eafba Do not optimize the used bits of the x86 vselect condition operand, when the condition operand is a vector of 1-bit predicates.
This may happen on MIC devices.

llvm-svn: 158168
2012-06-07 20:53:48 +00:00
Andrew Trick
4fe40f02fd Continue factoring computeOperandLatency. Use it for ARM hasHighOperandLatency.
llvm-svn: 158164
2012-06-07 19:42:04 +00:00
Andrew Trick
1429abc731 ARM getOperandLatency rewrite.
Match expectations of the new latency API. Cleanup and make the logic consistent.

llvm-svn: 158163
2012-06-07 19:42:00 +00:00
Andrew Trick
cbd1d0a130 ARM getOperandLatency should return -1 for unknown, consistent with API
llvm-svn: 158162
2012-06-07 19:41:58 +00:00
Andrew Trick
d0f85a1d12 Fix ARM getInstrLatency logic to work with the current API.
llvm-svn: 158161
2012-06-07 19:41:55 +00:00
Manman Ren
c8e46bcf47 PR13046: we can't replace usage of SUB with CMP in the lowering phase.
It will cause assertion failure later on.

llvm-svn: 158160
2012-06-07 19:27:33 +00:00
Rafael Espindola
4c9d611360 Use a base register instead of an index register with the local dynamic model.
Fixes pr13048.

llvm-svn: 158158
2012-06-07 18:39:19 +00:00
Manman Ren
1d91fc3342 X86: replace SUB with CMP if possible
This patch will optimize the following
    movq    %rdi, %rax
    subq    %rsi, %rax
    cmovsq  %rsi, %rdi
    movq    %rdi, %rax
to
    cmpq    %rsi, %rdi
    cmovsq  %rsi, %rdi
    movq    %rdi, %rax

Perform this optimization if the actual result of SUB is not used.

rdar: 11540023
llvm-svn: 158126
2012-06-07 00:42:47 +00:00
Manman Ren
f591de61da Revert r157755.
The commit is intended to fix rdar://11540023.
It is implemented as part of peephole optimization. We can actually implement
this in the SelectionDAG lowering phase.

llvm-svn: 158122
2012-06-06 23:53:03 +00:00
Benjamin Kramer
58b98297ac Round 2 of dead private variable removal.
LLVM is now -Wunused-private-field clean except for
- lib/MC/MCDisassembler/Disassembler.h. Not sure why it keeps all those unaccessible fields.
- gtest.

llvm-svn: 158096
2012-06-06 19:47:08 +00:00
Benjamin Kramer
d93c18846c Remove unused private fields found by clang's new -Wunused-private-field.
There are some that I didn't remove this round because they looked like
obvious stubs. There are dead variables in gtest too, they should be
fixed upstream.

llvm-svn: 158090
2012-06-06 18:25:08 +00:00
Chad Rosier
5a354cd5e8 Add support for dynamic stack realignment in the presence of dynamic allocas on
X86.
rdar://11496434

llvm-svn: 158087
2012-06-06 17:37:40 +00:00
Richard Barton
eedaf01042 Correct decoder for T1 conditional B encoding
llvm-svn: 158055
2012-06-06 09:12:53 +00:00
Craig Topper
8d23b98818 Mark several instructions SSE2 instead of SSE3 as they should be.
llvm-svn: 158049
2012-06-06 06:45:27 +00:00
Andrew Trick
24cce40009 misched: API for minimum vs. expected latency.
Minimum latency determines per-cycle scheduling groups.
Expected latency determines critical path and cost.

llvm-svn: 158021
2012-06-05 21:11:27 +00:00
Yuan Lin
00b608400c Fix header file include order in NVPTX backend NV_CONTRIB
llvm-svn: 158013
2012-06-05 19:06:13 +00:00
Roman Divacky
f8e2e4beaa PPC32 uses R2 as the TLS register. Fix the copy and paste.
llvm-svn: 158004
2012-06-05 17:14:17 +00:00
Andrew Trick
6d6fa07808 X86 itinerary properties.
llvm-svn: 157981
2012-06-05 03:44:46 +00:00
Andrew Trick
80ddb55a53 ARM itinerary properties.
llvm-svn: 157980
2012-06-05 03:44:43 +00:00
Andrew Trick
e7159e6731 misched: Added MultiIssueItineraries.
This allows a subtarget to explicitly specify the issue width and
other properties without providing pipeline stage details for every
instruction.

llvm-svn: 157979
2012-06-05 03:44:40 +00:00
Andrew Trick
8b333df134 whitespace
llvm-svn: 157976
2012-06-05 03:44:29 +00:00
Joel Jones
6fb688ff18 Revert commit r157966
llvm-svn: 157972
2012-06-05 00:47:21 +00:00
Joel Jones
fdd5284d04 This change handles a another case for generating the bic instruction
when a compile time constant is known.  This occurs when implicitly zero 
extending function arguments from 16 bits to 32 bits.

<rdar://problem/11481151>

llvm-svn: 157966
2012-06-04 23:38:57 +00:00
Akira Hatanaka
f860bcf5fc Fix a bug in MipsTargetLowering::LowerLOAD. A shift-right-logical node is
inserted after the shift-left-logical node.

llvm-svn: 157937
2012-06-04 17:46:29 +00:00
Roman Divacky
0daa2c0556 Implement local-exec TLS on PowerPC.
llvm-svn: 157935
2012-06-04 17:36:38 +00:00
Hans Wennborg
ff55c96bc2 MIPS TLS: use the model selected by TargetMachine::getTLSModel().
This was mostly done already in r156162, but I missed one place.

llvm-svn: 157929
2012-06-04 14:02:08 +00:00
Hans Wennborg
60ee5bbc4f Better comments for TLS-related X86 MachineOperand flags.
llvm-svn: 157920
2012-06-04 09:55:36 +00:00
Craig Topper
52bf0cfb27 Add intrinsic forms for FMA instructions to opcode folding tables.
llvm-svn: 157917
2012-06-04 07:46:16 +00:00
Craig Topper
eb2d859f52 Add VFMADDSUB and VFMSUBADD FMA instructions to folding tables. Also add 213 forms of scalar FMA instructions.
llvm-svn: 157914
2012-06-04 07:08:21 +00:00
Hal Finkel
c1daf7daf8 Fix a copy-and-paste duplication error in the PPC 440 and A2 schedules (no functionality change).
llvm-svn: 157912
2012-06-04 02:39:52 +00:00