1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 05:23:45 +02:00
Commit Graph

31726 Commits

Author SHA1 Message Date
Bill Schmidt
c18bf56926 [PowerPC] Yet another approach to __tls_get_addr
This patch is a third attempt to properly handle the local-dynamic and
global-dynamic TLS models.

In my original implementation, calls to __tls_get_addr were hidden
from view until the asm-printer phase, at which point the underlying
branch-and-link instruction was created with proper relocations.  This
mostly worked well, but I used some repellent techniques to ensure
that the TLS_GET_ADDR nodes at the SD and MI levels correctly received
input from GPR3 and produced output into GPR3.  This proved to work
badly in the presence of multiple TLS variable accesses, with the
copies to and from GPR3 being scheduled incorrectly and generally
creating havoc.

In r221703, I addressed that problem by representing the calls to
__tls_get_addr as true calls during instruction lowering.  This had
the advantage of removing all of the bad hacks and relying on the
existing call machinery to properly glue the copies in place. It
looked like this was going to be the right way to go.

However, as a side effect of the recent discovery of problems with
linker optimizations for TLS, we discovered cases of suboptimal code
generation with this strategy.  The problem comes when tls_get_addr is
called for the same address, and there is a resulting CSE
opportunity.  It turns out that in such cases MachineCSE will common
the addis/addi instructions that set up the input value to
tls_get_addr, but will not common the calls themselves.  MachineCSE
does not have any machinery to common idempotent calls.  This is
perfectly sensible, since presumably this would be done at the IR
level, and introducing calls in the back end isn't commonplace.  In
any case, we end up with two calls to __tls_get_addr when one would
suffice, and that isn't good.

I presumed that the original design would have allowed commoning of
the machine-specific nodes that hid the __tls_get_addr calls, so as
suggested by Ulrich Weigand, I went back to that design and cleaned it
up so that the copies were properly held together by glue
nodes.  However, it turned out that this didn't work either...the
presence of copies to physical registers kept the machine-specific
nodes from being commoned also.

All of which leads to the design presented here.  This is a return to
the original design, except that no attempt is made to introduce
copies to and from GPR3 during instruction lowering.  Virtual registers
are used until prior to register allocation.  At that point, a special
pass is run that identifies the machine-specific nodes that hide the
tls_get_addr calls and introduces the copies to and from GPR3 around
them.  The register allocator then coalesces these copies away.  With
this design, MachineCSE succeeds in commoning tls_get_addr calls where
possible, and we get nice optimal code generation (better than GCC at
the moment, which does not common these calls).

One additional problem must be dealt with:  After introducing the
mentions of the physical register GPR3, the aggressive anti-dependence
breaker sees opportunities to improve scheduling by selecting a
different register instead.  Flags must be used on the instruction
descriptions to tell the anti-dependence breaker to keep its hands in
its pockets.

One thing missing from the original design was recording a definition
of the link register on the GET_TLS_ADDR nodes.  Doing this was found
to be insufficient to force a stack frame to be created, which led to
looping behavior because two different LR values were stored at the
same address.  This appears to have been an oversight in
PPCFrameLowering::determineFrameLayout(), which is repaired here.

Because MustSaveLR() returns true for calls to builtin_return_address,
this changed the expected behavior of
test/CodeGen/PowerPC/retaddr2.ll, which now stacks a frame but
formerly did not.  I've fixed the test case to reflect this.

There are existing TLS tests to catch regressions; the checks in
test/CodeGen/PowerPC/tls-store2.ll proved to be too restrictive in the
face of instruction scheduling with these changes, so I fixed that
up.

I've added a new test case based on the PrettyStackTrace module that
demonstrated the original problem. This checks that we get correct
code generation and that CSE of the calls to __get_tls_addr has taken
place.

llvm-svn: 227976
2015-02-03 16:16:01 +00:00
Bruno Cardoso Lopes
c6e5ceb6dd [X86][MMX] Improve transfer from mmx to i32
Improve EXTRACT_VECTOR_ELT DAG combine to catch conversion patterns
between x86mmx and i32 with more layers of indirection.

Before:
  movq2dq %mm0, %xmm0
  movd %xmm0, %eax
After:
  movd %mm0, %eax

llvm-svn: 227969
2015-02-03 14:46:49 +00:00
Craig Topper
5a9b4168e7 [X86] Make fxsave64/fxrstor64/xsave64/xsrstor64/xsaveopt64 parseable in AT&T syntax. Also make them the default output.
llvm-svn: 227963
2015-02-03 11:03:57 +00:00
Craig Topper
48c1e3f17c [X86] Add Requires[In64BitMode] around MOVSX64rr32/MOVSX64rm32. This makes it more strictly mutexed with the ARPL instruction 32-bit mode. Helps with some disassembler changes I'm experimenting with. Should be NFC.
llvm-svn: 227962
2015-02-03 11:03:43 +00:00
Eric Christopher
cc62f1ae1b Only access TLOF via the TargetMachine, not TargetLowering.
llvm-svn: 227949
2015-02-03 07:22:52 +00:00
Eric Christopher
a26193de86 Define a runOnMachineFunction for the Hexagon AsmPrinter and
use it to initialize the subtarget.

llvm-svn: 227948
2015-02-03 06:40:22 +00:00
Eric Christopher
2ba773b6bd Migrate away from using a Subtarget except for the one place we want
to use it. Use the triple to determine OS format bits at the module
level.

llvm-svn: 227947
2015-02-03 06:40:19 +00:00
Eric Christopher
0cf178e495 Migrate to using the subtarget on the machine function and update
all uses.

llvm-svn: 227891
2015-02-02 23:03:45 +00:00
Eric Christopher
af3426aea9 Use the function template getSubtarget off of the machine function,
and use it in all locations.

llvm-svn: 227890
2015-02-02 23:03:43 +00:00
Eric Christopher
feb4ffa8ee Use the cached subtarget on the MachineFunction.
llvm-svn: 227885
2015-02-02 22:40:56 +00:00
Eric Christopher
0224cb6674 Remove dead header.
llvm-svn: 227884
2015-02-02 22:40:54 +00:00
Eric Christopher
a3a294848d Remove dead code in the HexagonMCInst classes. This also fixes
a layering violation in the port and removes calls to getSubtargetImpl.

llvm-svn: 227883
2015-02-02 22:40:53 +00:00
Eric Christopher
bfcc1f0544 80-col fixup.
llvm-svn: 227882
2015-02-02 22:40:51 +00:00
Eric Christopher
da02fbafc4 Remove dead code in the HexagonMCInst classes. This also fixes
a layering violation in the port and removes calls to getSubtargetImpl.

llvm-svn: 227880
2015-02-02 22:28:48 +00:00
Eric Christopher
922293f3ca 80-col fixup.
llvm-svn: 227879
2015-02-02 22:28:46 +00:00
Eric Christopher
cddc0aa5d2 Remove unused class variables and update all callers/uses from
the HexagonSplitTFRCondSet pass. Use the subtarget off the machine
function at the same time.

llvm-svn: 227878
2015-02-02 22:28:44 +00:00
Eric Christopher
3e95522a4f Migrate the HexagonSplitConst32AndConst64 pass from TargetMachine
based getSubtarget to the one cached on the MachineFunction.
Remove unused class variables and update all callers/uses.

llvm-svn: 227874
2015-02-02 22:11:43 +00:00
Eric Christopher
9e0beab8b0 Remove #if'd code and update comment.
llvm-svn: 227873
2015-02-02 22:11:42 +00:00
Eric Christopher
3ccabc821b Move HexagonMachineScheduler to use the subtarget off of the
MachineFunction and update all uses accordingly including
VLIWResourceModel.

llvm-svn: 227872
2015-02-02 22:11:40 +00:00
Eric Christopher
3ce7338e3c Cache and use the subtarget that owns the target lowering.
llvm-svn: 227871
2015-02-02 22:11:36 +00:00
Alexei Starovoitov
d7a5a8cce0 bpf: Use the getSubtarget call off of the MachineFunction rather than the TargetMachine
Summary:
Hi Eric,

this patch cleans up the layering violation that you're fixing across backends.
Anything else I need to fix on bpf backend side?

Thanks

Reviewers: echristo

Reviewed By: echristo

Differential Revision: http://reviews.llvm.org/D7355

llvm-svn: 227865
2015-02-02 21:24:27 +00:00
Jingyue Wu
34a8e5e1ea Resurrect the assertion removed by r227717
Summary: MSVC can compile "LoopID->getOperand(0) == LoopID" when LoopID is MDNode*.

Test Plan: no regression

Reviewers: mkuper

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D7327

llvm-svn: 227853
2015-02-02 20:41:11 +00:00
Eric Christopher
5cb620ce06 Migrate HexagonISelDAGToDAG to setting a subtarget pointer during
runOnMachineFunction. Update all uses of the Subtarget accordingly.

llvm-svn: 227840
2015-02-02 19:22:03 +00:00
Eric Christopher
eabd2452d1 Use the getSubtarget call off of the MachineFunction rather than
the TargetMachine.

llvm-svn: 227839
2015-02-02 19:22:01 +00:00
Eric Christopher
fa6d0be9d4 Remove unused class variables and update calls to get the subtarget
off of the machine function.

llvm-svn: 227837
2015-02-02 19:05:28 +00:00
Eric Christopher
1a48904da5 Sink queries into asserts since the variable is unused otherwise.
llvm-svn: 227836
2015-02-02 18:58:24 +00:00
Eric Christopher
5120f67861 Update CMake build for removed files.
llvm-svn: 227834
2015-02-02 18:52:49 +00:00
Eric Christopher
815e721ecd Get TargetRegisterInfo and TargetInstrInfo off of the MachineFunction
and remove unnecessary class variables.

llvm-svn: 227832
2015-02-02 18:46:31 +00:00
Eric Christopher
5e75c7a7c4 Use the function template getSubtarget to remove an explicit cast.
llvm-svn: 227831
2015-02-02 18:46:29 +00:00
Eric Christopher
201e4d3e21 Grab TargetInstrInfo off of the MachineFunction and remove
unnecessary class variables.

llvm-svn: 227830
2015-02-02 18:46:27 +00:00
Eric Christopher
ef475ed72a Remove unused files.
llvm-svn: 227829
2015-02-02 18:46:23 +00:00
Tom Stellard
5e1403ebec R600/SI: 64-bit and larger memory access must be at least 4-byte aligned
This is true for SI only. CI+ supports unaligned memory accesses,
but this requires driver support, so for now we disallow unaligned
accesses for all GCN targets.

llvm-svn: 227822
2015-02-02 18:02:28 +00:00
Ahmed Bougacha
53ed9373bc [AArch64] Prefer DUP/MOV ("CPY") to INS for vector_extract.
This avoids a partial false dependency on the previous content of
the upper lanes of the destination vector register.

Differential Revision: http://reviews.llvm.org/D7307

llvm-svn: 227820
2015-02-02 17:55:57 +00:00
Eric Christopher
e7361de869 Since TargetLowering is already subtarget dependent just pass
in the subtarget and stash it in the class so that lookups are
easier and safer.

llvm-svn: 227819
2015-02-02 17:52:27 +00:00
Eric Christopher
6ad9db4ce3 Use the function template getSubtarget on the MachineFunction
rather than a larger explicit cast.

llvm-svn: 227818
2015-02-02 17:52:25 +00:00
Eric Christopher
f136200ef8 Remove unused class variable.
llvm-svn: 227817
2015-02-02 17:52:23 +00:00
Eric Christopher
2d72ffac12 Remove unused class variable.
llvm-svn: 227816
2015-02-02 17:52:20 +00:00
Eric Christopher
b584389296 Reuse a bunch of cached subtargets and remove getSubtarget calls
without a Function argument.

llvm-svn: 227814
2015-02-02 17:38:43 +00:00
Eric Christopher
8eda3c8e33 Remove some unused forward declarations.
llvm-svn: 227812
2015-02-02 17:38:37 +00:00
Jan Wen Voung
9bc6ef1872 Fix ARM peephole optimizeCompare to avoid optimizing unsigned cmp to 0.
Summary:
Previously it only avoided optimizing signed comparisons to 0.
Sometimes the DAGCombiner will optimize the unsigned comparisons
to 0 before it gets to the peephole pass, but sometimes it doesn't.

Fix for PR22373.

Test Plan: test/CodeGen/ARM/sub-cmp-peephole.ll

Reviewers: jfb, manmanren

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D7274

llvm-svn: 227809
2015-02-02 16:56:50 +00:00
Hal Finkel
9be22b1e21 [PowerPC] Put PPCEarlyReturn into its own source file
PPCInstrInfo.cpp has ended up containing several small MI-level passes, and
this is making the file harder to read than necessary. Split out
PPCEarlyReturn into its own source file. NFC.

Now that PPCInstrInfo.cpp does not also contain pass implementations, I hope
that it will be slightly less unwieldy.

llvm-svn: 227775
2015-02-01 22:58:46 +00:00
Hal Finkel
1ad18b8c2c [PowerPC] Remove unnecessary include
llvm-svn: 227772
2015-02-01 22:03:13 +00:00
Hal Finkel
287f0c7ac8 [PowerPC] Put PPCVSXCopy into its own source file
PPCInstrInfo.cpp has ended up containing several small MI-level passes, and
this is making the file harder to read than necessary. Split out
PPCVSXCopy into its own source file. NFC.

llvm-svn: 227771
2015-02-01 22:01:29 +00:00
Hal Finkel
a67f242e9a [PowerPC] Put PPCVSXFMAMutate into its own source file
PPCInstrInfo.cpp has ended up containing several small MI-level passes, and
this is making the file harder to read than necessary. Split out
PPCVSXFMAMutate into its own source file. NFC.

llvm-svn: 227770
2015-02-01 21:51:22 +00:00
Hal Finkel
7435b67236 [PowerPC] Remove the PPCVSXCopyCleanup pass
This MI-level pass was necessary when VSX support was first being developed,
specifically, before the ABI code had been updated to use VSX registers for
arguments (the register assignments did not change, in a physical sense, but
the VSX super-registers are now used). Unfortunately, I never went back and
removed this pass after that was done. I believe this code is now effectively
dead.

llvm-svn: 227767
2015-02-01 21:20:58 +00:00
Hal Finkel
63bdc9fc20 [PowerPC] Add implicit ops to conditional returns in PPCEarlyReturn
When PPCEarlyReturn, it should really copy implicit ops from the old return
instruction to the new one. This currently does not matter much, because we run
PPCEarlyReturn very late in the pipeline (there is nothing to do DCE on
definitions of those registers). However, for completeness, we should do it
anyway.

Noticed by inspection (and there should be no functional change); thus, no
test case.

llvm-svn: 227763
2015-02-01 20:16:10 +00:00
Hal Finkel
276d13a6e1 [PowerPC] VSX stores don't also read
The VSX store instructions were also picking up an implicit "may read" from the
default pattern, which was an intrinsic (and we don't currently have a way of
specifying write-only intrinsics).

This was causing MI verification to fail for VSX spill restores.

llvm-svn: 227759
2015-02-01 19:07:41 +00:00
Hal Finkel
4a7dffd074 [PowerPC] Better scheduling for isel on P7/P8
isel is actually a cracked instruction on the P7/P8, and must start a dispatch
group. The scheduling model should reflect this so that we don't bunch too many
of them together when possible.

Thanks to Bill Schmidt and Pat Haugen for helping to sort this out.

llvm-svn: 227758
2015-02-01 17:52:16 +00:00
Michael Kuperstein
41ae9af2e3 [X86] Convert esp-relative movs of function arguments to pushes, step 2
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a 
reserved call frame), and perform rudimentary call folding. It still doesn't 
have a heuristic, so it is enabled only for optsize/minsize, with stack 
alignment <= 8, where it ought to be a fairly clear win.

(Re-commit of r227728)

Differential Revision: http://reviews.llvm.org/D6789

llvm-svn: 227752
2015-02-01 16:56:04 +00:00
Michael Kuperstein
f73ce6a4c9 Revert r227728 due to bad line endings.
llvm-svn: 227746
2015-02-01 16:15:07 +00:00