1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 19:12:56 +02:00
llvm-mirror/test
Bill Schmidt c18bf56926 [PowerPC] Yet another approach to __tls_get_addr
This patch is a third attempt to properly handle the local-dynamic and
global-dynamic TLS models.

In my original implementation, calls to __tls_get_addr were hidden
from view until the asm-printer phase, at which point the underlying
branch-and-link instruction was created with proper relocations.  This
mostly worked well, but I used some repellent techniques to ensure
that the TLS_GET_ADDR nodes at the SD and MI levels correctly received
input from GPR3 and produced output into GPR3.  This proved to work
badly in the presence of multiple TLS variable accesses, with the
copies to and from GPR3 being scheduled incorrectly and generally
creating havoc.

In r221703, I addressed that problem by representing the calls to
__tls_get_addr as true calls during instruction lowering.  This had
the advantage of removing all of the bad hacks and relying on the
existing call machinery to properly glue the copies in place. It
looked like this was going to be the right way to go.

However, as a side effect of the recent discovery of problems with
linker optimizations for TLS, we discovered cases of suboptimal code
generation with this strategy.  The problem comes when tls_get_addr is
called for the same address, and there is a resulting CSE
opportunity.  It turns out that in such cases MachineCSE will common
the addis/addi instructions that set up the input value to
tls_get_addr, but will not common the calls themselves.  MachineCSE
does not have any machinery to common idempotent calls.  This is
perfectly sensible, since presumably this would be done at the IR
level, and introducing calls in the back end isn't commonplace.  In
any case, we end up with two calls to __tls_get_addr when one would
suffice, and that isn't good.

I presumed that the original design would have allowed commoning of
the machine-specific nodes that hid the __tls_get_addr calls, so as
suggested by Ulrich Weigand, I went back to that design and cleaned it
up so that the copies were properly held together by glue
nodes.  However, it turned out that this didn't work either...the
presence of copies to physical registers kept the machine-specific
nodes from being commoned also.

All of which leads to the design presented here.  This is a return to
the original design, except that no attempt is made to introduce
copies to and from GPR3 during instruction lowering.  Virtual registers
are used until prior to register allocation.  At that point, a special
pass is run that identifies the machine-specific nodes that hide the
tls_get_addr calls and introduces the copies to and from GPR3 around
them.  The register allocator then coalesces these copies away.  With
this design, MachineCSE succeeds in commoning tls_get_addr calls where
possible, and we get nice optimal code generation (better than GCC at
the moment, which does not common these calls).

One additional problem must be dealt with:  After introducing the
mentions of the physical register GPR3, the aggressive anti-dependence
breaker sees opportunities to improve scheduling by selecting a
different register instead.  Flags must be used on the instruction
descriptions to tell the anti-dependence breaker to keep its hands in
its pockets.

One thing missing from the original design was recording a definition
of the link register on the GET_TLS_ADDR nodes.  Doing this was found
to be insufficient to force a stack frame to be created, which led to
looping behavior because two different LR values were stored at the
same address.  This appears to have been an oversight in
PPCFrameLowering::determineFrameLayout(), which is repaired here.

Because MustSaveLR() returns true for calls to builtin_return_address,
this changed the expected behavior of
test/CodeGen/PowerPC/retaddr2.ll, which now stacks a frame but
formerly did not.  I've fixed the test case to reflect this.

There are existing TLS tests to catch regressions; the checks in
test/CodeGen/PowerPC/tls-store2.ll proved to be too restrictive in the
face of instruction scheduling with these changes, so I fixed that
up.

I've added a new test case based on the PrettyStackTrace module that
demonstrated the original problem. This checks that we get correct
code generation and that CSE of the calls to __get_tls_addr has taken
place.

llvm-svn: 227976
2015-02-03 16:16:01 +00:00
..
Analysis [PM] Change the core design of the TTI analysis to use a polymorphic 2015-01-31 03:43:40 +00:00
Assembler IR: Update references to temporaries before deleting 2015-01-22 21:36:45 +00:00
Bindings Propagate a better error message to the C api. 2015-02-03 01:53:03 +00:00
Bitcode Check bit widths before trying to get a type. 2015-01-30 18:13:50 +00:00
BugPoint IR: Move MDLocation into place 2015-01-14 22:27:36 +00:00
CodeGen [PowerPC] Yet another approach to __tls_get_addr 2015-02-03 16:16:01 +00:00
DebugInfo Debug Info: Relax assertion in isUnsignedDIType() to allow floats to be 2015-02-02 18:31:58 +00:00
ExecutionEngine [Orc] Make OrcMCJITReplacement::addObject calls transfer buffer ownership to the 2015-02-02 19:51:18 +00:00
Feature IR: Move MDLocation into place 2015-01-14 22:27:36 +00:00
FileCheck
Instrumentation tsan: properly instrument unaligned accesses 2015-01-27 20:19:17 +00:00
Integer [tests] Cleanup initialization of test suffixes. 2013-08-16 00:37:11 +00:00
JitListener IR: Move MDLocation into place 2015-01-14 22:27:36 +00:00
Linker IR: Move MDLocation into place 2015-01-14 22:27:36 +00:00
LTO Introduce llvm/test/LTO/X86. LTO tests may be assumed as target-specific. 2015-01-30 10:09:26 +00:00
MC [X86] Make fxsave64/fxrstor64/xsave64/xsrstor64/xsaveopt64 parseable in AT&T syntax. Also make them the default output. 2015-02-03 11:03:57 +00:00
Object [ELFYAML] Provide default value 0 for YAML relocation addendum field 2015-01-29 06:56:24 +00:00
Other [PM] Teach the module-to-function adaptor to not run function passes 2015-02-01 10:47:25 +00:00
SymbolRewriter SymbolRewriter: allow rewriting with comdats 2015-01-27 22:57:39 +00:00
TableGen
tools llvm-readobj: add a test case for ARM_MOV32(T) base relocation 2015-01-31 04:46:50 +00:00
Transforms Fix: SLPVectorizer crashes with assertion when vectorizing a cmp instruction. 2015-02-02 12:45:34 +00:00
Unit
Verifier Fix statepoint verifier tests to actually test verifier. 2015-01-30 23:18:42 +00:00
YAMLParser
.clang-format
CMakeLists.txt Revert r224149, llvm-dsymutil was already here. 2014-12-12 21:25:07 +00:00
lit.cfg llvm/test/lit.cfg: have_ld_plugin_support(): Use decode() for stdout. 2015-01-05 14:18:04 +00:00
lit.site.cfg.in Reverting r226937: lit: Make MCJIT's supported arch check case insensitive 2015-01-24 01:42:44 +00:00
Makefile [lit] Make config.llvm_lib_dir available on cmake, too. 2014-12-30 03:24:11 +00:00
Makefile.tests
TestRunner.sh