1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00
Commit Graph

17233 Commits

Author SHA1 Message Date
Renato Golin
aea9ec6761 Revert 202433 - Provide a target override for the latest regalloc heuristic
That commit was introduced in order to help investigate a problem in ARM
codegen breaking from commit 202304 (Add a limit to the heuristic that register
allocates instructions in local order). Recent analisys indicated that the
problem no longer exists, so I'm reverting this change.

See PR18996.

llvm-svn: 218981
2014-10-03 12:20:53 +00:00
Chandler Carruth
48b57382a8 Fix the threshold added in r186434 (a re-apply of r185393) and updaated
to be a ManagedStatic in r218163 to not be a global variable written and
read to from within the innards of SpillPlacement.

This will fix a really scary race condition for anyone that has two
copies of LLVM running spill placement concurrently. Yikes!

This will also fix a really significant compile time hit that r218163
caused because the spill placement threshold read is actually in the
*very* hot path of this code. The memory fence on each read was showing
up as huge compile time regressions when spilling is responsible for
most of the compile time. For example, optimizing sanitized code showed
over 50% compile time regressions here. =/

llvm-svn: 218921
2014-10-02 22:23:14 +00:00
Duncan P. N. Exon Smith
fb6bcc4eb2 Revert "DI: Fold constant arguments into a single MDString"
This reverts commit r218914 while I investigate some bots.

llvm-svn: 218918
2014-10-02 22:15:31 +00:00
Duncan P. N. Exon Smith
58b6077a79 DI: Fold constant arguments into a single MDString
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString.  Integers are stringified and
a `\0` character is used as a separator.

Part of PR17891.

Note: I've attached my testcases upgrade scripts to the PR.  If I've
just broken your out-of-tree testcases, they might help.

llvm-svn: 218914
2014-10-02 21:56:57 +00:00
Adrian Prantl
2b1df58ebe Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787
2014-10-01 18:55:02 +00:00
Adrian Prantl
0959156fa3 Revert r218778 while investigating buldbot breakage.
"Move the complex address expression out of DIVariable and into an extra"

llvm-svn: 218782
2014-10-01 18:10:54 +00:00
Adrian Prantl
229943585f Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

llvm-svn: 218778
2014-10-01 17:55:39 +00:00
Jingyue Wu
784352be06 Revert r216862 due to a performance regression
Reported by Alexey Volkov in PR21115

llvm-svn: 218771
2014-10-01 15:22:13 +00:00
David Blaikie
fe64c97aaa Implement DW_TAG_subrange_type with DW_AT_count rather than DW_AT_upper_bound
This allows proper disambiguation of unbounded arrays and arrays of zero
bound ("struct foo { int x[]; };" and "struct foo { int x[0]; }"). GCC
instead produces an upper bound of -1 in the latter situation, but count
seems tidier. This way lower_bound is provided if it's not the language
default and count is provided if the count is known, otherwise it's
omitted. Simple.

If someone wants to look at rdar://problem/12566646 and see if this
change is acceptable to that bug/fix, that might be helpful (see the
empty-and-one-elem-array.ll test case which cites that radar).

llvm-svn: 218726
2014-10-01 00:56:55 +00:00
David Blaikie
5720297fff Omit DW_AT_inline under -gmlt to save a little more space.
llvm-svn: 218719
2014-09-30 23:29:16 +00:00
David Blaikie
0fda7951bb DebugInfo: Sink the code emitting DW_AT_APPLE_omit_frame_ptr down to a more common spot.
No functional change. Pre-emptive refactoring before I start pushing
some of this subprogram creation down into DWARFCompileUnit so I can
build different subprograms in the skeleton unit from the dwo unit for
adding -gmlt-like data to the skeleton.

llvm-svn: 218713
2014-09-30 22:32:49 +00:00
David Blaikie
c3f2904ef4 Disable the -gmlt optimization implemented in r218129 under Darwin due to issues with dsymutil.
r218129 omits DW_TAG_subprograms which have no inlined subroutines when
emitting -gmlt data. This makes -gmlt very low cost for -O0 builds.

Darwin's dsymutil reasonably considers a CU empty if it has no
subprograms (which occurs with the above optimization in -O0 programs
without any force_inline function calls) and drops the line table, CU,
and everything in this situation, making backtraces impossible.

Until dsymutil is modified to account for this, disable this
optimization on Darwin to preserve the desired functionality.
(see r218545, which should be reverted after this patch, for other
discussion/details)

Footnote:
In the long term, it doesn't look like this scheme (of simplified debug
info to describe inlining to enable backtracing) is tenable, it is far
too size inefficient for optimized code (the DW_TAG_inlined_subprograms,
even once compressed, are nearly twice as large as the line table
itself (also compressed)) and we'll be considering things like Cary's
two level line table proposal to encode all this information directly in
the line table.

llvm-svn: 218702
2014-09-30 21:28:32 +00:00
Sanjay Patel
abcb3acee5 Use the target-specified iteration count to opt out of any further refinement of an estimate. NFC.
llvm-svn: 218700
2014-09-30 20:44:23 +00:00
Sanjay Patel
c095454ca2 Split the estimate() interface into separate functions for each type. NFC.
It was hacky to use an opcode as a switch because it won't always match
(rsqrte != sqrte), and it looks like we'll need to add more special casing
per arch than I had hoped for. Eg, x86 will prefer a different NR estimate
implementation. ARM will want to use it's 'step' instructions. There also
don't appear to be any new estimate instructions in any arch in a long,
long time. Altivec vloge and vexpte may have been the first and last in
that field...

llvm-svn: 218698
2014-09-30 20:28:48 +00:00
Andrea Di Biagio
a5f619dff6 [DAG] Check in advance if a build_vector has a legal type before attempting to convert it into a shuffle.
Currently, the DAG Combiner only tries to convert type-legal build_vector nodes
into shuffles. This patch simply moves the logic that checks if a
build_vector has a legal value type up before we even start analyzing the
operands. This allows to early exit immediately from method
'visitBUILD_VECTOR' if the node type is known to be illegal.

No functional change intended.

llvm-svn: 218677
2014-09-30 15:30:22 +00:00
Matt Arsenault
23f3740054 Add MachineOperand::ChangeToFPImmediate and setFPImm
llvm-svn: 218579
2014-09-28 19:24:59 +00:00
James Molloy
3c39c7c1b4 [AArch64] Redundant store instructions should be removed as dead code
If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed.

This problem is found in spec2006-197.parser.

For example,
  stur    w10, [x11, #-4]
  stur    w10, [x11, #-4]
Then one of the two stur instructions can be removed.

Patch by David Xu!

llvm-svn: 218569
2014-09-27 17:02:54 +00:00
Sanjay Patel
98a98574c5 Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2).
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

And:

z = y / x

into:

z = y * rcpe(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.

Differential Revision: http://reviews.llvm.org/D5484

llvm-svn: 218553
2014-09-26 23:01:47 +00:00
David Xu
c351ac640d Revert patch ofr218493
llvm-svn: 218494
2014-09-26 02:28:03 +00:00
David Xu
43a5d5bdc1 Redundant store instructions should be removed as dead code
llvm-svn: 218493
2014-09-26 02:02:09 +00:00
Eric Christopher
3b65e1ff31 Move resetTargetOptions from taking a MachineFunction to a Function
since we are accessing the TargetMachine that we're a member
function of.

llvm-svn: 218489
2014-09-26 01:28:10 +00:00
Bruno Cardoso Lopes
d68585f076 [MachineSink+PGO] Teach MachineSink to use BlockFrequencyInfo
Machine Sink uses loop depth information to select between successors BBs to
sink machine instructions into, where BBs within smaller loop depths are
preferable.  This patch adds support for choosing between successors by using
profile information from BlockFrequencyInfo instead, whenever the information
is available.

Tested it under SPEC2006 train (average of 30 runs for each program); ~1.5%
execution speedup in average on x86-64 darwin.

<rdar://problem/18021659>

llvm-svn: 218472
2014-09-25 23:14:26 +00:00
Tom Stellard
fbc414f7ff SelectionDAG: Remove #if NDEBUG from check for a post-isel hook
The InstrEmitter will skip the check of MI.hasPostISelHook()
before calling AdjustInstrPostInstrSelection() when NDEBUG
is not defined.

This was added in r140228, and I'm not sure if it is intentional or not,
but it is a likely source for bugs, because it means with
Release+Asserts builds you can forget to set the hasPostISelHook
flag on TableGen definitions and AdjustInstrPostInstrSelection() will
still be called.

llvm-svn: 218458
2014-09-25 18:59:22 +00:00
Robin Morisset
98b0bed638 Lower idempotent RMWs to fence+load
Summary:
I originally tried doing this specifically for X86 in the backend in D5091,
but it was rather brittle and generally running too late to be general.
Furthermore, other targets may want to implement similar optimizations.
So I reimplemented it at the IR-level, fitting it into AtomicExpandPass
as it interacts with that pass (which could not be cleanly done before
at the backend level).

This optimization relies on a new target hook, which is only used by X86
for now, as the correctness of the optimization on other targets remains
an open question. If it is found correct on other targets, it should be
trivial to enable for them.

Details of the optimization are discussed in D5091.

Test Plan: make check-all + a new test

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5422

llvm-svn: 218455
2014-09-25 17:27:43 +00:00
Jiangning Liu
92ab0f5880 Clear PreferredExtendType for in each function-specific state FunctionLoweringInfo.
llvm-svn: 218364
2014-09-24 03:22:56 +00:00
Robin Morisset
1d079fe807 [X86] Make wide loads be managed by AtomicExpand
Summary:
AtomicExpand already had logic for expanding wide loads and stores on LL/SC
architectures, and for expanding wide stores on CmpXchg architectures, but
not for wide loads on CmpXchg architectures. This patch fills this hole,
and makes use of this new feature in the X86 backend.

Only one functionnal change: we now lose the SynchScope attribute.
It is regrettable, but I have another patch that I will submit soon that will
solve this for all of AtomicExpand (it seemed better to split it apart as it
is a different concern).

Test Plan: make check-all (lots of tests for this functionality already exist)

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5404

llvm-svn: 218332
2014-09-23 20:59:25 +00:00
Robin Morisset
a7da34c778 Add AtomicExpandPass::bracketInstWithFences, and use it whenever getInsertFencesForAtomic would trigger in SelectionDAGBuilder
Summary:
The goal is to eventually remove all the code related to getInsertFencesForAtomic
in SelectionDAGBuilder as it is wrong (designed for ARM, not really portable, works
mostly by accident because the backends are overly conservative), and repeats the
same logic that goes in emitLeading/TrailingFence.

In this patch, I make AtomicExpandPass insert the fences as it knows better
where to put them. Because this requires getting the fences and not just
passing an IRBuilder around, I had to change the return type of
emitLeading/TrailingFence.
This code only triggers on ARM for now. Because it is earlier in the pipeline
than SelectionDAGBuilder, it triggers and lowers atomic accesses to atomic so
SelectionDAGBuilder does not add barriers anymore on ARM.

If this patch is accepted I plan to implement emitLeading/TrailingFence for all
backends that setInsertFencesForAtomic(true), which will allow both making them
less conservative and simplifying SelectionDAGBuilder once they are all using
this interface.

This should not cause any functionnal change so the existing tests are used
and not modified.

Test Plan: make check-all, benefits from existing tests of atomics on ARM

Reviewers: jfb, t.p.northover

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5179

llvm-svn: 218329
2014-09-23 20:31:14 +00:00
Lang Hames
f3ae47a1c7 [MCJIT] Nuke MachineRelocation and MachineCodeEmitter. Now that the old JIT is
gone they're no longer needed.

llvm-svn: 218320
2014-09-23 18:08:47 +00:00
Sanjay Patel
14a8712c3d Use SDValue bool operator to reduce code. No functional change.
llvm-svn: 218314
2014-09-23 16:24:20 +00:00
David Majnemer
c7ddf568e6 MC: ReadOnlyWithRel section kinds should map to rdata in COFF
Don't consider ReadOnlyWithRel as a writable section in COFF, they
really belong in .rdata.

llvm-svn: 218268
2014-09-22 20:39:23 +00:00
Sanjay Patel
1235f854b3 Refactor reciprocal square root estimate into target-independent function; NFC.
This is purely a plumbing patch. No functional changes intended.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner.

Next steps:

    The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added.
    Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner.
    X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added.

Differential Revision: http://reviews.llvm.org/D5425

llvm-svn: 218219
2014-09-21 15:19:15 +00:00
Sanjay Patel
6ce3689f46 mop up: "Don’t duplicate function or class name at the beginning of the comment."
llvm-svn: 218218
2014-09-21 14:48:16 +00:00
Sanjay Patel
48e944fd89 mop up: "Don’t duplicate function or class name at the beginning of the comment."
llvm-svn: 218194
2014-09-20 22:39:16 +00:00
David Majnemer
8ffc0f9fcb MC: Treat ReadOnlyWithRel and ReadOnlyWithRelLocal as ReadOnly for COFF
A problem with our old behavior becomes observable under x86-64 COFF
when we need a read-only GV which has an initializer which is referenced
using a relocation: we would mark the section as writable.  Marking the
section as writable interferes with section merging.

This fixes PR21009.

llvm-svn: 218179
2014-09-20 07:31:46 +00:00
Peter Collingbourne
203d8801c7 Fix crash with an insertvalue that produces an empty object.
llvm-svn: 218171
2014-09-20 00:10:47 +00:00
Chris Bieneman
a553d90b36 Converting SpillPlacement's BlockFrequency threshold to a ManagedStatic to avoid static constructors and destructors.
llvm-svn: 218163
2014-09-19 22:46:28 +00:00
David Blaikie
ab086ab341 Omit DW_TAG_subprograms for subprograms without inlined subroutines when producing -gmlt data
To reduce the size of -gmlt data, skip the subprograms without any
inlined subroutines. Since we've now got the ability to make these
determinations in the backend (funnily enough - we added the flag so we
wouldn't produce ranges under -gmlt, but with this change we use the
flag, but go back to producing ranges under -gmlt).

Instead, just produce CU ranges to inform the consumer which parts of
the code are described by this CU's line table. Tools could inspect the
line table directly to compute the range, but the CU ranges only seem to
be about 0.5% of object/executable size, so I'm not too worried about
teaching llvm-symbolizer that trick just yet - it's certainly a possible
piece of future work.

Update an llvm-symbolizer test just to demonstrate that this schema is
acceptable there (if it wasn't, the compiler-rt tests would catch this,
but good to have an in-llvm-tree test for llvm-symbolizer's behavior
here)

Building the clang binary with -gmlt with this patch reduces the total
size of object files by 5.1% (5.56% without ranges) without compression
and the executable by 4.37% (4.75% without ranges).

llvm-svn: 218129
2014-09-19 17:03:16 +00:00
Frederic Riss
f436abc737 Change DwarfCompileUnit::createGlobalVariable to getOrCreateGlobalVariable.
Summary:
This will allow to request the creation of a forward delacred variable
at is point of use (for imported declarations, this will be
DwarfDebug::constructImportedEntityDIE) rather than having to put the
forward decl in a retention list.

Note that getOrCreateGlobalVariable returns the actual definition DIE when the
routine creates a declaration and a definition DIE. If you agree this is the
right behavior, then I'll have a followup patch that registers the definition
in the DIE map instead of the declaration as it is today (this 'breaks' only
one test, where we test that the imported entity is the declaration). I'm
not sure what's best here, but it's easy enough for a consumer to follow the
DW_AT_specification link to get to the declaration, whereas it takes more
work to find the actual definition from a declaration DIE.

Reviewers: echristo, dblaikie, aprantl

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5381

llvm-svn: 218126
2014-09-19 15:12:03 +00:00
Hal Finkel
0c0c256ad7 Optionally enable more-aggressive FMA formation in DAGCombine
The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one
use, but this is overly-conservative on some systems. Specifically, if the FMA
and the FADD have the same latency (and the FMA does not compete for resources
with the FMUL any more than the FADD does), there is no need for the
restriction, and furthermore, forming the FMA leaving the FMUL can still allow
for higher overall throughput and decreased critical-path length.

Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to
elide the hasOneUse check. This is enabled for PowerPC by default, as most
PowerPC systems will benefit.

Patch by Olivier Sallenave, thanks!

llvm-svn: 218120
2014-09-19 11:42:56 +00:00
Jiangning Liu
9dd58d584c Optimize sext/zext insertion algorithm in back-end.
With this optimization, we will not always insert zext for values crossing
basic blocks, but insert sext if the users of a value crossing basic block
has preference of sign predicate.

llvm-svn: 218101
2014-09-19 05:30:35 +00:00
David Blaikie
8a66037575 Omit DW_AT_frame_base under -gmlt for size
llvm-svn: 218100
2014-09-19 04:55:05 +00:00
David Blaikie
1a16b76ce1 Describe the -gmlt optimization committed in the previous revision.
llvm-svn: 218099
2014-09-19 04:47:46 +00:00
David Blaikie
5019d606b7 Omit all the extra static attributes on subprograms in -gmlt
This omission will be done in a fancier manner once we're dealing with
"put gmlt in the skeleton CUs under fission" - it'll have to be
conditional on the kind of CU we're emitting into (skeleton or gmlt).

llvm-svn: 218098
2014-09-19 04:30:36 +00:00
Hans Wennborg
0eae0ac98d Fix an it's vs. its typo.
llvm-svn: 218093
2014-09-19 01:14:56 +00:00
Frederic Riss
b1025475ae Revert part of r218041.
The patch moved some logic around in an attempt to generate potentially more
DW_AT_declaration attributes. The patch was flawed though and it stopped
generating the attribute in some cases.

llvm-svn: 218060
2014-09-18 16:41:04 +00:00
Frederic Riss
79b1a240fa Always emit DW_AT_declaration attribute when the variable isn't a definition.
Summary:
This doesn't show up today as we don't emit decalration only variables. This
will be tested when the followup patches implementing import of forward
declared entities lands in clang.

Reviewers: echristo, dblaikie, aprantl

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5382

llvm-svn: 218041
2014-09-18 09:38:23 +00:00
Eric Christopher
c75fbbac7c Add a new pass FunctionTargetTransformInfo. This pass serves as a
shim between the TargetTransformInfo immutable pass and the Subtarget
via the TargetMachine and Function. Migrate a single call from
BasicTargetTransformInfo as an example and provide shims where TargetMachine
begins taking a Function to determine the subtarget.

No functional change.

llvm-svn: 218004
2014-09-18 00:34:14 +00:00
Robin Morisset
4c9d292205 [X86] Use the generic AtomicExpandPass instead of X86AtomicExpandPass
This required a new hook called hasLoadLinkedStoreConditional to know whether
to expand atomics to LL/SC (ARM, AArch64, in a future patch Power) or to
CmpXchg (X86).

Apart from that, the new code in AtomicExpandPass is mostly moved from
X86AtomicExpandPass. The main result of this patch is to get rid of that
pass, which had lots of code duplicated with AtomicExpandPass.

llvm-svn: 217928
2014-09-17 00:06:58 +00:00
Quentin Colombet
324286d9e6 [CodeGenPrepare][AddressingModeMatcher] The promotion mechanism was expecting
instructions when truncate, sext, or zext were created. Fix that.

llvm-svn: 217926
2014-09-16 22:36:07 +00:00
Owen Anderson
dded4a7289 Add back a fallback case for targets that do not or cannot implement getNoopForMachoTarget().
llvm-svn: 217899
2014-09-16 20:28:00 +00:00