The method PerformCodePlacement was doing too much (i.e. 3x loops, lots of
different checking). This refactoring separates the analysis section of the
method into a separate function while leaving the actual code placement and
analysis preparation in PerformCodePlacement.
*NOTE* Really this part of ObjCARC should be refactored out of the main pass
class into its own seperate class/struct. But, it is not time to make that
change yet though (don't want to make such an invasive change without fixing all
of the bugs first).
llvm-svn: 173201
- Add list of physical registers clobbered in pseudo atomic insts
Physical registers are clobbered when pseudo atomic instructions are
expanded. Add them in clobber list to prevent DAG scheduler to
mis-schedule them after these insns are declared side-effect free.
- Add test case from Michael Kuperstein <michael.m.kuperstein@intel.com>
llvm-svn: 173200
the body does not use them and it appears the body has positional parameters.
This can cause unexpected results as in the added test case. As the darwin
version of gas(1) which only supported positional parameters, happened to
ignore the named parameters. Now that we want to support both styles of
macros we issue a warning in this specific case.
rdar://12861644
llvm-svn: 173199
Use the AttributeSet when we're talking about more than one attribute. Add a
function that adds a single attribute. No functionality change intended.
llvm-svn: 173196
Add the x32 environment kind to the triple, and separate the concept of
pointer size and callee save stack slot size, since they're not equal
on x32.
llvm-svn: 173175
generic function calls and intrinsics. This is somewhat overlapping with
an existing intrinsic cost method, but that one seems targetted at
vector intrinsics. I'll merge them or separate their names and use cases
in a separate commit.
This sinks the test of 'callIsSmall' down into TTI where targets can
control it. The whole thing feels very hack-ish to me though. I've left
a FIXME comment about the fundamental design problem this presents. It
isn't yet clear to me what the users of this function *really* care
about. I'll have to do more analysis to figure that out. Putting this
here at least provides it access to proper analysis pass tools and other
such. It also allows us to more cleanly implement the baseline cost
interfaces in TTI.
With this commit, it is now theoretically possible to simplify much of
the inline cost analysis's handling of calls by calling through to this
interface. That conversion will have to happen in subsequent commits as
it requires more extensive restructuring of the inline cost analysis.
The CodeMetrics class is now really only in the business of running over
a block of code and aggregating the metrics on that block of code, with
the actual cost evaluation done entirely in terms of TTI.
llvm-svn: 173148
Previously we tried to infer it from the bit width size, with an added
IsIEEE argument for the PPC/IEEE 128-bit case, which had a default
value. This default value allowed bugs to creep in, where it was
inappropriate.
llvm-svn: 173138
This is more code to isolate the use of the Attribute class to that of just
holding one attribute instead of a collection of attributes.
llvm-svn: 173094
BLOB (i.e., large, performance intensive data) in a bitcode file was switched to
invoking one virtual method call per byte read. Now we do one virtual call per
BLOB.
llvm-svn: 173065
A SparseMultiSet adds multiset behavior to SparseSet, while retaining SparseSet's desirable properties. Essentially, SparseMultiSet provides multiset behavior by storing its dense data in doubly linked lists that are inlined into the dense vector. This allows it to provide good data locality as well as vector-like constant-time clear() and fast constant time find(), insert(), and erase(). It also allows SparseMultiSet to have a builtin recycler rather than keeping SparseSet's behavior of always swapping upon removal, which allows it to preserve more iterators. It's often a better alternative to a SparseSet of a growable container or vector-of-vector.
llvm-svn: 173064
Patch by: Michel Dänzer
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 173053
Patch by: Michel Dänzer
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 173052
Patch by: Michel Dänzer
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 173051
is free. The whole CodeMetrics API should probably be reworked more, but
this is enough to allow deleting the duplicate code there for computing
whether an instruction is free.
All of the passes using this have been updated to pull in TTI and hand
it to the CodeMetrics stuff. Further, a dead CodeMetrics API
(analyzeFunction) is nuked for lack of users.
llvm-svn: 173036
analysis. How cute that it wasn't previously. ;]
Part of this confusion stems from the flattened header file tree. Thanks
to Benjamin for pointing out the goof on IRC, and we're considering
un-flattening the headers, so speak now if that would bug you.
llvm-svn: 173033
old CodeMetrics system. TTI has the specific advantage of being
extensible and customizable by targets to reflect target-specific cost
metrics.
llvm-svn: 173032
depend on and use other analyses (as long as they're either immutable
passes or CGSCC passes of course -- nothing in the pass manager has been
fixed here). Leverage this to thread TargetTransformInfo down through
the inline cost analysis.
No functionality changed here, this just threads things through.
llvm-svn: 173031
a dynamic analysis done on each call to the routine. However, now it can
use the standard pass infrastructure to reference other analyses,
instead of a silly setter method. This will become more interesting as
I teach it about more analysis passes.
This updates the two inliner passes to use the inline cost analysis.
Doing so highlights how utterly redundant these two passes are. Either
we should find a cheaper way to do always inlining, or we should merge
the two and just fiddle with the thresholds to get the desired behavior.
I'm leaning increasingly toward the latter as it would also remove the
Inliner sub-class split.
llvm-svn: 173030
lowered cost.
Currently, this is a direct port of the logic implementing
isInstructionFree in CodeMetrics. The hope is that the interface can be
improved (f.ex. supporting un-formed instruction queries) and the
implementation abstracted so that as we have test cases and target
knowledge we can expose increasingly accurate heuristics to clients.
I'll start switching existing consumers over and kill off the routine in
CodeMetrics in subsequent commits.
llvm-svn: 172998
It is not possible to distinguish 3r instructions from 2r / rus instructions
using only the fixed bits. Therefore if an instruction doesn't match the
2r / rus format try to decode it as a 3r instruction before returning Fail.
llvm-svn: 172984
The optimization handles esoteric cases but adds a lot of complexity both to the X86 backend and to other backends.
This optimization disables an important canonicalization of chains of SEXT nodes and makes SEXT and ZEXT asymmetrical.
Disabling the canonicalization of consecutive SEXT nodes into a single node disables other DAG optimizations that assume
that there is only one SEXT node. The AVX mask optimizations is one example. Additionally this optimization does not update the cost model.
llvm-svn: 172968
We ignore the cpu frontend and focus on pipeline utilization. We do this because we
don't have a good way to estimate the loop body size at the IR level.
llvm-svn: 172964
We weren't encoding boolean constants correctly due to modeling boolean as a
signed type & then sign extending an i1 up to a byte & getting 255.
llvm-svn: 172926
through a BitstreamCursor that produce it: advance() and
advanceSkippingSubblocks(), representing the two most common ways clients
want to walk through bitcode.
llvm-svn: 172919
This separates the check for "too few elements to run the vector loop" from the
"memory overlap" check, giving a lot nicer code and allowing to skip the memory
checks when we're not going to execute the vector code anyways. We still leave
the decision of whether to emit the memory checks as branches or setccs, but it
seems to be doing a good job. If ugly code pops up we may want to emit them as
separate blocks too. Small speedup on MultiSource/Benchmarks/MallocBench/espresso.
Most of this is legwork to allow multiple bypass blocks while updating PHIs,
dominators and loop info.
llvm-svn: 172902
but I cannot reproduce the problem and have scrubed my sources and
even tested with llvm-lit -v --vg.
Formatting fixes. Mostly long lines and
blank spaces at end of lines.
Contributer: Jack Carter
llvm-svn: 172882
but I cannot reproduce the problem and have scrubed my sources and
even tested with llvm-lit -v --vg.
Support for Mips register information sections.
Mips ELF object files have a section that is dedicated
to register use info. Some of this information such as
the assumed Global Pointer value is used by the linker
in relocation resolution.
The register info file is .reginfo in o32 and .MIPS.options
in 64 and n32 abi files.
This patch contains the changes needed to create the sections,
but leaves the actual register accounting for a future patch.
Contributer: Jack Carter
llvm-svn: 172847
Some instructions like memory reads/writes are executed
asynchronously, so we need to insert S_WAITCNT instructions
to block before accessing their results. Previously we have
just inserted S_WAITCNT instructions after each async
instruction, this patch fixes this and adds a prober
insertion pass.
Patch by: Christian König
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
llvm-svn: 172846
We shouldn't insert KILL optimization if we don't have a
kill instruction at all.
Patch by: Christian König
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
llvm-svn: 172845
Because the Attribute class is going to stop representing a collection of
attributes, limit the use of it as an aggregate in favor of using AttributeSet.
This replaces some of the uses for querying the function attributes.
llvm-svn: 172844
but I cannot reproduce the problem and have scrubed my sources and
even tested with llvm-lit -v --vg.
Removal of redundant code and formatting fixes.
Contributers: Jack Carter/Vladimir Medic
llvm-svn: 172842
Okay, here's how to reproduce the problem:
1) Build a Release (or Release+Asserts) version of clang in the normal way.
2) Using the clang & clang++ binaries from (1), build a Release (or
Release+Asserts) version of the same sources, but this time enable LTO ---
specify the `-flto' flag on the command line.
3) Run the ARC migrator tests:
$ arcmt-test --args -triple x86_64-apple-darwin10 -fsyntax-only -x objective-c++ ./src/tools/clang/test/ARCMT/cxx-rewrite.mm
You'll see that the output isn't correct (the whitespace is off).
The mis-compile is in the function `RewriteBuffer::RemoveText' in the
clang/lib/Rewrite/Core/Rewriter.cpp file. When that function and RewriteRope.cpp
are compiled with LTO and the `arcmt-test' executable is regenerated, you'll see
the error. When those files are not LTO'ed, then the output of the `arcmt-test'
is fine.
It is *really* hard to get a testcase out of this. I'll file a PR with what I
have currently.
--- Reverse-merging r172363 into '.':
U include/llvm/Analysis/MemoryBuiltins.h
U lib/Analysis/MemoryBuiltins.cpp
--- Reverse-merging r171325 into '.':
U test/Transforms/InstCombine/objsize.ll
G include/llvm/Analysis/MemoryBuiltins.h
G lib/Analysis/MemoryBuiltins.cpp
llvm-svn: 172756
- This code is dead, and the "right" way to get this support is to use the
platform-specific linker-integrated LTO mechanisms, or the forthcoming LLVM
linker.
llvm-svn: 172749
_Complex float and _Complex long double, by simply increasing the
number of floating point registers available for return values.
The test case verifies that the correct registers are loaded.
llvm-svn: 172733
Move the early if-conversion pass into this group.
ILP optimizations usually need to find the right balance between
register pressure and ILP using the MachineTraceMetrics analysis to
identify critical paths and estimate other costs. Such passes should run
together so they can share dominator tree and loop info analyses.
Besides if-conversion, future passes to run here here could include
expression height reduction and ARM's MLxExpansion pass.
llvm-svn: 172687
but I cannot reproduce the problem and have scrubed my sources and
even tested with llvm-lit -v --vg.
The Mips RDHWR (Read Hardware Register) instruction was not
tested for assembler or dissassembler consumption. This patch
adds that functionality.
Contributer: Vladimir Medic
llvm-svn: 172685
Moving the X86CostTable to a common place, so that other back-ends
can share the code. Also simplifying it a bit and commoning up
tables with one and two types on operations.
llvm-svn: 172658
- Instead of computing a bunch of buckets of different flag types, just do an
incremental link resolving conflicts as they arise.
- This also has the advantage of making the link result deterministic and not
dependent on map iteration order.
llvm-svn: 172634
AT_producer. Which includes clang's version information so we can tell
which version of the compiler was used.
This is the first of two steps to allow us to do that. This is the llvm-mc
change to provide a method to set the AT_producer string. The second step,
coming soon to a clang near you, will have the clang driver pass the value
of getClangFullVersion() via an flag when invoking the integrated assembler
on assembly source files.
rdar://12955296
llvm-svn: 172630
In r143502, we renamed getHostTriple() to getDefaultTargetTriple()
as part of work to allow the user to supply a different default
target triple at configure time. This change also affected the JIT.
However, it is inappropriate to use the default target triple in the
JIT in most circumstances because this will not necessarily match
the current architecture used by the process, leading to illegal
instruction and other such errors at run time.
Introduce the getProcessTriple() function for use in the JIT and
its clients, and cause the JIT to use it. On architectures with a
single bitness, the host and process triples are identical. On other
architectures, the host triple represents the architecture of the
host CPU, while the process triple represents the architecture used
by the host CPU to interpret machine code within the current process.
For example, when executing 32-bit code on a 64-bit Linux machine,
the host triple may be 'x86_64-unknown-linux-gnu', while the process
triple may be 'i386-unknown-linux-gnu'.
This fixes JIT for the 32-on-64-bit (and vice versa) build on non-Apple
platforms.
Differential Revision: http://llvm-reviews.chandlerc.com/D254
llvm-svn: 172627
Specifically according to the semantics of ARC -fno-objc-arc-exception simply
states that it is expected that the unwind path out of a call *MAY* not release
objects. Thus we can have the situation where a release gets moved into a catch
block which we ignore when we remove a retain/release pair resulting in (even
though we assume the program is exiting anyways) the cleanup code path
potentially blowing up before program exit.
llvm-svn: 172599
Hope you are feeling better.
The Mips RDHWR (Read Hardware Register) instruction was not
tested for assembler or dissassembler consumption. This patch
adds that functionality.
Contributer: Vladimir Medic
llvm-svn: 172579
using the DW_FORM_GNU_addr_index and a separate .debug_addr section which
stays in the executable and is fully linked.
Sneak in two other small changes:
a) Print out the debug_str_offsets.dwo section.
b) Change form we're expecting the entries in the debug_str_offsets.dwo
section to take from ULEB128 to U32.
Add tests for all of this in the fission-cu.ll test.
llvm-svn: 172578
into which we can emit single instructions without fixups (which is most
instructions). This is an optimization required because MCDataFragment
is prety large (240 bytes on x64), with no change in functionality.
For large programs, this reduces memory usage overhead required for bundling
by 40%.
To make the code as palatable as possible, the MCEncodedFragment interface was
further fragmented (no pun intended) and MCEncodedFragmentWithFixups is used
as the interface to work against when the user expects fixups. MCDataFragment
and MCRelaxableFragment implement this interface, while the new
MCCompactEncodedInstFragment implements MCEncodeFragment.
llvm-svn: 172572
// FIXME: Constraints are hard coded to 'm', but we need an 'r'
// constraint for addressof. This needs to be cleaned up!
Test cases are already in place. Specifically,
clang/test/CodeGen/ms-inline-asm.c t15(), t16(), and t24().
llvm-svn: 172569
After discussing the refactoring with Jim and Daniel, the following changes were
made:
* All generic directive parsing is now done by AsmParser itself. The previous
division between it and GenericAsmParser did not have clear boundaries and
just produced unnatural code of GenericAsmParser juggling the internals of
AsmParser through an interface.
The division of responsibilities is now clear: target-specific directives,
other extensions (used by platform-specific parseres), and generic directives.
* Priority for directive parsing was reshuffled to ask extensions first and
check the generic directives later.
No change in functionality.
llvm-svn: 172568
some optimization opportunities (in the enclosing supper-expressions).
rule 1. (-0.0 - X ) * Y => -0.0 - (X * Y)
if expression "-0.0 - X" has only one reference.
rule 2. (0.0 - X ) * Y => -0.0 - (X * Y)
if expression "0.0 - X" has only one reference, and
the instruction is marked "noSignedZero".
2. Eliminate negation (The compiler was already able to handle these
opt if the 0.0s are replaced with -0.0.)
rule 3: (0.0 - X) * (0.0 - Y) => X * Y
rule 4: (0.0 - X) * C => X * -C
if the expr is flagged "noSignedZero".
3.
Rule 5: (X*Y) * X => (X*X) * Y
if X!=Y and the expression is flagged with "UnsafeAlgebra".
The purpose of this transformation is two-fold:
a) to form a power expression (of X).
b) potentially shorten the critical path: After transformation, the
latency of the instruction Y is amortized by the expression of X*X,
and therefore Y is in a "less critical" position compared to what it
was before the transformation.
4. Remove the InstCombine code about simplifiying "X * select".
The reasons are following:
a) The "select" is somewhat architecture-dependent, therefore the
higher level optimizers are not able to precisely predict if
the simplification really yields any performance improvement
or not.
b) The "select" operator is bit complicate, and tends to obscure
optimization opportunities. It is btter to keep it as low as
possible in expr tree, and let CodeGen to tackle the optimization.
llvm-svn: 172551
This simplifies the usage and implementation of ELFObjectFile by using ELFType
to replace:
<endianness target_endianness, std::size_t max_alignment, bool is64Bits>
This does complicate the base ELF types as they must now use template template
parameters to partially specialize for the 32 and 64bit cases. However these
are only defined once.
llvm-svn: 172515
we need to generate a N64 compound relocation
R_MIPS_GPREL_32/R_MIPS_64/R_MIPS_NONE.
The bug was exposed by the SingleSourcetest case
DuffsDevice.c.
Contributer: Jack Carter
llvm-svn: 172496
simply use the getParser method from MCAsmParserExtension, working through the
MCAsmParser interface. There's no longer a need to overload that method to
cast it to the concrete AsmParser.
llvm-svn: 172491
This finally allows AsmParser to no longer list GenericAsmParser as a friend.
All member vars directly accessed by GenericAsmParser have been properly
encapsulated and exposed through the MCAsmParser interface. This reduces the
coupling between AsmParser and GenericAsmParser.
llvm-svn: 172490
---------------------------------------------------------------------------
C_A: reassociation is allowed
C_R: reciprocal of a constant C is appropriate, which means
- 1/C is exact, or
- reciprocal is allowed and 1/C is neither a special value nor a denormal.
-----------------------------------------------------------------------------
rule1: (X/C1) / C2 => X / (C2*C1) (if C_A)
=> X * (1/(C2*C1)) (if C_A && C_R)
rule 2: X*C1 / C2 => X * (C1/C2) if C_A
rule 3: (X/Y)/Z = > X/(Y*Z) (if C_A && at least one of Y and Z is symbolic value)
rule 4: Z/(X/Y) = > (Z*Y)/X (similar to rule3)
rule 5: C1/(X*C2) => (C1/C2) / X (if C_A)
rule 6: C1/(X/C2) => (C1*C2) / X (if C_A)
rule 7: C1/(C2/X) => (C1/C2) * X (if C_A)
llvm-svn: 172488
The included test case is derived from one of the GCC compatibility tests.
The problem arises after the selection DAG has been converted to type-legalized
form. The combiner first sees a 64-bit load that can be converted into a
pre-increment form. The original load feeds into a SRL that isolates the
upper 32 bits of the loaded doubleword. This looks like an opportunity for
DAGCombiner::ReduceLoadWidth() to replace the 64-bit load with a 32-bit load.
However, this transformation is not valid, as the replacement load is not
a pre-increment load. The pre-increment load produces an extra result,
which feeds a subsequent add instruction. The replacement load only has
one result value, and this value is propagated to all uses of the pre-
increment load, including the add. Because the add is looking for the
second result value as its operand, it ends up attempting to add a constant
to a token chain, resulting in a crash.
So the patch simply disables this transformation for any load with more than
two result values.
llvm-svn: 172480
Now that it behaves itself in terms of streamer independence (r172450), this
method can be moved to MCAsmParser to be available to all extensions,
overriding, etc.
-- -This line, and those below, will be ignored--
M lib/MC/MCParser/AsmParser.cpp
M include/llvm/MC/MCParser/MCAsmParser.h
llvm-svn: 172451
The aim of this patch is to fix the following piece of code in the
platform-independent AsmParser:
void AsmParser::CheckForValidSection() {
if (!ParsingInlineAsm && !getStreamer().getCurrentSection()) {
TokError("expected section directive before assembly directive");
Out.SwitchSection(Ctx.getMachOSection(
"__TEXT", "__text",
MCSectionMachO::S_ATTR_PURE_INSTRUCTIONS,
0, SectionKind::getText()));
}
}
This was added for the "-n" option of llvm-mc.
The proposed fix adds another virtual method to MCStreamer, called
InitToTextSection. Conceptually, it's similar to the existing
InitSections which initializes all common sections and switches to
text. The new method is implemented by each platform streamer in a way
that it sees fit. So AsmParser can now do this:
void AsmParser::CheckForValidSection() {
if (!ParsingInlineAsm && !getStreamer().getCurrentSection()) {
TokError("expected section directive before assembly directive");
Out.InitToTextSection();
}
}
Which is much more reasonable.
llvm-svn: 172450
Since it's used by extensions. One further step to fully decoupling
GenericAsmParser from an intimate knowledge of the internals of AsmParser,
pointing it to the MCASmParser interface instead (like all other parser
extensions do).
Since this change moves the MacroArgument type to the interface header, it's
renamed to be a bit more descriptive in a general context.
llvm-svn: 172449
The methods are also exposed via the MCAsmParser interface, which allows more
than one client to control them. Previously, GenericAsmParser was playing with
a member var in AsmParser directly (by virtue of being its friend).
llvm-svn: 172440