Use MachineOperand::ChangeToImmediate rather than reassigning
MachineOperands to new values created from MachineOperand::CreateImm,
so that their parent pointers are preserved.
This fixes "Instruction has operand with wrong parent set" errors
reported by the MachineVerifier.
llvm-svn: 341389
Summary:
Control height reduction merges conditional blocks of code and reduces the
number of conditional branches in the hot path based on profiles.
if (hot_cond1) { // Likely true.
do_stg_hot1();
}
if (hot_cond2) { // Likely true.
do_stg_hot2();
}
->
if (hot_cond1 && hot_cond2) { // Hot path.
do_stg_hot1();
do_stg_hot2();
} else { // Cold path.
if (hot_cond1) {
do_stg_hot1();
}
if (hot_cond2) {
do_stg_hot2();
}
}
This speeds up some internal benchmarks up to ~30%.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: xbolva00, dmgreen, mehdi_amini, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D50591
llvm-svn: 341386
The -diff option makes it easy to diff dwarf by hiding addresses and
offsets. However not all of them were hidden, which should be fixed by
this patch.
Differential revision: https://reviews.llvm.org/D51593
llvm-svn: 341377
One of the tests is failing 50% of the time when expensive checks are
enabled. Not sure how deep the problem is so just reverting while the
author can investigate so that the bots stop repeatedly failing and
blaming things incorrectly. Will respond with details on the original
commit.
llvm-svn: 341365
Check for definedness of the __cpp_sized_deallocation and
__cpp_aligned_new feature test macros. These will not be defined
when the feature is not available, and that prevents any code that
includes this header from compiling with -Wundef -Werror.
Differential Revision: https://reviews.llvm.org/D51171
llvm-svn: 341364
Load Hardening.
Wires up the existing pass to work with a proper IR attribute rather
than just a hidden/internal flag. The internal flag continues to work
for now, but I'll likely remove it soon.
Most of the churn here is adding the IR attribute. I talked about this
Kristof Beyls and he seemed at least initially OK with this direction.
The idea of using a full attribute here is that we *do* expect at least
some forms of this for other architectures. There isn't anything
*inherently* x86-specific about this technique, just that we only have
an implementation for x86 at the moment.
While we could potentially expose this as a Clang-level attribute as
well, that seems like a good question to defer for the moment as it
isn't 100% clear whether that or some other programmer interface (or
both?) would be best. We'll defer the programmer interface side of this
for now, but at least get to the point where the feature can be enabled
without relying on implementation details.
This also allows us to do something that was really hard before: we can
enable *just* the indirect call retpolines when using SLH. For x86, we
don't have any other way to mitigate indirect calls. Other architectures
may take a different approach of course, and none of this is surfaced to
user-level flags.
Differential Revision: https://reviews.llvm.org/D51157
llvm-svn: 341363
GCC triggers false positives if a nothrow function is called through a template argument. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80985 for details. The LLVM libraries have no stable C++ API, so the warning is not useful.
llvm-svn: 341361
Also reverts follow-up commits r341343 and r341344.
The primary commit continues to break some build bots even after the
fixes in r341343 for UBSan issues:
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-full/builds/5823
It is also failing for me locally (linux, x86-64).
llvm-svn: 341360
implementing the proposed mitigation technique described in the original
design document.
The idea is to check after calls that the return address used to arrive
at that location is in fact the correct address. In the event of
a mis-predicted return which reaches a *valid* return but not the
*correct* return, this will detect the mismatch much like it would
a mispredicted conditional branch.
This is the last published attack vector that I am aware of in the
Spectre v1 space which is not mitigated by SLH+retpolines. However,
don't read *too* much into that: this is an area of ongoing research
where we expect more issues to be discovered in the future, and it also
makes no attempt to mitigate Spectre v4. Still, this is an important
completeness bar for SLH.
The change here is of course delightfully simple. It was predicated on
cutting support for post-instruction symbols into LLVM which was not at
all simple. Many thanks to Hal Finkel, Reid Kleckner, and Justin Bogner
who helped me figure out how to do a bunch of the complex changes
involved there.
Differential Revision: https://reviews.llvm.org/D50837
llvm-svn: 341358
retpolines.
This implements the core design of tracing the intended target into the
target, checking it, and using that to update the predicate state. It
takes advantage of a few interesting aspects of SLH to make it a bit
easier to implement:
- We already split critical edges with conditional branches, so we can
assume those are gone.
- We already unfolded any memory access in the indirect branch
instruction itself.
I've left hard errors in place to catch if any of these somewhat subtle
invariants get violated.
There is some code that I can factor out and share with D50837 when it
lands, but I didn't want to couple landing the two patches, so I'll do
that in a follow-up cleanup commit if alright.
Factoring out the code to handle different scenarios of materializing an
address remains frustratingly hard. In a bunch of cases you want to fold
one of the cases into an immediate operand of some other instruction,
and you also have both symbols and basic blocks being used which require
different methods on the MI builder (and different operand kinds).
Still, I'll take a stab at sharing at least some of this code in
a follow-up if I can figure out how.
Differential Revision: https://reviews.llvm.org/D51083
llvm-svn: 341356
Summary:
Refactoring done by rL340872 accidentally appeared to be non-NFC, changing the way how
multiple instances of the same pass are handled - aggregation of results by PassName
forced data for multiple instances to be merged together and reported as one line.
Getting back to creating/reporting timers per pass instance.
Reporting was a bit enhanced by counting pass instances and adding #<num> suffix
to the pass description. Note that it is instances that are being counted,
not invocations of them.
time-passes test updated to account for multiple passes being run.
Reviewers: paquette, jhenderson, MatzeB, skatkov
Reviewed By: skatkov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51535
llvm-svn: 341346
This patch removes the function `expandSCEVIfNeeded` which behaves not as
it was intended. This function tries to make a lookup for exact existing expansion
and only goes to normal expansion via `expandCodeFor` if this lookup hasn't found
anything. As a result of this, if some instruction above the loop has a `SCEVConstant`
SCEV, this logic will return this instruction when asked for this `SCEVConstant` rather
than return a constant value. This is both non-profitable and in some cases leads to
breach of LCSSA form (as in PR38674).
Whether or not it is possible to break LCSSA with this algorithm and with some
non-constant SCEVs is still in question, this is still being investigated. I wasn't
able to construct such a test so far, so maybe this situation is impossible. If it is,
it will go as a separate fix.
Rather than do it, it is always correct to just invoke `expandCodeFor` unconditionally:
it behaves smarter about insertion points, and as side effect of this it will choose a
constant value for SCEVConstants. For other SCEVs it may end up finding a better insertion
point. So it should not be worse in any case.
NOTE: So far the only known case for which this transform may break LCSSA is mapping
of SCEVConstant to an instruction. However there is a suspicion that the entire algorithm
can compromise LCSSA form for other cases as well (yet not proved).
Differential Revision: https://reviews.llvm.org/D51286
Reviewed By: etherzhhb
llvm-svn: 341345
Usage:
llvm-objcopy --compress-debug-sections=zlib foo.o
llvm-objcopy --compress-debug-sections=zlib-gnu foo.o
In both cases the debug section contents is compressed with zlib. In the GNU
style case the header is the "ZLIB" magic string followed by the uint64 big-
endian decompressed size. In the non-GNU mode the header is the
Elf(32|64)_Chdr.
Decompression support is coming soon.
Differential Revision: https://reviews.llvm.org/D49678
llvm-svn: 341342
This patch modifies hasStandardEncoding() / inMicroMipsMode() /
inMips16Mode() methods of the MipsSubtarget class so only one can be
true at any one time. That prevents the selection of microMIPS and MIPS
instructions and patterns that are defined in TableGen files at the same
time. A few new patterns and instruction definitions hae been added to
keep test cases passed.
Differential revision: https://reviews.llvm.org/D51483
llvm-svn: 341338
Summary:
Original changeset (https://reviews.llvm.org/D46776) by @modocache. It was
reverted after the PS4 bot failed.
The issue has been determined to be with the way the PS4 SDK handles this
particular option. https://reviews.llvm.org/D50410 removes this test, so we
can push this again.
Patch by Arnaud Coomans!
Reviewers: cfe-commits, modocache
Reviewed By: modocache
Differential Revision: https://reviews.llvm.org/D50515
llvm-svn: 341329
A ReadAdvance was incorrectly added to the SchedReadWrite list associated with
the following SSE instructions:
sqrtss
sqrtsd
rsqrtss
rcpss
As a consequence, a wrong operand latency was computed for the register operand
used as the base address of the folded load operand.
This patch removes the wrong ReadAdvance, and updates the llvm-mca test cases.
There is still a problem with correctly modeling partial register writes on XMM
registers This other problem is currently tracked here:
https://bugs.llvm.org/show_bug.cgi?id=38813
Differential Revision: https://reviews.llvm.org/D51542
llvm-svn: 341326
Also adjust some of dsymutil's headers to put the header guards at the top,
otherwise the compiler will not recognize them as header guards.
llvm-svn: 341323
According to the standard, for the .debug_names (the "dwarf accelerator
tables"):
> If a subprogram or inlined subroutine is included, and has a
> DW_AT_linkage_name attribute, there will be an additional index entry
> for the linkage name.
For Swift we generate DW_structure_types with a linkage name and the
verifier was incorrectly rejecting this. This patch fixes that by only
considering the linkage name in those particular cases. The test is the
"reduced" debug info of the failing swift test on swift.org.
Differential revision: https://reviews.llvm.org/D51420
llvm-svn: 341311
When initial support for dllimport was added for aarch64 in
SVN r316555, ClassifyGlobalReference didn't set the MO_DLLIMPORT
flag - that was only completed in SVN r323810. Reuse the return
value from ClassifyGlobalReference for this purpose as well.
llvm-svn: 341310
A condition in isSpillInstruction() updates a small vector rather
than the 'FI' by-ref parameter, which was used in a subsequent
call to 'isSpillSlotObjectIndex()'. This patch fixes the condition
to check the FIs in the vector instead.
llvm-svn: 341305
For instructions that spill/fill to and from multiple frame-indices
in a single instruction, hasStoreToStackSlot and hasLoadFromStackSlot
should return an array of accesses, rather than just the first encounter
of such an access.
This better describes FI accesses for AArch64 (paired) LDP/STP
instructions.
Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D51537
llvm-svn: 341301
Remove braces around two, single statement "if" blocks in line with rest
of the file and the general LLVM code style. NFC, testing commit access.
llvm-svn: 341294
When doing some instruction scheduling work, we noticed some missing itineraries.
Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling,
because we can still get same latency due to default values.
With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.
This will has impact on the count of RetiredMOps, affects the Pending/Available Queue,
then causing different scheduling or suboptimal scheduling further.
Patch by jsji (Jinsong Ji)
Differential Revision: https://reviews.llvm.org/D51506
llvm-svn: 341293
The fold was implemented for the general case but use-limitation,
but the later constant version which didn't check uses was only
matching splat constants.
llvm-svn: 341292
In lib/CodeGen/LiveDebugVariables.cpp, it uses std::prev(MBBI) to
get DebugValue's SlotIndex. However, the previous instruction may be
also a debug instruction. It could not use a debug instruction to query
SlotIndex in mi2iMap.
Scan all debug instructions and use the first debug instruction to query
SlotIndex for following debug instructions. Only handle DBG_VALUE in
handleDebugValue().
Differential Revision: https://reviews.llvm.org/D50621
llvm-svn: 341289
If we have a pair of binops feeding another pair of binops, rearrange the operands so
the matching pair are together because that allows easy factorization folds to happen
in instcombine:
((X << S) & Y) & (Z << S) --> ((X << S) & (Z << S)) & Y (reassociation)
--> ((X & Z) << S) & Y (factorize shift from 'and' ops optimization)
This is part of solving PR37098:
https://bugs.llvm.org/show_bug.cgi?id=37098
Note that there's an instcombine version of this patch attached there, but we're trying
to make instcombine have less responsibility to improve compile-time efficiency.
For reasons I still don't completely understand, reassociate does this kind of transform
sometimes, but misses everything in my motivating cases.
This patch on its own is gluing an independent cleanup chunk to the end of the existing
RewriteExprTree() loop. We can build on it and do something stronger to better order the
full expression tree like D40049. That might be an alternative to the proposal to add a
separate reassociation pass like D41574.
Differential Revision: https://reviews.llvm.org/D45842
llvm-svn: 341288