Some gardening/refactoring.
It's cleaner to copy the instructions into the MachineFunction using the first
candidate instead of going to the mapper.
Also, by doing this we can remove the Seq member from OutlinedFunction entirely.
llvm-svn: 348390
Summary:
r336838 allowed these to be toggleable.
r336858 reverted r336838.
r336943 made the generation of these sections conditional on LDPO_REL.
This commit brings back the toggle-ability. You can specify:
-plugin-opt=-function-sections
-plugin-opt=-data-sections
For your linker flags to disable the changes made in r336943.
Without toggling r336943 off, arm64 linux kernels linked with gold-plugin
see significant boot time regressions, but with r336943 outright reverted
x86_64 linux kernels linked with gold-plugin fail to boot.
Reviewers: pcc, void
Reviewed By: pcc
Subscribers: javed.absar, kristof.beyls, llvm-commits, srhines
Differential Revision: https://reviews.llvm.org/D55291
llvm-svn: 348389
Because we're potentially peeking through a bitcast in this transform,
we need to use overall bitwidths rather than number of elements to
determine when it's safe to proceed.
Should fix:
https://bugs.llvm.org/show_bug.cgi?id=39893
llvm-svn: 348383
Summary: debug intrinsics might be marked norecurse to enable the caller function to be norecurse and optimized if needed. This avoids code gen optimisation differences when -g is used, as in globalOpt.cpp:processInternalGlobal checks.
Reviewers: chandlerc, jmolloy, aprantl
Reviewed By: aprantl
Subscribers: aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D55187
llvm-svn: 348381
Revert https://reviews.llvm.org/D55217 due to warnings-turned-into-errors in
AMGPU targets. I'll fix the warnings first, then re-commit this patch.
llvm-svn: 348375
Whenever we effectively take the address of a basic block we need to
manually update that basic block to reflect that fact or later passes
such as tail duplication and tail merging can break the invariants of
the code. =/ Sadly, there doesn't appear to be any good way of
automating this or even writing a reasonable assert to catch it early.
The change seems trivially and obviously correct, but sadly the only
really good test case I have is 1000s of basic blocks. I've tried
directly writing a test case that happens to make tail duplication do
something that crashes later on, but this appears to require an
*amazingly* complex set of conditions that I've not yet reproduced.
The change is technically covered by the tests because we mark the
blocks as having their address taken, but that doesn't really count as
properly testing the functionality.
llvm-svn: 348374
fidelity checking of RIP-based references to basic blocks and other
labels.
These labels are super important for SLH tests so we should keep them
readable in the test cases.
llvm-svn: 348373
Summary:
Many functions on `llvm::AttributeList` and `llvm::AttributeSet` are
documented with "returns a new {list,set} because attribute
{lists,sets} are immutable." This documentation can be aided by the
addition of an attribute, `LLVM_NODISCARD`. Adding this prevents
unsuspecting users of the API from expecting
`AttributeList::setAttributes` from modifying the underlying list.
At the very least, it would have saved me a few hours of debugging, since I
had been doing just that! I had a bug in my program where I was calling
`setAttributes` but then passing in the unmutated `AttributeList`.
I tried adding LLVM_NODISCARD and confirmed that it would have made my bug
immediately obvious.
Reviewers: rnk, javed.absar
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55217
llvm-svn: 348372
The tests here are based on the motivating cases from D54827.
More background:
1. We don't get these cases in general with SimplifyCFG because the root
of the pattern match is an icmp, not a branch. I'm not sure how often
we encounter this pattern vs. the seemingly more likely case with
branches, but I don't see evidence to leave the minimal pattern
unoptimized.
2. This has a chance of increasing compile-time because we're using a
ValueTracking call to handle the match. The motivating cases could be
handled with a simpler pair of calls to isImpliedTrueByMatchingCmp/
isImpliedFalseByMatchingCmp, but I saw that we have a more
comprehensive wrapper around those, so we might as well use it here
unless there's evidence that it's significantly slower.
3. Ideally, we'd handle the fold to constants in InstSimplify, but as
with the existing code here, we could extend this to handle cases
where the result is not a constant, but a new combined predicate.
That would mean splitting the logic across the 2 passes and possibly
duplicating the pattern-matching cost.
4. As mentioned in D54827, this seems like the kind of thing that should
be handled in Correlated Value Propagation, but that pass is currently
limited to dealing with instructions with constant operands, so extending
this bit of InstCombine is the smallest/easiest way to get these patterns
optimized.
llvm-svn: 348367
Prep work for PR38243 - mainly adding comments on where we need to add modulo support (doing so at the moment causes massive codegen regressions).
I've also consistently added support for modulo folding for uniform constants (although at the moment we have no way to trigger this) and removed the old assertions.
llvm-svn: 348366
Skip the ThinLTO cache tests on NetBSD. They require 'touch' being
able to alter atime of files, while NetBSD inhibits atime updates
when filesystem is mounted noatime.
Differential Revision: https://reviews.llvm.org/D55273
llvm-svn: 348355
Split timestamp preservation tests into atime and mtime test, and skip
the former on NetBSD. When the filesystem is mounted noatime, NetBSD
not only inhibits implicit atime updates but also prevents setting atime
via utime(), causing the test to fail.
Differential Revision: https://reviews.llvm.org/D55271
llvm-svn: 348354
This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions.
Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code.
Differential Revision: https://reviews.llvm.org/D54698
llvm-svn: 348353
Fix potential issue with the ISD::INSERT_VECTOR_ELT case tweaking the DemandedElts mask instead of using a local copy - so later uses of the mask use the tweaked version.....
Noticed while investigating adding zero/undef folding to SimplifyDemandedVectorElts and the altered DemandedElts mask was causing mismatches.
llvm-svn: 348348
Summary:
The remaining code paths that ControlFlowHoisting introduced that were
not disabled, increased compile time by 3x for some benchmarks.
The time is spent in DominatorTree updates.
Reviewers: john.brawn, mkazantsev
Subscribers: sanjoy, jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D55313
llvm-svn: 348345
Functions annotated with `__fastcall` or `__attribute__((__fastcall__))`
or `__attribute__((__swiftcall__))` may contain SEH handlers even on
Win64. This matches the behaviour of cl which allows for
`__try`/`__except` inside a `__fastcall` function. This was detected
while trying to self-host clang on Windows ARM64.
llvm-svn: 348337
It looks like MCRegAliasIterator can visit the same physical register twice. When this happens in this code in LICM we end up setting the PhysRegDef and then later in the same loop visit the register again. Now we see that PhysRegDef is set from the earlier iteration so now set PhysRegClobber.
This patch splits the loop so we have one that uses the previous value of PhysRegDef to update PhysRegClobber and second loop that updates PhysRegDef.
The X86 atomic test is an improvement. I had to add sideeffect to the two shrink wrapping tests to prevent hoisting from occurring. I'm not sure about the AMDGPU tests. It looks like the branch instruction changed at end the of the loops. And in the branch-relaxation test I think there is now "and vcc, exec, -1" instruction that wasn't there before.
Differential Revision: https://reviews.llvm.org/D55102
llvm-svn: 348330
Summary:
This fixes support in DAGISelMatcher backend for DAG nodes with multiple
result values. Previously the order of results in selected DAG nodes always
matched the order of results in ISel patterns. After the change the order of
results matches the order of operands in OutOperandList instead.
For example, given this definition from the attached test case:
def INSTR : Instruction {
let OutOperandList = (outs GPR:$r1, GPR:$r0);
let InOperandList = (ins GPR:$t0, GPR:$t1);
let Pattern = [(set i32:$r0, i32:$r1, (udivrem i32:$t0, i32:$t1))];
}
the DAGISelMatcher backend currently produces a matcher that creates INSTR
nodes with the first result `$r0` and the second result `$r1`, contrary to the
order in the OutOperandList. The order of operands in OutOperandList does not
matter at all, which is unexpected (and unfortunate) because the order of
results of a DAG node does matters, perhaps a lot.
With this change, if the order in OutOperandList does not match the order in
Pattern, DAGISelMatcherGen emits CompleteMatch opcodes with the order of
results taken from OutOperandList. Backend writers can use it to express
result reorderings in TableGen.
If the order in OutOperandList matches the order in Pattern, the result of
DAGISelMatcherGen is unaffected.
Patch by Eugene Sharygin
Reviewers: andreadb, bjope, hfinkel, RKSimon, craig.topper
Reviewed By: craig.topper
Subscribers: nhaehnle, craig.topper, llvm-commits
Differential Revision: https://reviews.llvm.org/D55055
llvm-svn: 348326
There's a 64k limit on the number of SDNode operands, and some very large
functions with 64k or more loads can cause crashes due to this limit being hit
when a TokenFactor with this many operands is created. To fix this, create
sub-tokenfactors if we've exceeded the limit.
No test case as it requires a very large function.
rdar://45196621
Differential Revision: https://reviews.llvm.org/D55073
llvm-svn: 348324
Like the already existing zip_shortest/zip_first iterators, zip_longest
iterates over multiple iterators at once, but has as many iterations as
the longest sequence.
This means some iterators may reach the end before others do.
zip_longest uses llvm::Optional's None value to mark a
past-the-end value.
zip_longest is not reverse-iteratable because the tuples iterated over
would be different for different length sequences (IMHO for the same
reason neither zip_shortest nor zip_first should be reverse-iteratable;
one can still reverse the ranges individually if that's the expected
behavior).
In contrast to zip_shortest/zip_first, zip_longest tuples contain
rvalues instead of references. This is because llvm::Optional cannot
contain reference types and the value-initialized default does not have
a memory location a reference could point to.
The motivation for these iterators is to use C++ foreach to compare two
lists of ordered attributes in D48100 (SemaOverload.cpp and
ASTReaderDecl.cpp).
Idea by @hfinkel.
This re-commits r348301 which was reverted by r348303.
The compilation error by gcc 5.4 was resolved using make_tuple in the in
the initializer_list.
The compileration error by msvc14 was resolved by splitting
ZipLongestValueType (which already was a workaround for msvc15) into
ZipLongestItemType and ZipLongestTupleType.
Differential Revision: https://reviews.llvm.org/D48348
llvm-svn: 348323
This breaks C and C++ semantics because it can cause the address
of the global inside the module to differ from the address outside
of the module.
Differential Revision: https://reviews.llvm.org/D55237
llvm-svn: 348321
We previously disabled this in r323371 because of a bug where we selected an
extending load, but didn't delete the old G_LOAD, resulting in two loads being
generated for volatile loads.
Since we now have dedicated G_SEXTLOAD/G_ZEXTLOAD operations, and that the
tablegen patterns should no longer be able to select (ext(load x)) patterns, it
should be safe to re-enable it.
The old test case should still work as expected.
llvm-svn: 348320
This is no longer used and is just taking up space in the structure.
Heap allocation of this structure is on the critical path, so space
actually matters.
llvm-svn: 348318
Ideally, we would fold all of these in InstSimplify in a
similar way to rL347896, but this is a bit awkward when
we're trying to simplify a compare directly because the
ValueTracking API expects the compare as an input, but
in InstSimplify, we just have the operands of the compare.
Given that we can do transforms besides just simplifications,
we might as well just extend the code in InstCombine (which
already does simplifications with constant operands).
llvm-svn: 348312
Previously these were dropped. We now understand them sufficiently
well to start emitting them. From the debugger's perspective, this
now enables us to have debug info about typedefs (both global and
function-locally scoped)
Differential Revision: https://reviews.llvm.org/D55228
llvm-svn: 348306