With the new LTO API in r278338, we stopped emitting the individual
index files and imports files for some modules in the distributed backend
case (thinlto-index-only plugin option).
Specifically, this is when the linker decides not to include a module in the
link, because it was in an archive library and did not have a strong
reference to it. Not creating the expected output files makes the
distributed build system implementation more difficult, in terms of
checking for the expected outputs of the thin link, and scheduling the
backend jobs. To address this, the gold-plugin will write dummy empty
.thinlto.bc and .imports files for modules not included in the link
(which LTO never sees).
Augmented a gold v1.12+ test, since that version of gold has the handling
for notifying on modules not being included in the link.
llvm-svn: 282100
We still don't really have an equivalent of "AssertXExt" in DAG, so we don't
exploit the guarantees on the receiving side yet, but this should produce
conservatively correct code on iOS ABIs.
llvm-svn: 282069
The only implementation that exists immediately looks it up anyway, and the
information is needed to handle various parameter attributes (stored on the
function itself).
llvm-svn: 282068
TargetMachine::getNameWithPrefix and inline the result into the singular
caller." and "Remove more guts of TargetMachine::getNameWithPrefix and
migrate one check to the TLOF mach-o version." temporarily until I can
get the whole call migrated out of the TargetMachine as we could hit
places where TLOF isn't valid.
This reverts commits r281981 and r281983.
llvm-svn: 282028
This should match the existing behaviour for passing complicated struct and
array types, in particular HFAs come through like that from Clang.
For C & C++ we still need to somehow support all the weird ABI flags, or at
least those that are present in the IR (signext, byval, ...), and stack-based
parameter passing.
llvm-svn: 281977
The routines llvm::ConvertDebugDeclareToDebugValue() always returned
a true value which was never checked at the call site; change the
function return type to void.
This NFC cleanup was approved in the review https://reviews.llvm.org/D23715
llvm-svn: 281964
Use SmallVector instead of dynamically allocated arrays for the mapping of the
operands in the InstructionMapping. That way we avoid heap allocation for most
of the cases. Ultimately, we should not have to rely on such tricky, the
instances of InstructionMapping would be TableGen'ed.
This improves the compilation time of the RegBankSelect pass.
llvm-svn: 281955
The comments of SplitBlockAndInsertIfThenElse say the SplitBefore instruction will stay in the old block.
But according to the implementation(split the block at SplitBefore by using splitBasicBlock), the SplitBefore will be moved to the new block.
This patch fixes the comments.
Patch by Zhe Yu Wu.
llvm-svn: 281939
The OperandsMapper class is used heavy in RegBankSelect and each
instantiation triggered a heap allocation for the array of operands.
Instead, use a SmallVector with a big enough size such that most of the
cases do not have to use dynamically allocated memory.
This improves the compile time of the RegBankSelect pass.
llvm-svn: 281916
Summary: The call target count profile is directly derived from LBR branch->target data. This is more reliable than instruction frequency profiles that could be moved across basic block boundaries. This patches uses call target count profile to annotate call instructions.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24410
llvm-svn: 281911
When porting large bodies of code from using const char*
to StringRef, it is helpful to be able to treat nullptr
as an empty string, since that it is often what it is used
to indicate in C-style code.
Differential Revision: https://reviews.llvm.org/D24697
llvm-svn: 281906
When phi nodes are created in the -mem2reg phase, the @llvm.dbg.declare
entries are converted to @llvm.dbg.value entries at the place where the
store instructions existed. However no entry is created to describe
the resulting value of the phi node.
The effect of this is especially noticeable in for loops which have a
constant for the intial value; the loop control variable's location
would be described as the intial constant value in the loop body once
the -mem2reg optimization phase was run.
This change adds the creation of the @llvm.dbg.value entries to describe
variables whose location is the result of a phi node created in -mem2reg.
Also when the phi node is finally lowered to a machine instruction it
is important that the lowered "load" instruction is placed before the
associated DEBUG_VALUE entry describing the value loaded.
Differential Revision: https://reviews.llvm.org/D23715
llvm-svn: 281895
This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter.
This is one of 3 commits to different repositories of XRay ARM port. The other 2 are:
https://reviews.llvm.org/D23932 (Clang test)
https://reviews.llvm.org/D23933 (compiler-rt)
Differential Revision: https://reviews.llvm.org/D23931
llvm-svn: 281878
The ValueSymbolTable is used to detect name conflict and rename
instructions automatically. This is not needed when the value
names are automatically discarded by the LLVMContext.
No functional change intended, just saving a little bit of memory.
This is a recommit of r281806 after fixing the accessor to return
a pointer instead of a reference and updating all the call-sites.
llvm-svn: 281813
The ValueSymbolTable is used to detect name conflict and rename
instructions automatically. This is not needed when the value
names are automatically discarded by the LLVMContext.
No functional change intended, just saving a little bit of memory.
llvm-svn: 281806
Recommitting after fixing AsmParser initialization and X86 inline asm
error cleanup.
Allow errors to be deferred and emitted as part of clean up to simplify
and shorten Assembly parser code. This will allow error messages to be
emitted in helper functions and be modified by the caller which has
better context.
As part of this many minor cleanups to the Parser:
* Unify parser cleanup on error
* Add Workaround for incorrect return values in ParseDirective instances
* Tighten checks on error-signifying return values for parser functions
and fix in-tree TargetParsers to be more consistent with the changes.
* Fix AArch64 test cases checking for spurious error messages that are
now fixed.
These changes should be backwards compatible with current Target Parsers
so long as the error status are correctly returned in appropriate
functions.
Reviewers: rnk, majnemer
Subscribers: aemerson, jyknight, llvm-commits
Differential Revision: https://reviews.llvm.org/D24047
llvm-svn: 281762
Enhance SCEV to compute the trip count for some loops with unknown stride.
Patch by Pankaj Chawla
Differential Revision: https://reviews.llvm.org/D22377
llvm-svn: 281732
When a phi node is finally lowered to a machine instruction it is
important that the lowered "load" instruction is placed before the
associated DEBUG_VALUE entry describing the value loaded.
Renamed the existing SkipPHIsAndLabels to SkipPHIsLabelsAndDebug to
more fully describe that it also skips debug entries. Then used the
"new" function SkipPHIsAndLabels when the debug information should not
be skipped when placing the lowered "load" instructions so that it is
placed before the debug entries.
Differential Revision: https://reviews.llvm.org/D23760
llvm-svn: 281727
Summary:
In runThinLTO we start the task numbering for ThinLTO backend
tasks depending on whether there was also a regular LTO object
(CombinedModule). However, the CombinedModule is moved at
the end of runRegularLTO, so we need to save this information and
pass it into runThinLTO. Otherwise the AddOutput callback to the client
will use the same task number for both the regular LTO object
and the first ThinLTO object, which in gold-plugin caused only
one to be end up in the output filename array and therefore passed
back to gold for the final native link.
Reviewers: pcc, mehdi_amini
Subscribers: mehdi_amini, kromanova
Differential Revision: https://reviews.llvm.org/D24643
llvm-svn: 281725
Summary:
`TargetLoweringObjectFile` can be re-used and thus `TargetLoweringObjectFile::Initialize()`
can be called multiple times causing `Mang` pointer memory leak.
Reviewers: echristo
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24659
llvm-svn: 281718
LazyCallGraph to support repeated, stable iterations, even in the face
of graph updates.
This is particularly important to allow the CGSCC pass manager to walk
the RefSCCs (and thus everything else) in a module more than once. Lots
of unittests and other tests were hard or impossible to write because
repeated CGSCC pass managers which didn't invalidate the LazyCallGraph
would conclude the module was empty after the first one. =[ Really,
really bad.
The interesting thing is that in many ways this simplifies the code. We
can now re-use the same code for handling reference edge insertion
updates of the RefSCC graph as we use for handling call edge insertion
updates of the SCC graph. Outside of adapting to the shared logic for
this (which isn't trivial, but is *much* simpler than the DFS it
replaces!), the new code involves putting newly created RefSCCs when
deleting a reference edge into the cached list in the correct way, and
to re-formulate the iterator to be stable and effective even in the face
of these kinds of updates.
I've updated the unittests for the LazyCallGraph to re-iterate the
postorder sequence and verify that this all works. We even check for
using alternating iterators to trigger the lazy formation of RefSCCs
after mutation has occured.
It's worth noting that there are a reasonable number of likely
simplifications we can make past this. It isn't clear that we need to
keep the "LeafRefSCCs" around any more. But I've not removed that mostly
because I want this to be a more isolated change.
Differential Revision: https://reviews.llvm.org/D24219
llvm-svn: 281716
The IPI stream is structurally identical to the TPI stream, but it
contains different record types. So we just re-use the TPI writing
code.
llvm-svn: 281638
It was only really there as a sentinel when instructions had to have precisely
one type. Now that registers are typed, each register really has to have a type
that is sized.
llvm-svn: 281599
Otherwise everything that needs to work out what size they are has to keep a
DataLayout handy, which is a bit silly and very annoying.
llvm-svn: 281597
The `CVType` had two redundant fields which were confusing and
error-prone to fill out. By treating member records as a distinct
type from leaf records, we are able to simplify this quite a bit.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D24432
llvm-svn: 281556
This completes being able to write all the interesting
values of a PDB TPI stream.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D24370
llvm-svn: 281555
Summary:
This fixes a dumpbin warning on objects produced by the MC assembler
when starting from text. All .debug$S sections are supposed to be marked
IMAGE_SCN_MEM_DISCARDABLE. The main, non-COMDAT .debug$S section had
this set, but any comdat ones were not being marked discardable because
there was no .section flag for it.
This change does two things:
- If the section name starts with .debug, implicitly mark the section as
discardable. This means we do the same thing as gas on .s files with
DWARF from GCC, which is important.
- Adds the 'D' flag to the .section directive on COFF to explicitly say
a section is discardable. We only emit this flag if the section name
does not start with .debug. This allows the user to explicitly tweak
their section flags without worrying about magic section names.
The only thing you can't do in this scheme is to create a
non-discardable section with a name starting with ".debug", but
hopefully users don't need that.
Reviewers: majnemer, rafael
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24582
llvm-svn: 281554
If TBAA is on an intrinsic and it gets upgraded, it'll delete the call
instruction that we collected in a vector. Even if we were to use
WeakVH, it'll drop the TBAA and we'll hit the assert on the upgrade
path.
r263673 gave a shot to make sure the TBAA upgrade happens before
intrinsics upgrade, but failed to account for all cases.
Instead of collecting instructions in a vector, this patch makes it
just upgrade the TBAA on the fly, because metadata are always
already loaded at this point.
Differential Revision: https://reviews.llvm.org/D24533
llvm-svn: 281549
Previously the prevailing information was not honored, and commons
symbols could override a strong definition. This patch fixes it and
propose the following semantic for commons: the client should mark
as prevailing the commons that it expects the LTO implementation to
merge (i.e. take the maximum size and alignment).
It implies that commons are allowed to have multiple prevailing
definitions.
Differential Revision: https://reviews.llvm.org/D24545
llvm-svn: 281538
Summary:
It was previously not possible for tools to use solely the stackmap
information emitted to reconstruct the return addresses of callsites in
the map, which is necessary to use the information to walk a stack. This
patch adds per-function callsite counts when emitting the stackmap
section in order to resolve the problem. Note that this slightly alters
the stackmap format, so external tools parsing these maps will need to
be updated.
**Problem Details:**
Records only store their offset from the beginning of the function they
belong to. While these records and the functions are output in program
order, it is not possible to determine where the end of one function's
records are without the callsite count when processing the records to
compute return addresses.
Patch by Kavon Farvardin!
Reviewers: atrick, ributzka, sanjoy
Subscribers: nemanjai
Differential Revision: https://reviews.llvm.org/D23487
llvm-svn: 281532
is an uint64_t. However, getter function getFlags returned an unsigned,
and in function hasProperty (1 << MCFlag) was used instead of (1ULL << MCFlag).
llvm-svn: 281483
Recommitting after fixing AsmParser Initialization.
Allow errors to be deferred and emitted as part of clean up to simplify
and shorten Assembly parser code. This will allow error messages to be
emitted in helper functions and be modified by the caller which has
better context.
As part of this many minor cleanups to the Parser:
* Unify parser cleanup on error
* Add Workaround for incorrect return values in ParseDirective instances
* Tighten checks on error-signifying return values for parser functions
and fix in-tree TargetParsers to be more consistent with the changes.
* Fix AArch64 test cases checking for spurious error messages that are
now fixed.
These changes should be backwards compatible with current Target Parsers
so long as the error status are correctly returned in appropriate
functions.
Reviewers: rnk, majnemer
Subscribers: aemerson, jyknight, llvm-commits
Differential Revision: https://reviews.llvm.org/D24047
llvm-svn: 281336
descriptions now tag add instructions, and the Hexagon backend is using this to
identify loop induction statements.
Patch by Sam Parker and Sjoerd Meijer.
Differential Revision: https://reviews.llvm.org/D23601
llvm-svn: 281304
This should allow users of the library to get a range to iterate through
all the subcommands that are registered to the global parser. This
allows users to define subcommands in libraries that self-register to
have dispatch done at a different stage (like main). It allows for
writing code like the following:
for (auto *S : cl::getRegisteredSubcommands()) {
if (*S) {
// Dispatch on S->getName().
}
}
This change also contains tests that show this usage pattern.
Reviewers: zturner, dblaikie, echristo
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24489
llvm-svn: 281290
This patch reverses the edge from DIGlobalVariable to GlobalVariable.
This will allow us to more easily preserve debug info metadata when
manipulating global variables.
Fixes PR30362. A program for upgrading test cases is attached to that
bug.
Differential Revision: http://reviews.llvm.org/D20147
llvm-svn: 281284
class.
SerializationTraits provides serialize and deserialize methods corresponding to
the earlier functions, but also provides a name for the type. In future, this
name will be used to render function signatures as strings, which will in turn
be used to negotiate and verify API support between RPC clients and servers.
llvm-svn: 281254
Allow errors to be deferred and emitted as part of clean up to simplify
and shorten Assembly parser code. This will allow error messages to be
emitted in helper functions and be modified by the caller which has
better context.
As part of this many minor cleanups to the Parser:
* Unify parser cleanup on error
* Add Workaround for incorrect return values in ParseDirective instances
* Tighten checks on error-signifying return values for parser functions
and fix in-tree TargetParsers to be more consistent with the changes.
* Fix AArch64 test cases checking for spurious error messages that are
now fixed.
These changes should be backwards compatible with current Target Parsers
so long as the error status are correctly returned in appropriate
functions.
Reviewers: rnk, majnemer
Subscribers: aemerson, jyknight, llvm-commits
Differential Revision: https://reviews.llvm.org/D24047
llvm-svn: 281249
Unlike SDag, we use a separate G_GEP instruction (much simplified, only taking
a single byte offset) to preserve the pointer type information through
selection.
llvm-svn: 281205
Remove createNode() and any API that depending on it, and add
HasCreateNode to the list of checks for HasObsoleteCustomizations. Now
an ilist *never* allocates (this was already true for iplist).
This factors out all the differences between iplist and ilist. I'll aim
to rename both to "owning_ilist" eventually, to call out the interesting
(not exactly intrusive) ownership semantics. In the meantime, I've left
both names around to reduce code churn.
One of the deleted APIs is the ilist copy constructor. I've lifted up
and tested iplist::cloneFrom (ala simple_ilist::cloneFrom) as a
replacement.
Users of ilist<> and iplist<> that want the list to allocate nodes have
a few options:
- use std::list;
- use AllocatorList or BumpPtrList (or build a similarly trivial list);
- use cloneFrom (which is explicit at the call site); or
- allocate at the call site.
See r280573, r281177, r281181, and r281182 for examples of what to do if
you're updating out-of-tree code.
llvm-svn: 281184
- Add AllocatorList, a non-intrusive list that owns an LLVM-style
allocator and provides a std::list-like interface (trivially built on
top of simple_ilist),
- add a typedef (and unit tests) for BumpPtrList, and
- use BumpPtrList for the list of llvm::yaml::Token (i.e., TokenQueueT).
TokenQueueT has no need for the complexity of an intrusive list. The
only reason to inherit from ilist was to customize the allocator.
TokenQueueT was the only example in-tree of using ilist<> in a truly
non-intrusive way.
Moreover, this removes the final use of the non-intrusive
ilist_traits<>::createNode (after r280573, r281177, and r281181). I
have a WIP patch that removes this customization point (and the API that
relies on it) that I plan to commit soon.
Note: AllocatorList owns the allocator, which limits the viable API
(e.g., splicing must be on the same list). For now I've left out
any problematic API. It wouldn't be hard to split AllocatorList into
two layers: an Impl class that calls DerivedT::getAlloc (via CRTP), and
derived classes that handle Allocator ownership/reference/etc semantics;
and then implement splice with appropriate assertions; but TBH we should
probably just customize the std::list allocators at that point.
llvm-svn: 281182
Force IVUsers to be moved instead of copied, properly update Parent
pointers in IVStrideUse when IVUsers is moved, and make sure we have
move constructors available in iplist and ilist.
I came across this in a WIP patch that deleted the copy constructors
from ilist. I was surprised to find that IVUsersAnalysis couldn't be
registered in the new pass manager.
It's not clear to me whether IVUsers was getting moved only when empty,
but if it was being moved when it was non-empty then this fixes a
pointer invalidation bug and should give some sort of speedup. Note
that the bugfix would be necessary even for a copy constructor.
llvm-svn: 281181
ilist_iterator::reset was unnecessary API, and wasn't any clearer (or
safer) at the call site than constructing a temporary and assigning it
to the iterator.
llvm-svn: 281175
Now that MachineBasicBlock::reverse_instr_iterator knows when it's at
the end (since r281168 and r281170), implement
MachineBasicBlock::reverse_iterator directly on top of an
ilist::reverse_iterator by adding an IsReverse template parameter to
MachineInstrBundleIterator. This replaces another hard-to-reason-about
use of std::reverse_iterator on list iterators, matching the changes for
ilist::reverse_iterator from r280032 (see the "out of scope" section at
the end of that commit message). MachineBasicBlock::reverse_iterator
now has a handle to the current node and has obvious invalidation
semantics.
r280032 has a more detailed explanation of how list-style reverse
iterators (invalidated when the pointed-at node is deleted) are
different from vector-style reverse iterators like std::reverse_iterator
(invalidated on every operation). A great motivating example is this
commit's changes to lib/CodeGen/DeadMachineInstructionElim.cpp.
Note: If your out-of-tree backend deletes instructions while iterating
on a MachineBasicBlock::reverse_iterator or converts between
MachineBasicBlock::iterator and MachineBasicBlock::reverse_iterator,
you'll need to update your code in similar ways to r280032. The
following table might help:
[Old] ==> [New]
delete &*RI, RE = end() delete &*RI++
RI->erase(), RE = end() RI++->erase()
reverse_iterator(I) std::prev(I).getReverse()
reverse_iterator(I) ++I.getReverse()
--reverse_iterator(I) I.getReverse()
reverse_iterator(std::next(I)) I.getReverse()
RI.base() std::prev(RI).getReverse()
RI.base() ++RI.getReverse()
--RI.base() RI.getReverse()
std::next(RI).base() RI.getReverse()
(For more details, have a look at r280032.)
llvm-svn: 281172
Add an assertion to the MachineInstrBundleIterator from instr_iterator
that the underlying iterator is valid. This is possible know that we
can check ilist_node::isSentinel (since r281168), and is consistent with
the constructors from MachineInstr* and MachineInstr&.
Avoiding the new assertion in operator== and operator!= requires four
(!!!!) new overloads each.
(As an aside, I'm strongly in favour of:
- making the conversion from instr_iterator explicit;
- making the conversion from pointer explicit;
- making the conversion from reference explicit; and
- removing all the extra overloads of operator== and operator!= except
const_instr_iterator.
I'm not signing up for that at this point, but being clear about when
something is an MachineInstr-iterator (possibly instr_end()) vs
MachineInstr-bundle-iterator (possibly end()) vs MachineInstr* (possibly
nullptr) vs MachineInstr& (known valid) would surely make code
cleaner... and it would remove a ton of boilerplate from
MachineInstrBundleIterator operators.)
llvm-svn: 281170
This is a prep commit before fixing MachineBasicBlock::reverse_iterator
invalidation semantics, ala r281167 for ilist::reverse_iterator. This
changes MachineBasicBlock::Instructions to track which node is the
sentinel regardless of LLVM_ENABLE_ABI_BREAKING_CHECKS.
There's almost no functionality change (aside from ABI). However, in
the rare configuration:
#if !defined(NDEBUG) && !defined(LLVM_ENABLE_ABI_BREAKING_CHECKS)
the isKnownSentinel() assertions in ilist_iterator<>::operator* suddenly
have teeth for MachineInstr. If these assertions start firing for your
out-of-tree backend, have a look at the suggestions in the commit
message for r279314, and at some of the commits leading up to it that
avoid dereferencing the end() iterator.
llvm-svn: 281168
This adds two declarative configuration options for intrusive lists
(available for simple_ilist, iplist, and ilist). Both of these options
affect ilist_node interoperability and need to be passed both to the
node and the list. Instead of adding a new traits class, they're
specified as optional template parameters (in any order).
The two options:
1. Pass ilist_sentinel_tracking<true> or ilist_sentinel_tracking<false>
to control whether there's a bit on ilist_node "prev" pointer
indicating whether it's the sentinel. The default behaviour is to
use a bit if and only if LLVM_ENABLE_ABI_BREAKING_CHECKS.
2. Pass ilist_tag<TagA> and ilist_tag<TagB> to allow insertion of a
single node into two different lists (simultaneously).
I have an immediate use-case for (1) ilist_sentinel_tracking: fixing the
validation semantics of MachineBasicBlock::reverse_iterator to match
ilist::reverse_iterator (ala r280032: see the comments at the end of the
commit message there). I'm adding (2) ilist_tag in the same commit to
validate that the options framework supports expansion. Justin Bogner
mentioned this might enable a possible cleanup in SelectionDAG, but I'll
leave this to others to explore. In the meantime, the unit tests and
the comments for simple_ilist and ilist_node have usage examples.
Note that there's a layer of indirection to support optional,
out-of-order, template paramaters. Internal classes are templated on an
instantiation of the non-variadic ilist_detail::node_options.
User-facing classes use ilist_detail::compute_node_options to compute
the correct instantiation of ilist_detail::node_options.
The comments for ilist_detail::is_valid_option describe how to add new
options (e.g., ilist_packed_int<int NumBits>).
llvm-svn: 281167
Summary:
An IR load can be invariant, dereferenceable, neither, or both. But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.
This patch splits up the notions of invariance and dereferenceability at
the MI level. It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D23371
llvm-svn: 281151
should have been (1ULL << MCID::XYZ). Currently this works because enum Flag
has 31 values, but extending it will result in a compile warnings/errors.
This was part of the accepted patch in https://reviews.llvm.org/D23601, but
it was suggested to apply this first as a separate patch.
llvm-svn: 281149
... and make a few ilist-internal API changes, in preparation for
changing how ilist_node is templated. The only effect for ilist users
should be changing the friend target from llvm::ilist_node_access to
llvm::ilist_detail::NodeAccess (which is only necessary when they
inherit privately from ilist_node).
- Split out SpecificNodeAccess, which has overloads of getNodePtr and
getValuePtr that are untemplated.
- Use more typedefs to prevent more changes later.
- Force inheritance to use *NodeAccess (to emphasize that ilist *users*
shouldn't be doing this).
There should be no functionality change here.
llvm-svn: 281142
Summary:
I want to separate out the notions of invariance and dereferenceability
at the MI level, so that they correspond to the equivalent concepts at
the IR level. (Currently an MI load is MI-invariant iff it's
IR-invariant and IR-dereferenceable.)
First step is renaming this function.
Reviewers: chandlerc
Subscribers: MatzeB, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D23370
llvm-svn: 281125
I tested this with "ninja check-llvm-codegen" on a Release build with
all architectures enabled, and again with a Debug build on x86.
Found with llvm-cov.
Differential Revision: https://reviews.llvm.org/D24433
llvm-svn: 281120
SmallVectors are convenient, but they don't cover every use case.
In particular, they are fairly large (3 pointers + one element) and
there is no way to take ownership of the buffer to put it somewhere
else. This patch then adds a lower lever interface that works with
any buffer.
llvm-svn: 281082
This simplifies a lot of code, and will actually be necessary for
an upcoming patch to serialize TPI record hash values.
The idea before was that visitors should be examining records, not
modifying them. But this is no longer true with a visitor that
constructs a CVRecord from Yaml. To handle this until now, we
were doing some fixups on CVRecord objects at a higher level, but
the code is really awkward, and it makes sense to just have the
visitor write the bytes into the CVRecord. In doing so I uncovered
a few bugs related to `Data` and `RawData` and fixed those.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D24362
llvm-svn: 281067
This writes the full sequence of type records described in
Yaml to the TPI stream of the PDB file.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D24316
llvm-svn: 281063
These were added in r281051, which, I am embarrassed to admit, has an
incomplete commit message that I forgot to update before pushing. You
can ignore element (2) in that list.
llvm-svn: 281054
1) On some platforms, sizeof(SDNodeBits) == 1, so we were only copying
one byte out of the bitfield when we wanted to copy two, and we were
leaving half of the return value of getRawSubclassData() undefined.
2) Something something bitfields, not sure exactly what the issue or fix
is, yet. (TODO)
Summary:
Previously we were assuming that SDNodeBits covered all of SDNode's
anonymous subclass data bitfield union. But that's not right; it might
have size 1, in which it clearly doesn't.
This patch adds a field that does cover the whole union and adds
static_asserts to ensure it stays correct.
Reviewers: ahatanak, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24223
llvm-svn: 281051
These instructions were only necessary when type information was stored in the
MachineInstr (because only generic MachineInstrs possessed a type). Now that
it's in MachineRegisterInfo, COPY and PHI work fine.
llvm-svn: 281037
We want each register to have a canonical type, which means the best place to
store this is in MachineRegisterInfo rather than on every MachineInstr that
happens to use or define that register.
Most changes following from this are pretty simple (you need an MRI anyway if
you're going to be doing any transformations, so just check the type there).
But legalization doesn't really want to check redundant operands (when, for
example, a G_ADD only ever has one type) so I've made use of MCInstrDesc's
operand type field to encode these constraints and limit legalization's work.
As an added bonus, more validation is possible, both in MachineVerifier and
MachineIRBuilder (coming soon).
llvm-svn: 281035
Summary:
While woring on mapping attributes in the C API, it clearly appeared that the recent changes in the API on the C++ side left Function and Call/Invoke with an attribute API that grew in an ad hoc manner. This makes it difficult to work with it, because one doesn't know which overloads exists and which do not.
Make sure that getter/setter function exists for both enum and string version. Remove inconsistent getter/setter, unless they have many callsites.
This should make it easier to work with attributes in the future.
This doesn't change how attribute works.
Reviewers: bkramer, whitequark, mehdi_amini, void
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21514
llvm-svn: 281019
mapping a yaml field to an object in code has always been
a stateless operation. You could still pass state by using the
`setContext` function of the YAMLIO object, but this represented
global state for the entire yaml input. In order to have
context-sensitive state, it is necessary to pass this state in
at the granularity of an individual mapping.
This patch adds support for this type of context-sensitive state.
You simply pass an additional argument of type T to the
`mapRequired` or `mapOptional` functions, and provided you have
specialized a `MappingContextTraits<U, T>` class with the
appropriate mapping function, you can pass this context into
the mapping function.
Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D24162
llvm-svn: 280977
And associated commits, as they broke the Thumb bots.
This reverts commit r280935.
This reverts commit r280891.
This reverts commit r280888.
llvm-svn: 280967
Summary:
This allows specifying instructions that are available only in specific assembler variant. If AsmVariantName is specified then instruction will be presented only in MatchTable for this variant. If not specified then assembler variants will be determined based on AsmString.
Also this allows splitting assembler match tables in same way as it is done in dissasembler.
Reviewers: ab, tstellarAMD, craig.topper, vpykhtin
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D24249
llvm-svn: 280952
Refactor replaceDominatedUsesWith to have a flag to control whether to replace uses in BB itself.
Summary: This is in preparation for LoopSink pass which calls replaceDominatedUsesWith to update after sinking.
llvm-svn: 280949
CGP tail-duplicates rets into blocks that end with a call that feed the ret.
This puts the call in tail position, potentially allowing the DAG builder to
lower it as a tail call. To avoid tail duplication in cases where we won't
form the tail call, CGP tried to predict whether this is going to be possible,
and avoids doing it when lowering as a tail call will definitely fail.
However, it was being too conservative by always throwing away calls to
functions with a signext/zeroext attribute on the return type.
Instead, we can use the same logic the builder uses to determine whether the
attributes work out.
Differential Revision: https://reviews.llvm.org/D24315
llvm-svn: 280894
This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter.
This is one of 3 commits to different repositories of XRay ARM port. The other 2 are:
1. https://reviews.llvm.org/D23932 (Clang test)
2. https://reviews.llvm.org/D23933 (compiler-rt)
Differential Revision: https://reviews.llvm.org/D23931
llvm-svn: 280888
When branching to a block that immediately tail calls, it is possible to fold
the call directly into the branch if the call is direct and there is no stack
adjustment, saving one byte.
Example:
define void @f(i32 %x, i32 %y) {
entry:
%p = icmp eq i32 %x, %y
br i1 %p, label %bb1, label %bb2
bb1:
tail call void @foo()
ret void
bb2:
tail call void @bar()
ret void
}
before:
f:
movl 4(%esp), %eax
cmpl 8(%esp), %eax
jne .LBB0_2
jmp foo
.LBB0_2:
jmp bar
after:
f:
movl 4(%esp), %eax
cmpl 8(%esp), %eax
jne bar
.LBB0_1:
jmp foo
I don't expect any significant size savings from this (on a Clang bootstrap I
saw 288 bytes), but it does make the code a little tighter.
This patch only does 32-bit, but 64-bit would work similarly.
Differential Revision: https://reviews.llvm.org/D24108
llvm-svn: 280832
Summary:
Previously we were trying to represent this with the "contains" list of
the .cv_inline_linetable directive, which was not enough information.
Now we directly represent the chain of inlined call sites, so we know
what location to emit when we encounter a .cv_loc directive of an inner
inlined call site while emitting the line table of an outer function or
inlined call site. Fixes PR29146.
Also fixes PR29147, where we would crash when .cv_loc directives crossed
sections. Now we write down the section of the first .cv_loc directive,
and emit an error if any other .cv_loc directive for that function is in
a different section.
Also fixes issues with discontiguous inlined source locations, like in
this example:
volatile int unlikely_cond = 0;
extern void __declspec(noreturn) abort();
__forceinline void f() {
if (!unlikely_cond) abort();
}
int main() {
unlikely_cond = 0;
f();
unlikely_cond = 0;
}
Previously our tables gave bad location information for the 'abort'
call, and the debugger wouldn't snow the inlined stack frame for 'f'.
It is important to emit good line tables for this code pattern, because
it comes up whenever an asan bug occurs in an inlined function. The
__asan_report* stubs are generally placed after the normal function
epilogue, leading to discontiguous regions of inlined code.
Reviewers: majnemer, amccarth
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24014
llvm-svn: 280822
This adds a copy of the demangler in libcxxabi.
The code also has no dependencies on anything else in LLVM. To enforce
that I added it as another library. That way a BUILD_SHARED_LIBS will
fail if anyone adds an use of StringRef for example.
The no llvm dependency combined with the fact that this has to build
on linux, OS X and Windows required a few changes to the code. In
particular:
No constexpr.
No alignas
On OS X at least this library has only one global symbol:
__ZN4llvm16itanium_demangleEPKcPcPmPi
My current plan is:
Commit something like this
Change lld to use it
Change lldb to use it as the fallback
Add a few #ifdefs so that exactly the same file can be used in
libcxxabi to export abi::__cxa_demangle.
Once the fast demangler in lldb can handle any names this
implementation can be replaced with it and we will have the one true
demangler.
llvm-svn: 280732
Currently the pass updates branch weights in the IR if the function has
any PGO info (entry frequency is set). However we could still have
regions of the CFG that does not have branch weights collected (e.g. a
cold region). In this case we'd use static estimates. Since static
estimates for branches are determined independently, they are
inconsistent. Updating them can "randomly" inflate block frequencies.
I've run into this in a completely cold loop of h264ref from
SPEC. -Rpass-with-hotness showed the loop to be completely cold during
inlining (before JT) but completely hot during vectorization (after JT).
The new testcase demonstrate the problem. We check array elements
against 1, 2 and 3 in a loop. The check against 3 is the loop-exiting
check. The block names should be self-explanatory.
In this example, jump threading incorrectly updates the weight of the
loop-exiting branch to 0, drastically inflating the frequency of the
loop (in the range of billions).
There is no run-time profile info for edges inside the loop, so branch
probabilities are estimated. These are the resulting branch and block
frequencies for the loop body:
check_1 (16)
(8) / |
eq_1 | (8)
\ |
check_2 (16)
(8) / |
eq_2 | (8)
\ |
check_3 (16)
(1) / |
(loop exit) | (15)
|
(back edge)
First we thread eq_1 -> check_2 to check_3. Frequencies are updated to
remove the frequency of eq_1 from check_2 and then from the false edge
leaving check_2. Changed frequencies are highlighted with * *:
check_1 (16)
(8) / |
eq_1~ | (8)
/ |
/ check_2 (*8*)
/ (8) / |
\ eq_2 | (*0*)
\ \ |
` --- check_3 (16)
(1) / |
(loop exit) | (15)
|
(back edge)
Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new
back edges. Frequencies are updated to remove the frequency of eq_1 and
eq_3 from check_3 and then the false edge leaving check_3 (changed
frequencies are highlighted with * *):
check_1 (16)
(8) / |
eq_1~ | (8)
/ |
/ check_2 (*8*)
/ (8) / |
/-- eq_2~ | (*0*)
(back edge) |
check_3 (*0*)
(*0*) / |
(loop exit) | (*0*)
|
(back edge)
As a result, the loop exit edge ends up with 0 frequency which in turn makes
the loop header to have maximum frequency.
There are a few potential problems here:
1. The profile data seems odd. There is a single profile sample of the
loop being entered. On the other hand, there are no weights inside the
loop.
2. Based on static estimation we shouldn't set edges to "extreme"
values, i.e. extremely likely or unlikely.
3. We shouldn't create profile metadata that is calculated from static
estimation. I am not sure what policy is but it seems to make sense to
treat profile metadata as something that is known to originate from
profiling. Estimated probabilities should only be reflected in BPI/BFI.
Any one of these would probably fix the immediate problem. I went for 3
because I think it's a good policy to have and added a FIXME about 2.
Differential Revision: https://reviews.llvm.org/D24118
llvm-svn: 280713
Use ADT/BitmaskEnum for DINode::DIFlags for the following purposes:
Get rid of unsigned int for flags to avoid problems on platforms with sizeof(int) < 4
Flags are now strongly typed
Patch by: Victor Leschuk <vleschuk@gmail.com>
Differential Revision: https://reviews.llvm.org/D23766
llvm-svn: 280700
Use ADT/BitmaskEnum for DINode::DIFlags for the following purposes:
* Get rid of unsigned int for flags to avoid problems on platforms with sizeof(int) < 4
* Flags are now strongly typed
Patch by: Victor Leschuk <vleschuk@gmail.com>
Differential Revision: https://reviews.llvm.org/D23766
llvm-svn: 280686
The latest MSVC update apparently resolve the call from the
const ref variant to itself, leading to an infinite
recursion. It is not clear to me why the r-value overload is
not selected. `ValueT` is a pointer type, and the functional-style
cast in the call `insert_as(ValueT(V), LookupKey);` should result
in a r-value ref. A bug in MSVC?
Differential Revision: https://reviews.llvm.org/D23956
llvm-svn: 280685
CGP currently drops select's MD_prof profile data when
generating conditional branch which can lead to bad
code layout. The patch fixes the issue.
Differential Revision: http://reviews.llvm.org/D24169
llvm-svn: 280600
Because the recent change about ODR type uniquing in the context,
we can reach types defined in another module during IR linking.
This triggered some assertions in case we IR link without starting
from an empty module. To alleviate that, we can self-map metadata
defined in the destination module so that they won't be visited.
Differential Revision: https://reviews.llvm.org/D23841
llvm-svn: 280599
The only intrusive thing about SparseBitVector's usage of ilist<> was
that new was usually called externally. There were no custom traits.
It seems like the reason to switch to ilist in r41855 was to avoid
pointer invalidation, but std::list<> has that feature too. Maybe
std::list<>::emplace makes this a little more obvious than it was then.
Switch over to std::list<> and simplify the code.
llvm-svn: 280573
Inheriting from std::iterator uses more boiler-plate than manual
typedefs. Avoid that in both ilist_iterator and
MachineInstrBundleIterator.
This has the side effect of removing ilist_iterator from certain ADL
lookups in namespace std; calls to std::next need to be qualified by
"std::" that didn't have to before. The one case of this in-tree was
operating on a temporary, so I used the more compact operator++.
llvm-svn: 280570
Split out iplist_impl from iplist, and change SymbolTableList to inherit
directly from iplist_impl. This makes it more straightforward to add
new template paramaters to iplist [*]:
- iplist_impl takes a "base" list that provides the intrusive
functionality (usually simple_ilist<T>) and a traits class.
- iplist no longer takes a "Traits" template parameter. It only takes
the value_type, T, and instantiates iplist_impl with simple_ilist<T>
and ilist_traits<T>.
- SymbolTableList now inherits from iplist_impl, instead of iplist.
Note for out-of-tree code: if you have an iplist whose second template
parameter was *not* the default (i.e., not ilist_traits<YourT>), you
have three options:
- Stop using a custom traits class, and instead specialize
ilist_traits<YourT>. This is the usual thing to do.
- Specialize iplist<YourT> to pass your custom traits class into
iplist_impl.
- Create your own trivial list type that passes your custom traits class
into iplist_impl (see SymbolTableList<> for an example).
[*]: The eventual goal is to start tracking a sentinel bit on the
MachineInstr list even when LLVM_ENABLE_ABI_BREAKING_CHECKS is off,
which will enable MachineBasicBlock::reverse_iterator to have normal
list invalidation semantics that matching the new
iplist<>::reverse_iterator from r280032.
llvm-svn: 280569
Delete the dead code for Write(ilist_iterator) in the IR Verifier,
inline report(ilist_iterator) at its call sites in the MachineVerifier,
and use simple_ilist<>::iterator in SymbolTableListTraits.
The only remaining reference to ilist_iterator outside of the ilist
implementation is from MachineInstrBundleIterator. I'll get rid of that
in a follow-up.
llvm-svn: 280565
This test was using the wrong type, and so not actually testing much.
ilist_iterator constructors weren't going through ilist_node_access, so
they didn't actually work with private inheritance.
llvm-svn: 280564
1. 0xNN and NNh are accepted as valid hexadecimal numbers, but 0xNNh is not.
0xNN and NNh may come with optional U or L suffix.
2. NNb is accepted as a valid binary (base-2) number, but 0bNN is not.
NNb may come with optional U or L suffix.
Differential Revision: https://reviews.llvm.org/D22112
llvm-svn: 280555
Previously we were splitting our records at 0xFFFF bytes, which the
Microsoft tools don't like.
Should fix failure on the new Windows self-host buildbot.
This length appears in microsoft-pdb/PDB/dbi/dbiimpl.h
llvm-svn: 280522
For the store of a wide value merged from a pair of values, especially int-fp pair,
sometimes it is more efficent to split it into separate narrow stores, which can
remove the bitwise instructions or sink them to colder places.
Now the feature is only enabled on x86 target, and only store of int-fp pair is
splitted. It is possible that the application scope gets extended with perf evidence
support in the future.
Differential Revision: https://reviews.llvm.org/D22840
llvm-svn: 280505
Crash was possible if match() method
was called on object that was moved or object
created with empty constructor.
Testcases updated.
DIfferential revision: https://reviews.llvm.org/D24123
llvm-svn: 280473
r280425 | dehao | 2016-09-01 16:15:50 -0700 (Thu, 01 Sep 2016) | 9 lines
Refactor LICM pass in preparation for LoopSink pass.
Summary: LoopSink pass uses some common function in LICM. This patch refactor the LICM code to make it usable by LoopSink pass (https://reviews.llvm.org/D22778).
r280429 | dehao | 2016-09-01 16:31:25 -0700 (Thu, 01 Sep 2016) | 9 lines
Refactor LICM to expose canSinkOrHoistInst to LoopSink pass.
Summary: LoopSink pass shares the same canSinkOrHoistInst functionality with LICM pass. This patch exposes this function in preparation of https://reviews.llvm.org/D22778
llvm-svn: 280453
Summary:
This is pretty useful especially in connection with
BFI's -view-block-freq-propagation-dags. It helped me to track down the
bug that is being fixed in D24118.
While -view-block-freq-propagation-dags displays the high-level
information with static heuristics included (and block frequencies), the
new thing only shows the raw weight as presented by PGO without any of
the static estimates. This helps to distinguished what has been
measured vs. estimated.
For the sample loop in D24118, -view-block-freq-propagation-dags=integer
looks like this:
https://reviews.llvm.org/F2381352
While with -view-cfg-only you can see the underlying branch weights:
https://reviews.llvm.org/F2392296
Reviewers: dexonsmith, bogner, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24144
llvm-svn: 280442
Summary: LoopSink pass shares the same canSinkOrHoistInst functionality with LICM pass. This patch exposes this function in preparation of https://reviews.llvm.org/D22778
Reviewers: chandlerc, davidxl, danielcdh
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24171
llvm-svn: 280429
Summary: This is in preparation for LoopSink pass which calls replaceDominatedUsesWith to update after sinking.
Reviewers: chandlerc, davidxl, danielcdh
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24170
llvm-svn: 280427
They're another source of generic vregs, which are going to need a type on the
definition when we remove the register width from MachineRegisterInfo.
llvm-svn: 280412
Previous change broke the C API for creating an EarlyCSE pass w/
MemorySSA by adding a bool parameter to control whether MemorySSA was
used or not. This broke the OCaml bindings. Instead, change the old C
API entry point back and add a new one to request an EarlyCSE pass with
MemorySSA.
llvm-svn: 280379
If an attribute name has special characters such as '\01', it is not
properly printed in LLVM assembly language format. Since the format
expects the special characters are printed as it is, it has to contain
escape characters to make it printable.
Before:
attributes #0 = { ... "counting-function"="^A__gnu_mcount_nc" ...
After:
attributes #0 = { ... "counting-function"="\01__gnu_mcount_nc" ...
Reviewers: hfinkel, rengolin, rjmccall, compnerd
Subscribers: nemanjai, mcrosier, hans, shenhan, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D23792
llvm-svn: 280357
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible
__builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently
broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is
lowered using:
ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET)
where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86,
FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not
work for PowerPC. Because of the way that the stack layout works, the canonical
frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC
(there is a lower save-area offset as well), so it is not just a matter of
implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its
semantics -- We can do that, since it is currently used only for
@llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct
itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips
currently does this, but by using a custom lowering for ADD that specifically
recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern.
This change introduces a ISD::EH_DWARF_CFA node, which by default expands using
the existing logic, but can be directly lowered by the target. Mips is updated
to use this method (which simplifies its implementation, and I suspect makes it
more robust), and updates PowerPC to do the same.
Fixes PR26761.
Differential Revision: https://reviews.llvm.org/D24038
llvm-svn: 280350
As discussed in https://reviews.llvm.org/D22666, our current mechanism to
support -pg profiling, where we insert calls to mcount(), or some similar
function, is fundamentally broken. We insert these calls in the frontend, which
means they get duplicated when inlining, and so the accumulated execution
counts for the inlined-into functions are wrong.
Because we don't want the presence of these functions to affect optimizaton,
they should be inserted in the backend. Here's a pass which would do just that.
The knowledge of the name of the counting function lives in the frontend, so
we're passing it here as a function attribute. Clang will be updated to use
this mechanism.
Differential Revision: https://reviews.llvm.org/D22825
llvm-svn: 280347
Summary:
This change promotes the 'isTailCall(...)' member function to
TargetInstrInfo as a query interface for determining on a per-target
basis whether a given MachineInstr is a tail call instruction. We build
upon this in the XRay instrumentation pass to emit special sleds for
tail call optimisations, where we emit the correct kind of sled.
The tail call sleds look like a mix between the function entry and
function exit sleds. Form-wise, the sled comes before the "jmp"
instruction that implements the tail call similar to how we do it for
the function entry sled. Functionally, because we know this is a tail
call, it behaves much like an exit sled -- i.e. at runtime we may use
the exit trampolines instead of a different kind of trampoline.
A follow-up change to recognise these sleds will be done in compiler-rt,
so that we can start intercepting these initially as exits, but also
have the option to have different log entries to more accurately reflect
that this is actually a tail call.
Reviewers: echristo, rSerge, majnemer
Subscribers: mehdi_amini, dberris, llvm-commits
Differential Revision: https://reviews.llvm.org/D23986
llvm-svn: 280334
Older versions of clang defined __has_cpp_attribute in C mode, but
would choke on scoped attributes, as per llvm.org/PR23435. Since we
support building with clang all the way back to 3.1, we have to work
around this issue.
llvm-svn: 280326
Previously we were assuming that any visitation of types would
necessarily be against a type we had binary data for. Reasonable
assumption when were just reading PDBs and dumping them, but once
we start writing PDBs from Yaml this breaks down, because we have
no binary data yet, only Yaml, and from that we need to read the
record kind and perform the switch based on that.
So this patch does that. Instead of having the visitor switch
on the kind that is already in the CVType record, we change the
visitTypeBegin() method to return the Kind, and switch on the
returned value. This way, the default implementation can still
return the value from the CVType, but the implementation which
visits Yaml records and serializes binary PDB type records can
use the field in the Yaml as the source of the switch.
llvm-svn: 280307
This reverts commit r280268, it causes all MSVC 2013 to ICE. This
appears to have been fixed in a later MSVC 2013 update, because I cannot
reproduce it locally. That said, all upstream LLVM bots are broken right
now, so I am reverting.
Also reverts dependent change r280275, "[Hexagon] Deal with undefs when
extending live intervals".
llvm-svn: 280301
We were kind of hacking this together before by embedding the
ability to forward requests into the TypeDeserializer. When
we want to start adding more different kinds of visitor callback
interfaces though, this doesn't scale well and is very inflexible.
So introduce the notion of a pipeline, which itself implements
the TypeVisitorCallbacks interface, but which contains an internal
list of other callbacks to invoke in sequence.
Also update the existing uses of CVTypeVisitor to use this new
pipeline class for deserializing records before visiting them
with another visitor.
llvm-svn: 280293
More preparation for dropping source types from MachineInstrs: regsters coming
out of already-selected code (i.e. non-generic instructions) don't have a type,
but that information is needed so we must add it manually.
This is done via a new G_TYPE instruction.
llvm-svn: 280292
Summary:
Current implementation of LI verifier isn't ideal and fails to detect
some cases when LI is incorrect. For instance, it checks that all
recorded loops are in a correct form, but it has no way to check if
there are no more other (unrecorded in LI) loops in the function. This
patch adds a way to detect such bugs.
Reviewers: chandlerc, sanjoy, hfinkel
Subscribers: llvm-commits, silvas, mzolotukhin
Differential Revision: https://reviews.llvm.org/D23437
llvm-svn: 280280
Summary:
Use MemorySSA, if requested, to do less conservative memory dependency
checking.
This change doesn't enable the MemorySSA enhanced EarlyCSE in the
default pipelines, so should be NFC.
Reviewers: dberlin, sanjoy, reames, majnemer
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19821
llvm-svn: 280279
This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes.
The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate.
Today, we *only* fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.)
My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any *known* miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default.
Differential Revision: https://reviews.llvm.org/D24000
llvm-svn: 280250
types. This is the LLVM counterpart and it adds options that map onto FP
exceptions and denormal build attributes allowing better fp math library
selections.
Differential Revision: https://reviews.llvm.org/D24070
llvm-svn: 280246
Move the comparison function into the only place there it is used,
i.e. the call to std::stable_sort in CoverageMappingWriter::write().
Add sorting by region kinds as it is required to ensure stable order
in our tests and to simplify D23987.
Differential Revision: https://reviews.llvm.org/D24034
llvm-svn: 280198
Legalization ends up creating many G_SEQUENCE/G_EXTRACT pairs which leads to
inefficient codegen (even for -O0), so add a quick pass over the function to
remove them again.
llvm-svn: 280155
When binaries are compressed by UPX, information about symbol table
offset and symbol count remain unchanged (but became invalid due to
compression).
This causes failure in the constructor and the rest of the binary cannot
be processed.
Instead, reset symbol related information (symbol/string table pointers,
sizes) - this should disable the related iterators and functions while
the rest of the binary can still be processed.
Patch by Bandzi Michal!
llvm-svn: 280147
Add constants for additional GNU note types and the GNU Notes OS type id. This
is needed to support printing the notes in ELF binaries.
llvm-svn: 280130
Many lists want to override only allocation semantics, or callbacks for
iplist. Split these up to prevent code duplication.
- Specialize ilist_alloc_traits to change the implementations of
deleteNode() and createNode().
- One common desire is to do nothing deleteNode() and disable
createNode(). Specialize ilist_alloc_traits to inherit from
ilist_noalloc_traits for that behaviour.
- Specialize ilist_callback_traits to use the addNodeToList(),
removeNodeFromList(), and transferNodesFromList() callbacks.
As a drive-by, add some coverage to the callback-related unit tests.
llvm-svn: 280128
The existing code hard-coded a limit of 20 instructions for duplication
when a block ended with an indirect branch. Extract this as an option.
No functional change intended.
llvm-svn: 280125
Guarantee that ilist_traits<T>::transferNodesFromList is only called
when nodes are actually changing lists.
I also moved all the callbacks to occur *first*, before the operation.
This is the only choice for iplist<T>::merge, so we might as well be
consistent. I expect this to have no effect in practice, although it
simplifies the logic in both iplist<T>::transfer and iplist<T>::insert.
llvm-svn: 280122
This is a prep commit before splitting up ilist_node_traits and
updating/simplifying call sites.
- Move to top of file (I considered moving to a different file,
llvm/ADT/ilist_traits.h, but it's really not much code).
- Clang-format.
- Convert comments to doxygen, clean them up, and add TODOs for what I'm
doing next.
llvm-svn: 280109
Split out a new, low-level intrusive list type with clear semantics.
Unlike iplist (and ilist), all operations on simple_ilist are intrusive,
and simple_ilist never takes ownership of its nodes. This enables an
intuitive API that has the right defaults for intrusive lists.
- insert() takes references (not pointers!) to nodes (in iplist/ilist,
passing a reference will cause the node to be copied).
- erase() takes only iterators (like std::list), and does not destroy
the nodes.
- remove() takes only references and has the same behaviour as erase().
- clear() does not destroy the nodes.
- The destructor does not destroy the nodes.
- New API {erase,remove,clear}AndDispose() take an extra Disposer
functor for callsites that want to call some disposal routine (e.g.,
std::default_delete).
This list is not currently configurable, and has no callbacks.
The initial motivation was to fix iplist<>::sort to work correctly (even
with callbacks in ilist_traits<>). iplist<> uses simple_ilist<>::sort
directly. The new test in unittests/IR/ModuleTest.cpp crashes without
this commit.
Fixing sort() via a low-level layer provided a good opportunity to:
- Unit test the low-level functionality thoroughly.
- Modernize the API, largely inspired by other intrusive list
implementations.
Here's a sketch of a longer-term plan:
- Create BumpPtrList<>, a non-intrusive list implemented using
simple_ilist<>, and use it for the Token list in
lib/Support/YAMLParser.cpp. This will factor out the only real use of
createNode().
- Evolve the iplist<> and ilist<> APIs in the direction of
simple_ilist<>, making allocation/deallocation explicit at call sites
(similar to simple_ilist<>::eraseAndDispose()).
- Factor out remaining calls to createNode() and deleteNode() and remove
the customization from ilist_traits<>.
- Transition uses of iplist<>/ilist<> that don't need callbacks over to
simple_ilist<>.
llvm-svn: 280107
Or they were not instantiated as expected;
llvm::InnerAnalysisManagerProxy<llvm::AnalysisManager<llvm::Function>, llvm::LazyCallGraph::SCC>::PassID
llvm::InnerAnalysisManagerProxy<llvm::AnalysisManager<llvm::Function>, llvm::LazyCallGraph::SCC>::PassID
llvm-svn: 280105
This reverts commit r280016, and the followups of r280017, r280027,
r280051, r280058, and r280059.
MSVC's implementation of std::promise does not get along with
llvm::Error. It uses its promised value too much like a normal value
type.
llvm-svn: 280100
only) for Expected<T> so that it can interoperate with MSVC's std::future
implementation.
MSVC 2013's std::future implementation requires the wrapped type to be default
constructible.
Hopefully this will fix the bot breakage in
http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/9937 .
llvm-svn: 280058
The former is simply wrong -- the code will either never be used or will
always be used, rather than being dependent upon whether it's built with
debug assertions enabled.
The macro DEBUG isn't ever set by the llvm build system. But, the macro
DEBUG(X) is defined (unconditionally) if you happen to include
llvm/Support/Debug.h.
The code in Value.h which was erroneously protected by the #ifdef DEBUG
didn't even compile -- you can't cast<> from an LLVMOpaqueValue
directly. Fortunately, it was never invoked, as Core.cpp included
Value.h before Debug.h.
The conditionalized code in AArch64CollectLOH.cpp was previously always
used, as it includes Debug.h.
llvm-svn: 280056
behaviors, and add a callB (blacking call) primitive.
callB is a blocking call primitive for threaded code where the RPC responses are
being processed on a separate thread. (For single threaded code callST should
continue to be used instead).
No unit test yet: Last time I commited a threaded unit test it deadlocked on
one of the s390x builders. I'll try to re-enable that test first, and add a new
test if I can sort out the deadlock issue.
llvm-svn: 280051
I'm working on a lower-level intrusive list that can be used
stand-alone, and splitting the files up a bit will make the code easier
to organize. Explode the ilist headers in advance to improve blame
lists in the future.
- Move ilist_node_base from ilist_node.h to ilist_node_base.h.
- Move ilist_base from ilist.h to ilist_base.h.
- Move ilist_iterator from ilist.h to ilist_iterator.h.
- Move ilist_node_access from ilist.h to ilist_node.h to support
ilist_iterator.
- Update unit tests to #include smaller headers.
- Clang-format the moved things.
I noticed in transit that there is a simplify_type specialization for
ilist_iterator. Since there is no longer an implicit conversion from
ilist<T>::iterator to T*, this doesn't make sense (effectively it's a
form of implicit conversion). For now I've added a FIXME.
llvm-svn: 280047
Reverse iterators to doubly-linked lists can be simpler (and cheaper)
than std::reverse_iterator. Make it so.
In particular, change ilist<T>::reverse_iterator so that it is *never*
invalidated unless the node it references is deleted. This matches the
guarantees of ilist<T>::iterator.
(Note: MachineBasicBlock::iterator is *not* an ilist iterator, but a
MachineInstrBundleIterator<MachineInstr>. This commit does not change
MachineBasicBlock::reverse_iterator, but it does update
MachineBasicBlock::reverse_instr_iterator. See note at end of commit
message for details on bundle iterators.)
Given the list (with the Sentinel showing twice for simplicity):
[Sentinel] <-> A <-> B <-> [Sentinel]
the following is now true:
1. begin() represents A.
2. begin() holds the pointer for A.
3. end() represents [Sentinel].
4. end() holds the poitner for [Sentinel].
5. rbegin() represents B.
6. rbegin() holds the pointer for B.
7. rend() represents [Sentinel].
8. rend() holds the pointer for [Sentinel].
The changes are #6 and #8. Here are some properties from the old
scheme (which used std::reverse_iterator):
- rbegin() held the pointer for [Sentinel] and rend() held the pointer
for A;
- operator*() cost two dereferences instead of one;
- converting from a valid iterator to its valid reverse_iterator
involved a confusing increment; and
- "RI++->erase()" left RI invalid. The unintuitive replacement was
"RI->erase(), RE = end()".
With vector-like data structures these properties are hard to avoid
(since past-the-beginning is not a valid pointer), and don't impose a
real cost (since there's still only one dereference, and all iterators
are invalidated on erase). But with lists, this was a poor design.
Specifically, the following code (which obviously works with normal
iterators) now works with ilist::reverse_iterator as well:
for (auto RI = L.rbegin(), RE = L.rend(); RI != RE;)
fooThatMightRemoveArgFromList(*RI++);
Converting between iterator and reverse_iterator for the same node uses
the getReverse() function.
reverse_iterator iterator::getReverse();
iterator reverse_iterator::getReverse();
Why doesn't iterator <=> reverse_iterator conversion use constructors?
In order to catch and update old code, reverse_iterator does not even
have an explicit conversion from iterator. It wouldn't be safe because
there would be no reasonable way to catch all the bugs from the changed
semantic (see the changes at call sites that are part of this patch).
Old code used this API:
std::reverse_iterator::reverse_iterator(iterator);
iterator std::reverse_iterator::base();
Here's how to update from old code to new (that incorporates the
semantic change), assuming I is an ilist<>::iterator and RI is an
ilist<>::reverse_iterator:
[Old] ==> [New]
reverse_iterator(I) (--I).getReverse()
reverse_iterator(I) ++I.getReverse()
--reverse_iterator(I) I.getReverse()
reverse_iterator(++I) I.getReverse()
RI.base() (--RI).getReverse()
RI.base() ++RI.getReverse()
--RI.base() RI.getReverse()
(++RI).base() RI.getReverse()
delete &*RI, RE = end() delete &*RI++
RI->erase(), RE = end() RI++->erase()
=======================================
Note: bundle iterators are out of scope
=======================================
MachineBasicBlock::iterator, also known as
MachineInstrBundleIterator<MachineInstr>, is a wrapper to represent
MachineInstr bundles. The idea is that each operator++ takes you to the
beginning of the next bundle. Implementing a sane reverse iterator for
this is harder than ilist. Here are the options:
- Use std::reverse_iterator<MBB::i>. Store a handle to the beginning of
the next bundle. A call to operator*() runs a loop (usually
operator--() will be called 1 time, for unbundled instructions).
Increment/decrement just works. This is the status quo.
- Store a handle to the final node in the bundle. A call to operator*()
still runs a loop, but it iterates one time fewer (usually
operator--() will be called 0 times, for unbundled instructions).
Increment/decrement just works.
- Make the ilist_sentinel<MachineInstr> *always* store that it's the
sentinel (instead of just in asserts mode). Then the bundle iterator
can sniff the sentinel bit in operator++().
I initially tried implementing the end() option as part of this commit,
but updating iterator/reverse_iterator conversion call sites was
error-prone. I have a WIP series of patches that implements the final
option.
llvm-svn: 280032
Optional.
For void functions the return type of a nonblocking call changes from
Expected<future<Optional<bool>>> to Expected<future<Error>>, and for functions
returning T the return type changes from Expected<future<Optional<T>>> to
Expected<future<Expected<T>>>.
Inner results need to be checked (since the RPC connection may have dropped
out before a result came back) and Error/Expected provide stronger checking
requirements. It also allows us drop the crufty 'optionalToError' function and
just collapse Errors in the single-threaded call primitives.
llvm-svn: 280016
Instead of putting all possible requests into a single table, we can perform
the extremely dense lookup based on opcode and type-index in constant time
using multi-dimensional array-like things.
This roughly halves the time spent doing legalization, which was dominated by
queries against the Actions table.
llvm-svn: 280011
There should be no functional change here, I'm just making the implementation
of "frem" (to libcall) legalization easier for a followup.
llvm-svn: 279987
Summary: No functional changes, just refactoring to make D23947 simpler.
Reviewers: eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23954
llvm-svn: 279982
Summary:
[Coroutines] Part 9: Add cleanup subfunction.
This patch completes coroutine heap allocation elision. Now, the heap elision example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex3.ll)
Intrinsic Changes:
* coro.free gets a token parameter tying it to coro.id to allow reliably discovering all coro.frees associated with a particular coroutine.
* coro.id gets an extra parameter that points back to a coroutine function. This allows to check whether a coro.id describes the enclosing function or it belongs to a different function that was later inlined.
CoroSplit now creates three subfunctions:
# f$resume - resume logic
# f$destroy - cleanup logic, followed by a deallocation code
# f$cleanup - just the cleanup code
CoroElide pass during devirtualization replaces coro.destroy with either f$destroy or f$cleanup depending whether heap elision is performed or not.
Other fixes, improvements:
* Fixed buglet in Shape::buildFrame that was not creating coro.save properly if coroutine has more than one suspend point.
* Switched to using variable width suspend index field (no longer limited to 32 bit index field can be as little as i1 or as large as i<whatever-size_t-is>)
Reviewers: majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23844
llvm-svn: 279971
Assuming the default FP env, we should not treat fdiv and frem any differently in terms of
trapping behavior than any other FP op. Ie, FP ops do not trap with the default FP env.
This matches how we treat these ops in IR with isSafeToSpeculativelyExecute(). There's a
similar bug in Constant::canTrap().
This bug manifests in PR29114:
https://llvm.org/bugs/show_bug.cgi?id=29114
...as a sequence of scalar divisions instead of a vector division on x86 for a <3 x float>
type.
Differential Revision: https://reviews.llvm.org/D23974
llvm-svn: 279970
switch to using one indirect stub manager per logical dylib rather than one per
input module.
LogicalDylib is a helper class used by the CompileOnDemandLayer to manage
symbol resolution between modules during lazy compilation. In particular, it
ensures that internal symbols resolve correctly even in the case where multiple
input modules contain the same internal symbol name (which must to be promoted
to external hidden linkage so that functions in any given module can be split
out by lazy compilation). LogicalDylib's resolution scheme (before this commit)
required one stub-manager per input module. This made recompilation of functions
(by adding a module containing a new definition) difficult, as the stub manager
for any given symbol was bound to the module that supplied the original
definition. By using one stubs manager for the whole logical dylib symbols can
be more easily replaced, although support for doing this is not included in this
patch (it will be implemented in a follow up).
llvm-svn: 279952
Fixed a bug in run-time checks for possible memory conflicts inside loop.
The bug is in Low <-> High boundaries calculation. The High boundary should be calculated as "last memory access pointer + element size".
Differential revision: https://reviews.llvm.org/D23176
llvm-svn: 279930
When global-isel fails on a MachineFunction MF, MF will be cleaned up
and given to SDISel.
Thanks to this fallback, we can already perform correctness test even if
we support only a small portion of the functions in a test.
llvm-svn: 279891
This is used to communicate that the instruction selection pipeline
failed at some point.
Another way to achieve that would be to have some kind of conditional
scheduling in the PassManager, such that we only schedule a pass based
on the success/failure of another one. The property approach has the
advantage of being lightweight and solve the problem at stake.
llvm-svn: 279885
Summary:
Have the cache pass back the path to the cache entry when it
is ready to be loaded, instead of a buffer.
For gold-plugin we can simply pass this file back to gold directly,
which avoids expensive writing of a separate tmp file. Ensure
the cache entry is not deleted on cleanup by adjusting the setting
of the IsTemporary flags.
Moved the loading of the buffer into llvm-lto2 to maintain current
behavior.
Reviewers: mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23946
llvm-svn: 279883
By default, this hook tells GlobalISel to abort (report a fatal error)
when it encounters an error. The alternative will be to fall back on
SDISel.
This fall back will be removed when the bring-up of GlobalISel is over.
llvm-svn: 279879
This method allows to reset the state of a MachineFunction as if it was
just created. This will be used during the bring-up of GlobalISel to
provide a way to fallback on SelectionDAG. That way, we can start doing
correctness testing even if we are not able to select all functions via
the global instruction selector.
llvm-svn: 279876
Summary:
This is obviously an interesting case because it may motivate code
restructuring or LTO.
Reporting this requires instantiation of ORE in the loop where the call
sites are first gathered. I've checked compile-time
overhead *with* -Rpass-with-hotness and the worst slow-down was 6% in
mcf and quickly tailing off. As before without -Rpass-with-hotness
there is no overhead.
Because this could be a pretty noisy diagnostics, it is currently
qualified as 'verbose'. As of this patch, 'verbose' diagnostics are
only emitted with -Rpass-with-hotness, i.e. when the output is expected
to be filtered.
Reviewers: eraman, chandlerc, davidxl, hfinkel
Subscribers: tejohnson, Prazek, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D23415
llvm-svn: 279860
We already have obtained a pointer to the underlying GlobalObject,
use it directly to find the comdat, rather than using the
GlobalValue::getComdat which will do the same thing again.
llvm-svn: 279856
MCContext already has many tasks, and separating CodeView out from it is
probably a good idea. The .cv_loc tracking was modelled on the DWARF
tracking which lived directly in MCContext.
Removes the inclusion of MCCodeView.h from MCContext.h, so now there are
only 10 build actions while I hack on CodeView support instead of 265.
llvm-svn: 279847
Summary: Dead store elimination gets very expensive when large numbers of instructions need to be analyzed. This patch limits the number of instructions analyzed per store to the value of the memdep-block-scan-limit parameter (which defaults to 100). This resulted in no observed difference in performance of the generated code, and no change in the statistics for the dead store elimination pass, but improved compilation time on some files by more than an order of magnitude.
Reviewers: dexonsmith, bruno, george.burgess.iv, dberlin, reames, davidxl
Subscribers: davide, chandlerc, dberlin, davidxl, eraman, tejohnson, mbodart, llvm-commits
Differential Revision: https://reviews.llvm.org/D15537
llvm-svn: 279833
We can't mark ORE (a function pass) preserved as required by the loop
passes because that is how we ensure that the required passes like
LazyBFI are all available any time ORE is used. See the new comments in
the patch.
Instead we use it directly just like the inliner does in D22694.
As expected there is some additional overhead after removing the caching
provided by analysis passes. The worst case, I measured was
LNT/CINT2006_ref/401.bzip2 which regresses by 12%. As before, this only
affects -Rpass-with-hotness and not default compilation.
llvm-svn: 279829
This function allows getting arbitrary sized block of random bytes.
Primary motivation is support for --build-id=uuid in lld.
Differential revision: https://reviews.llvm.org/D23671
llvm-svn: 279807
The assertion doesn't always hold true as sizeof(SDNodeBits) isn't equal
to sizeof(uint16_t) for some targets. For example, sizeof(SDNodeBits)
evaluates to 1, not 2, for ARM's APCS targets.
llvm-svn: 279797
MMI must match the function passed, and MF has a handle on MMI. Use that instead
of accepting it as separate argument. No Functional Change.
llvm-svn: 279701
Save the function in the class, and then don't pass it around. This reduces the
number of parameters and makes calls to member functions simpler.
No Functional Change.
llvm-svn: 279700
Rename AllVRegsAllocated to NoVRegs. This avoids the connotation of
running after register and simply describes that no vregs are used in
a machine function. With that we can simply compute the property and do
not need to dump/parse it in .mir files.
Differential Revision: http://reviews.llvm.org/D23850
llvm-svn: 279698
tracksSubRegLiveness only depends on the Subtarget and a cl::opt, there
is not need to change it or save/parse it in a .mir file.
Make the field const and move the initialization LiveIntervalAnalysis to the
MachineRegisterInfo constructor. Also cleanup some code and fix some
instances which better use MachineRegisterInfo::subRegLivenessEnabled() instead
of TargetSubtargetInfo::enableSubRegLiveness().
llvm-svn: 279676
Summary:
This patch implements readlane/readfirstlane intrinsics.
TODO: need to define a new register class to consider the case
that the source could be a vector register or M0.
Reviewed by:
arsenm and tstellarAMD
Differential Revision:
http://reviews.llvm.org/D22489
llvm-svn: 279660