As with other Attributor interfaces we often want to know if assumed
information was used to answer a query. This is important if only
known information is allowed or if known information can lead to an
early fixpoint. The users have been adjusted but none of them utilizes
the new information yet.
The test case here hits machine verifier problems. There are volatile
long loads that the results of do not get used, loading into two dead
registers. IfCvt will predicate them and as it does will add implicit
uses of the predicating registers due to thinking they are live in. As
nothing has used the register, the machine verifier disagrees that they
are really live and we end up with a failure.
The registers come from Pristine regs that LivePhysRegs counts as live.
This patch adds a addLiveInsNoPristines method to be used instead in
IfCvt, so that only really live in regs need to be added as implicit
operands.
Differential Revision: https://reviews.llvm.org/D90965
Originally committed as 04c203e310bd3fb58e16c936c0200d680100526e
Reverted in 768510632c5ddbf9438693d9c7db1903e39295ad due to the test
failing when encountering windows directory separators.
Fix the path separator platform issue with a FileCheck pattern {{[/\\]}}
Original commit message:
A followup to the feature added in 69da27c7496ea373567ce5121e6fe8613846e7a5
that added the optional "start file name" to match "start line" - but this
didn't work with Split DWARF because of the need for the decl file number
resolution code to refer back to the skeleton unit to find its .debug_line
contribution. So this patch adds the necessary infrastructure to track the
skeleton unit corresponding to a split full unit for the purpose of this
lookup.
In the spirit of TRegions [0], this patch analyzes a kernel and tracks
if it can be executed in SPMD-mode. If so, we flip the arguments of
the __kmpc_target_init and deinit call to enable the mode. We also
update the `<kernel>_exec_mode` flag to indicate to the runtime we
changed the mode to SPMD.
The code analysis is done interprocedurally by extending the
AAKernelInfo abstract attribute to track SPMD compatibility as well.
[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11
Differential Revision: https://reviews.llvm.org/D102307
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.
The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.
[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11
This was in parts extracted from D59319.
Reviewed By: ABataev, JonChesterfield
Differential Revision: https://reviews.llvm.org/D101976
In order to simplify future extensions, e.g., the merge of
AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the
state we keep for each malloc-like call. The result is also less
confusing as we only track malloc-like calls, not all calls. Further, we
only perform the updates necessary for a malloc-like to argue it can go
to the stack, e.g., we won't check all uses if we moved on to the
"must-be-freed" argument.
This patch also uses Attributor helps to simplify the allocated size,
alignment, and the potentially freed objects.
Overall, this is mostly a reorganization and only the use of the
optimistic helpers should change (=improve) the capabilities a bit.
Differential Revision: https://reviews.llvm.org/D104993
We have to be careful when we replace values to not use a non-dominating
instruction. It makes sense that simplification offers those as
"simplified values" but we can't manifest them in the IR without PHI
nodes. In the future we should consider potentially adding those PHI
nodes.
We should use AAValueSimplify for all value simplification, however
there was some leftover logic that predates AAValueSimplify in
AAReturnedValues. This remove the AAReturnedValues part and provides a
replacement by making AAValueSimplifyReturned strong enough to handle
all previously covered cases. Further, this improve
AAValueSimplifyCallSiteReturned to handle returned arguments.
AAReturnedValues is now much easier and the collected returned
values/instructions are now from the associated function only, making it
much more sane. We also do not have the brittle logic anymore that looks
for unresolved calls. Instead, we use AAValueSimplify to handle
recursion.
Useful code has been split into helper functions, e.g., an Attributor
interface to get a simplified value.
Differential Revision: https://reviews.llvm.org/D103860
Broke check-clang, see https://reviews.llvm.org/D102307#2869065
Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`
As with other Attributor interfaces we often want to know if assumed
information was used to answer a query. This is important if only
known information is allowed or if known information can lead to an
early fixpoint. The users have been adjusted but none of them utilizes
the new information yet.
In the spirit of TRegions [0], this patch analyzes a kernel and tracks
if it can be executed in SPMD-mode. If so, we flip the arguments of
the __kmpc_target_init and deinit call to enable the mode. We also
update the `<kernel>_exec_mode` flag to indicate to the runtime we
changed the mode to SPMD.
The code analysis is done interprocedurally by extending the
AAKernelInfo abstract attribute to track SPMD compatibility as well.
[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11
Differential Revision: https://reviews.llvm.org/D102307
We have to be careful when we replace values to not use a non-dominating
instruction. It makes sense that simplification offers those as
"simplified values" but we can't manifest them in the IR without PHI
nodes. In the future we should consider potentially adding those PHI
nodes.
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.
The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.
[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11
This was in parts extracted from D59319.
Reviewed By: ABataev, JonChesterfield
Differential Revision: https://reviews.llvm.org/D101976
In order to simplify future extensions, e.g., the merge of
AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the
state we keep for each malloc-like call. The result is also less
confusing as we only track malloc-like calls, not all calls. Further, we
only perform the updates necessary for a malloc-like to argue it can go
to the stack, e.g., we won't check all uses if we moved on to the
"must-be-freed" argument.
This patch also uses Attributor helps to simplify the allocated size,
alignment, and the potentially freed objects.
Overall, this is mostly a reorganization and only the use of the
optimistic helpers should change (=improve) the capabilities a bit.
Differential Revision: https://reviews.llvm.org/D104993
We should use AAValueSimplify for all value simplification, however
there was some leftover logic that predates AAValueSimplify in
AAReturnedValues. This remove the AAReturnedValues part and provides a
replacement by making AAValueSimplifyReturned strong enough to handle
all previously covered cases. Further, this improve
AAValueSimplifyCallSiteReturned to handle returned arguments.
AAReturnedValues is now much easier and the collected returned
values/instructions are now from the associated function only, making it
much more sane. We also do not have the brittle logic anymore that looks
for unresolved calls. Instead, we use AAValueSimplify to handle
recursion.
Useful code has been split into helper functions, e.g., an Attributor
interface to get a simplified value.
Differential Revision: https://reviews.llvm.org/D103860
Instead of performing the isMoreProfitable() operation on
InstructionCost::CostTy the operation is performed on InstructionCost
directly, so that it can handle the case where one of the costs is
Invalid.
This patch also changes the CostTy to be int64_t, so that the type is
wide enough to deal with multiplications with e.g. `unsigned MaxTripCount`.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D105113
This patch makes the operations on InstructionCost saturate, so that when
costs are accumulated they saturate to <max value>.
One of the compelling reasons for wanting to have saturation support
is because in various places, arbitrary values are used to represent
a 'high' cost, but when accumulating the cost of some set of operations
or a loop, overflow is not taken into account, which may lead to unexpected
results. By defining the operations to saturate, we can express the cost
of something 'very expensive' as InstructionCost::getMax().
Reviewed By: kparzysz, dmgreen
Differential Revision: https://reviews.llvm.org/D105108
The original motivation for this was to implement moreElementsVector of shuffles
on AArch64, which resulted in complex sequences of artifacts like unmerge(unmerge(concat...))
which the combiner couldn't handle. It seemed here that the better option,
instead of writing ever-more-complex combines, was to have a way to find
the original "non-artifact" source registers for a given definition, walking
through arbitrary expressions of unmerge/concat/insert. As long as the bits
aren't extended or truncated, this is a pretty simple algorithm that avoids
the need for lots of combines and instead jumps straight to the final result
we want.
I've only used this new technique in 2 places within tryCombineUnmerge, using it
in more general situations resulted in infinite loops in AMDGPU. So for now
it's used when we would otherwise fail to combine and that seems to work.
In order to support looking through G_INSERTs, I also had to add it as an
artifact in isArtifact(), which caused a whole lot of issues in tests. AMDGPU
started infinite looping since full legalization of G_INSERT doensn't seem to
be there. To work around this, I've temporarily added a CLI option to use the
old behaviour so that the MIR tests will still run and terminate.
Other minor changes include no longer making >128b G_MERGE/UNMERGE legal.
We never had isel support for that anyway and it was a remnant of the legacy
legalizer rules. However being legal prevented the combiner from checking if it
was dead and deleting them.
Differential Revision: https://reviews.llvm.org/D104355
Renames CommonOrcRuntimeTypes.h to ExecutorAddress.h and moves ExecutorAddress
into the 'orc' namespace (rather than orc::shared).
Also makes ExecutorAddress a class, adds an ExecutorAddrDiff type and some
arithmetic operations on the pair (subtracting two addresses yields an addrdiff,
adding an addrdiff and an address yields an address).
Replace the clang builtin function and LLVM intrinsic previously used to select
the f64x2.promote_low_f32x4 instruction with custom combines from standard
SelectionDAG nodes. Implement the new combines to share code with the similar
combines for f64x2.convert_low_i32x4_{s,u}. Resolves PR50232.
Differential Revision: https://reviews.llvm.org/D105675
A followup to the feature added in
69da27c7496ea373567ce5121e6fe8613846e7a5 that added the optional "start
file name" to match "start line" - but this didn't work with Split DWARF
because of the need for the decl file number resolution code to refer
back to the skeleton unit to find its .debug_line contribution. So this
patch adds the necessary infrastructure to track the skeleton unit
corresponding to a split full unit for the purpose of this lookup.
This to protect against non-sensical instruction sequences being assembled,
which would either cause asserts/crashes further down, or a Wasm module being output that doesn't validate.
Unlike a validator, this type checker is able to give type-errors as part of the parsing process, which makes the assembler much friendlier to be used by humans writing manual input.
Because the MC system is single pass (instructions aren't even stored in MC format, they are directly output) the type checker has to be single pass as well, which means that from now on .globaltype and .functype decls must come before their use. An extra pass is added to Codegen to collect information for this purpose, since AsmPrinter is normally single pass / streaming as well, and would otherwise generate this information on the fly.
A `-no-type-check` flag was added to llvm-mc (and any other tools that take asm input) that surpresses type errors, as a quick escape hatch for tests that were not intended to be type correct.
This is a first version of the type checker that ignores control flow, i.e. it checks that types are correct along the linear path, but not the branch path. This will still catch most errors. Branch checking could be added in the future.
Differential Revision: https://reviews.llvm.org/D104945
In order to mirror the GetElementPtrInst::indices() API.
Wanted to use this in the IRForTarget code, and was surprised to
find that it didn't exist yet.
Reapply after fixing another occurrence in lldb that was relying
on this in the preceding commit.
-----
GetElementPtrInst::Create() (and IRBuilder methods based on it)
currently accept nullptr as the element type, and will fetch the
element type from the pointer in that case. Remove this fallback,
as it is incompatible with opaque pointers. I've removed a handful
of leftover calls using this behavior as a preliminary step.
Out-of-tree code affected by this change should either pass a proper
type, or can temporarily explicitly call getPointerElementType(),
if the newly added assertion is encountered.
Differential Revision: https://reviews.llvm.org/D105653
Reapply with fixes for clang tests.
-----
This is a simple enum attribute. Test changes are because enum
attributes are sorted before type attributes, so mustprogress is
now in a different position.
This change is intended as initial setup. The plan is to add
more semantic checks later. I plan to update the documentation
as more semantic checks are added (instead of documenting the
details up front). Most of the code closely mirrors that for
the Swift calling convention. Three places are marked as
[FIXME: swiftasynccc]; those will be addressed once the
corresponding convention is introduced in LLVM.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D95561
This reverts commit 84ed3a794b4ffe7bd673f1e5a17d507aa3113d12.
A number of clang tests are also affected by this change. Revert
until I can update them.
While working on the elementtype attribute, I felt that the type
attribute handling in AttrBuilder is overly repetitive. This patch
converts the separate Type* members into an std::array<Type*>, so
that all type attribute kinds can be handled generically.
There's more room for improvement here (especially when it comes to
converting the AttrBuilder to an Attribute), but this seems like a
good starting point.
Differential Revision: https://reviews.llvm.org/D105658
GetElementPtrInst::Create() (and IRBuilder methods based on it)
currently accept nullptr as the element type, and will fetch the
element type from the pointer in that case. Remove this fallback,
as it is incompatible with opaque pointers. I've removed a handful
of leftover calls using this behavior as a preliminary step.
Out-of-tree code affected by this change should either pass a proper
type, or can temporarily explicitly call getPointerElementType(),
if the newly added assertion is encountered.
Differential Revision: https://reviews.llvm.org/D105653
Currently InstructionSimplify.cpp knows how to simplify floating point
instructions that have a NaN operand. It does not know how to handle the
matching constrained FP intrinsic.
This patch teaches it how to simplify so long as the exception handling
is not "fpexcept.strict".
Differential Revision: https://reviews.llvm.org/D103169
Summary:
The bit order of the has_vec and longtbtable bits in the traceback table generated by the XL compiler flipped at some point after v12.1. This is different from the definition is the AIX header debug.h. The change in the XL compiler that caused the deviation from the OS header definition was unintentional. Since both orderings are extant and the XL compiler runtime also expects the ordering defined by the OS, we will correct the output from LLVM to match the defined ordering given by the OS (which is also consistent with the Assembler Language Reference). Mitigation for traceback tables encoded with the wrong ordering is required for either ordering.
Reviewers: XingXue, HubertTong
Differential Revision: https://reviews.llvm.org/D105487
We keep a record of substitutions between debug value numbers post-isel,
however we never actually look them up until the end of compilation. As a
result, there's nothing gained by the collection being a std::map. This
patch downgrades it to being a vector, that's then sorted at the end of
compilation in LiveDebugValues.
Differential Revision: https://reviews.llvm.org/D105029
This reverts commit 5b350183cdabd83573bc760ddf513f3e1d991bcb (and
also "[NFC][ScalarEvolution] Cleanup howManyLessThans.",
009436e9c1fee1290d62bc0faafe0c0295542f56, to make it apply).
See https://reviews.llvm.org/D105216 for discussion on various
miscompilations caused by that commit.
This patch removes the IsPairwiseForm flag from the Reduction Cost TTI
hooks, along with some accompanying code for pattern matching reductions
from trees starting at extract elements. IsPairWise is now assumed to be
false, which was the predominant way that the value was used from both
the Loop and SLP vectorizers. Since the adjustments such as D93860, the
SLP vectorizer has not relied upon this distinction between paiwise and
non-pairwise reductions.
This also removes some code that was detecting reductions trees starting
from extract elements inside the costmodel. This case was
double-counting costs though, adding the individual costs on the
individual instruction _and_ the total cost of the reduction. Removing
it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to
not double count. The cost of reduction intrinsics is still tested
through the various tests in
llvm/test/Analysis/CostModel/X86/reduce-xyz.ll.
Differential Revision: https://reviews.llvm.org/D105484
This reverts commit e2d30846327c7ec5cc9d2a46aa9bcd9c2c4eff93.
MSVC doesn't seem to resolve the intended ambiguity in implicit
conversion contexts correctly: https://godbolt.org/z/ee16aqv4v
See bug for full details, but basically there's an upcoming ambiguity in
the conversion in `StringRef(SomeSmallString)` - either the implicit
conversion operator (SmallString::operator StringRef) could be used, or
the std::string_view range-based ctor (& then `StringRef(std::string_view)`
would be used)
To address this, make such a conversion invalid up-front - most uses are
more tersely written as `SomeSmallString.str()` anyway, or more clearly
written as `StringRef x = y;` rather than `StringRef x(y);` - so if you
hit this in out-of-tree code, please update in one of those ways.
Hopefully I've fixed everything in tree prior to this patch landing.
SelectionDAG's equivalents in ISD::InputArg/OutputArg track the
original argument index. Mips relies on this, and its currently
reinventing its own parallel CallLowering infrastructure which tracks
these indexes on the side. Add this to help move towards deleting the
custom mips handling.
There are two issues with the current implementation of computeBECount:
1. It doesn't account for the possibility that adding "Stride - 1" to
Delta might overflow. For almost all loops, it doesn't, but it's not
actually proven anywhere.
2. It doesn't account for the possibility that Stride is zero. If Delta
is zero, the backedge is never taken; the value of Stride isn't
relevant. To handle this, we have to make sure that the expression
returned by computeBECount evaluates to zero.
To deal with this, add two new checks:
1. Use a variety of tricks to try to prove that the addition doesn't
overflow. If the proof is impossible, use an alternate sequence which
never overflows.
2. Use umax(Stride, 1) to handle the possibility that Stride is zero.
Differential Revision: https://reviews.llvm.org/D105216
As pointed out in post-commit review on rG8e22539067d9, it's
necessary to call getScalarType() to support GEPs with a vector
base. Dropping that call was an oversight on my side.
This adds a new llvm::thread class with the same interface as std::thread
except there is an extra constructor that allows us to set the new thread's
stack size. On Darwin even the default size is boosted to 8MB to match the main
thread.
It also switches all users of the older C-style `llvm_execute_on_thread` API
family over to `llvm::thread` followed by either a `detach` or `join` call and
removes the old API.
Moved definition of DefaultStackSize into the .cpp file to hopefully
fix the build on some (GCC-6?) machines.