This adds a pattern for i64 zext_inreg(i32 extract_vector_elt X),
producing a single UMOVvi16 instruction that is already expected to
clear the top bits. The exact pattern that this matches is
and(anyext(vector_extract X, lane), 0xff), similar to the sext patterns
higher up in the same file.
Differential Revision: https://reviews.llvm.org/D98599
This patch adds additional load/store test cases involving scalars, vectors,
and PC-Rel in preparation for the refactored load and store implementation
introduced in D93370.
Differential Revision: https://reviews.llvm.org/D97391
When tracking defined lanes through phi nodes in the live range
graph each branch of the phi must be handled independently.
Also rewrite the marking algorithm to reduce unnecessary
operations.
Previously a shared set of defined lanes was used which caused
marking to stop prematurely. This was observable in existing lit
tests, but test patterns did not cover this detail.
Reviewed By: piotr
Differential Revision: https://reviews.llvm.org/D98614
Generate a json file containing descriptions of AST classes and their
public accessors which return SourceLocation or SourceRange.
Use the JSON file to generate a C++ API and implementation for accessing
the source locations and method names for accessing them for a given AST
node.
This new API can be used to implement 'srcloc' output in clang-query:
http://ce.steveire.com/z/m_kTIo
The JSON file can also be used to generate bindings for other languages,
such as Python and Javascript:
https://steveire.wordpress.com/2019/04/30/the-future-of-ast-matching
In this first version of this feature, only the accessors for Stmt
classes are generated, not Decls, TypeLocs etc. Those can be added
after this change is reviewed, as this change is mostly about
infrastructure of these code generators.
Also in this version, the platforms/cmake configurations are excluded as
much as possible so that support can be added iteratively. Currently a
break on any platform causes a revert of the entire feature. This way,
the `OR WIN32` can be removed in a future commit and if it breaks the
buildbots, only that commit gets reverted, making the entire process
easier to manage.
Differential Revision: https://reviews.llvm.org/D93164
Normally, this function just doesn't bother about cycles,
and hopes that the caller supplied small-enough depth
so that at worst it will take a potentially large,
but limited amount of time. But that obviously doesn't work
if there is no depth limit.
This reapples 36f1c3db66f7268ea3183bcf0bbf05b3e1c570b4,
but without asserting, just bailout once cycle is detected.
This patch adds fixed-length vector support to the calling convention
when RVV is used to lower fixed-length vectors. The scheme follows the
regular vector calling convention for the argument/return registers, but
uses scalable vector container types as the LocVTs, and converts to/from
the fixed-length vector value types as required.
Fixed-length vector types may be split when the combination of minimum
VLEN and the maximum allowable LMUL is not large enough to fully contain
the vector. In this case the behaviour differs between fixed-length
vectors passed as parameters and as return values:
1. For return values, vectors must be passed entirely via registers or
via the stack.
2. For parameters, unlike scalar values, split vectors continue to be
passed by value, and are split across multiple registers until there are
no remaining registers. Thus vector parameters may be found partly in
registers and partly on the stack.
As with scalable vectors, the first fixed-length mask vector is passed
via v0. Split mask fixed-length vectors are passed first via v0 and then
via the next available vector register: v8,v9,etc.
The handling of vector return values uses all available argument
registers v8-v23 which does not adhere to the calling convention we're
supposedly implementing, but since this issue affects both fixed-length
and scalable-vector values, it was left as-is.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97954
The reason for this is to avoid deep recursion in DFS() which can cause
stack overflow on large CFGs, especially on Windows.
Differential Revision: https://reviews.llvm.org/D98528
For slow-hop targets, see if any single-op hops are duplicating work already done on another (dual-op) hop, which can sometimes occur as isHorizontalBinOp tries to find potential duplicates (but can't merge them itself). If so, reuse the other hop and shuffle the result.
Jeroen Dobbelaere in
https://lists.llvm.org/pipermail/llvm-dev/2021-March/149206.html
is reporting that this function can end up in an endless loop
when called from SROA w/ full restrict patches.
For now, simply ensure that such problems are caught earlier/easier.
Types of fractional LMUL and LMUL=1 are all using VR register class. When
using inline asm, it will use the first type in the register class as the
type for the register. It is not necessary the same as the value type. We
need to use INSERT_SUBVECTOR/EXTRACT_SUBVECToR/BITCAST to make it legal
to put the value in the corresponding register class.
Differential Revision: https://reviews.llvm.org/D97480
I encountered a project that uses llvm that passes "generic" by
default. While I could fix that project, I wouldn't be surprised
if other projects did something similar. So it seems like
a good idea to provide a better error here.
I've also added validation of the 64Bit feature against the
triple so that we can catch a mismatched CPU before failing in
a mysterious way. We can make it pretty far in isel because we
calculate XLenVT from the triple and use that to set up the legal
integer type.
Reviewed By: luismarques, khchen
Differential Revision: https://reviews.llvm.org/D98307
Generate a json file containing descriptions of AST classes and their
public accessors which return SourceLocation or SourceRange.
Use the JSON file to generate a C++ API and implementation for accessing
the source locations and method names for accessing them for a given AST
node.
This new API can be used to implement 'srcloc' output in clang-query:
http://ce.steveire.com/z/m_kTIo
The JSON file can also be used to generate bindings for other languages,
such as Python and Javascript:
https://steveire.wordpress.com/2019/04/30/the-future-of-ast-matching
In this first version of this feature, only the accessors for Stmt
classes are generated, not Decls, TypeLocs etc. Those can be added
after this change is reviewed, as this change is mostly about
infrastructure of these code generators.
Also in this version, the platforms/cmake configurations are excluded as
much as possible so that support can be added iteratively. Currently a
break on any platform causes a revert of the entire feature. This way,
the `OR WIN32` can be removed in a future commit and if it breaks the
buildbots, only that commit gets reverted, making the entire process
easier to manage.
Differential Revision: https://reviews.llvm.org/D93164
Generate a json file containing descriptions of AST classes and their
public accessors which return SourceLocation or SourceRange.
Use the JSON file to generate a C++ API and implementation for accessing
the source locations and method names for accessing them for a given AST
node.
This new API can be used to implement 'srcloc' output in clang-query:
http://ce.steveire.com/z/m_kTIo
The JSON file can also be used to generate bindings for other languages,
such as Python and Javascript:
https://steveire.wordpress.com/2019/04/30/the-future-of-ast-matching
In this first version of this feature, only the accessors for Stmt
classes are generated, not Decls, TypeLocs etc. Those can be added
after this change is reviewed, as this change is mostly about
infrastructure of these code generators.
Also in this version, the platforms/cmake configurations are excluded as
much as possible so that support can be added iteratively. Currently a
break on any platform causes a revert of the entire feature. This way,
the `OR WIN32` can be removed in a future commit and if it breaks the
buildbots, only that commit gets reverted, making the entire process
easier to manage.
Differential Revision: https://reviews.llvm.org/D93164
D98289 was erroneously reporting `invalid range list offset 0x20110`
instead of `invalid range list table index 0`.
Differential Revision: https://reviews.llvm.org/D98589
This makes M68k match other platforms in this regard.
This was done as part of the AsmParser/Disassembler work since the entry
functions of those modules usually reference `getTheXXXTarget()`.
Differential Revision: https://reviews.llvm.org/D98517
read_raw_stdin() was opening a file in binary mode, but Popen
was being told to use text mode (universal_newlines). This is
benign on Python 2 but an error on Python 3.
Differential Revision: https://reviews.llvm.org/D98428
This is an alternative to D98120. Herein, instead of deleting the transformation entirely, we check
that the underlying objects are both the same and therefore this transformation wouldn't incur a
provenance change, if applied.
https://alive2.llvm.org/ce/z/SYF_yv
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D98588
The load/store instruction will be transformed to amx intrinsics
in the pass of AMX type lowering. Prohibiting the pointer cast
make that pass happy.
Differential Revision: https://reviews.llvm.org/D98247
Adjust the Win64 calling convention for Swift to pass self in R13, which
is traditionally a CSR. This makes the behaviour similar to the SysV CC
for Swift as well. This should improve the argument passing on Windows,
although it comes at a high cost of ABI incompatibility. Fortunately in
this case, there is no guarantee of ABI stability, and so we can make
this incompatible change.
Change was reverted in commit 8d20f2c2c66eb486ff23cc3d55a53bd840b36971 because it was causing an infinite loop. 9228f2f32 fixed the root issue in the code structure, this change just reapplies the original change w/adaptation to the new code structure.
This fixes the bug demonstrated by the test case in the commit message of 8d20f2c2 (which was a revert of cf82700). The root issue was that we have two transforms which are inverses of each other. We use one for simple induction variables (where we can use the post-inc form), and the other for everything else. The problem was that the two transforms could disagree about whether something was an induction variable.
The reverted commit made a change to one of the matcher routines which was used for one of the two transforms without updating the other matcher. However, it's worth noting the existing code w/o the reverted change also has cases where the decision could differ between the two paths.
The fix is simply to consolidate the code such that two paths must agree by construction, and to add an assert to catch any potential future re-divergence.
Triggering the infinite loop requires side stepping the SunkAddrs cache. The SunkAddrs cache has the effect of suppressing the iteration in the common case, but there are codepaths through CGP which restart iteration and clear this cache.
Unfortunately, I have not been able to construct a standalone IR test case for this. The original test case is a c++ program which when compiled by clang demonstrates the infinite loop, but all of my attempts at extracting an IR test case runnable through opt/llc have failed to reproduce. (Including capturing the IR at point of the transform itself!) I have no idea what weird state clang is creating here.
I also tried creating a test case by hand, but gave up after about an hour of trying to find the right combination to dance through multiple transforms to create the end result needed to trip the bug.
This fixes a regression from the MemDep-based implementation:
MemDep completely ignores lifetime.start intrinsics that aren't
MustAlias -- this is probably unsound, but it does mean that the
MemDep based implementation successfully eliminated memcpy's from
lifetime.start if the memcpy happens at an offset, rather than
the base address of the alloca.
Add a special case for the case where the lifetime.start spans the
whole alloca (which is pretty much the only kind of lifetime.start
that frontends ever emit), as we don't need to figure out our exact
aliasing relationship in that case, the whole alloca is dead prior
to the call.
If this doesn't cover all practically relevant cases, then it
would be possible to make use of the recently added PartialAlias
clobber offsets to make this more precise.
A 1-bit smulo overflows is both inputs are -1 since the result
should be +1 which can't be represented in a signed 1 bit value.
We can detect this with an AND and a setcc. The multiply result
can also use the same AND.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D97634