This patch clarifies the semantics of nocapture attribute.
A 'Pointer Capture' subsection is added to describe the semantics of pointer capture first.
For the nocapture example with two same pointer arguments, it is consistent with the semantics that Alive2 used to run lit tests.
Reviewed By: nlopes
Differential Revision: https://reviews.llvm.org/D97924
These were misleading, they're more of a "clear" than an "invalidate".
We shouldn't be individually clearing analysis results. Either we clear
all analyses when some IR becomes invalid, or we properly go through
invalidation.
There was only one use of this, which can be simulated with
AM.invalidate(F, PA).
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D100519
This allows for walking all nested locations of a given location, and is generally useful when processing locations.
Differential Revision: https://reviews.llvm.org/D100437
str(n)cat appends a copy of the second argument to the end of the first
argument. To find the end of the first argument, str(n)cat has to read
from it until it finds the terminating 0. So it should not be marked as
writeonly. I think this means the argument should not be marked as
writeonly.
(This is causing a mis-compile with legacy DSE, before it got removed)
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D100601
When we pass a AArch64 Homogeneous Floating-Point
Aggregate (HFA) argument with increased alignment
requirements, for example
struct S {
__attribute__ ((__aligned__(16))) double v[4];
};
Clang uses `[4 x double]` for the parameter, which is passed
on the stack at alignment 8, whereas it should be at
alignment 16, following Rule C.4 in
AAPCS (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules)
Currently we don't have a way to express in LLVM IR the
alignment requirements of the function arguments. The align
attribute is applicable to pointers only, and only for some
special ways of passing arguments (e..g byval). When
implementing AAPCS32/AAPCS64, clang resorts to dubious hacks
of coercing to types, which naturally have the needed
alignment. We don't have enough types to cover all the
cases, though.
This patch introduces a new use of the stackalign attribute
to control stack slot alignment, when and if an argument is
passed in memory.
The attribute align is left as an optimizer hint - it still
applies to pointer types only and pertains to the content of
the pointer, whereas the alignment of the pointer itself is
determined by the stackalign attribute.
For byval arguments, the stackalign attribute assumes the
role, previously perfomed by align, falling back to align if
stackalign` is absent.
On the clang side, when passing arguments using the "direct"
style (cf. `ABIArgInfo::Kind`), now we can optionally
specify an alignment, which is emitted as the new
`stackalign` attribute.
Patch by Momchil Velikov and Lucas Prates.
Differential Revision: https://reviews.llvm.org/D98794
hasMode was looking up the map once. Then we'd either call get which
would look up again, or we'd insert into the map which requires
walking the map to find the insertion point.
I believe the hasMode was needed because get has a special case
to look for DefaultMode if the mode being asked for doesn't exist.
We don't want that here so we were using hasMode to make sure we
wouldn't hit that case.
Simplify to a regular operator[] access which will default
construct a SetType if the lookup fails.
Move some utility functions which are used within LDS lowering pass to a separate utils
file so that other LDS related passes can make use of them when required.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D100526
This generalizes RVInstIShift/RVInstIShiftW to take the upper
5 or 7 bits of the immediate as an input instead of only bit 30. Then
we can share them.
For RVInstIShift I left a hardcoded 0 at bit 26 where RV128 gets
a 7th bit for the shift amount.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D100424
Avoid visiting repeated instructions for processHeaderPhiOperands as it can cause a scenario of endless loop. Test case is attached and can be ran with `opt -basic-aa -tbaa -loop-unroll-and-jam -allow-unroll-and-jam -unroll-and-jam-count=4`.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D97407
Being lazy with printing the banner seems hard to reason with, we should print it
unconditionally first (it could also lead to duplicate banners if we
have multiple functions in -filter-print-funcs).
The printIR() functions were doing too many things. I separated out the
call from PrintPassInstrumentation since we were essentially doing two
completely separate things in printIR() from different callers.
There were multiple ways to generate the name of some IR. That's all
been moved to getIRName(). The printing of the IR name was also
inconsistent, now it's always "IR Dump on $foo" where "$foo" is the
name. For a function, it's the function name. For a loop, it's what's
printed by Loop::print(), which is more detailed. For an SCC, it's the
list of functions in parentheses. For a module it's "[module]", to
differentiate between a possible SCC with a function called "module".
To preserve D74814, we have to check if we're going to print anything at
all first. This is unfortunate, but I would consider this a special
case that shouldn't be handled in the core logic.
Reviewed By: jamieschmeiser
Differential Revision: https://reviews.llvm.org/D100231
There are four new PowerPC instructions that are introduced in
Power 10. They are hashst, hashchk, hashstp, hashchkp.
These instructions will be used for ROP Protection.
This patch adds the four instructions.
Reviewed By: nemanjai, amyk, #powerpc
Differential Revision: https://reviews.llvm.org/D99375
This patch changed the isLegalUse check to ensure that
LSRInstance::GenerateConstantOffsetsImpl generates an
offset that results in a legal addressing mode and
formula. The check is changed to look similar to the
assert check used for illegal formulas.
Differential Revision: https://reviews.llvm.org/D100383
Change-Id: Iffb9e32d59df96b8f072c00f6c339108159a009a
Value::replaceUsesOutsideBlock doesn't replace debug uses which leads to an
unnecessary reduction in variable location coverage. Fix this, add a unittest for
it, and add a regression test demonstrating the change through instcombine's
replacedSelectWithOperand.
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D99169
This also fixes a CHECK line in @fadd_strict_unroll which ensures the
changes made to fixReduction() to support in-order reductions with
unrolling are being tested correctly.
The `e_flags` contains a mixture of bitfields and regular ones, ensure all of them can be serialized and deserialized.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D100250
Returning in memory is not supported, so fall back to sret.
Also, extend i1 and i16 to i32. Otherwise, they would be passed through
memory.
Differential Revision: https://reviews.llvm.org/D100543
If we are truncating from a i32 source before comparing the result against zero, then see if we can directly compare the source value against zero.
If the upper (truncated) bits are known to be zero then we can compare against that, hopefully increasing the chances of us folding the compare into a EFLAG result of the source's operation.
Fixes PR49028.
Differential Revision: https://reviews.llvm.org/D100491
With this patch vbslq_f32(vnegq_s32(a), b, c) lowers to a BIT instruction.
Co-authored-by: Paul Walker <paul.walker@arm.com>
Differential Revision: https://reviews.llvm.org/D100304
Add an initial version of a helper to determine whether a recipe may
have side-effects.
Reviewed By: a.elovikov
Differential Revision: https://reviews.llvm.org/D100259
There were a few places in widenPHIInstruction where calculations of
offsets were failing to take the runtime calculation of VF into
account for scalable vectors. I've fixed those cases in this patch
as well as adding an assert that we should not be scalarising for
scalable vectors.
Tests are added here:
Transforms/LoopVectorize/AArch64/sve-widen-phi.ll
Differential Revision: https://reviews.llvm.org/D99254
At the moment, getMemoryOpCost returns 1 for all inputs if CostKind is
CodeSize or SizeAndLatency. This fools LoopUnroll into thinking memory
operations on large vectors have a cost of one, even if they will get
expanded to a large number of memory operations in the backend.
This patch updates getMemoryOpCost to return the cost for the type
legalization for both CodeSize and SizeAndLatency. This should more
accurately reflect the number of memory operations required.
I am not sure how latency should properly be included in SizeAndLatency
from the description, but returning the size cost should be clearly more
accurate.
This does not cause any binary changes when building
MultiSource/SPEC2000/SPEC2006 with -O3 -flto for AArch64, likely because
large vector memops are not really formed by code emitted from Clang.
But using the C/C++ matrix extension can easily result in code with very
large vector operations directly from Clang, e.g.
https://clang.godbolt.org/z/6xzxcTGvb
Reviewed By: samparker
Differential Revision: https://reviews.llvm.org/D100291
There are a few places in LoopVectorize.cpp where we have been too
cautious in adding VF.isScalable() asserts and it can be confusing.
It also makes it more difficult to see the genuine places where
work needs doing to improve scalable vectorization support.
This patch changes getMemInstScalarizationCost to return an
invalid cost instead of firing an assert for scalable vectors. Also,
vectorizeInterleaveGroup had multiple asserts all for the same
thing. I have removed all but one assert near the start of the
function, and added a new assert that we aren't dealing with masks
for scalable vectors.
Differential Revision: https://reviews.llvm.org/D99727
On Windows, float arguments are normally passed in float registers
in the calling convention for regular functions. For variable
argument functions, floats are passed in integer registers. This
already was done correctly since many years.
However, the surprising bit was that floats among the fixed arguments
also are supposed to be passed in integer registers, contrary to regular
functions. (This also seems to be the behaviour on ARM though, both
on Windows, but also on e.g. hardfloat linux.)
In the calling convention, don't promote shorter floats to f64, but
convert them to integers of the same length. (Floats passed as part of
the actual variable arguments are promoted to double already on the
C/Clang level; the LLVM vararg calling convention doesn't do any
extra promotion of f32 to f64 - this matches how it works on X86 too.)
Technically, this is an ABI break compared to older LLVM versions,
but it fixes compatibility with the official platform ABI. (In practice,
floats among the fixed arguments in variable argument functions is
a pretty rare construct.)
Differential Revision: https://reviews.llvm.org/D100365
Keep running "not --crash" via the external "not" executable, but
for plain negations, and for cases that use the shell "!" operator,
just skip that argument and invert the return code.
The libcxx tests only use the shell operator "!" for negations,
never the "not" executable, because libcxx tests can be run without
having a fully built llvm tree available providing the "not"
executable.
This allows using the internal shell for libcxx tests.
It should be possible to reland this now that D99938 fixed the
one test failure in clang-tidy that broke when "not" was handled
internally, letting lit/python execute grep.exe directly instead
of via not.exe. (See D99330 and D99406 for more commentery on the
exact issue that broke and other potential ways of fixing it.)
Differential Revision: https://reviews.llvm.org/D98859
If the PHI-of-ops simplifies to an existing value, no real PHI is
created, which means the dependencies between the
PHI-of-ops and its operands is not materialized in IR. At the
moment, we fail to create a real PHI node for the PHI-of-ops,
because the PHI-of-ops root instruction is not re-visited if
one of the PHI-of-ops operands changes. We need to add the
operands as additional users in this case.
Even with this patch, there are still some dependencies
missing. I will continue tackling the outstanding
reporeted crashes in this area.
Fixes PR36501, PR42422, PR42557.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D66924
Now since LDS uses within non-kernel functions are being handled in the
pass - LowerModuleLDS, we *NO* need to *forcefully* inline non-kernel
functions just because they use LDS. Do forceful inlining only when the
pass - LowerModuleLDS is not enabled. It is enabled by default.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D100481
When DIE is extracted manually, the DieArray is empty. When dump is invoked on aforementioned DIE it tries to extract child, even if Dump options say otherwise. Resulting in crash.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D99698