It should be enabled only when the load alignment is at least 8-byte.
Fixes: SWDEV-256824
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D90404
These expansions were rather inefficient and were done with more code
than necessary. This change optimizes them to use expansions more
similar to GCC. The code size is the same (when optimizing for code
size) but somehow LLVM reorders blocks in a non-optimal way. Still, this
should be an improvement with a reduction in code size of around 0.12%
(when building compiler-rt).
Differential Revision: https://reviews.llvm.org/D86418
This patch updates DSE + MemorySSA to use the same check as the legacy
implementation to determine if a location is killed by a free call.
This changes the existing behavior so that a free does not kill
locations before the start of the freed pointer.
This should fix PR48036.
This reverts the revert commit a1b53db32418cb6ed6f5b2054d15a22b5aa3aeb9.
This patch includes a fix for a reported issue, caused by
matchSelectPattern returning UMIN for selects of pointers in
some cases by looking to some connected casts.
For now, ensure integer instrinsics are only returned for selects of
ints or int vectors.
If the elt size is unknown due to it being a pointer, a comparison
against 0 will cause an assert. Make sure the elt size is large enough
before comparing and for the moment just return the scalar cost.
This is the last of the rotate->funnel shift InstCombine generalizations for PR46896
We still have foldGuardedRotateToFunnelShift to deal with in AggressiveInstCombine
Differential Revision: https://reviews.llvm.org/D90382
This is likely to be a regressigion introduced by my last refactoring of the
LSUnit (commit 5578ec32f9c4f). Before this patch, the
"CurrentStoreBarrierGroupID" index was not correctly reset on store barrier
executions. This was leading to unexpected crashes like the one reported as
PR48024.
Previously, !noalias and !alias.scope metadata on the call site was
applied as part of CloneAliasScopeMetadata(), which short-circuits
if the callee does not use any noalias metadata itself. However,
these two things have no relation to each other.
Consistently apply !noalias and !alias.scope metadata by integrating
this into an existing function that handled !llvm.access.group and
!llvm.mem.parallel_loop_access metadata. The handling for all of
these metadata kinds essentially the same.
This patch mainly made the following changes:
1. Support AVX-VNNI instructions;
2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix.
Differential Revision: https://reviews.llvm.org/D89105
Also added general wasm64 DWARF test
Also added asserts for unsupported reloc combinations that triggered this bug.
Differential Revision: https://reviews.llvm.org/D90503
This reverts commit 19225704890632cd2552f41ada41600a20db1371.
This appears to cause a crash in the following example
a, b, c;
l() {
int e = a, f = l, g, h, i, j;
float *d = c, *k = b;
for (;;)
for (; g < f; g++) {
k[h] = d[i];
k[h - 1] = d[j];
h += e << 1;
i += e;
}
}
clang -cc1 -triple i386-unknown-linux-gnu -emit-obj -target-cpu pentium-m -O1 -vectorize-loops -vectorize-slp reduced.c
llvm::Type *llvm::Type::getWithNewBitWidth(unsigned int) const: Assertion `isIntOrIntVectorTy() && "Original type expected to be a vector of integers or a scalar integer."' failed.
Add support for match-all tags and GOT-free runtime calls, which
are both required for the kernel to be able to support outlined
checks. This requires extending the access info to let the backend
know when to enable these features. To make the code easier to maintain
introduce an enum with the bit field positions for the access info.
Allow outlined checks to be enabled with -mllvm
-hwasan-inline-all-checks=0. Kernels that contain runtime support for
outlined checks may pass this flag. Kernels lacking runtime support
will continue to link because they do not pass the flag. Old versions
of LLVM will ignore the flag and continue to use inline checks.
With a separate kernel patch [1] I measured the code size of defconfig
+ tag-based KASAN, as well as boot time (i.e. time to init launch)
on a DragonBoard 845c with an Android arm64 GKI kernel. The results
are below:
code size boot time
before 92824064 6.18s
after 38822400 6.65s
[1] https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76
Depends on D90425
Differential Revision: https://reviews.llvm.org/D90426
Add Legalization support for VECREDUCE_SEQ_FADD, so that we don't need to depend on ExpandReductionsPass.
Differential Revision: https://reviews.llvm.org/D90247
This is a workaround for poor heuristics in the backend where we can
end up materializing the constant multiple times. This is particularly
bad when using outlined checks because we materialize it for every call
(because the backend considers it trivial to materialize).
As a result the field containing the shadow base value will always
be set so simplify the code taking that into account.
Differential Revision: https://reviews.llvm.org/D90425
In a kernel (or in general in environments where bit 55 of the address
is set) the shadow base needs to point to the end of the shadow region,
not the beginning. Bit 55 needs to be sign extended into bits 52-63
of the shadow base offset, otherwise we end up loading from an invalid
address. We can do this by using SBFX instead of UBFX.
Using SBFX should have no effect in the userspace case where bit 55
of the address is clear so we do so unconditionally. I don't think
we need a ABI version bump for this (but one will come anyway when
we switch to x20 for the shadow base register).
Differential Revision: https://reviews.llvm.org/D90424
From a code size perspective it turns out to be better to use a
callee-saved register to pass the shadow base. For non-leaf functions
it avoids the need to reload the shadow base into x9 after each
function call, at the cost of an additional stack slot to save the
caller's x20. But with x9 there is also a stack size cost, either
as a result of copying x9 to a callee-saved register across calls or
by spilling it to stack, so for the non-leaf functions the change to
stack usage is largely neutral.
It is also code size (and stack size) neutral for many leaf functions.
Although they now need to save/restore x20 this can typically be
combined via LDP/STP into the x30 save/restore. In the case where
the function needs callee-saved registers or stack spills we end up
needing, on average, 8 more bytes of stack and 1 more instruction
but given the improvements to other functions this seems like the
right tradeoff.
Unfortunately we cannot change the register for the v1 (non short
granules) check because the runtime assumes that the shadow base
register is stored in x9, so the v1 check still uses x9.
Aside from that there is no change to the ABI because the choice
of shadow base register is a contract between the caller and the
outlined check function, both of which are compiler generated. We do
need to rename the v2 check functions though because the functions
are deduplicated based on their names, not on their contents, and we
need to make sure that when object files from old and new compilers
are linked together we don't end up with a function that uses x9
calling an outlined check that uses x20 or vice versa.
With this change code size of /system/lib64/*.so in an Android build
with HWASan goes from 200066976 bytes to 194085912 bytes, or a 3%
decrease.
Differential Revision: https://reviews.llvm.org/D90422
If more than a prefix is provided - e.g. --check-prefixes=CHECK,FOO - we
don't report if (say) FOO is never used. This may lead to a gap in our
test coverage.
This patch introduces a new option, --allow-unused-prefixes. It
currently is set to true, keeping today's behavior. After we explicitly
set it in tests where this behavior was actually intentional, we will
switch it to false by default.
Differential Revision: https://reviews.llvm.org/D90281
This option was hardcoded to 32. Changing this as a CL option since we
have seen some cases downstream where increasing this limit allows us to
disprove reachability.
Reviewed-By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90487
Just return the new node, which is the standard practice.
I also noticed what appeared to be an unnecessary attempt at
creating an ANY_EXTEND where the type should already be correct.
I replace with an assert to verify the type.
Differential Revision: https://reviews.llvm.org/D90444
On Windows, after commit 881ba104656c40098d4bc90c52613c08136f0fe1, tools
using TempFile would error with "bad file descriptor" when writing the
file on a network drive. It appears that setting the delete-on-close bit via
SetFileInformationByHandle/FileDispositionInfo prevented it from
accessing the file on network drives, and although using
FILE_DISPOSITION_INFO seems to work, it causes other troubles.
Differential Revision: https://reviews.llvm.org/D81803
Make DebugLogging a member variable so that users of PassBuilder don't
need to pass it around so much.
Move call to TargetMachine::registerPassBuilderCallbacks() within
PassBuilder so users don't need to remember to call it.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D90437
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
This patch modifies two for loops to use the range based syntax.
Since they are equivalent, this patch is tagged NFC.
Differential Revision: https://reviews.llvm.org/D90069
I'm assuming the standard size integer instructions for this end up as something like:
mulq %rsi
seto %al
And the 'mul' generally has reciprocal throughput of 1 on typical implementations
(higher latency, but that's not handled here).
The default costs may end up much higher than that, and that's what we see in the test diffs.
Vector types are left as a 'TODO'.
Differential Revision: https://reviews.llvm.org/D90431
- As convergent intrinsics/calls could only be moved to
control-equivalent blocks, or more precisely the same divergent
branch, PRE needs to skip them.
Differential Revision: https://reviews.llvm.org/D90391
Currently isOverwrite returns OW_MaybePartial even for accesss known not to overlap. This is not a big problem for legacy implementation (since isPartialOverwrite follows isOverwrite and clarifies the result). Contrary SSA based version does a lot of work to later find out that accesses don't overlap. Besides negative impact on compile time we quickly reach MemorySSAPartialStoreLimit and miss optimization opportunities.
Note: In fact, I think it would be cleaner implementation if isOverwrite returned fully clarified result in the first place whithout need to call isPartialOverwrite. This can be done as a follow up. What do you think?
Reviewed By: fhahn, asbirlea
Differential Revision: https://reviews.llvm.org/D90371
Split up the monolithic VETargetLowering ctor into three initialization phases:
1. initRegisterClasses()
2. initSPUActions()
3. // TODO initVPUActions()
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D90463
As per the comment in VPRecipeBase, clients should not rely on
getVPRecipeID, as it may change in the future. It should only be used in
classof implementations. Use isa instead in getFirstNonPhi.