If the result of an atomic operation is not used then it can be more
efficient to build a reduction across all lanes instead of a scan. Do
this for GFX10, where the permlanex16 instruction makes it viable. For
wave64 this saves a couple of dpp operations. For wave32 it saves one
readlane (which are generally bad for performance) and one dpp
operation.
Differential Revision: https://reviews.llvm.org/D98953
This patch adds a few test cases where currently NoAlias is returned,
but the pointers can alias if the multiply overflows while computing
a GEP index value.
This was originally just an XFAIL test, but I modified it
to check output. To make that bot-friendly, I'm moving it
to the x86 dir since it specified an x86 target.
Add the constraint when destination EEW not equals the source EEW for
correctness.
The RVV spec has three register overlap rules and I implement the first
stricter constraint because the others are difficult to enforce.
Reviewed By: frasercrmck, craig.topper
Differential Revision: https://reviews.llvm.org/D98920
This reverts commit 3c8473ba534daa3 and includes test diffs to
maintain testing status.
There's at least 1 place that was not updated with 7202f47508 ,
so we can crash mismatching select and intrinsics as shown in
PR49730.
The tests, test/Transforms/InstCombine/AArch64/sve-*,
have been shown to not be AArch64 specific. These tests
have been renamed and moved to reflect this.
Differential Revision: https://reviews.llvm.org/D99253
The implementation of `llvm_struct_name` before this diff calls
`caml_copy_string`, which allocates, while the `result` local variable
points to a block allocated by `caml_alloc_small` that has not yet
been initialized. If the allocation in `caml_copy_string` triggers a
garbage collection, then the GC root `result` contains a pointer to
uninitialized data, which may crash the GC or lead to a memory
corruption.
This diff fixes this by allocating and initializing the string first
and then allocating and initializing the option, thereby leaving no
dangling pointers when allocations are made.
The conversion from a C string to an OCaml string option is refactored
into a function, `cstr_to_string_option`. This function is also used
to simplify the definitions of `llvm_get_mdstring` and
`llvm_string_of_const`.
Differential Revision: https://reviews.llvm.org/D99393
There are a number of compilation warnings regarding disregarding
const qualifiers, and casting between pointers to integer types with
different sign.
The incompatible sign warnings are due to treating the result of
`LLVMGetModuleIdentifier` as `const unsigned char *`, but it is
declared as `const char *`.
The dropped const qualifiers are due to the code pattern
`memcpy(String_val(_),_,_)` which ought to be (following the
implementation of the OCaml runtime)
`memcpy((char *)String_val(_),_,_)`. The issue is that `String_val` is
usually used to get the value of an immutable string. But in the
context of the `memcpy` calls, the string is in the process of being
initialized, so is not yet constant.
Differential Revision: https://reviews.llvm.org/D99392
This diff uses ptr_to_option to convert a nullable C pointer to an
OCaml option instead of the redundant implementation in
llvm_global_initializer.
Differential Revision: https://reviews.llvm.org/D99391
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.
unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
return N * TTI.getArithmeticInstrCost(...
After some investigation it seems that there are only these cases that occur
in practice:
1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast with scalar uses, or b) this is an update to an induction variable
that remains scalar.
I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. For all other cases I have added an assert that none of the
users needs scalarising, which didn't fire in any unit tests.
Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.
Differential Revision: https://reviews.llvm.org/D98512
This patch should fix the errors shown on the Windows bots by turning off text mode. I plan to investigate a better fix but this should unblock the buildbots for now.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D99363
This patch enables the cost-benefit-analysis-based inliner by default
if we have instrumentation profile.
- SPEC CPU 2017 shows a 0.4% improvement.
- An internal large benchmark shows a 0.9% reduction in the cycle
count along with 14.6% reduction in the number of call instructions
executed.
Differential Revision: https://reviews.llvm.org/D98213
When prioritize call site to consider for inlining in sample loader, use number of samples as a first tier breaker before using name/guid comparison. This would favor smaller functions when hotness is the same (from the same block). We could try to retrieve accurate function size if this turns out to be more important.
Differential Revision: https://reviews.llvm.org/D99370
JITLink now requires section names to be unique. In MachO section names are only
guaranteed to be unique within their containing segment (e.g. a '__const' section
in the '__DATA' segment does not clash with a '__const' section in the '__TEXT'
segment), so we need to use the fully qualified <segment>,<section> section
names (e.g. '__DATA,__const' or '__TEXT,__const') when constructing
jitlink::Sections for MachO objects.
Darwin platforms for both AArch64 and X86 can provide optimized `bzero()`
routines. In this case, it may be preferable to use `bzero` in place of a
memset of 0.
This adds a G_BZERO generic opcode, similar to G_MEMSET et al. This opcode can
be generated by platforms which may want to use bzero.
To emit the G_BZERO, this adds a pre-legalize combine for AArch64. The
conditions for this are largely a port of the bzero case in
`AArch64SelectionDAGInfo::EmitTargetCodeForMemset`.
The only difference in comparison to the SelectionDAG code is that, when
compiling for minsize, this will fire for all memsets of 0. The original code
notes that it's not beneficial to do this for small memsets; however, using
bzero here will save a mov from wzr. For minsize, I think that it's preferable
to prioritise omitting the mov.
This also fixes a bug in the libcall legalization code which would delete
instructions which could not be legalized. It also adds a check to make sure
that we actually get a libcall name.
Code size improvements (Darwin):
- CTMark -Os: -0.0% geomean (-0.1% on pairlocalalign)
- CTMark -Oz: -0.2% geomean (-0.5% on bullet)
Differential Revision: https://reviews.llvm.org/D99358
getPointersDiff would previously round down the difference between two
pointers to a multiple of the element size of the pointee, which could
result in a pointer value being decreased a little.
Alexey Bataev has graciously agreed to add a testcase for this;
submitting the bugfix now to unblock.
The `CHECK` prefix was dropped in e0bf2349303f. This lead to all CHECK
lines having no effect.
Reviewed By: tmsriram
Differential Revision: https://reviews.llvm.org/D99316
This permits extern function (BTF_KIND_FUNC) be added
to BTF_KIND_DATASEC if a section name is specified.
For example,
-bash-4.4$ cat t.c
void foo(int) __attribute__((section(".kernel.funcs")));
int test(void) {
foo(5);
return 0;
}
The extern function foo (BTF_KIND_FUNC) will be put into
BTF_KIND_DATASEC with name ".kernel.funcs".
This will help to differentiate two kinds of external functions,
functions in kernel and functions defined in other bpf programs.
Differential Revision: https://reviews.llvm.org/D93563
loop:
%cmp.0 = phi i32 [ 3, %entry ], [ %inc, %loop ]
%pos.0 = phi i32 [ 1, %entry ], [ %cmp.0, %loop ]
...
%inc = add i32 %cmp.0, 1
br label %loop
On above example, %pos.0 uses previous iteration's %cmp.0 with backedge
according to PHI's instruction's defintion. If the %inc is not same among
iterations, we can say the two PHIs are not same.
Differential Revision: https://reviews.llvm.org/D98422
In DeadArgumentElimination pass, if a function's argument is never used, corresponding caller's parameter can be changed to undef. If the param/arg has attribute noundef or other related attributes, LLVM LangRef(https://llvm.org/docs/LangRef.html#parameter-attributes) says its behavior is undefined. SimplifyCFG(D97244) takes advantage of this behavior and does bad transformation on valid code.
To avoid this undefined behavior when change caller's parameter to undef, this patch removes noundef attribute and other attributes imply noundef on param/arg.
Differential Revision: https://reviews.llvm.org/D98899
As noted in the LangRef, these are semantically readnone projections from the result value of the associated statepoint. However, it turned out we had a few latent bugs being covered up by the fact we were only marking them readonly (see PR49607 for context).
As of this change, all known issues are resolved. This is a deliberately minimal patch to make it easy to test downstream and revert with minimal change if that turns out to be necessary.
Differential Revision: https://reviews.llvm.org/D98729
All of these are scoped allocations which remain dereferenceable during the lifetime of the callee.
Differential Revision: https://reviews.llvm.org/D99310
getMinRVVVectorSizeInBits() asserts if the V extension isn't
enabled. So check that gather/scatter is legal first since it
already contains a check for V extension being enabled. It
also already checks getMinRVVVectorSizeInBits for fixed length
vectors so we don't need a check in getGatherScatterOpCost.
Instructions that have more uops than the processor's IssueWidth are
issued in multiple cycles.
The patch fixes PR49712.
Differential Revision: https://reviews.llvm.org/D99339
This *only* changes the cases where we *really* don't care
about the iteration order of the underlying contained,
namely when we will use the values from it to form DTU updates.
Rather than special-casing assume in BasicAA getModRefBehavior(),
do this one level higher, in the attribute handling of CallBase.
For assumes with operand bundles, the inaccessiblememonly attribute
applies regardless of operand bundles.
The function utilizes Windows' SearchPathW function, which as I found out today, may also return directories. After looking at the Unix implementation of the file I found that it contains a check whether the found path is also executable. While fixing the Windows implementation, I also learned that sys::fs::access returns successfully when querying whether directories are executable, which the Unix version does not.
This patch makes both of these functions equivalent to their Unix implementation and insures that any path returned by sys::findProgramByName on Windows may only be executable, just like the Unix implementation.
The equivalent additions I have made to the Windows implementation, in the Unix implementation are here:
sys::findProgramByName: 39ecfe6143/llvm/lib/Support/Unix/Program.inc (L90)
sys::fs::access: c2a84771bb/llvm/lib/Support/Unix/Path.inc (L608)
I encountered this issue when running the LLVM testsuite. Commands of the form not test ... would fail to correctly execute test.exe, which is part of GnuWin32, as it actually tried to execute a folder called test, which happened to be in a directory on my PATH.
Differential Revision: https://reviews.llvm.org/D99357
If a WhileLoopStartLR is reverted due to calls in the preheader, we may
still be able to instead create a DoLoopStart, preserving the low
overhead loop. This adds code for that, only reverting the
WhileLoopStartR to a Br/Cmp, leaving the rest of the low overhead loop
in place.
Differential Revision: https://reviews.llvm.org/D98413
We look for this pattern frequently in isel patterns so its a
good idea to try to preserve it.
This also let's us remove our special isel handling for srliw
and use a direct pattern match of (srl (and X, 0xffffffff), C)
since no bits will be removed from the and mask.
Differential Revision: https://reviews.llvm.org/D99042