When prioritize call site to consider for inlining in sample loader, use number of samples as a first tier breaker before using name/guid comparison. This would favor smaller functions when hotness is the same (from the same block). We could try to retrieve accurate function size if this turns out to be more important.
Differential Revision: https://reviews.llvm.org/D99370
JITLink now requires section names to be unique. In MachO section names are only
guaranteed to be unique within their containing segment (e.g. a '__const' section
in the '__DATA' segment does not clash with a '__const' section in the '__TEXT'
segment), so we need to use the fully qualified <segment>,<section> section
names (e.g. '__DATA,__const' or '__TEXT,__const') when constructing
jitlink::Sections for MachO objects.
Darwin platforms for both AArch64 and X86 can provide optimized `bzero()`
routines. In this case, it may be preferable to use `bzero` in place of a
memset of 0.
This adds a G_BZERO generic opcode, similar to G_MEMSET et al. This opcode can
be generated by platforms which may want to use bzero.
To emit the G_BZERO, this adds a pre-legalize combine for AArch64. The
conditions for this are largely a port of the bzero case in
`AArch64SelectionDAGInfo::EmitTargetCodeForMemset`.
The only difference in comparison to the SelectionDAG code is that, when
compiling for minsize, this will fire for all memsets of 0. The original code
notes that it's not beneficial to do this for small memsets; however, using
bzero here will save a mov from wzr. For minsize, I think that it's preferable
to prioritise omitting the mov.
This also fixes a bug in the libcall legalization code which would delete
instructions which could not be legalized. It also adds a check to make sure
that we actually get a libcall name.
Code size improvements (Darwin):
- CTMark -Os: -0.0% geomean (-0.1% on pairlocalalign)
- CTMark -Oz: -0.2% geomean (-0.5% on bullet)
Differential Revision: https://reviews.llvm.org/D99358
getPointersDiff would previously round down the difference between two
pointers to a multiple of the element size of the pointee, which could
result in a pointer value being decreased a little.
Alexey Bataev has graciously agreed to add a testcase for this;
submitting the bugfix now to unblock.
The `CHECK` prefix was dropped in e0bf2349303f. This lead to all CHECK
lines having no effect.
Reviewed By: tmsriram
Differential Revision: https://reviews.llvm.org/D99316
This permits extern function (BTF_KIND_FUNC) be added
to BTF_KIND_DATASEC if a section name is specified.
For example,
-bash-4.4$ cat t.c
void foo(int) __attribute__((section(".kernel.funcs")));
int test(void) {
foo(5);
return 0;
}
The extern function foo (BTF_KIND_FUNC) will be put into
BTF_KIND_DATASEC with name ".kernel.funcs".
This will help to differentiate two kinds of external functions,
functions in kernel and functions defined in other bpf programs.
Differential Revision: https://reviews.llvm.org/D93563
loop:
%cmp.0 = phi i32 [ 3, %entry ], [ %inc, %loop ]
%pos.0 = phi i32 [ 1, %entry ], [ %cmp.0, %loop ]
...
%inc = add i32 %cmp.0, 1
br label %loop
On above example, %pos.0 uses previous iteration's %cmp.0 with backedge
according to PHI's instruction's defintion. If the %inc is not same among
iterations, we can say the two PHIs are not same.
Differential Revision: https://reviews.llvm.org/D98422
In DeadArgumentElimination pass, if a function's argument is never used, corresponding caller's parameter can be changed to undef. If the param/arg has attribute noundef or other related attributes, LLVM LangRef(https://llvm.org/docs/LangRef.html#parameter-attributes) says its behavior is undefined. SimplifyCFG(D97244) takes advantage of this behavior and does bad transformation on valid code.
To avoid this undefined behavior when change caller's parameter to undef, this patch removes noundef attribute and other attributes imply noundef on param/arg.
Differential Revision: https://reviews.llvm.org/D98899
As noted in the LangRef, these are semantically readnone projections from the result value of the associated statepoint. However, it turned out we had a few latent bugs being covered up by the fact we were only marking them readonly (see PR49607 for context).
As of this change, all known issues are resolved. This is a deliberately minimal patch to make it easy to test downstream and revert with minimal change if that turns out to be necessary.
Differential Revision: https://reviews.llvm.org/D98729
All of these are scoped allocations which remain dereferenceable during the lifetime of the callee.
Differential Revision: https://reviews.llvm.org/D99310
getMinRVVVectorSizeInBits() asserts if the V extension isn't
enabled. So check that gather/scatter is legal first since it
already contains a check for V extension being enabled. It
also already checks getMinRVVVectorSizeInBits for fixed length
vectors so we don't need a check in getGatherScatterOpCost.
Instructions that have more uops than the processor's IssueWidth are
issued in multiple cycles.
The patch fixes PR49712.
Differential Revision: https://reviews.llvm.org/D99339
This *only* changes the cases where we *really* don't care
about the iteration order of the underlying contained,
namely when we will use the values from it to form DTU updates.
Rather than special-casing assume in BasicAA getModRefBehavior(),
do this one level higher, in the attribute handling of CallBase.
For assumes with operand bundles, the inaccessiblememonly attribute
applies regardless of operand bundles.
The function utilizes Windows' SearchPathW function, which as I found out today, may also return directories. After looking at the Unix implementation of the file I found that it contains a check whether the found path is also executable. While fixing the Windows implementation, I also learned that sys::fs::access returns successfully when querying whether directories are executable, which the Unix version does not.
This patch makes both of these functions equivalent to their Unix implementation and insures that any path returned by sys::findProgramByName on Windows may only be executable, just like the Unix implementation.
The equivalent additions I have made to the Windows implementation, in the Unix implementation are here:
sys::findProgramByName: 39ecfe6143/llvm/lib/Support/Unix/Program.inc (L90)
sys::fs::access: c2a84771bb/llvm/lib/Support/Unix/Path.inc (L608)
I encountered this issue when running the LLVM testsuite. Commands of the form not test ... would fail to correctly execute test.exe, which is part of GnuWin32, as it actually tried to execute a folder called test, which happened to be in a directory on my PATH.
Differential Revision: https://reviews.llvm.org/D99357
If a WhileLoopStartLR is reverted due to calls in the preheader, we may
still be able to instead create a DoLoopStart, preserving the low
overhead loop. This adds code for that, only reverting the
WhileLoopStartR to a Br/Cmp, leaving the rest of the low overhead loop
in place.
Differential Revision: https://reviews.llvm.org/D98413
We look for this pattern frequently in isel patterns so its a
good idea to try to preserve it.
This also let's us remove our special isel handling for srliw
and use a direct pattern match of (srl (and X, 0xffffffff), C)
since no bits will be removed from the and mask.
Differential Revision: https://reviews.llvm.org/D99042
The SCEV commit b46c085d2b6d1 [NFCI] SCEVExpander:
emit intrinsics for integral {u,s}{min,max} SCEV expressions
seems to reveal a new crash in SLPVectorizer.
SLP crashes expecting a SelectInst as an externally used value
but umin() call is found.
The patch relaxes the assumption to make the IR flag propagation safe.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D99328
Summary:
The colour characters currently added to the output of -print-changed=diff
and -print-changed=diff-quiet cause difficulties when capturing the output
and examining it in an editor. Change the function to not have the colour
characters and add 2 new choices (-print-changed=cdiff and
-print-changed=cdiff-quiet) to retain the existing functionality of adding
the colour characters.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks) yrouban (Yevgeny Rouban)
Differential Revision: https://reviews.llvm.org/D97398
D95598 added a cost model for broadcast shuffle, which should enable loops
such as the following to vectorize, where the load of b[42] is invariant
and can be done using a scalar load + splat:
for (int i=0; i<n; ++i)
a[i] = b[i] + b[42];
This patch adds tests to verify that we can vectorize such loops.
Reviewed By: joechrisellis
Differential Revision: https://reviews.llvm.org/D98506
Userspace page aliasing allows us to use middle pointer bits for tags
without untagging them before syscalls or accesses. This should enable
easier experimentation with HWASan on x86_64 platforms.
Currently stack, global, and secondary heap tagging are unsupported.
Only primary heap allocations get tagged.
Note that aliasing mode will not work properly in the presence of
fork(), since heap memory will be shared between the parent and child
processes. This mode is non-ideal; we expect Intel LAM to enable full
HWASan support on x86_64 in the future.
Reviewed By: vitalybuka, eugenis
Differential Revision: https://reviews.llvm.org/D98875
In future patches I will be setting the IsText parameter frequently so I will refactor the args to be in the following order. I have removed the FileSize parameter because it is never used.
```
static ErrorOr<std::unique_ptr<MemoryBuffer>>
getFile(const Twine &Filename, bool IsText = false,
bool RequiresNullTerminator = true, bool IsVolatile = false);
static ErrorOr<std::unique_ptr<MemoryBuffer>>
getFileOrSTDIN(const Twine &Filename, bool IsText = false,
bool RequiresNullTerminator = true);
static ErrorOr<std::unique_ptr<MB>>
getFileAux(const Twine &Filename, uint64_t MapSize, uint64_t Offset,
bool IsText, bool RequiresNullTerminator, bool IsVolatile);
static ErrorOr<std::unique_ptr<WritableMemoryBuffer>>
getFile(const Twine &Filename, bool IsVolatile = false);
```
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D99182
We do not need to scan further if the upper end or lower end of the
basic block is reached already and the instruction is not found. It
means that the instruction is definitely in the lower part of basic
block or in the upper block relatively.
This should improve compile time for the very big basic blocks.
Differential Revision: https://reviews.llvm.org/D99266
In order to test the preservation of the original Debug Info metadata
in your projects, a front end option could be very useful, since users
usually report that a concrete entity (e.g. variable x, or function fn2())
is missing debug info. The [0] is an example of running the utility
on GDB Project.
This depends on: D82546 and D82545.
Differential Revision: https://reviews.llvm.org/D82547
This patch adds a small optimization for vector shuffle lowering,
detecting shuffles which can be re-expressed as vector selects.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D99270
Unswitching a loop on a non-trivial divergent branch is expensive
since it serializes the execution of both version of the
loop. But identifying a divergent branch needs divergence analysis,
which is a function level analysis.
The legacy pass manager handles this dependency by isolating such a
loop transform and rerunning the required function analyses. This
functionality is currently missing in the new pass manager, and there
is no safe way for the SimpleLoopUnswitch pass to depend on
DivergenceAnalysis. So we conservatively assume that all non-trivial
branches are divergent if the target has divergence.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D98958
This patch adds further optimization techniques to RVV BUILD_VECTOR
lowering. It teaches the compiler to find splats of larger vector
element types "hidden" in smaller ones. For example, a v4i8 build_vector
(0x1, 0x2, 0x1, 0x2) could be splat as v2i16 0x0201. This is generally
more optimal than the dominant-element BUILD_VECTORs and so takes
priority.
This optimization is currently limited to all-constant-or-undef
BUILD_VECTORs as those were found to be the most common. There's no
reason this couldn't be extended to other BUILD_VECTORs, but the
additional bit-manipulation instructions may require more sophisticated
heuristics.
There are some cases where the materialization of the larger constant
takes more scalar instructions than it does to build the vector with
vector instructions. We could add heuristics to try and catch this.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D99195
Until AVX512 we don't have any vector truncation instructions, and always lower using shuffles instead.
combineVectorTruncation performs this earlier than lowering as it makes it easier to use any sign/zero-extended bits in the truncated bits with PACKSS/PACKUS to perform the shuffle.
We currently don't attempt to use combineVectorTruncation on AVX2 targets as in the past 256-bit PACKSS/PACKUS tended to cause 128-bit lane shuffle regressions - but these should now be all resolved with combineHorizOpWithShuffle and in all cases we now reduce the amount of cross-lane shuffling and variable shuffle mask usage.
Differential Revision: https://reviews.llvm.org/D96609
LowerVSETCC calls splitIntVSETCC after canonicalizing certain patterns, in particular (X & CPow2 != 0) -> (X & CPow2 == CPow2).
Unfortunately if we're splitting for AVX1/non-AVX512BW cases, we lose these canonicalizations as we call the split with the original SetCC node, and when the split nodes are later lowered in LowerVSETCC the patterns are lost behind extract_subvector etc. But if we pass the canonicalized operands for splitting we retain the optimizations.
Differential Revision: https://reviews.llvm.org/D99256
This may occur when swifterror codegen in the translator generates these,
but we shouldn't try to handle them since they should have regclasses anyway.
rdar://75784009
Differential Revision: https://reviews.llvm.org/D99287