This patch adds strings content checking to printf-2.ll via --check-globals flag.
Split off from D100724.
Reviewed By: xbolva00
Differential Revision: https://reviews.llvm.org/D101034
It is proper to relax non-negative limitation of step_vector.
Also this patch adds more combines for step_vector:
(sub X, step_vector(C)) -> (add X, step_vector(-C))
Differential Revision: https://reviews.llvm.org/D100812
SmallSet may use operator `<` when we insert MIRef elements, so we
cannot limit the comparison between different BBs.
We allow MIRef() to be less that any initialized MIRef object, otherwise,
we always reture false when compare between different BBs.
Differential Revision: https://reviews.llvm.org/D101039
This patch fixes an issue in which ConstantAsMetadata arguments to a
DIArglist, as well as the Constant values referenced by that metadata,
would not be always be emitted correctly into bitcode. This patch fixes
this issue firstly by searching for ConstantAsMetadata in DIArgLists
(previously we would only search for them when directly wrapped in
MetadataAsValue), and secondly by enumerating all of a DIArgList's
arguments directly prior to enumerating the DIArgList itself.
This patch also adds a number of asserts, and no longer treats the
arguments to a DIArgList as optional fields when reading/writing to
bitcode.
Differential Revision: https://reviews.llvm.org/D100572
We currently do not utilize instructions that convert single
precision vectors to doubleword integer vectors. These conversions
come up in code occasionally and this improvement allows us to
open code some functions that need to be added to altivec.h.
When inspecting the calling convention, for calling windows functions
from a non-windows function, inspect the calling convention of
the called function, not the caller.
Also remove an unnecessary parameter to AArch64CallLowering
OutgoingArgHandler.
Differential Revision: https://reviews.llvm.org/D100890
STRICT_WWM and STRICT_WQM are already defined with Uses = [EXEC], so
there is no need to add another implicit use of $exec when lowering them
to V_MOV_B32 instructions.
Differential Revision: https://reviews.llvm.org/D100969
In quite a few cases in LoopVectorize.cpp we call createStepForVF
with a step value of 0, which leads to unnecessary generation of
llvm.vscale intrinsic calls. I've optimised IRBuilder::CreateVScale
and createStepForVF to return 0 when attempting to multiply
vscale by 0.
Differential Revision: https://reviews.llvm.org/D100763
The change adds support for triming and merging cold context when mergine CSSPGO profiles using llvm-profdata. This is similar to the context profile trimming in llvm-profgen, however the flexibility to trim cold context after profile is generated can be useful.
Differential Revision: https://reviews.llvm.org/D100528
The value is always an immediate and can never be in a register.
This the kind of thing TargetConstant is for.
Saves a step GenDAGISel to convert a Constant to a TargetConstant.
This patch allows PRE of the following type of loads:
```
preheader:
br label %loop
loop:
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
br label %merge
merge:
...
br i1 ..., label %loop, label %exit
```
Into
```
preheader:
%x0 = load %p
br label %loop
loop:
%x.pre = phi(x0, x2)
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
%x1 = load %p
br label %merge
merge:
x2 = phi(x.pre, x1)
...
br i1 ..., label %loop, label %exit
```
So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.
The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.
There are several improvements prospect open up:
- We can sometimes be smarter in loop-exiting blocks via split of critical edges;
- If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that
we don't know if their sum is colder than the header.
Differential Revision: https://reviews.llvm.org/D99926
Reviewed By: reames
This recognizes the case when Hi is (sra Lo, 31). We can use
SPLAT_VECTOR_I64 rather than splatting the high bits and
combining them in the vector register.
Summary: The original logic seems to be we could collecting a CoroBegin
if one of the terminators could be dominated by one of coro.destroy,
which doesn't make sense.
This patch rewrites the logics to collect CoroBegin if all of
terminators are dominated by one coro.destroy. If there is no such
coro.destroy, we would call hasEscapePath to evaluate if we should
collect it.
Test Plan: check-llvm
Reviewed by: lxfind
Differential Revision: https://reviews.llvm.org/D100614
Don't phrase the semantics in terms of the optimizer. Instead have a
more straightforward execution based semantic.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D63439
This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions.
Reviewed By: jdoerfert, Meinersbur
Differential Revision: https://reviews.llvm.org/D95976
Each of the cases marked as legal here have an imported pattern in
AArch64GenGlobalISel.inc. So, if we mark them as legal, we get selection for
free.
Technically this is only supposed to happen if we have NEON support. But, we
fall back if we don't have that in the legalizer right now. I suppose it'd be
better to have a FIXME so we can write the testcase when the time comes.
(Plus, it'd just fall back in selection if NEON isn't available, so it's not
*wrong*, I guess?)
This fixes some fallbacks in the test suite.
(Also use `isScalar` from LegalityPredicates.cpp while we're here just to tidy
things a little bit.)
Differential Revision: https://reviews.llvm.org/D100916
Report dangling probes for frames that have real samples collected. Dangling probes are the probes associated to an empty block. When reported, sample count on a dangling probe will not be trusted by the compiler and we will rely on the counts inference algorithm to get the probe a reasonable count. This actually fixes a bug where previously only those dangling probes with samples collected were reported.
This patch also fixes two existing issues. Pseudo probes are stored in `Address2ProbesMap` and their pointers are used in `PseudoProbeInlineTree`. Previously `std::vector` was used to store probes and the pointers to probes may get obsolete as the vector grows. I'm changing `std::vector` to `std::list` instead.
The other issue is that all outlined functions shared the same inline frame previously due to the unchanged `Index` value as the dummy inlineSite identifier.
Good results seen for SPEC2017 in general regarding profile quality.
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D100235
On ELF targets, if a function has uwtable or personality, or does not have
nounwind (`needsUnwindTableEntry`), it marks that `.eh_frame` is needed in the module.
Then, a function gets `.eh_frame` if `needsUnwindTableEntry` or `-g[123]` is specified.
(i.e. If -g[123], every function gets `.eh_frame`.
This behavior is strange but that is the status quo on GCC and Clang.)
Let's take asan as an example. Other sanitizers are similar.
`asan.module_[cd]tor` has no attribute. `needsUnwindTableEntry` returns true,
so every function gets `.eh_frame` if `-g[123]` is specified.
This is the root cause that
`-fno-exceptions -fno-asynchronous-unwind-tables -g` produces .debug_frame
while
`-fno-exceptions -fno-asynchronous-unwind-tables -g -fsanitize=address` produces .eh_frame.
This patch
* sets the nounwind attribute on sanitizer module ctor/dtor.
* let Clang emit a module flag metadata "uwtable" for -fasynchronous-unwind-tables. If "uwtable" is set, sanitizer module ctor/dtor additionally get the uwtable attribute.
The "uwtable" mechanism is generic: synthesized functions not cloned/specialized
from existing ones should consider `Function::createWithDefaultAttr` instead of
`Function::create` if they want to get some default attributes which
have more of module semantics.
Other candidates: "frame-pointer" (https://github.com/ClangBuiltLinux/linux/issues/955https://github.com/ClangBuiltLinux/linux/issues/1238), dso_local, etc.
Differential Revision: https://reviews.llvm.org/D100251
This change adds debug information about whether PGO is being used or
not.
Microsoft performance tooling (e.g. xperf, WPA) uses this information to
show whether functions are optimized with PGO or not, as well as whether
PGO information is invalid.
This information is useful for validating whether training scenarios are
providing good coverage of real world scenarios, showing if profile data
is out of date, etc.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D99994
This previously made references to 2.3-draft which was a short
lived version number in 2017. It was replaced by date based
versions leading up to ratification.
This patch uses the latest ratified version number and just says
what the behavior is. Nothing here is in flux.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D100878
This was checked in some asserts, but not enforced by the
instruction matching.
There's still a second bug that we don't check that vt and vd
are different registers, but that will require custom checking.
Differential Revision: https://reviews.llvm.org/D100928
This fixed issue introduced in 16af97393346ad636298605930a8b503a55eb40a
and 796feb61637c407aefcc0d462f24a1cc41f350d8.
Differential Revision: https://reviews.llvm.org/D100909
This makes the memcpy-memcpy and memcpy-memset optimizations work for
variable sizes as long as they are equal, relaxing the old restriction
that they are constant integers. If they're not equal, the old
requirement that they are constant integers with certain size
restrictions is used.
The implementation works by pushing the length tests further down in the
code, which reveals some places where it's enough that the lengths are
equal (but not necessarily constant).
Differential Revision: https://reviews.llvm.org/D100870
Trying to evaluate a GEP would assert with
"Ty == cast<PointerType>(C->getType()->getScalarType())->getElementType()"
because the type of the pointer we would evaluate the GEP argument to
would be a different type than the GEP was expecting. We should treat
pointer stripping as a bitcast.
The test adds a redundant GEP that would crash due to type mismatch.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D100970
This reverts commit 9423f78240a216e3f38b394a41fe3427dee22c26.
A performance regression with this patch has been reported at
https://reviews.llvm.org/rG9423f78240a2#990953. Reverting for now.
Discovered during attributor testing comparing stats with
and without the attributor. Willreturn should not be inferred
for nonexact definitions.
Differential Revision: https://reviews.llvm.org/D100988
Fix for PR49984
This was discovered during Attributor testing.
Memset was always created with alignment of 1
and in case when strncpy alignment was changed
it triggered an assertion in the AttrBuilder.
Memset will now be created with appropriate alignment.
Differential Revision: https://reviews.llvm.org/D100875
PR50049 demonstrated an infinite loop between OR(SHUFFLE,SHUFFLE) <-> BLEND(SHUFFLE,SHUFFLE) patterns.
The UNDEF elements were allowing a combined shuffle mask to be widened which lost the undef element, resulting us needing to use the BLEND pattern (as the undef element would need to be zero for the OR pattern). But then bitcast folds would re-expose the undef element allowing us to use OR again.....