If SimplifyWithOpReplaced() cannot simplify the value, null should
be returned. Make sure this really does happen in all cases,
including those where SimplifyBinOp() returns the original value.
This does not matter for existing users, but does mattter for
D87480, which would go into an infinite loop otherwise.
With this extension the effects of `omp begin declare variant` will be
applied to template function declarations. The behavior is opt-in and
controlled by the `extension(allow_templates)` trait. While generally
useful, this will enable us to implement complex math function calls by
overloading the templates of the standard library with the ones in
libc++.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D85735
This extension allows to declare variants in between `omp begin/end
declare variant` that do not match the type of the existing function
with that name. Without this extension we would not find a base function
(with a compatible type), therefore create a new one, which would
cause conflicting declarations. With this extension we will not create
"missing" base functions, which basically renders these specializations
harmless. They will be generated but never called.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D85878
Without this patch, obj2yaml decodes the content of only one ".stack_size" section. Other sections are dumped with their full contents.
Reviewed By: grimar, MaskRay
Differential Revision: https://reviews.llvm.org/D87727
If you want to build everything, building the default target
via just `ninja` is better, but `ninja all` shouldn't give you
compile errors -- this fixes that.
https://reviews.llvm.org/D86393
Patch adds five new `GICombinerRules`, one for each of the following unary
FP instrs: `G_FNEG`, `G_FABS`, `G_FPTRUNC`, `G_FSQRT`, and `G_FLOG2`. The
combine rules perform the FP operation on the constant operand and replace
the original instr with the result. Patch additionally adds new combiner
tests for the AArch64 target to test these new combiner rules.
A new hidden option -print-changed is added along with code to support
printing the IR as it passes through the opt pipeline in the new pass
manager. Only those passes that change the IR are reported, with others
only having the banner reported, indicating that they did not change the
IR, were filtered out or ignored. Filtering of output via the
-filter-print-funcs is supported and a new supporting hidden option
-filter-passes is added. The latter takes a comma separated list of pass
names and filters the output to only show those passes in the list that
change the IR. The output can also be modified via the -print-module-scope
function.
The code introduces a template base class that generalizes the comparison
of IRs that takes an IR representation as template parameter. The
constructor takes a series of lambdas that provide an event based API
for generalized reporting of IRs as they are changed in the opt pipeline
through the new pass manager.
The first of several instantiations is provided that prints the IR
in a form similar to that produced by -print-after-all with the above
mentioned filtering capabilities. This version, and the others to
follow will be introduced at the upcoming developer's conference.
Reviewed By: aeubanks (Arthur Eubanks), yrouban (Yevgeny Rouban), ychen (Yuanfang Chen)
Differential Revision: https://reviews.llvm.org/D86360
This currently has no impact on code, but prevents sizeable code size
regressions after D52010. This prevents spilling and reloading all
values inside blocks that loop back. Add a baseline test which would
regress without this patch.
Most clients only need CVType and CVSymbol, not structs for every type
and symbol. Move CVSymbol and CVType to CVRecord.h to accomplish this.
Update some of the common headers that need CVSymbol and CVType to use
the new location.
eliminateFrameIndex won't fix up the offset register when the direct
frame index reference is moved to a separate move instruction. Switch
the offset to a base 0 (which it probably should be to begin with).
Additional sanity checks were added to get.active.lane.mask's second argument,
the loop tripcount/elementcount, in rG635b87511ec3. Like the other (overflow)
checks, skip this if tail-predication is forced.
Differential Revision: https://reviews.llvm.org/D87769
This patch prevents the `llvm.masked.gather` and `llvm.masked.scatter` intrinsics to be scalarized when invoked on scalable vectors.
The change in `Function.cpp` is needed to prevent the warning that is raised when `getNumElements` is used in place of `getElementCount` on `VectorType` instances. The tests guards for regressions on this change.
The tests makes sure that calls to `llvm.masked.[gather|scatter]` are still scalarized when:
# the intrinsics are operating on fixed size vectors, and
# the compiler is not targeting fixed length SVE code generation.
Reviewed By: efriedma, sdesmalen
Differential Revision: https://reviews.llvm.org/D86249
'require<globals-aa>' is needed to make globals-aa work in NPM, since
globals-aa is a module analysis but function passes cannot run module
analyses on demand.
So don't skip translating alias analyses to 'require<>'.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D87743
WeakRefDirective should specify a directive to declare "a global as being a weak undefined symbol".
The directive used by AMDGPU was incorrect - ".weakref" was intended for other purposes.
The correct directive is ".weak" and it is already defined as default for ELF.
So the redefinition was removed.
Reviewers: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D87762
Also renamed the fields to follow style guidelines.
Accessors help with readability - weight mutation, in particular,
is easier to follow this way.
Differential Revision: https://reviews.llvm.org/D87725
Fix lowering and instruction selection for v3x16 types
and enable InstCombine to emit them.
This patch only implements it for the selection dag.
GlobalISel tests in GlobalISel/llvm.amdgcn.image.load.1d.d16.ll and
GlobalISel/llvm.amdgcn.image.store.2d.d16.ll still don't work.
Differential Revision: https://reviews.llvm.org/D84420
Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is
marked as having unmodeled side effects, which makes the machine
scheduler treat it as a barrier. Now that we have proper implicit $mode
operands we can use a no-side-effects S_SETREG_B32_mode pseudo instead
for setregs that only touch the FP MODE bits, to give the scheduler more
freedom.
Differential Revision: https://reviews.llvm.org/D87446
Now that we're getting better at combining shuffles of different vector widths, this can now be performed as part of the standard target shuffle combines and isn't required for cleanup.
Exposed a minor issue in combineX86ShufflesRecursively where we failed to check if a shuffle's src ops were simple types.
https://bugs.llvm.org/show_bug.cgi?id=45932
assert(OutlinedFunctionCost >= Cloner.OutlinedRegionCost && "Outlined function cost should be no less than the outlined region") getting triggered in computeBBInlineCost.
Intrinsics like "assume" are considered regular function calls while computing costs.
This patch enables computeBBInlineCost to queries TTI for intrinsic call cost.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D87132
We've fixed the case where this could return an instruction after the
given instruction, but also means that we can falsely return a
'unique' def when they could be one coming from the backedge of a
loop.
Differential Revision: https://reviews.llvm.org/D87751