By replacing a lambda expression with a functor class instance, this
patch works around an issue encountered on AIX where the IBM XL compiler
appears to make no progress for many hours.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D106554
This is not the transform direction we want in general,
but by the time we have a CMOV, we've already tried
everything else that could be better.
The transform increases the uses of the other add operand,
but that is safe according to Alive2:
https://alive2.llvm.org/ce/z/Yn6p-A
We could probably extend this to other binops (not just add).
This is the motivating pattern discussed in:
https://llvm.org/PR51069
The test with i8 shows a missed fold because there's a trunc
sitting in front of the add. That can be handled with a small
follow-up.
Differential Revision: https://reviews.llvm.org/D106607
This adds custom lowering for truncating stores when operating on
fixed length vectors in SVE. It also includes a DAG combine to
fold extends followed by truncating stores into non-truncating
stores in order to prevent this pattern appearing once truncating
stores are supported.
Currently truncating stores are not used in certain cases where
the size of the vector is larger than the target vector width.
Differential Revision: https://reviews.llvm.org/D104471
Deduplication in OpenMPOpt finds redundant OpenMP runtime calls and replaces them with a single call placed in the earliest safe location in the IR. When deduplication happens in a target region this patch makes sure replacement calls are put after target_init.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106556
This adds some missing single source shuffle costs for AArch64, of i16
and i8 vectors. v4i16 are the same as v4i32 with a worse case cost of 3
coming from the perfect shuffle tables. The larger vector sizes expand
into a constant pool, plus a load (and adrp) and a tbl. I arbitrarily
chose 8 for the cost to be expensive but not too expensive.
Differential Revision: https://reviews.llvm.org/D106241
Constfold constrained variants of operations fadd, fsub, fmul, fdiv,
frem, fma and fmuladd.
The change also sets up some means to support for removal of unused
constrained intrinsics. They are declared as accessing memory to model
interaction with floating point environment, so they were not removed,
as they have side effect. Now constrained intrinsics that have
"fpexcept.ignore" as exception behavior are removed if they have no uses.
As for intrinsics that have exception behavior other than "fpexcept.ignore",
they can be removed if it is known that they do not raise floating point
exceptions. It happens when doing constant folding, attributes of such
intrinsic are changed so that the intrinsic is not claimed as accessing
memory.
Differential Revision: https://reviews.llvm.org/D102673
Clear the map when running the analysis multiple times.
The assertion that should ensure that every function is only
analyzed once triggered sometimes (once every ~70 compiles of some
graphics pipelines) when two functions of subsequent runs were allocated
at the same address.
Differential Revision: https://reviews.llvm.org/D106452
Add maximum NSA size limit as an ISA feature.
Use this to reduce NSA usage on GFX10.1 to avoid stability issues
with 4 and 5 dwords NSA instructions.
Maintain use of longer NSA instructions on GFX10.3.
Note: this also contains some minor fixes for GlobalISel which
did not work correctly with non-NSA form instructions on GFX10.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D103348
These tests show missed opportunities in the SelectionDAG layer when
dealing with scalable-vector splats. All of these are handled for the
equivalent `ISD::BUILD_VECTOR` code, and the tests have largely been
translated from the equivalent X86 tests.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D106574
A simplification callback can mean that the IR value is modified beyond
the apparent IR semantics. That is, a `i1 true` could be replaced by an
`i1 false` based on high-level domain-specific information. If a user
provides a simplification callback we will not look at the IR but
instead give up if the callback returns a nullptr.
SPMDization D102307 detects incompatible OpenMP runtime calls to abort converting a target region to SPMD mode. Calls to memory allocation/de-allocation routines kmpc_alloc_shared, kmpc_free_shared are incompatible unless they are removed by AAHeapToStack/AAHeapToShared analysis. This patch extends SPMDization detection to include AAHeapToStack/AAHeapToShared analysis results for enlarging the scope of possible SPMDized regions detected.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D105634
Fixing a typo in SampleContextTracker to use debug name when debug linkage name is no present. This should only affect C programs.
Saw 0.6% perf win on Cinder which is mostly C code.
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D106599
This patch adds llvm-readobj and the binutils symlink for readelf to
LLVM_TOOLCHAIN_TOOLS.
Tvoid *thread, void *attr,hey are required by some (most?)
autoconf-built libraries, adding these allows me to build newlib with
the toolchain generated this way.
Also opened an issue for that some days ago, see
https://bugs.llvm.org/show_bug.cgi?id=50698
Reviewed By: sbc100
Differential Revision: https://reviews.llvm.org/D104957
This should fix build breaks for 'development' mode. The other modes
were unaffected - 'release' because it doesn't use TFUtils.cpp, and the
mixed mode because the AOT compiled code brings in the necessary include
dirs anyway.
This function is called when some predecessor of an empty return block
ends with a conditional branch, with both successors being empty ret blocks.
Now, because of the way SimplifyCFG works, it might happen to simplify
one of the blocks in a way that makes a conditional branch
into an unconditional one, since it's destinations are now identical,
but it might not have actually simplified said conditional branch
into an unconditional one yet.
So, we have to check that ourselves first,
especially now that SimplifyCFG aggressively tail-merges
all ret and resume blocks.
Even if it was an unconditional branch already,
`SimplifyCFGOpt::simplifyReturn()` doesn't call `FoldReturnIntoUncondBranch()`
by default.
Reland of 31859f896.
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D104797
We previously had issues identifying macros not registered with a lowercase name.
Reviewed By: mstorsjo, thakis
Differential Revision: https://reviews.llvm.org/D106453