This adds some missing single source shuffle costs for AArch64, of i16
and i8 vectors. v4i16 are the same as v4i32 with a worse case cost of 3
coming from the perfect shuffle tables. The larger vector sizes expand
into a constant pool, plus a load (and adrp) and a tbl. I arbitrarily
chose 8 for the cost to be expensive but not too expensive.
Differential Revision: https://reviews.llvm.org/D106241
Constfold constrained variants of operations fadd, fsub, fmul, fdiv,
frem, fma and fmuladd.
The change also sets up some means to support for removal of unused
constrained intrinsics. They are declared as accessing memory to model
interaction with floating point environment, so they were not removed,
as they have side effect. Now constrained intrinsics that have
"fpexcept.ignore" as exception behavior are removed if they have no uses.
As for intrinsics that have exception behavior other than "fpexcept.ignore",
they can be removed if it is known that they do not raise floating point
exceptions. It happens when doing constant folding, attributes of such
intrinsic are changed so that the intrinsic is not claimed as accessing
memory.
Differential Revision: https://reviews.llvm.org/D102673
Clear the map when running the analysis multiple times.
The assertion that should ensure that every function is only
analyzed once triggered sometimes (once every ~70 compiles of some
graphics pipelines) when two functions of subsequent runs were allocated
at the same address.
Differential Revision: https://reviews.llvm.org/D106452
Add maximum NSA size limit as an ISA feature.
Use this to reduce NSA usage on GFX10.1 to avoid stability issues
with 4 and 5 dwords NSA instructions.
Maintain use of longer NSA instructions on GFX10.3.
Note: this also contains some minor fixes for GlobalISel which
did not work correctly with non-NSA form instructions on GFX10.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D103348
These tests show missed opportunities in the SelectionDAG layer when
dealing with scalable-vector splats. All of these are handled for the
equivalent `ISD::BUILD_VECTOR` code, and the tests have largely been
translated from the equivalent X86 tests.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D106574
A simplification callback can mean that the IR value is modified beyond
the apparent IR semantics. That is, a `i1 true` could be replaced by an
`i1 false` based on high-level domain-specific information. If a user
provides a simplification callback we will not look at the IR but
instead give up if the callback returns a nullptr.
SPMDization D102307 detects incompatible OpenMP runtime calls to abort converting a target region to SPMD mode. Calls to memory allocation/de-allocation routines kmpc_alloc_shared, kmpc_free_shared are incompatible unless they are removed by AAHeapToStack/AAHeapToShared analysis. This patch extends SPMDization detection to include AAHeapToStack/AAHeapToShared analysis results for enlarging the scope of possible SPMDized regions detected.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D105634
Fixing a typo in SampleContextTracker to use debug name when debug linkage name is no present. This should only affect C programs.
Saw 0.6% perf win on Cinder which is mostly C code.
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D106599
This patch adds llvm-readobj and the binutils symlink for readelf to
LLVM_TOOLCHAIN_TOOLS.
Tvoid *thread, void *attr,hey are required by some (most?)
autoconf-built libraries, adding these allows me to build newlib with
the toolchain generated this way.
Also opened an issue for that some days ago, see
https://bugs.llvm.org/show_bug.cgi?id=50698
Reviewed By: sbc100
Differential Revision: https://reviews.llvm.org/D104957
This should fix build breaks for 'development' mode. The other modes
were unaffected - 'release' because it doesn't use TFUtils.cpp, and the
mixed mode because the AOT compiled code brings in the necessary include
dirs anyway.
This function is called when some predecessor of an empty return block
ends with a conditional branch, with both successors being empty ret blocks.
Now, because of the way SimplifyCFG works, it might happen to simplify
one of the blocks in a way that makes a conditional branch
into an unconditional one, since it's destinations are now identical,
but it might not have actually simplified said conditional branch
into an unconditional one yet.
So, we have to check that ourselves first,
especially now that SimplifyCFG aggressively tail-merges
all ret and resume blocks.
Even if it was an unconditional branch already,
`SimplifyCFGOpt::simplifyReturn()` doesn't call `FoldReturnIntoUncondBranch()`
by default.
Reland of 31859f896.
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D104797
We previously had issues identifying macros not registered with a lowercase name.
Reviewed By: mstorsjo, thakis
Differential Revision: https://reviews.llvm.org/D106453
The logical (select) form of and/or will now be a source of problems.
We don't really account for it's inverted form, yet it exists,
and presumably we should treat it just like non-inverted form:
https://alive2.llvm.org/ce/z/BU9AXkhttps://bugs.llvm.org/show_bug.cgi?id=51149 reports a reportedly-serious
perf regression that will hopefully be mitigated by this.
Update shl/lshr/ashr costs based on the worst case costs from the script in D103695 - many of the 128-bit shifts (usually where integer multiplies aren't used) have similar behaviour to AVX1 so we can merge them.
Noticed while trying to clean up the shift costs model for SSE4 targets using the script in D10369 - SLM double-pumps all the 128-bit vector conversion ops and only use FP0 pipe - numbers taken from Intel AOM + Agner.