GFX10 image instructions use one or more address operands starting at
vaddr0, instead of a single vaddr operand, to allow for NSA forms.
Differential Revision: https://reviews.llvm.org/D81675
Fix the division/remainder algorithm by adding a second quotient
refinement step, which is required in some cases like
0xFFFFFFFFu / 0x11111111u (https://bugs.llvm.org/show_bug.cgi?id=46212).
Also document, rewrite and simplify it by ensuring that we always have a
lower bound on inv(y), which simplifies the UNR step and the quotient
refinement steps.
Differential Revision: https://reviews.llvm.org/D83381
This patch extends D58439 (`llvm/test/{yaml2obj,obj2yaml}/**/*.yaml`) and runs all
`llvm/test/**/*.yaml`
Many directories have configured `.yaml` (see the deleted lit.local.cfg
files). Yet still some don't configure .yaml and have caused stale tests:
* 8c5825befb7bbb2e76f7eccedc6d3bf26e9b2a6a test/llvm-readobj
* bdc3134e237737dd46b51cd1ecd41ecbbe9f921a test/ExecutionEngine
Just hoist .yaml to `llvm/test/lit.cfg.py`. Also delete .cxx which is
not used. The number of tests running on my machine increases from 38304 to 38309.
The list of new tests:
```
ExecutionEngine/RuntimeDyld/X86/ELF_x86-64_none.yaml
Object/archive-error-tmp.txt
tools/llvm-ar/coff-weak.yaml
tools/llvm-readobj/ELF/verneed-flags.yaml
tools/obj2yaml/COFF/bss.s
```
Reviewed By: grimar, jhenderson, rupprecht
Differential Revision: https://reviews.llvm.org/D83350
This removes existing code duplication and allows us to
assert that we are handling the expected cases.
We have a list of outstanding bugs that could benefit by
handling truncated source values, so that's a possible
addition going forward.
by default.
sample-profile-top-down-load is an internal option which can enable top-down
order of inlining and profile annotation in sample profile load pass. It was
found to be beneficial for better profile annotation.
Recently we found it could also solve some build time issue. Suppose function
A has many callsites in function B. In the last release binary where sample
profile was collected, the outline copy of A is large because there are many
other functions inlined into A. However although all the callsites calling A
in B are inlined, but every inlined body is small (A was inlined into B
before other functions are inlined into A), there is no build time issue in
last release.
In an optimized build using the sample profile collected from last release,
without top-down inlining, we saw a case that A got very large because of
inlining, and then multiple callsites of A got inlined into B, and that led
to a huge B which caused significant build time issue besides profile
annotation issue.
To solve that problem, the patch enables the flag
sample-profile-top-down-load by default. sample-profile-top-down-load can
have better performance when it is enabled together with
sample-profile-merge-inlinee so in this patch we also enable
sample-profile-merge-inlinee by default.
Differential Revision: https://reviews.llvm.org/D82919
Revision e1de2773a534957305d7a559c6d88c4b5ac354e2 provided support for
accepting integer registers in inline asm i.e.
__asm("lhi %r0, 5") -> lhi %r0, 5
__asm("lhi 0, 5") -> lhi 0,5
This patch aims to extend this support to instructions which compute
addresses as well. (i.e instructions of type BDMem and BD[X|R|V|L]Mem)
Author: anirudhp
Differential Revision: https://reviews.llvm.org/D83251
Summary:
Almost all uses of these iterators, including implicit ones, really
only need the const variant (as it should be). The only exception is
in NewGVN, which changes the order of dominator tree child nodes.
Change-Id: I4b5bd71e32d71b0c67b03d4927d93fe9413726d4
Reviewers: arsenm, RKSimon, mehdi_amini, courbet, rriddle, aartbik
Subscribers: wdng, Prazek, hiraditya, kuhar, rogfer01, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, vkmr, Kayjukh, jurahul, msifontes, cfe-commits, llvm-commits
Tags: #clang, #mlir, #llvm
Differential Revision: https://reviews.llvm.org/D83087
There's no reason to introduce a new option for the NPM.
The various PGO options are shared in this manner.
Reviewed By: echristo
Differential Revision: https://reviews.llvm.org/D83368
This cleans up the stack allocated by a @llvm.call.preallocated.setup.
Should either call the teardown or the preallocated call to clean up the
stack. Calling both is UB.
Add LangRef.
Add verifier check that the token argument is a @llvm.call.preallocated.setup.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D83354
ExpandVectorBuildThroughStack is also used for CONCAT_VECTORS.
However, when calculating the offsets for each of the operands we
incorrectly use the element size rather than actual size and thus
the stores overlap.
Differential Revision: https://reviews.llvm.org/D83303
The approach is simple: if a pass reports that it's not modifying a
Function/Module, compute a loose hash of that Function/Module and compare it
with the original one. If we report no change but there's a hash change, then we
have an error.
This approach misses a lot of change but it's not super intrusive and can
detect most of the simple mistakes.
Differential Revision: https://reviews.llvm.org/D80916
Summary:
D82193 exposed a problem with global type definitions in
`OMPConstants.h`. This causes a race when running in thinLTO mode.
Types now live inside of OpenMPIRBuilder to prevent this from happening.
Reviewers: jdoerfert
Subscribers: yaxunl, hiraditya, guansong, dexonsmith, aaron.ballman, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D83176
At the moment this place does not check maximum size set
by TTI and just creates a maximum possible vectors.
Differential Revision: https://reviews.llvm.org/D82227
Summary:
The following combine currently breaks in the DAGCombiner:
```
extract_vector_elt (concat_vectors v4i16:a, v4i16:b), x
-> extract_vector_elt a, x
```
This happens because after we have combined these nodes we have inserted nodes
that use individual instances of the vector element type. In the above example
i16. However this isn't a legal type on all backends, and when the combining pass calls
the legalizer it breaks as it expects types to already be legal. The type legalizer has
already been run, and running it again would make a mess of the nodes.
In the example code at least, the generated code is still efficient after the change.
Reviewers: miyuki, arsenm, dmgreen, lebedev.ri
Reviewed By: miyuki, lebedev.ri
Subscribers: lebedev.ri, wdng, hiraditya, steven.zhang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83231
vselect ((X & Pow2C) == 0), LHS, RHS --> vselect ((shl X, C') < 0), RHS, LHS
Follow-up to D83073 - the non-splat mask cases where we actually see an
improvement are quite limited from what I can tell. AVX1 needs multiply
and blend capabilities and AVX2 needs vector shift and blend capabilities.
The intersection of those 2 constraints is only vectors with 32-bit or
64-bit elements.
XOP is/was better.
Differential Revision: https://reviews.llvm.org/D83181
The name of the make program does not necessarily match "ninja",
especially if an alternative implementation like samurai is used.
Using CMAKE_GENERATOR is a more robust detection method, and is
already used elsewhere in this file.
Differential revision: https://reviews.llvm.org/D77091
On SKX targets we end up loading a v16i8 PSHUFB mask from a v32i8 constant and scaling incorrectly indexes the demanded elts mask - we're missing a check that the constant pool is the same size as the loaded mask.
Test case from D81791 post-commit review.
https://reviews.llvm.org/D69701 added support for on-the-fly argument
changes for update scripts. I recently wanted to keep some manual check
lines in a test generated by update_cc_test_checks.py in our CHERI fork, so
this commit adds support for UTC_ARGS in update_cc_test_checks.py. And since
I was refactoring the code to be in common.py, I also added it for
update_llc_test_checks.py.
Reviewed By: jdoerfert, MaskRay
Differential Revision: https://reviews.llvm.org/D78478
I intend to reuse this to add UTC_ARGS support for update_llc_test_checks.py
and update_cc_test_checks.py in D78478.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D78618
We use extact_subvector and insert_subvector to "cast" between
fixed length and scalable vectors. This patch adds custom c++
based ISel for the following cases:
fixed_vector = ISD::EXTRACT_SUBVECTOR scalable_vector, 0
scalable_vector = ISD::INSERT_SUBVECTOR undef(scalable_vector), fixed_vector, 0
Which result in either EXTRACT_SUBREG/INSERT_SUBREG for NEON sized
vectors or COPY_TO_REGCLASS otherwise.
Differential Revision: https://reviews.llvm.org/D82871
Occasionally we see absolutely massive basic blocks, typically in global
constructors that are vulnerable to heavy inlining. When these blocks are
dense with DBG_VALUE instructions, we can hit near quadratic complexity in
DwarfDebug's validThroughout function. The problem is caused by:
* validThroughout having to step through all instructions in the block to
examine their lexical scope,
* and a high proportion of instructions in that block being DBG_VALUEs
for a unique variable fragment,
Leading to us stepping through every instruction in the block, for (nearly)
each instruction in the block.
By adding this guard, we force variables in large blocks to use a location
list rather than a single-location expression, as shown in the added test.
This shouldn't change the meaning of the output DWARF at all: instead we
use a less efficient DWARF encoding to avoid a poor-performance code path.
Differential Revision: https://reviews.llvm.org/D83236
Summary:
I think, this results in much more understandable/readable flow.
At least the original logic was perhaps the most hard thing for me to grasp when taking an initial look on the delta passes.
Reviewers: nickdesaulniers, dblaikie, diegotf, george.burgess.iv
Reviewed By: nickdesaulniers
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83287
There are following issues with `CFIProgram::parse` code:
1) Invalid CFI opcodes were never tested. And currently a test would fail
when the `LLVM_ENABLE_ABI_BREAKING_CHECKS` is enabled. It happens because
the `DataExtractor::Cursor C` remains unchecked when the
"Invalid extended CFI opcode" error is reported:
```
.eh_frame section at offset 0x1128 address 0x0:
Program aborted due to an unhandled Error:
Error value was Success. (Note: Success values must still be checked prior to being destroyed).
```
2) It is impossible to reach the "Invalid primary CFI opcode" error with the current code.
There are 3 possible primary opcode values and all of them are handled. Hence this error
should be replaced with llvm_unreachable.
3) Errors currently reported are upper-case.
This patch refines the code in the `CFIProgram::parse` method to fix all issues mentioned
and adds unit tests for all possible invalid extended CFI opcodes.
Differential revision: https://reviews.llvm.org/D82868
This is a follow-up for D83225. This does the following:
1) Adds missing tests for existent errors.
2) Stops using `unwrapOrError` to propagate errors to caller.
(I am trying to get rid of all `unwrapOrErr` calls in the llvm-readelf code).
3) Improves error messages reported slightly.
Differential revision: https://reviews.llvm.org/D83314
In DAGTypeLegalizer::SplitVecRes_ExtendOp I have replaced an invalid
call to getVectorNumElements() with a call to getVectorMinNumElements(),
since the code path works for both fixed and scalable vectors.
This fixes up a warning in the following test:
sve-sext-zext.ll
Differential Revision: https://reviews.llvm.org/D83197
Calling getVectorNumElements() is not safe for scalable vectors and we
should normally use getVectorElementCount() instead. However, for the
code changed in this patch I decided to simply move the instantiation of
the variable 'OutNumElems' lower down to the place where only fixed-width
vectors are used, and hence it is safe to call getVectorNumElements().
Fixes up one warning in this test:
sve-sext-zext.ll
Differential Revision: https://reviews.llvm.org/D83195
For the GetElementPtr case in function
AddressingModeMatcher::matchOperationAddr
I've changed the code to use the TypeSize class instead of relying
upon the implicit conversion to a uint64_t. As part of this we now
check for scalable types and if we encounter one just bail out for
now as the subsequent optimisations doesn't currently support them.
This changes fixes up all warnings in the following tests:
llvm/test/CodeGen/AArch64/sve-ld1-addressing-mode-reg-imm.ll
llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll
Differential Revision: https://reviews.llvm.org/D83124
`__stack_chk_fail` does not return, but `unreachable` was not generated
following `call __stack_chk_fail`. This had a possibility to generate an
invalid binary for functions with a return type, because
`__stack_chk_fail`'s return type is void and `call __stack_chk_fail` can
be the last instruction in the function whose return type is non-void.
Generating `unreachable` after it makes sure CFGStackify's
`fixEndsAtEndOfFunction` handles it correctly.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D83277