This diff adds llvm-bitcode-strip driver to llvm-objcopy.
In the future this will enable us to build a replacement for the tool bitcode_strip.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D87212
The code which validates the value of -id is moved into the function parseInstallNameToolOptions.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D87855
This patch implements the vec_gen[b|h|w|d|q]m function prototypes in altivec.h
in order to utilize the move to VSR with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82725
If the mask of a pdep or pext instruction is a shift masked (i.e. one contiguous block of ones) we need at most one and and one shift to represent the operation without the intrinsic. One all platforms I know of, this is faster than the pdep/pext.
The cost modelling for multiple contiguous blocks might be worth exploring in a follow up, but it's not relevant for my current use case. It would almost certainly be a win on AMDs where these are really really slow though.
Differential Revision: https://reviews.llvm.org/D87861
This changes the order of output sections and the output assembly, but
is otherwise NFC.
It simplifies the TLOF interface by removing two COFF-only methods.
Currently, -object takes a comma separated list of objects as an
argument, which prevents it working with path names that contain a
comma. Drop comma-separated support, which requires to set pass the
-object flag multiple times to set multiple objects.
Patch by Andrew Gallagher!
Differential Revision: https://reviews.llvm.org/D87003
Under NPM, the TSan passes are split into a module and function pass. A
couple tests were testing for inserted module constructors, which is
only part of the module pass.
We cannot iterate on scalable vector, the number of elements is unknown at compile-time.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D87918
findPHICopyInsertPoint special cases placement in a block with a
callbr or invoke in it. In that case, we must ensure that the copy is
placed before the INLINEASM_BR or call instruction, if the register is
defined prior to that instruction, because it may jump out of the
block.
Previously, the code placed it immediately after the last def _or
use_. This is wrong, if the use is the instruction which may jump. We
could correctly place it immediately after the last def (ignoring
uses), but that is non-optimal for register pressure.
Instead, place the copy after the last def, or before the
call/inlineasm_br, whichever is later.
Differential Revision: https://reviews.llvm.org/D87865
This rewrites big parts of the fast register allocator. The basic
strategy of doing block-local allocation hasn't changed but I tweaked
several details:
Track register state on register units instead of physical
registers. This simplifies and speeds up handling of register aliases.
Process basic blocks in reverse order: Definitions are known to end
register livetimes when walking backwards (contrary when walking
forward then uses may or may not be a kill so we need heuristics).
Check register mask operands (calls) instead of conservatively
assuming everything is clobbered. Enhance heuristics to detect
killing uses: In case of a small number of defs/uses check if they are
all in the same basic block and if so the last one is a killing use.
Enhance heuristic for copy-coalescing through hinting: We check the
first k defs of a register for COPYs rather than relying on there just
being a single definition. When testing this on the full llvm
test-suite including SPEC externals I measured:
average 5.1% reduction in code size for X86, 4.9% reduction in code on
aarch64. (ranging between 0% and 20% depending on the test) 0.5%
faster compiletime (some analysis suggests the pass is slightly slower
than before, but we more than make up for it because later passes are
faster with the reduced instruction count)
Also adds a few testcases that were broken without this patch, in
particular bug 47278.
Patch mostly by Matthias Braun
The regressions this caused should be fixed when
https://reviews.llvm.org/D52010 is applied.
This reverts commit a21387c65470417c58021f8d3194a4510bb64f46.
Since 6524a7a2b9ca072bd7f7b4355d1230e70c679d2f, this would sometimes
not emit the or to exec at the beginning of the block, where it really
has to be. If there is an instruction that defines one of the source
operands, split the block and turn the si_end_cf into a terminator.
This avoids regressions when regalloc fast is switched to inserting
reloads at the beginning of the block, instead of spills at the end of
the block.
In a future change, this should always split the block.
We already handle the the cases where we have a 'zero extended splat' build vector (a, 0, 0, 0, a, 0, 0, 0, ...) but were missing the case where the 'a' scalar was zero-extended as well - such as i64 -> vXi64 splat cases on 32-bit targets.
This reverts commit c3492a1aa1b98c8d81b0969d52cea7681f0624c2.
I think this is the wrong strategy and wrong place to do this
transform anyway. Also reverts follow up commit
7d593d0d6905b55ca1124fca5e4d1ebb17203138.
If some leaves have the same instructions to be vectorized, we may
incorrectly evaluate the best order for the root node (it is built for the
vector of instructions without repeated instructions and, thus, has less
elements than the root node). In this case we just can not try to reorder
the tree + we may calculate the wrong number of nodes that requre the
same reordering.
For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves
are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first
leaf, it will be shrink to \<a, b\>. If instructions in this leaf should
be reordered, the best order will be \<1, 0\>. We need to extend this
order for the root node. For the root node this order should look like
\<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes
with the reused instructions.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D45263
Alignment requirements for ds_read/write_b96/b128 for gfx9 and onward are
now the same as for other GCN subtargets. This way we can avoid any
unintentional use of these instructions on systems that do not support dword
alignment and instead require natural alignment.
This also makes 'SH_MEM_CONFIG.alignment_mode == STRICT' the default.
Differential Revision: https://reviews.llvm.org/D87821