Summary:
This completes the needed glueing to support reading tbd files from nm.
This includes specifying which slice filtering with `--arch` and a new
option specifically for tbd files `--add-inlinedinfo` which will show
the reexported libraries that are appended in the tbd file.
Reviewers: ributzka, steven_wu, JDevlieghere, jhenderson
Reviewed By: JDevlieghere
Subscribers: hiraditya, MaskRay, dexonsmith, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81614
This was implicitly assuming the branch instruction was the next after
the pseudo. It's possible for another non-terminator instruction to be
inserted between the intrinsic and the branch, so adjust the insertion
point. Fixes a non-terminator after terminator verifier error (which
without the verifier, manifested itself as an infinite loop in
analyzeBranch much later on).
- Renaming the printer class, flag
- Refactoring
- Changing some tests
This patch is a preparational stage for introducing a new printing pass and new
functionality to the existing Annotation Writer. I plan to extend
this functionality for this tool to be more useful when looking at the inline
process.
This reverts part of D81156.
Accessing errs() concurrently was safe before and racy after D81156.
(`errs() << 'a'` is always racy)
Accessing outs() and errs() concurrently was safe before and racy after D81156.
Don't tie errs() to outs() by default to fix the fallout.
llvm-dwarfdump is single-threaded and opting in the tie behavior is safe.
The exact same #if is already inside isCpuIdSupported and causes
it to return true. The definition of isCpuIdSupported isn't
conditional so we should be able just rely on its body doing
the right thing.
Summary:
After their range checks were removed in 7f50c15be5c0, br_tables
started being duplicated into their predecessors by tail
folding. Unfortunately, when the br_tables were in loops this
transformation introduced bad irreducible control flow which was later
expanded into even more br_tables. This commit abuses the
`isNotDuplicable` property to prevent this irreducible control flow
from being introduced. This change saves a few dozen bytes of code
size and has a negligible affect on performance for most of the large
Emscripten benchmarks, but can improve performance significantly on
microbenchmarks of switches in loops.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81628
The spec for these says they need 0xf3 but also mentions REP
before the mnemonic. But I don't think its fair to users to make
them write REP first. And gas doesn't make them. objdump seems to
disassemble with or without the prefix and just prints any 0xf3
as REP.
Add a unit test that shows how CSE works if we install an observer
at the machine function level and not use the CSEMIRBuilder to build
instructions.
https://reviews.llvm.org/D81625
This is a rule that seems to have been enforced for the better part of
the decade, so we should document it for new contributors.
Differential Revision: https://reviews.llvm.org/D80947
'NP' means that the instruction is not recognized with a 66, F2 or F3
prefix. It will either #UD or decode to a different instruction.
All of the cases are here should fall into the #UD variety since
we should be detecting the collision with other instructions when
we build the disassembler tables.
SUMMARY:
Since we deal with aix emitLinkage in the PPCAIXAsmPrinter::emitLinkage() in the patch https://reviews.llvm.org/D75866. It do not go to AsmPrinter::emitLinkage() any more, we clean up some aix related code in the AsmPrinter::emitLinkage()
Reviewers: Jason liu
Differential Revision: https://reviews.llvm.org/D81613
Put AND before ADD in LegalizerHelper::lowerFPTRUNC_F64_TO_F16
in order to match algorithm from AMDGPUTargetLowering::LowerFP_TO_FP16.
Differential Revision: https://reviews.llvm.org/D81666
Place the instruction at the 24th column (0-based indexing), matching
GNU objdump ARM/AArch64/powerpc/etc when the address is low.
This is beneficial for non-x86 targets which have short instruction
lengths.
```
// GNU objdump AArch64
0: 91001062 add x2, x3, #0x4
400078: 91001062 add x2, x3, #0x4
// llvm-objdump, with this patch
0: 62 10 00 91 add x2, x3, #4
400078: 62 10 00 91 add x2, x3, #4
// llvm-objdump, if we change to print a word instead of bytes in the future
0: 91001062 add x2, x3, #4
400078: 91001062 add x2, x3, #4
// GNU objdump Thumb
0: bf00 nop
// GNU objdump Power ISA 3.1 64-bit instruction
// 0: 00 00 10 04 plwa r3,0
// 4: 00 00 60 a4
```
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D81590
Summary:
Other derivations will all want to emit optimization remarks and, as
part of that, use debug info.
Additionally, drive-by const-ing.
Reviewers: davidxl, dblaikie
Subscribers: aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81507
Convert shift+or bool vector patterns into CONCAT_VECTORS if we know this will be lowered to KUNPCK (which requires 16+ vector elements).
Fixes PR32547
Change BasicBlock::removePredecessor to optionally return a vector of
instructions which might be dead. Use this in ConstantFoldTerminator to
delete them if they are dead.
Reapply with a bug fix: don't drop the "!KeepOneInputPHIs" argument when
removePredecessor calls PHINode::removeIncomingValue.
Differential Revision: https://reviews.llvm.org/D80206
Change BasicBlock::removePredecessor to optionally return a vector of
instructions which might be dead. Use this in ConstantFoldTerminator to
delete them if they are dead.
Differential Revision: https://reviews.llvm.org/D80206
Previously these functions either returned a "changed" flag or a "repeat
instruction" flag, and could also modify an iterator to control which
instruction would be processed next.
Simplify this by always returning a "changed" flag, and handling all of
the "repeat instruction" functionality by modifying the iterator.
No functional change intended except in this case:
// If the source and destination of the memcpy are the same, then zap it.
... where the previous code failed to process the instruction after the
zapped memcpy.
Differential Revision: https://reviews.llvm.org/D81540
This teaches yaml2obj to allocate file space for a no-bits section
when there is a non-nobits section in the same segment that follows it.
It was discussed in D78005 thread and matches GNU linkers and LLD behavior.
Differential revision: https://reviews.llvm.org/D80629
AVX512 mask types are often bitcasted to scalar integers for various ops before being bitcast back to be used as a predicate. In many cases we can avoid these KMASK<->GPR transfers and perform equivalent operations on the mask unit.
If the destination mask type is legal, and we can confirm that the scalar op originally came from a mask/vector/float/double type then we should try to avoid the scalar entirely.
This avoids some codegen issues noticed while working on PTEST/MOVMSK improvements.
Partially fixes PR32547 - we don't create a KUNPCK yet, but OR(X,KSHIFTL(Y)) can be handled in a separate patch.
Differential Revision: https://reviews.llvm.org/D81548
Summary:
Fix crash when using -debug caused by the GlobalISel observer trying to print
an incomplete DBG_VALUE instruction. This was caused by the MachineIRBuilder
using buildInstr, which immediately inserts the instruction causing print,
instead of using BuildMI to first build up the instruction and using
insertInstr when finished.
Add RUN-line to existing debug-insts.ll test with -debug flag set to make sure
no crash is happening.
Also fixed a missing %s in the 2nd RUN-line of the same test.
Reviewers: t.p.northover, aditya_nandakumar, aemerson, dsanders, arsenm
Reviewed By: arsenm
Subscribers: wdng, arsenm, rovka, hiraditya, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76934
By moving target-independent code from
llvm/lib/Target/X86/X86IndirectThunks.cpp
to
llvm/include/llvm/CodeGen/IndirectThunks.h
Differential Revision: https://reviews.llvm.org/D81401
This appears to have been added when In64BitMode was added to a
bunch of instructions that don't have register operands. When an
instruction uses a register the parser will prevent a 64-bit
register from being parsed on a 32-bit target. But with only
memory and immediate operands this doesn't happen.
TEST64ri32 does have a register operand so the issue the predicate
was supposed to fix doesn't apply.
Until we have a real need for computing known bits for scalable
vectors I have simply changed the code to bail out for now and
pretend we know nothing. I've also fixed up some simple callers of
computeKnownBits too.
Differential Revision: https://reviews.llvm.org/D80437
Some processors may speculatively execute the instructions immediately
following RET (returns) and BR (indirect jumps), even though
control flow should change unconditionally at these instructions.
To avoid a potential miss-speculatively executed gadget after these
instructions leaking secrets through side channels, this pass places a
speculation barrier immediately after every RET and BR instruction.
Since these barriers are never on the correct, architectural execution
path, performance overhead of this is expected to be low.
On targets that implement that Armv8.0-SB Speculation Barrier extension,
a single SB instruction is emitted that acts as a speculation barrier.
On other targets, a DSB SYS followed by a ISB is emitted to act as a
speculation barrier.
These speculation barriers are implemented as pseudo instructions to
avoid later passes to analyze them and potentially remove them.
Even though currently LLVM does not produce BRAA/BRAB/BRAAZ/BRABZ
instructions, these are also mitigated by the pass and tested through a
MIR test.
The mitigation is off by default and can be enabled by the
harden-sls-retbr subtarget feature.
Differential Revision: https://reviews.llvm.org/D81400