When performing call slot optimization for a non-local destination,
we need to check whether there may be throwing calls between the
call and the copy. Otherwise, the early write to the destination
may be observable by the caller.
This was already done for call slot optimization of load/store,
but not for memcpys. For the sake of clarity, I'm moving this check
into the common optimization function, even if that does need an
additional instruction scan for the load/store case.
As efriedma pointed out, this check is not sufficient due to
potential accesses from another thread. This case is left as a TODO.
Differential Revision: https://reviews.llvm.org/D88799
PR47632
This allows MC to match `data32 ...` as one instruction instead of two (data32 without insn + insn).
The compatibility with GNU as improves: `data32 ljmp` will be matched as ljmpl.
`data32 lgdt 4(%eax)` will be matched as `lgdtl` (prefixes: 0x67 0x66, instead
of 0x66 0x67).
GNU as supports many other `data32 *w` as `*l`. We currently just hard code
`data32 callw` and `data32 ljmpw`. Generalizing the suffix replacement is
tricky and requires a think about the "bwlq" appending suffix rules in MatchAndEmitATTInstruction.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D88772
This involves porting BPFAbstractMemberAccess and BPFPreserveDIType to
NPM, then adding them BPFTargetMachine::registerPassBuilderCallbacks
(the NPM equivalent of adjustPassManager()).
Reviewed By: yonghong-song, asbirlea
Differential Revision: https://reviews.llvm.org/D88855
Some of these depended on analyses being present that aren't provided
automatically in NPM.
early_dce_clobbers_callgraph.ll was previously inlining a noinline function?
cast-call-combine.ll relied on the legacy always-inline pass being a
CGSCC pass and getting rerun.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D88187
This one is weird...
globals-aa needs to be already computed at licm, or else a function pass
can't run a module analysis and won't have access to globals-aa.
But the globals-aa result is impacted by instcombine in a way that
affects what the test is expecting. If globals-aa is computed before
instcombine, it is cached and globals-aa used in licm won't contain the
necessary info provided by instcombine.
Another catch is that if we don't invalidate AAManager, it will use the
cached AAManager that instcombine requested, which may not contain
globals-aa. So we have to invalidate<aa> so that licm can recompute
an AAManager with the globals-aa created by the require<globals-aa>.
This is essentially the problem described in https://reviews.llvm.org/D84259.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D88118
When we assume a return value is dead we might still visit return
instructions via `Attributor::checkForAllReturnedValuesAndReturnInsts(..)`.
When we do so the "returned value" is potentially simplified to `undef`
as it is the assumed "returned value". This is a problem if there was a
preexisting `noundef` attribute that will only be removed as we manifest
the `undef` return value. We should not use this combination to derive
`unreachable` though. Two test cases fixed.
In AAMemoryBehaviorFloating we used to track benign uses in a SetVector.
With this change we look through benign uses eagerly to reduce the
number of elements (=Uses) we look at during an update.
The test does actually not fail prior to this commit but I already wrote
it so I kept it.
We know V is a IntToPtrInst or PtrToIntInst type so we know its a CastInst - so use cast<> directly.
Prevents clang static analyzer warning that we could deference a null pointer.
This still only gets used for scalar types but now always uses ConstantExpr in preparation for vector support - it was using APInt methods in some places.
This folds a select_cc or select(set_cc) of a max or min vector reduction with a scalar value into a VMAXV or VMINV.
Differential Revision: https://reviews.llvm.org/D87836
This patch makes the parser
- reject higher vector registers (>=16) in operands where they should not
be accepted.
- accept higher integers (>=16) in vector register operands.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D88888
This diff adds support for universal binaries to llvm-objcopy.
This is a recommit of 32c8435ef70031 with the asan issue fixed.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D88400
Current Statepoint MI format is this:
STATEPOINT
<id>, <num patch bytes >, <num call arguments>, <call target>,
[call arguments...],
<StackMaps::ConstantOp>, <calling convention>,
<StackMaps::ConstantOp>, <statepoint flags>,
<StackMaps::ConstantOp>, <num deopt args>, [deopt args...],
<gc base/derived pairs...> <gc allocas...>
Note that GC pointers are listed in pairs <base,derived>.
This causes base pointers to appear many times (at least twice) in
instruction, which is bad for us when VReg lowering is ON.
The problem is that machine operand tiedness is 1-1 relation, so
it might look like this:
%vr2 = STATEPOINT ... %vr1, %vr1(tied-def0)
Since only one instance of %vr1 is tied, that may lead to incorrect
codegen (see PR46917 for more details), so we have to always spill
base pointers. This mostly defeats new VReg lowering scheme.
This patch changes statepoint instruction format so that every
gc pointer appears only once in operand list. That way they all can
be tied. Additional set of operands is added to preserve base-derived
relation required to build stackmap.
New statepoint has following format:
STATEPOINT
<id>, <num patch bytes>, <num call arguments>, <call target>,
[call arguments...],
<StackMaps::ConstantOp>, <calling convention>,
<StackMaps::ConstantOp>, <statepoint flags>,
<StackMaps::ConstantOp>, <num deopt args>, [deopt args...],
<StackMaps::ConstantOp>, <num gc pointers>, [gc pointers...],
<StackMaps::ConstantOp>, <num gc allocas>, [gc allocas...]
<StackMaps::ConstantOp>, <num entries in gc map>, [base/derived indices...]
Changes are:
- every gc pointer is listed only once in a flat length-prefixed list;
- alloca list is prefixed with its length too;
- following alloca list is length-prefixed list of base-derived
indices of pointers from gc pointer list. Note that indices are
logical (number of pointer), not absolute (index of machine operand).
Differential Revision: https://reviews.llvm.org/D87154
This reverts commit 6e25586990b93e2c9eaaa4f473b6720ccd646c46. It depends
on 32c8435ef70031d7bd3dce48e41bdce65747e123, which I'm reverting due to
ASan failures. Details in https://reviews.llvm.org/D88400.
Regarding this bug I posted earlier: https://bugs.llvm.org/show_bug.cgi?id=47035
After reading through LLVM source code and getting familiar with VPlan I was able to vectorize the code using by enabling VPlan native path. After talking with @fhahn he suggested that I contribute this as a test case. So here it is. I tried to follow the available guides how to do this best I could. I modified IR code by hand to have more clear variable names instead of numbers.
One thing what I'd like to get input from someone is that is current CHECK lines sufficient enough to verify that the inner loop has been vectorized properly?
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D87564
This removed 2 last precompiled binaries from the mips-got.test.
YAML descriptions are used instead.
Differential revision: https://reviews.llvm.org/D88565
uint8_t types are implicitly promoted to int, leading to a
unsigned-signed comparison.
Thanks for the heads-up @uabelho.
Differential Revision: https://reviews.llvm.org/D88876
In DAGCombiner::ForwardStoreValueToDirectLoad I have fixed up some
implicit casts from TypeSize -> uint64_t and replaced calls to
getVectorNumElements() with getVectorElementCount(). There are some
simple cases of forwarding that we can definitely support for
scalable vectors, i.e. when the store and load are both scalable
vectors and have the same size. I have added tests for the new
code paths here:
CodeGen/AArch64/sve-forward-st-to-ld.ll
Differential Revision: https://reviews.llvm.org/D87098
This patch enables basic BSS section handling, and improves a couple of error
messages in the ELF section parsing code.
Patch by Christian Schafmeister. Thanks Christian!
Differential Revision: https://reviews.llvm.org/D88867
Drop `noundef` for return values that are replaced by void and make it
illegal to put `noundef` on a void value.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D87306
Alignment attributes need to be dropped for non-pointer values.
This also introduces a check into the verifier to ensure you don't use
`align` on anything but a pointer. Test needed to be adjusted
accordingly.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D87304
Use context to prove that load can be safely executed at a point where load is being hoisted.
Postpone the decision about safety of speculative load execution till the moment we know
where we hoist load and check safety at that context.
Reviewers: nikic, fhahn, mkazantsev, lebedev.ri, efriedma, reames
Reviewed By: reames, mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D88725
This makes the NPM skip not required passes on functions marked optnone.
If this causes a pass that should be required but has not been marked
required to be skipped, add
`static bool isRequired() { return true; }`
to the pass class. AlwaysInlinerPass is an example.
clang/test/CodeGen/O0-no-skipped-passes.c is useful for checking that
no passes are skipped under -O0.
The -enable-npm-optnone option will be removed once this has been stable
for long enough without issues.
Reviewed By: ychen, asbirlea
Differential Revision: https://reviews.llvm.org/D87869
Refactor exit block creation to a single call ensureEarlyExitBlock.
Add support for generating an early exit block which clears the
exec mask, but only add this instruction when required.
These changes are to facilitate adding more forms of early
termination for PS shaders in the near future.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D88775
When unbundling COPY bundles in VirtRegRewriter the start of the
bundle is not correctly referenced in the unbundling loop.
The effect of this is that unbundled instructions are sometimes
inserted out-of-order, particular in cases where multiple
reordering have been applied to avoid clobbering dependencies.
The resulting instruction sequence clobbers dependencies.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D88821
Register context information was already being passed into the DWARFDebugFrame code that dumps unwind information but it wasn't being used. This change adds the ability to dump registers names of a valid MC register context was passed in and if it knows about the register. Updated the tests to use the newly returned register names.
Differential Revision: https://reviews.llvm.org/D88767
This and its friend X86ISD::LCMPXCHG8_SAVE_RBX_DAG are used if we need to avoid clobbering the frame pointer in EBX/RBX. EBX/RBX are only used a frame pointer in 64-bit mode. In 64-bit mode we don't use CMPXCHG8B since we have a GR64 cmpxchg available. So we don't need special handling for LCMPXCHG8B.
Split from D88808
Differential Revision: https://reviews.llvm.org/D88853
getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize
FP constants to the RHS. When this happens we should create the node
with the FMF that was requested. By using FlagInserter when can ensure
any calls to getNode/getSetcc during canonicalization will also get the flags.
Differential Revision: https://reviews.llvm.org/D88063
This reverts commit 20797989ea190f2ef22d13c5a7a0535fe9afa58b.
This patch (https://reviews.llvm.org/D69257) cannot complete a stage2
build due to the change:
```
CI->getCalledFunction()->getName().contains("longjmp")
```
There are several concrete issues here:
- The callee may not be a function, so `getCalledFunction` can assert.
- The called value may not have a name, so `getName` can assert.
- There's no distinction made between "my_longjmp_test_helper" and the
actual longjmp libcall.
At a higher level, there's a serious layering problem here. The
splitting pass makes policy decisions in a general way (e.g. based on
attributes or profile data). Special-casing certain names breaks the
layering. It subverts the work of library maintainers (who may now need
to opt-out of unexpected optimization behavior for any affected
functions) and can lead to inconsistent optimization behavior (as not
all llvm passes special-case ".*longjmp.*" in the same way).
The patch may need significant revision to address these issues.
But the immediate issue is that this crashes while compiling llvm's unit
tests in a stage2 build (due to the `getName` problem).