The -phony-externals option adds a generator which explicitly defines any
otherwise unresolved externals as null. This transforms link-time
unresolved-symbol errors into potential runtime null pointer accesses
(if an unresolved external is actually accessed during execution).
This option can be useful in -harness mode to avoid having to mock a
large number of symbols that are not reachable at runtime (e.g. unused
methods referenced by a class vtable).
As mentioned on D70376, LVI can currently cause performance issues
when running under NewPM. The problem is that, unlike the legacy
pass manager, NewPM will not immediately discard the LVI analysis
if the following pass does not need it. This is a problem, because
LVI has a high memory requirement, and mass invalidation of LVI
values is very inefficient. LVI should only be alive during passes
that actively interact with it.
This patch addresses the issue by explicitly abandoning LVI after CVP,
which gets us back to the LegacyPM behavior.
Differential Revision: https://reviews.llvm.org/D84959
These prefixes should override the default behavior and force a larger immediate size. I don't believe gas issues any warning if you use {disp8} when a 32-bit displacement is already required. And this patch doesn't either.
This completes the {disp8} and {disp32} support from PR46650.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D84793
It's always safe to pick the earlier abs regardless of the nsw flag. We'll just lose it if it is on the outer abs but not the inner abs.
Differential Revision: https://reviews.llvm.org/D85053
Negating the input doesn't matter. I left a FIXME to copy the nsw flag if its present on the neg but not on the abs.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D85055
formLCSSAForInstructions is used by SCEVExpander, which tracks all
inserted instructions including LCSSA phis using asserting value
handles. This means cleanup needs to happen in the caller.
Extend formLCSSAForInstructions to take an optional pointer to a
vector. If this argument is non-nullptr, instead of directly deleting
the phis, add them to the vector, so the caller can process them.
This should address various PPC buildbot failures, including
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/40567
Noticed while investigating combining from concatenated shuffle vectors, we weren't checking that PSHUFLW/PSHUFHW was legal - we were depending on lowering splitting to subvectors.
Use IRBuilder instead PHINode::Create. This should not impact the
generated code, but IRBuilder provides a way to register callbacks for
inserted instructions, which is convenient for some users.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D85037
This adds sign/zero extending scalar loads/stores to the MVE
instructions added in D77813, allowing us to create up more post-inc
instructions. These are comparatively simple, compared to LDR/STR (which
may be better turned into an LDRD/LDM), but still require some additions
over MVE instructions. Because there are i12 and i8 variants of the
offset loads/stores dealing with different signs, we may need to convert
an i12 address to a i8 negative instruction. t2LDRBi12 can also be
shrunk to a tLDRi under the right conditions, so we need to be careful
with codesize too.
Differential Revision: https://reviews.llvm.org/D78625
abs() should be rare enough that using value tracking is not going
to be a compile-time cost burden, so use it to reduce a variety of
potential patterns. We do this in DAGCombiner too.
Differential Revision: https://reviews.llvm.org/D85043
Now we try to load and broadcast together for operand 1. Followed
by load and broadcast for operand 1. Previously we tried load
operand 1, load operand 1, broadcast operand 0, broadcast operand 1.
Now we have a single helper that tries load and broadcast for
one operand that we can just call twice.
querying getSCEV() for incomplete phis leads to wrong cache value in `ExprToIVMap`,
because incomplete phis may be simplified to same value before get SCEV expression.
Reviewed By: lebedev.ri, mkazantsev
Differential Revision: https://reviews.llvm.org/D77560
SPE doesn't have a fsel instruction, so don't try to lower to it.
This fixes a "Cannot select: tN: f64 = PPCISD::FSEL tX, tY, tZ" error.
Reviewed By: #powerpc, lkail
Differential Revision: https://reviews.llvm.org/D77773
The patterns were incorrect copies from the FPU code, and are
unnecessary, since there's no extended load for SPE. Just let LLVM
itself do the work by marking it expand.
Reviewed By: #powerpc, lkail
Differential Revision: https://reviews.llvm.org/D78670
Change to expand all arguments and return values to i64 to follow ABI.
Update regression tests also.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D84581
Refer to LangRef http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics
'llvm.masked.load/store.*’ intrinsics are overloaded intrinsic, which allow the
load/store data to be a vector of any integer, floating-point or pointer data type.
Therefore, allow pointer data type when checking 'isLegalMaskedLoadStore()'.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D85045
Rather than hardcoding immediate values for 12 different combinations
in a nested pair of switches, we can perform the matched logic
operation on 3 magic constants to calculate the immediate.
Special thanks to this tweet https://twitter.com/rygorous/status/1187034321992871936
for making me realize I could do this.
Add the optimizations we have in the SelectionDAG version.
Known non-negative copies all known bits. Any known one other than
the sign bit makes result non-negative.
Differential Revision: https://reviews.llvm.org/D85000
D68049 created options for basic block sections: -fbasic-block-sections=,
-funique-basic-block-section-names. Rename options in llc and lld (--lto-)
to be consistent. Specifically,
+ Rename basicblock-sections to basic-block-sections
+ Rename unique-bb-section-names to unique-basic-block-section-names
Differential Revision: https://reviews.llvm.org/D84462
Summary: This patch separates the Loop Peeling Utilities from Loop Unrolling.
The reason for this change is that Loop Peeling is no longer only being used by
loop unrolling; Patch D82927 introduces loop peeling with fusion, such that
loops can be modified to have to same trip count, making them legal to be
peeled.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D83056
computeHostNumPhysicalCores() is designed to respect CPU affinity.
D84764 used sysconf(_SC_NPROCESSORS_ONLN) which does not respect
affinity.
SupportTests Threading.PhysicalConcurrency may fail if taskset -c is specified.
This fixes the ExecutionEngine/MCJIT/stubs-sm-pic.ll test in no-asserts
builds which is set to XFAIL on some platforms like 32-bit x86. More
importantly, we probably don't want to silently error in these cases.
Differential revision: https://reviews.llvm.org/D84390
If absolute value needs turn a negative number into a positive number it reduces the number of sign bits by at most 1.
Differential Revision: https://reviews.llvm.org/D84971
I found that propagateAttributes was ~23% of a thin link's run time
(almost 4x higher than the second hottest function). The main reason is
that it re-examines a global var each time it is referenced. This
becomes unnecessary once it is marked both non read only and non write
only. I added a set to avoid doing redundant work, which dropped the
runtime of that thin link by almost 15%.
I made a smaller efficiency improvement (no measurable impact) to skip
all summaries for a VI if the first copy is dead. I added an assert to
ensure that all copies are dead if any is. The code in
computeDeadSymbols marks all summaries for a VI as live. There is one
corner case where it was skipping marking an alias as live, that I
fixed. However, since the code earlier marked all copies of a preserved
GUID's VI as live, and each 'visit' marks all copies live, the only case
where this could make a difference is summaries that were marked live
when they were built initially, and that is only a few special compiler
generated symbols and inline assembly symbols, so it likely is never
provoked in practice.
Differential Revision: https://reviews.llvm.org/D84985
This patch implements the instruction definition and MC tests for the vector
string isolate instructions.
Differential Revision: https://reviews.llvm.org/D84197
https://reviews.llvm.org/D84909
Patch adds two new GICombinerRules, one for G_INTTOPTR and one for
G_PTRTOINT. The G_INTTOPTR elides ptr2int(int2ptr(x)) to a copy of x, if
the cast is within the same address space. The G_PTRTOINT elides
int2ptr(ptr2int(x)) to a copy of x. Patch additionally adds new combiner
tests for the AArch64 target to test these new combiner rules.
Patch by mkitzan
A function call can be replicated by optimizations like loop unroll and jump threading and the replicates end up sharing the sample nested callee profile. Therefore when it comes to merging samples for uninlined callees in the sample profile inliner, a callee profile can be merged multiple times which will cause an assert to fire.
This change avoids merging same callee profile for duplicate callsites by filtering out callee profiles with a non-zero head sample count.
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D84997
Refactoring `Slice` class and function `createUniversalBinary` from
`llvm-lipo` into MachOUniversalWriter. This refactoring is necessary so
as to use the refactored code for creating universal binaries under
llvm-libtool-darwin.
Reviewed by alexshap, smeenai
Differential Revision: https://reviews.llvm.org/D84662
This patch makes the 'debug_aranges' entry optional. If the entry is
empty, yaml2obj will only emit the header for it.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D84921
In this patch, we add a helper function getDWARFEmitterByName(). This
function returns the proper DWARF section emitting method by the name.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D84952
In this patch, emitDebugPubnames(), emitDebugPubtypes(),
emitDebugGNUPubnames(), emitDebugGNUPubtypes() are added.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D85003
Scheduler will try to retrieve the offset and base addr to determine if two
loads/stores are disjoint memory access. PowerPC failed to handle this for
frame index which will bring extra memory dependency for loads/stores.
Reviewed By: jji
Differential Revision: https://reviews.llvm.org/D84308
This patch allows SimplifyPartiallyRedundantLoad work when
the branch condition was frozen.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84944
We can preserve make.implicit metadata in the split block if it is
guaranteed that after following the branch we always reach the block
where processing of null case happens, which is equivalent to
"initial condition must execute if the loop is entered".
Differential Revision: https://reviews.llvm.org/D84925
Reviewed By: asbirlea
Non-trivial unswitching simply moves terminator being unswitch from the loop
up to the switch block. It also preserves all metadata that was there. It might not
be a correct thing to do for `make.implicit` metadata. Consider case:
```
for (...) {
cond = // computed in loop
if (cond) return X;
if (p == null) throw_npe(); !make implicit
}
```
Before the unswitching, if `p` is null and we reach this check, we are guaranteed
to go to `throw_npe()` block. Now we unswitch on `p == null` condition:
```
if (p == null) !make implicit {
for (...) {
if (cond) return X;
throw_npe()
}
} else {
for (...) {
if (cond) return X;
}
}
```
Now, following `true` branch of `p == null` does not always lead us to
`throw_npe()` because the loop has side exit. Now, if we run ImplicitNullCheck
pass on this code, it may end up making the unswitch condition implicit. This may
lead us to turning normal path to `return X` into signal-throwing path, which is
not efficient.
Note that this does not happen during trivial unswitch: it guarantees that we do not
have side exits before condition being unswitched.
This patch fixes this situation by unconditional dropping of `make.implicit` metadata
when we perform non-trivial unswitch. We could preserve it if we could prove that the
condition always executes. This can be done as a follow-up.
Differential Revision: https://reviews.llvm.org/D84916
Reviewed By: asbirlea
is enabled.
When -sample-profile-merge-inlinee is enabled, new FunctionSamples may be
created during profile merge without GUIDToFuncNameMap being initialized.
That will occasionally cause compiler crash. The patch fixes it.
Differential Revision: https://reviews.llvm.org/D84994