It turned out that the D78704 included a private LLVM header, which is excluded
from the LLVM install target.
I'm substituting that `#include` with the public one by moving the necessary
`#define` into that. There was a discussion about this at D78704 and on the
cfe-dev mailing list.
I'm also placing a note to remind others of this pitfall.
Reviewed By: mgorny
Differential Revision: https://reviews.llvm.org/D84929
Scheduler will try to retrieve the offset and base addr to determine if two
loads/stores are disjoint memory access. PowerPC failed to handle this for
frame index which will bring extra memory dependency for loads/stores.
Reviewed By: jji
Differential Revision: https://reviews.llvm.org/D84308
This patch allows SimplifyPartiallyRedundantLoad work when
the branch condition was frozen.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84944
We can preserve make.implicit metadata in the split block if it is
guaranteed that after following the branch we always reach the block
where processing of null case happens, which is equivalent to
"initial condition must execute if the loop is entered".
Differential Revision: https://reviews.llvm.org/D84925
Reviewed By: asbirlea
Non-trivial unswitching simply moves terminator being unswitch from the loop
up to the switch block. It also preserves all metadata that was there. It might not
be a correct thing to do for `make.implicit` metadata. Consider case:
```
for (...) {
cond = // computed in loop
if (cond) return X;
if (p == null) throw_npe(); !make implicit
}
```
Before the unswitching, if `p` is null and we reach this check, we are guaranteed
to go to `throw_npe()` block. Now we unswitch on `p == null` condition:
```
if (p == null) !make implicit {
for (...) {
if (cond) return X;
throw_npe()
}
} else {
for (...) {
if (cond) return X;
}
}
```
Now, following `true` branch of `p == null` does not always lead us to
`throw_npe()` because the loop has side exit. Now, if we run ImplicitNullCheck
pass on this code, it may end up making the unswitch condition implicit. This may
lead us to turning normal path to `return X` into signal-throwing path, which is
not efficient.
Note that this does not happen during trivial unswitch: it guarantees that we do not
have side exits before condition being unswitched.
This patch fixes this situation by unconditional dropping of `make.implicit` metadata
when we perform non-trivial unswitch. We could preserve it if we could prove that the
condition always executes. This can be done as a follow-up.
Differential Revision: https://reviews.llvm.org/D84916
Reviewed By: asbirlea
is enabled.
When -sample-profile-merge-inlinee is enabled, new FunctionSamples may be
created during profile merge without GUIDToFuncNameMap being initialized.
That will occasionally cause compiler crash. The patch fixes it.
Differential Revision: https://reviews.llvm.org/D84994
Similar to what was recently done to ParseATTOperand. Make
ParseIntelOperand directly responsible for adding to the operand
vector instead of returning the operand. Return a bool for error.
Remove ErrorOperand since it is no longer used.
For consistency with legacy pass name.
Helps with 37 instances of "unknown pass name 'tbaa'" in check-llvm under NPM.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D84967
If an analysis is actually invalidated, there's already a log statement
for that: 'Invalidating analysis: FooAnalysis'.
Otherwise the statement is not very useful.
Reviewed By: asbirlea, ychen
Differential Revision: https://reviews.llvm.org/D84981
findAllocaForValue uses AllocaForValue to cache resolved values.
The function is used only to resolve arguments of lifetime
intrinsic which usually are not fare for allocas. So result reuse
is likely unnoticeable.
In followup patches I'd like to replace the function with
GetUnderlyingObjects.
Depends on D84616.
Differential Revision: https://reviews.llvm.org/D84617
Fix for the issue raised in https://github.com/rust-lang/rust/issues/74632.
The current heuristic for inserting LFENCEs uses a quadratic-time algorithm. This can apparently cause substantial compilation slowdowns for building Rust projects, where functions > 5000 LoC are apparently common.
The updated heuristic in this patch implements a linear-time algorithm. On a set of benchmarks, the slowdown factor for the generated code was comparable (2.55x geo mean for the quadratic-time heuristic, vs. 2.58x for the linear-time heuristic). Both heuristics offer the same security properties, namely, mitigating LVI.
This patch also includes some formatting fixes.
Differential Revision: https://reviews.llvm.org/D84471
After the recent change to the tuning settings for pentium4 to improve our default 32-bit behavior, I've decided to see about implementing -mtune support. This way we could have a default architecture CPU of "pentium4" or "x86-64" and a default tuning cpu of "generic". And we could change our "pentium4" tuning settings back to what they were before.
As a step to supporting this, this patch separates all of the features lists for the CPUs into 2 lists. I'm using the Proc class and a new ProcModel class to concat the 2 lists before passing to the target independent ProcessorModel. Future work to truly support mtune would change ProcessorModel to take 2 lists separately. I've diffed the X86GenSubtargetInfo.inc file before and after this patch to ensure that the final feature list for the CPUs isn't changed.
Differential Revision: https://reviews.llvm.org/D84879
This patch addes time trace functionality to have a better understanding
of the analysis times.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D84980
This includes basic support for computeKnownBits on abs. I've left FIXMEs for more complicated things we could do.
Differential Revision: https://reviews.llvm.org/D84963
Just the obvious implementation that rewrites the result type. Also fix
warning from EXTRACT_SUBVECTOR legalization that triggers on the test.
Differential Revision: https://reviews.llvm.org/D84706
The -harness option enables new testing use-cases for llvm-jitlink. It takes a
list of objects to treat as a test harness for any regular objects passed to
llvm-jitlink.
If any files are passed using the -harness option then the following
transformations are applied to all other files:
(1) Symbols definitions that are referenced by the harness files are promoted
to default scope. (This enables access to statics from test harness).
(2) Symbols definitions that clash with definitions in the harness files are
deleted. (This enables interposition by test harness).
(3) All other definitions in regular files are demoted to local scope.
(This causes untested code to be dead stripped, reducing memory cost and
eliminating spurious unresolved symbol errors from untested code).
These transformations allow the harness files to reference and interpose
symbols in the regular object files, which can be used to support execution
tests (including fuzz tests) of functions in relocatable objects produced by a
build.
This allows clients to detect invalid transformations applied by JITLink passes
(e.g. inserting or removing symbols in unexpected ways) and terminate linking
with an error.
This change is used to simplify the error propagation logic in
ObjectLinkingLayer.
Avoid recursively calling copyPhysReg for AGPR handling. This was
dropping the necessary super register implicit defs to avoid liveness
verifier errors.
Pass the abs poison flag to the underlying ConstantRange
implementation, allowing CVP to simplify based on it.
Importantly, this recognizes that abs with poison flag is actually
non-negative...
This fixes an assertion failure that was being triggered in
SelectionDAG::getZeroExtendInReg(), where it was trying to extend the <2xi32>
to i64 (which should have been <2xi64>).
Fixes: rdar://66016901
Differential Revision: https://reviews.llvm.org/D84884
Determine whether switch edges are feasible based on range information,
and remove non-feasible edges lateron.
This does not try to determine whether the default edge is dead,
as we'd have to determine that the range is fully covered by the
cases for that.
Another limitation here is that we don't remove dead cases that
have the same successor as a live case. I'm not handling this
because I wanted to keep the edge removal based on feasible edges
only, rather than inspecting ranges again there -- this does not
seem like a particularly useful case to handle.
Differential Revision: https://reviews.llvm.org/D84270
Currently we skip alias sets with only reads or a single write and no
reads, but still add the pointers to the list of pointers in RtCheck.
This can lead to cases where we try to access a pointer that does not
exist when grouping checks. In most cases, the way we access
PositionMap masked that, as the value would default to index 0.
But in the example in PR46854 it causes a crash.
This patch updates the logic to avoid adding pointers for alias sets
that do not need any checks. It makes things slightly more verbose, by
first checking the numbers of reads/writes and bailing out early if we don't
need checks for the alias set.
I think this makes the logic a bit simpler to follow.
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D84608
ThinLTO is run using a single thread on Linux on Power. The
compute_thread_count() routine calls getHostNumPhysicalCores which
returns -1 by default, and so `MaxThreadCount is set to 1.
unsigned llvm::ThreadPoolStrategy::compute_thread_count() const {
int MaxThreadCount = UseHyperThreads
? computeHostNumHardwareThreads()
: sys::getHostNumPhysicalCores();
if (MaxThreadCount <= 0)
MaxThreadCount = 1;
…
}
Fix: provide custom implementation of getHostNumPhysicalCores for
Linux on Power and Linux on Z.
Reviewed By: Kai, uweigand
Differential Revision: https://reviews.llvm.org/D84764
LLVM selection dag assumes "switch" indices are pointer sized, which causes problems for our 32-bit br_table. The new function ensures 32-bit operands don't get unnecessarily extended, and 64-bit operands get truncated.
Note that the changes to the existing test test exactly that: the addition of -NEXT in 2 places ensures no extension is inserted (which the test previously ignored) and that the wrap is present (previously omitted in wasm64 mode).
Differential Revision: https://reviews.llvm.org/D84705
We are using undef on the indirect move source subreg and then
using implicit super-reg. This creates a problem in RA when
Greedy decides to split the register. It reassigns the implicit
super-reg but does not bother to change undef source because
it is really does not matter. The fix is to stop lying to RA and
drop undef flag.
This has also hit a problem in SIFoldOperands as it can fold
immediate into an indirect move since there is no undef flag
anymore. That results in multiple test failures, so added the
check for this case.
Differential Revision: https://reviews.llvm.org/D84899
Problem:
Right now, our "Running pass" is not accurate when passes are wrapped in adaptor because adaptor is never skipped and a pass could be skipped. The other problem is that "Running pass" for a adaptor is before any "Running pass" of passes/analyses it depends on. (for example, FunctionToLoopPassAdaptor). So the order of printing is not the actual order.
Solution:
Doing things like PassManager::Debuglogging is very intrusive because we need to specify Debuglogging whenever adaptor is created. (Actually, right now we're not specifying Debuglogging for some sub-PassManagers. Check PassBuilder)
This patch move debug logging for pass as a PassInstrument callback. We could be sure that all running passes are logged and in the correct order.
This could also be used to implement hierarchy pass logging in legacy PM. We could also move logging of pass manager to this if we want.
The test fixes looks messy. It includes changes:
- Remove PassInstrumentationAnalysis
- Remove PassAdaptor
- If a PassAdaptor is for a real pass, the pass is added
- Pass reorder (to the correct order), related to PassAdaptor
- Add missing passes (due to Debuglogging not passed down)
Reviewed By: asbirlea, aeubanks
Differential Revision: https://reviews.llvm.org/D84774
Get rid of all fixmes and base heuristic on `num-clustered-dwords`. The main intuition behind this is as
follows. The existing heuristic roughly summarizes as below:
* Assume, all the mem ops instructions participating in the clustering process, loads/stores same num bytes
* If num bytes loaded by each mem op is 4 bytes, then cluster at max 5 mem ops, that is at max 20 bytes
* If num bytes loaded by each mem op is 8 bytes, then cluster at max 3 mem ops, that is at max 24 bytes
* If num bytes loaded by each mem op is 16 bytes, then cluster at max 2 mem ops, that is at max 32 bytes
So, we need to make sure that the new heuristic do not completey deviate away from the above one, and it
properly handles both the sub-word loads and the wide loads.
Reviewed By: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D84354
In cases where the alignment of the datatype is smaller than
expected by the instruction, the address is aligned. The aligned
address is used for the load, but wasn't used for the store
conditional, which resulted in a run-time alignment exception.
We parse .arch so that some `.arch i386; .code32` code can assemble. It seems
that X86AsmParser does not do a good job tracking what features are needed to
assemble instructions. GNU as's x86 port supports a very wide range of .arch
operands. Ignore the operand for now.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D84900
We need to keep track of the alloca insertion point (which we already
communicate via the callback to the user) as we place allocas as well.
Reviewed By: fghanim, SouraVX
Differential Revision: https://reviews.llvm.org/D82470
The operand to these instructions is both input and output.
These are not yet emitted by the compiler and the assembler already
works fine, so can't test in this patch. But D75044 will use XPACI
and provide test coverage for this patch as well.
Differential Revision: https://reviews.llvm.org/D84298