Or else on optnone functions we get the following during instruction selection:
fatal error: error in backend: Cannot select: intrinsic %llvm.preserve.struct.access.index
Currently the -O0 pipeline doesn't properly run passes registered via
TargetMachine::registerPassBuilderCallbacks(), so don't add that RUN
line yet. That will be fixed after this.
Reviewed By: yonghong-song
Differential Revision: https://reviews.llvm.org/D89083
Based on offline discussions regarding D89139 and D88783 - we want to make sure targets aren't doing anything particularly dumb
Tests copied from aarch64 which has a mixture of general, legalization and special case tests
There are cases that generated OpenMP code consists of multiple,
consecutive OpenMP parallel regions, either due to high-level
programming models, such as RAJA, Kokkos, lowering to OpenMP code, or
simply because the programmer parallelized code this way. This
optimization merges consecutive parallel OpenMP regions to: (1) reduce
the runtime overhead of re-activating a team of threads; (2) enlarge the
scope for other OpenMP optimizations, e.g., runtime call deduplication
and synchronization elimination.
This implementation defensively merges parallel regions, only when they
are within the same BB and any in-between instructions are safe to
execute in parallel.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D83635
In the NPM, a pass cannot depend on another non-analysis pass. So pin
the test that tests that -lowerswitch is run automatically to legacy PM.
Reviewed By: sameerds
Differential Revision: https://reviews.llvm.org/D89051
Following on from D88890, this makes the newly added patterns
conditional on NoFP32Denormals. mad/mac f32 instructions always flush
denormals regardless of the MODE register setting, and I believe the
legacy variants do the same.
Differential Revision: https://reviews.llvm.org/D89123
There might be a better way to specify the pre-conditions,
but this is hopefully clearer than the way it was written:
https://rise4fun.com/Alive/Jhk3
Pre: C2 < 0 && isShiftedMask(C2) && (C1 == C1 & C2)
%a = and %x, C2
%r = add %a, C1
=>
%a2 = add %x, C1
%r = and %a2, C2
We cannot guarantee that the replacement expression is loop-invariant in
all AddRecs in the source expression. Use a rewriter that skips
AddRecExpr for now.
Fixes PR47776.
IV widening is sometimes a strictly harmful transform (some examples
of this are shown in tests 11, 12 in widen-loop-comp.ll). One of the
reasons of this is that sometimes SCEV fails to prove some facts after
part of guards has been widened.
Though each single such case looks like a bug that can be addressed,
it seems that disabling of IV widening may be profitable in some cases.
We want to have an option to do so. By default, existing behavior is
preserved and IV widening is on.
Summary: This patch is derived from D87384.
In this patch we expand the existing decomposition of mul-by-constant to be more general by implementing 2 patterns:
```
mul x, (2^N + 2^M) --> (add (shl x, N), (shl x, M))
mul x, (2^N - 2^M) --> (sub (shl x, N), (shl x, M))
```
The conversion will be trigged if the multiplier is a big constant that the target can't use a single multiplication instruction to handle. This is controlled by the hook `decomposeMulByConstant`.
More over, the conversion benefits from an ILP improvement since the instructions are independent. A case with the sequence like following also gets benefit since a shift instruction is saved.
```
*res1 = a * 0x8800;
*res2 = a * 0x8080;
```
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D88201
This removes the precompiled binary and rewrites test to use YAML.
After this change we'll have no more precompiled inputs in `llvm-readobj/ELF/Inputs`.
Differential revision: https://reviews.llvm.org/D89097
It is possible to get a fltSemantics of a particular Type,
but there is no way to produce a Type based on a
fltSemantics.
This adds the function Type::getFloatingPointTy, which
will return the appropriate floating point Type for a given
fltSemantics.
ConstantFP is modified to use this function instead of
implementing it itself. Also some minor refactors to use
Type::getFltSemantics instead of a hand-rolled version.
Differential Revision: https://reviews.llvm.org/D87512
This patch rewrites test case verify_die_ranges.s in YAML which helps
simplify the test.
Reviewed By: jhenderson, JDevlieghere, dblaikie
Differential Revision: https://reviews.llvm.org/D88200
This patch makes the opcode_base and the standard_opcode_lengths fields
of the line table optional. When both of them are not specified,
yaml2obj emits them according to the line table's version.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D88355
In `rescheduleKillAboveMI`, current implementation uses `SmallSet` to track reg's defs and uses. When comparing, use `SmallSet.count` to find out if it's clobbered or used. It's not correct if involving subregisters. This patch uses `regOverlapsSet` already used by `rescheduleMIBelowKill` to fix the issue.
Fixed https://bugs.llvm.org/show_bug.cgi?id=47707.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D88716
Annoyingly vectors aren't supported by shouldChangeType(), but we have precedents for always performing this on vector types (e.g. narrowBinOp).
Differential Revision: https://reviews.llvm.org/D89067
The IRTranslator depends on the branch probability info pass when the
optimization level is different than None and it depends all the time on
the StackProtector pass.
We have to explicitly call out pass dependencies otherwise the pass manager
may not be able to schedule the IRTranslator.
Before this patch, we were lucky because previous passes depend on the branch
probability info pass (like the Global Variable Optimization) and the stack
protector pass is initialized in initializeCodeGen.
However, if the target has a custom pipeline without any passes like Global
Variable Optimization, the pipeline creation will fail, at least because of
the branch probability info pass dependency (it is unlikely that
initializeCodeGen is not called).
This patch adds the missing dependencies to the IRTranslator.
Differential Revision: https://reviews.llvm.org/D89063
Use cast<> as we immediately dereference the pointer afterwards - cast<> will assert if we fail.
Prevents clang static analyzer warning that we could deference a null pointer.
Pre-conditions seem to be optimal, but we don't need a use check
because we are only replacing an add with a sub.
https://rise4fun.com/Alive/hzN
Pre: (~C1 | C2 == -1) && isPowerOf2(C2+1)
%m = and i8 %x, C1
%f = xor i8 %m, C2
%r = add i8 %f, C3
=>
%r = sub i8 C2 + C3, %m
In LowerEmscriptenEHSjLj, `longjmp` used to be replaced with
`emscripten_longjmp_jmpbuf(jmp_buf*, i32)`, which will eventually be
lowered to `emscripten_longjmp(i32, i32)`. The reason we used two
different names was because they had different signatures in the IR
pass.
D88697 fixed this by only using `emscripten_longjmp(i32, i32)` and
adding a `ptrtoint` cast to its first argument, so
```
longjmp(buf, 0)
```
becomes
```
emscripten_longjmp((i32)buf, 0)
```
But this assumed all uses of `longjmp` was a direct call to it, which
was not the case. This patch handles indirect uses of `longjmp` by
replacing
```
longjmp
```
with
```
(i32(*)(jmp_buf*, i32))emscripten_longjmp
```
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D89032