This reverts commit 0345d88de654259ae90494bf9b015416e2cccacb.
Google internal backend uses EntrySU, we are looking into removing
dependency on it.
Differential Revision: https://reviews.llvm.org/D88018
This commit was originally because it was suspected to cause a crash,
but a reproducer did not surface.
A crash that was exposed by this change was fixed in 1d8f2e52925b.
This reverts the revert commit 0581c0b0eeba03da590d1176a4580cf9b9e8d1e3.
This adds lowering for f32 values using the vmov.f16, which zeroes the
top bits whilst setting the lower bits to a pattern. This range of
values does not often come up, except where a f16 constant value has
been converted to a f32.
Differential Revision: https://reviews.llvm.org/D87790
This does not result in changes for any of the current tests, but it might
improve debug information in some cases.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D86522
SelectionDAGBuilder was inconsistently mangling values based on ABI
Calling Conventions when getting them through copyFromRegs in
SelectionDAGBuilder, causing duplicate value type convertions for
function arguments. The checking for the mangling requirement was based
on the value's originating instruction and was performed outside of, and
inspite of, the regular Calling Convention Lowering.
The issue could be observed in a scenario such as:
```
%arg1 = load half, half* %const, align 2
%arg2 = call fastcc half @someFunc()
call fastcc void @otherFunc(half %arg1, half %arg2)
; Here, %arg2 was incorrectly mangled twice, as the CallConv data from
; the call to @someFunc() was taken into consideration for the check
; when getting the value for processing the call to @otherFunc(...),
; after the proper convertion had taken place when lowering the return
; value of the first call.
```
This patch fixes the issue by disregarding the Calling Convention
information for such copyFromRegs, making sure the ABI mangling is
properly contanined in the Calling Convention Lowering.
This fixes Bugzilla #47454.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D87844
LSR claims to MemorySSA, but we also have to make sure it is preserved
when splitting critical edges. This can be done by passing MSSAU to
SplitCriticalEdge.
Fixes PR47557.
This is a follow-up of D86605. For strict DAG FP node, if its FP
exception behavior metadata is ignore, it should have nofpexcept flag.
But during custom lowering, this flag isn't passed down.
This is also seen on X86 target.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87390
The scalar elements of the vXi1 build_vector will have been type legalized to i8 by padding with 0s. So we can't check for all ones. Instead we should just look at bit 0 of the constant.
Differential Revision: https://reviews.llvm.org/D87863
This adds simple constant folding for VMOVrh, to constant fold fp16
constants to integer values. It can help especially with soft calling
conventions, but some of the results are not optimal as we end up
loading using a vldr. This will be improved in a follow up patch.
Differential Revision: https://reviews.llvm.org/D87789
InstCombine likes to canonicalize comparisons of the form
X == C || X == C+1 into (X & -2) == C'. Make sure LVI can still
recover the value range from this. Can of course also be useful
for proper mask comparisons.
For the sake of clarity, the implementation goes through KnownBits
to compute the range.
Rewrite this in a way where the core logic is in a separate
function, that is invoked with swapped operands. This makes it
easier to add handling for additional icmp patterns.
It should be possible to make this generic, but we're not great at checking legality of *_EXTEND_VECTOR_INREG ops so I'm conservatively putting this inside X86ISelLowering.cpp
We do similar factorization folds in SimplifyUsingDistributiveLaws,
but that drops no-wrap properties. Propagating those optimally may
help solve:
https://llvm.org/PR47430
The propagation is all-or-nothing for these patterns: when all
3 incoming ops have nsw or nuw, the 2 new ops should have the
same no-wrap property:
https://alive2.llvm.org/ce/z/Dv8wsU
This also solves:
https://llvm.org/PR47584
The test (currently crashing) is reduced from the example provided
in the post-commit discussion in D87149.
Differential Revision: https://reviews.llvm.org/D87965
It should be possible to make this generic, but we're not great at checking legality of *_EXTEND_VECTOR_INREG ops so I'm conservatively putting this inside X86ISelLowering.cpp
After moving WidenedMask is in an undefined state, so reduce scope of the variable so its reinitialized every iteration - we should still retain any memory allocation savings.
I want to export this function, and the current API was a bit
weird: It took an additional Alignment argument that didn't really
have anything to do with what the function does. Drop it, and
perform a max at the callsite.
Also rename it to tryEnforceAlignment().
Currently SCEVExpander creates inttoptr for non-integral pointers if the
base is a null constant for example. This results in invalid IR.
This patch changes InsertNoopCastOfTo to emit a GEP & bitcast to convert
to a non-integral pointer. First, a GEP of i8* null is generated and the
integral value is used as index. The GEP is then bitcasted to the target
type.
This was exposed by D71539.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D87827
The output here may not be optimal (yet), but it should be
consistent for commuted operands (it was not before) and
correct. We can do better by checking FMF and NaN if needed.
Code in InstSimplify generally assumes that we have already
folded code like this, so it was not handling 2 constant
inputs by commuting consistently.
Currently newer clang-format options cannot be included in .clang-format files, if not all users can be forced to use an updated version.
This patch tries to solve this by adding an option to clang-format, enabling to ignore unknown (newer) options.
Differential Revision: https://reviews.llvm.org/D86137
We were breaking out of the switch which falls into the default
implementation of SimplifyDemandedBitsForTargetNode which is a
wrapper around computeKnownBits. So we end up doing the recursion
and known bits calculation all over again. Instead we should return
with the known bits we calculated in the switch.
When address sanitizing a function, stack unpinsoning code is inserted before each ret instruction. However if the ret instruciton is preceded by a musttail call, such transformation broke the musttail call contract and generates invalid IR.
This patch fixes the issue by moving the insertion point prior to the musttail call if there is one.
Differential Revision: https://reviews.llvm.org/D87777
The IRInstructionData structs are a different representation of the
program. This list treats the program as if it was "flattened" and
the only parent is this list. This lets us easily create ranges of
instructions.
Differential Revision: https://reviews.llvm.org/D86969
This patch implements the vec_gen[b|h|w|d|q]m function prototypes in altivec.h
in order to utilize the move to VSR with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82725
If the mask of a pdep or pext instruction is a shift masked (i.e. one contiguous block of ones) we need at most one and and one shift to represent the operation without the intrinsic. One all platforms I know of, this is faster than the pdep/pext.
The cost modelling for multiple contiguous blocks might be worth exploring in a follow up, but it's not relevant for my current use case. It would almost certainly be a win on AMDs where these are really really slow though.
Differential Revision: https://reviews.llvm.org/D87861