Previously we emitted an fmadd and a fmadd+fneg and combined them with a shufflevector. But this doesn't follow the correct exception behavior for unselected elements so the backend can't merge them into the fmaddsub/fmsubadd instructions.
This patch restores the the fmaddsub intrinsics so we don't have two arithmetic operations. We lose out on optimization opportunity in the non-strict FP case, but I don't think this is a big loss. If someone gives us a test case we can look into adding instcombine/dagcombine improvements. I'd rather not have the frontend do completely different things for strict and non-strict.
This still has problems because target specific intrinsics don't support strict semantics yet. We also still have all of the problems with masking. But we at least generate the right instruction in constrained mode now.
Differential Revision: https://reviews.llvm.org/D74268
GCC 9.2 seems to incorrectly issue warning about out of bounds
access. This situation should not happen in any way.
Differential Revision: https://reviews.llvm.org/D75071
This patch adds bindings to C and Go for
addCoroutinePassesToExtensionPoints, which is used to add coroutine
passes to the correct locations in PassManagerBuilder.
Differential Revision: https://reviews.llvm.org/D51642
This is the second patch as part of https://bugs.llvm.org/show_bug.cgi?id=36544
Merging in the ConstantSDNode variant of FoldConstantArithmetic. After this, I will begin merging in FoldConstantVectorArithmetic
I've ensured this patch can build & pass all lit tests in Windows and Linux environments.
Patch by @justice_adams (Justice Adams)
Differential Revision: https://reviews.llvm.org/D74881
Simply by implementing a few functions I was able to correctly
disassemble a much larger amount of instructions.
Differential Revision: https://reviews.llvm.org/D74045
Not all operands are correctly disassembled at the moment. This means
that some machine instructions won't have all the necessary operands
set.
To avoid asserting, print an error instead until the necessary support
has been implemented.
Differential Revision: https://reviews.llvm.org/D73958
A number of multiplication instructions (muls, mulsu, fmul, fmuls,
fmulsu) had the wrong register class for an operand. This resulted in
the wrong register being used for the instruction.
Example:
target datalayout = "e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8"
target triple = "avr-atmel-none"
define i16 @sliceAppend(i16, i16, i16, i16, i16, i16) addrspace(1) {
%d = mul i16 %0, %5
ret i16 %d
}
The first instruction would be muls r24, r31 before this patch. The r31
should have been r15 if you look at the intermediate forms during
instruction selection / register allocation, but the generated
instruction uses r31. After this patch, an extra movw is inserted to get
%5 in range for muls.
To make sure this bug is fixed everywhere, I checked all instructions
and found that most multiplication instructions suffered from this bug,
which I have fixed with this patch. No other instructions appear to be
affected.
Differential Revision: https://reviews.llvm.org/D74281
Summary:
Function descriptor csect on AIX should be 4 byte align instead of 1 byte align.
Reviewer: daltenty
Differential Revision: https://reviews.llvm.org/D74974
Pass plugins introduced in D61446 do not support dynamic linking on
Windows, hence the option LLVM_${name_upper}_LINK_INTO_TOOLS can only
work being set to "ON". Currently, it defaults to "OFF" such that such
plugins are inoperable by default on Windows. Change the default for
subprojects to follow LLVM_ENABLE_PROJECTS.
Reviewed By: serge-sans-paille, MaskRay
Differential Revision: https://reviews.llvm.org/D72372
And add llvm-go back to the test dependencies.
No longer necessary now that llvm-go has been brought back.
This reverts commit e8f8873da5eaad187f82dad78ebdb3ab3df22b36.
I'm hoping to begin improving shuffle combining across different vector sizes, but before that we must ensure that all existing getTargetShuffleInputs calls must bail if the inputs aren't the same size.
Extends the existing support for spilling and restoring the condition
register to the linkage area for 32-bit targets, and enables for AIX.
Differential Revision: https://reviews.llvm.org/D74349
D74976 will handle larger vector types, but since SLM doesn't support AVX+ then we will always be extracting from 128-bit vectors so don't need to scale the cost.
This adds infrastructure to print and parse MIR MachineOperand comments.
The motivation for the ARM backend is to print condition code names instead of
magic constants that are difficult to read (for human beings). For example,
instead of this:
dead renamable $r2, $cpsr = tEOR killed renamable $r2, renamable $r1, 14, $noreg
t2Bcc %bb.4, 0, killed $cpsr
we now print this:
dead renamable $r2, $cpsr = tEOR killed renamable $r2, renamable $r1, 14 /* CC::always */, $noreg
t2Bcc %bb.4, 0 /* CC:eq */, killed $cpsr
This shows that MachineOperand comments are enclosed between /* and */. In this
example, the EOR instruction is not conditionally executed (i.e. it is "always
executed"), which is encoded by the 14 immediate machine operand. Thus, now
this machine operand has /* CC::always */ as a comment. The 0 on the next
conditional branch instruction represents the equal condition code, thus now
this operand has /* CC:eq */ as a comment.
As it is a comment, the MI lexer/parser completely ignores it. The benefit is
that this keeps the change in the lexer extremely minimal and no target
specific parsing needs to be done. The changes on the MIPrinter side are also
minimal, as there is only one target hooks that is used to create the machine
operand comments.
Differential Revision: https://reviews.llvm.org/D74306
Summary:
Implements the @llvm.aarch64.sve.dupq.lane intrinsic.
As specified in the ACLE, the behaviour of:
svdupq_lane_u64(data, index)
...is identical to:
svtbl(data, svadd_x(svptrue_b64(),
svand_x(svptrue_b64(), svindex_u64(0, 1), 1),
index * 2))
If the index is in the range [0,3], the operation is equivalent
to a single DUP (.q) instruction.
Reviewers: sdesmalen, c-rhodes, cameron.mcinally, efriedma, dancgr, rengolin
Reviewed By: sdesmalen
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74734
Change the way that we remove the redundant iteration count code in
the presence of IT blocks. collectLocalKilledOperands has been
introduced to scan an instructions operands, collecting the killed
instructions and then visiting them too. This is used to delete the
code in the preheader which calculates the iteration count. We also
track any IT blocks within the preheader and, if we remove all the
instructions from the IT block, we also remove the IT instruction.
isSafeToRemove is used to remove any redundant uses of the iteration
count within the loop body.
Differential Revision: https://reviews.llvm.org/D74975
Summary:
This patch adds intrinsics and ISelDAG nodes for signed
and unsigned fixed-point division:
```
llvm.sdiv.fix.sat.*
llvm.udiv.fix.sat.*
```
These intrinsics perform scaled, saturating division
on two integers or vectors of integers. They are
required for the implementation of the Embedded-C
fixed-point arithmetic in Clang.
Reviewers: bjope, leonardchan, craig.topper
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71550
Summary:
There is no need to write out gcdas when forking because we can just reset the counters in the parent process.
Let say a counter is N before the fork, then fork and this counter is set to 0 in the child process.
In the parent process, the counter is incremented by P and in the child process it's incremented by C.
When dump is ran at exit, parent process will dump N+P for the given counter and the child process will dump 0+C, so when the gcdas are merged the resulting counter will be N+P+C.
About exec** functions, since the current process is replaced by an another one there is no need to reset the counters but just write out the gcdas since the counters are definitely lost.
To avoid to have lists in a bad state, we just lock them during the fork and the flush (if called explicitely) and lock them when an element is added.
Reviewers: marco-c
Reviewed By: marco-c
Subscribers: hiraditya, cfe-commits, #sanitizers, llvm-commits, sylvestre.ledru
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D74953
Summary:
This should produce slightly better error messages in case of failures.
Only slightly, because this code was pretty careful about that to begin
with -- I've seen code which does much worse.
Reviewers: jhenderson, dblaikie
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74899
Summary:
The type used to represent functional units in MC is
'unsigned', which is 32 bits wide. This is currently
not a problem in any upstream target as no one seems
to have hit the limit on this yet, but in our
downstream one, we need to define more than 32
functional units.
Increasing the size does not seem to cause a huge
size increase in the binary (an llc debug build went
from 1366497672 to 1366523984, a difference of 26k),
so perhaps it would be acceptable to have this patch
applied upstream as well.
Subscribers: hiraditya, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71210
This optimization bypasses GOT loads and calls/branches through stubs when the
ultimate target of the access/branch is found to be within range of the
reference.
Extra debugging output is also added to the generic JITLink algorithm and
basic GOT and Stubs builder utility to aid debugging.
The gather intrinsics use a floating point mask when the result
type is FP. But we call DemandedBits on the mask assuming its an
integer type. We also use integer types when we create it from
generic IR. So add a bitcast to the intrinsic path to guarantee
the integer type.
The type profile we use for the isel patterns lied about how
many operands the gather/scatter node has to skip the index
and scale operands. This allowed us to expand the baseptr
operand into base, displacement, and segment and then merge
the index and scale with them in the final instruction during
isel. This is kind of a hack that relies on isel not checking the
number of operands at all.
This commit switches to custom isel where we can manage this
directly without relying on holes in the isel checking.
Summary:
The IR printing always prints out all functions in a module with the new pass manager, even with -filter-print-funcs specified. This is being fixed in this change. However, there are two exceptions, i.e, with user-specified wildcast switch -filter-print-funcs=* or -print-module-scope, under which IR of all functions should be printed.
Test Plan:
make check-clang
make check-llvm
Reviewers: wenlei
Reviewed By: wenlei
Subscribers: wenlei, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D74814
Leave the gather/scatter subclasses, but make them inherit from
MemIntrinsicSDNode and delete their constructor and destructor.
This way we can still have the getIndex, getMask, etc. convenience
functions.
In order to build the Linux kernel, the back chain must be supported with
packed-stack. The back chain is then stored topmost in the register save
area.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D74506
This version fixes a buildbot failure cause by picking the wrong insert
point for XORs. We cannot pick the XOR binary operator as insert point,
as it is not guaranteed that both input operands for the overflow
intrinsic are defined before it.
This reverts the revert commit
c7fc0e5da6c3c36eb5f3a874a6cdeaedb26856e0.
A question about this behavior came up on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139003.html
...and as part of backend improvements in D73978.
We decided not to implement a more general change that would have
folded any FP binop with nearly arbitrary constant + undef operand
to undef because that is not theoretically correct (even if it is
practically correct).
This is the SDAG-equivalent to the IR change in D74713.
Add a map from BasicBlocks to overlap intervals. For partial writes, we
can keep track of those in IOLs. We only add candidates that are valid
for eliminations.
Reviewers: dmgreen, bryant, asbirlea, Tyker
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D73757