Add llvm.call.preallocated.{setup,arg} instrinsics.
Add "preallocated" operand bundle which takes a token produced by llvm.call.preallocated.setup.
Add "preallocated" parameter attribute, which is like byval but without the copy.
Verifier changes for these IR constructs.
See https://github.com/rnk/llvm-project/blob/call-setup-docs/llvm/docs/CallSetup.md
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74651
The code assumed that zero-extending the integer constant to the
designated alloc size would be fine even for BE targets, but that's not
the case as that pulls in zeros from the MSB side while we actually
expect the padding zeros to go after the LSB.
I've changed the codepath handling the constant integers to use the
store size for both small(er than u64) and big constants and then add
zero padding right after that.
Differential Revision: https://reviews.llvm.org/D78011
SmallVector currently uses 32bit integers for size and capacity to reduce
sizeof(SmallVector). This limits the number of elements to UINT32_MAX.
For a SmallVector<char>, this limits the SmallVector size to only 4GB.
Buffering bitcode output uses SmallVector<char>, but needs >4GB output.
This changes SmallVector size and capacity to conditionally use word-size
integers if the element type is small (<4 bytes). For larger elements types,
the vector size can reach ~16GB with 32bit size.
Making this conditional on the element type provides both the smaller
sizeof(SmallVector) for larger types which are unlikely to grow so large,
and supports larger capacities for smaller element types.
This recommit fixes the same template being instantiated twice on platforms
where uintptr_t is the same as uint32_t.
We may want to identify sequences that are not
reductions, but still qualify as load-combines
in the back-end, so make most of the body a
helper function.
"unskipableSimplifyCode()" was added to handle unsafe BL8_NOTOC instruction
when TOC was not completely removed. The function is not needed after confirming
TOC pointer is not used in a function that uses PC-Relative addressing.
Differential Revision: https://reviews.llvm.org/D78517
like .cfi_restore"
Insert .cfi_offset/.cfi_register when IncomingCSRSaved of current block
is larger than OutgoingCSRSaved of its previous block.
Original commit message:
https://reviews.llvm.org/D42848 only handled CFA related cfi directives but
didn't handle CSR related cfi. The patch adds the CSR part. Basically it reuses
the framework created in D42848. For each basicblock, the patch tracks which
CSR set have been saved at its CFG predecessors's exits, and compare the CSR
set with the set at its previous basicblock's exit (The previous block is the
block laid before the current block). If the saved CSR set at its previous
basicblock's exit is larger, .cfi_restore will be inserted.
The patch also generates proper .cfi_restore in epilogue to make sure the
saved CSR set is consistent for the incoming edges of each block.
Differential Revision: https://reviews.llvm.org/D74303
Summary:
Reviewing failures identified in D78586, I was finding the identifiers
for these iterators hard to read.
Reviewers: efriedma, MaskRay, jyknight
Reviewed By: MaskRay
Subscribers: hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78849
All avx512 truncate instructions except vXi64->vXi32 are 2 uops
on port 5. So raise their costs to 2. Except when we have an
earlier faster sequence like pshufb for 128 bit input vectors.
Add a lower cost of 3 v16i16->v16i8 with avx512f where we can
extend to v16i32 then truncate. And a cost of 2 for avx512bw with
and without avx512vl. There we can use vpmovwb with either a ymm
or zmm input. Both of these beat masking, splitting, and using
packuswb which is our avx/avx2 codegen.
The tl;dr story is that this causes jumps in the emitted line
tables, even at `-O0`. We could at some point consider more fancy
solutions to preserve locations, but it doesn't seem to be worth
the effort for now.
<rdar://problem/62460788>
Differential Revision: https://reviews.llvm.org/D78947
Tail Calls were initially disabled for PC Relative code because it was not safe
to make certain assumptions about the tail calls (namely that all compiled
functions no longer used the TOC pointer in R2). However, once all of the
TOC pointer references have been removed it is safe to tail call everything
that was tail called prior to the PC relative additions as well as a number of
new cases.
For example, it is now possible to tail call indirect functions as there is no
need to save and restore the TOC pointer for indirect functions if the caller
is marked as may clobber R2 (st_other=1). For the same reason it is now also
possible to tail call functions that are external.
Differential Revision: https://reviews.llvm.org/D77788
With a fix to unittests/Support/TarWriterTest.cpp
This makes lld's --reproduce output more compatible with tar 1.13 and
before. This is a very old version of tar, but it's the version in
both gnuwin and unxutils, and the cost for supporting them are very
low, so we might as well just do that.
https://bugs.chromium.org/p/chromium/issues/detail?id=1073524#c21
and onward has more details.
Differential Revision: https://reviews.llvm.org/D78945
This makes lld's --reproduce output more compatible with tar 1.13 and
before. This is a very old version of tar, but it's the version in
both gnuwin and unxutils, and the cost for supporting them are very
low, so we might as well just do that.
https://bugs.chromium.org/p/chromium/issues/detail?id=1073524#c21
and onward has more details.
Differential Revision: https://reviews.llvm.org/D78945
D63847 added `MCInstrAnalysis::evaluateMemoryOperandAddress()`. This patch
leverages the feature to print the target addresses for evaluable instructions.
```
-400a: movl 4080(%rip), %eax
+400a: movl 4080(%rip), %eax # 5000 <data1>
```
This patch also deletes `MIA->isCall(Inst) || MIA->isUnconditionalBranch(Inst) || MIA->isConditionalBranch(Inst)`
which is used to guard `MCInstrAnalysis::evaluateBranch()`
Reviewed By: jhenderson, skan
Differential Revision: https://reviews.llvm.org/D78776
Profile and profile summary are usually read only once and then annotated
on IR. The profile summary metadata on IR should include the value of the
newly added partial profile flag, so that compilation phase like thinlto
postlink can get the full set of profile information.
Differential Revision: https://reviews.llvm.org/D78310
Summary:
When generating code for the LLVM IR zeroinitialiser operation, if
the vector type is scalable we should be using SPLAT_VECTOR instead
of BUILD_VECTOR.
Differential Revision: https://reviews.llvm.org/D78636
There are some intrinsics like this that currently block tail
predication, but should be fine. This allows fma through, as the one
that I ran into. There may be others that need the same treatment but
I've only done this one here.
Differential Revision: https://reviews.llvm.org/D78385
The insert(truncate/extend(extract(vec0,c0)),vec1,c1) case in rGacbc5ede99 wasn't combining the 'mineltsize' with the src vector elt size which may be smaller due to implicit extension during extraction.
Reduced from test case provided by @mstorsjo
hasNoSchedulingInfo should be used for Pseudo's and other instructions
that are never expected to be scheduled. This removes the flag from new
ARM instructions, instead fixing the A57 schedule by marking the related
architecture features as unsupported.
When compiling for a arm5te cpu from clang, the +dsp attribute is set.
This meant we could try and generate qadd8 instructions where we would
end up having no pattern. I've changed the condition here to be hasV6Ops
&& hasDSP, which is what other parts of ARMISelLowering seem to use for
similar instructions.
Fixed PR45677.
Differential Revision: https://reviews.llvm.org/D78877
This is a NFC patch for D77319. The idea is to hide the getNegatibleCost inside the getNegatedExpression()
to have it return null if the cost is expensive, and add some helper function for easy to use. And
rename the old getNegatedExpression to negateExpression to avoid the semantic conflict.
Reviewed By: RKSimon
Differential revision: https://reviews.llvm.org/D78291
We're currently getting this from the default implementation. But
I don't like how the cost model came to this answer and I might
be making some changes there.
Summary:
Extending the Function::viewCFG prototypes to allow for printing block probability info in form of .dot files during debugging.
Also avoiding an AV when no BFI/BPI available.
Reviewers: wenlei, davidxl, knaumov
Reviewed By: wenlei, davidxl
Subscribers: MaskRay, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77978
When folding tail, branch taken count is computed during initial VPlan execution
and recorded to be used by the compare computing the loop's mask. This recording
should directly set the State, instead of reusing Value2VPValue mapping which
serves original Values present prior to vectorization.
The branch taken count may be a constant Value, which may be used elsewhere in
the loop; trying to employ Value2VPValue for both leads to the issue reported in
https://reviews.llvm.org/D76992#inline-721028
Differential Revision: https://reviews.llvm.org/D78847
This means AttrBuilder will always create a sorted set of attributes and
we can skip the sorting step. Sorting attributes is surprisingly
expensive, and I recently made it worse by making it use array_pod_sort.
Followup to the PR45604 fix at rGe71dd7c011a3 where we disabled most of these cases.
By creating the shuffle at the byte level we can handle any extension/truncation as long as we track how small the scalar got and assume that the upper bytes will need to be zero.
Integer ranges can be used for loaded/stored values. Note that widening
can be disabled for loads/stores, as we only rely on instructions that
cause continued increases to ranges to be widened (like binary
operators).
Reviewers: efriedma, mssimpso, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78433
The change in D78624 had a noticeable negative compile-time impact.
It seems that going through a function call for the MaxUsesToExplore
default is fairly expensive, at least if LLVM is not built with LTO.
This patch makes MaxUsesToExpore default to 0 and assigns the actual
default in the implementation instead. This recovers most of the
regression.
Differential Revision: https://reviews.llvm.org/D78734