Summary:
This is a follow up on D61634. It adds an LLVM IR intrinsic to allow better implementation of memcpy from C++.
A follow up CL will add the intrinsics in Clang.
Reviewers: courbet, theraven, t.p.northover, jdoerfert, tejohnson
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71710
This patch updates the remark to also include a summary of the number of
vector operations generated for each matrix expression.
Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D72480
We have some ! patterns in the .gitignore (for the projects and runtimes
directories), and those patterns end up overriding the previous file
ignores, such that e.g. a .swp file inside the runtimes directory isn't
ignored. Move the file ignores last to ensure they take effect.
Differential Revision: https://reviews.llvm.org/D73253
The installation target we create should trigger the corresponding
installation target in the runtimes external project.
Differential Revision: https://reviews.llvm.org/D73251
from DenseMap to MapVector
The iteration order of LoopVectorizationLegality::Reductions matters for the
final code generation, so we better use MapVector instead of DenseMap for it
to remove the nondeterminacy. reduction-order.ll in the patch is an example
reduced from the case we saw. In the output of opt command, the order of the
select instructions in the vector.body block keeps changing from run to run
currently.
Differential Revision: https://reviews.llvm.org/D73490
Generate remarks for matrix operations in a function. To generate remarks
for matrix expressions, the following approach is used:
1. Collect leafs of matrix expressions (done in
RemarkGenerator::getExpressionLeafs). Leafs are lowered matrix
instructions without other matrix users (like stores).
2. For each leaf, create a remark containing a linearizied version of the
matrix expression.
The following improvements will be submitted as follow-ups:
* Summarize number of vector instructions generated for each expression.
* Account for shared sub-expressions.
* Propagate matrix remarks up the inlining chain.
The information provided by the matrix remarks helps users to spot cases
where matrix expression got split up, e.g. due to inlining not
happening. The remarks allow users to address those issues, ensuring
best performance.
Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D72453
This behavior appears to have changed unintentionally in
b0e979724f2679e4e6f5b824144ea89289bd6d56.
Instead of printing the leading newline in printFunction, print it when
printing a module. This ensures that `OS << *Func` starts printing
immediately on the current line, but whole modules are printed nicely.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D73505
Summary:
Treat scalable allocas as if they have storage size of 0, and
scalable-typed memory accesses as if their range is unlimited.
This is not a proper support of scalable vector types in the analysis -
we can do better, but not today.
Reviewers: vitalybuka
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73394
This patch adds a new option to enable/disable register renaming in the
load-store optimizer. Defaults to disabled, as there is a potential
mis-compile caused by this.
Trivial type predicates should be moved into the tablegen pattern
itself, and not checked inside complex patterns. This eliminates a
redundant complex pattern, and fixes select source modifiers for
GlobalISel.
I have further patches which fully handle select in tablegen and
remove all of the C++ selection, although it requires the ugliness to
support the entire range of legal register types.
Summary:
This is mostly NFC. computeForAddSub may give more precise results in
some cases, but that doesn't seem to affect any existing GlobalISel
tests.
Subscribers: rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73431
D47163 created a rule that we should not change the casted
type of a select when we have matching types in its compare condition.
That was intended to help vector codegen, but it also could create
situations where we miss subsequent folds as shown in PR44545:
https://bugs.llvm.org/show_bug.cgi?id=44545
By using shouldChangeType(), we can continue to get the vector folds
(because we always return false for vector types). But we also solve
the motivating bug because it's ok to narrow the scalar select in that
example.
Our canonicalization rules around select are a mess, but AFAICT, this
will not induce any infinite looping from the reverse transform (but
we'll need to watch for that possibility if committed).
Side note: there's a similar use of shouldChangeType() for phi ops
just below this diff, and the source and destination types appear to
be reversed.
Differential Revision: https://reviews.llvm.org/D72733
This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::EXTRACT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.
Differential Revision: This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::EXTRACT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.
This patch fixes an assertion failure in DwarfExpression that is
triggered when a complex fragment has exactly the size of a
subregister of the register the DBG_VALUE points to *and* there is no
DWARF encoding for the super-register.
I took the opportunity to replace/document some magic values with
static constructor functions to make this code less confusing to read.
rdar://problem/58489125
Differential Revision: https://reviews.llvm.org/D72938
Followup to D72978. This moves existing negation handling in
InstCombine into freelyNegateValue(), which make it composable.
In particular, root negations of div/zext/sext/ashr/lshr/sub can
now always be performed through a shl/trunc as well.
Differential Revision: https://reviews.llvm.org/D73288
This is a revert-of-revert (i.e. this reverts commit 802bec89, which
itself reverted fa4701e1 and 79daafc9) with a fix folded in. The problem
was that call site tags weren't emitted properly when LTO was enabled
along with split-dwarf. This required a minor fix. I've added a reduced
test case in test/DebugInfo/X86/fission-call-site.ll.
Original commit message:
This allows a call site tag in CU A to reference a callee DIE in CU B
without resorting to creating an incomplete duplicate DIE for the callee
inside of CU A.
We already allow cross-CU references of subprogram declarations, so it
doesn't seem like definitions ought to be special.
This improves entry value evaluation and tail call frame synthesis in
the LTO setting. During LTO, it's common for cross-module inlining to
produce a call in some CU A where the callee resides in a different CU,
and there is no declaration subprogram for the callee anywhere. In this
case llvm would (unnecessarily, I think) emit an empty DW_TAG_subprogram
in order to fill in the call site tag. That empty 'definition' defeats
entry value evaluation etc., because the debugger can't figure out what
it means.
As a follow-up, maybe we could add a DWARF verifier check that a
DW_TAG_subprogram at least has a DW_AT_name attribute.
Update #1:
Reland with a fix to create a declaration DIE when the declaration is
missing from the CU's retainedTypes list. The declaration is left out
of the retainedTypes list in two cases:
1) Re-compiling pre-r266445 bitcode (in which declarations weren't added
to the retainedTypes list), and
2) Doing LTO function importing (which doesn't update the retainedTypes
list).
It's possible to handle (1) and (2) by modifying the retainedTypes list
(in AutoUpgrade, or in the LTO importing logic resp.), but I don't see
an advantage to doing it this way, as it would cause more DWARF to be
emitted compared to creating the declaration DIEs lazily.
Update #2:
Fold in a fix for call site tag emission in the split-dwarf + LTO case.
Tested with a stage2 ThinLTO+RelWithDebInfo build of clang, and with a
ReleaseLTO-g build of the test suite.
rdar://46577651, rdar://57855316, rdar://57840415, rdar://58888440
Differential Revision: https://reviews.llvm.org/D70350
We want to have more load/store clustering but we also want
to maintain low register pressure which are oposit targets.
Allow scheduler to reschedule regions without mutations
applied if we hit a register limit.
Differential Revision: https://reviews.llvm.org/D73386
This changes the generated (Instr|Asm|Reg|Regclass)Name tables from this
form:
extern const char HexagonInstrNameData[] = {
/* 0 */ 'G', '_', 'F', 'L', 'O', 'G', '1', '0', 0,
/* 9 */ 'E', 'N', 'D', 'L', 'O', 'O', 'P', '0', 0,
/* 18 */ 'V', '6', '_', 'v', 'd', 'd', '0', 0,
/* 26 */ 'P', 'S', '_', 'v', 'd', 'd', '0', 0,
[...]
};
...to this:
extern const char HexagonInstrNameData[] = {
/* 0 */ "G_FLOG10\0"
/* 9 */ "ENDLOOP0\0"
/* 18 */ "V6_vdd0\0"
/* 26 */ "PS_vdd0\0"
[...]
};
This should make debugging and exploration a lot easier for mortals,
while providing a significant compile-time reduction for common compilers.
To avoid issues with low implementation limits, this is disabled by
default for visual studio.
To force output one way or the other, pass
`--long-string-literals=<bool>` to `tablegen`
Reviewers: mstorsjo, rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D73044
A variation of this patch was originally committed in ce23515f5ab011 and
then reverted in e464b31c due to build failures.
TAPI currently lacks a way to emit the macCatalyst platform. For TBD_V3
is does support zippered frameworks given that both macOS and
macCatalyst are part of the PlatformSet.
Differential revision: https://reviews.llvm.org/D73325
Use intermediate instructions, unlike with buffer stores. This is
necessary because of the need to have an internal way to distinguish
between signed and unsigned extloads. This introduces some duplication
and near duplication with the buffer store selection path. The store
handling should maybe be moved into legalization to match and
eliminate the duplication.
Summary:
This is in preparation for adding more test cases for D69661 and other
bug fixes in the same area.
Reviewers: tpr, dstuttard, critson, nhaehnle, arsenm
Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70708