This is a recommit that fixes unwanted STP generation by checking that
the base register has not been modified or used elsewhere.
Our initial motivating case was memcpy's with alignments > 16. The
loads/stores, to which small memcpy's expand, are kept together in
several places so that we get a sequence like this for a 64 bit copy:
LD w0
LD w1
ST w0
ST w1
The load/store optimiser can generate a LDP/STP w0, w1 from this because
the registers read/written are consecutive. In our case however, the
sequence is optimised during ISel, resulting in:
LD w0
ST w0
LD w0
ST w0
This instruction reordering allows reuse of registers. Since the registers
are no longer consecutive (i.e. they are the same), it inhibits LDP/STP
creation. The approach here is to perform renaming:
LD w0
ST w0
LD w1
ST w1
to enable the folding of the stores into a STP. We do not yet generate
the LDP due to a limitation in the renaming implementation, but plan to
look at that in a follow-up so that we fully support this case. While
this was initially motivated by certain memcpy's, this is a general
approach and thus is beneficial for other cases too, as can be seen
in some test changes.
Differential Revision: https://reviews.llvm.org/D103597
There was a bug from cost calculation for partially invariant unswitch.
The costs of non-duplicated blocks are substracted from the total LoopCost, so
anything that is duplicated should not be counted.
Differential Revision: https://reviews.llvm.org/D103816
Summary:
Memory globalization is required to maintain OpenMP standard semantics for data sharing between
worker and master threads. The GPU cannot share data between its threads so must allocate global or
shared memory to store the data in. Currently this is implemented fully in the frontend using the
`__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard
CPU stack sharing. The front-end scans the target region for variables that escape the region and
must be shared between the threads. Each variable then has a field created for it in a global record
type.
This patch replaces this functinality with a single allocation command, effectively mimicing an
alloca instruction for the variables that must be shared between the threads. This will be much
slower than the current solution, but makes it much easier to optimize as we can analyze each
variable independently and determine if it is not captured. In the future, we can replace these
calls with an `alloca` and small allocations can be pushed to shared memory.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D97680
Don't do this while stipping pointer casts, instead fetch it at
the end. This improves compatibility with opaque pointers for the
case where the base object is not opaque.
Currently we drop wrapping flags for expressions like (A + C1)<flags> - C2.
But we can retain flags under certain conditions:
* Adding a smaller constant is NUW if the original AddExpr was NUW.
* Adding a constant with the same sign and small magnitude is NSW, if the
original AddExpr was NSW.
This can improve results after using `SimplifyICmpOperands`, which may
subtract one in order to use stricter predicates, as is the case for
`isKnownPredicate`.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D104319
These serve as a convenient combination of consume_front/back and
startswith_lower/endswith_lower, consistent with other existing
case insensitive methods named <operation>_lower.
Differential Revision: https://reviews.llvm.org/D104218
Handle to gep p, 0-v case separately, and not as part of the loop
that ensures all indices are constant integers. Those two things
are not really related.
This patch adds the `afn`, `contract`, and `reassoc` fast-math flags.
It also fixes up `fneg`'s order in the alphabetized list.
Reviewed By: MaskRay, craig.topper
Differential Revision: https://reviews.llvm.org/D104541
This patch aims to add the scalable property to LLT. The rest of the
patch-series changes the interfaces to take/return ElementCount and
TypeSize, which both have the ability to represent the scalable property.
The changes are mostly mechanical and aim to be non-functional changes
for fixed-width vectors.
For scalable vectors some unit tests have been added, but no effort has
been put into making any of the GlobalISel algorithms work with scalable
vectors yet. That will be left as future work.
The work is split into a series of 5 patches to make reviews easier.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D104450
Patch was reverted due to a bug that existed before it and was exposed
by it. Returning after the underlying bug has been fixed.
Differential Revision: https://reviews.llvm.org/D103959
This changes the encoding of the `attribute` field, which currently only
contains the value `0` denoting this tag is for an exception, from
`varuint32` to `uint8`. This field is effectively unused at the moment
and reserved for future use, and it is not likely to need `varuint32`
even in future.
See https://github.com/WebAssembly/exception-handling/pull/162.
This does not change any encoded binaries because `0` is encoded in the
same way both in `varuint32` and `uint8`.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D104571
Since this method can apply to cmpxchg operations, make sure it's clear
what value we're actually retrieving. This will help ensure we don't
accidentally ignore the failure ordering of cmpxchg in the future.
We could potentially introduce a getOrdering() method on AtomicSDNode
that asserts the operation isn't cmpxchg, but not sure that's
worthwhile.
Differential Revision: https://reviews.llvm.org/D103338
A backedge-taken count doesn't refer to memory; returning a pointer type
is nonsense. So make sure we always return an integer.
The obvious way to do this would be to just convert the operands of the
icmp to integers, but that doesn't quite work out at the moment:
isLoopEntryGuardedByCond currently gets confused by ptrtoint operations.
So we perform the ptrtoint conversion late for lt/gt operations.
The test changes are mostly innocuous. The most interesting changes are
more complex SCEV expressions of the form "(-1 * (ptrtoint i8* %ptr to
i64)) + %ptr)". This is expected: we can't fold this to zero because we
need to preserve the pointer base.
The call to isLoopEntryGuardedByCond in howFarToZero is less precise
because of ptrtoint operations; this shows up in the function
pr46786_c26_char in ptrtoint.ll. Fixing it here would require more
complex refactoring. It should eventually be fixed by future
improvements to isImpliedCond.
See https://bugs.llvm.org/show_bug.cgi?id=46786 for context.
Differential Revision: https://reviews.llvm.org/D103656
Add a parameter of IsFSDiscriminator to function
getBaseDiscriminatorFromDiscriminator().
This function currently checks the internal flag of
--enable-fs-discriminator. This is not good because we might
change the default value of the internal flag.
Note that we have a default parameter. This is just
because create_afdo_tool has a call-site to it.
I will remove the default parameter in a later patch.
Differential Revision: https://reviews.llvm.org/D104584
`IMAGE_REL_ARM64_REL64/IMAGE_REL_AMD64_REL64` do not exist and `.quad a - .` is
currently not representable.
For instrumentation, `.quad a - .` is useful representing a cross-section
reference in a metadata section, to allow ELF medium/large code models. The COFF
limitation makes such generic instrumentations inconvenient. I plan to make a
PGO/coverage metadata section field relative in D104556.
Differential Revision: https://reviews.llvm.org/D104564
According to IR LangRef, the FMF flag:
contract
Allow floating-point contraction (e.g. fusing a multiply followed by an
addition into a fused multiply-and-add).
reassoc
Allow reassociation transformations for floating-point instructions.
This may dramatically change results in floating-point.
My understanding is that these two flags shouldn't imply each other,
as we might have a SDNode that can be reassociated with others, but
not contractble.
eg: We may want following fmul/fad/fsub to freely reassoc, but don't
want fma being generated here.
%F = fmul reassoc double %A, %B ; <double> [#uses=1]
%G = fmul reassoc double %C, %D ; <double> [#uses=1]
%H = fadd reassoc double %F, %G ; <double> [#uses=1]
%I = fsub reassoc double %H, %E ; <double> [#uses=1]
Before https://reviews.llvm.org/D45710, `reassoc` flag actually
did not imply isContratable either.
The current implementation also only check the flag in fadd node,
ignoring fmul node, this patch update that as well.
Reviewed By: spatel, qiucf
Differential Revision: https://reviews.llvm.org/D104247
Without this patch, llvm/utils/update_cc_test_checks.py fails to
perform `--replace-value-regex` replacements when two RUN lines
produce the same output and use the same single FileCheck prefix. The
problem is that replacements in a RUN line's output are not performed
until after comparing against previous RUN lines' output, where
replacements have already been performed. This patch fixes that.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D104566
Reapplied without changes -- this was reverted together with an
underlying patch.
-----
Bitcasts having opaque pointer source or result type cannot be
converted into a zero-index GEP, GEP source and result types
always have the same opaque-ness.
Relative to the original patch, an InstCombine test has been
added to show a previously missed pattern, and the Coroutine
test that resulted in the revert has been regenerated.
-----
Move this into a separate function, to make sure that early
returns do not accidentally skip other transforms. This previously
happened for the isSized() check, which skipped folds like
distributing a bitcast over a select.
The argument reduction pass shouldn't remove arguments of
intrinsics, because the resulting module is ill-formed, and so
inherently uninteresting.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D103129
This reverts commit d9f5d7b959de36085944d4a99a73f3053f953796.
This reverts commit 5780611d7e044ef56c4214df2c236ef5e15545ab.
This causes a failure in Coroutine tests.
As these are no longer passed to UnrollLoop(), there is no need to
modify them in computeUnrollCount(). Make them non-reference parameters.
Differential Revision: https://reviews.llvm.org/D104590
Perform better analysis when trying to vectorize PHIs.
1. Do not try to vectorize vector PHIs.
2. Do deeper analysis for more profitable nodes for the vectorization.
Before we just tried to vectorize the PHIs of the same type. Patch
improves this and tries to vectorize PHIs with incoming values which
come from the same basic block, have the same and/or alternative
opcodes.
It allows to save the compile time and provides better vectorization
results in general.
Part of D57059.
Differential Revision: https://reviews.llvm.org/D103638
Bitcasts having opaque pointer source or result type cannot be
converted into a zero-index GEP, GEP source and result types
always have the same opaque-ness.
Move this into a separate function, to make sure that early
returns do not accidentally skip other transforms. There is
already one isSized() check that could run into this issue,
thus this change is not strictly NFC.
This makes it more convenient to get a text format profile.
Add an error for printing non-text format output to a terminal for instrumentation profile.
(It cannot be portably tested. For sample profile, raw_fd_ostream is hidden deeply so it's inconvenient to add a diagnostic.)
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D104600
It's not possible to bitcast between different address spaces,
and this is ensured by the IR verifier. As such, this bitcast to
addrspacecast canonicalization can never be hit.
Adjust assertions to use isOpaqueOrPointeeTypeMatches() and make
it return an opaque pointer result for an opaque base pointer. We
also need to enumerate the element type, as it is no longer
implicitly enumerated through the pointer type.
Differential Revision: https://reviews.llvm.org/D104655
They are not conducive to being stored in git. Instead, we autogenerate
mock model artifacts for use in tests. Production models can be
specified with the cmake flag LLVM_INLINER_MODEL_PATH.
LLVM_INLINER_MODEL_PATH has two sentinel values:
- download, which will download the most recent compatible model.
- autogenerate, which will autogenerate a "fake" model for testing the
model uptake infrastructure.
Differential Revision: https://reviews.llvm.org/D104251
This reverts commit bb1dc876ebb8a2eef38d5183d00c2db1437f1c91.
This patch causes an assertion failure when building an arm64 defconfig
Linux kernel.
See https://reviews.llvm.org/D103959 for a link to the original bug
report and a reduced reproducer.