This patch adds support to the instruction-referencing LiveDebugValues
implementation for emitting entry values. The instruction referencing
implementations tracking by value rather than location means that we can
get around two of the issues with VarLocs. DBG_VALUE instructions that
re-assign the same value to a variable are no longer a problem, because we
can "see through" to the value being assigned. We also don't need to do
anything special during the dataflow stages: the "variable value problem"
doesn't need to know whether a value is available most of the time, and the
times it deoes need to know are always when entry values need to be
terminated.
The patch modifies the "TransferTracker" class, adding methods to identify
when a variable ias an entry value candidate, and when a machine value is
an entry value. recoverAsEntryValue tests these two things and emits an
entry-value expression if they're true. It's used when we clobber or
otherwise lose a value and can't find a replacement location for the value
it contained.
Differential Revision: https://reviews.llvm.org/D88406
This enables proper lowering of non-byte sized loads. We still aren't
faithfully preserving memory types everywhere, so the legality checks
still only consider the size.
Previously we didn't preserve the memory type and had to blindly
interpret a number of bytes. Now that non-byte memory accesses are
representable, we can handle these correctly.
Ported from DAG version (minus some weird special case i1 legality
checking which I don't fully understand, and we don't have a way to
query for)
For now, this is NFC and the test changes are placeholders. Since the
legality queries are still relying on byte-flattened memory sizes, the
legalizer can't actually see these non-byte accesses. This keeps this
change self contained without merging it with the larger patch to
switch to LLT memory queries.
Enable the emission of a GNU attributes section by reusing the code for
emitting the ARM build attributes section.
The GNU attributes follow the exact same section format as the ARM
BuildAttributes section, so this can be factored out and reused for GNU
attributes generally.
The immediate motivation for this is to emit a GNU attributes section for the
vector ABI on SystemZ (https://reviews.llvm.org/D105067).
Review: Logan Chien, Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D102894
This will currently accept the old number of bytes syntax, and convert
it to a scalar. This should be removed in the near future (I think I
converted all of the tests already, but likely missed a few).
Not sure what the exact syntax and policy should be. We can continue
printing the number of bytes for non-generic instructions to avoid
test churn and only allow non-scalar types for generic instructions.
This will currently print the LLT in parentheses, but accept parsing
the existing integers and implicitly converting to scalar. The
parentheses are a bit ugly, but the parser logic seems unable to deal
without either parentheses or some keyword to indicate the start of a
type.
This is an ELF specific option which isn't supported for Windows/MinGW
targets, even if the MinGW linker otherwise uses an ld.bfd like linker
interface.
Differential Revision: https://reviews.llvm.org/D105148
Use separate variable for adjusted scale used for GCD computations. This
fixes an issue where we incorrectly determined that all indices are
non-negative and returned noalias because of that.
Follow up to 91fa3565da16.
This is to allow 64 bit constant rematerialization. If a constant
is split into two separate moves initializing sub0 and sub1 like
now RA cannot rematerizalize a 64 bit register.
This gives 10-20% uplift in a set of huge apps heavily using double
precession math.
Fixes: SWDEV-292645
Differential Revision: https://reviews.llvm.org/D104874
Relevant discussion can be found at: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148197.html
In the existing design, An SCC that contains a coroutine will go through the folloing passes:
Inliner -> CoroSplitPass (fake) -> FunctionSimplificationPipeline -> Inliner -> CoroSplitPass (real) -> FunctionSimplificationPipeline
The first CoroSplitPass doesn't do anything other than putting the SCC back to the queue so that the entire pipeline can repeat.
As you can see, we run Inliner twice on the SCC consecutively without doing any real split, which is unnecessary and likely unintended.
What we really wanted is this:
Inliner -> FunctionSimplificationPipeline -> CoroSplitPass -> FunctionSimplificationPipeline
(note that we don't really need to run Inliner again on the ramp function after split).
Hence the way we do it here is to move CoroSplitPass to the end of the CGSCC pipeline, make it once for real, insert the newly generated SCCs (the clones) back to the pipeline so that they can be optimized, and also add a function simplification pipeline after CoroSplit to optimize the post-split ramp function.
This approach also conforms to how the new pass manager works instead of relying on an adhoc post split cleanup, making it ready for full switch to new pass manager eventually.
By looking at some of the changes to the tests, we can already observe that this changes allows for more optimizations applied to coroutines.
Reviewed By: aeubanks, ChuanqiXu
Differential Revision: https://reviews.llvm.org/D95807
This prevents constant gep operands from being hoisted by the Constant
Hoisting pass, leaving them to CodegenPrepare which can usually do a
better job at splitting large offsets. This can, in general, improve
performance and decrease codesize, especially for v6m where many
constants have a high cost.
Differential Revision: https://reviews.llvm.org/D104877
Summary:
in the patch https://reviews.llvm.org/D103651 [AIX][XCOFF] generate eh_info when vector registers are saved according to the traceback table.
when generate eh_info, it switch to other section, when it done, it need to switch back to text section again.
Reviewers: Jason Liu
Differential Revision: https://reviews.llvm.org/105195
This demonstrates a possible fix for PR48760 - for compares with constants, canonicalize the SGT/UGT condition code to use SGE/UGE which should reduce the number of EFLAGs bits we need to read.
As discussed on PR48760, some EFLAG bits are treated independently which can require additional uops to merge together for certain CMOVcc/SETcc/etc. modes.
I've limited this to cases where the constant increment doesn't result in a larger encoding or additional i64 constant materializations.
Differential Revision: https://reviews.llvm.org/D101074
There must be a better way to describe this pattern in words?
(X + C2) >u C --> X <s -C2 (if C == C2 + SMAX)
This could be extended to handle the more general (non-constant)
pattern too:
https://alive2.llvm.org/ce/z/rdfNFP
define i1 @src(i8 %a, i8 %c1) {
%t = add i8 %a, %c1
%c2 = add i8 %c1, 127 ; SMAX
%ov = icmp ugt i8 %t, %c2
ret i1 %ov
}
define i1 @tgt(i8 %a, i8 %c1) {
%neg_c1 = sub i8 0, %c1
%ov = icmp slt i8 %a, %neg_c1
ret i1 %ov
}
The pattern was noticed as a by-product of D104932.
We already implemented this for the select form, but the intrinsic form was missing. Note that this doesn't change poison behavior as 1 is non-poison, and the optimized form is still poison exactly when x is.
Dynamically loaded plugins for the new pass manager are initialised by
calling llvmGetPassPluginInfo. This is defined as a weak symbol so that
it is continually redefined by each plugin that is loaded. When loading
a plugin from a shared library, the intention is that
llvmGetPassPluginInfo will be resolved to the definition in the most
recent plugin. However, using a global search for this resolution can
fail in situations where multiple plugins are loaded.
Currently:
* If a plugin does not define llvmGetPassPluginInfo, then it will be
silently resolved to the previous plugin's definition.
* If loading the same plugin twice with another in between, e.g. plugin
A/plugin B/plugin A, then the second load of plugin A will resolve to
llvmGetPassPluginInfo in plugin B.
* The previous case can also occur when a dynamic library defines both
NPM and legacy plugins; the legacy plugins are loaded first and then
with `-fplugin=A -fpass-plugin=B -fpass-plugin=A`: A will be loaded as
a legacy plugin and define llvmGetPassPluginInfo; B will be loaded
and redefine it; and finally when A is loaded as an NPM plugin it will
be resolved to the definition from B.
Instead of searching globally, restrict the symbol lookup to the library
that is currently being loaded.
Differential Revision: https://reviews.llvm.org/D104916
In various circumstances, when we clobber a register there may be
alternative locations that the value is live in. The classic example would
be a value loaded from the stack, and then clobbered: the value is still
available on the stack. InstrRefBasedLDV was coping with this at block
starts where it's forced to pick a location, however it wasn't searching
for alternative locations when values were clobbered.
This patch notifies the "Transfer Tracker" object when clobbers occur, and
it's able to find alternatives and issue DBG_VALUEs for that location. See:
the added test.
Differential Revision: https://reviews.llvm.org/D88405
The remarks will trigger on some functions that are marked cold, such as the
`__muldc3` intrinsic functions. Change the remarks to avoid these functions.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D105196
We have analogous rules in instsimplify, etc.., but were missing the same in SCEV. The fold is near trivial, but came up in the context of a larger change.
I believe this Changed flag should be initialized to false,
otherwise the if (!Changed) is always dead. This doesn't
manifest in a functional issue because the PHINode checks will
fail if nothing changed. They are identical to the earlier
checks that must have already failed to get into this else block.
While there remove an else after return to reduce indentation.
Differential Revision: https://reviews.llvm.org/D105159
This patch augments Lit with the ability to parse regular expressions
in boolean expressions. This includes REQUIRES:, XFAIL:, UNSUPPORTED:,
and all other special Lit markup that evaluates to a boolean expression.
Regular expressions can be specified by enclosing them in {{...}},
similarly to how FileCheck handles such regular expressions. The regular
expression can either be on its own, or it can be part of an identifier.
For example, a match expression like {{.+}}-apple-darwin{{.+}} would match
the following variables:
x86_64-apple-darwin20.0
arm64-apple-darwin20.0
arm64-apple-darwin22.0
etc...
In the long term, this could be used to remove the need to handle the
target triple specially when parsing boolean expressions.
Differential Revision: https://reviews.llvm.org/D104572
Based off the worse case numbers generated by D103695, the AVX1/2/512 sitofp/uitofp/fptosi/fptoui costs were higher than necessary (based off instruction counts instead of actual throughput).
The SSE costs still need further fixes, but I hit an issue with the order in which SSE costs are checked - we need to check CUSTOM costs (with non-legal types) first, and then fallback to LEGALIZED types. I'm looking at this now, and this should let us start thinning out a lot of the duplicates in the costs tables.
Then we can finally start work on vXi64 / vXi16 / vXi8 / vXi1 integers, which should let us look at sub-128-bit vectorization (D103925).
This patch adds additional remarks, suggesting the use of `noescape` for failed
globalization and indicating when internalization failed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D105150
Details: https://reviews.llvm.org/D96805 changed the GCNTTIImpl::getCFInstrCost to return 1 for the PHI nodes
for the TTI::TCK_CodeSize and TTI::TCK_SizeAndLatency. This is incorrect because the value moves that are the
result of the PHI lowering are inserted into the basic block predecessors - not into the block itself.
As a result of this change LoopRotate and LoopUnroll were broken because of the incorrect Loop header and loop
body size/cost estimation.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D105104
When clamping the index for a memory access to a stacked vector we must
take into account the entire type being accessed, not just assume that
we are accessing only a single element.
Differential Revision: https://reviews.llvm.org/D105016
This is a followup patch on D103636 where
it seemed checking on amdgpu-calls and
amdgpu-stack-objects is unnecessary. Removing these
checks didn't regress any tests functionally.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D104513
Currently UREM & SREM on constant ranges produces overly pessimistic
results for single element constant ranges.
Delegate to APInt's implementation if both operands are single element
constant ranges. We already do something similar for other binary
operators, like binary AND.
Fixes PR49731.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D105115
Looking at PostDominatorTree::dominates, we can see that has the same
logic (with the addition of handling Phi nodes - which are not used as inputs in
this pass) as the helper function.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105141
My use case for this is illustrated in the test case: I want to define
the same instruction twice with different (disjoint) predicates, because
the instruction has different operands on different subtargets. It's
convenient to do this with a multiclass that also defines an alias for
the instruction.
Previously tablegen would complain if this alias was defined twice with
no predicate. One way to fix this would be to add a predicate on each
definition of the alias, matching the predicate on the instruction. But
this (a) is slightly awkward to do in the real world use case I had, and
(b) leads to an inefficient matcher that will do something like this:
if (Mnemonic == "foo_alias") {
if (Features.test(Feature_Subtarget1Bit))
Mnemonic == "foo";
else if (Features.test(Feature_Subtarget2Bit))
Mnemonic == "foo";
return;
}
It would be more efficient to skip the feature tests and return "foo"
unconditionally.
Overall it seems better to allow multiple definitions of the identical
alias with no predicate.
Differential Revision: https://reviews.llvm.org/D105033
`ARMInstPrinter::printMveAddrModeQOperand()` was added in D62680, but
was never used. It looks like `printT2AddrModeImm8Operand<false>()` is
used instead.
Differential Revision: https://reviews.llvm.org/D105124