As noticed in D90554 ,
the AVX2 costs for 256-bit vectors did not include FMAXNUM entries,
so we fell back to AVX1 which assumes those ops will be split into
128-bit halves or something close to that.
Differential Revision: https://reviews.llvm.org/D90613
The code is looking for (sext_inreg (or (shl X, C2), (shr (and Y, C3), C1))).
We need to ensure X and Y are the same.
Differential Revision: https://reviews.llvm.org/D90580
The `LiveRegUnits` utility (as well as `LivePhysRegs`) considers
callee-saved registers to be alive at the point after the return
instruction in a block. In the ARM backend, the `LR` register is
classified as callee-saved, which is not really correct (from an ARM
eABI or just common sense point of view). These two conditions cause
the `MachineOutliner` to overestimate the liveness of `LR`, which
results in unnecessary saves/restores of `LR` around calls to outlined
sequences. It also causes the `MachineVerifer` to crash in some
cases, because the save instruction reads a dead `LR`, for example
when the following program:
int h(int, int);
int f(int a, int b, int c, int d) {
a = h(a + 1, b - 1);
b = b + c;
return 1 + (2 * a + b) * (c - d) / (a - b) * (c + d);
}
int g(int a, int b, int c, int d) {
a = h(a - 1, b + 1);
b = b + c;
return 2 + (2 * a + b) * (c - d) / (a - b) * (c + d);
}
is compiled with `-target arm-eabi -march=armv7-m -Oz`.
This patch computes the liveness of `LR` in return blocks only, while
taking into account the few ARM instructions, which read `LR`, but
nevertheless the register is not mentioned (explicitly or implicitly)
in the instruction operands.
Differential Revision: https://reviews.llvm.org/D89189
This reverts the revert commit 408c4408facc3a79ee4ff7e9983cc972f797e176.
This version of the patch includes a fix for a crash caused by
treating ICmp/FCmp constant expressions as instructions.
Original message:
On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.
This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.
This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.
I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
Patch fixes case when sched class has write and read variants belonging
to different processor models.
Differential revision: https://reviews.llvm.org/D89777
For PS4 development we support dllimport/export annotations in
source code. This patch enables the dllimport/export attributes
on PS4 by adding a new function to query the triple for whether
dllimport/export are used and using that function to decide
whether these attributes are supported. This replaces the current
method of checking if the target is Windows.
This means we can drop the use of "TargetArch" in the .td file
(which is an improvement as dllimport/export support isn't really
a function of the architecture).
I have included a simple codgen test to show that the attributes
are accepted and have an effect on codegen for PS4. I have also
enabled the DLLExportStaticLocal and DLLImportStaticLocal
attributes, which we support downstream. However, I am unable to
write a test for these attributes until other patches for PS4
dllimport/export handling land upstream. Whilst writing this
patch I noticed that, as these attributes are internal, they do
not need to be target specific (when these attributes are added
internally in Clang the target specific checks have already been
run); however, I think leaving them target specific is fine
because it isn't harmful and they "really are" target specific
even if that has no functional impact.
Differential Revision: https://reviews.llvm.org/D90442
As discussed on D90322, some MSVC builds are failing with is_trivially_copyable static asserts (see D86126) - we can avoid this by not using the std::pair<unsigned,unsigned> which held both the FP+DP Registers, just handle the FP register and convert to DP on the fly.
These sections are implicit and handled a bit differently.
Currently the "Offset" is ignored for them.
This patch fixes an issue.
Differential revision: https://reviews.llvm.org/D90446
Only the aliases 'xzr' and 'sp' exist for the physical register x31.
The reason for wanting to remove the alias 'x31' is because it allows users
to write invalid asm that is not accepted by the GNU assembler.
Is there any objection to removing this alias? Or do we want to keep
this for compatibility with existing code that uses w31/x31?
Differential Revision: https://reviews.llvm.org/D90153
When validating C3 in (sext_inreg (or (shl X, C2), (shr (and Y, C3), C1)), i32)
we are truncating it to 32 bits before checking its value. We need
to check all 64 bits.
Variable InnerIsSel references FalseRes, while FalseRes might be
zext/sext. So InnerIsSel should reference SetOrSelCC, otherwise a crash
will happen.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D90142
The function is matching (sext_inreg (or (shl X, C2), (shr (and Y, C3), C1))),
with appropriate checks for the constants to be a rotate. But it
fails to check that X and Y are the same which is also necessary.
fshl/fshr intrinsics turn into rotl/rotr ISD opcodes and we don't
have a complete set of patterns.
We pattern match rotl, but we have a custom match for rori that gets
priority. We don't pattern match rotr and we don't have patterns
or custom code for rori from rotr.
We have added a new load/store cluster algorithm in D85517. However, AArch64 see
some compiling deg with the new algorithm as the IsReachable() is not cheap if
the DAG is complex. O(M+N) See https://bugs.llvm.org/show_bug.cgi?id=47966
So, this patch added a heuristic to switch to old cluster algorithm if the DAG is too complex.
Reviewed By: Owen Anderson
Differential Revision: https://reviews.llvm.org/D90144
Similar to -fprofile-generate=, add -fmemory-profile= which takes a
directory path. This is passed down to LLVM via a new module flag
metadata. LLVM in turn provides this name to the runtime via the new
__memprof_profile_filename variable.
Additionally, always pass a default filename (in $cwd if a directory
name is not specified vi the = form of the option). This is also
consistent with the behavior of the PGO instrumentation. Since the
memory profiles will generally be fairly large, it doesn't make sense to
dump them to stderr. Also, importantly, the memory profiles will
eventually be dumped in a compact binary format, which is another reason
why it does not make sense to send these to stderr by default.
Change the existing memprof tests to specify log_path=stderr when that
was being relied on.
Depends on D89086.
Differential Revision: https://reviews.llvm.org/D89087
Strengthening nowrap flags is relatively expensive. Make sure we
only do it if we're actually going to use the flags -- we don't
use them for many recursive invocations. Additionally, if we're
reusing an existing SCEV node, there's no point in trying to
strengthen the flags if we don't have any new baseline facts.
This change falls slightly short of being NFC, because the way
flags during add+addrec / mul+addrec folding are handled may be
more precise (as less operands are included in the calculation).
The fshr intrinsic with same inputs produces rotr ISD node. The
fshl intrinsic produces rotl ISD node.
There were only test cases and isel patterns for the fshl/rotl case.
This patch adds fshr/rotr test cases.
This reverts 781917254dba17df7fb357a5f621ada2ac1b36e3 and recommits
781917254dba17df7fb357a5f621ada2ac1b36e3.
I've changed getRegForInlineAsmConstraint to not use a std::pair
of Register in a previous commit. Hopefully that fixes the reported
issue with expensive checks on Windows. I'm still not sure exactly
why this commit removing an include affected a different file.
Original message:
RISCVRegisterInfo.h is part of the CodeGen layer. The Utils library
is intended to be shared with the MC layer so shouldn't use files
from the CodeGen layer.
The register enum names are already available from
RISCVMCTargetDesc.h. It appears what was coming from this include
was a transitive include of the Register class which I've replaced
with MCRegister. Register has a constructor from MCRegister so it
should be convertible.
The return value of this interface still uses an 'unsigned' on all
targets. So we convert Register back to unsigned at the end.
I'm hoping this will prevent the issue that caused the revert of
D90322.