This patch proposes how to deal with RISC-V vector frame objects. The
layout of RISC-V vector frame will look like
|---------------------------------|
| scalar callee-saved registers |
|---------------------------------|
| scalar local variables |
|---------------------------------|
| scalar outgoing arguments |
|---------------------------------|
| RVV local variables && |
| RVV outgoing arguments |
|---------------------------------| <- end of frame (sp)
If there is realignment or variable length array in the stack, we will use
frame pointer to access fixed objects and stack pointer to access
non-fixed objects.
|---------------------------------| <- frame pointer (fp)
| scalar callee-saved registers |
|---------------------------------|
| scalar local variables |
|---------------------------------|
| ///// realignment ///// |
|---------------------------------|
| scalar outgoing arguments |
|---------------------------------|
| RVV local variables && |
| RVV outgoing arguments |
|---------------------------------| <- end of frame (sp)
If there are both realignment and variable length array in the stack, we
will use frame pointer to access fixed objects and base pointer to access
non-fixed objects.
|---------------------------------| <- frame pointer (fp)
| scalar callee-saved registers |
|---------------------------------|
| scalar local variables |
|---------------------------------|
| ///// realignment ///// |
|---------------------------------| <- base pointer (bp)
| RVV local variables && |
| RVV outgoing arguments |
|---------------------------------|
| /////////////////////////////// |
| variable length array |
| /////////////////////////////// |
|---------------------------------| <- end of frame (sp)
| scalar outgoing arguments |
|---------------------------------|
In this version, we do not save the addresses of RVV objects in the
stack. We access them directly through the polynomial expression
(a x VLENB + b). We do not reserve frame pointer when there is any RVV
object in the stack. So, we also access the scalar frame objects through the
polynomial expression (a x VLENB + b) if the access across RVV stack
area.
Differential Revision: https://reviews.llvm.org/D94465
1. Emit warnings for files without symbols.
2. Add -no_warning_for_no_symbols.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D95843
The AMD GPU SIMemoryLegalizer was using the ordering address space
rather than the instruction address space when determining the
s_waitcnt to generate to ensure that a read-modify-write atomic has
completed. This resulted in additional unnecessary counters being
waited on.
Differential Revision: https://reviews.llvm.org/D96743
Reapply patch after fixing buildbot failure.
Refactor SampleProfile.cpp to use the core code in CodeGen.
The main changes are:
(1) Move SampleProfileLoaderBaseImpl class to a header file.
(2) Split SampleCoverageTracker to a head file and a cpp file.
(3) Move the common codes (common options and callsiteIsHot())
to the common cpp file.
Differential Revision: https://reviews.llvm.org/D96455
Basic block sections enables function sections implicitly, this is not needed
and is inefficient with "=list" option.
We had basic block sections enable function sections implicitly in clang. This
is particularly inefficient with "=list" option as it places functions that do
not have any basic block sections in separate sections. This causes unnecessary
object file overhead for large applications.
This patch disables this implicit behavior. It only creates function sections
for those functions that require basic block sections.
Further, there was an inconistent behavior with llc as llc was not turning on
function sections by default. This patch makes llc and clang consistent and
tests are added to check the new behavior.
This is the first of two patches and this adds functionality in LLVM to
create a new section for the entry block if function sections is not
enabled.
Differential Revision: https://reviews.llvm.org/D93876
This change introduces support for zero flag ELF section groups to LLVM.
LLVM already supports COMDAT sections, which in ELF are a special type
of ELF section groups. These are generally useful to enable linker GC
where you want a group of sections to always travel together, that is to
be either retained or discarded as a whole, but without the COMDAT
semantics. Other ELF assemblers already support zero flag ELF section
groups and this change helps us reach feature parity.
Differential Revision: https://reviews.llvm.org/D95851
This reverts commit 310b35304cdf5a230c042904655583c5532d3e91.
The build is broken with -DBUILD_SHARED_LIBS=ON :
lib/ProfileData/CMakeFiles/LLVMProfileData.dir/SampleProfileLoaderBaseUtil.cpp.o: In function `llvm::sampleprofutil::callsiteIsHot(llvm::sampleprof::FunctionSamples const*, llvm::ProfileSummaryInfo*, bool)':
SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x1a): undefined reference to `llvm::ProfileSummaryInfo::isColdCount(unsigned long) const'
SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x28): undefined reference to `llvm::ProfileSummaryInfo::isHotCount(unsigned long) const'
...
When the `DWOPath` is absolute, we want to use `DWOPath` as is, without prepending any other
components to the path. The `sys::path::append` does not join, but rather unconditionally appends
the paths, so something like `sys::path::append("/tmp", "/tmp/banana")` will result in
`/tmp/tmp/banana` rather than the desired `/tmp/banana`.
This then causes `llvm-dwp` to fail in a following situation:
```
$ clang -gsplit-dwarf /tmp/banana/test.c -c -o /tmp/outdir/foo.o
$ clang outdir/foo.o -o outdir/hm
$ llvm-dwarfdump outdir/hm | grep -C2 foo.dwo
DW_AT_comp_dir ("/tmp")
DW_AT_GNU_pubnames (true)
DW_AT_GNU_dwo_name ("/tmp/outdir/foo.dwo")
DW_AT_GNU_dwo_id (0xde4d396f3bf0e257)
DW_AT_low_pc (0x0000000000401100)
$ strace -o trace llvm-dwp -e outdir/hm -o outdir/hm.dwp
error: No such file or directory
$ cat trace | grep foo.dwo
openat(AT_FDCWD, "/tmp/tmp/outdir/foo.dwo", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
```
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D96678
We currently represent TOC entries by an MCSymbol. This is not enough in some situations.
For example, when accessing an initialized TLS variable v on AIX using the general dynamic
model, we need to generate the two following entries for v:
.tc .v[TC],v@m
.tc v[TC],v
One is for the region handle (with the @m relocation), the other is for the variable offset.
This refactoring allows storing several entries for the same symbol with different VariantKind
in the TOC. If the VariantKind is not specified, we default to VK_None.
The AIX TLS implementation using this refactoring to generate the two entries will be posted
in a subsequent patch.
Patched By: bsaleil
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D96346
This patch fixes a warning:
llvm-project/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseImpl.h:69:7:
error: 'llvm::SampleProfileLoaderBaseImpl' has virtual functions but
non-virtual destructor [-Werror,-Wnon-virtual-dtor]
Differential Revision: https://reviews.llvm.org/D96810
This reverts commit 5dfba562dd247f731528448ee83785b099f93629.
That commit causes an assertion failure with the following repro:
typedef long b __attribute__((__vector_size__(16)));
b *d;
b e;
b __attribute__((__always_inline__)) c(b h, b i) {
return (__attribute__((__vector_size__(8 * sizeof(short)))) short)h + i;
}
j() {
b k, l, m, n, o[6], p, q;
m = d[5];
b r = m;
b s = f(r, 8);
q = s;
l = d[1];
p = l;
t(q);
n = c(m, l);
o[1] = c(s, f(p, 8));
k = __builtin_shufflevector(n, o[1], 0, 2);
e = __builtin_ia32_psrlwi128(k, j);
}
./bin/clang -cc1 -triple x86_64-grtev4-linux-gnu -emit-obj -O1 -std=c99 test.c
real_path returns an `std::error_code` which evaluates to `true` in case an
error happens and `false` if not. This code was checking the inverse, so
case-insensitive file systems ended up being detected as case sensitive.
Tested using an LLDB reproducer test as we anyway need a real file system and
also some matching logic to detect whether the respective file system is
case-sensitive (which the test is doing via some Python checks that we can't
really emulate with the usual FileCheck logic).
Fixes rdar://67003004
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D96795
Refactor SampleProfile.cpp to use the core code in CodeGen.
The main changes are:
(1) Move SampleProfileLoaderBaseImpl class to a header file.
(2) Split SampleCoverageTracker to a head file and a cpp file.
(3) Move the common codes (common options and callsiteIsHot())
to the common cpp file.
Differential Revision: https://reviews.llvm.org/D96455
This reverts commit 61b4702a408834228c1c139b0e9af98616774db4.
We were seeing some test failures in SPECINT2006 due to this change. Reverting
to investigate.
The tile directive is in OpenMP's Technical Report 8 and foreseeably will be part of the upcoming OpenMP 5.1 standard.
This implementation is based on an AST transformation providing a de-sugared loop nest. This makes it simple to forward the de-sugared transformation to loop associated directives taking the tiled loops. In contrast to other loop associated directives, the OMPTileDirective does not use CapturedStmts. Letting loop associated directives consume loops from different capture context would be difficult.
A significant amount of code generation logic is taking place in the Sema class. Eventually, I would prefer if these would move into the CodeGen component such that we could make use of the OpenMPIRBuilder, together with flang. Only expressions converting between the language's iteration variable and the logical iteration space need to take place in the semantic analyzer: Getting the of iterations (e.g. the overload resolution of `std::distance`) and converting the logical iteration number to the iteration variable (e.g. overload resolution of `iteration + .omp.iv`). In clang, only CXXForRangeStmt is also represented by its de-sugared components. However, OpenMP loop are not defined as syntatic sugar. Starting with an AST-based approach allows us to gradually move generated AST statements into CodeGen, instead all at once.
I would also like to refactor `checkOpenMPLoop` into its functionalities in a follow-up. In this patch it is used twice. Once for checking proper nesting and emitting diagnostics, and additionally for deriving the logical iteration space per-loop (instead of for the loop nest).
Differential Revision: https://reviews.llvm.org/D76342
Similar to D96622, we're better off just promoting uaddsat(x,y) -> umin(add(x,y),c) instead of trying to perform a shifted uaddsat.
I initially tried to just use shifted promotion in cases where we didn't have a legal/custom umin - but we don't appear to have any targets that have uaddsat but not umin, so imo we're better off always using the umin and avoid an untested shifted uaddsat code path.
Differential Revision: https://reviews.llvm.org/D96767
fde24661718c7812a20a10e518cd853e8e060107 added support for
scalable vectors to matchUnaryPredicate by handling SPLAT_VECTOR in
addition to BUILD_VECTOR. This was used to enabled UDIV/SDIV/UREM/SREM
by constant expansion in BuildUDIV/BuildSDIV in TargetLowering.cpp
The caller there expects to call getBuildVector from the match factors.
This leads to a crash right now if there is a SPLAT_VECTOR of
fixed vectors since the number of vectors won't match the number
of elements.
To fix this, this patch updates the callers to check the opcode
instead of whether the type is fixed or scalable. This assumes
that only 3 opcodes are handled by matchUnaryPredicate so
I've added an assertion to the final else to check that opcode.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D96174
ICMP & SELECT patterns extracting the sign of a value can be simplified
to OR & ASR (see https://alive2.llvm.org/ce/z/Xx4iZ0).
This does not save any instructions in IR, but it is profitable on
AArch64, because we need at least 2 extra instructions to materialize 1
and -1 for the SELECT.
The improvements result in ~5% speedups on loops of the form
static int sign_of(int x) {
if (x < 0) return -1;
return 1;
}
void foo(const int *x, int *res, int cnt) {
for (int i=0;i<cnt;i++)
res[i] = sign_of(x[i]);
}
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D96596
Also don't call function to update the call graph if there are no
clones. The function will fail.
rdar://74277860
Differential Revision: https://reviews.llvm.org/D96620
From what I can tell, a writeback is unpredictable with LR for both
loads and stores. This changes the operand from a gprnopc to a rGPR in
both cases (which I believe is essentially a NFC due to the tied-def
already being a rGPR.)
Differential Revision: https://reviews.llvm.org/D96723
In a future commit, soft clauses will be hinted with kill instructions
rather than forced together with bundles. Look for kills that look
like this, and erase them. I'm not sure if the check for specific uses
is worthwhile, or if it would be better to just unconditionally erase
kills.
This reduces test churn in a future patch.
Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))
Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles.
Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle.
This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise.
Differential Revision: https://reviews.llvm.org/D96345
This was allowing debug instructions to break the bundling, which
would change scheduling behavior. Bundle debug info / kills inside
the bundle. This seems to work OK, although the asm printer doesn't
understand these in a bundle. This implicitly expects the memory
legalizer to unbundle. It would probably be slightly nicer to move
these after.
Rewrite the loop to be clearer and make sure we don't end a bundle on
a meta instruction, only allow them in between other valid bundle
instructions.
This is a split patch of D96644.
Explicitly pass both `InnerLoop` and `OuterLoop` to function `processLoop` to remove the need to swap elements in loop list and allow making loop list an `ArrayRef`.
Also, fix inconsistent spellings of `OuterLoopId` and `Inner Loop Id` in debug log.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D96650
When a literal that cannot fit in the immediate form of the fmov instruction
is used to initialise an SVE vector, an extra unnecessary fmov is currently
generated. This patch adds an extra codegen pattern preventing the extra
instruction from being generated.
Differential Revision: https://reviews.llvm.org/D96700
Co-Authored-By: Paul Walker <paul.walker@arm.com>
This patch enables scalable vectorization of loops with integer/fast reductions, e.g:
```
unsigned sum = 0;
for (int i = 0; i < n; ++i) {
sum += a[i];
}
```
A new TTI interface, isLegalToVectorizeReduction, has been added to prevent
reductions which are not supported for scalable types from vectorizing.
If the reduction is not supported for a given scalable VF,
computeFeasibleMaxVF will fall back to using fixed-width vectorization.
Reviewed By: david-arm, fhahn, dmgreen
Differential Revision: https://reviews.llvm.org/D95245
These directives force the associated address to be interpreted as a
function or data respectively. CODE is the default when not specified.
Differential Revision: https://reviews.llvm.org/D96712
Reviewed by: MaskRay
Non-splatted non-integer build_vector nodes were mistakenly being
lowered as VID expressions, which should not happen. VID can only be
used to select integer build_vector nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96718
The patterns mostly follow the scalar counterparts, save for some extra
optimizations to match the vector/scalar forms.
The patch adds a DAGCombine for ISD::FCOPYSIGN to try and reorder
ISD::FNEG around any ISD::FP_EXTEND or ISD::FP_TRUNC of the second
operand. This helps us achieve better codegen to match vfsgnjn.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96028
This patch changes costAndCollectOperands to use InstructionCost for
accumulated cost values.
isHighCostExpansion will return true if the cost has exceeded the budget.
Reviewed By: CarolineConcatto, ctetreau
Differential Revision: https://reviews.llvm.org/D92238