Don't shrink VOP3 instructions if there are any uses of a carry-out
operand, because the shrunken form of the instruction would write the
carry-out to vcc instead of to a virtual register.
Differential Revision: https://reviews.llvm.org/D100760
This patch is related to https://reviews.llvm.org/D100032 which define
some illegal types or operations for x86_amx. There are no arguments,
arrays, pointers, vectors or constants of x86_amx.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D100472
Previously we would use the type of the pointee to determine what to
cast the result of constant folding a load. To aid with opaque pointer
types, we should explicitly pass the type of the load rather than
looking at pointee types.
ConstantFoldLoadThroughBitcast() converts the const prop'd value to the
proper load type (e.g. [1 x i32] -> i32). Instead of calling this in
every intermediate step like bitcasts, we only call this when we
actually see the global initializer value.
In some existing uses of this API, we don't know the exact type we're
loading from immediately (e.g. first we visit a bitcast, then we visit
the load using the bitcast). In those cases we have to manually call
ConstantFoldLoadThroughBitcast() when simplifying the load to make sure
that we cast to the proper type.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D100718
This patch is the last one in backend to support fp128 type in
pre-POWER9 subtargets with VSX, removing temporary option and updating
remaining tests.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D92374
This patch relaxes the requirement that the STEP_VECTOR step constant
must be of a type at least as large as the vector element type. This
does not permit its use on targets which have legal vector element types
larger than the largest legal scalar type, such as i64 vectors on RV32.
As such, the requirement has been loosened so that the step operand must
be any scalar type so long as the constant immediate is non-negative and
the value fits inside the vector element type.
This limits combining optimizations in certain circumstances but in
practice it's unlikely to be a hindrance.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D100660
This patch adds basic CSKY branch instructions and symbol address series instructions.
Those two kinds of instruction have relationship between each other, and it involves much work about Fixups.
For now, basic instructions are enabled except for disassembler support.
We would support to generate basic codegen asm firstly and delay disassembler work later.
Differential Revision: https://reviews.llvm.org/D95029
This patch adds basic CSKY integer instructions except for branch series such as bsr, br.
It mainly includes basic ALU, load & store, compare and data move instructions.
Branch series instructions need handle complex symbol operand as following patch later.
Differential Revision: https://reviews.llvm.org/D94007
This basic parser will handle basic instructions with register or immediate operands.
With the addition of CSKYInstPrinter, we can now make use of lit tests.
Differential Revision: https://reviews.llvm.org/D93798
It's necessary to calculate correct instruction size because
PseudoVRELOAD and PseudoSPILL will be expanded into multiple
instructions.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D100702
This reverts commit 328377307ad2da961b3be0f2bbf1814a6f1f4ed3.
This commit causes buildbot failures due to some clang tests are not updated.
Temporary revert to fix clang tests.
Beginners might not aware of this variable and wanted to try a new experimental target.
Although this variable mention in Writing a Backend Documentation. But it becomes easy to search when listed in cmake.rst doc where most variables are listed.
Reviewed By: myhsu
Differential Revision: https://reviews.llvm.org/D100729
Cost of spill location is computed basing on relative branch frequency
where corresponding spill/reload/copy are located.
While the number itself is highly depends on incoming IR,
the total cost can be used when do some changes in RA.
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100020
Pseudo probe are currently given a slot index like other regular instructions. This affects register pressure and lifetime weight computation because of enlarged lifetime length with pseudo probe instructions. As a consequence, program could get different code generated w/ and w/o pseudo probes. I'm closing the gap by excluding pseudo probes from stack index and downstream register allocation related passes.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D100334
Flipping the default value of SkipPseudoOp to true for those MIR APIs to favor maximum performance. Note that certain spots like branch folding and MIR if-conversion is are disabled for better counts quality. For these two optimizations, this is a no-diff change.
The counts quality with SPEC2017 before/after this change is unchanged.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D100332
This patch improves https://reviews.llvm.org/D76971 (Deduce attributes for aligned_alloc in InstCombine) and implements "TODO" item mentioned in the review of that patch.
> The function aligned_alloc() is the same as memalign(), except for the added restriction that size should be a multiple of alignment.
Currently, we simply bail out if we see a non-constant size - change that.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D100785
M68kDisassembler should put M68kDesc as its direct library dependency
since it uses logics releated to code beads Otherwise the build will
fail when building LLVM libraries as shared objects (building LLVM
libraries statically won't have this problem though)
In an env that reuses compiler instances for multiple compilations, this
omission results in non-deterministic assembly output (names of the
auto-generated labels) if the order or full set of Modules compiled
varies.
Differential Revision: https://reviews.llvm.org/D100797
This also includes PC-relative addresses since they are still
referenced as absolute addresses in assembly and converted to
relative addresses by the assembler.
This changes, for example:
- `bra #-2` -> `bra $100`
- `jsr #16` -> `jsr $10`
Differential Revision: https://reviews.llvm.org/D100697
The unnamedaddr property of a function is lost when using
`-fwhole-program-vtables` and thinlto which causes size increase under linker's
safe icf mode.
The size increase of chrome on Linux when switching from all icf to safe icf
drops from 5 MB to 3 MB after this change, and from 6 MB to 4 MB on Windows.
There is a repro:
```
# a.h
struct A {
virtual int f();
virtual int g();
};
# a.cpp
#include "a.h"
int A::f() { return 10; }
int A::g() { return 10; }
# main.cpp
#include "a.h"
int g(A* a) {
return a->f();
}
int main(int argv, char** args) {
A a;
return g(&a);
}
$ clang++ -O2 -ffunction-sections -flto=thin -fwhole-program-vtables -fsplit-lto-unit -c main.cpp -o main.o && clang++ -Wl,--icf=safe -fuse-ld=lld -flto=thin main.o -o a.out && llvm-readobj -t a.out | grep -A 1 -e _ZN1A1fEv -e _ZN1A1gEv
Name: _ZN1A1fEv (480)
Value: 0x201830
--
Name: _ZN1A1gEv (490)
Value: 0x201840
```
Differential Revision: https://reviews.llvm.org/D100498
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.
Differential Revision: https://reviews.llvm.org/D100495
When the ProcResGroup has BufferSize=0,
1. if there is a subunit in the list of write resources for the
scheduling class, do not attempt to schedule the ProcResGroup.
2. if there is not a subunit in the list of write resources for the
scheduling class, choose a subunit to use instead of the ProcResGroup.
3. having both the ProcResGroup and any of its subunits in the resources
implied by a InstRW is not supported.
Used to model parallel uses from a pool of resources.
Differential Revision: https://reviews.llvm.org/D98976
Used to model structural hazards on FP issue, where some
instructions take up 2 issue slots and others one as well
as similar structural hazards on load issue, where some
instructions take up two load lanes and others one.
Differential Revision: https://reviews.llvm.org/D98977
This is mostly stylistic cleanup after D100226, but not entirely. When skimming the code, I found one case where we weren't accounting for attributes on the callsite at all. I'm also suspicious we had some latent bugs related to operand bundles (which are supposed to be able to *override* attributes on declarations), but I don't have concrete test cases for those, just suspicions.
Aside: The only case left in the file which directly checks attributes on the declaration is the norecurse logic. I left that because I didn't understand it; it looks obviously wrong, so I suspect I'm misinterpreting the intended semantics of the attribute.
Differential Revision: https://reviews.llvm.org/D100689
This change fixes a latent bug which was exposed by a change currently in review (https://reviews.llvm.org/D99802#2685032).
The story on this is a bit involved. Without this change, what ended up happening with the pending review was that we'd strip attributes off intrinsics, and then selectiondag would fail to lower the intrinsic. Why? Because the lowering of the intrinsic relies on the presence of the readonly attribute. We don't have a matcher to select the case where there's a glue node needed.
Now, on the surface, this still seems like a codegen bug. However, here it gets fun. I was unable to reproduce this with a standalone test at all, and was pretty much struck until skatkov provided the critical detail. This reproduces only when RS4GC and codegen are run in the same process and context. Why? Because it turns out we can't roundtrip the stripped attribute through serialized IR!
We'll happily print out the missing attribute, but when we parse it back, the auto-upgrade logic has a side effect of blindly overwriting attributes on intrinsics with those specified in Intrinsics.td. This makes it impossible to exercise SelectionDAG from a standalone test case.
At this point, I decided to treat this an RS4GC bug as a) we don't need to strip in this case, and b) I could write a test which shows the correct behavior to ensure this doesn't break again in the future.
As an aside, I'd originally set out to handle libfuncs too - since in theory they might have the same issues - but backed away quickly when I realized how the semantics of builtin, nobuiltin, and no-builtin-x all interacted. I'm utterly convinced that no part of the optimizer handles that correctly, and decided not to open that can of worms here.
We previously used splats instead of v128.const to materialize vector constants
because V8 did not support v128.const. Now that V8 supports v128.const, we can
use v128.const instead. Although this increases code size, it should also
increase performance (or at least require fewer engine-side optimizations), so
it is an appropriate change to make.
Differential Revision: https://reviews.llvm.org/D100716
This patch removes -fixed-abi check for indirect calls
and also adds queue-ptr which is required for indirect calls to work.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D100633
Comparisons to zero or one after cset instructions can be safely
removed in examples like:
cset w9, eq cset w9, eq
cmp w9, #1 ---> <removed>
b.ne .L1 b.ne .L1
cset w9, eq cset w9, eq
cmp w9, #0 ---> <removed>
b.ne .L1 b.eq .L1
Peephole optimization to detect suitable cases and get rid of that
comparisons added.
Differential Revision: https://reviews.llvm.org/D98564
During store promotion, we check whether the pointer was captured
to exclude potential reads from other threads. However, we're only
interested in captures before or inside the loop. Check this using
PointerMayBeCapturedBefore against the loop header.
Differential Revision: https://reviews.llvm.org/D100706
As noted in the FIXME there's a sort of agreement that the any
extra bits stored will be 0.
The generated code is pretty terrible. I was really hoping we
could use a tail undisturbed trick, but tail undisturbed no
longer applies to masked destinations in the current draft
spec.
Fingers crossed that it isn't common to do this. I doubt IR
from clang or the vectorizer would ever create this kind of store.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D100618
This is a partial port of AArch64TargetLowering::LowerCTPOP.
This custom lowering tries to uses NEON instructions to give a more efficient
CTPOP lowering when possible.
In the non-NEON/noimplicitfloat case, this should use the generic lowering
(see: https://godbolt.org/z/GcaPvWe4x). I think that's worth implementing after
implementing the widening code for s16/s8 though.
Differential Revision: https://reviews.llvm.org/D100399
These constraints are machine agnostic; there's no reason to handle
these per-arch. If arches don't support these constraints, then they
will fail elsewhere during instruction selection. We don't need virtual
calls to look these up; TargetLowering::getInlineAsmMemConstraint should
only be overridden by architectures with additional unique memory
constraints.
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D100416