All other floating point math optimization related attribute are merged
in a conservative way during function inlining. This commit adds the
merge rule for the 'no-signed-zeros-fp-math' attribute.
Differential Revision: https://reviews.llvm.org/D81714
Some sequences of optimizations can generate call sites which may never be
executed during runtime, and through constant propagation result in dynamic
allocas being converted to static allocas with very large allocation amounts.
The inliner tries to move these to the caller's entry block, resulting in the
stack limits being reached/bypassed. Avoid inlining functions if this would
result.
The threshold of 64k currently doesn't get triggered on the test suite with an
-Os LTO build on arm64, care should be taken in changing this in future to avoid
needlessly pessimising inlining behaviour.
Differential Revision: https://reviews.llvm.org/D81765
The 'Descriptor' field of .debug_gnu_pubnames and .debug_gnu_pubtypes
section should be 1-byte rather than 4-byte. This patch helps resolve
this issue.
This patch enables printing of constants to see which instructions were
constant-folded. Needed for tests and better visiual analysis of
inliner's work.
Reviewers: apilipenko, mtrofin, davidxl, fedor.sergeev
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D81024
Summary:
The printer seems to intend to not print the trailing comma but has a
copy-paste error for the last value in the escape, and the parser
enforces having no trailing comma, but somehow a test was never included
to actually confirm it.
Reviewers: thegameg, arsenm
Reviewed By: thegameg, arsenm
Subscribers: wdng, arsenm, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82478
Provided test case crashes otherwise.
If NewTy is already DL.getIntPtrType(NewTy),
CreateBitCast() won't actually create any bitcast,
so we are better off just doing the general thing.
This makes it usable from outside of SCEV,
while previously it was internal to the ScalarEvolution.cpp
In particular, i want to use it in an WIP alloca promotion helper pass,
to analyze if some SCEV is a multiple of some other SCEV.
The dependency was introduced in
5134020ea62d1e1e125fdac48d251a26b80e9781. The only functional change
from this removal would be the new PM interface for the two codegen
passes. This is not necessary since we don't have codegen pipeline using
new PM yet. This removal is to break the potential circular dependency between
Passes and CodeGen once the codegen begins to gain new PM support.
Patch has a memory leak bug that broke the ASan buildbots. More info
available at: https://reviews.llvm.org/D80330
This reverts commit b5740105d270a2d76da8812cafb63e4b799ada73.
The ARM ARM considers p10/p11 valid arguments for MCR/MRC instructions.
MRC instructions with p10 arguments are also used in kernel code which
is shared for different architectures. Turn usage of p10/p11 to warnings
for ARMv7/ARMv8-M.
Reviewers: rengolin, olista01, t.p.northover, efriedma, psmith, simon_tatham
Reviewed By: simon_tatham
Subscribers: hiraditya, danielkiss, jcai19, tpimh, nickdesaulniers, peter.smith, javed.absar, kristof.beyls, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59733
This class allows to see the inliner's decisions for better
optimization verifications and tests. To use, use flag
"-passes="print<inline-cost>"".
This is the second attempt to integrate the patch.
The problem from the first try has been discussed and
fixed in D82205.
Reviewers: apilipenko, mtrofin, davidxl, fedor.sergeev
Reviewed By: mtrofin
Differential revision: https://reviews.llvm.org/D81743
This patch implements builtins for the following prototypes:
unsigned long long __builtin_cntlzdm (unsigned long long, unsigned long long)
unsigned long long __builtin_cnttzdm (unsigned long long, unsigned long long)
vector unsigned long long vec_cntlzm (vector unsigned long long, vector unsigned long long)
vector unsigned long long vec_cnttzm (vector unsigned long long, vector unsigned long long)
Differential Revision: https://reviews.llvm.org/D80941
Summary:
Get back `const` partially lost in one of recent changes.
Additionally specify explicit qualifiers in few places.
Reviewers: samparker
Reviewed By: samparker
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82383
For the upcoming changes, we need to have an ability to dump
InlineCostCallAnalyzer info in non-debug builds as well.
Reviewed-By: mtrofin
Differential Revision: https://reviews.llvm.org/D82205
Summary:
Add the --hot-func-list feature to llvm-profdata show for sample profiles. This feature prints a list of hot functions whose max sample count are above the 99% threshold, with their numbers of total samples, total samples percentage, max samples, entry samples, and their function names.
Test Plan:
Reviewers: wenlei, hoyFB
Reviewed By: wenlei, hoyFB
Subscribers: hoyFB, wenlei, weihe, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82355
This diff merges help message tests for llvm-objcopy, llvm-strip and
llvm-install-name-tool.
Patch by Sameer Arora!
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D82012
D68667 introduced a tighter limit to the number of GEPs to simplify
together. The limit was based on the vector element size of the pointer,
but the pointers themselves are not actually put in vectors.
IIUC we try to vectorize the index computations here, so we should base
the limit on the vector element size of the computation of the index.
This restores the test regression on AArch64 and also restores the
vectorization for a important pattern in SPEC2006/464.h264ref on
AArch64 (@test_i16_extend). We get a large benefit from doing a single
load up front and then processing the index computations in vectors.
Note that we could probably even further improve the AArch64 codegen, if
we would do zexts to i32 instead of i64 for the sub operands and then do
a single vector sext on the result of the subtractions. AArch64 provides
dedicated vector instructions to do so. Sketch of proof in Alive:
https://alive2.llvm.org/ce/z/A4xYAB
Reviewers: craig.topper, RKSimon, xbolva00, ABataev, spatel
Reviewed By: ABataev, spatel
Differential Revision: https://reviews.llvm.org/D82418
This diff updates the help messages for llvm-objcopy, llvm-strip and
llvm-install-name-tool.
Patch by Sameer Arora!
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D81907
Currently module asm ends up emitted twice and at the wrong place in the PTX.
This patch moves module asm generation into emitStartOfAsmFile() which puts at
the correct location in the generated PTX.
Differential Revision: https://reviews.llvm.org/D82280
Eric Cristopher asked me about possibly disabling some passes at
-O1/Og. Figured a good first step was to test all the pipelines.
They all appear to be the same for now. Hoping we can use FileCheck
prefixes for differences to avoid repeating the contents 3 times.
Summary:
In D52514 I had fixed a bug with WPD after indirect call promotion, by
checking that a type test being analyzed dominates potential virtual
calls. With that fix I included a small effiency enhancement to avoid
processing a devirt candidate multiple times (when there are multiple
type tests). This latter change wasn't in response to any measured
efficiency issues, it was merely theoretical. Unfortuantely, it turns
out to limit optimization opportunities after inlining.
Specifically, consider code that looks like:
class A {
virtual void foo();
};
class B : public A {
void foo();
}
void callee(A *a) {
a->foo(); // Call 1
}
void caller(B *b) {
b->foo(); // Call 2
callee(b);
}
After inlining callee into caller, because of the existing call to
b->foo() in caller there will be 2 type tests in caller for the vtable
pointer of b: the original type test against B from Call 2, and the
inlined type test against A from Call 1. If the code was compiled with
-fstrict-vtable-pointers, then after optimization WPD will see that
both type tests are associated with the inlined virtual Call 1.
With my earlier change to only process a virtual call against one type
test, we may only consider virtual Call 1 against the base class A type
test, which can't be devirtualized. With my change here to remove this
restriction, it also gets considered for the type test against the
derived class B type test, where it can be devirtualized.
Note that if caller didn't include it's own earlier virtual call
b->foo() we will not be able to devirtualize after inlining callee even
after this fix, since there would not be a type test against B in the
IR. As a future enhancement we can consider inserting type tests at call
sites that pass pointers to classes with virtual calls, to enable
context-sensitive devirtualization after inlining.
Reviewers: pcc, vitalybuka, evgeny777
Subscribers: Prazek, hiraditya, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79235
This patch removes the PROC macro in favor of CPUKind enum and a
table that contains information about CPUs.
The current information in the table is the CPU name, CPUKind enum
value, key feature for target multiversioning, and Is64Bit capable.
For the strings that are aliases, I've duplicated the information
in the table. This means there are more rows in the table than
CPUKind enums.
This replaces multiple StringSwitch's with loops through the table.
They are linear searches due to the table being more logically
ordered than alphabetical. The StringSwitch's would have also been
linear. I've used StringLiteral on the strings in the table so we
can quickly check the length while searching.
I contemplated having a CPUKind for each string so there was a 1:1
mapping, but didn't want to spread more names to the places that
use the enum.
My ultimate goal here is to store the features for each CPU as a
bitset within the table. Hoping to use constexpr to make this
composable so we can group features and inherit them. After the
table lookup we can turn the bitset into a list of strings for the
frontend. The current switch we have for selecting features for
CPUs has become difficult to maintain while trying to express
inheritance relationships.
Differential Revision: https://reviews.llvm.org/D82414
This change includes the following:
- Add additional information in the relevant table-gen files to encode
the necessary information to automatically parse the argument into a
CompilerInvocation instance and to generate the appropriate command
line argument from a CompilerInvocation instance.
- Extend OptParserEmitter to emit the necessary macro tables as well as
constant tables to support parsing and generating command line
arguments for options that provide the necessary information.
- Port some options to use this new system for parsing and generating
command line arguments.
Differential Revision: https://reviews.llvm.org/D79796
This reverts commit 62841415e685fe8857f75edd1fa92b7d1d08b875.
The commit is a misnomer, and it "made its way in" unintentionally,
through a patch that had it as a depdendency. The change itself ended up
to be just a comment update, but the description is completely wrong.