When trying to return a type such as <vscale x 1 x i32> from a
function we crash in DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR
when attempting to get the fixed number of elements in the vector.
For the simple case we are dealing with, i.e. extracting
<vscale x 1 x i32> from index 0 of input vector <vscale x 4 x i32>
we can simply rely upon existing code that just returns the input.
Differential Revision: https://reviews.llvm.org/D102605
Haswell, Excavator and early Ryzen all have slower 256-bit non-uniform vector shifts (confirmed on AMDSoG/Agner/instlatx64 and llvm models) - so bump the worst case costs accordingly.
Noticed while investigating PR50364
This adds custom lowering for the MLOAD and MSTORE ISD nodes when
passed fixed length vectors in SVE. This is done by converting the
vectors to VLA vectors and using the VLA code generation.
Fixed length extending loads and truncating stores currently produce
correct code, but do not use the built in extend/truncate in the
load and store instructions. This will be fixed in a future patch.
Differential Revision: https://reviews.llvm.org/D101834
This will allow to use llvm-strip with file names that begin with dashes.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D102825
This patch prepares llvm-objcopy to move its implementation
into a separate library. To make it possible it is necessary
to minimize internal dependencies.
Differential Revision: https://reviews.llvm.org/D99055
When attempting to return something like a <vscale x 1 x i32>
type from a function we end up trying to widen the vector by
inserting a <vscale x 1 x i32> subvector into an undefined
<vscale x 4 x i32> vector. However, during legalisation we
then attempt to widen the INSERT_SUBVECTOR operands and hit
an error in WidenVectorOperand.
This patch adds a new WidenVecOp_INSERT_SUBVECTOR function
that currently only supports inserting subvectors into undefined
vectors.
Differential Revision: https://reviews.llvm.org/D102501
We have been handling filters and landingpads incorrectly all along. We
pass clauses' (catches') types to `__cxa_find_matching_catch` in JS glue
code, which returns the thrown pointer and sets the selector using
`setTempRet0()`.
We apparently have been doing the same for filters' (exception specs')
types; we pass them to `__cxa_find_matching_catch` just the same way as
clauses. And `__cxa_find_matching_catch` treats all given types as
clauses. So it is a little surprising; maybe we intended to do something
from the JS side and didn't end up doing?
So anyway, I don't think supporting exception specs in Emscripten EH is
a priority, but this can actually cause incorrect results for normal
catches when functions are inlined and the inlined spec type has a
parent-child relationship with the catch's type.
---
The below is an example of a bug that can happen when inlining and class
hierarchy is mixed. If you are busy you can skip this part:
```
struct A {};
struct B : A {};
void bar() throw (B) { throw B(); }
void foo() {
try {
bar();
} catch (A &) {
fputs ("Expected result\n", stdout);
}
}
```
In the unoptimized code, `bar`'s landingpad will have a filter for `B`
and `foo`'s landingpad will have a clause for `A`. But when `bar` is
inlined into `foo`, `foo`'s landingpad has both a filter for `B` and a
clause for `A`, and it passes the both types to
`__cxa_find_matching_catch`:
```
__cxa_find_matching_catch(typeinfo for B, typeinfo for A)
```
`__cxa_find_matching_catch` thinks both are clauses, and looks at the
first type `B`, which belongs to a filter. And the thrown type is `B`,
so it thinks the first type `B` is caught. But this makes it return an
incorrect selector, because it is supposed to catch the exception using
the second type `A`, which is a parent of `B`. As a result, the `foo` in
the example program above does not print "Expected result" but just
throws the exception to the caller. (This wouldn't have happened if `A`
and `B` are completely disjoint types, such as `float` and `int`)
Fixes https://bugs.llvm.org/show_bug.cgi?id=50357.
Reviewed By: dschuff, kripken
Differential Revision: https://reviews.llvm.org/D102795
llvm::Any::TypeId::Id relies on the uniqueness of the address of a static
variable defined in a template function. hidden visibility implies vague linkage
for that variable, which does not guarantee the uniqueness of the address across
a binary and a shared library. This totally breaks the implementation of
llvm::Any.
Ideally, setting visibility to llvm::Any::TypeId::Id should be enough,
unfortunately this doesn't work as expected and we lack time (before 12.0.1
release) to understand why setting the visibility to llvm::Any does work.
See https://gcc.gnu.org/wiki/Visibility and
https://gcc.gnu.org/onlinedocs/gcc/Vague-Linkage.html
for more information on that topic.
Differential Revision: https://reviews.llvm.org/D101972
bswap.v2i16 + sitofp in LLVM IR generate a sequence of:
- REV32 + USHR for bswap.v2i16
- SHL + SSHR + SCVTF for sext to v2i32 and scvt
The shift instructions are excessive as noted in PR24820, and they can
be optimized to just SSHR.
Differential Revision: https://reviews.llvm.org/D102333
In LAM model X86_64 will use bits 57-62 (of 0-63) as HWASAN tag.
So here we make sure the tag shift position and tag mask is correct for x86-64.
Differential Revision: https://reviews.llvm.org/D102472
Currently 1 byte global object has a ridiculous 63 bytes redzone.
This patch reduces the redzone size to be less than 32 if the size of global object is less than or equal to half of 32 (the minimal size of redzone).
A 12 bytes object has a 20 bytes redzone, a 20 bytes object has a 44 bytes redzone.
Reviewed By: MaskRay, #sanitizers, vitalybuka
Differential Revision: https://reviews.llvm.org/D102469
To track security issues, we're starting with the chromium bug tracker
(using the llvm project there).
We considered using Github Security Advisories. However, they are
currently intended as a way for project owners to publicize their
security advisories, and aren't well-suited to reporting issues.
This also moves the issue-reporting paragraph to the beginning of the
document, in part to make it more discoverable, in part to allow the
anchor-linking to actually display the paragraph at the top of the page.
Note that this doesn't update the concrete list of security-sensitive
areas, which is still an open item. When we do, we may want to move the
list of security-sensitive areas next to the issue-reporting paragraph
as well, as it seems like relevant information needed in the reporting
process.
Finally, when describing the discission medium, this splits the topics
discussed into two: the concrete security issues, discussed in the
issue tracker, and the logistics of the group, in our mailing list,
as patches on public lists, and in the monthly sync-up call.
While there, add a SECURITY.md page linking to the relevant paragraph.
Differential Revision: https://reviews.llvm.org/D100873
The ROR instruction can only handle immediates between 1 and 31. The
would-be encoding for ROR #0 is actually the RRX instruction.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D102455
This change tries to fix a place missing `moveAndDanglePseudoProbes `. In FoldValueComparisonIntoPredecessors, it folds the BB into predecessors and then marked the BB unreachable. However, the original logic from the BB is still alive, deleting the probe will mislead the SampleLoader mark it as zero count sample.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D102721
FullTy is only necessary when we need to figure out what type an
instruction works with given a pointer's pointee type. However, we just
end up using the value operand's type, so FullTy isn't necessary.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102788
No verifier changes needed, the verifier currently doesn't check that
the pointer operand's pointee type matches the GEP type. There is a
similar check in GetElementPtrInst::Create() though.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102744
Summary:
Currently, only `OptimizationRemarks` can be emitted using a Function.
Add constructors to allow this for `OptimizationRemarksAnalysis` and
`OptimizationRemarkMissed` as well.
Reviewed By: jdoerfert thegameg
Differential Revision: https://reviews.llvm.org/D102784
This is another FMF gap exposed by D90901, but I don't see a way
to show the difference in a regression test as with:
f66ba4c
6025663
We will see an asm difference if we add a test as part of D90901.
Similar to 8854b27 -
All of the CHECK lines should be identical to before,
but without any of the x86-specific calls that were
replaced with generic FMA long ago.
The file still has value because it shows a miscompile
as demonstrated in D90901, but we probably need to
add tests with FMF to make that explicit without
losing coverage.
For source-based coverage, the frontend sets the counter IDs and the
constraints of counter IDs is not defined. For e.g., the Rust frontend
until recently had a reserved counter #0
(https://github.com/rust-lang/rust/pull/83774). Rust coverage
instrumentation also creates counters on edges in addition to basic
blocks. Some functions may have more counters than regions.
This breaks an assumption in CoverageMapping.cpp where the number of
counters in a function is assumed to be bounded by the number of
regions:
Counts.assign(Record.MappingRegions.size(), 0);
This assumption causes CounterMappingContext::evaluate() to fail since
there are not enough counter values created in the above call to
`Counts.assign`. Consequently, some uncovered functions are not
reported in coverage reports.
This change walks a Function's CoverageMappingRecord to find the maximum
counter ID, and uses it to initialize the counter array when instrprof
records are missing for a function in sparse profiles.
Differential Revision: https://reviews.llvm.org/D101780
In order to create the code regions for llvm-mca to analyze, llvm-mca creates an
AsmCodeRegionGenerator and calls AsmCodeRegionGenerator::parseCodeRegions().
Within this function, both an MCAsmParser and MCTargetAsmParser are created so
that MCAsmParser::Run() can be used to create the code regions for us.
These parser classes were created for llvm-mc so they are designed to emit code
with an MCStreamer and MCTargetStreamer that are expected to be setup and passed
into the MCAsmParser constructor. Because llvm-mca doesn’t want to emit any
code, an MCStreamerWrapper class gets created instead and passed into the
MCAsmParser constructor. This wrapper inherits from MCStreamer and overrides
many of the emit methods to just do nothing. The exception is the
emitInstruction() method which calls Regions.addInstruction(Inst).
This works well and allows llvm-mca to utilize llvm-mc’s MCAsmParser to build
our code regions, however there are a few directives which rely on the
MCTargetStreamer. llvm-mc assumes that the MCStreamer that gets passed into the
MCAsmParser’s constructor has a valid pointer to an MCTargetStreamer. Because
llvm-mca doesn’t setup an MCTargetStreamer, when the parser encounters one of
those directives, a segfault will occur.
In x86, each one of these 7 directives will cause this segfault if they exist in
the input assembly to llvm-mca:
.cv_fpo_proc
.cv_fpo_setframe
.cv_fpo_pushreg
.cv_fpo_stackalloc
.cv_fpo_stackalign
.cv_fpo_endprologue
.cv_fpo_endproc
I haven’t looked at other targets, but I wouldn’t be surprised if some of the
other ones also have certain directives which could result in this same
segfault.
My proposed solution is to simply initialize an MCTargetStreamer after we
initialize the MCStreamerWrapper. The MCTargetStreamer requires an ostream
object, but we don’t actually want any of these directives to be emitted
anywhere, so I use an ostream created with the nulls() function. Since this
needs to happen after the MCStreamerWrapper has been initialized, it needs to
happen within the AsmCodeRegionGenerator::parseCodeRegions() function. The
MCTargetStreamer also needs an MCInstPrinter which is easiest to initialize
within the main() function of llvm-mca. So this MCInstPrinter gets constructed
within main() then passed into the parseCodeRegions() function as a parameter.
(If you feel like it would be appropriate and possible to create the
MCInstPrinter within the parseCodeRegions() function, then feel free to modify
my solution. That would stop us from having to pass it into the function and
would limit its scope / lifetime.)
My solution stops the segfault from happening and still passes all of the
current (expected) llvm-mca tests. I also added a new test for x86 that checks
for this segfault on an input that includes one of the .cv_fpo directives (this
test fails without my solution, but passes with it).
As far as I can tell, all of the functions that I modified are only called from
within llvm-mca so there shouldn’t be any worries about breaking other tools.
Differential Revision: https://reviews.llvm.org/D102709
Turns out simplifyLoopIVs sometimes returns a non-dead instruction in it's DeadInsts out param. I had done a bit of NFC cleanup which was only NFC if simplifyLoopIVs obeyed it's documentation. I'm simplfy dropping that part of the change.
Commit message from try 3:
Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :)
Original commit message:
The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case.
LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.
Differential Revision: https://reviews.llvm.org/D102511