No verifier changes needed, the verifier currently doesn't check that
the pointer operand's pointee type matches the GEP type. There is a
similar check in GetElementPtrInst::Create() though.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102744
Summary:
Currently, only `OptimizationRemarks` can be emitted using a Function.
Add constructors to allow this for `OptimizationRemarksAnalysis` and
`OptimizationRemarkMissed` as well.
Reviewed By: jdoerfert thegameg
Differential Revision: https://reviews.llvm.org/D102784
This is another FMF gap exposed by D90901, but I don't see a way
to show the difference in a regression test as with:
f66ba4c
6025663
We will see an asm difference if we add a test as part of D90901.
Similar to 8854b27 -
All of the CHECK lines should be identical to before,
but without any of the x86-specific calls that were
replaced with generic FMA long ago.
The file still has value because it shows a miscompile
as demonstrated in D90901, but we probably need to
add tests with FMF to make that explicit without
losing coverage.
For source-based coverage, the frontend sets the counter IDs and the
constraints of counter IDs is not defined. For e.g., the Rust frontend
until recently had a reserved counter #0
(https://github.com/rust-lang/rust/pull/83774). Rust coverage
instrumentation also creates counters on edges in addition to basic
blocks. Some functions may have more counters than regions.
This breaks an assumption in CoverageMapping.cpp where the number of
counters in a function is assumed to be bounded by the number of
regions:
Counts.assign(Record.MappingRegions.size(), 0);
This assumption causes CounterMappingContext::evaluate() to fail since
there are not enough counter values created in the above call to
`Counts.assign`. Consequently, some uncovered functions are not
reported in coverage reports.
This change walks a Function's CoverageMappingRecord to find the maximum
counter ID, and uses it to initialize the counter array when instrprof
records are missing for a function in sparse profiles.
Differential Revision: https://reviews.llvm.org/D101780
In order to create the code regions for llvm-mca to analyze, llvm-mca creates an
AsmCodeRegionGenerator and calls AsmCodeRegionGenerator::parseCodeRegions().
Within this function, both an MCAsmParser and MCTargetAsmParser are created so
that MCAsmParser::Run() can be used to create the code regions for us.
These parser classes were created for llvm-mc so they are designed to emit code
with an MCStreamer and MCTargetStreamer that are expected to be setup and passed
into the MCAsmParser constructor. Because llvm-mca doesn’t want to emit any
code, an MCStreamerWrapper class gets created instead and passed into the
MCAsmParser constructor. This wrapper inherits from MCStreamer and overrides
many of the emit methods to just do nothing. The exception is the
emitInstruction() method which calls Regions.addInstruction(Inst).
This works well and allows llvm-mca to utilize llvm-mc’s MCAsmParser to build
our code regions, however there are a few directives which rely on the
MCTargetStreamer. llvm-mc assumes that the MCStreamer that gets passed into the
MCAsmParser’s constructor has a valid pointer to an MCTargetStreamer. Because
llvm-mca doesn’t setup an MCTargetStreamer, when the parser encounters one of
those directives, a segfault will occur.
In x86, each one of these 7 directives will cause this segfault if they exist in
the input assembly to llvm-mca:
.cv_fpo_proc
.cv_fpo_setframe
.cv_fpo_pushreg
.cv_fpo_stackalloc
.cv_fpo_stackalign
.cv_fpo_endprologue
.cv_fpo_endproc
I haven’t looked at other targets, but I wouldn’t be surprised if some of the
other ones also have certain directives which could result in this same
segfault.
My proposed solution is to simply initialize an MCTargetStreamer after we
initialize the MCStreamerWrapper. The MCTargetStreamer requires an ostream
object, but we don’t actually want any of these directives to be emitted
anywhere, so I use an ostream created with the nulls() function. Since this
needs to happen after the MCStreamerWrapper has been initialized, it needs to
happen within the AsmCodeRegionGenerator::parseCodeRegions() function. The
MCTargetStreamer also needs an MCInstPrinter which is easiest to initialize
within the main() function of llvm-mca. So this MCInstPrinter gets constructed
within main() then passed into the parseCodeRegions() function as a parameter.
(If you feel like it would be appropriate and possible to create the
MCInstPrinter within the parseCodeRegions() function, then feel free to modify
my solution. That would stop us from having to pass it into the function and
would limit its scope / lifetime.)
My solution stops the segfault from happening and still passes all of the
current (expected) llvm-mca tests. I also added a new test for x86 that checks
for this segfault on an input that includes one of the .cv_fpo directives (this
test fails without my solution, but passes with it).
As far as I can tell, all of the functions that I modified are only called from
within llvm-mca so there shouldn’t be any worries about breaking other tools.
Differential Revision: https://reviews.llvm.org/D102709
Turns out simplifyLoopIVs sometimes returns a non-dead instruction in it's DeadInsts out param. I had done a bit of NFC cleanup which was only NFC if simplifyLoopIVs obeyed it's documentation. I'm simplfy dropping that part of the change.
Commit message from try 3:
Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :)
Original commit message:
The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case.
LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.
Differential Revision: https://reviews.llvm.org/D102511
Sample profile loader can be run in both LTO prelink and postlink. Currently the counts annoation in postilnk doesn't fully overwrite what's done in prelink. I'm adding a switch (`-overwrite-existing-weights=1`) to enable a full overwrite, which includes:
1. Clear old metadata for calls when their parent block has a zero count. This could be caused by prelink code duplication.
2. Clear indirect call metadata if somehow all the rest targets have a sum of zero count.
3. Overwrite branch weight for basic blocks.
With a CS profile, I was seeing #1 and #2 help reduce code size by preventing post-sample ICP and CGSCC inliner working on obsolete metadata, which come from a partial global inlining in prelink. It's not expected to work well for non-CS case with a less-accurate post-inline count quality.
It's worth calling out that some prelink optimizations can damage counts quality in an irreversible way. One example is the loop rotate optimization. Due to lack of exact loop entry count (profiling can only give loop iteration count and loop exit count), moving one iteration out of the loop body leaves the rest iteration count unknown. We had to turn off prelink loop rotate to achieve a better postlink counts quality. A even better postlink counts quality can be archived by turning off prelink CGSCC inlining which is not context-sensitive.
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D102537
lld/MachO/Driver.cpp and lld/MachO/SyntheticSections.cpp include
llvm/Config/config.h which doesn't exist when building standalone lld.
This patch replaces llvm/Config/config.h include with llvm/Config/llvm-config.h
just like it is in lld/ELF/Driver.cpp and HAVE_LIBXAR with LLVM_HAVE_LIXAR and
moves LLVM_HAVE_LIBXAR from config.h to llvm-config.h
Also it adds LLVM_HAVE_LIBXAR to LLVMConfig.cmake and links liblldMachO2.so
with XAR_LIB if LLVM_HAVE_LIBXAR is set.
Differential Revision: https://reviews.llvm.org/D102084
The operation of some VP intrinsics do/will not map to regular
instruction opcodes. Returning 'None' seems more intuitive here than
'Instruction::Call'.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D102778
- This patch (is one in a series of patches) which introduces HLASM Parser support (for the first parameter of inline asm statements) to LLVM ([[ https://lists.llvm.org/pipermail/llvm-dev/2021-January/147686.html | main RFC here ]])
- This patch in particular introduces HLASM Parser support for Z machine instructions.
- The approach taken here was to subclass `AsmParser`, and make various functions and variables as "protected" wherever appropriate.
- The `HLASMAsmParser` class overrides the `parseStatement` function. Two new private functions `parseAsHLASMLabel` and `parseAsMachineInstruction` are introduced as well.
The general syntax is laid out as follows (more information available in [[ https://www.ibm.com/support/knowledgecenter/SSENW6_1.6.0/com.ibm.hlasm.v1r6.asm/asmr1023.pdf | HLASM V1R6 Language Reference Manual ]] - Chapter 2 - Instruction Statement Format):
```
<TokA><spaces.*><TokB><spaces.*><TokC><spaces.*><TokD>
```
1. TokA is referred to as the Name Entry. This token is optional
2. TokB is referred to as the Operation Entry. This token is mandatory.
3. TokC is referred to as the Operand Entry. This token is mandatory
4. TokD is referred to as the Remarks Entry. This token is optional
- If TokA is provided, then we either parse TokA as a possible comment or as a label (Name Entry), Tok B as the Operation Entry and so on.
- If TokA is not provided (i.e. we have one or more spaces and then the first token), then we will parse the first token (i.e TokB) as a possible Z machine instruction, TokC as the operands to the Z machine instruction and TokD as a possible Remark field
- TokC (Operand Entry), no spaces are allowed between OperandEntries. If a space occurs it is classified as an error.
- TokD if provided is taken as is, and emitted as a comment.
The following additional approach was examined, but not taken:
- Adding custom private only functions to base AsmParser class, and only invoking them for z/OS. While this would eliminate the need for another child class, these private functions would be of non-use to every other target. Similarly, adding any pure virtual functions to the base MCAsmParser class and overriding them in AsmParser would also have the same disadvantage.
Testing:
- This patch doesn't have tests added with it, for the sole reason that MCStreamer Support and Object File support hasn't been added for the z/OS target (yet). Hence, it's not possible generate code outright for the z/OS target. They are in the process of being committed / process of being worked on.
- Any comments / feedback on how to combat this "lack of testing" due to other missing required features is appreciated.
Reviewed By: Kai, uweigand
Differential Revision: https://reviews.llvm.org/D98276
The current implementation assumes the destination type of shuffle is the same as the decomposed ones. Add the check to avoid crush when the condition is not satisfied.
This fixes PR37616.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D102751
Generalize the fix from rGd0902a8665b1 by ensuring we widen/narrow the indices subvector first and then perform the ZERO_EXTEND_VECTOR_INREG (if necessary), which should allow us to perform the variable permutes with source/destination/indices vectors of any widths.
Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - these are all Port0 only.
Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
Intriniscs reading or writing the FFR register need to model the fact
there is additional state being read/wrtten.
Model this state as inaccessible memory.
* setffr => write inaccessiblememonly
* rdffr => read inaccessiblememonly
* ldff* => read arg memory, write inaccessiblemem
* ldnf => read arg memory, write inaccessiblemem
In InnerLoopVectorizer::setDebugLocFromInst we were previously
asserting that the VF is not scalable. This is because we want to
use the number of elements to create a duplication factor for the
debug profiling data. However, for scalable vectors we only know the
minimum number of elements. I've simply removed the assert for now
and added a FIXME saying that we assume vscale is always 1. When
vscale is not 1 it just means that the profiling data isn't as
accurate, but shouldn't cause any functional problems.
This is a step towards relying more on node-level FMF rather than function-wide
or target settings.
I think it was just an oversight that we didn't get this path in D87361
or follow-on patches.
The lack of FMF propagation is blocking D90901 from converting tests to IR-level FMF.
We can't do much more than this currently because we also fail to propagate flags
from x86-specific node to generic FMA node. That would be another patch, so the
test just verifies that we can transfer from IR to initial SDAG node.
Differential Revision: https://reviews.llvm.org/D102725
This required some changes to, instead of eagerly making PHI's
in the UnwindDest valid as-if the BB is already not a predecessor,
to be valid while BB is still a predecessor.
The current implementation assumes the destination type of shuffle is the same as the decomposed ones. Add the check to avoid crush when the condition is not satisfied.
This fixes PR37616.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D102751
There doesn't seem to be a need to support recursive locking,
and a recursive mutex is unnecessarily inefficient.
Differential Revision: https://reviews.llvm.org/D102486
Do the single hash calculation before acquiring the lock, to reduce
lock contention. If Copy is true, and the string was not yet contained
in the StringStorage, use the new address from StringStorage, but
reuse the hash we already calculated.
Differential Revision: https://reviews.llvm.org/D102484
This patch adds a new option to the LoopVectorizer to control how
scalable vectors can be used.
Initially, this suggests three levels to control scalable
vectorization, although other more aggressive options can be added in
the future.
The possible options are:
- Disabled: Disables vectorization with scalable vectors.
- Enabled: Vectorize loops using scalable vectors or fixed-width
vectors, but favors fixed-width vectors when the cost
is a tie.
- Preferred: Like 'Enabled', but favoring scalable vectors when the
cost-model is inconclusive.
Reviewed By: paulwalker-arm, vkmr
Differential Revision: https://reviews.llvm.org/D101945
This will allow to use llvm-objcopy with file names that begin with dashes.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D102665
Like the element extraction of these vectors, we choose to promote up to
an i8 vector type and perform the insertion there.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D102697