The fold was implemented for the general case but use-limitation,
but the later constant version which didn't check uses was only
matching splat constants.
llvm-svn: 341292
In lib/CodeGen/LiveDebugVariables.cpp, it uses std::prev(MBBI) to
get DebugValue's SlotIndex. However, the previous instruction may be
also a debug instruction. It could not use a debug instruction to query
SlotIndex in mi2iMap.
Scan all debug instructions and use the first debug instruction to query
SlotIndex for following debug instructions. Only handle DBG_VALUE in
handleDebugValue().
Differential Revision: https://reviews.llvm.org/D50621
llvm-svn: 341289
If we have a pair of binops feeding another pair of binops, rearrange the operands so
the matching pair are together because that allows easy factorization folds to happen
in instcombine:
((X << S) & Y) & (Z << S) --> ((X << S) & (Z << S)) & Y (reassociation)
--> ((X & Z) << S) & Y (factorize shift from 'and' ops optimization)
This is part of solving PR37098:
https://bugs.llvm.org/show_bug.cgi?id=37098
Note that there's an instcombine version of this patch attached there, but we're trying
to make instcombine have less responsibility to improve compile-time efficiency.
For reasons I still don't completely understand, reassociate does this kind of transform
sometimes, but misses everything in my motivating cases.
This patch on its own is gluing an independent cleanup chunk to the end of the existing
RewriteExprTree() loop. We can build on it and do something stronger to better order the
full expression tree like D40049. That might be an alternative to the proposal to add a
separate reassociation pass like D41574.
Differential Revision: https://reviews.llvm.org/D45842
llvm-svn: 341288
Summary:
A follow-up for D49266 / rL337166 + D49497 / rL338044.
This is still the same pattern to check for the [lack of]
signed truncation, but in this case the constants and the predicate
are negated.
https://rise4fun.com/Alive/BDVhttps://rise4fun.com/Alive/n7Z
Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma, dmgreen
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51532
llvm-svn: 341287
Removes the implicit conversion to the underlying type for
JITSymbolFlags::FlagNames and replaces it with some bitwise and comparison
operators.
llvm-svn: 341282
This is no-outwardly-visible-change intended, so no test.
But the code is smaller and more efficient. The check for
a 'not' op is intended to avoid the expensive value tracking
call when it should not be necessary, and it might prevent
infinite looping when we resurrect:
rL300977
llvm-svn: 341280
The 'rol Rd' instruction is equivalent to 'adc Rd'.
This caused compile warnings from tablegen because of conflicting bits
shared between each instruction.
llvm-svn: 341275
Summary:
Reid suggested making HasWinCFI a plain bool defaulting to false in D50288.
It's needed in order to add HasWinCFI to MIRPrinter. Otherwise, we'll get the
assertion:
HasWinCFI.hasValue() && "HasWinCFI not set yet!"'
Also, a few ARM64 Windows test cases will fail with the same assert if the ARM64
MCLayer part of EH work (D50166) goes in before the frame lowering part that
sets HasWinCFI (D50288 as of now).
Reviewers: rnk, mstorsjo, hans, javed.absar
Reviewed By: rnk
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D51560
llvm-svn: 341270
Leverage existing logic in constant hoisting pass to transform constant GEP
expressions sharing the same base global variable. Multi-dimensional GEPs are
rewritten into single-dimensional GEPs.
Differential Revision: https://reviews.llvm.org/D51396
llvm-svn: 341269
ModuleCount = InstrCount was incorrect. It should have been
InstrCount = ModuleCount. This was making it emit an extra, incorrect remark
for Print Module IR.
The test didn't catch this, because it didn't ensure that the only remark
output was from the desired pass. So, it was possible to have an extra remark
come through and not fail. Updated the test so that we ensure that the last
remark that's output comes from the desired pass. This is done by ensuring
that whatever is being read after the last remark is YAML output rather than
some incorrect garbage.
llvm-svn: 341267
- Remove duplication: Both TestingGuide and TestSuiteMakefileGuide
would give a similar overview over the test-suite.
- Present cmake/lit as the default/normal way of running the test-suite:
- Move information about the cmake/lit testsuite into the new
TestSuiteGuide.rst file. Mark the remaining information in
TestSuiteMakefilesGuide.rst as deprecated.
- General simplification and shorting of language.
- Remove paragraphs about tests known to fail as everything should pass
nowadays.
- Remove paragraph about zlib requirement; it's not required anymore
since we copied a zlib source snapshot into the test-suite.
- Remove paragraph about comparison with "native compiler". Correctness is
always checked against reference outputs nowadays.
- Change cmake/lit quickstart section to recommend `pip` for installing
lit and use `CMAKE_C_COMPILER` and a cache file in the example as that
is what most people will end up doing anyway. Also a section about
compare.py to quickstart.
- Document `Bitcode` and `MicroBenchmarks` directories.
- Add section with commonly used cmake configuration options.
- Add section about showing and comparing result files via compare.py.
- Add section about using external benchmark suites.
- Add section about using custom benchmark suites.
- Add section about profile guided optimization.
- Add section about cross-compilation and running on external devices.
Differential Revision: https://reviews.llvm.org/D51465
llvm-svn: 341260
These intrinsics use the same implementation as PTEST intrinsics, but use vXi1 vectors.
New clang builtins will be accompanying them shortly.
llvm-svn: 341259
In basic block, loop, and function passes, we already have a function that
we can use to emit optimization remarks. We can use that instead of searching
the module for the first suitable function (that is, one that contains at
least one basic block.)
llvm-svn: 341253
Instead of counting the size of the entire module every time we run a pass,
pass along a delta instead and use that to emit the remark.
This means we only have to use (on average) smaller IR units to calculate
instruction counts. E.g, in a BB pass, we only need to look at the delta of
the BB instead of the delta of the entire module.
6/6
(This improved compile time for size remarks on sqlite3 + O2 significantly)
llvm-svn: 341250
Same vein as the previous commits. Pre-calculate the size of
the module and use that to decide if we're going to emit a
remark.
This one comes with a FIXME and TODO. First off, CallGraphSCC
and CallGraphNode don't have a getInstructionCount function. So,
for now, we do the same thing as in a module pass.
Second off, we're not really saving anything here yet, because
as before, I need to change emitInstrCountChangedRemark to take
in a delta. Keeping the patches small though, so that's coming up
next.
5/6
llvm-svn: 341249
Same as the previous NFC commits in the same vein.
This one introduces a TODO. I'm going to change emitInstrCountChangedRemark
so that it takes in a delta. Since the delta isn't necessary yet, it's not
there. For now, this means that we're calculating the size of the module
twice.
Just done separately to keep the patches small.
4/6
llvm-svn: 341248
Another commit reducing compile time in size remarks.
Cache the size of the module and loop, and update values based
off of deltas instead. Avoid recalculating the size of the
whole module whenever possible.
3/6
llvm-svn: 341247
Size remarks are slow due to lots of recalculation of the module.
This is similar to the previous commit. Cache the size of the module and
update counts in basic block passes based off a less-expensive delta.
2/6
llvm-svn: 341246
Size remarks are slow due to lots of recalculation of the module.
Pre-calculate the module size and initial function size for a remark. Use
deltas calculated using the less-expensive function IR count to update the
module counts for Function passes.
1/6
llvm-svn: 341245
Summary:
The python executable may not exist on all systems so use sys.executable
instead.
Reviewers: ddunbar, stella.stamenova
Subscribers: delcypher, llvm-commits
Differential Revision: https://reviews.llvm.org/D51511
llvm-svn: 341244
Since we changed the storage for the PID in PIDRecord instances, we need
to also update the way we load the data from a DataExtractor through the
RecordInitializer.
llvm-svn: 341243
Previously we've been reading and writing the wrong types which only
worked in little endian implementations. This time we're writing the
same typed values the runtime is using, and reading them appropriately
as well.
llvm-svn: 341241
I changed the seed slightly, but forgot to run the tests on a 32-bit system, so
tests which hard-code a specific hash value started breaking.
llvm-svn: 341240
This change allows us to let the compiler do the right thing for when
handling big-endian and little-endian records for FDR mode function
records.
Previously, we assumed that the encoding was little-endian that reading
the first byte to look for the function id and function record types was
ordered in a little-endian manner. This change allows us to better
handle function records where the first four bytes may actually be
encoded in big-endian thus giving us the wrong bytes where we're seeking
the function information from.
This is a follow-up to D51210 and D51289.
llvm-svn: 341236
This change makes the writer implementation more consistent with the way
fields are written down to avoid assumptions on bitfield order and
padding. We also fix an inconsistency between the type returned by the
`delta()` accessor to match the data member it's returning.
This is a follow-up to D51289 and D51210.
llvm-svn: 341230
Following D50807, and heading towards D50664, this intermediary change does the following:
1. Upgrade all custom Error types in llvm/trunk/lib/DebugInfo/ to use the new StringError behavior (D50807).
2. Implement std::is_error_code_enum and make_error_code() for DebugInfo error enumerations.
3. Rename GenericError -> PDBError (the file will be renamed in a subsequent commit)
4. Update custom error messages to follow the same formatting: (\w\s*)+\.
5. Keep generic "file not found" (ENOENT) errors as they are in PDB code. Previously, there used to be a custom enumeration for that purpose.
6. Remove a few extraneous LF in log() implementations. Printing LF is a responsability at a higher level, not at the error level.
Differential Revision: https://reviews.llvm.org/D51499
llvm-svn: 341228
This patch recognizes shuffles that shift elements and fill with zeros. I've copied and modified the shift matching code we use for normal vector registers to do this. I'm not sure if there's a good way to share more of this code without making the existing function more complex than it already is.
This will be used to enable kshift intrinsics in clang.
Differential Revision: https://reviews.llvm.org/D51401
llvm-svn: 341227
This change makes the XRay Trace loading functions first use a
little-endian data extractor, then on failures try a big-endian data
extractor. Without this change, the trace loading facility will not work
with data written from a big-endian machine.
Follow-up to D51210 and D51289.
llvm-svn: 341226
Before this patch, the FDRTraceWriter would not take endianness into
account when writing data into the output stream.
This is a follow-up to D51289 and D51210.
llvm-svn: 341223
The presence of a ReadAdvance for input operand #0 is problematic
because it changes the input latency of the register used as the base address
for the folded load.
A broadcast cannot start executing if the load address hasn't been computed yet.
In the llvm-mca example, the VBROADCASTSS is dependent on the address generated
by the LEAQ. That means, it cannot start until LEAQ reaches the write-back
stage. If we apply ReadAdvance, then we wrongly assume that the load can start 3
cycles in advance.
Differential Revision: https://reviews.llvm.org/D51534
llvm-svn: 341222
The `mtc1` and `mfc1` definitions in the MipsInstrFPU.td have MMRel,
but do not have StdMMR6Rel tags. When these instructions are emitted
for microMIPS R6 targets, `Mips::MipsR62MicroMipsR6` nor
`Mips::Std2MicroMipsR6` cannot find correct op-codes and as a result the
backend uses mips32 variant of the instructions encoding.
The patch fixes this problem by adding the StdMMR6Rel tag and check
instructions encoding in the test case.
Differential revision: https://reviews.llvm.org/D51482
llvm-svn: 341221
The intention is to enable the extract_vector_elt load combine,
and doing this for other operations interferes with more
useful optimizations on vectors.
Handle any type of load since in principle we should do the
same combine for the various load intrinsics.
llvm-svn: 341219