Instead of computing GUID based on some assumption about symbol mangling
rule from IRName to symbol name, lookup the IRName from all the symtabs
from all the input files to see if there are any matching symbols entry
provides the IRName for GUID computation.
rdar://65853754
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D84803
Summary:
Support TOCU and TOCL relocation type for object file generation.
Reviewed by: DiggerLin
Differential Revision: https://reviews.llvm.org/D84549
The non-standard header file `<sysexits.h>` provides some return values.
`EX_IOERR` is used to as a special value to signal a broken pipe to the clang driver.
On z/OS Unix System Services, this header file does not exists. This patch
- adds a check for `<sysexits.h>`, removing the dependency on `LLVM_ON_UNIX`
- adds a new header file `llvm/Support/ExitCodes`, which either includes
`<sysexits.h>` or defines `EX_IOERR`
- updates the users of `EX_IOERR` to include the new header file
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D83472
When floating point callee-saved registers were used, the frame pointer would
incorrectly point to the bottom of the CSR space (containing saved floating-point
registers), rather than to the frame record.
While all frame offsets were calculated consistently, resulting in working code,
this prevented stack walkers from being about to traverse the frame list.
This implements 2 different vectorisation fallback strategies if tail-folding
fails: 1) don't vectorise at all, or 2) vectorise using a scalar epilogue. This
can be controlled with option -prefer-predicate-over-epilogue, that has been
changed to take a numeric value corresponding to the tail-folding preference
and preferred fallback.
Patch by: Pierre van Houtryve, Sjoerd Meijer.
Differential Revision: https://reviews.llvm.org/D79783
This patch adds a few tests in DebugInfo/MIR/InstrRef/ of interesting
behaviour that the instruction referencing implementation of
LiveDebugValues has. Mostly, these tests exist to ensure that if you
give the "-experimental-debug-variable-locations" command line switch,
the right implementation runs; and to ensure it behaves the same way as
the VarLoc LiveDebugValues implementation.
I've also touched roughly 30 other tests, purely to make the tests less
rigid about what output to accept. DBG_VALUE instructions are usually
printed with a trailing !debug-location indicating its scope:
!debug-location !1234
However InstrRefBasedLDV produces new DebugLoc instances on the fly,
meaning there sometimes isn't a numbered node when they're printed,
making the output:
!debug-location !DILocation(line: 0, blah blah)
Which causes a ton of these tests to fail. This patch removes checks for
that final part of each DBG_VALUE instruction. None of them appear to
be actually checking the scope is correct, just that it's present, so
I don't believe there's any loss in coverage here.
Differential Revision: https://reviews.llvm.org/D83054
If the condition output is negated, swap the branch targets. This is
similar to what SelectionDAG does for when SelectionDAGBuilder
decides to invert the condition and swap the branches.
This is leaving behind a dead constant def for some reason.
This produces less work for addressing mode matching. I think this is
safe since I don't think machine IR is supposed to give the same
aliasing properties as getelementptr in the IR.
If a workgroup size is known to be not greater than wavefront size
the s_barrier instruction is not needed since all threads are guaranteed
to come to the same point at the same time.
This is the same optimization that was implemented for SelectionDAG in
D31731.
Differential Revision: https://reviews.llvm.org/D86609
This patch makes the unit_length and header_length fields of line tables
optional. yaml2obj is able to infer them for us.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D86590
Before calling target hook to determine if two loads/stores are clusterable,
we put them into different groups to avoid fake cluster due to dependency.
For now, we are putting the loads/stores into the same group if they have
the same predecessor. We assume that, if two loads/stores have the same
predecessor, it is likely that, they didn't have dependency for each other.
However, one SUnit might have several predecessors and for now, we just
pick up the first predecessor that has non-data/non-artificial dependency,
which is too arbitrary. And we are struggling to fix it.
So, I am proposing some better implementation.
1. Collect all the loads/stores that has memory info first to reduce the complexity.
2. Sort these loads/stores so that we can stop the seeking as early as possible.
3. For each load/store, seeking for the first non-dependency instruction with the
sorted order, and check if they can cluster or not.
Reviewed By: Jay Foad
Differential Revision: https://reviews.llvm.org/D85517
Currently, `dyn_cast<XCOFFObjectFile>` always does cast and returns a pointer,
even when we pass `ELF`/`Wasm`/`Mach-O` or `COFF` instead of `XCOFF`.
It happens because `XCOFFObjectFile` class does not implement `classof`.
I've fixed it and added a unit test.
Differential revision: https://reviews.llvm.org/D86542
MVE Gather scatter codegeneration is looking a lot better than it used
to, but still has some issues. The instructions we currently model as 1
cycle per element, which is a bit low for some cases. Increasing the
cost by the MVECostFactor brings them in-line with our other instruction
costs. This will have the effect of only generating then when the extra
benefit is more likely to overcome some of the issues. Notably in
running out of registers and vectorizing loops that could otherwise be
SLP vectorized.
In the short-term whilst we look at other ways of dealing with those
more directly, we can increase the costs of gathers to make them more
likely to be beneficial when created.
Differential Revision: https://reviews.llvm.org/D86444
We have no tests for OS/ABI values specific to
EM_TI_C6000, ELFOSABI_AMDGPU_MESA3D and ELFOSABI_ARM machines.
Also, related arrays in the code are not grouped together.
(That is why such testing was missed I guess).
The patch fixes that all.
Differential revision: https://reviews.llvm.org/D86341
If the basic block of the instruction passed to getUniqueReachingMIDef
is a transitive predecessor of itself and has a definition of the
register, the function will return that definition even if it is after
the instruction given to the function. This patch stops the function
from scanning the instruction's basic block to prevent this.
Differential Revision: https://reviews.llvm.org/D86607
This removes Error.cpp/.h files from obj2yaml.
These files are not needed because we are
using `Error`s instead of error codes widely and do
not need a logic related to obj2yaml specific
error codes anymore.
I had to adjust just a few lines of tool's code
to remove remaining dependencies.
Differential revision: https://reviews.llvm.org/D86536
llvm-readobj crashes when `-S --section-symbols` is used
on an object that has no symbol table.
The patch fixes it.
Differential revision: https://reviews.llvm.org/D86520
The `sections-ext.test` is a test for ELF that is used
to test `--st`, `--sr` and `--sd` extension options for `-S`.
There are 2 problems with it:
1) It is broken, because for CHECK lines it contains there is
no corresponding `FileCheck` call.
2) It uses the precompiled object: `trivial.obj.elf-i386`.
This is the last ELF test where `trivial.obj.elf-i386` is used so we can get
rid of the binary and use an YAML description.
Also, there is a `Inputs/trivial.ll` file that describes how `trivial*` objects
in `Inputs` folders are created. I've removed it from `ELF`, because it is not
actual anymore (we have no more input binaries created with the use of trivial.ll there)
and copied the refined versions of it to `COFF`, `MachO` and `wasm` Input folders.
Differential revision: https://reviews.llvm.org/D86462
This change extend the CMake files with the necessary additions
to build LLVM for z/OS.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D83866
pointer.
mwaitx uses EBX as one of its argument.
Using this instruction clobbers RBX as it is defined to hold one of the
input. When the backend uses dynamically allocated stack, RBX is used as
a reserved register for the base pointer.
This patch is adapted from @qcolombet patch for cmpxchg at r263325.
This fixes PR43528.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D73475
The documentation was missing a '*/' in '/*<2x32-bit> vadd {0, 64, VPR}',
and the example code are now aligned to improve readability.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86201
When optimizing the table, PointerToAnyOperandMatchers would be
incorrectly reported as identical even though they have different
SizeInBits values. This bug was due to failing to overload the
isIdentical() method, which this patch addresses.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86199
Intrinsic properties can now be set to default and applied to all
intrinsics. If the attributes are not needed, the user can opt-out by
setting the DisableDefaultAttributes flag to true.
Differential Revision: https://reviews.llvm.org/D70365
The switch in AArch64Operand::print was changed in D45688 so the shift
can be printed after printing the register. This is implemented with
LLVM_FALLTHROUGH and was broken in D52485 when BTIHint was put between
the register and shift operands.
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D86535
This fixes an issue where the restore point of callee-saves in the
function epilogues was incorrectly calculated when the basic block
consisted of only a RET instruction. This caused dealloc instructions
to be inserted in between the block of callee-save restore instructions,
rather than before it.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D86099
Currently `strace llvm-dwarfdump x.debug >/tmp/file`:
ioctl(1, TCGETS, 0x7ffd64d7f340) = -1 ENOTTY (Inappropriate ioctl for device)
write(1, " DW_AT_decl_line\t(89)\n"..., 4096) = 4096
ioctl(1, TCGETS, 0x7ffd64d7f400) = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(1, TCGETS, 0x7ffd64d7f410) = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(1, TCGETS, 0x7ffd64d7f400) = -1 ENOTTY (Inappropriate ioctl for device)
After this patch:
write(1, "0000000000001102 \"strlen\")\n "..., 4096) = 4096
write(1, "site\n DW_AT_low"..., 4096) = 4096
write(1, "d53)\n\n0x000e4d4d: DW_TAG_G"..., 4096) = 4096
The same speedup can be achieved by `--color=0` but that is not much convenient.
This implementation has been suggested by Joerg Sonnenberger.
Differential Revision: https://reviews.llvm.org/D86406
This patch produces an edge-based interface in AAIsDead.
By this, we can query a set of basic blocks that are directly reachable from a given basic block.
This is specifically useful for implementation of AAReachability.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D85547
While since D86306 we do it's sibling fold for `insertvalue`,
we should also do this for `extractvalue`'s.
And unlike that one, the results here are, quite honestly, shocking,
as it can be observed here on vanilla llvm test-suite + RawSpeed results:
```
| statistic name | baseline | proposed | Δ | % | |%| |
|----------------------------------------------------|-----------|-----------|--------:|--------:|-------:|
| asm-printer.EmittedInsts | 7945095 | 7942507 | -2588 | -0.03% | 0.03% |
| assembler.ObjectBytes | 273209920 | 273069800 | -140120 | -0.05% | 0.05% |
| early-cse.NumCSE | 2183363 | 2183398 | 35 | 0.00% | 0.00% |
| early-cse.NumSimplify | 541847 | 550017 | 8170 | 1.51% | 1.51% |
| instcombine.NumAggregateReconstructionsSimplified | 2139 | 108 | -2031 | -94.95% | 94.95% |
| instcombine.NumCombined | 3601364 | 3635448 | 34084 | 0.95% | 0.95% |
| instcombine.NumConstProp | 27153 | 27157 | 4 | 0.01% | 0.01% |
| instcombine.NumDeadInst | 1694521 | 1765022 | 70501 | 4.16% | 4.16% |
| instcombine.NumPHIsOfExtractValues | 0 | 37546 | 37546 | 0.00% | 0.00% |
| instcombine.NumSunkInst | 63158 | 63686 | 528 | 0.84% | 0.84% |
| instcount.NumBrInst | 874304 | 871857 | -2447 | -0.28% | 0.28% |
| instcount.NumCallInst | 1757657 | 1758402 | 745 | 0.04% | 0.04% |
| instcount.NumExtractValueInst | 45623 | 11483 | -34140 | -74.83% | 74.83% |
| instcount.NumInsertValueInst | 4983 | 580 | -4403 | -88.36% | 88.36% |
| instcount.NumInvokeInst | 61018 | 59478 | -1540 | -2.52% | 2.52% |
| instcount.NumLandingPadInst | 35334 | 34215 | -1119 | -3.17% | 3.17% |
| instcount.NumPHIInst | 344428 | 331116 | -13312 | -3.86% | 3.86% |
| instcount.NumRetInst | 100773 | 100772 | -1 | 0.00% | 0.00% |
| instcount.TotalBlocks | 1081154 | 1077166 | -3988 | -0.37% | 0.37% |
| instcount.TotalFuncs | 101443 | 101442 | -1 | 0.00% | 0.00% |
| instcount.TotalInsts | 8890201 | 8833747 | -56454 | -0.64% | 0.64% |
| instsimplify.NumSimplified | 75822 | 75707 | -115 | -0.15% | 0.15% |
| simplifycfg.NumHoistCommonCode | 24203 | 24197 | -6 | -0.02% | 0.02% |
| simplifycfg.NumHoistCommonInstrs | 48201 | 48195 | -6 | -0.01% | 0.01% |
| simplifycfg.NumInvokes | 2785 | 4298 | 1513 | 54.33% | 54.33% |
| simplifycfg.NumSimpl | 997332 | 1018189 | 20857 | 2.09% | 2.09% |
| simplifycfg.NumSinkCommonCode | 7088 | 6464 | -624 | -8.80% | 8.80% |
| simplifycfg.NumSinkCommonInstrs | 15117 | 14021 | -1096 | -7.25% | 7.25% |
```
... which tells us that this new fold fires whopping 38k times,
increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time),
decreasing total instruction count by -0.64% (-56454),
and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less)
and `extractvalue`'s (-74.83%, i.e. four times less).
This causes geomean -0.01% binary size decrease
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text
and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions
The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform
`InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold,
which is supposed to be dealing with such aggregate reconstructions,
fires a lot less now. There are two reasons why:
1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes.
We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun.
2. But then, EarlyCSE not only manages to fold such redundant PHI's,
it also sees that the extract-insert chain recreates the original aggregate,
and replaces it with the original aggregate.
The take-aways are
1. We maybe should do most trivial, same-BB PHI CSE in InstCombine
2. I need to check if what other patterns remain, and how they can be resolved.
(i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away)
This is a reland of the original commit fcb51d8c2460faa23b71e06abb7e826243887dd6,
because originally i forgot to ensure that the base aggregate types match.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D86530
Update the comment stating the aim of the test - this is currently
only checking that these assembler directives doesn't cause the
assembler to fail, but the results of the testcase aren't particularly
correct yet.
Remove bits of the testcase that are even less likely to be found in
the wild (the .seh_startchained/.seh_endchained block), where the
testcase currently doesn't really generate anything interesting
anyway.
Differential Revision: https://reviews.llvm.org/D86524
This reverts commit fcb51d8c2460faa23b71e06abb7e826243887dd6.
As buildbots report, there's apparently some missing check to ensure
that the types of incoming values match the type of PHI.
Let's revert for a moment.
While since D86306 we do it's sibling fold for `insertvalue`,
we should also do this for `extractvalue`'s.
And unlike that one, the results here are, quite honestly, shocking,
as it can be observed here on vanilla llvm test-suite + RawSpeed results:
```
| statistic name | baseline | proposed | Δ | % | |%| |
|----------------------------------------------------|-----------|-----------|--------:|--------:|-------:|
| asm-printer.EmittedInsts | 7945095 | 7942507 | -2588 | -0.03% | 0.03% |
| assembler.ObjectBytes | 273209920 | 273069800 | -140120 | -0.05% | 0.05% |
| early-cse.NumCSE | 2183363 | 2183398 | 35 | 0.00% | 0.00% |
| early-cse.NumSimplify | 541847 | 550017 | 8170 | 1.51% | 1.51% |
| instcombine.NumAggregateReconstructionsSimplified | 2139 | 108 | -2031 | -94.95% | 94.95% |
| instcombine.NumCombined | 3601364 | 3635448 | 34084 | 0.95% | 0.95% |
| instcombine.NumConstProp | 27153 | 27157 | 4 | 0.01% | 0.01% |
| instcombine.NumDeadInst | 1694521 | 1765022 | 70501 | 4.16% | 4.16% |
| instcombine.NumPHIsOfExtractValues | 0 | 37546 | 37546 | 0.00% | 0.00% |
| instcombine.NumSunkInst | 63158 | 63686 | 528 | 0.84% | 0.84% |
| instcount.NumBrInst | 874304 | 871857 | -2447 | -0.28% | 0.28% |
| instcount.NumCallInst | 1757657 | 1758402 | 745 | 0.04% | 0.04% |
| instcount.NumExtractValueInst | 45623 | 11483 | -34140 | -74.83% | 74.83% |
| instcount.NumInsertValueInst | 4983 | 580 | -4403 | -88.36% | 88.36% |
| instcount.NumInvokeInst | 61018 | 59478 | -1540 | -2.52% | 2.52% |
| instcount.NumLandingPadInst | 35334 | 34215 | -1119 | -3.17% | 3.17% |
| instcount.NumPHIInst | 344428 | 331116 | -13312 | -3.86% | 3.86% |
| instcount.NumRetInst | 100773 | 100772 | -1 | 0.00% | 0.00% |
| instcount.TotalBlocks | 1081154 | 1077166 | -3988 | -0.37% | 0.37% |
| instcount.TotalFuncs | 101443 | 101442 | -1 | 0.00% | 0.00% |
| instcount.TotalInsts | 8890201 | 8833747 | -56454 | -0.64% | 0.64% |
| instsimplify.NumSimplified | 75822 | 75707 | -115 | -0.15% | 0.15% |
| simplifycfg.NumHoistCommonCode | 24203 | 24197 | -6 | -0.02% | 0.02% |
| simplifycfg.NumHoistCommonInstrs | 48201 | 48195 | -6 | -0.01% | 0.01% |
| simplifycfg.NumInvokes | 2785 | 4298 | 1513 | 54.33% | 54.33% |
| simplifycfg.NumSimpl | 997332 | 1018189 | 20857 | 2.09% | 2.09% |
| simplifycfg.NumSinkCommonCode | 7088 | 6464 | -624 | -8.80% | 8.80% |
| simplifycfg.NumSinkCommonInstrs | 15117 | 14021 | -1096 | -7.25% | 7.25% |
```
... which tells us that this new fold fires whopping 38k times,
increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time),
decreasing total instruction count by -0.64% (-56454),
and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less)
and `extractvalue`'s (-74.83%, i.e. four times less).
This causes geomean -0.01% binary size decrease
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text
and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions
The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform
`InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold,
which is supposed to be dealing with such aggregate reconstructions,
fires a lot less now. There are two reasons why:
1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes.
We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun.
2. But then, EarlyCSE not only manages to fold such redundant PHI's,
it also sees that the extract-insert chain recreates the original aggregate,
and replaces it with the original aggregate.
The take-aways are
1. We maybe should do most trivial, same-BB PHI CSE in InstCombine
2. I need to check if what other patterns remain, and how they can be resolved.
(i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away)
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D86530
This happens when using -flto and -Wl,--plugin-opt=emit-llvm to create a linked LTO bitcode file, and the bitcode file has a strtab with size > 2^29.
All the issues relate to a pattern like this
size_t x64 = y64 + z32 * C
When z32 is >= (2^32)/C, z32 * C overflows.
Reviewed-by: MaskRay
Differential Revision: https://reviews.llvm.org/D86500
A Mach-O universal binary may contain bitcode as a slice.
This diff adds proper handling of such binaries to llvm-lipo.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D85740
Since we can only copy to GR32 we had to EXTRACT from GR32, but
we would first go to GR16 and then the truncate would extra again
to GR8. This adds a special case to go directly from GR32 to GR8.
This would eventually get cleaned up, but though maybe we should
avoid doing it in the first place. Our k-register handling is weird
and we could probably stand to have some more special ISD nodes
for the conversions so the i32 type would be explicit.
The IsExtractedElement already called getOperand(0) so Extract
here is the source vector. We shouldn't call getOperand(0). This
worked for the original test cases because the result was a
bitcast so the getOperand(0) accidently peeked through the bitcast
which is what we wanted.
In the failing case here, the operand turns out to be undef so
the getOperand(0) asserts because undef has no operands.
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=25184
Differential Revision: https://reviews.llvm.org/D86428