This allow it to recognize more loads as being consecutive when the load's address are complex at the start.
Differential Revision: https://reviews.llvm.org/D74444
Summary:
The DWARF transformer is added as a class so it can be unit tested fully.
The DWARF is converted to GSYM format and handles many special cases for functions:
- omit functions in compile units with 4 byte addresses whose address is UINT32_MAX (dead stripped)
- omit functions in compile units with 8 byte addresses whose address is UINT64_MAX (dead stripped)
- omit any functions whose high PC is <= low PC (dead stripped)
- StringTable builder doesn't copy strings, so we need to make backing copies of strings but only when needed. Many strings come from sections in object files and won't need to have backing copies, but some do.
- When a function doesn't have a mangled name, store the fully qualified name by creating a string by traversing the parent decl context DIEs and then. If we don't do this, we end up having cases where some function might appear in the GSYM as "erase" instead of "std::vector<int>::erase".
- omit any functions whose address isn't in the optional TextRanges member variable of DwarfTransformer. This allows object file to register address ranges that are known valid code ranges and can help omit functions that should have been dead stripped, but just had their low PC values set to zero. In this case we have many functions that all appear at address zero and can omit these functions by making sure they fall into good address ranges on the object file. Many compilers do this when the DWARF has a DW_AT_low_pc with a DW_FORM_addr, and a DW_AT_high_pc with a DW_FORM_data4 as the offset from the low PC. In this case the linker can't write the same address to both the high and low PC since there is only a relocation for the DW_AT_low_pc, so many linkers tend to just zero it out.
Reviewers: aprantl, dblaikie, probinson
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74450
This reverts commit 80a34ae31125aa46dcad47162ba45b152aed968d with fixes.
Previously, since bots turning on EXPENSIVE_CHECKS are essentially turning on
MachineVerifierPass by default on X86 and the fact that
inline-asm-avx-v-constraint-32bit.ll and inline-asm-avx512vl-v-constraint-32bit.ll
are not expected to generate functioning machine code, this would go
down to `report_fatal_error` in MachineVerifierPass. Here passing
`-verify-machineinstrs=0` to make the intent explicit.
This reverts commit 80a34ae31125aa46dcad47162ba45b152aed968d with fixes.
On bots llvm-clang-x86_64-expensive-checks-ubuntu and
llvm-clang-x86_64-expensive-checks-debian only,
llc returns 0 for these two tests unexpectedly. I tweaked the RUN line a little
bit in the hope that LIT is the culprit since this change is not in the
codepath these tests are testing.
llvm\test\CodeGen\X86\inline-asm-avx-v-constraint-32bit.ll
llvm\test\CodeGen\X86\inline-asm-avx512vl-v-constraint-32bit.ll
MemorySSA is often taking up an unreasonable fraction of runtime in
assertion enabled builds. Turns out that there is one code-path that
runs verifyMemorySSA() even if VerifyMemorySSA is not enabled. This
patch makes it conditional as well.
Differential Revision: https://reviews.llvm.org/D74505
Also greatly improve i64 lowering. LegalizeIntegerTypes does the
correct narrowing if i64 isn't legal. Just workaround this for
SelectionDAG by making i64 legal and splitting in the patterns.
If the target has FP64 but not FP16 then we have custom lowering for FP_EXTEND
and STRICT_FP_EXTEND with type f64. However if the extend is from f32 to f64 the
current implementation will cause in infinite loop for STRICT_FP_EXTEND due to
emitting a merge_values of the original node which after replacement becomes a
merge_values of itself.
Fix this by not doing anything for f32 to f64 extend when we have FP64, though
for STRICT_FP_EXTEND we have to do the strict-to-nonstrict mutation as that
doesn't happen automatically for opcodes with custom lowering.
Differential Revision: https://reviews.llvm.org/D74559
We want the extra-use tests to be consistent with the
earlier single-use tests and be as cheap as possible
in vector form to show cost model edge cases. So use
i8 and extract from element 0 since that should be
cheap for all x86 targets.
Skip the loop over the CalleSavedInfos in 'restoreCalleeSavedRegisters' when
the register is a CR field and we are not targeting 32-bit ELF. This is safe
because:
1) The helper function 'restoreCRs' returns if the target is not 32-bit ELF,
making all the code in the loop related to CR fields dead for every other
subtarget. This code is only called on ELF right now, but the patch
to extend it for AIX also needs to skip 'restoreCRs'.
2) The loop will not otherwise modify the iterator, so the iterator
manipulations at the bottom of the loop end up setting 'I' to its
current value.
This simplifciation allows us to remove one argument from 'restoreCRs'.
Also add a helper function to determine if a register is one of the
callee saved condition register fields.
Before, the script used `git log -SFoo.cpp` to find a commit where
the number of occurrences of "Foo.cpp" changed -- but since
a patch with
+ LLVMFoo.cpp
- Foo.cpp
contains the same number of instances of "Foo.cpp", the script
incorrectly skipped this type of rename.
As fix, look for '\bFoo\.cpp\b' instead and pass --pickaxe-regex
so that we can grep for word boundaries.
To test, check out 7531a5039fd (which renamed in llvm/lib/IR
RemarkStreamer.cpp to LLVMRemarkStreamer.cpp) and look at the output of
the script. Before this change, it correctly assigned the addition
of LLVMRemarkStreamer.cpp to 7531a5039fd but incorrectly assigned
the removal of RemarkStreamer.cpp to b8a847c. With this, it
correctly assigns both to 7531a5039fd.
Basically change the layout to please `go build` and remove references to
`llvm-go`.
Update llvm/test/Bindings/Go/ to use the system go compiler
Differential Revision: https://reviews.llvm.org/D74540
Exploit native VSX rounding instruction, x(v|s)r(d|s)pic, which does
rounding using current rounding mode.
According to C standard library, rint may raise INEXACT exception while
nearbyint won't.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D72685
This reverts commit 61b35e4111160fe834a00c33d040e01150b576ac.
This commit causes a timeout in chromium builds; likely to have a
similar cause to the previous timeout issue caused by this commit (see
6ded69f294a9 for more details). It is possible that there is no way to
fix this bug that will not cause this issue; further investigations as
to the efficiency of handling large amounts of debug info will be
necessary.
In some cases BTI landing pad is inserted even compatible instruction
was there already. Meta instruction does not count in this case
therefore skip them in the check for first instructions in the function.
Differential revision: https://reviews.llvm.org/D74492
Simon pointed out that this function is doing a bitcast, which can be
incorrect for big endian. That makes the lowering of VMOVN in MVE
wrong, but the function is shared between Neon and MVE so both can
be incorrect.
This attempts to fix things by using the newly added VECTOR_REG_CAST
instead of the BITCAST. As it may now be used on Neon, I've added the
relevant patterns for it there too. I've also added a quick dag combine
for it to remove them where possible.
Differential Revision: https://reviews.llvm.org/D74485
We do not keep the actual value of the CIE ID field, because it is
predefined, and use a constant when dumping a CIE record. The issue
was that the predefined value is different for .debug_frame and
.eh_frame sections, but we always printed the one which corresponds
to .debug_frame. The patch fixes that by choosing an appropriate
constant to print.
See the following for more information about .eh_frame sections:
https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html
Differential Revision: https://reviews.llvm.org/D73627
Finalization can introduce new blocks we need to outline as well so it
makes sense to identify the blocks that need to be outlined after
finalization happened. There was also a minor unit test adjustment to
account for the fact that we have a single outlined exit block now.
On KNL targets, we widen 128/256-bit strict_fsetcc nodes to
512-bits without forcing the upper bits to zero. This can cause
spurious exceptions due to garbage upper bits. This behavior was
inherited from the non-strict case where the spurious exception
isn't a problem.
Currently, BPF does not support dynamic static allocation.
For a program like below:
extern void bar(int *);
void foo(int n) {
int a[n];
bar(a);
}
The current error message looks like:
unimplemented operand
UNREACHABLE executed at /.../llvm/lib/Target/BPF/BPFISelLowering.cpp:199!
Let us make error message explicit so it will be clear to the user
what is the problem. With this patch, the error message looks like:
fatal error: error in backend: Unsupported dynamic stack allocation
...
Differential Revision: https://reviews.llvm.org/D74521
Reapply 8a56d64d7620b3764f10f03f3a1e307fcdd72c2f with minor fixes.
The problem was that cancellation can cause new edges to the parallel
region exit block which is not outlined. The CodeExtractor will encode
the information which "exit" was taken as a return value. The fix is to
ensure we do not return any value from the outlined function, to prevent
control to value conversion we ensure a single exit block for the
outlined region.
This reverts commit 3aac953afa34885a72df96f2b703b65f85cbb149.
Patchable statepoint is lowered into sequence of nops, so zeroed call target
should not be on register. It is better to use getTargetConstant instead
of getConstant to select zero constant for call target.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D74465
When SI_IF is inserted, it constrains the source register with a
register class, which was quite likely a G_ICMP. This was incorrectly
treating it as a scalar, and then applyMappingImpl would end up
producing invalid MIR since this was unexpected.
Also fix not using all VGPR sources for vcc outputs.
These tests fail when the default is switched to assume IEEE denormal
handling. I'm not sure if PPC really has a way to control the denormal
input handling.