Split out from D59749. The current implementation of isWrappedSet()
doesn't do what it says on the tin, and treats ranges like
[X, Max] as wrapping, because they are represented as [X, 0) when
using half-inclusive ranges. This also makes it inconsistent with
the semantics of isSignWrappedSet().
This patch renames isWrappedSet() to isUpperWrapped(), in preparation
for the introduction of a new isWrappedSet() method with corrected
behavior.
llvm-svn: 357107
Right now, if you try to use optdiff.py on any opt records, it will fail because
its calls to gather_results weren't updated to support filtering.
Since filters are supposed to be optional, this makes them None by default in
get_remarks and in gather_results. This allows other tools that don't support
filtering to still use the functions as is.
Differential Revision: https://reviews.llvm.org/D59894
llvm-svn: 357106
If there were only dbg_values in the block, recede would hit the
beginning of the block and try to use thet dbg_value as a real
instruction.
llvm-svn: 357105
Start using the uadd.sat and usub.sat intrinsics for the existing
canonicalizations. These intrinsics should optimize better than
expanded IR, have better handling in the X86 backend and should
be no worse than expanded IR in other backends, as far as we know.
rL357012 already introduced use of uadd.sat for the add+umin pattern.
Differential Revision: https://reviews.llvm.org/D58872
llvm-svn: 357103
The artifact combiners push instructions which have been marked for deletion
onto an list for the legalizer to deal with on return. However, for trunc(ext)
combines the combiner routine recursively calls itself. When it does this the
dead instructions list may not be empty, and the other combiners don't expect
to be dealing with essentially invalid MIR (multiple vreg defs etc).
This change fixes it by ensuring that the dead instructions are processed on
entry into tryCombineInstruction.
As a result, this fix exposed a few places in tests where G_TRUNC instructions
were not being deleted even though they were dead.
Differential Revision: https://reviews.llvm.org/D59892
llvm-svn: 357101
This reapplies r356149, using the correct overload of findUnusedReg
which passes the current iterator.
This worked most of the time, because the scavenger iterator was moved
at the end of the frame index loop in PEI. This would fail if the
spill was the first instruction. This was further hidden by the fact
that the scavenger wasn't passed in for normal frame index
elimination.
llvm-svn: 357098
Haswell CPUs have special support for SHLD/SHRD with the same register for both sources. Such an instruction will go to the rotate/shift unit on port 0 or 6. This gives it 1 cycle latency and 0.5 cycle reciprocal throughput. When the register is not the same, it becomes a 3 cycle operation on port 1. Sandybridge and Ivybridge always have 1 cyc latency and 0.5 cycle reciprocal throughput for any SHLD.
When FastSHLDRotate feature flag is set, we try to use SHLD for rotate by immediate unless BMI2 is enabled. But MachineCopyPropagation can look through a copy and change one of the sources to be different. This will break the hardware optimization.
This patch adds psuedo instruction to hide the second source input until after register allocation and MachineCopyPropagation. I'm not sure if this is the best way to do this or if there's some other way we can make this work.
Fixes PR41055
Differential Revision: https://reviews.llvm.org/D59391
llvm-svn: 357096
This patch removes an overly conservative check that would prevent
simplifying copies when the value we were tracking would go through
several subregister indices.
Indeed, the intend of this check was to not track values whenever
we have to compose subregister, but actually what the check was
doing was bailing anytime we see a second subreg, even if that
second subreg would actually be the new source of truth (as opposed
to a part of that subreg).
Differential Revision: https://reviews.llvm.org/D59891
llvm-svn: 357095
This patch fixes an assembler bug that allowed SVE vector registers to contain a
type suffix when not expected. The SVE unpredicated movprfx instruction is the
only instruction affected.
The following are examples of what was previously valid:
movprfx z0.b, z0.b
movprfx z0.b, z0.s
movprfx z0, z0.s
These instructions are now erroneous.
Patch by Cullen Rhodes (c-rhodes)
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D59636
llvm-svn: 357094
Also includes one example of how this transform is unsound. This isn't
verifying the copies are used in the control flow intrinisic patterns.
Also add option to disable exec mask opt pass. Since this pass is
unsound, it may be useful to turn it off until it is fixed.
llvm-svn: 357091
Currently this is called before the frame size is set on the
function. For AMDGPU, the scavenger is used for large frames where
part of the offset needs to be materialized in a register, so
estimating the frame size is useful for knowing whether the scavenger
is useful.
llvm-svn: 357087
The AMDGPU implementation of getReservedRegs depends on
MachineFunctionInfo fields that are parsed from the YAML section. This
was reserving the wrong register since it was setting the reserved
regs before parsing the correct one.
Some tests were relying on the default reserved set for the assumed
default calling convention.
llvm-svn: 357083
The .BTF.ext FuncInfoTable and LineInfoTable contain
information organized per ELF section. Current definition
of FuncInfoTable/LineInfoTable is:
std::unordered_map<uint32_t, std::vector<BTFFuncInfo>> FuncInfoTable
std::unordered_map<uint32_t, std::vector<BTFLineInfo>> LineInfoTable
where the key is the section name off in the string table.
The unordered_map may cause the order of section output
different for different platforms.
The same for unordered map definition of
std::unordered_map<std::string, std::unique_ptr<BTFKindDataSec>>
DataSecEntries
where BTF_KIND_DATASEC entries may have different ordering
for different platforms.
This patch fixed the issue by using std::map.
Test static-var-derived-type.ll is modified to generate two
DataSec's which will ensure the ordering is the same for all
supported platforms.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 357077
There is no reason why stages should be visited in reverse order.
This patch allows the definition of stages that push instructions forward from
their cycleEnd() routine.
llvm-svn: 357074
Rework BaseIndexOffset and isAlias to fully work with lifetime nodes
and fold in lifetime alias analysis.
This is mostly NFC.
Reviewers: courbet
Reviewed By: courbet
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59794
llvm-svn: 357070
Original commit by Ayonam Ray.
This commit adds a regression test for the issue discovered in the
previous commit: that the range check for the jump table can only be
omitted if the fall-through destination of the jump table is
unreachable, which isn't necessarily true just because the default of
the switch is unreachable.
This addresses the missing optimization in PR41242.
> During the lowering of a switch that would result in the generation of a
> jump table, a range check is performed before indexing into the jump
> table, for the switch value being outside the jump table range and a
> conditional branch is inserted to jump to the default block. In case the
> default block is unreachable, this conditional jump can be omitted. This
> patch implements omitting this conditional branch for unreachable
> defaults.
>
> Differential Revision: https://reviews.llvm.org/D52002
> Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev
llvm-svn: 357067
but the implementation is hard to extend. It doesn't currently have an
easy way to support intrinsics that, for example, lack a rounding mode.
This will be needed for impending new constrained intrinsics.
This code is split out of D55897 <https://reviews.llvm.org/D55897>, which
itself was split out of D43515 <https://reviews.llvm.org/D43515>.
Reviewed by: arsenm
Differential Revision: http://reviews.llvm.org/D59830
llvm-svn: 357065
Cleanup isAArch64FrameOffsetLegal by:
- Merging the large switch statement to reuse AArch64InstrInfo::getMemOpInfo().
- Using AArch64InstrInfo::getUnscaledLdSt() to determine whether an instruction
has an unscaled variant.
- Simplifying the logic that calculates the offset to fit the immediate.
Reviewers: paquette, evandro, eli.friedman, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D59636
llvm-svn: 357064
This patch mirrors the change made to the Unix equivalent in
r351916. This in turn fixes bugs related to the use of FileOutputBuffer
to output to "-", i.e. stdout, on Windows.
Differential Revision: https://reviews.llvm.org/D59663
llvm-svn: 357058
Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure.
This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain.
........
Causes PR41249
llvm-svn: 357057
getAsCarry() checks that the input argument is a carry-producing node before
allowing a transformation to addcarry. This patch adds a check to make sure
that the carry-producing node is legal. If it is not, it may not remain in a
form that is manageable by the target backend. The test case caused a
compilation failure during instruction selection for this reason on SystemZ.
Patch by Ulrich Weigand.
Review: Sanjay Patel
https://reviews.llvm.org/D59822
llvm-svn: 357052
ToolOutputFile handles '-' so no need to specialize here.
Also, we neither reassign the variable nor pass it around, thus no need
to use std::unique_ptr<ToolOutputFile>.
exit(1) -> return 1; to call the destructor of raw_fd_stream
llvm-svn: 357051
We handle the case where the C2 does not fit in a signed 32-bit immediate, but
(C2>>C1) does. But there's also some 64-bit opportunities when C2 is not an unsigned
32-bit immediate, but (C2>>C1) is. For OR/XOR this allows us to load the
immediate with with MOV32ri instead of a movabsq. For AND it allows us to use a
32-bit AND and fold the immediate.
llvm-svn: 357050
Previously we manually selected the AND/OR/XOR with immediate and the SHL(or ADD if the shift is 1). But this was missing out on the opportunity to use a 64 bit AND with a 32-bit immediate and possibly other isel tricks we have built into the tables.
Instead, insert the new nodes into the DAG using insertDAGNode and allow them each to be selected through the normal table.
llvm-svn: 357049
This patch lays the groundwork for extending the generic machine scheduler by providing a PPC-specific implementation.
There are no functional changes as this is an incremental patch that simply provides the necessary overrides which just
encapsulate the behavior of the generic scheduler. Subsequent patches will add specific behavior.
Differential Revision: https://reviews.llvm.org/D59284
llvm-svn: 357047
We were manually outputting the code we would get from selecting ANY_EXTEND. We
can save some code by just letting an ANY_EXTEND go through isel on its own.
llvm-svn: 357045
A section containing metadata on remark diagnostics will be emitted if
the flag (-mllvm) -remarks-section is present.
For now, the metadata is:
* a magic number for remarks: "REMARKS\0"
* the version number: a little-endian uint64_t
* the absolute file path to the serialized remark diagnostics: a
null-terminated string.
Differential Revision: https://reviews.llvm.org/D59571
llvm-svn: 357043
Split off from D59749. This uses a simpler and more efficient
implementation of isSignWrappedSet(), and considers full sets
as non-wrapped, to be consistent with isWrappedSet(). Otherwise
the behavior is unchanged.
There are currently only two users of this function and both already
check for isFullSet() || isSignWrappedSet(), so this is not going to
cause a change in overall behavior.
Differential Revision: https://reviews.llvm.org/D59848
llvm-svn: 357039
A bunch of macros use the same variable name, and since CMake macros
don't get their own scope, the value persists across macro invocations,
and we can end up exporting targets which shouldn't be exported. Clear
the variable before each use to avoid this.
Converting these macros to functions would also help, since it would
avoid the variable leaking into its parent scope, and that's something I
plan to follow up with. It won't fully address the problem, however,
since functions still inherit variables from their parent scopes, so if
someone in the parent scope just happened to use the same variable name
we'd still have the same issue.
llvm-svn: 357036