This patch adds bnez and beqz instructions which represent alias definitions for bne and beq instructions as follows:
bnez $rs,$imm => bne $rs,$zero,$imm
beqz $rs,$imm => beq $rs,$zero,$imm
The corresponding test cases are added.
Patch by Vladimir Medic
llvm-svn: 182040
This lane mask provides information about which register lanes
completely cover super-registers. See the block comment before
getCoveringLanes().
llvm-svn: 182034
This is the second part of the change to always return "true"
offset values from getPreIndexedAddressParts, tackling the
case of "memrix" type operands.
This is about instructions like LD/STD that only have a 14-bit
field to encode immediate offsets, which are implicitly extended
by two zero bits by the machine, so that in effect we can access
16-bit offsets as long as they are a multiple of 4.
The PowerPC back end currently handles such instructions by
carrying the 14-bit value (as it will get encoded into the
actual machine instructions) in the machine operand fields
for such instructions. This means that those values are
in fact not the true offset, but rather the offset divided
by 4 (and then truncated to an unsigned 14-bit value).
Like in the case fixed in r182012, this makes common code
operations on such offset values not work as expected.
Furthermore, there doesn't really appear to be any strong
reason why we should encode machine operands this way.
This patch therefore changes the encoding of "memrix" type
machine operands to simply contain the "true" offset value
as a signed immediate value, while enforcing the rules that
it must fit in a 16-bit signed value and must also be a
multiple of 4.
This change must be made simultaneously in all places that
access machine operands of this type. However, just about
all those changes make the code simpler; in many cases we
can now just share the same code for memri and memrix
operands.
llvm-svn: 182032
While testing some experimental code to add vector-scalar registers to
PowerPC, I noticed that a couple of independent instructions were
flipped by the scheduler. The new CHECK-DAG support is perfect for
avoiding this problem.
llvm-svn: 182020
DAGCombiner::CombineToPreIndexedLoadStore calls a target routine to
decompose a memory address into a base/offset pair. It expects the
offset (if constant) to be the true displacement value in order to
perform optional additional optimizations; in particular, to convert
other uses of the original pointer into uses of the new base pointer
after pre-increment.
The PowerPC implementation of getPreIndexedAddressParts, however,
simply calls SelectAddressRegImm, which returns a TargetConstant.
This value is appropriate for encoding into the instruction, but
it is not always usable as true displacement value:
- Its type is always MVT::i32, even on 64-bit, where addresses
ought to be i64 ... this causes the optimization to simply
always fail on 64-bit due to this line in DAGCombiner:
// FIXME: In some cases, we can be smarter about this.
if (Op1.getValueType() != Offset.getValueType()) {
- Its value is truncated to an unsigned 16-bit value if negative.
This causes the above opimization to generate wrong code.
This patch fixes both problems by simply returning the true
displacement value (in its original type). This doesn't
affect any other user of the displacement.
llvm-svn: 182012
getExceptionHandlingType is not ExceptionHandling::DwarfCFI on xcore, so
etFrameInstructions is never called. There is no point creating cfi
instructions if they are never used.
llvm-svn: 181979
Without this change nothing was covering this addFrameMove:
// For 64-bit SVR4 when we have spilled CRs, the spill location
// is SP+8, not a frame-relative slot.
if (Subtarget.isSVR4ABI()
&& Subtarget.isPPC64()
&& (PPC::CR2 <= Reg && Reg <= PPC::CR4)) {
MachineLocation CSDst(PPC::X1, 8);
MachineLocation CSSrc(PPC::CR2);
MMI.addFrameMove(Label, CSDst, CSSrc);
continue;
}
llvm-svn: 181976
This creates stubs that help Mips32 functions call Mips16
functions which have floating point parameters that are normally passed
in floating point registers.
llvm-svn: 181972
We only want to check this once, not for every conditional block in the loop.
No functionality change (except that we don't perform a check redudantly
anymore).
llvm-svn: 181942
Increase the number of instructions LLVM recognizes as setting the ZF
flag. This allows us to remove test instructions that redundantly
recalculate the flag.
llvm-svn: 181937
The old PPCCTRLoops pass, like the Hexagon pass version from which it was
derived, could only handle some simple loops in canonical form. We cannot
directly adapt the new Hexagon hardware loops pass, however, because the
Hexagon pass contains a fundamental assumption that non-constant-trip-count
loops will contain a guard, and this is not always true (the result being that
incorrect negative counts can be generated). With this commit, we replace the
pass with a late IR-level pass which makes use of SE to calculate the
backedge-taken counts and safely generate the loop-count expressions (including
any necessary max() parts). This IR level pass inserts custom intrinsics that
are lowered into the desired decrement-and-branch instructions.
The most fragile part of this new implementation is that interfering uses of
the counter register must be detected on the IR level (and, on PPC, this also
includes any indirect branches in addition to function calls). Also, to make
all of this work, we need a variant of the mtctr instruction that is marked
as having side effects. Without this, machine-code level CSE, DCE, etc.
illegally transform the resulting code. Hopefully, this can be improved
in the future.
This new pass is smaller than the original (and much smaller than the new
Hexagon hardware loops pass), and can handle many additional cases correctly.
In addition, the preheader-creation code has been copied from LoopSimplify, and
after we decide on where it belongs, this code will be refactored so that it
can be explicitly shared (making this implementation even smaller).
The new test-case files ctrloop-{le,lt,ne}.ll have been adapted from tests for
the new Hexagon pass. There are a few classes of loops that this pass does not
transform (noted by FIXMEs in the files), but these deficiencies can be
addressed within the SE infrastructure (thus helping many other passes as well).
llvm-svn: 181927
If the input operands to SETCC are promoted, we need to make sure that we
either use the promoted form of both operands (or neither); a mixture is not
allowed. This can happen, for example, if a target has a custom promoted
i1-returning intrinsic (where i1 is not a legal type). In this case, we need to
use the promoted form of both operands.
This change only augments the behavior of the existing logic in the case where
the input types (which may or may not have already been legalized) disagree,
and should not affect existing target code because this case would otherwise
cause an assert in the SETCC operand promotion code.
This will be covered by (essentially all of the) tests for the new PPCCTRLoops
infrastructure.
llvm-svn: 181926
IR optimisation passes can result in a basic block that contains:
llvm.lifetime.start(%buf)
...
llvm.lifetime.end(%buf)
...
llvm.lifetime.start(%buf)
Before this change, calculateLiveIntervals() was ignoring the second
lifetime.start() and was regarding %buf as being dead from the
lifetime.end() through to the end of the basic block. This can cause
StackColoring to incorrectly merge %buf with another stack slot.
Fix by removing the incorrect Starts[pos].isValid() and
Finishes[pos].isValid() checks.
Just doing:
Starts[pos] = Indexes->getMBBStartIdx(MBB);
Finishes[pos] = Indexes->getMBBEndIdx(MBB);
unconditionally would be enough to fix the bug, but it causes some
test failures due to stack slots not being merged when they were
before. So, in order to keep the existing tests passing, treat LiveIn
and LiveOut separately rather than approximating the live ranges by
merging LiveIn and LiveOut.
This fixes PR15707.
Patch by Mark Seaborn.
llvm-svn: 181922
We want the order to be deterministic on all platforms. NAKAMURA Takumi
fixed that in r181864. This patch is just two small cleanups:
* Move the function to the cpp file. It is only passed to array_pod_sort.
* Remove the ppc implementation which is now redundant
llvm-svn: 181910
This patch matches GCC behavior: the code used to only allow unaligned
load/store on ARM for v6+ Darwin, it will now allow unaligned load/store for
v6+ Darwin as well as for v7+ on other targets.
The distinction is made because v6 doesn't guarantee support (but LLVM assumes
that Apple controls hardware+kernel and therefore have conformant v6 CPUs),
whereas v7 does provide this guarantee (and Linux behaves sanely).
Overall this should slightly improve performance in most cases because of
reduced I$ pressure.
Patch by JF Bastien
llvm-svn: 181897
Now that PowerPC no longer uses adjustFixupOffset, and no other
back-end (ever?) did, we can remove the infrastructure itself
(incidentally addressing a FIXME to that effect).
llvm-svn: 181895
Now that applyFixup understands differently-sized fixups, we can define
fixup_ppc_lo16/fixup_ppc_lo16_ds/fixup_ppc_ha16 to properly be 2-byte
fixups, applied at an offset of 2 relative to the start of the
instruction text.
This has the benefit that if we actually need to generate a real
relocation record, its address will come out correctly automatically,
without having to fiddle with the offset in adjustFixupOffset.
Tested on both 64-bit and 32-bit PowerPC, using external and
integrated assembler.
llvm-svn: 181894