The LDinto_toc pattern has been part of 64-bit PowerPC for a long
time, and represents loading from a memory location into the TOC
register (X2). However, this pattern doesn't explicitly record that
it modifies that register. This patch adds the missing dependency.
It was very surprising to me that this has never shown up as a problem
in the past, and that we only saw this problem recently in a single
scenario when building a self-hosted clang. It turns out that in most
cases we have another dependency present that keeps the LDinto_toc
instruction tied in place. LDinto_toc is used for TOC restore
following a call site, so this is a typical sequence:
BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
LDinto_toc 24, %X1
ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use>
Because the LDinto_toc is inserted prior to the ADJCALLSTACKUP, there
is a natural anti-dependency between the two that keeps it in place.
Therefore we don't usually see a problem. However, in one particular
case, one call is followed immediately by another call, and the second
call requires a parameter that is a TOC-relative address. This is the
code sequence:
BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
LDinto_toc 24, %X1
ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use>
ADJCALLSTACKDOWN 96, %R1<imp-def>, %R1<imp-use>
%vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
%vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39
Note that the back-to-back stack adjustments are the same size! The
back end is smart enough to recognize this and optimize them away:
BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
LDinto_toc 24, %X1
%vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
%vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39
Now there is nothing to prevent the ADDIStocHA instruction from moving
ahead of the LDinto_toc instruction, and because of the longest-path
heuristic, this is what happens.
With the accompanying patch, %X2 is represented as an implicit def:
BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
LDinto_toc 24, %X1, %X2<imp-def,dead>
ADJCALLSTACKUP 96, 0, %R1<imp-def,dead>, %R1<imp-use>
ADJCALLSTACKDOWN 96, %R1<imp-def,dead>, %R1<imp-use>
%vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
%vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39
So now when the two stack adjustments are removed, ADDIStocHA is
prevented from being moved above LDinto_toc.
I have not yet created a test case for this, because the original
failure occurs on a relatively large function that needs reduction.
However, this is a fairly serious bug, despite its infrequency, and I
wanted to get this patch onto the list as soon as possible so that it
can be considered for a 3.5 backport. I'll work on whittling down a
test case.
Have we missed the boat for 3.5 at this point?
Thanks,
Bill
llvm-svn: 215685
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)
Changes made by clang-tidy with minor tweaks.
llvm-svn: 215558
This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store
intrinsics. As with the construction of the MachineMemOperands for the
intrinsic calls used for unaligned load/store lowering, the only slight
complication is that we need to represent a larger memory range than the
loaded/stored value-type size (because the address is rounded down to an
aligned address, and we need to conservatively represent the entire possible
range of the actual access). This required adding an extra size field to
TargetLowering::IntrinsicInfo, and this was done in a way that required no
modifications to other targets (the size defaults to the store size of the
provided memory data type).
This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed).
llvm-svn: 215512
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reverts commits r215111, 215115, 215116, 215117, 215136.
llvm-svn: 215154
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.
Thanks to Lang Hames for making MCJIT a good replacement!
llvm-svn: 215111
to get the subtarget and that's accessible from the MachineFunction
now. This helps clear the way for smaller changes where we getting
a subtarget will require passing in a MachineFunction/Function as
well.
llvm-svn: 214988
Commits r213915 and r214718 fix recognition of shuffle masks for vmrg*
and vpku*um instructions for a little-endian target, by swapping the
input arguments. The vsldoi instruction requires similar treatment,
and also needs its shift count adjusted for little endian.
Reviewed by Ulrich Weigand.
This is a bug fix candidate for release 3.5 (and hopefully the last of
those for PowerPC).
llvm-svn: 214923
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.
Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.
llvm-svn: 214838
My original LE implementation of the vsldoi instruction, with its
altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect
shufflevector operations in the LLVM IR. Correct code is generated
because the back end handles the incorrect shufflevector in a
consistent manner.
This patch and a companion patch for Clang correct this problem by
removing the fixup from altivec.h and the corresponding fixup from the
PowerPC back end. Several test cases are also modified to reflect the
now-correct LLVM IR.
llvm-svn: 214800
In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl*
by swapping the input arguments. As it turns out, the exact same fix
is also required for the vpkuhum/vpkuwum patterns.
This fixes another regression in llvmpipe when vector support is
enabled.
Reviewed by Bill Schmidt.
llvm-svn: 214718
I ran into some test failures where common code changed vector division
by constant into a multiply-high operation (MULHU). But these are not
implemented by the back-end, so we failed to recognize the insn.
Fixed by marking MULHU/MULHS as Expand for vector types.
llvm-svn: 214716
This patch refactors code generation of vector comparisons.
This fixes a wrong code-gen bug for ISD::SETGE for floating-point types,
and improves generated code for vector comparisons in general.
Specifically, the patch moves all logic deciding how to implement vector
comparisons into getVCmpInst, which gets two extra boolean outputs
indicating to its caller whether its needs to swap the input operands
and/or negate the result of the comparison. Apart from implementing
these two modifications as directed by getVCmpInst, there is no need
to ever implement vector comparisons in any other manner; in particular,
there is never a need to perform two separate comparisons (e.g. one for
equal and one for greater-than, as code used to do before this patch).
Reviewed by Bill Schmidt.
llvm-svn: 214714