Instead of emitting config values in a predefined order, the code
emitter will now emit a 32-bit register index followed by the 32-bit
config value.
llvm-svn: 179546
This fixes an ABI bug for non-Darwin PPC64. For the callee-saved condition
registers, the spill location is specified relative to the stack pointer (SP +
8). However, this is not relative to the SP after the new stack frame is
established, but instead relative to the caller's stack pointer (it is stored
into the linkage area of the parent's stack frame).
So, like with the link register, we don't directly spill the CRs with other
callee-saved registers, but just mark them to be spilled during prologue
generation.
In practice, this reverts r179457 for PPC64 (but leaves it in place for PPC32).
llvm-svn: 179500
For functions that need to spill CRs, and have dynamic stack allocations, the
value of the SP during the restore is not what it was during the save, and so
we need to use the FP in these cases (as for all of the other spills and
restores, but the CR restore has a special code path because its reserved slot,
like the link register, is specified directly relative to the adjusted SP).
llvm-svn: 179457
The register allocator expects minimal physreg live ranges. Schedule
physreg copies accordingly. This is slightly tricky when they occur in
the middle of the scheduling region. For now, this is handled by
rescheduling the copy when its associated instruction is
scheduled. Eventually we may instead bundle them, but only if we can
preserve the bundles as parallel copies during regalloc.
llvm-svn: 179449
When debugging performance regressions we often ask ourselves if the regression
that we see is due to poor isel/sched/ra or due to some micro-architetural
problem. When comparing two code sequences one good way to rule out front-end
bottlenecks (and other the issues) is to force code alignment. This pass adds
a flag that forces the alignment of all of the basic blocks in the program.
llvm-svn: 179353
As packed comparisons in AVX/SSE produce all 0s or all 1s in each SIMD lane,
vector select could be simplified to AND/OR or removed if one or both values
being selected is all 0s or all 1s.
llvm-svn: 179267
This patch is revised based on patch from Victor Umansky
<victor.umansky@intel.com>. More cases are handled in X86's bool
simplification, i.e.
- SETCC_CARRY
- value is truncated to i1 with AND
As a by-product, PR5443 is also fixed.
llvm-svn: 179265
In the simple and triangle if-conversion cases, when CopyAndPredicateBlock is
used because the to-be-predicated block has other predecessors, we need to
explicitly remove the old copied block from the successors list. Normally if
conversion relies on TII->AnalyzeBranch combined with BB->CorrectExtraCFGEdges
to cleanup the successors list, but if the predicated block contained an
un-analyzable branch (such as a now-predicated return), then this will fail.
These extra successors were causing a problem on PPC because it was causing
later passes (such as PPCEarlyReturm) to leave dead return-only basic blocks in
the code.
llvm-svn: 179227
Mips32 code as Mips16 unless it can't be compiled as Mips 16. For now this
would happen as long as floating point instructions are not needed.
Probably it would also make sense to compile as mips32 if atomic operations
are needed too. There may be other cases too.
A module pass prescans the IR and adds the mips16 or nomips16 attribute
to functions depending on the functions needs.
Mips 16 mode can result in a 40% code compression by utililizing 16 bit
encoding of many instructions.
The hope is for this to replace the traditional gcc way of dealing with
Mips16 code using floating point which involves essentially using soft float
but with a library implemented using mips32 floating point. This gcc
method also requires creating stubs so that Mips32 code can interact with
these Mips 16 functions that have floating point needs. My conjecture is
that in reality this traditional gcc method would never win over this
new method.
I will be implementing the traditional gcc method also. Some of it is already
done but I needed to do the stubs to finish the work and those required
this mips16/32 mixed mode capability.
I have more ideas for to make this new method much better and I think the old
method will just live in llvm for anyone that needs the backward compatibility
but I don't for what reason that would be needed.
llvm-svn: 179185
Depending on the number of bits set in the writemask.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 179166
Modifier 'D' is to use the second word of a double integer.
We had previously implemented the pure register varient of
the modifier and this patch implements the memory reference.
#include "stdio.h"
int b[8] = {0,1,2,3,4,5,6,7};
void main()
{
int i;
// The first word. Notice, no 'D'
{asm (
"lw %0,%1;"
: "=r" (i)
: "m" (*(b+4))
);}
printf("%d\n",i);
// The second word
{asm (
"lw %0,%D1;"
: "=r" (i)
: "m" (*(b+4))
);}
printf("%d\n",i);
}
llvm-svn: 179135
This enables us to form predicated branches (which are the same conditional
branches we had before) and also a larger set of predicated returns (including
instructions like bdnzlr which is a conditional return and loop-counter
decrement all in one).
At the moment, if conversion does not capture all possible opportunities. A
simple example is provided in early-ret2.ll, where if conversion forms one
predicated return, and then the PPCEarlyReturn pass picks up the other one. So,
at least for now, we'll keep both mechanisms.
llvm-svn: 179134
and mips16 on a per function basis.
Because this patch is somewhat involved I have provide an overview of the
key pieces of it.
The patch is written so as to not change the behavior of the non mixed
mode. We have tested this a lot but it is something new to switch subtargets
so we don't want any chance of regression in the mainline compiler until
we have more confidence in this.
Mips32/64 are very different from Mip16 as is the case of ARM vs Thumb1.
For that reason there are derived versions of the register info, frame info,
instruction info and instruction selection classes.
Now we register three separate passes for instruction selection.
One which is used to switch subtargets (MipsModuleISelDAGToDAG.cpp) and then
one for each of the current subtargets (Mips16ISelDAGToDAG.cpp and
MipsSEISelDAGToDAG.cpp).
When the ModuleISel pass runs, it determines if there is a need to switch
subtargets and if so, the owning pointers in MipsTargetMachine are
appropriately changed.
When 16Isel or SEIsel is run, they will return immediately without doing
any work if the current subtarget mode does not apply to them.
In addition, MipsAsmPrinter needs to be reset on a function basis.
The pass BasicTargetTransformInfo is substituted with a null pass since the
pass is immutable and really needs to be a function pass for it to be
used with changing subtargets. This will be fixed in a follow on patch.
llvm-svn: 179118
This pattern occurs in SROA output due to the way vector arguments are lowered
on ARM.
The testcase from PR15525 now compiles into this, which is better than the code
we got with the old scalarrepl:
_Store:
ldr.w r9, [sp]
vmov d17, r3, r9
vmov d16, r1, r2
vst1.8 {d16, d17}, [r0]
bx lr
Differential Revision: http://llvm-reviews.chandlerc.com/D647
llvm-svn: 179106
On PowerPC, non-vector loads and stores have r+i forms; however, in functions
with large stack frames these were not being used to access slots far from the
stack pointer because such slots were out of range for the signed 16-bit
immediate offset field. This increases register pressure because we need a
separate register for each offset (when the r+r form is used). By enabling
virtual base registers, we can deal with large stack frames without unduly
increasing register pressure.
llvm-svn: 179105
The save area is twice as big and there is no struct return slot. The
stack pointer is always 16-byte aligned (after adding the bias).
Also eliminate the stack adjustment instructions around calls when the
function has a reserved stack frame.
llvm-svn: 179083
PowerPC has a conditional branch to the link register (return) instruction: BCLR.
This should be used any time when we'd otherwise have a conditional branch to a
return. This adds a small pass, PPCEarlyReturn, which runs just prior to the
branch selection pass (and, importantly, after block placement) to generate
these conditional returns when possible. It will also eliminate unconditional
branches to returns (these happen rarely; most of the time these have already
been tail duplicated by the time PPCEarlyReturn is invoked). This is a nice
optimization for small functions that do not maintain a stack frame.
llvm-svn: 179026
I've managed to convince myself that AArch64's acquire/release
instructions are sufficient to guarantee C++11's required semantics,
even in the sequentially-consistent case.
llvm-svn: 179005
First, we should not cheat: fsel-based lowering of select_cc is a
finite-math-only optimization (the ISA manual, section F.3 of v2.06, makes
this clear, as does a note in our own README).
This also adds fsel-based lowering of EQ and NE condition codes. As it turned
out, fsel generation was covered by a grand total of zero regression test
cases. I've added some test cases to cover the existing behavior (which is now
finite-math only), as well as the new EQ cases.
llvm-svn: 179000
Integer return values are sign or zero extended by the callee, and
structs up to 32 bytes in size can be returned in registers.
The CC_Sparc64 CallingConv definition is shared between
LowerFormalArguments_64 and LowerReturn_64. Function arguments and
return values are passed in the same registers.
The inreg flag is also used for return values. This is required to handle
C functions returning structs containing floats and ints:
struct ifp {
int i;
float f;
};
struct ifp f(void);
LLVM IR:
define inreg { i32, float } @f() {
...
ret { i32, float } %retval
}
The ABI requires that %retval.i is returned in the high bits of %i0
while %retval.f goes in %f1.
Without the inreg return value attribute, %retval.i would go in %i0 and
%retval.f would go in %f3 which is a more efficient way of returning
%multiple values, but it is not ABI compliant for returning C structs.
llvm-svn: 178966
64-bit SPARC v9 processes use biased stack and frame pointers, so the
current function's stack frame is located at %sp+BIAS .. %fp+BIAS where
BIAS = 2047.
This makes more local variables directly accessible via [%fp+simm13]
addressing.
llvm-svn: 178965
There are certain PPC instructions into which we can fold a zero immediate
operand. We can detect such cases by looking at the register class required
by the using operand (so long as it is not otherwise constrained).
llvm-svn: 178961
All arguments are formally assigned to stack positions and then promoted
to floating point and integer registers. Since there are more floating
point registers than integer registers, this can cause situations where
floating point arguments are assigned to registers after integer
arguments that where assigned to the stack.
Use the inreg flag to indicate 32-bit fragments of structs containing
both float and int members.
The three-way shadowing between stack, integer, and floating point
registers requires custom argument lowering. The good news is that
return values are passed in the exact same way, and we can share the
code.
Still missing:
- Update LowerReturn to handle structs returned in registers.
- LowerCall.
- Variadic functions.
llvm-svn: 178958
SITargetLowering::analyzeImmediate() was converting the 64-bit values
to 32-bit and then checking if they were an inline immediate. Some
of these conversions caused this check to succeed and produced
S_MOV instructions with 64-bit immediates, which are illegal.
v2:
- Clean up logic
Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178927
On cores for which we know the misprediction penalty, and we have
the isel instruction, we can profitably perform early if conversion.
This enables us to replace some small branch sequences with selects
and avoid the potential stalls from mispredicting the branches.
Enabling this feature required implementing canInsertSelect and
insertSelect in PPCInstrInfo; isel code in PPCISelLowering was
refactored to use these functions as well.
llvm-svn: 178926
The DAGCombine logic that recognized a/sqrt(b) and transformed it into
a multiplication by the reciprocal sqrt did not handle cases where the
sqrt and the division were separated by an fpext or fptrunc.
llvm-svn: 178801
The Thumb2SizeReduction pass avoids false CPSR dependencies, except it
still aggressively creates tMOVi8 instructions because they are so
common.
Avoid creating false CPSR dependencies even for tMOVi8 instructions when
the the CPSR flags are known to have high latency. This allows integer
computation to overlap floating point computations.
Also process blocks in a reverse post-order and propagate high-latency
flags to successors.
<rdar://problem/13468102>
llvm-svn: 178773
This requires v9 cmov instructions using the %xcc flags instead of the
%icc flags.
Still missing:
- Select floats on %xcc flags.
- Select i64 on %fcc flags.
llvm-svn: 178737
For this we need to use a libcall. Previously LLVM didn't implement
libcall support for frem, so I've added it in the usual
straightforward manner. A test case from the bug report is included.
llvm-svn: 178639
The same compare instruction is used for 32-bit and 64-bit compares. It
sets two different sets of flags: icc and xcc.
This patch adds a conditional branch instruction using the xcc flags for
64-bit compares.
llvm-svn: 178621
When unsafe FP math operations are enabled, we can use the fre[s] and
frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together
with some Newton iteration, in order to quickly generate floating-point
division and sqrt results. All of these instructions are separately optional,
and so each has its own feature flag (except for the Altivec instructions,
which are covered under the existing Altivec flag). Doing this is not only
faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these
computations to be pipelined with other computations in order to hide their
overall latency.
I've also added a couple of missing fnmsub patterns which turned out to be
missing (but are necessary for good code generation of the Newton iterations).
Altivec needs a similar fix, but that will probably be more complicated because
fneg is expanded for Altivec's v4f32.
llvm-svn: 178617
This patch initializes t9 to the handler address, but only if the relocation
model is pic. This handles the case where handler to which eh.return jumps
points to the start of the function.
Patch by Sasa Stankovic.
llvm-svn: 178588
When doing a partword atomic operation, a lwarx was being paired with
a stdcx. instead of a stwcx. when compiling for a 64-bit target. The
target has nothing to do with it in this case; we always need a stwcx.
Thanks to Kai Nacke for reporting the problem.
llvm-svn: 178559
This is helps on architectures where i8,i16 are not legal but we have byte, and
short loads/stores. Allowing us to merge copies like the one below on ARM.
copy(char *a, char *b, int n) {
do {
int t0 = a[0];
int t1 = a[1];
b[0] = t0;
b[1] = t1;
radar://13536387
llvm-svn: 178546
The last resort pattern produces 6 instructions, and there are still
opportunities for materializing some immediates in fewer instructions.
llvm-svn: 178526
SPARC v9 defines new 64-bit shift instructions. The 32-bit shift right
instructions are still usable as zero and sign extensions.
This adds new F3_Sr and F3_Si instruction formats that probably should
be used for the 32-bit shifts as well. They don't really encode an
simm13 field.
llvm-svn: 178525
This is far from complete, but it is enough to make it possible to write
test cases using i64 arguments.
Missing features:
- Floating point arguments.
- Receiving arguments on the stack.
- Calls.
llvm-svn: 178523
We would also like to merge sequences that involve a variable index like in the
example below.
int index = *idx++
int i0 = c[index+0];
int i1 = c[index+1];
b[0] = i0;
b[1] = i1;
By extending the parsing of the base pointer to handle dags that contain a
base, index, and offset we can handle examples like the one above.
The dag for the code above will look something like:
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (i8 load %index))))
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (i32 add (i32 signextend (i8 load %index))
(i32 1)))))
The code that parses the tree ignores the intermediate sign extensions. However,
if there is a sign extension it needs to be on all indexes.
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (add (i8 load %index)
(i8 1))))
vs
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (i32 add (i32 signextend (i8 load %index))
(i32 1)))))
radar://13536387
llvm-svn: 178483
The P7 and A2 have additional floating-point conversion instructions which
allow a direct two-instruction sequence (plus load/store) to convert from all
combinations (signed/unsigned i32/i64) <--> (float/double) (on previous cores,
only some combinations were directly available).
llvm-svn: 178480
The popcntw instruction is available whenever the popcntd instruction is
available, and performs a separate popcnt on the lower and upper 32-bits.
Ignoring the high-order count, this can be used for the 32-bit input case
(saving on the explicit zero extension otherwise required to use popcntd).
llvm-svn: 178470
This instruction is available on modern PPC64 CPUs, and is now used
to improve the SINT_TO_FP lowering (by eliminating the need for the
separate sign extension instruction and decreasing the amount of
needed stack space).
llvm-svn: 178446
The existing SINT_TO_FP code for i32 -> float/double conversion was disabled
because it relied on broken EXTSW_32/STD_32 instruction definitions. The
original intent had been to enable these 64-bit instructions to be used on CPUs
that support them even in 32-bit mode. Unfortunately, this form of lying to
the infrastructure was buggy (as explained in the FIXME comment) and had
therefore been disabled.
This re-enables this functionality, using regular DAG nodes, but only when
compiling in 64-bit mode. The old STD_32/EXTSW_32 definitions (which were dead)
are removed.
llvm-svn: 178438
'@SECREL' is what is used by the Microsoft assembler, but GNU as expects '@SECREL32'.
With the patch, the MC-generated code works fine in combination with a recent GNU as (2.23.51.20120920 here).
Patch by David Nadlinger!
Differential Revision: http://llvm-reviews.chandlerc.com/D429
llvm-svn: 178427
specific code paths.
This allows us to write code like:
if (__nvvm_reflect("FOO"))
// Do something
else
// Do something else
and compile into a library, then give "FOO" a value at kernel
compile-time so the check becomes a no-op.
llvm-svn: 178416
derived class MipsSETargetLowering.
We shouldn't be generating madd/msub nodes if target is Mips16, since Mips16
doesn't have support for multipy-add/sub instructions.
llvm-svn: 178404
Like nearbyint, rint can be implemented on PPC using the frin instruction. The
complication comes from the fact that rint needs to set the FE_INEXACT flag
when the result does not equal the input value (and frin does not do that). As
a result, we use a custom inserter which, after the rounding, compares the
rounded value with the original, and if they differ, explicitly sets the XX bit
in the FPSCR register (which corresponds to FE_INEXACT).
Once LLVM has better modeling of the floating-point environment we should be
able to (often) eliminate this extra complexity.
llvm-svn: 178362
These instructions are available on the P5x (and later) and on the A2. They
implement the standard floating-point rounding operations (floor, trunc, etc.).
One caveat: frin (round to nearest) does not implement "ties to even", and so
is only enabled in fast-math mode.
llvm-svn: 178337
- RDRAND always clears the destination value when a random value is not
available (i.e. CF == 0). This value is truncated or zero-extended as
the false boolean value to be returned. Boolean simplification needs
to skip this 'zext' or 'trunc' node.
llvm-svn: 178312
Compiling in 32-bit mode on a P7 would assert after 64-bit DAG combines were
added for bswap with load/store. This is because these combines are really only
valid in 64-bit mode, regardless of the CPU (and this was not being checked).
llvm-svn: 178286
These are 64-bit load/store with byte-swap, and available on the P7 and the A2.
Like the similar instructions for 16- and 32-bit words, these are matched in the
target DAG-combine phase against load/store-bswap pairs.
llvm-svn: 178276
PPC ISA 2.06 (P7, A2, etc.) has a popcntd instruction. Add this instruction and
tell TTI about it so that popcount-loop recognition will know about it.
llvm-svn: 178233
There were a few places where kill flags were not being set correctly, and
where 32-bit instruction variants were being used with 64-bit registers. After
r178180, this code was being triggered causing llc to assert.
llvm-svn: 178220
This reverts commit 342d92c7a0adeabc9ab00f3f0d88d739fe7da4c7.
Turns out we're going with a different schema design to represent
DW_TAG_imported_modules so we won't need this extra field.
llvm-svn: 178215
form of call in preference to memory indirect on Atom.
In this case, the patch applies the optimization to the code for reloading
spilled registers.
The patch also includes changes to sibcall.ll and movgs.ll, which were
failing on the Atom buildbot after the first patch was applied.
This patch by Sriram Murali.
llvm-svn: 178193
indirect through a memory address is to load the memory address into
a register and then call indirect through the register.
This patch implements this improvement by modifying SelectionDAG to
force a function address which is a memory reference to be loaded
into a virtual register.
Patch by Sriram Murali.
llvm-svn: 178171
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 178127
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 178126
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 178125
Some implementation detail in the forgotten past required the link
register to be placed in the GPRC and G8RC register classes. This is
just wrong on the face of it, and causes several extra intersection
register classes to be generated. I found this was having evil
effects on instruction scheduling, by causing the wrong register class
to be consulted for register pressure decisions.
No code generation changes are expected, other than some minor changes
in instruction order. Seven tests in the test bucket required minor
tweaks to adjust to the new normal.
llvm-svn: 178114
This is just the basic groundwork for supporting DW_TAG_imported_module but I
wanted to commit this before pushing support further into Clang or LLVM so that
this rather churny change is isolated from the rest of the work. The major
churn here is obviously adding another field (within the common DIScope prefix)
to all DIScopes (files, classes, namespaces, lexical scopes, etc). This should
be the last big churny change needed for DW_TAG_imported_module/using directive
support/PR14606.
llvm-svn: 178099
As Bill Schmidt pointed out to me, only on Darwin do we need to spill/restore
VRSAVE in the SjLj code. For non-Darwin, don't spill/restore VRSAVE (and I've
added some asserts to make sure that we're not).
As it turns out, we're not currently handling the Darwin case correctly (I've
added a FIXME in the test case). I've tried adding various implied register
definitions/uses to force the spill without success, so I'll need to address
this later.
llvm-svn: 178096
All Intel CPUs since Yonah look a lot alike, at least at the granularity
of the scheduling models. We can add more accurate models for
processors that aren't Sandy Bridge if required. Haswell will probably
need its own.
The Atom processor and anything based on NetBurst is completely
different. So are the non-Intel chips.
llvm-svn: 178080
Now that the register scavenger can support multiple spill slots, and PEI can
use virtual-register-based scavenging for multiple simultaneous registers, we
can use a virtual register for the transfer register in the CR spilling code.
This should eliminate the last place (outside of the prologue/epilogue) where
we depend on the unconditional availability of the r0 register. We will soon be
able to allocate it (in a somewhat restricted sense) as a GPR.
llvm-svn: 178060
The previous algorithm could not deal properly with scavenging multiple virtual
registers because it kept only one live virtual -> physical mapping (and
iterated through operands in order). Now we don't maintain a current mapping,
but rather use replaceRegWith to completely remove the virtual register as
soon as the mapping is established.
In order to allow the register scavenger to return a physical register killed
by an instruction for definition by that same instruction, we now call
RS->forward(I) prior to eliminating virtual registers defined in I. This
requires a minor update to forward to ignore virtual registers.
These new features will be tested in forthcoming commits.
llvm-svn: 178058
- 'prefetch' intrinsics are only lowered when SSE is available. On non-X86
builds, 'generic' CPU is used and stops lowering any prefetch intrinsics.
llvm-svn: 178046
- It's still considered aligned when the specified alignment is larger
than the natural alignment;
- The new alignment for the high 128-bit vector should be min(16,
alignment) as the pointer is advanced by 16, a power-of-2 offset.
llvm-svn: 177947
- Handle the case where the result of 'insert_subvect' is bitcasted
before 'extract_subvec'. This removes the redundant insertf128/extractf128
pair on unaligned 256-bit vector load/store on vectors of non 64-bit integer.
llvm-svn: 177945
For instance, following transformation will be disabled:
x + x + x => 3.0f * x;
The problem of these transformations is that it introduces a FP constant, which
following Instruction-Selection pass cannot handle.
Reviewed by Nadav, thanks a lot!
rdar://13445387
llvm-svn: 177933
test/CodeGen/Generic/2008-02-20-MatchingMem.ll: Test contains inline assembly not supported by Hexagon.
Following tests are XFAILed due to multiple return values which Hexagon doesn't support.
test/CodeGen/Generic/multiple-return-values-cross-block-with-invoke.ll
test/CodeGen/Generic/select-cc.ll
test/CodeGen/Generic/vector.ll
llvm-svn: 177912
Performing this check unilaterally prevented us from generating FMAs when the incoming IR contained illegal vector types which would eventually be legalized to underlying types that *did* support FMA.
For example, an @llvm.fmuladd on an OpenCL float16 should become a sequence of float4 FMAs, not float4 fmul+fadd's.
NOTE: Because we still call the target-specific profitability hook, individual targets can reinstate the old behavior, if desired, by simply performing the legality check inside their callback hook. They can also perform more sophisticated legality checks, if, for example, some illegal vector types can be productively implemented as FMAs, but not others.
llvm-svn: 177820
Thanks to Jakob for isolating the underlying problem from the
test case in r177423. The original commit had introduced
asymmetric copy operations, but these turned out to be a work-around
to the real problem (the use of == instead of hasSubClassEq in PPCCTRLoops).
llvm-svn: 177679
This implements SJLJ lowering on PPC, making the Clang functions
__builtin_{setjmp/longjmp} functional on PPC platforms. The implementation
strategy is similar to that on X86, with the exception that a branch-and-link
variant is used to get the right jump address. Credit goes to Bill Schmidt for
suggesting the use of the unconditional bcl form (instead of the regular bl
instruction) to limit return-address-cache pollution.
Benchmarking the speed at -O3 of:
static jmp_buf env_sigill;
void foo() {
__builtin_longjmp(env_sigill,1);
}
main() {
...
for (int i = 0; i < c; ++i) {
if (__builtin_setjmp(env_sigill)) {
goto done;
} else {
foo();
}
done:;
}
...
}
vs. the same code using the libc setjmp/longjmp functions on a P7 shows that
this builtin implementation is ~4x faster with Altivec enabled and ~7.25x
faster with Altivec disabled. This comparison is somewhat unfair because the
libc version must also save/restore the VSX registers which we don't yet
support.
llvm-svn: 177666
Although there is only one Altivec VRSAVE register, it is a member of
a register class, and we need the ability to spill it. Because this
register is normally callee-preserved and handled by special code this
has never before been necessary. However, this capability will be required by
a forthcoming commit adding SjLj support.
llvm-svn: 177654
The old code used to lower FRAMEADDR tried to replicate the logic in the real
frame-lowering code that determines whether or not the frame pointer (r31) will
be used. When it seemed as through the frame pointer would not be used, the
stack pointer (r1) was used instead. Unfortunately, because the stack size is
not yet known, this does not work. Instead, this change introduces new
always-reserved pseudo-registers (FP and FP8) that are replaced during prologue
insertion with the real frame-pointer register (either r1 or r31).
It is important that this intrinsic always return a valid frame address because
it is used by Clang to store the frame address as part of code generation for
__builtin_setjmp.
llvm-svn: 177653
NEON is not IEEE 754 compliant, so we should avoid lowering single-precision
floating point operations with NEON unless unsafe-math is turned on. The
equivalent VFP instructions are IEEE 754 compliant, but in some cores they're
much slower, so some archs/OSs might still request it to be on by default,
such as Swift and Darwin.
llvm-svn: 177651
This makes DIType's first non-tag parameter the same as DIFile's, allowing them
to both share the common implementation of getFilename/getDirectory in DIScope.
llvm-svn: 177467
A node's ordering is only propagated during legalization if (a) the new node does
not have an ordering (is not a CSE'd node), or (b) the new node has an ordering
that is higher than the node being legalized.
llvm-svn: 177465
This is another step along the way to making all DIScopes have a common prefix
which can be added to in a general manner to support using directives
(DW_TAG_imported_module).
llvm-svn: 177462
Currently, pre-increment store patterns are written to use two separate
operands to represent address base and displacement:
stwu $rS, $ptroff($ptrreg)
This causes problems when implementing the assembler parser, so this
commit changes the patterns to use standard (complex) memory operands
like in all other memory access instruction patterns:
stwu $rS, $dst
To still match those instructions against the appropriate pre_store
SelectionDAG nodes, the patch uses the new feature that allows a Pat
to match multiple DAG operands against a single (complex) instruction
operand.
Approved by Hal Finkel.
llvm-svn: 177429
Currently the PPC r0 register is unconditionally reserved. There are two reasons
for this:
1. r0 is treated specially (as the constant 0) by certain instructions, and so
cannot be used with those instructions as a regular register.
2. r0 is used as a temporary register in the CR-register spilling process
(where, under some circumstances, we require two GPRs).
This change addresses the first reason by introducing a restricted register
class (without r0) for use by those instructions that treat r0 specially. These
register classes have a new pseudo-register, ZERO, which represents the r0-as-0
use. This has the side benefit of making the existing target code simpler (and
easier to understand), and will make it clear to the register allocator that
uses of r0 as 0 don't conflict will real uses of the r0 register.
Once the CR spilling code is improved, we'll be able to allocate r0.
Adding these extra register classes, for some reason unclear to me, causes
requests to the target to copy 32-bit registers to 64-bit registers. The
resulting code seems correct (and causes no test-suite failures), and the new
test case covers this new kind of asymmetric copy.
As r0 is still reserved, no functionality change intended.
llvm-svn: 177423
Remove an accidentally-added instruction definition and add a comment in the
test case. This is in response to a post-commit review by Bill Schmidt.
No functionality change intended.
llvm-svn: 177404
The ARM backend currently has poor codegen for long sext/zext
operations, such as v8i8 -> v8i32. This patch addresses this
by performing a custom expansion in ARMISelLowering. It also
adds/changes the cost of such lowering in ARMTTI.
This partially addresses PR14867.
Patch by Pete Couperus
llvm-svn: 177380
PPC64 supports unaligned loads and stores of 64-bit values, but
in order to use the r+i forms, the offset must be a multiple of 4.
Unfortunately, this cannot always be determined by examining the
immediate itself because it might be available only via a TOC entry.
In order to get around this issue, we additionally predicate the
selection of the r+i form on the alignment of the load or store
(forcing it to be at least 4 in order to select the r+i form).
llvm-svn: 177338
Hal Finkel recently added code to allow unaligned memory references
for PowerPC. Two tests were temporarily modified with
-disable-ppc-unaligned to keep them from failing. This patch adjusts
the expected code generation for the unaligned references.
llvm-svn: 177328
Apparently my final cleanup to use a relevant suffix for these tests before
committing r176831 caused them to stop running since lit wasn't configured to
run tests with that suffix in those directories (why don't we just have a
global suffix list?). So, add the suffix to the relevant directories & fix the
test that has bitrotted over the last week due to my debug info schema changes.
llvm-svn: 177315
This commit fixes an assert that would occur on loops with large constant counts
(like looping for ((uint32_t) -1) iterations on PPC64). The existing code did
not handle counts that it computed to be negative (asserting instead), but
these can be created with valid inputs.
This bug was discovered by bugpoint while I was attempting to isolate a
completely different problem.
Also, in writing test cases for the negative-count problem, I discovered that
the ori/lsi handling was broken (there was a typo which caused the logic that
was supposed to detect these pairs and extract the iteration count to always
fail). This has now also been corrected (and is covered by one of the new test
cases).
llvm-svn: 177295
Because the initial-value constants had not been added to the list
of instructions considered for DCE the resulting code had redundant
constant-materialization instructions.
llvm-svn: 177294
This is the first step to making all DIScopes have a common metadata prefix (so
that things (using directives, for example) that can appear in any scope can be
added to that common prefix). DIFile is itself a DIScope so the common prefix
of all DIScopes cannot be a DIFile - instead it's the raw filename/directory
name pair.
llvm-svn: 177239
This change cleans up two issues with Altivec register spilling:
1. The spilling code was inefficient (using two instructions, and add and a
load, when just one would do)
2. The code assumed that r0 would always be available (true for now, but this
will change)
The new code handles VR spilling just like GPR spills but forced into r+r mode.
As a result, when any VR spills are present, we must now always allocate the
register-scavenger spill slot.
llvm-svn: 177231
I was too pessimistic in r177105. Vector selects that fit into a legal register
type lower just fine. I was mislead by the code fragment that I was using. The
stores/loads that I saw in those cases came from lowering the conditional off
an address.
Changing the code fragment to:
%T0_3 = type <8 x i18>
%T1_3 = type <8 x i1>
define void @func_blend3(%T0_3* %loadaddr, %T0_3* %loadaddr2,
%T1_3* %blend, %T0_3* %storeaddr) {
%v0 = load %T0_3* %loadaddr
%v1 = load %T0_3* %loadaddr2
==> FROM:
;%c = load %T1_3* %blend
==> TO:
%c = icmp slt %T0_3 %v0, %v1
==> USE:
%r = select %T1_3 %c, %T0_3 %v0, %T0_3 %v1
store %T0_3 %r, %T0_3* %storeaddr
ret void
}
revealed this mistake.
radar://13403975
llvm-svn: 177170
Unaligned access is supported on PPC for non-vector types, and is generally
more efficient than manually expanding the loads and stores.
A few of the existing test cases were using expanded unaligned loads and stores
to test other features (like load/store with update), and for these test cases,
unaligned access remains disabled.
llvm-svn: 177160
In preparation for the addition of other SIMD ISA extensions (such as QPX) we
need to make sure that all Altivec patterns are properly predicated on having
Altivec support.
No functionality change intended (one test case needed to be updated b/c it
assumed that Altivec intrinsics would be supported without enabling Altivec
support).
llvm-svn: 177152
For spills into a large stack frame, the FI-elimination code uses the register
scavenger to obtain a free GPR for use with an r+r-addressed load or store.
When there are no available GPRs, the scavenger gets one by using its spill
slot. Previously, we were not always allocating that spill slot and the RS
would assert when the spill slot was needed.
I don't currently have a small test that triggered the assert, but I've
created a small regression test that verifies that the spill slot is now
added when the stack frame is sufficiently large.
llvm-svn: 177140
We used to add a spill slot for the register scavenger whenever the function
has a frame pointer. This is unnecessarily conservative: We may need the spill
slot for dynamic stack allocations, and functions with dynamic stack
allocations always have a FP, but we might also have a FP for other reasons
(such as the user explicitly disabling frame-pointer elimination), and we don't
necessarily need a spill slot for those functions.
The structsinregs test needed adjustment because it disables FP elimination.
llvm-svn: 177106
By terrible I mean we store/load from the stack.
This matters on PAQp8 in _Z5trainPsS_ii (which is inlined into Mixer::update)
where we decide to vectorize a loop with a VF of 8 resulting in a 25%
degradation on a cortex-a8.
LV: Found an estimated cost of 2 for VF 8 For instruction: icmp slt i32
LV: Found an estimated cost of 2 for VF 8 For instruction: select i1, i32, i32
The bug that tracks the CodeGen part is PR14868.
radar://13403975
llvm-svn: 177105
In r176898 I updated the cost model to reflect the fact that sext/zext/cast on
v8i32 <-> v8i8 and v16i32 <-> v16i8 are expensive.
This test case is so that we make sure to update the cost model once we fix
CodeGen.
llvm-svn: 176955
This is the next step towards making the metadata for DIScopes have a common
prefix rather than having to delegate based on their tag type.
llvm-svn: 176913
This could be 'null' or the empty string, DIDescriptor::getStringField
coalesces the two cases anyway so it's just a matter of legible/efficient
representation.
The change in behavior of the DICompileUnit::get* functions could be
subsumed by the full verification check - but ideally that should just be an
assertion if we could front-load the actual debug info metadata failure paths.
llvm-svn: 176907
Now that only the register-scavenger version of the CR spilling code remains,
we no longer need the Darwin R2 hack. Darwin can use R0 as a spare register in
any case where the System V ABI uses it (R0 is special architecturally, and so
is reserved under all common ABIs).
A few test cases needed to be updated to reflect the register-allocation changes.
llvm-svn: 176868
These cases were found by further work to remove support for debug info
versioning. Common cleanups (other than changing the version info in the tag
field) included adding the last parameter to compile_units (recently added for
fission support) and other cases of trailing fields in lexical blocks, compile
units, and subprograms.
llvm-svn: 176834
Summary:
Statistics are still available in Release+Asserts (any +Asserts builds),
and stats can also be turned on with LLVM_ENABLE_STATS.
Move some of the FastISel stats that were moved under DEBUG()
back out of DEBUG(), since stats are disabled across the board now.
Many tests depend on grepping "-stats" output. Move those into
a orig_dir/Stats/. so that they can be marked as unsupported
when building without statistics.
Differential Revision: http://llvm-reviews.chandlerc.com/D486
llvm-svn: 176733
fold selectcc (selectcc x, y, a, b, cc), b, a, b, setne ->
selectcc x, y, a, b, cc
Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176700
Two changes:
1. Prefer SET* instructions when possible
2. Handle the CND*_INT case with floating-point args
Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176699
Code generation makes some basic assumptions about the IR it's been given. In
particular, if there is only one 'invoke' in the function, then that invoke
won't be going away. However, with the advent of the `llvm.donothing' intrinsic,
those invokes may go away. If all of them go away, the landing pad no longer has
any users. This confuses the back-end, which asserts.
This happens with SjLj exceptions, because that's the model that modifies the IR
based on there being invokes, etc. in the function.
Remove any invokes of `llvm.donothing' during SjLj EH preparation. This will
give us a CFG that the back-end won't be confused about. If all of the invokes
in a function are removed, then the SjLj EH prepare pass won't insert the bogus
code the relies upon the invokes being there.
<rdar://problem/13228754&13316637>
llvm-svn: 176677
Mostly this is just changing the named metadata (llvm.dbg.sp, llvm.dbg.gv,
llvm.dbg.<func>.lv, etc -> llvm.dbg.cu), adding a few fields to older records
(DIVariable: flags/inlined-at, DICompileUnit: sp/gv/types,
DISubprogram: local variables list)
The tests to update were discovered by a change I'm working on to remove debug
info version support - so any tests using old debug info versions I haven't
updated probably are bad tests or just not actually designed to test debug
info.
llvm-svn: 176671
That can usually be lowered efficiently and is common in sandybridge code.
It would be nice to do this in DAGCombiner but we can't insert arbitrary
BUILD_VECTORs this late.
Fixes PR15462.
llvm-svn: 176634