Mips32 code as Mips16 unless it can't be compiled as Mips 16. For now this
would happen as long as floating point instructions are not needed.
Probably it would also make sense to compile as mips32 if atomic operations
are needed too. There may be other cases too.
A module pass prescans the IR and adds the mips16 or nomips16 attribute
to functions depending on the functions needs.
Mips 16 mode can result in a 40% code compression by utililizing 16 bit
encoding of many instructions.
The hope is for this to replace the traditional gcc way of dealing with
Mips16 code using floating point which involves essentially using soft float
but with a library implemented using mips32 floating point. This gcc
method also requires creating stubs so that Mips32 code can interact with
these Mips 16 functions that have floating point needs. My conjecture is
that in reality this traditional gcc method would never win over this
new method.
I will be implementing the traditional gcc method also. Some of it is already
done but I needed to do the stubs to finish the work and those required
this mips16/32 mixed mode capability.
I have more ideas for to make this new method much better and I think the old
method will just live in llvm for anyone that needs the backward compatibility
but I don't for what reason that would be needed.
llvm-svn: 179185
Summary:
I did a local comparison between using bash and using lit's runner, and
more of the suite passes with lit than passes with bash. Most of the
bash failures have to do with /dev/null, which is nonsensical on
Windows, but the lit runner handles it.
The lit shell runner is also much faster than bash, so I would expect
most Windows devs would want it by default.
The behavior can be overridden on any OS by setting
LIT_USE_INTERNAL_SHELL to 0 or 1 in the environment.
Reviewers: chapuni, ddunbar
CC: llvm-commits, timurrrr
Differential Revision: http://llvm-reviews.chandlerc.com/D559
llvm-svn: 179173
These instructions aren't universally available, but depend on a specific
extension to the normal ARM architecture (rather than, say, v6/v7/...) so a new
feature is appropriate.
This also enables the feature by default on A-class cores which usually have
these extensions, to avoid breaking existing code and act as a sensible
default.
llvm-svn: 179171
Depending on the number of bits set in the writemask.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 179166
This adds in-principle support for if-converting the bctr[l] instructions.
These instructions are used for indirect branching. It seems, however, that the
current if converter will never actually predicate these. To do so, it would
need the ability to hoist a few setup insts. out of the conditionally-executed
block. For example, code like this:
void foo(int a, int (*bar)()) { if (a != 0) bar(); }
becomes:
...
beq 0, .LBB0_2
std 2, 40(1)
mr 12, 4
ld 3, 0(4)
ld 11, 16(4)
ld 2, 8(4)
mtctr 3
bctrl
ld 2, 40(1)
.LBB0_2:
...
and it would be safe to do all of this unconditionally with a predicated
beqctrl instruction.
llvm-svn: 179156
The target hooks are getting out of hand. What does it mean to run
before or after regalloc anyway? Allowing either Pass* or AnalysisID
pass identification should make it much easier for targets to use the
substitutePass and insertPass APIs, and create less need for badly
named target hooks.
llvm-svn: 179140
Modifier 'D' is to use the second word of a double integer.
We had previously implemented the pure register varient of
the modifier and this patch implements the memory reference.
#include "stdio.h"
int b[8] = {0,1,2,3,4,5,6,7};
void main()
{
int i;
// The first word. Notice, no 'D'
{asm (
"lw %0,%1;"
: "=r" (i)
: "m" (*(b+4))
);}
printf("%d\n",i);
// The second word
{asm (
"lw %0,%D1;"
: "=r" (i)
: "m" (*(b+4))
);}
printf("%d\n",i);
}
llvm-svn: 179135
This enables us to form predicated branches (which are the same conditional
branches we had before) and also a larger set of predicated returns (including
instructions like bdnzlr which is a conditional return and loop-counter
decrement all in one).
At the moment, if conversion does not capture all possible opportunities. A
simple example is provided in early-ret2.ll, where if conversion forms one
predicated return, and then the PPCEarlyReturn pass picks up the other one. So,
at least for now, we'll keep both mechanisms.
llvm-svn: 179134
and mips16 on a per function basis.
Because this patch is somewhat involved I have provide an overview of the
key pieces of it.
The patch is written so as to not change the behavior of the non mixed
mode. We have tested this a lot but it is something new to switch subtargets
so we don't want any chance of regression in the mainline compiler until
we have more confidence in this.
Mips32/64 are very different from Mip16 as is the case of ARM vs Thumb1.
For that reason there are derived versions of the register info, frame info,
instruction info and instruction selection classes.
Now we register three separate passes for instruction selection.
One which is used to switch subtargets (MipsModuleISelDAGToDAG.cpp) and then
one for each of the current subtargets (Mips16ISelDAGToDAG.cpp and
MipsSEISelDAGToDAG.cpp).
When the ModuleISel pass runs, it determines if there is a need to switch
subtargets and if so, the owning pointers in MipsTargetMachine are
appropriately changed.
When 16Isel or SEIsel is run, they will return immediately without doing
any work if the current subtarget mode does not apply to them.
In addition, MipsAsmPrinter needs to be reset on a function basis.
The pass BasicTargetTransformInfo is substituted with a null pass since the
pass is immutable and really needs to be a function pass for it to be
used with changing subtargets. This will be fixed in a follow on patch.
llvm-svn: 179118
This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations.
The infrastructure has three potential users:
1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]).
2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute.
3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization.
This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code:
void SAXPY(int *x, int *y, int a, int i) {
x[i] = a * x[i] + y[i];
x[i+1] = a * x[i+1] + y[i+1];
x[i+2] = a * x[i+2] + y[i+2];
x[i+3] = a * x[i+3] + y[i+3];
}
llvm-svn: 179117
parse an identifier. Otherwise, parseExpression may parse multiple tokens,
which makes it impossible to properly compute an immediate displacement.
An example of such a case is the source operand (i.e., [Symbol + ImmDisp]) in
the below example:
__asm mov eax, [Symbol + ImmDisp]
The existing test cases exercise this patch.
rdar://13611297
llvm-svn: 179115
therefore not at all) of the pc or statement list. We also don't
need to emit the compilation dir so save so space and time
and don't bother.
Fix up the testcase accordingly and verify that we don't emit
the attributes or the items that they use.
llvm-svn: 179114
Some general cleanup and only scan the end of a BB for branches (once we're
done with the terminators and debug values, then there should not be any other
branches). These address post-commit review suggestions by Bill Schmidt.
No functionality change intended.
llvm-svn: 179112
rather than deriving the StringRef from the Start and End SMLocs.
Using the Start and End SMLocs works fine for operands such as [Symbol], but
not for operands such as [Symbol + ImmDisp]. All existing test cases that
reference a variable exercise this patch.
rdar://13602265
llvm-svn: 179109
This pattern occurs in SROA output due to the way vector arguments are lowered
on ARM.
The testcase from PR15525 now compiles into this, which is better than the code
we got with the old scalarrepl:
_Store:
ldr.w r9, [sp]
vmov d17, r3, r9
vmov d16, r1, r2
vst1.8 {d16, d17}, [r0]
bx lr
Differential Revision: http://llvm-reviews.chandlerc.com/D647
llvm-svn: 179106
On PowerPC, non-vector loads and stores have r+i forms; however, in functions
with large stack frames these were not being used to access slots far from the
stack pointer because such slots were out of range for the signed 16-bit
immediate offset field. This increases register pressure because we need a
separate register for each offset (when the r+r form is used). By enabling
virtual base registers, we can deal with large stack frames without unduly
increasing register pressure.
llvm-svn: 179105
Some translations here are not 1x1 because there are grep|grep
chains that are non-trivial to implement in terms of FileCheck features. I
made an effort for the tests to remain as similar as possible; do let me know
if you notice anything fishy. The good news are that some buggy tests were
fixed (grep | not grep - a bug waiting to happen).
llvm-svn: 179102
The save area is twice as big and there is no struct return slot. The
stack pointer is always 16-byte aligned (after adding the bias).
Also eliminate the stack adjustment instructions around calls when the
function has a reserved stack frame.
llvm-svn: 179083