1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 13:02:52 +02:00
Commit Graph

224 Commits

Author SHA1 Message Date
Richard Sandiford
9d3cacb101 [SystemZ] Allow integer XOR involving high words
llvm-svn: 191759
2013-10-01 14:08:44 +00:00
Richard Sandiford
d2a449d3de [SystemZ] Allow integer OR involving high words
llvm-svn: 191755
2013-10-01 13:22:41 +00:00
Richard Sandiford
3af32e8cab [SystemZ] Allow integer insertions with a high-word destination
llvm-svn: 191753
2013-10-01 13:18:56 +00:00
Richard Sandiford
497097c027 [SystemZ] Allow selects with a high-word destination
llvm-svn: 191751
2013-10-01 13:10:16 +00:00
Richard Sandiford
8c8e2f0237 [SystemZ] Add patterns to load a constant into a high word (IIHF)
Similar to low words, we can use the shorter LLIHL and LLIHH if it turns
out that the other half of the GR64 isn't live.

llvm-svn: 191750
2013-10-01 13:02:28 +00:00
Richard Sandiford
ac3360b004 [SystemZ] Add register zero extensions involving at least one high word
llvm-svn: 191746
2013-10-01 12:49:07 +00:00
Richard Sandiford
192be1070b [SystemZ] Add truncating high-word stores (STCH and STHH)
llvm-svn: 191743
2013-10-01 12:22:49 +00:00
Richard Sandiford
de433bf58d [SystemZ] Add zero-extending high-word loads (LLCH and LLHH)
llvm-svn: 191742
2013-10-01 12:19:08 +00:00
Richard Sandiford
dd8ae7a617 [SystemZ] Add sign-extending high-word loads (LBH and LHH)
llvm-svn: 191740
2013-10-01 12:11:47 +00:00
Richard Sandiford
c2e496f7ba [SystemZ] Use upper words of GR64s for codegen
This just adds the basics necessary for allocating the upper words to
virtual registers (move, load and store).  The move support is parameterised
in a way that makes it easy to handle zero extensions, but the associated
zero-extend patterns are added by a later patch.

The easiest way of testing this seemed to be add a new "h" register
constraint for high words.  I don't expect the constraint to be useful
in real inline asms, but it should work, so I didn't try to hide it
behind an option.

llvm-svn: 191739
2013-10-01 11:26:28 +00:00
Manman Ren
ad317a135a TBAA: update tbaa format from scalar format to struct-path aware format.
llvm-svn: 191690
2013-09-30 18:17:55 +00:00
Manman Ren
2ef9ca7627 TBAA: handle scalar TBAA format and struct-path aware TBAA format.
Remove the command line argument "struct-path-tbaa" since we should not depend
on command line argument to decide which format the IR file is using. Instead,
we check the first operand of the tbaa tag node, if it is a MDNode, we treat
it as struct-path aware TBAA format, otherwise, we treat it as scalar TBAA
format.

When clang starts to use struct-path aware TBAA format no matter whether
struct-path-tbaa is no, and we can auto-upgrade existing bc files, the support
for scalar TBAA format can be dropped.

Existing testing cases are updated to use the struct-path aware TBAA format.

llvm-svn: 191538
2013-09-27 18:34:27 +00:00
Richard Sandiford
e1db330ce8 [SystemZ] Rein back the use of block operations
The backend tries to use block operations like MVC, NC, OC and XC for
simple scalar operations.  For correctness reasons, it rejects any case
in which the regions might partially overlap.  However, for performance
reasons, it should also reject cases where the regions might be equal,
since the instruction might then not use the fast path.

This fixes a performance regression seen in bzip2.  We may want to limit
the optimisation even more in future, or even remove it entirely, but I'll
try with this for now.

llvm-svn: 191525
2013-09-27 15:29:20 +00:00
Richard Sandiford
cae9d29151 [SystemZ] Improve handling of PC-relative addresses
The backend previously folded offsets into PC-relative addresses
whereever possible.  That's the right thing to do when the address
can be used directly in a PC-relative memory reference (using things
like LRL).  But if we have a register-based memory reference and need
to load the PC-relative address separately, it's better to use an anchor
point that could be shared with other accesses to the same area of the
variable.

Fixes a FIXME.

llvm-svn: 191524
2013-09-27 15:14:04 +00:00
Richard Sandiford
76d1801e90 [SystemZ] Add unsigned compare-and-branch instructions
For some reason I never got around to adding these at the same time as
the signed versions.  No idea why.

I'm not sure whether this SystemZII::BranchC* stuff is useful, or whether
it should just be replaced with an "is normal" flag.  I'll leave that
for later though.

There are some boundary conditions that can be tweaked, such as preferring
unsigned comparisons for equality with [128, 256), and "<= 255" over "< 256",
but again I'll leave those for a separate patch.

llvm-svn: 190930
2013-09-18 09:56:40 +00:00
Richard Sandiford
ea2b1a8b94 [SystemZ] Improve extload handling
The port originally had special patterns for extload, mapping them to the
same instructions as sextload.  It seemed neater to have patterns that
match "an extension that is allowed to be signed" and "an extension that
is allowed to be unsigned".

This was originally meant to be a clean-up, but it does improve the handling
of promoted integers a little, as shown by args-06.ll.

llvm-svn: 190777
2013-09-16 09:03:10 +00:00
Richard Sandiford
30374b51cb [SystemZ] Try to fold shifts into TMxx
E.g. "SRL %r2, 2; TMLL %r2, 1" => "TMLL %r2, 4".

llvm-svn: 190672
2013-09-13 09:09:50 +00:00
Richard Sandiford
bfcf129b8e [SystemZ] Add TM and TMY
The main complication here is that TM and TMY (the memory forms) set
CC differently from the register forms.  When the tested bits contain
some 0s and some 1s, the register forms set CC to 1 or 2 based on the
value the uppermost bit.  The memory forms instead set CC to 1
regardless of the uppermost bit.

Until now, I've tried to make it so that a branch never tests for an
impossible CC value.  E.g. NR only sets CC to 0 or 1, so branches on the
result will only test for 0 or 1.  Originally I'd tried to do the same
thing for TM and TMY by using custom matching code in ISelDAGToDAG.
That ended up being very ugly though, and would have meant duplicating
some of the chain checks that the common isel code does.

I've therefore gone for the simpler alternative of adding an extra
operand to the TM DAG opcode to say whether a memory form would be OK.
This means that the inverse of a "TM;JE" is "TM;JNE" rather than the
more precise "TM;JNLE", just like the inverse of "TMLL;JE" is "TMLL;JNE".
I suppose that's arguably less confusing though...

llvm-svn: 190400
2013-09-10 10:20:32 +00:00
Richard Sandiford
8d6edc5218 [SystemZ] Tweak integer comparison code
The architecture has many comparison instructions, including some that
extend one of the operands.  The signed comparison instructions use sign
extensions and the unsigned comparison instructions use zero extensions.
In cases where we had a free choice between signed or unsigned comparisons,
we were trying to decide at lowering time which would best fit the available
instructions, taking things like extension type into account.  The code
to do that was getting increasingly hairy and was also making some bad
decisions.  E.g. when comparing the result of two LLCs, it is better to use
CR rather than CLR, since CR can be fused with a branch while CLR can't.

This patch removes the lowering code and instead adds an operand to
integer comparisons to say whether signed comparison is required,
whether unsigned comparison is required, or whether either is OK.
We can then leave the choice of instruction up to the normal isel code.

llvm-svn: 190138
2013-09-06 11:51:39 +00:00
Richard Sandiford
ea5b4917b9 [SystemZ] Use XC for a memset of 0
llvm-svn: 190130
2013-09-06 10:25:07 +00:00
Richard Sandiford
399318ba38 [SystemZ] Add NC, OC and XC
For now these are just used to handle scalar ANDs, ORs and XORs in which
all operands are memory.

llvm-svn: 190041
2013-09-05 10:36:45 +00:00
Richard Sandiford
2543e2b36c [SystemZ] Add support for TMHH, TMHL, TMLH and TMLL
For now this just handles simple comparisons of an ANDed value with zero.
The CC value provides enough information to do any comparison for a
2-bit mask, and some nonzero comparisons with more populated masks,
but that's all future work.

llvm-svn: 189819
2013-09-03 15:38:35 +00:00
Richard Sandiford
9fc2e5cdff [SystemZ] Add support for TMHH, TMHL, TMLH and TMLL
For now just handles simple comparisons of an ANDed value with zero.
The CC value provides enough information to do any comparison for a
2-bit mask, and some nonzero comparisons with more populated masks,
but that's all future work.

llvm-svn: 189469
2013-08-28 10:31:43 +00:00
Richard Sandiford
96af6a5cf1 [SystemZ] Extend memcmp support to all constant lengths
This uses the infrastructure added for memcpy and memmove in r189331.

llvm-svn: 189458
2013-08-28 09:01:51 +00:00
Richard Sandiford
b585b1e48e [SystemZ] Extend memcpy and memset support to all constant lengths
Lengths up to a certain threshold (currently 6 * 256) use a series of MVCs.
Lengths above that threshold use a loop to handle X*256 bytes followed
by a single MVC to handle the excess (if any).  This loop will also be
needed in future when support for variable lengths is added.

Because the same tablegen classes are used to define MVC and CLC,
the patch also has the side-effect of defining a pseudo loop instruction
for CLC.  That instruction isn't used yet (and wouldn't be handled correctly
if it were).  I'm planning to use it soon though.

llvm-svn: 189331
2013-08-27 09:54:29 +00:00
Richard Sandiford
9867b44c59 [SystemZ] Add basic prefetch support
Just the instructions and intrinsics for now.

llvm-svn: 189100
2013-08-23 11:36:42 +00:00
Richard Sandiford
152d2f09a8 [SystemZ] Try reversing comparisons whose first operand is in memory
This allows us to make more use of the many compare reg,mem instructions.

llvm-svn: 189099
2013-08-23 11:27:19 +00:00
Richard Sandiford
de9eba2208 [SystemZ] Prefer LHI;ST... over LAY;MV...
If we had a store of an integer to memory, and the integer and store size
were suitable for a form of MV..., we used MV... no matter what.  We could
then have sequences like:

    lay %r2, 0(%r3,%r4)
    mvi 0(%r2), 4

In these cases it seems better to force the constant into a register
and use a normal store:

    lhi %r2, 4
    stc %r2, 0(%r3, %r4)

since %r2 is more likely to be hoisted and is easier to rematerialize.

llvm-svn: 189098
2013-08-23 11:18:53 +00:00
Richard Sandiford
b195d89bde Turn MipsOptimizeMathLibCalls into a target-independent scalar transform
...so that it can be used for z too.  Most of the code is the same.
The only real change is to use TargetTransformInfo to test when a sqrt
instruction is available.

The pass is opt-in because at the moment it only handles sqrt.

llvm-svn: 189097
2013-08-23 10:27:02 +00:00
Richard Sandiford
1dc05c13d2 [SystemZ] Define remainig *MUL_LOHI patterns
The initial port used MLG(R) for i64 UMUL_LOHI but left the other three
combinations as not-legal-or-custom.  Although 32x32->{32,32}
multiplications exist, they're not as quick as doing a normal 64-bit
multiplication, so it didn't seem like i32 SMUL_LOHI and UMUL_LOHI
would be useful.  There's also no direct instruction for i64 SMUL_LOHI,
so it needs to be implemented in terms of UMUL_LOHI.

However, not defining these patterns means that we don't convert
division by a constant into multiplication, so this patch fills
in the other cases.  The new i64 SMUL_LOHI sequence is simpler
than the one that we used previously for 64x64->128 multiplication,
so int-mul-08.ll now tests the full sequence.

llvm-svn: 188898
2013-08-21 09:34:56 +00:00
Richard Sandiford
e6e07910e3 [SystemZ] Use FI[EDX]BRA for codegen
llvm-svn: 188895
2013-08-21 09:04:20 +00:00
Richard Sandiford
add1a68f21 [SystemZ] Use SRST to optimize memchr
SystemZTargetLowering::emitStringWrapper() previously loaded the character
into R0 before the loop and made R0 live on entry.  I'd forgotten that
allocatable registers weren't allowed to be live across blocks at this stage,
and it confused LiveVariables enough to cause a miscompilation of f3 in
memchr-02.ll.

This patch instead loads R0 in the loop and leaves LICM to hoist it
after RA.  This is actually what I'd tried originally, but I went for
the manual optimisation after noticing that R0 often wasn't being hoisted.
This bug forced me to go back and look at why, now fixed as r188774.

We should also try to optimize null checks so that they test the CC result
of the SRST directly.  The select between null and the SRST GPR result could
then usually be deleted as dead.

llvm-svn: 188779
2013-08-20 09:38:48 +00:00
Richard Sandiford
6a0b1638b4 Fix test typo and add usual "br %r14" test
llvm-svn: 188775
2013-08-20 09:14:46 +00:00
Richard Sandiford
fcd54a3b89 Fix overly pessimistic shortcut in post-RA MachineLICM
Post-RA LICM keeps three sets of registers: PhysRegDefs, PhysRegClobbers
and TermRegs.  When it sees a definition of R it adds all aliases of R
to the corresponding set, so that when it needs to test for membership
it only needs to test a single register, rather than worrying about
aliases there too.  E.g. the final candidate loop just has:

    unsigned Def = Candidates[i].Def;
    if (!PhysRegClobbers.test(Def) && ...) {

to test whether register Def is multiply defined.

However, there was also a shortcut in ProcessMI to make sure we didn't
add candidates if we already knew that they would fail the final test.
This shortcut was more pessimistic than the final one because it
checked whether _any alias_ of the defined register was multiply defined.
This is too conservative for targets that define register pairs.
E.g. on z, R0 and R1 are sometimes used as a pair, so there is a
128-bit register that aliases both R0 and R1.  If a loop used
R0 and R1 independently, and the definition of R0 came first,
we would be able to hoist the R0 assignment (because that used
the final test quoted above) but not the R1 assignment (because
that meant we had two definitions of the paired R0/R1 register
and would fail the shortcut in ProcessMI).

This patch just uses the same check for the ProcessMI shortcut as
we use in the final candidate loop.

llvm-svn: 188774
2013-08-20 09:11:13 +00:00
Richard Sandiford
5e32ef0acd [SystemZ] Add negative integer absolute (load negative)
For now this matches the equivalent of (neg (abs ...)), which did hit a few
times in projects/test-suite.  We should probably also match cases where
absolute-like selects are used with reversed arguments.

llvm-svn: 188671
2013-08-19 12:56:58 +00:00
Richard Sandiford
aee0958460 [SystemZ] Add integer absolute (load positive)
llvm-svn: 188670
2013-08-19 12:48:54 +00:00
Richard Sandiford
841d24aa5a [SystemZ] Add support for sibling calls
This first cut is pretty conservative.  The final argument register (R6)
is call-saved, so we would need to make sure that the R6 argument to a
sibling call is the same as the R6 argument to the calling function,
which seems worth keeping as a separate patch.

Saying that integer truncations are free means that we no longer
use the extending instructions LGF and LLGF for spills in int-conv-09.ll
and int-conv-10.ll.  Instead we treat the registers as 64 bits wide and
truncate them to 32-bits where necessary.  I think it's unlikely we'd
use LGF and LLGF for spills in other situations for the same reason,
so I'm removing the tests rather than replacing them.  The associated
code is generic and applies to many more instructions than just
LGF and LLGF, so there is no corresponding code removal.

llvm-svn: 188669
2013-08-19 12:42:31 +00:00
Richard Sandiford
06a13f49c8 [SystemZ] Use SRST to implement strlen and strnlen
It would also make sense to use it for memchr; I'm working on that now.

llvm-svn: 188547
2013-08-16 11:41:43 +00:00
Richard Sandiford
93a75a2a56 [SystemZ] Use MVST to implement strcpy and stpcpy
llvm-svn: 188546
2013-08-16 11:29:37 +00:00
Richard Sandiford
353c7bc810 [SystemZ] Use CLST to implement strcmp
llvm-svn: 188544
2013-08-16 11:21:54 +00:00
Richard Sandiford
159b694b6e [SystemZ] Fix handling of 64-bit memcmp results
Generalize r188163 to cope with return types other than MVT::i32, just
as the existing visitMemCmpCall code did.  I've split this out into a
subroutine so that it can be used for other upcoming patches.

I also noticed that I'd used the wrong API to record the out chain.
It's a load that uses DAG.getRoot() rather than getRoot(), so the out
chain should go on PendingLoads.  I don't have a testcase for that because
we don't do any interesting scheduling on z yet.

llvm-svn: 188540
2013-08-16 10:55:47 +00:00
Richard Sandiford
7d2dfd7cf5 [SystemZ] Fix sign of integer memcmp result
r188163 used CLC to implement memcmp.  Code that compares the result
directly against zero can test the CC value produced by CLC, but code
that needs an integer result must use IPM.  The sequence I'd used was:

   ipm <reg>
   sll <reg>, 2
   sra <reg>, 30

but I'd forgotten that this inverts the order, so that CC==1 ("less")
becomes an integer greater than zero, and CC==2 ("greater") becomes
an integer less than zero.  This sequence should only be used if the
CLC arguments are reversed to compensate.  The problem then is that
the branch condition must also be reversed when testing the CLC
result directly.

Rather than do that, I went for a different sequence that works with
the natural CLC order:

   ipm <reg>
   srl <reg>, 28
   rll <reg>, <reg>, 31

One advantage of this is that it doesn't clobber CC.  A disadvantage
is that any sign extension to 64 bits must be done separately,
rather than being folded into the shifts.

llvm-svn: 188538
2013-08-16 10:22:54 +00:00
Daniel Dunbar
a496d61c01 [tests] Cleanup initialization of test suffixes.
- Instead of setting the suffixes in a bunch of places, just set one master
   list in the top-level config. We now only modify the suffix list in a few
   suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).

 - Aside from removing the need for a bunch of lit.local.cfg files, this enables
   4 tests that were inadvertently being skipped (one in
   Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
   CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
   XFAILED).

 - This commit also fixes a bunch of config files to use config.root instead of
   older copy-pasted code.

llvm-svn: 188513
2013-08-16 00:37:11 +00:00
Richard Sandiford
4980a32ba3 [SystemZ] Use CLC and IPM to implement memcmp
For now this is restricted to fixed-length comparisons with a length
in the range [1, 256], as for memcpy() and MVC.

llvm-svn: 188163
2013-08-12 10:28:10 +00:00
Richard Sandiford
b6323e0b21 [SystemZ] Optimize floating-point comparisons with zero
This follows the same lines as the integer code.  In the end it seemed
easier to have a second 4-bit mask in TSFlags to specify the compare-like
CC values.  That eats one more TSFlags bit than adding a CCHasUnordered
would have done, but it feels more concise.

llvm-svn: 187883
2013-08-07 11:10:06 +00:00
Richard Sandiford
5960348422 [SystemZ] Add floating-point load-and-test instructions
These instructions can also be used as comparisons with zero.

llvm-svn: 187882
2013-08-07 11:03:34 +00:00
Richard Sandiford
39f379d037 [SystemZ] Use BRCT and BRCTG to eliminate add-&-compare sequences
This patch just uses a peephole test for "add; compare; branch" sequences
within a single block.  The IR optimizers already convert loops to
decrement-and-branch-on-nonzero form in some cases, so even this
simplistic test triggers many times during a clang bootstrap and
projects/test-suite run.  It looks like there are still cases where we
need to more strongly prefer branches on nonzero though.  E.g. I saw a
case where a loop that started out with a check for 0 ended up with a
check for -1.  I'll try to look at that sometime.

I ended up adding the Reference class because MachineInstr::readsRegister()
doesn't check for subregisters (by design, as far as I could tell).

llvm-svn: 187723
2013-08-05 11:23:46 +00:00
Richard Sandiford
eefa00392f [SystemZ] Use LOAD AND TEST to eliminate comparisons against zero
llvm-svn: 187720
2013-08-05 11:03:20 +00:00
Richard Sandiford
9b9d87ef99 [SystemZ] Reuse CC results for integer comparisons with zero
This also fixes a bug in the predication of LR to LOCR: I'd forgotten
that with these in-place instruction builds, the implicit operands need
to be added manually.  I think this was latent until now, but is tested
by int-cmp-45.c.  It also adds a CC valid mask to STOC, again tested by
int-cmp-45.c.

llvm-svn: 187573
2013-08-01 10:39:40 +00:00
Richard Sandiford
6d6df38281 [SystemZ] Prefer comparisons with zero
Convert >= 1 to > 0, etc.  Using comparison with zero isn't a win on its own,
but it exposes more opportunities for CC reuse (the next patch).

llvm-svn: 187571
2013-08-01 10:29:45 +00:00
Richard Sandiford
5a382b8c6f [SystemZ] Implement isLegalAddressingMode()
The loop optimizers were assuming that scales > 1 were OK.  I think this
is actually a bug in TargetLoweringBase::isLegalAddressingMode(),
since it seems to be trying to reject anything that isn't r+i or r+r,
but it has no default case for scales other than 0, 1 or 2.  Implementing
the hook for z means that z can no longer test any change there though.

llvm-svn: 187497
2013-07-31 12:58:26 +00:00
Richard Sandiford
987e271aaa [SystemZ] Be more careful about inverting CC masks (conditional loads)
Extend r187495 to conditional loads.  I split this out because the
easiest way seemed to be to force a particular operand order in
SystemZISelDAGToDAG.cpp.

llvm-svn: 187496
2013-07-31 12:38:08 +00:00
Richard Sandiford
b3ecd3b03e [SystemZ] Be more careful about inverting CC masks
System z branches have a mask to select which of the 4 CC values should
cause the branch to be taken.  We can invert a branch by inverting the mask.
However, not all instructions can produce all 4 CC values, so inverting
the branch like this can lead to some oddities.  For example, integer
comparisons only produce a CC of 0 (equal), 1 (less) or 2 (greater).
If an integer EQ is reversed to NE before instruction selection,
the branch will test for 1 or 2.  If instead the branch is reversed
after instruction selection (by inverting the mask), it will test for
1, 2 or 3.  Both are correct, but the second isn't really canonical.
This patch therefore keeps track of which CC values are possible
and uses this when inverting a mask.

Although this is mostly cosmestic, it fixes undefined behavior
for the CIJNLH in branch-08.ll.  Another fix would have been
to mask out bit 0 when generating the fused compare and branch,
but the point of this patch is that we shouldn't need to do that
in the first place.

The patch also makes it easier to reuse CC results from other instructions.

llvm-svn: 187495
2013-07-31 12:30:20 +00:00
Richard Sandiford
7320349682 [SystemZ] Move compare-and-branch generation even later
r187116 moved compare-and-branch generation from the instruction-selection
pass to the peephole optimizer (via optimizeCompare).  It turns out that even
this is a bit too early.  Fused compare-and-branch instructions don't
interact well with predication, where a CC result is needed.  They also
make it harder to reuse the CC side-effects of earlier instructions
(not yet implemented, but the subject of a later patch).

Another problem was that the AnalyzeBranch family of routines weren't
handling compares and branches, so we weren't able to reverse the fused
form in cases where we would reverse a separate branch.  This could have
been fixed by extending AnalyzeBranch, but given the other problems,
I've instead moved the fusing to the long-branch pass, which is also
responsible for the opposite transformation: splitting out-of-range
compares and branches into separate compares and long branches.

I've added a test for the AnalyzeBranch problem.  A test for the
predication problem is included in the next patch, which fixes a bug
in the choice of CC mask.

llvm-svn: 187494
2013-07-31 12:11:07 +00:00
Richard Sandiford
32c979f9e1 [SystemZ] Postpone NI->RISBG conversion to convertToThreeAddress()
r186399 aggressively used the RISBG instruction for immediate ANDs,
both because it can handle some values that AND IMMEDIATE can't,
and because it allows the destination register to be different from
the source.  I realized later while implementing the distinct-ops
support that it would be better to leave the choice up to
convertToThreeAddress() instead.  The AND IMMEDIATE form is shorter
and is less likely to be cracked.

This is a problem for 32-bit ANDs because we assume that all 32-bit
operations will leave the high word untouched, whereas RISBG used in
this way will either clear the high word or copy it from the source
register.  The patch uses the z196 instruction RISBLG for this instead.

This means that z10 will be restricted to NILL, NILH and NILF for
32-bit ANDs, but I think that should be OK for now.  Although we're
using z10 as the base architecture, the optimization work is going
to be focused more on z196 and zEC12.

llvm-svn: 187492
2013-07-31 11:36:35 +00:00
Richard Sandiford
d3155041a0 [SystemZ] Rework compare and branch support
Before the patch we took advantage of the fact that the compare and
branch are glued together in the selection DAG and fused them together
(where possible) while emitting them.  This seemed to work well in practice.
However, fusing the compare so early makes it harder to remove redundant
compares in cases where CC already has a suitable value.  This patch
therefore uses the peephole analyzeCompare/optimizeCompareInstr pair of
functions instead.

No behavioral change intended, but it paves the way for a later patch.

llvm-svn: 187116
2013-07-25 09:34:38 +00:00
Richard Sandiford
a8ce38895a [SystemZ] Add LOCR and LOCGR
llvm-svn: 187113
2013-07-25 09:11:15 +00:00
Richard Sandiford
ef1a928c13 [SystemZ] Add LOC and LOCG
As with the stores, these instructions can trap when the condition is false,
so they are only used for things like (cond ? x : *ptr).

llvm-svn: 187112
2013-07-25 09:04:52 +00:00
Richard Sandiford
8de9a94d5e [SystemZ] Add STOC and STOCG
These instructions are allowed to trap even if the condition is false,
so for now they are only used for "*ptr = (cond ? x : *ptr)"-style
constructs.

llvm-svn: 187111
2013-07-25 08:57:02 +00:00
Richard Sandiford
49d61a017c [SystemZ] Add tests for ALHSIK and ALGHSIK
The insn definitions themselves crept into r186689, sorry.
This should be the last of the distinct-ops instructions.

llvm-svn: 186690
2013-07-19 16:44:32 +00:00
Richard Sandiford
fe2faa0d5b [SystemZ] Add ALRK, AGLRK, SLRK and SGLRK
Follows the same lines as r186686, but much more limited, since we only
use ADD LOGICAL for multi-i64 additions.

llvm-svn: 186689
2013-07-19 16:37:00 +00:00
Richard Sandiford
8ae3d3dbfc [SystemZ] Add AHIK and AGHIK
I did these as a separate patch because it uses a slightly different
form of RIE layout.

llvm-svn: 186687
2013-07-19 16:32:12 +00:00
Richard Sandiford
8dca7aa469 [SystemZ] Add ARK, AGRK, SRK and SGRK
The testsuite changes follow the same lines as for r186683.

llvm-svn: 186686
2013-07-19 16:26:39 +00:00
Richard Sandiford
ae52d0261f [SystemZ] Add NGRK, OGRK and XGRK
Like r186683, but for 64 bits.

llvm-svn: 186685
2013-07-19 16:24:22 +00:00
Richard Sandiford
f4f67cacd7 [SystemZ] Add NRK, ORK and XRK
The atomic tests assume the two-operand forms, so I've restricted them to z10.

Running and-01.ll, or-01.ll and xor-01.ll for z196 as well as z10 shows why
using convertToThreeAddress() is better than exposing the three-operand forms
first and then converting back to two operands where possible (which is what
I'd originally tried).  Using the three-operand form first stops us from
taking advantage of NG, OG and XG for spills.

llvm-svn: 186683
2013-07-19 16:21:55 +00:00
Richard Sandiford
499b1e1400 [SystemZ] Use SLLK, SRLK and SRAK for codegen
This patch uses the instructions added in r186680 for codegen.

llvm-svn: 186681
2013-07-19 16:12:08 +00:00
Richard Sandiford
be2932828b [SystemZ] Use RNSBG
This should be the last of the R.SBG patches for now.

llvm-svn: 186573
2013-07-18 10:40:35 +00:00
Richard Sandiford
106d44afb8 [SystemZ] Generalize RxSBG SRA case
The original code only folded SRA into ROTATE ... SELECTED BITS
if there was no outer shift.  This patch splits out that check
and generalises it slightly.  The extra cases aren't really that
interesting, but this is paving the way for RNSBG support.

llvm-svn: 186571
2013-07-18 10:14:55 +00:00
Richard Sandiford
8e6d7fea3c [SystemZ] Use RXSBG
Extend the previous R.SBG patches to handle XORs.

llvm-svn: 186570
2013-07-18 10:06:15 +00:00
Richard Sandiford
ab0fb439a9 [SystemZ] Use ROSBG and non-zero form of RISBG for OR nodes
llvm-svn: 186405
2013-07-16 11:55:57 +00:00
Richard Sandiford
c99f769478 [SystemZ] Use RISBG for (shift (and ...))
Another patch in the series to make more use of R.SBG.  This one extends
r186072 and r186073 to handle cases where the AND is inside the shift.

llvm-svn: 186399
2013-07-16 11:02:24 +00:00
Stephen Lin
7e501cf4c3 Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change and all updated tests passed locally.
This update was done with the following bash script:

  find test/CodeGen -name "*.ll" | \
  while read NAME; do
    echo "$NAME"
    if ! grep -q "^; *RUN: *llc.*debug" $NAME; then
      TEMP=`mktemp -t temp`
      cp $NAME $TEMP
      sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
      while read FUNC; do
        sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
      done
      sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
      sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
      sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
      sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
      mv $TEMP $NAME
    fi
  done

llvm-svn: 186280
2013-07-14 06:24:09 +00:00
Richard Sandiford
bab8ad599d [SystemZ] Add test missing from r186148
Sigh, twice in two days sorry.  One day I'll remember...

llvm-svn: 186150
2013-07-12 09:20:14 +00:00
Richard Sandiford
8fe174c649 [SystemZ] Optimize sign-extends of vector setccs
Normal (sext (setcc ...)) sequences are optimised into
(select_cc ..., -1, 0) by DAGCombiner::visitSIGN_EXTEND.
However, this is deliberately not done for vectors, and after
vector type legalization we have (sext_inreg (setcc ...)) instead.

I wondered about trying to extend DAGCombiner to handle this case too,
but it seemed to be a loss on some other targets I tried, even those for
which SETCC isn't "legal" and SELECT_CC is.

llvm-svn: 186149
2013-07-12 09:17:10 +00:00
Richard Sandiford
b7ea44c800 [SystemZ] Improve spilling of LGDR and LDGR
If the source of these instructions is spilled we should load the destination.
If the destination is spilled we should store the source.

llvm-svn: 186147
2013-07-12 08:37:17 +00:00
Richard Sandiford
f53d62a725 [SystemZ] Add testcase missing from r186073
llvm-svn: 186074
2013-07-11 09:10:38 +00:00
Richard Sandiford
e5c5a78828 [SystemZ] Use zeroing form of RISBG for shift-and-AND sequences
Extend r186072 to handle shifts and ANDs.

llvm-svn: 186073
2013-07-11 09:10:09 +00:00
Richard Sandiford
fa42560424 [SystemZ] Use zeroing form of RISBG for some AND sequences
RISBG can handle some ANDs for which no AND IMMEDIATE exists.
It also acts as a three-operand AND for some cases where an
AND IMMEDIATE could be used instead.

It might be worth adding a pass to replace RISBG with AND IMMEDIATE
in cases where the register operands end up being the same and where
AND IMMEDIATE is smaller.

llvm-svn: 186072
2013-07-11 08:59:12 +00:00
Richard Sandiford
03cd63553a [SystemZ] Use MVC for simple load/store pairs
Look for patterns of the form (store (load ...), ...) in which the two
locations are known not to partially overlap.  (Identical locations are OK.)
These sequences are better implemented by MVC unless either the load or
the store could use RELATIVE LONG instructions.

The testcase showed that we weren't using LHRL and LGHRL for extload16,
only sextloadi16.  The patch fixes that too.

llvm-svn: 185919
2013-07-09 09:46:39 +00:00
Richard Sandiford
9295f19189 [SystemZ] Use "STC;MVC" for memset
Use "STC;MVC" for memsets that are too big for two STCs or MV...Is yet
small enough for a single MVC.  As with memcpy, I'm leaving longer cases
till later.

The number of tests might seem excessive, but f33 & f34 from memset-04.ll
failed the first cut because I'd not added the "?:" on the calculation
of Size1.

llvm-svn: 185918
2013-07-09 09:32:42 +00:00
Richard Sandiford
537b8d7bec [SystemZ] Use MVC for memcpy
Use MVC for memcpy in cases where a single MVC is enough.  Using MVC is
a win for longer copies too, but I'll leave that for later.

llvm-svn: 185802
2013-07-08 09:35:23 +00:00
Richard Sandiford
c0fe83c1b6 [SystemZ] Remove no-op MVCs
The stack coloring pass has code to delete stores and loads that become
trivially dead after coloring.  Extend it to cope with single instructions
that copy from one frame index to another.

The testcase happens to show an example of this kicking in at the moment.
It did occur in Real Code too though.

llvm-svn: 185705
2013-07-05 14:38:48 +00:00
Richard Sandiford
8e414e2aef Fix double renaming bug in stack coloring pass
The stack coloring pass renumbered frame indexes with a loop of the form:

  for each frame index FI
    for each instruction I that uses FI
      for each use of FI in I
        rename FI to FI'

This caused problems if an instruction used two frame indexes F0 and F1
and if F0 was renamed to F1 and F1 to F2.  The first time we visited the
instruction we changed F0 to F1, then we changed both F1s to F2.

In other words, the problem was that SSRefs recorded which instructions
used an FI, but not which MachineOperands and MachineMemOperands within
that instruction used it.

This is easily fixed for MachineOperands by walking the instructions
once and processing each operand in turn.  There's already a loop to
do that for dead store elimination, so it seemed more efficient to
fuse the two at the block level.

MachineMemOperands are more tricky because they can be shared between
instructions.  The patch handles them by making SSRefs an array of
MachineMemOperands rather than an array of MachineInstrs.  We might end
up processing the same MachineMemOperand twice, but that's OK because
we always know from the SSRefs index what the original frame index was.

llvm-svn: 185703
2013-07-05 14:24:47 +00:00
Richard Sandiford
d84ed5f34a [SystemZ] Enable the use of MVC for frame-to-frame spills
...now that the problem that prompted the restriction has been fixed.

The original spill-02.py was a compromise because at the time I couldn't
find an example that actually failed without the two scavenging slots.
The version included here did.

llvm-svn: 185701
2013-07-05 14:02:01 +00:00
Richard Sandiford
acd92ea1e1 [SystemZ] Allocate a second register scavenging slot
This is another prerequisite for frame-to-frame MVC copies.
I'll commit the patch that makes use of the slot separately.

The downside of trying to test many corner cases with each of the
available addressing modes is that a fair few tests need to account
for the new frame layout.  I do still think it's useful to have all
these tests though, since it's something that wouldn't get much coverage
otherwise.

llvm-svn: 185698
2013-07-05 13:11:52 +00:00
Richard Sandiford
c7495a0fca [SystemZ] Fold more spills
Add a mapping from register-based <INSN>R instructions to the corresponding
memory-based <INSN>.  Use it to cut down on the number of spill loads.

Some instructions extend their operands from smaller fields, so this
required a new TSFlags field to say how big the unextended operand is.

This optimisation doesn't trigger for C(G)R and CL(G)R because in practice
we always combine those instructions with a branch.  Adding a test for every
other case probably seems excessive, but it did catch a missed optimisation
for DSGF (fixed in r185435).

llvm-svn: 185529
2013-07-03 10:10:02 +00:00
Richard Sandiford
750b064fa2 [SystemZ] Use DSGFR over DSGR in more cases
Fixes some cases where we were using full 64-bit division for (sdiv i32, i32)
and (sdiv i64, i32).

The "32" in "SDIVREM32" just refers to the second operand.  The first operand
of all *DIVREM*s is a GR128.

llvm-svn: 185435
2013-07-02 15:40:22 +00:00
Richard Sandiford
33deb195f9 [SystemZ] Use MVC to spill loads and stores
Try to use MVC when spilling the destination of a simple load or the source
of a simple store.  As explained in the comment, this doesn't yet handle
the case where the load or store location is also a frame index, since
that could lead to two simultaneous scavenger spills, something the
backend can't handle yet.  spill-02.py tests that this restriction kicks in,
but unfortunately I've not yet found a case that would fail without it.
The volatile trick I used for other scavenger tests doesn't work here
because we can't use MVC for volatile accesses anyway.

I'm planning on relaxing the restriction later, hopefully with a test
that does trigger the problem...

Tests @f8 and @f9 also showed that L(G)RL and ST(G)RL were wrongly
classified as SimpleBDX{Load,Store}.  It wouldn't be easy to test for
that bug separately, which is why I didn't split out the fix as a
separate patch.

llvm-svn: 185434
2013-07-02 15:28:56 +00:00
Richard Sandiford
56466fde7f [SystemZ] Fix some embarrassing test typos
llvm-svn: 185070
2013-06-27 09:49:34 +00:00
Richard Sandiford
609a7eb0a1 [SystemZ] Allow LA and LARL to be rematerialized
llvm-svn: 185069
2013-06-27 09:42:10 +00:00
Richard Sandiford
a2d164d53e [SystemZ] Allow immediate moves to be rematerialized
llvm-svn: 185068
2013-06-27 09:38:48 +00:00
Richard Sandiford
964ffa104f [SystemZ] Add conditional store patterns
Add pseudo conditional store instructions, so that we use:

    branch foo:
    store
foo:

instead of:

    load
    branch foo:
    move
foo:
    store

z196 has real 32-bit and 64-bit conditional stores, but we don't use
any z196 instructions yet.

llvm-svn: 185065
2013-06-27 09:27:40 +00:00
Richard Sandiford
77f91408dd [SystemZ] Don't use LOAD and STORE REVERSED for volatile accesses
Unlike most -- hopefully "all other", but I'm still checking -- memory
instructions we support, LOAD REVERSED and STORE REVERSED may access
the memory location several times.  This means that they are not suitable
for volatile loads and stores.

This patch is a prerequisite for better atomic load and store support.
The same principle applies there: almost all memory instructions we
support are inherently atomic ("block concurrent"), but LOAD REVERSED
and STORE REVERSED are exceptions.

Other instructions continue to allow volatile operands.  I will add
positive "allows volatile" tests at the same time as the "allows atomic
load or store" tests.

llvm-svn: 183002
2013-05-31 13:25:22 +00:00
Richard Sandiford
b7ab6fd782 [SystemZ] Enable unaligned accesses
The code to distinguish between unaligned and aligned addresses was
already there, so this is mostly just a switch-on-and-test process.

llvm-svn: 182920
2013-05-30 09:45:42 +00:00
Richard Sandiford
b70371b744 [SystemZ] Two tests missing from previous commit
llvm-svn: 182847
2013-05-29 11:59:26 +00:00
Richard Sandiford
b62e20c071 [SystemZ] Immediate compare-and-branch support
This patch adds support for the CIJ and CGIJ instructions.

llvm-svn: 182846
2013-05-29 11:58:52 +00:00
Richard Sandiford
4b6cfd7cec [SystemZ] Register compare-and-branch support
This patch adds support for the CRJ and CGRJ instructions.  Support for
the immediate forms will be a separate patch.

The architecture has a large number of comparison instructions.  I think
it's generally better to concentrate on using the "best" comparison
instruction first and foremost, then only use something like CRJ if
CR really was the natual choice of comparison instruction.  The patch
therefore opportunistically converts separate CR and BRC instructions
into a single CRJ while emitting instructions in ISelLowering.

llvm-svn: 182764
2013-05-28 10:41:11 +00:00
Richard Sandiford
10059d8172 [SystemZ] Tighten branch tests
After r182274, the branches in these tests must always be short.

llvm-svn: 182358
2013-05-21 08:53:17 +00:00
Richard Sandiford
cc815ef1d8 [SystemZ] Add long branch pass
Before this change, the SystemZ backend would use BRCL for all branches
and only consider shortening them to BRC when generating an object file.
E.g. a branch on equal would use the JGE alias of BRCL in assembly output,
but might be shortened to the JE alias of BRC in ELF output.  This was
a useful first step, but it had two problems:

(1) The z assembler isn't traditionally supposed to perform branch shortening
    or branch relaxation.  We followed this rule by not relaxing branches
    in assembler input, but that meant that generating assembly code and
    then assembling it would not produce the same result as going directly
    to object code; the former would give long branches everywhere, whereas
    the latter would use short branches where possible.

(2) Other useful branches, like COMPARE AND BRANCH, do not have long forms.
    We would need to do something else before supporting them.

    (Although COMPARE AND BRANCH does not change the condition codes,
    the plan is to model COMPARE AND BRANCH as a CC-clobbering instruction
    during codegen, so that we can safely lower it to a separate compare
    and long branch where necessary.  This is not a valid transformation
    for the assembler proper to make.)

This patch therefore moves branch relaxation to a pre-emit pass.
For now, calls are still shortened from BRASL to BRAS by the assembler,
although this too is not really the traditional behaviour.

The first test takes about 1.5s to run, and there are likely to be
more tests in this vein once further branch types are added.  The feeling
on IRC was that 1.5s is a bit much for a single test, so I've restricted
it to SystemZ hosts for now.

The patch exposes (and fixes) some typos in the main CodeGen/SystemZ tests.
A later patch will remove the {{g}}s from that directory.

llvm-svn: 182274
2013-05-20 14:23:08 +00:00
Richard Sandiford
1ccc224047 [SystemZ] Make use of SUBTRACT HALFWORD
Thanks to Ulrich Weigand for noticing that this instruction was missing.

llvm-svn: 181893
2013-05-15 15:05:29 +00:00