1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 05:01:59 +01:00

1090 Commits

Author SHA1 Message Date
David Green
5a76ea81c6 [ARM] More unpredictable VCVT instructions.
These extra vcvt instructions were missed from 74ca67c109 because they
live in a different Domain, but should be treated in the same way.

Differential Revision: https://reviews.llvm.org/D83204
2020-07-21 07:24:37 +01:00
David Green
b082241dfa [ARM] Predicated MVE reduction tests. NFC 2020-07-21 06:47:48 +01:00
Elvina Yakubova
6cd76408bf [llvm-readobj] Update tests because of changes in llvm-readobj behavior
This patch updates tests using llvm-readobj and llvm-readelf, because
soon reading from stdin will be achievable only via a '-' as described
here: https://bugs.llvm.org/show_bug.cgi?id=46400. Patch with changes to
llvm-readobj behavior is here: https://reviews.llvm.org/D83704

Differential Revision: https://reviews.llvm.org/D83912

Reviewed by: jhenderson, MaskRay, grimar
2020-07-20 10:39:04 +01:00
David Green
5eb849eb9e [ARM] Don't mark vctp as having sideeffects
As far as I can tell, it should not be necessary for VCTP to be
unpredictable in tail predicated loops. Either it has a a valid loop
counter as a operand which will naturally keep it in the right loop, or
it doesn't and it won't be converted to a tail predicated loop. Not
marking it as having side effects allows it to be scheduled more cleanly
for cases where it is not expected to become a tail predicate loop.

Differential Revision: https://reviews.llvm.org/D83907
2020-07-19 09:28:09 +01:00
Sam Tebbs
6329f986c9 [HWLoops] Stop converting to a while loop when it would be unsafe to
There were cases where a do-while loop would be converted to a while
loop before finding out that it would be unsafe to expand the SCEV in
this situation and then bailing out of hardware loop conversion.

This patch checks if it would be unsafe to expand the SCEV and if so stops converting the do-while into a while, allowing conversion to a hardware loop.

Differential Revision: https://reviews.llvm.org/D83953
2020-07-17 11:47:08 +01:00
Matt Arsenault
ccceacf068 DAG: Try scalarizing when expanding saturating add/sub
In an upcoming AMDGPU patch, the scalar cases will be legal and vector
ops should be scalarized, rather than producing a long sequence of
vector ops which will also need to be scalarized.

Use a lazy heuristic that seems to work and improves the thumb2 MVE
test.
2020-07-16 14:05:16 -04:00
Pavel Iliin
ced6d5f200 [ARM] VBIT/VBIF support added.
Vector bitwise selects are matched by pseudo VBSP instruction
and expanded to VBSL/VBIT/VBIF after register allocation
depend on operands registers to minimize extra copies.
2020-07-16 11:25:53 +01:00
David Green
a9b961b8da [ARM] CSEL generation
This adds a peephole optimisation to turn a t2MOVccr that could not be
folded into any other instruction into a CSEL on 8.1-m. The t2MOVccr
would usually be expanded into a conditional mov, that becomes an IT;
MOV pair. We can instead generate a CSEL instruction, which can
potentially be smaller and allows better register allocation freedom,
which can help reduce codesize. Performance is more variable and may
depend on the micrarchitecture details, but initial results look good.
If we need to control this per-cpu, we can add a subtarget feature as we
need it.

Original patch by David Penry.

Differential Revision: https://reviews.llvm.org/D83566
2020-07-16 11:10:53 +01:00
Pavel Iliin
5410e8da94 [ARM][NFC] More detailed vbsl checks in ARM & Thumb2 tests. 2020-07-13 17:00:43 +01:00
Sjoerd Meijer
ad0b9a0feb [ARM][MVE] Refactor option -disable-mve-tail-predication
This refactors option -disable-mve-tail-predication to take different arguments
so that we have 1 option to control tail-predication rather than several
different ones.

This is also a prep step for D82953, in which we want to reject reductions
unless that is requested with this option.

Differential Revision: https://reviews.llvm.org/D83133
2020-07-13 13:40:33 +01:00
David Green
684e62e531 [ARM] Remove hasSideEffects from FP converts
Whether an instruction is deemed to have side effects in determined by
whether it has a tblgen pattern that emits a single instruction.
Because of the way a lot of the the vcvt instructions are specified
either in dagtodag code or with patterns that emit multiple
instructions, they don't get marked as not having side effects.

This just marks them as not having side effects manually. It can help
especially with instruction scheduling, to not create artificial
barriers, but one of these tests also managed to produce fewer
instructions.

Differential Revision: https://reviews.llvm.org/D81639
2020-07-05 16:23:24 +01:00
David Green
b9acbecea9 [ARM][HWLoops] Create hardware loops for sibling loops
Given a loop with two subloops, it should be possible for both to be
converted to hardware loops. That's what this patch does, simply enough.
It slightly alters the loop iterating order to try and convert all
subloops. If one (or more) succeeds, it stops as before.

Differential Revision: https://reviews.llvm.org/D78502
2020-07-03 17:20:02 +01:00
Nicholas Guy
f4899fdead [ARM] Rearrange SizeReduction when using -Oz
Move the Thumb2SizeReduce pass to before IfConversion when optimising
for minimal code size.

Running the Thumb2SizeReduction pass before IfConversionallows T1
instructions to propagate to the final output, rather than the
ifConverter modifying T2 instructions and preventing them from being
reduced later.

This change does introduce a regression regarding execution time, so
it's only applied when optimising for size.

Running the LLVM Test Suite with this change produces a geomean
difference of -0.1% for the size..text metric.

Differential Revision: https://reviews.llvm.org/D82439
2020-07-02 09:19:38 +01:00
Sam Parker
dd087b6361 [NFC][ARM] Add test. 2020-07-01 09:28:56 +01:00
Sam Parker
e85afea7aa [ARM][LowOverheadLoops] Handle reductions
While validating live-out values, record instructions that look like
a reduction. This will comprise of a vector op (for now only vadd),
a vorr (vmov) which store the previous value of vadd and then a vpsel
in the exit block which is predicated upon a vctp. This vctp will
combine the last two iterations using the vmov and vadd into a vector
which can then be consumed by a vaddv.

Once we have determined that it's safe to perform tail-predication,
we need to change this sequence of instructions so that the
predication doesn't produce incorrect code. This involves changing
the register allocation of the vadd so it updates itself and the
predication on the final iteration will not update the falsely
predicated lanes. This mimics what the vmov, vctp and vpsel do and
so we then don't need any of those instructions.

Differential Revision: https://reviews.llvm.org/D75533
2020-07-01 08:31:49 +01:00
Samuel Tebbs
dcd6d8787a [ARM] Allow the fabs intrinsic to be tail predicated
This patch stops the fabs intrinsic from blocking tail predication.

Differential Revision: https://reviews.llvm.org/D82570
2020-06-30 17:27:28 +01:00
Samuel Tebbs
d908315744 [ARM] Allow the usub_sat and ssub_sat intrinsics to be tail predicated
This patch stops the usub_sat and ssub_sat intrinsics from blocking tail predication.

Differential Revision: https://reviews.llvm.org/D82571
2020-06-30 17:16:58 +01:00
Sjoerd Meijer
4fb902bfc8 [ARM][MVE] Tail-predication: clean-up of unused code
After the rewrite of this pass (D79175) I missed one thing: the inserted VCTP
intrinsic can be cloned to exit blocks if there are instructions present in it
that perform the same operation, but this wasn't triggering anymore. However,
it turns out that for handling reductions, see D75533, it's actually easier not
not to have the VCTP in exit blocks, so this removes that code.

This was possible because it turned out that some other code that depended on
this, rematerialization of the trip count enabling more dead code removal
later, wasn't doing much anymore due to more aggressive dead code removal that
was added to the low-overhead loops pass.

Differential Revision: https://reviews.llvm.org/D82773
2020-06-30 17:09:36 +01:00
Samuel Tebbs
4fc642e8a5 [ARM] Allow rounding intrinsics to be tail predicated
This patch stops the trunc, rint, round, floor and ceil intrinsics from blocking tail predication.

Differential Revision: https://reviews.llvm.org/D82553
2020-06-30 16:52:25 +01:00
Sam Parker
5b21582cc5 [NFC][ARM] Tail predication reduction tests 2020-06-30 13:27:22 +01:00
David Green
6b973734b0 [ARM] Better reductions
MVE has native reductions for integer add and min/max. The others need
to be expanded to a series of extract's and scalar operators to reduce
the vector into a single scalar. The default codegen for that expands
the reduction into a series of in-order operations.

This modifies that to something more suitable for MVE. The basic idea is
to use vector operations until there are 4 remaining items then switch
to pairwise operations. For example a v8f16 fadd reduction would become:
Y = VREV X
Z = ADD(X, Y)
z0 = Z[0] + Z[1]
z1 = Z[2] + Z[3]
return z0 + z1

The awkwardness (there is always some) comes in from something like a
v4f16, which is first legalized by adding identity values to the extra
lanes of the reduction, and which can then not be optimized away through
the vrev; fadd combo, the inserts remain. I've made sure they custom
lower so that we can produce the pairwise additions before the extra
values are added.

Differential Revision: https://reviews.llvm.org/D81397
2020-06-29 16:04:13 +01:00
David Green
33b3ee4e0a [ARM] VCVTT fpround instruction selection
Similar to the recent patch for fpext, this adds vcvtb and vcvtt with
insert into vector instruction selection patterns for fptruncs. This
helps clear up a lot of register shuffling that we would otherwise do.

Differential Revision: https://reviews.llvm.org/D81637
2020-06-26 10:24:06 +01:00
David Green
6ea3fc012b [ARM] VCVTT instruction selection
We current extract and convert from a top lane of a f16 vector using a
VMOVX;VCVTB pair. We can simplify that to use a single VCVTT. The
pattern is mostly copied from a vector extract pattern, but produces a
VCVTTHS f32 directly.

This had to move some code around so that ARMInstrVFP had access to the
required pattern frags that were previously part of ARMInstrNEON.

Differential Revision: https://reviews.llvm.org/D81556
2020-06-26 08:58:55 +01:00
Sjoerd Meijer
ef2fc8e627 [SelectionDAG] Lower @llvm.get.active.lane.mask to setcc
This lowers intrinsic @llvm.get.active.lane.mask to a setcc node, i.e. an icmp
ule, and creates vectors for its 2 arguments on which the comparison is
performed.

Differential Revision: https://reviews.llvm.org/D82292
2020-06-26 07:46:38 +01:00
Sjoerd Meijer
9e7cf4b604 [ARM] Don't revert get.active.lane.mask in ARM Tail-Predication pass
Don't revert intrinsic get.active.lane.mask here, this is moved to isel
legalization in D82292.

Differential Revision: https://reviews.llvm.org/D82105
2020-06-26 07:42:39 +01:00
David Green
bc2c8f2f3b [ARM] Split FPExt loads
This extends PerformSplittingToWideningLoad to also handle FP_Ext, as
well as sign and zero extends. It uses an integer extending load
followed by a VCVTL on the bottom lanes to efficiently perform an fpext
on a smaller than legal type.

The existing code had to be rewritten a little to not just split the
node in two and let legalization handle it from there, but to actually
split into legal chunks.

Differential Revision: https://reviews.llvm.org/D81340
2020-06-25 21:55:13 +01:00
David Green
f40cf5ffb9 [ARM] MVE VCVT lowering for f16->f32 extends
This adds code to lower f16 to f32 fp_exts's using an MVE VCVT
instructions, similar to a recent similar patch for fp_trunc. Again it
goes through the lowering of a BUILD_VECTOR, but is slightly simpler
only having to deal with interleaved indices. It adds a VCVTL node to
lower to, similar to VCVTN.

Differential Revision: https://reviews.llvm.org/D81339
2020-06-25 20:54:26 +01:00
David Green
5a95ea8473 [ARM] Add FP_ROUND handling to splitting MVE stores
This splits MVE vector stores of a fp_trunc in the same way that we do
for standard trunc's. It extends PerformSplittingToNarrowingStores to
handle fp_round, splitting the store into pieces and adding a VCVTNb to
perform the actual fp_round. The actual store is then converted to an
integer store so that it can truncate bottom lanes of the result.

Differential Revision: https://reviews.llvm.org/D81141
2020-06-25 19:37:15 +01:00
David Green
6e22f03ef5 [ARM] MVE VCVT lowering for f32->f16 truncs
This adds code to lower f32 to f16 fp_trunc's using a pair of MVE VCVT
instructions. Due to v4f16 not being legal, fp_round are often split up
fairly early. So this reconstructs the vcvt's from a buildvector of
fp_rounds from two vector inputs. Something like:

BUILDVECTOR(FP_ROUND(EXTRACT_ELT(X, 0),
            FP_ROUND(EXTRACT_ELT(Y, 0),
            FP_ROUND(EXTRACT_ELT(X, 1),
            FP_ROUND(EXTRACT_ELT(Y, 1), ...)

It adds a VCVTN node to handle this, which like VMOVN or VQMOVN lowers
into the top/bottom lanes of an MVE instruction.

Differential Revision: https://reviews.llvm.org/D81139
2020-06-25 15:59:36 +01:00
Sam Tebbs
2aff9bba46 [ARM] Allow tail predication on sadd_sat and uadd_sat intrinsics
This patch stops the sadd_sat and uadd_sat intrinsics from blocking tail predication.

Differential revision: https://reviews.llvm.org/D82377
2020-06-25 11:54:29 +01:00
David Green
680cf8ff46 [ARM] Mark more integer instructions as not having side effects.
LDRD and STRD along with UBFX and SBFX are selected from DAGToDAG
transforms, so do not have tblgen patterns. They don't get marked as
having side effects so cannot be scheduled as efficiently as you would
like.

This specifically marks then as not having side effects.

Differential Revision: https://reviews.llvm.org/D82358
2020-06-23 22:45:51 +01:00
Tres Popp
e6cca551e7 Revert "[CGP] Enable CodeGenPrepares phi type convertion."
This reverts commit 67121d7b82ed78a47ea32f0c87b7317e2b469ab2.

This is causing compile times to be 2x slower on some large binaries.
2020-06-22 13:06:18 +02:00
David Green
2825171bc9 [CGP] Enable CodeGenPrepares phi type convertion. 2020-06-21 16:46:16 +01:00
Sjoerd Meijer
c43371a1bf [ARM][MVE] tail-predication: renamed internal option.
Renamed -force-tail-predication to -force-mve-tail-predication because
that's more descriptive and consistent.
2020-06-19 15:07:06 +01:00
Lucas Prates
a515304bf7 [ARM] Supporting lowering of half-precision FP arguments and returns in AArch32's backend
Summary:
Half-precision floating point arguments and returns are currently
promoted to either float or int32 in clang's CodeGen and there's
no existing support for the lowering of `half` arguments and returns
from IR in AArch32's backend.

Such frontend coercions, implemented as coercion through memory
in clang, can cause a series of issues in argument lowering, as causing
arguments to be stored on the wrong bits on big-endian architectures
and incurring in missing overflow detections in the return of certain
functions.

This patch introduces the handling of half-precision arguments and returns in
the backend using the actual "half" type on the IR. Using the "half"
type the backend is able to properly enforce the AAPCS' directions for
those arguments, making sure they are stored on the proper bits of the
registers and performing the necessary floating point convertions.

Reviewers: rjmccall, olista01, asl, efriedma, ostannard, SjoerdMeijer

Reviewed By: ostannard

Subscribers: stuij, hiraditya, dmgreen, llvm-commits, chill, dnsampaio, danielkiss, kristof.beyls, cfe-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75169
2020-06-18 13:15:13 +01:00
Sjoerd Meijer
0d40769e87 [ARM] Reimplement MVE Tail-Predication pass using @llvm.get.active.lane.mask
To set up a tail-predicated loop, we need to to calculate the number of
elements processed by the loop. We can now use intrinsic
@llvm.get.active.lane.mask() to do this, which is emitted by the vectoriser in
D79100. This intrinsic generates a predicate for the masked loads/stores, and
consumes the Backedge Taken Count (BTC) as its second argument. We can now use
that to reconstruct the loop tripcount, instead of the IR pattern match
approach we were using before.

Many thanks to Eli Friedman and Sam Parker for all their help with this work.

This also adds overflow checks for the different, new expressions that we
create: the loop tripcount, and the sub expression that calculates the
remaining elements to be processed. For the latter, SCEV is not able to
calculate precise enough bounds, so we work around that at the moment, but is
not entirely correct yet, it's conservative. The overflow checks can be
overruled with a force flag, which is thus potentially unsafe (but not really
because the vectoriser is the only place where this intrinsic is emitted at the
moment). It's also good to mention that the tail-predication pass is not yet
enabled by default.  We will follow up to see if we can implement these
overflow checks better, either by a change in SCEV or we may want revise the
definition of llvm.get.active.lane.mask.

Differential Revision: https://reviews.llvm.org/D79175
2020-06-17 15:17:42 +01:00
David Green
9c4e61f00c [LSR] Filter for postinc formulae
In more complicated loops we can easily hit the complexity limits of
loop strength reduction. If we do and filtering occurs, it's all too
easy to remove the wrong formulae for post-inc preferring accesses due
to it attempting to maximise register re-use. The patch adds an
alternative filtering step when the target is preferring postinc to pick
postinc formulae instead, hopefully lowering the complexity to below the
limit so that aggressive filtering is not needed.

There is also a change in here to stop considering existing addrecs as
free under postinc. We should already be modelling them as a reg so
don't want it to cause us to get the cost wrong. (I'm not sure that code
makes sense in general, but there are X86 tests specifically for it
where it seems to be helping so have left it around for the standard
non-post-inc case).

Differential Revision: https://reviews.llvm.org/D80273
2020-06-17 12:32:04 +01:00
David Green
3175ed5612 [ARM] Fix crash trying to generate i1 immediates
These code patterns attempt to call isVMOVModifiedImm on a splat of i1
values, leading to an unreachable being hit. I've guarded the call on a
more specific set of sizes, as i1 vectors are legal under MVE.

Differential Revision: https://reviews.llvm.org/D81860
2020-06-16 12:27:24 +01:00
David Green
8a5c6f3865 [ARM] Add some MVE vecreduce tests. NFC 2020-06-09 12:07:19 +01:00
David Green
499648daa2 [ARM] VQMOVN demand bits analysis
Similar to VMOVN, a VQMOVN will only demand the top/bottom lanes of it's
first input. However unlike VMOVN it will need access to the entire
second argument, as that value is saturated not just moved in place.

Differential Revision: https://reviews.llvm.org/D80515
2020-06-05 18:41:02 +01:00
David Green
5cd7010367 [ARM] FP16 conversion tests. NFC 2020-06-04 13:13:56 +01:00
David Green
9a7508e506 [ARM] Extra MVE VMLAV reduction patterns
These patterns for i8 and i16 VMLA's were missing. They end up from
legalized vector.reduce.add.v8i16 and vector.reduce.add.v16i8, and
although the instruction works differently (the mul and add are
performed in a higher precision), I believe it is OK because only an
i8/i16 are demanded from them, and so the results will be the same. At
least, they pass any testing I can think to run on them.

There are some tests that end up looking worse, but are quite artificial
due to passing half vector types through a call boundary. I would not
expect the vmull to realistically come up like that, and a vmlava is
likely better a lot of the time.

Differential Revision: https://reviews.llvm.org/D80524
2020-05-29 16:23:24 +01:00
David Green
6307317d50 [ARM] More tests for MVE LSR and float issues. NFC 2020-05-28 22:04:12 +01:00
Victor Campos
8541a94229 [ARM] Fix rewrite of frame index in Thumb2's address mode i8s4
Summary:
In Thumb2's frame index rewriting process, the address mode i8s4, which
is used by LDRD and STRD instructions, is handled by taking the
immediate offset operand and multiplying it by 4.

This behaviour is wrong, however. In this specific address mode, the
MachineInstr's immediate operand is already in the expected form. By
consequence of that, multiplying it once more by 4 yields a flawed
offset value, four times greater than it should be.

Differential Revision: https://reviews.llvm.org/D80557
2020-05-27 13:09:13 +01:00
David Green
5d1d2066e7 [ARM] MVE VMINV/VMAXV test additions. NFC 2020-05-26 14:00:14 +01:00
David Green
92f08097a7 [ARM] VMULH tests for when other parts are working. NFC 2020-05-25 12:46:18 +01:00
Jean-Michel Gorius
2d66ce0e5e Revert "[CodeGen] Add support for multiple memory operands in MachineInstr::mayAlias"
This temporarily reverts commit 7019cea26dfef5882c96f278c32d0f9c49a5e516.

It seems that, for some targets, there are instructions with a lot of memory operands (probably more than would be expected). This causes a lot of buildbots to timeout and notify failed builds. While investigations are ongoing to find out why this happens, revert the changes.
2020-05-22 21:26:46 +02:00
Jean-Michel Gorius
b6e158e140 [CodeGen] Add support for multiple memory operands in MachineInstr::mayAlias
Summary:
To support all targets, the mayAlias member function needs to support instructions with multiple operands.

This revision also changes the order of the emitted instructions in some test cases.

Reviewers: efriedma, hfinkel, craig.topper, dmgreen

Reviewed By: efriedma

Subscribers: MatzeB, dmgreen, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80161
2020-05-21 23:02:54 +02:00
Sjoerd Meijer
a5cc4d095a [HardwareLoops] llvm.loop.decrement.reg definition
This is split off from D80316, slightly tightening the definition of overloaded
hardwareloop intrinsic llvm.loop.decrement.reg specifying that both operands
its result have the same type.
2020-05-21 10:48:16 +01:00
Pierre-vh
66d4cf9f6d [Target][ARM] Make Low Overhead Loops coexist with VPT blocks.
Previously, the LowOverheadLoops pass couldn't handle VPT blocks
with conditions, or with multiple VCTPs. This patch improves the
LowOverheadLoops pass so it can handle those cases.

It also adds support for VCMPs before the VCTP.

Differential Revision: https://reviews.llvm.org/D78206
2020-05-20 12:24:55 +01:00