1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 05:01:59 +01:00

3796 Commits

Author SHA1 Message Date
Amara Emerson
dd34b50099 [AArch64][GlobalISel] Add legalization & selection support for G_INTRINSIC_LRINT.
Differential Revision: https://reviews.llvm.org/D84552
2020-07-30 16:14:56 -07:00
Jon Roelofs
50a1ea2ba8 [SelectionDAG] Fix lowering of vector geps
This fixes an assertion failure that was being triggered in
SelectionDAG::getZeroExtendInReg(), where it was trying to extend the <2xi32>
to i64 (which should have been <2xi64>).

Fixes: rdar://66016901

Differential Revision: https://reviews.llvm.org/D84884
2020-07-30 14:56:53 -06:00
Florian Hahn
28a367c7c3 [AArch64] Add machine-combiner tests with instruction level FMFs. 2020-07-30 11:41:09 +01:00
David Sherwood
cf36aeaf65 [SVE][CodeGen] At -O0 fallback to DAG ISel when translating alloca with scalable types
When building code at -O0 We weren't falling back to DAG ISel correctly
when encountering alloca instructions with scalable vector types. This
is because the alloca has no operands that are scalable. I've fixed this by
adding a check in AArch64ISelLowering::fallBackToDAGISel for alloca
instructions with scalable types.

Differential Revision: https://reviews.llvm.org/D84746
2020-07-30 08:40:53 +01:00
Matt Arsenault
71c6e8a505 GlobalISel: Handle assorted no-op intrinsics
SelectionDAGBuilder just drops these, so do the same.
2020-07-29 21:26:20 -04:00
Matt Arsenault
e2b102c48a GlobalISel: Handle llvm.roundeven
I still think it's highly questionable that we have two intrinsics
with identical behavior and only vary by the name of the libcall used
if it happens to be lowered that way, but try to reduce the feature
delta between SDAG and GlobalISel for recently added intrinsics. I'm
not sure which opcode should be considered the canonical one, but
lower roundeven back to round.
2020-07-29 20:01:12 -04:00
Amara Emerson
d7527ffdb2 [GlobalISel] Add G_INTRINSIC_LRINT and translate from llvm.lrint
Differential Revision: https://reviews.llvm.org/D84551
2020-07-29 11:51:04 -07:00
Amara Emerson
9d49eb2e5c [AArch64][GlobalISel] Selection support for vector DUP[X]lane instructions.
In future, we'd like to use the perfect-shuffle mechanism to deal with these
shuffle permutations. For now, this improves performance by avoiding the
super-expensive const-pool load + tbl instruction.

Differential Revision: https://reviews.llvm.org/D84866
2020-07-29 11:41:37 -07:00
Jessica Paquette
6898e4fc01 [AArch64][GlobalISel] Select XRO addressing mode with wide immediates
Port the wide immediate case from AArch64DAGToDAGISel::SelectAddrModeXRO.

If we have a wide immediate which can't be represented in an add, we can end up
with code like this:

```
mov  x0, imm
add x1, base, x0
ldr  x2, [x1, 0]
```

If we use the [base, xN] addressing mode instead, we can produce this:

```
mov  x0, imm
ldr  x2, [base, x0]
```

This saves 0.4% code size on 7zip at -O3, and gives a geomean code size
improvement of 0.1% on CTMark.

Differential Revision: https://reviews.llvm.org/D84784
2020-07-29 11:02:10 -07:00
David Sherwood
db7d870d70 [SVE][CodeGen] Add simple integer add tests for SVE tuple types
I have added tests to:

  CodeGen/AArch64/sve-intrinsics-int-arith.ll

for doing simple integer add operations on tuple types. Since these
tests introduced new warnings due to incorrect use of
getVectorNumElements() I have also fixed up these warnings in the
same patch. These fixes are:

1. In narrowExtractedVectorBinOp I have changed the code to bail out
early for scalable vector types, since we've not yet hit a case that
proves the optimisations are profitable for scalable vectors.
2. In DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS I have replaced
calls to getVectorNumElements with getVectorMinNumElements in cases
that work with scalable vectors. For the other cases I have added
asserts that the vector is not scalable because we should not be
using shuffle vectors and build vectors in such cases.

Differential revision: https://reviews.llvm.org/D84016
2020-07-29 13:32:10 +01:00
David Sherwood
6c7452123a [SVE] Add checks for no warnings in CodeGen/AArch64/sve-sext-zext.ll
Previous patches fixed up all the warnings in this test:

  llvm/test/CodeGen/AArch64/sve-sext-zext.ll

and this change simply checks that no new warnings are added in future.

Differential revision: https://reviews.llvm.org/D83205
2020-07-29 13:06:39 +01:00
Matt Arsenault
d26779aca3 GlobalISel: Translate llvm.convert.{to|from}.fp16 intrinsics
I think these were added as a workaround for SelectionDAG lacking half
legalization support in the past. I think they should probably be
removed from the IR, but clang does still have a target control to
emit these instead of the native half fpext/fptrunc.
2020-07-28 11:46:05 -04:00
Sander de Smalen
d04f687d3b [AArch64][SVE] Fix epilogue for SVE when the stack is realigned.
While deallocating the stackframe, the offset used to reload the
callee-saved registers was not pointing to the SVE callee-saves,
but rather to the whole SVE area.

   +--------------+
   | GRP callee   |
   |     saves    |
   +--------------+ <- FP
   | SVE callee   |
   |     saves    |
   +--------------+ <- Should restore SVE callee saves from here
   |  SVE Spills  |
   |  and Locals  |
   +--------------+ <- instead of from here.
   |              |
   :              :
   |              |
   +--------------+ <- SP

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D84539
2020-07-28 15:45:53 +01:00
Sander de Smalen
7093403ad8 [AArch64][SVE] Don't align the last SVE callee save.
Instead of aligning the last callee-saved-register slot to the stack
alignment (16 bytes), just align the SVE callee-saved block. This also
simplifies the code that allocates space for the callee-saves.

This change is needed to make sure the offset to which the callee-saved
register is spilled, corresponds to the offset used for e.g. unwind call
frame instructions.

Reviewers: efriedma, paulwalker-arm, david-arm, rengolin

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84042
2020-07-28 15:45:53 +01:00
Sander de Smalen
275d51c545 [AArch64][SVE] Don't support fixedStack for SVE objects.
Fixed stack objects are preallocated and defined to be allocated before
any of the regular stack objects. These are normally used to model stack
arguments.

The AAPCS does not support passing SVE registers on the stack by value
(only by reference). The current layout also doesn't place them before
all stack objects, but rather before all SVE objects. Removing this
simplifies the code that emits the allocation/deallocation
around callee-saved registers (D84042).

This patch also removes all uses of fixedStack from from
framelayout-sve.mir, where this was used purely for testing purposes.

Reviewers: paulwalker-arm, efriedma, rengolin

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D84538
2020-07-28 15:45:53 +01:00
Francesco Petrogalli
79f86bfe26 [llvm][CodeGen] Addressing modes for SVE ldN.
Reviewers: c-rhodes, efriedma, sdesmalen

Subscribers: huihuiz, tschuett, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77251
2020-07-27 22:18:28 +00:00
Arthur Eubanks
fca7fa2021 Prefix some AArch64/ARM passes with "aarch64-"/"arm-"
For consistency with other target specific passes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D84560
2020-07-27 11:00:39 -07:00
Jon Roelofs
c0e6798650 [AArch64] fjcvtzs,rmif,cfinv,setf* all clobber nzcv
Differential Revision: https://reviews.llvm.org/D83818
2020-07-27 09:17:53 -06:00
David Sherwood
7010946aee [SVE] Don't use LocalStackAllocation for SVE objects
I have introduced a new TargetFrameLowering query function:

  isStackIdSafeForLocalArea

that queries whether or not it is safe for objects of a given stack
id to be bundled into the local area. The default behaviour is to
always bundle regardless of the stack id, however for AArch64 this is
overriden so that it's only safe for fixed-size stack objects.
There is future work here to extend this algorithm for multiple local
areas so that SVE stack objects can be bundled together and accessed
from their own virtual base-pointer.

Differential Revision: https://reviews.llvm.org/D83859
2020-07-27 08:22:01 +01:00
QingShan Zhang
ab3ffdcb1e [Scheduling] Improve group algorithm for store cluster
Store Addr and Store Addr+8 are clusterable pair. They have memory(ctrl) dependency on different loads.
Current implementation will put these two stores into different group and miss to cluster them.

Reviewed By: evandro

Differential Revision: https://reviews.llvm.org/D84139
2020-07-27 02:02:40 +00:00
Amara Emerson
8003a99b34 [AArch64][GlobalISel] Make <8 x s16> and <16 x s8> legal types for G_SHUFFLE_VECTOR and G_IMPLICIT_DEF.
Trivial change, we're still missing support for rev matching for these types
in the combiner.
2020-07-26 00:48:09 -07:00
Jessica Paquette
0266a8b395 [AArch64][GlobalISel] Look through constants when selection stores of 0
Very minor code size improvements (hits 8 times in Bullet at -O3), but still
something.

Also very minor NFC change to make sure we only search for a 0 constant when
selecting a store. Before, we'd do this for loads as well.

Differential Revision: https://reviews.llvm.org/D84573
2020-07-24 22:46:14 -07:00
Jessica Paquette
b6d2a38769 [AArch64][GlobalISel] Use wzr/xzr for 16 and 32 bit stores of zero
We weren't performing this optimization on 16 and 32 bit stores. SDAG happily
does this though.

e.g. https://godbolt.org/z/cWocKr

This saves about 0.2% in code size on CTMark at -O3.

Differential Revision: https://reviews.llvm.org/D84568
2020-07-24 17:15:20 -07:00
Matt Arsenault
16ca0e0369 GlobalISel: Define mulfix/divfix opcodes
The full expansion involves the funnel shifts, which depend on another
patch to expand those.
2020-07-24 20:02:20 -04:00
Amara Emerson
8f9bed84e6 [AArch64][GlobalISel] Promote G_UITOFP vector operands to same elt size as result.
Fixes legalization failures.
2020-07-24 17:00:50 -07:00
Eli Friedman
d2a7f965f0 [AArch64][SVE] Add "fast" fcmp operations.
dacf8d3 added support for most fcmp operations, but there are some extra
variations I hadn't considered: SelectionDAG supports float comparisons
that are neither ordered nor unordered. Add support for the missing
operations.

Differential Revision: https://reviews.llvm.org/D84460
2020-07-24 13:22:41 -07:00
Francesco Petrogalli
5f0a55fcfc [llvm][sve] Reg + Imm addressing mode for ld1ro.
Reviewers: kmclaughlin, efriedma, sdesmalen

Subscribers: tschuett, hiraditya, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83357
2020-07-24 17:48:47 +00:00
Eli Friedman
f7eaa43249 [AArch64][SVE] Teach copyPhysReg to copy ZPR2/3/4.
It's sort of tricky to hit this in practice, but not impossible. I have
a synthetic C testcase if anyone is interested.

The implementation is identical to the equivalent NEON register copies.

Differential Revision: https://reviews.llvm.org/D84373
2020-07-23 16:41:37 -07:00
Amara Emerson
b33e55da04 [AArch64][GlobalISel] Add post-legalize combine for sext(trunc(sextload)) -> trunc/copy
On AArch64 we generate redundant G_SEXTs or G_SEXT_INREGs because of this.

Differential Revision: https://reviews.llvm.org/D81993
2020-07-23 12:06:35 -07:00
Evgeny Leviant
355534b549 [CodeGen][TargetPassConfig] Add unreachable-mbb-elimination pass explicitly
Differential revision: https://reviews.llvm.org/D84228
2020-07-23 18:05:11 +03:00
Konstantin Schwarz
e3dfca8d38 [GlobalISel][InlineAsm] Add register class ID to the flags of register input operands
Summary: We do this already for output operands, but missed it for (non-tied) input operands.

Reviewers: arsenm, Petar.Avramovic

Reviewed By: arsenm

Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, llvm-commits, kerbowa

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83763
2020-07-23 13:35:01 +02:00
Sander de Smalen
d7fe459f10 [AArch64][SVE] Correctly allocate scavenging slot in presence of SVE.
This patch addresses two issues:

* Forces the availability of the base-pointer (x19) when the frame has
  both scalable vectors and variable-length arrays. Otherwise it will
  be expensive to access non-SVE locals.

* In presence of SVE stack objects, it will allocate the emergency
  scavenging slot close to the SP, so that they can be accessed from
  the SP or BP if available. If accessed from the frame-pointer, it will
  otherwise need an extra register to access the scavenging slot because
  of mixed scalable/non-scalable addressing modes.

Reviewers: efriedma, ostannard, cameron.mcinally, rengolin, david-arm

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D70174
2020-07-22 10:50:36 +01:00
Amara Emerson
751c6ffec8 Revert "[AArch64][GlobalISel] Add post-legalize combine for sext_inreg(trunc(sextload)) -> copy"
This reverts commit 64eb3a4915f00cca9af4c305a9ff36209003cd7b.

It caused miscompiles with optimizations enabled. Reverting while I investigate.
2020-07-21 16:01:18 -07:00
Amara Emerson
fb2c0a0ef7 [AArch64][GlobalISel] Fix TLS accesses clobbering registers incorrectly.
This was happening because the BLR didn't have a use of the X0 arg register,
which would end up being re-used in high reg pressure situations.
The change also avoids hard coding the use of X0 for the sequence except to
copy the value for the call. ld64 should still be able to optimize it.

rdar://65438258
2020-07-21 16:01:17 -07:00
Matt Arsenault
f9701c3fe0 GlobalISel: Translate llvm.powi intrinsic
There are a few questionable things about this intrinsic and existing
DAG implementation. For some reason the intrinsic hardcodes the second
operand to be scalar-only i32, and SelectionDAG builder makes a
legalization decision based on whether the operand is constant.
2020-07-21 18:13:04 -04:00
Sander de Smalen
cd95c17a17 [AArch64][SVE] Fix PCS for functions taking/returning scalable types.
The default calling convention needs to save/restore the SVE callee
saves according to the SVE PCS when the function takes or returns
scalable types, even when the `aarch64_sve_vector_pcs` CC is not
specified for the function.

Reviewers: efriedma, paulwalker-arm, david-arm, rengolin

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D84041
2020-07-21 15:55:39 +01:00
Petre-Ionut Tudor
1210771898 [ARM] Generate [SU]HADD from ((a + b) >> 1)
Summary:
Teach LLVM to recognize the above pattern, where the operands are
either signed or unsigned types.

Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83777
2020-07-21 13:22:07 +01:00
Eli Friedman
941026e51d [AArch64][SVE] Add support for trunc to <vscale x N x i1>.
This isn't a natively supported operation, so convert it to a
mask+compare.

In addition to the operation itself, fix up some surrounding stuff to
make the testcase work: we need concat_vectors on i1 vectors, we need
legalization of i1 vector truncates, and we need to fix up all the
relevant uses of getVectorNumElements().

Differential Revision: https://reviews.llvm.org/D83811
2020-07-20 13:11:02 -07:00
Yuanfang Chen
bf8086d1c1 [llc] (almost) remove --print-machineinstrs
Its effect could be achieved by
`-stop-after`,`-print-after`,`-print-after-all`. But a few tests need to
print MIR after ISel which could not be done with
`-print-after`/`-stop-after` since isel pass does not have commandline name.
That's the reason `--print-machineinstrs` is downgraded to
`--print-after-isel` in this patch. `--print-after-isel` could be
removed after we switch to new pass manager since isel pass would have a
commandline text name to use `print-after` or equivalent switches.

The motivation of this patch is to reduce tests dependency on
would-be-deprecated feature.

Reviewed By: arsenm, dsanders

Differential Revision: https://reviews.llvm.org/D83275
2020-07-20 10:43:28 -07:00
Matt Arsenault
b4f7c23c9a AArch64/GlobalISel: Fix hardcoded registers in error message checks 2020-07-20 10:06:18 -04:00
Paul Walker
75f1125292 [SVE] Add lowering for fixed length vector fdiv, fma, fmul and fsub operations.
Differential Revision: https://reviews.llvm.org/D84034
2020-07-20 11:57:34 +00:00
Elvina Yakubova
6cd76408bf [llvm-readobj] Update tests because of changes in llvm-readobj behavior
This patch updates tests using llvm-readobj and llvm-readelf, because
soon reading from stdin will be achievable only via a '-' as described
here: https://bugs.llvm.org/show_bug.cgi?id=46400. Patch with changes to
llvm-readobj behavior is here: https://reviews.llvm.org/D83704

Differential Revision: https://reviews.llvm.org/D83912

Reviewed by: jhenderson, MaskRay, grimar
2020-07-20 10:39:04 +01:00
Tim Northover
24f7263412 AArch64: emit @llvm.debugtrap as brk #0xf000 on all platforms
It's useful for a debugger to be able to distinguish an @llvm.debugtrap
from a (noreturn) @llvm.trap, so this extends the existing Windows
behaviour to other platforms.
2020-07-20 10:31:26 +01:00
Evgeny Leviant
3d264ff874 [CodeGen][TargetPassConfig] Add TargetTransformInfo pass correctly
Patch adds tti pass directly enforcing its execution with correctly set
TargetTransformInfo.

Differential revision: https://reviews.llvm.org/D84047
2020-07-18 14:11:40 +03:00
Jay Foad
3f23d4b8c3 [MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics
tryLatency compares two sched candidates. For the top zone it prefers
the one with lesser depth, but only if that depth is greater than the
total latency of the instructions we've already scheduled -- otherwise
its latency would be hidden and there would be no stall.

Unfortunately it only tests the depth of one of the candidates. This can
lead to situations where the TopDepthReduce heuristic does not kick in,
but a lower priority heuristic chooses the other candidate, whose depth
*is* greater than the already scheduled latency, which causes a stall.

The fix is to apply the heuristic if the depth of *either* candidate is
greater than the already scheduled latency.

All this also applies to the BotHeightReduce heuristic in the bottom
zone.

Differential Revision: https://reviews.llvm.org/D72392
2020-07-17 11:02:13 +01:00
Paul Walker
bb56ec9a61 [SVE] Add lowering for scalable vector fadd, fdiv, fmul and fsub operations.
Lower the operations to predicated variants.  This is prep work
required for fixed length code generation but also fixes a bug
whereby these operations fail selection when "unpacked" vector
types (e.g. MVT::nxv2f32) are used.

This patch also adds the missing "unpacked" patterns for FMA.

Differential Revision: https://reviews.llvm.org/D83765
2020-07-16 11:31:35 +00:00
Kerry McLaughlin
30eb603e95 [SVE][CodeGen] Legalisation of masked loads and stores
Summary:
This patch modifies IncrementMemoryAddress to use a vscale
when calculating the new address if the data type is scalable.

Also adds tablegen patterns which match an extract_subvector
of a legal predicate type with zip1/zip2 instructions

Reviewers: sdesmalen, efriedma, david-arm

Reviewed By: efriedma, david-arm

Subscribers: tschuett, hiraditya, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83137
2020-07-16 10:55:45 +01:00
Hiroshi Yamauchi
bd196de5cc [PGO][PGSO] Add profile guided size optimization to LegalizeDAG.
Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83333
2020-07-15 10:03:38 -07:00
Roger Ferrer Ibanez
97e701dc7a [DAGCombiner] Rebuild (setcc x, y, ==) from (xor (xor x, y), 1)
The existing code already considered this case. Unfortunately a typo in
the condition prevents it from triggering. Also the existing code, had
it run, forgot to do the folding.

This fixes PR42876.

Differential Revision: https://reviews.llvm.org/D65802
2020-07-15 07:34:22 +00:00
Roger Ferrer Ibanez
7ef6c6e4c3 [NFC] Add tests for boolean comparisons
They currently show that the not equal case may be improved.

See PR42876

Differential Revision: https://reviews.llvm.org/D65801
2020-07-15 07:33:43 +00:00