1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 05:01:59 +01:00

61169 Commits

Author SHA1 Message Date
Thomas Symalla
ee664c032b Added and used new target pseudo for v_cvt_pk_i16_i32, changes due to code review. 2021-02-02 09:14:53 +01:00
Thomas Symalla
43278a1cb3 Formatting changes 2021-02-02 09:14:53 +01:00
Thomas Symalla
46f1f49a56 Formatting changes. 2021-02-02 09:14:53 +01:00
Thomas Symalla
8aacc5adcd Updating formatting changes. 2021-02-02 09:14:53 +01:00
Thomas Symalla
8955159f2d Resolve formatting changes. 2021-02-02 09:14:53 +01:00
Thomas Symalla
38676d07e0 Code changes yielded from review. 2021-02-02 09:14:53 +01:00
Thomas Symalla
ea43201600 Move step to PreLegalizer 2021-02-02 09:14:53 +01:00
Thomas Symalla
122a71a9f1 Move Combiner to PreLegalize step 2021-02-02 09:14:53 +01:00
Thomas Symalla
9a5185f66f Reverted unintended git-format change. 2021-02-02 09:14:52 +01:00
Thomas Symalla
5e6c38bc76 Fixed the lit tests and a bug in the implementation. 2021-02-02 09:14:52 +01:00
Thomas Symalla
8206639df8 Refactored the pattern matching. 2021-02-02 09:14:52 +01:00
Thomas Symalla
555bb61a39 Added early exit. 2021-02-02 09:14:52 +01:00
Thomas Symalla
1f11e485f5 Added comments. 2021-02-02 09:14:52 +01:00
Thomas Symalla
ae887237b4 clang-format 2021-02-02 09:14:52 +01:00
Thomas Symalla
1663fb4919 Added clamp i64 to i16 global isel pattern. 2021-02-02 09:14:52 +01:00
Craig Topper
6d008ca25f [RISCV] Replace NoX0 SDNodeXForm with a ComplexPattern to do the selection of the VL operand.
I think this is a more standard way of doing this.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D95833
2021-02-02 00:08:58 -08:00
Puyan Lotfi
293c83b00b Revert "[AArch64] Homogeneous Prolog and Epilog Size Optimization"
This reverts commit 0426be3df6180747bd68706db87a70580f064f0f.

Reverting due to some expensive-checks failures in tests.
2021-02-02 02:33:44 -05:00
Kyungwoo Lee
28c9f1933e [AArch64] Homogeneous Prolog and Epilog Size Optimization
Prologs and epilogs handle callee-save registers and tend to be irregular with
different immediate offsets that are not often handled by the MachineOutliner.
Commit D18619/a5335647d5e8 (combining stack operations) stretched irregularity
further.

This patch tries to emit homogeneous stores and loads with the same offset for
prologs and epilogs respectively. We have observed that this canonicalizes
(homogenizes) prologs and epilogs significantly and results in a greatly
increased chance of outlining, resulting in a code size reduction.

Despite the above results, there are still size wins to be had that the
MachineOutliner does not provide due to the special handling X30/LR. To handle
the LR case, his patch custom-outlines prologs and epilogs in place. It does
this by doing the following:

  * Injects HOM_Prolog and HOM_Epilog pseudo instructions during a Prolog and
    Epilog Injection Pass.
  * Lowers and optimizes said pseudos in a AArchLowerHomogneousPrologEpilog Pass.
  * Outlined helpers are created on demand. Identical helpers are merged by the linker.
  * An opt-in flag is introduced to enable this feature. Another threshold flag
    is also introduced to control the aggressiveness of outlining for application's need.

This reduced an average of 4% of code size on LLVM-TestSuite/CTMark targeting arm64/-Oz.

Differential Revision: https://reviews.llvm.org/D76570
2021-02-02 00:26:51 -05:00
Matt Arsenault
e64514a4fe AMDGPU: Fix dbg_value handling when forming soft clause bundles
DBG_VALUES placed between memory instructions would change
codegen. Skip over these and re-insert them after the bundle instead
of giving up on bundling.
2021-02-01 22:16:35 -05:00
Philip Reames
29acfb48a9 [x86] introduce no_callee_saved_registers attribute
This is directly analogous to the existing no_caller_saved_registers, but with the opposite intention.  A function or call so marked shifts the responsibility of spilling the usual CSRs to it's caller.

An indirect call site and callee which don't agree on the attribute is ill defined.

The motivation for this change is that being able to prune callee saves (without modifying other details of the calling convention) is sometimes useful when generating stubs and adapters.  There's no intention to expose this as a source language feature; this is expected to be used by frontends to implement adapters where warranted.

Some specific examples of use cases:
* GC compatible compiled code wants to call an externally defined library function without needing to track pointer values through CSRs.
* debug enabled code wants to call precompiled library which doesn't provide enough information to track CSRs while preserving debug quality in caller.
* adapter stub entering hand written assembler which doesn't follow normal calling conventions.
2021-02-01 16:19:14 -08:00
Philip Reames
dfa3b662c8 [NFC][X86] Use CallBase interface to simplify code 2021-02-01 15:24:41 -08:00
Philip Reames
1af57bc4bf [NFC][X86] Avoid redundant work inspecting callee 2021-02-01 15:24:41 -08:00
David Green
b21519cd48 [ARM] Flatten identity shuffles through vqdmulh nodes
Given a shuffle(vqdmulh(shuffle, shuffle), we can flatter the shuffles
out if they become an identity mask. This can come up during lane
interleaving, when we do that better.

Differential Revision: https://reviews.llvm.org/D94034
2021-02-01 19:14:20 +00:00
Craig Topper
41a3080ffd [X86] Accept 64-bit GPRs for vextractps when using a register that requires EVEX.
This is consistent with the VEX version. It also fixes a sorting
issue in the matching table that caused the EVEX version to be
prioritized over VEX in intel syntax.

Fixes issue [2] from PR48991.
2021-02-01 11:01:32 -08:00
Simon Pilgrim
446a01964c [X86][SSE] LowerScalarImmediateShift - use APInt::getLowBitsSet for vXi8 ISD::SRL mask generation. NFCI.
Match what we do for ISD::SHL
2021-02-01 18:17:40 +00:00
Jessica Paquette
08fd0a40ed [AArch64][GlobalISel] Emit G_ASSERT_ZEXT in assignValueToReg
When we have a zeroext parameter, emit G_ASSERT_ZEXT.

Add a check that we actually emit it.

This is a 0.1% code size win on CTMark/7zip and CTMark/consumer-typeset at -Os.

Differential Revision: https://reviews.llvm.org/D95567
2021-02-01 10:01:52 -08:00
Craig Topper
6ab0fefa1b [RISCV] Add scalable vector support for floating point FMA instructions
A follow up patch will add support for commuting operands or
changing opcode to vfmacc and friends.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D95662
2021-02-01 09:52:43 -08:00
Craig Topper
c8be45ab39 [RISCV] Update comment text from D95774. NFC 2021-02-01 09:52:43 -08:00
Craig Topper
136420859e [RISCV] Optimize (srl (and X, 0xffff), C) -> (srli (slli X, 16), 16 + C).
Rather than materializing the 0xffff immediate for the AND, use
a shift left to remove the upper bits and then shift in zeros
from the right.

This pattern occurs when type legalizing an i16 right shift.

I've implemented this with custom selection code for a number of
reasons. I've limited this to the AND having a single use. We need
to compensate for SimplifyDemandedBits altering the AND mask. I'm
using *W opcodes on RV64. We may want to generlize this in the
future. For all these reason it seemed easiest to do it this way.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D95774
2021-02-01 09:37:55 -08:00
Austin Kerbow
b099a1ce22 [AMDGPU] Fix release build after 0397dca0. 2021-02-01 08:55:14 -08:00
Austin Kerbow
411730e6c6 [AMDGPU] Fix crash with sgpr spills to vgpr disabled
This would assert with amdgpu-spill-sgpr-to-vgpr disabled when trying to
spill the FP.

Fixes: SWDEV-262704

Reviewed By: RamNalamothu

Differential Revision: https://reviews.llvm.org/D95768
2021-02-01 08:35:25 -08:00
David Green
dbdd240132 [ARM] Simplify VMOVRRD from extracts of buildvectors
Under the softfp calling convention, we are often left with
VMOVRRD(extract(bitcast(build_vector(a, b, c, d)))) for the return value
of the function. These can be simplified to a,b or c,d directly,
depending on the value of the extract.

Big endian is a little different because the bitcast switches the lanes
around, meaning we end up with b,a or d,c.

Differential Revision: https://reviews.llvm.org/D94989
2021-02-01 16:09:25 +00:00
Kerry McLaughlin
267255edd9 [SVE][CodeGen] Remove performMaskedGatherScatterCombine
The AArch64 DAG combine added by D90945 & D91433 extends the index
of a scalable masked gather or scatter to i32 if necessary.

This patch removes the combine and instead adds shouldExtendGSIndex, which
is used by visitMaskedGather/Scatter in SelectionDAGBuilder to query whether
the index should be extended before calling getMaskedGather/Scatter.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D94525
2021-02-01 14:10:00 +00:00
Dmitry Preobrazhensky
d2dbf6a661 [AMDGPU][MC] Corrected error position for invalid operands
Generic parser may report an incorrect error position when an offending operand is followed by a comma.
See bug 48884 for details: https://bugs.llvm.org/show_bug.cgi?id=48884.

Differential Revision: https://reviews.llvm.org/D95674
2021-02-01 14:31:08 +03:00
David Green
9248cc33ab [ARM] Turn sext_inreg(VGetLaneu) into VGetLaneu
This adds a DAG combine for converting sext_inreg of VGetLaneu into
VGetLanes, providing the types match correctly.

Differential Revision: https://reviews.llvm.org/D95073
2021-02-01 11:10:35 +00:00
Simon Pilgrim
4720508a81 [X86][AVX] combineExtractWithShuffle - combine extracts from 256/512-bit vector shuffles.
We can only legally extract from the lowest 128-bit subvector, so extract the correct subvector to allow us to handle 256/512-bit vector element extracts.
2021-02-01 10:31:43 +00:00
David Green
96fb4c7ba1 [ARM] Simplify extract of VMOVDRR
Under SoftFP calling conventions, we can be left with
extract(bitcast(BUILD_VECTOR(VMOVDRR(a, b), ..))) patterns that can
simplify to a or b, depending on the extract lane.

Differential Revision: https://reviews.llvm.org/D94990
2021-02-01 10:24:57 +00:00
Kazushi (Jam) Marukawa
1b359f6379 [VE] Change inetger constants 32-bit friendly
Correct integer constants like `1UL << 63` to `UINT64_C(1) << 63` in
order to make them work on 32-bit machines.  Tested on both an i386
and x86_64 machines.

Reviewed By: mgorny

Differential Revision: https://reviews.llvm.org/D95724
2021-02-01 19:00:47 +09:00
Craig Topper
a5363d614d [Mips] Cleanup isel patterns to use 'vnot' instead of (xor X, immAllOnesV). NFCI
A couple patterns used bitconvert on the immAllOnesV, but
the isel matching uses ISD::isBuildVectorAllOnes which
is able to look through bitcasts. So isel patterns don't need
to do it explicitly.
2021-01-31 20:01:05 -08:00
Craig Topper
92b45d2660 [PowerPC] Remove vnot_ppc and replace with the standard vnot.
immAllOnesV has special support for looking through bitcasts
automatically so isel patterns don't need to explicitly look
for the bitconvert.
2021-01-31 19:41:33 -08:00
Craig Topper
00ea1c43c6 [X86] Cleanup isel patterns to use 'vnot' instead of (xor X, immAllOnesV) to improve readability. NFC 2021-01-31 18:53:40 -08:00
Craig Topper
ebb87abce3 [RISCV] Custom lower fshl/fshr with Zbt extension.
We need to add a mask to the shift amount for these operations
to use the FSR/FSL instructions. We were previously doing this
in isel patterns, but custom lowering will make the mask
visible to optimizations earlier.
2021-01-31 17:49:15 -08:00
Kazu Hirata
e0a8f45e5f [llvm] Drop unnecessary const from return types (NFC)
Identified with const-return-type.
2021-01-31 10:23:43 -08:00
Kazu Hirata
8323f01e46 [VE] Fix compiler warnings (NFC) 2021-01-31 10:23:39 -08:00
Matt Arsenault
d78e6d4720 AMDGPU: Add missing consts 2021-01-31 10:47:57 -05:00
Craig Topper
c021b33a1e [RISCV] Use MVT instead of EVT in RISCVISelDAGToDAG.cpp
All this code runs post type legalization so we should have
exclusively legal types. The methods on MVT should be more
efficient than EVT.
2021-01-30 15:57:15 -08:00
Kazu Hirata
d4906cf2ad [llvm] Add missing header guards (NFC)
Identified with llvm-header-guard.
2021-01-30 09:53:42 -08:00
Kazu Hirata
58d7d68bee [AMDGPU] Forward-declare AMDGPUTargetMachine (NFC)
AMDGPUTargetTransformInfo.h needs AMDGPUTargetMachine but relies on a
forward declaration of AMDGPUTargetMachine in AMDGPU.h.  This patch
adds a forward declaration right in AMDGPUTargetTransformInfo.h.

While we are at it, this patch removes the one in
AMDGPU.h, where it is unnecessary.
2021-01-30 09:53:40 -08:00
Kazu Hirata
ab8cd6c8fd [llvm] Use isa instead of dyn_cast (NFC) 2021-01-29 23:23:37 -08:00
Kazu Hirata
95dfb94fbd [llvm] Use llvm::lower_bound and llvm::upper_bound (NFC) 2021-01-29 23:23:36 -08:00