1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 20:23:11 +01:00
Commit Graph

61364 Commits

Author SHA1 Message Date
Sjoerd Meijer
e48baaecdb Recommit "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode"
This reverts commit effc3b079927a6dd3084b4ff712ec07f926366f0, with the build
problem fixed.
2021-02-15 11:33:00 +00:00
Sjoerd Meijer
dda4f352fa Revert "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode"
This reverts commit cd6de0e8de4a5fd558580be4b1a07116914fc8ed.
2021-02-15 11:01:23 +00:00
Sjoerd Meijer
c1c4d25c71 [TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode
This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into
getPreferredAddressingMode() so that we have one interface to steer LSR in
generating the preferred addressing mode.

Differential Revision: https://reviews.llvm.org/D96600
2021-02-15 10:44:15 +00:00
Fraser Cormack
adeeb489c1 [RISCV] Convert VSLIDE(UP|DOWN) nodes to "VL" versions (NFC)
This patch prepares the RISCV VSLIDEUP and VSLIDEDOWN custom nodes to
ones carrying additional mask and vector-length operands. This is
primarily so they can be used by both systems.

This also takes the opportunity to create some helper functions to deal
with the common task of getting the default (unmasked) VL operands.

Reviewed By: craig.topper, arcbbb

Differential Revision: https://reviews.llvm.org/D96505
2021-02-15 10:32:56 +00:00
Arlo Siemsen
d4eefe6819 Add ehcont section support
In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling.

This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker.

This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag.

The change includes a test that when the module flag is enabled the section is correctly generated.

The set of exception continuation information includes returns from exceptional control flow (catchret in llvm).

In order to collect catchret we:
1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation,
2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and
3) Combines these targets with the other EHCont targets that were already being collected.

Change originally authored by Daniel Frampton <dframpto@microsoft.com>

For more details, see MSVC documentation for `/guard:ehcont`
  https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D94835
2021-02-15 14:27:12 +08:00
Carl Ritson
730fd62aad [AMDGPU] Add llvm.amdgcn.wqm.demote intrinsic
Add intrinsic which demotes all active lanes to helper lanes.
This is used to implement demote to helper Vulkan extension.

In practice demoting a lane to helper simply means removing it
from the mask of live lanes used for WQM/WWM/Exact mode.
Where the shader does not use WQM, demotes just become kills.

Additionally add llvm.amdgcn.live.mask intrinsic to complement
demote operations. In theory llvm.amdgcn.ps.live can be used
to detect helper lanes; however, ps.live can be moved by LICM.
The movement of ps.live cannot be remedied without changing
its type signature and such a change would require ps.live
users to update as well.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D94747
2021-02-15 08:45:46 +09:00
Tony Tye
69551096db [AMDGPU] Limit memory scope for scratch, LDS and GDS
Changes for AMD GPU SIMemoryLegalizer:

- Limit the memory scope to maximum supported by the scratch, LDS and
  GDS address spaces.

- Improve assertion checking.

- Correct toSIAtomicScope argument name.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D96643
2021-02-14 17:34:12 +00:00
Kazu Hirata
9458e140a1 [AMDGPU] Fix build breakage 2021-02-14 09:02:55 -08:00
Kazu Hirata
ff49587cde [llvm] Use llvm::is_contained (NFC) 2021-02-14 08:36:20 -08:00
Ben Shi
fd0f322cf0 [AVR] Fix a bug in 16-bit shifts
Reviewed By: aykevl

Differential Revision: https://reviews.llvm.org/D96590
2021-02-14 11:54:55 +08:00
Craig Topper
9d5ee57922 [RISCV] Rename the RVVBaseAddr ComplexPattern to just BaseAddr and use it to merge some scalar load/store patterns too. 2021-02-13 12:01:51 -08:00
Heejin Ahn
4214250471 [WebAssemblly] Fix rethrow's argument computation
Previously we assumed `rethrow`'s argument was always 0, but it turned
out `rethrow` follows the same rule with `br` or `delegate`:
https://github.com/WebAssembly/exception-handling/pull/137
https://github.com/WebAssembly/exception-handling/issues/146#issuecomment-777349038

Currently `rethrow`s generated by our backend always rethrow the
exception caught by the innermost enclosing catch, so this adds a
function to compute that and replaces `rethrow`'s argument with its
computed result.

This also renames `EHPadStack` in `InstPrinter` to `TryStack`, because
in CFGStackify we use `EHPadStack` to mean the range between
`catch`~`end`, while in `InstPrinter` we used it to mean the range
between `try`~`catch`, so choosing different names would look clearer.
Doesn't contain any functional changes in `InstPrinter`.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D96595
2021-02-13 03:43:15 -08:00
Kazu Hirata
f0327193f8 [AMDGPU] Drop unnecessary const from a return type (NFC)
Identified with readability-const-return-type.
2021-02-12 23:44:32 -08:00
Serge Pavlov
c5d859bfe2 [FPEnv][ARM] Implement lowering of llvm.set.rounding
Differential Revision: https://reviews.llvm.org/D96501
2021-02-13 11:16:29 +07:00
Craig Topper
25ee8ece36 [RISCV] Move riscv_vfmv_v_f_vl patterns to RISCVInstrInfoVVLPatterns.td for consistency with riscv_vmv_v_x_vl. NFC 2021-02-12 16:08:27 -08:00
Craig Topper
542e18bed6 [RISCV] Add support for fixed vector fabs 2021-02-12 15:33:36 -08:00
Craig Topper
a3618cb881 [RISCV] Add support for fixed vector sqrt. 2021-02-12 15:33:29 -08:00
Jessica Paquette
1f262f8352 [AArch64][GlobalISel] Fold constants into G_GLOBAL_VALUE
This is pretty much just ports `performGlobalAddressCombine` from
AArch64ISelLowering. (AArch64 doesn't use the generic DAG combine for this.)

This adds a pre-legalize combine which looks for this pattern:

```
  %g = G_GLOBAL_VALUE @x
  %ptr1 = G_PTR_ADD %g, cst1
  %ptr2 = G_PTR_ADD %g, cst2
  ...
  %ptrN = G_PTR_ADD %g, cstN
```

And then, if possible, transforms it like so:

```
  %g = G_GLOBAL_VALUE @x
  %offset_g = G_PTR_ADD %g, -min(cst)
  %ptr1 = G_PTR_ADD %offset_g, cst1
  %ptr2 = G_PTR_ADD %offset_g, cst2
  ...
  %ptrN = G_PTR_ADD %offset_g, cstN
```

Where min(cst) is the smallest out of the G_PTR_ADD constants.

This means we should save at least one G_PTR_ADD.

This also updates code in the legalizer + selector which assumes that
G_GLOBAL_VALUE will never have an offset and adds/updates relevant tests.

Differential Revision: https://reviews.llvm.org/D96624
2021-02-12 14:55:15 -08:00
Craig Topper
42bd4878de [RISCV] Use a ComplexPattern to merge the PatFrags for removing unneeded masks on shift amounts.
Rather than having patterns with and without an AND, use a
ComplexPattern to handle both cases.

Reduces the isel table by about 700 bytes.
2021-02-12 14:03:23 -08:00
Stanislav Mekhanoshin
525e98279b [AMDGPU] Fix Windows build
A trivial fix, 64 bit constant is 1ull, not 1ul on Windows.
Fixed build broken by c0d7a8bc6241.
2021-02-12 12:30:52 -08:00
Amara Emerson
988eea1dc5 [GlobalISel] Propagate extends through G_PHIs into the incoming value blocks.
This combine tries to do inter-block hoisting of extends of G_PHIs, into the
originating blocks of the phi's incoming value. The idea is to expose further
optimization opportunities that are normally obscured by the PHI.

Some basic heuristics, and a target hook for AArch64 is added, to allow tuning.
E.g. if the extend is used by a G_PTR_ADD, it doesn't perform this combine
since it may be folded into the addressing mode during selection.

There are very minor code size improvements on AArch64 -Os, but the real benefit
is that it unlocks optimizations like AArch64 conditional compares on some
benchmarks.

Differential Revision: https://reviews.llvm.org/D95703
2021-02-12 11:52:52 -08:00
David Green
0c8d4b9ca1 [ARM] Optimize fp store of extract to integer store if already available.
Given a floating point store from an extracted vector, with an integer
VGETLANE that already exists, storing the existing VGETLANEu directly
can be better for performance. As the value is known to already be in an
integer registers, this can help reduce fp register pressure, removed
the need for the fp extract and allows use of more integer post-inc
stores not available with vstr.

This can be a bit narrow in scope, but helps with certain biquad kernels
that store shuffled vector elements.

Differential Revision: https://reviews.llvm.org/D96159
2021-02-12 18:34:58 +00:00
Simon Pilgrim
8450c87080 [DAG] Move basic USUBSAT pattern matches from X86 to DAGCombine
Begin transitioning the X86 vector code to recognise sub(umax(a,b) ,b) or sub(a,umin(a,b)) USUBSAT patterns to make it more generic and available to all targets.

This initial patch just moves the basic umin/umax patterns to DAG, removing some vector-only checks on the way - these are some of the patterns that the legalizer will try to expand back to so we can be reasonably relaxed about matching these pre-legalization.

We can handle the trunc(sub(..))) variants as well, which helps with patterns where we were promoting to a wider type to detect overflow/saturation.

The remaining x86 code requires some cleanup first - some of it isn't actually tested etc. I also need to resurrect D25987.

Differential Revision: https://reviews.llvm.org/D96413
2021-02-12 18:22:57 +00:00
Stanislav Mekhanoshin
485c24bc42 [AMDGPU] Allow accvgpr_read/write decode with opsel
These two instructions are VOP3P and have op_sel_hi bits,
however do not use op_sel_hi. That is recommended to set
unused op_sel_hi bits to 1. However, we cannot decode
both representations with 1 and 0 if bits are set to
default value 1. If bits are set to be ignored with '?'
initializer then encoding defaults them to 0.

The patch is a hack to force ignored '?' bits to 1 on
encoding for these instructions.

There is still canonicalization happens on disasm print
if incoming values are non-default, so that disasm output
does not match binary input, but this is pre-existing
problem for all instructions with '?' bits.

Fixes: SWDEV-272540

Differential Revision: https://reviews.llvm.org/D96543
2021-02-12 10:04:47 -08:00
Akira Hatanaka
078c441e76 [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-12 09:51:57 -08:00
Craig Topper
38a1f47551 [RISCV] Add support for integer fixed vector setcc
I believe I've covered all orderings of splat operands here. Better
canonicalization in lowering might help reduce this. I did not handle
the immediate adjustments needed for set(u)gt/set(u)lt.

Testing here is limited to byte types because the scalable vector
type used for masks for the store is calculated assuming 8 byte
elements. But for the setcc its based on the element count of the
container type for the setcc input. So they don't agree. We'll need
to enhanced D96352 to handle this I think.

Differential Revision: https://reviews.llvm.org/D96443
2021-02-12 09:29:41 -08:00
Craig Topper
72741c7878 [RISCV] Add support for matching .vx and .vi forms of binary instructions for fixed vectors.
Unlike scalable vectors, I'm only using a ComplexPattern for
the immediate itself. The vmv_v_x is matched explicitly. We igore
the VL argument when matching a binary operator, but we do check
it when matching splat directly.

I left out tests for vXi64 as they fail on rv32 right now.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96365
2021-02-12 09:18:10 -08:00
Jay Foad
3fbdec87d1 [TableGen][GlobalISel] Allow duplicate RendererFns
Allow different GICustomOperandRenderers to use the same RendererFn.
This avoids the need for targets to define a bunch of identical C++
renderer functions with different names.

Without this fix TableGen would have emitted code that tried to define
the GICR enumeration with duplicate enumerators.

Differential Revision: https://reviews.llvm.org/D96587
2021-02-12 15:05:32 +00:00
David Green
22b5fa9203 [ARM] Single source VMOVNT
Our current lowering of VMOVNT goes via a shuffle vector of the form
<0, N, 2, N+2, 4, N+4, ..>. That can of course also be a single input
shuffle of the form <0, 0, 2, 2, 4, 4, ..>, where we use a VMOVNT to
insert a vector into the top lanes of itself. This adds lowering of that
case, re-using the existing isVMOVNMask.

Differential Revision: https://reviews.llvm.org/D96065
2021-02-12 14:28:57 +00:00
Sanjay Patel
d1a8bb697a [Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics
The vector reduction intrinsics started life as experimental ops, so backend support
was lacking. As part of promoting them to 1st-class intrinsics, however, codegen
support was added/improved:
D58015
D90247

So I think it is safe to now remove this complication from IR.

Note that we still have an IR-level codegen expansion pass for these as discussed
in D95690. Removing that is another step in simplifying the logic. Also note that
x86 was already unconditionally forming reductions in IR, so there should be no
difference for x86.

I spot checked a couple of the tests here by running them through opt+llc and did
not see any asm diffs.

If we do find functional differences for other targets, it should be possible
to (at least temporarily) restore the shuffle IR with the ExpandReductions IR
pass.

Differential Revision: https://reviews.llvm.org/D96552
2021-02-12 08:13:50 -05:00
luxufan
7124b02d55 [RISCV] Change parseVTypeI function
Change parseVTypeI function to Make the added vset instruction test cases report more concrete error message.

Differential Revision: https://reviews.llvm.org/D96218
2021-02-12 19:38:34 +08:00
Fraser Cormack
765390acdd [RISCV] Add support for integer fixed min/max
This patch extends the initial fixed-length vector support to include
smin, smax, umin, and umax.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96491
2021-02-12 09:19:45 +00:00
Heejin Ahn
3c77486de7 [WebAssembly] Fix delegate's argument computation
I previously assumed `delegate`'s immediate argument computation
followed a different rule than that of branches, but we agreed to make
it the same
(https://github.com/WebAssembly/exception-handling/issues/146). This
removes the need for a separate `DelegateStack` in both CFGStackify and
InstPrinter.

When computing the immediate argument, we use a different function for
`delegate` computation because in MIR `DELEGATE`'s instruction's
destination is the destination catch BB or delegate BB, and when it is a
catch BB, we need an additional step of getting its corresponding `end`
marker.

Reviewed By: tlively, dschuff

Differential Revision: https://reviews.llvm.org/D96525
2021-02-11 21:57:28 -08:00
Craig Topper
ef5423cf23 [RISCV] Add a pattern for a scalable vector mask vnot.
We can use a vnand.mm with the same register for both inputs.
This avoids materializing an alls ones constant with vmset.mm.
2021-02-11 15:34:58 -08:00
ShihPo Hung
934bc977ad [RISCV] Initial support for insert/extract subvector
This patch handles cast-like insert_subvector & extract_subvector
in which case:
1. index starts from 0.
2. inserting a fixed-width vector into a scalable vector,
   or extracting a fixed-width vector from a scalable vector.

Reviewed By: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D96352
2021-02-11 14:35:49 -08:00
Pengxuan Zheng
cd15616305 [AArch64] Adding Neon Sm3 & Sm4 Intrinsics
This adds SM3 and SM4 Intrinsics support for AArch64, specifically:
        vsm3ss1q_u32
        vsm3tt1aq_u32
        vsm3tt1bq_u32
        vsm3tt2aq_u32
        vsm3tt2bq_u32
        vsm3partw1q_u32
        vsm3partw2q_u32
        vsm4eq_u32
        vsm4ekeyq_u32

Reviewed By: labrinea

Differential Revision: https://reviews.llvm.org/D95655
2021-02-11 14:20:20 -08:00
Stanislav Mekhanoshin
8e1a6005f7 [AMDGPU] Fix promote alloca with double use in a same insn
If we have an instruction where more than one pointer operands
are derived from the same promoted alloca, we are fixing it for
one argument and do not fix a second use considering this user
done.

Fix this by deferring processing of memory intrinsics until all
potential operands are replaced.

Fixes: SWDEV-271358

Differential Revision: https://reviews.llvm.org/D96386
2021-02-11 11:42:25 -08:00
Matt Arsenault
0ac1411820 AMDGPU: Restrict soft clause bundling at half of the available regs
Fixes a testcase that was overcommitting large register tuples to a
bundle, which the register allocator could not possibly satisfy.  This
was producing a bundle which used nearly all of the available SGPRs
with a series of 16-dword loads (not all of which are freely available
to use).

This is a quick hack for some deeper issues with how the clause
bundler tracks register pressure.

Overall the pressure tracking used here doesn't make sense and is too
imprecise for what it needs to avoid the allocator failing. The
pressure estimate does not account for the alignment requirements of
large SGPR tuples, so this was really underestimating the pressure
impact. This also ignores the impact of the extended live range of the
use registers after the bundle is introduced. Additionally, it didn't
account for some wide tuples not being available due to reserved
registers.

This regresses a few cases. These end up introducing more
spilling. This is also a function of the global pressure being used in
the decision to bundle, not the local pressure impact of the bundle
itself.
2021-02-11 14:08:59 -05:00
Yonghong Song
18184b22d0 BPF: Add LLVMAnalysis in CMakefile LINK_COMPONENTS
buildbot reported a build error like below:
  BPFTargetMachine.cpp:(.text._ZN4llvm19TargetTransformInfo5ModelINS_10BPFTTIImplEED2Ev
    [_ZN4llvm19TargetTransformInfo5ModelINS_10BPFTTIImplEED2Ev]+0x14):
    undefined reference to `llvm::TargetTransformInfo::Concept::~Concept()'
  lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BPFTargetMachine.cpp.o:
    In function `llvm::TargetTransformInfo::Model<llvm::BPFTTIImpl>::~Model()':

Commit a260ae716030 ("BPF: Implement TTI.IntImmCost() properly")
added TargetTransformInfo to BPF, which requires LLVMAnalysis
dependence. In certain cmake configurations, lacking explicit
LLVMAnalysis dependency may cause compilation error.
Similar to other targets, this patch added LLVMAnalysis
in CMakefile LINK_COMPONENTS explicitly.
2021-02-11 10:24:22 -08:00
Jay Foad
2f91cc65b2 [AMDGPU] Better selection of base offset when merging DS reads/writes
When merging a pair of DS reads or writes needs to materialize the base
offset in a vgpr, choose a value that is aligned to as high a power of
two as possible. This maximises the chance that different pairs can use
the same base offset, in which case the base offset registers can be
commoned up by MachineCSE.

Differential Revision: https://reviews.llvm.org/D96421
2021-02-11 17:46:09 +00:00
Craig Topper
32e1c9e6be [TargetLowering][RISCV][AArch64][PowerPC] Enable BuildUDIV/BuildSDIV on illegal types before type legalization if we can find a larger legal type that supports MUL.
If we wait until the type is legalized, we'll lose information
about the orginal type and need to use larger magic constants.
This gets especially bad on RISCV64 where i64 is the only legal
type.

I've limited this to simple scalar types so it only works for
i8/i16/i32 which are most likely to occur. For more odd types
we might want to do a small promotion to a type where MULH is legal
instead.

Unfortunately, this does prevent some urem/srem+seteq matching since
that still require legal types.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D96210
2021-02-11 09:43:13 -08:00
Craig Topper
5d204c33c0 [RISCV] Add support loads, stores, and splats of vXi1 fixed vectors.
This refines how we determine which masks types are legal and adds
support for loads, stores, and all ones/zeros splats.

I left a fixme in store handling where I think we need to zero
extra bits if the type isn't a multiple of a byte. If I remember
right from X86 there was some case we could have a store of a
1, 2, or 4 bit mask and have a scalar zextload that then expected the
bits to be 0. Its tricky to zero the bits with RVV. We need to do
something like round VL up, zero a register, lower the VL back down,
then do a tail undisturbed move into the zero register. Another
option might be to generate a mask of 1/2/4 bits set with a VL of 8
and use that to mask off the bits.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96468
2021-02-11 09:13:16 -08:00
Yonghong Song
d55275bb4b BPF: Implement TTI.IntImmCost() properly
This patch implemented TTI.IntImmCost() properly.
Each BPF insn has 32bit immediate space, so for any immediate
which can be represented as 32bit signed int, the cost
is technically free. If an int cannot be presented as
a 32bit signed int, a ld_imm64 instruction is needed
and a TCC_Basic is returned.

This change is motivated when we observed that
several bpf selftests failed with latest llvm trunk, e.g.,
  #10/16 strobemeta.o:FAIL
  #10/17 strobemeta_nounroll1.o:FAIL
  #10/18 strobemeta_nounroll2.o:FAIL
  #10/19 strobemeta_subprogs.o:FAIL
  #96 snprintf_btf:FAIL

The reason of the failure is due to that
SpeculateAroundPHIsPass did aggressive transformation
which alters control flow for which currently verifer
cannot handle well. In llvm12, SpeculateAroundPHIsPass
is not called.

SpeculateAroundPHIsPass relied on TTI.getIntImmCost()
and TTI.getIntImmCostInst() for profitability
analysis. This patch implemented TTI.getIntImmCost()
properly for BPF backend which also prevented
transformation which caused the above test failures.

Differential Revision: https://reviews.llvm.org/D96448
2021-02-11 08:35:25 -08:00
David Green
91764313a1 [ARM] Add CostKind to getMVEVectorCostFactor.
This adds the CostKind to getMVEVectorCostFactor, so that it can
automatically account for CodeSize costs, where it returns a cost of 1
not the MVEFactor used for Throughput/Latency. This helps simplify the
caller code and allows us to get the codesize cost more correct in more
cases.
2021-02-11 15:33:59 +00:00
Simon Tatham
84aee28dc2 [ARM] Copy-paste error in ARMv87a architecture definition.
In the tablegen architecture definition, the Name field for the
ARMv87a record read "ARMv86a". All the other records contain their own
names.

Corrected it to "ARMv87a", and added the necessary value in
ARMArchEnum for that to refer to.

Reviewed By: pratlucas

Differential Revision: https://reviews.llvm.org/D96493
2021-02-11 13:35:56 +00:00
David Green
71e1b7feea [ARM] Change getScalarizationOverhead overload used in gather costs. NFC
This changes which of the getScalarizationOverhead overloads is used in
the gather/scatter cost to use the base variant directly, not relying on
the version using heuristics on the number of args with no args
provided. It should still produce the same costs for scalarized
gathers/scatters.
2021-02-11 11:58:55 +00:00
Carl Ritson
fb4b457dbd [AMDGPU] Move kill lowering to WQM pass and add live mask tracking
Move implementation of kill intrinsics to WQM pass. Add live lane
tracking by updating a stored exec mask when lanes are killed.
Use live lane tracking to enable early termination of shader
at any point in control flow.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D94746
2021-02-11 20:31:29 +09:00
David Green
c7af7bf359 [ARM] Remove dead mov's in preheader of tail predicated loops
With t2DoLoopDec we can be left with some extra MOV's in the preheaders
of tail predicated loops. This removes them, in the same way we remove
other dead variables.

Differential Revision: https://reviews.llvm.org/D91857
2021-02-11 10:48:20 +00:00
Sander de Smalen
e6c099942e [TTI] Change TargetTransformInfo::getMinimumVF to return ElementCount
This will be needed in the loop-vectorizer where the minimum VF
requested may be a scalable VF. getMinimumVF now takes an additional
operand 'IsScalableVF' that indicates whether a scalable VF is required.

Reviewed By: kparzysz, rampitec

Differential Revision: https://reviews.llvm.org/D96020
2021-02-11 09:08:48 +00:00
David Green
a86aaa927d [ARM] Make a BE predicate bitcast consistent with the rest of llvm
We were storing predicate registers, such as a <8 x i1>, in the opposite
order to how the rest of llvm expects. This actually turns out to be
correct for the one place that usually uses it - the
ScalarizeMaskedMemIntrin pass, but only because the pass was incorrect
itself. This fixes the order so that bits are stored in the opposite
order and bitcasts work as expected. This allows the Scalarization pass
to be fixed, as in https://reviews.llvm.org/D94765.

Differential Revision: https://reviews.llvm.org/D94867
2021-02-11 08:59:52 +00:00