1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 18:54:02 +01:00
Commit Graph

216117 Commits

Author SHA1 Message Date
Adam Nemet
885b8e3e0a [Matrix] Fold the transpose into the matmul operand used to fetch scalars
For column-major this is:
  A * B^t
whereas for row-major:
  A^t * B

Differential Revision: https://reviews.llvm.org/D101762
2021-05-17 17:40:46 -07:00
Philip Reames
9a9cd48cbe [LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3)
Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll

Previous commit message...

This is a resubmit of 3e5ce4 (which was reverted by 7fe41ac).  The original commit caused a PPC build bot failure we never really got to the bottom of.  I can't reproduce the issue, and the bot owner was non-responsive.  In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in 80e8025.  My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess.

Original commit message follows...

If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way.

Differential Revision: https://reviews.llvm.org/D94892
2021-05-17 16:59:25 -07:00
Stanislav Mekhanoshin
dc20bc576c [AMDGPU] Do not check denorm for LDS FP atomic with unsafe flag
This is already how it is handled for global and flat atomics.

Differential Revision: https://reviews.llvm.org/D102366
2021-05-17 16:53:09 -07:00
Philip Reames
91a232a867 Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute"
This reverts commit c23ce54b36b1a52eb280ea1d59802b56d6dd9800.  I apparently missed some newly added non-x86 tests.
2021-05-17 16:49:32 -07:00
Philip Reames
0998b43333 [LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute
This is a resubmit of 3e5ce4 (which was reverted by 7fe41ac).  The original commit caused a PPC build bot failure we never really got to the bottom of.  I can't reproduce the issue, and the bot owner was non-responsive.  In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in 80e8025.  My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess.

Original commit message follows...

If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way.

Differential Revision: https://reviews.llvm.org/D94892
2021-05-17 16:33:56 -07:00
Ben Shi
0ac3dead80 [RISCV][test] Add new tests of or/xor in the zbs extension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102625
2021-05-18 07:10:17 +08:00
Scott Linder
9078fb1b63 [ADT] Add new type traits for type pack indexes
Similar versions of these already exist, this effectively just just
factors them out into STLExtras. I plan to use these in future patches.

Differential Revision: https://reviews.llvm.org/D100672
2021-05-17 22:28:55 +00:00
Scott Linder
337d1f312b [ADT] Factor out in_place_t and expose in Optional ctor
Differential Revision: https://reviews.llvm.org/D100671
2021-05-17 22:25:39 +00:00
Eli Friedman
6944a84c4d [AArch64][SVE] Implement extractelement of i1 vectors.
The implementation just extends the vector to a larger element type, and
extracts from that.  Not fancy, but generates reasonable code.

There was discussion in the review of doing the promotion in
target-independent code, but I'm sticking with this to avoid making
LegalizeDAG infrastructure more complicated.

Differential Revision: https://reviews.llvm.org/D87651
2021-05-17 14:51:11 -07:00
Philip Reames
03638ef4ae Do actual DCE in LoopUnroll (try 3)
Recommitting after fixing a bug found post commit.  Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug.  Oops.  :)

The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration.  Test case added in LoopUnroll/dce.ll to cover this case.

LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.

Differential Revision: https://reviews.llvm.org/D102511
2021-05-17 14:47:02 -07:00
Arthur Eubanks
dcd63c6bcf [test] Free triple in PassBuilderBindingsTest 2021-05-17 13:58:16 -07:00
Heejin Ahn
6eb058e5ac [WebAssembly] Nullify DBG_VALUE_LISTs in DebugValueManager
WebAssemblyDebugValueManager class currently does not handle
DBG_VALUE_LIST instructions correctly for two reasons, which are
explained in https://bugs.llvm.org/show_bug.cgi?id=50361.

This effectively nullifies DBG_VALUE_LISTs in
WebAssemblyDebugValueManager so that the info will appear as "optimized
out" in debuggers but still be at least correct in the meantime.

Reviewed By: dschuff, jmorse

Differential Revision: https://reviews.llvm.org/D102589
2021-05-17 13:47:36 -07:00
Mitch Phillips
c43a4a465c Revert "X86: support Swift Async context"
This reverts commit 747e5cfb9f5d944b47fe014925b0d5dc2fda74d7.

Reason: New frame layout broke the sanitizer unwinder. Not clear why,
but seems like some of the changes aren't always guarded by Swyft
checks. See
https://reviews.llvm.org/rG747e5cfb9f5d944b47fe014925b0d5dc2fda74d7 for
more information.
2021-05-17 12:44:57 -07:00
LLVM GN Syncbot
3e066dbc01 [gn build] Port 0c557db61711 2021-05-17 18:56:03 +00:00
Sanjay Patel
0595724bb0 [InstCombine] fold fnegs around select
This is one of the folds requested in:
https://llvm.org/PR39480

https://alive2.llvm.org/ce/z/NczU3V

Note - this uses the normal FMF propagation logic
(flags transfer from the final value to new/intermediate ops).
It's not clear if this matches what Alive2 implements,
so we may want to adjust one or the other.
2021-05-17 14:53:49 -04:00
Sanjay Patel
a545a0416b [InstCombine] add tests for fneg-of-select; NFC 2021-05-17 14:53:48 -04:00
Nick Desaulniers
e282834e98 [AArch64] Support customizing stack protector guard
Follow up to D88631 but for aarch64; the Linux kernel uses the command
line flags:

1. -mstack-protector-guard=sysreg
2. -mstack-protector-guard-reg=sp_el0
3. -mstack-protector-guard-offset=0

to use the system register sp_el0 for the stack canary, enabling the
kernel to have a unique stack canary per task (like a thread, but not
limited to userspace as the kernel can preempt itself).

Address pr/47341 for aarch64.

Fixes: https://github.com/ClangBuiltLinux/linux/issues/289
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: xiangzhangllvm, DavidSpickett, dmgreen

Differential Revision: https://reviews.llvm.org/D100919
2021-05-17 11:49:22 -07:00
Peter Collingbourne
6ff12b1f0b gn build: Only build the hwasan runtime in aliasing mode on x86.
The LAM mode is currently untested by check-hwasan, so we only need
to build the runtime in aliasing mode. Because LAM mode will always
need to be conditional (because only certain hardware will support
it) we can always just disable the LAM lit tests if it ever starts
being tested.
2021-05-17 11:48:49 -07:00
Mats Larsen
76874c8e3b [NewPM] Add C bindings for new pass manager
This patch contains the bare minimum to run the new Pass Manager from the LLVM-C APIs. It does not feature PGOOptions, PassPlugins or Debugify in its current state. Bugzilla: PR48499

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D102136
2021-05-17 11:45:47 -07:00
Nico Weber
9187a7ce7e Revert "[NewPM] Add C bindings for new pass manager"
This reverts commit cd220a06782c3da13a53de2fdf10d928eef6460c.
Doesn't build.
2021-05-17 13:59:12 -04:00
Mats Larsen
2251d44074 [NewPM] Add C bindings for new pass manager
This patch contains the bare minimum to run the new Pass Manager from the LLVM-C APIs. It does not feature PGOOptions, PassPlugins or Debugify in its current state. Bugzilla: PR48499

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D102136
2021-05-17 10:48:45 -07:00
Roman Lebedev
4892585b46 [LoopIdiom] 'logical right-shift until zero' ('count active bits') "on steroids" idiom recognition.
I think i've added exhaustive test coverage, and i have verified that alive2 is happy with all the tests,
so in principle i'm fine with landing this without review, but just in case..

This adds support for the "count active bits" pattern, i.e.:
```
int countActiveBits(unsigned val) {
    int cnt = 0;
    for( ; (val >> cnt) != 0; ++cnt)
        ;
    return cnt;
}
```
but a somewhat more general one, since that is what i need:
```
int countActiveBits(unsigned val, int start, int off) {
    int cnt;
    for (cnt = start; val >> (cnt + off); cnt++)
        ;
    return cnt;
}
```

I've followed in footstep of 'left-shift until bittest' idiom (D91038),
in the sense that iff the `ctlz` intrinsic is cheap, we'll transform,
regardless of all other factors.

This can have a shocking effect on certain benchmarks:
```
raw.pixls.us-unique/Olympus/XZ-1$ /repositories/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf
RUNNING: /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmp49_28zcm
2021-05-09T01:06:05+03:00
Running /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench
Run on (32 X 3600.24 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x2)
Load Average: 5.26, 6.29, 3.49
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_mean          145 ms          145 ms          128   0.145319         0.999981   10.1568M       69.8949M        69.8936M      6.88159       6.88146   0.145322
p1319978.orf/threads:32/process_time/real_time_median        145 ms          145 ms          128   0.145317         0.999986   10.1568M       69.8941M        69.8931M      6.88151       6.88141   0.145319
p1319978.orf/threads:32/process_time/real_time_stddev      0.766 ms        0.766 ms          128   766.586u         15.1302u          0       354.167k        354.098k    0.0348699     0.0348631   766.469u
RUNNING: /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpwb9sw2x0
2021-05-09T01:06:24+03:00
Running /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Run on (32 X 3599.95 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x2)
Load Average: 4.05, 5.95, 3.43
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_mean         99.8 ms         99.8 ms          128  0.0997758         0.999972   10.1568M       101.797M        101.794M      10.0225       10.0222  0.0997786
p1319978.orf/threads:32/process_time/real_time_median       99.7 ms         99.7 ms          128  0.0997165         0.999985   10.1568M       101.857M        101.854M      10.0284       10.0281  0.0997195
p1319978.orf/threads:32/process_time/real_time_stddev      0.224 ms        0.224 ms          128   224.166u          34.345u          0        226.81k        227.231k    0.0223309     0.0223723   224.586u
Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Benchmark                                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_pvalue                 0.0000          0.0000      U Test, Repetitions: 128 vs 128
p1319978.orf/threads:32/process_time/real_time_mean                  -0.3134         -0.3134           145           100           145           100
p1319978.orf/threads:32/process_time/real_time_median                -0.3138         -0.3138           145           100           145           100
p1319978.orf/threads:32/process_time/real_time_stddev                -0.7073         -0.7078             1             0             1             0

```

Reviewed By: craig.topper, zhuhan0

Differential Revision: https://reviews.llvm.org/D102116
2021-05-17 20:33:33 +03:00
Steffen Larsen
1e7a7bb573 [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions
Adds NVPTX builtins and intrinsics for the CUDA PTX `redux.sync` instructions
for `sm_80` architecture or newer.

PTX ISA description of `redux.sync`:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-redux-sync

Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>

Differential Revision: https://reviews.llvm.org/D100124
2021-05-17 09:46:59 -07:00
Stuart Adams
4b94b88699 [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions
Adds NVPTX builtins and intrinsics for the CUDA PTX `cp.async` instructions for
`sm_80` architecture or newer.

PTX ISA description of `cp.async`:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-asynchronous-copy
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive

Authored-by: Stuart Adams <stuart.adams@codeplay.com>
Co-Authored-by: Alexander Johnston <alexander@codeplay.com>

Differential Revision: https://reviews.llvm.org/D100394
2021-05-17 09:46:59 -07:00
Alex Zinenko
01e701738c [llvm][doc] fix header for read/write_register intrinsics in LangRef
Mutli-line headers are not allowed in RST, reformat the header to be a
single wide line.
2021-05-17 18:38:16 +02:00
Florian Hahn
f33ce47836 [LoopUnroll] Add multi-exit test which does not exit through latch.
This patch adds a new test for loop-unrolling with multiple exiting
blocks, where the latch does not exit, but the header does. This can
happen when the loop has not been rotated, e.g. due to minsize.

Inspired by the following end-to-end test, using -Oz
https://godbolt.org/z/fP6sna8qK

    bool foo(int *ptr, int limit) {
        #pragma clang loop unroll(full)
        for (unsigned int i = 0; i < 4; i++) {
            if (ptr[i] > limit)
            return false;
            ptr[i]++;
        }
        return true;
    }
2021-05-17 17:08:15 +01:00
Stanislav Mekhanoshin
ca4e8cd7a6 [AMDGPU] Set unused dst_sel to '?' in the encoding
This is to allow disasm with any bits in the unused fields.

Differential Revision: https://reviews.llvm.org/D102526
2021-05-17 08:38:52 -07:00
Sanjay Patel
afbea032ea [x86] update fma test with deprecated intrinsics; NFC
All of the CHECK lines should be identical to before,
but without any of the x86-specific calls that were
replaced with generic FMA long ago.

The file still has value because it shows a miscompile
as demonstrated in D90901, but we probably need to
add tests with FMF to make that explicit without
losing coverage.
2021-05-17 11:06:32 -04:00
Simon Pilgrim
a3d5fb0749 [X86] Don't dereference a dyn_cast<> - use a cast<> instead. NFCI.
dyn_cast<> can return null if the cast fails, by using cast<> we assert that the cast is correct helping to avoid a potential null dereference.
2021-05-17 15:58:32 +01:00
Jay Foad
257b9bc0d9 [AMDGPU] Tweak VOP3_INTERP16 profile
Set the output register class based on the output type, instead of
hard-coding VGPR_32. I think this is more correct. It doesn't make any
difference at the moment because we use the same class for 16- and
32-bit results, but it might in future if we make more use of true
16-bit register classes.

Differential Revision: https://reviews.llvm.org/D102622
2021-05-17 15:28:00 +01:00
Fraser Cormack
6af6b413bf [RISCV][NFC] Correct alignment in scatter/gather tests
This lays the groundwork for changes to alignment in D102493 to be more
apparent.
2021-05-17 15:12:55 +01:00
Andy Yankovsky
dc2e44c588 [APInt][NFC] Fix typo vlalue->value
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D102618
2021-05-17 16:18:22 +02:00
Irina Dobrescu
f3133f1d74 [AArch64] Lower bitreverse in ISel
Adding lowering support for bitreverse.

Previously, lowering bitreverse would expand it into a series of other instructions. This patch makes it so this produces a single rbit instruction instead.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D102397
2021-05-17 13:35:27 +01:00
Benjamin Kramer
3ae6f7dd7e Put back the trailing commas on TYPED_TEST_SUITE
This avoids a -pedantic warning:
warning: ISO C++11 requires at least one argument for the "..." in a variadic macro

See also https://github.com/google/googletest/issues/2271
2021-05-17 14:14:13 +02:00
Roman Lebedev
faba1577fb [InstCombine] isFreeToInvert(): constant expressions aren't free to invert (PR50370)
This fixes https://bugs.llvm.org/show_bug.cgi?id=50370,
which reports a yet another endless combine loop,
this one regressed from 554b1bced325a8d860ad00bd59020d66d01c95f8,
which fixed yet another endless combine loop (PR50308)

This code had fallen into the very typical pitfall of forgetting
that constant expressions exist, and they aren't free to invert,
because the `not` won't be absorbed by the "constant",
but will remain a (constant) expression...
2021-05-17 14:58:05 +03:00
Simon Pilgrim
e099d7d19d [X86] Regenerate cmov.ll tests 2021-05-17 12:50:38 +01:00
James Henderson
b7e39fd3ed [debuginfo-tests] Fix environment variable used to specify LLDB
Currently, if the user specifies the environment variable 'CLANG', tests
will attempt to use the value as a path to the clang executable.
Previously, lldb could also be specified via the CLANG environment
variable, but this was almost certainly a bug, because that meant both
clang and lldb would have the same path. This patch changes the
environment variable for lldb to 'LLDB'.

Reviewed by: thopre, teemperor

Differential Revision: https://reviews.llvm.org/D101982
2021-05-17 12:50:10 +01:00
Benjamin Kramer
1e6c6cdc2d Clean up uses of gmock Invoke in an attempt to make it work with GCC 6.2. NFCI. 2021-05-17 13:48:45 +02:00
Nemanja Ivanovic
8655d76475 [PowerPC] Add patterns for vselect of v1i128
These patterns are missing even though the underlying instruction
doesn't really care about the type. Added these patterns to resolve
https://bugs.llvm.org/show_bug.cgi?id=50084
2021-05-17 06:37:46 -05:00
Max Kazantsev
4cff0ef1cf [Test] Auto-generate checks in a test (prepring to update) 2021-05-17 18:26:47 +07:00
Nemanja Ivanovic
5fe52009aa [PowerPC] Do not emit dssall on AIX
This instruction is a nop on all server cores (certainly on all
cores that AIX supports) so it is fine to emit a nop instead of it.
In fact, that is exactly what XL emits. So we emit a nop on AIX
and we leave the codegen as is on other platforms since there may
indeed be cores out there for which this actually does some prefetching.
2021-05-17 06:08:06 -05:00
Nico Weber
b25c3747e2 [gn build] reformat all gn files
$ git ls-files '*.gn' '*.gni' | xargs llvm/utils/gn/gn.py format
2021-05-17 06:59:43 -04:00
Nico Weber
3e66c8c917 [gn build] Add build file for msan runtime
Works for the examples on
https://clang.llvm.org/docs/MemorySanitizer.html

Differential Revision: https://reviews.llvm.org/D102554
2021-05-17 06:58:10 -04:00
Tim Northover
4f5c40575e X86: support Swift Async context
This adds support to the X86 backend for the newly committed swiftasync
function parameter. If such a (pointer) parameter is present it gets stored
into an augmented frame record (populated in IR, but generally containing
enhanced backtrace for coroutines using lots of tail calls back and forth).

The context frame is identical to AArch64 (primarily so that unwinders etc
don't get extra complexity). Specfically, the new frame record is [AsyncCtx,
%rbp, ReturnAddr], and its presence is signalled by bit 60 of the stored %rbp
being set to 1. %rbp still points to the frame pointer in memory for backwards
compatibility (only partial on x86, but OTOH the weird AsyncCtx before the rest
of the record is because of x86).
2021-05-17 11:56:16 +01:00
Tim Northover
47e1a53df5 AArch64: mark x22 livein if it's an async context that gets stored.
This fixes a crash with expensive checks enabled (the verifier was not happy).
2021-05-17 11:56:03 +01:00
Max Kazantsev
a6b214f432 [Test] Fix test to make the transform for which is was added legal
%limit in these tests is supposed to be positive.
2021-05-17 17:19:01 +07:00
Simon Pilgrim
633bafaf5f [TargetLowering] prepareUREMEqFold/prepareSREMEqFold - account for non legal shift types
Ensure we tell getShiftAmountTy that we're working with pre-legalized types to prevent cases where the (legalized) shift type can no longer handle the (non-legalized) type width.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34366
2021-05-17 11:03:27 +01:00
Tim Northover
fc5daa6083 IR/AArch64/X86: add "swifttailcc" calling convention.
Swift's new concurrency features are going to require guaranteed tail calls so
that they don't consume excessive amounts of stack space. This would normally
mean "tailcc", but there are also Swift-specific ABI desires that don't
naturally go along with "tailcc" so this adds another calling convention that's
the combination of "swiftcc" and "tailcc".

Support is added for AArch64 and X86 for now.
2021-05-17 10:48:34 +01:00
Jacob Bramley
2320a8cd94 [AArch64] Lower fpto*i.sat intrinsics.
AArch64's fctv* instructions implement the saturating behaviour that the
fpto*i.sat intrinsics require, in cases where the destination width
matches the saturation width. Lowering them removes a lot of unnecessary
generated code.

Only scalar lowerings are supported for now.

Differential Revision: https://reviews.llvm.org/D102353
2021-05-17 10:19:19 +01:00
Fraser Cormack
69fc258fc0 [DAGCombiner] Relax an assertion to an early return
The select-of-constants transform was asserting that its constant vector
inputs did not implicitly truncate their input without that as an
explicit precondition to the function. This patch relaxes that assertion
into an early return to skip the optimization.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D102393
2021-05-17 09:15:55 +01:00