1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 20:23:11 +01:00
llvm-mirror/test
Simon Pilgrim 32c30ddee7 [X86] Implement smarter instruction lowering for FP_TO_UINT from f32/f64 to i32/i64 and vXf32/vXf64 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction.
We know that "CVTTPS2SI" returns 0x80000000 for out of range inputs (and for FP_TO_UINT, negative float values are undefined). We can use this to make unsigned conversions from vXf32 to vXi32 more efficient, particularly on targets without blend using the following logic:

small := CVTTPS2SI(x);
fp_to_ui(x) := small | (CVTTPS2SI(x - 2^31) & ARITHMETIC_RIGHT_SHIFT(small, 31))

Even on targets where "PBLENDVPS"/"PBLENDVB" exists, it is often a latency 2, low throughput instruction so this logic is applied there too (in particular for AVX2 also). It furthermore gets rid of one high latency floating point comparison in the previous lowering.

@TomHender checked the correctness of this for all possible floats between -1 and 2^32 (both ends excluded).

Original Patch by @TomHender (Tom Hender)

Differential Revision: https://reviews.llvm.org/D89697
2021-07-14 12:03:49 +01:00
..
Analysis [X86] Implement smarter instruction lowering for FP_TO_UINT from f32/f64 to i32/i64 and vXf32/vXf64 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction. 2021-07-14 12:03:49 +01:00
Assembler [remangleIntrinsicFunction] Detect and resolve name clash 2021-07-13 11:21:12 +02:00
Bindings
Bitcode Revert "[DebugInfo] Enforce implicit constraints on distinct MDNodes" 2021-07-02 15:57:07 -07:00
BugPoint
CodeGen [X86] Implement smarter instruction lowering for FP_TO_UINT from f32/f64 to i32/i64 and vXf32/vXf64 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction. 2021-07-14 12:03:49 +01:00
DebugInfo [DebugInfo] Correctly update dbg.values with duplicated location ops 2021-07-14 11:17:24 +01:00
Demangle [Clang] Introduce Swift async calling convention. 2021-07-09 11:50:10 -07:00
Examples [Orc] At CBindings for LazyRexports 2021-07-01 21:52:05 +02:00
ExecutionEngine [JITLink][ELF] Move ELF section and symbol parsing into ELFLinkGraphBuilder. 2021-06-29 09:59:49 +10:00
Feature
FileCheck
Instrumentation [DebugInfo] Correctly update dbg.values with duplicated location ops 2021-07-14 11:17:24 +01:00
Integer
JitListener
Linker
LTO
MachineVerifier CodeGen: Print/parse LLTs in MachineMemOperands 2021-06-30 16:54:13 -04:00
MC [AArch64][SME] Add matrix register definitions and parsing support 2021-07-14 08:25:49 +00:00
Object [llvm-readobj] Switch command line parsing from llvm::cl to OptTable 2021-07-12 10:14:42 -07:00
ObjectYAML
Other [NewPM][SimpleLoopUnswitch] Add option to not trivially unswitch 2021-07-13 16:09:42 -07:00
SafepointIRVerifier
Support [llvm-readobj] Switch command line parsing from llvm::cl to OptTable 2021-07-12 10:14:42 -07:00
SymbolRewriter
TableGen [TableGen] Allow identical MnemonicAliases with no predicate 2021-06-30 10:53:39 +01:00
ThinLTO/X86 [CSSPGO] Do not import pseudo probe desc in thinLTO 2021-07-13 18:26:36 -07:00
tools [CSSPGO][llvm-profgen] Allow multiple executable load segments. 2021-07-13 18:22:24 -07:00
Transforms [X86] Implement smarter instruction lowering for FP_TO_UINT from f32/f64 to i32/i64 and vXf32/vXf64 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction. 2021-07-14 12:03:49 +01:00
Unit
Verifier [OpaquePtr] Support VecOfAnyPtrsToElt intrinsics 2021-07-01 20:35:33 +02:00
YAMLParser
.clang-format Add .clang-format without column limit to subdirectory tests/. 2013-11-19 04:26:05 +00:00
CMakeLists.txt [Orc] At CBindings for LazyRexports 2021-07-01 21:52:05 +02:00
lit.cfg.py [Orc] At CBindings for LazyRexports 2021-07-01 21:52:05 +02:00
lit.site.cfg.py.in Make lit configs relocatable again after c747b7d1d9a 2021-06-22 15:27:32 -04:00
TestRunner.sh