1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 12:33:33 +02:00
llvm-mirror/lib
Eli Friedman ca89c6b055 [SelectionDAG] Improve the legalisation lowering of UMULO.
There is no way in the universe, that doing a full-width division in
software will be faster than doing overflowing multiplication in
software in the first place, especially given that this same full-width
multiplication needs to be done anyway.

This patch replaces the previous implementation with a direct lowering
into an overflowing multiplication algorithm based on half-width
operations.

Correctness of the algorithm was verified by exhaustively checking the
output of this algorithm for overflowing multiplication of 16 bit
integers against an obviously correct widening multiplication. Baring
any oversights introduced by porting the algorithm to DAG, confidence in
correctness of this algorithm is extremely high.

Following table shows the change in both t = runtime and s = space. The
change is expressed as a multiplier of original, so anything under 1 is
“better” and anything above 1 is worse.

+-------+-----------+-----------+-------------+-------------+
| Arch  | u64*u64 t | u64*u64 s | u128*u128 t | u128*u128 s |
+-------+-----------+-----------+-------------+-------------+
|   X64 |     -     |     -     |    ~0.5     |    ~0.64    |
|  i686 |   ~0.5    |   ~0.6666 |    ~0.05    |    ~0.9     |
| armv7 |     -     |   ~0.75   |      -      |    ~1.4     |
+-------+-----------+-----------+-------------+-------------+

Performance numbers have been collected by running overflowing
multiplication in a loop under `perf` on two x86_64 (one Intel Haswell,
other AMD Ryzen) based machines. Size numbers have been collected by
looking at the size of function containing an overflowing multiply in
a loop.

All in all, it can be seen that both performance and size has improved
except in the case of armv7 where code size has regressed for 128-bit
multiply. u128*u128 overflowing multiply on 32-bit platforms seem to
benefit from this change a lot, taking only 5% of the time compared to
original algorithm to calculate the same thing.

The final benefit of this change is that LLVM is now capable of lowering
the overflowing unsigned multiply for integers of any bit-width as long
as the target is capable of lowering regular multiplication for the same
bit-width. Previously, 128-bit overflowing multiply was the widest
possible.

Patch by Simonas Kazlauskas!

Differential Revision: https://reviews.llvm.org/D50310

llvm-svn: 339922
2018-08-16 18:39:39 +00:00
..
Analysis [NFC] Add missing const modifier 2018-08-16 06:28:04 +00:00
AsmParser [DebugInfoMetadata] Added DIFlags interface in DIBasicType. 2018-08-14 19:35:34 +00:00
BinaryFormat [dwarfdump] Add pretty printer for accelerator table based on Atom. 2018-07-13 17:21:51 +00:00
Bitcode [DebugInfoMetadata] Added DIFlags interface in DIBasicType. 2018-08-14 19:35:34 +00:00
CodeGen [SelectionDAG] Improve the legalisation lowering of UMULO. 2018-08-16 18:39:39 +00:00
DebugInfo [codeview] Use push_macro to avoid conflicts instead of a prefix 2018-08-16 17:34:31 +00:00
Demangle Fix memory leak in demangling of string literals. 2018-08-16 17:48:32 +00:00
ExecutionEngine [MCJIT] Fix a case of Error::success() being passed to report_fatal_error. 2018-08-15 20:11:21 +00:00
Fuzzer
FuzzMutate Remove trailing space 2018-07-30 19:41:25 +00:00
IR [X86] Remove masking from the 512-bit padds and psubs intrinsics. Use select in IR instead. 2018-08-16 06:20:24 +00:00
IRReader
LineEditor
Linker [NFC] Remove an empty line. 2018-07-27 06:50:45 +00:00
LTO Remove trailing space 2018-07-30 19:41:25 +00:00
MC [MC] Cleanup noop default case spelling. NFC. 2018-08-16 17:22:31 +00:00
Object llvm-readobj: Fix addend in relocations for android packed format 2018-08-15 17:58:22 +00:00
ObjectYAML [yaml2obj] - Add a support for changing EntSize. 2018-08-07 08:11:38 +00:00
Option
Passes Revert "[GVNHoist] Re-enable GVNHoist by default" 2018-07-30 20:07:33 +00:00
ProfileData [Coverage] Ignore 'unused' functions with non-zero execution counts 2018-08-07 22:25:36 +00:00
Support [ADT] Replace APInt::WORD_MAX with APInt::WORDTYPE_MAX 2018-08-16 11:08:23 +00:00
TableGen Remove trailing space 2018-07-30 19:41:25 +00:00
Target [codeview] Use push_macro to avoid conflicts instead of a prefix 2018-08-16 17:34:31 +00:00
Testing
ToolDrivers Give llvm-lib rudimentary help output. 2018-07-14 02:29:44 +00:00
Transforms [InstCombine] Expand the simplification of pow(x, 0.5) to sqrt(x) 2018-08-16 15:58:08 +00:00
WindowsManifest
XRay [XRay] Improve error reporting when loading traces 2018-08-07 04:42:39 +00:00
CMakeLists.txt
LLVMBuild.txt