llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 03:02:36 +01:00

Author	SHA1	Message	Date
Lang Hames	cafb3faae9	[ORC][examples] Add missing library dependencies.	2020-09-22 16:18:00 -07:00
Hubert Tong	ecf7743d95	[InstCombine][NFC][tests] Add ninf base value case to pow-sqrt.ll	2020-09-22 18:58:05 -04:00
Hubert Tong	07e12367b2	[InstCombine] Fix errno bug in pow expansion to sqrt A conversion from `pow` to `sqrt` shall not call an `errno`-setting `sqrt` with -//infinity//: the `sqrt` will set `EDOM` where the `pow` call need not. This patch avoids the erroneous (pun not intended) transformation by applying the restrictions discussed in the thread for https://lists.llvm.org/pipermail/llvm-dev/2020-September/145051.html. The existing tests are updated (depending on emphasis in the checks for library calls, avoidance of overlap, and overall coverage): - to add `ninf`, retaining the intended library call, - to use the intrinsic, retaining the use of `select`, or - to expect the replacement to not occur. The following is tested: - The pow intrinsic folds to a `select` instruction to handle -//infinity//. - The pow library call folds, with `ninf`, to `sqrt` without the `select` instruction associated with handling -//infinity//. - The pow library call does not fold to `sqrt` without `ninf`. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D87877	2020-09-22 18:58:05 -04:00
Alexey Bataev	47531738d7	[SLP]Fix coding style, NFC.	2020-09-22 17:44:29 -04:00
Philip Reames	c9c01d3eea	[AArch64] Teach analyzeBranch to remove branch equivelent to fallthrough The motivation here is that MachineBlockPlacement relies on analyzeBranch to remove branches to fallthrough blocks when the branch is not fully analyzeable. With the introduction of the FAULTING_OP psuedo for implicit null checking (see D87861), this case becomes important. Note that it's hard to otherwise exercise this path as BranchFolding handle's any fully analyzeable branch sequence without using this interface. p.s. For anyone who saw my comment in the original review, what I thought was an issue in BranchFolding originally turned out to simply be a bug in my patch. (Now fixed.) Differential Revision: https://reviews.llvm.org/D88035	2020-09-22 14:38:27 -07:00
Michael Liao	6003dbfc22	Fix build due to renaming in LoopInfo.	2020-09-22 17:33:38 -04:00
Fangrui Song	9e917a5915	Change LoopInfo::empty to isInnermost after D82895	2020-09-22 14:07:40 -07:00
Stefanos Baziotis	22876f12a9	Small fixes for "[LoopInfo] empty() -> isInnermost(), add isOutermost()"	2020-09-22 23:59:34 +03:00
Reid Kleckner	6fefa81755	Revert "[CodeGen] emit CG profile for COFF object file" This reverts commit 91aed9bf975f1e4346cc8f4bdefc98436386ced2, it is causing link errors.	2020-09-22 13:47:39 -07:00
Stefanos Baziotis	b245c32b48	[LoopInfo] empty() -> isInnermost(), add isOutermost() Differential Revision: https://reviews.llvm.org/D82895	2020-09-22 23:28:51 +03:00
Congzhe Cao	a77f090539	[AArch64] Avoid pairing loads with same result reg When pairing ldr instructions to an ldp instruction, we cannot pair two ldr destination registers where one is a sub or super register of the other. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D86906	2020-09-22 16:25:08 -04:00
Mircea Trofin	3c9a842f53	[ThinLTO] Option to bypass function importing. This completes the circle, complementing -lto-embed-bitcode (specifically, post-merge-pre-opt). Using -thinlto-assume-merged skips function importing. The index file is still needed for the other data it contains. Differential Revision: https://reviews.llvm.org/D87949	2020-09-22 13:12:11 -07:00
Paul C. Anagnostopoulos	54f163f8a4	Two patches to fix the broken build. One to fix a C++ compiler warning. One to allow Sphinx to find a new document.	2020-09-22 16:00:31 -04:00
Roman Lebedev	15f7fc0a26	[CVP] Narrow SDiv/SRem to the smallest power-of-2 that's sufficient to contain its operands This is practically identical to what we already do for UDiv/URem: https://rise4fun.com/Alive/04K Name: narrow udiv Pre: C0 u<= 255 && C1 u<= 255 %r = udiv i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = udiv i8 %t0, %t1 %r = zext i8 %t2 to i16 Name: narrow exact udiv Pre: C0 u<= 255 && C1 u<= 255 %r = udiv exact i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = udiv exact i8 %t0, %t1 %r = zext i8 %t2 to i16 Name: narrow urem Pre: C0 u<= 255 && C1 u<= 255 %r = urem i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = urem i8 %t0, %t1 %r = zext i8 %t2 to i16 ... only here we need to look for 'min signed bits', not 'active bits', and there's an UB to be aware of: https://rise4fun.com/Alive/KG86 https://rise4fun.com/Alive/LwR Name: narrow sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 %r = sdiv i16 C0, C1 => %t0 = trunc i16 C0 to i9 %t1 = trunc i16 C1 to i9 %t2 = sdiv i9 %t0, %t1 %r = sext i9 %t2 to i16 Name: narrow exact sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 %r = sdiv exact i16 C0, C1 => %t0 = trunc i16 C0 to i9 %t1 = trunc i16 C1 to i9 %t2 = sdiv exact i9 %t0, %t1 %r = sext i9 %t2 to i16 Name: narrow srem Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 %r = srem i16 C0, C1 => %t0 = trunc i16 C0 to i9 %t1 = trunc i16 C1 to i9 %t2 = srem i9 %t0, %t1 %r = sext i9 %t2 to i16 Name: narrow sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1) %r = sdiv i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = sdiv i8 %t0, %t1 %r = sext i8 %t2 to i16 Name: narrow exact sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1) %r = sdiv exact i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = sdiv exact i8 %t0, %t1 %r = sext i8 %t2 to i16 Name: narrow srem Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1) %r = srem i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = srem i8 %t0, %t1 %r = sext i8 %t2 to i16 The ConstantRangeTest.losslessSignedTruncationSignext test sanity-checks the logic, that we can losslessly truncate ConstantRange to `getMinSignedBits()` and signext it back, and it will be identical to the original CR. On vanilla llvm test-suite + RawSpeed, this fires 1262 times, while the same fold for UDiv/URem only fires 384 times. Sic! Additionally, this causes +606.18% (+1079) extra cases of aggressive-instcombine.NumDAGsReduced, and +473.14% (+1145) of aggressive-instcombine.NumInstrsReduced folds.	2020-09-22 21:37:30 +03:00
Roman Lebedev	6323cbc627	[NFC][CVP] Add tests for SDiv/SRem narrowing	2020-09-22 21:37:30 +03:00
Roman Lebedev	6e34bfa0f0	[NFC][CVP] Give a better name STATISTIC() counting udiv i16 -> udiv i8 xforms	2020-09-22 21:37:30 +03:00
Roman Lebedev	7f7780d776	[ConstantRange] Introduce getMinSignedBits() method Similar to the ConstantRange::getActiveBits(), and to similarly-named methods in APInt, returns the bitwidth needed to represent the given signed constant range	2020-09-22 21:37:30 +03:00
Roman Lebedev	8b5ad85b2f	[NFC][APInt] Refactor getMinSignedBits() in terms of getNumSignBits() This is fully identical to the old implementation, just easier to read.	2020-09-22 21:37:29 +03:00
Roman Lebedev	c2fd2cd7a1	[NFC][CVP] processUDivOrURem(): refactor to use ConstantRange::getActiveBits() As an exhaustive test shows, this logic is fully identical to the old implementation, with exception of the case where both of the operands had empty ranges: ``` TEST_F(ConstantRangeTest, CVP_UDiv) { unsigned Bits = 4; EnumerateConstantRanges(Bits, [&](const ConstantRange &CR0) { if(CR0.isEmptySet()) return; EnumerateConstantRanges(Bits, [&](const ConstantRange &CR1) { if(CR0.isEmptySet()) return; unsigned MaxActiveBits = 0; for (const ConstantRange &CR : {CR0, CR1}) MaxActiveBits = std::max(MaxActiveBits, CR.getActiveBits()); ConstantRange OperandRange(Bits, /isFullSet=/false); for (const ConstantRange &CR : {CR0, CR1}) OperandRange = OperandRange.unionWith(CR); unsigned NewWidth = OperandRange.getUnsignedMax().getActiveBits(); EXPECT_EQ(MaxActiveBits, NewWidth) << CR0 << " " << CR1; }); }); } ```	2020-09-22 21:37:29 +03:00
Roman Lebedev	ebabbd05e9	[ConstantRange] Introduce getActiveBits() method Much like APInt::getActiveBits(), computes how many bits are needed to be able to represent every value in this constant range, treating the values as unsigned.	2020-09-22 21:37:29 +03:00
Roman Lebedev	753cbd3c43	[ConstantRange] binaryXor(): special-case binary complement case - the result is precise Use the fact that `~X` is equivalent to `-1 - X`, which gives us fully-precise answer, and we only need to special-handle the wrapped case. This fires ~16k times for vanilla llvm test-suite + RawSpeed.	2020-09-22 21:37:29 +03:00
Roman Lebedev	3c7246f864	[CVP] Enhance SRem -> URem fold to work not just on non-negative operands This is a continuation of 8d487668d09fb0e4e54f36207f07c1480ffabbfd, the logic is pretty much identical for SRem: Name: pos pos Pre: C0 >= 0 && C1 >= 0 %r = srem i8 C0, C1 => %r = urem i8 C0, C1 Name: pos neg Pre: C0 >= 0 && C1 <= 0 %r = srem i8 C0, C1 => %r = urem i8 C0, -C1 Name: neg pos Pre: C0 <= 0 && C1 >= 0 %r = srem i8 C0, C1 => %t0 = urem i8 -C0, C1 %r = sub i8 0, %t0 Name: neg neg Pre: C0 <= 0 && C1 <= 0 %r = srem i8 C0, C1 => %t0 = urem i8 -C0, -C1 %r = sub i8 0, %t0 https://rise4fun.com/Alive/Vd6 Now, this new logic does not result in any new catches as of vanilla llvm test-suite + RawSpeed. but it should be virtually compile-time free, and it may be important to be consistent in their handling, because if we had a pair of sdiv-srem, and only converted one of them, -divrempairs will no longer see them as a pair, and thus not "merge" them.	2020-09-22 21:37:28 +03:00
Roman Lebedev	7819096dae	[NFC][CVP] Add tests for srem with potentially different sigdness domains	2020-09-22 21:37:28 +03:00
Arthur Eubanks	31377eefb4	[test][NewPM] Pin do-nothing-intrinsic.ll to legacy PM It tests CallGraph infra around the legacy PM which isn't relevant in NPM.	2020-09-22 11:33:38 -07:00
Arthur Eubanks	9c2efe1cd8	[LoopInfo][NewPM] Fix tests in Analysis/LoopInfo under NPM	2020-09-22 11:31:00 -07:00
Hubert Tong	d242891bee	[InstCombine] For pow(x, +/-0.5), stop falling into pow(x, 1.5), etc. case The current code for handling pow(x, y) where y is an integer plus 0.5 is not explicitly guarded against attempting to transform the case where abs(y) is exactly 0.5. The latter case is meant to be handled by `replacePowWithSqrt`. Indeed, if the pow(x, integer+0.5) case proceeds past a certain point, it will hit an assertion by attempting to form pow(x, 0) using `getPow`. This patch adds an explicit check to prevent attempting the pow(x, integer+0.5) transformation on pow(x, +/-0.5) as suggested during the review of D87877. This has the effect of retaining the shrinking of `pow` to `powf` when the `sqrt` libcall cannot be formed. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D88066	2020-09-22 14:23:32 -04:00
Hubert Tong	d179ce53d3	[NFC] Replace tabs with spaces in PPCInstrPrefix.td	2020-09-22 14:23:32 -04:00
Hubert Tong	c17d225889	[test][MC] Rehabilitate llvm/test/MC/COFF/bigobj.py The subject test was not actually running. This patch adds the relevant suffix to the list of lit case filename extensions for the enclosing directory. Minor adjustments are also made to deal with bit rot. Reviewed By: daltenty Differential Revision: https://reviews.llvm.org/D87122	2020-09-22 14:23:32 -04:00
LLVM GN Syncbot	260bab443f	[gn build] Port 8a64689e264	2020-09-22 18:07:36 +00:00
LLVM GN Syncbot	a70054367a	[gn build] Port 848d66fafd2	2020-09-22 18:07:35 +00:00
Paul C. Anagnostopoulos	82dfae475d	Version 0.5 of the new "TableGen Backend Developer's Guide." Files modified to take comments into account. MLIR documentation updated for new TableGen documentation files.	2020-09-22 14:01:52 -04:00
Mircea Trofin	b436344fb4	[NFC][regalloc] Simplify/conform to style guide indvars in Greedy Differential Revision: https://reviews.llvm.org/D88055	2020-09-22 10:55:52 -07:00
Amy Kwan	4778b6148c	[PowerPC] Implement Vector String Isolate Builtins in Clang/LLVM This patch implements the vector string isolate (predicate and non-predicate versions) builtins. The predicate builtins are custom selected within PPCISelDAGToDAG. Differential Revision: https://reviews.llvm.org/D87671	2020-09-22 11:31:44 -05:00
Amy Kwan	8864893cb8	[PowerPC] Implement the 128-bit Vector Divide Extended Builtins in Clang/LLVM This patch implements the 128-bit vector divide extended builtins in Clang/LLVM. These builtins map to the vdivesq and vdiveuq instructions respectively. Differential Revision: https://reviews.llvm.org/D87729	2020-09-22 11:31:44 -05:00
Simon Pilgrim	024914484d	[DAG] Remove DAGTypeLegalizer::GenWidenVectorTruncStores (PR42046) Just scalarize trunc stores - GenWidenVectorTruncStores does the same thing but is flawed (PR42046) and unused. Differential Revision: https://reviews.llvm.org/D87708	2020-09-22 17:24:45 +01:00
Alexandre Ganea	cdedaef996	Silence 'warning: unused variable' when compiling with Clang 10.0	2020-09-22 12:17:40 -04:00
Hamilton Tobon Mosquera	178800adfb	[OpenMPOpt] Refactored "issue" and "wait" declarations for data map runtime call. Refactored __tgt_target_data_begin_mapper_<issue\|wait> to receive the handle as an input/output argument. This given the compiler warning of returning the handle as copy. Differential Revision: https://reviews.llvm.org/D88029	2020-09-22 10:50:17 -05:00
Arthur Eubanks	6d75c366ed	[DI][ASan][NewPM] Fix some DebugInfo ASan tests under NPM	2020-09-22 08:28:54 -07:00
Alexandre Ganea	09fd7108ad	[ThinLTO] Re-order modules for optimal multi-threaded processing Re-use an optimizition from the old LTO API (used by ld64). This sorts modules in ascending order, based on bitcode size, so that larger modules are processed first. This allows for smaller modules to be process last, and better fill free threads 'slots', and thusly allow for better multi-thread load balancing. In our case (on dual Intel Xeon Gold 6140, Windows 10 version 2004, two-stage build), this saves 15 sec when linking `clang.exe` with LLD & `-flto=thin`, `/opt:lldltojobs=all`, no ThinLTO cache, -DLLVM_INTEGRATED_CRT_ALLOC=d:\git\rpmalloc. Before patch: 102 sec After patch: 85 sec Inspired by the work done by David Callahan in D60495. Differential Revision: https://reviews.llvm.org/D87966	2020-09-22 11:25:59 -04:00
Arthur Eubanks	ac65fcae49	[GVNSink][NewPM] Add GVNSinkPass to PassRegistry.def	2020-09-22 08:24:09 -07:00
Florian Hahn	f248f61115	[VPlan] Add dump() helper to VPValue & VPRecipeBase. This provides a convenient way to print VPValues and recipes in a debugger. In particular it saves the user from instantiating VPSlotTracker to print recipes or values.	2020-09-22 15:55:16 +01:00
Michael Liao	43a4d1cdda	[PeepholeOptimizer] Enhance the redundant COPY elimination. - Eliminate redundant COPYs from the same register & subregister pair. Differential Revision: https://reviews.llvm.org/D87939	2020-09-22 10:11:37 -04:00
Simon Pilgrim	f0dc47fc06	[X86] Add missing namespace closure comments. NFCI. Fixes some clang-tidy llvm-namespace-comment warnings.	2020-09-22 15:06:59 +01:00
Simon Pilgrim	f735dd1bec	[X86] Cleanup/add namespace closure comments. NFCI. Fixes some clang-tidy llvm-namespace-comment warnings.	2020-09-22 15:06:58 +01:00
Stefan Pintilie	12f9b13618	[PowerPC] Fix for compiler side issue in PCRelative Local Exec Stop combining loads and stores with PPCISD::ADD_TLS before we can merge the node with with TLS_LOCAL_EXEC_MAT_ADDR. The issue is that TLS_LOCAL_EXEC_MAT_ADDR cannot be selected by itself and requires the previous ADD_TLS node that goes with it. However, we sometimes try to combine ADD_TLS with loads and stores that come after it. If this happens then the ADD_TLS is removed and TLS_LOCAL_EXEC_MAT_ADDR cannot be selected. While this bug fix will address the issue it my not be ideal from a performance perspective as we may be able to add patterns to combine TLS_LOCAL_EXEC_MAT_ADDR with ADD_TLS with the load and store that comes after it all in one. However, this is beyond the scope of this patch. Reviewed By: NeHuang Differential Revision: https://reviews.llvm.org/D88030	2020-09-22 08:28:06 -05:00
Sanjay Patel	db9cccedfd	[SLP] reduce code duplication for checking parent block; NFC	2020-09-22 09:21:20 -04:00
Sanjay Patel	87c6ed7e4f	[SLP] move misplaced code comments; NFC	2020-09-22 09:21:20 -04:00
Sanjay Patel	207c01f738	[SLP] clean up code in gather(); NFC 1. Use range for-loop to avoid repeatedly accessing end index. 2. Better variable names.	2020-09-22 09:21:20 -04:00
Simon Pilgrim	85b5840339	[SLP] Merge null and dyn_cast<> checks into dyn_cast_or_null<>. NFCI.	2020-09-22 14:01:47 +01:00
Sam Parker	bd6fe01b0b	[ARM] Trying to fix asan buildbot	2020-09-22 13:43:23 +01:00

1 2 3 4 5 ...

204005 Commits