1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 03:02:36 +01:00
Commit Graph

204005 Commits

Author SHA1 Message Date
Lang Hames
cafb3faae9 [ORC][examples] Add missing library dependencies. 2020-09-22 16:18:00 -07:00
Hubert Tong
ecf7743d95 [InstCombine][NFC][tests] Add ninf base value case to pow-sqrt.ll 2020-09-22 18:58:05 -04:00
Hubert Tong
07e12367b2 [InstCombine] Fix errno bug in pow expansion to sqrt
A conversion from `pow` to `sqrt` shall not call an `errno`-setting
`sqrt` with -//infinity//: the `sqrt` will set `EDOM` where the `pow`
call need not.

This patch avoids the erroneous (pun not intended) transformation by
applying the restrictions discussed in the thread for
https://lists.llvm.org/pipermail/llvm-dev/2020-September/145051.html.

The existing tests are updated (depending on emphasis in the checks for
library calls, avoidance of overlap, and overall coverage):
  - to add `ninf`, retaining the intended library call,
  - to use the intrinsic, retaining the use of `select`, or
  - to expect the replacement to not occur.

The following is tested:
  - The pow intrinsic folds to a `select` instruction to
    handle -//infinity//.
  - The pow library call folds, with `ninf`, to `sqrt` without the
    `select` instruction associated with handling -//infinity//.
  - The pow library call does not fold to `sqrt` without `ninf`.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D87877
2020-09-22 18:58:05 -04:00
Alexey Bataev
47531738d7 [SLP]Fix coding style, NFC. 2020-09-22 17:44:29 -04:00
Philip Reames
c9c01d3eea [AArch64] Teach analyzeBranch to remove branch equivelent to fallthrough
The motivation here is that MachineBlockPlacement relies on analyzeBranch to remove branches to fallthrough blocks when the branch is not fully analyzeable. With the introduction of the FAULTING_OP psuedo for implicit null checking (see D87861), this case becomes important. Note that it's hard to otherwise exercise this path as BranchFolding handle's any fully analyzeable branch sequence without using this interface.

p.s. For anyone who saw my comment in the original review, what I thought was an issue in BranchFolding originally turned out to simply be a bug in my patch. (Now fixed.)

Differential Revision: https://reviews.llvm.org/D88035
2020-09-22 14:38:27 -07:00
Michael Liao
6003dbfc22 Fix build due to renaming in LoopInfo. 2020-09-22 17:33:38 -04:00
Fangrui Song
9e917a5915 Change LoopInfo::empty to isInnermost after D82895 2020-09-22 14:07:40 -07:00
Stefanos Baziotis
22876f12a9 Small fixes for "[LoopInfo] empty() -> isInnermost(), add isOutermost()" 2020-09-22 23:59:34 +03:00
Reid Kleckner
6fefa81755 Revert "[CodeGen] emit CG profile for COFF object file"
This reverts commit 91aed9bf975f1e4346cc8f4bdefc98436386ced2, it is
causing link errors.
2020-09-22 13:47:39 -07:00
Stefanos Baziotis
b245c32b48 [LoopInfo] empty() -> isInnermost(), add isOutermost()
Differential Revision: https://reviews.llvm.org/D82895
2020-09-22 23:28:51 +03:00
Congzhe Cao
a77f090539 [AArch64] Avoid pairing loads with same result reg
When pairing ldr instructions to an ldp instruction, we cannot pair two ldr
destination registers where one is a sub or super register of the other.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D86906
2020-09-22 16:25:08 -04:00
Mircea Trofin
3c9a842f53 [ThinLTO] Option to bypass function importing.
This completes the circle, complementing -lto-embed-bitcode
(specifically, post-merge-pre-opt). Using -thinlto-assume-merged skips
function importing. The index file is still needed for the other data it
contains.

Differential Revision: https://reviews.llvm.org/D87949
2020-09-22 13:12:11 -07:00
Paul C. Anagnostopoulos
54f163f8a4 Two patches to fix the broken build.
One to fix a C++ compiler warning.
One to allow Sphinx to find a new document.
2020-09-22 16:00:31 -04:00
Roman Lebedev
15f7fc0a26 [CVP] Narrow SDiv/SRem to the smallest power-of-2 that's sufficient to contain its operands
This is practically identical to what we already do for UDiv/URem:
  https://rise4fun.com/Alive/04K

Name: narrow udiv
Pre: C0 u<= 255 && C1 u<= 255
%r = udiv i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = udiv i8 %t0, %t1
%r = zext i8 %t2 to i16

Name: narrow exact udiv
Pre: C0 u<= 255 && C1 u<= 255
%r = udiv exact i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = udiv exact i8 %t0, %t1
%r = zext i8 %t2 to i16

Name: narrow urem
Pre: C0 u<= 255 && C1 u<= 255
%r = urem i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = urem i8 %t0, %t1
%r = zext i8 %t2 to i16

... only here we need to look for 'min signed bits', not 'active bits',
and there's an UB to be aware of:
  https://rise4fun.com/Alive/KG86
  https://rise4fun.com/Alive/LwR

Name: narrow sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128
%r = sdiv i16 C0, C1
  =>
%t0 = trunc i16 C0 to i9
%t1 = trunc i16 C1 to i9
%t2 = sdiv i9 %t0, %t1
%r = sext i9 %t2 to i16

Name: narrow exact sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128
%r = sdiv exact i16 C0, C1
  =>
%t0 = trunc i16 C0 to i9
%t1 = trunc i16 C1 to i9
%t2 = sdiv exact i9 %t0, %t1
%r = sext i9 %t2 to i16

Name: narrow srem
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128
%r = srem i16 C0, C1
  =>
%t0 = trunc i16 C0 to i9
%t1 = trunc i16 C1 to i9
%t2 = srem i9 %t0, %t1
%r = sext i9 %t2 to i16


Name: narrow sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1)
%r = sdiv i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = sdiv i8 %t0, %t1
%r = sext i8 %t2 to i16

Name: narrow exact sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1)
%r = sdiv exact i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = sdiv exact i8 %t0, %t1
%r = sext i8 %t2 to i16

Name: narrow srem
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1)
%r = srem i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = srem i8 %t0, %t1
%r = sext i8 %t2 to i16


The ConstantRangeTest.losslessSignedTruncationSignext test sanity-checks
the logic, that we can losslessly truncate ConstantRange to
`getMinSignedBits()` and signext it back, and it will be identical
to the original CR.

On vanilla llvm test-suite + RawSpeed, this fires 1262 times,
while the same fold for UDiv/URem only fires 384 times. Sic!

Additionally, this causes +606.18% (+1079) extra cases of
aggressive-instcombine.NumDAGsReduced, and +473.14% (+1145)
of aggressive-instcombine.NumInstrsReduced folds.
2020-09-22 21:37:30 +03:00
Roman Lebedev
6323cbc627 [NFC][CVP] Add tests for SDiv/SRem narrowing 2020-09-22 21:37:30 +03:00
Roman Lebedev
6e34bfa0f0 [NFC][CVP] Give a better name STATISTIC() counting udiv i16 -> udiv i8 xforms 2020-09-22 21:37:30 +03:00
Roman Lebedev
7f7780d776 [ConstantRange] Introduce getMinSignedBits() method
Similar to the ConstantRange::getActiveBits(), and to similarly-named
methods in APInt, returns the bitwidth needed to represent
the given signed constant range
2020-09-22 21:37:30 +03:00
Roman Lebedev
8b5ad85b2f [NFC][APInt] Refactor getMinSignedBits() in terms of getNumSignBits()
This is fully identical to the old implementation, just easier to read.
2020-09-22 21:37:29 +03:00
Roman Lebedev
c2fd2cd7a1 [NFC][CVP] processUDivOrURem(): refactor to use ConstantRange::getActiveBits()
As an exhaustive test shows, this logic is fully identical to the old
implementation, with exception of the case where both of the operands
had empty ranges:

```
TEST_F(ConstantRangeTest, CVP_UDiv) {
  unsigned Bits = 4;
  EnumerateConstantRanges(Bits, [&](const ConstantRange &CR0) {
    if(CR0.isEmptySet())
      return;
    EnumerateConstantRanges(Bits, [&](const ConstantRange &CR1) {
      if(CR0.isEmptySet())
        return;

      unsigned MaxActiveBits = 0;
      for (const ConstantRange &CR : {CR0, CR1})
        MaxActiveBits = std::max(MaxActiveBits, CR.getActiveBits());

      ConstantRange OperandRange(Bits, /*isFullSet=*/false);
      for (const ConstantRange &CR : {CR0, CR1})
        OperandRange = OperandRange.unionWith(CR);
      unsigned NewWidth = OperandRange.getUnsignedMax().getActiveBits();

      EXPECT_EQ(MaxActiveBits, NewWidth) << CR0 << " " << CR1;
    });
  });
}
```
2020-09-22 21:37:29 +03:00
Roman Lebedev
ebabbd05e9 [ConstantRange] Introduce getActiveBits() method
Much like APInt::getActiveBits(), computes how many bits are needed
to be able to represent every value in this constant range,
treating the values as unsigned.
2020-09-22 21:37:29 +03:00
Roman Lebedev
753cbd3c43 [ConstantRange] binaryXor(): special-case binary complement case - the result is precise
Use the fact that `~X` is equivalent to `-1 - X`, which gives us
fully-precise answer, and we only need to special-handle the wrapped case.

This fires ~16k times for vanilla llvm test-suite + RawSpeed.
2020-09-22 21:37:29 +03:00
Roman Lebedev
3c7246f864 [CVP] Enhance SRem -> URem fold to work not just on non-negative operands
This is a continuation of 8d487668d09fb0e4e54f36207f07c1480ffabbfd,
the logic is pretty much identical for SRem:

Name: pos pos
Pre: C0 >= 0 && C1 >= 0
%r = srem i8 C0, C1
  =>
%r = urem i8 C0, C1

Name: pos neg
Pre: C0 >= 0 && C1 <= 0
%r = srem i8 C0, C1
  =>
%r = urem i8 C0, -C1

Name: neg pos
Pre: C0 <= 0 && C1 >= 0
%r = srem i8 C0, C1
  =>
%t0 = urem i8 -C0, C1
%r = sub i8 0, %t0

Name: neg neg
Pre: C0 <= 0 && C1 <= 0
%r = srem i8 C0, C1
  =>
%t0 = urem i8 -C0, -C1
%r = sub i8 0, %t0

https://rise4fun.com/Alive/Vd6

Now, this new logic does not result in any new catches
as of vanilla llvm test-suite + RawSpeed.
but it should be virtually compile-time free,
and it may be important to be consistent in their handling,
because if we had a pair of sdiv-srem, and only converted one of them,
-divrempairs will no longer see them as a pair,
and thus not "merge" them.
2020-09-22 21:37:28 +03:00
Roman Lebedev
7819096dae [NFC][CVP] Add tests for srem with potentially different sigdness domains 2020-09-22 21:37:28 +03:00
Arthur Eubanks
31377eefb4 [test][NewPM] Pin do-nothing-intrinsic.ll to legacy PM
It tests CallGraph infra around the legacy PM which isn't relevant in NPM.
2020-09-22 11:33:38 -07:00
Arthur Eubanks
9c2efe1cd8 [LoopInfo][NewPM] Fix tests in Analysis/LoopInfo under NPM 2020-09-22 11:31:00 -07:00
Hubert Tong
d242891bee [InstCombine] For pow(x, +/-0.5), stop falling into pow(x, 1.5), etc. case
The current code for handling pow(x, y) where y is an integer plus 0.5
is not explicitly guarded against attempting to transform the case where
abs(y) is exactly 0.5.

The latter case is meant to be handled by `replacePowWithSqrt`. Indeed,
if the pow(x, integer+0.5) case proceeds past a certain point, it will
hit an assertion by attempting to form pow(x, 0) using `getPow`.

This patch adds an explicit check to prevent attempting the
pow(x, integer+0.5) transformation on pow(x, +/-0.5) as suggested during
the review of D87877. This has the effect of retaining the shrinking of
`pow` to `powf` when the `sqrt` libcall cannot be formed.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D88066
2020-09-22 14:23:32 -04:00
Hubert Tong
d179ce53d3 [NFC] Replace tabs with spaces in PPCInstrPrefix.td 2020-09-22 14:23:32 -04:00
Hubert Tong
c17d225889 [test][MC] Rehabilitate llvm/test/MC/COFF/bigobj.py
The subject test was not actually running. This patch adds the
relevant suffix to the list of lit case filename extensions for the
enclosing directory.

Minor adjustments are also made to deal with bit rot.

Reviewed By: daltenty

Differential Revision: https://reviews.llvm.org/D87122
2020-09-22 14:23:32 -04:00
LLVM GN Syncbot
260bab443f [gn build] Port 8a64689e264 2020-09-22 18:07:36 +00:00
LLVM GN Syncbot
a70054367a [gn build] Port 848d66fafd2 2020-09-22 18:07:35 +00:00
Paul C. Anagnostopoulos
82dfae475d Version 0.5 of the new "TableGen Backend Developer's Guide."
Files modified to take comments into account.
MLIR documentation updated for new TableGen documentation files.
2020-09-22 14:01:52 -04:00
Mircea Trofin
b436344fb4 [NFC][regalloc] Simplify/conform to style guide indvars in Greedy
Differential Revision: https://reviews.llvm.org/D88055
2020-09-22 10:55:52 -07:00
Amy Kwan
4778b6148c [PowerPC] Implement Vector String Isolate Builtins in Clang/LLVM
This patch implements the vector string isolate (predicate and non-predicate
versions) builtins. The predicate builtins are custom selected within PPCISelDAGToDAG.

Differential Revision: https://reviews.llvm.org/D87671
2020-09-22 11:31:44 -05:00
Amy Kwan
8864893cb8 [PowerPC] Implement the 128-bit Vector Divide Extended Builtins in Clang/LLVM
This patch implements the 128-bit vector divide extended builtins in Clang/LLVM.
These builtins map to the vdivesq and vdiveuq instructions respectively.

Differential Revision: https://reviews.llvm.org/D87729
2020-09-22 11:31:44 -05:00
Simon Pilgrim
024914484d [DAG] Remove DAGTypeLegalizer::GenWidenVectorTruncStores (PR42046)
Just scalarize trunc stores - GenWidenVectorTruncStores does the same thing but is flawed (PR42046) and unused.

Differential Revision: https://reviews.llvm.org/D87708
2020-09-22 17:24:45 +01:00
Alexandre Ganea
cdedaef996 Silence 'warning: unused variable' when compiling with Clang 10.0 2020-09-22 12:17:40 -04:00
Hamilton Tobon Mosquera
178800adfb [OpenMPOpt] Refactored "issue" and "wait" declarations for data map runtime call.
Refactored __tgt_target_data_begin_mapper_<issue|wait> to receive the handle as an input/output argument.
This given the compiler warning of returning the handle as copy.

Differential Revision: https://reviews.llvm.org/D88029
2020-09-22 10:50:17 -05:00
Arthur Eubanks
6d75c366ed [DI][ASan][NewPM] Fix some DebugInfo ASan tests under NPM 2020-09-22 08:28:54 -07:00
Alexandre Ganea
09fd7108ad [ThinLTO] Re-order modules for optimal multi-threaded processing
Re-use an optimizition from the old LTO API (used by ld64).
This sorts modules in ascending order, based on bitcode size, so that larger modules are processed first. This allows for smaller modules to be process last, and better fill free threads 'slots', and thusly allow for better multi-thread load balancing.

In our case (on dual Intel Xeon Gold 6140, Windows 10 version 2004, two-stage build), this saves 15 sec when linking `clang.exe` with LLD & `-flto=thin`, `/opt:lldltojobs=all`, no ThinLTO cache, -DLLVM_INTEGRATED_CRT_ALLOC=d:\git\rpmalloc.

Before patch: 102 sec
After patch: 85 sec

Inspired by the work done by David Callahan in D60495.

Differential Revision: https://reviews.llvm.org/D87966
2020-09-22 11:25:59 -04:00
Arthur Eubanks
ac65fcae49 [GVNSink][NewPM] Add GVNSinkPass to PassRegistry.def 2020-09-22 08:24:09 -07:00
Florian Hahn
f248f61115 [VPlan] Add dump() helper to VPValue & VPRecipeBase.
This provides a convenient way to print VPValues and recipes in a
debugger. In particular it saves the user from instantiating
VPSlotTracker to print recipes or values.
2020-09-22 15:55:16 +01:00
Michael Liao
43a4d1cdda [PeepholeOptimizer] Enhance the redundant COPY elimination.
- Eliminate redundant COPYs from the same register & subregister pair.

Differential Revision: https://reviews.llvm.org/D87939
2020-09-22 10:11:37 -04:00
Simon Pilgrim
f0dc47fc06 [X86] Add missing namespace closure comments. NFCI.
Fixes some clang-tidy llvm-namespace-comment warnings.
2020-09-22 15:06:59 +01:00
Simon Pilgrim
f735dd1bec [X86] Cleanup/add namespace closure comments. NFCI.
Fixes some clang-tidy llvm-namespace-comment warnings.
2020-09-22 15:06:58 +01:00
Stefan Pintilie
12f9b13618 [PowerPC] Fix for compiler side issue in PCRelative Local Exec
Stop combining loads and stores with PPCISD::ADD_TLS before we can merge the
node with with TLS_LOCAL_EXEC_MAT_ADDR. The issue is that
TLS_LOCAL_EXEC_MAT_ADDR cannot be selected by itself and requires the previous
ADD_TLS node that goes with it. However, we sometimes try to combine ADD_TLS
with loads and stores that come after it. If this happens then the ADD_TLS is
removed and TLS_LOCAL_EXEC_MAT_ADDR cannot be selected.

While this bug fix will address the issue it my not be ideal from a performance
perspective as we may be able to add patterns to combine TLS_LOCAL_EXEC_MAT_ADDR
with ADD_TLS with the load and store that comes after it all in one. However,
this is beyond the scope of this patch.

Reviewed By: NeHuang

Differential Revision: https://reviews.llvm.org/D88030
2020-09-22 08:28:06 -05:00
Sanjay Patel
db9cccedfd [SLP] reduce code duplication for checking parent block; NFC 2020-09-22 09:21:20 -04:00
Sanjay Patel
87c6ed7e4f [SLP] move misplaced code comments; NFC 2020-09-22 09:21:20 -04:00
Sanjay Patel
207c01f738 [SLP] clean up code in gather(); NFC
1. Use range for-loop to avoid repeatedly accessing end index.
2. Better variable names.
2020-09-22 09:21:20 -04:00
Simon Pilgrim
85b5840339 [SLP] Merge null and dyn_cast<> checks into dyn_cast_or_null<>. NFCI. 2020-09-22 14:01:47 +01:00
Sam Parker
bd6fe01b0b [ARM] Trying to fix asan buildbot 2020-09-22 13:43:23 +01:00