1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 04:32:44 +01:00
Commit Graph

201615 Commits

Author SHA1 Message Date
Vitaly Buka
567623e324 Revert "[NFC][StackSafety] Add index test"
This reverts commit 5fd49911db546cda6b35dffb6be440385e8d96d5.

GUIDs don't match.
2020-08-08 21:26:35 -07:00
Vitaly Buka
49cd07bb08 [NFC][StackSafety] Add index test
This directly covers generateParamAccessSummary
2020-08-08 19:11:02 -07:00
Vitaly Buka
115fa28020 [NFC][StackSafety] noinline in alias tests 2020-08-08 18:21:52 -07:00
weihe
68e894ab3c [llvm-profdata] Implement llvm-profdata overlap for sample profiles
Implemented the `llvm-profdata overlap` feature for sample profiles. It reports weighted //similarity// and unweighted //overlap// metrics at program and function level for two input profiles. Similarity metrics are symmetric with regards to the order of two input profiles. By default, the tool only reports program-level summary. Users can look into function-level details via additional options `--function`, `--similarity-cutoff`, and `--value-cutoff`.

The similarity metrics are designed as follows:
* Program-level summary
    * Whole program profile similarity is an aggregate over function-level similarity `FS`: `PS = sum(FS(A) * avg_weight(A))` for all function `A`.
    * Whole program sample overlap: `PSO = common_samples / total_samples`.
    * Function overlap: `FO = #common_function / #total_function`.
    * Hot-function overlap: `HFO = #common_hot_function / #total_hot_function`.
    * Hot-block overlap: `HBO = #common_hot_block / #total_hot_block`.
* Function-level details
    * Function-level similarity is an aggregate over line/block-level similarities `BS` of all sample lines/blocks in the function, weighted by the closeness of the function's weights in two profiles: `FS = sum(BS(i)) * (1 - weight_distance(A))`.
    * Function-level sample overlap: `FSO = common_samples / total_samples` for samples in the function.

Reviewed By: wenlei, hoyFB, wmi

Differential Revision: https://reviews.llvm.org/D83852
2020-08-08 17:49:48 -07:00
Petr Hosek
6c27d09879 Revert "[CMake] Simplify CMake handling for zlib"
This reverts commit ccbc1485b55ff4acd21bcfafbf7aec4ed0fd818d which
is still failing on the Windows MLIR bots.
2020-08-08 17:08:23 -07:00
Petr Hosek
af8170b5ad [CMake] Simplify CMake handling for zlib
Rather than handling zlib handling manually, use find_package from CMake
to find zlib properly. Use this to normalize the LLVM_ENABLE_ZLIB,
HAVE_ZLIB, HAVE_ZLIB_H. Furthermore, require zlib if LLVM_ENABLE_ZLIB is
set to YES, which requires the distributor to explicitly select whether
zlib is enabled or not. This simplifies the CMake handling and usage in
the rest of the tooling.

This is a reland of abb0075 with all followup changes and fixes that
should address issues that were reported in PR44780.

Differential Revision: https://reviews.llvm.org/D79219
2020-08-08 16:44:08 -07:00
Thomas Lively
23fb49ac5f [WebAssembly] Fix FastISel address calculation bug
Fixes PR47040, in which an assertion was improperly triggered during
FastISel's address computation. The issue was that an `Address` set to
be relative to the FrameIndex with offset zero was incorrectly
considered to have an unset base. When the left hand side of an add
set the Address to be 0 off the FrameIndex, the right side would not
detect that the Address base had already been set and could try to set
the Address to be relative to a register instead, triggering an
assertion.

This patch fixes the issue by explicitly tracking whether an `Address`
has been set rather than interpreting an offset of zero to mean the
`Address` has not been set.

Differential Revision: https://reviews.llvm.org/D85581
2020-08-08 15:23:11 -07:00
Craig Topper
2826efbf81 [X86] Remove a DCI.isBeforeLegalize() call from combineVSelectWithAllOnesOrZeros.
This was blocking isTypeLegal call so that we could do a particular
transform on illegal types before type legalization. But the we
create a target specific node using that type. We shouldn't do
that if the type isn't legal. So I think we should just always
make sure the type is legal.

I suspect that in order to get the condition VT to not be a vector
of i1 we already completed type legalization anyway so this probably
doesn't matter much in practice.
2020-08-08 14:19:13 -07:00
Roman Lebedev
2f56715ea7 [Reduce] Rewrite function body delta pass again
It is not enough to replace all uses of users of the function with undef,
the users, we only drop instruction users, so they may stick around.

Let's try different approach - first drop bodies for all the functions
we will drop, which should take care of blockaddress issue the previous
rewrite was dealing with; then, after dropping *all* such bodies,
replace remaining uses with undef (thus all the uses are either
outside of functions, or are in kept functions)
and then finally drop functions.

This seems to work, and passes the *existing* test coverage,
but it is possible that a new issue will be discovered later :)

A new (previously crashing) test added.
2020-08-08 23:48:44 +03:00
Craig Topper
ffa193cdec [X86] Support matching VPTERNLOG when the root node is X86ISD::ANDNP. 2020-08-08 13:11:47 -07:00
Craig Topper
83ecd17734 [X86] Add VPTERNLOG test cases where the root node will be X86ISD::ANDNP. NFC
We currently fail to match this.
2020-08-08 12:53:28 -07:00
Dávid Bolvanský
5bc09382f4 [AArch64RegisterInfo] Supress new warning 2020-08-08 21:47:01 +02:00
Craig Topper
3821fbba14 [X86] Remove isSafeToClobberEFLAGS helper and just inline it into the call sites.
This is just a thin wrapper around computeRegisterLivness which
we can just call directly. The only real difference is that
isSafeToClobberEFLAGS returns a bool and computeRegisterLivness
returns an enum. So we need to check for the specific enum value
that isSafeToClobberEFLAGS was hiding.

I've also adjusted which sites pass an explicit value for
Neighborhood since the default for computeRegisterLivness is 10.
2020-08-08 12:31:58 -07:00
Craig Topper
c01ba6831e Recommit "[X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places"
I messed up the bug numbers in the commit message before

Previously this function searched 4 instructions forwards or
backwards to determine if it was ok to clobber eflags.

This is called in 3 places: rematerialization, turning 2 operand
leas into adds or splitting 3 ops leas into an lea and add on some
CPU targets.

This patch increases the search limit to 10 instructions for
rematerialization and 2 operand lea to add. I've left the old
treshold for 3 ops lea spliting as that increases code size.

Fixes PR47024 and PR46315.
2020-08-08 11:53:14 -07:00
Craig Topper
2d70f5fabd Revert "[X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places"
This reverts commit 44b260cb0aab387d85e4d59c16fc7b8866264f5e.

I messed up the bug number in the commit message so I'm reverting
to fix it.
2020-08-08 11:53:14 -07:00
Dávid Bolvanský
b26d8d1ccf [FileCheckTest] Supress new warning 2020-08-08 20:45:24 +02:00
Simon Pilgrim
9b68c08ed2 [X86][SSE] combineTargetShuffle - use scaleShuffleMask helper to widen shuffle mask. NFCI.
Use scaleShuffleMask helper for the shuffle(hadd,hadd) canonicalization.
2020-08-08 19:36:18 +01:00
Craig Topper
b998790f91 [X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places
Previously this function searched 4 instructions forwards or
backwards to determine if it was ok to clobber eflags.

This is called in 3 places: rematerialization, turning 2 operand
leas into adds or splitting 3 ops leas into an lea and add on some
CPU targets.

This patch increases the search limit to 10 instructions for
rematerialization and 2 operand lea to add. I've left the old
treshold for 3 ops lea spliting as that increases code size.

Fixes PR47024 and PR43014
2020-08-08 11:29:41 -07:00
Simon Pilgrim
733733f727 [InstCombine] Use CreateVectorSplat(ElementCount) variant directly
This was introduced at rGe20223672100, and the CreateVectorSplat(unsigned NumElements) variant calls it internally
2020-08-08 19:26:02 +01:00
Roman Lebedev
903dd081e7 [SimplifyCFG] Fix invoke->call fold w/ multiple invokes in presence of lifetime intrinsics
SimplifyCFG has two main folds for resumes - one when resume is directly
using the landingpad, and the other one where resume is using a PHI node.

While for the first case, we were already correctly ignoring all the
PHI nodes, and both the debug info intrinsics and lifetime intrinsics,
in the PHI-based-one, we weren't ignoring PHI's in the resume block,
and weren't ignoring lifetime intrinsics. That is clearly a bug.

On RawSpeed library, this results in +9.34% (+81) more invoke->call folds,
-0.19% (-39) landing pads, -0.24% (-81) invoke instructions
but +51 call instructions and -132 basic blocks.

Though, the run-time performance impact appears to be within the noise.
2020-08-08 20:00:28 +03:00
Roman Lebedev
5027b49a96 [NFC][SimplifyCFG] Rewrite isCleanupBlockEmpty() to be iterator_range-based 2020-08-08 20:00:28 +03:00
Roman Lebedev
b96154f169 [NFC][SimplifyCFG] Add a test showing invoke->call simplification failure 2020-08-08 20:00:28 +03:00
Roman Lebedev
152f29684a [NFC][SimplifyCFG] Count the number of invokes turned into calls due to empty cleanup blocks 2020-08-08 20:00:27 +03:00
Sanjay Patel
33d6e8d6a8 [DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division, part 2
Follow-up to D82716 / rGea71ba11ab11
We do not have the fabs removal fold in IR yet for the case
where the sqrt operand is repeated, so that's another potential
improvement.
2020-08-08 10:38:06 -04:00
Sanjay Patel
d58ed6d476 [x86] add tests for another reciprocal sqrt pattern; NFC 2020-08-08 10:38:06 -04:00
Benjamin Kramer
ba661f5c49 lib/CodeGen doesn't depend on lib/Passes. 2020-08-08 13:40:24 +02:00
Rainer Orth
2697fb8b1e [test][DebugInfo] Adapt two tests for Sun assembler syntax on Sparc
Two DebugInfo tests currently `FAIL` on Sparc:

  LLVM :: DebugInfo/Generic/2010-06-29-InlinedFnLocalVar.ll
  LLVM :: DebugInfo/Generic/array.ll

both in a similar way.  E.g.

  : 'RUN: at line 1';   /var/llvm/local-sparcv9-A/bin/llc -O2 /vol/llvm/src/llvm-project/local/llvm/test/DebugInfo/Generic/2010-06-29-InlinedFnLocalVar.ll -o - | /var/llvm/local-sparcv9-A/bin/FileCheck /vol/llvm/src/llvm-project/local/llvm/test/DebugInfo/Generic/2010-06-29-InlinedFnLocalVar.ll

  /vol/llvm/src/llvm-project/local/llvm/test/DebugInfo/Generic/2010-06-29-InlinedFnLocalVar.ll:4:10: error: CHECK: expected string not found in input
  ; CHECK: debug_info,
           ^

On `amd64-pc-solaris2.11`, the corresponding line is

  .section        .debug_info,"",@progbits

while on `sparcv9-sun-solaris2.11` we have only

  .section        .debug_info

This happens because Sparc currently emits `.section` directives using the
style of the Solaris/SPARC assembler (controlled by `SunStyleELFSectionSwitchSyntax`).

This patch takes the easy way out and allows both forms while tightening the
check to only match the `.section` directive.

Tested on `sparcv9-sun-solaris2.11`, `amd64-pc-solaris2.11`,
`x86_64-pc-linux-gnu`, and `x86_64-apple-darwin20.0.0`.

Differential Revision: https://reviews.llvm.org/D85414
2020-08-08 09:13:47 +02:00
Juneyoung Lee
8d60604a96 [InstCombine] Optimize select(freeze(icmp eq/ne x, y), x, y)
This patch adds an optimization that folds select(freeze(icmp eq/ne x, y), x, y)
to x or y.
This was needed to resolve slowdown after D84940 is applied.

I tried to bake this logic into foldSelectInstWithICmp, but it wasn't clear.
This patch conservatively writes the pattern in a separate function,
foldSelectWithFrozenICmp.

The output does not need freeze; https://alive2.llvm.org/ce/z/X49hNE (from @nikic)

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D85533
2020-08-08 15:22:29 +09:00
Juneyoung Lee
2677ef03a3 [InstCombine] Add tests for select(freeze(icmp x, y), x, y); NFC 2020-08-08 15:09:08 +09:00
Craig Topper
b2ec81eb93 [X86] Limit the scope of the min/max canonicalization in combineSelect
Previously the transform was doing these two canonicalizations
(x > y) ? x : y -> (x >= y) ? x : y
(x < y) ? x : y -> (x <= y) ? x : y

But those don't seem to be useful generally. And they actively
pessimize the cases in PR47049.

This patch limits it to
(x > 0) ? x : 0 -> (x >= 0) ? x : 0
(x < -1) ? x : -1 -> (x <= -1) ? x : -1

These are the cases mentioned in the comments as the motivation
for the canonicalization. These allow the CMOV to use the S
flag from the compare thus improving opportunities to use a TEST
or the flags from an arithmetic instruction.
2020-08-07 22:51:49 -07:00
Keno Fischer
d9083126e6 [X86] Don't produce bad x86andp nodes for i1 vectors
In D85499, I attempted to fix this same issue by canonicalizing
andnp for i1 vectors, but since there was some opposition to such
a change, this commit just fixes the bug by using two different
forms depending on which kind of vector type is in use. We can
then always decide to switch the canonical forms later.

Description of the original bug:
We have a DAG combine that tries to fold (vselect cond, 0000..., X) -> (andnp cond, x).
However, it does so by attempting to create an i64 vector with the number
of elements obtained by truncating division by 64 from the bitwidth. This is
bad for mask vectors like v8i1, since that division is just zero. Besides,
we don't want i64 vectors anyway. For i1 vectors, switch the pattern
to (andnp (not cond), x), which is the canonical form for `kandn`
on mask registers.

Fixes https://github.com/JuliaLang/julia/issues/36955.

Differential Revision: https://reviews.llvm.org/D85553
2020-08-07 20:05:47 -04:00
LLVM GN Syncbot
f4add55acc [gn build] Port f5b5ccf2a68 2020-08-07 23:43:14 +00:00
Yuanfang Chen
526e29b5b9 Reland "Revert "[NewPM][CodeGen] Introduce machine pass and machine pass manager""
This relands commit 320eab2d558fde0b61437e9b9075bfd301c2c474.

The test failed because it was looking for x86-linux target
unconditionally. Now it gets the default target.
2020-08-07 16:40:49 -07:00
Matt Arsenault
dee6e5cc20 AMDGPU: Avoid explicitly listing all the memory nodes 2020-08-07 19:22:46 -04:00
Vitaly Buka
3e043ca45f [NFC][StackSafety] Fix statistics 2020-08-07 16:18:52 -07:00
Arthur Eubanks
834a6fc438 [NewPM] Print 'Skipping pass' as pass instrumentation
If OptNoneInstrumentation prints it instead, 'Skipping pass' will print for even required passes.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D85493
2020-08-07 15:02:02 -07:00
Mircea Trofin
be6dadeffd [NFC][MLInliner] Refactor logging implementation
This prepares it for logging externally-specified outputs.

Differential Revision: https://reviews.llvm.org/D85451
2020-08-07 14:56:56 -07:00
Sameer Arora
629a98ce69 [llvm-libtool-darwin] Add support for -D and -U options
Add support for `-D` and `-U` options for llvm-libtool-darwin. `-D`
allows for using zero for timestamps and UIDs/GIDs. `-U` allows for
using actual timestamps and UIDs/GIDs.

Reviewed by jhenderson, smeenai

Differential Revision: https://reviews.llvm.org/D84209
2020-08-07 14:44:32 -07:00
Sameer Arora
a753e1d6dd [llvm-libtool-darwin] Add support for -filelist option
Add support for `-filelist` option for llvm-libtool-darwin. `-filelist`
option allows for passing in a file containing a list of filenames.

Reviewed by jhenderson, smeenai

Differential Revision: https://reviews.llvm.org/D84206
2020-08-07 14:29:24 -07:00
Sameer Arora
df248b9fac [llvm-libtool-darwin] Add constant CPU_SUBTYPE_ARM64_V8
Add support for constant MachO::CPU_SUBTYPE_ARM64_V8. This constant is
needed so as to match `llvm-libtool-darwin`'s behavior to that of
cctools' libtool when `-arch_only` flag is passed in on command line.

Reviewed by jhenderson, alexshap, smeenai

Differential Revision: https://reviews.llvm.org/D85041
2020-08-07 14:09:27 -07:00
Vitaly Buka
6669d78639 Revert "[StackSafety] Skip ambiguous lifetime analysis"
This reverts commit 0b2616a8045cb776ea1514c3401d0a8577de1060.

Crashes with safe-stack.
2020-08-07 14:02:50 -07:00
Vitaly Buka
ae3ad63414 [StackSafety,NFC] Add Stats counters 2020-08-07 14:02:50 -07:00
Sameer Arora
a27cdf02e5 Add symlinks for libtool and install_name_tool
Add symlinks for `llvm-libtool-darwin` and
`llvm-install-name-tool`.

Reviewed by jhenderson, smeenai

Differential Revision: https://reviews.llvm.org/D85054
2020-08-07 13:46:36 -07:00
Matt Arsenault
fc0dd4b853 GlobalISel: Handle zext(sext x) in artifact combiner
This eliminates the illegal intermediate s8 value in the added test.
2020-08-07 16:37:46 -04:00
Sameer Arora
218d68982d [FileCheck] Add docs for --allow-empty
This diff adds documentation for `allow-empty` flag under FileCheck
docs.

Reviewed by jhenderson, smeenai, thopre

Differential Revision: https://reviews.llvm.org/D83682
2020-08-07 13:27:57 -07:00
Sameer Arora
8d6c074d8c [llvm-install-name-tool] Adds docs for llvm-install-name-tool
Adding documentation for llvm-install-name-tool.

Reviewed by smeenai, Ktwu

Differential Revision: https://reviews.llvm.org/D81944
2020-08-07 12:51:58 -07:00
Gui Andrade
d255b7f7cb Revert "[MSAN] Instrument libatomic load/store calls"
Problems with instrumenting atomic_load when the call has no successor,
blocking compiler roll

This reverts commit 33d239513c881d8c11c60d5710c55cf56cc309a5.
2020-08-07 19:45:51 +00:00
LLVM GN Syncbot
79c68f2fb8 [gn build] Port 320eab2d558 2020-08-07 19:01:40 +00:00
Yuanfang Chen
cee8d8ef70 Revert "[NewPM][CodeGen] Introduce machine pass and machine pass manager"
This reverts commit 911565d1085d9447363fe8ad041817436c4998fe.

Broke some non-Linux bots.
2020-08-07 11:59:58 -07:00
Jianzhou Zhao
b6143710cb Reduce dropTriviallyDeadConstantArrays cumulative time percentage from 17% to 4%
The history of dropTriviallyDeadConstantArrays is like this. Because the appending linkage uses too much memory (http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150105/251381.html), dropTriviallyDeadConstantArrays was introduced (https://reviews.llvm.org/rG81f385b0c6ea37dd7195a65be162c75bbdef29d2) to release unused constant arrays. Recently, dropTriviallyDeadConstantArrays was improved (https://reviews.llvm.org/rG81f385b0c6ea37dd7195a65be162c75bbdef29d2) to reduce its quadratic cost.

Our recent LTO profiling shows that when a target is large, 15-20% of time cost is from the SetVector::insert called by dropTriviallyDeadConstantArrays.

A large application has hundreds or thousands of modules; each module calls dropTriviallyDeadConstantArrays once for cleaning up tens of thousands of ConstantArrays a module has. In those ConstantArrays, usually around 5 can be deleted; a very very few deleted ConstantArrays reference other ConstantArrays: less than 10 out of millions.

Given this, the cost of SetVector::insert is mainly from the construction of WorkList from ArrayConstants. This motivated the fix that iterates ArrayConstants directly, and uses WorkList only when necessary.

Our evaluation shows that
1) The cumulative time percentage of dropTriviallyDeadConstantArrays is reduced from 15-17% to 4-6%.
2) For targets with LTO time > 20min, the time reduction is about 20%.
3) No observable performance impact for build without using LTO.

{F12506218}
{F12506221}

Reviewed By: mehdi_amini, tejohnson, jdoerfert

Differential Revision: https://reviews.llvm.org/D85379
2020-08-07 11:36:30 -07:00