1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-18 18:42:46 +02:00
Commit Graph

201641 Commits

Author SHA1 Message Date
Kazu Hirata
fc9284d5ff [docs] Fix typos 2020-08-09 19:31:49 -07:00
Juneyoung Lee
6fcfc1d6c4 [BuildLibCalls] Add noundef to standard I/O functions
This patch adds noundef to return value and arguments of standard I/O functions.
With this patch, passing undef or poison to the functions becomes undefined
behavior in LLVM IR. Since undef/poison is lowered from operations having UB in C/C++,
passing undef to them was already UB in source.

With this patch, the functions cannot return undef or poison anymore as well.
According to C17 standard, ungetc/ungetwc/fgetpos/ftell can generate unspecified
value; 3.19.3 says unspecified value is a valid value of the relevant type,
and using unspecified value is unspecified behavior, which is not UB, so it
cannot be undef (using undef is UB when e.g. it is used at branch condition).

— The value of the file position indicator after a successful call to the ungetc function for a text stream, or the ungetwc function for any stream, until all pushed-back characters are read or discarded (7.21.7.10, 7.29.3.10).
— The details of the value stored by the fgetpos function (7.21.9.1).
— The details of the value returned by the ftell function for a text stream (7.21.9.4).

In the long run, most of the functions listed in BuildLibCalls should have noundefs; to remove redundant diffs which will anyway disappear in the future, I added noundef to a few more non-I/O functions as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85345
2020-08-10 10:58:25 +09:00
Shinji Okumura
6da8fa8433 [Attributor][NFC][AAPotentialValues] Change interface of PotentialValuesState
Previously `PotentialValuesState` inherited `BooleanState`.
We have to add `getAssumed` to the state in order to use `clampStateAndIndicateChange` (which will be used in `AAPotentialValuesArgument`).
However `BooleanState::getAssumed` is not a virtual function and we cannot override it.
Therefore, I changed the state not to inherit `BooleanState` and add `getAssumed` to it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85610
2020-08-10 09:18:10 +09:00
Vitaly Buka
00ccee5400 [StackSafety] Don't keep FullSet in index
Optimization. Missing record is enterpreted as FullSet anyway.
2020-08-09 15:01:46 -07:00
Vitaly Buka
4a5ca55ac8 [StackSafety] Use getSignedMin() to serialize ranges
Almost NFC as it's important only for full sets which should not
be serialized at all.
2020-08-09 14:53:13 -07:00
Vitaly Buka
231239123a [NFC][StackSafety] Add index test
This directly covers generateParamAccessSummary
2020-08-09 14:34:00 -07:00
Vitaly Buka
dfa0038a6e [NFC][StackSafety] Add shell test requirement 2020-08-09 14:31:17 -07:00
Dávid Bolvanský
96beb2ab81 [X86] Added testcases for PR47024 and PR46315 2020-08-09 22:05:21 +02:00
Vitaly Buka
7703373325 [NFC][StackSafety] Avoid some duplications in tests 2020-08-09 12:38:53 -07:00
Craig Topper
b4370f3b78 [X86][GlobalISel] Enable a test case for sext i32->i64 that was commented out.
Test case seems to work so I guess whatever lead to it being
commented out with a FIXME has been fixed.
2020-08-09 12:01:34 -07:00
Piotr Sobczak
cb35142c34 Fix 64-bit copy to SCC
Fix 64-bit copy to SCC by restricting the pattern resulting
in such a copy to subtargets supporting 64-bit scalar compare,
and mapping the copy to S_CMP_LG_U64.

Before introducing the S_CSELECT pattern with explicit SCC
(0045786f146e78afee49eee053dc29ebc842fee1), there was no need
for handling 64-bit copy to SCC ($scc = COPY sreg_64).

The proposed handling to read only the low bits was however
based on a false premise that it is only one bit that matters,
while in fact the copy source might be a vector of booleans and
all bits need to be considered.

The practical problem of mapping the 64-bit copy to SCC is that
the natural instruction to use (S_CMP_LG_U64) is not available
on old hardware. Fix it by restricting the problematic pattern
to subtargets supporting the instruction (hasScalarCompareEq64).

Differential Revision: https://reviews.llvm.org/D85207
2020-08-09 20:50:30 +02:00
Florian Hahn
9f66518378 [InstSimplify] Make sure CanUseUndef is initialized in all cases.
This should fix a bunch of buildbot failures.
2020-08-09 19:47:16 +01:00
Florian Hahn
564f5c4ac7 [InstSimplify/NewGVN] Add option to control the use of undef.
Making use of undef is not safe if the simplification result is not used
to replace all uses of the result. This leads to problems in NewGVN,
which does not replace all uses in the IR directly. See PR33165 for more
details.

This patch adds an option to SimplifyQuery to disable the use of undef.

Note that I've only guarded uses if isa<UndefValue>/m_Undef where
SimplifyQuery is currently available. If we agree on the general
direction, I'll update the remaining uses.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84792
2020-08-09 19:16:56 +01:00
Florian Hahn
2c34117c7c [SCEVExpander] Make sure cast properly dominates Builder's IP.
The selected cast must properly dominate the Builder's IP, so we cannot
re-use the cast, if it matches the builder's IP.
2020-08-09 16:51:19 +01:00
Aditya Kumar
b003049fb3 [HotColdSplit] Add options for splitting cold functions in separate section
Add support for (if enabled) splitting cold functions into a separate section
in order to further boost locality of hot code.

Authored by: rjf (Ruijie Fang)
Reviewed by: hiraditya,rcorcs,vsk

Differential Revision: https://reviews.llvm.org/D85331
2020-08-09 08:48:12 -07:00
Dávid Bolvanský
1bcc62d24e [Tests] Precommit tests for D85593 2020-08-09 16:07:36 +02:00
Sanjay Patel
0cee2d7cd7 [VectorCombine] try to create vector loads from scalar loads
This patch was adjusted to match the most basic pattern that starts with an insertelement
(so there's no extract created here). Hopefully, that removes any concern about
interfering with other passes. Ie, the transform should almost always be profitable.

We could make an argument that this could be part of canonicalization, but we
conservatively try not to create vector ops from scalar ops in passes like instcombine.

If the transform is not profitable, the backend should be able to re-scalarize the load.

Differential Revision: https://reviews.llvm.org/D81766
2020-08-09 09:05:06 -04:00
Florian Hahn
4fd7f95ec0 [SCEVExpander] Avoid re-using existing casts if it means updating users.
Currently the SCEVExpander tries to re-use existing casts, even if they
are not exactly at the insertion point it was asked to create the cast.
To do so in some case, it creates a new cast at the insertion point and
updates all users to use the new cast.

This behavior is problematic, because it changes the IR outside of the
instructions created during the expansion. Therefore we cannot
completely undo all changes made during expansion.

This re-use should be only an extra optimization, so only using the new
cast in the expanded instructions should not be a correctness issue.
There are many cases equivalent instructions are created during
expansion.

This patch also adjusts findInsertPointAfter to skip instructions
inserted during expansion. This enables re-using existing casts without
the renaming any uses, by picking a better insertion point.

Reviewed By: efriedma, lebedev.ri

Differential Revision: https://reviews.llvm.org/D84399
2020-08-09 13:25:17 +01:00
David Green
2fe5a9cb44 [ARM] Add VADDV and VMLAV patterns for v16i16
This adds patterns for v16i16's vecreduce, using all the existing code
to go via an i32 VADDV/VMLAV and truncating the result.

Differential Revision: https://reviews.llvm.org/D85452
2020-08-09 11:09:49 +01:00
David Green
9669ed5ad9 [ARM] Allow vecreduce_add in tail predicated loops
This allows vecreduce_add in loops so that we can tailpredicate them.

Differential Revision: https://reviews.llvm.org/D85454
2020-08-09 10:57:17 +01:00
David Green
0bb44acbe5 [ARM] Some formatting and predicate VRHADD patterns. NFC
This formats some of the MVE patterns, and adds a missing
Predicates = [HasMVEInt] to some VRHADD patterns I noticed
as going through. Although I don't believe NEON would ever
use the patterns (as it would use ADDL and VSHRN instead)
they should ideally be predicated on having MVE instructions.
2020-08-09 10:07:52 +01:00
Georgii Rymar
54d50f63c3 [llvm-readelf/obj] - Refine the implementation of printMipsReginfo().
It adds the proper warnings reporting and updates the mips-reginfo.test to
remove using of the precompiled binary.

Differential revision: https://reviews.llvm.org/D85511
2020-08-09 11:10:12 +03:00
Georgii Rymar
160db82d87 [llvm-readobj] - Remove 3 excessive test cases.
This patch does the following:

1) Removes mips-options.test and the corresponding Inputs/mips-options.elf-mips64el binary:
This is a test that checks that --dynamic-table is able to print the DT_MIPS_OPTIONS tag.
We are testing it in dynamic-tags-machine-specific.test already.
(https://github.com/llvm/llvm-project/blob/master/llvm/test/tools/llvm-readobj/ELF/dynamic-tags-machine-specific.test#L235)

2) Removes mips-rld-map-rel.test and the corresponding Inputs/mips-rld-map-rel.elf-mipsel binary.
This is a test that checks that --dynamic-table is able to print the DT_MIPS_RLD_MAP_REL tag.
We are testing it in dynamic-tags-machine-specific.test already.
(https://github.com/llvm/llvm-project/blob/master/llvm/test/tools/llvm-readobj/ELF/dynamic-tags-machine-specific.test#L257)

3) Removes ppc64-glink.test test and the corresponding Inputs/ppc64.exe binary.
This is a test that checks that --dynamic-table is able to print the DT_PPC64_GLINK tag.
We are testing it in dynamic-tags-machine-specific.test already.
(https://github.com/llvm/llvm-project/blob/master/llvm/test/tools/llvm-readobj/ELF/dynamic-tags-machine-specific.test#L337)

Differential revision: https://reviews.llvm.org/D85515
2020-08-09 11:00:49 +03:00
Craig Topper
e8fb646664 [X86][GlobalISel] Remove unneeded code for handling zext i8->16, i8->i64, i16->i64, i32->i64.
These all seem to be handled by tablegen pattern imports.
2020-08-09 00:26:15 -07:00
Craig Topper
7201018573 [DAGCombiner] Teach SimplifySetCC SETUGE X, SINTMIN -> SETLT X, 0 and SETULE X, SINTMAX -> SETGT X, -1.
These aren't the canonical forms we'd get from InstCombine, but
we do have X86 tests for them. Recognizing them is pretty cheap.

While there make use of APInt:isSignedMinValue/isSignedMaxValue
instead of creating a new APInt to compare with. Also use
SelectionDAG::getAllOnesConstant helper to hide the all ones
APInt creation.
2020-08-08 22:27:16 -07:00
Craig Topper
3488483f10 [X86] Autogenerate complete checks. NFC 2020-08-08 22:12:34 -07:00
Vitaly Buka
567623e324 Revert "[NFC][StackSafety] Add index test"
This reverts commit 5fd49911db546cda6b35dffb6be440385e8d96d5.

GUIDs don't match.
2020-08-08 21:26:35 -07:00
Vitaly Buka
49cd07bb08 [NFC][StackSafety] Add index test
This directly covers generateParamAccessSummary
2020-08-08 19:11:02 -07:00
Vitaly Buka
115fa28020 [NFC][StackSafety] noinline in alias tests 2020-08-08 18:21:52 -07:00
weihe
68e894ab3c [llvm-profdata] Implement llvm-profdata overlap for sample profiles
Implemented the `llvm-profdata overlap` feature for sample profiles. It reports weighted //similarity// and unweighted //overlap// metrics at program and function level for two input profiles. Similarity metrics are symmetric with regards to the order of two input profiles. By default, the tool only reports program-level summary. Users can look into function-level details via additional options `--function`, `--similarity-cutoff`, and `--value-cutoff`.

The similarity metrics are designed as follows:
* Program-level summary
    * Whole program profile similarity is an aggregate over function-level similarity `FS`: `PS = sum(FS(A) * avg_weight(A))` for all function `A`.
    * Whole program sample overlap: `PSO = common_samples / total_samples`.
    * Function overlap: `FO = #common_function / #total_function`.
    * Hot-function overlap: `HFO = #common_hot_function / #total_hot_function`.
    * Hot-block overlap: `HBO = #common_hot_block / #total_hot_block`.
* Function-level details
    * Function-level similarity is an aggregate over line/block-level similarities `BS` of all sample lines/blocks in the function, weighted by the closeness of the function's weights in two profiles: `FS = sum(BS(i)) * (1 - weight_distance(A))`.
    * Function-level sample overlap: `FSO = common_samples / total_samples` for samples in the function.

Reviewed By: wenlei, hoyFB, wmi

Differential Revision: https://reviews.llvm.org/D83852
2020-08-08 17:49:48 -07:00
Petr Hosek
6c27d09879 Revert "[CMake] Simplify CMake handling for zlib"
This reverts commit ccbc1485b55ff4acd21bcfafbf7aec4ed0fd818d which
is still failing on the Windows MLIR bots.
2020-08-08 17:08:23 -07:00
Petr Hosek
af8170b5ad [CMake] Simplify CMake handling for zlib
Rather than handling zlib handling manually, use find_package from CMake
to find zlib properly. Use this to normalize the LLVM_ENABLE_ZLIB,
HAVE_ZLIB, HAVE_ZLIB_H. Furthermore, require zlib if LLVM_ENABLE_ZLIB is
set to YES, which requires the distributor to explicitly select whether
zlib is enabled or not. This simplifies the CMake handling and usage in
the rest of the tooling.

This is a reland of abb0075 with all followup changes and fixes that
should address issues that were reported in PR44780.

Differential Revision: https://reviews.llvm.org/D79219
2020-08-08 16:44:08 -07:00
Thomas Lively
23fb49ac5f [WebAssembly] Fix FastISel address calculation bug
Fixes PR47040, in which an assertion was improperly triggered during
FastISel's address computation. The issue was that an `Address` set to
be relative to the FrameIndex with offset zero was incorrectly
considered to have an unset base. When the left hand side of an add
set the Address to be 0 off the FrameIndex, the right side would not
detect that the Address base had already been set and could try to set
the Address to be relative to a register instead, triggering an
assertion.

This patch fixes the issue by explicitly tracking whether an `Address`
has been set rather than interpreting an offset of zero to mean the
`Address` has not been set.

Differential Revision: https://reviews.llvm.org/D85581
2020-08-08 15:23:11 -07:00
Craig Topper
2826efbf81 [X86] Remove a DCI.isBeforeLegalize() call from combineVSelectWithAllOnesOrZeros.
This was blocking isTypeLegal call so that we could do a particular
transform on illegal types before type legalization. But the we
create a target specific node using that type. We shouldn't do
that if the type isn't legal. So I think we should just always
make sure the type is legal.

I suspect that in order to get the condition VT to not be a vector
of i1 we already completed type legalization anyway so this probably
doesn't matter much in practice.
2020-08-08 14:19:13 -07:00
Roman Lebedev
2f56715ea7 [Reduce] Rewrite function body delta pass again
It is not enough to replace all uses of users of the function with undef,
the users, we only drop instruction users, so they may stick around.

Let's try different approach - first drop bodies for all the functions
we will drop, which should take care of blockaddress issue the previous
rewrite was dealing with; then, after dropping *all* such bodies,
replace remaining uses with undef (thus all the uses are either
outside of functions, or are in kept functions)
and then finally drop functions.

This seems to work, and passes the *existing* test coverage,
but it is possible that a new issue will be discovered later :)

A new (previously crashing) test added.
2020-08-08 23:48:44 +03:00
Craig Topper
ffa193cdec [X86] Support matching VPTERNLOG when the root node is X86ISD::ANDNP. 2020-08-08 13:11:47 -07:00
Craig Topper
83ecd17734 [X86] Add VPTERNLOG test cases where the root node will be X86ISD::ANDNP. NFC
We currently fail to match this.
2020-08-08 12:53:28 -07:00
Dávid Bolvanský
5bc09382f4 [AArch64RegisterInfo] Supress new warning 2020-08-08 21:47:01 +02:00
Craig Topper
3821fbba14 [X86] Remove isSafeToClobberEFLAGS helper and just inline it into the call sites.
This is just a thin wrapper around computeRegisterLivness which
we can just call directly. The only real difference is that
isSafeToClobberEFLAGS returns a bool and computeRegisterLivness
returns an enum. So we need to check for the specific enum value
that isSafeToClobberEFLAGS was hiding.

I've also adjusted which sites pass an explicit value for
Neighborhood since the default for computeRegisterLivness is 10.
2020-08-08 12:31:58 -07:00
Craig Topper
c01ba6831e Recommit "[X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places"
I messed up the bug numbers in the commit message before

Previously this function searched 4 instructions forwards or
backwards to determine if it was ok to clobber eflags.

This is called in 3 places: rematerialization, turning 2 operand
leas into adds or splitting 3 ops leas into an lea and add on some
CPU targets.

This patch increases the search limit to 10 instructions for
rematerialization and 2 operand lea to add. I've left the old
treshold for 3 ops lea spliting as that increases code size.

Fixes PR47024 and PR46315.
2020-08-08 11:53:14 -07:00
Craig Topper
2d70f5fabd Revert "[X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places"
This reverts commit 44b260cb0aab387d85e4d59c16fc7b8866264f5e.

I messed up the bug number in the commit message so I'm reverting
to fix it.
2020-08-08 11:53:14 -07:00
Dávid Bolvanský
b26d8d1ccf [FileCheckTest] Supress new warning 2020-08-08 20:45:24 +02:00
Simon Pilgrim
9b68c08ed2 [X86][SSE] combineTargetShuffle - use scaleShuffleMask helper to widen shuffle mask. NFCI.
Use scaleShuffleMask helper for the shuffle(hadd,hadd) canonicalization.
2020-08-08 19:36:18 +01:00
Craig Topper
b998790f91 [X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places
Previously this function searched 4 instructions forwards or
backwards to determine if it was ok to clobber eflags.

This is called in 3 places: rematerialization, turning 2 operand
leas into adds or splitting 3 ops leas into an lea and add on some
CPU targets.

This patch increases the search limit to 10 instructions for
rematerialization and 2 operand lea to add. I've left the old
treshold for 3 ops lea spliting as that increases code size.

Fixes PR47024 and PR43014
2020-08-08 11:29:41 -07:00
Simon Pilgrim
733733f727 [InstCombine] Use CreateVectorSplat(ElementCount) variant directly
This was introduced at rGe20223672100, and the CreateVectorSplat(unsigned NumElements) variant calls it internally
2020-08-08 19:26:02 +01:00
Roman Lebedev
903dd081e7 [SimplifyCFG] Fix invoke->call fold w/ multiple invokes in presence of lifetime intrinsics
SimplifyCFG has two main folds for resumes - one when resume is directly
using the landingpad, and the other one where resume is using a PHI node.

While for the first case, we were already correctly ignoring all the
PHI nodes, and both the debug info intrinsics and lifetime intrinsics,
in the PHI-based-one, we weren't ignoring PHI's in the resume block,
and weren't ignoring lifetime intrinsics. That is clearly a bug.

On RawSpeed library, this results in +9.34% (+81) more invoke->call folds,
-0.19% (-39) landing pads, -0.24% (-81) invoke instructions
but +51 call instructions and -132 basic blocks.

Though, the run-time performance impact appears to be within the noise.
2020-08-08 20:00:28 +03:00
Roman Lebedev
5027b49a96 [NFC][SimplifyCFG] Rewrite isCleanupBlockEmpty() to be iterator_range-based 2020-08-08 20:00:28 +03:00
Roman Lebedev
b96154f169 [NFC][SimplifyCFG] Add a test showing invoke->call simplification failure 2020-08-08 20:00:28 +03:00
Roman Lebedev
152f29684a [NFC][SimplifyCFG] Count the number of invokes turned into calls due to empty cleanup blocks 2020-08-08 20:00:27 +03:00
Sanjay Patel
33d6e8d6a8 [DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division, part 2
Follow-up to D82716 / rGea71ba11ab11
We do not have the fabs removal fold in IR yet for the case
where the sqrt operand is repeated, so that's another potential
improvement.
2020-08-08 10:38:06 -04:00