1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 11:13:28 +01:00
Commit Graph

203834 Commits

Author SHA1 Message Date
Simon Pilgrim
c9cecd455f SymbolizableObjectFile.h - remove unnecessary includes. NFCI.
Use forward declarations where possible, move includes down to SymbolizableObjectFile.cpp and avoid duplicate includes.
2020-09-17 13:18:53 +01:00
Sam Parker
457d1ca725 [NFC][ARM] Tail fold test changes
Run update script on one test and add another.
2020-09-17 13:09:10 +01:00
David Green
380e2e6128 [ARM] Additional tests for qr intrinsics in loops. NFC 2020-09-17 12:39:21 +01:00
Simon Pilgrim
76278d6f79 DwarfStringPool.cpp - remove unnecessary StringRef include. NFCI.
Already included in DwarfStringPool.h
2020-09-17 12:18:27 +01:00
Simon Pilgrim
1a0145c2c3 DwarfFile.h - remove unnecessary includes. NFCI.
Use forward declarations where possible, move includes down to DwarfFile.cpp and avoid duplicate includes.
2020-09-17 12:12:18 +01:00
David Green
2fb62cdb38 [ARM] Extra fp16 bitcast tests. NFC 2020-09-17 12:10:23 +01:00
Nico Weber
8542f0e600 [gn build] (manually) port c9af34027bc 2020-09-17 06:33:24 -04:00
Simon Pilgrim
e8a01fe357 [AsmPrinter] DwarfDebug - use DebugLoc const references where possible. NFC.
Avoid unnecessary copies.
2020-09-17 10:45:54 +01:00
Simon Pilgrim
ff1c52e27b [AMDGPU] Remove orphan SITargetLowering::LowerINT_TO_FP declaration. NFCI.
Method implementation no longer exists.
2020-09-17 10:45:53 +01:00
Simon Pilgrim
2d987e9462 [AsmPrinter] Remove orphan DwarfUnit::shareAcrossDWOCUs declaration. NFCI.
Method implementation no longer exists.
2020-09-17 10:45:52 +01:00
Rainer Orth
a7ca9604c1 [X86] Fix stack alignment on 32-bit Solaris/x86
On Solaris/x86, several hundred 32-bit tests `FAIL`, all in the same way:

  env ASAN_OPTIONS=halt_on_error=false ./halt_on_error_suppress_equal_pcs.cpp.tmp
  Segmentation Fault (core dumped)

They segfault during startup:

  Thread 2 received signal SIGSEGV, Segmentation fault.
  [Switching to Thread 1 (LWP 1)]
  0x080f21f0 in __sanitizer::internal_mmap(void*, unsigned long, int, int, int, unsigned long long) () at /vol/llvm/src/llvm-project/dist/compiler-rt/lib/sanitizer_common/sanitizer_solaris.cpp:65
  65	                             int prot, int flags, int fd, OFF_T offset) {
  1: x/i $pc
  => 0x80f21f0 <_ZN11__sanitizer13internal_mmapEPvmiiiy+16>:	movaps 0x30(%esp),%xmm0
  (gdb) p/x $esp
  $3 = 0xfeffd488

The problem is that `movaps` expects 16-byte alignment, while 32-bit Solaris/x86
only guarantees 4-byte alignment following the i386 psABI.

This patch updates `X86Subtarget::initSubtargetFeatures` accordingly,
handles Solaris/x86 in the corresponding testcase, and allows for some
variation in address alignment in
`compiler-rt/test/ubsan/TestCases/TypeCheck/vptr.cpp`.

Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.

Differential Revision: https://reviews.llvm.org/D87615
2020-09-17 11:17:11 +02:00
Douglas Yung
b4c47725ed Revert "Re-land: Add new hidden option -print-changed which only reports changes to IR"
The test added in this commit is failing on Windows bots:

http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/1269

This reverts commit f9e6d1edc0dad9afb26e773aa125ed62c58f7080 and follow-up commit 6859d95ea2d0f3fe0de2923a3f642170e66a1a14.
2020-09-17 01:32:29 -07:00
Roman Lebedev
36de144674 [NFC] EliminateDuplicatePHINodes(): small-size optimization: if there are <= 32 PHI's, O(n^2) algo is faster (geomean -0.08%)
This is functionally equivalent to the old implementation.

As per https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=4739e6e4eb54d3736e6457249c0919b30f6c855a&stat=instructions
this is a clear geomean compile-time regression-free win with overall geomean of `-0.08%`

32 PHI's appears to be the sweet spot; both the 16 and 64 performed worse:
https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=c4efe1fbbfdf0305ac26cd19eacb0c7774cdf60e&stat=instructions
https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=e4989d1c67010d3339d1a40ff5286a31f10cfe82&stat=instructions

If we have more PHI's than that, we fall-back to the original DenseSet-based implementation,
so the not-so-fast cases will still be handled.

However compile-time isn't the main motivation here.
I can name at least 3 limitations of this CSE:
1. Assumes that all PHI nodes have incoming basic blocks in the same order (can be fixed while keeping the DenseMap)
2. Does not special-handle `undef` incoming values (i don't see how we can do this with hashing)
3. Does not special-handle backedge incoming values (maybe can be fixed by hashing backedge as some magical value)

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87408
2020-09-17 11:29:03 +03:00
Jay Foad
654464aac1 [SplitKit] Only copy live lanes
When splitting a live interval with subranges, only insert copies for
the lanes that are live at the point of the split. This avoids some
unnecessary copies and fixes a problem where copying dead lanes was
generating MIR that failed verification. The test case for this is
test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir.

Without this fix, some earlier live range splitting would create %430:

%430 [256r,848r:0)[848r,2584r:1)  0@256r 1@848r L0000000000000003 [848r,2584r:0)  0@848r L0000000000000030 [256r,2584r:0)  0@256r weight:1.480938e-03
...
256B     undef %430.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec
...
848B     %430.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec
...
2584B    %431:vreg_128 = COPY %430:vreg_128

Then RAGreedy::tryLocalSplit would split %430 into %432 and %433 just
before 848B giving:

%432 [256r,844r:0)  0@256r L0000000000000030 [256r,844r:0)  0@256r weight:3.066802e-03
%433 [844r,848r:0)[848r,2584r:1)  0@844r 1@848r L0000000000000030 [844r,2584r:0)  0@844r L0000000000000003 [844r,844d:0)[848r,2584r:1)  0@844r 1@848r weight:2.831776e-03
...
256B     undef %432.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec
...
844B     undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128 {
           internal %433.sub2:vreg_128 = COPY %432.sub2:vreg_128
848B     }
  %433.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec
...
2584B    %431:vreg_128 = COPY %433:vreg_128

Note that the copy from %432 to %433 at 844B is a curious
bundle-without-a-BUNDLE-instruction that SplitKit creates deliberately,
and it includes a copy of .sub0 which is not live at this point, and
that causes it to fail verification:

*** Bad machine code: No live subrange at use ***
- function:    zextload_global_v64i16_to_v64i64
- basic block: %bb.0  (0x7faed48) [0B;2848B)
- instruction: 844B    undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128
- operand 1:   %432.sub0:vreg_128
- interval:    %432 [256r,844r:0)  0@256r L0000000000000030 [256r,844r:0)  0@256r weight:3.066802e-03
- at:          844B

Using real bundles with a BUNDLE instruction might also fix this
problem, but the current fix is less invasive and also avoids some
unnecessary copies.

https://bugs.llvm.org/show_bug.cgi?id=47492

Differential Revision: https://reviews.llvm.org/D87757
2020-09-17 09:26:11 +01:00
Jay Foad
84f59a0ebd [AMDGPU] Generate test checks for splitkit-copy-bundle.mir
This is a pre-commit for D87757 "[SplitKit] Only copy live lanes".
2020-09-17 09:26:09 +01:00
Sjoerd Meijer
e3e147deb9 [Lint] Add check for intrinsic get.active.lane.mask
As @efriedma pointed out in D86301, this "not equal to 0 check" of
get.active.lane.mask's second operand needs to live here in Lint and not the
Verifier.

Differential Revision: https://reviews.llvm.org/D87228
2020-09-17 09:22:03 +01:00
Qiu Chaofan
e076bde382 [SelectionDAG] Check any use of negation result before removal
2508ef01 fixed a bug about constant removal in negation. But after
sanitizing check I found there's still some issue about it so it's
reverted.

Temporary nodes will be removed if useless in negation. Before the
removal, they'd be checked if any other nodes used it. So the removal
was moved after getNode. However in rare cases the node to be removed is
the same as result of getNode. We missed that and will be fixed by this
patch.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D87614
2020-09-17 16:00:54 +08:00
Fangrui Song
1bd4869627 [llvm-cov gcov] Add --demangled-names (-m)
gcov 4.9 introduced the option.
2020-09-16 23:18:50 -07:00
Igor Kudrin
2fc6c9d1e9 [DebugInfo] Simplify DIEInteger::SizeOf().
An AsmPrinter should always be provided to the method because some forms
depend on its parameters. The only place in the codebase which passed
a nullptr value was found in the unit tests, so the patch updates it to
use some dummy AsmPrinter instead.

Differential Revision: https://reviews.llvm.org/D85293
2020-09-17 12:47:38 +07:00
Fangrui Song
c5286d0793 [llvm-cov gcov][test] Move tests to gcov/
And rename llvm-cov.test (misnomer) to basic.test
2020-09-16 22:42:49 -07:00
Jianzhou Zhao
58b21e82ac Fix the arguments of std::min
fixing
11201315d5
2020-09-17 04:03:31 +00:00
Jianzhou Zhao
a32b89877c Add the header of std::min
fixing
11201315d5
2020-09-17 03:48:36 +00:00
Jianzhou Zhao
76fc5249d5 Flush bitcode incrementally for LTO output
Bitcode writer does not flush buffer until the end by default. This is
fine to small bitcode files. When -flto,--plugin-opt=emit-llvm,-gmlt are
used, the final bitcode file is large, for example, >8G. Keeping all
data in memory consumes a lot of memory.

This change allows bitcode writer flush data to disk early when buffered
data size is above some threshold. This is only enabled when lld emits
LLVM bitcode.

One issue to address is backpatching bitcode: subblock length, function
body indexes, meta data indexes need to backfill. If buffer can be
flushed partially, we introduced raw_fd_stream that supports
read/seek/write, and enables backpatching bitcode flushed in disk.

Reviewed-by: tejohnson, MaskRay

Differential Revision: https://reviews.llvm.org/D86905
2020-09-17 03:32:31 +00:00
LLVM GN Syncbot
0dbccab8b5 [gn build] Port a895040eb02 2020-09-17 03:02:00 +00:00
Stella Stamenova
0778e74e6b Revert "[IRSim] Adding IR Instruction Mapper"
This reverts commit b04c1a9d3127730c05e8a22a0e931a12a39528df.
2020-09-16 20:00:43 -07:00
David Blaikie
2d08a455d6 debug_rnglists/symbolizing: reduce memory usage by not caching rnglists
This matches the debug_ranges behavior - though is currently implemented
differently. (the debug_ranges parsing was handled by creating a new
ranges parser during DIE address querying, and just destroying it after
the query - whereas the rnglists parser is a member of the DWARFUnit
currently - so the API doesn't cache anymore)

I think this could/should be improved by not parsing debug_rnglists
headers at all when dumping debug_info or symbolizing - do it the way
DWARF (roughly) intended: take the rnglists_base, add addr*index to it,
read the offset, parse the list at rnglists_base+offset. This would have
no error checking for valid index (because the number of valid indexes
is stored in the header, which has a negative offset from rnglists_base
- and is sort of only intended for use by dumpers, not by parsers going
from debug_info to a rnglist) or out of contribution bounds access
(since it wouldn't know the length of the contribution, also in the
header) - nor any error-checking that the rnglist contribution was using
the same properties as the debug_info (version, DWARF32/64, address
size, etc).
2020-09-16 19:36:07 -07:00
Qiu Chaofan
51f005dc6b [PowerPC] Fix store-fptoi combine of f128 on Power8
llc would crash for (store (fptosi-f128-i32)) when -mcpu=pwr8, we should
not generate FP_TO_(S|U)INT_IN_VSR for f128 types at this time. This
patch fixes it.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D86686
2020-09-17 10:21:35 +08:00
Chen Zheng
adc072cade [MachineSink] add one more mir case - nfc 2020-09-16 22:03:06 -04:00
LLVM GN Syncbot
94a2e109f9 [gn build] Port b04c1a9d312 2020-09-17 01:54:10 +00:00
Andrew Litteken
5831702c50 [IRSim] Adding IR Instruction Mapper
This introduces the IRInstructionMapper, and the associated wrapper for
instructions, IRInstructionData, that maps IR level Instructions to
unsigned integers.

Mapping is done mainly by using the "isSameOperationAs" comparison
between two instructions.  If they return true, the opcode, result type,
and operand types of the instruction are used to hash the instruction
with an unsigned integer.  The mapper accepts instruction ranges, and
adds each resulting integer to a list, and each wrapped instruction to
a separate list.

At present, branches, phi nodes are not mapping and exception handling
is illegal.  Debug instructions are not considered.

The different mapping schemes are tested in
unittests/Analysis/IRSimilarityIdentifierTest.cpp

Differential Revision: https://reviews.llvm.org/D86968
2020-09-16 20:49:21 -05:00
Arthur Eubanks
42f2f416c4 [NewPM] Port -print-alias-sets to NPM
Really it should be named print<alias-sets>, but for the sake of
changing fewer tests, added a TODO to rename after NPM switch and test
cleanup.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87713
2020-09-16 18:34:56 -07:00
Alina Sbirlea
1da924dccf [MemorySSA] Rename uses in blocks with Phis.
Renaming should include blocks with existing Phis.

Resolves PR45927.

Differential Revision: https://reviews.llvm.org/D87661
2020-09-16 17:24:17 -07:00
Craig Topper
5e975a1c0f [DAGCombiner] Teach visitMSTORE to replace an all ones mask with an unmasked store.
Similar to what done in D87788 for MLOAD.

Again I've skipped indexed, truncating, and compressing stores.
2020-09-16 16:42:22 -07:00
Daniel Kiss
3aa2ecd346 [AArch64] Add -mmark-bti-property flag.
Writing the .note.gnu.property manually is error prone and hard to
maintain in the assembly files.
The -mmark-bti-property is for the assembler to emit the section with the
GNU_PROPERTY_AARCH64_FEATURE_1_BTI. To be used when C/C++ is compiled
with -mbranch-protection=bti.

This patch refactors the .note.gnu.property handling.

Reviewed By: chill, nickdesaulniers

Differential Revision: https://reviews.llvm.org/D81930

Reland with test dependency on aarch64 target.
2020-09-17 01:18:36 +02:00
Daniel Kiss
2099a5d915 Revert "[AArch64] Add -mmark-bti-property flag."
This reverts commit 95e43f84b7b9c61011aece7583c0367297dd67d8.
2020-09-17 01:17:23 +02:00
Michael Liao
aa0a69fb07 [EarlyCSE] Simplify max/min pattern matching. NFC. 2020-09-16 18:34:46 -04:00
Nico Weber
54684569c2 [gn build] (manually) port 1321160a2 2020-09-16 18:29:07 -04:00
Daniel Kiss
6c5fe458bd [AArch64] Add -mmark-bti-property flag.
Writing the .note.gnu.property manually is error prone and hard to
maintain in the assembly files.
The -mmark-bti-property is for the assembler to emit the section with the
GNU_PROPERTY_AARCH64_FEATURE_1_BTI. To be used when C/C++ is compiled
with -mbranch-protection=bti.

This patch refactors the .note.gnu.property handling.

Reviewed By: chill, nickdesaulniers

Differential Revision: https://reviews.llvm.org/D81930
2020-09-17 00:24:14 +02:00
jasonliu
989dca51e1 Disable a large test for EXPENSIVE_CHECKS and debug build
Summary:
When running a large test in LLVM_ENABLE_EXPENSIVE_CHECKS=ON mode,
buildbot could hit timeout.
Disable the test when this mode is on.
Also disable it for debug so that the test won't hang for too long.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D87794
2020-09-16 21:57:34 +00:00
Rahman Lavaee
e3e9dd3c7b [obj2yaml] - Match ".stack_size" with the original section name, and not the uniquified name.
Without this patch, obj2yaml decodes the content of only one ".stack_size" section. Other sections are dumped with their full contents.

Reviewed By: grimar, MaskRay

Differential Revision: https://reviews.llvm.org/D87727
2020-09-16 14:17:29 -07:00
Mircea Trofin
e41d0852e6 [NFC][regalloc] type LiveInterval::reg() as Register
We have the Register type which precisely captures the role of this
member. Storage-wise, it's an unsigned.

This helps readability & maintainability.

Differential Revision: https://reviews.llvm.org/D87768
2020-09-16 14:11:26 -07:00
Stanislav Mekhanoshin
c2dd96e0bf [AMDGPU] gfx1030 test update. NFC. 2020-09-16 13:56:16 -07:00
Lang Hames
a2132fa761 [ORC] Add operations to create and lookup JITDylibs to OrcV2 C bindings. 2020-09-16 13:49:30 -07:00
Craig Topper
dcfdc54cf8 [DAGCombiner] Teach visitMLOAD to replace an all ones mask with an unmasked load
If we have an all ones mask, we can just a regular masked load. InstCombine already gets this in IR. But the all ones mask can appear after type legalization.

Only avx512 test cases are affected because X86 backend already looks for element 0 and the last element being 1. It replaces this with an unmasked load and blend. The all ones mask is a special case of that where the blend will be removed. That transform is only enabled on avx2 targets. I believe that's because a non-zero passthru on avx2 already requires a separate blend so its more profitable to handle mixed constant masks.

This patch adds a dedicated all ones handling to the target independent DAG combiner. I've skipped extending, expanding, and index loads for now. X86 doesn't use index so I don't know much about it. Extending made me nervous because I wasn't sure I could trust the memory VT had the right element count due to some weirdness in vector splitting. For expanding I wasn't sure if we needed different undef handling.

Differential Revision: https://reviews.llvm.org/D87788
2020-09-16 13:21:16 -07:00
Craig Topper
87cd53cd27 [X86] Add test case for a masked load mask becoming all ones after type legalization.
We should be able to turn this into a unmasked load. X86 has an
optimization to detect that the first and last element aren't masked
and then turn the whole thing into an unmasked load and a blend.
That transform is disabled on avx512 though.

But if we know the blend isn't needed, then the unmasked load by
itself should always be profitable.
2020-09-16 13:10:04 -07:00
Philip Reames
1d6082b5e3 [aarch64][tests] Add tests which show current lack of implicit null support
I will be posting a patch which adds appropriate target support shortly; landing the tests so that the diffs are clear.
2020-09-16 12:55:29 -07:00
David Greene
eb1409d08e [UpdateTestChecks] Allow $ in function names
Some compilers generation functions with '$' in their names, so recognize those
functions.

This also requires recognizing function names inside quotes in some contexts in
order to escape certain characters.

Differential Revision: https://reviews.llvm.org/D82995
2020-09-16 14:34:18 -05:00
LLVM GN Syncbot
05cd8277f1 [gn build] Port 56069b5c71c 2020-09-16 19:03:25 +00:00
Nikita Popov
4a43cab9e0 Reapply [InstCombine] Simplify select operand based on equality condition
Reapply after fixing SimplifyWithOpReplaced() to never return
the original value, which would lead to an infinite loop in this
transform.

-----

For selects of the type X == Y ? A : B, check if we can simplify A
by using the X == Y equality and replace the operand if that's
possible. We already try to do this in InstSimplify, but will only
fold if the result of the simplification is the same as B, in which
case the select can be dropped entirely. Here the select will be
retained, just one operand simplified.

As we are performing an actual replacement here, we don't have
problems with refinement / poison values.

Differential Revision: https://reviews.llvm.org/D87480
2020-09-16 20:53:58 +02:00
Nikita Popov
3a8ed708c6 [InstSimplify] Clarify SimplifyWithOpReplaced() return value
If SimplifyWithOpReplaced() cannot simplify the value, null should
be returned. Make sure this really does happen in all cases,
including those where SimplifyBinOp() returns the original value.

This does not matter for existing users, but does mattter for
D87480, which would go into an infinite loop otherwise.
2020-09-16 20:53:26 +02:00