1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 18:54:02 +01:00
Commit Graph

202623 Commits

Author SHA1 Message Date
Christopher Tetreault
c849a98c1b [SVE] Remove calls to VectorType::getNumElements from IR
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D81500
2020-08-27 11:16:10 -07:00
Matt Arsenault
9b9f0675a4 GlobalISel: Use & operator on KnownBits
Avoid repeating for zero and one
2020-08-27 14:07:18 -04:00
Matt Arsenault
b3488037c4 GlobalISel: Implement known bits for G_MERGE_VALUES 2020-08-27 14:07:18 -04:00
Mikhail Maltsev
510acaee86 [ARM][BFloat16] Change types of some Arm and AArch64 bf16 intrinsics
Add bitcode files which got truncated to 0 length in phabricator.

Differential Revision: https://reviews.llvm.org/D86146
2020-08-27 18:52:59 +01:00
Matt Arsenault
942e4bdca0 GlobalISel: Remove leftover lit.local.cfg
The global-isel feature has been required for a long time and was
removed in c9455d3c579292e7ae5b7559ad0302d459e69a95, so this was
causing all tests to be skipped.
2020-08-27 13:49:06 -04:00
Mikhail Maltsev
f7e914e2c5 [ARM][BFloat16] Change types of some Arm and AArch64 bf16 intrinsics
This patch adjusts the following ARM/AArch64 LLVM IR intrinsics:
- neon_bfmmla
- neon_bfmlalb
- neon_bfmlalt
so that they take and return bf16 and float types. Previously these
intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from
implementation lacking bf16 IR type).

The neon_vbfdot[q] intrinsics are adjusted similarly. This change
required some additional selection patterns for vbfdot itself and
also for vector shuffles (in a previous patch) because of SelectionDAG
transformations kicking in and mangling the original code.

This patch makes the generated IR cleaner (less useless bitcasts are
produced), but it does not affect the final assembly.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D86146
2020-08-27 18:43:16 +01:00
Craig Topper
5106495e6e [X86] Don't call hasFnAttribute and getFnAttribute for 'prefer-vector-width' and 'min-legal-vector-width' in getSubtargetImpl
We only need to call getFnAttribute and then check if the Attribute
is None or not.
2020-08-27 10:40:20 -07:00
Owen Anderson
36eea6621c Reapply D70800: Fix AArch64 AAPCS frame record chain
Original Commit Message:
After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register)
may be placed in the middle of a stack frame if a function has both callee-saved
general-purpose registers and floating point registers. This will break the stack unwinders
that simply walk through the frame records (based on the guarantee from AAPCS64
"The Frame Pointer" section). This commit fixes the problem by adding the frame record offset.

Patch By: logan
Differential Revision: D70800
2020-08-27 17:29:41 +00:00
Teresa Johnson
362b3e8fee [HeapProf] Fix bot failures from instrumentation pass
Fix bot failure from 7ed8124d46f94601d5f1364becee9cee8538265e:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/8533

Since we are always using dynamic shadow,
insertDynamicShadowAtFunctionEntry should always return true for
modifying the function.
2020-08-27 10:21:19 -07:00
LLVM GN Syncbot
67bc8c2cb7 [gn build] Port 7ed8124d46f 2020-08-27 17:08:02 +00:00
Arthur Eubanks
eaf6c380e3 [gn build] Manually port c9455d3 2020-08-27 10:05:34 -07:00
Arthur Eubanks
0ab0d3a7dd [test][Inliner] Make always-inline.ll work with NPM
The NPM doesn't support call-site alwaysinline as described in the comments.

Also make NPM runs more similar to legacy PM runs.

Reviewed By: ychen, asbirlea

Differential Revision: https://reviews.llvm.org/D86663
2020-08-27 10:00:54 -07:00
Aditya Nandakumar
90acb0696f [GISel] Add new GISel combiners for G_SELECT
https://reviews.llvm.org/D83833

Patch adds two new GICombinerRules for G_SELECT. The rules include:
combining selects with undef comparisons into their first selectee value,
and to combine away selects with constant comparisons. Patch additionally
adds a new combiner test for the AArch64 target to test these new G_SELECT
combiner rules and the existing select_same_val combiner rule.

Patch by  mkitzan
2020-08-27 09:40:15 -07:00
Arthur Eubanks
104df8538a [OCaml] Remove add_constant_propagation
After https://reviews.llvm.org/D85159.
2020-08-27 09:30:21 -07:00
Simon Moll
2afc6e83ea [sda][nfc] clang-formatting 2020-08-27 18:27:44 +02:00
Shinji Okumura
1054ad3409 [Attributor] Add a phase flag to Attributor
Add a new flag that indicates which stage in the process we are in.
This flag is introduced for handling behavior of `getAAFor` according to the stage. (discussed in D86635)

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86678
2020-08-28 01:16:38 +09:00
Aditya Nandakumar
46055bcec5 [GISel]: Fix one more CSE Non determinism
https://reviews.llvm.org/D86676

Sometimes we can have the following code

 x:gpr(s32) = G_OP

Say we build G_OP2 to the same x and then delete the previous instruction. Using something like

 Register X = ...;
 auto NewMIB = CSEBuilder.buildOp2(X, ... args);

Currently there's a mismatch in how NewMIB is profiled and inserted into the CSEMap (ie it doesn't consider register bank/register class along with type).Unify the profiling by refactoring and calling the common method.

This was found by turning on the CSEInfo::verify in at the end of each of our GISel passes which turns inconsistent state/non determinism in CSEing into crashes which likely usually indicates missing calls to Observer on mutations (the most common case). Here non determinism usually means not cseing sometimes, but almost never about producing incorrect code.
Also this patch adds this verification at the end of the combiners as well.
2020-08-27 09:06:21 -07:00
Lucas Prates
4265553ed8 [CodeGen] Properly propagating Calling Convention information when lowering vector arguments
When joining the legal parts of vector arguments into its original value
during the lower of Formal Arguments in SelectionDAGBuilder, the Calling
Convention information was not being propagated for the handling of each
individual parts. The same did not happen when lowering calls, causing a
mismatch.

This patch fixes the issue by properly propagating the Calling
Convention details.

This fixes Bugzilla #47001.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D86715
2020-08-27 17:01:10 +01:00
Teresa Johnson
bce40acc54 [HeapProf] Clang and LLVM support for heap profiling instrumentation
See RFC for background:
http://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html

Note that the runtime changes will be sent separately (hopefully this
week, need to add some tests).

This patch includes the LLVM pass to instrument memory accesses with
either inline sequences to increment the access count in the shadow
location, or alternatively to call into the runtime. It also changes
calls to memset/memcpy/memmove to the equivalent runtime version.
The pass is modeled on the address sanitizer pass.

The clang changes add the driver option to invoke the new pass, and to
link with the upcoming heap profiling runtime libraries.

Currently there is no attempt to optimize the instrumentation, e.g. to
aggregate updates to the same memory allocation. That will be
implemented as follow on work.

Differential Revision: https://reviews.llvm.org/D85948
2020-08-27 08:50:35 -07:00
Roman Lebedev
2088bfe3c4 [InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block
Apparently, we don't do this, neither in EarlyCSE, nor in InstSimplify,
nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places..

While i could teach EarlyCSE how to hash PHI nodes,
we can't really do much (anything?) even if we find two identical
PHI nodes in different basic blocks, same-BB case is the interesting one,
and if we teach InstSimplify about it (which is what i wanted originally,
https://reviews.llvm.org/D86530), we get EarlyCSE support for free.

So i would think this is pretty uncontroversial.

On vanilla llvm test-suite + RawSpeed, this has the following effects:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |    \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instsimplify.NumPHICSE                             | 0         | 23779     |  23779 |    0.00% |    0.00% |
| asm-printer.EmittedInsts                           | 7942328   | 7942392   |     64 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 273069192 | 273084704 |  15512 |    0.01% |    0.01% |
| correlated-value-propagation.NumPhis               | 18412     | 18539     |    127 |    0.69% |    0.69% |
| early-cse.NumCSE                                   | 2183283   | 2183227   |    -56 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 542090    |  -8015 |   -1.46% |    1.46% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640264   | 3664769   |  24505 |    0.67% |    0.67% |
| instcombine.NumDeadInst                            | 1778193   | 1783183   |   4990 |    0.28% |    0.28% |
| instcount.NumCallInst                              | 1758401   | 1758799   |    398 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330533    |    -24 |   -0.01% |    0.01% |
| instcount.TotalInsts                               | 8831952   | 8832286   |    334 |    0.00% |    0.00% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019808   | 999607    | -20201 |   -1.98% |    1.98% |
```
I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` -> `call`
transforms, and counter-intuitively results in *more* instructions total.

That being said, the PHI count doesn't decrease that much,
and looking at some examples, it seems at least some of them
were previously getting PHI CSE'd in SimplifyCFG of all places..

I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same time.
As a comment in `InstCombinerImpl::visitPHINode()` already stated,
there are no guarantees on the ordering of the operands of a PHI node,
so if we just naively compare them, we may false-negatively say that
the nodes are not equal when the only difference is operand order,
which is especially important since the fold is in InstSimplify,
so we can't rely on InstCombine sorting them beforehand.

Fixing this for the general case is costly (geomean +0.02%),
and does not appear to catch anything in test-suite, but for
the same-BB case, it's trivial, so let's fix at least that.

As per http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions
this appears to cause geomean +0.03% compile time increase (regression),
but geomean -0.01%..-0.04% code size decrease (improvement).
2020-08-27 18:47:04 +03:00
Roman Lebedev
46ec32bc09 [NFC][EarlyCSE][InstSimplify] Add tests for CSE of PHI nodes
PHI nodes depend on the block they're in,
so we can only deal with the most basic case of same-BB PHI's.
2020-08-27 18:47:03 +03:00
Russell Gallop
16726ba1da [Test] Tidy up loose ends from LLVM_HAS_GLOBAL_ISEL
This hasn't been allowed as a build option since r309990

Remove leftover REQUIRES: global-isel

Differential Revision: https://reviews.llvm.org/D86714
2020-08-27 16:36:27 +01:00
Alexandre Ganea
1c64f56c35 [Support] On Windows, add optional support for {rpmalloc|snmalloc|mimalloc}
This patch optionally replaces the CRT allocator (i.e., malloc and free) with rpmalloc (mixed public domain licence/MIT licence) or snmalloc (MIT licence) or mimalloc (MIT licence). Please note that the source code for these allocators must be available outside of LLVM's tree.

To enable, use `cmake ... -DLLVM_INTEGRATED_CRT_ALLOC=D:/git/rpmalloc -DLLVM_USE_CRT_RELEASE=MT` where `D:/git/rpmalloc` has already been git clone'd from `https://github.com/mjansson/rpmalloc`. The same applies to snmalloc and mimalloc.

When enabled, the allocator will be embeded (statically linked) into the LLVM tools & libraries. This currently only works with the static CRT (/MT), although using the dynamic CRT (/MD) could potentially work as well in the future.

When enabled, this changes the memory stack from:
  new/delete -> MS VC++ CRT malloc/free -> HeapAlloc -> VirtualAlloc
to:
  new/delete -> {rpmalloc|snmalloc|mimalloc} -> VirtualAlloc

The goal of this patch is to bypass the application's global heap - which is thread-safe thus inducing locking - and instead take advantage of a modern lock-free, thread cache, allocator. On a 6-core Xeon Skylake we observe a 2.5x decrease in execution time when linking a large scale application with LLD and ThinLTO (12 min 20 sec -> 5 min 34 sec), when all hardware threads are being used (using LLD's flag /opt:lldltojobs=all). On a dual 36-core Xeon Skylake with all hardware threads used, we observe a 24x decrease in execution time (1 h 2 min -> 2 min 38 sec) when linking a large application with LLD and ThinLTO. Clang build times also see a decrease in the range 5-10% depending on the configuration.

Differential Revision: https://reviews.llvm.org/D71786
2020-08-27 11:09:46 -04:00
diggerlin
5d038d06f5 Revert "[AIX][XCOFF] emit symbol visibility for xcoff object file."
This reverts commit a0818689213234d5a078641432d10eccccf61a13.

Based on the Hubert Tong'comment  https://reviews.llvm.org/D84265#inline-799085
2020-08-27 11:07:58 -04:00
Benjamin Kramer
1af5e41f38 [Hexagon] Fold another layer of single-use variable into assert. NFCI. 2020-08-27 16:52:34 +02:00
Benjamin Kramer
f7ce567fa8 [Hexagon] Fold single-use variable into assert. NFCI. 2020-08-27 16:44:22 +02:00
Matt Arsenault
6d03e1f153 AMDGPU: Hoist subtarget lookup 2020-08-27 10:27:56 -04:00
Krzysztof Parzyszek
fd8ba7e648 [Hexagon] Widen short vector stores to HVX vectors using masked stores
Also invent a flag -hexagon-hvx-widen=N to set the minimum threshold
for widening short vectors to HVX vectors.
2020-08-27 09:25:08 -05:00
Florian Hahn
cdab0a3e82 [SimplifyLibCalls] Remove over-eager early return in strlen optzns.
Currently we bail out early for strlen calls with a GEP operand, if none
of the GEP specific optimizations fire. But there could be later
optimizations that still apply,  which we currently miss out on.

An example is that we do not apply the following optimization
   strlen(x) == 0 --> *x == 0

Unless I am missing something, there seems to be no reason for bailing
out early there.

Fixes PR47149.

Reviewed By: lebedev.ri, xbolva00

Differential Revision: https://reviews.llvm.org/D85886
2020-08-27 15:19:45 +01:00
Pavel Labath
e258c2d84b [cmake] Make gtest include directories a part of the library interface
This applies the same fix that D84748 did for macro definitions.
Appropriate include path is now automatically set for all libraries
which link against gtest targets, which avoids the need to set
include_directories in various parts of the project.

Differential Revision: https://reviews.llvm.org/D86616
2020-08-27 15:35:57 +02:00
serge-sans-paille
65b0561260 Fix OpenMP deduplicateRuntimeCalls return status
Differential Revision: https://reviews.llvm.org/D86705
2020-08-27 15:01:04 +02:00
serge-sans-paille
7d0233352d Fix Attributor return status
Differential Revision: https://reviews.llvm.org/D86703
2020-08-27 15:01:04 +02:00
Jay Foad
a697cc2007 [AMDGPU] Remove unused variable introduced in r251860 2020-08-27 13:28:32 +01:00
Drew Wock
2a5c21f1b1 [FPEnv] Allow fneg + strict_fadd -> strict_fsub in DAGCombiner
This is the first of a set of DAGCombiner changes enabling strictfp
optimizations. I want to test to waters with this to make sure changes
like these are acceptable for the strictfp case- this particular change
should preserve exception ordering and result precision perfectly, and
many other possible changes appear to be able to as well.

Copied from regular fadd combines but modified to preserve ordering via
the chain, this change allows strict_fadd x, (fneg y) to become
struct_fsub x, y and strict_fadd (fneg x), y to become strict_fsub y, x.

Differential Revision: https://reviews.llvm.org/D85548
2020-08-27 08:17:01 -04:00
Russell Gallop
c9d8b1aa6b Fix for PS4 bots after 0b7f6cc71a72a85f8a0cbee836a7a8e31876951a 2020-08-27 12:47:26 +01:00
Florian Hahn
c351843892 [DSE,MemorySSA] Remove short-cut to check if all paths are covered.
The post-order number early continue does not work in some cases, e.g.
if a path from EarlierAccess to an exit includes a node that dominates
EarlierAccess in a cycle.

The short-cut only has very minor impact on compile-time, so it seems
straight-forward to remove it for now:

http://llvm-compile-time-tracker.com/compare.php?from=062412e79fcfedf2cf004433e42036b0333e3f83&to=d7386016a77ce1387bdbbf360f1de157faea9d31&stat=instructions

Fixes PR47285.
2020-08-27 12:42:40 +01:00
OCHyams
5a27fa8d7e Revert "[DWARF] Add cuttoff guarding quadratic validThroughout behaviour"
This reverts commit b9d977b0ca60c54f11615ca9d144c9f08b29fd85.

This cutoff is no longer required. The commit 34ffa7fc501 (D86153) introduces a
performance improvement which was tested against the motivating case for this
patch.

Discussed in differential revision: https://reviews.llvm.org/D86153
2020-08-27 11:52:30 +01:00
OCHyams
3e2347db56 [DwarfDebug] Improve validThroughout performance (4/4)
Almost NFC (see end).

The backwards scan in validThroughout significantly contributed to compile time
for a pathological case, causing the 'X86 Assembly Printer' pass to account for
roughly 70% of the run time. This patch guards the loop against running
unnecessarily, bringing the pass contribution down to 4%.

Almost NFC: There is a hack in validThroughout which promotes single constant
value DBG_VALUEs in the prologue to be live throughout the function. We're more
likely to hit this code path with this patch applied. Similarly to the parent
patches there is a small coverage change reported in the order of 10s of bytes.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D86153
2020-08-27 11:52:30 +01:00
OCHyams
4d679a7eae [DwarfDebug] Improve multi-BB single location detection in validThroughout (3/4)
With the changes introduced in D86151 we can now check for single locations
which span multiple blocks for inlined scopes and blocks.

D86151 introduced the InstructionOrdering parameter, replacing a scan through
MBB instructions. The functionality to compare instruction positions across
blocks was add there, and this patch just removes the exit checks that were
previously (but no longer) required.

CTMark shows a geomean binary size reduction of 2.2% for RelWithDebInfo builds.
llvm-locstats (using D85636) shows a very small variable location coverage
change in 5 of 10 binaries, but just like in D86151 it is only in the order of
10s of bytes.

Reviewed By: djtodoro

Differential Revision: https://reviews.llvm.org/D86152
2020-08-27 11:52:29 +01:00
OCHyams
463930d1cb [DwarfDebug] Improve single location detection in validThroughout (2/4)
With this patch we're now accounting for two more cases which should be
considered 'valid throughout': First, where RangeEnd is ScopeEnd. Second, where
RangeEnd comes before ScopeEnd when including meta instructions, but are both
preceded by the same non-meta instruction.

CTMark shows a geomean binary size reduction of 1.5% for RelWithDebInfo builds.
`llvm-locstats` (using D85636) shows a very small variable location coverage
change in 2 of 10 binaries, but it is in the order of 10s of bytes which lines
up with my expectations.

I've added a test which checks both of these new cases. The first check in the
test isn't strictly necessary for this patch. But I'm not sure that it is
explicitly tested anywhere else, and is useful for the final patch in the
series.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D86151
2020-08-27 11:52:29 +01:00
OCHyams
adfe449a17 [NFC][DebugInfo] Create InstructionOrdering helper class (1/4)
Group the map and methods used to query instruction ordering for trimVarLocs
(D82129) into a class. This will make it easier to reuse the functionality
upcoming patches.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D86150
2020-08-27 11:52:29 +01:00
Florian Hahn
40dcb7eec6 [DSE,MemorySSA] Add test for PR47285. 2020-08-27 11:27:06 +01:00
Vitaly Buka
c66ff34562 [NFC][ValueTracking] Cleanup a test 2020-08-27 03:25:10 -07:00
Mikhail Maltsev
63df72fb38 [AArch64] Optimize instruction selection for certain vector shuffles
This patch adds code to recognize vector shuffles which can be
represented as VDUP (splat) of a vector lane with of a different
(wider) type than the original vector lane type.

For example:
    shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
is essentially:
    shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 0, i32 0>

Such patterns are generated by the SelectionDAG machinery in some cases
(see DAGCombiner::visitBITCAST in DAGCombiner.cpp, the "Remove double
bitcasts from shuffles" part).

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D86225
2020-08-27 11:06:49 +01:00
Vitaly Buka
368c7592ca [NFC][ValueTracking] Fix typo in test 2020-08-27 03:01:30 -07:00
Paul Walker
a71aa114da [SVE] Fallback to default expansion when lowering SIGN_EXTEN_INREG from non-byte based source.
Differential Revision: https://reviews.llvm.org/D86394
2020-08-27 10:57:37 +01:00
Sander de Smalen
33c91e76d2 [AArch64][SVE] Add missing debug info for ACLE types.
This patch adds type information for SVE ACLE vector types,
by describing them as vectors, with a lower bound of 0, and
an upper bound described by a DWARF expression using the
AArch64 Vector Granule register (VG), which contains the
runtime multiple of 64bit granules in an SVE vector.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D86101
2020-08-27 10:56:42 +01:00
Sjoerd Meijer
7ef96b82c5 Follow up of rGca243b07276a: fixed a typo. NFC. 2020-08-27 10:53:41 +01:00
Alex Richardson
1928b323b7 [RISC-V] fmv.s/fmv.d should be as cheap as a move
Since the canonical floatig-point move is fsgnj rd, rs, rs, we should
handle this case in RISCVInstrInfo::isAsCheapAsAMove().

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D86518
2020-08-27 10:32:23 +01:00
Alex Richardson
aa5340b34a [RISC-V] Mark C_MV as a move instruction
Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D86517
2020-08-27 10:32:23 +01:00