1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 02:52:53 +02:00
Commit Graph

216220 Commits

Author SHA1 Message Date
Simon Pilgrim
4fa97550f8 [CostModel][X86] Improve accuracy of vXi64 MUL costs on AVX2/AVX512 targets
By llvm-mca analysis, Haswell/Broadwell has the worst v4i64 recip-throughput cost of the AVX2 targets at 6 (vs the currently used cost of 8). Similarly SkylakeServer (our only AVX512 target model) implements PMULLQ with an average cost of 1.5 (rounded up to 2.0), and the PMULUDQ-sequence (without AVX512DQ) as a cost of 6.
2021-05-24 09:48:32 +01:00
Chen Zheng
7ccc4dd201 [Debug-Info]update section name to match AIX behaviour; nfc 2021-05-24 04:33:41 -04:00
Florian Hahn
930085daca [VectorCombine] Scalarize vector load/extract.
This patch adds a new combine that tries to scalarize chains of
`extractelement (load %ptr), %idx` to `load (gep %ptr, %idx)`. This is
profitable when extracting only a few elements out of a large vector.

At the moment, `store (extractelement (load %ptr), %idx), %ptr`
operations on large vectors result in huge code in the backend.

This can easily be triggered by using the matrix extension, e.g.
https://clang.godbolt.org/z/qsccPdPf4

This should complement D98240.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D100273
2021-05-24 09:29:08 +01:00
Johannes Doerfert
e8185bdd21 [Attributor] Introduce a helper do deal with constant type mismatches
If we simplify values we sometimes end up with type mismatches. If the
value is a constant we can often cast it though to still allow
propagation. The logic is now put into a helper and it replaces some
ad hoc things we did before.

This also introduces the AA namespace for abstract attribute related
functions and types.
2021-05-23 23:00:40 -05:00
Johannes Doerfert
637e90a523 [Attributor] Teach AAIsDead about undef values
Not only if the branch or switch condition is dead but also if it is
assumed `undef` we can delay AAIsDead exploration.
2021-05-23 23:00:40 -05:00
Johannes Doerfert
8fbdaf26cb [Attributor] Deal with address spaces gracefully
When we do value propagation we need to cast address spaces properly.
2021-05-23 23:00:39 -05:00
Johannes Doerfert
3c31dd18ce [Attributor] Be more careful to not disturb the CG outside the SCC
We have seen various problems when the call graph was not updated or
the updated did not succeed because it involved functions outside the
SCC. This patch adds assertions and checks to avoid accidentally
changing something outside the SCC that would impact the call graph.
It also prevents us from reanalyzing functions outside the current
SCC which could cause problems on its own. Note that the transformations
we do might cause the CG to be "more precise" but the original one would
always be a super set of the most precise one. Since the call graph is
by nature an approximation, it is good enough to have a super set of all
call edges.
2021-05-23 23:00:39 -05:00
Johannes Doerfert
70d6e43525 [Attributor][FIX] Account for undef in the constant value lattice
The constant value lattice looks like this

```
  <None>
     |
  <undef>
  /  |   \
... <0>  ...
 \   |   /
 <unknown>
```
We did not account for the undef and assumed a value meant we could not
change anymore. Now we actually check if we have the same value as
before, which will signal CHANGED to the users when we go from undef to
a specific constant.

This fixes, among other things, the bug exposed by @ipccp4 in
`value-simplify.ll`.
2021-05-23 20:47:06 -05:00
Johannes Doerfert
0473f9be3c [Attributor][FIX] Ensure we replace undef if we see the first "real" value
The state of AAPotentialValues tracks if undef is contained. It should
fold undef into the first non-undef value. However we missed a case
before. There was also a shadowing definition of two variables that
caused trouble. The test exposes both problems.
2021-05-23 20:47:06 -05:00
Johannes Doerfert
42ec472f5c [Attributor][NFC] Precommit test case with branch on undef
This test exposes a bug in the module pass as it simplifies ipccp4 to
unreachable, which is unfortunately wrong.
2021-05-23 20:47:06 -05:00
Johannes Doerfert
ced56b844d [Attributor][NFC] Add helpful debug outputs 2021-05-23 20:47:05 -05:00
Johannes Doerfert
06fcc8738f [Attributor][NFC] Clang format the Attributor source files 2021-05-23 20:47:05 -05:00
Johannes Doerfert
27d2bed19b [Attributor][NFC] Rerun update_test_checks script on Attributor tests 2021-05-23 20:47:05 -05:00
Fady Ghanim
34efa8f465 [NFC] Removing leftover debug code
Removing a missed value::dump() used to debug during development of
OMPBuilder atomic.
2021-05-23 19:13:33 -04:00
Fangrui Song
7d66664724 [AArch64] Delete unneeded fixup_aarch64_ldr_pcrel_imm19 VK_GOT special case
An AArch64 VK_GOT fixup must have a symbol. MCAssembler::evaluateFixup considers
such a fixup not resolved. The code path cannot trigger.
2021-05-23 15:20:56 -07:00
Fady Ghanim
bb0b21b662 [OpenMP][OMPIRBuilder]Adding support for omp atomic
This patch adds support for generating `omp atomic` for all different
atomic clauses
2021-05-23 17:44:09 -04:00
Philipp Krones
df7a8b162e [MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo
This makes it possible for targets to define their own MCObjectFileInfo.
This MCObjectFileInfo is then used to determine things like section alignment.

This is a follow up to D101462 and prepares for the RISCV backend defining the
text section alignment depending on the enabled extensions.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D101921
2021-05-23 14:15:23 -07:00
Nikita Popov
130d03950d [LoopUnroll] Add test for partial unrolling again non-latch exit (NFC)
This test case would get miscompiled by the current version of
D102982, because unrolling does not respect the PreserveCondBr
flag for partial unrolling.
2021-05-23 23:10:23 +02:00
Fangrui Song
68c6e43bf6 [AArch64][MC] Remove unneeded "in .xxx directive" from diagnostics
The prevailing style does not add the message. The directive name is not useful
because the next line replicates the error line which includes the directive.
2021-05-23 13:58:16 -07:00
Joerg Sonnenberger
4ecac84bb1 [SPARC] recognize the "rd %pc, reg" special form
Differential Revision: https://reviews.llvm.org/D96312
2021-05-23 22:52:59 +02:00
Sander de Smalen
849ddeeded NFC: cleaned up and renamed scalable-vf-analysis.ll -> scalable-vectorization.ll
* Removes unnecessary loop hints.
* Use RUN line with '-scalable-vectorization=preferred' instead of 'on'
  for the maximize-bandwidth behaviour. This prepares the test for enabling
  scalable vectorization; With a forced instruction-cost of 1, 'on' will
  always favour fixed-width VF to be chosen, whereas with 'preferred'
  we can check that the maximize-bandwidth option in combination with
  scalable-vectorization=preferred actually picks a scalable VF.
* Renamed to scalable-vectorization.ll, because a follow-up patch will
  test more than just analysis.
2021-05-23 19:53:51 +01:00
Roman Lebedev
9f647330f9 [NFC][X86][Costmodel] Add tests with with masked loads/stores w/non-power-of-two vectors 2021-05-23 21:45:36 +03:00
Fangrui Song
653c782076 [AArch64] Use \t in AsmStreamer to match the prevailing style 2021-05-23 11:35:42 -07:00
Simon Pilgrim
da358a290c Fix bugs URL for PR relocations
The PR works from llvm.org, not bugs.llvm.org
2021-05-23 17:19:36 +01:00
Simon Pilgrim
f8e2239648 [CostModel][X86] Align v2i64 MUL costs on SSE42+ targets with worst case
Based on worst case of sandybridge (which seems to match nehalem for this SSE sequence) (vs btver2 + bdver2) llvm-mca analysis
2021-05-23 16:20:57 +01:00
Nico Weber
9cfe1a59a7 [gn build] (semi-manually) port 0bccdf82f705 2021-05-23 10:01:06 -04:00
Sanjay Patel
645f84951e [InstSimplify] add more tests for rem-mul-div; NFC
See D102864 for discussion.
2021-05-23 09:46:29 -04:00
maekawatoshiki
0614e76510 [LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D99149
2021-05-23 22:32:01 +09:00
Nikita Popov
8df1db8535 [LoopUnroll] Add test for unrollable non-latch multi-exit (NFC)
This test case requires unrolling against a non-latch exit in
a multiple-exit loop with exiting latch. It's not covered by
exiting heuristics or the extension in D102635.
2021-05-23 10:51:45 +02:00
David Green
77bb6ccfb3 [ARM] Add extra debug messages for gather/scatter lowering. NFC 2021-05-23 08:52:13 +01:00
Martin Storsjö
a30052855f [Windows] Use TerminateProcess to exit without running destructors
If exiting using _Exit or ExitProcess, DLLs are still unloaded
cleanly before exiting, running destructors and other cleanup in those
DLLs. When the caller expects to exit without cleanup, running
destructors in some loaded DLLs (which can be either libLLVM.dll or
e.g. libc++.dll) can cause deadlocks occasionally.

This is an alternative to D102684.

Differential Revision: https://reviews.llvm.org/D102944
2021-05-22 23:41:40 +03:00
Martin Storsjö
68aa000dc9 [MinGW] Mark a number of library functions unavailable for mingw targets
These functions were marked unavailable for MSVC targets before,
within an "T.isOSWindows() && !T.isOSCygMing()" block, but these ones
are unavailable on MinGW targets too.

This avoids generating calls to stpcpy for MinGW targets, which has
been happening since 6dbf0cfcf789365493f70ae69df8a7a59be41c75 (in
some cases).

This fixes https://github.com/mstorsjo/llvm-mingw/issues/201.

Differential Revision: https://reviews.llvm.org/D102946
2021-05-22 23:40:19 +03:00
Simon Pilgrim
f6dc684c0c [CostModel][X86] Align v4i64 MUL costs on AVX1 targets with worst case
Based on worst case of sandybridge (vs btver2 + bdver2) llvm-mca analysis - which is a lot less than what we were predicting (I think based off total uop count).
2021-05-22 20:07:55 +01:00
Nikita Popov
66053da05c [IR] Optimize no-op removal from AttributeList (NFC)
When removing an AttrBuilder from an index of an AttributeList,
directly return the original list if no attributes were actually
removed.
2021-05-22 19:03:27 +02:00
Nikita Popov
632878de15 [IR] Optimize no-op removal from AttributeSet (NFC)
When removing an AttrBuilder from an AttributeSet, first check
whether there is any overlap. If nothing is being removed, we can
directly return the original set.
2021-05-22 18:55:25 +02:00
Simon Pilgrim
0bdfd66523 [CostModel][X86] Pull out X86/X64 scalar int arithmetric costs from SSE tables. NFCI.
These aren't dependent on any SSE level (and don't tend to get quicker either).
2021-05-22 16:13:49 +01:00
Lang Hames
da54af9961 [ORC] Add more synchronization to TestLookupWithUnthreadedMaterialization.
Don't run tasks until their corresponding thread has been added to the running
threads vector. This is an extention to fda4300da82, which doesn't seem to have
been enough to fix the synchronization issues on its own.
2021-05-22 07:59:24 -07:00
Lang Hames
635b337d6d [JITLink] Move some Block bitfields into Addressable to improve packing.
Keeping these bitfields from Block to Addressable allows them to be packed with
the bitfields at the end of Addressable, reducing the size of Block by eight
bytes.
2021-05-22 07:59:24 -07:00
Yaxun (Sam) Liu
80152597c5 [HIP] support ThinLTO
Add options -[no-]offload-lto and -foffload-lto=[thin,full] for controlling
LTO for offload compilation. Allow LTO for AMDGPU target.

AMDGPU target does not support codegen of object files containing
call of external functions, therefore the LLVM module passed to
AMDGPU backend needs to contain definitions of all the callees.
An LLVM option is added to allow function importer to import
functions with noinline attribute.

HIP toolchain passes proper LLVM options to lld to make sure
function importer imports definitions of all the callees.

Reviewed by: Teresa Johnson, Artem Belevich

Differential Revision: https://reviews.llvm.org/D99683
2021-05-22 10:48:34 -04:00
Nikita Popov
8c07a78a96 Reapply [InstCombine] Fold multiuse shr eq zero
This was reverted due to performance regressions in ARM benchmarks,
which have since been addressed by D101196 (SCEV analysis improvement)
and D101778 (CGP reverse transform).

-----

The single-use case is handled implicity by converting the icmp
into a mask check first. When comparing with zero in particular,
we don't need the one-use restriction, as we only produce a single
icmp.

https://alive2.llvm.org/ce/z/MSixcm
https://alive2.llvm.org/ce/z/GwpG0M
2021-05-22 14:46:50 +02:00
David Green
aca1951bd7 [ARM] Clean up some tests, removing dead instructions. NFC 2021-05-22 13:38:00 +01:00
Simon Pilgrim
99610346a2 [CostModel][X86] vXi8 MUL is always promoted to vXi16 2021-05-22 11:56:49 +01:00
Florian Hahn
2e9363c644 [Matrix] Bail out early if there are no matrix intrinsics.
If there are no matrix intrinsics in a function, we can directly bail
out, as there's nothing left to do.

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D102931
2021-05-22 11:37:25 +01:00
Simon Pilgrim
4e6ce44315 [CostModel][X86] Add test coverage for sub-64bit vXi8 multiplication costs
These can be cheaply promoted to a single v8i16 vector for multiplication
2021-05-22 11:33:36 +01:00
Simon Pilgrim
1482151467 [CostModel][X86] Improve v8i32 MUL costs on AVX1 targets to account for slower btver2
BTVER2 has a 2 cycle throughput for v4i32 multiplies (same as SSE41 targets), which is only partially hidden by the subvector extracts/insert when splitting v8i32.
2021-05-22 11:13:07 +01:00
Tomasz Miąsko
f7d35d6801 [Demangle][Rust] Parse function signatures
Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D102581
2021-05-22 11:49:08 +02:00
Tomasz Miąsko
c8ca4bcf8b [Demangle][Rust] Parse references
Reviewed By: dblaikie

Part of https://reviews.llvm.org/D102580
2021-05-22 11:49:08 +02:00
Tomasz Miąsko
563ebde96d [Demangle][Rust] Parse raw pointers
Reviewed By: dblaikie

Part of https://reviews.llvm.org/D102580
2021-05-22 11:49:08 +02:00
Nikita Popov
76bc4c8b9d [CVP] Add test for PR50399 (NFC) 2021-05-22 11:21:34 +02:00
Roman Lebedev
325fca859f Reland [X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost()
Now that getMemoryOpCost() correctly handles all the vector variants,
we should no longer hand-roll our own version of it, but use it directly.

The AVX512 variant probably needs a similar change,
but there it is less obvious.

This was initially landed in 69ed93a4355123a45c1d7216aea7cd53d07a361b,
but was reverted in 6b95fd199d96e3ba5c28a23b17b74203522bdaa8
because the patch it depends on was reverted.
2021-05-22 11:47:08 +03:00