1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 18:54:02 +01:00
Commit Graph

218470 Commits

Author SHA1 Message Date
Simon Pilgrim
e0937686e0 [InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183) (REAPPLIED)
As discussed on PR50183, we already fold to prefer 'select-of-idx' vs 'select-of-gep':

define <4 x i32>* @select0a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %gep0 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %gep1 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a3
  %sel = select i1 %a2, <4 x i32>* %gep0, <4 x i32>* %gep1
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %sel = select i1 %a2, i64 %a1, i64 %a3
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

This patch adds basic handling for the 'fallthrough' cases where the gep idx == 0 has been folded away to the base address:

define <4 x i32>* @select0(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %sel = select i1 %a2, <4 x i32>* %a0, <4 x i32>* %gep
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %sel = select i1 %a2, i64 0, i64 %a1
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

Reapplied with a fix for the bpf "-bpf-disable-avoid-speculation" tests

Differential Revision: https://reviews.llvm.org/D105901
2021-07-14 12:21:01 +01:00
Chuanqi Xu
6ee3360389 [NFC] [Coroutines] Remove unused CoroFree 2021-07-14 19:13:12 +08:00
Simon Pilgrim
32c30ddee7 [X86] Implement smarter instruction lowering for FP_TO_UINT from f32/f64 to i32/i64 and vXf32/vXf64 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction.
We know that "CVTTPS2SI" returns 0x80000000 for out of range inputs (and for FP_TO_UINT, negative float values are undefined). We can use this to make unsigned conversions from vXf32 to vXi32 more efficient, particularly on targets without blend using the following logic:

small := CVTTPS2SI(x);
fp_to_ui(x) := small | (CVTTPS2SI(x - 2^31) & ARITHMETIC_RIGHT_SHIFT(small, 31))

Even on targets where "PBLENDVPS"/"PBLENDVB" exists, it is often a latency 2, low throughput instruction so this logic is applied there too (in particular for AVX2 also). It furthermore gets rid of one high latency floating point comparison in the previous lowering.

@TomHender checked the correctness of this for all possible floats between -1 and 2^32 (both ends excluded).

Original Patch by @TomHender (Tom Hender)

Differential Revision: https://reviews.llvm.org/D89697
2021-07-14 12:03:49 +01:00
LLVM GN Syncbot
b00ffab852 [gn build] Port c08dabb0f476 2021-07-14 10:49:08 +00:00
Simon Pilgrim
9be15018bd Revert rGb803294cf78714303db2d3647291a2308347ef23 : "[InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183)"
Missed some BPF test changes that need addressing
2021-07-14 11:48:37 +01:00
Nico Weber
0eb0133948 [gn build] (manually) merge 462d4de35b0c 2021-07-14 06:43:23 -04:00
Stefan Pintilie
d3442926d3 [NFC][PowerPC] Added test to check regsiter allocation for ACC registers
ACC regsiters are a combination of 4 consecutive vector regsiters and therefore
somtimes require special treatment for register allocation. This patch only
adds a test.
2021-07-14 05:22:24 -05:00
Stephen Tozer
6422696166 [DebugInfo] Correctly update dbg.values with duplicated location ops
This patch fixes code that incorrectly handled dbg.values with duplicate
location operands, i.e. !DIArgList(i32 %a, i32 %a). The errors in
question were caused by either applying an update to dbg.value multiple
times when the update is only valid once, or by updating the
DIExpression for only the first instance of a value that appears
multiple times.

Differential Revision: https://reviews.llvm.org/D105831
2021-07-14 11:17:24 +01:00
Simon Pilgrim
417d5c9876 [InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183)
As discussed on PR50183, we already fold to prefer 'select-of-idx' vs 'select-of-gep':

define <4 x i32>* @select0a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %gep0 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %gep1 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a3
  %sel = select i1 %a2, <4 x i32>* %gep0, <4 x i32>* %gep1
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %sel = select i1 %a2, i64 %a1, i64 %a3
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

This patch adds basic handling for the 'fallthrough' cases where the gep idx == 0 has been folded away to the base address:

define <4 x i32>* @select0(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %sel = select i1 %a2, <4 x i32>* %a0, <4 x i32>* %gep
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %sel = select i1 %a2, i64 0, i64 %a1
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

Differential Revision: https://reviews.llvm.org/D105901
2021-07-14 10:49:12 +01:00
Fraser Cormack
0a5760dcd6 [RISCV] Fix the neutral element in vector 'fadd' reductions
Using positive zero as the neutral element in 'fadd' reductions, while
it generates better code, is incorrect. The correct neutral element is
negative zero: 0.0 + -0.0 = 0.0, whereas -0.0 + -0.0 = -0.0.

There are perhaps more optimal lowerings of negative zero avoiding
constant-pool loads which could be left as future work.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D105902
2021-07-14 10:18:38 +01:00
Sebastian Neubauer
811bfa448e [AMDGPU] Init scratch only if necessary
If no scratch or flat instructions are used, we do not need to
initialize the flat scratch hardware register.

Differential Revision: https://reviews.llvm.org/D105920
2021-07-14 10:45:22 +02:00
Sebastian Neubauer
5a48184106 [AMDGPU] Precommit flat-scratch-init.ll test 2021-07-14 10:45:22 +02:00
Cullen Rhodes
fcd9253fa0 [AArch64][SME] Add matrix register definitions and parsing support
SME introduces the ZA array, a new piece of architectural register state
consisting of a matrix of [SVLb x SVLb] bytes, where SVL is the
implementation defined Streaming SVE vector length and SVLb is the
number of 8-bit elements in a vector of SVL bits.

SME instructions consist of three types of matrix operands:

  * Tiles: a ZA tile is a square, two-dimensional sub-array of elements
  within the ZA array. These tiles make up the larger accumulator array
  and the granularity varies based on the element size, i.e.
    - ZAQ0..ZAQ15 (smallest tile granule)
    - ZAD0..ZAD7
    - ZAS0..ZAS3
    - ZAH0..ZAH1
    or ZAB0       (largest tile granule, single tile)
  * Tile vectors: similar to regular tiles, but have an extra 'h' or 'v'
  to tell how the vector at [reg+offset] is layed out in the tile,
  horizontally or vertically. E.g. za1h.h or za15v.q, which corresponds
  to vectors in registers ZAH1 and ZAQ15, respectively.
  * Accumulator matrix: this is the entire accumulator array ZA.

This patch adds the register classes and related operands and parsing
for SME instructions operating on the accumulator array.

The ADDHA and ADDVA instructions which operate on tiles are also added
in this patch to make some use of the code added, later patches will
make use of the other operands introduced here.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Co-authored by: Sander de Smalen (@sdesmalen)

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D105570
2021-07-14 08:25:49 +00:00
Ruiling Song
09d83ebb79 [AMDGPU] Don't handle export done when unify exit nodes
This patch aims to revert the changes introduced by D70781 D71192 D76364

D70781 was introduced to fix hardware hang where we do not insert exp-
null-done for a kill inside infinit loop. At that time we have not added
exp-null-done for kill early termination, but I believe as for now, we will
always add the exp-null-done for early termination case in LaterBranchLowering.

D71192 was introduced to handle the only_kill case, which is also been
handled by the kill early termination work.

D76364 was used to fix a regression by D71192, where we cleared the done
bit of the export in the existing program and not let the normal return
block branching to the new unified return block.

With this change, we just trust frontends have setup exp-done correctly
which is true for all existing frontends. The backend only inserts
exp-null-done for the kill cases which is handled in SILateBranchLowering.cpp.

Reviewed by: critson

Differential Revision: https://reviews.llvm.org/D105610
2021-07-14 14:54:37 +08:00
Ruiling Song
d76956bb01 [NFC][AMDGPU] autogenerate kill-infinite-loop.ll checks
This would help us to track the assembly changes to these tests.

Reviewed by: foad

Differential Revision: https://reviews.llvm.org/D105609
2021-07-14 14:52:31 +08:00
Ruiling Song
5d215901cd [RegisterCoalescer] Resolve conflict based on liveness of subregister
Currently we are resolving lane/subregister conflict by visiting
instructions sequentially in current block to see whether there is any
use of the tainted lanes. To save compile time, we are not doing further
check in successor blocks. This sounds reasonable without subgregister liveness.

But since we have added subregister liveness tracking capability to
register coalescer, we can easily determine whether we have subregister
liveness conflict by checking subranges. This would help coalescing more
COPYs for target that enables subregister liveness tracking.

Reviewed by: arsenm, qcolombet

Differential Revision: https://reviews.llvm.org/D104509
2021-07-14 14:43:22 +08:00
Yuichi Yoshida
a1f5d65f19 Reformulate OrcJIT tutorial doc to make it more clear.
Fixed a minor writing error. The text was hard to understand.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D105899
2021-07-14 05:48:01 +00:00
Jinsong Ji
74491befb9 [AIX] Update testcase to use aix triple
We have implemented the basic MCAsmParser now, we can use the triple
directly now.
2021-07-14 03:32:37 +00:00
Hongtao Yu
d565cac9a2 [CSSPGO][llvm-profgen] Fix a missing initalization
Fixing a missing initalization that accidentaly caused by https://reviews.llvm.org/D103178 .
2021-07-13 19:49:55 -07:00
Hongtao Yu
886d278938 Revert "[CSSPGO][llvm-profgen] Fix a missing initalization"
This reverts commit fef5f4456abcb1ea052206db6c232468d70b07f2.
2021-07-13 19:48:58 -07:00
Hongtao Yu
66715cb6e7 [CSSPGO][llvm-profgen] Fix a missing initalization
Fixing a missing initalization that accidentaly caused by https://reviews.llvm.org/D103178 .
2021-07-13 19:46:18 -07:00
Shilei Tian
7cc27af3c6 [AbstractAttributor] Fold function calls to __kmpc_is_spmd_exec_mode if possible
In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode`
to query the execution mode of current kernels. In many cases, user programs
only contain target region executing in one mode. As a consequence, those runtime
function calls will only return one value. If we can get rid of these function
calls during compliation, it can potentially improve performance.

In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for
each kernel (device) function `F`, we collect all kernel entries `K` that can
reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each
iteration, it will check all reaching kernel entries, and update the folded value
accordingly.

In the future we will support more function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105787
2021-07-13 22:28:35 -04:00
Philip Reames
dbffa1f255 [SCEV] Handle zero stride correctly in howManyLessThans
This is split from D105216, but the code is hoisted much earlier into
the path where we can actually get a zero stride flowing through. Some
fairly simple proofs handle the cases which show up in practice. The
only test changes are the cases where we really do need a non-zero
divider to produce the right result.

Recommitting with isLoopInvariant() check.

Differential Revision: https://reviews.llvm.org/D105921
2021-07-13 19:14:01 -07:00
Hongtao Yu
56987d8f22 [NFC][CSSPGO] Rename the name of an enum value. 2021-07-13 18:30:16 -07:00
Hongtao Yu
5474d8b64f [CSSPGO] Do not import pseudo probe desc in thinLTO
Previously we reliedy on pseudo probe descriptors to look up precomputed GUID during probe emission for inlined probes. Since we are moving to always using unique linkage names, GUID for functions can be computed in place from dwarf names. This eliminates the need of importing pseudo probe descs in thinlto, since those descs should be emitted by the original modules.

This significantly reduces thinlto memory footprint in some extreme case where the number of imported modules for a single module is massive.

Test Plan:

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D105248
2021-07-13 18:26:36 -07:00
Hongtao Yu
0eee65899b [CSSPGO][llvm-profgen] Allow multiple executable load segments.
The linker or post-link optimizer can create an ELF image with multiple executable segments each of which will be loaded separately at run time. This breaks the assumption of llvm-profgen that currently only supports one base load address. What it ends up with is that the subsequent mmap events will be treated as an overwrite of the first mmap event which will in turn screw up address mapping. While it is non-trivial to support multiple separate load addresses and given that on x64 those segments will always be loaded at consecutive addresses (though via separate mmap
sys calls), I'm adding an error checking logic to bail out if that's violated and keep using a single load address which is the address of the first executable segment.

Also changing the disassembly output from printing section offset to printing the virtual address instead, which matches the behavior of objdump.

Differential Revision: https://reviews.llvm.org/D103178
2021-07-13 18:22:24 -07:00
Jon Roelofs
f09d547d01 [AArch64] rm unused subreg's 2021-07-13 18:06:31 -07:00
Jon Roelofs
940b1930e5 [AArch64] Fix AArch64::dsub's size 2021-07-13 18:06:31 -07:00
Arthur Eubanks
222304d82f Revert "[SCEV] Handle zero stride correctly in howManyLessThans"
This reverts commit 4df591b5c960affd1612e330d0c9cd3076c18053.

Causes crashes, see comments on D105921.
2021-07-13 17:53:48 -07:00
Jessica Paquette
9f6c8f9557 [AArch64][GlobalISel] Mark v2s64 -> v2p0 G_INTTOPTR as legal
Allow

```
%x:_<2 x p0> = G_INTTOPTR %y:_<2 x s64>
```

This shows up when building clang for AArch64 with GlobalISel.

Also show that we can select it.

This should match SDAG's behaviour: https://godbolt.org/z/33oqYoaYv

Differential Revision: https://reviews.llvm.org/D105944
2021-07-13 17:28:14 -07:00
Matt Arsenault
ae4bdf805a AMDGPU: Try to fix test failure with EXPENSIVE_CHECKS
The machine verifier is enabled by default for EXPENSIVE_CHECKS, so
the pass runs of it would pollute the output here.
2021-07-13 19:34:14 -04:00
Arthur Eubanks
78dcef41e1 [NewPM][SimpleLoopUnswitch] Add option to not trivially unswitch
To help with debugging non-trivial unswitching issues.

Don't care about the legacy pass, nobody is using it.

If a pass's string params are empty (e.g. "simple-loop-unswitch"), don't
default to the empty constructor for the pass params. We should still
let the parser take care of it in case the parser has its own defaults.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D105933
2021-07-13 16:09:42 -07:00
Matt Arsenault
74be1319be RegAlloc: Allow targets to split register allocation
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register
classes are handled at the same time, this was problematic. We don't
know ahead of time how many registers will be needed to be reserved to
handle the spilling. If no VGPRs were left for spilling, we would have
to try to spill to memory. If the spilled SGPRs were required for exec
mask manipulation, it is highly problematic because the lanes active
at the point of spill are not necessarily the same as at the restore
point.

Avoid this problem by fully allocating SGPRs in a separate regalloc
run from VGPRs. This way we know the exact number of VGPRs needed, and
can reserve them for a second run.  This fixes the most serious
issues, but it is still possible using inline asm to make all VGPRs
unavailable. Start erroring in the case where we ever would require
memory for an SGPR spill.

This is implemented by giving each regalloc pass a callback which
reports if a register class should be handled or not. A few passes
need some small changes to deal with leftover virtual registers.

In the AMDGPU implementation, a new pass is introduced to take the
place of PrologEpilogInserter for SGPR spills emitted during the first
run.

One disadvantage of this is currently StackSlotColoring is no longer
used for SGPR spills. It would need to be run again, which will
require more work.

Error if the standard -regalloc option is used. Introduce new separate
-sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be
controlled individually. PBQB is not currently supported, so this also
prevents using the unhandled allocator.
2021-07-13 18:49:29 -04:00
Eli Friedman
60fc4afb9a [ScalarEvolution] Make isKnownNonZero handle more cases.
Using an unsigned range instead of signed ranges is a bit more precise.

Differential Revision: https://reviews.llvm.org/D105941
2021-07-13 15:36:45 -07:00
Victor Huang
6d07c374d0 [PowerPC] Add PowerPC compare and multiply related builtins and instrinsics for XL compatibility
This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch adds the builtins and instrisics for compare
and multiply related operations.

Reviewed By: nemanjai, #powerpc

Differential revision: https://reviews.llvm.org/D102875
2021-07-13 16:55:09 -05:00
Derek Schuff
ca2e9aa30c [WebAssembly] Run varargs codegen test with non-emscripten triple
This is a followup from D105749 to cover both triples in the case
where they differ.
2021-07-13 14:31:19 -07:00
Alexander Yermolovich
8ba7df9183 [LLD] Adding support for RELA for CG Profile.
This is a follow up to https://reviews.llvm.org/D104080, and ca3bdb57fa (diff-e64a48fabe31db213a631fdc5f2acb51bdddf3f16a8fb2928784f4c579229585). The implementation of  call graph profile was changed from a black box section to relocation approach. This was done to be compatible with post processing tools like strip/objcopy, and llvm equivalent. When they are invoked on object file before the final linking step with this new approach the symbol indices correctness is preserved.

The GNU binutils tools change the REL section to RELA section, unlike llvm tools. For example when strip -S is run on the ELF object files, as an intermediate step before linking. To preserve compatibility this patch extends implementation in LLD and ELFDumper to support both REL and RELA sections for call graph profile.

Reviewed By: MaskRay, jhenderson

Differential Revision: https://reviews.llvm.org/D105217
2021-07-13 13:56:30 -07:00
Philip Reames
50475a84ee [SCEV] Handle zero stride correctly in howManyLessThans
This is split from D105216, but the code is hoisted much earlier into the path where we can actually get a zero stride flowing through. Some fairly simple proofs handle the cases which show up in practice. The only test changes are the cases where we really do need a non-zero divider to produce the right result.

Differential Revision: https://reviews.llvm.org/D105921
2021-07-13 13:31:40 -07:00
Vedant Kumar
eca523aea7 [docs/llvm-cov] Document -compilation-dir
Document the `-compilation-dir` option added in D100232.

Differential Revision: https://reviews.llvm.org/D105826
2021-07-13 13:10:02 -07:00
Eli Friedman
240a6f4cf4 [NFC] Use CHECK-LABEL in trip-count-unknown-stride.ll 2021-07-13 12:21:13 -07:00
Eli Friedman
309b52e532 [LoopReroll] Add an extra defensive check to avoid SCEV assertion.
Make sure getMinusSCEV() didn't return a pointer.  The following check
would never succeed if it was a pointer, anyway, but calling
getMulExpr() on a pointer SCEV now asserts.
2021-07-13 12:17:09 -07:00
Nico Weber
fa89c8064f [gn build] (manually) port 303ddb60a2d2 2021-07-13 15:15:38 -04:00
Philip Reames
b8173e8f1b [tests] Precommit a test case from D105216 2021-07-13 12:02:44 -07:00
Philip Reames
3fbf51e263 [SCEV] Strengthen inference of RHS > Start in howManyLessThans
Split off from D105216 to simplify review.  Rewritten with a lambda to be easier to follow.  Comments clarified.

Sorry for no test case, this is tricky to exercise with the current structure of the code.  It's about to be hit more frequently in a follow up patch, and the change itself is simple.
2021-07-13 11:54:07 -07:00
Jon Roelofs
657f0ebfbd [Tests] Fix test broken by: 43c7ca8e4963 [AArch64][GlobalISel] Legalize store <2 x i16> 2021-07-13 11:36:36 -07:00
Krishna Kariya
f29381cd87 [InstCombine] Precommit tests for D105088 (NFC)
Add tests for D105088, as well as an option to disable the
(generally) unsound inttoptr of ptrtoint optimization.

Differential Revision: https://reviews.llvm.org/D105771
2021-07-13 20:35:04 +02:00
Thomas Lively
a15b5073d0 [WebAssembly] Generate checks for simd-load-store-alignment.ll
This will make it easier to update these tests as we add support for generating
more SIMD loads and stores with custom alignments.

Differential Revision: https://reviews.llvm.org/D105862
2021-07-13 11:25:33 -07:00
Victor Huang
9de7900a54 [PowerPC][NFC] Power ISA features for Semachecking
[NFC] This patch adds features for pwr7, pwr8, and pwr9 that can be
used for semachecking builtin functions that are only valid for certain
versions of ppc.

Reviewed By: nemanjai, #powerpc
Authored By: Quinn Pham <Quinn.Pham@ibm.com>

Differential revision: https://reviews.llvm.org/D105501
2021-07-13 13:13:34 -05:00
Victor Huang
d46c7e167b Revert "[PowerPC][NFC] Power ISA features for Semachecking"
This reverts commit 10e0cdfc6526578c8892d895c0448e77cb9ba876.
2021-07-13 13:13:34 -05:00
Jon Roelofs
8feee53f22 [AArch64][GlobalISel] Legalize load <2 x i16>
Differential revision: https://reviews.llvm.org/D105913
2021-07-13 11:12:05 -07:00