1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 02:52:53 +02:00
Commit Graph

191565 Commits

Author SHA1 Message Date
Matt Arsenault
fabdf2e6da GlobalISel: Fix narrowScalar for G_{CTLZ|CTTZ}_ZERO_UNDEF
Narrow these for 64-bit VALU for AMDGPU.
2020-02-09 19:02:38 -05:00
Matt Arsenault
4dfb5b8a70 AMDGPU/GlobalISel: Split 64-bit G_CTPOP in RegBankSelect 2020-02-09 18:39:33 -05:00
Matt Arsenault
a025afb406 GlobalISel: Fix narrowing of G_CTLZ/G_CTTZ
The result type is separate from the source type.
2020-02-09 18:11:43 -05:00
Matt Arsenault
49e5ee334c AMDGPU/GlobalISel: Don't mis-select vector index on a constant
Vector indexing with a constant index should be folded out in the
legalizer, but this was accidentally falling through. This would
produce the indexing operation with $noreg. Handle this case as a
dynamic index just in case a bug like this happens again in the
future.
2020-02-09 18:02:37 -05:00
Matt Arsenault
11aefea8e8 AMDGPU/GlobalISel: Look through casts when legalizing vector indexing
We were failing to find constants that were casted. I feel like the
artifact combiner should have folded the constant in the trunc before
the custom lowering, but that doesn't happen.
2020-02-09 18:02:10 -05:00
Matt Arsenault
7f0a98ae35 AMDGPU: Remove dead kill handling
At one point a custom node was used for kill handling, but now the
intrinsic is directly selected. Remove leftover pattern machinery.
2020-02-09 17:59:24 -05:00
Matt Arsenault
780e89dd58 AMDGPU: Fix SI_IF lowering when the save exec reg has terminator uses
Reverts part of 6524a7a2b9ca072bd7f7b4355d1230e70c679d2f. Since that
commit, the expansion was ignoring the actual save exec register
produced by the instruction, and looking at other instructions. I do
not understand why it was looking at other instructions, but relying
on this scan was wrong.

Fixes verifier errors after SI_IF is tail duplicated, which should be
correct to do. The results were fed into a phi, which was lowered to
the S_MOV_B64_term instructions.
2020-02-09 17:59:19 -05:00
Simon Pilgrim
4d7c42b93d [X86] combineConcatVectorOps - combine VROTLI/VROTRI ops
Fix issue mentioned on rGe82e17d4d4ca - non-AVX512BW targets failed to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).
2020-02-09 21:50:10 +00:00
Craig Topper
d20adfcf42 [X86] Use custom isel for (X86sbb_flag 0, 0) so we can use 32-bit SBB for i8/i16.
We were using MOV32r0 and an extract_subreg as an input. By using
custom isel we can move the extract_subreg to after the SBB instead
of on the input.
2020-02-09 13:19:35 -08:00
Craig Topper
11d30da96a [X86] Add flag result VT to a MOV32r0 created in X86DAGToDAGISel::Select
The flag isn't used, but I believe this matches the MOV32r0 that
would be created by the table emitter. This should allow this node
to be CSEed with any others created by the table.
2020-02-09 13:19:21 -08:00
Simon Pilgrim
1f239dd6be [X86] Add lowerShuffleAsBitRotate (PR44379)
As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets.

This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway.

There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch.

Also, non-AVX512BW targets fail to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).
2020-02-09 21:15:03 +00:00
Craig Topper
ac35a8a35e [X86] Use MVT::i32 for the type of a MOV32r0 created in X86DAGToDAGISel::Select.
Not sure if this really matters. The VT isn't really used after
this point. At best it might affect CSE.
2020-02-09 11:57:42 -08:00
Craig Topper
c35ae80199 [X86] Remove isel patterns that include a vselect/X86selects and a strict FP node.
A vselect+strictfp node is not equivalent to a masked operation.
The exceptions of the strictfp node are not masked by a vselect
after it so we can't match it to a masked operation.

We already had a hack in IsLegalToFold to prevent these patterns from
matching. This patch removes that hack and removes the patterns.
2020-02-09 11:45:54 -08:00
Simon Pilgrim
574e93990b [X86][XOP] Add XOP target to vXi16/vXi8 shuffle tests
Helps with bit rotation test coverage for PR44379
2020-02-09 18:35:51 +00:00
Simon Pilgrim
ddaedac538 [X86][SSE] Add more tests showing failure to lower shuffles as bit rotations 2020-02-09 18:35:51 +00:00
Simon Pilgrim
9fac692aaa [X86] Rename matchShuffleAsRotate - matchShuffleAsByteRotate. NFCI.
A matchShuffleAsBitRotate variant will be added soon and we need to make the difference more obvious.
2020-02-09 18:35:50 +00:00
LLVM GN Syncbot
286adde9cb [gn build] Port a17f03bd939 2020-02-09 15:41:05 +00:00
Sanjay Patel
35167a94b8 [VectorCombine] new IR transform pass for partial vector ops
We have several bug reports that could be characterized as "reducing scalarization",
and this topic was also raised on llvm-dev recently:
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html
...so I'm proposing that we deal with these patterns in a new, lightweight IR vector
pass that runs before/after other vectorization passes.

There are 4 alternate options that I can think of to deal with this kind of problem
(and we've seen various attempts at all of these), but they all have flaws:

    InstCombine - can't happen without TTI, but we don't want target-specific
                  folds there.
    SDAG - too late to assist other vectorization passes; TLI is not equipped
           for these kind of cost queries; limited to a single basic block.
    CGP - too late to assist other vectorization passes; would need to re-implement
          basic cleanups like CSE/instcombine.
    SLP - doesn't fit with existing transforms; limited to a single basic block.

This initial patch/transform is based on existing code in AggressiveInstCombine:
we walk backwards through the function looking for a pattern match. But we diverge
from that cost-independent IR canonicalization pass by using TTI to decide if the
vector alternative is profitable.

We probably have at least 10 similar bug reports/patterns (binops, constants,
inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements.
It's possible that we could iterate on a worklist to fix-point like InstCombine does,
but it's safer to start with a most basic case and evolve from there, so I didn't
try to do anything fancy with this initial implementation.

Differential Revision: https://reviews.llvm.org/D73480
2020-02-09 10:04:41 -05:00
Simon Pilgrim
c912faf386 Fix signed/unsigned warning. 2020-02-09 13:35:03 +00:00
Simon Pilgrim
d1b038d961 [X86] Recognise ROTLI/ROTRI rotations as faux shuffles
Allows us to combine rotations with shuffles.

One of many things necessary to fix PR44379 (lowering shuffles to rotations)
2020-02-09 12:25:49 +00:00
Ehud Katz
f6fd87fc2b [LoopExtractor] Convert LoopExtractor from LoopPass to ModulePass
The LoopExtractor created new functions (by definition), which violates
the restrictions of a LoopPass.
The correct implementation of this pass should be as a ModulePass.
Includes reverting rL82990 implications on the LoopExtractor.

Fixes PR3082 and PR8929.

Differential Revision: https://reviews.llvm.org/D69069
2020-02-09 12:25:21 +02:00
Ayman Musa
ab4b452855 [AggressiveInstCombine] Add test with baseline CHECKs for aggressive inst combine for SELECT. 2020-02-09 12:07:25 +02:00
serge_sans_paille
35ac7e7659 Support -fstack-clash-protection for x86
Implement protection against the stack clash attack [0] through inline stack
probing.

Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].

This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.

Only implemented for x86.

[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html

This a recommit of 39f50da2a357a8f685b3540246c5d762734e035f with proper LiveIn
declaration, better option handling and more portable testing.

Differential Revision: https://reviews.llvm.org/D68720
2020-02-09 10:42:45 +01:00
serge-sans-paille
85fea1133e Revert "Support -fstack-clash-protection for x86"
This reverts commit 0fd51a4554f5f4f90342f40afd35b077f6d88213.

Failures:

http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/4354
2020-02-09 10:06:31 +01:00
serge_sans_paille
9120fd7c93 Support -fstack-clash-protection for x86
Implement protection against the stack clash attack [0] through inline stack
probing.

Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].

This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.

Only implemented for x86.

[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html

This a recommit of 39f50da2a357a8f685b3540246c5d762734e035f with proper LiveIn
declaration, better option handling and more portable testing.

Differential Revision: https://reviews.llvm.org/D68720
2020-02-09 09:35:42 +01:00
Craig Topper
5ed35d6e72 [X86] Add more scalar intrinsic instructions to isNonFoldablePartialRegisterLoad.
I think this covers most if not all of the scalar intrinsic
instructions.
2020-02-08 20:41:36 -08:00
Johannes Doerfert
9354843292 [Attributor] Add an Attributor CGSCC pass and run it
In addition to the module pass, this patch introduces a CGSCC pass that
runs the Attributor on a strongly connected component of the call graph
(both old and new PM). The Attributor was always design to be used on a
subset of functions which makes this patch mostly mechanical.

The one change is that we give up `norecurse` deduction in the module
pass in favor of doing it during the CGSCC pass. This makes the
interfaces simpler but can be revisited if needed.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D70767
2020-02-08 21:27:34 -06:00
Fangrui Song
4104d396cb Fix -Wunused-lambda-capture for -DLLVM_ENABLE_ASSERTIONS=off builds after 6556c615f3c3aae8af876806777065961ae20024 2020-02-08 19:03:58 -08:00
Johannes Doerfert
d90ff4aa0e [FIX] Ordering problem accidentally introduced with D72304 2020-02-08 20:14:41 -06:00
Craig Topper
a065cf8a6f [X86] Add the recently added (V)CVTSS2SI/CVTSD2SI instructions used for LRINT/LLRINT to the load folding tables. 2020-02-08 17:54:48 -08:00
Johannes Doerfert
9686ad56e1 [FIX] Fix warning in LazyCallGraphTest caused by D70927 2020-02-08 18:58:16 -06:00
fady
1e5209e528 [OpenMP][OMPIRBuilder] Add Directives (master and critical) to OMPBuilder.
Add support for Master and Critical directive in the OMPIRBuilder. Both make use of a new common interface for emitting inlined OMP regions called `emitInlinedRegion` which was added in this patch as well.

Also this patch modifies clang to use the new directives when  `-fopenmp-enable-irbuilder` commandline option is passed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D72304
2020-02-08 18:55:48 -06:00
Johannes Doerfert
681051992b [OpenMP][Opt] Delete terminating and read-only parallel regions
Parallel regions known to be read-only, e.g., after we removed all dead
write accesses, and terminating (`willreturn`) can be removed.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D69954
2020-02-08 18:52:04 -06:00
Johannes Doerfert
dd0a5d8390 [OpenMP][Opt] Annotate known runtime functions and deduplicate more
This adds ~27 more runtime calls to the OpenMPKinds.def file, all with
attributes. We deduplicate 16 of those automatically in function =
thread scope. And we annotate all of them automatically during the
OpenMPOpt discovery step. A test with all omp_XXXX runtime calls to
track annotation coverage is included.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D69984
2020-02-08 18:35:39 -06:00
Nico Weber
2d6e96ab96 [gn build] (manually) port 72277ecd62e and the LLVMBuild bit of 9548b74a83 2020-02-08 19:01:55 -05:00
Craig Topper
3a411551c2 [X86] Use any_fadd/sub/mul/div/sqrt with the AVX512 scalar_*_patterns.
Making sure not to use them with patterns for masked instructions.

Also fix FMA patterns that were matching strict_fma+x86selects to
masked instructions.
2020-02-08 15:54:40 -08:00
Fangrui Song
c6af7f4d3a [gn build] Add OpenMPOpt.cpp to LLVMipo after D69930/9548b74a831e 2020-02-08 14:18:43 -08:00
Simon Pilgrim
be66870296 Fix test name typo 2020-02-08 21:28:46 +00:00
Nikita Popov
99ce33a9ed [InstCombine] Refactor foldICmpAndShift(); NFCI
Separate out handling for shl, lshr and ashr. The combined handling
obscured some overly pessimistic requirements for the transform.
2020-02-08 22:27:43 +01:00
Johannes Doerfert
d3c9ee991a [FIX] Update PM tests after D69930 landed 2020-02-08 15:22:40 -06:00
Simon Pilgrim
e58baea86e [X86][SSE] Add test cases from PR44379 2020-02-08 21:03:03 +00:00
Simon Pilgrim
3d26f85dd1 [X86] Test showing inability to combine ROTLI/ROTRI rotations into shuffles
One of many things necessary to fix PR44379 (lowering shuffles to rotations)
2020-02-08 21:03:02 +00:00
Johannes Doerfert
96971b934b [OpenMP] Introduce the OpenMPOpt transformation pass
The OpenMPOpt pass is a CGSCC pass in which OpenMP specific
optimizations can reside.

The OpenMPOpt pass uses the OpenMPKinds.def file to identify runtime
calls and their uses. This allows targeted transformations and eases
their implementation.

This initial patch deduplicates `__kmpc_global_thread_num` and
`omp_get_thread_num` calls. We can also identify arguments that are
equivalent to such a call result and use it instead. Later we can
determine "gtid" arguments based on the use in kernel functions etc.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D69930
2020-02-08 14:47:03 -06:00
Johannes Doerfert
1b1628ecca Introduce a CallGraph updater helper class
The CallGraphUpdater is a helper that simplifies the process of updating
the call graph, both old and new style, while running an CGSCC pass.

The uses are contained in different commits, e.g. D70767.

More functionality is added as we need it.

Reviewed By: modocache, hfinkel

Differential Revision: https://reviews.llvm.org/D70927
2020-02-08 14:16:48 -06:00
George Burgess IV
3ad731fcef [SimplifyLibCalls] Add __strlen_chk.
Bionic has had `__strlen_chk` for a while. Optimizing that into a
constant is quite profitable, when possible.

Differential Revision: https://reviews.llvm.org/D74079
2020-02-08 11:51:00 -08:00
Nikita Popov
0a6ffdfab7 [InstCombine] Fix infinite min/max canonicalization loop (PR44541)
While D72944 also fixes https://bugs.llvm.org/show_bug.cgi?id=44541,
it does so in a more roundabout manner and there might be other
loopholes to trigger the same issue. This is a more direct fix,
that prevents the transform if the min/max is based on a
non-canonical sub X, 0 instruction.

Differential Revision: https://reviews.llvm.org/D73849
2020-02-08 20:42:17 +01:00
Craig Topper
079ee50b5e [LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results.
Remove code from LegalizeTypes that allowed this to work.

We were already using BUILD_PAIR for this in some places so this
standardizes on a single way to do this.
2020-02-08 09:52:31 -08:00
Simon Pilgrim
62c0dbfd59 [X86] X86InstComments - add FMA4 comments
These typically match the FMA3 equivalents, although the multiply operands sometimes get flipped due to the FMA3 permute variants.
2020-02-08 17:02:00 +00:00
Simon Pilgrim
8ef6412f67 [X86] Standardize BROADCAST enum names (PR31079)
Tweak EVEX implementation names so it matches the other variants by adding the 'r' prefix. Oddly some of the subvec broadcast ops already matched.
2020-02-08 16:55:00 +00:00
Nikita Popov
230f67c40c [InstCombine] Remove unnecessary worklist push; NFCI
This is no longer needed after d4627b90a0462c90a834c2f7b9c9228b3ec7a45b,
should have dropped it there...
2020-02-08 17:09:28 +01:00