1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 05:01:59 +01:00

10777 Commits

Author SHA1 Message Date
Michael Liao
2c215416cc [DAGCombine] Generalize the case (add (or x, c1), c2) -> (add x, (c1 + c2))
Reviewers: arsenm

Subscribers: sdardis, wdng, hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, ecnelises, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81708
2020-06-12 13:53:08 -04:00
Simon Pilgrim
558c76c79d [DAG] foldAddSubOfSignBit - add support for non-uniform vector constants 2020-06-12 14:58:15 +01:00
David Sherwood
4d25f3947a [CodeGen] Let computeKnownBits do something sensible for scalable vectors
Until we have a real need for computing known bits for scalable
vectors I have simply changed the code to bail out for now and
pretend we know nothing. I've also fixed up some simple callers of
computeKnownBits too.

Differential Revision: https://reviews.llvm.org/D80437
2020-06-11 08:17:11 +01:00
Sanjay Patel
2d91437b89 [DAGCombiner] allow more folding of fadd + fmul into fma
If fmul and fadd are separated by an fma, we can fold them together
to save an instruction:
fadd (fma A, B, (fmul C, D)), N1 --> fma(A, B, fma(C, D, N1))

The fold implemented here is actually a specialization - we should
be able to peek through >1 fma to find this pattern. That's another
patch if we want to try that enhancement though.

This transform was guarded by the TLI hook enableAggressiveFMAFusion(),
so it was done for some in-tree targets like PowerPC, but not AArch64
or x86. The hook is protecting against forming a potentially more
expensive computation when fma takes longer to execute than a single
fadd. That hook may be needed for other transforms, but in this case,
we are replacing fmul+fadd with fma, and the fma should never take
longer than the 2 individual instructions.

'contract' FMF is all we need to allow this transform. That flag
corresponds to -ffp-contract=fast in Clang, so we are allowed to form
fma ops freely across expressions.

Differential Revision: https://reviews.llvm.org/D80801
2020-06-09 10:41:27 -04:00
Guillaume Chatelet
9bd39147ee Revert "[Alignment][NFC] Migrate TargetLowering::allowsMemoryAccess"
This reverts commit f21c52667ed147903015a94643b0057319189d4e.
2020-06-09 10:43:59 +00:00
Simon Wallis
031a325c42 [ARM] prologue instructions emitted for naked function with >64 byte argument
Summary:

The naked function attribute is meant to suppress all function
prologue/epilogue instructions.

On ARM, some are still emitted if an argument greater than 64 bytes in size
(the threshold for using the byval attribute in IR) is passed partially
in registers.

Perform the check for Attribute::Naked and early exit in
SelectionDAGISel::LowerArguments().

Checking in ARMFrameLowering::determineCalleeSaves() is too late.

A test case is included.

Reviewers: llvm-commits, olista01, danielkiss

Reviewed By: danielkiss

Subscribers: kristof.beyls, hiraditya, danielkiss

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80715

Change-Id: Icedecf2a4ad31bc3c35ab0df7489a9d346e1f7cc
2020-06-09 11:33:03 +01:00
Guillaume Chatelet
574b28f02f [Alignment][NFC] Migrate TargetLowering::allowsMemoryAccess
Summary:
Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::allowsMemoryAccess` without marking it override.

This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81379
2020-06-09 10:11:07 +00:00
Guillaume Chatelet
5893bc59fd Fix unused variable warning 2020-06-09 08:56:05 +00:00
David Sherwood
0bf1bb94f0 [CodeGen] Ensure callers of CreateStackTemporary use sensible alignments
In two instances of CreateStackTemporary we are sometimes promoting
alignments beyond the stack alignment. I have introduced a new function
called getReducedAlign that will return the alignment for the broken
down parts of illegal vector types. For example, on NEON a <32 x i8>
type is made up of two <16 x i8> types - in this case the sensible
alignment is 16 bytes, not 32.

In the legalization code wherever we create stack temporaries I have
started using the reduced alignments instead for illegal vector types.

I added a test to

  CodeGen/AArch64/build-one-lane.ll

that tries to insert an element into an illegal fixed vector type
that involves creating a temporary stack object.

Differential Revision: https://reviews.llvm.org/D80370
2020-06-09 08:10:17 +01:00
Christopher Tetreault
c4c014f213 [SVE] Eliminate calls to default-false VectorType::get() from CodeGen
Reviewers: efriedma, c-rhodes, david-arm, spatel, craig.topper, aqjune, paquette, arsenm, gchatelet

Reviewed By: spatel, gchatelet

Subscribers: wdng, tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80313
2020-06-08 10:26:10 -07:00
Sanjay Patel
4644aeb774 [DAGCombiner] clean-up FMA+FMUL folds; NFC
D80801 suggests some readability improvements before mocing this block.
2020-06-06 10:32:54 -04:00
Matt Arsenault
56ef93b0b0 GlobalISel: Make known bits/alignment API more consistent
Just computing the alignment makes sense without caring about the
general known bits, such as for non-integral pointers. Separate the
two and start calling into the TargetLowering hooks for frame indexes.

Start calling the TargetLowering implementation for FrameIndexes,
which improves the AMDGPU matching for stack addressing modes. Also
introduce a new hook for returning known alignment of target
instructions. For AMDGPU, it would be useful to report the known
alignment implied by certain intrinsic calls.

Also stop using MaybeAlign.
2020-06-05 14:57:22 -04:00
Sander de Smalen
31807863bb Reland D80640: [CodeGen][SVE] Calculate correct type legalization for scalable vectors.
This reverts commit 9bcef270d7a319c6c0fdffc6c80984a8f0a30ecb.
2020-06-05 18:09:31 +01:00
Sander de Smalen
9dac7984eb Revert "[CodeGen][SVE] Calculate correct type legalization for scalable vectors."
Seems to break some buildbots, reverting the patch for now.

This reverts commit 164f4b9d26fdf3cd640a09b63b5ec44d033cbe8a.
2020-06-05 16:03:52 +01:00
Sander de Smalen
b11af615da [CodeGen][SVE] Calculate correct type legalization for scalable vectors.
This patch updates TargetLoweringBase::computeRegisterProperties and
TargetLoweringBase::getTypeConversion to support scalable vectors,
and make the right calls on how to legalise them. These changes are required
to legalise both MVTs and EVTs.

Reviewers: efriedma, david-arm, ctetreau

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80640
2020-06-05 15:20:34 +01:00
Kerry McLaughlin
5e3af5dc50 [CodeGen][SVE] Legalisation of extends with scalable types
Summary:
This patch adds legalisation of extensions where the operand
of the extend is a legal scalable type but the result is not.

EXTRACT_SUBVECTOR is used to split the result, before
being replaced by target-specific [S|U]UNPK[HI|LO] operations.

For example:

```
zext <vscale x 16 x i8> %a to <vscale x 16 x i16>
```
should emit:

```
uunpklo z2.h, z0.b
uunpkhi z1.h, z0.b
```

Reviewers: sdesmalen, efriedma, david-arm

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, huihuiz, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79587
2020-06-05 12:08:42 +01:00
Philip Reames
d66e1d17e1 [Statepoint] Migrate a few tests to gc-live bundle format and fix assert
The assert was missed in 0e7c7705, migrating the test revealed the problem.
2020-06-04 18:15:58 -07:00
Matt Arsenault
47a0b3e0e7 DAG: Change computeKnownBitsForFrameIndex to be usable by GISel
This wasn't getting much value from the DAG or depth arguments, since
it's only called on the frame index root nodes. FrameIndexes can also
only return a scalar value, so it also didn't need DemandedElts.
2020-06-04 10:50:26 -04:00
Sanjay Patel
37da3e96e3 [x86] add test/code comment for chain value use (PR46195); NFC 2020-06-04 09:15:17 -04:00
Simon Pilgrim
7f449074ca [DAG] scalarizeBinOpOfSplats - extract from the source of splat vector (PR46189)
D79003/rG9fa58d1bf2f8 exposed an issue with scalarizeBinOpOfSplats that we were extracting from the splatted vector result instead of the source, the splat index is only valid for the source vector not the result, which may contain undefs, including at the splat index.
2020-06-04 11:58:59 +01:00
Tim Northover
6f80acd90a Revert "[DAGCombiner] avoid unnecessary indirection from SDNode/SDValue; NFCI"
This reverts commit 21dadd774f56778ef68c1ce307205dfbdacc793a.

In at least PromoteIntBinOps, they wanted to know about users of *all* values
produced by the node not just the integer being promoted. For example not
replacing chain users if the operation was a load breaks the ordering of the
DAG.
2020-06-04 11:53:14 +01:00
Madhur Amilkanthwar
4d31a6ef5a Utility to dump .dot representation of SelectionDAG without firing viewer
Summary:
This patch adds support for dumping .dot
representation of SelectionDAG. It is inspired from the fact that,
a developer may want to just dump the graph at
a predictable path with a simple name to compare.
The exisitng utility (i.e. viewGraph) are overkill
for this motive hence this patch adds the requires support
while using the core routines from GraphWriter.

Example usage: DAG.dumpDotGraph("/tmp/graph.dot", "MyGraph")
will create /tmp/graph.dot file when DAG is an
object of SelectionDAG class.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D80711
2020-06-04 11:51:48 +05:30
Philip Reames
77b9606e79 [Statepoint] Remove last of old ImmutableStatepoint code
To do so, I had to sink the old school inline operand handling into GCStatepointInst which is non ideal.  This code should be removed shortly and I was able to at least clean it up a bunch.
2020-06-03 20:31:17 -07:00
Philip Reames
409ef9346e [Statepoint] Delete more dead code from old wrappers
The verify() routine duplicates IR/Verifier.cpp checks, so while not technically dead it doesn't add any value either.
2020-06-03 20:10:30 -07:00
Henry Kao
d1cc2de77b [CodeGen][SVE] Replace deprecated calls in getCopyFromPartsVector()
Summary: Replaced getVectorNumElements() with getVectorElementCount(). Added operator overloads for class ElementCount. Fixes warning in several AArch64 unit tests.

Reviewers: sdesmalen, kmclaughlin, dancgr, efriedma, each, andwar, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80826
2020-06-03 11:20:02 -04:00
Simon Pilgrim
8ca5ce94d4 [DAG] SimplifyDemandedBits - peek through SHL if we only demand sign bits.
If we're only demanding the (shifted) sign bits of the shift source value, then we can use the value directly.

This handles SimplifyDemandedBits/SimplifyMultipleUseDemandedBits for both ISD::SHL and X86ISD::VSHLI.

Differential Revision: https://reviews.llvm.org/D80869
2020-06-03 16:11:54 +01:00
Simon Pilgrim
fbe2fc786a [DAG] GetDemandedBits - don't bother asserting for a non-null cast<> result. NFC.
cast<> will assert on failure anyhow.

This lets us fold the cast<> with the getAPIntValue() that uses it.
2020-06-03 12:43:07 +01:00
Kadir Cetinkaya
def2e34328 [llvm] Fix unused variable warning 2020-06-02 22:46:24 +02:00
Amy Kwan
97fd4517d5 [DAGCombiner] Combine shifts into multiply-high
This patch implements a target independent DAG combine to produce multiply-high
instructions from shifts. This DAG combine will combine shifts for any type as
long as the MULH on the narrow type is legal.

For now, it is enabled on PowerPC as PowerPC is the only target that has an
implementation of the isMulhCheaperThanMulShift TLI hook introduced in
D78271.

Moreover, this DAG combine focuses on catching the pattern:
(shift (mul (ext <narrow_type>:$a to <wide_type>), (ext <narrow_type>:$b to <wide_type>)), <narrow_width>)
to produce mulhs when we have a sign-extend, and mulhu when we have
a zero-extend.

The patch performs the following checks:
- Operation is a right shift arithmetic (sra) or logical (srl)
- Input to the shift is a multiply
- Both operands to the shift are sext/zext nodes
- The extends into the multiply are both the same
- The narrow type is half the width of the wide type
- The shift amount is the width of the narrow type
- The respective mulh operation is legal

Differential Revision: https://reviews.llvm.org/D78272
2020-06-02 15:22:48 -05:00
Denis Antrushin
40fd2ba064 [StatepointLowering] Handle UNDEF gc values.
Do not spill UNDEF GC values. Instead, replace corresponding
gc.relocate intrinsic with an (arbitrary, but recognizable) constant.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D80714
2020-06-02 10:18:33 +03:00
Matt Arsenault
0ba1ec26dd DAG: Fix getNode dropping flags if there's a glue output
The AMDGPU non-strict fdiv lowering needs to introduce an FP mode
switch in some cases, and has custom nodes to provide chain/glue for
the intermediate FP operations. We need to propagate nofpexcept here,
but getNode was dropping the flags.

Adding nofpexcept in the AMDGPU custom lowering is left to a future
patch.

Also fix a second case where flags were dropped, but in this case it
seems it just didn't handle this number of operands.

Test will be included in future AMDGPU patch.
2020-06-01 13:48:02 -04:00
Florian Hahn
4c3ac27019 [ScheduleDAG] Avoid unnecessary recomputation of topological order.
In some cases ScheduleDAGRRList has to add new nodes to resolve problems
with interfering physical registers. When new nodes are added, it
completely re-computes the topological order, which can take a long
time, but is unnecessary. We only add nodes one by one, and initially
they do not have any predecessors. So we can just insert them at the end
of the vector. Later we add predecessors, but the helper function
properly updates the topological order much more efficiently. With this
change, the compile time for the program below drops from 300s to 30s on
my machine.

    define i11129 @test1() {
      %L1 = load i11129, i11129* undef
      %B30 = ashr i11129 %L1, %L1
      store i11129 %B30, i11129* undef
      ret i11129 %L1
    }

This should be generally beneficial, as we can skip a large amount of
work. Theoretically there are some scenarios where we might not safe
much, e.g. when we add a dependency between the first and last node.
Then we would have to shift all nodes. But we still do not have to spend
the time re-computing the initial order.

Reviewers: MatzeB, atrick, efriedma, niravd, paquette

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D59722
2020-05-31 11:04:35 +01:00
Craig Topper
87390781b5 [DAGCombiner] Move debug message and statistic update into CommitTargetLoweringOpt.
This code was repeated in two callers of CommitTargetLoweringOpt.
But CommitTargetLoweringOpt is also called from TargetLowering.
We should print a message for those calls to. So sink the
repeated code into CommitTargetLoweringOpt to catch those calls.
2020-05-30 19:47:07 -07:00
Simon Pilgrim
220d9ef6ed [TargetLowering] SimplifyDemandedBits - remove shift amount clamps from getValidShiftAmountConstant calls. NFC.
getValidShiftAmountConstant only returns a value if the shift amount is in range, so we don't need to check it again.
2020-05-30 14:04:55 +01:00
Simon Pilgrim
440c8825f9 [SelectionDAG] ComputeNumSignBits - use Valid Min/Max shift amount helpers directly. NFCI.
We are calling getValidShiftAmountConstant first followed by getValidMinimumShiftAmountConstant/getValidMaximumShiftAmountConstant if that failed. But both are used in the same way in ComputeNumSignBits and the Min/Max variants call getValidShiftAmountConstant internally anyhow.
2020-05-30 14:02:14 +01:00
Simon Pilgrim
10547eab71 [SelectionDAG] Remove repeated getOperand() call. NFC. 2020-05-30 10:21:36 +01:00
Tobias Bosch
06b3595fcf [DebugInfo][DAG] Don't reuse debug location on COPY if width changes.
Summary:
This caused incorrect debug information for parameters:
Previously, after a COPY of a parameter that changes the width,
we would emit a DBG_VALUE that continues to be associated to that
parameter, even though it now used a different width.
This made the LiveDebugValues pass assume the parameter value
got clobbered and it stopped tracking the parameter entry
value, leading to incorrect debug information.

Fixes https://bugs.llvm.org/show_bug.cgi?id=39715

Subscribers: aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80819
2020-05-29 13:24:33 -07:00
Zequan Wu
25d7f2f024 Add NoMerge MIFlag to avoid MIR branch folding
Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given

Differential Revision: https://reviews.llvm.org/D79537
2020-05-29 12:31:06 -07:00
Guozhi Wei
7118b8a8dd [DAGCombiner] Add command line options to guard store width reduction
optimizations

As discussed in the thread http://lists.llvm.org/pipermail/llvm-dev/2020-May/141838.html,
some bit field access width can be reduced by ReduceLoadOpStoreWidth, some
can't. If two accesses are very close, and the first access width is reduced,
the second is not. Then the wide load of second access will be stalled for long
time.

This patch add command line options to guard ReduceLoadOpStoreWidth and
ShrinkLoadReplaceStoreWithStore, so users can use them to disable these
store width reduction optimizations.

Differential Revision: https://reviews.llvm.org/D80745
2020-05-29 09:41:41 -07:00
David Sherwood
bff701c7ed [CodeGen] Fix warning in visitShuffleVector
Make sure we only ask for the number of elements after we've
bailed out for scalable vectors.

Differential revision: https://reviews.llvm.org/D80632
2020-05-29 17:09:59 +01:00
Sanjay Patel
07aa15d865 [DAGCombiner] avoid unnecessary indirection from SDNode/SDValue; NFCI 2020-05-29 09:31:52 -04:00
Florian Hahn
855b755233 [DAGComb] Do not turn insert_elt into shuffle for single elt vectors.
Currently combineInsertEltToShuffle turns insert_vector_elt into a
vector_shuffle, even if the inserted element is a vector with a single
element. In this case, it should be unlikely that the additional shuffle
would be more efficient than a insert_vector_elt.

Additionally, this fixes a infinite cycle in DAGCombine, where
combineInsertEltToShuffle turns a insert_vector_elt into a shuffle,
which gets turned back into a insert_vector_elt/extract_vector_elt by
a custom AArch64 lowering (in visitVECTOR_SHUFFLE).

Such insert_vector_elt and extract_vector_elt combinations can be
lowered efficiently using mov on AArch64.

There are 2 test changes in arm64-neon-copy.ll: we now use one or two
mov instructions instead of a single zip1. The reason that we need a
second mov in ins1f2 is that we have to move the result to the result
register and is not really related to the DAGCombine fold I think.
But in any case, on most uarchs, mov should be cheaper than zip1. On a
Cortex-A75 for example, zip1 is twice as expensive as mov
(https://developer.arm.com/docs/101398/latest/arm-cortex-a75-software-optimization-guide-v20)

Reviewers: spatel, efriedma, dmgreen, RKSimon

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D80710
2020-05-29 13:21:13 +01:00
Paul Walker
a604bdd390 [SelectionDAG] Update getNode asserts for EXTRACT/INSERT_SUBVECTOR.
Summary:
The description of EXTACT_SUBVECTOR and INSERT_SUBVECTOR has been
changed to accommodate scalable vectors (see ISDOpcodes.h). This
patch updates the asserts used to verify these requirements when
using SelectionDAG's getNode interface.

This patch introduces the MVT function getVectorMinNumElements
that can be used against fixed-length and scalable vectors when
only the known minimum vector length is required.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80709
2020-05-29 11:02:18 +00:00
David Sherwood
87771f8fd7 [CodeGen] Fix warnings in getZeroExtendInReg
We should be using getVectorElementCount() to assert that two types
have the same numbers of elements. I encountered the warnings while
compiling this test:

  CodeGen/AArch64/sve-intrinsics-ld1.ll

Differential Revision: https://reviews.llvm.org/D80616
2020-05-29 11:51:07 +01:00
David Sherwood
b0b422a9b8 [CodeGen] Add support for extracting elements of scalable vectors
I have tried to ensure that SelectionDAG and DAGCombiner do
sensible things for scalable vectors, and added support for a
limited number of simple folds. Codegen support for the vector
extract patterns have also been added to the AArch64 backend.

New vector extract tests have been added here:

  CodeGen/AArch64/sve-extract-element.ll

and I have also added new folds using inserts and extracts here:

  CodeGen/AArch64/sve-insert-element.ll

Differential Revision: https://reviews.llvm.org/D80208
2020-05-29 07:49:43 +01:00
Eric Christopher
04de7bc77f unsigned -> Register for readability. 2020-05-28 15:21:55 -07:00
Philip Reames
071bbe0177 [Statepoints] Sink routines for grabbing projections to GCStatepointInst [NFC]
Mechanical movement, nothing more.
2020-05-28 13:51:59 -07:00
Philip Reames
56c98b3720 [Statepoint] Sink actual_args and gc_args to GCStatepointInst [NFC]
These are the two operand sets which are expected to survive more than another week or so.  Instead of bothering to update the deopt and gc-transition operands, we'll just wait until those are removed and delete the code.

For those following along, this is likely to be the last (major) change in this sequence for about a week.  I want to wait until all of this has been merged downstream to ensure I haven't introduced any bugs (and migrate some downstream code to the new interfaces).  Once that's done, we should be able to delete Statepoint/ImmutableStatepoint without too much work.
2020-05-28 13:51:59 -07:00
Philip Reames
321a5cfc7a [Statepoint] Use iterate_range.empty [NFC] 2020-05-28 13:51:59 -07:00
Philip Reames
0fb404dfdf [Statepoint] Convert a few more isStatepoint calls to idiomatic isa/cast
I'd apparently only grepped in the lib directories and missed a few used in the Statepoint header itself.  Beyond simple mechanical cleanup, changed the type of one routine to reflect the fact it also returns a statepoint.
2020-05-28 11:35:36 -07:00