1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 12:43:36 +01:00
Commit Graph

30071 Commits

Author SHA1 Message Date
Fangrui Song
7cc3572b85 [XRay] Support DW_TAG_call_site and delete unneeded PATCHABLE_EVENT_CALL/PATCHABLE_TYPED_EVENT_CALL lowering 2021-01-25 00:49:18 -08:00
Fangrui Song
18f43f5ac9 [XRay] Make __xray_customevent support non-Linux 2021-01-25 00:48:21 -08:00
QingShan Zhang
d5b70bbb38 [NFC] [DAGCombine] Correct the result for sqrt even the iteration is zero
For now, we correct the result for sqrt if iteration > 0. This doesn't make
sense as they are not strict relative.

Reviewed By: dmgreen, spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D94480
2021-01-25 04:02:44 +00:00
Chen Zheng
db58448497 [PowerPC] support register pressure reduction in machine combiner.
Reassociating some patterns to generate more fma instructions to
reduce register pressure.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D92071
2021-01-24 21:28:21 -05:00
Kazu Hirata
0ec2908ce9 [llvm] Use pop_back_val (NFC) 2021-01-24 12:18:57 -08:00
Kazu Hirata
2fb5578408 [CodeGen] Forward-declare TargetMachine (NFC)
InstrEmitter.h needs TargetMachine but relies on a forward declaration
of TargetMachine in MachineOperand.h.  This patch adds a forward
declaration right in InstrEmitter.h.

While we are at it, this patch removes the one in MachineOperand.h,
where it is unnecessary.
2021-01-24 12:18:54 -08:00
Roger Ferrer Ibanez
e9cd01d82d [RISCV][PrologEpilogInserter] "Float" emergency spill slots to avoid making them immediately unreachable from the stack pointer
In RISC-V there is a single addressing mode of the form imm(reg) where
imm is a signed integer of 12-bit with a range of [-2048..2047] bytes
from reg.

The test MultiSource/UnitTests/C++11/frame_layout of the LLVM test-suite
exercises several scenarios with the stack, including function calls
where the stack will need to be realigned to to a local variable having
a large alignment of 4096 bytes.

In situations of large stacks, the RISC-V backend (in
RISCVFrameLowering) reserves an extra emergency spill slot which can be
used (if no free register is found) by the register scavenger after the
frame indexes have been eliminated. PrologEpilogInserter already takes
care of keeping the emergency spill slots as close as possible to the
stack pointer or frame pointer (depending on what the function will
use). However there is a final alignment step to honour the maximum
alignment of the stack that, when using the stack pointer to access the
emergency spill slots, has the side effect of setting them farther from
the stack pointer.

In the case of the frame_layout testcase, the net result is that we do
have an emergency spill slot but it is so far from the stack pointer
(more than 2048 bytes due to the extra alignment of a variable to 4096
bytes) that it becomes unreachable via any immediate offset.

During elimination of the frame index, many (regular) offsets of the
stack may be immediately unreachable already. Their address needs to be
computed using a register. A virtual register is created and later
RegisterScavenger should be able to find an unused (physical) register.
However if no register is available, RegisterScavenger will pick a
physical register and spill it onto an emergency stack slot, while we
compute the offset (restoring the chosen register after all this). This
assumes that the emergency stack slot is easily reachable (this is,
without requiring another register!).

This is the assumption we seem to break when we perform the extra
alignment in PrologEpilogInserter.

We can "float" the emergency spill slots by increasing (in absolute
value) their offsets from the incoming stack pointer. This way the
emergency spill slots will remain close to the stack pointer (once the
function has allocated storage for the stack, including the needed
realignment). The new size computed in PrologEpilogInserter is padding
so it should be OK to move the emergency spill slots there. Also because
we're increasing the alignment, the new location should stay aligned for
the purpose of the emergency spill slots.

Note that this change also impacts other backends as shown by the tests.
Changes are minor adjustments to the emergency stack slot offset.

Differential Revision: https://reviews.llvm.org/D89239
2021-01-23 09:10:03 +00:00
Craig Topper
4a8863f5e1 [TargetLowering] Use isOneConstant to simplify some code. NFC 2021-01-22 19:32:19 -08:00
Stanislav Mekhanoshin
dd3332e2e5 Change materializeFrameBaseRegister() to return register
The only caller of this function is in the LocalStackSlotAllocation
and it creates base register of class returned by the target's
getPointerRegClass(). AMDGPU wants to use a different reg class
here so let materializeFrameBaseRegister to just create and return
whatever it wants.

Differential Revision: https://reviews.llvm.org/D95268
2021-01-22 15:51:06 -08:00
Mitch Phillips
1c4f0a4fdd Revert "[AArch64][GlobalISel] Implement widenScalar for signed overflow"
This reverts commit 541d98efa222b00e16c67348810898c2fa11f398.

Reason: Dependent patch 3dedad475da45c05bc4f66cd14e9f44581edf0bc broke
UBSan on Android: http://lab.llvm.org:8011/#/builders/77/builds/3082
2021-01-22 14:32:11 -08:00
Mitch Phillips
7a51025e46 Revert "[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method"
This reverts commit 2bb92bf451d7eb2c817f3e5403353e7c0c14d350.

Dependent patch broke UBSan on Android:
3dedad475da45c05bc4f66cd14e9f44581edf0bc
2021-01-22 14:32:11 -08:00
Cassie Jones
166f6f7864 [GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method
The widenScalar implementation for signed and unsigned overflowing
operations were very similar: both are checked by truncating the result
and then re-sign/zero-extending it and checking that it matches the
computed operation.

Using a truncate + zero-extend for the unsigned case instead of manually
producing the AND instruction like before leads to an extra copy
instruction during legalization, but this should be harmless.

Differential Revision: https://reviews.llvm.org/D95035
2021-01-22 14:08:46 -08:00
Simon Pilgrim
82a418b65b [DAG] Commute shuffle(splat(A,u), shuffle(C,D)) -> shuffle'(shuffle(C,D), splat(A,u))
We only merge shuffles if the inner (LHS) shuffle is a non-splat, so commute these shuffles to improve merging of multiple shuffles.
2021-01-22 11:43:18 +00:00
Craig Topper
cac4215eb0 [TargetLowering] Use getBoolConstant instead of assuming zero or one for boolean contents.
Noticed while I was touching other nearby code. I don't have a
test where this matters because the targets I work on
use zero or one boolean contents. And the tests cases I've seen
this fire on happen before type legalization where the result type
is MVT::i1 so the distinction doesn't matter.
2021-01-22 00:26:14 -08:00
Craig Topper
39e0e13396 [TargetLowering] Simplify some code in SimplifySetCC that tries to handle SIGN_EXTEND_INREG operand types that should never happen. NFCI
There was code to handle the first operand being different than
the result type. And code to handle first operand having the
same type as the type to extend from. This should never happen
for a correctly formed SIGN_EXTEND_INREG. I've replace the
code with asserts.

I also noticed we created the same APInt twice so I've reused it.
2021-01-21 23:56:37 -08:00
Cassie Jones
6bc2033895 [AArch64][GlobalISel] Implement widenScalar for signed overflow
Implement widening for G_SADDO and G_SSUBO. Previously it was only
implemented for G_UADDO and G_USUBO. Also add legalize-add/sub tests for
narrow overflowing add/sub on AArch64.

Differential Revision: https://reviews.llvm.org/D95034
2021-01-21 22:55:42 -08:00
Kazu Hirata
69d44af40a [llvm] Use isDigit (NFC) 2021-01-21 19:59:50 -08:00
Kazu Hirata
6f30ddfaf5 [CodeGen] Use llvm::append_range (NFC) 2021-01-21 19:59:46 -08:00
Chen Zheng
b37b3473aa [NFC] [TargetRegisterInfo] add another API to get srcreg through copy.
Reviewed By: nemanjai, jsji

Differential Revision: https://reviews.llvm.org/D92069
2021-01-21 20:10:25 -05:00
Matt Arsenault
a964cd0596 AArch64/GlobalISel: Factor out parametersInCSRMatch
Make this look more like the DAG handling and move to common code.

I also noticed AArch64 seems to not be properly adding the
physreg:virtreg mapping to the function live ins.
2021-01-21 10:32:48 -05:00
Simon Pilgrim
add6aab738 [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED).
Add DemandedElts support inside the TRUNCATE analysis.

REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d

Differential Revision: https://reviews.llvm.org/D56387
2021-01-21 13:01:34 +00:00
Simon Pilgrim
bf43079680 [DAG] SimplifyDemandedBits - correctly adjust truncated shift amount type
As noticed on D56387, for vectors we must always correctly adjust the shift amount type during truncation (not just after legalization). We were getting away with it as we currently only accepted scalars via the dyn_cast<ConstantSDNode>.
2021-01-21 12:38:36 +00:00
Simon Pilgrim
49e90d18ab [DAG] CombineToPreIndexedLoadStore - use const APInt& for getAPIntValue(). NFCI.
Cleanup some code to use auto* properly from cast, and use const APInt& for getAPIntValue() to avoid an unnecessary copy.
2021-01-21 11:04:09 +00:00
Luo, Yuanke
d25938d612 Revert "[X86][AMX] Fix tile config register spill issue."
This reverts commit 20013d02f3352a88d0838eed349abc9a2b0e9cc0.
2021-01-21 18:11:43 +08:00
Luo, Yuanke
90681ff5c1 [X86][AMX] Fix tile config register spill issue.
Previous code build the model that tile config register is the user of
each AMX instruction. There is a problem for the tile config register
spill. When across function, the ldtilecfg instruction may be inserted
on each AMX instruction which use tile config register. This cause all
tile data register clobber.
To fix this issue, we remove the model of tile config register. We
analyze the regmask of call instruction and insert ldtilecfg if there is
any tile data register live across the call. Inserting the sttilecfg
before the call is unneccessary, because the tile config doesn't change
and we can just reload the config.
Besides we also need check tile config register interference. Since we
don't model the config register we should check interference from the
ldtilecfg to each tile data register def.
             ldtilecfg
             /       \
            BB1      BB2
            /         \
           call       BB3
           /           \
       %1=tileload   %2=tilezero
We can start from the instruction of each tile def, and backward to
ldtilecfg. If there is any call instruction, and tile data register is
not preserved, we should insert ldtilecfg after the call instruction.

Differential Revision: https://reviews.llvm.org/D94155
2021-01-21 16:01:50 +08:00
Kazu Hirata
cd6de67c94 [llvm] Use hasSingleElement (NFC) 2021-01-20 21:35:55 -08:00
Hans Wennborg
87b5059ac7 Revert "[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE"
It caused "Vector shift amounts must be in the same as their first arg"
asserts in Chromium builds. See the code review for repro instructions.

> Add DemandedElts support inside the TRUNCATE analysis.
>
> Differential Revision: https://reviews.llvm.org/D56387

This reverts commit cad4275d697c601761e0819863f487def73c67f8.
2021-01-20 20:06:55 +01:00
Simon Pilgrim
11b2803aa0 [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE
Add DemandedElts support inside the TRUNCATE analysis.

Differential Revision: https://reviews.llvm.org/D56387
2021-01-20 15:39:58 +00:00
Amanieu d'Antras
bea54db865 [AArch64] Add support for the GNU ILP32 ABI
Add the aarch64[_be]-*-gnu_ilp32 targets to support the GNU ILP32 ABI for AArch64.

The needed codegen changes were mostly already implemented in D61259, which added support for the watchOS ILP32 ABI. The main changes are:
- Wiring up the new target to enable ILP32 codegen and MC.
- ILP32 va_list support.
- ILP32 TLSDESC relocation support.

There was existing MC support for ELF ILP32 relocations from D25159 which could be enabled by passing "-target-abi ilp32" to llvm-mc. This was changed to check for "gnu_ilp32" in the target triple instead. This shouldn't cause any issues since the existing support was slightly broken: it was generating ELF64 objects instead of the ELF32 object files expected by the GNU ILP32 toolchain.

This target has been tested by running the full rustc testsuite on a big-endian ILP32 system based on the GCC ILP32 toolchain.

Reviewed By: kristof.beyls

Differential Revision: https://reviews.llvm.org/D94143
2021-01-20 13:34:47 +00:00
Mirko Brkusanin
a421260042 [AMDGPU][GlobalISel] Avoid selecting S_PACK with constants
If constants are hidden behind G_ANYEXT we can treat them same way as G_SEXT.
For that purpose we extend getConstantVRegValWithLookThrough with option
to handle G_ANYEXT same way as G_SEXT.

Differential Revision: https://reviews.llvm.org/D92219
2021-01-20 11:54:53 +01:00
Gabriel Hjort Åkerlund
0fd4b778e9 [GlobalISel] Add missing operand update when copy is required
When constraining an operand register using constrainOperandRegClass(),
the function may emit a COPY in case the provided register class does
not match the current operand register class. However, the operand
itself is not updated to make use of the COPY, thereby resulting in
incorrect code. This patch fixes that bug by updating the machine
operand accordingly.

Reviewed By: dsanders

Differential Revision: https://reviews.llvm.org/D91244
2021-01-20 10:32:52 +01:00
Kazu Hirata
e7b66f83f2 [llvm] Use llvm::all_of (NFC) 2021-01-19 20:19:17 -08:00
Kazu Hirata
f118370581 [llvm] Use llvm::find (NFC) 2021-01-19 20:19:14 -08:00
Ian Levesque
cb3e4b9e0e [xray] Honor xray-never function-instrument attribute
function-instrument=xray-never wasn't actually honored before. We were
getting lucky that it worked because CodeGenFunction would omit the
other xray attributes when a function was annotated with
xray_never_instrument. This patch adds proper support.

Differential Revision: https://reviews.llvm.org/D89441
2021-01-19 18:47:09 -05:00
Jeroen Dobbelaere
116cd71f2c [noalias.decl] Look through llvm.experimental.noalias.scope.decl
Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93042
2021-01-19 20:09:42 +01:00
Jessica Paquette
05fff88674 Fix buildbot after cfc60730179042a93cb9cb338982e71d20707a24
Windows buildbots were not happy with using find_if + instructionsWithoutDebug.

In cfc60730179042a9, instructionsWithoutDebug is not technically necessary. So,
just iterate over the block directly.

http://lab.llvm.org:8011/#/builders/127/builds/4732/steps/7/logs/stdio
2021-01-19 10:38:04 -08:00
Jessica Paquette
c4d2d8a4de [GlobalISel] Combine (a[0]) | (a[1] << k1) | ...| (a[m] << kn) into a wide load
This is a restricted version of the combine in `DAGCombiner::MatchLoadCombine`.
(See D27861)

This tries to recognize patterns like below (assuming a little-endian target):

```
s8* x = ...
s32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
->
s32 val = *((i32)a)

s8* x = ...
s32 val = a[3] | (a[2] << 8) | (a[1] << 16) | (a[0] << 24)
->
s32 val = BSWAP(*((s32)a))
```

(This patch also handles the big-endian target case as well, in which the first
example above has a BSWAP, and the second example above does not.)

To recognize the pattern, this searches from the last G_OR in the expression
tree.

E.g.

```
    Reg   Reg
     \    /
      OR_1   Reg
       \    /
        OR_2
          \     Reg
           .. /
          Root
```

Each non-OR register in the tree is put in a list. Each register in the list is
then checked to see if it's an appropriate load + shift logic.

If every register is a load + potentially a shift, the combine checks if those
loads + shifts, when OR'd together, are equivalent to a wide load (possibly with
a BSWAP.)

To simplify things, this patch

(1) Only handles G_ZEXTLOADs (which appear to be the common case)
(2) Only works in a single MachineBasicBlock
(3) Only handles G_SHL as the bit twiddling to stick the small load into a
    specific location

An IR example of this is here: https://godbolt.org/z/4sP9Pj (lifted from
test/CodeGen/AArch64/load-combine.ll)

At -Os on AArch64, this is a 0.5% code size improvement for CTMark/sqlite3,
and a 0.4% improvement for CTMark/7zip-benchmark.

Also fix a bug in `isPredecessor` which caused it to fail whenever `DefMI` was
the first instruction in the block.

Differential Revision: https://reviews.llvm.org/D94350
2021-01-19 10:24:27 -08:00
Luo, Yuanke
414d93201a [X86] Fix tile spill merge issue.
This is a additional bug fix for c5be0e0cc0. The distance for
the spill instructions is wrong in previous patch.

Differential Revision: https://reviews.llvm.org/D94772
2021-01-19 10:51:42 +08:00
Chen Zheng
7478f56eb6 Revert "[NFC] [TargetRegisterInfo] add one use check to lookThruCopyLike."
This reverts commit 3bdf4507b66348ad78df4655a8e4f36c3fc10f3c.

Post commit comments need to be addressed first.
2021-01-18 21:33:31 -05:00
Craig Topper
9e854071e5 Recommit "[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results."
This recommits 2c51bef76cbf0149101b9e7c7c658b4a58657929.

I've fixed the broken check line from when I renamed the test function.

Original commit message:
This builds on D94142 where scalable vectors are allowed in structs.

I did have to fix one scalable vector issue in the vector type
creation for these intrinsics where we used getVectorNumElements
instead of ElementCount.
2021-01-18 11:08:28 -08:00
Craig Topper
4299f44938 Revert "[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results."
This reverts commit 2c51bef76cbf0149101b9e7c7c658b4a58657929.

I seem to have messed up the check lines in the test.
2021-01-18 11:00:20 -08:00
Craig Topper
7410032174 [RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results.
This builds on D94142 where scalable vectors are allowed in structs.

I did have to fix one scalable vector issue in the vector type
creation for these intrinsics where we used getVectorNumElements
instead of ElementCount.

Differential Revision: https://reviews.llvm.org/D94149
2021-01-18 10:41:36 -08:00
Kazu Hirata
8b4d487fa2 [llvm] Use the default value of drop_begin (NFC) 2021-01-18 10:16:36 -08:00
Denis Antrushin
f91dde86ae [Statepoint] Handle undef operands in statepoint.
Currently when spilling statepoint register operands in FixupStatepoints
we do not pay attention that it might be `undef`. We just generate a
spill, which may lead to verifier error because we have a use without def.

To handle it, let FixupStateponts ignore `undef` register operands
completely and change them to some constant value when generating
stack map. Use same value as used by ISel for this purpose (0xFEFEFEFE).

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D94703
2021-01-18 15:20:54 +03:00
Tres Popp
3fdf369051 Revert "[PowerPC] support register pressure reduction in machine combiner."
This reverts commit 26a396c4ef481cb159bba631982841736a125a9c.

See https://reviews.llvm.org/D92071 for a description of the issue.
2021-01-18 12:01:57 +01:00
Simon Pilgrim
ba5c703719 [DAG] SimplifyDemandedBits - use KnownBits comparisons to remove ISD::UMIN/UMAX ops
Use the KnownBits icmp comparisons to determine when a ISD::UMIN/UMAX op is unnecessary should either op be known to be ULT/ULE or UGT/UGE than the other.

Differential Revision: https://reviews.llvm.org/D94532
2021-01-18 10:29:23 +00:00
Craig Topper
4097ff94d8 [IR] Allow scalable vectors in structs to support intrinsics returning multiple values.
RISC-V would like to use a struct of scalable vectors to return multiple
values from intrinsics. This woud also be needed for target independent
intrinsics like llvm.sadd.overflow.

This patch removes the existing restriction for this. I've modified
StructType::isSized to consider a struct containing scalable vectors
as unsized so the verifier won't allow loads/stores/allocas of these
structs.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D94142
2021-01-17 23:29:51 -08:00
Chen Zheng
6f751d75c9 [PowerPC] support register pressure reduction in machine combiner.
Reassociating some patterns to generate more fma instructions to
reduce register pressure.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D92071
2021-01-17 23:56:13 -05:00
Qiu Chaofan
6cc3cf5799 [Legalizer] Promote result type in expanding FP_TO_XINT
This patch promotes result integer type of FP_TO_XINT in expanding.
So crash in conversion from ppc_fp128 to i1 will be fixed.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D92473
2021-01-18 11:56:11 +08:00
Chen Zheng
2bf67e3d19 [NFC] [TargetRegisterInfo] add one use check to lookThruCopyLike.
add one use check to lookThruCopyLike.

The root node is safe to be deleted if we are sure that every
definition in the copy chain only has one use.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D92069
2021-01-17 19:56:42 -05:00