1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

31898 Commits

Author SHA1 Message Date
Simon Pilgrim
4cccff07f7 [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT (REAPPLIED)
This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.

In particular this helps remove some unnecessary scalar->vector->scalar patterns.

The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.

Reapplied after reversion at rL368660 due to PR42982 which was fixed at rGca7fdd41bda0.

Differential Revision: https://reviews.llvm.org/D65887
2020-01-04 13:15:50 +00:00
Craig Topper
ef88147c11 [X86] Update MaxIndex test in x86-cmov-converter.ll to return the index and not use the index to look up the array after the loop.
This represents a more realistic version of the code being tested.
The cmov converter doesn't look at the code after the loop so
it doesn't matter for what's being tested.

But as noted in this twitter thread https://twitter.com/trav_downs/status/1213311159413161987
gcc can turn the previous MaxIndex code into the MaxValue code. So
returning the index makes it a distinct case.
2020-01-03 23:59:54 -08:00
Craig Topper
4ad2df5155 [X86] Autogenerate complete checks. NFC 2020-01-03 17:18:18 -08:00
Matt Arsenault
037bd41e28 AMDGPU: Add gfx9 run lines to a testcase 2020-01-03 15:25:50 -05:00
Sanjay Patel
7ff3934618 [DAGCombiner] fix miscompile in translating (X & undef) to shuffle
See PR42982 for more context:
https://bugs.llvm.org/show_bug.cgi?id=42982
2020-01-03 14:58:49 -05:00
Sanjay Patel
e91f60c7b7 [x86] add test for miscompile in XformToShuffleWithZero(); NFC 2020-01-03 14:49:25 -05:00
Craig Topper
58c0b5e9fe [X86] Improve for v2i32->v2f64 uint_to_fp
This uses an alternative implementation of this conversion derived
from our v2i32->v2f32 handling. We can zero extend the v2i32 to
v2i64, or it with the bit representation of 2.0^52 which will give
us 2.0^52 plus the 32-bit integer since double's mantissa is 52 bits.
Then we just need to subtract 2.0^52 as a double and let the floating
point unit normalize the remaining bits into a valid double.

This is less instructions then our previous code, but does require
a port 5 shuffle for the zero extend or unpack.

Differential Revision: https://reviews.llvm.org/D71945
2020-01-03 11:39:08 -08:00
Reid Kleckner
e7c2cc0e45 Move tail call disabling code to target independent code
When the "disable-tail-calls" attribute was added, checks were added for
it in various backends. Now this code has proliferated, and it is
something the target is responsible for checking. Move that
responsibility back to the ISels (fast, global, and SD).

There's no major functionality change, except for targets that never
implemented this check.

This LLVM attribute was originally added in
d9699bc7bdf0362173fcd256690f61a4d47429c2 (2015).

Reviewers: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D72118
2020-01-03 11:27:41 -08:00
Fangrui Song
9fd094ca82 [AArch64][test] Merge arm64-$i.ll Linux tests into $i.ll
Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D72061
2020-01-03 09:18:55 -08:00
Matt Arsenault
0ffb461077 AMDGPU/GlobalISel: Fix off by one in operand index
This should be looking at the RHS of the add for a constant.
2020-01-03 10:30:30 -05:00
Roman Lebedev
24c9de9750 [DAGCombiner][X86][AArch64] Generalize A-(A&B)->A&(~B) fold (PR44448)
The fold 'A - (A & (B - 1))' -> 'A & (0 - B)'
added in 8dab0a4a7d691f2704f1079538e0ef29548db159
is too specific. It should/can just be 'A - (A & B)' -> 'A & (~B)'

Even if we don't manage to fold `~` into B,
we have likely formed `ANDN` node.
Also, this way there's less similar-but-duplicate folds.

Name: X - (X & Y)  ->  X & (~Y)
%o = and i32 %X, %Y
%r = sub i32 %X, %o
  =>
%n = xor i32 %Y, -1
%r = and i32 %X, %n

https://rise4fun.com/Alive/kOUl

See
  https://bugs.llvm.org/show_bug.cgi?id=44448
  https://reviews.llvm.org/D71499
2020-01-03 17:55:47 +03:00
Roman Lebedev
e5e0775c51 [NFC][X86][AArch64] Add 'A - (A & B)' pattern tests (PR44448)
The fold 'A - (A & (B - 1))' -> 'A & (0 - B)'
added in 8dab0a4a7d691f2704f1079538e0ef29548db159
is too specific. It should just be 'A - (A & B)' -> 'A & (~B)'

Name: X - (X & Y)  ->  X & (~Y)
%o = and i32 %X, %Y
%r = sub i32 %X, %o
  =>
%n = xor i32 %Y, -1
%r = and i32 %X, %n

https://rise4fun.com/Alive/kOUl

See
  https://bugs.llvm.org/show_bug.cgi?id=44448
  https://reviews.llvm.org/D71499
2020-01-03 17:55:46 +03:00
Roman Lebedev
122f4cf40d [NFC][X86] Add BMI runlines to align-down.ll test 2020-01-03 17:55:46 +03:00
Roman Lebedev
f9c10b284e [DAGCombiner] ~(add X, -1) -> neg X fold
The fold 'A - (A & (B - 1))' -> 'A & (0 - B)'
added in 8dab0a4a7d691f2704f1079538e0ef29548db159
is too specific. It should just be 'A - (A & B)' -> 'A & (~B)',
but we currently fail to sink that '~' into `(B - 1)`.

Name: ~(X - 1)  ->  (0 - X)
%o = add i32 %X, -1
%r = xor i32 %o, -1
  =>
%r = sub i32 0, %X

https://rise4fun.com/Alive/rjU
2020-01-03 17:55:46 +03:00
Roman Lebedev
976233953a [NFC][DAGCombine][X86] '~(X - 1)' pattern tests
The fold 'A - (A & (B - 1))' -> 'A & (0 - B)'
added in 8dab0a4a7d691f2704f1079538e0ef29548db159
is too specific. It should just be 'A - (A & B)' -> 'A & (~B)',
but we currently fail to sink that '~' into `(B - 1)`.

Name: ~(X - 1)  ->  (0 - X)
%o = add i32 %X, -1
%r = xor i32 %o, -1
  =>
%r = sub i32 0, %X

https://rise4fun.com/Alive/rjU
2020-01-03 17:55:46 +03:00
Roman Lebedev
f8c34dee2f [DAGCombine][X86][Thumb2/LowOverheadLoops] A - (A & C) -> A & (~C) fold (PR44448)
While we do manage to fold integer-typed IR in middle-end,
we can't do that for the main motivational case of pointers.

There is @llvm.ptrmask() intrinsic which may or may not be helpful,
but i'm not sure it is fully considered canonical yet,
not everything is fully aware of it likely.

Name: PR44448  ptr - (ptr & C) -> ptr & (~C)
%bias = and i32 %ptr, C
%r = sub i32 %ptr, %bias
  =>
%r = and i32 %ptr, ~C

See
  https://bugs.llvm.org/show_bug.cgi?id=44448
  https://reviews.llvm.org/D71499
2020-01-03 17:55:45 +03:00
Roman Lebedev
65e2edf233 [NFC][DAGCombine][X86] Tests for 'A - (A & C)' pattern (PR44448)
Name: PR44448  ptr - (ptr & C) -> ptr & (~C)
%bias = and i32 %ptr, C
%r = sub i32 %ptr, %bias
  =>
%r = and i32 %ptr, ~C

The main motivational pattern involes pointer-typed values,
so this transform can't really be done in middle-end.

See
  https://bugs.llvm.org/show_bug.cgi?id=44448
  https://reviews.llvm.org/D71499
2020-01-03 17:55:45 +03:00
Sam Parker
c88913b0a6 [ARM][NFC] Update MIR test 2020-01-03 14:51:15 +00:00
Roman Lebedev
d15ef79468 [DAGCombine][X86][AArch64] 'A - (A & (B - 1))' -> 'A & (0 - B)' fold (PR44448)
While we do manage to fold integer-typed IR in middle-end,
we can't do that for the main motivational case of pointers.

There is @llvm.ptrmask() intrinsic which may or may not be helpful,
but i'm not sure it is fully considered canonical yet,
not everything is fully aware of it likely.

https://rise4fun.com/Alive/ZVdp

Name: ptr - (ptr & (alignment-1))  ->  ptr & (0 - alignment)
  %mask = add i64 %alignment, -1
  %bias = and i64 %ptr, %mask
  %r = sub i64 %ptr, %bias
=>
  %highbitmask = sub i64 0, %alignment
  %r = and i64 %ptr, %highbitmask

See
  https://bugs.llvm.org/show_bug.cgi?id=44448
  https://reviews.llvm.org/D71499
2020-01-03 13:58:36 +03:00
Roman Lebedev
1a3061047d [NFC][DAGCombine][X86][AArch64] Tests for 'A - (A & (B - 1))' pattern (PR44448)
https://rise4fun.com/Alive/ZVdp

Name: ptr - (ptr & (alignment-1))  ->  ptr & (0 - alignment)
  %mask = add i64 %alignment, -1
  %bias = and i64 %ptr, %mask
  %r = sub i64 %ptr, %bias
=>
  %highbitmask = sub i64 0, %alignment
  %r = and i64 %ptr, %highbitmask

The main motivational pattern involes pointer-typed values,
so this transform can't really be done in middle-end.

See
  https://bugs.llvm.org/show_bug.cgi?id=44448
  https://reviews.llvm.org/D71499
2020-01-03 13:58:36 +03:00
Craig Topper
01fa15d221 [X86] Re-enable lowerUINT_TO_FP_vXi32 under fast-math by using an FSUB instead of an FADD.
Summary:
We previously disabled this under fast math due to aggressive
reassociation by the machine combiner. But I think we can work
around this by using a FSUB instead of FADD for the first
operation.

This matches the similar algorithm we do for uint_to_fp i64->f64
in TargetLowering::expandUINT_TO_FP. If reassociation hasn't
been a problem for that, hopefully its not a problem here.

Reviewers: RKSimon, spatel, scanon

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71968
2020-01-02 21:46:53 -08:00
QingShan Zhang
0a7fe2ac65 [DAGCombine] Initialize the default operation action for SIGN_EXTEND_INREG for vector type as 'expand' instead of 'legal'
For now, we didn't set the default operation action for SIGN_EXTEND_INREG for
vector type, which is 0 by default, that is legal. However, most target didn't
have native instructions to support this opcode. It should be set as expand by
default, as what we did for ANY_EXTEND_VECTOR_INREG.

Differential Revision: https://reviews.llvm.org/D70000
2020-01-03 03:26:41 +00:00
Wang, Pengfei
6277397419 [X86] Enable strict FP by default and remove option -disable-strictnode-mutation. NFCI. 2020-01-03 10:59:34 +08:00
Justin Hibbits
75b225e755 [PowerPC]: Fix predicate handling with SPE
SPE floating-point compare instructions only update the GT bit in the CR
field.  All predicates must therefore be reduced to GT/LE.
2020-01-02 19:30:53 -06:00
Justin Hibbits
72adc6d222 Run update_llc_test_checks against SPE tests.
This is in preparation for further tests which are better generated with
the script.  No functional change.
2020-01-02 19:30:52 -06:00
Wang, Pengfei
1222e272ea [X86] Optimization of inserting vxi1 sub vector into vXi1 vector
Summary:
After bugfix the undef value case here, we used more operations to implement inserting vxi1 sub vector into vXi1 vector, I optimize it by use less operations.

The history information at https://reviews.llvm.org/D68311

Reviewers: craig.topper, LuoYuanke, yubing, annita.zhang, pengfei, LiuChen3, RKSimon

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D71917
2020-01-03 09:25:25 +08:00
Sean Fertile
800b81bc7b [PowerPC][AIX] Enable sret arguments.
Removes the fatal error for sret arguments and adds lit testing.

Differential Revision: https://reviews.llvm.org/D71504
2020-01-02 19:31:01 -05:00
Evgenii Stepanov
9d355c1873 Change dbg-*-tag-offset tests to use llvm-dwarfdump.
Reviewers: dblaikie

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72023
2020-01-02 14:35:54 -08:00
Jonas Paulsson
2c90fb35b9 [SystemZ] Create brcl 0,0 instead of brcl 0,3 in EmitNop for 6 bytes.
For consistency with GCC, the target label is moved to the brcl itself
instead of the next instruction.

Review: Ulrich Weigand
2020-01-02 13:21:04 -08:00
Matt Arsenault
ff6a637c9b AMDGPU/GlobalISel: Correct MMO sizes in some tests
There intended to test non-extloads, but the memory size did not match
the result size.
2020-01-02 16:00:46 -05:00
Matt Arsenault
6596ef63a7 AMDGPU/GlobalISel: Regenerate check lines
This avoids diff noise in a future commit from the check name change
from the G_GEP->G_PTR_ADD rename.
2020-01-02 16:00:45 -05:00
Nemanja Ivanovic
95db3a73f8 [PowerPC] Only legalize FNEARBYINT with unsafe fp math
Commit 0f0330a78709 legalized these nodes on PPC without consideration of
unsafe math which means that we get inexact exceptions raised for nearbyint.
Since this doesn't conform to the standard, switch this legalization to depend
on unsafe fp math.
2020-01-02 13:45:54 -06:00
Craig Topper
2dd345a5d8 [X86] Remove FP0-6 operands from call instructions in FPStackifier pass. Only count defs as returns.
All FP0-6 operands should be removed by the FP stackifier. By
removing these we fix the machine verifier error in PR39437.

I've also made it so that only defs are counted for STReturns
which removes what I think were extra stack cleanup instructions.

And I've removed the regcall assert because it was checking the
attributes of the caller, but here we're concerned with the
attributes of the callee. But I don't know how to get that
information from this level.
2020-01-02 11:10:51 -08:00
Ulrich Weigand
d43411f026 [FPEnv] Default NoFPExcept SDNodeFlag to false
The NoFPExcept bit in SDNodeFlags currently defaults to true, unlike all
other such flags. This is a problem, because it implies that all code that
transforms SDNodes without copying flags can introduce a correctness bug,
not just a missed optimization.

This patch changes the default to false. This makes it necessary to move
setting the (No)FPExcept flag for constrained intrinsics from the
visitConstrainedIntrinsic routine to the generic visit routine at the
place where the other flags are set, or else the intersectFlagsWith
call would erase the NoFPExcept flag again.

In order to avoid making non-strict FP code worse, whenever
SelectionDAGISel::SelectCodeCommon matches on a set of orignal nodes
none of which can raise FP exceptions, it will preserve this property
on all results nodes generated, by setting the NoFPExcept flag on
those result nodes that would otherwise be considered as raising
an FP exception.

To check whether or not an SD node should be considered as raising
an FP exception, the following logic applies:

- For machine nodes, check the mayRaiseFPException property of
  the underlying MI instruction
- For regular nodes, check isStrictFPOpcode
- For target nodes, check a newly introduced isTargetStrictFPOpcode

The latter is implemented by reserving a range of target opcodes,
similarly to how memory opcodes are identified. (Note that there a
bit of a quirk in identifying target nodes that are both memory nodes
and strict FP nodes. To simplify the logic, right now all target memory
nodes are automatically also considered strict FP nodes -- this could
be fixed by adding one more range.)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D71841
2020-01-02 16:59:45 +01:00
David Green
5ca377e901 [ARM] Update ifcvt test target triples and opcodes. NFC
Some of the instructions in these tests were technically invalid
combinations (using ARM opcodes in Thumb mode, for example). Update the
targets and the instructions used to be more correct.
2020-01-02 14:18:54 +00:00
Andrzej Warzynski
f863ea0ba6 [AArch64][SVE] Gather loads: pass 32 bit unpacked offsets as nxv2i32
Summary:
Currently 32 bit unpacked offsets are passed as nxv2i64. However, as
pointed out in https://reviews.llvm.org/D71074, using nxv2i32 instead
would improve consistency with:
  * how other arguments are treated
  * how scatter stores are implemented
This patch makes sure that 32 bit unpacked offsets are passes as nxv2i32
instead of nxv2i64.

Reviewers: sdesmalen, efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71724
2020-01-02 13:01:28 +00:00
Fangrui Song
524d73508e [XRay][test] Fix xray-empty-firstmbb.mir and delete incorrect xray-empty-function.mir
xray-empty-firstmbb.mir does not test the intended code path. Change
xray-instruction-threshold to 0 to exercise the code path.

Delete xray-empty-function.mir . Empty MachineFunction does not work.
Various passes (e.g. MachineDominatorTree) assume the presence of an
entry block.
2020-01-01 22:21:11 -08:00
Craig Topper
e7e2dd9f70 [X86] Add x86_regcallcc calling convention to function declaration recently added in a test.
The callsite had the calling convention, but not the function itself.
2020-01-01 19:07:37 -08:00
Craig Topper
88590b7c02 [X86] Add test cases for regcall function that takes a long double as a parameter, but does not return a long double.
I believe we are incorrectly doing some FP stack manipulations
after the call.
2020-01-01 18:53:12 -08:00
Craig Topper
0d150f0e68 [X86] Call SimplifyMultipleUseDemandedBits from combineVSelectToBLENDV if the condition is used by something other than select conditions.
We might be able to bypass some nodes on the condition path.

Differential Revision: https://reviews.llvm.org/D71984
2020-01-01 11:16:52 -08:00
David Green
d7d2f17739 [ARM] Add +mve feature to mve tests. NFC 2020-01-01 17:25:20 +00:00
Liu, Chen3
b8ff856aff add strict float for round operation
Differential Revision: https://reviews.llvm.org/D72026
2020-01-01 20:42:12 +08:00
Fangrui Song
fcca6780a8 [MC][TargetMachine] Delete MCTargetOptions::MCPIECopyRelocations
clang/lib/CodeGen/CodeGenModule performs the -mpie-copy-relocations
check and sets dso_local on applicable global variables. We don't need
to duplicate the work in TargetMachine shouldAssumeDSOLocal.

Verified that -mpie-copy-relocations can still emit PC relative
relocations for external variable accesses.

clang -target x86_64 -fpie -mpie-copy-relocations -c => R_X86_64_PC32
clang -target aarch64 -fpie -mpie-copy-relocations -c => R_AARCH64_ADR_PREL_PG_HI21+R_AARCH64_LDST64_ABS_LO12_NC
2020-01-01 00:50:18 -08:00
Craig Topper
c7da2d97ef [X86] Add X87 FCMOV support to X86FlagsCopyLowering.
Fixes PR44396
2019-12-31 20:35:21 -08:00
Matt Arsenault
94bbcd967d DAG: Stop trying to fold FP -(x-y) -> y-x in getNode with nsz
This was increasing the number of instructions when fsub was legalized
on AMDGPU with no signed zeros enabled. This fold should be guarded by
hasOneUse, and I don't think getNode should be doing that. The same
fold is already done as a regular combine through isNegatibleForFree.

This does require duplicating, even though isNegatibleForFree does
this combine already (and properly checks hasOneUse) to avoid one PPC
regression. In the regression, the outer fneg has nsz but the fsub
operand does not. isNegatibleForFree only sees the operand, and
doesn't see it's used from a nsz context. A nsz parameter needs to be
added and threaded through isNegatibleForFree to avoid this.
2019-12-31 22:49:51 -05:00
Craig Topper
3d13b6dc72 [X86] Constant fold KSHIFT of an all zeros vector to just an all zeros vector. 2019-12-31 15:57:39 -08:00
Craig Topper
fad67910ab [X86] Use carry flag from add for (seteq (add X, -1), -1).
If we just subtracted 1 and are checking if the result is -1. We can use the carry flag from the ADD instead of an explicit CMP. I'm using the same checks for the add users as EmitTest.

Fixes one case from PR44412

Differential Revision: https://reviews.llvm.org/D72019
2019-12-31 15:05:23 -08:00
Matt Arsenault
6ce33119b3 AMDGPU: Precommit test showing extra instructions are introduced 2019-12-31 14:54:57 -05:00
Michael Liao
e0db31d08e [amdgpu] Fix scoreboard updating on s_waitcnt_vscnt.
Summary: - Other counters are accidentally cleared.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71866
2019-12-31 14:20:30 -05:00
Craig Topper
dd14b23f4e [X86] Add test case for opposite branch condition for PR44412. NFC 2019-12-31 10:58:04 -08:00