1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 04:52:54 +02:00
Commit Graph

332 Commits

Author SHA1 Message Date
Matt Arsenault
bd0ea56e67 AMDGPU: Return correct type during argument lowering
The type needs to be casted back to the original argument type.
Fixes an assert that for some reason is only run when
using -debug.

Includes an additional combine to avoid test regressions
from having conversions mixed with multiple Assert[SZ]ext
nodes. On subtargets where i16 is legal, this was producing an i32
register with an i16 AssertZExt, truncated to i16 with another i8
AssertZExt.

t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
t3: i16 = truncate t2
t5: i16 = AssertZext t3, ValueType:ch:i8
t6: i8 = truncate t5
t7: i32 = zero_extend t6
llvm-svn: 308082
2017-07-15 05:52:59 +00:00
Stanislav Mekhanoshin
9046f7bb15 [AMDGPU] fcaninicalize optimization for GFX9+
Since GFX9 supports denorm modes for v_min_f32/v_max_f32 that
is possible to further optimize fcanonicalize and remove it
if applied to min/max given their operands are known not to be
an sNaN or that sNaNs are not supported.

Additionally we can remove fcanonicalize if denorms are supported
for the VT and we know that its argument is never a NaN.

Differential Revision: https://reviews.llvm.org/D35335

llvm-svn: 307976
2017-07-13 23:59:15 +00:00
Stanislav Mekhanoshin
efe34aaf12 [AMDGPU] fcanonicalize elimination optimization
We are using multiplication by 1.0 to flush denormals and quiet sNaNs.
That is possible to omit this multiplication if source of the
fcanonicalize instruction is known to be flushed/quieted, i.e.
if it comes from another instruction known to do the normalization
and we are using IEEE mode to quiet sNaNs.

Differential Revision: https://reviews.llvm.org/D35218

llvm-svn: 307848
2017-07-12 21:20:28 +00:00
Nirav Dave
c55f8bfffc Add DAG argument to canMergeStoresTo NFC.
llvm-svn: 307583
2017-07-10 20:25:54 +00:00
Simon Pilgrim
7f479d6ac7 [AMDGPU] Fix -Wimplicit-fallthrough warning. NFCI.
llvm-svn: 307485
2017-07-08 19:50:03 +00:00
Stanislav Mekhanoshin
c39b2b753b [AMDGPU] Always use rcp + mul with fast math
Regardless of relaxation options such as -cl-fast-relaxed-math
we are producing rather long code for fdiv via amdgcn_fdiv_fast
intrinsic. This intrinsic is used to replace fdiv with 2.5ulp
metadata and does not handle denormals, thus believed to be fast.

An fdiv instruction can also have fast math flag either by itself
or together with fpmath metadata. Clang used with a relaxation flag
always produces both metadata and fast flag:

%div = fdiv fast float %v, %0, !fpmath !12
!12 = !{float 2.500000e+00}

Current implementation ignores fast flag and favors metadata. An
instruction with just fast flag would be lowered to a fastest rcp +
mul, but that never happen on practice because of described mutual
clang and BE behavior.

This change allows an "fdiv fast" to be always lowered as rcp + mul.

Differential Revision: https://reviews.llvm.org/D34844

llvm-svn: 307308
2017-07-06 20:34:21 +00:00
Craig Topper
d8ebaac997 [Constants] If we already have a ConstantInt*, prefer to use isZero/isOne/isMinusOne instead of isNullValue/isOneValue/isAllOnesValue inherited from Constant. NFCI
Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne.

llvm-svn: 307292
2017-07-06 18:39:47 +00:00
Stanislav Mekhanoshin
9d3604967f [AMDGPU] Simplify setcc (sext from i1 b), -1|0, cc
Depending on the compare code that can be either an argument of
sext or negate of it. This helps to avoid v_cndmask_b64 instruction
for sext. A reversed value can be further simplified and folded into
its parent comparison if possible.

Differential Revision: https://reviews.llvm.org/D34545

llvm-svn: 306446
2017-06-27 18:53:03 +00:00
Stanislav Mekhanoshin
f85be265f1 [AMDGPU] Combine and x, (sext cc from i1) => select cc, x, 0
Also factored out function to check if a boolean is an already
deserialized value which does not require v_cndmask_b32 to be
loaded. Added binary logical operators to its check.

Differential Revision: https://reviews.llvm.org/D34500

llvm-svn: 306439
2017-06-27 18:25:26 +00:00
Matt Arsenault
06d9b51c75 AMDGPU: Whitespace fixes
llvm-svn: 306265
2017-06-26 03:01:36 +00:00
Matt Arsenault
9b6d8d16dd AMDGPU: Partially fix implicit.buffer.ptr intrinsic handling
This should not be treated as a different version of
private_segment_buffer. These are distinct things with
different uses and register classes, and requires the
function argument info to have more context about the
function's type and environment.

Also add missing test coverage for the intrinsic, and
emit an error for HSA. This also encovers that the intrinsic
is broken unless there happen to be stack objects.

llvm-svn: 306264
2017-06-26 03:01:31 +00:00
David Stuttard
acb1b94192 [AMDGPU] Add intrinsics for tbuffer load and store - build error fix
Variable was unused in non-debug build (used in assert) causing compile time
warning and eventual build failure

llvm-svn: 306034
2017-06-22 17:15:49 +00:00
David Stuttard
a19aa0bc03 [AMDGPU] Add intrinsics for tbuffer load and store
Intrinsic already existed for llvm.SI.tbuffer.store

Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.*

Added CodeGen tests for the 2 new variants added.
Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr

Differential Revision: https://reviews.llvm.org/D30687

llvm-svn: 306031
2017-06-22 16:29:22 +00:00
Stanislav Mekhanoshin
9d9d25f071 [AMDGPU] Add FP_CLASS to the add/setcc combine
This is one of the nodes which also compile as v_cmp_*.

Differential Revision: https://reviews.llvm.org/D34485

llvm-svn: 305970
2017-06-21 23:46:22 +00:00
Stanislav Mekhanoshin
10582312cd [AMDGPU] Combine add and adde, sub and sube
If one of the arguments of adde/sube is zero we can fold another
add/sub into it.

Differential Revision: https://reviews.llvm.org/D34374

llvm-svn: 305964
2017-06-21 22:30:01 +00:00
Stanislav Mekhanoshin
ca766419df [AMDGPU] simplify add x, *ext (setcc) => addc|subb x, 0, setcc
This simplification allows to avoid generating v_cndmask_b32
to serialize condition code between compare and use.

Differential Revision: https://reviews.llvm.org/D34300

llvm-svn: 305962
2017-06-21 22:05:06 +00:00
Matt Arsenault
1b9121f71c AMDGPU: Cleanup CreateLiveInRegister
llvm-svn: 305748
2017-06-19 21:52:45 +00:00
Matt Arsenault
cf4059efeb AMDGPU: Teach isLegalAddressingMode about flat offsets
Also fix reporting r+r as a valid addressing mode without
offsets.

llvm-svn: 305203
2017-06-12 17:06:35 +00:00
Chandler Carruth
eb66b33867 Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

llvm-svn: 304787
2017-06-06 11:49:48 +00:00
Mandeep Singh Grang
efd068d7d5 [llvm] Remove double semicolons
Reviewers: craig.topper, arsenm, mehdi_amini

Reviewed By: mehdi_amini

Subscribers: mehdi_amini, wdng, nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33924

llvm-svn: 304767
2017-06-06 05:08:36 +00:00
Alexander Timofeev
b673edd025 AMDGPUAnnotateUniformValue should always treat volatile loads as divergent
llvm-svn: 304554
2017-06-02 15:25:52 +00:00
Nirav Dave
0855b10b8c [AMDGPU] Prevent too large store merges in AMDGPU Subtargets. NFCI.
Various address spaces on the SI and R600 subtargets have stricter
limits on memory access size that other address spaces. Use
canMergeStoresTo predicate to prevent the DAGCombiner from creating
these stores as they will be split up during legalization.

llvm-svn: 303767
2017-05-24 15:59:09 +00:00
Stanislav Mekhanoshin
6d04b87725 [AMDGPU] Combine and (srl) into shl (bfe)
Perform DAG combine:
and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb
Where nb is a number of trailing zeroes in mask.

It replaces two instructions with two and BFE is generally a more
expensive one. However this is only done if we are selecting a byte
or word at an aligned boundary which results in a proper SDWA
operand pattern. It is only done if SDWA is supported.

TODO: improve SDWA pass to actually convert this pattern. It is not
done now because we have an immediate in the instruction, which has
be moved into a VGPR.

Differential Revision: https://reviews.llvm.org/D33455

llvm-svn: 303681
2017-05-23 19:54:48 +00:00
Matt Arsenault
ab4fb8ba2f AMDGPU: Start defining a calling convention
Partially implement callee-side for arguments and return values.
byval doesn't work properly, and most likely sret or other on-stack
return values most as well.

llvm-svn: 303308
2017-05-17 21:56:25 +00:00
Matt Arsenault
13c6f6f3f1 AMDGPU: Make better use of op_sel with high components
Handle more general swizzles.

llvm-svn: 303296
2017-05-17 20:30:58 +00:00
Matt Arsenault
b41d61b11a AMDGPU: Fix min3/max3 combines for f16/i16
Fix missing instruction definitions for min3/max3.

llvm-svn: 303284
2017-05-17 19:25:06 +00:00
Davide Italiano
2059f769c1 [AMDGPU] Placate unused variable warning in release builds.
llvm-svn: 302821
2017-05-11 19:58:52 +00:00
Matt Arsenault
fd9981d4ab AMDGPU: Pull fneg out of extract_vector_elt
This allows folding source modifiers in more f16 cases.
Makes it easier to select per-component packed neg modifiers.

llvm-svn: 302813
2017-05-11 17:26:25 +00:00
Marek Olsak
372ac26def AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D32645

llvm-svn: 302200
2017-05-04 22:25:20 +00:00
Amara Emerson
7b62022bef Generalize the specialized flag-carrying SDNodes by moving flags into SDNode.
This removes BinaryWithFlagsSDNode, and flags are now all passed by value.

Differential Revision: https://reviews.llvm.org/D32527

llvm-svn: 301803
2017-05-01 15:17:51 +00:00
Marek Olsak
d1c320b134 AMDGPU: Add new amdgcn.init.exec intrinsics
v2: More tests, bug fixes, cosmetic changes.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D31762

llvm-svn: 301677
2017-04-28 20:21:58 +00:00
Craig Topper
af680f3cea [SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits
This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently.

This is largely a mechanical transformation from KnownZero to Known.Zero.

Differential Revision: https://reviews.llvm.org/D32569

llvm-svn: 301620
2017-04-28 05:31:46 +00:00
Krzysztof Parzyszek
ce1e95e40d Move size and alignment information of regclass to TargetRegisterInfo
1. RegisterClass::getSize() is split into two functions:
   - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const;
   - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const;
2. RegisterClass::getAlignment() is replaced by:
   - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const;

This will allow making those values depend on subtarget features in the
future.

Differential Revision: https://reviews.llvm.org/D31783

llvm-svn: 301221
2017-04-24 18:55:33 +00:00
Matt Arsenault
237f3db4bd AMDGPU: Move trap lowering to DAG
Fixes traps in any block besides the entry block,
and fixes depending on a live-in physical register
by using a virtual register copy.

Also happens to stop emitting a nop in the case
debug trap is not supported.

llvm-svn: 301206
2017-04-24 17:49:13 +00:00
Konstantin Zhuravlyov
079d4d7f9f AMDGPU: Do not lower fast unsafe div for safe, f32, with fp32 denormals
Differential Revision: https://reviews.llvm.org/D32085

llvm-svn: 301023
2017-04-21 19:25:33 +00:00
Akira Hatanaka
6e48df725c [AArch64] Improve code generation for logical instructions taking
immediate operands.

This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.

This recommits r300932 and r300930, which was causing dag-combine to
loop forever. The problem was that optimizeLogicalImm was returning
true even when there was no change to the immediate node (which happened
when the immediate was all zeros or ones), which caused dag-combine to
push and pop the same node to the work list over and over again without
making any progress.

This commit fixes the bug by returning false early in optimizeLogicalImm
if the immediate is all zeros or ones. Also, it changes the code to
compare the immediate with 0 or Mask rather than calling
countPopulation.

rdar://problem/18231627

Differential Revision: https://reviews.llvm.org/D5591

llvm-svn: 301019
2017-04-21 18:53:12 +00:00
Akira Hatanaka
3808b889a6 Revert r300932 and r300930.
It seems that r300930 was creating an infinite loop in dag-combine when
compling the following file:

MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c

llvm-svn: 300940
2017-04-21 01:31:50 +00:00
Akira Hatanaka
62264dba17 [AArch64] Improve code generation for logical instructions taking
immediate operands.

This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.

This recommits r300913, which broke bots because I didn't fix a call to
ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of
TargetLoweringOpt and TargetLowering.

rdar://problem/18231627

Differential Revision: https://reviews.llvm.org/D5591

llvm-svn: 300930
2017-04-21 00:05:16 +00:00
Matt Arsenault
72dd027fd6 AMDGPU: Custom lower illegal small select types
Promote them to i32 vectors to avoid unpacking and re-packing
the vectors.

llvm-svn: 300754
2017-04-19 20:53:07 +00:00
Matt Arsenault
8e264191d7 AMDGPU: Fix invalid copies when copying i1 to phys reg
Insert a VReg_1 virtual register so the i1 workaround pass
can handle it.

llvm-svn: 300113
2017-04-12 21:58:23 +00:00
Matt Arsenault
c16c71ecdb AMDGPU: Refactor SIMachineFunctionInfo slightly
Prepare for handling non-entry functions.

llvm-svn: 299999
2017-04-11 22:29:28 +00:00
Matt Arsenault
80352fe61c AMDGPU: Refactor argument lowering
Split into smaller functions and prepare for handling
non-entry functions.

llvm-svn: 299998
2017-04-11 22:29:24 +00:00
Konstantin Zhuravlyov
5aca078403 AMDGPU/GFX9: Fix shared and private aperture queries
Differential Revision: https://reviews.llvm.org/D31786

llvm-svn: 299727
2017-04-06 23:02:33 +00:00
Matt Arsenault
5f1e090b9b AMDGPU: Replace fp16SrcZerosHighBits with a whitelist
FCOPYSIGN is lowered to bit operations which don't clear the high
bits.

llvm-svn: 299708
2017-04-06 20:58:30 +00:00
Matt Arsenault
7dae06fe21 AMDGPU: Stop using CCAssignToRegWithShadow
This does not do what it is attempting to use it for
and requires working around in LowerFormalArguments.

llvm-svn: 299667
2017-04-06 17:37:27 +00:00
Stanislav Mekhanoshin
ebf08bfb48 [AMDGPU] Eliminate barrier if workgroup size is not greater than wavefront size
If a workgroup size is known to be not greater than wavefront size
the s_barrier instruction is not needed since all threads are guarantied
to come to the same point at the same time.

Differential Revision: https://reviews.llvm.org/D31731

llvm-svn: 299659
2017-04-06 16:48:30 +00:00
Matt Arsenault
494ba6a33a AMDGPU: Remove legacy export intrinsic
llvm-svn: 299444
2017-04-04 16:34:39 +00:00
Matt Arsenault
7c5896b3a9 AMDGPU: Remove llvm.SI.vs.load.input
llvm-svn: 299391
2017-04-03 21:45:13 +00:00
Matt Arsenault
4041d4247f AMDGPU: Remove legacy bfe intrinsics
llvm-svn: 299372
2017-04-03 18:08:08 +00:00
Matt Arsenault
c3c5eef5bb AMDGPU: Remove unnecessary ands when f16 is legal
Add a new node to act as a fancy bitcast from f16 operations to
i32 that implicitly zero the high 16-bits of the result.

Alternatively could try making v2f16 legal and canonicalizing
on build_vectors.

llvm-svn: 299246
2017-03-31 19:53:03 +00:00