1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 13:02:52 +02:00
Commit Graph

1392 Commits

Author SHA1 Message Date
Chandler Carruth
c387751c70 Enable '-Wstring-conversion' and fix some bad asserts that it helped
find.

Notable is the assert in NewGVN which had no effect because of the bug.

llvm-svn: 290400
2016-12-23 01:38:06 +00:00
Matt Arsenault
1edd642a1d AMDGPU: Invert cmp + select with constant
Canonicalize a select with a constant to the false side. This
enables more instruction shrinking opportunities since an
inline immediate can be used for the false side of v_cndmask_b32_e32.

This seems to usually be better but causes some code size regressions
in some tests.

llvm-svn: 290372
2016-12-22 21:40:08 +00:00
Matt Arsenault
66ebaecd36 AMDGPU: Use i16 for i16 shift amount
llvm-svn: 290351
2016-12-22 16:36:25 +00:00
Matt Arsenault
ca3de8cd67 AMDGPU: Fix missing 16-bit cmpx instructions
llvm-svn: 290349
2016-12-22 16:27:14 +00:00
Matt Arsenault
9419a1ea69 AMDGPU: Use i16 comparison instructions
llvm-svn: 290348
2016-12-22 16:27:11 +00:00
Matt Arsenault
0975fb877d AMDGPU: Fixed '!NodePtr->isKnownSentinel()' assert
Caused by dereferencing end iterator when trying to const cast the iterator.

Patch by Martin Sherburn

llvm-svn: 290347
2016-12-22 16:06:32 +00:00
Sam Kolton
dc4ffc9328 [AMDGPU] Add pseudo SDWA instructions
Summary: This is needed for later SDWA support in CodeGen.

Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27412

llvm-svn: 290338
2016-12-22 12:57:41 +00:00
Sam Kolton
0ab0b61c0c [AMDGPU] Disassembler: fix for disaasembling v_mac_f32/16_dpp/sdwa
Summary: Real instruction should copy constraints from real instruction. This allows auto-generated disassembler to correctly process tied operands.

Reviewers: nhaustov, vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27847

llvm-svn: 290336
2016-12-22 11:30:48 +00:00
Matt Arsenault
13f610555b AMDGPU: Fix missing commute table entries for cmpx
No tests because these aren't currently used anywhere.

llvm-svn: 290316
2016-12-22 04:39:41 +00:00
Matt Arsenault
ee5d8d2da0 AMDGPU: Swap order of operands in fadd/fsub combine
FMA is canonicalized to constant in the middle operand. Do
the same so fmad matches and avoid an extra combine step.

llvm-svn: 290313
2016-12-22 04:03:40 +00:00
Matt Arsenault
9d4a891569 AMDGPU: Check fast math flags in fadd/fsub combines
llvm-svn: 290312
2016-12-22 04:03:35 +00:00
Matt Arsenault
dd6b858bbf AMDGPU: Form more FMAs if fusion is allowed
Extend the existing fadd/fsub->fmad combines to produce
FMA if allowed.

llvm-svn: 290311
2016-12-22 03:55:35 +00:00
Matt Arsenault
f4e299f829 AMDGPU: Move combines into separate functions
llvm-svn: 290309
2016-12-22 03:44:42 +00:00
Matt Arsenault
5ba9667c15 AMDGPU: Enable some f32 fadd/fsub combines for f16
llvm-svn: 290308
2016-12-22 03:40:39 +00:00
Matt Arsenault
d7ec3d5ba4 AMDGPU: Implement isFMAFasterThanFMulAndFAdd for f16
llvm-svn: 290307
2016-12-22 03:21:48 +00:00
Matt Arsenault
5ecf306700 AMDGPU: Allow rcp and rsq usage with f16
llvm-svn: 290302
2016-12-22 03:05:44 +00:00
Matt Arsenault
a844bf67ff AMDGPU: Custom lower f16 fdiv
llvm-svn: 290301
2016-12-22 03:05:41 +00:00
Matt Arsenault
263e20ee06 AMDGPU: Implement f16 fcanonicalize
llvm-svn: 290300
2016-12-22 03:05:37 +00:00
Matt Arsenault
5979870866 AMDGPU: Update isFPImmLegal for f16
I don't think this matters because ConstantFP is legal.

llvm-svn: 290299
2016-12-22 03:05:30 +00:00
Tom Stellard
4026197eca AMDGPU/SI: Fix file header
llvm-svn: 290265
2016-12-21 19:06:24 +00:00
Davide Italiano
9d09222590 [AMDGPU] Garbage collect dead code. NFCI.
llvm-svn: 290249
2016-12-21 10:19:00 +00:00
Matt Arsenault
ea409cc20d AMDGPU: Allow 16-bit types in inline asm constraints
llvm-svn: 290193
2016-12-20 19:06:12 +00:00
Matt Arsenault
63d92e4ebb AMDGPU: Don't add same instruction multiple times to worklist
When the instruction is processed the first time, it may be
deleted resulting in crashes. While the new test adds the same
user to the worklist twice, this particular case doesn't crash
but I'm not sure why.

llvm-svn: 290191
2016-12-20 18:55:06 +00:00
Tom Stellard
5634d231ec AMDGPU/SI: Make a function const
llvm-svn: 290185
2016-12-20 17:26:34 +00:00
Tom Stellard
2c0dd4ec69 AMDGPU/SI: Add a MachineMemOperand when lowering llvm.amdgcn.buffer.load.*
Reviewers: arsenm, nhaehnle, mareko

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D27834

llvm-svn: 290184
2016-12-20 17:19:44 +00:00
Tom Stellard
ec757acb99 AMDGPU/SI: Add a MachineMemOperand to MIMG instructions
Summary:
Without a MachineMemOperand, the scheduler was assuming MIMG instructions
were ordered memory references, so no loads or stores could be reordered
across them.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27536

llvm-svn: 290179
2016-12-20 15:52:17 +00:00
Konstantin Zhuravlyov
3febbc8b1e [AMDGPU] When unifying metadata, add operands to named metadata individually
Differential Revision: https://reviews.llvm.org/D27725

llvm-svn: 290114
2016-12-19 16:54:24 +00:00
Sam Kolton
fcea1ddb2f AMDGPU: [AMDGPU] Assembler: add .hsa_code_object_metadata directive for functime metadata V2.0
Summary:
Added pair of directives .hsa_code_object_metadata/.end_hsa_code_object_metadata.
Between them user can put YAML string that would be directly put to the generated note. E.g.:
'''
.hsa_code_object_metadata
    {
        amd.MDVersion: [ 2, 0 ]
    }
.end_hsa_code_object_metadata
'''
Based on D25046

Reviewers: vpykhtin, nhaustov, yaxunl, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, mgorny, tony-tye

Differential Revision: https://reviews.llvm.org/D27619

llvm-svn: 290097
2016-12-19 11:43:15 +00:00
Matt Arsenault
5f7fab5b1a AMDGPU: Fix name for v_ashrrev_i16
llvm-svn: 289967
2016-12-16 17:40:11 +00:00
Matt Arsenault
b1034a224d AMDGPU: Select branch on undef to uniform scc branch
llvm-svn: 289877
2016-12-15 21:57:11 +00:00
Matt Arsenault
feb1ec1fb2 AMDGPU: Fix asserting on returned tail calls
llvm-svn: 289868
2016-12-15 20:50:12 +00:00
Matt Arsenault
496e9bc65d AMDGPU: Assembler support for vintrp instructions
llvm-svn: 289866
2016-12-15 20:40:20 +00:00
Alexander Timofeev
aa7ea574e9 Fix for regression after Global Load Scalarization patch
llvm-svn: 289822
2016-12-15 15:17:19 +00:00
Krzysztof Parzyszek
b6cc44c368 Extract LaneBitmask into a separate type
Specifically avoid implicit conversions from/to integral types to
avoid potential errors when changing the underlying type. For example,
a typical initialization of a "full" mask was "LaneMask = ~0u", which
would result in a value of 0x00000000FFFFFFFF if the type was extended
to uint64_t.

Differential Revision: https://reviews.llvm.org/D27454

llvm-svn: 289820
2016-12-15 14:36:06 +00:00
Nico Weber
bc8a99a6d1 fix gcc warning about a superfluous ;
llvm-svn: 289705
2016-12-14 20:33:54 +00:00
Yaxun Liu
d4bb42cc3b Fix build failure due to r289674 on certain systems
Removed a useless include which caused conflict.

llvm-svn: 289700
2016-12-14 20:17:47 +00:00
Yaxun Liu
98de4b3c84 AMDGPU: Emit runtime metadata version 2 as YAML
Differential Revision: https://reviews.llvm.org/D25046

llvm-svn: 289674
2016-12-14 17:16:52 +00:00
Matt Arsenault
6bc4304e35 AMDGPU: Make AllocationPriority of SGPRs higher than VGPRs
Since SGPRs should spill to VGPRs, they should be allocated first.
I don't think this is sufficient for SGPRs to always spill to
VGPRs though.

llvm-svn: 289671
2016-12-14 16:52:06 +00:00
Nirav Dave
9fd3ae9cf9 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
Reverting due to ARM MCJIT and MIPS LLD error.

This reverts commit r289659.

llvm-svn: 289667
2016-12-14 16:43:44 +00:00
Matt Arsenault
c74ace61e1 AMDGPU: Change vintrp printing
llvm-svn: 289664
2016-12-14 16:36:12 +00:00
Nirav Dave
afe2eccae3 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing after removing load-store factoring through
token factors in favor of improved token factor operand pruning

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289659
2016-12-14 15:44:26 +00:00
Stephan Bergmann
aba15d97df Replace APFloatBase static fltSemantics data members with getter functions
At least the plugin used by the LibreOffice build
(<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly
uses those members (through inline functions in LLVM/Clang include files in turn
using them), but they are not exported by utils/extract_symbols.py on Windows,
and accessing data across DLL/EXE boundaries on Windows is generally
problematic.

Differential Revision: https://reviews.llvm.org/D26671

llvm-svn: 289647
2016-12-14 11:57:17 +00:00
Eugene Zelenko
c816ae3436 [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 289475
2016-12-12 22:23:53 +00:00
Nicolai Haehnle
baedb7e4bc AMDGPU: llvm.amdgcn.interp.mov is a source of divergence
Summary:
While the result is constant across a single primitive, each pixel
shader wave can have pixels from multiple primitives.

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D27572

llvm-svn: 289447
2016-12-12 16:52:19 +00:00
Matt Arsenault
82492d300b AMDGPU: Fix asan errors when folding operands
This was failing when trying to fold immediates into operand 1 of a
phi, which only has one statically known operand.

llvm-svn: 289337
2016-12-10 19:58:00 +00:00
Matt Arsenault
78957bd5fc AMDGPU: Fix AMDGPUPromoteAlloca breaking addrspacecasts
The users of the addrspacecast were having their types incorrectly
changed, producing invalid bitcasts between address spaces.

llvm-svn: 289307
2016-12-10 00:52:50 +00:00
Matt Arsenault
c2c2a10170 AMDGPU: Fix handling of 16-bit immediates
Since 32-bit instructions with 32-bit input immediate behavior
are used to materialize 16-bit constants in 32-bit registers
for 16-bit instructions, determining the legality based
on the size is incorrect. Change operands to have the size
specified in the type.

Also adds a workaround for a disassembler bug that
produces an immediate MCOperand for an operand that
is supposed to be OPERAND_REGISTER.

The assembler appears to accept out of bounds immediates and
truncates them, but this seems to be an issue for 32-bit
already.

llvm-svn: 289306
2016-12-10 00:39:12 +00:00
Matt Arsenault
365f8ab107 AMDGPU: Fix vintrp disassembly
llvm-svn: 289292
2016-12-10 00:29:55 +00:00
Matt Arsenault
61a1b18506 AMDGPU: Change vintrp printing to better match sc
Some of the immediates need to be printed differently
eventually.

llvm-svn: 289291
2016-12-10 00:23:12 +00:00
Eugene Zelenko
796f37f3bb [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 289282
2016-12-09 22:06:55 +00:00