1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 18:54:02 +01:00
Commit Graph

193132 Commits

Author SHA1 Message Date
Anna Welker
d262d5349f [TTI][ARM][MVE] Refine gather/scatter cost model
Refines the gather/scatter cost model, but also changes the TTI
function getIntrinsicInstrCost to accept an additional parameter
which is needed for the gather/scatter cost evaluation.
This did require trivial changes in some non-ARM backends to
adopt the new parameter.
Extending gathers and truncating scatters are now priced cheaper.

Differential Revision: https://reviews.llvm.org/D75525
2020-03-11 10:23:41 +00:00
Victor Campos
57135c3cb8 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

The code generation explicitly deviates from using the register-offset
variant of LDRD/STRD. In this variant, the register allocated to the
register-offset cannot be reused in any of the remaining operands. Such
restriction seems to be non-trivial to implement in LLVM, thus it is
left as a to-do.

Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers

Reviewed By: efriedma, nickdesaulniers

Subscribers: danielkiss, alanphipps, hans, nathanchance, nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2020-03-11 10:19:27 +00:00
QingShan Zhang
380bb62d3e [NFC][Test] Add a PowerPC test to verify the behavior of a*b +/- c*d 2020-03-11 09:35:40 +00:00
Sebastian Neubauer
d908115756 [AMDGPU] Use script to generate atomic optimizations test
This is a preparation for introducing a llvm.amdgcn.ballot intrinsic in
D65088.
2020-03-11 09:59:36 +01:00
QingShan Zhang
2143280c39 [NFC][Test] Format the test PowerPC/recipest.ll with update_llc_test_checks.py 2020-03-11 08:49:53 +00:00
Serge Pavlov
d7d03300e9 Make IEEEFloat::roundToIntegral more standard conformant
Behavior of IEEEFloat::roundToIntegral is aligned with IEEE-754
operation roundToIntegralExact. In partucular this function now:
- returns opInvalid for signaling NaNs,
- returns opInexact if the result of rounding differs from argument.

Differential Revision: https://reviews.llvm.org/D75246
2020-03-11 10:38:46 +07:00
Matt Arsenault
f34467c340 GlobalISel: Don't try to narrow extending loads/trunc store
If the loaded memory size was smaller than the result size, this would
produce out of bounds memory accesses. I'm wondering if we need a
distinct narrow memory legalize action type, since a case I care about
is decomposing a 4-byte unaligned access into 4 extending loads, which
would leave the original result register type. I'm currently awkwardly
using narrowScalar to handle unaligned accesses that need to be split.
2020-03-10 23:34:10 -04:00
Matt Arsenault
dd9ccb691d GlobalISel: Add missing add/sub with carries to MachineIRBuilder 2020-03-10 22:39:55 -04:00
Matt Arsenault
bbcd5f4f1b AMDGPU/GlobalISel: Add some tests that used to infinite loop 2020-03-10 22:12:56 -04:00
Carl Ritson
c7052955c8 [AMDGPU] Allow struct.buffer.*.format intrinsics to accept i32
Summary:
In the same manner as struct.buffer.load / struct.buffer.store,
allow struct.buffer.load.format / struct.buffer.store.format to
return / accept any type.  This simplifies front-end code gen.

Reviewers: tpr, arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75789
2020-03-11 08:20:32 +09:00
Lang Hames
bc59809461 [RuntimeDyld][COFF] Build stubs for COFF dllimport symbols.
Summary:
Enables JIT-linking by RuntimeDyld of COFF objects that contain references to
dllimport symbols. This is done by recognizing symbols that start with the
reserved "__imp_" prefix and building a pointer entry to the target symbol in
the stubs area of the section. References to the "__imp_" symbol are updated to
point to this pointer.

Work in progress: The generic code is in place, but only RuntimeDyldCOFFX86_64
and RuntimeDyldCOFFI386 have been updated to look for and update references to
dllimport symbols.

Reviewers: compnerd

Subscribers: hiraditya, ributzka, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75884
2020-03-10 16:08:40 -07:00
Lang Hames
9a8419c639 [RuntimeDyld] Allow multi-line rtdyld-check and jitlink-check expressions.
This patch allows rtdyld-check / jitlink-check expressions to be extended over
multiple lines by terminating each line with a '\'. E.g.

  # llvm-rtdyld: *{8}X = \
  # llvm-rtdyld:   Y
  X:
    .quad Y

This will be used to break up some long lines in upcoming test cases.
2020-03-10 16:08:40 -07:00
Matt Arsenault
28c8c4e5a8 AMDGPU/GlobalISel: Refine G_TRUNC legality rules
Scalarize most truncates. Avoid touching cases that could end up in
unresolvable infinite loops.
2020-03-10 15:32:22 -07:00
Matt Arsenault
7f383ef5cd GlobalISel: Implement fewerElementsVector for G_TRUNC
Extend fewerElementsVectorBasic to handle operands with different
element types.
2020-03-10 15:17:20 -07:00
Matt Arsenault
275c4b7f94 AMDGPU: Use V_MAC_F32 for fmad.ftz
This avoids regressions in a future patch. I'm confused by the use of
the gfx9 usage legacy_mad. Was this a pointless instruction rename, or
uses fmul_legacy handling? Why is regular mac avilable in that case?
2020-03-10 14:41:06 -07:00
LLVM GN Syncbot
3bc21436e7 [gn build] Port ebdb98f254f 2020-03-10 20:34:28 +00:00
Jay Foad
582a6adc1c [AMDGPU] Fix the gfx10 scheduling model for f32 conversions
Summary:
As far as I can tell on gfx10 conversions to/from f32 (that are not
converting f32 to/from f64) are full rate instructions, but they were
marked as quarter rate instructions.

I have fixed this for gfx10 only. I assume the scheduling model was
correct for older architectures, though I don't have any documentation
handy to confirm that.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75392
2020-03-10 19:31:24 +00:00
Fangrui Song
a99d809eec [SimplifyLibcalls] Don't replace locked IO (fgetc/fgets/fputc/fputs/fread/fwrite) with unlocked IO (*_unlocked)
This essentially reverts some of the SimplifyLibcalls part changes of D45736 [SimplifyLibcalls] Replace locked IO with unlocked IO.

C11 7.21.5.2 The fflush function

> If stream is a null pointer, the fflush function performs this flushing action on all streams for which the behavior is defined above.

i.e. fopen'ed FILE* is inherently captured.

POSIX.1-2017 getc_unlocked, getchar_unlocked, putc_unlocked, putchar_unlocked - stdio with explicit client locking

> These functions can safely be used in a multi-threaded program if and only if they are called while the invoking thread owns the ( FILE *) object, as is the case after a successful call to the flockfile() or ftrylockfile() functions.

After a thread fopen'ed a FILE*, when it is calling foobar() which is now replaced by foobar_unlocked(),
if another thread is concurrently calling fflush(0), the behavior is undefined.

C11 7.22.4.4 The exit function

> Next, all open streams with unwritten buffered data are flushed, all open streams are closed, and all files created by the tmpfile function are removed.

The replacement is only feasible if the program is single threaded, or exit or fflush(0) is never called.
See also http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180528/556615.html
for how the replacement makes libc interceptors difficult to implement.

dalias: in a worst case, it's unbounded data corruption because of concurrent access to pointers
without synchronization.  f->wpos or rpos could get outside of the buffer, thread A could do
f->wpos += j after knowing j is in bounds, while thread B also changes it concurrently.

This can produce exploitable conditions depending on libc internals.

Revert the SimplifyLibcalls part change because the cons obviously
overweigh the pros.  Even when the replacement is feasible, the benefit
is indemonstrable, more so in an application instead of an artificial
glibc benchmark.  Theoretically the replacement could be beneficial when
calling getc_unlocked/putc_unlocked in a loop, but then it is better
using a blocked IO operation and the user is likely aware of that.

The function attribute inference is still useful and thus kept.

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D75933
2020-03-10 11:11:58 -07:00
Matt Arsenault
b868163de4 ARM: Fixup some tests using denormal-fp-math attribute
Don't use the deprecated, single mode form in tests. Also make sure to
parse the attribute, in case of the deprecated form.
2020-03-10 14:02:06 -04:00
Benjamin Kramer
5193ea06e9 Give helpers internal linkage. NFC. 2020-03-10 18:27:42 +01:00
LLVM GN Syncbot
ac5dfb4241 [gn build] Port a4cde9ad7b6 2020-03-10 17:04:42 +00:00
Tyker
209ad09066 Fixed [AssumeBundles] Move to IR so it can be used by Analysis
This is a recommit of 57c964aaa76bfaa908398fbd9d8c9d6d19856859
after fixing modules build.
2020-03-10 18:02:39 +01:00
Kazushi (Jam) Marukawa
ed35b7a19d [VE] Target-specific bit size for sjljehprepare
Summary:
This patch extends the TargetMachine to let targets specify the integer size
used by the sjljehprepare pass. This is 64bit for the VE target and otherwise
defaults to 32bit for all targets, which was hard-wired before.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D71337
2020-03-10 17:51:16 +01:00
Simon Moll
24a623a765 [instcombine] remove fsub to fneg hacks; only emit fneg
Summary: Rewrite the fsub-0.0 idiom to fneg and always emit fneg for fp
negation. This also extends the scalarization cost in instcombine for unary
operators to result in the same IR rewrites for fneg as for the idiom.

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D75467
2020-03-10 16:57:02 +01:00
Simon Pilgrim
02722c233c [X86][SSE] getFauxShuffleMask - add support for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT) shuffle pattern
We already do this for PINSRB/PINSRW and SCALAR_TO_VECTOR.
2020-03-10 15:42:37 +00:00
Simon Pilgrim
44a507e52c [X86][SSE] matchShuffleWithSHUFPD - add support for unary shuffles.
This causes one minor test change but is mainly necessary for an upcoming patch.
2020-03-10 15:42:36 +00:00
Simon Pilgrim
7647985257 [X86][SSE] Add some extract+insert shuffle tests
Shows failure to avoid xmm<->gpr transfers by using insertps/blendps
2020-03-10 15:42:36 +00:00
Hiroshi Yamauchi
fb601ea161 [PSI] Add tests for is(Hot|Cold)FunctionInCallGraphNthPercentile.
Summary:
Follow up on D75283.

Also remove the test code that was moved to another test and was to be removed.

Reviewers: davidxl

Subscribers: eraman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75630
2020-03-10 08:21:10 -07:00
Matt Arsenault
3df4600f3b AMDGPU/GlobalISel: Insert readfirstlane on SGPR returns
In case the source value ends up in a VGPR, insert a readfirstlane to
avoid producing an illegal copy later. If it turns out to be
unnecessary, it can be folded out.
2020-03-10 11:18:48 -04:00
Sam Parker
74bf7f8a40 [ARM][MVE] VFMA and VFMS validForTailPredication
Add four instructions to the whitelist.

Differential Revision: https://reviews.llvm.org/D75902
2020-03-10 14:58:29 +00:00
Jonas Paulsson
354d615db9 [SystemZ] Improve foldMemoryOperandImpl().
Swap the compare operands if LHS is spilled while updating the CCMask:s of
the CC users. This is relatively straight forward since the live-in lists for
the CC register can be assumed to be correct during register allocation
(thanks to 659efa2).

Also fold a spilled operand of an LOCR/SELR into an LOC(G).

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D67437
2020-03-10 15:54:47 +01:00
LLVM GN Syncbot
ed59983363 [gn build] Port 714466bf367 2020-03-10 14:33:04 +00:00
Florian Hahn
bc7c21fd6c [InstCombine] Support vectors in SimplifyAddWithRemainder.
SimplifyAddWithRemainder currently also matches for vector types, but
tries to create an integer constant, which causes a crash.

By using Constant::getIntegerValue() we can support both the scalar and
vector cases.

The 2 added test cases crash without the fix.

Reviewers: spatel, lebedev.ri

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D75906
2020-03-10 14:29:40 +00:00
Nico Weber
6bec78f65f [gn build] (manually) merge 47edf5bafb 2020-03-10 10:22:39 -04:00
Mikhail Maltsev
ebdfc210f1 [ARM,CDE] Generalize MVE intrinsics infrastructure to support CDE
Summary:
This patch generalizes the existing code to support CDE intrinsics
which will share some properties with existing MVE intrinsics
(some of the intrinsics will be polymorphic and accept/return values
of MVE vector types).
Specifically the patch:
* Adds new tablegen backends -gen-arm-cde-builtin-def,
  -gen-arm-cde-builtin-codegen, -gen-arm-cde-builtin-sema,
  -gen-arm-cde-builtin-aliases, -gen-arm-cde-builtin-header based on
  existing MVE backends.
* Renames the '__clang_arm_mve_alias' attribute into
  '__clang_arm_builtin_alias' (it will be used with CDE intrinsics as
  well as MVE intrinsics)
* Implements semantic checks for the coprocessor argument of the CDE
  intrinsics as well as the existing coprocessor intrinsics.
* Adds one CDE intrinsic __arm_cx1 to test the above changes

Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen

Reviewed By: simon_tatham

Subscribers: sdesmalen, mgorny, kristof.beyls, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75850
2020-03-10 14:03:16 +00:00
Jonas Paulsson
a150110be1 [SimplifyCFG] Skip merging return blocks if it would break a CallBr.
SimplifyCFG should not merge empty return blocks and leave a CallBr behind
with a duplicated destination since the verifier will then trigger an
assert. This patch checks for this case and avoids the transformation.

CodeGenPrepare has a similar check which also has a FIXME comment about why
this is needed. It seems perhaps better if these two passes would eventually
instead update the CallBr instruction instead of just checking and avoiding.

This fixes https://bugs.llvm.org/show_bug.cgi?id=45062.

Review: Craig Topper

Differential Revision: https://reviews.llvm.org/D75620
2020-03-10 14:59:13 +01:00
Sanjay Patel
04f0791d5b [InstCombine] regenerate test checks; NFC
tmp -> t because 'tmp' tends to cause problems for the auto-generation script.
2020-03-10 09:57:41 -04:00
Simon Pilgrim
ee23af1e27 [TargetLowering] SimplifyDemandedVectorElts - add DemandedElts mask to ISD::BITCAST SimplifyDemandedBits call.
This fixes most of the regressions introduced in the rG4bc6f6332028 bugfix. The vector-trunc.ll issue should be fixed by D66004.
2020-03-10 13:39:10 +00:00
Sanjay Patel
7af984d05f [InstCombine] fold gep-of-select-of-constants (PR45084)
As shown in:
https://bugs.llvm.org/show_bug.cgi?id=45084
...we failed to combine a gep with constant indexes with a
pointer operand that is a select of constants.

Differential Revision: https://reviews.llvm.org/D75807
2020-03-10 09:25:13 -04:00
Sanjay Patel
b22a808ae3 [InstCombine] add/adjust tests for select-gep; NFC
Goes with D75807
2020-03-10 09:25:13 -04:00
Florian Hahn
ac56a0239b [SLP] Support vectorizing functions provided by vector libs.
It seems like the SLPVectorizer is currently not aware of vector
versions of functions provided by libraries like Accelerate [1].
This patch updates SLPVectorizer to use the same infrastructure
the LoopVectorizer uses to detect vectorizable library functions.

For calls, it computes the cost of an intrinsic call (existing behavior)
and the cost of a vector function library call, if available. Like
LoopVectorizer, it assumes the cost of the vector function is simply the
cost of a call to a vector function.

[1] https://developer.apple.com/documentation/accelerate

Reviewers: ABataev, RKSimon, spatel

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D75878
2020-03-10 13:10:50 +00:00
Simon Pilgrim
13106dbb0f [X86][SSE] Add more accurate costs for fmaxnum/fminnum codegen
Based off llvm-mca reports on codegen in llvm\test\CodeGen\X86\fmaxnum.ll + llvm\test\CodeGen\X86\fminnum.ll
2020-03-10 11:59:40 +00:00
Djordje Todorovic
170a16a73b [NFC][llvm-dwarfdump] Always use 'const Twine &'
According to the Twine.h comment, the Twines should only
be used as const references in arguments.

Differential Revision: https://reviews.llvm.org/D75727
2020-03-10 12:58:59 +01:00
Simon Pilgrim
77f4566679 [SLPVectorizer][X86] Add fmaxnum/fminnum tests 2020-03-10 11:18:28 +00:00
Simon Pilgrim
2f17854215 [CostModel][X86] Add fmaxnum/fminnum costs tests 2020-03-10 11:18:27 +00:00
Simon Pilgrim
d63e8c5905 [X86][SSE] Add SSE41 coverage for fmaxnum/fminnum tests 2020-03-10 11:18:27 +00:00
alex-t
01c427b18c [AMDGPU] SI_INDIRECT_DST_V* pseudos expansion should place EXEC restore to separate basic block
Summary:
When SI_INDIRECT_DST_V* pseudos has indexes in VGPR, they get expanded into the self-looped basic block that modifies EXEC in a loop.

To keep EXEC consistent it is stored before and then re-stored after the pseudo expansion result.

%95:vreg_512 = SI_INDIRECT_DST_V16 %93:vreg_512(tied-def 0), %94:sreg_32, 0, killed %1500:vgpr_32

results to

    s_mov_b64 s[6:7], exec
BB0_16:
    v_readfirstlane_b32 s8, v28
    v_cmp_eq_u32_e32 vcc, s8, v28
    s_and_saveexec_b64 vcc, vcc
    s_set_gpr_idx_on s8, gpr_idx(DST)
    v_mov_b32_e32 v6, v25
    s_set_gpr_idx_off
    s_xor_b64 exec, exec, vcc
    s_cbranch_execnz BB0_16
; %bb.17:
    s_mov_b64 exec, s[6:7]

The bug appeared in case this expansion occurs in the ELSE block of the CF.

Originally

  %110:vreg_512 = SI_INDIRECT_DST_V16 %103:vreg_512(tied-def 0), %85:vgpr_32, 0, %107:vgpr_32,
   %112:sreg_64 = SI_ELSE %108:sreg_64, %bb.19, 0, implicit-def dead $exec, implicit-def dead $scc, implicit $exec

expanded to

         ******************   <== here exec has "THEN" context

    s_mov_b64 s[6:7], exec
BB0_16:
    v_readfirstlane_b32 s8, v28
    v_cmp_eq_u32_e32 vcc, s8, v28
    s_and_saveexec_b64 vcc, vcc
    s_set_gpr_idx_on s8, gpr_idx(DST)
    v_mov_b32_e32 v6, v25
    s_set_gpr_idx_off
    s_xor_b64 exec, exec, vcc
    s_cbranch_execnz BB0_16
; %bb.17:
    s_or_saveexec_b64 s[4:5], s[4:5]   <-- exec mask is restored for "ELSE" but immediately overwritten.
    s_mov_b64 exec, s[6:7]

The rest of the "ELSE" block is executed not by the workitems which constitute the "else mask" but by those which constitute "then mask"

SILowerControlFlow::emitElse always considers the basic block begin() as an insertion point for s_or_saveexec.

Proposed fix:  The SI_INDIRECT_DST_V* procedure should split the reminder block to create landing pad for the EXEC restoration.

Reviewers: rampitec, vpykhtin, nhaehnle

Reviewed By: vpykhtin

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75472
2020-03-10 14:04:22 +03:00
Kerry McLaughlin
cdd0eae6fc [AArch64][SVE] Add SVE intrinsics for address calculations
Summary: Adds the @llvm.aarch64.sve.adr[b|h|w|d] intrinsics

Reviewers: sdesmalen, andwar, efriedma, dancgr, cameron.mcinally, rengolin

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75858
2020-03-10 10:53:37 +00:00
James Greenhalgh
a9bbb2f3a9 [Arm] Do not lower vmax/vmin to Neon instructions
On some Arm cores there is a performance penalty when forwarding from an
S register to a D register.  Calculating VMAX in a D register creates
false forwarding hazards, so don't do that unless we're on a core which
specifically asks for it.

Patch by James Greenhalgh

Differential Revision: https://reviews.llvm.org/D75248
2020-03-10 10:48:48 +00:00
Simon Pilgrim
53e2f64862 [X86][AVX] combineX86ShuffleChain - combine binary shuffles to X86ISD::VPERM2X128
For pre-AVX512 targets, combine binary shuffles to X86ISD::VPERM2X128 if possible. This mainly helps optimize the blend(extract_subvector(x,1),y) pattern.

At some point soon we're going to have make a decision about when to combine AVX512 shuffles more aggressively - we bail out if there is any change in element size (to protect predicate mask merging) which means we miss out on a lot of optimizations.
2020-03-10 10:44:28 +00:00