llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 12:12:47 +01:00

Author	SHA1	Message	Date
Florian Hahn	b9a535f60f	[Matrix] Add remark propagation along the inlined-at chain. This patch adds support for propagating matrix expressions along the inlined-at chain and emitting remarks at the traversed function scopes. To motivate this new behavior, consider the example below. Without the remark 'up-leveling', we would only get remarks in load.h and store.h, but we cannot generate a remark describing the full expression in toplevel.cpp, which is the place where the user has the best chance of spotting/fixing potential problems. With this patch, we generate a remark for the load in load.h, one for the store in store.h and one for the complete expression in toplevel.cpp. For a bigger example, please see remarks-inlining.ll. load.h: template <typename Ty, unsigned R, unsigned C> Matrix<Ty, R, C> load(Ty Ptr) { Matrix<Ty, R, C> Result; Result.value = reinterpret_cast <typename Matrix<Ty, R, C>::matrix_t >(Ptr); return Result; } store.h: template <typename Ty, unsigned R, unsigned C> void store(Matrix<Ty, R, C> M1, Ty Ptr) { reinterpret_cast<typename decltype(M1)::matrix_t >(Ptr) = M1.value; } toplevel.cpp void test(double A, double B, double *C) { store(add(load<double, 3, 5>(A), load<double, 3, 5>(B)), C); } For a given function, we traverse the inlined-at chain for each matrix instruction (= instructions with shape information). We collect the matrix instructions in each DISubprogram we visit. This produces a mapping of DISubprogram -> (List of matrix instructions visible in the subpogram). We then generate remarks using the list of instructions for each subprogram in the inlined-at chain. Note that the list of instructions for a subprogram includes the instructions from its own subprograms recursively. For example using the example above, for the subprogram 'test' this includes inline functions 'load' and 'store'. This allows surfacing the remarks at a level useful to users. Please note that the current approach may create a lot of extra remarks. Additional heuristics to cut-off the traversal can be implemented in the future. For example, it might make sense to stop 'up-leveling' once all matrix instructions are at the same debug location. Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D73600	2020-03-11 17:40:08 +00:00
Andrzej Warzynski	c29228eb19	[AArch64][SVE] Add the @llvm.aarch64.sve.sel intrinsic Reviewers: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75928	2020-03-11 17:05:21 +00:00
Philip Reames	9c84b99ce5	[GC] Remove buggy untested optimization from statepoint lowering A downstream test case (see included reduced test) revealed that we have a bug in how we handle duplicate relocations. If we have the same SDValue relocated twice, and that value happens to be a constant (such as null), we only export one of the two llvm::Values. Exporting on a per llvm::Value basis is required to allow lowering of gc.relocates in following basic blocks (e.g. invokes). Without it, we end up with a use of an undefined vreg and bad things happen. Rather than fixing the optimization - which appears to be hard - I propose we simply remove it. There are no tests in tree that change with this code removed. If we find out later that this did matter for something, we can reimplement a variation of this in CodeGenPrepare to catch the easy cases without complicating the lowering code. Thanks to Denis and Serguei who did all the hard work of figuring out what went wrong here. The patch is by far the easy part. :) Differential Revision: https://reviews.llvm.org/D75964	2020-03-11 10:03:24 -07:00
James Henderson	bfc3be34e7	[Object][unittest] Skip tests on machines with non-64 bit size_t Speculative fix for build bot failures such as http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/14317/	2020-03-11 15:31:30 +00:00
David Green	7a9084d2b2	[ARM] Extra VFMA tests. NFC	2020-03-11 15:14:07 +00:00
Matt Arsenault	ab795ee017	AMDGPU/GlobalISel: Manually RegBankSelect copies This was failng on any pre-assigned copy to the VCC bank. This is something of a workaround for the default implementation in getInstrMappingImpl, and how it treats copy-like operations in general. Copy-like operations are considered to only have one result register bank, rather than separate banks for each source like a normal instruction. To avoid potentially mishandling reg_sequence with impossible operand combinations, the generic implementation errors on impossible costs. If the bank was already assigned, is treated it as-if it were an unsatisfiable REG_SEQUENCE mapping. We really don't get any value from any of what getInstrMappingImpl tries to do for copies, so just directly emit the simple mapping we really want.	2020-03-11 11:12:12 -04:00
Christian Sigg	5d2315f6e7	Change to individual pretty printer classes, remove generic `make_printer`. Summary: Follow-up from D72589. Reviewers: dblaikie Reviewed By: dblaikie Subscribers: merge_guards_bot, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73609	2020-03-11 15:04:03 +01:00
Hubert Tong	600da7bd8c	[unittests][Object] Use matching signedness for expected value Speculative fix for buildbot breakage: http://lab.llvm.org:8011/builders/clang-ppc64le-rhel/builds/1899/steps/ninja%20check%201/logs/stdio D75742 introduces checks that cause bots to complain about comparing values where the integer types mismatch on signedness. This patch makes the expected value unsigned in various cases (since the value being tested is unsigned).	2020-03-11 09:58:10 -04:00
Sam Parker	777af3bdba	[NFC][ARM] Add test Precommit test for LowOverheadLoops.	2020-03-11 11:51:52 +00:00
Sam Parker	6dfdec8dbc	[NFC][ARM] Reorder some logic Move some logic around in LowOverheadLoop::ValidateLiveOut	2020-03-11 11:40:09 +00:00
Simon Pilgrim	4c7af40810	[X86] Replace (most) X86ISD::SHLD/SHRD usage with ISD::FSHL/FSHR generic opcodes (PR39467) For i32 and i64 cases, X86ISD::SHLD/SHRD are close enough to ISD::FSHL/FSHR that we can use them directly, we just need to account for the operand commutation for SHRD. The i16 SHLD/SHRD case is annoying as the shift amount is modulo-32 (vs funnel shift modulo-16), so I've added X86ISD::FSHL/FSHR equivalents, which matches the generic implementation in all other terms. Something I'm slightly concerned with is that ISD::FSHL/FSHR legality is controlled by the Subtarget.isSHLDSlow() feature flag - we don't normally use non-ISA features for this but it allows the DAG combines to continue to operate after legalization in a lot more cases. The X86 *bits.ll changes are all affected by the same issue - we now have a "FSHR(-1,-1,amt) -> ROTR(-1,amt) -> (-1)" simplification that reduces the dependencies enough for the branch fall through code to mess up. Differential Revision: https://reviews.llvm.org/D75748	2020-03-11 11:17:49 +00:00
LLVM GN Syncbot	acb9457c82	[gn build] Port 326bc1da45b	2020-03-11 10:47:56 +00:00
James Henderson	a17a111e61	[Object] Fix handling of large archive members The archive library truncated the size of archive members whose size was greater than max uint32_t. This patch fixes the issue and adds some unit tests to verify. Reviewed by: ruiu, MaskRay, grimar, rupprecht Differential Revision: https://reviews.llvm.org/D75742	2020-03-11 10:29:45 +00:00
Anna Welker	d262d5349f	[TTI][ARM][MVE] Refine gather/scatter cost model Refines the gather/scatter cost model, but also changes the TTI function getIntrinsicInstrCost to accept an additional parameter which is needed for the gather/scatter cost evaluation. This did require trivial changes in some non-ARM backends to adopt the new parameter. Extending gathers and truncating scatters are now priced cheaper. Differential Revision: https://reviews.llvm.org/D75525	2020-03-11 10:23:41 +00:00
Victor Campos	57135c3cb8	[ARM] Improve codegen of volatile load/store of i64 Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. The code generation explicitly deviates from using the register-offset variant of LDRD/STRD. In this variant, the register allocated to the register-offset cannot be reused in any of the remaining operands. Such restriction seems to be non-trivial to implement in LLVM, thus it is left as a to-do. Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers Reviewed By: efriedma, nickdesaulniers Subscribers: danielkiss, alanphipps, hans, nathanchance, nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072	2020-03-11 10:19:27 +00:00
QingShan Zhang	380bb62d3e	[NFC][Test] Add a PowerPC test to verify the behavior of ab +/- cd	2020-03-11 09:35:40 +00:00
Sebastian Neubauer	d908115756	[AMDGPU] Use script to generate atomic optimizations test This is a preparation for introducing a llvm.amdgcn.ballot intrinsic in D65088.	2020-03-11 09:59:36 +01:00
QingShan Zhang	2143280c39	[NFC][Test] Format the test PowerPC/recipest.ll with update_llc_test_checks.py	2020-03-11 08:49:53 +00:00
Serge Pavlov	d7d03300e9	Make IEEEFloat::roundToIntegral more standard conformant Behavior of IEEEFloat::roundToIntegral is aligned with IEEE-754 operation roundToIntegralExact. In partucular this function now: - returns opInvalid for signaling NaNs, - returns opInexact if the result of rounding differs from argument. Differential Revision: https://reviews.llvm.org/D75246	2020-03-11 10:38:46 +07:00
Matt Arsenault	f34467c340	GlobalISel: Don't try to narrow extending loads/trunc store If the loaded memory size was smaller than the result size, this would produce out of bounds memory accesses. I'm wondering if we need a distinct narrow memory legalize action type, since a case I care about is decomposing a 4-byte unaligned access into 4 extending loads, which would leave the original result register type. I'm currently awkwardly using narrowScalar to handle unaligned accesses that need to be split.	2020-03-10 23:34:10 -04:00
Matt Arsenault	dd9ccb691d	GlobalISel: Add missing add/sub with carries to MachineIRBuilder	2020-03-10 22:39:55 -04:00
Matt Arsenault	bbcd5f4f1b	AMDGPU/GlobalISel: Add some tests that used to infinite loop	2020-03-10 22:12:56 -04:00
Carl Ritson	c7052955c8	[AMDGPU] Allow struct.buffer.*.format intrinsics to accept i32 Summary: In the same manner as struct.buffer.load / struct.buffer.store, allow struct.buffer.load.format / struct.buffer.store.format to return / accept any type. This simplifies front-end code gen. Reviewers: tpr, arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75789	2020-03-11 08:20:32 +09:00
Lang Hames	bc59809461	[RuntimeDyld][COFF] Build stubs for COFF dllimport symbols. Summary: Enables JIT-linking by RuntimeDyld of COFF objects that contain references to dllimport symbols. This is done by recognizing symbols that start with the reserved "__imp_" prefix and building a pointer entry to the target symbol in the stubs area of the section. References to the "__imp_" symbol are updated to point to this pointer. Work in progress: The generic code is in place, but only RuntimeDyldCOFFX86_64 and RuntimeDyldCOFFI386 have been updated to look for and update references to dllimport symbols. Reviewers: compnerd Subscribers: hiraditya, ributzka, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75884	2020-03-10 16:08:40 -07:00
Lang Hames	9a8419c639	[RuntimeDyld] Allow multi-line rtdyld-check and jitlink-check expressions. This patch allows rtdyld-check / jitlink-check expressions to be extended over multiple lines by terminating each line with a '\'. E.g. # llvm-rtdyld: *{8}X = \ # llvm-rtdyld: Y X: .quad Y This will be used to break up some long lines in upcoming test cases.	2020-03-10 16:08:40 -07:00
Matt Arsenault	28c8c4e5a8	AMDGPU/GlobalISel: Refine G_TRUNC legality rules Scalarize most truncates. Avoid touching cases that could end up in unresolvable infinite loops.	2020-03-10 15:32:22 -07:00
Matt Arsenault	7f383ef5cd	GlobalISel: Implement fewerElementsVector for G_TRUNC Extend fewerElementsVectorBasic to handle operands with different element types.	2020-03-10 15:17:20 -07:00
Matt Arsenault	275c4b7f94	AMDGPU: Use V_MAC_F32 for fmad.ftz This avoids regressions in a future patch. I'm confused by the use of the gfx9 usage legacy_mad. Was this a pointless instruction rename, or uses fmul_legacy handling? Why is regular mac avilable in that case?	2020-03-10 14:41:06 -07:00
LLVM GN Syncbot	3bc21436e7	[gn build] Port ebdb98f254f	2020-03-10 20:34:28 +00:00
Jay Foad	582a6adc1c	[AMDGPU] Fix the gfx10 scheduling model for f32 conversions Summary: As far as I can tell on gfx10 conversions to/from f32 (that are not converting f32 to/from f64) are full rate instructions, but they were marked as quarter rate instructions. I have fixed this for gfx10 only. I assume the scheduling model was correct for older architectures, though I don't have any documentation handy to confirm that. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75392	2020-03-10 19:31:24 +00:00
Fangrui Song	a99d809eec	[SimplifyLibcalls] Don't replace locked IO (fgetc/fgets/fputc/fputs/fread/fwrite) with unlocked IO (_unlocked) This essentially reverts some of the SimplifyLibcalls part changes of D45736 [SimplifyLibcalls] Replace locked IO with unlocked IO. C11 7.21.5.2 The fflush function > If stream is a null pointer, the fflush function performs this flushing action on all streams for which the behavior is defined above. i.e. fopen'ed FILE is inherently captured. POSIX.1-2017 getc_unlocked, getchar_unlocked, putc_unlocked, putchar_unlocked - stdio with explicit client locking > These functions can safely be used in a multi-threaded program if and only if they are called while the invoking thread owns the ( FILE ) object, as is the case after a successful call to the flockfile() or ftrylockfile() functions. After a thread fopen'ed a FILE, when it is calling foobar() which is now replaced by foobar_unlocked(), if another thread is concurrently calling fflush(0), the behavior is undefined. C11 7.22.4.4 The exit function > Next, all open streams with unwritten buffered data are flushed, all open streams are closed, and all files created by the tmpfile function are removed. The replacement is only feasible if the program is single threaded, or exit or fflush(0) is never called. See also http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180528/556615.html for how the replacement makes libc interceptors difficult to implement. dalias: in a worst case, it's unbounded data corruption because of concurrent access to pointers without synchronization. f->wpos or rpos could get outside of the buffer, thread A could do f->wpos += j after knowing j is in bounds, while thread B also changes it concurrently. This can produce exploitable conditions depending on libc internals. Revert the SimplifyLibcalls part change because the cons obviously overweigh the pros. Even when the replacement is feasible, the benefit is indemonstrable, more so in an application instead of an artificial glibc benchmark. Theoretically the replacement could be beneficial when calling getc_unlocked/putc_unlocked in a loop, but then it is better using a blocked IO operation and the user is likely aware of that. The function attribute inference is still useful and thus kept. Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D75933	2020-03-10 11:11:58 -07:00
Matt Arsenault	b868163de4	ARM: Fixup some tests using denormal-fp-math attribute Don't use the deprecated, single mode form in tests. Also make sure to parse the attribute, in case of the deprecated form.	2020-03-10 14:02:06 -04:00
Benjamin Kramer	5193ea06e9	Give helpers internal linkage. NFC.	2020-03-10 18:27:42 +01:00
LLVM GN Syncbot	ac5dfb4241	[gn build] Port a4cde9ad7b6	2020-03-10 17:04:42 +00:00
Tyker	209ad09066	Fixed [AssumeBundles] Move to IR so it can be used by Analysis This is a recommit of 57c964aaa76bfaa908398fbd9d8c9d6d19856859 after fixing modules build.	2020-03-10 18:02:39 +01:00
Kazushi (Jam) Marukawa	ed35b7a19d	[VE] Target-specific bit size for sjljehprepare Summary: This patch extends the TargetMachine to let targets specify the integer size used by the sjljehprepare pass. This is 64bit for the VE target and otherwise defaults to 32bit for all targets, which was hard-wired before. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D71337	2020-03-10 17:51:16 +01:00
Simon Moll	24a623a765	[instcombine] remove fsub to fneg hacks; only emit fneg Summary: Rewrite the fsub-0.0 idiom to fneg and always emit fneg for fp negation. This also extends the scalarization cost in instcombine for unary operators to result in the same IR rewrites for fneg as for the idiom. Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D75467	2020-03-10 16:57:02 +01:00
Simon Pilgrim	02722c233c	[X86][SSE] getFauxShuffleMask - add support for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT) shuffle pattern We already do this for PINSRB/PINSRW and SCALAR_TO_VECTOR.	2020-03-10 15:42:37 +00:00
Simon Pilgrim	44a507e52c	[X86][SSE] matchShuffleWithSHUFPD - add support for unary shuffles. This causes one minor test change but is mainly necessary for an upcoming patch.	2020-03-10 15:42:36 +00:00
Simon Pilgrim	7647985257	[X86][SSE] Add some extract+insert shuffle tests Shows failure to avoid xmm<->gpr transfers by using insertps/blendps	2020-03-10 15:42:36 +00:00
Hiroshi Yamauchi	fb601ea161	[PSI] Add tests for is(Hot\|Cold)FunctionInCallGraphNthPercentile. Summary: Follow up on D75283. Also remove the test code that was moved to another test and was to be removed. Reviewers: davidxl Subscribers: eraman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75630	2020-03-10 08:21:10 -07:00
Matt Arsenault	3df4600f3b	AMDGPU/GlobalISel: Insert readfirstlane on SGPR returns In case the source value ends up in a VGPR, insert a readfirstlane to avoid producing an illegal copy later. If it turns out to be unnecessary, it can be folded out.	2020-03-10 11:18:48 -04:00
Sam Parker	74bf7f8a40	[ARM][MVE] VFMA and VFMS validForTailPredication Add four instructions to the whitelist. Differential Revision: https://reviews.llvm.org/D75902	2020-03-10 14:58:29 +00:00
Jonas Paulsson	354d615db9	[SystemZ] Improve foldMemoryOperandImpl(). Swap the compare operands if LHS is spilled while updating the CCMask:s of the CC users. This is relatively straight forward since the live-in lists for the CC register can be assumed to be correct during register allocation (thanks to 659efa2). Also fold a spilled operand of an LOCR/SELR into an LOC(G). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D67437	2020-03-10 15:54:47 +01:00
LLVM GN Syncbot	ed59983363	[gn build] Port 714466bf367	2020-03-10 14:33:04 +00:00
Florian Hahn	bc7c21fd6c	[InstCombine] Support vectors in SimplifyAddWithRemainder. SimplifyAddWithRemainder currently also matches for vector types, but tries to create an integer constant, which causes a crash. By using Constant::getIntegerValue() we can support both the scalar and vector cases. The 2 added test cases crash without the fix. Reviewers: spatel, lebedev.ri Reviewed By: spatel, lebedev.ri Differential Revision: https://reviews.llvm.org/D75906	2020-03-10 14:29:40 +00:00
Nico Weber	6bec78f65f	[gn build] (manually) merge 47edf5bafb	2020-03-10 10:22:39 -04:00
Mikhail Maltsev	ebdfc210f1	[ARM,CDE] Generalize MVE intrinsics infrastructure to support CDE Summary: This patch generalizes the existing code to support CDE intrinsics which will share some properties with existing MVE intrinsics (some of the intrinsics will be polymorphic and accept/return values of MVE vector types). Specifically the patch: * Adds new tablegen backends -gen-arm-cde-builtin-def, -gen-arm-cde-builtin-codegen, -gen-arm-cde-builtin-sema, -gen-arm-cde-builtin-aliases, -gen-arm-cde-builtin-header based on existing MVE backends. * Renames the '__clang_arm_mve_alias' attribute into '__clang_arm_builtin_alias' (it will be used with CDE intrinsics as well as MVE intrinsics) * Implements semantic checks for the coprocessor argument of the CDE intrinsics as well as the existing coprocessor intrinsics. * Adds one CDE intrinsic __arm_cx1 to test the above changes Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen Reviewed By: simon_tatham Subscribers: sdesmalen, mgorny, kristof.beyls, danielkiss, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75850	2020-03-10 14:03:16 +00:00
Jonas Paulsson	a150110be1	[SimplifyCFG] Skip merging return blocks if it would break a CallBr. SimplifyCFG should not merge empty return blocks and leave a CallBr behind with a duplicated destination since the verifier will then trigger an assert. This patch checks for this case and avoids the transformation. CodeGenPrepare has a similar check which also has a FIXME comment about why this is needed. It seems perhaps better if these two passes would eventually instead update the CallBr instruction instead of just checking and avoiding. This fixes https://bugs.llvm.org/show_bug.cgi?id=45062. Review: Craig Topper Differential Revision: https://reviews.llvm.org/D75620	2020-03-10 14:59:13 +01:00
Sanjay Patel	04f0791d5b	[InstCombine] regenerate test checks; NFC tmp -> t because 'tmp' tends to cause problems for the auto-generation script.	2020-03-10 09:57:41 -04:00

... 2 3 4 5 6 ...

193295 Commits