llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-18 18:42:46 +02:00

Author	SHA1	Message	Date
serge-sans-paille	39f31f66e4	[NFC] Wisely nest dyn_cast in FunctionLoweringInfo Take advantage of the inheritance tree to avoid a few comparison.	2021-03-16 10:22:44 +01:00
Caroline Concatto	cb0c62b65c	[SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse This patch adds support for reverse loop vectorization. It is possible to vectorize the following loop: ``` for (int i = n-1; i >= 0; --i) a[i] = b[i] + 1.0; ``` with fixed or scalable vector. The loop-vectorizer will use 'reverse' on the loads/stores to make sure the lanes themselves are also handled in the right order. This patch adds support for scalable vector on IRBuilder interface to create a reverse vector. The IR function CreateVectorReverse lowers to experimental.vector.reverse for scalable vector and keedp the original behavior for fixed vector using shuffle reverse. Differential Revision: https://reviews.llvm.org/D95363	2021-03-16 07:51:59 +00:00
Amara Emerson	7917eb608e	[AArch64][GlobalISel] Fix crash on lowering <1 x half> types.	2021-03-15 23:27:43 -07:00
wlei	6bd02e9a2a	[CSSPGO][llvm-profgen] Fix getCanonicalFnName usage in llvm-profgen Previously we didn't support to keep the unique linkage name(-funique-internal-linkage-name) in llvm-profgen. As discussed in https://reviews.llvm.org/D96932, we choose to do canonicalization for it. Now since "selected" is set as the default parameter of getCanonicalFnName in `D96932`, we don't need to add any attribute here for the previous usage and only fix the missing usage in the pseudo probe decoding. Differential Revision: https://reviews.llvm.org/D98226	2021-03-15 21:00:42 -07:00
Johannes Doerfert	60c629d360	[NVPTX] CUDA does provide malloc/free since compute capability 2.X https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#dynamic-global-memory-allocation-and-operations Reviewed By: tra Differential Revision: https://reviews.llvm.org/D98606	2021-03-15 22:45:56 -05:00
Josh Berdine	984ad28c40	[OCaml][test] Fix Bindings/OCaml/executionengine.ml test It seems that at some point it became necessary to pass `-thread` to the ocaml compiler for this test. Differential Revision: https://reviews.llvm.org/D98593	2021-03-16 02:48:36 +00:00
LLVM GN Syncbot	b19c601bda	[gn build] Port 4f198b0c27b0	2021-03-16 02:41:16 +00:00
Bing1 Yu	ab2b029d8f	[X86] Pass to transform amx intrinsics to scalar operation. This pass runs in any situations but we skip it when it is not O0 and the function doesn't have optnone attribute. With -O0, the def of shape to amx intrinsics is near the amx intrinsics code. We are not able to find a point which post-dominate all the shape and dominate all amx intrinsics. To decouple the dependency of the shape, we transform amx intrinsics to scalar operation, so that compiling doesn't fail. In long term, we should improve fast register allocation to allocate amx register. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D93594	2021-03-16 10:40:22 +08:00
David Blaikie	11757a1309	Skip path separators to make the test portable across Win/Linux	2021-03-15 18:24:40 -07:00
Petr Hosek	fb12fb1321	[CMake] Clean up unnecessary dependency The LINK_COMPONENTS dependency between DebugInfoCodeView and DebugInfoMSF is unnecessary. Breaking them would allow a more fine-controlled distribution. Patch By: dangyi Differential Revision: https://reviews.llvm.org/D98465	2021-03-15 16:29:16 -07:00
LLVM GN Syncbot	4791deff1c	[gn build] Port ecf6466f01c5	2021-03-15 23:01:19 +00:00
Lang Hames	4b389e1c7b	[JITLink][MachO][x86-64] Introduce generic x86-64 support. This patch introduces generic x86-64 edge kinds, and refactors the MachO/x86-64 backend to use these edge kinds. This simplifies the implementation of the MachO/x86-64 backend and makes it possible to write generic x86-64 passes and utilities. The new edge kinds are different from the original set used in the MachO/x86-64 backend. Several edge kinds that were not meaningfully distinguished in that backend (e.g. the PCRelMinusN edges) have been merged into single edge kinds in the new scheme (these edge kinds can be reintroduced later if we find a use for them). At the same time, new edge kinds have been introduced to convey extra information about the state of the graph. E.g. The RequestAndTransformTo* edges represent GOT/TLVP relocations prior to synthesis of the GOT/TLVP entries, and the 'Relaxable' suffix distinguishes edges that are candidates for optimization from edges which should be left as-is (e.g. to enable runtime redirection). ELF/x86-64 will be refactored to use these generic edges at some point in the future, and I anticipate a similar refactor to create a generic arm64 support header too. Differential Revision: https://reviews.llvm.org/D98305	2021-03-15 15:43:07 -07:00
Nico Weber	b7c9aa8059	[gn build] merge af2796c76d2f a bit more The default is fine on non-Win, but on Win this needs an explicit setting now that lit no longer has the right default.	2021-03-15 18:20:54 -04:00
Alexander Yermolovich	0440613b8e	[DWARF] Check for AddrOffsetSectionBase to work with DWO Units. Context: https://lists.llvm.org/pipermail/llvm-dev/2021-February/148521.html A fix for llvm-symbolizer, and other tools like BOLT, that allows retrieving address when built with -gsplit-dwarf=single mode. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D96827	2021-03-15 14:46:09 -07:00
Artem Belevich	7a17da6eb6	[NVPTX] Avoid temp copy of byval kernel parameters. Avoid making a temporary copy of byval argument if all accesses are loads and therefore the pointer to the parameter can not escape. This avoids excessive global memory accesses when each kernel makes its own copy. Differential revision: https://reviews.llvm.org/D98469	2021-03-15 14:27:22 -07:00
Nick Lewycky	40de5cf96a	NFC: Formatting changes. Run clang-format over these files. Capitalize some variable names per clang-tidy's request. Pulled out to simplify review of D98302.	2021-03-15 14:26:39 -07:00
Stanislav Mekhanoshin	17050632cf	[AMDGPU] Fix copyPhysReg to not produce unalined vgpr access RA can insert something like a sub1_sub2 COPY of a wide VGPR tuple which results in the unaligned acces with v_pk_mov_b32 after the copy is expanded. This is regression after D97316. Differential Revision: https://reviews.llvm.org/D98549	2021-03-15 14:14:30 -07:00
Florian Hahn	1aed9ced3b	[AnnotationRemarks] Remove unneeded Function.h include (NFC).	2021-03-15 21:09:35 +00:00
Nico Weber	19f76964c4	[gn build] merge 9bcf0eff99	2021-03-15 17:05:05 -04:00
Nico Weber	f75a03ef22	[gn build] kind of merge af2796c76d2f Good enough for now. If we need more, we'll do the usual platform-dependent hardcoding that in practice works for everything else too.	2021-03-15 17:01:00 -04:00
Stanislav Mekhanoshin	fc6febe595	[AMDGPU] Fixed msan failure with uninitialized value	2021-03-15 13:58:19 -07:00
Kirill Bobyrev	da5206f100	[clangd] Optionally add reflection for clangd-index-server This was originally landed without the optional part and reverted later: `8080ea4c4b` Reviewed By: kadircet Differential Revision: https://reviews.llvm.org/D98404	2021-03-15 21:07:25 +01:00
Markus Böck	9badf35f5b	Revert line accidentally included in af2796c76d2ff4b73165ed47959afd35a769beee	2021-03-15 21:03:46 +01:00
Sanjay Patel	677d887642	[SLP] update stale test comments; NFC These bugs were fixed with 0a8e7ca402eb	2021-03-15 16:02:46 -04:00
Stanislav Mekhanoshin	196e7f3138	[AMDGPU] Use single cache policy operand Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amount of code. These operands are mostly 0 anyway. Additional advantage that parser will accept these flags in any order unlike now. Differential Revision: https://reviews.llvm.org/D96469	2021-03-15 13:00:59 -07:00
Markus Böck	1be4884f17	[test] Add ability to get error messages from CMake for errc substitution Visual Studios implementation of the C++ Standard Library does not use strerror to produce a message for std::error_code unlike other standard libraries such as libstdc++ or libc++ that might be used. This patch adds a cmake script that through running a C++ program gets the error messages for the POSIX error codes and passes them onto lit through an optional config parameter. If the config parameter is not set, or getting the messages failed, due to say a cross compiling configuration without an emulator, it will fall back to using pythons strerror functions. Differential Revision: https://reviews.llvm.org/D98278	2021-03-15 20:56:08 +01:00
Wenlei He	93a5ec7e97	[CSSPGO] Load context profile for external functions in PreLink and populate ThinLTO import list For ThinLTO's prelink compilation, we need to put external inline candidates into an import list attached to function's entry count metadata. This enables ThinLink to treat such cross module callee as hot in summary index, and later helps postlink to import them for profile guided cross module inlining. For AutoFDO, the import list is retrieved by traversing the nested inlinee functions. For CSSPGO, since profile is flatterned, a few things need to happen for it to work: - When loading input profile in extended binary format, we need to load all child context profile whose parent is in current module, so context trie for current module includes potential cross module inlinee. - In order to make the above happen, we need to know whether input profile is CSSPGO profile before start reading function profile, hence a flag for profile summary section is added. - When searching for cross module inline candidate, we need to walk through the context trie instead of nested inlinee profile (callsite sample of AutoFDO profile). - Now that we have more accurate counts with CSSPGO, we swtiched to use entry count instead of total count to decided if an external callee is potentially beneficial to inline. This make it consistent with how we determine whether call tagert is potential inline candidate. Differential Revision: https://reviews.llvm.org/D98590	2021-03-15 12:22:15 -07:00
Fangrui Song	0fdd473f7e	Change void getNoop(MCInst &NopInst) to MCInst getNop() Prefer (self-documenting) return values to output parameters (which are liable to be used). While here, rename Noop to Nop which is more widely used and improves consistency with hasEmitNops/setEmitNops/emitNop/etc.	2021-03-15 12:05:34 -07:00
Craig Topper	bea426ec19	[RISCV] Add RISCVISD::BR_CC similar to RISCVISD::SELECT_CC. This allows me to introduce similar combines for branches as we have recently added for SELECT_CC. Some of them are less useful for standalone setccs and only help branch instructions. By having a BR_CC node its easier to only affect branches. I'm using CondCodeSDNode to make isel patterns easier to write so we can refer to the codes by name. SELECT_CC uses a constant instead. I've translated the condition code just like SELECT_CC so we need less patterns for the swapped conditions. This includes special cases for X < 1 and X > -1 that get translated to blez and bgez by using a 0 constant. computeKnownBitsForTargetNode support for SELECT_CC is added to allow MaskedValueIsZero to work for cases where the true and false values of the SELECT_CC are setccs and the result of the SELECT_CC is used by a BR_CC. This was needed to avoid regressions in some of the overflow tests. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D98159	2021-03-15 11:54:01 -07:00
Philipp Tomsich	e1e0468f14	[RISCV] Add isel-patterns to optimize (a < 1) into blez (a <= 0) The following code-sequence showed up in a testcase (isolated from SPEC2017) for if-conversion and vectorization when searching for the maximum in an array: addi a2, zero, 1 blt a1, a2, .LBB0_5 which can be expressed as `bge zero,a1,.LBB0_5`/`blez a1,/LBB0_5`. More generally, we want to express (a < 1) as (a <= 0). This adds the required isel-pattern and updates the testcases. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98449	2021-03-15 11:32:43 -07:00
Stelios Ioannou	5ca2e09258	[AArch64] Implement __rndr, __rndrrs intrinsics This patch implements the __rndr and __rndrrs intrinsics to provide access to the random number instructions introduced in Armv8.5-A. They are only defined for the AArch64 execution state and are available when __ARM_FEATURE_RNG is defined. These intrinsics store the random number in their pointer argument and return a status code if the generation succeeded. The difference between __rndr __rndrrs, is that the latter intrinsic reseeds the random number generator. The instructions write the NZCV flags indicating the success of the operation that we can then read with a CSET. [1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics [2] https://bugs.llvm.org/show_bug.cgi?id=47838 Differential Revision: https://reviews.llvm.org/D98264 Change-Id: I8f92e7bf5b450e5da3e59943b53482edf0df6efc	2021-03-15 17:51:48 +00:00
Juneyoung Lee	5173305fd6	[AssumeBundles] Add nonnull/align to op bundle if noundef exists This is a patch to add nonnull and align to assume's operand bundle only if noundef exists. Since nonnull and align in fn attr have poison semantics, they should be paired with noundef or noundef-implying attributes to be immediate UB. Reviewed By: jdoerfert, Tyker Differential Revision: https://reviews.llvm.org/D98228	2021-03-16 10:23:42 +09:00
Fraser Cormack	33c1eeca58	[CodeGen] Fix issues with scalable-vector INSERT/EXTRACT_SUBVECTORs This patch addresses a few issues when dealing with scalable-vector INSERT_SUBVECTOR and EXTRACT_SUBVECTOR nodes. When legalizing in DAGTypeLegalizer::SplitVecRes_INSERT_SUBVECTOR, we store the low and high halves to the stack separately. The offset for the high half was calculated incorrectly. Additionally, we can optimize this process when we can detect that the subvector is contained entirely within the low/high split vector type. While this optimization is valid on scalable vectors, when performing the 'high' optimization, the subvector must also be a scalable vector. Note that the 'low' optimization is still conservative: it may be possible to insert v2i32 into the low half of a split nxv1i32/nxv1i32, but we can't guarantee it. It is always possible to insert v2i32 into nxv2i32 or v2i32 into nxv4i32+2 as we know vscale is at least 1. Lastly, in SelectionDAG::isSplatValue, we early-exit on the extracted subvector value type being a scalable vector, forgetting that we can also extract a fixed-length vector from a scalable one. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98495	2021-03-15 17:04:21 +00:00
Nico Weber	4f01a02f9f	[gn build] (semi-manually) port b136a74efc54	2021-03-15 12:51:12 -04:00
Christopher Tetreault	909d35077f	[CMake] Require python 3.6 if enabling LLVM test targets The lit test suite uses python 3.6 features. Rather than a strange python syntax error upon running the lit tests, we will require the correct version in CMake. Reviewed By: serge-sans-paille, yln Differential Revision: https://reviews.llvm.org/D95635	2021-03-15 09:50:39 -07:00
Craig Topper	b807b27145	[RISCV] Improve legalization of i32 UADDO/USUBO on RV64. The default legalization uses zero extends that require pair of shifts on RISCV. Instead we can take advantage of the fact that unsigned compares work equally well on sign extended inputs. This allows us to use addw/subw and sext.w. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D98233	2021-03-15 09:30:23 -07:00
Simon Pilgrim	ae2fffa9ef	[X86][SSE] isHorizontalBinOp - ensure we clear any unused source operands to improve HADD/SUB matching Our shuffle matching for HADD/SUB patterns wasn't clearing repeated ops in 'fake unary' style shuffle masks (unpack(x,x) etc.), preventing matching of add(fakeunary(),fakeunary()) style patterns.	2021-03-15 16:24:29 +00:00
Zahira Ammarguellat	75a089166d	[NFC] Fix "unused parameter" error revealed in the Linux self-build.	2021-03-15 12:17:11 -04:00
Sanjay Patel	69f00a58ef	[InstSimplify] ctlz({signbit} >>u x) --> x The motivating pattern was handled in 0a2d69480d , but we should have this for symmetry. But this really highlights that we could generalize for any shifted constant if we match this in instcombine. https://alive2.llvm.org/ce/z/MrmVNt	2021-03-15 12:03:35 -04:00
Sanjay Patel	6b99ab8b48	[InstSimplify] add tests for ctlz of shifted constant; NFC	2021-03-15 12:03:35 -04:00
LLVM GN Syncbot	54ae663167	[gn build] Port 13e49dcee48f	2021-03-15 15:24:41 +00:00
Jon Chesterfield	c86d684da8	[amdgpu] Implement lower function LDS pass [amdgpu] Implement lower function LDS pass Local variables are allocated at kernel launch. This pass collects global variables that are used from non-kernel functions, moves them into a new struct type, and allocates an instance of that type in every kernel. Uses are then replaced with a constantexpr offset. Prior to this pass, accesses from a function are compiled to trap. With this pass, most such accesses are removed before reaching codegen. The trap logic is left unchanged by this pass. It is still reachable for the cases this pass misses, notably the extern shared construct from hip and variables marked constant which survive the optimizer. This is of interest to the openmp project because the deviceRTL runtime library uses cuda shared variables from functions that cannot be inlined. Trunk llvm therefore cannot compile some openmp kernels for amdgpu. In addition to the unit tests attached, this patch applied to ROCm llvm with fixed-abi enabled and the function pointer hashing scheme deleted passes the openmp suite. This lowering will use more LDS than strictly necessary. It is intended to be a functionally correct fallback for cases that are difficult to target from future optimisation passes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94648	2021-03-15 15:24:01 +00:00
Simon Pilgrim	2593dc40a4	[X86][SSE] canonicalizeShuffleWithBinOps - handle target shuffles. Fold SHUFFLE(BINOP(SHUFFLE(X),SHUFFLE(Y))) -> BINOP(SHUFFLE'(X),SHUFFLE'(Y)) style patterns as well as the existing shuffles of constants.	2021-03-15 15:01:29 +00:00
David Green	19dfc4f3fc	[AArch64] Zero extended extract_vector_elt pattern This adds a pattern for i64 zext_inreg(i32 extract_vector_elt X), producing a single UMOVvi16 instruction that is already expected to clear the top bits. The exact pattern that this matches is and(anyext(vector_extract X, lane), 0xff), similar to the sext patterns higher up in the same file. Differential Revision: https://reviews.llvm.org/D98599	2021-03-15 14:56:20 +00:00
Amy Kwan	d53e3aade2	[NFC][PowerPC] Add additional load/store test cases This patch adds additional load/store test cases involving scalars, vectors, and PC-Rel in preparation for the refactored load and store implementation introduced in D93370. Differential Revision: https://reviews.llvm.org/D97391	2021-03-15 08:54:38 -05:00
Wael Yehia	187913c5cf	[PATCH] fix location of test case from D97507.	2021-03-15 09:34:24 -04:00
Anton Afanasyev	0e939ab792	[SLP][Test] Precommit test for PR40522	2021-03-15 15:53:54 +03:00
Carl Ritson	aa27a4d8bb	[AMDGPU] Fix shortfalls in WQM marking When tracking defined lanes through phi nodes in the live range graph each branch of the phi must be handled independently. Also rewrite the marking algorithm to reduce unnecessary operations. Previously a shared set of defined lanes was used which caused marking to stop prematurely. This was observable in existing lit tests, but test patterns did not cover this detail. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D98614	2021-03-15 21:44:15 +09:00
Simon Pilgrim	f127ae93b8	[X86][SSE] canonicalizeShuffleWithBinOps - add X86ISD::PSHUFB handling. Recommit rGcd938ab162b0ac560dd0e9fee290980c7e0e47e5 with an early-out if the pshub would introduce zeros across the binop.	2021-03-15 12:43:30 +00:00
Bradley Smith	19a711a469	[AArch64][SVE] Add unpredicated ld1/st1 patterns for reg+reg addressing modes Differential Revision: https://reviews.llvm.org/D95677	2021-03-15 12:36:28 +00:00

1 2 3 4 5 ...

212796 Commits