llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

Author	SHA1	Message	Date
Sjoerd Meijer	1a682582ae	[Matrix] Intrinsic descriptions This changes the matrix load/store intrinsic definitions to load/store from/to a pointer, and not from/to a pointer to a vector, as discussed in D83477. This also includes the recommit of "[Matrix] Tighten LangRef definitions and Verifier checks" which adds improved language reference descriptions of the matrix intrinsics and verifier checks. Differential Revision: https://reviews.llvm.org/D83785	2020-07-14 19:58:16 +01:00
Dmitry Polukhin	b364785790	[yaml][clang-tidy] Fix multiline YAML serialization Summary: New line duplication logic introduced in https://reviews.llvm.org/D63482 has two issues: (1) there is no logic that removes duplicate newlines when clang-apply-replacment reads YAML and (2) in general such logic should be applied to all strings and should happen on string serialization level instead in YAML parser. This diff changes multiline strings quotation from single quote `'` to double `"`. It solves problems with internal newlines because now they are escaped. Also double quotation solves the problem with leading whitespace after newline. In case of single quotation YAML parsers should remove leading whitespace according to specification. In case of double quotation these leading are internal space and they are preserved. There is no way to instruct YAML parsers to preserve leading whitespaces after newline so double quotation is the only viable option that solves all problems at once. Test Plan: check-all Reviewers: gribozavr, mgehre, yvvan Subscribers: xazax.hun, hiraditya, cfe-commits, llvm-commits Tags: #clang-tools-extra, #clang, #llvm Differential Revision: https://reviews.llvm.org/D80301	2020-07-09 02:41:58 -07:00
Florian Hahn	c5d367254d	[Matrix] Use alignment info when lowering loads/stores. This patch updates LowerMatrixIntrinsics to preserve the alignment specified at the original load/stores and the align attribute for the pointer argument of the column.major.load/store intrinsics. We can always use the specified alignment for the load of the first column. For subsequent columns, the alignment may need to be reduced. For ConstantInt strides, compute the offset for the start of the column in bytes and use commonAlignment to get the largest valid alignment. For non-ConstantInt strides, we need to take the common alignment of the initial alignment and the element size in bytes. Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, rjmccall Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D81960	2020-06-18 13:19:31 +01:00
Florian Hahn	e49aa96f95	[Matrix] Preserve volatile when loading loads/stores. Currently the matrix lowering turns volatile loads/stores into non-volatile ones. This patch updates the lowering to preserve the volatile bit. Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D81498	2020-06-18 12:14:19 +01:00
Florian Hahn	20cbc7221c	[Matrix] Add volatile load/store tests (NFC).	2020-06-18 09:57:13 +01:00
Florian Hahn	d6537a7e9f	[Matrix] Update load/store intrinsics. This patch adjust the load/store matrix intrinsics, formerly known as llvm.matrix.columnwise.load/store, to improve the naming and allow passing of extra information (volatile). The patch performs the following changes: * Rename columnwise.load/store to column.major.load/store. This is more expressive and also more in line with the naming in Clang. * Changes the stride arguments from i32 to i64. The stride can be larger than i32 and this makes things more uniform with the way things are handled in Clang. * A new boolean argument is added to indicate whether the load/store is volatile. The lowering respects that when emitting vector load/store instructions * MatrixBuilder is updated to require both Alignment and IsVolatile arguments, which are passed through to the generated intrinsic. The alignment is set using the `align` attribute. The changes are grouped together in a single patch, to have a single commit that breaks the compatibility. We probably should be fine with updating the intrinsics, as we did not yet officially support them in the last stable release. If there are any concerns, we can add auto-upgrade rules for the columnwise intrinsics though. Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache, rjmccall, ftynse Reviewed By: anemet, nicolasvasilache Differential Revision: https://reviews.llvm.org/D81472	2020-06-18 09:44:52 +01:00
Florian Hahn	84fdea1445	[Matrix] Add align info to some more loads/stores (NFC). Some tests were missing alignment info. Subsequent changes properly preserve the set alignment. Set it properly beforehand, to avoid unnecessary test changes.	2020-06-16 20:42:59 +01:00
Florian Hahn	c9320fb9a9	[Matrix] Specify missing alignment in tests (NFC). Some tests were missing alignment info. Subsequent changes properly preserve the set alignment. Set it properly beforehand, to avoid unnecessary test changes. It also updates cases where an alignment of 16 was specified, instead of the vector element type alignment.	2020-06-16 15:37:35 +01:00
Florian Hahn	02d648d7f7	[IR] Add nocapture & nosync to matrix intrinsics. As suggested in D81472, the load/store intrinsics' pointer arguments can be marked as nocapture and all matrix intrinsics as nosync. This also re-flows the intrinsic definitions, to make them a little more concise.	2020-06-15 22:07:40 +01:00
Florian Hahn	e79fa2344f	[Matrix] Update check lines for strided intrinsics (NFC). This re-generates some check lines, after the naming of values got improved, to reduce the size of diffs in follow-on patches.	2020-06-09 15:51:00 +01:00
aartbik	0933dc1711	[llvm] [MatrixIntrinsics] Add row-major support for llvm.matrix.transpose Summary: Only column-major was supported so far. This adds row-major support as well. Note that we probably also want very efficient SIMD implementations for the various target platforms. Bug: https://bugs.llvm.org/show_bug.cgi?id=46085 Reviewers: nicolasvasilache, reidtatge, bkramer, fhahn, ftynse, andydavis1, craig.topper, dcaballe, mehdi_amini, anemet Reviewed By: fhahn Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80673	2020-05-28 12:13:32 -07:00
Florian Hahn	82f4f75f07	[Matrix] Add option to use row-major matrix layout as default. This patch adds a -matrix-default-layout option which can be used to set the default matrix layout to row-major or column-major (default). The initial patch updates codegen for loads, stores, binary operators and matrix multiply. Reviewers: anemet, Gerolf, andrew.w.kaylor, LuoYuanke Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D76325	2020-04-06 10:00:56 +01:00
Florian Hahn	8bc8aea47f	[Matrix] Add initial tiling for load/multiply/store chains. This patch adds initial fusion for load/multiply/store chains of matrix operations. The patch contains roughly two parts: 1. Code generation for a fused load/multiply/store chain (LowerMatrixMultiplyFused). First, we ensure that both loads of the multiply operands do not alias the store. If they do, we create new non-aliasing copies of the operands. Note that this may introduce new basic block. Finally we process TileSize x TileSize blocks. That is: load tiles from the input operands, multiply and store them. 2. Identify fusion candidates & matrix instructions. As a first step, collect all instructions with shape info and fusion candidates (currently @llvm.matrix.multiply calls). Next, try to fuse candidates and collect instructions eliminated by fusion. Finally iterate over all matrix instructions, skip the ones eliminated by fusion and lower the rest as usual. Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D75566	2020-04-06 09:28:15 +01:00
Florian Hahn	0a2b72f638	[Matrix] Add new test case with getelementptr constant exprs. The new test mostly ensures we keep doing the right thing for constant expressions while lowering matrix instructions.	2020-04-01 09:32:13 +01:00
Florian Hahn	b9a535f60f	[Matrix] Add remark propagation along the inlined-at chain. This patch adds support for propagating matrix expressions along the inlined-at chain and emitting remarks at the traversed function scopes. To motivate this new behavior, consider the example below. Without the remark 'up-leveling', we would only get remarks in load.h and store.h, but we cannot generate a remark describing the full expression in toplevel.cpp, which is the place where the user has the best chance of spotting/fixing potential problems. With this patch, we generate a remark for the load in load.h, one for the store in store.h and one for the complete expression in toplevel.cpp. For a bigger example, please see remarks-inlining.ll. load.h: template <typename Ty, unsigned R, unsigned C> Matrix<Ty, R, C> load(Ty Ptr) { Matrix<Ty, R, C> Result; Result.value = reinterpret_cast <typename Matrix<Ty, R, C>::matrix_t >(Ptr); return Result; } store.h: template <typename Ty, unsigned R, unsigned C> void store(Matrix<Ty, R, C> M1, Ty Ptr) { reinterpret_cast<typename decltype(M1)::matrix_t >(Ptr) = M1.value; } toplevel.cpp void test(double A, double B, double *C) { store(add(load<double, 3, 5>(A), load<double, 3, 5>(B)), C); } For a given function, we traverse the inlined-at chain for each matrix instruction (= instructions with shape information). We collect the matrix instructions in each DISubprogram we visit. This produces a mapping of DISubprogram -> (List of matrix instructions visible in the subpogram). We then generate remarks using the list of instructions for each subprogram in the inlined-at chain. Note that the list of instructions for a subprogram includes the instructions from its own subprograms recursively. For example using the example above, for the subprogram 'test' this includes inline functions 'load' and 'store'. This allows surfacing the remarks at a level useful to users. Please note that the current approach may create a lot of extra remarks. Additional heuristics to cut-off the traversal can be implemented in the future. For example, it might make sense to stop 'up-leveling' once all matrix instructions are at the same debug location. Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D73600	2020-03-11 17:40:08 +00:00
Florian Hahn	8119a5a24a	[Matrix] Mark matrix memory intrinsics as argmemonly/write\|read mem. matrix.columnwise.load and matrix.columnwise.store only access memory through the argument pointers. Also matrix.columnwise.store only writes memory.	2020-02-04 12:32:45 +00:00
Florian Hahn	421b40f37f	[Matrix] Mark expressions shared between multiple remarks. This patch adds support for explicitly highlighting sub-expressions shared by multiple leaf nodes. For example consider the following code %shared.load = tail call <8 x double> @llvm.matrix.columnwise.load.v8f64.p0f64(double* %arg1, i32 %stride, i32 2, i32 4), !dbg !10, !noalias !10 %trans = tail call <8 x double> @llvm.matrix.transpose.v8f64(<8 x double> %shared.load, i32 2, i32 4), !dbg !10 tail call void @llvm.matrix.columnwise.store.v8f64.p0f64(<8 x double> %trans, double* %arg3, i32 10, i32 4, i32 2), !dbg !10 %load.2 = tail call <30 x double> @llvm.matrix.columnwise.load.v30f64.p0f64(double* %arg3, i32 %stride, i32 2, i32 15), !dbg !10, !noalias !10 %mult = tail call <60 x double> @llvm.matrix.multiply.v60f64.v8f64.v30f64(<8 x double> %trans, <30 x double> %load.2, i32 4, i32 2, i32 15), !dbg !11 tail call void @llvm.matrix.columnwise.store.v60f64.p0f64(<60 x double> %mult, double* %arg2, i32 10, i32 4, i32 15), !dbg !11 We have two leaf nodes (the 2 stores) and the first store stores %trans which is also used by the matrix multiply %mult. We generate separate remarks for each leaf (stores). To denote that parts are shared, the shared expressions are marked as shared (), with a reference to the other remark that shares it. The operation summary also denotes the shared operations separately. Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D72526	2020-01-28 09:27:55 -08:00
Florian Hahn	7b27c55d67	[Matrix] Mark remarks test as AArch64 specific.	2020-01-27 18:00:43 -08:00
Florian Hahn	c497f867de	[Matrix] Add info about number of operations to remarks. This patch updates the remark to also include a summary of the number of vector operations generated for each matrix expression. Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D72480	2020-01-27 17:43:39 -08:00
Florian Hahn	47a33e71d6	[Matrix] Add optimization remarks for matrix expression. Generate remarks for matrix operations in a function. To generate remarks for matrix expressions, the following approach is used: 1. Collect leafs of matrix expressions (done in RemarkGenerator::getExpressionLeafs). Leafs are lowered matrix instructions without other matrix users (like stores). 2. For each leaf, create a remark containing a linearizied version of the matrix expression. The following improvements will be submitted as follow-ups: * Summarize number of vector instructions generated for each expression. * Account for shared sub-expressions. * Propagate matrix remarks up the inlining chain. The information provided by the matrix remarks helps users to spot cases where matrix expression got split up, e.g. due to inlining not happening. The remarks allow users to address those issues, ensuring best performance. Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D72453	2020-01-27 16:39:29 -08:00
Florian Hahn	4d6151bec8	[Matrix] Update shape propagation to iterate until done. This patch updates the shape propagation to iterate until no new shape information is discovered. As initial seed for the forward propagation, we use the matrix intrinsic instructions. Both propagateShapeForward and propagateShapeBackward return new work lists, with the instructions to be used for the next iteration. When propagating forward, we record all instructions we added new shape information for. When propagating backward, we record all users of instructions we added new shape information for. Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70901	2020-01-09 10:52:52 +00:00
Florian Hahn	0a96d7d106	[Matrix] Propagate and use shape information for loads. This patch extends to shape propagation to also include load instructions and implements shape aware lowering for vector loads. Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70900	2020-01-09 10:21:20 +00:00
Florian Hahn	ea4efaf94d	[Matrix] Implement back-propagation of shape information. This patch extends the shape propagation for matrix operations to also propagate the shape of instructions to their operands. Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70899	2020-01-09 09:48:07 +00:00
Florian Hahn	8be26dcc2f	[Matrix] Propagate and use shape info for binary operators. This patch extends the current shape propagation and shape aware lowering to also support binary operators. Those operators are uniform with respect to their shape (shape of the input operands is the same as the shape of their result). Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70898	2019-12-27 15:50:47 +00:00
Florian Hahn	ad8abe7ef4	[Matrix] Use fmuladd for matrix.multiply if allowed. If the matrix.multiply calls have the contract fast math flag, we can use fmuladd. This als adds a command line option to force fmuladd generation. We can retire this option once there is a clang-level option. Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70951	2019-12-23 14:49:14 +01:00
Florian Hahn	70930dfece	[Matrix] Add forward shape propagation and first shape aware lowerings. This patch adds infrastructure for forward shape propagation to LowerMatrixIntrinsics. It also updates the pass to make use of the shape information to break up larger vector operations and to eliminate unnecessary conversion operations between columnwise matrixes and flattened vectors: if shape information is available for an instruction, lower the operation to a set of instructions operating on columns. For example, a store of a matrix is broken down into separate stores for each column. For users that do not have shape information (e.g. because they do not yet support shape information aware lowering), we pack the result columns into a flat vector and update those users. It also adds shape aware lowering for the first non-intrinsic instruction: vector stores. Example: For %c = call <4 x double> @llvm.matrix.transpose(<4 x double> %a, i32 2, i32 2) store <4 x double> %c, <4 x double>* %Ptr We generate the code below without shape propagation. Note %9 which combines the columns of the transposed matrix into a flat vector. %split = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 0, i32 1> %split1 = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 2, i32 3> %1 = extractelement <2 x double> %split, i64 0 %2 = insertelement <2 x double> undef, double %1, i64 0 %3 = extractelement <2 x double> %split1, i64 0 %4 = insertelement <2 x double> %2, double %3, i64 1 %5 = extractelement <2 x double> %split, i64 1 %6 = insertelement <2 x double> undef, double %5, i64 0 %7 = extractelement <2 x double> %split1, i64 1 %8 = insertelement <2 x double> %6, double %7, i64 1 %9 = shufflevector <2 x double> %4, <2 x double> %8, <4 x i32> <i32 0, i32 1, i32 2, i32 3> store <4 x double> %9, <4 x double>* %Ptr With this patch, we propagate the 2x2 shape information from the transpose to the store and we generate the code below. Note that we store the columns directly and do not need an extra shuffle. %9 = bitcast <4 x double>* %Ptr to double* %10 = bitcast double* %9 to <2 x double>* store <2 x double> %4, <2 x double>* %10, align 8 %11 = getelementptr double, double* %9, i32 2 %12 = bitcast double* %11 to <2 x double>* store <2 x double> %8, <2 x double>* %12, align 8 Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70897	2019-12-23 13:51:56 +01:00
Florian Hahn	1643768c5c	[Matrix] Add first set of matrix intrinsics and initial lowering pass. This is the first patch adding an initial set of matrix intrinsics and a corresponding lowering pass. This has been discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html The first patch introduces four new intrinsics (transpose, multiply, columnwise load and store) and a LowerMatrixIntrinsics pass, that lowers those intrinsics to vector operations. Matrixes are embedded in a 'flat' vector (e.g. a 4 x 4 float matrix embedded in a <16 x float> vector) and the intrinsics take the dimension information as parameters. Those parameters need to be ConstantInt. For the memory layout, we initially assume column-major, but in the RFC we also described how to extend the intrinsics to support row-major as well. For the initial lowering, we split the input of the intrinsics into a set of column vectors, transform those column vectors and concatenate the result columns to a flat result vector. This allows us to lower the intrinsics without any shape propagation, as mentioned in the RFC. In follow-up patches, we plan to submit the following improvements: * Shape propagation to eliminate the embedding/splitting for each intrinsic. * Fused & tiled lowering of multiply and other operations. * Optimization remarks highlighting matrix expressions and costs. * Generate loops for operations on large matrixes. * More general block processing for operation on large vectors, exploiting shape information. We would like to add dedicated transpose, columnwise load and store intrinsics, even though they are not strictly necessary. For example, we could instead emit a large shufflevector instruction instead of the transpose. But we expect that to (1) become unwieldy for larger matrixes (even for 16x16 matrixes, the resulting shufflevector masks would be huge), (2) risk instcombine making small changes, causing us to fail to detect the transpose, preventing better lowerings For the load/store, we are additionally planning on exploiting the intrinsics for better alias analysis. Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor, efriedma, rengolin Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70456	2019-12-12 15:42:18 +00:00

27 Commits