llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 18:54:02 +01:00

Author	SHA1	Message	Date
Mark Murray	70244eebaf	[ARM][MVE][Intrinsics] All vqdmulhq/vqrdmulhq tests should be for signed numbers. Fix broken tests. I can't yet explain how they worked locally pre-commit.	2019-12-13 17:29:59 +00:00
Fangrui Song	510814e7a8	[ARM][MVE] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=Off builds after D71062	2019-12-13 09:26:26 -08:00
LLVM GN Syncbot	beedccd28a	gn build: Merge 84728e65e95	2019-12-13 16:20:43 +00:00
Miloš Stojanović	05b9c1cd8b	[llvm-exegesis][mips] Add BenchmarkResultTest unit test Test writing and reading benchmark instructions to and from disc, and check calculations of min, max and avg values from a list of benchmark measures. Differential Revision: https://reviews.llvm.org/D71265	2019-12-13 17:02:19 +01:00
Sam Parker	a3cacd08b6	[ARM][MVE] Make VPT invalid for tail predication We've been marking VPT incompatible instructions as invalid for tail predication too, though this may not strictly be true. VPT are incompatible and, unless its the first predicate def in a loop, they shouldn't be compatible for tail predication either. Differential Revision: https://reviews.llvm.org/D71410	2019-12-13 15:01:08 +00:00
Kristina Bessonova	965f2386e0	[llvm-dwarfdump][Statistics] Don't count coverage less than 1% as 0% Summary: This is a follow up for D70548. Currently, variables with debug info coverage between 0% and 1% are put into zero-bucket. D70548 changed the way statistics calculate a variable's coverage: we began to use enclosing scope rather than a possible variable life range. Thus more variables might be moved to zero-bucket despite they have some debug info coverage. The patch is to distinguish between a variable that has location info but it's significantly less than its enclosing scope and a variable that doesn't have it at all. Reviewers: djtodoro, aprantl, dblaikie, avl Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71070	2019-12-13 17:34:58 +03:00
Nicola Zaghen	8d0fd71b2b	Reland [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. This fixes the buildbot failures. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!	2019-12-13 14:30:21 +00:00
Sanjay Patel	356915b055	[x86] add tests for shift-trunc-shift; NFC More coverage for a possible generic transform.	2019-12-13 08:37:06 -05:00
Mikhail Maltsev	c04187c6e9	[ARM][MVE] Add vector reduction intrinsics with two vector operands Summary: This patch adds intrinsics for the following MVE instructions: * VABAV * VMLADAV, VMLSDAV * VMLALDAV, VMLSLDAV * VRMLALDAVH, VRMLSLDAVH Each of the above 4 groups has a corresponding new LLVM IR intrinsic, since the instructions cannot be easily represented using general-purpose IR operations. Reviewers: simon_tatham, ostannard, dmgreen, MarkMurrayARM Reviewed By: MarkMurrayARM Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71062	2019-12-13 13:17:29 +00:00
Kristina Bessonova	f42c025b44	[llvm-dwarfdump][Statistics] Change the coverage buckets representation. NFC Summary: This changes the representation of 'coverage buckets' in llvm-dwarfdump and llvm-locstats to one that makes more clear what the buckets contain. See some related details in D71070. Reviewers: djtodoro, aprantl, cmtice, jhenderson Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71366	2019-12-13 16:08:25 +03:00
Simon Tatham	e5e610c551	[ARM][MVE] Add intrinsics for more immediate shifts. Summary: This fills in the remaining shift operations that take a single vector input and an immediate shift count: the `vqshl`, `vqshlu`, `vrshr` and `vshll[bt]` families. `vshll[bt]` (which shifts each input lane left into a double-width output lane) is the most interesting one. There are separate MC instruction ids for shifting by exactly the input lane width and shifting by less than that, because the instruction encoding is so completely different for the lane-width special case. So I had to write two sets of patterns to match based on the immediate shift count, which involved adding a ComplexPattern matcher to avoid the general-case pattern accidentally matching the special case too. For that family I've made sure to add an llc codegen test for both versions of each instruction. I'm experimenting with a new strategy for parametrising the isel patterns for all these instructions: adding extra fields to the relevant `Instruction` subclass itself, which are ignored by the Tablegen backends that generate the MC data, but can be retrieved from each instance of that instruction subclass when it's passed as a template parameter to the multiclass that generates its isel patterns. A nice effect of that is that I can fill in those informational fields using `let` blocks, rather than having to type them out once per instruction at `defm` time. (As a result, quite a lot of existing instruction `def`s are reindented by this patch, so it's clearer to read with whitespace changes ignored.) Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71458	2019-12-13 13:07:39 +00:00
John Brawn	27b751b877	[ARM] Add custom strict fp conversion lowering when non-strict is custom We have custom lowering for operations converting to/from floating-point types when we don't have hardware support for those types, and this doesn't interact well with the target-independent legalization of the strict versions of these operations. Fix this by adding similar custom lowering of the strict versions. This fixes the last of the assertion failures in the CodeGen/ARM/fp-intrinsics test, with the remaining failures due to poor instruction selection. Differential Revision: https://reviews.llvm.org/D71127	2019-12-13 13:00:00 +00:00
Tim Renouf	030b9c457d	Revert "AMDGPU: Try to commute sub of boolean ext" This reverts commit 69fcfb7d3597e0cdb5554b4e672e9032b411b167. As shown in the test I attached to this commit, the change I reverted causes a problem with "zext(cc1) - zext(cc2)". It commuted the operands to the sub and used different logic to select the addc/subc instruction: sub zext (setcc), x => addcarry 0, x, setcc sub sext (setcc), x => subcarry 0, x, setcc ... but that is bogus. I believe it is not possible to fold those commuted patterns into any form of addcarry or subcarry. It may have worked as intended before "AMDGPU: Change boolean content type to 0 or 1" because the setcc was considered to be -1 rather than 1. Differential Revision: https://reviews.llvm.org/D70978 Change-Id: If2139421aa6c935cbd1d925af58fe4a4aa9e8f43	2019-12-13 12:49:06 +00:00
Djordje Todorovic	7c53d69f00	[llvm-locstats] Avoid the locstats when no scope bytes coverage found If the total number of PC range bytes in each variable's enclosing scope ('scope bytes total') is 0, we will have division by zero. Differential Revision: https://reviews.llvm.org/D71415	2019-12-13 13:45:44 +01:00
Alex Richardson	5244511578	[NFC] Use EVT instead of bool for getSetCCInverse() Summary: The use of a boolean isInteger flag (generally initialized using VT.isInteger()) caused errors in our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). In our backend, pointers use a separate ValueType (iFATPTR) and therefore .isInteger() returns false. This meant that getSetCCInverse() was using the floating-point variant and generated incorrect code for us: `(void )0x12033091e < (void )0xffffffffffffffff` would return false. Committing this change will significantly reduce our merge conflicts for each upstream merge. Reviewers: spatel, bogner Reviewed By: bogner Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70917	2019-12-13 12:22:03 +00:00
Sjoerd Meijer	ca4474f01a	Revert "[ARM][MVE] findVCMPToFoldIntoVPS. NFC." This reverts commit 9468e3334ba54fbb1b209aaec662d7375451fa1f. There's a test that doesn't like this change. The RDA analysis gets invalided by changes in the block, which is not taken into account. Revert while I work on a fix for this.	2019-12-13 11:56:44 +00:00
Mark Murray	2d0e359316	[ARM][MVE][Intrinsics] Add _x() variants of my _m() intrinsics. Summary: Better use of multiclass is used, and this helped find some existing bugs in the predicated VMULL* intrinsics, which are now fixed. The refactored VMULL[TB]Q_(INT\|POLY)_M() intrinsics were discovered to have an argument ("inactive") with incorrect type, and this required a fix that is included in this whole patch. The argument "inactive" should have been the same width (per vector element) as the return type of the intrinsic, but was not in the case where the return type was double the element width of the input types. To assist in testing the multiclassing , and to thwart further gremlins, the unit tests are improved in scope. The .ll tests are all generated by a small bit of throw-away scripting from the corresponding .c tests, and as such the diffs are large and nasty. Look at the file rather than the diff. Reviewers: dmgreen, miyuki, ostannard, simon_tatham Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71421	2019-12-13 11:51:23 +00:00
Kai Nacke	520375e6f3	[Docs] Fix target feature matrix for PowerPC and SystemZ The target feature matrix in the code generator documentation is outdated. This PR fixes some entries for PowerPC and SystemZ. Both have: - assembly parser - disassembler - .o file writing Reviewers: uweigand Differential Revision: https://reviews.llvm.org/D71004	2019-12-13 06:18:08 -05:00
Kerry McLaughlin	07c7fe4c54	Recommit "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores" Updated pred_load patterns added to AArch64SVEInstrInfo.td by this patch to use reg + imm non-temporal loads to fix previous test failures. Original commit message: Adds the following intrinsics: - llvm.aarch64.sve.ldnt1 - llvm.aarch64.sve.stnt1 This patch creates masked loads and stores with the MONonTemporal flag set when used with the intrinsics above.	2019-12-13 10:08:20 +00:00
David Stenberg	fa448e3ce2	[LiveDebugValues] Omit entry values for DBG_VALUEs with pre-existing expressions Summary: This is a quickfix for PR44275. An assertion that checks that the DIExpression is valid failed due to attempting to create an entry value for an indirect parameter. This started appearing after D69028, as the indirect parameter started being represented using an DW_OP_deref, rather than with the DBG_VALUE's second operand, meaning that the isIndirectDebugValue() check in LiveDebugValues did not exclude such parameters. A DIExpression that has an entry value operation can currently not have any other operation, leading to the failed isValid() check. This patch simply makes us stop considering emitting entry values for such parameters. To support such cases I think we at least need to do the following changes: * In DIExpression::isValid(): Remove the limitation that a DW_OP_LLVM_entry_value operation can be the only operation in a DIExpression. * In LiveDebugValues::emitEntryValues(): Create an entry value of size 1, so that it only wraps the register operand, and not the whole pre-existing expression (the DW_OP_deref). * In LiveDebugValues::removeEntryValue(): Check that the new debug value has the same debug expression as the original, rather than checking that the debug expression is empty. * In DwarfExpression::addMachineRegExpression(): Modify the logic so that a DW_OP_reg* expression is emitted for the entry value. That is how GCC emits entry values for indirect parameters. That will currently not happen to due the DW_OP_deref causing the !HasComplexExpression to fail. The LocationKind needs to be changed also, rather than always emitting a DW_OP_stack_value for entry values. There are probably more things I have missed, but that could hopefully be a good starting point for emitting such entry values. Reviewers: djtodoro, aprantl, jmorse, vsk Reviewed By: aprantl, vsk Subscribers: hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D71416	2019-12-13 10:49:46 +01:00
Georgii Rymar	5ee22c8a68	[yaml2obj] - Add a way to override sh_flags section field. Currently we have the `Flags` property that allows to set flags for a section. The problem is that it does not allow us to set an arbitrary value, because of bit fields validation under the hood. An arbitrary values can be used to test specific broken cases. We probably do not want to relax the validation, so this patch adds a `ShSize` property that allows to override the `sh_size`. It is inline with others `Sh*` properties we have already. Differential revision: https://reviews.llvm.org/D71411	2019-12-13 11:54:37 +03:00
Georgii Rymar	0c77a265d1	[llvm-readobj] - Fix letters used for dumping section types in GNU style. I've noticed that when we have all regular flags set, we print "WAEXMSILoGTx" instead of "WAXMSILOGTCE" printed by GNU readelf. It happens because: 1) We print SHF_EXCLUDE at the wrong place. 2) We do not recognize SHF_COMPRESSED, we print "x" instead of "C". 3) We print "o" instead of "O" for SHF_OS_NONCONFORMING. This patch fixes differences and adds test cases. Differential revision: https://reviews.llvm.org/D71418	2019-12-13 11:31:24 +03:00
Craig Topper	f2c16e8f5e	[LegalizeTypes] Remove unnecessary if before calling ReplaceValueWith on the chain in SoftenFloatRes_LOAD. I believe this is a leftover from when fp128 was softened to fp128 on X86-64. In that case type legalization must have been able to create a load that was the same as N which would make this replacement fail or assert. Since we no longer do that, this check should be unneeded.	2019-12-13 00:14:41 -08:00
Nikita Popov	76a5a184aa	Reapply [LVI] Normalize pointer behavior This is a rebase of the change over D70376, which fixes an LVI cache invalidation issue that also affected this patch. ----- Related to D69686. As noted there, LVI currently behaves differently for integer and pointer values: For integers, the block value is always valid inside the basic block, while for pointers it is only valid at the end of the basic block. I believe the integer behavior is the correct one, and CVP relies on it via its getConstantRange() uses. The reason for the special pointer behavior is that LVI checks whether a pointer is dereferenced in a given basic block and marks it as non-null in that case. Of course, this information is valid only after the dereferencing instruction, or in conservative approximation, at the end of the block. This patch changes the treatment of dereferencability: Instead of including it inside the block value, we instead treat it as something similar to an assume (it essentially is a non-nullness assume) and incorporate this information in intersectAssumeOrGuardBlockValueConstantRange() if the context instruction is the terminator of the basic block. This happens either when determining an edge-value internally in LVI, or when a terminator was explicitly passed to getValueAt(). The latter case makes this change not fully NFC, because we can now fold terminator icmps based on the dereferencability information in the same block. This is the reason why I changed one JumpThreading test (it would optimize the condition away without the change). Of course, we do not want to recompute dereferencability on each intersectAssume call, so we need a new cache for this. The dereferencability analysis requires walking the entire basic block and computing underlying objects of all memory operands. This was previously done separately for each queried pointer value. In the new implementation (both because this makes the caching simpler, and because it is faster), I instead only walk the full BB once and cache all the dereferenced pointers. So the traversal is now performed only once per BB, instead of once per queried pointer value. I think the overall model now makes more sense than before, and there will be no more pitfalls due to differing integer/pointer behavior. Differential Revision: https://reviews.llvm.org/D69914	2019-12-13 08:59:58 +01:00
Rui Ueyama	bd7cf0f396	Revert an accidental commit af5ca40b47b3e85c3add81ccdc0b787c4bc355ae	2019-12-13 15:17:40 +09:00
Rui Ueyama	8ba96bda06	temporary	2019-12-13 14:35:03 +09:00
Nate Voorhies	95d42df811	[NFC][AArch64] Fix typo. Summary: Coaleascer should be coalescer. Reviewers: qcolombet, Jim Reviewed By: Jim Subscribers: Jim, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70731	2019-12-13 10:25:36 +08:00
Eric Christopher	a1871cdc3f	Temporarily revert "NFC: DebugInfo: Refactor RangeSpanList to be a struct, like DebugLocStream::List" as it was causing bot and build failures. This reverts commit 8e04896288d22ed8bef7ac367923374f96b753d6.	2019-12-12 17:55:41 -08:00
David Blaikie	dc5cbd92c1	NFC: DebugInfo: Refactor RangeSpanList to be a struct, like DebugLocStream::List Move these data structures closer together so their emission code can eventually share more of its implementation.	2019-12-12 16:53:59 -08:00
David Blaikie	ee634363a8	NFC: DebugInfo: Refactor debug_loc/loclist emission into a common function (except for v4 loclists, which are sufficiently different to not fit well in this generic implementation) In subsequent patches I intend to refactor the DebugLoc and ranges data structures to be more similar so I can common more of the implementation here.	2019-12-12 16:39:12 -08:00
Evgenii Stepanov	1b96f8703a	hwasan: add tag_offset DWARF attribute to optimized debug info Summary: Support alloca-referencing dbg.value in hwasan instrumentation. Update AsmPrinter to emit DW_AT_LLVM_tag_offset when location is in loclist format. Reviewers: pcc Subscribers: srhines, aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70753	2019-12-12 16:18:54 -08:00
Danilo Carvalho Grael	d624de3264	[AArch64][SVE] Add integer arithmetic with immediate instructions. Summary: Add pattern matching for the following instructions: - add, sub, subr, sqadd, sqsub, uqadd, uqsub This patch required complex patterns to match the immediate with optinal left shift. I re-used the Select function from the other SVE repo to implement the complext pattern. I plan on doing another patch to also match constant vector of the same immediate. Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan Tags: #llvm Differential Revision: https://reviews.llvm.org/D71370	2019-12-12 17:28:46 -05:00
Johannes Doerfert	bde45fcf38	[Attributor][FIX] Do treat byval arguments special When we reason about the pointer argument that is byval we actually reason about a local copy of the value passed at the call site. This was not the case before and we wrongly introduced attributes based on the surrounding function. AAMemoryBehaviorArgument, AAMemoryBehaviorCallSiteArgument and AANoCaptureCallSiteArgument are made aware of byval now. The code to skip "subsuming positions" for reasoning follows a common pattern and we should refactor it. A TODO was added. Discovered by @efriedma as part of D69748.	2019-12-12 16:04:21 -06:00
Denis Bakhvalov	6605830fb0	[NFC][InstSimplify] Refactoring ThreadCmpOverSelect function Removed code duplication in ThreadCmpOverSelect and broke it into several smaller functions for reusing them. Differential Revision: https://reviews.llvm.org/D71158	2019-12-12 22:45:58 +01:00
Sanjay Patel	05003a5b3c	Revert "[DAGCombiner] fold shift-trunc-shift to shift-mask-trunc" This reverts commit 8963332c3327daa652ba3e26d35f9109b6991985. There was a logic bug typo in this code, but it wasn't visible in the asm for the tests.	2019-12-12 16:24:40 -05:00
Sanjay Patel	183b3675ca	[DAGCombiner] fold shift-trunc-shift to shift-mask-trunc This fold is done in IR by instcombine, and we have a special form of it already here in DAGCombiner, but we want the more general transform too: https://rise4fun.com/Alive/3jZm Name: general Pre: (C1 + zext(C2) < 64) %s = lshr i64 %x, C1 %t = trunc i64 %s to i16 %r = lshr i16 %t, C2 => %s2 = lshr i64 %x, C1 + zext(C2) %a = and i64 %s2, zext((1 << (16 - C2)) - 1) %r = trunc %a to i16 Name: special Pre: C1 == 48 %s = lshr i64 %x, C1 %t = trunc i64 %s to i16 %r = lshr i16 %t, C2 => %s2 = lshr i64 %x, C1 + zext(C2) %r = trunc %s2 to i16 ...because D58017 exposes a regression without this fold.	2019-12-12 15:44:13 -05:00
Teresa Johnson	3a2a2bb628	[LTO] Support for embedding bitcode section during LTO Summary: This adds support for embedding bitcode in a binary during LTO. The libLTO gains supports the `-lto-embed-bitcode` flag. The option allows users of the LTO library to embed a bitcode section. For example, LLD can pass the option via `ld.lld -mllvm=-lto-embed-bitcode`. This feature allows doing something comparable to `clang -c -fembed-bitcode`, but on the (LTO) linker level. Having bitcode alongside native code has many use-cases. To give an example, the MacOS linker can create a `-bitcode_bundle` section containing bitcode. Also, having this feature built into LLVM is an alternative to 3rd party tools such as [[ https://github.com/travitch/whole-program-llvm \| wllvm ]] or [[ https://github.com/SRI-CSL/gllvm \| gllvm ]]. As with these tools, this feature simplifies creating "whole-program" llvm bitcode files, but in contrast to wllvm/gllvm it does not rely on a specific llvm frontend/driver. Patch by Josef Eisl <josef.eisl@oracle.com> Reviewers: #llvm, #clang, rsmith, pcc, alexshap, tejohnson Reviewed By: tejohnson Subscribers: tejohnson, mehdi_amini, inglorion, hiraditya, aheejin, steven_wu, dexonsmith, dang, cfe-commits, llvm-commits, #llvm, #clang Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68213	2019-12-12 12:34:19 -08:00
Tony	39015f8b4f	[AMDGPU] AMDGPUUsage clarify address space information and other typo and formatting fixes Summary: - Clarify AMDGPU address spaces. - Correct path to AMDGPU backend since now in the mono-repo. - Fix numerous text style and typo issues. - Correct reStructure text formatting warnings. - Made reStructure directive usage more consistent. - Add references for gfx10 ISA specification. Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71392	2019-12-12 14:51:27 -05:00
Kit Barton	6565d62ef3	Rename LoopInfo::isRotated() to LoopInfo::isRotatedForm(). This patch renames the LoopInfo::isRotated() method to LoopInfo::isRotatedForm() to make it clear that the method checks whether the loop is in rotated form, not whether the loop has been rotated by the LoopRotation pass.	2019-12-12 14:22:36 -05:00
Jonas Paulsson	68d81cc406	[SystemZ] Implement the packed stack layout Any llvm function with the "packed-stack" attribute will be compiled to use the packed stack layout which reuses unused parts of the incoming register save area. This is needed for building the Linux kernel. Review: Ulrich Weigand https://reviews.llvm.org/D70821	2019-12-12 10:26:03 -08:00
Sanjay Patel	80a23ffb66	[DAGCombiner] improve readability This is not quite NFC because I changed the SDLoc to use the more standard 'N' (the starting node for the fold). This transform is a special-case of a more general fold that we do in IR, but it seems like the general fold is needed here too to avoid a potential regression seen in D58017. https://rise4fun.com/Alive/3jZm	2019-12-12 13:16:50 -05:00
Sanjay Patel	f806d650f4	[AArch64][PowerPC] add tests for shift sandwich; NFC	2019-12-12 12:37:02 -05:00
Florian Hahn	1991907bad	[BasicAA] Use GEP as context for computeKnownBits in aliasGEP. In order to use assumptions, computeKnownBits needs a context instruction. We can use the GEP, if it is an instruction. We already pass the assumption cache, but it cannot be used without a context instruction. Reviewers: anemet, asbirlea, hfinkel, spatel Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D71264	2019-12-12 17:18:04 +00:00
Michael Liao	8950bb7c13	[amdgpu] Fix `-Wenum-compare` warning. NFC.	2019-12-12 11:44:16 -05:00
LLVM GN Syncbot	156014a0e8	gn build: Merge 526244b187d	2019-12-12 15:45:15 +00:00
Florian Hahn	1643768c5c	[Matrix] Add first set of matrix intrinsics and initial lowering pass. This is the first patch adding an initial set of matrix intrinsics and a corresponding lowering pass. This has been discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html The first patch introduces four new intrinsics (transpose, multiply, columnwise load and store) and a LowerMatrixIntrinsics pass, that lowers those intrinsics to vector operations. Matrixes are embedded in a 'flat' vector (e.g. a 4 x 4 float matrix embedded in a <16 x float> vector) and the intrinsics take the dimension information as parameters. Those parameters need to be ConstantInt. For the memory layout, we initially assume column-major, but in the RFC we also described how to extend the intrinsics to support row-major as well. For the initial lowering, we split the input of the intrinsics into a set of column vectors, transform those column vectors and concatenate the result columns to a flat result vector. This allows us to lower the intrinsics without any shape propagation, as mentioned in the RFC. In follow-up patches, we plan to submit the following improvements: * Shape propagation to eliminate the embedding/splitting for each intrinsic. * Fused & tiled lowering of multiply and other operations. * Optimization remarks highlighting matrix expressions and costs. * Generate loops for operations on large matrixes. * More general block processing for operation on large vectors, exploiting shape information. We would like to add dedicated transpose, columnwise load and store intrinsics, even though they are not strictly necessary. For example, we could instead emit a large shufflevector instruction instead of the transpose. But we expect that to (1) become unwieldy for larger matrixes (even for 16x16 matrixes, the resulting shufflevector masks would be huge), (2) risk instcombine making small changes, causing us to fail to detect the transpose, preventing better lowerings For the load/store, we are additionally planning on exploiting the intrinsics for better alias analysis. Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor, efriedma, rengolin Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70456	2019-12-12 15:42:18 +00:00
Sjoerd Meijer	6e99e23adb	[ARM][MVE] findVCMPToFoldIntoVPS. NFC. This adds ReachingDefAnalysis (RDA) to the VPTBlock pass, so that we can reimplement findVCMPToFoldIntoVPS with just a few calls to RDA. Differential Revision: https://reviews.llvm.org/D71330	2019-12-12 15:41:20 +00:00
Guillaume Chatelet	60bec22752	[Alignment][NFC] Adding Align compatible methods to IntrinsicInst/IRBuilder Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71420	2019-12-12 16:22:15 +01:00
LLVM GN Syncbot	554f3952e8	gn build: Merge 600d123c6ff	2019-12-12 15:01:28 +00:00
Tom Stellard	c17f304125	AMDGPU/SILoadStoreOptimizer: Simplify function Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71044	2019-12-12 06:53:03 -08:00

1 2 3 4 5 ...

188857 Commits