llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 05:01:59 +01:00

Author	SHA1	Message	Date
Sam Parker	c326fd4f01	[NFC][ARM] Add more tail predication tests	2020-05-19 14:01:10 +01:00
David Green	e11168f2ca	[ARM] Patterns for VQSHRN Given a VQMOVN(VSHR), we can fold that into a VQSHRN simply enough using a few tablegen patterns. Differential Revision: https://reviews.llvm.org/D77720	2020-05-16 17:46:43 +01:00
David Green	c1a15ae1a8	[ARM] Combines for VMOVN This adds two combines for VMOVN, one to fold VMOVN[tb](c, VQMOVNb(a, b)) => VQMOVN[tb](c, b) The other to perform demand bits analysis on the lanes of a VMOVN. We know that only the bottom lanes of the second operand and the top or bottom lanes of the Qd operand are needed in the result, depending on if the VMOVN is bottom or top. Differential Revision: https://reviews.llvm.org/D77718	2020-05-16 15:13:16 +01:00
David Green	4120e7a927	[ARM] MVE saturating truncates This adds some custom lowering for VQMOVN, an instruction that can be used to perform saturating truncates from a pair of min(max(X, -0x8000), 0x7fff), providing those constants are correct. This leaves a VQMOVNBs which saturates the value and inserts that into the bottom lanes of an existing vector. We then need to do something with the other lanes, extending the value using a vmovlb. Ideally, as will often be the case, only the bottom lane of what remains will be demanded, allowing the vmovlb to be removed. Which should mean the instruction is either equal or a win most of the time, and allows some extra follow-up folding to happen. Differential Revision: https://reviews.llvm.org/D77590	2020-05-16 15:10:20 +01:00
David Green	b6aab3138f	[ARM] Extra VQMOVN/VQSHRN tests. NFC	2020-05-16 14:23:26 +01:00
David Green	4c9b5189ca	[ARM] Change more triples to arm-none-none-eabi. NFC	2020-05-15 22:53:07 +01:00
Anna Welker	8fb628942b	[ARM][MVE] Add support for incrementing scatters Adds support to build pre-incrementing scatters. If the increment (i.e., add instruction) that is merged into the scatter is the loop increment, an incrementing write-back scatter can be built, which then assumes the role of the loop increment. Differential Revision: https://reviews.llvm.org/D79859	2020-05-15 17:02:00 +01:00
David Green	439830ee0e	[ARM] Convert floating point splats to integer Under MVE a vdup will always take a gpr register, not a floating point value. During DAG combine we convert the types to a bitcast to an integer in an attempt to fold the bitcast into other instructions. This is OK, but only works inside the same basic block. To do the same trick across a basic block boundary we need to convert the type in codegenprepare, before the splat is sunk into the loop. This adds a convertSplatType function to codegenprepare to do that, putting bitcasts around the splat to force the type to an integer. There is then some adjustment to the code in shouldSinkOperands to handle the extra bitcasts. Differential Revision: https://reviews.llvm.org/D78728	2020-05-13 15:24:16 +01:00
David Green	0021a951bd	[ARM] Sink splats to fma intrinsics Similar to fmul/fadd, we can sink a splat into a loop containing a fma in order to use more register instruction variants. For that there are also adjustments to the sinking code to handle more than 2 arguments. Differential Revision: https://reviews.llvm.org/D78386	2020-05-13 14:58:30 +01:00
Pierre-vh	39a7b5b535	[LSR][ARM] Add new TTI hook to mark some LSR chains as profitable This patch adds a new TTI hook to allow targets to tell LSR that a chain including some instruction is already profitable and should not be optimized. This patch also adds an implementation of this TTI hook for ARM so LSR doesn't optimize chains that include the VCTP intrinsic. Differential Revision: https://reviews.llvm.org/D79418	2020-05-13 14:18:28 +01:00
Pierre-vh	90e5c93ad7	[Target][ARM] Replace re-uses of old VPR values with VPNOTs Differential Revision: https://reviews.llvm.org/D76847	2020-05-12 12:09:57 +01:00
Eli Friedman	f704804dd2	[SelectionDAG] Don't promote the alignment of allocas beyond the stack alignment. allocas in LLVM IR have a specified alignment. When that alignment is specified, the alloca has at least that alignment at runtime. If the specified type of the alloca has a higher preferred alignment, SelectionDAG currently ignores that specified alignment, and increases the alignment. It does this even if it would trigger stack realignment. I don't think this makes sense, so this patch changes that. I was looking into this for SVE in particular: for SVE, overaligning vscale'ed types is extra expensive because it requires realigning the stack multiple times, or using dynamic allocation. (This currently isn't implemented.) I updated the expected assembly for a couple tests; in particular, for arg-copy-elide.ll, the optimization in question does not increase the alignment the way SelectionDAG normally would. For the rest, I just increased the specified alignment on the allocas to match what SelectionDAG was inferring. Differential Revision: https://reviews.llvm.org/D79532	2020-05-11 17:39:00 -07:00
David Green	fb0ed4b40d	[ARM] Convert VDUPLANE to VDUP under MVE Unlike Neon, MVE does not have a way of duplicating from a vector lane, so a VDUPLANE currently selects to a VDUP(move_from_lane(..)). This forces that to be done earlier as a dag combine to allow other folds to happen. It converts to a VDUP(EXTRACT). On FP16 this is then folded to a VGETLANEu to prevent it from creating a vmovx;vmovhr pair, using a single move_from_reg instead. Differential Revision: https://reviews.llvm.org/D79606	2020-05-09 18:58:13 +01:00
Simon Pilgrim	9de716fac1	[DAG] SimplifyMultipleUseDemandedBits - remove superfluous bitcasts If the SimplifyMultipleUseDemandedBits calls BITCASTs that peek through back to the original type then we can remove the BITCASTs entirely. Differential Revision: https://reviews.llvm.org/D79572	2020-05-08 19:04:49 +01:00
David Green	fbc724fa32	[ARM] Change test target to arm-none-none-eabi. NFC	2020-05-08 14:16:31 +01:00
James Y Knight	582ae3d23e	Correctly modify the CFG in IfConverter, and then remove the CorrectExtraCFGEdges function. The latter was a workaround for "Various pieces of code" leaving bogus extra CFG edges in place. Where by "various" it meant only IfConverter::MergeBlocks, which failed to clear all of the successors of dead blocks it emptied out. This wouldn't matter a whole lot, except that the dead blocks remained listed as predecessors of still-useful blocks, inhibiting optimizations. This fix slightly changed two thumb tests, because the correct CFG successors allowed for the "diamond" if-conversion pattern to be detected, when it could only use "simple" before. Additionally, the removal of a now-redundant call to analyzeBranch (with AllowModify=true) in BranchFolder::OptimizeFunction caused a later check for an empty block in BranchFolder::OptimizeBlock to fail. Correct this by moving the call to analyzeBranch in OptimizeBlock higher. Differential Revision: https://reviews.llvm.org/D79527	2020-05-07 18:17:07 -04:00
Anna Welker	5274cd21d2	[ARM][MVE] Add support for incrementing gathers Enables the MVEGatherScatterLowering pass to build pre-incrementing gathers. Incrementing writeback gathers are built when it is possible to replace the loop increment instruction. Differential Revision: https://reviews.llvm.org/D76786	2020-05-07 12:33:50 +01:00
Sam Parker	62d4131f92	[NFC][ARM] Add tail predication test	2020-05-07 08:19:32 +01:00
David Green	8184dbe12c	[ARM] VMOVhr load -> vldr Much like the similar combine added recently for VMOVrh load, this adds a fold for VMOVhr load turning it into a vldr.f16 as opposed to a vldrh and vmov.f16. Differential Revision: https://reviews.llvm.org/D78714	2020-05-06 15:45:56 +01:00
David Green	c9f36a7bf2	[ARM] VMOVrh of VMOVhr A VMOVhr of a VMOVrh can be simply folded to the original HPR value. Differential Revision: https://reviews.llvm.org/D78710	2020-05-06 15:10:01 +01:00
David Green	cca08857e3	[ARM] Extract from a VDUP If we get into the situation where we are extracting from a VDUP, the extracted value is just the origin, so long as the types match or we can bitcast between the two. Differential Revision: https://reviews.llvm.org/D78708	2020-05-06 14:51:25 +01:00
David Green	8aafbb48f2	[ARM] Convert a bitcast VDUP to a VDUP The idea, under MVE, is to introduce more bitcasts around VDUP's in an attempt to get the type correct across basic block boundaries. In order to do that without other regressions we need a few fixups, of which this is the first. If the code is a bitcast of a VDUP, we can convert that straight into a VDUP of the new type, so long as they have the same size. Differential Revision: https://reviews.llvm.org/D78706	2020-05-06 14:14:21 +01:00
David Green	301346d47a	[LSR] Don't require register reuse under postinc LSR has some logic that tries to aggressively reuse registers in formula. This can lead to sub-optimal decision in complex loops where the backend it trying to use shouldFavorPostInc. This disables the re-use in those situations. Differential Revision: https://reviews.llvm.org/D79301	2020-05-05 16:04:50 +01:00
David Green	a8d6339086	[ARM] Correct the type on a predicate cast A PREDICATE_CAST(PREDICATE_CAST(X)) can be converted to a PREDICATE_CAST(X) as the operation can convert between any forms of predicates (v4i1/v8i1/v16i1/i32). Unfortunately I got the type wrong on one of the rarer converts, which would lead to invalid nodes during isel. This fixes it up to use the correct type. Differential Revision: https://reviews.llvm.org/D79402	2020-05-05 13:15:10 +01:00
Pierre-vh	4bfc5d4361	[Target][ARM] Fold or(A, B) more aggressively for I1 vectors This patch makes the folding of or(A, B) into not(and(not(A), not(B))) more agressive for I1 vector. This only affects Thumb2 MVE and improves codegen, because it removes a lot of msr/mrs instructions on VPR.P0. This patch also adds a xor(vcmp) -> !vcmp fold for MVE. Differential Revision: https://reviews.llvm.org/D77202	2020-05-05 10:03:02 +01:00
Pierre-vh	fed8d066c9	[Target][ARM] Add PerformVSELECTCombine for MVE Integer Ops This patch adds an implementation of PerformVSELECTCombine in the ARM DAG Combiner that transforms vselect(not(cond), lhs, rhs) into vselect(cond, rhs, lhs). Normally, this should be done by the target-independent DAG Combiner, but it doesn't handle the kind of constants that we generate, so we have to reimplement it here. Differential Revision: https://reviews.llvm.org/D77712	2020-05-05 10:03:02 +01:00
David Green	7246186ac6	[ARM] MVE predcast with const test. NFC	2020-05-05 09:53:42 +01:00
David Green	d5bbc099f9	[ARM] Complex LSR test showing inefficient codegen. NFC	2020-05-04 21:50:10 +01:00
LemonBoy	c1a6255491	[SelectionDAG] Unify scalarizeVectorLoad and VectorLegalizer::ExpandLoad The two code paths have the same goal, legalizing a load of a non-byte-sized vector by loading the "flattened" representation in memory, slicing off each single element and then building a vector out of those pieces. The technique employed by `ExpandLoad` is slightly more convoluted and produces slightly better codegen on ARM, AMDGPU and x86 but suffers from some bugs (D78480) and is wrong for BE machines. Differential Revision: https://reviews.llvm.org/D79096	2020-05-02 15:18:10 -07:00
David Green	d0924a703a	[ARM] Always replace FP16 bitcasts with VMOVhr or VMOVrh This changes the logic with lowering fp16 bitcasts to always produce either a VMOVhr or a VMOVrh, instead of only trying to do it with certain surrounding nodes. To perform the same optimisations demand bits and known bits information has been added for them. Differential Revision: https://reviews.llvm.org/D78587	2020-04-28 16:12:53 +01:00
David Green	219eaf384f	[ARM] Allow fma in tail predicated loops There are some intrinsics like this that currently block tail predication, but should be fine. This allows fma through, as the one that I ran into. There may be others that need the same treatment but I've only done this one here. Differential Revision: https://reviews.llvm.org/D78385	2020-04-27 15:32:47 +01:00
David Green	4fe4f92bfe	[ARM] Various tests for MVE and FP16 codegen. NFC	2020-04-24 12:11:46 +01:00
David Green	6e28c84a3a	[ARM] Replace arm vendor with none. NFC	2020-04-22 18:19:35 +01:00
David Green	2d4a711a67	[ARM] Distribute MVE post-increments This adds some extra processing into the Pre-RA ARM load/store optimizer to detect and merge MVE loads/stores and adds of the same base. This we don't always turn into a post-inc during ISel, and due to the nature of it being a graph we don't always know an order to use for the nodes, not knowing which nodes to make post-inc and which to use the new post-inc of. After ISel, we have an order that we can use to post-inc the following instructions. So this looks for a loads/store with a starting offset of 0, and an add/sub from the same base, plus a number of other loads/stores. We then do some checks and convert the zero offset load/store into a postinc variant. Any loads/stores after it have the offset subtracted from their immediates. For example: LDR #4 LDR #4 LDR #0 LDR_POSTINC #16 LDR #8 LDR #-8 LDR #12 LDR #-4 ADD #16 It only handles MVE loads/stores at the moment. Normal loads/store will be added in a followup patch, they just have some extra details to ensure that we keep generating LDRD/LDM successfully. Differential Revision: https://reviews.llvm.org/D77813	2020-04-22 14:16:51 +01:00
David Green	3dec8a2aea	[ARM] MVE FMA loop tests. NFC	2020-04-22 13:27:40 +01:00
Eli Friedman	6f59ec7d26	[ARM] Fix MIR tests with invalid live-ins. A register can't be live if it isn't defined; fix issues in various testcases. Differential Revision: https://reviews.llvm.org/D78529	2020-04-21 12:13:35 -07:00
David Green	54b3bad454	[ARM] MVE and scalar postinc mir tests. NFC	2020-04-20 22:00:07 +01:00
David Green	157dfafe06	[ARM] Add an low overhead sibling loop test. NFC	2020-04-20 18:46:38 +01:00
Sam Parker	e91e7bcdb8	[ARM][MVE] Add patterns for VRHADD Add patterns which use standard add nodes along with arm vshr imm nodes. Differential Revision: https://reviews.llvm.org/D77069	2020-04-20 10:05:21 +01:00
David Green	f40fb7fe60	[ARM] Regenerate tests. NFC	2020-04-19 13:45:39 +01:00
Sam Parker	7e25683abe	[ARM][MVE] Add VHADD and VHSUB patterns Add patterns that use a normal, non-wrapping, add and sub nodes along with an arm vshr imm node. Differential Revision: https://reviews.llvm.org/D77065	2020-04-17 07:45:15 +01:00
David Green	c3a8de9fef	[ARM] MVE postinc tests. NFC	2020-04-16 22:05:28 +01:00
Anna Welker	77dddc5e35	[ARM][MVE] Fix location of optimized gather addresses Fix for the address optimization for gathers and scatters which would in some complex cases push out instructions not to the vector loop preheader, but to other locations as well which lead to a scrambled order and the compilation failing. This patch ensures that said instructions are always pushed to the end of the vector loop preheader. Differential Revision: https://reviews.llvm.org/D78293	2020-04-16 18:15:28 +01:00
Konstantin Schwarz	9edce7f809	[MIR] Add comments to INLINEASM immediate flag MachineOperands Summary: The INLINEASM MIR instructions use immediate operands to encode the values of some operands. The MachineInstr pretty printer function already handles those operands and prints human readable annotations instead of the immediates. This patch adds similar annotations to the output of the MIRPrinter, however uses the new MIROperandComment feature. Reviewers: SjoerdMeijer, arsenm, efriedma Reviewed By: arsenm Subscribers: qcolombet, sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78088	2020-04-16 13:46:14 +02:00
Pierre-vh	86b2ed0199	[Target][ARM] Fix VPT Block Pass miscompilation The pass was incorrectly reverting back to a "T" when something wrote to VPR inside a "E" block. This is not the correct behaviour, the predicate should stay the same. Differential Revision: https://reviews.llvm.org/D77798	2020-04-14 15:16:27 +01:00
Pierre-vh	5ddbe3862b	[Target][ARM] Adding MVE VPT Optimisation Pass Differential Revision: https://reviews.llvm.org/D76709	2020-04-14 15:16:27 +01:00
Anna Welker	5cfdb4f2d2	[ARM][MVE] Optimise offset addresses of gathers/scatters This patch adds an analysis of the offset addresses used by gathers and scatters to the MVEGatherScatterLowering pass to find multiplications and additions that are loop invariant and thus can be moved into the loop preheader, avoiding to execute them each time. Differential Revision: https://reviews.llvm.org/D76681	2020-04-08 11:46:57 +01:00
Keith Walker	894061e4f5	[ARM] unwinding .pad instructions missing in execute-only prologue If the stack pointer is altered for local variables and we are generating Thumb2 execute-only code the .pad directive is missing. Usually the size of the adjustment is stored in a PC-relative location and loaded into a register which is then added to the stack pointer. However when we are generating execute-only code code the size of the adjustment is instead generated using the MOVW/MOVT instruction pair. As a by product of handling the execute-only case this also fixes an existing issue that in the none execute-only case the .pad directive was generated against the load of the constant to a register instruction, instead of the instruction which adds the register to the stack pointer. Differential Revision: https://reviews.llvm.org/D76849	2020-04-07 11:51:59 +01:00
Pierre-vh	e102b60201	Revert "[CodeGen][SelectionDAG] Flip Booleans More Often" This reverts commit 23342bdcc888835e744f38a2fcd0a5c651e33a31.	2020-04-07 09:09:10 +01:00
Pierre-vh	c741aac112	[CodeGen][SelectionDAG] Flip Booleans More Often Differential Revision: https://reviews.llvm.org/D77201	2020-04-07 08:19:57 +01:00

1 2 3 4 5 ...

1090 Commits