llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 13:11:39 +01:00

Author	SHA1	Message	Date
Diogo N. Sampaio	2cc01dcc2a	[ARM][FIX] Add missing f16.lane.vldN/vstN lowering Summary: Add missing D and Q lane VLDSTLane lowering for fp16 elements. Reviewers: efriedma, kosarev, SjoerdMeijer, ostannard Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60874 llvm-svn: 358962	2019-04-23 09:36:39 +00:00
Diogo N. Sampaio	46651abffe	[ARM] [FIX] Add missing f16 vector operations lowering Summary: Add missing <8xhalf> shufflevectors pattern, when using concat_vector dag node. As well, allows <8xhalf> and <4xhalf> vldup1 operations. These instructions are required for v8.2a fp16 lowering of vmul_n_f16, vmulq_n_f16 and vmulq_lane_f16 intrinsics. Reviewers: olista01, pbarrio, LukeGeeson, efriedma Reviewed By: efriedma Subscribers: efriedma, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60319 llvm-svn: 358081	2019-04-10 13:28:06 +00:00
Eli Friedman	e844a79a1e	[ARM] Eliminate redundant "mov rN, sp" instructions in Thumb1. This takes sequences like "mov r4, sp; str r0, [r4]", and optimizes them to something like "str r0, [sp]". For regular stack variables, this optimization was already implemented: we lower loads and stores using frame indexes, which are expanded later. However, when constructing a call frame for a call with more than four arguments, the existing optimization doesn't apply. We need to use stores which are actually relative to the current value of sp, and don't have an associated frame index. This patch adds a special case to handle that construct. At the DAG level, this is an ISD::STORE where the address is a CopyFromReg from SP (plus a small constant offset). This applies only to Thumb1: in Thumb2 or ARM mode, a regular store instruction can access SP directly, so the COPY gets eliminated by existing code. The change to ARMDAGToDAGISel::SelectThumbAddrModeSP is a related cleanup: we shouldn't pretend that it can select anything other than frame indexes. Differential Revision: https://reviews.llvm.org/D59568 llvm-svn: 356601	2019-03-20 19:40:45 +00:00
Oliver Stannard	480fa6448d	[ARM] Fix selection of VLDR.16 instruction with imm offset The isScaledConstantInRange function takes upper and lower bounds which are checked after dividing by the scale, so the bounds checks for half, single and double precision should all be the same. Previously, we had wrong bounds checks for half precision, so selected an immediate the instructions can't actually represent. Differential revision: https://reviews.llvm.org/D58822 llvm-svn: 355305	2019-03-04 09:17:38 +00:00
Craig Topper	ea7e6b3857	Implementation of asm-goto support in LLVM This patch accompanies the RFC posted here: http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html This patch adds a new CallBr IR instruction to support asm-goto inline assembly like gcc as used by the linux kernel. This instruction is both a call instruction and a terminator instruction with multiple successors. Only inline assembly usage is supported today. This also adds a new INLINEASM_BR opcode to SelectionDAG and MachineIR to represent an INLINEASM block that is also considered a terminator instruction. There will likely be more bug fixes and optimizations to follow this, but we felt it had reached a point where we would like to switch to an incremental development model. Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii Differential Revision: https://reviews.llvm.org/D53765 llvm-svn: 353563	2019-02-08 20:48:56 +00:00
Sam Parker	81bbef609c	[ARM] Add OptMinSize to ARMSubtarget In many places in the backend, we like to know whether we're optimising for code size and this is performed by checking the current machine function attributes. A subtarget is created on a per-function basis, so it's possible to know when we're compiling for code size on construction so record this in the new object. Differential Revision: https://reviews.llvm.org/D57812 llvm-svn: 353501	2019-02-08 07:57:42 +00:00
Sjoerd Meijer	2ef8508947	[ARM] Thumb2: ConstantMaterializationCost Constants can also be materialised using the negated value and a MVN, and this case seem to have been missed for Thumb2. To check the constant materialisation costs, we now call getT2SOImmVal twice, once for the original constant and then also for its negated value, and this function checks if the constant can both be splatted or rotated. This was revealed by a test that optimises for minsize: instead of a LDR literal pool load and having a literal pool entry, just a MVN with an immediate is smaller (and also faster). Differential Revision: https://reviews.llvm.org/D57327 llvm-svn: 352737	2019-01-31 08:38:06 +00:00
David Green	c802e04e3f	[ARM] Use sub for negative offset load/store in thumb1 This attempts to optimise negative values used in load/store operands a little. We currently try to selct them as rr, materialising the negative constant using a MOV/MVN pair. This instead selects ri with an immediate of 0, forcing the add node to become a simpler sub. Differential Revision: https://reviews.llvm.org/D57121 llvm-svn: 352475	2019-01-29 10:40:31 +00:00
Chandler Carruth	ae65e281f3	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Sjoerd Meijer	129cbf3900	[ARM] FP16: support vld1.16 for vector loads with post-increment Differential Revision: https://reviews.llvm.org/D55112 llvm-svn: 348110	2018-12-03 08:26:34 +00:00
Chandler Carruth	81e2f0deb5	[SDAG] Remove the reliance on MI's allocation strategy for `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be surprised at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740	2018-08-14 23:30:32 +00:00
Eli Friedman	251919f579	[ARM] Adjust AND immediates to make them cheaper to select. LLVM normally prefers to minimize the number of bits set in an AND immediate, but that doesn't always match the available ARM instructions. In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer a two-instruction sequence movs+ands or movs+bics. Some potential improvements outlined in ARMTargetLowering::targetShrinkDemandedConstant, but seems to work pretty well already. The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX instruction due to a larger-than-expected mask. (It's orthogonal, in some sense, but as far as I can tell it's either impossible or nearly impossible to reproduce the bug without this change.) According to my testing, this seems to consistently improve codesize by a small amount by forming bic more often for ISD::AND with an immediate. Differential Revision: https://reviews.llvm.org/D50030 llvm-svn: 339472	2018-08-10 21:21:53 +00:00
Sjoerd Meijer	16212a63c3	[ARM] FP16: codegen support for VTRN Differential Revision: https://reviews.llvm.org/D50454 llvm-svn: 339340	2018-08-09 12:45:09 +00:00
Sjoerd Meijer	96990242de	[ARM] FP16: support vector zip and unzip This is addressing PR38404. Differential Revision: https://reviews.llvm.org/D50186 llvm-svn: 338835	2018-08-03 09:24:29 +00:00
Fangrui Song	121474a01b	Remove trailing space sed -Ei 's/[[:space:]]+$//' include/*/.{def,h,td} lib/*/.{cpp,h} llvm-svn: 338293	2018-07-30 19:41:25 +00:00
Eli Friedman	2655e6dc4e	[ARM] Assert that ARMDAGToDAGISel creates valid UBFX/SBFX nodes. We don't ever check these again (unless you're using -fno-integrated-as), so make sure the extracted bits are well-defined. I don't think it's possible to trigger any of the assertions on trunk, but it's difficult to prove. (The first one depends on DAGCombine to minimize the number of set bits in AND masks; I think the others are mathematically impossible to hit.) llvm-svn: 335931	2018-06-28 21:49:41 +00:00
Ivan A. Kosarev	e666ad4518	[NEON] Support vldNq intrinsics in AArch32 (LLVM part) This patch adds support for the q versions of the dup (load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for example. Currently, non-q versions of the dup intrinsics are implemented in clang by generating IR that first loads the elements of the structure into the first lane with the lane (to-single-lane) intrinsics, and then propagating it other lanes. There are at least two problems with this approach. First, there are no double-spaced to-single-lane byte-element instructions. For example, there is no such instruction as 'vld2.8 { d0[0], d2[0] }, [r0]'. That means we cannot rely on the to-single-lane intrinsics and instructions to implement the q versions of the dup intrinsics. Note that to-all-lanes instructions do support all sizes of data items, including bytes. The second problem with the current approach is that we need a separate vdup instruction to propagate the structure to each lane. So for vld4q_dup_f16() we would need four vdup instructions in addition to the initial vld instruction. This patch introduces dup LLVM intrinsics and reworks handling of the currently supported (non-q) NEON dup intrinsics to expand them into those LLVM intrinsics, thus eliminating the need for using to-single-lane intrinsics and instructions. Additionally, this patch adds support for u64 and s64 dup NEON intrinsics. These are marked as Arch64-only in the ARM NEON Reference, but it seems there are no reasons to not support them in AArch32 mode. Please correct, if that is wrong. That's what we generate with this patch applied: vld2q_dup_f16: vld2.16 {d0[], d2[]}, [r0] vld2.16 {d1[], d3[]}, [r0] vld3q_dup_f16: vld3.16 {d0[], d2[], d4[]}, [r0] vld3.16 {d1[], d3[], d5[]}, [r0] vld4q_dup_f16: vld4.16 {d0[], d2[], d4[], d6[]}, [r0] vld4.16 {d1[], d3[], d5[], d7[]}, [r0] Differential Revision: https://reviews.llvm.org/D48439 llvm-svn: 335733	2018-06-27 13:57:52 +00:00
Tim Northover	63b281a203	ARM: convert ORR instructions to ADD where possible on Thumb. Thumb has more 16-bit encoding space dedicated to ADD than ORR, allowing both a 3-address encoding and a wider range of immediates. So, particularly when optimizing for code size (but it doesn't make things worse elsewhere) it's beneficial to select an OR operation to an ADD if we know overflow won't occur. This is made even better by LLVM's penchant for putting operations in canonical form by converting the other way. llvm-svn: 335119	2018-06-20 12:09:44 +00:00
Ivan A. Kosarev	871056ae45	[NEON] Support VST1xN intrinsics in AArch32 mode (LLVM part) We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47447 llvm-svn: 334361	2018-06-10 09:27:27 +00:00
Eli Friedman	204b6b7989	[ARM] Allow CMPZ transforms even if the input has multiple uses. It looks like this got left in by accident in r289794; I can't think of any reason this check would be necessary. (Maybe it was meant to be a check that the AND has one use? But we check that a few lines earlier.) Differential Revision: https://reviews.llvm.org/D47921 llvm-svn: 334322	2018-06-08 21:16:56 +00:00
Ivan A. Kosarev	3a4bdaf295	[NEON] Support VLD1xN intrinsics in AArch32 mode (LLVM part) We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47120 llvm-svn: 333825	2018-06-02 16:40:03 +00:00
Ivan A. Kosarev	f49f73f5a8	Revert r333819 "[NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)" The LLVM part was committed instead of the Clang part. Differential Revision: https://reviews.llvm.org/D47121 llvm-svn: 333824	2018-06-02 16:38:38 +00:00
Ivan A. Kosarev	8c6f4bca6c	[NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part) We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47121 llvm-svn: 333819	2018-06-02 16:26:42 +00:00
Adrian Prantl	076a6683eb	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272	2018-05-01 15:54:18 +00:00
Nirav Dave	3fdc3ca230	[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172" Reland ISel cycle checking improvements after simplifying node id invariant traversal and correcting typo. llvm-svn: 327898	2018-03-19 20:19:46 +00:00
Sjoerd Meijer	4c4a37be92	[ARM] Support for v4f16 and v8f16 vectors This is the groundwork for adding the Armv8.2-A FP16 vector intrinsics, which uses v4f16 and v8f16 vector operands and return values. All the moving parts are tested with two intrinsics, a 1-operand v8f16 and a 2-operand v4f16 intrinsic. In a follow-up patch the rest of the intrinsics and tests will be added. Differential Revision: https://reviews.llvm.org/D44538 llvm-svn: 327839	2018-03-19 13:35:25 +00:00
Nirav Dave	c89f3ddc27	Revert "[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"" as it times out building test-suite on PPC. llvm-svn: 327778	2018-03-17 19:24:54 +00:00
Nirav Dave	5cfea3a8a6	[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172" Reland ISel cycle checking improvements after simplifying and reducing node id invariant traversal. llvm-svn: 327777	2018-03-17 17:42:10 +00:00
Nirav Dave	fc86632712	Revert: r327172 "Correct load-op-store cycle detection analysis" r327171 "Improve Dependency analysis when doing multi-node Instruction Selection" r328170 "[DAG] Enforce stricter NodeId invariant during Instruction selection" Reverting patch as NodeId invariant change is causing pathological increases in compile time on PPC llvm-svn: 327197	2018-03-10 02:16:15 +00:00
Nirav Dave	645ddf853a	[DAG] Enforce stricter NodeId invariant during Instruction selection Instruction Selection makes use of the topological ordering of nodes by node id (a node's operands have smaller node id than it) when doing cycle detection. During selection we may violate this property as a selection of multiple nodes may induce a use dependence (and thus a node id restriction) between two unrelated nodes. If a selected node has an unselected successor this may allow us to miss a cycle in detection an invalid selection. This patch fixes this by marking all unselected successors of a selected node have negated node id. We avoid pruning on such negative ids but still can reconstruct the original id for pruning. In-tree targets have been updated to replace DAG-level replacements with ISel-level ones which enforce this property. This preemptively fixes PR36312 before triggering commit r324359 relands Reviewers: craig.topper, bogner, jyknight Subscribers: arsenm, nhaehnle, javed.absar, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D43198 llvm-svn: 327170	2018-03-09 20:57:15 +00:00
Florian Hahn	06a6d27ef3	[ARM] Fix codegen for VLD3/VLD4/VST3/VST4 with WB Code generation of VLD3, VLD4, VST3 and VST4 with register writeback is broken due to 2 separate bugs: 1) VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register are missing rules to expand them to non pseudo MIR. These are selected for ARMISD::VLD3_UPD/VLD4_UPD with v1i64 vectors in SelectVLD. 2) Selection of the right VLD/VST instruction is broken for load and store of 3 and 4 v1i64 vectors. SelectVLD and SelectVST are called with MIR opcode for fixed writeback (ie increment is access size) and call getVLDSTRegisterUpdateOpcode() to select an opcode with register writeback if base register update is of a different size. Since getVLDSTRegisterUpdateOpcode() only knows about VLD1/VLD2/VST1/VST2 the call is currently conditional on the number of element in the vector. However, VLD1/VST1 is selected by SelectVLD/SelectVST's caller for load and stores of 3 or 4 v1i64 vectors. Therefore the opcode is not updated which later lead to a fixed writeback instruction being constructed with an extra operand for the register writeback. This patch addresses the two issues as follows: - it adds the necessary mapping from VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register to VLD1d64Twb_register and VLD1d64Qwb_register respectively. Like for the existing _fixed variants, the cost of these is bumped for unaligned access. - it changes the logic in SelectVLD and SelectVSD to call isVLDfixed and isVSTfixed respectively to decide whether the opcode should be updated. It also reworks the logic and comments for pushing the writeback offset operand and r0 operand to clarify the logic: writeback offset needs to be pushed if it's a register writeback, r0 needs to be pushed if not and the instruction is a VLD1/VLD2/VST1/VST2. Reviewers: rengolin, t.p.northover, samparker Reviewed By: samparker Patch by Thomas Preud'homme <thomas.preudhomme@arm.com> Differential Revision: https://reviews.llvm.org/D42970 llvm-svn: 326570	2018-03-02 13:02:55 +00:00
Sjoerd Meijer	77cd5c40d2	[ARM] Armv8.2-A FP16 code generation (part 1/3) This is the groundwork for Armv8.2-A FP16 code generation . Clang passes and returns _Float16 values as floats, together with the required bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318. We will implement half-precision argument passing/returning lowering in the ARM backend soon, but for now this means that this: _Float16 sub(_Float16 a, _Float16 b) { return a + b; } gets lowered to this: define float @sub(float %a.coerce, float %b.coerce) { entry: %0 = bitcast float %a.coerce to i32 %tmp.0.extract.trunc = trunc i32 %0 to i16 %1 = bitcast i16 %tmp.0.extract.trunc to half <SNIP> %add = fadd half %1, %3 <SNIP> } When FullFP16 is not supported, we don't make f16 a legal type, and we get legalization for "free", i.e. nothing changes and everything works as before. And also f16 argument passing/returning is handled. When FullFP16 is supported, we do make f16 a legal type, and have 2 places that we need to patch up: f16 argument passing and returning, which involves minor tweaks to avoid unnecessary code generation for some bitcasts. As a "demonstrator" that this works for the different FP16, FullFP16, softfp modes, etc., I've added match rules to the VSUB instruction description showing that we can codegen this instruction from IR, but more importantly, also to some conversion instructions. These conversions were causing issue before in the FP16 and FullFP16 cases. I've also added match rules to the VLDRH and VSTRH desriptions, so that we can actually compile the entire half-precision sub code example above. This showed that these loads and stores had the wrong addressing mode specified: AddrMode5 instead of AddrMode5FP16, which turned out not be implemented at all, so that has also been added. This is the minimal patch that shows all the different moving parts. In patch 2/3 I will add some efficient lowering of bitcasts, and in 2/3 I will add the remaining Armv8.2-A FP16 instruction descriptions. Thanks to Sam Parker and Oliver Stannard for their help and reviews! Differential Revision: https://reviews.llvm.org/D38315 llvm-svn: 323512	2018-01-26 09:26:40 +00:00
Andre Vieira	a1f92109e8	[ARM] Fix erroneous availability of SMMLS for Armv7-M Differential Revision: https://reviews.llvm.org/D41855 llvm-svn: 322360	2018-01-12 09:21:09 +00:00
David Blaikie	e01dc73ad2	Fix a bunch more layering of CodeGen headers that are in Target All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490	2017-11-17 01:07:10 +00:00
Momchil Velikov	653fa328bc	[ARM] Split Arm jump table branch into i12 and rs suffixed versions This is a refactoring/cleanup of Arm `addrmode2` operand class. The patch removes it completely. Differential Revision: https://reviews.llvm.org/D39832 llvm-svn: 318291	2017-11-15 12:02:55 +00:00
Javed Absar	6663c348ab	[ARM] Tidy up banked registers encoding Moves encoding (SYSm) information of banked registers to ARMSystemRegister.td, where it rightly belongs and forms a single point of reference in the code. Reviewed by: @fhahn, @rovka, @olista01 Differential Revision: https://reviews.llvm.org/D36219 llvm-svn: 309910	2017-08-03 01:24:12 +00:00
Javed Absar	ca6d1aaf93	[ARM] Unify handling of M-Class system registers This patch cleans up and fixes issues in the M-Class system register handling: 1. It defines the system registers and the encoding (SYSm values) in one place: a new ARMSystemRegister.td using SearchableTable, thereby removing the hand-coded values which existed in multiple places. 2. Some system registers e.g. BASEPRI_MAX_NS which do not exist were being allowed! Ref: ARMv6/7/8M architecture reference manual. Reviewed by: @t.p.northover, @olist01, @john.brawn Differential Revision: https://reviews.llvm.org/D35209 llvm-svn: 308456	2017-07-19 12:57:16 +00:00
Sam Parker	7f44c8f183	[ARM] Allow rematerialization of ARM Thumb literal pool loads Constants are crucial for code size in the ARM Thumb-1 instruction set. The 16 bit instruction size often does not offer enough space for immediate arguments. This means that additional instructions are frequently used to load constants into registers. Since constants are hoisted, this can lead to significant register spillage if they are used multiple times in a single function. This can be avoided by rematerialization, i.e. recomputing a constant instead of reloading it from the stack. This patch fixes the rematerialization of literal pool loads in the ARM Thumb instruction set. Patch by Philip Ginsbach Differential Revision: https://reviews.llvm.org/D33936 llvm-svn: 308004	2017-07-14 08:23:56 +00:00
Tim Northover	5f903fb289	ARM: avoid handing a deleted node back to TableGen during ISel. When we replaced the multiplicand the destination node might already exist. When that happens the original gets CSEd and deleted. However, it's actually used as the offset so nonsense is produced. Should fix PR32726. llvm-svn: 301983	2017-05-02 22:45:19 +00:00
Tim Northover	cc3adfc204	ARM: handle post-indexed NEON ops where the offset isn't the access width. Before, we assumed that any ConstantInt offset was precisely the access width, so we could use the "[rN]!" form. ISelLowering only ever created that kind, but further simplification during combining could lead to unexpected constants and incorrect codegen. Should fix PR32658. llvm-svn: 300878	2017-04-20 19:54:02 +00:00
Benjamin Kramer	603b1014b9	Fix use-after-frees on memory allocated in a Recycler. This will become asan errors once the patch lands that poisons the memory after free. The x86 change is a hack, but I don't see how to solve this properly at the moment. llvm-svn: 300867	2017-04-20 18:29:14 +00:00
Eli Friedman	226d22a6a5	[ARM] Use TableGen patterns to select vtbl. NFC. Differential Revision: https://reviews.llvm.org/D32103 llvm-svn: 300749	2017-04-19 20:39:39 +00:00
Eli Friedman	12c7517fb8	[ARM] Replace some C++ selection code with TableGen patterns. NFC. Differential Revision: https://reviews.llvm.org/D30794 llvm-svn: 297768	2017-03-14 18:43:37 +00:00
Sam Parker	1cba526c4c	[ARM] Move SMULW[B\|T] isel to DAG Combine Create nodes for smulwb and smulwt and move their selection from DAGToDAG to DAG combine. smlawb and smlawt can then be selected using tablegen. Added some helper functions to detect shift patterns as well as a wrapper around SimplifyDemandBits. Added a couple of extra tests. Differential Revision: https://reviews.llvm.org/D30708 llvm-svn: 297716	2017-03-14 09:13:22 +00:00
Artyom Skrobov	2817f1c979	Refactor the multiply-accumulate combines to act on ARMISD::ADD[CE] nodes, instead of the generic ISD::ADD[CE]. Summary: This allows for some simplification because the combines are no longer limited to just one go at the node before it gets legalized into an ARM target-specific one. Reviewers: jmolloy, rogfer01 Subscribers: aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30401 llvm-svn: 297453	2017-03-10 12:41:33 +00:00
Artyom Skrobov	c1017f58cf	In Thumb1 mode, the custom lowering for ARMISD::CMPZ could never emit tADDi3 Reviewers: jmolloy, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D30097 llvm-svn: 295478	2017-02-17 18:59:16 +00:00
John Brawn	b7fc3ce6ba	[ARM] Fix incorrect mask bits in MSR encoding for write_register intrinsic In the encoding of system registers in the M-class MSR instruction the mask bits should be 2 for registers that don't take a _<bits> qualifier (the instruction is unpredictable otherwise), and should also be 2 if the register takes a _<bits> qualifier but it's not present as no _<bits> is an alias for _nzcvq. Differential Revision: https://reviews.llvm.org/D29828 llvm-svn: 294762	2017-02-10 17:41:08 +00:00
Eli Friedman	09b663436a	[ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently. Currently, there are substantial problems forming vld1_dup even if the VDUP survives legalization. The lack of an actual node leads to terrible results: not only can we not form post-increment vld1_dup instructions, but we form scalar pre-increment and post-increment loads which force the loaded value into a GPR. This patch fixes that by combining the vdup+load into an ARMISD node before DAGCombine messes it up. Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable). Recommiting with fix to avoid forming vld1dup if the type of the load doesn't match the type of the vdup (see https://llvm.org/bugs/show_bug.cgi?id=31404). Differential Revision: https://reviews.llvm.org/D27694 llvm-svn: 289972	2016-12-16 18:44:08 +00:00
Nico Weber	b553ce3ab5	Revert 279703, it caused PR31404. llvm-svn: 289923	2016-12-16 04:51:25 +00:00
Sjoerd Meijer	9792b09ed3	[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently This is essentially a recommit of r285893, but with a correctness fix. The problem of the original commit was that this: bic r5, r7, #31 cbz r5, .LBB2_10 got rewritten into: lsrs r5, r7, #5 beq .LBB2_10 The result in destination register r5 is not the same and this is incorrect when r5 is not dead. So this fix includes checking the uses of the AND destination register. And also, compared to the original commit, some regression tests didn't need changing anymore because of this extra check. For completeness, this was the original commit message: For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. Differential Revision: https://reviews.llvm.org/D27761 llvm-svn: 289794	2016-12-15 09:38:59 +00:00

1 2 3 4 5 ...

617 Commits