llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 20:23:11 +01:00

Author	SHA1	Message	Date
Sjoerd Meijer	e48baaecdb	Recommit "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit effc3b079927a6dd3084b4ff712ec07f926366f0, with the build problem fixed.	2021-02-15 11:33:00 +00:00
Sjoerd Meijer	dda4f352fa	Revert "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit cd6de0e8de4a5fd558580be4b1a07116914fc8ed.	2021-02-15 11:01:23 +00:00
Sjoerd Meijer	c1c4d25c71	[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into getPreferredAddressingMode() so that we have one interface to steer LSR in generating the preferred addressing mode. Differential Revision: https://reviews.llvm.org/D96600	2021-02-15 10:44:15 +00:00
Fraser Cormack	adeeb489c1	[RISCV] Convert VSLIDE(UP\|DOWN) nodes to "VL" versions (NFC) This patch prepares the RISCV VSLIDEUP and VSLIDEDOWN custom nodes to ones carrying additional mask and vector-length operands. This is primarily so they can be used by both systems. This also takes the opportunity to create some helper functions to deal with the common task of getting the default (unmasked) VL operands. Reviewed By: craig.topper, arcbbb Differential Revision: https://reviews.llvm.org/D96505	2021-02-15 10:32:56 +00:00
Arlo Siemsen	d4eefe6819	Add ehcont section support In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling. This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker. This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag. The change includes a test that when the module flag is enabled the section is correctly generated. The set of exception continuation information includes returns from exceptional control flow (catchret in llvm). In order to collect catchret we: 1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation, 2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and 3) Combines these targets with the other EHCont targets that were already being collected. Change originally authored by Daniel Frampton <dframpto@microsoft.com> For more details, see MSVC documentation for `/guard:ehcont` https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D94835	2021-02-15 14:27:12 +08:00
Carl Ritson	730fd62aad	[AMDGPU] Add llvm.amdgcn.wqm.demote intrinsic Add intrinsic which demotes all active lanes to helper lanes. This is used to implement demote to helper Vulkan extension. In practice demoting a lane to helper simply means removing it from the mask of live lanes used for WQM/WWM/Exact mode. Where the shader does not use WQM, demotes just become kills. Additionally add llvm.amdgcn.live.mask intrinsic to complement demote operations. In theory llvm.amdgcn.ps.live can be used to detect helper lanes; however, ps.live can be moved by LICM. The movement of ps.live cannot be remedied without changing its type signature and such a change would require ps.live users to update as well. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D94747	2021-02-15 08:45:46 +09:00
Tony Tye	69551096db	[AMDGPU] Limit memory scope for scratch, LDS and GDS Changes for AMD GPU SIMemoryLegalizer: - Limit the memory scope to maximum supported by the scratch, LDS and GDS address spaces. - Improve assertion checking. - Correct toSIAtomicScope argument name. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D96643	2021-02-14 17:34:12 +00:00
Kazu Hirata	9458e140a1	[AMDGPU] Fix build breakage	2021-02-14 09:02:55 -08:00
Kazu Hirata	ff49587cde	[llvm] Use llvm::is_contained (NFC)	2021-02-14 08:36:20 -08:00
Ben Shi	fd0f322cf0	[AVR] Fix a bug in 16-bit shifts Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D96590	2021-02-14 11:54:55 +08:00
Craig Topper	9d5ee57922	[RISCV] Rename the RVVBaseAddr ComplexPattern to just BaseAddr and use it to merge some scalar load/store patterns too.	2021-02-13 12:01:51 -08:00
Heejin Ahn	4214250471	[WebAssemblly] Fix rethrow's argument computation Previously we assumed `rethrow`'s argument was always 0, but it turned out `rethrow` follows the same rule with `br` or `delegate`: https://github.com/WebAssembly/exception-handling/pull/137 https://github.com/WebAssembly/exception-handling/issues/146#issuecomment-777349038 Currently `rethrow`s generated by our backend always rethrow the exception caught by the innermost enclosing catch, so this adds a function to compute that and replaces `rethrow`'s argument with its computed result. This also renames `EHPadStack` in `InstPrinter` to `TryStack`, because in CFGStackify we use `EHPadStack` to mean the range between `catch`~`end`, while in `InstPrinter` we used it to mean the range between `try`~`catch`, so choosing different names would look clearer. Doesn't contain any functional changes in `InstPrinter`. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D96595	2021-02-13 03:43:15 -08:00
Kazu Hirata	f0327193f8	[AMDGPU] Drop unnecessary const from a return type (NFC) Identified with readability-const-return-type.	2021-02-12 23:44:32 -08:00
Serge Pavlov	c5d859bfe2	[FPEnv][ARM] Implement lowering of llvm.set.rounding Differential Revision: https://reviews.llvm.org/D96501	2021-02-13 11:16:29 +07:00
Craig Topper	25ee8ece36	[RISCV] Move riscv_vfmv_v_f_vl patterns to RISCVInstrInfoVVLPatterns.td for consistency with riscv_vmv_v_x_vl. NFC	2021-02-12 16:08:27 -08:00
Craig Topper	542e18bed6	[RISCV] Add support for fixed vector fabs	2021-02-12 15:33:36 -08:00
Craig Topper	a3618cb881	[RISCV] Add support for fixed vector sqrt.	2021-02-12 15:33:29 -08:00
Jessica Paquette	1f262f8352	[AArch64][GlobalISel] Fold constants into G_GLOBAL_VALUE This is pretty much just ports `performGlobalAddressCombine` from AArch64ISelLowering. (AArch64 doesn't use the generic DAG combine for this.) This adds a pre-legalize combine which looks for this pattern: ``` %g = G_GLOBAL_VALUE @x %ptr1 = G_PTR_ADD %g, cst1 %ptr2 = G_PTR_ADD %g, cst2 ... %ptrN = G_PTR_ADD %g, cstN ``` And then, if possible, transforms it like so: ``` %g = G_GLOBAL_VALUE @x %offset_g = G_PTR_ADD %g, -min(cst) %ptr1 = G_PTR_ADD %offset_g, cst1 %ptr2 = G_PTR_ADD %offset_g, cst2 ... %ptrN = G_PTR_ADD %offset_g, cstN ``` Where min(cst) is the smallest out of the G_PTR_ADD constants. This means we should save at least one G_PTR_ADD. This also updates code in the legalizer + selector which assumes that G_GLOBAL_VALUE will never have an offset and adds/updates relevant tests. Differential Revision: https://reviews.llvm.org/D96624	2021-02-12 14:55:15 -08:00
Craig Topper	42bd4878de	[RISCV] Use a ComplexPattern to merge the PatFrags for removing unneeded masks on shift amounts. Rather than having patterns with and without an AND, use a ComplexPattern to handle both cases. Reduces the isel table by about 700 bytes.	2021-02-12 14:03:23 -08:00
Stanislav Mekhanoshin	525e98279b	[AMDGPU] Fix Windows build A trivial fix, 64 bit constant is 1ull, not 1ul on Windows. Fixed build broken by c0d7a8bc6241.	2021-02-12 12:30:52 -08:00
Amara Emerson	988eea1dc5	[GlobalISel] Propagate extends through G_PHIs into the incoming value blocks. This combine tries to do inter-block hoisting of extends of G_PHIs, into the originating blocks of the phi's incoming value. The idea is to expose further optimization opportunities that are normally obscured by the PHI. Some basic heuristics, and a target hook for AArch64 is added, to allow tuning. E.g. if the extend is used by a G_PTR_ADD, it doesn't perform this combine since it may be folded into the addressing mode during selection. There are very minor code size improvements on AArch64 -Os, but the real benefit is that it unlocks optimizations like AArch64 conditional compares on some benchmarks. Differential Revision: https://reviews.llvm.org/D95703	2021-02-12 11:52:52 -08:00
David Green	0c8d4b9ca1	[ARM] Optimize fp store of extract to integer store if already available. Given a floating point store from an extracted vector, with an integer VGETLANE that already exists, storing the existing VGETLANEu directly can be better for performance. As the value is known to already be in an integer registers, this can help reduce fp register pressure, removed the need for the fp extract and allows use of more integer post-inc stores not available with vstr. This can be a bit narrow in scope, but helps with certain biquad kernels that store shuffled vector elements. Differential Revision: https://reviews.llvm.org/D96159	2021-02-12 18:34:58 +00:00
Simon Pilgrim	8450c87080	[DAG] Move basic USUBSAT pattern matches from X86 to DAGCombine Begin transitioning the X86 vector code to recognise sub(umax(a,b) ,b) or sub(a,umin(a,b)) USUBSAT patterns to make it more generic and available to all targets. This initial patch just moves the basic umin/umax patterns to DAG, removing some vector-only checks on the way - these are some of the patterns that the legalizer will try to expand back to so we can be reasonably relaxed about matching these pre-legalization. We can handle the trunc(sub(..))) variants as well, which helps with patterns where we were promoting to a wider type to detect overflow/saturation. The remaining x86 code requires some cleanup first - some of it isn't actually tested etc. I also need to resurrect D25987. Differential Revision: https://reviews.llvm.org/D96413	2021-02-12 18:22:57 +00:00
Stanislav Mekhanoshin	485c24bc42	[AMDGPU] Allow accvgpr_read/write decode with opsel These two instructions are VOP3P and have op_sel_hi bits, however do not use op_sel_hi. That is recommended to set unused op_sel_hi bits to 1. However, we cannot decode both representations with 1 and 0 if bits are set to default value 1. If bits are set to be ignored with '?' initializer then encoding defaults them to 0. The patch is a hack to force ignored '?' bits to 1 on encoding for these instructions. There is still canonicalization happens on disasm print if incoming values are non-default, so that disasm output does not match binary input, but this is pre-existing problem for all instructions with '?' bits. Fixes: SWDEV-272540 Differential Revision: https://reviews.llvm.org/D96543	2021-02-12 10:04:47 -08:00
Akira Hatanaka	078c441e76	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-12 09:51:57 -08:00
Craig Topper	38a1f47551	[RISCV] Add support for integer fixed vector setcc I believe I've covered all orderings of splat operands here. Better canonicalization in lowering might help reduce this. I did not handle the immediate adjustments needed for set(u)gt/set(u)lt. Testing here is limited to byte types because the scalable vector type used for masks for the store is calculated assuming 8 byte elements. But for the setcc its based on the element count of the container type for the setcc input. So they don't agree. We'll need to enhanced D96352 to handle this I think. Differential Revision: https://reviews.llvm.org/D96443	2021-02-12 09:29:41 -08:00
Craig Topper	72741c7878	[RISCV] Add support for matching .vx and .vi forms of binary instructions for fixed vectors. Unlike scalable vectors, I'm only using a ComplexPattern for the immediate itself. The vmv_v_x is matched explicitly. We igore the VL argument when matching a binary operator, but we do check it when matching splat directly. I left out tests for vXi64 as they fail on rv32 right now. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96365	2021-02-12 09:18:10 -08:00
Jay Foad	3fbdec87d1	[TableGen][GlobalISel] Allow duplicate RendererFns Allow different GICustomOperandRenderers to use the same RendererFn. This avoids the need for targets to define a bunch of identical C++ renderer functions with different names. Without this fix TableGen would have emitted code that tried to define the GICR enumeration with duplicate enumerators. Differential Revision: https://reviews.llvm.org/D96587	2021-02-12 15:05:32 +00:00
David Green	22b5fa9203	[ARM] Single source VMOVNT Our current lowering of VMOVNT goes via a shuffle vector of the form <0, N, 2, N+2, 4, N+4, ..>. That can of course also be a single input shuffle of the form <0, 0, 2, 2, 4, 4, ..>, where we use a VMOVNT to insert a vector into the top lanes of itself. This adds lowering of that case, re-using the existing isVMOVNMask. Differential Revision: https://reviews.llvm.org/D96065	2021-02-12 14:28:57 +00:00
Sanjay Patel	d1a8bb697a	[Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promoting them to 1st-class intrinsics, however, codegen support was added/improved: D58015 D90247 So I think it is safe to now remove this complication from IR. Note that we still have an IR-level codegen expansion pass for these as discussed in D95690. Removing that is another step in simplifying the logic. Also note that x86 was already unconditionally forming reductions in IR, so there should be no difference for x86. I spot checked a couple of the tests here by running them through opt+llc and did not see any asm diffs. If we do find functional differences for other targets, it should be possible to (at least temporarily) restore the shuffle IR with the ExpandReductions IR pass. Differential Revision: https://reviews.llvm.org/D96552	2021-02-12 08:13:50 -05:00
luxufan	7124b02d55	[RISCV] Change parseVTypeI function Change parseVTypeI function to Make the added vset instruction test cases report more concrete error message. Differential Revision: https://reviews.llvm.org/D96218	2021-02-12 19:38:34 +08:00
Fraser Cormack	765390acdd	[RISCV] Add support for integer fixed min/max This patch extends the initial fixed-length vector support to include smin, smax, umin, and umax. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96491	2021-02-12 09:19:45 +00:00
Heejin Ahn	3c77486de7	[WebAssembly] Fix delegate's argument computation I previously assumed `delegate`'s immediate argument computation followed a different rule than that of branches, but we agreed to make it the same (https://github.com/WebAssembly/exception-handling/issues/146). This removes the need for a separate `DelegateStack` in both CFGStackify and InstPrinter. When computing the immediate argument, we use a different function for `delegate` computation because in MIR `DELEGATE`'s instruction's destination is the destination catch BB or delegate BB, and when it is a catch BB, we need an additional step of getting its corresponding `end` marker. Reviewed By: tlively, dschuff Differential Revision: https://reviews.llvm.org/D96525	2021-02-11 21:57:28 -08:00
Craig Topper	ef5423cf23	[RISCV] Add a pattern for a scalable vector mask vnot. We can use a vnand.mm with the same register for both inputs. This avoids materializing an alls ones constant with vmset.mm.	2021-02-11 15:34:58 -08:00
ShihPo Hung	934bc977ad	[RISCV] Initial support for insert/extract subvector This patch handles cast-like insert_subvector & extract_subvector in which case: 1. index starts from 0. 2. inserting a fixed-width vector into a scalable vector, or extracting a fixed-width vector from a scalable vector. Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D96352	2021-02-11 14:35:49 -08:00
Pengxuan Zheng	cd15616305	[AArch64] Adding Neon Sm3 & Sm4 Intrinsics This adds SM3 and SM4 Intrinsics support for AArch64, specifically: vsm3ss1q_u32 vsm3tt1aq_u32 vsm3tt1bq_u32 vsm3tt2aq_u32 vsm3tt2bq_u32 vsm3partw1q_u32 vsm3partw2q_u32 vsm4eq_u32 vsm4ekeyq_u32 Reviewed By: labrinea Differential Revision: https://reviews.llvm.org/D95655	2021-02-11 14:20:20 -08:00
Stanislav Mekhanoshin	8e1a6005f7	[AMDGPU] Fix promote alloca with double use in a same insn If we have an instruction where more than one pointer operands are derived from the same promoted alloca, we are fixing it for one argument and do not fix a second use considering this user done. Fix this by deferring processing of memory intrinsics until all potential operands are replaced. Fixes: SWDEV-271358 Differential Revision: https://reviews.llvm.org/D96386	2021-02-11 11:42:25 -08:00
Matt Arsenault	0ac1411820	AMDGPU: Restrict soft clause bundling at half of the available regs Fixes a testcase that was overcommitting large register tuples to a bundle, which the register allocator could not possibly satisfy. This was producing a bundle which used nearly all of the available SGPRs with a series of 16-dword loads (not all of which are freely available to use). This is a quick hack for some deeper issues with how the clause bundler tracks register pressure. Overall the pressure tracking used here doesn't make sense and is too imprecise for what it needs to avoid the allocator failing. The pressure estimate does not account for the alignment requirements of large SGPR tuples, so this was really underestimating the pressure impact. This also ignores the impact of the extended live range of the use registers after the bundle is introduced. Additionally, it didn't account for some wide tuples not being available due to reserved registers. This regresses a few cases. These end up introducing more spilling. This is also a function of the global pressure being used in the decision to bundle, not the local pressure impact of the bundle itself.	2021-02-11 14:08:59 -05:00
Yonghong Song	18184b22d0	BPF: Add LLVMAnalysis in CMakefile LINK_COMPONENTS buildbot reported a build error like below: BPFTargetMachine.cpp:(.text._ZN4llvm19TargetTransformInfo5ModelINS_10BPFTTIImplEED2Ev [_ZN4llvm19TargetTransformInfo5ModelINS_10BPFTTIImplEED2Ev]+0x14): undefined reference to `llvm::TargetTransformInfo::Concept::~Concept()' lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BPFTargetMachine.cpp.o: In function `llvm::TargetTransformInfo::Model<llvm::BPFTTIImpl>::~Model()': Commit a260ae716030 ("BPF: Implement TTI.IntImmCost() properly") added TargetTransformInfo to BPF, which requires LLVMAnalysis dependence. In certain cmake configurations, lacking explicit LLVMAnalysis dependency may cause compilation error. Similar to other targets, this patch added LLVMAnalysis in CMakefile LINK_COMPONENTS explicitly.	2021-02-11 10:24:22 -08:00
Jay Foad	2f91cc65b2	[AMDGPU] Better selection of base offset when merging DS reads/writes When merging a pair of DS reads or writes needs to materialize the base offset in a vgpr, choose a value that is aligned to as high a power of two as possible. This maximises the chance that different pairs can use the same base offset, in which case the base offset registers can be commoned up by MachineCSE. Differential Revision: https://reviews.llvm.org/D96421	2021-02-11 17:46:09 +00:00
Craig Topper	32e1c9e6be	[TargetLowering][RISCV][AArch64][PowerPC] Enable BuildUDIV/BuildSDIV on illegal types before type legalization if we can find a larger legal type that supports MUL. If we wait until the type is legalized, we'll lose information about the orginal type and need to use larger magic constants. This gets especially bad on RISCV64 where i64 is the only legal type. I've limited this to simple scalar types so it only works for i8/i16/i32 which are most likely to occur. For more odd types we might want to do a small promotion to a type where MULH is legal instead. Unfortunately, this does prevent some urem/srem+seteq matching since that still require legal types. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96210	2021-02-11 09:43:13 -08:00
Craig Topper	5d204c33c0	[RISCV] Add support loads, stores, and splats of vXi1 fixed vectors. This refines how we determine which masks types are legal and adds support for loads, stores, and all ones/zeros splats. I left a fixme in store handling where I think we need to zero extra bits if the type isn't a multiple of a byte. If I remember right from X86 there was some case we could have a store of a 1, 2, or 4 bit mask and have a scalar zextload that then expected the bits to be 0. Its tricky to zero the bits with RVV. We need to do something like round VL up, zero a register, lower the VL back down, then do a tail undisturbed move into the zero register. Another option might be to generate a mask of 1/2/4 bits set with a VL of 8 and use that to mask off the bits. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96468	2021-02-11 09:13:16 -08:00
Yonghong Song	d55275bb4b	BPF: Implement TTI.IntImmCost() properly This patch implemented TTI.IntImmCost() properly. Each BPF insn has 32bit immediate space, so for any immediate which can be represented as 32bit signed int, the cost is technically free. If an int cannot be presented as a 32bit signed int, a ld_imm64 instruction is needed and a TCC_Basic is returned. This change is motivated when we observed that several bpf selftests failed with latest llvm trunk, e.g., #10/16 strobemeta.o:FAIL #10/17 strobemeta_nounroll1.o:FAIL #10/18 strobemeta_nounroll2.o:FAIL #10/19 strobemeta_subprogs.o:FAIL #96 snprintf_btf:FAIL The reason of the failure is due to that SpeculateAroundPHIsPass did aggressive transformation which alters control flow for which currently verifer cannot handle well. In llvm12, SpeculateAroundPHIsPass is not called. SpeculateAroundPHIsPass relied on TTI.getIntImmCost() and TTI.getIntImmCostInst() for profitability analysis. This patch implemented TTI.getIntImmCost() properly for BPF backend which also prevented transformation which caused the above test failures. Differential Revision: https://reviews.llvm.org/D96448	2021-02-11 08:35:25 -08:00
David Green	91764313a1	[ARM] Add CostKind to getMVEVectorCostFactor. This adds the CostKind to getMVEVectorCostFactor, so that it can automatically account for CodeSize costs, where it returns a cost of 1 not the MVEFactor used for Throughput/Latency. This helps simplify the caller code and allows us to get the codesize cost more correct in more cases.	2021-02-11 15:33:59 +00:00
Simon Tatham	84aee28dc2	[ARM] Copy-paste error in ARMv87a architecture definition. In the tablegen architecture definition, the Name field for the ARMv87a record read "ARMv86a". All the other records contain their own names. Corrected it to "ARMv87a", and added the necessary value in ARMArchEnum for that to refer to. Reviewed By: pratlucas Differential Revision: https://reviews.llvm.org/D96493	2021-02-11 13:35:56 +00:00
David Green	71e1b7feea	[ARM] Change getScalarizationOverhead overload used in gather costs. NFC This changes which of the getScalarizationOverhead overloads is used in the gather/scatter cost to use the base variant directly, not relying on the version using heuristics on the number of args with no args provided. It should still produce the same costs for scalarized gathers/scatters.	2021-02-11 11:58:55 +00:00
Carl Ritson	fb4b457dbd	[AMDGPU] Move kill lowering to WQM pass and add live mask tracking Move implementation of kill intrinsics to WQM pass. Add live lane tracking by updating a stored exec mask when lanes are killed. Use live lane tracking to enable early termination of shader at any point in control flow. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D94746	2021-02-11 20:31:29 +09:00
David Green	c7af7bf359	[ARM] Remove dead mov's in preheader of tail predicated loops With t2DoLoopDec we can be left with some extra MOV's in the preheaders of tail predicated loops. This removes them, in the same way we remove other dead variables. Differential Revision: https://reviews.llvm.org/D91857	2021-02-11 10:48:20 +00:00
Sander de Smalen	e6c099942e	[TTI] Change TargetTransformInfo::getMinimumVF to return ElementCount This will be needed in the loop-vectorizer where the minimum VF requested may be a scalable VF. getMinimumVF now takes an additional operand 'IsScalableVF' that indicates whether a scalable VF is required. Reviewed By: kparzysz, rampitec Differential Revision: https://reviews.llvm.org/D96020	2021-02-11 09:08:48 +00:00
David Green	a86aaa927d	[ARM] Make a BE predicate bitcast consistent with the rest of llvm We were storing predicate registers, such as a <8 x i1>, in the opposite order to how the rest of llvm expects. This actually turns out to be correct for the one place that usually uses it - the ScalarizeMaskedMemIntrin pass, but only because the pass was incorrect itself. This fixes the order so that bits are stored in the opposite order and bitcasts work as expected. This allows the Scalarization pass to be fixed, as in https://reviews.llvm.org/D94765. Differential Revision: https://reviews.llvm.org/D94867	2021-02-11 08:59:52 +00:00

1 2 3 4 5 ...

61364 Commits