llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 18:54:02 +01:00

Author	SHA1	Message	Date
Jay Foad	899f1c90ad	[GlobalISel] Remove ConstantFoldingMIRBuilder ConstantFoldingMIRBuilder was an experiment which is not used for anything. The constant folding functionality is now part of CSEMIRBuilder. Differential Revision: https://reviews.llvm.org/D101050	2021-04-23 09:13:27 +01:00
Daniel Kiss	30b326d46e	[AArch64] Fix for BTI landing pad insertion with PAC-RET+bkey. EMITBKEY is emitted for PAC-RET+bkey, which is a non machine instructions. PR: 49957 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D100996	2021-04-23 10:07:25 +02:00
KAWASHIMA Takahiro	47186c3ead	[LoopReroll] Fix rerolling loop with extra instructions Fixes PR47627 This fix suppresses rerolling a loop which has an unrerollable instruction. Sample IR for the explanation below: ``` define void @foo([2 x i32]* nocapture %a) { entry: br label %loop loop: ; base instruction %indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ] ; unrerollable instructions %stptrx = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %indvar, i64 0 store i32 999, i32* %stptrx, align 4 ; extra simple arithmetic operations, used by root instructions %plus20 = add nuw nsw i64 %indvar, 20 %plus10 = add nuw nsw i64 %indvar, 10 ; root instruction 0 %ldptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 0 %value0 = load i32, i32* %ldptr0, align 4 %stptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 0 store i32 %value0, i32* %stptr0, align 4 ; root instruction 1 %ldptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 1 %value1 = load i32, i32* %ldptr1, align 4 %stptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 1 store i32 %value1, i32* %stptr1, align 4 ; loop-increment and latch %indvar.next = add nuw nsw i64 %indvar, 1 %exitcond = icmp eq i64 %indvar.next, 5 br i1 %exitcond, label %exit, label %loop exit: ret void } ``` In the loop rerolling pass, `%indvar` and `%indvar.next` are appended to the `LoopIncs` vector in the `LoopReroll::DAGRootTracker::findRoots` function. Before this fix, two instructions with `unrerollable instructions` comment above are marked as `IL_All` at the end of the `LoopReroll::DAGRootTracker::collectUsedInstructions` function, as well as instructions with `extra simple arithmetic operations` comment and `loop-increment and latch` comment. It is incorrect because `IL_All` means that the instruction should be executed in all iterations of the rerolled loop but the `store` instruction should not. This fix rejects instructions which may have side effects and don't belong to def-use chains of any root instructions and reductions. See https://bugs.llvm.org/show_bug.cgi?id=47627 for more information.	2021-04-23 15:14:46 +09:00
Wang, Pengfei	83a6f34489	[X86][AMX][NFC] Avoid assert for the same immidiate value The previous condition in the assert was over strict. We ought to allow the same immidiate value being loaded more than once. The intention for the assert is to check the same AMX register uses multiple different immidiate shapes. So this fix supposes to be NFC. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D101124	2021-04-23 12:17:00 +08:00
Wang, Pengfei	d73d62d45f	[X86][AMX] Try to hoist AMX shapes' def We request no intersections between AMX instructions and their shapes' def when we insert ldtilecfg. However, this is not always ture resulting from not only users don't follow AMX API model, but also optimizations. This patch adds a mechanism that tries to hoist AMX shapes' def as well. It only hoists shapes inside a BB, we can improve it for cases across BBs in future. Currently, it only hoists shapes of which all sources' def above the first AMX instruction. We can improve for the case that only source that moves an immediate value to a register below AMX instruction. Differential Revision: https://reviews.llvm.org/D101067	2021-04-23 12:17:00 +08:00
Wang, Pengfei	d7776e3283	[X86] Enable compilation of user interrupt handlers. Add __uintr_frame structure and use UIRET instruction for functions with x86 interrupt calling convention when UINTR is present. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D99708	2021-04-23 11:43:57 +08:00
Serguei Katkov	126d78bad9	[InlineSpiller] Clean-up isSpillCandBB This is mostly NFC except that for end of BB not previous slot is used. Idx is used to find a def of sibling live interval in that slot. The def on end of MBB and on previous slot of end MBB should be the same, so it should be NFC. Reviewers: reames, qcolombet, MatzeB, wmi, rnk Reviewed By: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D100922	2021-04-23 10:16:02 +07:00
Nico Weber	412c04b610	[gn build] (manually) port 0b2bc69ba29	2021-04-22 22:40:53 -04:00
Matt Arsenault	d002a57ca7	AMDGPU: Restore atomic fp feature on FP atomic instruction definitions 9931b1f7a4785b6a17fb87b81a3546d61d0cbca1 switched this to checking for the two specific subtargets, instead of the dedicated feature. This broke supporting functions which force added the feature when emitting targets that do not actually support them. This stil does not work for the targets that use the gfx6/7 or gfx10 encodings.	2021-04-22 21:32:01 -04:00
Fangrui Song	c83fe04e08	[IR][sanitizer] Add module flag "frame-pointer" and set it for cc1 -mframe-pointer={non-leaf,all} The Linux kernel objtool diagnostic `call without frame pointer save/setup` arise in multiple instrumentation passes (asan/tsan/gcov). With the mechanism introduced in D100251, it's trivial to respect the command line -m[no-]omit-leaf-frame-pointer/-f[no-]omit-frame-pointer, so let's do it. Fix: https://github.com/ClangBuiltLinux/linux/issues/1236 (tsan) Fix: https://github.com/ClangBuiltLinux/linux/issues/1238 (asan) Also document the function attribute "frame-pointer" which is long overdue. Differential Revision: https://reviews.llvm.org/D101016	2021-04-22 18:07:30 -07:00
Levy Hsu	04656c7e3e	[RISCV] [1/2] Add IR intrinsic for Zbp extension RV32/64: grev grevi gorc gorci shfl shfli unshfl unshfli RV64 ONLY: grevw greviw gorcw gorciw shflw shfli (For non-existing shfliw) unshfli (For non-existing unshfliw) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100830	2021-04-22 16:34:51 -07:00
Keith Smiley	82ce0102d8	llvm-objdump: add --rpaths to macho support This prints the rpaths for the given binary Reviewed By: kastiglione Differential Revision: https://reviews.llvm.org/D100681	2021-04-22 16:01:10 -07:00
Heejin Ahn	5217fbac0b	[WebAssembly] Fix fixEndsAtEndOfFunction for delegate Background: CFGStackify's [[ `398f253400/llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp (L1481-L1540)` \| fixEndsAtEndOfFunction ]] fixes block/loop/try's return type when the end of function is unreachable and the function return type is not void. So if a function returns i32 and `block`-`end` wraps the whole function, i.e., the `block`'s `end` is the last instruction of the function, the `block`'s return type should be i32 too: ``` block i32 ... end end_function ``` If there are consecutive `end`s, this signature has to be propagate to those blocks too, like: ``` block i32 ... block i32 ... end end end_function ``` This applies to `try`-`end` too: ``` try i32 ... catch ... end end_function ``` In case of `try`, we not only follow consecutive `end`s but also follow `catch`, because for the type of the whole `try` to be i32, both `try` and `catch` parts have to be i32: ``` try i32 ... block i32 ... end catch ... block i32 ... end end end_function ``` --- Previously we only handled consecutive `end`s or `end` before a `catch`. But now we have `delegate`, which serves like `end` for `try`-`delegate`. So we have to follow `delegate` too and mark its corresponding `try` as i32 (the function's return type): ``` try i32 ... catch ... try i32 ;; Here ... delegate N end end_function ``` Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D101036	2021-04-22 15:32:00 -07:00
Heejin Ahn	4405bf5794	[WebAssembly] Serialize params/results in MachineFunctionInfo This adds support for YAML serialization of `Params` and `Results` fields in `WebAssemblyMachineFunctionInfo`. Types are printed as `MVT`'s string representation. This is for writing MIR tests easier. The tests added are testing simple parsing and printing of `params` / `results` fields under `machineFunctionInfo`. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D101029	2021-04-22 15:31:09 -07:00
Heejin Ahn	37702c2638	[WebAssembly] Put utility functions in Utils directory (NFC) This CL 1. Creates Utils/ directory under lib/Target/WebAssembly 2. Moves existing WebAssemblyUtilities.cpp\|h into the Utils/ directory 3. Creates Utils/WebAssemblyTypeUtilities.cpp\|h and put type declarataions and type conversion functions scattered in various places into this single place. It has been suggested several times that it is not easy to share utility functions between subdirectories (AsmParser, DIsassembler, MCTargetDesc, ...). Sometimes we ended up [[ https://reviews.llvm.org/D92840#2478863 \| duplicating ]] the same function because of this. There are already other targets doing this: AArch64, AMDGPU, and ARM have Utils/ subdirectory under their target directory. This extracts the utility functions into a single directory Utils/ and make them sharable among all passes in WebAssembly/ and its subdirectories. Also I believe gathering all type-related conversion functionalities into a single place makes it more usable. (Actually I was working on another CL that uses various type conversion functions scattered in multiple places, which became the motivation for this CL.) Reviewed By: dschuff, aardappel Differential Revision: https://reviews.llvm.org/D100995	2021-04-22 15:29:43 -07:00
Craig Topper	7a82be4f0f	[RISCV] Fix crash with fptosi.sat/fptoui.sat intrinsics on RV64. Add test cases. Add PromoteIntOp_FP_TO_XINT_SAT to type legalize the bit width operand from i32 to i64 for RV64. Add test cases for the saturating intrinsics for half/float/double and i32/i64. CodeGen is definitely not optimal. We can probably make use of the native behavior of fcvt instructions in many cases. Fixes PR50083	2021-04-22 15:18:15 -07:00
Krzysztof Parzyszek	9949cdf248	[Hexagon] Improve lowering of returns of i1 Emit explicit any-extend to avoid weird tstbit sequences.	2021-04-22 16:47:52 -05:00
Elia Geretto	99885567cb	[dfsan] Fix Len argument type in call to __dfsan_mem_transfer_callback This patch is supposed to solve: https://bugs.llvm.org/show_bug.cgi?id=50075 The function `__dfsan_mem_transfer_callback` takes a `Len` argument of type `i64`; however, when processing a `MemTransferInst` such as `llvm.memcpy.p0i8.p0i8.i32`, the `len` argument has type `i32`. In order to make the type of `len` compatible with the one of the callback argument, this change zero-extends it when necessary. Reviewed By: stephan.yichao.zhao, gbalats Differential Revision: https://reviews.llvm.org/D101048	2021-04-22 21:12:20 +00:00
Nikita Popov	5320e959d4	[GVN] Generate LE and BE check lines (NFC) I accidentally dropped some check lines in my previous commit. Apparently update_test_checks no longer warns on label conflicts???	2021-04-22 22:44:08 +02:00
Nikita Popov	b67e56152d	[GVN] Regenerate test checks (NFC)	2021-04-22 22:38:41 +02:00
Krzysztof Parzyszek	b7fbfec6c7	[Hexagon] Use 'vnot' instead of 'not' in patterns with vectors 'not' expands to checking for an xor with a -1 constant. Since this looks for a ConstantSDNode it will never match for a vector. Co-authored-by: Craig Topper <craig.topper@sifive.com> Differential Revision: https://reviews.llvm.org/D100687	2021-04-22 15:36:20 -05:00
Arthur Eubanks	21048e7590	[GlobalOpt] Don't replace alias with aliasee if aliasee is interposable Both the alias and aliasee linkage are important. PR27866 provides some background. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99629	2021-04-22 13:12:34 -07:00
David Green	1c90e80182	[AArch64] Improve vector reverse lowering This improves the lowering of v8i16 and v16i8 vector reverse shuffles. Instead of going via a generic tbl it uses a rev64; ext pair, as already happens for v4i32. Differential Revision: https://reviews.llvm.org/D100882	2021-04-22 21:01:25 +01:00
Min-Yih Hsu	f97e6f4c0f	[M68k][Disassembler][NFC] Decorate dump methods with LLVM_DUMP_METHOD And guard them with proper macro conditions. NFC.	2021-04-22 12:02:07 -07:00
Min-Yih Hsu	bcae9fcd31	[M68k][AsmParser][NFC] Remove redundant default cases Remove redundant default cases since all enumeration values have been covered (-Wcovered-switch-default). NFC.	2021-04-22 11:50:48 -07:00
Kai Nacke	02764e0318	Fix the triple used in llvm-mca. lookupTarget() can update the passed triple argument. This happens when no triple is given on the command line, and the architecture argument does not match the architecture in the default triple. For example, passing -march=aarch64 on the command line, and the default triple being x86_64-windows-msvc, the triple is changed to aarch64-windows-msvc. However, this triple is not saved, and later in the code, the triple is constructed again from the triple name, which is the default triple at this point. Thus the default triple is passed to constructor of MCSubtargetInfo instance. The triple is only used determine the object file format, and by chance, the AArch64 target also uses the COFF file format, and all is fine. Obviously, the AArch64 target does not support all available binary file formats, e.g. XCOFF and GOFF, and llvm-mca crashes in this case. The fix is to update the triple name with the changed triple name for the target lookup. Then the default object file format for the architecture is used, in the example ELF. Reviewed By: andreadb, abhina.sreeskantharajan Differential Revision: https://reviews.llvm.org/D100992	2021-04-22 14:27:09 -04:00
Vitaly Buka	639348c2c2	Revert "[sanitizer] Use COMPILER_RT_EMULATOR with gtests" Missed review comments. This reverts commit e25082961cb5aaafc817cb55593cf0ea8d3c4c22.	2021-04-22 11:15:55 -07:00
Philip Reames	555456c598	[SCEV] Compute ranges for lshr recurrences Straight forward extension to the recently added infrastructure which was pioneered with shl. Differential Revision: https://reviews.llvm.org/D99687	2021-04-22 11:06:31 -07:00
Philip Reames	8361e53fbe	Revert "[instcombine] Exploit UB implied by nofree attributes" This change effectively reverts 86664638, but since there have been some changes on top and I wanted to leave the tests in, it's not a mechanical revert. Why revert this now? Two main reasons: 1) There are continuing discussion around what the semantics of nofree. I am getting increasing uncomfortable with the seeming possibility we might redefine nofree in a way incompatible with these changes. 2) There was a reported miscompile triggered by this change (https://github.com/emscripten-core/emscripten/issues/9443). At first, I was making good progress on tracking down the issues exposed and those issues appeared to be unrelated latent bugs. Now that we've found at least one bug in the original change, and the investigation has stalled, I'm no longer comfortable leaving this in tree. In retrospect, I probably should have reverted this earlier and investigated the issues once the triggering change was out of tree.	2021-04-22 10:53:17 -07:00
Craig Topper	ae70485f74	[RISCV] Add IR intrinsics for vmsge(u).vv/vx/vi. These instructions don't really exist, but we have ways we can emulate them. .vv will swap operands and use vmsle().vv. .vi will adjust the immediate and use .vmsgt(u).vi when possible. For .vx we need to use some of the multiple instruction sequences from the V extension spec. For unmasked vmsge(u).vx we use: vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd For cases where mask and maskedoff are the same value then we have vmsge{u}.vx v0, va, x, v0.t which is the vd==v0 case that requires a temporary so we use: vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt For other masked cases we use this sequence: vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0 We trust that register allocation will prevent vd in vmslt{u}.vx from being v0 since v0 is still needed by the vmxor. Differential Revision: https://reviews.llvm.org/D100925	2021-04-22 10:44:38 -07:00
Craig Topper	295197497d	[RISCV] Add missing tests for vector type for second operand of vmsgt and vmsgtu IR intrinsics. Refactor to use new multiclass instead of individual patterns. We already supported this due to SEW=64 on RV32, but we didn't have test cases for all the types we supported. Part of D100925	2021-04-22 10:44:38 -07:00
Craig Topper	4b0b60fb85	[RISCV] Support vector type for second operand of vmfge and vmfgt IR intrinsics. We don't have instructions for these, but can swap the operands to use vmle/vmflt. This makes the IR interface more consistent and simplifies the frontend implementation. Part of D100925	2021-04-22 10:44:38 -07:00
Vitaly Buka	cd55b6be92	[sanitizer] Use COMPILER_RT_EMULATOR with gtests Differential Revision: https://reviews.llvm.org/D100998	2021-04-22 10:33:50 -07:00
Fangrui Song	dcdc354ce1	Temporarily revert the code part of D100981 "Delete le32/le64 targets" This partially reverts commit 77ac823fd285973cfb3517932c09d82e6a32f46d. Halide uses le32/le64 (https://github.com/halide/Halide/pull/5934). Temporarily brings back the code part to give them some time for migration.	2021-04-22 10:18:44 -07:00
Craig Topper	ba2e1abe2f	[RISCV] Turn splat shuffles of vector loads into strided load with stride of x0. Implementations are allowed to optimize an x0 stride to perform less memory accesses. This is the case in SiFive cores. No idea if this is the case in other implementations. We might need a tuning flag for this. Reviewed By: frasercrmck, arcbbb Differential Revision: https://reviews.llvm.org/D100815	2021-04-22 10:02:57 -07:00
Craig Topper	c85259e1c3	[RISCV] Use stack temporary to splat two GPRs into SEW=64 vector on RV32. Rather than doing splatting each separately and doing bit manipulation to merge them in the vector domain, copy the data to the stack and splat it using a strided load with x0 stride. At least on some implementations this vector load is optimized to not do a load for each element. This is equivalent to how we move i64 to f64 on RV32. I've only implemented this for the intrinsic fallbacks in this patch. I think we do similar splatting/shifting/oring in other places. If this is approved, I'll refactor the others to share the code. Differential Revision: https://reviews.llvm.org/D101002	2021-04-22 09:50:07 -07:00
Krzysztof Parzyszek	e644e52d90	[Hexagon] Add HVX intrinsics for conditional vector loads/stores Intrinsics for the following instructions are added. The intrinsic name is "int_hexagon_<inst>[_128B]", e.g. int_hexagon_V6_vL32b_pred_ai for 64-byte version int_hexagon_V6_vL32b_pred_ai_128B for 128-byte version V6_vL32b_pred_ai if (Pv4) Vd32 = vmem(Rt32+#s4) V6_vL32b_pred_pi if (Pv4) Vd32 = vmem(Rx32++#s3) V6_vL32b_pred_ppu if (Pv4) Vd32 = vmem(Rx32++Mu2) V6_vL32b_npred_ai if (!Pv4) Vd32 = vmem(Rt32+#s4) V6_vL32b_npred_pi if (!Pv4) Vd32 = vmem(Rx32++#s3) V6_vL32b_npred_ppu if (!Pv4) Vd32 = vmem(Rx32++Mu2) V6_vL32b_nt_pred_ai if (Pv4) Vd32 = vmem(Rt32+#s4):nt V6_vL32b_nt_pred_pi if (Pv4) Vd32 = vmem(Rx32++#s3):nt V6_vL32b_nt_pred_ppu if (Pv4) Vd32 = vmem(Rx32++Mu2):nt V6_vL32b_nt_npred_ai if (!Pv4) Vd32 = vmem(Rt32+#s4):nt V6_vL32b_nt_npred_pi if (!Pv4) Vd32 = vmem(Rx32++#s3):nt V6_vL32b_nt_npred_ppu if (!Pv4) Vd32 = vmem(Rx32++Mu2):nt V6_vS32b_pred_ai if (Pv4) vmem(Rt32+#s4) = Vs32 V6_vS32b_pred_pi if (Pv4) vmem(Rx32++#s3) = Vs32 V6_vS32b_pred_ppu if (Pv4) vmem(Rx32++Mu2) = Vs32 V6_vS32b_npred_ai if (!Pv4) vmem(Rt32+#s4) = Vs32 V6_vS32b_npred_pi if (!Pv4) vmem(Rx32++#s3) = Vs32 V6_vS32b_npred_ppu if (!Pv4) vmem(Rx32++Mu2) = Vs32 V6_vS32Ub_pred_ai if (Pv4) vmemu(Rt32+#s4) = Vs32 V6_vS32Ub_pred_pi if (Pv4) vmemu(Rx32++#s3) = Vs32 V6_vS32Ub_pred_ppu if (Pv4) vmemu(Rx32++Mu2) = Vs32 V6_vS32Ub_npred_ai if (!Pv4) vmemu(Rt32+#s4) = Vs32 V6_vS32Ub_npred_pi if (!Pv4) vmemu(Rx32++#s3) = Vs32 V6_vS32Ub_npred_ppu if (!Pv4) vmemu(Rx32++Mu2) = Vs32 V6_vS32b_nt_pred_ai if (Pv4) vmem(Rt32+#s4):nt = Vs32 V6_vS32b_nt_pred_pi if (Pv4) vmem(Rx32++#s3):nt = Vs32 V6_vS32b_nt_pred_ppu if (Pv4) vmem(Rx32++Mu2):nt = Vs32 V6_vS32b_nt_npred_ai if (!Pv4) vmem(Rt32+#s4):nt = Vs32 V6_vS32b_nt_npred_pi if (!Pv4) vmem(Rx32++#s3):nt = Vs32 V6_vS32b_nt_npred_ppu if (!Pv4) vmem(Rx32++Mu2):nt = Vs32	2021-04-22 11:49:29 -05:00
Raphael Isemann	03a8dac306	Fix memory leak in MicrosoftDemangleNodes's Node::toString The buffer we turn into a std::string here is malloc'd and should be free'd before we return from this function. Follow up to LLDB leak fixes such as D100806. Reviewed By: mstorsjo, rupprecht, MaskRay Differential Revision: https://reviews.llvm.org/D100843	2021-04-22 18:44:30 +02:00
Jianzhou Zhao	94cf740f57	[dfsan] Track origin at loads The first version of origin tracking tracks only memory stores. Although this is sufficient for understanding correct flows, it is hard to figure out where an undefined value is read from. To find reading undefined values, we still have to do a reverse binary search from the last store in the chain with printing and logging at possible code paths. This is quite inefficient. Tracking memory load instructions can help this case. The main issues of tracking loads are performance and code size overheads. With tracking only stores, the code size overhead is 38%, memory overhead is 1x, and cpu overhead is 3x. In practice #load is much larger than #store, so both code size and cpu overhead increases. The first blocker is code size overhead: link fails if we inline tracking loads. The workaround is using external function calls to propagate metadata. This is also the workaround ASan uses. The cpu overhead is ~10x. This is a trade off between debuggability and performance, and will be used only when debugging cases that tracking only stores is not enough. Reviewed By: gbalats Differential Revision: https://reviews.llvm.org/D100967	2021-04-22 16:25:24 +00:00
Irina Dobrescu	53c3a79b92	[flang][openmp] Add General Semantic Checks for Allocate Directive This patch adds semantic checks for the General Restrictions of the Allocate Directive. Since the requires directive is not yet implemented in Flang, the restriction: ``` allocate directives that appear in a target region must specify an allocator clause unless a requires directive with the dynamic_allocators clause is present in the same compilation unit ``` will need to be updated at a later time. A different patch will be made with the Fortran specific restrictions of this directive. I have used the code from https://reviews.llvm.org/D89395 for the CheckObjectListStructure function. Co-authored-by: Isaac Perry <isaac.perry@arm.com> Reviewed By: clementval, kiranchandramohan Differential Revision: https://reviews.llvm.org/D91159	2021-04-22 16:15:06 +00:00
Sanjay Patel	a4fce4e845	[x86] remove stale comment from test file; NFC	2021-04-22 12:11:47 -04:00
Alexey Bataev	dab3d7322e	[SLP]Skip undefs trying to find perfect/shuffled tree entries matching. We can skip check for undefs trying to find perfect/shuffled tree entries matching, they can be ignored completely improving the final cost/vectorization results. Differential Revision: https://reviews.llvm.org/D101061	2021-04-22 08:59:07 -07:00
Hongtao Yu	713ed720d1	[llvm-profgen] A couple tweaks to the testing harness. 1. Remove unnecessary filtering code. 2. Add llvm-profgen to tool substitutions. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D101006	2021-04-22 08:57:14 -07:00
Coplin, Jared	5e52d75c0d	[Hexagon] Unmasked and masked load pair to dame bae -? one load and selects	2021-04-22 10:15:46 -05:00
Joe Ellis	543196c49f	[AArch64] Block tryCombineToBSL combines for vectors wider than NEON There are no patterns for the AArch64ISD::BSP ISD node for anything other than NEON vectors at the moment. As a result, if we hit these combines for vectors wider than a NEON vector (such as what we might get with fixed length SVE) we will fail to lower. This patch simply prevents us from attempting the combines if the input vector type is too wide. Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D100961	2021-04-22 15:09:13 +00:00
Joe Ellis	0427e8801a	[LoopVectorize] Fix bug where predicated loads/stores were dropped This commit fixes a bug where the loop vectoriser fails to predicate loads/stores when interleaving for targets that support masked loads and stores. Code such as: 1 void foo(int restrict data1, int restrict data2) 2 { 3 int counter = 1024; 4 while (counter--) 5 if (data1[counter] > data2[counter]) 6 data1[counter] = data2[counter]; 7 } ... could previously be transformed in such a way that the predicated store implied by: if (data1[counter] > data2[counter]) data1[counter] = data2[counter]; ... was lost, resulting in miscompiles. This bug was causing some tests in llvm-test-suite to fail when built for SVE. Differential Revision: https://reviews.llvm.org/D99569	2021-04-22 15:05:54 +00:00
Alexey Bataev	be1b72b8b6	[SLP]Replace more `TTI` with `TTIRef`, NFC. To pacify MSVC buildbots.	2021-04-22 07:53:20 -07:00
Alexey Bataev	b09b5f35d0	[SLP]Added explicit ref to TargetTransformInfo to try to pacify MSVC buildbots, NFC.	2021-04-22 07:49:48 -07:00
Alexey Bataev	a899f9f408	[SLP]Improve cost model for the vectorized extractelements. 1. No need to call `areAllUsersVectorized` as later the cost is calculated only if the instruction has one use and gets vectorized. 2. Need to calculate the cost of the dead extractelement more precisely, taking the vector type of the vector operand, not the resulting vector type. Part of D57059. Differential Revision: https://reviews.llvm.org/D99980	2021-04-22 07:40:17 -07:00
Dávid Bolvanský	78fb52ea7c	[LoopIdiom] Added testcase for double memset (fixed in LLVM 12); NFC	2021-04-22 16:39:25 +02:00

1 2 3 4 5 ...

214617 Commits