llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 19:12:56 +02:00

Author	SHA1	Message	Date
David Blaikie	a7f9ea705a	Make llvm::function_ref's operator bool explicit This can avoid all sorts of mistakes with implicit conversion (indirectly) to int, etc. I'm quite surprise there aren't any things to fixup with this - but I guess most uses of function_ref aren't optional/nullable.	2020-03-26 20:09:57 -07:00
Zakk Chen	028706e67a	Fix typo, targetFeature should be lowercase. this fixing also enable llc -mattr=+cpuhelp Reviewers: ziangwan, kongyi Reviewed By: kongyi Tags: #llvm Differential Revision: https://reviews.llvm.org/D76757	2020-03-26 19:40:04 -07:00
Kai Wang	78d8129469	[NFC] Clang format for the ELF header and ARM build attributes. Differential Revision: https://reviews.llvm.org/D76819	2020-03-27 09:53:12 +08:00
Leonard Chan	583d100083	Move setBugReportMsg() out from under a conditional Fixes a build break with LLVM_ENABLE_BACKTRACES=OFF. Differential Revision: https://reviews.llvm.org/D76893	2020-03-26 16:39:03 -07:00
Cyndy Ishida	dbf20123ca	[llvm][TextAPI/MachO] silence clang-tidy warnings, NFC * applies only to tests	2020-03-26 16:32:04 -07:00
Dan Gohman	b72287bdfc	[WebAssembly] Support wasm exports with zero-length names. Zero-length strings are valid export names in WebAssembly, so allow users to specify them. Differential Revision: https://reviews.llvm.org/D71793	2020-03-26 16:20:43 -07:00
Dan Gohman	d964845ffe	[WebAssembly] Fix the order of destructors in the LowerGlobalDtors pass. Fix the LowerGlobalDtors pass to run destructors in the same order as the regular LLVM destructor lowering -- in reverse order. Adjacent destructors with the same associated object are grouped, but destructors are not reordered based on associated objects. Differential Revision: https://reviews.llvm.org/D70685	2020-03-26 16:19:02 -07:00
Stanislav Mekhanoshin	f9ca869bd0	[AMDGPU] Propagate amdgpu-waves-per-eu to callees Differential Revision: https://reviews.llvm.org/D76868	2020-03-26 14:43:44 -07:00
LLVM GN Syncbot	c9d270bf8d	[gn build] Port 9f7d4150b9e	2020-03-26 21:10:45 +00:00
Craig Topper	26b3a1d698	[X86] Move combineLoopMAddPattern and combineLoopSADPattern to an IR pass before SelecitonDAG. These transforms rely on a vector reduction flag on the SDNode set by SelectionDAGBuilder. This flag exists because SelectionDAG can't see across basic blocks so SelectionDAGBuilder is looking across and saving the info. X86 is the only target that uses this flag currently. By removing the X86 code we can remove the flag and the SelectionDAGBuilder code. This pass adds a dedicated IR pass for X86 that looks across the blocks and transforms the IR into a form that the X86 SelectionDAG can finish. An advantage of this new approach is that we can enhance it to shrink the phi nodes and final reduction tree based on the zeroes that we need to concatenate to bring the partially reduced reduction back up to the original width. Differential Revision: https://reviews.llvm.org/D76649	2020-03-26 14:10:20 -07:00
Simon Pilgrim	a8c9ffb590	[X86] Prefer PACKUS(AND(),AND()) to SHUFFLE(PSHUFB(),PSHUFB()) on all targets Extends rG9d1721ce3926 to support AVX2+ targets.	2020-03-26 20:46:24 +00:00
Jay Foad	fa0f9a79a6	[AMDGPU] Rename overloaded getMaxWavesPerEU to getWavesPerEUForWorkGroup Summary: I think Max in the name was misleading. NFC. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76860	2020-03-26 20:21:04 +00:00
Jay Foad	6f92f87437	[AMDGPU] Remove getMaxWavesPerCU in favour of getWavesPerWorkGroup. Summary: These methods were identical. I chose to remove getMaxWavesPerCU because I think Max in the name was misleading. NFC. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76859	2020-03-26 20:21:04 +00:00
Derek Schuff	9362eda6c4	[WEbAssembly] Clear frame base vreg in explicit-locals when stack pointer is dead Having an alloca in a function causes the stack pointer to be generated in the prolog, but if it's unused other than for debug info, explicit-locals will drop it and not allocate a local. In this case we need to reset the FrameBaseVreg. Differential Revision: https://reviews.llvm.org/D76784	2020-03-26 13:07:32 -07:00
Simon Pilgrim	f0bba60cca	[X86] lowerV16I8Shuffle - create v8i16 mask for PACKUS(AND(),AND()) patterns. We can improve computeKnownBits results by avoiding excess bitcasts. For this pattern we were doing: (v16i8 PACKUS(v8i16 BITCAST(v16i8 AND(V1, MASK)), v8i16 BITCAST(v16i8 AND(V2, MASK)))) By performing the MASK/AND with a v8i16 type and bitcasting V1/V2 directly we can help computeKnownBits see that the mask is clearing the upper bits and allows shuffle combining to peek through later on. This will be necessary to extend rG9d1721ce3926 to AVX2+ targets in a future patch.	2020-03-26 19:59:57 +00:00
diggerlin	fd7184191d	[AIX] discard the label in the csect of function description and use qualname for linkage SUMMARY: SUMMARY for a source file "test.c" void foo() {}; llc will generate assembly code as (assembly patch) .globl foo .globl .foo .csect foo[DS] foo: .long .foo .long TOC[TC0] .long 0 and symbol table as (xcoff object file) [4] m 0x00000004 .data 1 unamex foo [5] a4 0x0000000c 0 0 SD DS 0 0 [6] m 0x00000004 .data 1 extern foo [7] a4 0x00000004 0 0 LD DS 0 0 After first patch, the assembly will be as .globl foo[DS] # -- Begin function foo .globl .foo .align 2 .csect foo[DS] .long .foo .long TOC[TC0] .long 0 and symbol table will as [6] m 0x00000004 .data 1 extern foo [7] a4 0x00000004 0 0 DS DS 0 0 Change the code for the assembly path and xcoff objectfile patch for llc. Reviewers: Jason Liu Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D76162	2020-03-26 15:46:52 -04:00
Michael Liao	58a5965510	[cuda][hip] Add CUDA builtin surface/texture reference support. Summary: - Even though the bindless surface/texture interfaces are promoted, there are still code using surface/texture references. For example, [PR#26400](https://bugs.llvm.org/show_bug.cgi?id=26400) reports the compilation issue for code using `tex2D` with texture references. For better compatibility, this patch proposes the support of surface/texture references. - Due to the absent documentation and magic headers, it's believed that `nvcc` does use builtins for texture support. From the limited NVVM documentation[^nvvm] and NVPTX backend texture/surface related tests[^test], it's believed that surface/texture references are supported by replacing their reference types, which are annotated with `device_builtin_surface_type`/`device_builtin_texture_type`, with the corresponding handle-like object types, `cudaSurfaceObject_t` or `cudaTextureObject_t`, in the device-side compilation. On the host side, that global handle variables are registered and will be established and updated later when corresponding binding/unbinding APIs are called[^bind]. Surface/texture references are most like device global variables but represented in different types on the host and device sides. - In this patch, the following changes are proposed to support that behavior: + Refine `device_builtin_surface_type` and `device_builtin_texture_type` attributes to be applied on `Type` decl only to check whether a variable is of the surface/texture reference type. + Add hooks in code generation to replace that reference types with the correponding object types as well as all accesses to them. In particular, `nvvm.texsurf.handle.internal` should be used to load object handles from global reference variables[^texsurf] as well as metadata annotations. + Generate host-side registration with proper template argument parsing. --- [^nvvm]: https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf [^test]: https://raw.githubusercontent.com/llvm/llvm-project/master/llvm/test/CodeGen/NVPTX/tex-read-cuda.ll [^bind]: See section 3.2.11.1.2 ``Texture reference API` in [CUDA C Programming Guide](https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf). [^texsurf]: According to NVVM IR, `nvvm.texsurf.handle` should be used. But, the current backend doesn't have that supported. We may revise that later. Reviewers: tra, rjmccall, yaxunl, a.sidorin Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76365	2020-03-26 14:44:52 -04:00
Scott Linder	3a0c436c93	[AMDGPU] Fix PC register mapping in wave32 mode Summary: The PC_32 DWARF register is for a 32-bit process address space which we don't implement in AMDGCN; another way of putting this is that the size of the PC register is not a function of the wavefront size. If we ever implement a 32-bit process address space we will need to add two more DwarfFlavours i.e. we will need to represent the product of (wave32, wave64) x (64-bit address space, 32-bit address space). Tags: #llvm Differential Revision: https://reviews.llvm.org/D76732	2020-03-26 14:43:25 -04:00
David Blaikie	8238e95412	Roll otherwise unused subexpressions into an assertion	2020-03-26 11:32:33 -07:00
Sanjay Patel	b80ff02f88	[InstCombine] add shuffle-with-bitcast-operand tests; NFC	2020-03-26 14:28:47 -04:00
Guillaume Chatelet	bb1b18ffd2	[Alignment][NFC] Use llvmTargetFrameLowering::getStackAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Reviewed By: courbet Subscribers: wuzish, arsenm, jyknight, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, fedor.sergeev, jrtc27, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76613	2020-03-26 18:15:53 +00:00
Jonathan Roelofs	f30eee7386	[InstCombine] Fix Incorrect fold of ashr+xor -> lshr w/ vectors Fixes https://bugs.llvm.org/show_bug.cgi?id=43665	2020-03-26 12:09:36 -06:00
Jinsong Ji	012f102459	[docs][Phabricator] git migration related update 1.Add instructions to update author when committing other's patch We have updated DeveloperPolicy to show how to change author in https://reviews.llvm.org/D72468 We should also update Phabricator page to include such infomation, in case people follow the steps here and forget to update author info. 2. Replace `git llvm push` with `git push` Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D76718	2020-03-26 18:08:06 +00:00
Jay Foad	50b935813e	[AMDGPU] Make use of divideCeil. NFC.	2020-03-26 16:11:35 +00:00
Jay Foad	a62c51da79	[AMDGPU] Remove unused methods. NFC.	2020-03-26 16:11:35 +00:00
Fangrui Song	d7fb6561f1	[llvm-objdump] Fix typo. NFC	2020-03-26 09:10:14 -07:00
Justin Hibbits	6bf4f263ac	[PowerPC]: Don't allow r0 as a target for LD_GOT_TPREL_L/32 Summary: The linker is free to relax this (relocation R_PPC_GOT_TPREL16) against R_PPC_TLS, if it sees fit (initial exec to local exec). If r0 is used, this can generate execution-invalid code (converts to 'addi %rX, %r0, FOO, which translates in PPC-lingo to li %rX, FOO). Forbid this instead. This fixes static binaries using locales on FreeBSD/powerpc (tested on FreeBSD/powerpcspe). Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D76662	2020-03-26 10:59:28 -05:00
Simon Pilgrim	27633c0863	[X86][SSE] Prefer PACKUS(AND(),AND()) to SHUFFLE(PSHUFB(),PSHUFB()) on pre-AVX2 targets As discussed on PR31443, we should be trying to use PACKUS for binary truncation patterns to reduce the number of shuffles. The plan is to support AVX2+ targets once we've worked around PR45315 - we fail to peek through a VBROADCAST_LOAD mask to recognise zero upper bits in a PACKUS pattern. We should also be able to add support for v8i16 and possibly 256/512-bit vectors as well.	2020-03-26 15:47:43 +00:00
Fangrui Song	82533e3f53	[PPCInstPrinter] Change printBranchOperand(calltarget) to print the target address in hexadecimal form ``` // llvm-objdump -d output (before) 0: bl .-4 4: bl .+0 8: bl .+4 // llvm-objdump -d output (after) ; GNU objdump -d 0: bl 0xfffffffc / bl 0xfffffffffffffffc 4: bl 0x4 8: bl 0xc ``` Many Operand's are not annotated as OPERAND_PCREL. They are not affected (e.g. `b .+67108860`). I plan to fix them in future patches. Modified test/tools/llvm-objdump/ELF/PowerPC/branch-offset.s to test address space wraparound for powerpc32 and powerpc64. Reviewed By: sfertile, jhenderson Differential Revision: https://reviews.llvm.org/D76591	2020-03-26 08:32:29 -07:00
Fangrui Song	3d2508df1f	[X86InstPrinter] Change printPCRelImm to print the target address in hexadecimal form ``` // llvm-objdump -d output (before) 400000: e8 0b 00 00 00 callq 11 400005: e8 0b 00 00 00 callq 11 // llvm-objdump -d output (after) 400000: e8 0b 00 00 00 callq 0x400010 400005: e8 0b 00 00 00 callq 0x400015 // GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled 400000: e8 0b 00 00 00 callq 400010 400005: e8 0b 00 00 00 callq 400015 ``` In llvm-objdump, we pass the address of the next MCInst. Ideally we should just thread the address of the current address, unfortunately we cannot call X86MCCodeEmitter::encodeInstruction (X86MCCodeEmitter requires MCInstrInfo and MCContext) to get the length of the MCInst. MCInstPrinter::printInst has other callers (e.g llvm-mc -filetype=asm, llvm-mca) which set Address to 0. They leave MCInstPrinter::PrintBranchImmAsAddress as false and this change is a no-op for them. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D76580	2020-03-26 08:28:59 -07:00
Simon Cook	baf8a6dc1d	[RISCV] Support negative constants in CompressInstEmitter Summary: Some compressed instructions match against negative values; store immediates as a signed value such that these patterns will now match the intended instructions. Reviewers: asb, lenary, PaoloS Reviewed By: asb Subscribers: rbar, johnrusso, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, evandro, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76767	2020-03-26 15:23:38 +00:00
Fangrui Song	7f7bfe12ea	[MCInstPrinter] Pass `Address` parameter to MCOI::OPERAND_PCREL typed operands. NFC Follow-up of D72172 and D72180 This patch passes `uint64_t Address` to print methods of PC-relative operands so that subsequent target specific patches can change `*InstPrinter::print{Operand,PCRelImm,...}` to customize the output. Add MCInstPrinter::PrintBranchImmAsAddress which is set to true by llvm-objdump. ``` // Current llvm-objdump -d output aarch64: 20000: bl #0 ppc: 20000: bl .+4 x86: 20000: callq 0 // Ideal output aarch64: 20000: bl 0x20000 ppc: 20000: bl 0x20004 x86: 20000: callq 0x20005 // GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled aarch64: 20000: bl 20000 ppc: 20000: bl 0x20004 x86: 20000: callq 20005 ``` In `lib/Target/X86/X86GenAsmWriter1.inc` (generated by `llvm-tblgen -gen-asm-writer`): ``` case 12: // CALL64pcrel32, CALLpcrel16, CALLpcrel32, EH_SjLj_Setup, JCXZ, JECXZ, J... - printPCRelImm(MI, 0, O); + printPCRelImm(MI, Address, 0, O); return; ``` Some targets have 2 `printOperand` overloads, one without `Address` and one with `Address`. They should annotate derived `Operand` properly with `let OperandType = "OPERAND_PCREL"`. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D76574	2020-03-26 08:21:15 -07:00
LLVM GN Syncbot	bee7db5b02	[gn build] Port 2aac0c47aed	2020-03-26 15:16:51 +00:00
Dominik Montada	5df600714d	[GlobalISel] add helper function to create arbitrary libcalls Summary: The existing helper function can only create a libcall to functions available in RTLIB. Add a helper function that can create a libcall to a given function name using the provided calling convention. Reviewers: aditya_nandakumar, t.p.northover, rovka, arsenm, dsanders Reviewed By: arsenm Subscribers: wdng, hiraditya, volkan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76845	2020-03-26 16:11:13 +01:00
Louis Dionne	8126913310	[lit] NFC: Remove trailing whitespace I keep having to remove them from my diffs!	2020-03-26 11:05:18 -04:00
Qiu Chaofan	4e091a22bd	[Legalizer] Fix some flags miss in vector results In some scalarize/split result methods (unary, binary, ...), flags in SDNode were not passed down, which may lead to unexpected results in unsafe float-point optimization. This patch fixes them. (maybe not complete) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D76832	2020-03-26 22:01:19 +08:00
Sam Parker	33ab418b5a	[NFC] Create X86 subdirectory for indvar tests Many IndVarSiimplify tests target an x86 triple, so move them into a target specific folder.	2020-03-26 12:24:45 +00:00
Aaron Ballman	e021b93f38	Clarify use of llvm_unreachable in the coding standard. There has been some ongoing confusion regarding when to use `llvm_unreachable` which this patch attempts to address. Specifically, the confusion has been around whether `llvm_unreachable` is intended to mark only unreachable code paths that the compiler cannot determine itself or to mark a code path which is unconditionally a bug to reach. Based on email and IRC discussions, it sounds like "unconditional bug to reach" is the consensus.	2020-03-26 08:08:23 -04:00
Simon Pilgrim	2950a4b04b	[X86][SSE] getFauxShuffleMask - peek through TRUNCATE/AEXT/ZEXT for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT()) As long we extract from a source vector with smaller elements and we zero-extend the element in the final shuffle mask then we can safely peek through truncations and any/zero-extensions to find the source extraction.	2020-03-26 11:57:45 +00:00
Jonas Paulsson	e127891baf	[SystemZ] Bugfix in tieOpsIfNeeded() This function did a check which was broken to see if an opcode requires op0 and op1 to be tied. By chance this is NFC. Review: Ulrich Weigand	2020-03-26 12:22:14 +01:00
Georgii Rymar	edd888ac89	[obj2yaml] - Refactor how we dump sections. NFCI. This is a NFC splitted from D75342. Previously obj2yaml never dumped a normal SHT_NULL section (i.e. when it is just zeroed) or non-allocatable SHT_STRTAB/SHT_SYMTAB/SHT_DYNSYM sections. This patch does not change the output, but it changes the logic so that we now dump these sections, and them remove them later. It allows us to create and work with our internal representation of sections, i.e. to work with the vector of Chunks, what looks cleaner. It is used by D75342 and also should help us to support dumping a content that does not belong to a section (i.e. to dump some data as `Fill` chunks). Differential revision: https://reviews.llvm.org/D76684	2020-03-26 14:04:07 +03:00
gbreynoo	b1f1108ed2	Tools emit the bug report URL on crash When Clang crashes a useful message is output: "PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script." A similar message is now output for all tools. Differential Revision: https://reviews.llvm.org/D74324	2020-03-26 10:26:59 +00:00
Kang Zhang	4130033848	[PowerPC] Remove the repeated definition for some InstAlias for mtspr/mfspr Summary: Below InstAlias have been redefined, this patch is to remove the repeated definition. mtdec/mfdec mtsdr1/mfsdr1 mtsrr0/mfsrr0 mtsrr1/mfsrr1 mtasr Reviewed By: nemanjai, steven.zhang Differential Revision: https://reviews.llvm.org/D75821	2020-03-26 09:58:30 +00:00
James Henderson	0518ef06da	[NFC][llvm-readobj] Refactor unique warning handler The unique warning handler was previously a property of the dump style, but it is commonly used in the dumper too. Since the two ELF output styles have no impact on the way warnings are printed, this patch moves the handler and related functions into the dumper class, instead of the dump style class. Reviewed by: MaskRay, grimar Differential Revision: https://reviews.llvm.org/D76777	2020-03-26 09:54:55 +00:00
Cullen Rhodes	4020b155d6	[AArch64][SVE] Implement structured store intrinsics Summary: This patch adds initial support for the following intrinsics: * llvm.aarch64.sve.st2 * llvm.aarch64.sve.st3 * llvm.aarch64.sve.st4 For storing two, three and four vectors worth of data. Basic codegen for reg+immediate forms are implemented. Reg+reg addressing modes will be addressed in a later patch. These intrinsics are intended for use in the Arm C Language Extension (ACLE). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D75947	2020-03-26 09:34:51 +00:00
Ties Stuij	74a8dfdced	[PATCH] [ARM] ARMv8.6-a command-line + BFloat16 Asm Support Summary: This patch introduces command-line support for the Armv8.6-a architecture and assembly support for BFloat16. Details can be found https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a in addition to the GCC patch for the 8..6-a CLI: https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg02647.html In detail this patch - march options for armv8.6-a - BFloat16 assembly This is part of a patch series, starting with command-line and Bfloat16 assembly support. The subsequent patches will upstream intrinsics support for BFloat16, followed by Matrix Multiplication and the remaining Virtualization features of the armv8.6-a architecture. Based on work by: - labrinea - MarkMurrayARM - Luke Cheeseman - Javed Asbar - Mikhail Maltsev - Luke Geeson Reviewers: SjoerdMeijer, craig.topper, rjmccall, jfb, LukeGeeson Reviewed By: SjoerdMeijer Subscribers: stuij, kristof.beyls, hiraditya, dexonsmith, danielkiss, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D76062	2020-03-26 09:17:20 +00:00
Simon Tatham	b77c309b4c	Do export symbols when LLVM_EXPORT_SYMBOLS_FOR_PLUGINS is on. Summary: In D76527, we stopped exporting symbols from clang, opt and llc unless the `LLVM_ENABLE_PLUGINS` cmake variable is true (which causes clang's own plugin collection to be built). But another reasonable build configuration is to ask clang to export its symbols for out-of-tree plugins to use, without building the in-tree ones. That is, you might set `LLVM_EXPORT_SYMBOLS_FOR_PLUGINS` without also setting `LLVM_ENABLE_PLUGINS` (at least if you're using MSVC, where you need to ask explicitly for the symbols to be exported). In that situation, the symbols should still be exported, but after D76527, they weren't being. Reviewers: efriedma, john.brawn Reviewed By: efriedma, john.brawn Subscribers: mgorny, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76760	2020-03-26 09:07:02 +00:00
David Green	0b35e38c5b	[ARM] Sink splats to vector float instructions Some MVE floating point instructions have gpr register variants that take the scalar gpr value and splat them to all lanes. In order to accept them in loops, the shuffle_vector and insert need to be sunk down into the loop, next to the instruction so that ISel can see the whole pattern. This does that sinking for FAdd, FSub, FMul and FCmp. The patterns for mul are slightly more constrained as there are no fms variants taking register arguments. Differential Revision: https://reviews.llvm.org/D76023	2020-03-26 09:02:18 +00:00
Craig Topper	e608f0b2f9	[X86] Update more intrinsic tests to prepare to extend D60940 to scalar fp. I want to extend D60940 to scalar FP which will prevent forming masked instructions if the arithmetic op has another use. To prepare for that, this patch updates tests to avoid repeating the operation multiple times with different masking.	2020-03-25 23:03:20 -07:00
Fangrui Song	5a9a269ca9	[InstCombine] Fix a code-sinking bug after D73832/f1a9efabcb9b - UserParent = PN->getIncomingBlock(I->use_begin()); + UserParent = PN->getIncomingBlock(SingleUse); The first use of I may be droppable (llvm.assume). When compiling llvm/lib/IR/AutoUpgrade.cpp with a bootstrapped clang with ThinLTO with minimized bitcode files, I see such a case in the function _ZN4llvm20UpgradeIntrinsicCallEPNS_8CallInstEPNS_8FunctionE clang -c -fthinlto-index=AutoUpgrade.o.thinlto.bc AutoUpgrade.bc -O3 Unfortunately it is really difficult to get a minimized reproduce.	2020-03-25 22:50:53 -07:00

... 2 3 4 5 6 ...

194053 Commits