llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

Author	SHA1	Message	Date
Sam Elliott	3627daccde	[RISCV] Lower llvm.trap and llvm.debugtrap Summary: Until this commit, these have lowered to a call to abort(). `llvm.trap()` now lowers to `unimp`, which should trap on all systems. `llvm.debugtrap()` now lowers to `ebreak`, which is exactly what this instruction is for. Reviewers: asb, luismarques Reviewed By: asb Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69390	2019-10-28 09:54:33 +00:00
Seiya Nuta	e73b1ca85f	[llvm-objcopy][MachO] Implement --only-section Reviewers: alexshap, rupprecht, jdoerfert, jhenderson Reviewed By: alexshap, rupprecht, jhenderson Subscribers: mgorny, jakehehrlich, abrachet, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65541	2019-10-28 16:00:20 +09:00
David Zarzycki	119adc8057	[X86] Fix 48/96 byte memcmp code gen Detect scalar ISD::ZERO_EXTEND generated by memcmp lowering and convert it to ISD::INSERT_SUBVECTOR. https://reviews.llvm.org/D69464	2019-10-28 08:41:45 +02:00
Craig Topper	699301587a	[X86] Use 64-bit version of source register in LowerPATCHABLE_EVENT_CALL and LowerPATCHABLE_TYPED_EVENT_CALL Summary: The PATCHABLE_EVENT_CALL uses i32 in the intrinsic. This results in the register allocator picking a 32-bit register. We need to use the 64-bit register when forming the MOV64rr instructions. Otherwise we print illegal assembly in the text output. I think prior to this it was impossible for SrcReg to be equal to DstReg so the NOP code was not reachable. While there use Register instead of unsigned. Also add a FIXME for what looks like a bug. Reviewers: dberris Reviewed By: dberris Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69365	2019-10-27 20:44:41 -07:00
Sanjay Patel	045dec12c6	[SDAG] fold insert_vector_elt with undef index Similar to: rG4c47617627fb This makes the DAG behavior consistent with IR's insertelement. https://bugs.llvm.org/show_bug.cgi?id=42689 I've tried to maintain test intent for AArch64 and WebAssembly by replacing undef index operands with something else.	2019-10-27 15:28:43 -04:00
Craig Topper	b8fb12ab1e	[LegalizeTypes] When promoting BITREVERSE/BSWAP don't take the shift amount into account when determining the shift amount VT. If the target's preferred shift amount VT can't hold any shift amount for the promoted VT, we should use i32. The specific shift amount shouldn't matter. The type will be adjusted later when the shift itself is type legalized. This avoids an assert in getNode. Fixes PR43820.	2019-10-27 12:20:35 -07:00
David Zarzycki	c0f48a3c7e	[X86] Prefer KORTEST on Knights Landing or later for memcmp() PTEST and especially the MOVMSK instructions are slow on Knights Landing or later. As a bonus, this patch increases instruction parallelism by emitting: KORTEST(PCMPNEQ(a, b), PCMPNEQ(c, d)) == 0 Instead of: KORTEST(AND(PCMPEQ(a, b), PCMPEQ(c, d))) == ~0 https://reviews.llvm.org/D69157	2019-10-26 21:14:57 +03:00
David Zarzycki	497fe9daff	[X86] NFC: expand inline memcmp test coverage 1) Adds SSE4.1 coverage. 2) Adds prefer-256-bit or not coverage. 3) Adds more power-of-two tests up to 512 bytes. 4) Adds power-of-two-minus-one tests to verify overlapping loads. 5) Adds power-of-two-plus-one-half tests (48, 96, 192, and 384). 6) Adds greater-than/less-than tests from 16 to 512 bytes. https://reviews.llvm.org/D69222	2019-10-26 21:14:57 +03:00
cdevadas	c0c0cbe1cb	[AMDGPU] Fix Vreg_1 PHI lowering in SILowerI1Copies. There is a minor flaw in the implementation of function lowerPhis. This function replaces values of regclass Vreg_1 (boolean values) involved in PHIs into an SGPR. Currently it iterates over the MBBs and performs an inplace lowering of PHIs and fails to lower any incoming value that itself is another PHI of Vreg_1 regclass. The failure occurs only when the MBB where the incoming PHI value belongs is not visited/lowered yet. To fix this problem, collect all Vreg_1 PHIs upfront and then perform the lowering. Differential Revision: https://reviews.llvm.org/D69182	2019-10-26 14:37:45 +05:30
Sanjay Patel	6fbcb75321	[SDAG] fold extract_vector_elt with undef index This makes the DAG behavior consistent with IR's extractelement after: rGb32e4664a715 https://bugs.llvm.org/show_bug.cgi?id=42689 I've tried to maintain test intent for WebAssembly. The AMDGPU test is trying to test for crashing or other bad behavior, but I'm not sure if that's possible after this change.	2019-10-25 19:27:26 -04:00
Stanislav Mekhanoshin	40bb4ec570	[AMDGPU] Enable SGPR copy folding That used to fail in the last testcase function because after %0:sreg_64.sub0 was folded into %3:sreg_32_xm0_xexec COPY, it was further folded into S_STORE_DWORD_IMM. Its legal effective subreg class is SReg_32 while instruction expects more restricted SReg_32_XM0_EXEC. However, SIInstrInfo::isLegalRegOperand() passed the legality check and it was caught in the verifier. Borrowed code from the verifier to check for RC legality. Differential Revision: https://reviews.llvm.org/D69445	2019-10-25 15:08:30 -07:00
Yonghong Song	b5fb93b933	[BPF] fix a CO-RE issue with -mattr=+alu32 Ilya Leoshkevich (<iii@linux.ibm.com>) reported an issue that with -mattr=+alu32 CO-RE has a segfault in BPF MISimplifyPatchable pass. The pattern will be transformed by MISimplifyPatchable pass looks like below: r5 = ld_imm64 @"b:0:0$0:0" r2 = ldw r5, 0 ... r2 ... // use r2 The pass will remove the intermediate 'ldw' instruction and replacing all r2 with r5 likes below: r5 = ld_imm64 @"b:0:0$0:0" ... r5 ... // use r5 Later, the ld_imm64 insn will be replaced with r5 = <patched immediate> for field relocation purpose. With -mattr=+alu32, the input code may become r5 = ld_imm64 @"b:0:0$0:0" w2 = ldw32 r5, 0 ... w2 ... // use w2 Replacing "w2" with "r5" is incorrect and will trigger compiler internal errors. To fix the problem, if the register class of ldw* dest register is sub_32, we just replace the original ldw* register with: w2 = w5 Directly replacing all uses of w2 with in-place constructed w5 for the use operand seems not working in all cases. The latest kernel will have -mattr=+alu32 on by default, so added this flag to all CORE tests. Tested with latest kernel bpf-next branch as well with this patch. Differential Revision: https://reviews.llvm.org/D69438	2019-10-25 14:27:25 -07:00
Sanjay Patel	cd60fd42a3	[x86] add tests for extractelement with undef index (PR42689); NFC	2019-10-25 17:22:37 -04:00
Jian Cai	66c120eb49	Revert "[ARM] Uses "Sun Style" syntax for section switching" This reverts commit 03de2f84fc4acf06c719cd007b5459c9d4d0a20c.	2019-10-25 14:03:07 -07:00
Matt Arsenault	86e74b8817	GlobalISel: Implement widenScalar for G_INSERT_VECTOR_ELT	2019-10-25 13:55:07 -07:00
Jian Cai	d5c7197a31	[ARM] Uses "Sun Style" syntax for section switching Summary: Support "Sun Style" syntax for section switching ("#alloc,#write" etc). https://bugs.llvm.org/show_bug.cgi?id=43759 Reviewers: peter.smith, eli.friedman, kristof.beyls, t.p.northover Reviewed By: peter.smith Subscribers: MaskRay, llozano, manojgupta, nickdesaulniers, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69296	2019-10-25 13:27:35 -07:00
Matt Arsenault	f79040be20	AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHG Custom lower this to a target instruction with the merge operands. I think it might be better to directly select this and emit a REG_SEQUENCE, but this would be more work since it would require splitting the tablegen patterns for these cases from the other atomics.	2019-10-25 13:11:09 -07:00
Changpeng Fang	5b1a7b5635	AMDGPU: Fix the broken dominator tree when creating waterfall loop for resource descriptor Summary: In loadSRsrcFromVGPR, if MBB is the same as Succ, Remiander is not the immediate dominator of Succ. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D69358	2019-10-25 13:08:04 -07:00
Daniel Sanders	d70c5273d1	[gicombiner] Add parse failure tests for defs/match	2019-10-25 12:56:49 -07:00
Amy Huang	c8e61a471b	Revert "Add an instruction marker field to the ExtraInfo in MachineInstrs." Reverting commit b85b4e5a6f8579c137fecb59a4d75d7bfb111f79 due to some buildbot failures/ out of memory errors.	2019-10-25 12:41:34 -07:00
Teresa Johnson	a378559c07	[LLD][ThinLTO] Handle GUID collision in import global processing Summary: If there are a GUID collision between two globals checking the summarylist from the import index to make assumption can be dangerous. Do not assume that a GlobalValue that has a GlobalVarSummary actually is a GlobalVariable as it can be another GlobalValue with the same GUID that the summary is connected to. Patch by Joel Klinghed (the_jk@opera.com) Reviewers: evgeny777, tejohnson Reviewed By: tejohnson Subscribers: tejohnson, dblaikie, MaskRay, mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67322	2019-10-25 12:36:01 -07:00
Sanjay Patel	b45a90bb89	[CVP] add test for poison propagation bug (PR43802); NFC	2019-10-25 15:01:57 -04:00
Alexander Shaposhnikov	0acc35fd3b	[llvm-objcopy][MachO] Add support for min os version load commands Add support for min os version load commands. Test plan: make check-all Differential revision: https://reviews.llvm.org/D69419	2019-10-25 11:42:29 -07:00
Stanislav Mekhanoshin	8136d749b5	[AMDGPU] Fold AGPR reg_sequence initializers Differential Revision: https://reviews.llvm.org/D69413	2019-10-25 11:39:02 -07:00
vpykhtin	7d2c3bcc82	[AMDGPU] Disallow dpp combining for dpp instructions without Src2 operand (when Src2 is required) Differential revision: https://reviews.llvm.org/D69430	2019-10-25 21:30:37 +03:00
Sanjay Patel	e414386896	[DAGCombiner] widen zext of popcount based on target support zext (ctpop X) --> ctpop (zext X) This is a prerequisite step for canonicalizing in the other direction (narrow the popcount) in IR - PR43688: https://bugs.llvm.org/show_bug.cgi?id=43688 I'm not sure if any other targets are affected, but I found a missing fold for PPC, so added tests based on that. The reason we widen all the way to 64-bit in these tests is because the initial DAG looks something like this: t5: i8 = ctpop t4 t6: i32 = zero_extend t5 <-- created based on IR, but unused node? t7: i64 = zero_extend t5 Differential Revision: https://reviews.llvm.org/D69127	2019-10-25 14:10:51 -04:00
Austin Kerbow	2b4fed1026	AMDGPU/GlobalISel: Legalize FDIV16 Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69347	2019-10-25 11:07:17 -07:00
Sanjay Patel	4f884b9be0	[PowerPC] add test for popcnt with any_extend; NFC A zext-specific variation of this case is proposed in D69127.	2019-10-25 12:43:44 -04:00
Amy Huang	9f76a00d08	Add an instruction marker field to the ExtraInfo in MachineInstrs. Summary: Add instruction marker to MachineInstr ExtraInfo. This does almost the same thing as Pre/PostInstrSymbols, except that it doesn't create a label until printing instructions. This allows for labels to be put around instructions that are deleted/duplicated somewhere. Also undo the workaround in r375137. Reviewers: rnk Subscribers: MatzeB, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69136	2019-10-25 09:21:10 -07:00
Scott Linder	e3c14856ec	[AMDGPU] Remove update_llc_test_checks for a test The test split-arg-dbg-value.ll has a host-specific path in the full output captured by update_llc_test_checks. Fix for test failures introduced in https://reviews.llvm.org/D69402 Tags: #llvm	2019-10-25 11:47:33 -04:00
Luís Marques	ef490b4e67	[RISCV] Add support for half-precision floats Complete fp16 support by ensuring that load extension / truncate store operations are properly expanded. Reviewers: asb, lenary Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D69246	2019-10-25 14:02:02 +01:00
Petar Avramovic	f0e9c21e95	[MIPS GlobalISel] Select MSA vector generic and builtin fsqrt selectImpl is able to select G_FSQRT when we set bank for vector operands to fprb. Add detailed tests. Note: G_FSQRT is generated from llvm-ir intrinsics llvm.sqrt., and at the moment MIPS is not able to generate this intrinsic for vector type (some targets generate vector llvm.sqrt. from calls to a builtin function). __builtin_msa_fsqrt_<format> will be transformed into G_FSQRT in legalizeIntrinsic and selected in the same way. Differential Revision: https://reviews.llvm.org/D69376	2019-10-25 14:45:14 +02:00
georgerim	f4f8838607	[yaml2obj, obj2yaml] - Add support for SHT_NOTE sections. SHT_NOTE is the section that consists of namesz, descsz, type, name + padding, desc + padding data. This patch teaches yaml2obj, obj2yaml to dump and parse them. This patch implements the section how it is described here: https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-18048.html Which says: "For 64–bit objects and 32–bit objects, each entry is an array of 4-byte words in the format of the target processor" The official specification is different http://www.sco.com/developers/gabi/latest/ch5.pheader.html#note_section And says: "n 64-bit objects (files with e_ident[EI_CLASS] equal to ELFCLASS64), each entry is an array of 8-byte words in the format of the target processor. In 32-bit objects (files with e_ident[EI_CLASS] equal to ELFCLASS32), each entry is an array of 4-byte words in the format of the target processor" Since LLVM uses the first, 32-bit way, this patch follows it. Differential revision: https://reviews.llvm.org/D68983	2019-10-25 13:25:56 +03:00
georgerim	0dde941ced	[obj2yaml] - Better dumping for relocations without symbols associated. This just reorders the code and removes an assignment of an empty string for the case when a relocation has no symbol associated. With this our output becomes cleaner and shorter. Differential revision: https://reviews.llvm.org/D69255	2019-10-25 12:29:31 +03:00
georgerim	791ddaf763	[llvm/Object] - Fix the error message reported for a broken SHT_SYMTAB_SHNDX section. SHT_SYMTAB_SHNDX should have the same number of entries as the symbol table associated (https://www.sco.com/developers/gabi/latest/ch4.sheader.html) We currently can report the following message: "SHT_SYMTAB_SHNDX section has sh_size (24) which is not equal to the number of symbols (2)" It is just broken. This patch refines/fixes it. Differential revision: https://reviews.llvm.org/D69305	2019-10-25 12:19:46 +03:00
czhengsz	bf78d763b4	[PowerPC] [Peephole] fold frame offset by using index form to save add. renamable $x6 = ADDI8 $x1, -80 ;;; 0 is replaced with -80 renamable $x6 = ADD8 killed renamable $x6, renamable $x5 STW killed renamable $r3, 4, killed renamable $x6 :: (store 4 into %ir.14, !tbaa !2) After PEI there is a peephole opt opportunity to combine above -80 in ADDI8 with 4 in the STW to eliminate unnecessary ADD8. Expected result: renamable $x6 = ADDI8 $x1, -76 STWX killed renamable $r3, renamable $x5, killed renamable $x6 :: (store 4 into %ir.6, !tbaa !2) Reviewed by: stefanp Differential Revision: https://reviews.llvm.org/D66329	2019-10-25 04:13:30 -04:00
Kai Luo	7b7501b312	Test commit via git.	2019-10-25 01:36:55 +00:00
Joerg Sonnenberger	7fc5fc4254	Always flush pending errors in MCAsmParser This has become visible with the --fatal-warnings support.	2019-10-25 00:48:12 +02:00
Scott Linder	d6f495c298	[AMDGPU] Clean up update_llc_test_checks CodeGen tests Summary: Some tests have been hand edited without removing the update_llc_test_checks header, some have slightly outdated CHECK lines which still pass, and some have additional comments which update_llc_test_checks pushes towards the function body. Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69402	2019-10-24 17:35:33 -04:00
Akira Hatanaka	74a0f2314a	[ObjC][ARC] Check whether the return and parameter types of the old and new functions are compatible before upgrading a function call to an intrinsic call. Sometimes users insert calls to ARC runtime functions that are not compatible with the corresponding intrinsic functions (for example, 'i8* @objc_storeStrong' instead of 'void @objc_storeStrong'). Don't upgrade those calls. rdar://problem/56447127	2019-10-24 13:08:50 -07:00
Craig Topper	ad3a6f4017	[GlobalISel][AArch64][AMDGPU][X86] Teach LegalizationArtifactCombiner to combine trunc(g_constant). This allows X86 to properly form shift by immediate instructions since we require an 8-bit constant to match the imported SelectionDAG patterns.	2019-10-24 12:59:26 -07:00
Stanislav Mekhanoshin	6cd7c75463	[AMDGPU] Fix mfma scheduling crash An SUnit can be neither intruction not SDNode. It is all null if represents a nop. Fixed a crash on using SU->getInstr(). Differential Revision: https://reviews.llvm.org/D69395	2019-10-24 11:01:52 -07:00
Simon Tatham	8eb8e6e5b5	[InstCombine] Known-bits optimization for ARM MVE VADC. The MVE VADC instruction reads and writes the carry bit at bit 29 of the FPSCR register. The corresponding ACLE intrinsic is specified to work with an integer in which the carry bit is stored at bit 0. So if a user writes a code sequence in C that passes the carry from one VADC to the next, like this, s0 = vadcq_u32(a0, b0, &carry); s1 = vadcq_u32(a1, b1, &carry); then clang will generate IR for each of those operations that shifts the carry bit up into bit 29 before the VADC, and after it, shifts it back down and masks off all but the low bit. But in this situation what you really wanted was two consecutive VADC instructions, so that the second one directly reads the value left in FPSCR by the first, without wasting several instructions on pointlessly clearing the other flag bits in between. This commit explains to InstCombine that the other bits of the flags operand don't matter, and adds a test that demonstrates that all the code between the two VADC instructions can be optimized away as a result. Reviewers: dmgreen, miyuki, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67162	2019-10-24 16:33:13 +01:00
Simon Tatham	5205599760	[ARM] Add IR intrinsics for MVE VLD[24] and VST[24]. The VST2 and VST4 instructions take two or four vector registers as input, and store part of each register to memory in an interleaved pattern. They come in variants indicating which part of each register they store (VST20 and VST21; VST40 to VST43 inclusive); the intention is that issuing each of those variants in turn has the combined effect of loading or storing the whole set of registers to a memory block of equal size. The corresponding VLD2 and VLD4 instructions load from memory in the same interleaved format: each one overwrites only part of its output register set, and again, the idea is that if you use VLD4{0,1,2,3} or VLD2{0,1} together, you end up having written to the whole of each register. I've implemented the stores and loads quite differently. The loads were easiest to implement as a single intrinsic that expands to all four VLD4x instructions or both VLD2x, delivering four complete output registers. (Implementing each individual load as a separate instruction taking four input registers to partially overwrite is possible in theory, but pointless, and when I tried it, I found it would need extra work to get the register allocation not to be horrible.) Since that intrinsic delivers multiple outputs, it has to be instruction-selected in custom C++. But the store instructions are easier to model individually, because they don't overwrite any register at all and you can write a DAG Isel pattern in Tablegen for each one. Hence, my new intrinsic `int_arm_mve_vld4q` expands to four load instructions, delivers four full output vectors, and is handled by C++ code, whereas `int_arm_mve_vst4q` expands to just one store instruction, takes four input vectors and a constant indicating which lanes to store, and is handled entirely in Tablegen. (And similarly for vld2q/vst2q.) This is asymmetric, but it was the easiest way to do each one. Reviewers: dmgreen, miyuki, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68700	2019-10-24 16:33:13 +01:00
Simon Tatham	41354b8e32	[ARM] Add some sample IR MVE intrinsics with C++ isel. This adds some initial example IR intrinsics for MVE instructions that deliver multiple output values, and hence, have to be instruction- selected by custom C++ code instead of Tablegen patterns. I've added the writeback gather load instructions (taking a vector of base addresses and a single common offset, returning a vector of loaded values and an updated vector of base addresses); one example from the long shift family (taking and returning a 64-bit value in two GPRs); and the VADC instruction (which propagates a carry bit from each vector-lane addition to the next, taking an input carry flag in FPSCR and outputting the final one in FPSCR as well). To support the VPT-predicated forms of these instructions, I've written some helper functions to add the cluster of MVE predicate operands to the end of a MachineInstr. `AddMVEPredicateToOps` is used when the instruction actually is predicated (so it takes a predicate mask argument), and `AddEmptyMVEPredicateToOps` is for when the instruction is unpredicated (so it fills in $noreg for the mask). Each one comes in a form suitable for `vpred_n`, and one for `vpred_r` which takes the extra 'inactive' parameter. For VADC, the representation of the carry flag in the IR intrinsic is a word intended to be moved directly to and from `FPSCR_nzcvqc`, i.e. with the carry flag in bit 29 of the word. (The user-facing ACLE intrinsic will want it to be in bit 0, but I'll do that on the clang side.) Reviewers: dmgreen, miyuki, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68699	2019-10-24 16:33:13 +01:00
Simon Tatham	f9bcd34532	[ARM] Begin adding IR intrinsics for MVE instructions. This commit, together with the next few, will add a representative sample of the kind of IR intrinsics that we'll need in order to implement the user-facing ACLE intrinsics for MVE. Supporting all of them will take more work; the intention of this initial series of commits is to implement an intrinsic or two from lots of different categories, as examples and proofs of concept. This initial commit introduces a small number of IR intrinsics for instructions simple enough that they can use Tablegen ISel patterns: the predicated versions of the VADD and VSUB instructions (both integer and FP), VMIN and VMAX, and the float->half VCVT instruction (predicated and unpredicated). When using VPT-predicated instructions in automatic code generation, it will be convenient to specify the predicate value as a vector of the appropriate number of i1. To make it easy to specify all sizes of an instruction in one go and give each one the matching predicate vector type, I've added a system of Tablegen informational records describing MVE's vector types: each one gives the underlying LLVM IR ValueType (which may not be the same if the MVE vector is of explicitly signed or unsigned integers) and an appropriate vNi1 to use as the predicate vector. (Also, those info records include the usual encoding for the types, so that as we add associations between each instruction encoding and one of the new `MVEVectorVTInfo` records, we can remove some of the existing template parameters and replace them with references to the vector type info's fields.) The user-facing ACLE intrinsics will receive a predicate mask as a 16-bit integer, so I've also provided a pair of intrinsics i2v and v2i, to convert between an integer and a vector of i1 by just changing the register class. Reviewers: dmgreen, miyuki, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67158	2019-10-24 16:33:13 +01:00
Michael Liao	b532a94abc	[AMDGPU] Skip additional folding on the same operand. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69355	2019-10-24 11:30:22 -04:00
Petar Avramovic	c4354e2d3b	[MIPS GlobalISel] Select MSA vector generic and builtin fabs selectImpl is able to select G_FABS when we set bank for vector operands to fprb. Add detailed tests. Note: G_FABS is generated from llvm-ir intrinsics llvm.fabs., and at the moment MIPS is not able to generate this intrinsic for vector type (some targets generate vector llvm.fabs. from calls to a builtin function). We can handle fabs using __builtin_msa_fmax_a_<format> and passing same vector as both arguments. __builtin_msa_fmax_a_<format> will be directly selected into FMAX_A_<format> in legalizeIntrinsic. Differential Revision: https://reviews.llvm.org/D69346	2019-10-24 13:45:26 +02:00
Petar Avramovic	5167c00f3d	[MIPS GlobalISel] MSA vector generic and builtin fadd, fsub, fmul, fdiv Select vector G_FADD, G_FSUB, G_FMUL and G_FDIV for MIPS32 with MSA. We have to set bank for vector operands to fprb and selectImpl will do the rest. __builtin_msa_fadd_<format>, __builtin_msa_fsub_<format>, __builtin_msa_fmul_<format> and __builtin_msa_fdiv_<format> will be transformed into G_FADD, G_FSUB, G_FMUL and G_FDIV in legalizeIntrinsic respectively and selected in the same way. Differential Revision: https://reviews.llvm.org/D69340	2019-10-24 10:15:07 +02:00
Petar Avramovic	aa102b103b	[MIPS GlobalISel] MSA vector generic and builtin sdiv, srem, udiv, urem Select vector G_SDIV, G_SREM, G_UDIV and G_UREM for MIPS32 with MSA. We have to set bank for vector operands to fprb and selectImpl will do the rest. __builtin_msa_div_s_<format>, __builtin_msa_mod_s_<format>, __builtin_msa_div_u_<format> and __builtin_msa_mod_u_<format> will be transformed into G_SDIV, G_SREM, G_UDIV and G_UREM in legalizeIntrinsic respectively and selected in the same way. Differential Revision: https://reviews.llvm.org/D69333	2019-10-24 10:03:36 +02:00

1 2 3 4 5 ...

66014 Commits