llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 12:12:47 +01:00

Author	SHA1	Message	Date
Simon Atanasyan	156abd7667	[mips] Support 'y' operand code to print exact log2 of the operand llvm-svn: 324477	2018-02-07 12:36:39 +00:00
Simon Atanasyan	6cf477a87b	[mips] Handle 'M' and 'L' operand codes for memory operands Both operand codes now work the same way in case of register or memory operands. It print high-order or low-order word in a double-word register or memory location. llvm-svn: 324476	2018-02-07 12:36:33 +00:00
Sjoerd Meijer	16fa1af9d7	[ARM] FP16 mov imm pattern This is a follow up of r324321, adding a match pattern for mov with a FP16 immediate (also fixing operand vfp_f16imm that wasn't even compiling). Differential Revision: https://reviews.llvm.org/D42973 llvm-svn: 324456	2018-02-07 08:37:17 +00:00
Chandler Carruth	e91e72c6d6	[x86/retpoline] Make the external thunk names exactly match the names that happened to end up in GCC. This is really unfortunate, as the names don't have much rhyme or reason to them. Originally in the discussions it seemed fine to rely on aliases to map different names to whatever external thunk code developers wished to use but there are practical problems with that in the kernel it turns out. And since we're discovering this practical problems late and since GCC has already shipped a release with one set of names, we are forced, yet again, to blindly match what is there. Somewhat rushing this patch out for the Linux kernel folks to test and so we can get it patched into our releases. Differential Revision: https://reviews.llvm.org/D42998 llvm-svn: 324449	2018-02-07 06:16:24 +00:00
Tom Stellard	13a03d5c2b	AMDGPU/GlobalISel: Mark 32-bit G_FPTOUI as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D42152 llvm-svn: 324446	2018-02-07 04:47:59 +00:00
Mark Searles	93f7c20d71	[AMDGPU] Suppress redundant waitcnt instrs. 1. Run the memory legalizer prior to the waitcnt pass; keep the policy that the waitcnt pass does not remove any waitcnts within the incoming IR. 2. The waitcnt pass doesn't (yet) track waitcnts that exist prior to the waitcnt pass (it just skips over them); because the waitcnt pass is ignorant of them, it may insert a redundant waitcnt. To avoid this, check the prev instr. If it and the to-be-inserted waitcnt are the same, suppress the insertion. We keep the existing waitcnt under the assumption that whomever, e.g., the memory legalizer, inserted it knows what they were doing. 3. Follow-on work: teach the waitcnt pass to record the pre-existing waitcnts for better waitcnt production. Differential Revision: https://reviews.llvm.org/D42854 llvm-svn: 324440	2018-02-07 02:21:21 +00:00
Matt Arsenault	4b374680ef	AMDGPU: Select BFI patterns with 64-bit ints llvm-svn: 324431	2018-02-07 00:21:34 +00:00
Craig Topper	8371f1254e	[DAGCombiner][AMDGPU][X86] Turn cttz/ctlz into cttz_zero_undef/ctlz_zero_undef if we can prove the input is never zero X86 currently has a late DAG combine after cttz/ctlz are turned into BSR+BSF+CMOV to detect this and remove the CMOV. But we should be able to do this much earlier and avoid creating the cmov all together. For the changed AMDGPU test case it appears that previously the i8 cttz was type legalized to i16 which introduced an OR with 256 in order to limit the result to 8 on the widened type. At this point the result is known to never be zero, but nothing checked that. Then operation legalization is told to promote all i16 cttz to i32. This introduces an extend and a truncate and another OR with 65536 to limit the result to 16. With the DAG combiner change we are able to prevent the creation of the second OR since the opcode will have been changed to cttz_zero_undef after the first OR. I the lack of the OR caused the instruction to change to v_ffbl_b32_sdwa Differential Revision: https://reviews.llvm.org/D42985 llvm-svn: 324427	2018-02-06 23:54:37 +00:00
Eli Friedman	a8bbefe904	Place undefined globals in .bss instead of .data Following up on the discussion from http://lists.llvm.org/pipermail/llvm-dev/2017-April/112305.html, undef values are now placed in the .bss as well as null values. This prevents undef global values taking up potentially huge amounts of space in the .data section. The following two lines now both generate equivalent .bss data: @vals1 = internal unnamed_addr global [20000000 x i32] zeroinitializer, align 4 @vals2 = internal unnamed_addr global [20000000 x i32] undef, align 4 ; previously unaccounted for This is primarily motivated by the corresponding issue in the Rust compiler (https://github.com/rust-lang/rust/issues/41315). Differential Revision: https://reviews.llvm.org/D41705 Patch by varkor! llvm-svn: 324424	2018-02-06 23:22:14 +00:00
Evandro Menezes	120280b132	[AArch64] Adjust the cost model for Exynos M3 Fix the modeling of long division and SIMD conversion from integer and horizontal minimum and maximum. llvm-svn: 324417	2018-02-06 22:35:47 +00:00
Krzysztof Parzyszek	f51472dbd6	[Hexagon] Extract HVX lowering and selection into HVX-specific files, NFC llvm-svn: 324392	2018-02-06 20:22:20 +00:00
Krzysztof Parzyszek	39e7944170	[Hexagon] Lower concat of more than 2 vectors into build_vector llvm-svn: 324391	2018-02-06 20:18:58 +00:00
Stanislav Mekhanoshin	5d1899b98e	[AMDGPU] removed dead code handling rmw in memory legalizer It was always using cmpxchg path and in rmw and cmpxchg instructions are not distinguishable in the BE. Differential Revision: https://reviews.llvm.org/D42976 llvm-svn: 324383	2018-02-06 19:11:56 +00:00
Krzysztof Parzyszek	660a77777d	[Hexagon] Don't form new-value jumps from floating-point instructions Additionally, verify that the register defined by the producer is a 32-bit register. llvm-svn: 324381	2018-02-06 19:08:41 +00:00
Sjoerd Meijer	d26c465fc0	[ARM] f16 conversions This is a follow up of r324321, adding f16 <-> f32 and f16 <-> f64 conversion match patterns. Differential Revision: https://reviews.llvm.org/D42954 llvm-svn: 324360	2018-02-06 16:28:43 +00:00
Nirav Dave	8f7ea6d7d0	[DAG, X86] Improve Dependency analysis when doing multi-node Instruction Selection Cleanup cycle/validity checks in ISel (IsLegalToFold, HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full search for cycles / dependencies pruning the search when topological property of NodeId allows. As part of this propogate the NodeId-based cutoffs to narrow hasPreprocessorHelper searches. Reviewers: craig.topper, bogner Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41293 llvm-svn: 324359	2018-02-06 16:14:29 +00:00
Marek Olsak	d1440bbe12	AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU Author: Bas Nieuwenhuizen https://reviews.llvm.org/D42881 llvm-svn: 324353	2018-02-06 15:17:55 +00:00
Krzysztof Parzyszek	d8c54aaf3a	[Hexagon] Remove leftover assert llvm-svn: 324352	2018-02-06 15:15:13 +00:00
Krzysztof Parzyszek	e747b538c8	[Hexagon] Split HVX operations on vector pairs Vector pairs are legal types, but not every operation can work on pairs. For those operations that are legal for single vectors, generate a concat of their results on pair halves. llvm-svn: 324350	2018-02-06 14:24:57 +00:00
Krzysztof Parzyszek	227a5b7528	[Hexagon] Add helper functions to identify single/pair vector types, NFC llvm-svn: 324349	2018-02-06 14:21:31 +00:00
Krzysztof Parzyszek	0c375edf16	[Hexagon] Handle lowering of SETCC via setCondCodeAction It was expanded directly into instructions earlier. That was to avoid loads from a constant pool for a vector negation: "xor x, splat(i1 -1)". Implement ISD opcodes QTRUE and QFALSE to denote logical vectors of all true and all false values, and handle setcc with negations through selection patterns. llvm-svn: 324348	2018-02-06 14:16:52 +00:00
Simon Pilgrim	5207077229	[X86][SSE] Add PACKUS support for truncation of clamped values Followup to D42544 that matches PACKUSWB cases for non-AVX512, SSE and PACKUSDW cases will have to wait until we can add support for general SMIN/SMAX matching. llvm-svn: 324347	2018-02-06 14:07:46 +00:00
Tim Renouf	ad7f73b5d3	[AMDGPU] do not generate .AMDGPU.config for amdpal os type Summary: Now we generate PAL metadata for the amdpal os type, there is no need to generate the .AMDGPU.config section. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37760 Change-Id: I303c5fad66656ce97293da60621afac6595b4c18 llvm-svn: 324346	2018-02-06 13:39:38 +00:00
Sander de Smalen	412b309025	[AArch64][SVE] Asm: Add AND_ZI instructions and aliases Summary: Adds support for the SVE AND instruction with vector and logical-immediate operands, and their corresponding aliases. Reviewers: fhahn, rengolin, samparker, echristo, aadg, kristof.beyls Reviewed By: fhahn Subscribers: aemerson, javed.absar, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D42295 llvm-svn: 324343	2018-02-06 13:13:21 +00:00
Simon Pilgrim	1294fdbab0	[X86][SSE] Add PACKSS support for truncation of clamped values Followup to D42544 that matches PACKSSWB cases for non-AVX512, SSE and PACKSSDW cases will have to wait until we can add support for general SMIN/SMAX matching. llvm-svn: 324339	2018-02-06 12:16:10 +00:00
Hiroshi Inoue	1ada022dd5	[PowerPC] fix up in rL324229, NFC This patch fixes up my previous commit (add initialization of local variables). llvm-svn: 324336	2018-02-06 11:34:16 +00:00
Oliver Stannard	743e76e5a4	[AArch64] Fix spelling of ICH_ELRSR_EL2 system register This register was mis-spelled as ICH_ELSR_EL2, but has the correct encoding for ICH_ELRSR_EL2. llvm-svn: 324325	2018-02-06 09:39:04 +00:00
Oliver Stannard	1a18abc9a8	[ARM][AArch64] Add CSDB speculation barrier instruction This adds the CSDB instruction, which is a new barrier instruction described by the whitepaper at [1]. This is in encoding space which was previously executed as a NOP, so it is available for all targets that have the relevant NOP encoding space. This matches the binutils behaviour for these instructions [2][3]. [1] https://developer.arm.com/support/security-update [2] https://sourceware.org/ml/binutils/2018-01/msg00116.html [3] https://sourceware.org/ml/binutils/2018-01/msg00120.html llvm-svn: 324324	2018-02-06 09:24:47 +00:00
Sjoerd Meijer	7941278b18	[ARM] Armv8.2-A FP16 code generation (part 3/3) This adds most of the FP16 codegen support, but these areas need further work: - FP16 literals and immediates are not properly supported yet (e.g. literal pool needs work), - Instructions that are generated from intrinsics (e.g. vabs) haven't been added. This will be addressed in follow-up patches. Differential Revision: https://reviews.llvm.org/D42849 llvm-svn: 324321	2018-02-06 08:43:56 +00:00
Konstantin Zhuravlyov	f29f02b282	AMDGPU/MemoryModel: Fix monotonic atomic loads Those should have glc bit set for system and agent synchronization scopes llvm-svn: 324314	2018-02-06 04:06:04 +00:00
Ahmed Charles	0b697489b2	[RISCV] Add support for %pcrel_lo. llvm-svn: 324303	2018-02-06 00:55:23 +00:00
Reid Kleckner	42169ef1ae	Revert "Don't assume a null GV is local for ELF and MachO." This reverts r323297. It breaks building grub. llvm-svn: 324301	2018-02-06 00:47:14 +00:00
Craig Topper	7eb6729e53	[X86] Relax restrictions on what setcc condition codes can be folded with a sext when AVX512 is enabled. We now allow all signed comparisons and not equal. The complement that needs to be added for this is no worse than the extend. And the vector output forms of pcmpeq/pcmpgt have better latency than the k-register version on SKX. llvm-svn: 324294	2018-02-05 23:57:01 +00:00
Sanjay Patel	a19541c05c	[LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused (PR35681) In the motivating case from PR35681 and represented by the macro-fuse-cmp test: https://bugs.llvm.org/show_bug.cgi?id=35681 ...there's a 37 -> 31 byte size win for the loop because we eliminate the big base address offsets. SPEC2017 on Ryzen shows no significant perf difference. Differential Revision: https://reviews.llvm.org/D42607 llvm-svn: 324289	2018-02-05 23:43:05 +00:00
Nirav Dave	c405b8636c	[X86] Teach DAG unfoldMemoryOperand to reconvert CMPs to tests Summary: Copy MI-level cmp->test conversion to SelectionDAG-level memory unfold. This fixes a regression from upcoming D41293 change. Reviewers: craig.topper, RKSimon Reviewed By: craig.topper Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D42808 llvm-svn: 324261	2018-02-05 18:58:58 +00:00
Craig Topper	6aca35b677	[X86] Artificially lower the complexity of the scalar ANDN patterns so that AND with immediate will match first. This allows the immediate to folded into the and instead of being forced to move into a register. This can sometimes result in shorter encodings since the and can sign extend an immediate. This also allows us to match an and to a movzx after a not. This can cause an extra move if the input to the separate NOT has an additional user which requires a copy before the NOT. llvm-svn: 324260	2018-02-05 18:31:04 +00:00
Krzysztof Parzyszek	298aa9ea0d	[Hexagon] Memoize instruction positions in BitTracker llvm-svn: 324250	2018-02-05 17:12:07 +00:00
Craig Topper	c5caaf9155	[X86] Teach X86DAGToDAGISel::shrinkAndImmediate to preserve upper 32 zeroes of a 64 bit mask. If the upper 32 bits of a 64 bit mask are all zeros, we have special isel patterns to use a 32-bit and instead of a 64-bit and by relying on the impliciting zeroing of 32 bit ops. This patch teachs shrinkAndImmediate not to break that optimization. Differential Revision: https://reviews.llvm.org/D42899 llvm-svn: 324249	2018-02-05 16:54:07 +00:00
Benjamin Kramer	ce1b232618	BitTracker.h needs a full definition of MachineInstr, so include the defining file. Patch by Dean Sturtevant! Differential Revision: https://reviews.llvm.org/D42907 llvm-svn: 324245	2018-02-05 15:56:24 +00:00
Krzysztof Parzyszek	24456c87b2	[Hexagon] Forgot about HexagonISD::VZERO in selecting const vectors llvm-svn: 324244	2018-02-05 15:52:54 +00:00
Krzysztof Parzyszek	e519378660	[Hexagon] Don't use garbage mask in HvxSelector::shuffp2 The function shuffp2 was breaking up a wide shuffle into a pair of narrower ones, except that the narrower shuffle masks were actually uninitialized. llvm-svn: 324243	2018-02-05 15:46:41 +00:00
Krzysztof Parzyszek	9179adab3b	[Hexagon] Use V6_vmpyih for halfword multiplication Unlike V6_vmpyhv, it produces the result in the exact form that is expected without the need for a shuffle. llvm-svn: 324241	2018-02-05 15:40:06 +00:00
Dmitry Preobrazhensky	e8a7910a4d	[AMDGPU][MC] Corrected dst/data size for MIMG opcodes with d16 modifier See bug 36154: https://bugs.llvm.org/show_bug.cgi?id=36154 Differential Revision: https://reviews.llvm.org/D42847 Reviewers: cfang, artem.tamazov, arsenm llvm-svn: 324237	2018-02-05 14:18:53 +00:00
Dmitry Preobrazhensky	1c461e7122	[AMDGPU][MC] Added validation of d16 and r128 modifiers of MIMG opcodes See bugs 36094, 36095: https://bugs.llvm.org/show_bug.cgi?id=36094 https://bugs.llvm.org/show_bug.cgi?id=36095 Differential Revision: https://reviews.llvm.org/D42692 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 324231	2018-02-05 12:45:43 +00:00
Hiroshi Inoue	dc037f8fd3	[PowerPC] Check hot loop exit edge in PPCCTRLoops PPCCTRLoops transform loops using mtctr/bdnz instructions if loop trip count is known and big enough to compensate for the cost of mtctr. But if there is a loop exit edge which is known to be frequently taken (by builtin_expect or by PGO), we should not transform the loop to avoid the cost of mtctr instruction. Here is an example of a loop with hot exit edge: for (unsigned i = 0; i < TripCount; i++) { // do something if (__builtin_expect(check(), 1)) break; // do something } Differential Revision: https://reviews.llvm.org/D42637 llvm-svn: 324229	2018-02-05 12:25:29 +00:00
Craig Topper	9d71e524e3	[X86] Add isel patterns for selecting masked SUBV_BROADCAST with bitcasts. Remove combineBitcastForMaskedOp. Add test cases for the merge masked versions to make sure we have all those covered. llvm-svn: 324210	2018-02-05 08:37:37 +00:00
Craig Topper	f8333368b8	[X86] Remove unused lambda. NFC llvm-svn: 324206	2018-02-05 06:56:33 +00:00
Craig Topper	cb0a8a99fa	[X86] Remove X86ISD::SHUF128 from combineBitcastForMaskedOp. Use isel patterns instead. We always created X86ISD::SHUF128 with a 64-bit element type so we can use isel patterns to detect a bitconvert to 32-bit to handle masking. The test changes are because we also match the bitconvert even if there is no masking. This leads to unnecessary isel pattern, but it requires more multiclass hackery in tablegen to get rid of it. llvm-svn: 324205	2018-02-05 06:00:23 +00:00
Craig Topper	952f7a0d70	[X86] Add DAG combine to turn (bitcast (and/or/xor (bitcast X), Y)) -> (and/or/xor X, (bitcast Y)) when casting between GPRs and mask operations. This reduces the number of transitions between k-registers and GPRs, reducing the number of instructions. There's still some room for improvement to remove more transitions, but this is a good start. llvm-svn: 324184	2018-02-04 01:43:48 +00:00
Craig Topper	49a92b2c7a	[X86] Remove unused function argument. NFC llvm-svn: 324183	2018-02-04 01:43:44 +00:00

1 2 3 4 5 ...

45975 Commits