llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

Author	SHA1	Message	Date
Christopher Tetreault	74267eec45	Revert "Ensure that InstructionCost actually implements a total ordering" This reverts commit b481cd519e07b3ad2bd3e81c89b0dd8efd68d6bc.	2021-02-02 12:10:02 -08:00
Hongtao Yu	3594bb4c1b	[CSSPGO] Introducing distribution factor for pseudo probe. Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count. This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes. A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead. Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D93264	2021-02-02 11:55:01 -08:00
Christopher Tetreault	21f48fe20a	Ensure that InstructionCost actually implements a total ordering Previously, operator== would consider the actual equality of the pairs (lhs.Value, lhs.State) == (rhs.Value, rhs.State). However, if an invalid cost was involved in a call to operator<, only the state would be compared. Thus, it was not the case that ({2, Invalid} < {3, Invalid} \|\| {2, Invalid} > {3, Invalid} \|\| {2, Invalid} == {3, Invalid}). This patch implements a true total ordering, where cost state is considered first, then value. While it's not really imporant that {2, Invalid} be considered to be less than {3, Invalid}, it's not a problem either. This patch also implements operator== in terms of operator<, so the two definitions will be kept in sync. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D95803	2021-02-02 11:49:14 -08:00
Mehdi Amini	63a3fb08ab	Fix CMake LLVM_TARGETS_TO_BUILD "Native" option to work with JIT LLVM_TARGETS_TO_BUILD accepts both "host" or "Native" for auto-selecting the target from the environment. However the way "Native" was plumbed would lead to the JIT environment being disabled. This patch is making "Native" works just as "host". Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D95837	2021-02-02 18:47:15 +00:00
Fraser Cormack	50cb792dae	[RISCV] Fix incorrect RVV sdiv/udiv lowering Due to a clerical error, the sdiv operation was mapping to vdivu and udiv to vdiv, when the opposite mapping is the correct one. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D95869	2021-02-02 18:35:53 +00:00
Craig Topper	314d197649	[RISCV] Correct types in tablegen multiclasses found by D95874.	2021-02-02 10:39:47 -08:00
Fangrui Song	ba0d4574ce	[ConstraintElimination] Fix -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off build	2021-02-02 10:23:14 -08:00
Craig Topper	6d91a300e2	[RISCV] Use a ComplexPattern to merge isel patterns for vector load/store with GPR and FrameIndex addresses. This reduces the isel table size by about 3000 bytes. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D95844	2021-02-02 10:20:52 -08:00
Greg McGary	bc41ebf40b	[lld-macho][NFC] refactor relocation handling Add per-reloc-type attribute bits and migrate code from per-target file into target independent code, driven by reloc attributes. Many cleanups Differential Revision: https://reviews.llvm.org/D95121	2021-02-02 10:54:53 -07:00
Fangrui Song	7a2d87a064	[MC] Upgrade DWARF version to 5 upon .file 0 Without `-dwarf-version`, llvm-mc uses the default `MCContext::DwarfVersion` 4. Without `-gdwarf-N`, Clang cc1as uses `clang::driver::ToolChain::GetDefaultDwarfVersion` which is 4 on many toolchains. Note: `clang -c` can synthesize .debug_info without -g. There is currently a MCParser warning upon `.file 0` and MCParser errors upon `.loc 0` if the DWARF version is less than 5. This causes friction to the following usage: ``` clang -S -g -gdwarf-5 a.c // MC warning due to .file 0, MC error due to .loc 0 clang -c a.s llvm-mc -filetype=obj a.s ``` My idea is that we can just upgrade `MCContext::DwarfVersion` to 5 upon `.file 0` to make the above commands work. The downside is that for an explicit version `clang -c -gdwarf-4 a.s`, it can be argued that the new behavior drops the probably intended diagnostic. I think the downside is small because in most cases DWARF version for an assembly action should either match the original compile action or be omitted. Ongoing discussion taking a similar action for GNU as: https://sourceware.org/pipermail/binutils/2021-January/114980.html Differential Revision: https://reviews.llvm.org/D94882	2021-02-02 09:41:05 -08:00
Florian Hahn	a84d68c5a7	[ConstraintElimination] Add test with pointer bitcast.	2021-02-02 17:36:05 +00:00
Fangrui Song	b110101bd6	[MC] Support SHF_GNU_RETAIN as section flag 'R' On Linux target triples, GNU as sets EI_OSABI to ELFOSABI_GNU when SHF_GNU_RETAIN is used。 On `--freebsd`, it usually sets EI_OSABI to ELFOSABI_FREEBSD. GNU ld respects SHF_GNU_RETAIN only for ELFOSABI_FREEBSD/ELFOSABI_GNU. https://sourceware.org/bugzilla/show_bug.cgi?id=27282 MC doesn't set ELFOSABI_GNU for SHF_GNU_RETAIN/STB_GNU_UNIQUE/STT_GNU_IFUNC. MC assembled object files do not have special semantics in GNU ld. Reviewed By: psmith Differential Revision: https://reviews.llvm.org/D95730	2021-02-02 09:34:09 -08:00
Fangrui Song	4be3ad3853	[yaml2obj/obj2yaml/llvm-readobj] Support SHF_GNU_RETAIN In binutils, the flag is defined for ELFOSABI_GNU and ELFOSABI_FREEBSD. It can be used to mark a section as a GC root. In practice, the flag has generic semantics and can be applied to many EI_OSABI values, so we consider it generic. Differential Revision: https://reviews.llvm.org/D95728	2021-02-02 09:19:53 -08:00
Sanjay Patel	a9c32b94c7	[ExpandReductions] add test for fmin with FMF; NFC	2021-02-02 12:17:08 -05:00
Jeroen Dobbelaere	378758ee45	[InlineFunction] Only update noalias scopes once for an instruction. Inlining sometimes maps different instructions to be inlined onto the same instruction. We must ensure to only remap the noalias scopes once. Otherwise the scope might disappear (at best). This patch ensures that we only replace scopes for which the mapping is known. This approach is preferred over tracking which instructions we already handled in a SmallPtrSet, as that one will need more memory. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95862	2021-02-02 17:57:10 +01:00
David Green	e7a4a9e940	[ARM] Correct some tablegen operand types. NFC	2021-02-02 16:55:31 +00:00
Florian Hahn	4314d26857	[ConstraintElimination] Add nicer way to dump constraints (NFC). Use ConstraintSystem::dump(Names) to display the result of decomposing a condition.	2021-02-02 16:36:45 +00:00
David Green	91ae3d7f3f	[ARM] Mark MVE_VMOV_to_lane_32 as isInsertSubregLike This allows the peephole optimizer to know that a MVE_VMOV_to_lane_32 is the same as an insert subreg, allowing it to optimize some redundant lane moves. Differential Revision: https://reviews.llvm.org/D95433	2021-02-02 16:35:47 +00:00
Sebastian Neubauer	e79a30d836	[AMDGPU] Remove unused tmp register The temporary register is only used to compute the frame pointer. The frame pointer is overwritten and not used in between, so we can reuse the frame pointer for the computation, saving one register. Differential Revision: https://reviews.llvm.org/D95865	2021-02-02 17:17:54 +01:00
Sebastian Neubauer	1d5127c810	[AMDGPU] Save fp/bp after csr saves Saving callee-save registers happens in whole wave mode. Exec is saved to a free register, which can be reused to save the frame pointer. Therefore, saving the fp needs to happen after saving csrs. Differential Revision: https://reviews.llvm.org/D95861	2021-02-02 17:17:54 +01:00
Wenlei He	eddb3ffdfb	[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path. This is resubmit of D95024, with build break and overtighten assertion fixed. Test Plan:	2021-02-02 07:55:08 -08:00
Stefan Pintilie	020aa3c460	[PowerPC] Materialize 34 bit constants with pli on Power 10. NOTE: This patch was originally written by Anil Mahmud. His code has been rebased but otherwise left mostly unchanged. A new instructon on Power 10 allows for the materialization of 34 bit immediate values. This patch allows the compiler to take advantage of the new instruction in this situation. Reviewed By: amyk Differential Revision: https://reviews.llvm.org/D92879	2021-02-02 09:49:22 -06:00
David Green	bee1ac1e53	[ARM] Add MVE insert-of-extract pattern A v4i32 insert of an extract can become a simple lane move, as opposed to round-tripping via a GPR. This adds a patterns that turns an v4i32 insert-extract pair into a EXTRACT_SUBREG/INSERT_SUBREG, with the required COPY_TO_REGCLASS. These get better optimized into a simple lane move by the rest of the backend. Differential Revision: https://reviews.llvm.org/D95428	2021-02-02 15:15:04 +00:00
Roman Lebedev	0136cec13c	[InstCombine] Host inversion out of ashr's value operand (PR48995) This is a yet another hint that we will eventually need InstCombineInverter, which would consistently sink inversions, but but for that we'll need to consistently hoist inversions where possible, so let's do that here. Example of a proof: https://alive2.llvm.org/ce/z/78SbDq See https://bugs.llvm.org/show_bug.cgi?id=48995	2021-02-02 17:56:43 +03:00
Roman Lebedev	27b7c783ff	[NFC][InstCombine] Add tests for (~x) a>> y --> ~(x a>> y) fold (PR48995) See https://bugs.llvm.org/show_bug.cgi?id=48995	2021-02-02 17:56:31 +03:00
Tom Weaver	99a2c6afdf	Revert "[InstrProfiling] Use !associated metadata for counters, data and values" This reverts commit df3e39f60b356ca9dbfc11e96e5fdda30afa7acb. introduced failing test instrprof-gc-sections.c causing build bot to fail: http://lab.llvm.org:8011/#/builders/53/builds/1184	2021-02-02 14:19:31 +00:00
David Green	b8cb37b9c8	[ARM] Extra shuffle tests. NFC	2021-02-02 14:16:42 +00:00
David Green	dc98bdfb7d	[ARM] Select VINS from vector inserts This patch adds tablegen patterns for pairs of i16/f16 insert/extracts. If we are inserting into two adjacent vector lanes (0 and 1 for example), we can use either a vmov;vins or vmovx;vins to insert the pair together, avoiding a round-trip from GRP registers. This is quite a large patterns with a number of EXTRACT_SUBREG/INSERT_SUBREG/ COPY_TO_REGCLASS nodes, but hopefully as most of those become copies all that will be cleaned up by further optimizations. The VINS pattern was also adjusted to allow it to represent that it is inserting into the top half of an existing register. Differential Revision: https://reviews.llvm.org/D95381	2021-02-02 13:50:02 +00:00
Simon Pilgrim	347d983c7a	[X86][SSE] LowerINSERT_VECTOR_ELT - pull out repeated EltSizeInBits calls. NFCI.	2021-02-02 13:45:18 +00:00
Sander de Smalen	45b8ee9aa4	NFC: Migrate SpeculateAroundPHIs to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: ctetreau Differential Revision: https://reviews.llvm.org/D95353	2021-02-02 13:32:45 +00:00
Sander de Smalen	53e3448991	NFC: Migrate SimpleLoopUnswitch to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D95352	2021-02-02 13:32:44 +00:00
Dmitry Preobrazhensky	e22eed53cb	[AMDGPU][MC] Corrected parsing of optional modifiers Fixed bugs in parsing of "no*" modifiers and improved errors handling. See https://bugs.llvm.org/show_bug.cgi?id=41282. Differential Revision: https://reviews.llvm.org/D95675	2021-02-02 14:52:29 +03:00
Simon Pilgrim	8caaa7e0d2	[X86][AVX512] Support variable-index vector insertion on AVX512 targets (PR47924) With predicate masks, AVX512 can efficiently perform variable-index vector insertion with 2 broadcasts + 1 comparison, avoiding a lot of aliased memory traffic. Differential Revision: https://reviews.llvm.org/D95779	2021-02-02 11:41:18 +00:00
Andrew Ng	dd96c8fb71	[X86] Fix disassembly of x86-64 GDTLS code sequence For x86-64 the REX.w prefix takes precedence over any other size override (i.e. 0x66). Therefore, for x86-64 when REX.w is present set 'hasOpSize' to false to ensure that any size override is ignored. Fixes PR48901. Differential Revision: https://reviews.llvm.org/D95682	2021-02-02 11:35:00 +00:00
Simon Pilgrim	0e7f51461d	[X86][AVX] Add missing VEX_WIG tags from VPACKUSDW/VPHSUBD/VPCMPISTRI/VPCMPISTRM/VPCMPESTRI/VPCMPESTRM Fixes PR48877 Differential Revision: https://reviews.llvm.org/D95801	2021-02-02 11:25:44 +00:00
David Green	505ef9e320	[ARM] Remove DLS lr, lr A DLS lr, lr instruction only moves lr to itself. It need not be emitted on it's own to save a instruction in the loop preheader. Differential Revision: https://reviews.llvm.org/D78916	2021-02-02 11:09:31 +00:00
Adrian Kuegel	306c653648	Revert "[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline" This reverts commit 9a03058d6322edb8abc803ba3e436cc62647d979.	2021-02-02 11:51:04 +01:00
Adrian Kuegel	930ba0d06e	Revert "Fix build break from D95024" This reverts commit 09cd849fdef2b2d3de2d0b0a5c512100957e0ef6.	2021-02-02 11:51:04 +01:00
David Green	2ad4eade18	[ARM] Regenerate LowOverheadLoops mir tests. NFC	2021-02-02 10:28:58 +00:00
David Sherwood	81a8f9f3aa	[SVE][LoopVectorize] Add masked load/store and gather/scatter support for SVE This patch updates IRBuilder::CreateMaskedGather/Scatter to work with ScalableVectorType and adds isLegalMaskedGather/Scatter functions to AArch64TargetTransformInfo. In addition I've fixed up isLegalMaskedLoad/Store to return true for supported scalar types, since this is what the vectorizer asks for. In LoopVectorize.cpp I've changed LoopVectorizationCostModel::getInterleaveGroupCost to return an invalid cost for scalable vectors, since currently this relies upon using shuffle vector for reversing vectors. In addition, in LoopVectorizationCostModel::setCostBasedWideningDecision I have assumed that the cost of scalarising memory ops is infinitely expensive. I have added some simple masked load/store and gather/scatter tests, including cases where we use gathers and scatters for conditional invariant loads and stores. Differential Revision: https://reviews.llvm.org/D95350	2021-02-02 09:52:39 +00:00
Benjamin Kramer	96428d51bc	Fold one-use variable into assert. NFCI. Avoids a warning in Release builds.	2021-02-02 10:50:48 +01:00
Sebastian Neubauer	8175a7d5b7	[AMDGPU] Mark epilog restores as frame-destroy I guess instructions were marked as frame-setup by accident, they are restores as part of the epilog. Differential Revision: https://reviews.llvm.org/D95783	2021-02-02 10:24:37 +01:00
Sebastian Neubauer	32c7642ef3	[AMDGPU] Clarify calling conv about inactive lanes So far, it was not specified what happens with the VGPRs of inactive lanes when functions are called. This patch explicitely mentions that the VGPR values of inactive lanes need to be preserved for all registers. This describes the current behavior, as only active lanes of registers are saved to scratch. Also, as the multi-lane nature of VGPRs is not properly modeled, we cannot determine the live VGPRs from inactive lanes at calls. So we cannot save them, even if we intended to do so. Differential Revision: https://reviews.llvm.org/D95610	2021-02-02 10:15:09 +01:00
Wenlei He	41fcc00b36	Fix build break from D95024	2021-02-02 01:01:12 -08:00
Wenlei He	60e5015150	[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path. Test Plan: Differential Revision: https://reviews.llvm.org/D95024	2021-02-02 00:34:06 -08:00
Thomas Symalla	af31f24b1c	Fixed includes. Differential Revision: https://reviews.llvm.org/D93708	2021-02-02 09:14:54 +01:00
Thomas Symalla	298055ddcd	Fixed includes.	2021-02-02 09:14:54 +01:00
Thomas Symalla	a7e61e92bb	Reverted whitespace changes. Differential Revision: https://reviews.llvm.org/D90968	2021-02-02 09:14:54 +01:00
Thomas Symalla	b7a94c0dc2	Added missing includes.	2021-02-02 09:14:54 +01:00
Thomas Symalla	ba95ff2c1c	Renamed med3 opcode, removed superfluous copy.	2021-02-02 09:14:54 +01:00

1 2 3 4 5 ...

210757 Commits