llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 02:52:53 +02:00

Author	SHA1	Message	Date
Dmitry Preobrazhensky	e22eed53cb	[AMDGPU][MC] Corrected parsing of optional modifiers Fixed bugs in parsing of "no*" modifiers and improved errors handling. See https://bugs.llvm.org/show_bug.cgi?id=41282. Differential Revision: https://reviews.llvm.org/D95675	2021-02-02 14:52:29 +03:00
Simon Pilgrim	8caaa7e0d2	[X86][AVX512] Support variable-index vector insertion on AVX512 targets (PR47924) With predicate masks, AVX512 can efficiently perform variable-index vector insertion with 2 broadcasts + 1 comparison, avoiding a lot of aliased memory traffic. Differential Revision: https://reviews.llvm.org/D95779	2021-02-02 11:41:18 +00:00
Andrew Ng	dd96c8fb71	[X86] Fix disassembly of x86-64 GDTLS code sequence For x86-64 the REX.w prefix takes precedence over any other size override (i.e. 0x66). Therefore, for x86-64 when REX.w is present set 'hasOpSize' to false to ensure that any size override is ignored. Fixes PR48901. Differential Revision: https://reviews.llvm.org/D95682	2021-02-02 11:35:00 +00:00
Simon Pilgrim	0e7f51461d	[X86][AVX] Add missing VEX_WIG tags from VPACKUSDW/VPHSUBD/VPCMPISTRI/VPCMPISTRM/VPCMPESTRI/VPCMPESTRM Fixes PR48877 Differential Revision: https://reviews.llvm.org/D95801	2021-02-02 11:25:44 +00:00
David Green	505ef9e320	[ARM] Remove DLS lr, lr A DLS lr, lr instruction only moves lr to itself. It need not be emitted on it's own to save a instruction in the loop preheader. Differential Revision: https://reviews.llvm.org/D78916	2021-02-02 11:09:31 +00:00
Adrian Kuegel	306c653648	Revert "[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline" This reverts commit 9a03058d6322edb8abc803ba3e436cc62647d979.	2021-02-02 11:51:04 +01:00
Adrian Kuegel	930ba0d06e	Revert "Fix build break from D95024" This reverts commit 09cd849fdef2b2d3de2d0b0a5c512100957e0ef6.	2021-02-02 11:51:04 +01:00
David Green	2ad4eade18	[ARM] Regenerate LowOverheadLoops mir tests. NFC	2021-02-02 10:28:58 +00:00
David Sherwood	81a8f9f3aa	[SVE][LoopVectorize] Add masked load/store and gather/scatter support for SVE This patch updates IRBuilder::CreateMaskedGather/Scatter to work with ScalableVectorType and adds isLegalMaskedGather/Scatter functions to AArch64TargetTransformInfo. In addition I've fixed up isLegalMaskedLoad/Store to return true for supported scalar types, since this is what the vectorizer asks for. In LoopVectorize.cpp I've changed LoopVectorizationCostModel::getInterleaveGroupCost to return an invalid cost for scalable vectors, since currently this relies upon using shuffle vector for reversing vectors. In addition, in LoopVectorizationCostModel::setCostBasedWideningDecision I have assumed that the cost of scalarising memory ops is infinitely expensive. I have added some simple masked load/store and gather/scatter tests, including cases where we use gathers and scatters for conditional invariant loads and stores. Differential Revision: https://reviews.llvm.org/D95350	2021-02-02 09:52:39 +00:00
Benjamin Kramer	96428d51bc	Fold one-use variable into assert. NFCI. Avoids a warning in Release builds.	2021-02-02 10:50:48 +01:00
Sebastian Neubauer	8175a7d5b7	[AMDGPU] Mark epilog restores as frame-destroy I guess instructions were marked as frame-setup by accident, they are restores as part of the epilog. Differential Revision: https://reviews.llvm.org/D95783	2021-02-02 10:24:37 +01:00
Sebastian Neubauer	32c7642ef3	[AMDGPU] Clarify calling conv about inactive lanes So far, it was not specified what happens with the VGPRs of inactive lanes when functions are called. This patch explicitely mentions that the VGPR values of inactive lanes need to be preserved for all registers. This describes the current behavior, as only active lanes of registers are saved to scratch. Also, as the multi-lane nature of VGPRs is not properly modeled, we cannot determine the live VGPRs from inactive lanes at calls. So we cannot save them, even if we intended to do so. Differential Revision: https://reviews.llvm.org/D95610	2021-02-02 10:15:09 +01:00
Wenlei He	41fcc00b36	Fix build break from D95024	2021-02-02 01:01:12 -08:00
Wenlei He	60e5015150	[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path. Test Plan: Differential Revision: https://reviews.llvm.org/D95024	2021-02-02 00:34:06 -08:00
Thomas Symalla	af31f24b1c	Fixed includes. Differential Revision: https://reviews.llvm.org/D93708	2021-02-02 09:14:54 +01:00
Thomas Symalla	298055ddcd	Fixed includes.	2021-02-02 09:14:54 +01:00
Thomas Symalla	a7e61e92bb	Reverted whitespace changes. Differential Revision: https://reviews.llvm.org/D90968	2021-02-02 09:14:54 +01:00
Thomas Symalla	b7a94c0dc2	Added missing includes.	2021-02-02 09:14:54 +01:00
Thomas Symalla	ba95ff2c1c	Renamed med3 opcode, removed superfluous copy.	2021-02-02 09:14:54 +01:00
Thomas Symalla	087ff79f1a	Removed the generic virtual register creations. Reworked the tests.	2021-02-02 09:14:54 +01:00
Thomas Symalla	f548399826	Implemented a MED3_S32 GIR opcode.	2021-02-02 09:14:53 +01:00
Thomas Symalla	ee664c032b	Added and used new target pseudo for v_cvt_pk_i16_i32, changes due to code review.	2021-02-02 09:14:53 +01:00
Thomas Symalla	43278a1cb3	Formatting changes	2021-02-02 09:14:53 +01:00
Thomas Symalla	46f1f49a56	Formatting changes.	2021-02-02 09:14:53 +01:00
Thomas Symalla	8aacc5adcd	Updating formatting changes.	2021-02-02 09:14:53 +01:00
Thomas Symalla	8955159f2d	Resolve formatting changes.	2021-02-02 09:14:53 +01:00
Thomas Symalla	38676d07e0	Code changes yielded from review.	2021-02-02 09:14:53 +01:00
Thomas Symalla	4c17035470	Fixed tests.	2021-02-02 09:14:53 +01:00
Thomas Symalla	ea43201600	Move step to PreLegalizer	2021-02-02 09:14:53 +01:00
Thomas Symalla	122a71a9f1	Move Combiner to PreLegalize step	2021-02-02 09:14:53 +01:00
Thomas Symalla	75cc84f30a	Renamed identifiers in lit	2021-02-02 09:14:53 +01:00
Thomas Symalla	9a5185f66f	Reverted unintended git-format change.	2021-02-02 09:14:52 +01:00
Thomas Symalla	5e6c38bc76	Fixed the lit tests and a bug in the implementation.	2021-02-02 09:14:52 +01:00
Thomas Symalla	8206639df8	Refactored the pattern matching.	2021-02-02 09:14:52 +01:00
Thomas Symalla	2f722f6a50	Renames	2021-02-02 09:14:52 +01:00
Thomas Symalla	555bb61a39	Added early exit.	2021-02-02 09:14:52 +01:00
Thomas Symalla	1f11e485f5	Added comments.	2021-02-02 09:14:52 +01:00
Thomas Symalla	ae887237b4	clang-format	2021-02-02 09:14:52 +01:00
Thomas Symalla	1663fb4919	Added clamp i64 to i16 global isel pattern.	2021-02-02 09:14:52 +01:00
Craig Topper	6d008ca25f	[RISCV] Replace NoX0 SDNodeXForm with a ComplexPattern to do the selection of the VL operand. I think this is a more standard way of doing this. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D95833	2021-02-02 00:08:58 -08:00
Wenlei He	4a2e76b6b1	[CSSPGO] Call site prioritized inlining for sample PGO This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`). Motivation With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO. - It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat. - The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately. - It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO. New FDO Inliner Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed. - Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655. - Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check. - Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites. - BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path. Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO). Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO. Tunings and knobs I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO. Results Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it). Differential Revision: https://reviews.llvm.org/D94001	2021-02-01 23:46:34 -08:00
Craig Topper	0dc42fa551	[SelectionDAG] Prevent scalable vector warning from ComputeNumSignBits on extract_vector_elt on a scalable vector.	2021-02-01 23:42:03 -08:00
Puyan Lotfi	293c83b00b	Revert "[AArch64] Homogeneous Prolog and Epilog Size Optimization" This reverts commit 0426be3df6180747bd68706db87a70580f064f0f. Reverting due to some expensive-checks failures in tests.	2021-02-02 02:33:44 -05:00
Fangrui Song	511ed9320f	[test] Fix unused FileCheck prefixes in test/Reduce	2021-02-01 23:05:46 -08:00
Fangrui Song	712f0e3c09	[test] Fix unused FileCheck prefixes in clang-tidy and one llvm/test/Reduce test	2021-02-01 22:51:29 -08:00
Lang Hames	1856abe361	[ORC] Clear unused materializing info entries. Once a symbol is Ready its MaterializingInfo entry is unused and can be removed to free up some memory.	2021-02-02 17:47:32 +11:00
Gil Rapaport	1307269e6f	[SCEV] Apply loop guards to divisibility tests Extend applyLoopGuards() to take into account conditions/assumes proving some value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the loop unroller and the loop vectorizer identify more loops as not requiring remainder loops. Differential Revision: https://reviews.llvm.org/D95521	2021-02-02 08:09:39 +02:00
Kyungwoo Lee	28c9f1933e	[AArch64] Homogeneous Prolog and Epilog Size Optimization Prologs and epilogs handle callee-save registers and tend to be irregular with different immediate offsets that are not often handled by the MachineOutliner. Commit D18619/a5335647d5e8 (combining stack operations) stretched irregularity further. This patch tries to emit homogeneous stores and loads with the same offset for prologs and epilogs respectively. We have observed that this canonicalizes (homogenizes) prologs and epilogs significantly and results in a greatly increased chance of outlining, resulting in a code size reduction. Despite the above results, there are still size wins to be had that the MachineOutliner does not provide due to the special handling X30/LR. To handle the LR case, his patch custom-outlines prologs and epilogs in place. It does this by doing the following: * Injects HOM_Prolog and HOM_Epilog pseudo instructions during a Prolog and Epilog Injection Pass. * Lowers and optimizes said pseudos in a AArchLowerHomogneousPrologEpilog Pass. * Outlined helpers are created on demand. Identical helpers are merged by the linker. * An opt-in flag is introduced to enable this feature. Another threshold flag is also introduced to control the aggressiveness of outlining for application's need. This reduced an average of 4% of code size on LLVM-TestSuite/CTMark targeting arm64/-Oz. Differential Revision: https://reviews.llvm.org/D76570	2021-02-02 00:26:51 -05:00
Nathan Hawes	574038ba44	[VFS] Add support to RedirectingFileSystem for mapping a virtual directory to one in the external FS. Previously file entries in the -ivfsoverlay yaml could map to a file in the external file system, but directories had to list their contents in the form of other file entries or directories. Allowing directory entries to map to a directory in the external file system makes it possible to present an external directory's contents in a different location and (in combination with the 'fallthrough' option) overlay one directory's contents on top of another. rdar://problem/72485443 Differential Revision: https://reviews.llvm.org/D94844	2021-02-02 14:56:17 +10:00
Kazu Hirata	e420343cf4	[TableGen] Use range-based for loops (NFC)	2021-02-01 20:55:09 -08:00

1 2 3 4 5 ...

210676 Commits