llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 11:02:59 +02:00

Author	SHA1	Message	Date
Thomas Symalla	298055ddcd	Fixed includes.	2021-02-02 09:14:54 +01:00
Thomas Symalla	a7e61e92bb	Reverted whitespace changes. Differential Revision: https://reviews.llvm.org/D90968	2021-02-02 09:14:54 +01:00
Thomas Symalla	b7a94c0dc2	Added missing includes.	2021-02-02 09:14:54 +01:00
Thomas Symalla	ba95ff2c1c	Renamed med3 opcode, removed superfluous copy.	2021-02-02 09:14:54 +01:00
Thomas Symalla	087ff79f1a	Removed the generic virtual register creations. Reworked the tests.	2021-02-02 09:14:54 +01:00
Thomas Symalla	f548399826	Implemented a MED3_S32 GIR opcode.	2021-02-02 09:14:53 +01:00
Thomas Symalla	ee664c032b	Added and used new target pseudo for v_cvt_pk_i16_i32, changes due to code review.	2021-02-02 09:14:53 +01:00
Thomas Symalla	43278a1cb3	Formatting changes	2021-02-02 09:14:53 +01:00
Thomas Symalla	46f1f49a56	Formatting changes.	2021-02-02 09:14:53 +01:00
Thomas Symalla	8aacc5adcd	Updating formatting changes.	2021-02-02 09:14:53 +01:00
Thomas Symalla	8955159f2d	Resolve formatting changes.	2021-02-02 09:14:53 +01:00
Thomas Symalla	38676d07e0	Code changes yielded from review.	2021-02-02 09:14:53 +01:00
Thomas Symalla	4c17035470	Fixed tests.	2021-02-02 09:14:53 +01:00
Thomas Symalla	ea43201600	Move step to PreLegalizer	2021-02-02 09:14:53 +01:00
Thomas Symalla	122a71a9f1	Move Combiner to PreLegalize step	2021-02-02 09:14:53 +01:00
Thomas Symalla	75cc84f30a	Renamed identifiers in lit	2021-02-02 09:14:53 +01:00
Thomas Symalla	9a5185f66f	Reverted unintended git-format change.	2021-02-02 09:14:52 +01:00
Thomas Symalla	5e6c38bc76	Fixed the lit tests and a bug in the implementation.	2021-02-02 09:14:52 +01:00
Thomas Symalla	8206639df8	Refactored the pattern matching.	2021-02-02 09:14:52 +01:00
Thomas Symalla	2f722f6a50	Renames	2021-02-02 09:14:52 +01:00
Thomas Symalla	555bb61a39	Added early exit.	2021-02-02 09:14:52 +01:00
Thomas Symalla	1f11e485f5	Added comments.	2021-02-02 09:14:52 +01:00
Thomas Symalla	ae887237b4	clang-format	2021-02-02 09:14:52 +01:00
Thomas Symalla	1663fb4919	Added clamp i64 to i16 global isel pattern.	2021-02-02 09:14:52 +01:00
Craig Topper	6d008ca25f	[RISCV] Replace NoX0 SDNodeXForm with a ComplexPattern to do the selection of the VL operand. I think this is a more standard way of doing this. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D95833	2021-02-02 00:08:58 -08:00
Wenlei He	4a2e76b6b1	[CSSPGO] Call site prioritized inlining for sample PGO This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`). Motivation With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO. - It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat. - The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately. - It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO. New FDO Inliner Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed. - Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655. - Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check. - Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites. - BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path. Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO). Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO. Tunings and knobs I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO. Results Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it). Differential Revision: https://reviews.llvm.org/D94001	2021-02-01 23:46:34 -08:00
Craig Topper	0dc42fa551	[SelectionDAG] Prevent scalable vector warning from ComputeNumSignBits on extract_vector_elt on a scalable vector.	2021-02-01 23:42:03 -08:00
Puyan Lotfi	293c83b00b	Revert "[AArch64] Homogeneous Prolog and Epilog Size Optimization" This reverts commit 0426be3df6180747bd68706db87a70580f064f0f. Reverting due to some expensive-checks failures in tests.	2021-02-02 02:33:44 -05:00
Fangrui Song	511ed9320f	[test] Fix unused FileCheck prefixes in test/Reduce	2021-02-01 23:05:46 -08:00
Fangrui Song	712f0e3c09	[test] Fix unused FileCheck prefixes in clang-tidy and one llvm/test/Reduce test	2021-02-01 22:51:29 -08:00
Lang Hames	1856abe361	[ORC] Clear unused materializing info entries. Once a symbol is Ready its MaterializingInfo entry is unused and can be removed to free up some memory.	2021-02-02 17:47:32 +11:00
Gil Rapaport	1307269e6f	[SCEV] Apply loop guards to divisibility tests Extend applyLoopGuards() to take into account conditions/assumes proving some value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the loop unroller and the loop vectorizer identify more loops as not requiring remainder loops. Differential Revision: https://reviews.llvm.org/D95521	2021-02-02 08:09:39 +02:00
Kyungwoo Lee	28c9f1933e	[AArch64] Homogeneous Prolog and Epilog Size Optimization Prologs and epilogs handle callee-save registers and tend to be irregular with different immediate offsets that are not often handled by the MachineOutliner. Commit D18619/a5335647d5e8 (combining stack operations) stretched irregularity further. This patch tries to emit homogeneous stores and loads with the same offset for prologs and epilogs respectively. We have observed that this canonicalizes (homogenizes) prologs and epilogs significantly and results in a greatly increased chance of outlining, resulting in a code size reduction. Despite the above results, there are still size wins to be had that the MachineOutliner does not provide due to the special handling X30/LR. To handle the LR case, his patch custom-outlines prologs and epilogs in place. It does this by doing the following: * Injects HOM_Prolog and HOM_Epilog pseudo instructions during a Prolog and Epilog Injection Pass. * Lowers and optimizes said pseudos in a AArchLowerHomogneousPrologEpilog Pass. * Outlined helpers are created on demand. Identical helpers are merged by the linker. * An opt-in flag is introduced to enable this feature. Another threshold flag is also introduced to control the aggressiveness of outlining for application's need. This reduced an average of 4% of code size on LLVM-TestSuite/CTMark targeting arm64/-Oz. Differential Revision: https://reviews.llvm.org/D76570	2021-02-02 00:26:51 -05:00
Nathan Hawes	574038ba44	[VFS] Add support to RedirectingFileSystem for mapping a virtual directory to one in the external FS. Previously file entries in the -ivfsoverlay yaml could map to a file in the external file system, but directories had to list their contents in the form of other file entries or directories. Allowing directory entries to map to a directory in the external file system makes it possible to present an external directory's contents in a different location and (in combination with the 'fallthrough' option) overlay one directory's contents on top of another. rdar://problem/72485443 Differential Revision: https://reviews.llvm.org/D94844	2021-02-02 14:56:17 +10:00
Kazu Hirata	e420343cf4	[TableGen] Use range-based for loops (NFC)	2021-02-01 20:55:09 -08:00
Kazu Hirata	8694450fb2	[TableGen] Use ListSeparator (NFC)	2021-02-01 20:55:07 -08:00
Kazu Hirata	ee92719619	[llvm] Use pop_back_val (NFC)	2021-02-01 20:55:05 -08:00
Fangrui Song	6659e20d2b	[LoopVectorize] Relax a FCmpInst assert to dyn_cast after D95690 The instruction may be `icmp eq i32`. Noticed in an internal Halide+wasm JIT test.	2021-02-01 19:28:45 -08:00
Matt Arsenault	e64514a4fe	AMDGPU: Fix dbg_value handling when forming soft clause bundles DBG_VALUES placed between memory instructions would change codegen. Skip over these and re-insert them after the bundle instead of giving up on bundling.	2021-02-01 22:16:35 -05:00
Mircea Trofin	ea66bf2f6b	[Utils] Add a switch controlling prefix warnings in UpdateTestChecks The switch controls both unused prefix warnings, and warnings about functions which differ under different runs for a prefix, and, thus, end up not having asserts for that prefix. (If the latter case spans to all functions, then the former case kicks in) The switch is on by default, and can be disabled. Differential Revision: https://reviews.llvm.org/D95829	2021-02-01 18:04:18 -08:00
Philip Reames	29acfb48a9	[x86] introduce no_callee_saved_registers attribute This is directly analogous to the existing no_caller_saved_registers, but with the opposite intention. A function or call so marked shifts the responsibility of spilling the usual CSRs to it's caller. An indirect call site and callee which don't agree on the attribute is ill defined. The motivation for this change is that being able to prune callee saves (without modifying other details of the calling convention) is sometimes useful when generating stubs and adapters. There's no intention to expose this as a source language feature; this is expected to be used by frontends to implement adapters where warranted. Some specific examples of use cases: * GC compatible compiled code wants to call an externally defined library function without needing to track pointer values through CSRs. * debug enabled code wants to call precompiled library which doesn't provide enough information to track CSRs while preserving debug quality in caller. * adapter stub entering hand written assembler which doesn't follow normal calling conventions.	2021-02-01 16:19:14 -08:00
Rahman Lavaee	3beeb7f456	[obj2yaml, yaml2obj] Use Hex64 for BBAddressMap fields. This patch let the yaml encoding use Hex64 values for NumBlocks, BB AddressOffset, BB Size, and BB Metadata. Additionally, it changes the decoded values in elf2yaml to uint64_t to match DataExtractor::getULEB128 return type. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D95767	2021-02-01 15:37:30 -08:00
Philip Reames	dfa3b662c8	[NFC][X86] Use CallBase interface to simplify code	2021-02-01 15:24:41 -08:00
Philip Reames	1af57bc4bf	[NFC][X86] Avoid redundant work inspecting callee	2021-02-01 15:24:41 -08:00
Petr Hosek	62cf2a8725	[InstrProfiling] Use !associated metadata for counters, data and values C identifier name input sections such as __llvm_prf_* are GC roots so they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the C identifier name semantics. The !associated metadata may be attached to a global object declaration with a single argument that references another global object, and it gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded by the linker, setting up !associated metadata allows linker to discard counters, data and values associated with that function symbol. Note that !associated metadata is only supported by ELF, it does not have any effect on non-ELF targets. Differential Revision: https://reviews.llvm.org/D76802	2021-02-01 15:01:43 -08:00
Patrick Oppenlander	019e9907bd	[llvm-objcopy] -O binary: consider SHT_NOBITS sections to be empty This is consistent with BFD objcopy. Previously llvm objcopy would allocate space for SHT_NOBITS sections often resulting in enormous binary files. New test case (binary-paddr.test %t6). Reviewed By: jhenderson, MaskRay Differential Revision: https://reviews.llvm.org/D95569	2021-02-01 15:01:25 -08:00
Hongtao Yu	7761552c73	[CSSPGO] Tweaking inlining with pseudo probes. Fixing up a couple places where `getCallSiteIdentifier` is needed to support pseudo-probe-based callsites. Also fixing an issue in the extbinary profile reader where the metadata section is not fully scanned based on the number of profiles loaded only for the current module. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95791	2021-02-01 13:56:40 -08:00
Philip Reames	95ddf0834f	[tests] highlight cornercase w/deref hoisting from D95815 The main point of committing this early is to have a negative test in tree. Nothing fails in the current tests if we implement this (currently unsound) optimization.	2021-02-01 13:32:39 -08:00
Sanjay Patel	a3cb545d77	[LoopVectorize] improve IR fast-math-flags propagation in reductions This is another step (see D95452) towards correcting fast-math-flags bugs in vector reductions. There are multiple bugs visible in the test diffs, and this is still not working as it should. We still use function attributes (rather than FMF) to drive part of the logic, but we are not checking for the correct FP function attributes. Note that FMF may not be propagated optimally on selects (example in https://llvm.org/PR35607 ). That's why I'm proposing to union the FMF of a fcmp+select pair and avoid regressions on existing vectorizer tests. Differential Revision: https://reviews.llvm.org/D95690	2021-02-01 16:21:36 -05:00
Florian Hahn	59b7da73aa	[ConstraintElimination] Add support for EQ predicates. A == B map to A >= B && A <= B (https://alive2.llvm.org/ce/z/_dwxKn). This extends the constraint construction to return a list of constraints, which can be used to properly de-compose nested AND & OR.	2021-02-01 20:48:31 +00:00

1 2 3 4 5 ...

210661 Commits