1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 11:02:59 +02:00
Commit Graph

210661 Commits

Author SHA1 Message Date
Thomas Symalla
298055ddcd Fixed includes. 2021-02-02 09:14:54 +01:00
Thomas Symalla
a7e61e92bb Reverted whitespace changes.
Differential Revision: https://reviews.llvm.org/D90968
2021-02-02 09:14:54 +01:00
Thomas Symalla
b7a94c0dc2 Added missing includes. 2021-02-02 09:14:54 +01:00
Thomas Symalla
ba95ff2c1c Renamed med3 opcode, removed superfluous copy. 2021-02-02 09:14:54 +01:00
Thomas Symalla
087ff79f1a Removed the generic virtual register creations. Reworked the tests. 2021-02-02 09:14:54 +01:00
Thomas Symalla
f548399826 Implemented a MED3_S32 GIR opcode. 2021-02-02 09:14:53 +01:00
Thomas Symalla
ee664c032b Added and used new target pseudo for v_cvt_pk_i16_i32, changes due to code review. 2021-02-02 09:14:53 +01:00
Thomas Symalla
43278a1cb3 Formatting changes 2021-02-02 09:14:53 +01:00
Thomas Symalla
46f1f49a56 Formatting changes. 2021-02-02 09:14:53 +01:00
Thomas Symalla
8aacc5adcd Updating formatting changes. 2021-02-02 09:14:53 +01:00
Thomas Symalla
8955159f2d Resolve formatting changes. 2021-02-02 09:14:53 +01:00
Thomas Symalla
38676d07e0 Code changes yielded from review. 2021-02-02 09:14:53 +01:00
Thomas Symalla
4c17035470 Fixed tests. 2021-02-02 09:14:53 +01:00
Thomas Symalla
ea43201600 Move step to PreLegalizer 2021-02-02 09:14:53 +01:00
Thomas Symalla
122a71a9f1 Move Combiner to PreLegalize step 2021-02-02 09:14:53 +01:00
Thomas Symalla
75cc84f30a Renamed identifiers in lit 2021-02-02 09:14:53 +01:00
Thomas Symalla
9a5185f66f Reverted unintended git-format change. 2021-02-02 09:14:52 +01:00
Thomas Symalla
5e6c38bc76 Fixed the lit tests and a bug in the implementation. 2021-02-02 09:14:52 +01:00
Thomas Symalla
8206639df8 Refactored the pattern matching. 2021-02-02 09:14:52 +01:00
Thomas Symalla
2f722f6a50 Renames 2021-02-02 09:14:52 +01:00
Thomas Symalla
555bb61a39 Added early exit. 2021-02-02 09:14:52 +01:00
Thomas Symalla
1f11e485f5 Added comments. 2021-02-02 09:14:52 +01:00
Thomas Symalla
ae887237b4 clang-format 2021-02-02 09:14:52 +01:00
Thomas Symalla
1663fb4919 Added clamp i64 to i16 global isel pattern. 2021-02-02 09:14:52 +01:00
Craig Topper
6d008ca25f [RISCV] Replace NoX0 SDNodeXForm with a ComplexPattern to do the selection of the VL operand.
I think this is a more standard way of doing this.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D95833
2021-02-02 00:08:58 -08:00
Wenlei He
4a2e76b6b1 [CSSPGO] Call site prioritized inlining for sample PGO
This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`).

Motivation

With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO.
 - It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat.
 - The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately.
 - It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO.

New FDO Inliner

Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed.
 - Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655.
 - Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check.
 - Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites.
 - BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path.

Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO).

Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO.

Tunings and knobs

I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO.

Results

Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it).

Differential Revision: https://reviews.llvm.org/D94001
2021-02-01 23:46:34 -08:00
Craig Topper
0dc42fa551 [SelectionDAG] Prevent scalable vector warning from ComputeNumSignBits on extract_vector_elt on a scalable vector. 2021-02-01 23:42:03 -08:00
Puyan Lotfi
293c83b00b Revert "[AArch64] Homogeneous Prolog and Epilog Size Optimization"
This reverts commit 0426be3df6180747bd68706db87a70580f064f0f.

Reverting due to some expensive-checks failures in tests.
2021-02-02 02:33:44 -05:00
Fangrui Song
511ed9320f [test] Fix unused FileCheck prefixes in test/Reduce 2021-02-01 23:05:46 -08:00
Fangrui Song
712f0e3c09 [test] Fix unused FileCheck prefixes in clang-tidy and one llvm/test/Reduce test 2021-02-01 22:51:29 -08:00
Lang Hames
1856abe361 [ORC] Clear unused materializing info entries.
Once a symbol is Ready its MaterializingInfo entry is unused and can be removed
to free up some memory.
2021-02-02 17:47:32 +11:00
Gil Rapaport
1307269e6f [SCEV] Apply loop guards to divisibility tests
Extend applyLoopGuards() to take into account conditions/assumes proving some
value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the
loop unroller and the loop vectorizer identify more loops as not requiring
remainder loops.

Differential Revision: https://reviews.llvm.org/D95521
2021-02-02 08:09:39 +02:00
Kyungwoo Lee
28c9f1933e [AArch64] Homogeneous Prolog and Epilog Size Optimization
Prologs and epilogs handle callee-save registers and tend to be irregular with
different immediate offsets that are not often handled by the MachineOutliner.
Commit D18619/a5335647d5e8 (combining stack operations) stretched irregularity
further.

This patch tries to emit homogeneous stores and loads with the same offset for
prologs and epilogs respectively. We have observed that this canonicalizes
(homogenizes) prologs and epilogs significantly and results in a greatly
increased chance of outlining, resulting in a code size reduction.

Despite the above results, there are still size wins to be had that the
MachineOutliner does not provide due to the special handling X30/LR. To handle
the LR case, his patch custom-outlines prologs and epilogs in place. It does
this by doing the following:

  * Injects HOM_Prolog and HOM_Epilog pseudo instructions during a Prolog and
    Epilog Injection Pass.
  * Lowers and optimizes said pseudos in a AArchLowerHomogneousPrologEpilog Pass.
  * Outlined helpers are created on demand. Identical helpers are merged by the linker.
  * An opt-in flag is introduced to enable this feature. Another threshold flag
    is also introduced to control the aggressiveness of outlining for application's need.

This reduced an average of 4% of code size on LLVM-TestSuite/CTMark targeting arm64/-Oz.

Differential Revision: https://reviews.llvm.org/D76570
2021-02-02 00:26:51 -05:00
Nathan Hawes
574038ba44 [VFS] Add support to RedirectingFileSystem for mapping a virtual directory to one in the external FS.
Previously file entries in the -ivfsoverlay yaml could map to a file in the
external file system, but directories had to list their contents in the form of
other file entries or directories. Allowing directory entries to map to a
directory in the external file system makes it possible to present an external
directory's contents in a different location and (in combination with the
'fallthrough' option) overlay one directory's contents on top of another.

rdar://problem/72485443
Differential Revision: https://reviews.llvm.org/D94844
2021-02-02 14:56:17 +10:00
Kazu Hirata
e420343cf4 [TableGen] Use range-based for loops (NFC) 2021-02-01 20:55:09 -08:00
Kazu Hirata
8694450fb2 [TableGen] Use ListSeparator (NFC) 2021-02-01 20:55:07 -08:00
Kazu Hirata
ee92719619 [llvm] Use pop_back_val (NFC) 2021-02-01 20:55:05 -08:00
Fangrui Song
6659e20d2b [LoopVectorize] Relax a FCmpInst assert to dyn_cast after D95690
The instruction may be `icmp eq i32`. Noticed in an internal Halide+wasm JIT test.
2021-02-01 19:28:45 -08:00
Matt Arsenault
e64514a4fe AMDGPU: Fix dbg_value handling when forming soft clause bundles
DBG_VALUES placed between memory instructions would change
codegen. Skip over these and re-insert them after the bundle instead
of giving up on bundling.
2021-02-01 22:16:35 -05:00
Mircea Trofin
ea66bf2f6b [Utils] Add a switch controlling prefix warnings in UpdateTestChecks
The switch controls both unused prefix warnings, and warnings about
functions which differ under different runs for a prefix, and, thus, end
up not having asserts for that prefix.

(If the latter case spans to all functions, then the former case kicks
in)

The switch is on by default, and can be disabled.

Differential Revision: https://reviews.llvm.org/D95829
2021-02-01 18:04:18 -08:00
Philip Reames
29acfb48a9 [x86] introduce no_callee_saved_registers attribute
This is directly analogous to the existing no_caller_saved_registers, but with the opposite intention.  A function or call so marked shifts the responsibility of spilling the usual CSRs to it's caller.

An indirect call site and callee which don't agree on the attribute is ill defined.

The motivation for this change is that being able to prune callee saves (without modifying other details of the calling convention) is sometimes useful when generating stubs and adapters.  There's no intention to expose this as a source language feature; this is expected to be used by frontends to implement adapters where warranted.

Some specific examples of use cases:
* GC compatible compiled code wants to call an externally defined library function without needing to track pointer values through CSRs.
* debug enabled code wants to call precompiled library which doesn't provide enough information to track CSRs while preserving debug quality in caller.
* adapter stub entering hand written assembler which doesn't follow normal calling conventions.
2021-02-01 16:19:14 -08:00
Rahman Lavaee
3beeb7f456 [obj2yaml, yaml2obj] Use Hex64 for BBAddressMap fields.
This patch let the yaml encoding use Hex64 values for NumBlocks, BB AddressOffset, BB Size, and BB Metadata.
Additionally, it changes the decoded values in elf2yaml to uint64_t to match DataExtractor::getULEB128 return type.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D95767
2021-02-01 15:37:30 -08:00
Philip Reames
dfa3b662c8 [NFC][X86] Use CallBase interface to simplify code 2021-02-01 15:24:41 -08:00
Philip Reames
1af57bc4bf [NFC][X86] Avoid redundant work inspecting callee 2021-02-01 15:24:41 -08:00
Petr Hosek
62cf2a8725 [InstrProfiling] Use !associated metadata for counters, data and values
C identifier name input sections such as __llvm_prf_* are GC roots so
they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the
C identifier name semantics.

The !associated metadata may be attached to a global object declaration
with a single argument that references another global object, and it
gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded
by the linker, setting up !associated metadata allows linker to discard
counters, data and values associated with that function symbol.

Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.

Differential Revision: https://reviews.llvm.org/D76802
2021-02-01 15:01:43 -08:00
Patrick Oppenlander
019e9907bd [llvm-objcopy] -O binary: consider SHT_NOBITS sections to be empty
This is consistent with BFD objcopy.

Previously llvm objcopy would allocate space for SHT_NOBITS sections
often resulting in enormous binary files.

New test case (binary-paddr.test %t6).

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D95569
2021-02-01 15:01:25 -08:00
Hongtao Yu
7761552c73 [CSSPGO] Tweaking inlining with pseudo probes.
Fixing up a couple places where `getCallSiteIdentifier` is needed to support pseudo-probe-based callsites.

Also fixing an issue in the extbinary profile reader where the metadata section is not fully scanned based on the number of profiles loaded only for the current module.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D95791
2021-02-01 13:56:40 -08:00
Philip Reames
95ddf0834f [tests] highlight cornercase w/deref hoisting from D95815
The main point of committing this early is to have a negative test in tree.  Nothing fails in the current tests if we implement this (currently unsound) optimization.
2021-02-01 13:32:39 -08:00
Sanjay Patel
a3cb545d77 [LoopVectorize] improve IR fast-math-flags propagation in reductions
This is another step (see D95452) towards correcting fast-math-flags
bugs in vector reductions.

There are multiple bugs visible in the test diffs, and this is still
not working as it should. We still use function attributes (rather
than FMF) to drive part of the logic, but we are not checking for
the correct FP function attributes.

Note that FMF may not be propagated optimally on selects (example
in https://llvm.org/PR35607 ). That's why I'm proposing to union the
FMF of a fcmp+select pair and avoid regressions on existing vectorizer
tests.

Differential Revision: https://reviews.llvm.org/D95690
2021-02-01 16:21:36 -05:00
Florian Hahn
59b7da73aa [ConstraintElimination] Add support for EQ predicates.
A == B map to A >= B && A <= B
(https://alive2.llvm.org/ce/z/_dwxKn).

This extends the constraint construction to return a list of
constraints, which can be used to properly de-compose nested AND & OR.
2021-02-01 20:48:31 +00:00