Add pseudo instruction to allow early termination of pixel shader
anywhere based on the value of SCC. The intention is to use this
when a mask of live lanes is updated, e.g. live lanes in WQM pass.
This facilitates early termination of shaders even when EXEC is
incomplete, e.g. in non-uniform control flow.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D88777
Copy over the __eh_frame from the binary into the dSYM. This helps
kernel developers that are working with only dSYMs (i.e. no binaries)
when debugging a core file. This only kicks in when the __eh_frame
exists in the linked binary. Most of the time ld64 will remove the
section in favor of compact unwind info. When it is emitted, it's
generally small enough and should not bloat the dSYM.
rdar://69774935
Differential revision: https://reviews.llvm.org/D94460
InlineSpiller::foldMemoryOperand unties registers before an attempt to fold and
does not restore tied-ness in case of failure.
I do not have a particular test for demo of invalid behavior.
This is something of clean-up.
It is better to keep the behavior correct in case some time in future it happens.
Reviewers: reames, dantrushin
Reviewed By: dantrushin, reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D94389
Add a warning when the timestmap doesn't match between the object file
and the debug map entry. We were already emitting such warnings for
archive members and swift interface files. This patch also unifies the
warning across all three.
rdar://65614640
Differential revision: https://reviews.llvm.org/D94536
- Merge 6706342f48bea80 -- no more libcxx_needs_site_config, we now
always need it
- Since it was always off in practice, write_config bitrot. Unbitrot
it so that it works
- Remove copy step and let concat step write to final location
immediately -- and fix copy destination directory
As a side effect, libcxx/include/BUILD.gn now has only a single
sources list, which means the cmake sync script should be able to
automatically sync additions and removals of .h files. On the flipside,
this means this file now must be updated after most changes to
libcxx/include/__config_site.in, and looking through the last few months
of changes this looks like it's going to be a wash.
This is a pretty classic optimization. Instead of processing symbol
records and copying them to temporary storage, do a first pass to
measure how large the module symbol stream will be, and then copy the
data into place in the PDB file. This requires defering relocation until
much later, which accounts for most of the complexity in this patch.
This patch avoids copying the contents of all live .debug$S sections
into heap memory, which is worth about 20% of private memory usage when
making PDBs. However, this is not an unmitigated performance win,
because it can be faster to read dense, temporary, heap data than it is
to iterate symbol records in object file backed memory a second time.
Results on release chrome.dll:
peak mem: 5164.89MB -> 4072.19MB (-1,092.7MB, -21.2%)
wall-j1: 0m30.844s -> 0m32.094s (slightly slower)
wall-j3: 0m20.968s -> 0m20.312s (slightly faster)
wall-j8: 0m19.062s -> 0m17.672s (meaningfully faster)
I gathered similar numbers for a debug, component build of content.dll
in Chrome, and the performance impact of this change was in the noise.
The memory usage reduction was visible and similar.
Because of the new parallelism in the PDB commit phase, more cores makes
the new approach faster. I'm assuming that most C++ developer machines
these days are at least quad core, so I think this is a win.
Differential Revision: https://reviews.llvm.org/D94267
promise is a header field but it is not guaranteed that it would be the third
field of the frame due to `performOptimizedStructLayout`.
Reviewed By: lxfind
Differential Revision: https://reviews.llvm.org/D94137
The load/store instruction will be transformed to amx intrinsics in the
pass of AMX type lowering. Prohibiting the pointer cast make that pass
happy.
Differential Revision: https://reviews.llvm.org/D94372
This patch resolves the suboptimal codegen described in http://llvm.org/pr47873 .
When CodeGenPrepare lowers select into a conditional branch, a freeze instruction is inserted.
It is then translated to `BRCOND(FREEZE(SETCC))` in SelDag.
The `FREEZE` in the middle of `SETCC` and `BRCOND` was causing a suboptimal code generation however.
This patch adds `BRCOND(FREEZE(cond))` -> `BRCOND(cond)` fold to DAGCombiner to remove the `FREEZE`.
To make this optimization sound, `BRCOND(UNDEF)` simply should nondeterministically jump to the branch or not, rather than raising UB.
It wasn't clear what happens when the condition was undef according to the comments in ISDOpcodes.h, however.
I updated the comments of `BRCOND` to make it explicit (as well as `BR_CC`, which is also a conditional branch instruction).
Note that it diverges from the semantics of `br` instruction in IR, which is explicitly UB.
Since the UB semantics was necessary to explain optimizations that use branching conditions, and SelDag doesn't seem to have such optimization, I think this divergence is okay.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D92015
This is a small patch stating that a nocapture pointer cannot be returned.
Discussed in D93189.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94386
Previously, instructions which could be
expressed as VOP3 in addition to another
encoding had a _e64 suffix on the tablegen
record name, while those
only available as VOP3 did not. With this
patch, all VOP3s will have the _e64 suffix.
The assembly does not change, only the mir.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D94341
Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423
This patches remaining tests, and patches lit.local.cfg to block future
such cases (until we flip FileCheck's flag)
Differential Revision: https://reviews.llvm.org/D94556
Adding sample-profile-suffix-elision-policy attribute to functions whose linkage names are uniquefied so that their unique name suffix won't be trimmed when applying AutoFDO profiles.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D94455
Most uses of this class just use the default MallocAllocator.
As this contains no fields, we can use the empty base optimisation for BumpPtrAllocatorImpl and save 8 bytes of padding for most use cases.
This prevents using a class that is marked as `final` as the `AllocatorT` template argument.
In one must use an allocator that has been marked as `final`, the simplest way around this is a proxy class.
The class should have all the methods that `AllocaterBase` expects and should forward the calls to your own allocator instance.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D94439
This seems to only have overridden cold handling, which we probably
shouldn't do. As far as I can tell the wrapper library functions are
still inlined as appropriate.
This makes sure that assembly output actually can be assembled.
Set the correct MCExpr relocations specifier VK_PAGEOFF - and also
set VK_PAGE consistently even though it's not visible in the assembly
output.
Differential Revision: https://reviews.llvm.org/D94365
This change modifies the source location formatting from:
LineNumber.Discriminator
to:
LineNumber:ColumnNumber.Discriminator
The motivation here is to enhance location information for inline replay that currently exists for the SampleProfile inliner. This will be leveraged further in inline replay for the CGSCC inliner in the related diff.
The ReplayInlineAdvisor is also modified to read the new format and now takes into account the callee for greater accuracy.
Testing:
ninja check-llvm
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D94333
The custom expansion of select operations in the RISC-V backend
interferes with the matching of cmov instructions. Legalizing
select when the Zbt extension is available solves that problem.
Reviewed By: lenary, craig.topper
Differential Revision: https://reviews.llvm.org/D93767
LoopVectorize uses some utilities on LoopVersioning, but doesn't actually use it for, you know, versioning. As a result, the precondition LoopVersioning expects is too strong for this user. At the moment, LoopVectorize supports any loop with a unique exit block, so check the same precondition here.
Really, the whole class structure here is a mess. We should separate the actual versioning from the metadata updates, but that's a bigger problem.
This replicates existing and/or tests to also test variants using
select. This should help us get a more accurate view on which
optimizations we're missing if we disable the select -> and/or
fold.
This relates to the ongoing effort to support vectorization of multiple exit loops (see D93317).
The previous code assumed that LCSSA phis were always single entry before the vectorizer ran. This was correct, but only because the vectorizer allowed only a single exiting edge. There's nothing in the definition of LCSSA which requires single entry phis.
A common case where this comes up is with a loop with multiple exiting blocks which all reach a common exit block. (e.g. see the test updates)
Differential Revision: https://reviews.llvm.org/D93725
Similar to D94125, derive `willreturn` for functions that are `readonly` and
`mustprogress` in FunctionAttrs.
To quote the reasoning from D94125:
Since D86233 we have `mustprogress` which, in combination with
`readonly`, implies `willreturn`. The idea is that every side-effect
has to be modeled as a "write". Consequently, `readonly` means there
is no side-effect, and `mustprogress` guarantees that we cannot "loop"
forever without side-effect.
Reviewed By: jdoerfert, nikic
Differential Revision: https://reviews.llvm.org/D94502
This is a partial fix for https://bugs.llvm.org/show_bug.cgi?id=44403.
Folding gep p, q-p to q is only legal if p and q have the same
provenance. This fold should probably be guarded by something like
getUnderlyingObject(p) == getUnderlyingObject(q).
This patch is a partial fix that removes the special handling for
gep p, 0-p, which will fold to a null pointer, which would certainly
not pass an underlying object check (unless p is also null, in which
case this would fold trivially anyway). Folding to a null pointer
is particularly problematic due to the special handling it receives
in many places, making end-to-end miscompiles more likely.
Differential Revision: https://reviews.llvm.org/D93820
We can use a 0 immediate to avoid needing to materialize 0 into
an FPR first.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D94459
If SETO/SETUO aren't legal, they'll be expanded and we'll end up
with 3 comparisons.
SETONE is equivalent to (SETOGT || SETOLT)
so if one of those operations is supported use that expansion. We
don't need both since we can commute the operands to make the other.
SETUEQ can be implemented with !(SETOGT || SETOLT) or (SETULE && SETUGE).
I've only implemented the first because it didn't look like most of the
affected targets had legal SETULE/SETUGE.
Reviewed By: frasercrmck, tlively, nemanjai
Differential Revision: https://reviews.llvm.org/D94450
Remove the hack adding /usr/local paths on FreeBSD and DragonFlyBSD.
It does not seem to be necessary today, and it breaks cross builds.
Differential Revision: https://reviews.llvm.org/D94491