Running this script gives
```
"llvm-project/llvm/./utils/wciia.py", line 56
if word == "N:":
TabError: inconsistent use of tabs and spaces in indentation
```
Under emacs' whitespace-mode, it shows
```
for·line·in·code_owners_file:$
····for·word·in·line.split():$
» if·word·==·"N:":$
» » name·=·line[2:].strip()$
» » if·code_owner:$
» » » process_code_owner(code_owner)$
» » » code_owner·=·{}$
```
I use `yapf` to format this script directly and it's running correctly.
This code was re-implementing the same-BB case of
isPotentiallyReachable(). Historically, this was done because
CaptureTracking used additional caching for local dominance
queries. Now that it is no longer needed, the code is effectively
the same as isPotentiallyReachable().
The only difference are extra checks for invoke/phis. These are
misleading checks related to dominance in the value availability
sense that are not relevant for control reachability. The invoke
check was correct but redundant in that invokes are always
terminators, so `I` could never come before the invoke. The phi
check is a matter of interpretation (should an earlier phi node be
considered reachable from a later phi node in the same block?)
but ultimately doesn't matter because phis don't capture anyway.
Reapply after adjusting the synchronized.m test case, where the
TODO is now resolved. The pointer is only captured on the exception
handling path.
-----
For the CapturesBefore tracker, it is sufficient to check that
I can not reach BeforeHere. This does not necessarily require
that BeforeHere dominates I, it can also occur if the capture
happens on an entirely disjoint path.
This change was previously accepted in D90688, but had to be
reverted due to large compile-time impact in some cases: It
increases the number of reachability queries that are performed.
After recent changes, the compile-time impact is largely mitigated,
so I'm reapplying this patch. The remaining compile-time impact
is largely proportional to changes in code-size.
The intention is to be able to run this from additional locations (such as shuffle combining) in the future.
Reapplies rGb95a103808ac (after reversion at rGc012a388a15b), with SSE3/SSSE3 typo fix, test added at rG0afb10de1449.
This reverts commit 6b8b43e7af3074124e3c9e429e1fb08165799be4.
This causes clang test to fail (CodeGenObjC/synchronized.m).
Revert until I can figure out whether that's an expected change.
For the CapturesBefore tracker, it is sufficient to check that
I can not reach BeforeHere. This does not necessarily require
that BeforeHere dominates I, it can also occur if the capture
happens on an entirely disjoint path.
This change was previously accepted in D90688, but had to be
reverted due to large compile-time impact in some cases: It
increases the number of reachability queries that are performed.
After recent changes, the compile-time impact is largely mitigated,
so I'm reapplying this patch. The remaining compile-time impact
is largely proportional to changes in code-size.
This is based on the test from D90688, without the argmemonly
attribute. The argmemonly attribute would guaranteed no modref
by itself and the question of captures would not arise in the
first place.
The default AsmPrinter print GV in comments,
AIX should do so too.
This also fix LLVM :: CodeGen/Generic/inline-asm-mem-clobber.ll.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D102534
This patch makes it possible to do call site specific deductions
for AAValueSimplification and AAIsDead.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D84722
Reachability queries are very expensive, and currently performed
for each instruction we look at, even though most of them will
not lead to a capture and are thus ultimately irrelevant. It is
more efficient to walk a few unnecessary instructions than to
perform unnecessary reachability queries.
Theoretically, this may produce worse results, because the additional
instructions considered may cause us to hit the use count limit
earlier. In practice, this does not appear to be a problem, e.g.
on test-suite O3 we report only one more captured-before with this
change, with no resulting codegen differences.
This makes PointerMayBeCapturedBefore() significantly cheaper in
practice, hopefully allowing it to be used in more places.
This patch introduces source loading and pruning functions.
It will allow to use the DWARF embedded source and use the same code for JSON printout.
No functional changes.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102539
This patch adds support for GCC's -fstack-usage flag. With this flag, a stack
usage file (i.e., .su file) is generated for each input source file. The format
of the stack usage file is also similar to what is used by GCC. For each
function defined in the source file, a line with the following information is
produced in the .su file.
<source_file>:<line_number>:<function_name> <size_in_byte> <static/dynamic>
"Static" means that the function's frame size is static and the size info is an
accurate reflection of the frame size. While "dynamic" means the function's
frame size can only be determined at run-time because the function manipulates
the stack dynamically (e.g., due to variable size objects). The size info only
reflects the size of the fixed size frame objects in this case and therefore is
not a reliable measure of the total frame size.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D100509
findIndirectCallFunctionSamples will leave Sum uninitialized if it returns an empty vector, we don't really use Sum in this case (but we do make a copy that isn't used either) - so ensure we initialize the value to zero to at least silence the static analysis warning.
These checks are not specific to the instruction based variant of
isPotentiallyReachable(), they are equally valid for the basic
block based variant. Move them there, to make sure that switching
between the instruction and basic block variants cannot introduce
regressions.
Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - vector integer multiplies are pipelined - all Port0, throughput = 2 @ 128bits, 1 @ 64bits.
Noticed while checking reduction costs - now that we can use in-order models in llvm-mca, the atom model is the "worst case scenario" we have in x86.
All the uses that we have for collectBitParts revolve around us matching down to an operation with a single root value - I don't think we're intending to change that (and a lot of collectBitParts assumes it).
The binops cases (OR/FSHL/FSHR) already check if the providers are the same, but that would still mean we waste time collecting through unaryops before getting to them.
Currently we only match bswap intrinsics from or(shl(),lshr()) style patterns when we could often match bitreverse intrinsics almost as cheaply.
Differential Revision: https://reviews.llvm.org/D90170
Reapply rG5ed56a821c06 (after reverted by rG7aa89c4a22fd) - don't take reference from struct that will be erased in X86FrameLowering::eliminateCallFramePseudoInstr
I'm also adding an explicit data layout, so we can
confirm that alignment requirements/prefs are met.
I tried to use complete/scripted CHECK lines here,
but that fails with 1 of the globals, and not sure why.
Use comesBefore() instead of performing an instruction walk. In
line with the previous implementation, instructions are considered
to reach themselves.
The system's network API is in libnetwork.so, so we explicitly need to link to
them on Haiku. This patch is similar to https://reviews.llvm.org/D97633.
Patch by Niels Reedijk. Thanks Niels!
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D98405