Summary:
This was added in https://reviews.llvm.org/D2449, but I'm not sure it's
necessary since an inalloca value is never a Constant (should be an
AllocaInst).
Reviewers: hans, rnk
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79350
We check that C is finite and strictly positive, but there's no need to
check that it's normal too. exp2 should be just as accurate on denormals
as pow is.
Differential Revision: https://reviews.llvm.org/D79413
Size==0 triggers `assert(Size != 0)` in mapped_file_region::init.
I plan to use an empty file in D79339 (llvm-objcopy --dump-section).
According to POSIX, "If len is zero, mmap() shall fail and no mapping
shall be established." Just specialize case Size=0 to use
createInMemoryBuffer.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D79338
LSR has some logic that tries to aggressively reuse registers in
formula. This can lead to sub-optimal decision in complex loops where
the backend it trying to use shouldFavorPostInc. This disables the
re-use in those situations.
Differential Revision: https://reviews.llvm.org/D79301
VMEM soft clauses only contain VMEM and FLAT instructions. Teaching
GCNHazardRecognizer::checkSoftClauseHazards that other kinds of
instructions will naturally break the clause means there are far fewer
cases where it has to insert an s_nop instruction to forcibly break the
clause.
Differential Revision: https://reviews.llvm.org/D79353
I don't think there's any good reason not to do this transformation when
the pow has multiple uses.
Differential Revision: https://reviews.llvm.org/D79407
This reverts commit 8baa0b9439b5788bc5a0a2ee45dfda01b7a5a43f. This broke the
LLDB Green Dragon bot where htonl is getting miscompiled on macOS 10.14 and 10.15
SDKs, causing networking tests to fail as IP addressed were being inverted
(e.g., 127.0.0.1 became 1.0.0.127 with an enabled modules build).
Reverting until this is fixed.
Marking a section as ALLOC tells the ELF loader to load the section into memory.
As we do not want to load the notes into VRAM, the flag should not be there.
On AMDHSA, .note is still marked as ALLOC, apparently this is currently
needed for OpenCL (see https://reviews.llvm.org/D74995).
Differential Revision: https://reviews.llvm.org/D76278
A PREDICATE_CAST(PREDICATE_CAST(X)) can be converted to a
PREDICATE_CAST(X) as the operation can convert between any forms of
predicates (v4i1/v8i1/v16i1/i32). Unfortunately I got the type wrong on
one of the rarer converts, which would lead to invalid nodes during
isel. This fixes it up to use the correct type.
Differential Revision: https://reviews.llvm.org/D79402
This adds a general combine that can be used to fold:
or(zext(OP(x)), shl(zext(OP(y)),bw/2))
-->
OP(or(zext(x), shl(zext(y),bw/2)))
Allowing us to widen 'concat-able' style or+zext patterns - I've just set this up for BSWAP but we could use this for other similar ops (BITREVERSE for instance).
We already do something similar for bitop(bswap(x),bswap(y)) --> bswap(bitop(x,y))
Fixes PR45715
Reviewed By: @lebedev.ri
Differential Revision: https://reviews.llvm.org/D79041
Unless we're truncating an 'all-bits' result, using PACKSS for vXi64->vXi32 truncation causes problems with later combines as ComputeNumSignBits struggles to see through BITCASTs to smaller types. If we don't use PACKSS in these cases then we fallback to shuffles which are usually just as good.
The --output-target documentation has slightly rotted, as the default is
no longer purely based on the input file format, but also the value of
--input-target. This patch updates the documentation to make this
explicit.
Reviewed by: MaskRay, alexshap
Differential Revision: https://reviews.llvm.org/D79318
Make the kind of cost explicit throughout the cost model which,
apart from making the cost clear, will allow the generic parts to
calculate better costs. It will also allow some backends to
approximate and correlate the different costs if they wish. Another
benefit is that it will also help simplify the cost model around
immediate and intrinsic costs, where we currently have multiple APIs.
RFC thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html
Differential Revision: https://reviews.llvm.org/D79002
This fixes a regression introduced by a very old commit 280ac1fd1dc35 (was
llvm-svn 361950).
Commit 280ac1fd1dc35 redesigned the logic in the LSUnit with the goal of
speeding up isReady() queries, and stabilising the LSUnit API (while also making
the load store unit more customisable).
The concept of MemoryGroup (effectively an alias set) was added by that commit
to better describe and track dependencies between memory operations. However,
that concept was not just used for alias dependencies, but it was also used for
describing memory "order" dependencies (enforced by the memory consistency
model).
Instructions of a same memory group were considered "equivalent" as in:
independent operations that can potentially execute in parallel. The problem
was that the cost of a dependency (in terms of number of cycles) should have
been different for "order" dependency. Instructions in an order dependency
simply have to have to wait until their predecessors are "issued" to an
underlying pipeline (rather than having to wait until predecessors have beeng
fully executed). For simple "order" dependencies, this was effectively
introducing an artificial delay on the "issue" of independent loads and stores.
This patch fixes the issue and adds a new test named 'independent-load-stores.s'
to a bunch of x86 targets. That test contains the reproducible posted by Fabian
Ritter on PR45793.
I had to rerun the update-mca-tests script on several files. To avoid expected
regressions on some Exynos tests, I have added a -noalias=false flag (to match
the old strict behavior on latencies).
Some tests for processor Barcelona are improved/fixed by this change and they
now show better results. In a few tests we were incorrectly counting the time
spent by instructions in a scheduler queue. In one case in particular we now
correctly see a store executed out of order. That test was affected by the same
underlying issue reported as PR45793.
Reviewers: mattd
Differential Revision: https://reviews.llvm.org/D79351
Summary:
This fixes a few things that are connected. It is very hard to provide
an independent test case for each of those fixes, because they are
interconnected and sometimes one masks another. The provided test case
triggers some of those bugs below but not all.
---
1. Background:
`placeBlockMarker` takes a BB, and if the BB is a destination of some
branch, it places `end_block` marker there, and computes the nearest
common dominator of all predecessors (what we call 'header') and places
a `block` marker there.
When we first place markers, we traverse BBs from top to bottom. For
example, when there are 5 BBs A, B, C, D, and E and B, D, and E are
branch destinations, if mark the BB given to `placeBlockMarker` with `*`
and draw a rectangle representing the border of `block` and `end_block`
markers, the process is going to look like
```
-------
----- |-----|
--- |---| ||---||
|A| ||A|| |||A|||
--- --> |---| --> ||---||
*B | B | || B ||
C | C | || C ||
D ----- |-----|
E *D | D |
E -------
*E
```
which means when we first place markers, we go from inner to outer
scopes. So when we place a `block` marker, if the header already
contains other `block` or `try` marker, it has to belong to an inner
scope, so the existing `block`/`try` markers should go _after_ the new
marker. This was the assumption we had.
But after placing all markers we run `fixUnwindMismatches` function.
There we do some control flow transformation and create some branches,
and we call `placeBlockMarker` again to place `block`/`end_block`
markers for those newly created branches. We can't assume that we are
traversing branch destination BBs from top to bottom now because we are
basically inserting some new markers in the middle of existing markers.
Fix:
In `placeBlockMarker`, we don't have the assumption that the BB given is
in the order of top to bottom, and when placing `block` markers,
calculates whether existing `block` or `try` markers are inner or
outer scopes with respect to the current scope.
---
2. Background:
In `fixUnwindMismatches`, when there is a call whose correct unwind
destination mismatches the current destination after initially placing
`try` markers, we wrap that with a new nested `try`/`catch`/`end` and
jump to the correct handler within the new `catch`. The correct handler
code is split as a separate BB from its original EH pad so it can be
branched to. Here's an example:
- Before
```
mbb:
call @foo <- Unwind destination mismatch!
wrong-ehpad:
catch
...
cont:
end_try
...
correct-ehpad:
catch
[handler code]
```
- After
```
mbb:
try (new)
call @foo
nested-ehpad: (new)
catch (new)
local.set n / drop (new)
br %handleri (new)
nested-end: (new)
end_try (new)
wrong-ehpad:
catch
...
cont:
end_try
...
correct-ehpad:
catch
local.set n / drop (new)
handler: (new)
end_try
[handler code]
```
Note that after this transformation, it is possible there are no calls
to actually unwind to `correct-ehpad` here. `call @foo` now
branches to `handler`, and there can be no other calls to unwind to
`correct-ehpad`. In this case `correct-ehpad` does not have any
predecessors anymore.
This can cause a bug in `placeBlockMarker`, because we may need to place
`end_block` marker in `handler`, and `placeBlockMarker` computes the
nearest common dominator of all predecessors. If one of `handler`'s
predecessor (here `correct-ehpad`) does not have any predecessors, i.e.,
no way of reaching it, we cannot correctly compute the common dominator
of predecessors of `handler`, and end up placing no `block`/`end`
markers. This bug actually sometimes masks the bug 1.
Fix:
When we have an EH pad that does not have any predecessors after this
transformation, deletes all its successors, so that its successors don't
have any dangling predecessors.
---
3. Background:
Actually the `handler` BB in the example shown in bug 2 doesn't need
`end_block` marker, despite it being a new branch destination, because
it already has `end_try` marker which can serve the same purpose. I just
put that example there for an illustration purpose. There is a case we
actually need to place `end_block` marker: when the branch dest is the
appendix BB. The appendix BB is created when there is a call that is
supposed to unwind to the caller ends up unwinding to a wrong EH pad. In
this case we also wrap the call with a nested `try`/`catch`/`end`,
create an 'appendix' BB at the very end of the function, and branch to
that BB, where we rethrow the exception to the caller.
Fix:
When we don't actually need to place block markers, we don't.
---
4. In case we fall through to the continuation BB after the catch block,
after extracting handler code in `fixUnwindMismatches` (refer to bug 2
for an example), we now have to add a branch to it to bypass the
handler.
- Before
```
try
...
(falls through to 'cont')
catch
handler body
end
<-- cont
```
- After
```
try
...
br %cont (new)
catch
end
handler body
<-- cont
```
The problem is, we haven't been placing a new `end_block` marker in the
`cont` BB in this case. We should, and this fixes it. But it is hard to
provide a test case that triggers this bug, because the current
compilation pipeline from .ll to .s does not generate this kind of code;
we always have a `br` after `invoke`. But code without `br` is still
valid, and we can have that kind of code if we have some pipeline
changes or optimizations later. Even mir test cases cannot trigger this
part for now, because we don't encode auxiliary EH-related data
structures (such as `WasmEHFuncInfo`) in mir now. Those functionalities
can be added later, but I don't think we should block this fix on that.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79324
This patch makes the folding of or(A, B) into not(and(not(A), not(B)))
more agressive for I1 vector. This only affects Thumb2 MVE and improves
codegen, because it removes a lot of msr/mrs instructions on VPR.P0.
This patch also adds a xor(vcmp) -> !vcmp fold for MVE.
Differential Revision: https://reviews.llvm.org/D77202
This patch adds an implementation of PerformVSELECTCombine in the
ARM DAG Combiner that transforms vselect(not(cond), lhs, rhs) into
vselect(cond, rhs, lhs).
Normally, this should be done by the target-independent DAG Combiner,
but it doesn't handle the kind of constants that we generate, so we
have to reimplement it here.
Differential Revision: https://reviews.llvm.org/D77712
Summary:
I have fixed several places in getSplatSourceVector and isSplatValue
to work correctly with scalable vectors. I added new support for
the ISD::SPLAT_VECTOR DAG node as one of the obvious cases we can
support with scalable vectors. In other places I have tried to do
the sensible thing, such as bail out for vector types we don't yet
support or don't intend to support.
It's not possible to add IR test cases to cover these changes, since
they are currently only ever exercised on certain targets, e.g.
only X86 targets use the result of getSplatSourceVector. I've
assumed that X86 tests already exist to test these code paths for
fixed vectors. However, I have added some AArch64 unit tests that
test the specific functions I have changed.
Differential revision: https://reviews.llvm.org/D79083
The argparse 'append' action concatenates multiple occurrences of an
argument (even when we specify `nargs=1` or `nargs='?'`). This means
that we create multiple identical output files if the `--output`
argument is given more than once. This isn't useful and we instead want
this to behave like a standard optional argument: last occurrence wins.
This patch threads the virtual file system through dsymutil.
Currently there is no good way to find out exactly what files are
necessary in order to reproduce a dsymutil link, at least not without
knowledge of how dsymutil's internals. My motivation for this change is
to add lightweight "reproducers" that automatically gather the input
object files through the FileCollectorFileSystem. The files together
with the YAML mapping will allow us to transparently reproduce a
dsymutil link, even without having to mess with the OSO path prefix.
Differential revision: https://reviews.llvm.org/D79376
This revision adds support for merging identical blocks, or those with the same operations that branch to the same successors. Operands that mismatch between the different blocks are replaced with new block arguments added to the merged block.
Differential Revision: https://reviews.llvm.org/D79134
Summary:
That unless the user requested an output object (--lto-obj-path), the an
unused empty combined module is not emitted.
This changed is helpful for some target (ex. RISCV-V) which encoded the
ABI info in IR module flags (target-abi). Empty unused module has no ABI
info so the linker would get the linking error during merging
incompatible ABIs.
Reviewers: tejohnson, espindola, MaskRay
Subscribers: emaste, inglorion, arichardson, hiraditya, simoncook, MaskRay, steven_wu, dexonsmith, PkmX, dang, lenary, s.egerton, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78988
Refering to the link order of a dylib better matches the terminology used in
static compilation. As upcoming patches will increase the number of places where
link order matters (for example when closing JITDylibs) it's better to get this
name change out of the way early.
This reverts commit fb5fd74685e728b1d5e68d33e9842bcd734b98e6.
Re-instates commit 53913a65b408ade2956061b4c0aaed6bba907403
The fix is to trim off trailing separators, as in `/foo/bar/` and
produce `/foo/bar`. VFS tests rely on this. I added unit tests for
remove_dots.
Register live ranges may have had gaps that after coalescing should be
removed. This is done by adding a new segment to the range, and merging
it with neighboring segments. When doing so, do not assume that each
subrange of the register ended at the same index. If a subrange ended
earlier, adding this segment could make the live range invalid.
Instead, if the subrange is not live at the start of the segment,
extend it first.
Previous patch broken flang, which has some yet-to-be resolved cyclic
dependencies. This patch fixes the breakage by restricting the dependencies
which are generated to public libraries, which is probably more sensible anyway.
Differential Revision: https://reviews.llvm.org/D79366
Summary:
Constrain which metadata nodes are allowed to be, or contain,
DILocations. This ensures that logic for updating DILocations in a
Module is complete.
Currently, !llvm.loop metadata is the only odd duck which contains
nested DILocations. This has caused problems in the past: some passes
forgot to visit the nested locations, leading to subtly broken debug
info and late verification failures.
If there's a compelling reason for some future metadata to nest
DILocations, we'll need to introduce a generic API for updating the
locations attached to an Instruction before relaxing this check.
Reviewers: aprantl, dsanders
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79245
abhinavgaba reported that that the custom-result-category.py test hangs
on a Windows build bot [1]. Disable it for now.
[1] https://reviews.llvm.org/D78164#2018178
Today symbol names generated for machine basic block sections use a
unary encoding to reduce bloat. This is essential when every basic block
in the binary is assigned a symbol however with basic block clusters
(rG05192e585ce175b55f2a26b83b4ed7882785c8e6) when we only need to
generate a few non-temporary symbols we can assign more descriptive
names making them more user friendly. With this change -
Cold cluster section for function foo is named "foo.cold"
Exception cluster section for function foo is named "foo.eh"
Other cluster sections identified by their ids are named "foo.ID"
Using this format works well with existing tools. It will demangle as
expected and works with existing symbolizers, profilers and debuggers
out of the box.
$ c++filt _Z3foov.cold
foo() [clone .cold]
$ c++filt _Z3foov.eh
foo() [clone .eh]
$c++filt _Z3foov.1234
foo() [clone 1234]
Tests for basicblock-sections are updated with some cleanup where
appropriate.
Differential Revision: https://reviews.llvm.org/D79221
Fixes PR44357
For ARM ELF, regions covered by data mapping symbols `$d` are dumped as `.byte`, `.short` or `.word` but inline relocations are not printed. This patch merges its loop into the normal instruction printing loop so that inline relocations are printed.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D79284