Summary:
This support is needed for the Fortran array variables with pointer/allocatable
attribute. This support enables debugger to identify the status of variable
whether that is currently allocated/associated.
for pointer array (before allocation/association)
without DW_AT_associated
(gdb) pt ptr
type = integer (140737345375288:140737354129776)
(gdb) p ptr
value requires 35017956 bytes, which is more than max-value-size
with DW_AT_associated
(gdb) pt ptr
type = integer (:)
(gdb) p ptr
$1 = <not associated>
for allocatable array (before allocation)
without DW_AT_allocated
(gdb) pt arr
type = integer (140737345375288:140737354129776)
(gdb) p arr
value requires 35017956 bytes, which is more than max-value-size
with DW_AT_allocated
(gdb) pt arr
type = integer, allocatable (:)
(gdb) p arr
$1 = <not allocated>
Testing
- unit test cases added
- check-llvm
- check-debuginfo
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83544
This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.
This includes the base IR changes, and some tests for places where it
should be treated similarly to byval. Codegen support will be in a
future patch.
My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the
argument, although nothing does this in reality. There is also talk of
removing and replacing the byval attribute, so a new attribute would
need to take its place anyway.
This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.
Background:
There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.
It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.
The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.
This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.
Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.
The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.
Additional context is in D79744.
This patch adds a new variant of the matrix lowering pass that only does
a minimal lowering and only depends on TTI. The main purpose of this pass
is to have a pass with minimal dependencies to run as part of the backend
pipeline.
At the moment, the only difference to the regular lowering pass is that it
does not support remarks. But in subsequent patches add support for tiling
to the lowering pass which will require more analysis, which we do not want
to run in the backend, as the lowering should happen in the middle-end in
practice and running it in the backend is mostly for convenience when
running llc.
Reviewers: anemet, Gerolf, efriedma, hfinkel
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D76867
This patch changes llvm-readelf (and llvm-readobj for consistency)
behavior to print an error when executed with no input files.
Reading from stdin can be achieved via a '-' for the input
object.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46400
Differential Revision: https://reviews.llvm.org/D83704
Reviewed by: jhenderson, MaskRay, sbc, jyknight
It's useful for a debugger to be able to distinguish an @llvm.debugtrap
from a (noreturn) @llvm.trap, so this extends the existing Windows
behaviour to other platforms.
Add narrowScalarFor action.
Add narrow scalar for typeIndex == 0 for G_FPTOSI/G_FPTOUI.
Legalize using narrowScalarFor as s16->s32 G_FPTOSI/G_FPTOUI
followed by s32->s64 G_SEXT/G_ZEXT.
Differential Revision: https://reviews.llvm.org/D84010
There is a strange "feature" of the code: it handles all relocations as `Elf_Rela`.
For handling `Elf_Rel` it converts them to `Elf_Rela` and passes `bool IsRela` to
specify the real type everywhere.
A related issue is that the
`decode_relrs` helper in lib/Object has to return `Expected<std::vector<Elf_Rela>>`
because of that, though it could return a vector of `Elf_Rel`.
I think we should just start using templates for relocation types, it makes the code
cleaner and shorter. This patch does it.
Differential revision: https://reviews.llvm.org/D83871
It fixes/improves the following:
1) Some code was duplicated.
2) A "The .MIPS.abiflags section has a wrong size" error was not reported as a warning,
but was printed to stdout for the LLVM style. Also, it was reported as an error for the GNU style.
This patch changes the behavior to be consistent and to report warnings.
3) `unwrapOrError()` was used before, now a warning is reported instead.
Differential revision: https://reviews.llvm.org/D84033
Common code sinking is already guarded with a (with default-off!) flag,
so add a flag for hoisting, too.
D84108 will hopefully make hoisting off-by-default too.
Summary:
All tests are updated, except wrapper.ll since it is not working nicely
with newly created functions.
Reviewers: jdoerfert, uenoku, baziotis, homerdin
Subscribers: arphaman, jfb, kuter, bbn, okura, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D84130
Identify relocations by (section name, offset) pairs, rather than plain
vmaddrs. This makes it easier to cross-reference debugging output for
relocations with output from standard object inspection tools (otool,
readelf, objdump, etc.).
This patch implements the .debug_rnglists section. We are able to
produce the .debug_rnglists section by the following syntax.
```
debug_rnglists:
- Format: DWARF32 ## Optional
Length: 0x1234 ## Optional
Version: 5 ## Optional
AddressSize: 0x08 ## Optional
SegmentSelectorSize: 0x00 ## Optional
OffsetEntryCount: 2 ## Optional
Offsets: [1, 2] ## Optional
Lists:
- Entries:
- Operator: DW_RLE_base_address
Values: [ 0x1234 ]
```
The generated .debug_rnglists is verified by llvm-dwarfdump, except for
the operator DW_RLE_startx_endx, since llvm-dwarfdump doesn't support
it.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D83624
Summary:
This introduces new flag to the update_test_checks and
update_cc_test_checks that allows for function attributes
to be checked in a check-line. If the flag is not set,
the behavior should remain the same.
Reviewers: jdoerfert
Subscribers: arichardson, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83629
When processing a MachO SUBTRACTOR/UNSIGNED pair, if the UNSIGNED target
is non-extern then check the r_symbolnum field of the relocation to find
the targeted section and use the section's address to find 'ToSymbol'.
Previously 'ToSymbol' was found by loading the initial value stored at
the fixup location and treating this as an address to search for. This
is incorrect, however: the initial value includes the addend and will
point to the wrong block if the addend is less than zero or greater than
the block size.
rdar://65756694
The getAllOnesValue can only handle things that are bitcast from a
ConstantInt, while here we bitcast through a pointer, so we may see more
complex objects (like Array or Struct).
Differential Revision: https://reviews.llvm.org/D83870
The function extractArgumentsFromModule() was passing a one-based index to,
but replaceFunctionCalls() was expecting a zero-based argument index. This
resulted in assertion errors when reducing function call arguments with
different types. Additionally, the
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D84099
This patch
- adds `canCreateUndefOrPoison`
- refactors `canCreatePoison` so it can deal with constantexprs
`canCreateUndefOrPoison` will be used at D83926.
Reviewed By: nikic, jdoerfert
Differential Revision: https://reviews.llvm.org/D84007
Summary:
This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context.
A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose.
Subscribers: mgorny, aprantl, hiraditya, llvm-commits
Tags: #llvm
Resubmit for https://reviews.llvm.org/D84086
fma reassoc A, B, C --> fadd (fmul A, B), C (when target has no FMA hardware)
C/C++ code may use explicit fma() calls (which become LLVM fma
intrinsics in IR) but then gets compiled with -ffast-math or similar.
For targets that do not have FMA hardware, we don't want to go out to
the math library for a precise but slow FMA result.
I tried this as a generic DAGCombine, but it caused infinite looping
on more than 1 other target, so there's likely some over-reaching fma
formation happening.
There's also a potential intersection of strict FP with fast-math here.
Deferring to current behavior for that case (assuming that strict-ness
overrides fast-ness).
Differential Revision: https://reviews.llvm.org/D83981
This reverts commit 4500db8c59621a31c622862a2946457fdee481ce,
which was reverted because lower thresholds exposed a new issue (PR46680).
Now that it was resolved by d12ec0f752e7f2c7f7252539da2d124264ec33f7,
we can reinstate lower limits and wait for a new bugreport before
reverting this again...
Both users of predicteinfo (NewGVN and SCCP) are interested in
getting a cmp constraint on the predicated value. They currently
implement separate logic for this. This patch adds a common method
for this in PredicateBase.
This enables a missing bit of PredicateInfo handling in SCCP: Now
the predicate on the condition itself is also used. For switches
it means we know that the switched-on value is the same as the case
value. For assumes/branches we know that the condition is true or
false.
Differential Revision: https://reviews.llvm.org/D83640
This is a step towards trying to remove unnecessary FP compares
with infinity when compiling with -ffinite-math-only or similar.
I'm intentionally not checking FMF on the fcmp itself because
I'm assuming that will go away eventually.
The analysis part of this was added with rGcd481136 for use with
isKnownNeverNaN. Similarly, that could be an enhancement here to
get predicates like 'one' and 'ueq'.
Differential Revision: https://reviews.llvm.org/D84035
Fixes https://bugs.llvm.org/show_bug.cgi?id=46680.
Just like insertions through IRBuilder, InsertNewInstBefore()
should be using the deferred worklist mechanism, so that processing
of newly added instructions is prioritized.
There's one side-effect of the worklist order change which could be
classified as a regression. An add op gets pushed through a select
that at the time is not a umax. We could add a reverse transform
that tries to push adds in the reverse direction to restore a min/max,
but that seems like a sure way of getting infinite loops... Seems
like something that should best wait on min/max intrinsics.
Differential Revision: https://reviews.llvm.org/D84109
As far as I can tell, it should not be necessary for VCTP to be
unpredictable in tail predicated loops. Either it has a a valid loop
counter as a operand which will naturally keep it in the right loop, or
it doesn't and it won't be converted to a tail predicated loop. Not
marking it as having side effects allows it to be scheduled more cleanly
for cases where it is not expected to become a tail predicate loop.
Differential Revision: https://reviews.llvm.org/D83907
.gcno, .gcda and source files can be modified while we are reading them. If the
concurrent modification of a file being read nullifies the NUL terminator
assumption, llvm-cov can trip over an assertion failure in MemoryBuffer::init.
This is not so rare - the source files can be in an editor and .gcda can be
written by an running process (if the process forks, when .gcda gets written is
probably more unpredictable).
There is no accompanying test because an assertion failure requires data
races with some involved setting.