Currently all code instances within the matrix lowering pass consider
matrix A to be MxN and B to be NxK, producing C which is MxK. Anyone
interacting with this API after reading the docs but without reading the pass
would expect A: MxK, B: KxN, and C: MxN. These changes bring the documentation
in line with the implementation.
One point of concern with this, the original signature as described in the docs
may be better or at least more expected. The interface as it was written
reflected other common matrix multiplication interfaces such as BLAS'[1], where
the matrices are MxK, KxN, MxN respectively. Choosing to honor this requires
changing code and tests instead, but should be mostly just renaming of variables.
Patch by Braedy Kuzma <braedy@ualberta.ca>
[1] http://www.netlib.org/lapack/explore-html/db/dc9/group__single__blas__level3_gafe51bacb54592ff5de056acabd83c260.html#gafe51bacb54592ff5de056acabd83c260
Reviewers: anemet, LuoYuanke, nicolasvasilache, fhahn
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D80663
Summary:
An upgrade of LLVM for CrOS [0] containing [1] triggered a bunch of
errors related to writing to reserved registers for a Linux kernel's
arm64 compat vdso (which is a aarch32 image).
After a discussion on LKML [2], it was determined that
-f{no-}omit-frame-pointer was not being specified. Comparing GCC and
Clang [3], it becomes apparent that GCC defaults to omitting the frame
pointer implicitly when optimizations are enabled, and Clang does not.
ie. setting -O1 (or above) implies -fomit-frame-pointer. Clang was
defaulting to -fno-omit-frame-pointer implicitly unless -fomit-frame-pointer
was set explicitly.
Why this becomes a problem is that the Linux kernel's arm64 compat vdso
contains code that uses r7. r7 is used sometimes for the frame pointer
(for example, when targeting thumb (-mthumb)). See useR7AsFramePointer()
in llvm/llvm-project/llvm/lib/Target/ARM/ARMSubtarget.h. This is mostly
for legacy/compatibility reasons, and the 2019 Q4 revision of the ARM
AAPCS looks to standardize r11 as the frame pointer for aarch32, though
this is not yet implemented in LLVM.
Users that are reliant on the implicit value if unspecified when
optimizations are enabled should explicitly choose -fomit-frame-pointer
(new behavior) or -fno-omit-frame-pointer (old behavior).
[0] https://bugs.chromium.org/p/chromium/issues/detail?id=1084372
[1] https://reviews.llvm.org/D76848
[2] https://lore.kernel.org/lkml/20200526173117.155339-1-ndesaulniers@google.com/
[3] https://godbolt.org/z/0oY39t
Reviewers: kristof.beyls, psmith, danalbert, srhines, MaskRay, ostannard, efriedma
Reviewed By: psmith, danalbert, srhines, MaskRay, efriedma
Subscribers: efriedma, olista01, MaskRay, vhscampos, cfe-commits, llvm-commits, manojgupta, llozano, glider, hctim, eugenis, pcc, peter.smith, srhines
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80828
'git push' command, without any other arguments, can do different
things depending on the local configuration of Git. This patch
updates the 'git push' command with extra arguments to be more
resilient to any local configuration.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D79964
Some of the --debug-* options can take an optional offset. Although the
man page does a good job of making that clear, it's much harder to
discover from the help output.
Currently the only reference to this is the following sentence:
> Where applicable these parameters take an optional =<offset> argument
> to dump only the entry at the specified offset.
This patch changes the help output from to print [=<offset>] after the
options that take an offset.
--debug-info[=<offset>] - Dump the .debug_info section
rdar://problem/63150066
Differential revision: https://reviews.llvm.org/D80959
Summary:
Sketch the outline for a new document that explains how to update debug
info in various kinds of code transformations.
Some of the guidelines that belong in HowToUpdateDebugInfo.rst were in
SourceLevelDebugging.rst already under the debugify section. It seems
like the distinction between the two docs ought to be that the former is
more prescriptive, while the latter is more descriptive.
To that end I've consolidated the "how to update debug info" guidelines
which were in SourceLevelDebugging.rst into the new doc, along with the
information about using "debugify" to test transformations. Since we've
added a mir-debugify pass, I've described that as well.
Reviewers: aprantl, jmorse, chrisjackson, dsanders
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80052
This is split off from D79100 and:
- adds a intrinsic description/definition for @llvm.get.active.lane.mask(), and
- describe its semantics in LangRef.
As described (in more detail) in its LangRef section, it is semantically
equivalent to an icmp with the vector induction variable and the back-edge
taken count, and generates a mask of active/inactive vector lanes.
It will have several use cases. First, it will be used by the
ExpandVectorPredication pass for the VP intrinsics, to expand VP intrinsics for
scalable vectors on targets that do not support the `%evl` parameter, see
D78203.
Also, this is part of, and essential for our ARM MVE tail-predication story:
- this intrinsic will be emitted by the LoopVectorizer in D79100, when
the scalar epilogue is tail-folded into the vector body. This new intrinsic
will generate the predicate for the masked loads/stores, and it takes the
back-edge taken count as an argument. The back-edge taken count represents the
number of elements processed by the loop, which we need to setup MVE
tail-predication.
- Emitting the intrinsic is controlled by a new TTI hook, see D80597.
- We pick up this new intrinsic in an ARM MVETailPredication backend pass, see
D79175, and convert it to a MVE target specific intrinsic/instruction to
create a tail-predicated loop.
Differential Revision: https://reviews.llvm.org/D80596
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch adds support signed numeric
values, thus allowing negative numeric values.
As such, the patch adds a new class to represent a signed or unsigned
value and add the logic for type promotion and type conversion in
numeric expression mixing signed and unsigned values. It also adds
the %d format specifier to represent signed value.
Finally, it also adds underflow and overflow detection when performing a
binary operation.
Copyright:
- Linaro (changes up to diff 183612 of revision D55940)
- GraphCore (changes in later versions of revision D55940 and
in new revision created off D55940)
Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson
Reviewed By: jhenderson, arichardson
Subscribers: MaskRay, hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, kristina, hfinkel, rogfer01, JonChesterfield
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60390
The HardwareLoop intrinsics were missing and not described in LangRef. This
adds these descriptions/definitions.
Differential Revision: https://reviews.llvm.org/D80316
With this change it is be possible to write FileCheck expressions such
as [[#(VAR+1)-2]]. Currently, the only supported arithmetic operators are
plus and minus, so this is not particularly useful yet. However, it our
CHERI fork we have tests that benefit from having multiplication in
FileCheck expressions. Allowing parenthesized expressions is the simplest
way for us to work around the current lack of operator precedence in
FileCheck expressions.
Reviewed By: thopre, jhenderson
Differential Revision: https://reviews.llvm.org/D77383
cctools strip has the option "-T" which removes Swift symbols.
This diff implements this option in llvm-strip for MachO.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D80099
Summary:
preallocated and musttail can work together, but we don't want to call
@llvm.call.preallocated.setup() to modify the stack in musttail calls.
So we shouldn't have the "preallocated" operand bundle when a
preallocated call is musttail.
Also disallow use of preallocated on calls without preallocated.
Codegen not yet implemented.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80581
Summary:
A struct argument can be passed-by-value to a callee via a pointer to a
temporary stack copy. Add support for emitting an entry value DBG_VALUE
when an indirect parameter DBG_VALUE becomes unavailable. This is done
by omitting DW_OP_stack_value from the entry value expression, to make
the expression describe the location of an object.
rdar://63373691
Reviewers: djtodoro, aprantl, dstenb
Subscribers: hiraditya, lldb-commits, llvm-commits
Tags: #lldb, #llvm
Differential Revision: https://reviews.llvm.org/D80345
- Added more info about what we refer as a clobber in MSSA.
- Added more info about MemoryDefs and how there is a single Def chain.
- The doc portrayed MSSA as modeling the heap whileit is modeling
the whole memory, so I changed the wording to not be heap-specific.
Differential Revision: https://reviews.llvm.org/D80000
Confusingly, these were unrelated and had different semantics. The
G_PTR_MASK instruction predates the llvm.ptrmask intrinsic, but has a
different format. G_PTR_MASK only allows clearing the low bits of a
pointer, and only a constant number of bits. The ptrmask intrinsic
allows an arbitrary mask. Replace G_PTR_MASK to match the intrinsic.
Only selects the cases that look like the old instruction. More work
is needed to select the general case. Also new legalization code is
still needed to deal with the case where the incoming mask size does
not match the pointer size, which has a specified behavior in the
langref.
This intrinsic implements IEEE-754 operation roundToIntegralTiesToEven,
and performs rounding to the nearest integer value, rounding halfway
cases to even. The intrinsic represents the missed case of IEEE-754
rounding operations and now llvm provides full support of the rounding
operations defined by the standard.
Differential Revision: https://reviews.llvm.org/D75670
After D80096, bots that build clang for distribution and that can't use
system gcc / libstdc++ need to pass a working rpath so that unit test
binaries can run. The method suggested in GettingStarted.rst works fine
for local development, but it results in an absolute local rpath ending
up even in distributed binaries like clang, which is both ugly and
unnecessary.
Add an explicit toggle that can be used to add an rpath only for the
non-distributed binaries that need it.
Differential Revision: https://reviews.llvm.org/D80534
This change makes minor correction to the implementation of intrinsic
`llvm.flt.rounds`:
- Added documentation entry in LangRef,
- Attributes of the intrinsic changed to be in line with other functions
dependent of floating-point environment.
Differential Revision: https://reviews.llvm.org/D79322
Summary: 'A' constraint requires an immediate int or fp constant that can be inlined in an instruction encoding.
Reviewers: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D78494
Summary:
Added a new IRCanonicalizer pass which aims to transform LLVM modules into
a canonical form by reordering and renaming instructions while preserving the
same semantics. The canonicalizer makes it easier to spot semantic differences
when diffing two modules which have undergone different passes.
Presentation: https://www.youtube.com/watch?v=c9WMijSOEUg
Reviewed by: plotfi
Differential Revision: https://reviews.llvm.org/D66029
- Change title to "DWARF For Heterogeneous Debugging".
- Add "Examples" section that references the AMDGPUUsage DWARF section.
- Make the "References" section a top level section.
Differential Revision: https://reviews.llvm.org/D70523
llvm-extract get serveral new options, but we forgot to update doc.
This patch update the doc.
Reviewed By: volkan
Differential Revision: https://reviews.llvm.org/D80413
Summary:
- Correct missing space in some "note" and "TODO" directives in
AMDGPUUsage.rst
- Correct warning for heading underline being too short in
BitCodeFormat.rst
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80407
Add support for generating a dsymutil reproducer. The result is a folder
containing all the object files for linking.
When --gen-reproducer is passed, dsymutil uses a FileCollectorFileSystem
which keeps track of all the files used by dsymutil. These files are
copied into a temporary directory when dsymutil exists.
When this path is passed to --use-reproducer, dsymutil uses a
RedirectingFileSystem that will use the files from the reproducer
directory instead of the actual paths. This means you don't need to mess
with the OSO path prefix.
Differential revision: https://reviews.llvm.org/D79398
If we don't know anything about the alignment of a pointer, Align(1) is
still correct: all pointers are at least 1-byte aligned.
Included in this patch is a bugfix for an issue discovered during this
cleanup: pointers with "dereferenceable" attributes/metadata were
assumed to be aligned according to the type of the pointer. This
wasn't intentional, as far as I can tell, so Loads.cpp was fixed to
stop making this assumption. Frontends may need to be updated. I
updated clang's handling of C++ references, and added a release note for
this.
Differential Revision: https://reviews.llvm.org/D80072
Summary:
Due to deleting the git llvm script, folks were asking for better documentation
about how to use git in order to commit to the Github repo. I added some step
by step git commands to make the usage clearer.
Context link: http://lists.llvm.org/pipermail/llvm-dev/2020-May/141640.html
Reviewed By: spatel, mehdi_amini
Differential Revision: https://reviews.llvm.org/D80088
When the callee requires a dynamic stack realignment,
it is not possible to correcty access the incoming
stack arguments using the stack pointer. We reserve a
base pointer in such cases to access the function arguments
inside the callee. The base pointer will hold the incoming
stack pointer value before any kind of delta added to it.
Reviewed By: arsenm, scott.linder
Differential Revision: https://reviews.llvm.org/D78811
The "null-pointer-is-valid" attribute needs to be checked by many
pointer-related combines. To make the check more efficient, convert
it from a string into an enum attribute.
In the future, this attribute may be replaced with data layout
properties.
Differential Revision: https://reviews.llvm.org/D78862
Summary:
The BFloat IR type is introduced to provide support for, initially, the BFloat16
datatype introduced with the Armv8.6 architecture (optional from Armv8.2
onwards). It has an 8-bit exponent and a 7-bit mantissa and behaves like an IEEE
754 floating point IR type.
This is part of a patch series upstreaming Armv8.6 features. Subsequent patches
will upstream intrinsics support and C-lang support for BFloat.
Reviewers: SjoerdMeijer, rjmccall, rsmith, liutianle, RKSimon, craig.topper, jfb, LukeGeeson, sdesmalen, deadalnix, ctetreau
Subscribers: hiraditya, llvm-commits, danielkiss, arphaman, kristof.beyls, dexonsmith
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78190
This patch adds support for DWARF attribute DW_AT_data_location.
Summary:
Dynamic arrays in fortran are described by array descriptor and
data allocation address. Former is mapped to DW_AT_location and
later is mapped to DW_AT_data_location.
Testing:
unit test cases added (hand-written)
check llvm
check debug-info
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D79592
llvm rejects DWARF operator DW_OP_push_object_address.This DWARF
operator is needed for Flang to support allocatable array.
Summary:
Currently llvm rejects DWARF operator DW_OP_push_object_address.
below error is produced when llvm finds this operator.
[..]
invalid expression
!DIExpression(151)
warning: ignoring invalid debug info in pushobj.ll
[..]
There are some parts missing in support of this operator, need to
be completed.
Testing
-added a unit testcase
-check-debuginfo
-check-llvm
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D79306
Sometimes you want to disable a FileCheck directive without removing
it entirely, or you want to write comments that mention a directive by
name. The `COM:` directive makes it easy to do this. For example,
you might have:
```
; X32: pinsrd_1:
; X32: pinsrd $1, 4(%esp), %xmm0
; COM: FIXME: X64 isn't working correctly yet for this part of codegen, but
; COM: X64 will have something similar to X32:
; COM:
; COM: X64: pinsrd_1:
; COM: X64: pinsrd $1, %edi, %xmm0
```
Without this patch, you need to use some combination of rewording and
directive syntax mangling to prevent FileCheck from recognizing the
commented occurrences of `X32:` and `X64:` above as directives.
Moreover, FileCheck diagnostics have been proposed that might complain
about the occurrences of `X64` that don't have the trailing `:`
because they look like directive typos:
<http://lists.llvm.org/pipermail/llvm-dev/2020-April/140610.html>
I think dodging all these problems can prove tedious for test authors,
and directive syntax mangling already makes the purpose of existing
test code unclear. `COM:` can avoid all these problems.
This patch also updates the small set of existing tests that define
`COM` as a check prefix:
- clang/test/CodeGen/default-address-space.c
- clang/test/CodeGenOpenCL/addr-space-struct-arg.cl
- clang/test/Driver/hip-device-libs.hip
- llvm/test/Assembler/drop-debug-info-nonzero-alloca.ll
I think lit should support `COM:` as well. Perhaps `clang -verify`
should too.
Reviewed By: jhenderson, thopre
Differential Revision: https://reviews.llvm.org/D79276
We want to add a way to avoid merging identical calls so as to keep the
separate debug-information for those calls. There is also an asan
usecase where having this attribute would be beneficial to avoid
alternative work-arounds.
Here is the link to the feature request:
https://bugs.llvm.org/show_bug.cgi?id=42783.
`nomerge` is different from `noline`. `noinline` prevents function from
inlining at callsites, but `nomerge` prevents multiple identical calls
from being merged into one.
This patch adds `nomerge` to disable the optimization in IR level. A
followup patch will be needed to let backend understands `nomerge` and
avoid tail merge at backend.
Reviewed By: asbirlea, rnk
Differential Revision: https://reviews.llvm.org/D78659
Changed the language in LLVM_USE_LINKER to more strongly recommend LLD
and to specify that the GNU gold linker is only useful if LLD is
unavailable in binary form and it is the first build of LLVM. Added that
LLD will help when used on ELF-based platforms.
Corrected information in CMAKE_BUILD_TYPE regarding the Release build
type and enabling assertions.
Added option LLVM_ENABLE_ASSERTIONS and mentioned enabling this option
with a Release build as an alternative to using a Debug build.
Specified that the LLVM_OPTIMIZED_TABLEGEN
option is only for Debug builds, that the LLVM_USE_SPLIT_DWARF option
is only available on ELF host platforms, and that setting
CLANG_ENABLE_STATIC_ANALYZER to OFF only slightly improves build time.
These changes address comments made in D75425.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D77346
Sometimes you want to disable a FileCheck directive without removing
it entirely, or you want to write comments that mention a directive by
name. The `COM:` directive makes it easy to do this. For example,
you might have:
```
; X32: pinsrd_1:
; X32: pinsrd $1, 4(%esp), %xmm0
; COM: FIXME: X64 isn't working correctly yet for this part of codegen, but
; COM: X64 will have something similar to X32:
; COM:
; COM: X64: pinsrd_1:
; COM: X64: pinsrd $1, %edi, %xmm0
```
Without this patch, you need to use some combination of rewording and
directive syntax mangling to prevent FileCheck from recognizing the
commented occurrences of `X32:` and `X64:` above as directives.
Moreover, FileCheck diagnostics have been proposed that might complain
about the occurrences of `X64` that don't have the trailing `:`
because they look like directive typos:
<http://lists.llvm.org/pipermail/llvm-dev/2020-April/140610.html>
I think dodging all these problems can prove tedious for test authors,
and directive syntax mangling already makes the purpose of existing
test code unclear. `COM:` can avoid all these problems.
This patch also updates the small set of existing tests that define
`COM` as a check prefix:
- clang/test/CodeGen/default-address-space.c
- clang/test/CodeGenOpenCL/addr-space-struct-arg.cl
- clang/test/Driver/hip-device-libs.hip
- llvm/test/Assembler/drop-debug-info-nonzero-alloca.ll
I think lit should support `COM:` as well. Perhaps `clang -verify`
should too.
Reviewed By: jhenderson, thopre
Differential Revision: https://reviews.llvm.org/D79276
Linkage type was only referenced for functions, not for global
variables.
Clarify that LLVM doesn't make assumption about the allocation size when
no definitive initializer for a global variable is known.
Differential Revision: https://reviews.llvm.org/D78952
This patch adds statistics about the contribution of each object file to
the linked debug info. When --statistics is passed to dsymutil, it
prints a table after linking as illustrated below.
It lists the object file name, the size of the debug info in the object
file in bytes, and the absolute size contribution to the linked dSYM and
the percentage difference. The table is sorted by the output size, so
the object files contributing the most to the link are listed first.
.debug_info section size (in bytes)
-------------------------------------------------------------------------------
Filename Object dSYM Change
-------------------------------------------------------------------------------
basic2.macho.x86_64.o 210b 165b -24.00%
basic3.macho.x86_64.o 177b 150b -16.51%
basic1.macho.x86_64.o 125b 129b 3.15%
-------------------------------------------------------------------------------
Total 512b 444b -14.23%
-------------------------------------------------------------------------------
Differential revision: https://reviews.llvm.org/D79513
The 'nsz' flag is different than 'nnan' or 'ninf' in that it does not create poison.
Make that explicit in the LangRef and fix ValueTracking analysis that misinterpreted
the definition.
This manifests as bugs in InstSimplify shown in the test diffs and as discussed in
PR45778:
https://bugs.llvm.org/show_bug.cgi?id=45778
Differential Revision: https://reviews.llvm.org/D79422
The AMDGPU target has a convention that defined all VGPRs
(execept the initial 32 argument registers) as callee-saved.
This convention is not efficient always, esp. when the callee
requiring more registers, ended up emitting a large number of
spills, even though its caller requires only a few.
This patch revises the ABI by introducing more scratch registers
that a callee can freely use.
The 256 vgpr registers now become:
32 argument registers
112 scratch registers and
112 callee saved registers.
The scratch registers and the CSRs are intermixed at regular
intervals (a split boundary of 8) to obtain a better occupancy.
Reviewers: arsenm, t-tye, rampitec, b-sumner, mjbedy, tpr
Reviewed By: arsenm, t-tye
Differential Revision: https://reviews.llvm.org/D76356
The --output-target documentation has slightly rotted, as the default is
no longer purely based on the input file format, but also the value of
--input-target. This patch updates the documentation to make this
explicit.
Reviewed by: MaskRay, alexshap
Differential Revision: https://reviews.llvm.org/D79318
This addresses:
-Clean up the source code
-Refactor the JSON fields
-Fix the test cases
-Improve the docs for the stats output
Differential Revision: https://reviews.llvm.org/D77789
Summary:
FileCheck documentation contains an example of a numeric variable
defined and used on the same line. This is not currently supported by
FileCheck so this commit fixes the example to use CHECK-SAME for the
variable use.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D79253
This option was added several months ago in commit e84468c1.
Reviewed by: MaskRay, erik.pilkington, steven_wu
Differential Revision: https://reviews.llvm.org/D79166
- Rename DW_OP_LLVM_offset_constu to DW_OP_LLVM_offset_uconst to
matches DW_OP_plus_uconst.
- Correct DW_OP_LLVM_call_ref to be DW_OP_call_ref.
- Move proposed changes to a separate section to clarify that the
introduction section is not part of the changes.
- Fix formatting typos and add missing reference.
- Clarify why DW_OP_LLVM_offset et al do not wrap on overflow.
- Correct syntax of augmentation string.
Differential Revision: https://reviews.llvm.org/D70523
Add llvm.call.preallocated.{setup,arg} instrinsics.
Add "preallocated" operand bundle which takes a token produced by llvm.call.preallocated.setup.
Add "preallocated" parameter attribute, which is like byval but without the copy.
Verifier changes for these IR constructs.
See https://github.com/rnk/llvm-project/blob/call-setup-docs/llvm/docs/CallSetup.md
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74651
This change helps improve `dsymutil` documentation.
- Add missing options
- Re-arrange options in alphabetical order
- Wrap inline options in double-back-quote
- `-v` is for `--version` not `--verbose`
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D78479
Summary:
This change mentions CDE assembly in the LLVM release notes and CDE
intrinsics in both Clang and LLVM release notes.
Reviewers: kristof.beyls, simon_tatham
Reviewed By: kristof.beyls
Subscribers: danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78481
Summary:
This patch add the dataflow option to LLVM_USE_SANITIZER and documents
it.
Tested via check-cxx (wip to fix the errors).
Reviewers: morehouse, #libc!
Subscribers: mgorny, cfe-commits, libcxx-commits
Tags: #clang, #libc
Differential Revision: https://reviews.llvm.org/D78390
Summary:
This matches the behavior of GNU addr2line. We previously treated
hexadecimal addresses as binary if they started with 0b, otherwise as
octal if they started with 0, otherwise as decimal.
This only affects llvm-addr2line; the behavior of llvm-symbolize is
unaffected.
Reviewers: ikudrin, rupprecht, jhenderson
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73306
This moves v32i16/v64i8 to a model consistent with how we
treat integer types with avx1.
This does change the ABI for types vXi16/vXi8 vectors larger than
512 bits to pass in multiple zmms instead of multiple ymms. We'd
already hacked some code to make v64i8/v32i16 pass in zmm.
Cost model is still a bit of a mess. In some place I tried to
match existing behavior. But really we need to account for
splitting and concating costs. Cost model for shuffles is
especially pessimistic.
Differential Revision: https://reviews.llvm.org/D76212
- Unify the sections on DWARF expression and location lists.
- Allow a location description to have one or more single location
descriptions.
- Define context of DWARF expression that includes an initial
stack. Allow initial stack to be used when evaluating location list
expression with overlapping PC ranges.
- Reorganize the DWARF proposal in AMDGPUUsage so suitable for
submission to the DWARF site.
- Replace CFI instruction DW_CFA_LLVM_def_cfa_aspace with
DW_CFA_def_aspace_cfa and DW_CFA_def_aspace_cfa_sf. This is to avoid
the problem that DW_CFA_def_cfa and DW_CFA_def_cfa_sf cannot use a
register that is not the size of an address in the CFA address
space.
- Clarify DWARF address class and DWARF address space. Define language
values for DWARF address classes and specify how they are used by
some common source languages.
- Define rules for accessing registers and derefencing memory when the
type size and register size or byte size operand do not match.
- Numerous cleanups for consistency.
Differential Revision: https://reviews.llvm.org/D70523
This patch extracts the RTTI part of llvm::ErrorInfo into its own class
(RTTIExtends) so that it can be used in other non-error hierarchies, and makes
it compatible with the existing LLVM RTTI function templates (isa, cast,
dyn_cast, dyn_cast_or_null) by adding the classof method.
Differential Revision: https://reviews.llvm.org/D39111
LanguageExtensions.rst:2191: WARNING: Title underline too short.
llvm-symbolizer.rst:157: Error in "code-block" directive: maximum 1 argument(s) allowed, 30 supplied.
This patch adds missing description of enable-no-signed-zeros-fp-math
and enable-no-trapping-fp-math options of llc.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D77713
Now compiler defines 5 sets of constants to represent rounding mode.
These are:
1. `llvm::APFloatBase::roundingMode`. It specifies all 5 rounding modes
defined by IEEE-754 and is used in `APFloat` implementation.
2. `clang::LangOptions::FPRoundingModeKind`. It specifies 4 of 5 IEEE-754
rounding modes and a special value for dynamic rounding mode. It is used
in clang frontend.
3. `llvm::fp::RoundingMode`. Defines the same values as
`clang::LangOptions::FPRoundingModeKind` but in different order. It is
used to specify rounding mode in in IR and functions that operate IR.
4. Rounding mode representation used by `FLT_ROUNDS` (C11, 5.2.4.2.2p7).
Besides constants for rounding mode it also uses a special value to
indicate error. It is convenient to use in intrinsic functions, as it
represents platform-independent representation for rounding mode. In this
role it is used in some pending patches.
5. Values like `FE_DOWNWARD` and other, which specify rounding mode in
library calls `fesetround` and `fegetround`. Often they represent bits
of some control register, so they are target-dependent. The same names
(not values) and a special name `FE_DYNAMIC` are used in
`#pragma STDC FENV_ROUND`.
The first 4 sets of constants are target independent and could have the
same numerical representation. It would simplify conversion between the
representations. Also now `clang::LangOptions::FPRoundingModeKind` and
`llvm::fp::RoundingMode` do not contain the value for IEEE-754 rounding
direction `roundTiesToAway`, although it is supported natively on
some targets.
This change defines all the rounding mode type via one `llvm::RoundingMode`,
which also contains rounding mode for IEEE rounding direction `roundTiesToAway`.
Differential Revision: https://reviews.llvm.org/D77379
D72467 updated the shufflevector instruction to include a constant mask
rather than a mask operand. The LangRef text was vague enough to still
make sense, but it is better to update here too, so there's no confusion
about valid mask values. The text here is adapted from the documentation
code comments for "class ShuffleVectorInst".
Differential Revision: https://reviews.llvm.org/D77396
The LitConfig is shared across the whole test suite. However, since
enabling recursive expansion can be a breaking change for some test
suites, it's important to confine the setting to test suites that
enable it explicitly.
Note that other issues were raised with the way recursiveExpansionLimit
operates. However, this commit simply moves the setting to the right
place -- the mechanism by which it works can be improved independently.
Differential Revision: https://reviews.llvm.org/D77415
SUMMARY:
For the llvm-objdump -D, the symbol name is used as a label in the disassembly for the specific address (when a symbol address is equal to the virtual address in the dump).
In XCOFF, multiple symbols may have the same name, being differentiated by their storage mapping class. It is helpful to print the QualName and not just the name when forming the output label for a csect symbol. The symbol index further removes any ambiguity caused by duplicate names.
To maintain compatibility with the binutils objdump, the XCOFF-specific --symbol-description option is added to enable the enhanced format.
Reviewers: hubert.reinterpretcast, James Henderson, Jason Liu ,daltenty
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D72973
Summary:
This patch is to teach `llvm-objdump` dump dynamic symbols (`-T` and `--dynamic-syms`). Currently, this patch is not fully compatible with `gnu-objdump`, but I would like to continue working on this in next few patches. It has two issues.
1. Some symbols shouldn't be marked as global(g). (`-t/--syms` has same issue as well) (Fixed by D75659)
2. `gnu-objdump` can dump version information and *dynamically* insert before symbol name field.
`objdump -T a.out` gives:
```
DYNAMIC SYMBOL TABLE:
0000000000000000 w D *UND* 0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 printf
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __libc_start_main
0000000000000000 w D *UND* 0000000000000000 __gmon_start__
0000000000000000 w D *UND* 0000000000000000 _ITM_registerTMCloneTable
0000000000000000 w DF *UND* 0000000000000000 GLIBC_2.2.5 __cxa_finalize
```
`llvm-objdump -T a.out` gives:
```
DYNAMIC SYMBOL TABLE:
0000000000000000 w D *UND* 0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 g DF *UND* 0000000000000000 printf
0000000000000000 g DF *UND* 0000000000000000 __libc_start_main
0000000000000000 w D *UND* 0000000000000000 __gmon_start__
0000000000000000 w D *UND* 0000000000000000 _ITM_registerTMCloneTable
0000000000000000 w DF *UND* 0000000000000000 __cxa_finalize
```
Reviewers: jhenderson, grimar, MaskRay, espindola
Reviewed By: jhenderson, grimar
Subscribers: emaste, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75756
Summary: I think it would be better to require the alignment to be >= 1. It is currently confusing to allow both values.
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77372
This will likely introduce catastrophic performance regressions on
older subtargets, but should be correct. A follow up change will
remove the old fp32-denormals subtarget features, and switch to using
the new denormal-fp-math/denormal-fp-math-f32 attributes. Frontends
should be making sure to add the denormal-fp-math-f32 attribute when
appropriate to avoid performance regressions.
Uploading output from `git format-patch` fails when version has
more than 2 dots, e.g. git version 2.24.1.windows.2 which is
currently recommended by e.g. GitExtensions or 2.24.1.rc on Linux.
Differential Revision: https://reviews.llvm.org/D72374
Add an option to llvm-dwarfdump to calculate the bytes within
the debug sections. Dump this numbers when using --statistics
option as well.
This is an initial patch (e.g. we should support other units,
since we only support 'bytes' now).
Differential Revision: https://reviews.llvm.org/D74205
Summary:
As noted in documentation, different repetition modes have different trade-offs:
> .. option:: -repetition-mode=[duplicate|loop]
>
> Specify the repetition mode. `duplicate` will create a large, straight line
> basic block with `num-repetitions` copies of the snippet. `loop` will wrap
> the snippet in a loop which will be run `num-repetitions` times. The `loop`
> mode tends to better hide the effects of the CPU frontend on architectures
> that cache decoded instructions, but consumes a register for counting
> iterations.
Indeed. Example:
>>! In D74156#1873657, @lebedev.ri wrote:
> At least for `CMOV`, i'm seeing wildly different results
> | | Latency | RThroughput |
> | duplicate | 1 | 0.8 |
> | loop | 2 | 0.6 |
> where latency=1 seems correct, and i'd expect the througput to be close to 1/2 (since there are two execution units).
This isn't great for analysis, at least for schedule model development.
As discussed in excruciating detail in
>>! In D74156#1924514, @gchatelet wrote:
>>>! In D74156#1920632, @lebedev.ri wrote:
>> ... did that explanation of the question i'm having made any sense?
>
> Thx for digging in the conversation !
> Ok it makes more sense now.
>
> I discussed it a bit with @courbet:
> - We want the analysis tool to stay simple so we'd rather not make it knowledgeable of the repetition mode.
> - We'd like to still be able to select either repetition mode to dig into special cases
>
> So we could add a third `min` repetition mode that would run both and take the minimum. It could be the default option.
> Would you have some time to look what it would take to add this third mode?
there appears to be an agreement that it is indeed sub-par,
and that we should provide an optional, measurement (not analysis!) -time
way to rectify the situation.
However, the solutions isn't entirely straight-forward.
We can just add an actual 'multiplexer' `MinSnippetRepetitor`, because
if we just concatenate snippets produced by `DuplicateSnippetRepetitor`
and `LoopSnippetRepetitor` and run+measure that, the measurement will
naturally be different from what we'd get by running+measuring
them separately and taking the min.
([[ https://www.wolframalpha.com/input/?i=%28x%2By%29%2F2+%21%3D+min%28x%2C+y%29 | `time(D+L)/2 != min(time(D), time(L))` ]])
Also, it seems best to me to have a single snippet instead of generating
a snippet per repetition mode, since the only difference here is that the
loop repetition mode reserves one register for loop counter.
As far as i can tell, we can either teach `BenchmarkRunner::runConfiguration()`
to produce a single report given multiple repetitors (as in the patch),
or do that one layer higher - don't modify `BenchmarkRunner::runConfiguration()`,
produce multiple reports, don't actually print each one, but aggregate them somehow
and only print the final one.
Initially i've gone ahead with the latter approach, but it didn't look like a natural fit;
the former (as in the diff) does seem like a better fit to me.
There's also a question of the test coverage. It sure currently does work here:
```
$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-8fb949.o
---
mode: inverse_throughput
key:
instructions:
- 'CMOV64rr RAX RAX R11 i_0x0'
- 'CMOV64rr RBP RBP R15 i_0x0'
- 'CMOV64rr RBX RBX RBX i_0x0'
- 'CMOV64rr RCX RCX RBX i_0x0'
- 'CMOV64rr RDI RDI R10 i_0x0'
- 'CMOV64rr RDX RDX RAX i_0x0'
- 'CMOV64rr RSI RSI RAX i_0x0'
- 'CMOV64rr R8 R8 R8 i_0x0'
- 'CMOV64rr R9 R9 RDX i_0x0'
- 'CMOV64rr R10 R10 RBX i_0x0'
- 'CMOV64rr R11 R11 R14 i_0x0'
- 'CMOV64rr R12 R12 R9 i_0x0'
- 'CMOV64rr R13 R13 R12 i_0x0'
- 'CMOV64rr R14 R14 R15 i_0x0'
- 'CMOV64rr R15 R15 R13 i_0x0'
config: ''
register_initial_values:
- 'RAX=0x0'
- 'R11=0x0'
- 'EFLAGS=0x0'
- 'RBP=0x0'
- 'R15=0x0'
- 'RBX=0x0'
- 'RCX=0x0'
- 'RDI=0x0'
- 'R10=0x0'
- 'RDX=0x0'
- 'RSI=0x0'
- 'R8=0x0'
- 'R9=0x0'
- 'R14=0x0'
- 'R12=0x0'
- 'R13=0x0'
cpu_name: bdver2
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { key: inverse_throughput, value: 0.819, per_snippet_value: 12.285 }
error: ''
info: instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BF000000000000000048BB000000000000000048B9000000000000000048BF000000000000000049BA000000000000000048BA000000000000000048BE000000000000000049B8000000000000000049B9000000000000000049BE000000000000000049BC000000000000000049BD0000000000000000490F40C3490F40EF480F40DB480F40CB490F40FA480F40D0480F40F04D0F40C04C0F40CA4C0F40D34D0F40DE4D0F40E14D0F40EC4D0F40F74D0F40FD490F40C35B415C415D415E415F5DC3
...
$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-051eb3.o
---
mode: inverse_throughput
key:
instructions:
- 'CMOV64rr RAX RAX R11 i_0x0'
- 'CMOV64rr RBP RBP RSI i_0x0'
- 'CMOV64rr RBX RBX R9 i_0x0'
- 'CMOV64rr RCX RCX RSI i_0x0'
- 'CMOV64rr RDI RDI RBP i_0x0'
- 'CMOV64rr RDX RDX R9 i_0x0'
- 'CMOV64rr RSI RSI RDI i_0x0'
- 'CMOV64rr R9 R9 R12 i_0x0'
- 'CMOV64rr R10 R10 R11 i_0x0'
- 'CMOV64rr R11 R11 R9 i_0x0'
- 'CMOV64rr R12 R12 RBP i_0x0'
- 'CMOV64rr R13 R13 RSI i_0x0'
- 'CMOV64rr R14 R14 R14 i_0x0'
- 'CMOV64rr R15 R15 R10 i_0x0'
config: ''
register_initial_values:
- 'RAX=0x0'
- 'R11=0x0'
- 'EFLAGS=0x0'
- 'RBP=0x0'
- 'RSI=0x0'
- 'RBX=0x0'
- 'R9=0x0'
- 'RCX=0x0'
- 'RDI=0x0'
- 'RDX=0x0'
- 'R12=0x0'
- 'R10=0x0'
- 'R13=0x0'
- 'R14=0x0'
- 'R15=0x0'
cpu_name: bdver2
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { key: inverse_throughput, value: 0.6083, per_snippet_value: 8.5162 }
error: ''
info: instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000048BE000000000000000048BB000000000000000049B9000000000000000048B9000000000000000048BF000000000000000048BA000000000000000049BC000000000000000049BA000000000000000049BD000000000000000049BE000000000000000049BF000000000000000049B80200000000000000490F40C3480F40EE490F40D9480F40CE480F40FD490F40D1480F40F74D0F40CC4D0F40D34D0F40D94C0F40E54C0F40EE4D0F40F64D0F40FA4983C0FF75C25B415C415D415E415F5DC3
...
$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=min
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c7a47d.o
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2581f1.o
---
mode: inverse_throughput
key:
instructions:
- 'CMOV64rr RAX RAX R11 i_0x0'
- 'CMOV64rr RBP RBP R10 i_0x0'
- 'CMOV64rr RBX RBX R10 i_0x0'
- 'CMOV64rr RCX RCX RDX i_0x0'
- 'CMOV64rr RDI RDI RAX i_0x0'
- 'CMOV64rr RDX RDX R9 i_0x0'
- 'CMOV64rr RSI RSI RAX i_0x0'
- 'CMOV64rr R9 R9 RBX i_0x0'
- 'CMOV64rr R10 R10 R12 i_0x0'
- 'CMOV64rr R11 R11 RDI i_0x0'
- 'CMOV64rr R12 R12 RDI i_0x0'
- 'CMOV64rr R13 R13 RDI i_0x0'
- 'CMOV64rr R14 R14 R9 i_0x0'
- 'CMOV64rr R15 R15 RBP i_0x0'
config: ''
register_initial_values:
- 'RAX=0x0'
- 'R11=0x0'
- 'EFLAGS=0x0'
- 'RBP=0x0'
- 'R10=0x0'
- 'RBX=0x0'
- 'RCX=0x0'
- 'RDX=0x0'
- 'RDI=0x0'
- 'R9=0x0'
- 'RSI=0x0'
- 'R12=0x0'
- 'R13=0x0'
- 'R14=0x0'
- 'R15=0x0'
cpu_name: bdver2
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { key: inverse_throughput, value: 0.6073, per_snippet_value: 8.5022 }
error: ''
info: instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BA000000000000000048BB000000000000000048B9000000000000000048BA000000000000000048BF000000000000000049B9000000000000000048BE000000000000000049BC000000000000000049BD000000000000000049BE000000000000000049BF0000000000000000490F40C3490F40EA490F40DA480F40CA480F40F8490F40D1480F40F04C0F40CB4D0F40D44C0F40DF4C0F40E74C0F40EF4D0F40F14C0F40FD490F40C3490F40EA5B415C415D415E415F5DC35541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BA000000000000000048BB000000000000000048B9000000000000000048BA000000000000000048BF000000000000000049B9000000000000000048BE000000000000000049BC000000000000000049BD000000000000000049BE000000000000000049BF000000000000000049B80200000000000000490F40C3490F40EA490F40DA480F40CA480F40F8490F40D1480F40F04C0F40CB4D0F40D44C0F40DF4C0F40E74C0F40EF4D0F40F14C0F40FD4983C0FF75C25B415C415D415E415F5DC3
...
```
but i open to suggestions as to how test that.
I also have gone with the suggestion to default to this new mode.
This was irking me for some time, so i'm happy to finally see progress here.
Looking forward to feedback.
Reviewers: courbet, gchatelet
Reviewed By: courbet, gchatelet
Subscribers: mstojanovic, RKSimon, llvm-commits, courbet, gchatelet
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76921
The requirement for deopt parameter to be in gc parameter if it can
be modified by GC is very strong and difficult to follow.
The key example of why this can't work:
%p1 = bitcast i8* %p to i8*
statepoint [gc = (%p1)], [deopt = (%p1)]
The optimizer is allowed to replace either use (or both) of %p1 with %p.
If it updates only one of the two (entirely legal), the two sets do not overlap.
So this change removes the strong wording.
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D77122
We already mention that `noalias` is modeled after the C99 `restrict`
qualifier but we did omit one important requirement in the description.
For the restrict guarantees the object affected has to be modified
during the execution of the function, in any way (see 6.7.3.1.4 in [0]).
There are two reasons we want this restriction as well:
1) To match the `restrict` semantics when we lower it to `noalias`.
2) To allow the reasoning that the object pointed to by a `noalias`
pointer is not modified through means not derived from this
pointer. Hence, following the uses of that pointer is sufficient
to determine potential modifications.
The discussion on this came up as part of D73428. In that patch the
Attributor is taught to derive `noalias` for call site arguments based
on alias queries against objects that are accessed in the callee. This
is possible even if the pointer passed at the call site was "not-`noalias`".
To simplify the logic there *and* to allow the use of `noalias` as
described in 2) above, it is beneficial to follow the C `restrict`
semantics in cases where there might be "read-read-aliases". Note that
AliasAnalysis* queries for read only objects already result in
`NoAlias` even if the pointers might "alias".
* From this point of view our Alias Analysis is basically a Dependence
Analysis.
[0] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D74935
Summary:
This patch clarifies the semantics of branching on undef value.
Defining `br undef` as undefined behavior explains optimizations that use branch conditions, such as CVP (D76931) and GVN (propagateEquality).
For `switch cond`, it is defined to raise UB if cond is an expression containing undef && cond is not frozen &&
it may yield different values.
This allows that at the destination block the branch condition can be assumed to be frozen already (otherwise UB was already triggered).
This condition is slightly stricter than MemorySanitizer, which allows undef-y condition if it always leads to the same destination,
but it does not break MemorySanitizer because we are giving stricter constraint.
Reviewers: efriedma, fhahn, nikic, spatel, jdoerfert, nlopes
Reviewed By: nlopes
Subscribers: regehr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76973
I added a list of options to configure should someone have issues with
long build time or running out of memory. This was added under common
problems in the getting started section of the documentation.
Reviewed By: Meinersbur, dim, e-leclercq
Differential Revision: https://reviews.llvm.org/D75425
This allows defining substitutions in terms of other substitutions. For
example, a %build substitution could be defined in terms of a %cxx
substitution as '%cxx %s -o %t.exe' and the script would be properly
expanded.
Differential Revision: https://reviews.llvm.org/D76178
1.Add instructions to update author when committing other's patch
We have updated DeveloperPolicy to show how to change author in
https://reviews.llvm.org/D72468
We should also update Phabricator page to include such infomation,
in case people follow the steps here and forget to update author info.
2. Replace `git llvm push` with `git push`
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D76718
There has been some ongoing confusion regarding when to use `llvm_unreachable`
which this patch attempts to address. Specifically, the confusion has been
around whether `llvm_unreachable` is intended to mark only unreachable code
paths that the compiler cannot determine itself or to mark a code path which is
unconditionally a bug to reach. Based on email and IRC discussions, it sounds
like "unconditional bug to reach" is the consensus.
to remap object file paths (but no source paths) before
processing. This is meant to be used for Clang objects where the
module cache location was remapped using ``-fdebug-prefix-map``; to
help dsymutil find the Clang module cache.
<rdar://problem/55685132>
Differential Revision: https://reviews.llvm.org/D76391
Summary:
The next release of LLVM will support the full ACLE spec for MVE intrinsics,
so it's worth saying so in the release notes.
Reviewers: kristof.beyls
Reviewed By: kristof.beyls
Subscribers: cfe-commits, hans, dmgreen, llvm-commits
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D76513
Summary:
Add new generic MIR opcodes G_SADDSAT etc. Add support in IRTranslator
for translating the saturating add/subtract intrinsics to the new
opcodes.
Reviewers: aemerson, dsanders, paquette, arsenm
Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, volkan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76600
This handles not paths embedded in debug info, but also in sources.
Since the use of this flag is controlled by an option, rather than
replacing the new option, we add a new option.
Differential Revision: https://reviews.llvm.org/D76018
Remove the gap left between the stack pointer (s32) and frame pointer
(s34) now that the scratch wave offset is no longer a part of the
calling convention ABI.
Update llvm/docs/AMDGPUUsage.rst to reflect the change.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75657
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This allows us to removes the scratch wave
offset register from the calling convention ABI.
As part of this change, allow the use of an inline constant zero for the
SOffset of MUBUF instructions accessing the stack in entry functions
when a frame pointer is not requested/required. Entry functions with
calls still need to set up the calling convention ABI stack pointer
register, and reference it in order to address arguments of called
functions. The ABI stack pointer register remains unswizzled, but is now
wave-relative instead of queue-relative.
Non-entry functions also use an inline constant zero SOffset for
wave-relative scratch access, but continue to use the stack and frame
pointers as before. When the stack or frame pointer is converted to a
swizzled offset it is now scaled directly, as the scratch wave offset no
longer needs to be subtracted first.
Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling
convention.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75138
This is copied from the suggested text by @regehr in:
https://bugs.llvm.org/show_bug.cgi?id=20895
The way forward was not clear for several years, but now that we
have 'freeze' and Alive2, the behavior should be documented.
Also see comments in D76332.
Makes tests fail on Windows, see https://reviews.llvm.org/D70720#1924542
This reverts commit 3a5ddedadb671e485ce5c638142817879ac14a8c, and
follow-ups:
f4cb9c919e28276222873453cf85de9e5a3c7be5
042eb0482aa758057c4f77616a4696cdb21b4fcc
c0cf5f5da9a7bf1bdf43ed53287b0f634fc53045
18649f48139932377c2a2909f1fb600bf5cf6e57
f62b898c1f5dd77e68b53570dc2679877bcbe4c2
This adds the --debug-vars option to llvm-objdump, which prints
locations (registers/memory) of source-level variables alongside the
disassembly based on DWARF info. A vertical line is printed for each
live-range, with a label at the top giving the variable name and
location, and the position and length of the line indicating the program
counter range in which it is valid.
Currently, this only works for object files, not executables or shared
libraries.
Differential revision: https://reviews.llvm.org/D70720
AVR has been enabled by default since
c480c584a0b7de675dddb2616122fc218cd72c0e, the tests have been stable for
a couple days now, revert extremely unlikely.
LLVM currently supports CSK_MD5 and CSK_SHA1 source file checksums in
debug info. This change adds support for CSK_SHA256 checksums.
The SHA256 checksums are supported by the CodeView debug format.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D75785
Summary: This patch adds the basic utilities to deal with dropable uses. dropable uses are uses that we rather drop than prevent transformations, for now they are limited to uses in llvm.assume.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: uenoku, lebedev.ri, mgorny, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73404
Summary: As discussed in D75505, it's not particularly useful to change the type of a load to/from floating-point/integer because it's followed by a bitcast, and it might lead to surprising code generation. Check that this doesn't generally happen.
Reviewers: lebedev.ri
Subscribers: jkorous, dexonsmith, ributzka, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75644
This is an update to the documentation of our community code-review process.
Based on the RFC: High-Level Code-Review Documentation Update
(http://lists.llvm.org/pipermail/llvm-dev/2019-November/136808.html).
In this patch, I've pulled out the documentation into a separate file, and
broken it into a number of subsections. This is, of course, just one further
step in better documenting our community processes. I expect we'll continue to
improve this over time. Thank you to everyone who provided feedback!
Differential Revision: https://reviews.llvm.org/D71916
Try again with an up-to-date version of D69471 (99317124 was a stale
revision).
---
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
Tools working with object files on Darwin (e.g. lipo) may need to know
properties like the CPU type and subtype of a bitcode file. The logic of
converting a triple to a Mach-O CPU_(SUB_)TYPE should be provided by
LLVM instead of relying on tools to re-implement it.
Differential Revision: https://reviews.llvm.org/D75067
Add CoalescingBitVector to ADT. This is part 1 of a 3-part series to
address a compile-time explosion issue in LiveDebugValues.
---
CoalescingBitVector is a bitvector that, under the hood, relies on an
IntervalMap to coalesce elements into intervals.
CoalescingBitVector efficiently represents sets which predominantly
contain contiguous ranges (e.g. the VarLocSets in LiveDebugValues,
which are very long sequences that look like {1, 2, 3, ...}). OTOH,
CoalescingBitVector isn't good at representing sets with lots of gaps
between elements. The first N coalesced intervals of set bits are stored
in-place (in the initial heap allocation).
Compared to SparseBitVector, CoalescingBitVector offers more predictable
performance for non-sequential find() operations. This provides a
crucial speedup in LiveDebugValues.
Differential Revision: https://reviews.llvm.org/D74984