This saves needing to call getInt32 ourselves. Making the code a little shorter.
The test changes are because insert/extract use getInt64 internally. Shouldn't be a functional issue.
This cleanup because I plan to write similar code for expandload/compressstore.
llvm-svn: 355767
When installing runtimes with install-runtimes-stripped, we don't want
to just strip them, we also want to preserve the debugging information
for potential debugging. To make it possible to later find the stripped
debugging information, we want to use the .build-id layout:
https://fedoraproject.org/wiki/RolandMcGrath/BuildID#Find_files_by_build_ID
That is, for libfoo.so with build ID abcdef1234, the debugging information
will be installed into lib/debug/.build-id/ab/cdef1234. llvm-objcopy
already has support for stripping files and linking the debugging
stripped output into the right location. However, CMake doesn't support
customizing strip invocation for the *-stripped targets. So instead, we
replace CMAKE_STRIP with a custom script that invokes llvm-objcopy with
the right command line flags.
Differential Revision: https://reviews.llvm.org/D59127
llvm-svn: 355765
There are special cases in the scalarization for constant masks. If we hit one of the special cases we don't need to reset the iteration.
Noticed while starting work on adding expandload/compressstore to this pass.
llvm-svn: 355754
Summary:
Floating-point CSRs should be accessible even when F extension is not enabled.
But pseudo instructions that access floating point CSRs still require the F extension.
GNU tools already implement this behavior. RISC-V spec is pending update to reflect
this behavior and to extend it to pseudo instructions that access floating point CSRs.
Reviewers: asb
Reviewed By: asb
Subscribers: asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, llvm-commits
Differential Revision: https://reviews.llvm.org/D58932
llvm-svn: 355753
r44412 fixed a huge compile time regression but it needed ModifiedDT flag to be
maintained correctly in optimizations in optimizeBlock() and optimizeInst().
Function optimizeSelectInst() does not update the flag.
This patch propagates the flag in optimizeSelectInst() back to
optimizeBlock().
This patch also removes ModifiedDT in CodeGenPrepare class (which is not used).
The property of ModifiedDT is now recorded in a ref parameter.
Differential Revision: https://reviews.llvm.org/D59139
llvm-svn: 355751
Specifically, compute and Print Type and Section columns.
This is a re-commit of rL354833, after fixing the Asan problem found a a buildbot.
Differential Revision: https://reviews.llvm.org/D59060
llvm-svn: 355742
An extension of D58282 noted in PR39665:
https://bugs.llvm.org/show_bug.cgi?id=39665
This doesn't answer the request to use movmsk, but that's an
independent problem. We need this and probably still need
scalarization of FP selects because we can't do that as a
target-independent transform (although it seems likely that
targets besides x86 should have this transform).
llvm-svn: 355741
Summary:
This patch works around the bug in the ptxas tool with the processing of bytes
separated by the comma symbol. The emission of the packed string is
temporarily disabled.
Reviewers: tra
Subscribers: jholewinski, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59148
llvm-svn: 355740
Summary:
This change change the instrumentation to allow users to view the registers at the point at which tag mismatch occured. Most of the heavy lifting is done in the runtime library, where we save the registers to the stack and emit unwind information. This allows us to reduce the overhead, as very little additional work needs to be done in each __hwasan_check instance.
In this implementation, the fast path of __hwasan_check is unmodified. There are an additional 4 instructions (16B) emitted in the slow path in every __hwasan_check instance. This may increase binary size somewhat, but as most of the work is done in the runtime library, it's manageable.
The failure trace now contains a list of registers at the point of which the failure occured, in a format similar to that of Android's tombstones. It currently has the following format:
Registers where the failure occurred (pc 0x0055555561b4):
x0 0000000000000014 x1 0000007ffffff6c0 x2 1100007ffffff6d0 x3 12000056ffffe025
x4 0000007fff800000 x5 0000000000000014 x6 0000007fff800000 x7 0000000000000001
x8 12000056ffffe020 x9 0200007700000000 x10 0200007700000000 x11 0000000000000000
x12 0000007fffffdde0 x13 0000000000000000 x14 02b65b01f7a97490 x15 0000000000000000
x16 0000007fb77376b8 x17 0000000000000012 x18 0000007fb7ed6000 x19 0000005555556078
x20 0000007ffffff768 x21 0000007ffffff778 x22 0000000000000001 x23 0000000000000000
x24 0000000000000000 x25 0000000000000000 x26 0000000000000000 x27 0000000000000000
x28 0000000000000000 x29 0000007ffffff6f0 x30 00000055555561b4
... and prints after the dump of memory tags around the buggy address.
Every register is saved exactly as it was at the point where the tag mismatch occurs, with the exception of x16/x17. These registers are used in the tag mismatch calculation as scratch registers during __hwasan_check, and cannot be saved without affecting the fast path. As these registers are designated as scratch registers for linking, there should be no important information in them that could aid in debugging.
Reviewers: pcc, eugenis
Reviewed By: pcc, eugenis
Subscribers: srhines, kubamracek, mgorny, javed.absar, krytarowski, kristof.beyls, hiraditya, jdoerfert, llvm-commits, #sanitizers
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D58857
llvm-svn: 355738
LLVM is always built; including it in LLVM_ENABLE_PROJECTS has no
effect, but since it's in LLVM_ALL_PROJECTS, we produce a confusing
message about it being disabled. Drop it from LLVM_ALL_PROJECTS to avoid
this. Pointed out by David Greene on the mailing list [1].
[1] http://lists.llvm.org/pipermail/llvm-dev/2019-March/130854.html
llvm-svn: 355735
Commit r355068 "Fix IR/Analysis layering issue with OptBisect" uses the
template
return Gate.isEnabled() && !Gate.shouldRunPass(this, getDescription(...));
for all pass kinds. For the RegionPass, it left out the not operator,
causing region passes to be skipped as soon as a pass gate is used.
llvm-svn: 355733
When matching half of the build_vector to a load, there could still be
a hidden dependency on the other half of the build_vector the pattern
wouldn't detect. If there was an additional chain dependency on the
other value, a cycle could be introduced.
I don't think a tablegen pattern is capable of matching the necessary
conditions, so move this into PreprocessISelDAG. Check isPredecessorOf
for the other value to avoid a cycle. This has a warning that it's
expensive, so this should probably be moved into an MI pass eventually
that will have more freedom to reorder instructions to help match
this. That is currently complicated by the lack of a computeKnownBits
type mechanism for the selected function.
llvm-svn: 355731
This avoids breaking possible value dependencies when sorting loads by
offset.
AMDGPU has some load instructions that write into the high or low bits
of the destination register, and have a tied input for the other input
bits. These can easily have the same base pointer, but be a swizzle so
the high address load needs to come first. This was inserting glue
forcing the opposite ordering, producing a cycle the InstrEmitter
would assert on. It may be potentially expensive to look for the
dependency between the other loads, so just skip any where this could
happen.
Fixes bug 40936 by reverting r351379, which added a hacky attempt to
fix this by adding chains in this case, which I think was just working
around broken glue before the InstrEmitter. The core of the patch is
re-implementing the fix for that problem.
llvm-svn: 355728
This was checking the wrong operands for the base register and the
offsets. The indexes are shifted by the number of output registers
from the machine instruction definition, and the chain is moved to the
end.
llvm-svn: 355722
Summary:
If the LLVM module shows that it has debug info, but the file is
actually empty and the real debug info is not emitted, the ptxas tool
emits error 'Debug information not found in presence of .target debug'.
We need at leas one empty debug section to silence this message. Section
`.debug_loc` is not emitted for PTX and we can emit empty `.debug_loc`
section if `debug` option was emitted.
Reviewers: tra
Subscribers: jholewinski, aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D57250
llvm-svn: 355719
many valnos.
Recently we found compile time out problem in several cases when
SpeculativeLoadHardening was enabled. The significant compile time was spent
in register coalescing pass, where register coalescer tried to join many other
live intervals with some very large live intervals with many valnos.
Specifically, every time JoinVals::mapValues is called, computeAssignment will
be called by getNumValNums() times of the target live interval. If the large
live interval has N valnos and has N copies associated with it, trying to
coalescing those copies will at least cost N^2 complexity.
The patch adds some limit to the effort trying to join those very large live
intervals with others. By default, for live interval with > 100 valnos, and
when it has been coalesced with other live interval by more than 100 times,
we will stop coalescing for the live interval anymore. That put a compile
time cap for the N^2 algorithm and effectively solves the compile time
problem we saw.
Differential revision: https://reviews.llvm.org/D59143
llvm-svn: 355714
The indexed variant of vfmal.f16 and vfmsl.f16
instructions use the uppser bits of the indexed
operand to store the index (1 bit for the double
variant, 2 bits for the quad).
This limits the usable registers to d0 - d7 or
s0 - s15. This patch enforces this limitation.
Differential Revision: https://reviews.llvm.org/D59021
llvm-svn: 355707
llvm-readelf prints relocation addends as:
<symbol value>[+-]<absolute addend>
where [+-] is determined from whether addend is less than zero or not.
However, it does not print the +/- if there is no symbol, which meant
that negative addends became their positive value with no indication
that this had happened. This patch stops the absolute conversion when
addends are negative and there is no associated symbol.
Reviewed by: Higuoxing, mattd, MaskRay
Differential Revision: https://reviews.llvm.org/D59095
llvm-svn: 355696
From the Python subprocess docs:
If shell is True, it is recommended to pass args as a string rather than as
a sequence.
[...]
If args is a sequence, the first item specifies the command string, and any
additional items will be treated as additional arguments to the shell itself.
Prior to this change, the `--version` would be passed to the shell, not to
a potential gn binary on $PATH, and running `gn` without any arguments makes
it exit with an exit code != 0, so the script would think that there wasn't
a working gn binary on $PATH.
Fix this by following the documentation's recommendation of using a string
now that we pass shell=True. I tested this on macOS and Windows, each with
the three cases of
- no gn on PATH (should run gn downloaded by get.py if present,
else suggest running get.py)
- broken gn wrapper on PATH (should behave like the previous item)
- working gn on PATH (should use gn on PATH)
llvm-svn: 355694
`os.uname()` doesn't exist on Windows, so use `platform.machine()` which
returns `os.uname()[4]` on non-Win and (on 64-bit systems) "AMD64" on Windows.
Also use `sys.platform` instead of `platform` to check for Windows-ness for the
file extension in gn.py (get.py got this right).
Differential Revision: https://reviews.llvm.org/D59115
llvm-svn: 355693
Use this feature to fix a bug on ARM where 4 byte alignment is
incorrectly assumed.
Differential Revision: https://reviews.llvm.org/D57335
llvm-svn: 355685
Summary:
Right now, when we encounter a string equality check,
e.g. `if (memcmp(a, b, s) == 0)`, we try to expand to a comparison if `s` is a
small compile-time constant, and fall back on calling `memcmp()` else.
This is sub-optimal because memcmp has to compute much more than
equality.
This patch replaces `memcmp(a, b, s) == 0` by `bcmp(a, b, s) == 0` on platforms
that support `bcmp`.
`bcmp` can be made much more efficient than `memcmp` because equality
compare is trivially parallel while lexicographic ordering has a chain
dependency.
Subscribers: fedor.sergeev, jyknight, ckennelly, gchatelet, llvm-commits
Differential Revision: https://reviews.llvm.org/D56593
llvm-svn: 355672
We were just checking pointer size and type primitive size. But this caused unintended things like vectors of half being accepted by masked load/store.
For FP we now explicitly check for only double and float.
For pointers we now let any pointer through. Trusting that only 32 and 64 would be used to generate assembly.
We only check bitwidth after checking that the type is an integer.
llvm-svn: 355667
This change is a consequence of the discussion in "RFC: Place libs in
Clang-dedicated directories", specifically the suggestion that
libunwind, libc++abi and libc++ shouldn't be using Clang resource
directory. Tools like clangd make this assumption, but this is
currently not true for the LLVM_ENABLE_PER_TARGET_RUNTIME_DIR build.
This change addresses that by moving the output of these libraries to
lib/<target> and include/ directories, leaving resource directory only
for compiler-rt runtimes and Clang builtin headers.
Differential Revision: https://reviews.llvm.org/D59013
llvm-svn: 355665
Summary:
In r349534, objc arc implementation is switched to use intrinsics and at
the same time, clang.arc.use is renamed to llvm.objc.clang.arc.use to
make the naming more consistent. The side-effect of that is llvm no
longer recognize it as intrinsics and codegen external references to
it instead.
Rather than upgrade the old intrinsics name to the new one and wait for
the arc-contract pass to remove it, simply remove it in the bitcode
upgrader.
rdar://problem/48607063
Reviewers: pete, ahatanak, erik.pilkington, dexonsmith
Reviewed By: pete, dexonsmith
Subscribers: jkorous, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59112
llvm-svn: 355663
to get the modules bots running again.
The LLVM_DEBUG macro only plays well with a modular build of LLVM when
the header is marked as textual, but doing so causes redefinition
errors.
llvm-svn: 355653
Use the system shell to see if we can find a 'gn' binary on $PATH. This solves the error wherein subprocess.call fails ungracefully if the binary doesn't exist.
llvm-svn: 355645