Given a vecreduce_add node, detect the below pattern and convert it to the node
sequence with UABDL, [S|U]ADB and UADDLP.
i32 vecreduce_add(
v16i32 abs(
v16i32 sub(
v16i32 [sign|zero]_extend(v16i8 a), v16i32 [sign|zero]_extend(v16i8 b))))
=================>
i32 vecreduce_add(
v4i32 UADDLP(
v8i16 add(
v8i16 zext(
v8i8 [S|U]ABD low8:v16i8 a, low8:v16i8 b
v8i16 zext(
v8i8 [S|U]ABD high8:v16i8 a, high8:v16i8 b
Differential Revision: https://reviews.llvm.org/D104042
The sorting, obviously, must be stable, else we will have random assembly fluctuations.
Apparently there was no test coverage that would benefit from that,
so i've added one test.
The sorting consists of two parts - just sort the input vectors,
and recompute the shuffle mask -> input vector mapping.
I don't believe we need to do anything else.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D104187
Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary.
This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288
(as mentioned by @fhahn), after committing D91250.
Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99173
There's no need for `toSmallVector()` as `SmallVector.h` already provides a `to_vector` free function that takes a range.
Reviewed By: Quuxplusone
Differential Revision: https://reviews.llvm.org/D104024
This patch implements vector-predicated intrinsics on IR level for fadd,
fsub, fmul, fdiv and frem. There operate in the default floating-point
environment. We will use constrained fp operand bundles for constrained
vector-predicated fp math (D93455).
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D93470
The comment mentions deplibs should be removed in 4.0. Removing it in this patch.
Reviewed By: compnerd, dexonsmith, lattner
Differential Revision: https://reviews.llvm.org/D102763
Handle "short" in a case-insensitive fashion in MASM.
Required to correctly parse z_Windows_NT-586_asm.asm from the OpenMP runtime.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D104195
Did not correctly handle "jecxz short <address>".
Discovered while working on LLVM-ML; shows up in z_Windows_NT-586_asm.asm from the OpenMP runtime
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D104194
Lower truncations and expansions between fp128 and half values into libcalls.
Expand truncating stores into two separate truncation and a store operations.
Reviewed By: jrtc27
Differential Revision: https://reviews.llvm.org/D104185
This adds t2WhileLoopStartTP, similar to the t2DoLoopStartTP added in
D90591. It keeps a reference to both the tripcount register and the
element count register, so that the ARMLowOverheadLoops pass in the
backend can pick the correct one without having to search for it from
the operand of a VCTP.
Differential Revision: https://reviews.llvm.org/D103236
For CMP imm instruction, when the operand 1 is symbol address we should
check if it is immediate first. Here is the example code.
`CMP64mi32 $noreg, 8, killed renamable $rcx, @d, $noreg, @a, implicit-def
$eflags`
Many thanks to Craig, Topper for the test case to reproduce this issue.
Differential Revision: https://reviews.llvm.org/D104037
When we're building the runtimes for multiple platform targets, we
create umbrella build targets for each distribution component, but those
targets didn't have any dependencies and were just no-ops. Make the
umbrella target depend on the sub-targets for each platform to fix this,
which is consistent with the behavior of the umbrella targets for each
runtime, and also consistent with the behavior when we've only specified
the default target.
Since this only comes up with inputs containing sections at least 4GB
large (I guess I could use a bzero section or something, so the input
file doesn't have to be 4GB, but even then the output file would have to
be 4GB, right?) I've skipped testing this. If there's a nice way to test
this without needing 4GB inputs or output files.
The subtlety here is demonstrated by this code:
struct t { operator uint64_t(); };
static_assert(std::is_same_v<int, decltype(std::declval<bool>() ? 0 : std::declval<t>())>);
static_assert(std::is_same_v<uint64_t, decltype(std::declval<bool>() ? 0 : std::declval<uint64_t>())>);
Because of this difference, the original source code was getting an int
type (truncating the actual size) and then extending it again, resulting
in bogus values (I haven't thought through this hard enough to explain
why the resulting value was 0xffff... - sign extension, possible UB, but
in any case it's the wrong answer - in this particular case I was
looking at that resulted in a size so large that we couldn't open a file
large enough to write to and ended up with a rather vague:
error: 'file_name.o': Invalid argument
For CMP imm instruction, when the operand 1 is symbol address we should
check if it is immediate first. Here is the example code.
`CMP64mi32 $noreg, 8, killed renamable $rcx, @d, $noreg, @a, implicit-def
$eflags`
Many thanks to Craig, Topper for the test case to reproduce this issue.
Differential Revision: https://reviews.llvm.org/D104037
IHexWriter was evaluating a section's physical address when deciding if
that section should be written to an output. This approach does not
account for a zero-sized section that has the same physical address as a
sized section. The behavior varies from GNU objcopy, and may result in a
HEX file that does not include all program sections.
The IHexWriter now excludes zero-sized sections when deciding what
should be written to the output. This affects the contents of the
writer's `Sections` collection; we will not try to insert multiple
sections that could have the same physical address. The behavior seems
consistent with GNU objcopy, which always excludes empty sections,
no matter the address.
The new test case evaluates the IHexWriter behavior when provided a
variety of empty sections that overlap or append a filled section. See
the input file's comments for more information. Given that test input,
and the change to the IHexWriter, GNU objcopy and llvm-objcopy produce
the same output.
Reviewed By: jhenderson, MaskRay, evgeny777
Differential Revision: https://reviews.llvm.org/D101332
This patch is to address https://bugs.llvm.org/show_bug.cgi?id=50610.
In computed goto pattern, there are usually a list of basic blocks that are all targets of indirectbr instruction, and each basic block also has address taken and stored in a variable.
CHR pass could potentially clone these basic blocks, which would generate a cloned version of the indirectbr and clonved version of all basic blocks in the list.
However these basic blocks will not have their addresses taken and stored anywhere. So latter SimplifyCFG pass will simply remove all tehse cloned basic blocks, resulting in incorrect code.
To fix this, when searching for scopes, we skip scopes that contains BBs with addresses taken.
Added a few test cases.
Reviewed By: aeubanks, wenlei, hoy
Differential Revision: https://reviews.llvm.org/D103867
This is an attempt to fix clang test failures due to 'nonportable-include-path'
warnings on Windows when a path to llvm-project's base directory contains some
uppercase letters (excluding a drive letter).
The issue originates from 2 problems:
* discovery.py loads site config in lower case causing all the paths
based on __file__ and requested within the config file to be in lowercase as well,
* neither os.path.abspath() nor os.path.realpath() (both used to obtain paths of
config files, sources, object directories, etc) do not return paths in the correct
case for Windows (at least consistently for all python versions).
As os.path library doesn't seem to provide any relaible way to restore
the case for paths on Windows, this patch proposes to use pathlib.resolve().
pathlib is a part of Python 3.4 while llvm lit requires Python 3.6.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D103014