This patch implements a target independent DAG combine to produce multiply-high
instructions from shifts. This DAG combine will combine shifts for any type as
long as the MULH on the narrow type is legal.
For now, it is enabled on PowerPC as PowerPC is the only target that has an
implementation of the isMulhCheaperThanMulShift TLI hook introduced in
D78271.
Moreover, this DAG combine focuses on catching the pattern:
(shift (mul (ext <narrow_type>:$a to <wide_type>), (ext <narrow_type>:$b to <wide_type>)), <narrow_width>)
to produce mulhs when we have a sign-extend, and mulhu when we have
a zero-extend.
The patch performs the following checks:
- Operation is a right shift arithmetic (sra) or logical (srl)
- Input to the shift is a multiply
- Both operands to the shift are sext/zext nodes
- The extends into the multiply are both the same
- The narrow type is half the width of the wide type
- The shift amount is the width of the narrow type
- The respective mulh operation is legal
Differential Revision: https://reviews.llvm.org/D78272
Summary:
Jump tables for most targets cannot handle out of range indices by
themselves, so LLVM emits range checks to guard the jump
tables. WebAssembly, on the other hand, implements jump tables using
the br_table instruction, which takes a default branch target as an
operand, making the range checks redundant. This patch introduces a
new MachineFunction pass in the WebAssembly backend to find and
eliminate the redundant range checks.
Reviewers: aheejin, dschuff
Subscribers: mgorny, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80863
'git push' command, without any other arguments, can do different
things depending on the local configuration of Git. This patch
updates the 'git push' command with extra arguments to be more
resilient to any local configuration.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D79964
Some of the --debug-* options can take an optional offset. Although the
man page does a good job of making that clear, it's much harder to
discover from the help output.
Currently the only reference to this is the following sentence:
> Where applicable these parameters take an optional =<offset> argument
> to dump only the entry at the specified offset.
This patch changes the help output from to print [=<offset>] after the
options that take an offset.
--debug-info[=<offset>] - Dump the .debug_info section
rdar://problem/63150066
Differential revision: https://reviews.llvm.org/D80959
These are scalar instructions that change vector instructions, so they
should not be executed without any active lanes.
The implementation of -amdgpu-skip-threshold also seem to be backwards
from expected, since decreasing it prevents removal.
Summary:
Added initial codegen for 'affinity' clauses on task directives.
Emits next code:
```
kmp_task_affinity_info_t affs[<num_elems>];
void *td = __kmpc_task_alloc(..);
affs[<i>].base = &data_i;
affs[<i>].size = sizeof(data_i);
__kmpc_omp_reg_task_with_affinity(&loc, <gtid>, td, <num_elems>, affs);
```
The result returned by the call of `__kmpc_omp_reg_task_with_affinity`
function is ignored currently sincethe runtime currently ignores args
and returns 0 uncoditionally.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, sstefan1, llvm-commits, cfe-commits, caomhin
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80240
This teaches yaml2obj to allocate file space for a no-bits section
when there is a non-nobits section in the same segment that follows it.
It was discussed in D78005 thread and matches GNU linkers and LLD behavior.
Differential revision: https://reviews.llvm.org/D80629
Instead of using a fake call and metadata to temporarily represent a probed
static alloca, use a pseudo instruction.
This is inspired by the SystemZ approach proposed in https://reviews.llvm.org/D78717.
Differential Revision: https://reviews.llvm.org/D80641
The pass which infers when it's legal to load a global address space
as SMRD was only considering amdgpu_kernel, and ignoring the shader
entry type calling conventions.
The collectCallSiteParameters() method searches for instructions
which load values into registers used for parameters passing.
Previously, interpretation of those values, loaded by one such
instruction, was implemented inside collectCallSiteParameters() method.
This patch moves the interpretation code from collectCallSiteParameters()
method into a separate static method named interpretValue. New method is
called from collectCallSiteParameters() to process each instruction from
targeted instruction scope.
The collectCallSiteParameters() searches for loaded parameter value
among instructions which precede the call instruction, inside the same
basic block. When needed, new method (interpretValue) could be used for
searching any instruction scope.
This is preparation for search of parameter value, loaded inside call
delay slot.
Patch by Nikola Tesic
Differential revision: https://reviews.llvm.org/D78106
Summary:
This is a result of the discussion at D78113. Previously we would be
only giving the current offset at which the error was detected. However,
this was phrased somewhat ambiguously (as it could also mean that end of
data was at that offset). The new error message includes the current
offset as well as the extent of the data being read.
I've changed a couple of file-level static functions into private member
functions in order to avoid passing a bunch of new arguments everywhere.
Reviewers: dblaikie, jhenderson
Subscribers: hiraditya, MaskRay, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78558
LV currently only supports power of 2 vectorization factors, which has
been made explicit with the assertion added in
840450549c9199150cbdee29acef756c19660ca1.
However, if the widest type is not a power-of-2 the computed MaxVF won't
be a power-of-2 either. This patch updates computeFeasibleMaxVF to
ensure the returned value is a power-of-2 by rounding down to the
nearest power-of-2.
Fixes PR46139.
Reviewers: Ayal, gilr, rengolin
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D80870
gc.relocate intrinsic is special in that its second and third operands
are not real values, but indices into relocate's parent statepoint list
of GC pointers.
To be CSE'd, they need special handling in `isEqual()` and `getHashCode()`.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D80445
MIPS 64-bit ABI does not provide special PC-relative relocation like
R_MIPS_PC32 in 32-bit case. But we can use a "chain of relocation"
defined by N64 ABIs. In that case one relocation record might contain up
to three relocations which applied sequentially. Width of a final relocation
mask applied to the result of relocation depends on the last relocation
in the chain. In case of 64-bit PC-relative relocation we need the following
chain: `R_MIPS_PC32 | R_MIPS_64`. The first relocation calculates an
offset, but does not truncate the result. The second relocation just
apply calculated result as a 64-bit value.
The 64-bit PC-relative relocation might be useful in generation of
`.eh_frame` sections to escape passing `-Wl,-z,notext` flags to linker.
Differential Revision: https://reviews.llvm.org/D80390
Summary:
Support I32/F32 registers in assembler parser and add regression tests of LD/ST
instructions.
Differential Revision: https://reviews.llvm.org/D80777
Summary:
Using a .data() member on a StringRef was discarding the StringRef
size, breaking llvm-exegesis on machines with counter sums (e.g.
Zen2).
Reviewers: oontvoo
Subscribers: mstojanovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80982
This patch adds clang options:
-fbasic-block-sections={all,<filename>,labels,none} and
-funique-basic-block-section-names.
LLVM Support for basic block sections is already enabled.
+ -fbasic-block-sections={all, <file>, labels, none} : Enables/Disables basic
block sections for all or a subset of basic blocks. "labels" only enables
basic block symbols.
+ -funique-basic-block-section-names: Enables unique section names for
basic block sections, disabled by default.
Differential Revision: https://reviews.llvm.org/D68049
Do not spill UNDEF GC values. Instead, replace corresponding
gc.relocate intrinsic with an (arbitrary, but recognizable) constant.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D80714
Summary:
Combine unmerge(trunc) to enable other merge combines.
Without this combine, the scalar unmerge(trunc(merge))
pattern cannot be combined and easily lead to
hard-to-legalize merge/unmerge artifacts.
Reviewed By: arsenm
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79567
When fixing probability of unreachable edges in
BranchProbabilityInfo::calcMetadataWeights() proportionally distribute
remainder probability over the reachable edges. The old implementation
distributes the remainder probability evenly.
See examples in the fixed tests.
Reviewers: yamauchi, ebrevnov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80611
These cases all follow the same pattern:
struct A {
friend class X;
//...
class X {};
};
But 'friend class X;' injects 'X' into the surrounding namespace scope,
rather than introducing a class member. So the second 'class X {}' is a
completely different type, which changes the meaning of the earlier name
'X' from '::X' to 'A::X'.
Additionally, the friend declaration is pointless -- members of a class
don't need to be befriended to be able to access private members.
We looked through a truncate to get to the load. So we should be
deleting the truncate first.
There is a check that the node is really unused before deleting
so this didn't cause a functional issue.