This patch fixes D81345 and PR46652.
If a loop with a small trip count is compiled w/o -Os/-Oz, Loop Access Analysis
still generates runtime checks for unit strides that will version the loop.
In such cases, the loop vectorizer should either re-run the analysis or bail-out
from vectorizing the loop, as done prior to D81345. The latter is applied for
now as the former requires refactoring.
Differential Revision: https://reviews.llvm.org/D83470
Currently, llvm when see a global variable in .maps section,
it ensures its type must be a struct type. Then pointee
will be further evaluated for the structure members.
In normal cases, the pointee type will be skipped.
Although this is what current all bpf programs are doing,
but it is a little bit restrictive. For example, it is legitimate
for users to have:
typedef struct { int key_size; int value_size; } __map_t;
__map_t map __attribute__((section(".maps")));
This patch lifts this restriction and typedef of
a struct type is also allowed for .maps section variables.
To avoid create unnecessary fixup entries when traversal
started with typedef/struct type, the new implementation
first traverse all map struct members and then traverse
the typedef/struct type. This way, in internal BTFDebug
implementation, no fixup entries are generated.
Two new unit tests are added for typedef and const
struct in .maps section. Also tested with kernel bpf selftests.
Differential Revision: https://reviews.llvm.org/D83638
fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)
This is only allowed when "reassoc" is present on the fadd.
As discussed in D80801, this transform goes beyond
what is allowed by "contract" FMF (-ffp-contract=fast).
That is because we are fusing the trailing add of 'E' with a
multiply, but without "reassoc", the code mandates that the
products A*B and C*D are added together before adding in 'E'.
I've added this example to the LangRef to try to clarify the
meaning of "contract". If that seems reasonable, we should
probably do something similar for the clang docs because
there does not appear to be any formal spec for the behavior
of -ffp-contract=fast.
Differential Revision: https://reviews.llvm.org/D82499
These test cases fail to use vpternlog because the AND was converted
to a blend shuffle and then converted back to AND during shuffle lowering.
This results in the AND having a different type than it started with.
This prevents our custom matching logic from seeing the two logic ops.
Summary:
This patch is enabling the generation of clauses enum sets for semantics check in Flang through
tablegen. Enum sets and directive - sets map is generated by the new tablegen infrsatructure for OpenMP
and other directive languages.
The semantic checks for OpenMP are modified to use this newly generated map.
Reviewers: DavidTruby, sscalpone, kiranchandramohan, ichoyjx, jdoerfert
Reviewed By: DavidTruby, ichoyjx
Subscribers: mgorny, yaxunl, hiraditya, guansong, sstefan1, aaron.ballman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83326
Summary:
- Skip unreachable predecessors during header detection in SCC. Those
unreachable blocks would be generated in the switch lowering pass in
the corner cases or other frontends. Even though they could be removed
through the CFG simplification, we should skip them during header
detection.
Reviewers: sameerds
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83562
It is possible that LowerSwitch pass leaves certain blocks
unreachable from the entry. If not removed, these dead blocks
can cause undefined behavior in the subsequent passes.
It caused a crash in the AMDGPU backend after the instruction
selection when a PHI node has its incoming values coming from
these unreachable blocks.
In the AMDGPU pass flow, the last invocation of UnreachableBlockElim
precedes where LowerSwitch is currently placed and eventually
missed out on the opportunity to get these blocks eliminated.
This patch ensures that LowerSwitch pass get inserted earlier
to make use of the existing unreachable block elimination pass.
Reviewed By: sameerds, arsenm
Differential Revision: https://reviews.llvm.org/D83584
The current implementation of Tail Recursion Elimination has a very restricted
pre-requisite: AllCallsAreTailCalls. i.e. it requires that no function
call receives a pointer to local stack. Generally, function calls that
receive a pointer to local stack but do not capture it - should not
break TRE. This fix allows us to do TRE if it is proved that no pointer
to the local stack is escaped.
Reviewed by: efriedma
Differential Revision: https://reviews.llvm.org/D82085
MSVC throws an error if you use "too many" if-else in a row:
`Frontend/OpenMP/OMPKinds.def(570): fatal error C1061: compiler limit:
blocks nested too deeply`
We work around it now...
In non-SPMD mode we create a state machine like code to identify the
parallel region the GPU worker threads should execute next. The
identification uses the parallel region function pointer as that allows
it to work even if the kernel (=target region) and the parallel region
are in separate TUs. However, taking the address of a function comes
with various downsides. With this patch we will identify the most common
situation and replace the function pointer use with a dummy global
symbol (for identification purposes only). That means, if the parallel
region is only called from a single target region (or kernel), we do not
use the function pointer of the parallel region to identify it but a new
global symbol.
Fixes PR46450.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83271
The module slice describes which functions we can analyze and transform
while working on an SCC as part of the CGSCC OpenMPOpt pass. So far, we
simply restricted it to the SCC. In a follow up we will need to have a
bigger scope which is why this patch introduces a proper identification
of the module slice. In short, everything that has a transitive
reference to a function in the SCC or is transitively referenced by one
is fair game.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D83270
We now identify GPU kernels, that is entry points into the GPU code.
These kernels (can) correspond to OpenMP target regions. With this patch
we identify and on request print them via remarks.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83269
There are various runtime calls in the device runtime with unused, or
always fixed, arguments. This is bad for all sorts of reasons. Clean up
two before as we match them in OpenMPOpt now.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83268
P9 is the only one with InstrSchedModel, but we may have more in the
future, we should not hardcoded it to P9, check hasInstrSchedModel
instead.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D83590
In BUILD_VECTOR lowering, we used to generally prefer using splats
over v128.const instructions because v128.const has a very large
encoding. However, in d5b7a4e2e8 we switched to preferring consts
because they are expected to be more efficient in engines. This patch
updates the ISel patterns to match this current preference.
Differential Revision: https://reviews.llvm.org/D83581
This reverts commit 1d542f0ca83fa1411d6501a8d088450d83abd5b8.
`recollectUses()` is added to prevent looking at dead uses after
Attributor run.
This is the first and most basic ICV Tracking implementation. For this
first version, we only support deduplication within the same BB.
Reviewers: jdoerfert, JonChesterfield, hamax97, jhuber6, uenoku,
baziotis, lebedev.ri
Differential Revision: https://reviews.llvm.org/D81788
Summary:
Diff D83176 moved the last piece of code from OMPConstants.cpp and now this file was only
useful to include the tablegen generated file. This patch replace OMPConstants.cpp with OMP.cpp
generated by tablegen.
Reviewers: sstefan1, jdoerfert, jdenny
Reviewed By: sstefan1
Subscribers: mgorny, yaxunl, hiraditya, guansong, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83583
Summary:
eraseBlock is trying to erase all probability info for the given BB.
This info is stored in a DenseMap organized like so:
using Edge = std::pair<const BasicBlock *, unsigned>;
DenseMap<Edge, BranchProbability> Probs;
where the unsigned in the Edge key is the successor id.
It was walking through every single map entry, checking if the BB in the
key's pair matched the given BB. Much more efficient is to do what
another method (getEdgeProbability) was already doing, which is to walk
the successors of the BB, and simply do a map lookup on the key formed
from each <BB, successor id> pair.
Doing this dropped the overall compile time for a file containing a
very large function by around 32%.
Reviewers: davidxl, xur
Subscribers: llvm-commits, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83596
This risks ODR violations in inline functions that call these functions
(if they remain static) & otherwise just causes some object size
increase, potentially, by these functions not being deduplicated by the
linker.
as it wasn't NFC and is causing issues with thinlto bitcode reading.
I've followed up offline with reproduction instructions and testcases.
This reverts commit 30582457b47004dec8a78144abc919a13ccbd08c.
This doesn't appear used for anything, and is emitted incorrectly
based on the description. This also depends on the IR type, and
pointee element type.
In FileCheck.rst, add `-dump-input-context` and `-dump-input-filter`,
and fix some `-dump-input` documentation.
In `FileCheck -help`, `cl::value_desc("kind")` is being ignored for
`-dump-input-filter`, so just drop it.
Extend `-dump-input=help` to mention FILECHECK_OPTS.
PassInfoMixin should be used for all NPM passes, rater than a custom
`name()`.
This caused ambiguous references in LegacyPassManager.cpp, so had to
remove "using namespace llvm::legacy" and move some things around.
Reviewed By: ychen, asbirlea
Differential Revision: https://reviews.llvm.org/D83498