Now that Loop Peeling has been fixed (80cdd30eb90c3509bf315f1fa1369483e2448bbd),
enable the dominance check by default.
This reverts commit 3b5d36ece21f9baf96d82944b0165cb352443bee.
We can only legally extract from the lowest 128-bit subvector, so extract the correct subvector to allow us to handle 256/512-bit vector element extracts.
Under SoftFP calling conventions, we can be left with
extract(bitcast(BUILD_VECTOR(VMOVDRR(a, b), ..))) patterns that can
simplify to a or b, depending on the extract lane.
Differential Revision: https://reviews.llvm.org/D94990
Correct integer constants like `1UL << 63` to `UINT64_C(1) << 63` in
order to make them work on 32-bit machines. Tested on both an i386
and x86_64 machines.
Reviewed By: mgorny
Differential Revision: https://reviews.llvm.org/D95724
If we determine that the invariant path through the loop has no effects,
we can directly branch to the exit block, instead to unswitching first.
Besides avoiding some extra work (unswitching first, then deleting the
loop again) this allows to be more aggressive than regular unswitching
with respect to cost-modeling. This approach should always be be
desirable.
This is similar in spirit to D93734, just that it uses the previously
added checks for loop-unswitching.
I tried to add the required no-op checks from scratch, as we only check
a subset of the loop. There is potential to unify the checks with
LoopDeletion, at the cost of adding a predicate whether a block should
be considered.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95468
The reduction of a sanitizer build failure when enabling the dominance check (D95335) showed that loop peeling also needs to take care of scope duplication, just like loop unrolling (D92887).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D95544
This primarily occurs with isel patterns using vnot. This reduces
the number of variants in the isel tables.
We generally canonicalize build_vectors of constants to the RHS. I think
we might fail if there is a bitcast on the build_vector, but that
should be easy to fix if we can find a case. Usually the
bitcast is introduced by type legalization or lowering. It's
likely canonicalization would have already occured.
To set non-default rounding mode user usually calls function 'fesetround'
from standard C library. This way has some disadvantages.
* It creates unnecessary dependency on libc. On the other hand, setting
rounding mode requires few instructions and could be made by compiler.
Sometimes standard C library even is not available, like in the case of
GPU or AI cores that execute small kernels.
* Compiler could generate more effective code if it knows that a particular
call just sets rounding mode.
This change introduces new IR intrinsic, namely 'llvm.set.rounding', which
sets current rounding mode, similar to 'fesetround'. It however differs
from the latter, because it is a lower level facility:
* 'llvm.set.rounding' does not return any value, whereas 'fesetround'
returns non-zero value in the case of failure. In glibc 'fesetround'
reports failure if its argument is invalid or unsupported or if floating
point operations are unavailable on the hardware. Compiler usually knows
what core it generates code for and it can validate arguments in many
cases.
* Rounding mode is specified in 'fesetround' using constants like
'FE_TONEAREST', which are target dependent. It is inconvenient to work
with such constants at IR level.
C standard provides a target-independent way to specify rounding mode, it
is used in FLT_ROUNDS, however it does not define standard way to set
rounding mode using this encoding.
This change implements only IR intrinsic. Lowering it to machine code is
target-specific and will be implemented latter. Mapping of 'fesetround'
to 'llvm.set.rounding' is also not implemented here.
Differential Revision: https://reviews.llvm.org/D74729
A couple patterns used bitconvert on the immAllOnesV, but
the isel matching uses ISD::isBuildVectorAllOnes which
is able to look through bitcasts. So isel patterns don't need
to do it explicitly.
This reverts commit 6e58539659aea0ee621c7e267d825aa82d4e7e96.
This failed in http://lab.llvm.org:8011/#/builders/123/builds/2676. I guess
were're still missing some symbols, but unfortunately the specific error is
masked by a bug in python/lit that hides stderr. This test will have to remain
disabled on Windows until I can get help to debug it further.
We need to add a mask to the shift amount for these operations
to use the FSR/FSL instructions. We were previously doing this
in isel patterns, but custom lowering will make the mask
visible to optimizations earlier.
This testcase was failing on windows due to missing definitions. This commit
adds definitions of the missing symbols (as absolute symbols) to eliminate the
errors.
If we're going to end up expanding anyway, we should do it early
so we don't create extra operations to handle the bytes added by
promotion.
This is helfpul on RISCV where we might have to promote i16 all
the way to i64.
Differential Revision: https://reviews.llvm.org/D95756
This patch removes some options that have been duplicated in
LTOCodeGenerator and instead use lto::Config directly to manage the
options.
This is a cleanup after 6a59f0560648.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D95738
With a context instruction, this would produce a context
error. However, it would continue on and do an out of bounds access of
the empty allocation order array.
Current dsymutil implementation of hasLiveMemoryLocation()/hasLiveAddressRange()
and applyValidRelocs() assume that calls should be done in certain order
(from first Dies to last). Multi-thread implementation might call these methods
in other order(it might process compilation units in order other than they are physically
located), so we remove restriction that searching for relocations should be done
in ascending order. This change does not introduce noticable performance degradation.
The testing results for clang binary:
golden-dsymutil/dsymutil 23787992
clang MD5: 5efa8fd9355ebf81b65f24db5375caa2
elapsed time=91sec
build-Release/bin/dsymutil 23855616
clang MD5: 5efa8fd9355ebf81b65f24db5375caa2
elapsed time=91sec
Differential Revision: https://reviews.llvm.org/D93106
We already used emplace_back in at least one other place so be
consistent.
AddPatternToMatch already took PTM as an rvalue reference, but
we need to use std::move again to move it into the PatternToMatch
vector.
Use vector::swap instead of copying to a local vector and clearing
the original. We can just swap into the just created local vector
instead which will move the pointers and not the data.
Use std::move in another place to avoid a copy.
AMDGPUTargetTransformInfo.h needs AMDGPUTargetMachine but relies on a
forward declaration of AMDGPUTargetMachine in AMDGPU.h. This patch
adds a forward declaration right in AMDGPUTargetTransformInfo.h.
While we are at it, this patch removes the one in
AMDGPU.h, where it is unnecessary.
This demonstrates a missed optimization: the `vmv.x.s` instruction is
used to extract the element from the vector, and this instruction
already sign-extends the value to XLEN.
Fixes https://bugs.llvm.org/show_bug.cgi?id=48882.
If the input file does not exist (or has a reading error), the
following code will crash if there are two or more input addresses.
```
auto ResOrErr = Symbolizer.symbolizeInlinedCode(
ModuleName, {Offset, object::SectionedAddress::UndefSection});
Printer << (error(ResOrErr) ? DILineInfo() : ResOrErr.get().getFrame(0));
```
For the first address, `symbolizeInlinedCode` returns an error.
For the second address, `symbolizeInlinedCode` returns an empty result
(not an error) and `.getFrame(0)` will crash.
Differential revision: https://reviews.llvm.org/D95609
This patch fixes updating MemorySSA if the header contains memory
defs that do not clobber a duplicated instruction. We need to find the
first defining access outside the loop body and use that as defining
access of the duplicated instruction.
This fixes a crash caused by bee486851c1a.
This patch adds an option to enable the new pass manager in
LTOCodeGenerator. It also updates a few tests with legacy PM specific
tests, which started failing after 6a59f0560648 when
LLVM_ENABLE_NEW_PASS_MANAGER=true.
This patch updates LTOCodeGenerator to use the utilities provided by
LTOBackend to run middle-end optimizations and backend code generation.
This is a first step towards unifying the code used by libLTO's C API
and the newer, C++ interface (see PR41541).
The immediate motivation is to allow using the new pass manager when
doing LTO using libLTO's C API, which is used on Darwin, among others.
With the changes, there are no codegen/stats differences when building
MultiSource/SPEC2000/SPEC2006 on Darwin X86 with LTO, compared
to without the patch.
Reviewed By: steven_wu
Differential Revision: https://reviews.llvm.org/D94487
Compact unwind entries have 8 bits for the encoding-table offset:
* offsets 0..126 reference the global commmon-encodings table, while
* offsets 127..255 reference a per-second-level-page table.
This diff teaches `llvm-objdump` to print this per-page encodings table.
Differential Revision: https://reviews.llvm.org/D93265
This is an optimized approach for D94155.
Previous code build the model that tile config register is the user of
each AMX instruction. There is a problem for the tile config register
spill. When across function, the ldtilecfg instruction may be inserted
on each AMX instruction which use tile config register. This cause all
tile data register clobber.
To fix this issue, we remove the model of tile config register. Instead,
we analyze the AMX instructions between one call to another. We will
insert ldtilecfg after the first call if we find any AMX instructions.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D95136
Source Drift happens when the sources are updated after profiling the binary
but before building the final optimized binary. If the source has changed since
the profiles were obtained, optimizing basic blocks might be sub-optimal. This
only applies to BasicBlockSection::List as it creates clusters of basic blocks
using basic block ids. Source drift can invalidate these groupings leading to
sub-optimal code generation with regards to performance.
PGO source drift for a particular function can be detected using function
metadata added in D95495.
When source drift is deected, disable basic block clusters by default
which can be re-enabled with -mllvm option
bbsections-detect-source-drift=false.
Differential Revision: https://reviews.llvm.org/D95593
As a fixme notes, both of these directory iterator implementations are
conceptually similar and duplicate the functionality of returning and uniquing
entries across two or more directories. This patch combines them into a single
class 'CombiningDirIterImpl'.
This also drops the 'Redirecting' prefix from RedirectingDirEntry and
RedirectingFileEntry to save horizontal space. There's no loss of clarity as
they already have to be prefixed with 'RedirectingFileSystem::' whenever
they're referenced anyway.
rdar://problem/72485443
Differential Revision: https://reviews.llvm.org/D94857
Not all combinations of SEW and LMUL we need to support. For example, we
only need to support [M1, M2, M4, M8] for SEW = 64. There is no need to
define pseudos for PseudoVLSE64MF8, PseudoVLSE64MF4, and PseudoVLSE64MF2.
Differential Revision: https://reviews.llvm.org/D95667