The case start to fail since https://reviews.llvm.org/D99351.
Looks like to me that the node order within Context Profile Tree depends
on the implmementation of std::hash<std::string>.
Unfortunately, the current clang implementation generate different values on
AIX (or for all big-endian systems?)
On Linux:
main: 2408804140(0x8f936f2c)
external: 896680882(0x357243b2)
externalA: 620231129(0x24f7f9d9)
On AIX:
main: 994322777(0x3b442959)
external: 3548191215(0xd37d19ef)
externalA: 1390365101(0x52df49ad)
XFAIL it first while we discuss and seek for a fix.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D99815
Define -fatal-warnings to make warnings fatal, and accept /WX as an ML.EXE compatible alias for it.
Also make sure that if Warning() returns true, we always treat it as an error.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D92504
Head files are included in a separate patch in case the name needs to be changed.
RV32 / 64:
clmul
clmulh
clmulr
Differential Revision: https://reviews.llvm.org/D99711
Forgot to amend the Author.
Original commit message:
Header files are included in a separate patch in case the name needs to be changed.
RV32 / 64:
orc.b
Differential Revision: https://reviews.llvm.org/D99320
Make variables and text-macro references case-insensitive, to match ml.exe.
Also improve error handling for text-macro expansion.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D92503
Implementation for RISC-V Zbr extension intrinsic.
Header files are included in separate patch in case the name needs to be changed
RV32 / 64:
crc32b
crc32h
crc32w
crc32cb
crc32ch
crc32cw
RV64 Only:
crc32d
crc32cd
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D99009
LLVM test CodeGen/Hexagon/hwloop3.ll tries to check for the absence of a
sequence of consecutive instructions with several CHECK-NOT with one of
those directives using a variable defined in another. However CHECK-NOT
are checked independently so that is using a variable defined in a
pattern that should not occur in the input.
This commit merges the two CHECK-NOT into a single CHECK-NOT that
matches the content of two successive non-blank lines, thereby allowing
to preserve the intent of the test.
Reviewed By: bcahoon
Differential Revision: https://reviews.llvm.org/D99778
For positive constants we try shifting left to remove leading zeros
and fill the bottom bits with 1s. We then materialize that constant
shift it right.
This patch adds a new strategy to try filling the bottom bits with
zeros instead. This catches some additional cases.
When run under valgrind, or with a malloc that poisons freed memory,
this can lead to segfaults or other problems.
To avoid modifying the AdditionalUsers DenseMap while still iterating,
save the instructions to be notified in a separate SmallPtrSet, and use
this to later call OperandChangedState on each instruction.
Fixes PR49582.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D98602
Linux-only for now. Some mac bits stubbed out, but not tested.
Good enough for the tiny_race.c example at
https://clang.llvm.org/docs/ThreadSanitizer.html :
$ out/gn/bin/clang -fsanitize=address -g -O1 tiny_race.c
$ while true; do ./a.out || echo $? ; done
While here, also make `-fsanitize=address` work for .c files.
Differential Revision: https://reviews.llvm.org/D99795
This patch moves mapping of IR operands to VPValues out of
tryToCreateWidenRecipe. This allows using existing VPValue operands when
widening recipes directly, which will be introduced in future patches.
The safepoints being inserted exists to free memory, or coordinate with another thread to do so. Thus, we must strip any inferred attributes and reinfer them after the lowering.
I'm not aware of any active miscompiles caused by this, but since I'm working on strengthening inference of both and leveraging them in the optimization decisions, I figured a bit of future proofing was warranted.
Change the definition of G_SBFX and G_UBFX so that the lsb and width
can have different types than the src and dst operands.
Differential Revision: https://reviews.llvm.org/D99739
The ultimate reduction node may have multiple uses, but if the ultimate
reduction is min/max reduction and based on SelectInstruction, the
condition of this select instruction must have only single use.
Differential Revision: https://reviews.llvm.org/D99753
RV32 is able to use the llvm.experimental.vector.insert intrinsics too.
This patch ensures they're tested.
Reviewed By: khchen, asb
Differential Revision: https://reviews.llvm.org/D99655
In order to bring up scalable vector support in LLVM incrementally,
we introduced behaviour to emit a warning, instead of an error, when
asking the wrong question of a scalable vector, like asking for the
fixed number of elements.
This patch puts that behaviour under a flag. The default behaviour is
that the compiler will always error, which means that all LLVM unit
tests and regression tests will now fail when a code-path is taken that
still uses the wrong interface.
The behaviour to demote an error to a warning can be individually enabled
for tools that want to support experimental use of scalable vectors.
This patch enables that behaviour when driving compilation from Clang.
This means that for users who want to try out scalable-vector support,
fixed-width codegen support, or build user-code with scalable vector
intrinsics, Clang will not crash and burn when the compiler encounters
such a case.
This allows us to do away with the following pattern in many of the SVE tests:
RUN: .... 2>%t
RUN: cat %t | FileCheck --check-prefix=WARN
WARN-NOT: warning: ...
The behaviour to emit warnings is only temporary and we expect this flag
to be removed in the future when scalable vector support is more stable.
This patch also has fixes the following tests:
unittests:
ScalableVectorMVTsTest.SizeQueries
SelectionDAGAddressAnalysisTest.unknownSizeFrameObjects
AArch64SelectionDAGTest.computeKnownBitsSVE_ZERO_EXTEND_VECTOR_INREG
regression tests:
Transforms/InstCombine/vscale_gep.ll
Reviewed By: paulwalker-arm, ctetreau
Differential Revision: https://reviews.llvm.org/D98856
The motivation for this patch is to better estimate the cost of
extracelement instructions in cases were they are going to be free,
because the source vector can be used directly.
A simple example is
%v1.lane.0 = extractelement <2 x double> %v.1, i32 0
%v1.lane.1 = extractelement <2 x double> %v.1, i32 1
%a.lane.0 = fmul double %v1.lane.0, %x
%a.lane.1 = fmul double %v1.lane.1, %y
Currently we only consider the extracts free, if there are no other
users.
In this particular case, on AArch64 which can fit <2 x double> in a
vector register, the extracts should be free, independently of other
users, because the source vector of the extracts will be in a vector
register directly, so it should be free to use the vector directly.
The SLP vectorized version of noop_extracts_9_lanes is 30%-50% faster on
certain AArch64 CPUs.
It looks like this does not impact any code in
SPEC2000/SPEC2006/MultiSource both on X86 and AArch64 with -O3 -flto.
This originally regressed after D80773, so if there's a better
alternative to explore, I'd be more than happy to do that.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D99719
D99717 introduced some test cases which showed that the output of one
vsetvli into another would not be picked up by the RISCVCleanupVSETVLI
pass. This patch teaches the optimization about such a pattern. The
pattern is quite common when using the RVV vsetvli intrinsic to pass the
VL onto other intrinsics.
The second test case introduced by D99717 is left unoptimized by this
patch. It is a rarer case and will require us to rewire any uses of the
redundant vset[i]vli's output to the previous one's.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D99730
Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available.
Reviewed By: mkazantsev, lebedev.ri
Differential Revision: https://reviews.llvm.org/D88287
Recently we switched to use InvalidProbeCount = UINT64_MAX (instead of 0) to represent dangling probe, but UINT64_MAX is not excluded when computing profile summary. This caused profile summary to produce incorrect hot/cold threshold. The change fixed it by excluding UINT64_MAX from summary builder.
Differential Revision: https://reviews.llvm.org/D99788