The LASTB and LASTA instructions extract the last active element,
or element after the last active, from the source vector.
The added variants are:
Scalar:
last(a|b) w0, p0, z0.b
last(a|b) w0, p0, z0.h
last(a|b) w0, p0, z0.s
last(a|b) x0, p0, z0.d
SIMD & FP Scalar:
last(a|b) b0, p0, z0.b
last(a|b) h0, p0, z0.h
last(a|b) s0, p0, z0.s
last(a|b) d0, p0, z0.d
The CLASTB and CLASTA conditionally extract the last or element after
the last active element from the source vector.
The added variants are:
Scalar:
clast(a|b) w0, p0, w0, z0.b
clast(a|b) w0, p0, w0, z0.h
clast(a|b) w0, p0, w0, z0.s
clast(a|b) x0, p0, x0, z0.d
SIMD & FP Scalar:
clast(a|b) b0, p0, b0, z0.b
clast(a|b) h0, p0, h0, z0.h
clast(a|b) s0, p0, s0, z0.s
clast(a|b) d0, p0, d0, z0.d
Vector:
clast(a|b) z0.b, p0, z0.b, z1.b
clast(a|b) z0.h, p0, z0.h, z1.h
clast(a|b) z0.s, p0, z0.s, z1.s
clast(a|b) z0.d, p0, z0.d, z1.d
Please refer to the architecture specification for more details on
the semantics of the added instructions.
llvm-svn: 336783
This allows us to use SelectionDAG::isKnownNeverZero in DAGCombiner::visitREM (visitSDIVLike/visitUDIVLike handle the checking for constants).
llvm-svn: 336779
llvm-mca doesn't know that on modern AMD processors, portions of a general
purpose register are not treated independently. So, a partial register write has
a false dependency on the super-register.
The issue with partial register writes will be addressed by a follow-up patch.
llvm-svn: 336778
First stage in PR38057 - support non-uniform constant vectors in the combine to reuse the division-by-constant logic.
We can definitely do better for srem pow2 remainders (and avoid that extra multiply....) but this at least helps keep everything on the vector unit.
Differential Revision: https://reviews.llvm.org/D48975
llvm-svn: 336774
gcc 4.7 seems to disagree with gcc 5.3 about whether you need to say
'return std::move(thing)' instead of just 'return thing'. All the
json::Arrays and json::Objects that I was implicitly turning into
json::Values by returning them from functions now have explicit
std::move wrappers, so hopefully 4.7 will be happy now.
llvm-svn: 336772
The aim of this backend is to output everything TableGen knows about
the record set, similarly to the default -print-records backend. But
where -print-records produces output in TableGen's input syntax
(convenient for humans to read), this backend produces it as
structured JSON data, which is convenient for loading into standard
scripting languages such as Python, in order to extract information
from the data set in an automated way.
The output data contains a JSON representation of the variable
definitions in output 'def' records, and a few pieces of metadata such
as which of those definitions are tagged with the 'field' prefix and
which defs are derived from which classes. It doesn't dump out
absolutely every piece of knowledge it _could_ produce, such as type
information and complicated arithmetic operator nodes in abstract
superclasses; the main aim is to allow consumers of this JSON dump to
essentially act as new backends, and backends don't generally need to
depend on that kind of data.
The new backend is implemented as an EmitJSON() function similar to
all of llvm-tblgen's other EmitFoo functions, except that it lives in
lib/TableGen instead of utils/TableGen on the basis that I'm expecting
to add it to clang-tblgen too in a future patch.
To test it, I've written a Python script that loads the JSON output
and tests properties of it based on comments in the .td source - more
or less like FileCheck, except that the CHECK: lines have Python
expressions after them instead of textual pattern matches.
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: arichardson, labath, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D46054
llvm-svn: 336771
Summary:
These changes cover the PR#31399.
Now the ffs(x) function is lowered to (x != 0) ? llvm.cttz(x) + 1 : 0
and it corresponds to the following llvm code:
%cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)
%tobool = icmp eq i32 %v, 0
%.op = add nuw nsw i32 %cnt, 1
%add = select i1 %tobool, i32 0, i32 %.op
and x86 asm code:
bsfl %edi, %ecx
addl $1, %ecx
testl %edi, %edi
movl $0, %eax
cmovnel %ecx, %eax
In this case the 'test' instruction can't be eliminated because
the 'add' instruction modifies the EFLAGS, namely, ZF flag
that is set by the 'bsf' instruction when 'x' is zero.
We now produce the following code:
bsfl %edi, %ecx
movl $-1, %eax
cmovnel %ecx, %eax
addl $1, %eax
Patch by Ivan Kulagin
Reviewers: davide, craig.topper, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48765
llvm-svn: 336768
These patterns looked for a MOVSS/SD followed by a scalar_to_vector. Or a scalar_to_vector followed by a load.
In both cases we emitted a MOVSS/SD for the MOVSS/SD part, a REG_CLASS for the scalar_to_vector, and a MOVSS/SD for the load.
But we have patterns that do each of those 3 things individually so there's no reason to build large patterns.
Most of the test changes are just reorderings. The one test that had a meaningful change is pr30430.ll and it appears to be a regression. But its doing -O0 so I think it missed a lot of opportunities and was just getting lucky before.
llvm-svn: 336762
Let's be conservative here; it matches what we actually implemented, and
it should be rare in practice anyway.
Differential Revision: https://reviews.llvm.org/D49042
llvm-svn: 336744
The original code attempted to do this, but the std::abs() call didn't
actually do anything due to implicit type conversions. Fix the type
conversions, and perform the correct check for negative immediates.
This probably has very little practical impact, but it's worth fixing
just to avoid confusion in the future, I think.
Differential Revision: https://reviews.llvm.org/D48907
llvm-svn: 336742
If we don't include Initialization.h,
`LLVMInitializeAggressiveInstCombiner` won't see its `extern "C"` decl.
This causes sadness, name mangling, and linker errors.
Reported on the mailing lists by Vladimir Vissoultchev. Thanks!
llvm-svn: 336736
Some added 20 and some added 15. Its unclear when to use which value and whether they are required at all.
This patch removes them all. If we start finding real world issues we may need to add them back with proper tests.
llvm-svn: 336735
Isel currently emits movss/movsd a lot of the time and an accidental double commute turns it into a blend.
Ideally we'd select blend directly in isel under optspeed and not rely on the double commute to create blend.
llvm-svn: 336731
These ISD nodes try to select the MOVLPS and MOVLPD instructions which are special load only instructions. They load data and merge it into the lower 64-bits of an XMM register. They are logically equivalent to our MOVSD node plus a load.
There was only one place in X86ISelLowering that used MOVLPD and no places that selected MOVLPS. The one place that selected MOVLPD had to choose between it and MOVSD based on whether there was a load. But lowering is too early to tell if the load can really be folded. So in isel we have patterns that use MOVSD for MOVLPD if we can't find a load.
We also had patterns that select the MOVLPD instruction for a MOVSD if we can find a load, but didn't choose the MOVLPD ISD opcode for some reason.
So it seems better to just standardize on MOVSD ISD opcode and manage MOVSD vs MOVLPD instruction with isel patterns.
llvm-svn: 336728
Summary:
I noticed that the .imports files emitted for distributed ThinLTO
backends do not have consistent ordering. This is because StringMap
iteration order is not guaranteed to be deterministic. Since we already
have a std::map with this information, used when emitting the individual
index files (ModuleToSummariesForIndex), use it for the imports files as
well.
This issue is likely causing some unnecessary rebuilds of the ThinLTO
backends in our distributed build system as the imports files are inputs
to those backends.
Reviewers: pcc, steven_wu, mehdi_amini
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D48783
llvm-svn: 336721
Memory leaks in tests.
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/6289/steps/check-llvm%20asan/logs/stdio
Direct leak of 192 byte(s) in 1 object(s) allocated from:
#0 0x554ea8 in operator new(unsigned long) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:106
#1 0x56cef1 in llvm::VPlanTestBase::doAnalysis(llvm::Function&) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h:53:14
#2 0x56bec4 in llvm::VPlanTestBase::buildHCFG(llvm::BasicBlock*) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h:57:3
#3 0x571f1e in llvm::(anonymous namespace)::VPlanHCFGTest_testVPInstructionToVPRecipesInner_Test::TestBody() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/Transforms/Vectorize/VPlanHCFGTest.cpp:119:15
#4 0xed2291 in testing::Test::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc
#5 0xed44c8 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2656:11
#6 0xed5890 in testing::TestCase::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2774:28
#7 0xef3634 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:4649:43
#8 0xef27e0 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc
#9 0xebbc23 in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/include/gtest/gtest.h:2233:46
#10 0xebbc23 in main /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/UnitTestMain/TestMain.cpp:51
#11 0x7f65569592e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
and more.
llvm-svn: 336718
Do not use LLVM_RUNTIMES_LIBDIR_SUFFIX variable which is an internal
variable used by the runtimes build from individual runtimes, instead
set per-runtime librarhy directory suffix variable which is necessary
for the sanitized runtimes build to install libraries into correct
location.
Differential Revision: https://reviews.llvm.org/D49121
llvm-svn: 336713
I believe isProfitableToFold will stop the load folding that this was intended to overcome.
Given an (xor load, -1), isProfitableToFold will see that the immediate can be folded with the xor using a one byte immediate since it can be sign extended. It doesn't know about NOT, but the one byte immediate check is enough to stop the fold.
llvm-svn: 336712
Automatically codesign all executables and dynamic libraries if a
codesigning identity is given (via LLVM_CODESIGNING_IDENTITY). This
option is darwin only for now.
Also update platforms/iOS.cmake to pick up the right versions of
codesign and codesign_allocate.
llvm-svn: 336708
The llvm_gcov_... routines in compiler-rt are regular C functions that
need to be called using the proper C ABI for the target. The current
code simply calls them using plain LLVM IR types. Since the type are
mostly simple, this happens to just work on certain targets. But other
targets still need special handling; in particular, it may be necessary
to sign- or zero-extended sub-word values to comply with the ABI. This
caused gcov failures on SystemZ in particular.
Now the very same problem was already fixed for the llvm_profile_ calls
here: https://reviews.llvm.org/D21736
This patch uses the same method to fix the llvm_gcov_ calls, in
particular calls to llvm_gcda_start_file, llvm_gcda_emit_function, and
llvm_gcda_emit_arcs.
Reviewed By: marco-c
Differential Revision: https://reviews.llvm.org/D49134
llvm-svn: 336692