1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 19:12:56 +02:00
Commit Graph

8538 Commits

Author SHA1 Message Date
Mircea Trofin
85690f6e18 [llvm][NFC] Factor out logic for getting incoming & back Loop edges
Reviewers: davidxl

Reviewed By: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59967

llvm-svn: 357284
2019-03-29 17:39:17 +00:00
Florian Hahn
2f903fe2ac Recommit "[DSE] Preserve basic block ordering using OrderedBasicBlock."
Updated to use DenseMap::insert instead of [] operator for insertion, to
avoid a crash caused by epoch checks.

This reverts commit 2b85de438326f9d27bc96dc934ec98b98abdb337.

llvm-svn: 357257
2019-03-29 14:10:24 +00:00
Florian Hahn
98d7a687e9 Revert Recommit "[DSE] Preserve basic block ordering using OrderedBasicBlock."
Another buildbot failure

http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/20402

clang-9: /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/llvm/include/llvm/ADT/DenseMap.h:1228: llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::value_type* llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::operator->() const [with KeyT = const llvm::Instruction*; ValueT = unsigned int; KeyInfoT = llvm::DenseMapInfo<const llvm::Instruction*>; Bucket = llvm::detail::DenseMapPair<const llvm::Instruction*, unsigned int>; bool IsConst = false; llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::pointer = llvm::detail::DenseMapPair<const llvm::Instruction*, unsigned int>*; llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::value_type = llvm::detail::DenseMapPair<const llvm::Instruction*, unsigned int>]: Assertion `isHandleInSync() && "invalid iterator access!"' failed.

0.	Program arguments: /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/stage1.install/bin/clang-9 -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -disable-free -main-file-name ArchiveCommandLine.cpp -mrelocation-model static -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu skylake-avx512 -dwarf-column-info -debugger-tuning=gdb -momit-leaf-frame-pointer -coverage-notes-file /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/sandbox/build/MultiSource/Benchmarks/7zip/Output/ArchiveCommandLine.llvm.gcno -resource-dir /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/stage1.install/lib/clang/9.0.0 -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/sandbox/build/MultiSource/Benchmarks/7zip -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/include -I ../../../include -D _GNU_SOURCE -D __STDC_LIMIT_MACROS -D NDEBUG -D BREAK_HANDLER -D UNICODE -D _UNICODE -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/C -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/CPP/myWindows -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/CPP/include_windows -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/CPP -I . -D _FILE_OFFSET_BITS=64 -D _LARGEFILE_SOURCE -D NDEBUG -D _REENTRANT -D ENV_UNIX -D _7ZIP_LARGE_PAGES -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/x86_64-linux-gnu/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/x86_64-linux-gnu/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/backward -internal-isystem /usr/local/include -internal-isystem /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/stage1.install/lib/clang/9.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -O3 -std=gnu++98 -fdeprecated-macro -fdebug-compilation-dir /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/sandbox/build/MultiSource/Benchmarks/7zip -ferror-limit 19 -fmessage-length 0 -pthread -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -vectorize-loops -vectorize-slp -o Output/ArchiveCommandLine.llvm.o -x c++ /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/CPP/7zip/UI/Common/ArchiveCommandLine.cpp -faddrsig

This reverts r357222 (git commit 64cccfcc72c44ea62f441b782d2177a90912769a)

llvm-svn: 357227
2019-03-29 00:22:26 +00:00
Florian Hahn
99b146d4a0 Recommit "[DSE] Preserve basic block ordering using OrderedBasicBlock."
Recommitting after addressing a buildbot failure.

This reverts commit c87869ebea000dd6483de7c7451cb36c1d36f866.

llvm-svn: 357222
2019-03-28 23:11:00 +00:00
Florian Hahn
4959350890 Revert [DSE] Preserve basic block ordering using OrderedBasicBlock.
This reverts r357208 (git commit c0bfd37d385c93711ef3a349599dba20e6b101ef)

This causes a buildbot failure:  http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/16124

FAILED: lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o
/home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/install/stage2/bin/clang++   -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/IR -I/home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/llvm.src/lib/IR -Iinclude -I/home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/llvm.src/include -fPIC -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -flto=thin -O3    -UNDEBUG  -fno-exceptions -fno-rtti -MD -MT lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o -MF lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o.d -o lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o -c /home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/llvm.src/lib/IR/IRBuilder.cpp
clang-9: /home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/llvm.src/lib/Analysis/OrderedBasicBlock.cpp:38: bool llvm::OrderedBasicBlock::comesBefore(const llvm::Instruction *, const llvm::Instruction *): Assertion `!(LastInstFound == BB->end() && NextInstPos != 0) && "Instruction supposed to be in NumberedInsts"' failed.

llvm-svn: 357211
2019-03-28 20:36:24 +00:00
Florian Hahn
58ef1f1bc3 [DSE] Preserve basic block ordering using OrderedBasicBlock.
By extending OrderedBB to allow removing and replacing cached
instructions, we can preserve OrderedBBs in DSE easily. This eliminates
one source of quadratic compile time in DSE.

Fixes PR38829.

Reviewers: rnk, efriedma, hfinkel

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D59789

llvm-svn: 357208
2019-03-28 20:02:33 +00:00
Florian Hahn
9641845c3d [MemDepAnalysis] Allow caller to pass in an OrderedBasicBlock.
If the caller can preserve the OBB, we can avoid recomputing the order
for each getDependency call.

Reviewers: efriedma, rnk, hfinkel

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D59788

llvm-svn: 357206
2019-03-28 19:17:31 +00:00
Chandler Carruth
7cfefe1cbe [NewPM] Fix a nasty bug with analysis invalidation in the new PM.
The issue here is that we actually allow CGSCC passes to mutate IR (and
therefore invalidate analyses) outside of the current SCC. At a minimum,
we need to support mutating parent and ancestor SCCs to support the
ArgumentPromotion pass which rewrites all calls to a function.

However, the analysis invalidation infrastructure is heavily based
around not needing to invalidate the same IR-unit at multiple levels.
With Loop passes for example, they don't invalidate other Loops. So we
need to customize how we handle CGSCC invalidation. Doing this without
gratuitously re-running analyses is even harder. I've avoided most of
these by using an out-of-band preserved set to accumulate the cross-SCC
invalidation, but it still isn't perfect in the case of re-visiting the
same SCC repeatedly *but* it coming off the worklist. Unclear how
important this use case really is, but I wanted to call it out.

Another wrinkle is that in order for this to successfully propagate to
function analyses, we have to make sure we have a proxy from the SCC to
the Function level. That requires pre-creating the necessary proxy.

The motivating test case now works cleanly and is added for
ArgumentPromotion.

Thanks for the review from Philip and Wei!

Differential Revision: https://reviews.llvm.org/D59869

llvm-svn: 357137
2019-03-28 00:51:36 +00:00
Hans Wennborg
20250932e6 Fix the build with GCC 4.8 after r356783
llvm-svn: 356875
2019-03-25 09:27:42 +00:00
Nikita Popov
e4a16093f5 [ConstantRange] Add getFull() + getEmpty() named constructors; NFC
This adds ConstantRange::getFull(BitWidth) and
ConstantRange::getEmpty(BitWidth) named constructors as more readable
alternatives to the current ConstantRange(BitWidth, /* full */ false)
and similar. Additionally private getFull() and getEmpty() member
functions are added which return a full/empty range with the same bit
width -- these are commonly needed inside ConstantRange.cpp.

The IsFullSet argument in the ConstantRange(BitWidth, IsFullSet)
constructor is now mandatory for the few usages that still make use of it.

Differential Revision: https://reviews.llvm.org/D59716

llvm-svn: 356852
2019-03-24 09:34:40 +00:00
Nikita Popov
717af54804 [ValueTracking] Avoid redundant known bits calculation in computeOverflowForSignedAdd()
We're already computing the known bits of the operands here. If the
known bits of the operands can determine the sign bit of the result,
we'll already catch this in signedAddMayOverflow(). The only other
way (and as the comment already indicates) we'll get new information
from computing known bits on the whole add, is if there's an assumption
on it.

As such, we change the code to only compute known bits from assumptions,
instead of computing full known bits on the add (which would unnecessarily
recompute the known bits of the operands as well).

Differential Revision: https://reviews.llvm.org/D59473

llvm-svn: 356785
2019-03-22 17:51:40 +00:00
Alina Sbirlea
3fc1696aca [AliasAnalysis] Second prototype to cache BasicAA / anyAA state.
Summary:
Adding contained caching to AliasAnalysis. BasicAA is currently the only one using it.

AA changes:
- This patch is pulling the caches from BasicAAResults to AAResults, meaning the getModRefInfo call benefits from the IsCapturedCache as well when in "batch mode".
- All AAResultBase implementations add the QueryInfo member to all APIs. AAResults APIs maintain wrapper APIs such that all alias()/getModRefInfo call sites are unchanged.
- AA now provides a BatchAAResults type as a wrapper to AAResults. It keeps the AAResults instance and a QueryInfo instantiated to batch mode. It delegates all work to the AAResults instance with the batched QueryInfo. More API wrappers may be needed in BatchAAResults; only the minimum needed is currently added.

MemorySSA changes:
- All walkers are now templated on the AA used (AliasAnalysis=AAResults or BatchAAResults).
- At build time, we optimize uses; now we create a local walker (lives only as long as OptimizeUses does) using BatchAAResults.
- All Walkers have an internal AA and only use that now, never the AA in MemorySSA. The Walkers receive the AA they will use when built.

- The walker we use for queries after the build is instantiated on AliasAnalysis and is built after building MemorySSA and setting AA.
- All static methods doing walking are now templated on AliasAnalysisType if they are used both during build and after. If used only during build, the method now only takes a BatchAAResults. If used only after build, the method now takes an AliasAnalysis.

Subscribers: sanjoy, arsenm, jvesely, nhaehnle, jlebar, george.burgess.iv, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59315

llvm-svn: 356783
2019-03-22 17:22:19 +00:00
Bixia Zheng
8ce6c7b366 [ConstantFolding] Fix GetConstantFoldFPValue to avoid cast overflow.
Summary:
In C++, the behavior of casting a double value that is beyond the range
of a single precision floating-point to a float value is undefined. This
change replaces such a cast with APFloat::convert to convert the value,
which is consistent with how we convert a double value to a half value.

Reviewers: sanjoy

Subscribers: lebedev.ri, sanjoy, jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59500

llvm-svn: 356781
2019-03-22 16:37:37 +00:00
Craig Topper
2e83bf2dab [ScalarizeMaskedMemIntrin] Add support for scalarizing expandload and compressstore intrinsics.
This adds support for scalarizing these intrinsics as well the X86TargetTransformInfo support to avoid scalarizing them in the cases X86 can handle.

I've omitted handling special cases for constant masks for this first pass. Though CodeGenPrepare can constant fold the branch conditions and remove some of the control flow anyway.

Fixes PR40994 and is covers most of PR3666. Might want to implement constant masks to close that.

Differential Revision: https://reviews.llvm.org/D59180

llvm-svn: 356687
2019-03-21 17:38:52 +00:00
Nikita Popov
ba8e158b08 [ValueTracking] Use ConstantRange based overflow check for signed sub
This is D59450, but for signed sub. This case is not NFC, because
the overflow logic in ConstantRange is more powerful than the existing
check. This resolves the TODO in the function.

I've added two tests to show that this indeed catches more cases than
the previous logic, but the main correctness test coverage here is in
the existing ConstantRange unit tests.

Differential Revision: https://reviews.llvm.org/D59617

llvm-svn: 356685
2019-03-21 17:23:51 +00:00
Fangrui Song
5bc40f99c0 [BasicAA] Use DenseMap::try_emplace after D59151. NFC
llvm-svn: 356651
2019-03-21 08:47:40 +00:00
Alina Sbirlea
cc7694fc25 [BasicAA] Reduce no of map seaches [NFCI].
Summary:
This is a refactoring patch.
- Reduce the number of map searches by reusing the iterator.
- Add asserts to check that the entry is in the cache, as this is something BasicAA relies on to avoid infinite recursion.

Reviewers: chandlerc, aschwaighofer

Subscribers: sanjoy, jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59151

llvm-svn: 356644
2019-03-21 05:02:05 +00:00
George Burgess IV
4e01a42475 [MSSA] Delete move ctor; remove dynamic never-moved verification
Code archaeology in D59315 revealed that MSSA should never be moved.
Rather than trying to check dynamically that this hasn't happened in the
verify() functions of Walkers, it's likely best to just delete its move
constructor.

Since all these verify() functions did is check that MSSA hasn't moved,
this allows us to remove these verify functions.

I can readd the verification checks if someone's super concerned about
us trying to `memcpy` MemorySSA or something somewhere, but I imagine we
have other problems if we're trying anything like that...

llvm-svn: 356641
2019-03-21 03:11:34 +00:00
Nikita Popov
8f1e59217b [ValueTracking] Compute range for abs without nsw
This is a small followup to D59511. The code that was moved into
computeConstantRange() there is a bit overly conversative: If the
abs is not nsw, it does not compute any range. However, abs without
nsw still has a well-defined contiguous unsigned range from 0 to
SIGNED_MIN. This is a lot less useful than the usual 0 to SIGNED_MAX
range, but if we're already here we might as well specify it...

Differential Revision: https://reviews.llvm.org/D59563

llvm-svn: 356586
2019-03-20 18:16:02 +00:00
Nikita Popov
47424b8478 [ValueTracking] Use computeConstantRange() for unsigned add/sub overflow
Improve computeOverflowForUnsignedAdd/Sub in ValueTracking by
intersecting the computeConstantRange() result into the ConstantRange
created from computeKnownBits(). This allows us to detect some
additional never/always overflows conditions that can't be determined
from known bits.

This revision also adds basic handling for constants to
computeConstantRange(). Non-splat vectors will be handled in a followup.

The signed case will also be handled in a followup, as it needs some
more groundwork.

Differential Revision: https://reviews.llvm.org/D59386

llvm-svn: 356489
2019-03-19 17:53:56 +00:00
Simon Pilgrim
3be8fb12c8 [SelectionDAG] Handle unary SelectPatternFlavor for ABS case in SelectionDAGBuilder::visitSelect
These changes are related to PR37743 and include:

    SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node.

    Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner.

    Add promoting the integer ABS node in the LegalizeIntegerType.

    Expand-based legalization of integer result for the ABS nodes.

    Expand-based legalization of ABS vector operations.

    Add some integer abs testcases for different typesizes for Thumb arch

    Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to:
        tmp = (SRA, Hi, 31)
        Lo = (UADDO tmp, Lo)
        Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1))
        Lo = (XOR tmp, Lo)

    The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern:
        (ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))).

    Change integer abs testcases for codegen with the ABS node support for AArch64.
        Indicate that the ABS is legal for the i64 type when the NEON is supported.
        Change the integer abs testcases to show changing of codegen.

    Add combine and legalization of ABS nodes for Thumb arch.

    Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition.

For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743

Patch by: @ikulagin (Ivan Kulagin)

Differential Revision: https://reviews.llvm.org/D49837

llvm-svn: 356468
2019-03-19 16:24:55 +00:00
Simon Pilgrim
c8d8ba0642 [InstSimplify] SimplifyICmpInst - icmp eq/ne %X, undef -> undef
As discussed on PR41125 and D59363, we have a mismatch between icmp eq/ne cases with an undef operand:

When the other operand is constant we fold to undef (handled in ConstantFoldCompareInstruction)
When the other operand is non-constant we fold to a bool constant based on isTrueWhenEqual (handled in SimplifyICmpInst).

Neither is really wrong, but this patch changes the logic in SimplifyICmpInst to consistently fold to undef.

The NewGVN test change is annoying (as with most heavily reduced tests) but AFAICT I have kept the purpose of the test based on rL291968.

Differential Revision: https://reviews.llvm.org/D59541

llvm-svn: 356456
2019-03-19 14:08:23 +00:00
Nikita Popov
d426781611 Revert "[ValueTracking][InstSimplify] Support min/max selects in computeConstantRange()"
This reverts commit 106f0cdefb02afc3064268dc7a71419b409ed2f3.

This change impacts the AMDGPU smed3.ll and umed3.ll codegen tests.

llvm-svn: 356424
2019-03-18 22:26:27 +00:00
Nikita Popov
bd7f742618 [ValueTracking][InstSimplify] Support min/max selects in computeConstantRange()
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This was suggested by spatel as an alternative
approach to D59378. I've also added the infinite looping test from
that revision here.

Differential Revision: https://reviews.llvm.org/D59506

llvm-svn: 356415
2019-03-18 21:35:19 +00:00
Nikita Popov
e840b98cd3 [ValueTracking][InstSimplify] Move abs handling into computeConstantRange(); NFC
This is preparation for D59506. The InstructionSimplify abs handling
is moved into computeConstantRange(), which is the general place for
such calculations. This is NFC and doesn't affect the existing tests
in test/Transforms/InstSimplify/icmp-abs-nabs.ll.

Differential Revision: https://reviews.llvm.org/D59511

llvm-svn: 356409
2019-03-18 21:20:03 +00:00
Warren Ristow
d2b89688b2 [SCEV] Guard movement of insertion point for loop-invariants
This reinstates r347934, along with a tweak to address a problem with
PHI node ordering that that commit created (or exposed). (That commit
was reverted at r348426, due to the PHI node issue.)

Original commit message:

r320789 suppressed moving the insertion point of SCEV expressions with
dev/rem operations to the loop header in non-loop-invariant situations.
This, and similar, hoisting is also unsafe in the loop-invariant case,
since there may be a guard against a zero denominator. This is an
adjustment to the fix of r320789 to suppress the movement even in the
loop-invariant case.

This fixes PR30806.

Differential Revision: https://reviews.llvm.org/D57428

llvm-svn: 356392
2019-03-18 18:52:35 +00:00
Nikita Popov
d423c075d2 [ValueTracking] Use ConstantRange overflow check for signed add; NFC
This is the same change as rL356290, but for signed add. It replaces
the existing ripple logic with the overflow logic in ConstantRange.

This is NFC in that it should return NeverOverflow in exactly the
same cases as the previous implementation. However, it does make
computeOverflowForSignedAdd() more powerful by now also determining
AlwaysOverflows conditions. As none of its consumers handle this yet,
this has no impact on optimization. Making use of AlwaysOverflows
in with.overflow folding will be handled as a followup.

Differential Revision: https://reviews.llvm.org/D59450

llvm-svn: 356345
2019-03-17 21:25:26 +00:00
Nikita Popov
e9a00fefdc [ConstantRange] Add fromKnownBits() method
Following the suggestion in D59450, I'm moving the code for constructing
a ConstantRange from KnownBits out of ValueTracking, which also allows us
to test this code independently.

I'm adding this method to ConstantRange rather than KnownBits (which
would have been a bit nicer API wise) to avoid creating a dependency
from Support to IR, where ConstantRange lives.

Differential Revision: https://reviews.llvm.org/D59475

llvm-svn: 356339
2019-03-17 20:24:02 +00:00
Nikita Popov
0687c3e726 [ValueTracking] Use ConstantRange overflow checks for unsigned add/sub; NFC
Use the methods introduced in rL356276 to implement the
computeOverflowForUnsigned(Add|Sub) functions in ValueTracking, by
converting the KnownBits into a ConstantRange.

This is NFC: The existing KnownBits based implementation uses the same
logic as the the ConstantRange based one. This is not the case for the
signed equivalents, so I'm only changing unsigned here.

This is in preparation for D59386, which will also intersect the
computeConstantRange() result into the range determined from KnownBits.

llvm-svn: 356290
2019-03-15 18:37:45 +00:00
Teresa Johnson
2274d19c8e [ThinLTO] Restructure AliasSummary to contain ValueInfo of Aliasee
Summary:
The AliasSummary previously contained the AliaseeGUID, which was only
populated when reading the summary from bitcode. This patch changes it
to instead hold the ValueInfo of the aliasee, and always populates it.
This enables more efficient access to the ValueInfo (specifically in the
recent patch r352438 which needed to perform an index hash table lookup
using the aliasee GUID).

As noted in the comments in AliasSummary, we no longer technically need
to keep a pointer to the corresponding aliasee summary, since it could
be obtained by walking the list of summaries on the ValueInfo looking
for the summary in the same module. However, I am concerned that this
would be inefficient when walking through the index during the thin
link for various analyses. That can be reevaluated in the future.

By always populating this new field, we can remove the guard and special
handling for a 0 aliasee GUID when dumping the dot graph of the summary.

An additional improvement in this patch is when reading the summaries
from LLVM assembly we now set the AliaseeSummary field to the aliasee
summary in that same module, which makes it consistent with the behavior
when reading the summary from bitcode.

Reviewers: pcc, mehdi_amini

Subscribers: inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D57470

llvm-svn: 356268
2019-03-15 15:11:38 +00:00
Sanjay Patel
cd6dce3d39 [InstCombine] canonicalize funnel shift constant shift amount to be modulo bitwidth
The shift argument is defined to be modulo the bitwidth, so if that argument
is a constant, we can always reduce the constant to its minimal form to allow
better CSE and other follow-on transforms.

We need to be careful to ignore constant expressions here, or we will likely
infinite loop. I'm adding a general vector constant query for that case.

Differential Revision: https://reviews.llvm.org/D59374

llvm-svn: 356192
2019-03-14 19:22:08 +00:00
Alina Sbirlea
c7915741b9 [MemorySSA] Remove redundant walker assignment [NFC].
Subscribers: llvm-commits
llvm-svn: 356189
2019-03-14 18:45:17 +00:00
Teresa Johnson
476ae3dcb7 [SCEV] Use depth limit for trunc analysis
Summary:
This fixes an extremely long compile time caused by recursive analysis
of truncs, which were not previously subject to any depth limits unlike
some of the other ops. I decided to use the same control used for
sext/zext, since the routines analyzing these are sometimes mutually
recursive with the trunc analysis.

Reviewers: mkazantsev, sanjoy

Subscribers: sanjoy, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58994

llvm-svn: 355949
2019-03-12 18:28:05 +00:00
Sjoerd Meijer
c1c71ee45e [TTI] Enable analysis of clib functions in getIntrinsicCosts. NFCI.
This is addressing the issue that we're not modeling the cost of clib functions
in TTI::getIntrinsicCosts and thus we're basically addressing this fixme:
    
// FIXME: This is wrong for libc intrinsics.

To enable analysis of clib functions, we not only need an intrinsic ID and
formal arguments, but also the actual user of that function so that we can e.g.
look at alignment and values of arguments. So, this is the initial plumbing to
pass the user of an intrinsinsic on to getCallCosts, which queries
getIntrinsicCosts.

Differential Revision: https://reviews.llvm.org/D59014

llvm-svn: 355901
2019-03-12 09:48:02 +00:00
Sanjoy Das
c6d3c3a4b5 Reland "Relax constraints for reduction vectorization"
Change from original commit: move test (that uses an X86 triple) into the X86
subdirectory.

Original description:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.

Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal

Reviewed By: sdesmalen

Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57728

llvm-svn: 355889
2019-03-12 01:31:44 +00:00
Sanjoy Das
244bc57544 Revert "Relax constraints for reduction vectorization"
This reverts commit r355868.  Breaks hexagon.

llvm-svn: 355873
2019-03-11 22:37:31 +00:00
Sanjoy Das
367bdd4c9b Relax constraints for reduction vectorization
Summary:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.

Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal

Reviewed By: sdesmalen

Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57728

llvm-svn: 355868
2019-03-11 21:36:41 +00:00
Nikita Popov
00449b7b7f [ValueTracking] Move constant range computation into ValueTracking; NFC
InstructionSimplify currently has some code to determine the constant
range of integer instructions for some simple cases. It is used to
simplify icmps.

This change moves the relevant code into ValueTracking as
llvm::computeConstantRange(), so it can also be reused for other
purposes.

In particular this is with the optimization of overflow checks in
mind (ref D59071), where constant ranges cover some cases that
known bits don't.

llvm-svn: 355781
2019-03-09 21:17:42 +00:00
Michael Kruse
54bab5df35 [RegionPass] Fix forgotten "!".
Commit r355068 "Fix IR/Analysis layering issue with OptBisect" uses the
template

   return Gate.isEnabled() && !Gate.shouldRunPass(this, getDescription(...));

for all pass kinds. For the RegionPass, it left out the not operator,
causing region passes to be skipped as soon as a pass gate is used.

llvm-svn: 355733
2019-03-08 21:03:06 +00:00
George Burgess IV
2393dc49fe [CFLAnders] Fix typo in comment; NFC
Patch by Enna1!

Differential Revision: https://reviews.llvm.org/D58756

llvm-svn: 355715
2019-03-08 19:28:55 +00:00
Clement Courbet
4f713f8431 [SelectionDAG] Allow the user to specify a memeq function.
Summary:
Right now, when we encounter a string equality check,
e.g. `if (memcmp(a, b, s) == 0)`, we try to expand to a comparison if `s` is a
small compile-time constant, and fall back on calling `memcmp()` else.

This is sub-optimal because memcmp has to compute much more than
equality.

This patch replaces `memcmp(a, b, s) == 0` by `bcmp(a, b, s) == 0` on platforms
that support `bcmp`.

`bcmp` can be made much more efficient than `memcmp` because equality
compare is trivially parallel while lexicographic ordering has a chain
dependency.

Subscribers: fedor.sergeev, jyknight, ckennelly, gchatelet, llvm-commits

Differential Revision: https://reviews.llvm.org/D56593

llvm-svn: 355672
2019-03-08 09:07:45 +00:00
Fangrui Song
6d6e39d714 [IDF] Delete a redundant J-edge test
In the DJ-graph based computation of iterated dominance frontiers,
SuccNode->getIDom() == Node is one of the tests to check if (Node,Succ)
is a J-edge. If it is true, since Node is dominated by Root,

  SuccLevel = level(Node)+1 > RootLevel

which means the next test SuccLevel > RootLevel will also be true. test
the check is redundant and can be deleted as it also involves one
indirection and provides no speed-up.

llvm-svn: 355589
2019-03-07 11:42:59 +00:00
David Green
56efa959c1 [SCEV] Ensure that isHighCostExpansion takes into account what is being divided
A SCEV is not low-cost just because you can divide it by a power of 2. We need to also
check what we are dividing to make sure it too is not a high-code expansion. This helps
to not expand the exit value of certain loops, helping not to bloat the code.

The change in no-iv-rewrite.ll is reverting back to what it was testing before rL194116,
and looks a lot like the other tests in replace-loop-exit-folds.ll.

Differential Revision: https://reviews.llvm.org/D58435

llvm-svn: 355393
2019-03-05 12:12:18 +00:00
Sanjoy Das
5012dbbadd PHI nodes are not FPMathOperator s
Reviewers: chandlerc, arsenm

Reviewed By: arsenm

Subscribers: wdng, arsenm, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58887

llvm-svn: 355362
2019-03-05 01:15:08 +00:00
Sanjay Patel
a55ffabffe [ValueTracking] do not try to peek through bitcasts in computeKnownBitsFromAssume()
There are no tests for this case, and I'm not sure how it could ever work,
so I'm just removing this option from the matcher. This should fix PR40940:
https://bugs.llvm.org/show_bug.cgi?id=40940

llvm-svn: 355292
2019-03-03 18:59:33 +00:00
Fangrui Song
00c264dcc3 [DemandedBits] Remove some redundancy in the work list
InputIsKnownDead check is shared by all operands. Compute it once.

For non-integer instructions, use Visited.insert(I).second to replace a
find() and an insert().

llvm-svn: 355290
2019-03-03 14:50:01 +00:00
Fangrui Song
833c076d7b [DemandedBits] Optimize a find()+insert pattern with try_emplace and APInt::operator|=
llvm-svn: 355284
2019-03-03 11:12:57 +00:00
Florian Hahn
961b7ba244 [SCEV] Handle case where MaxBECount is less precise than ExactBECount for OR.
In some cases, MaxBECount can be less precise than ExactBECount for AND
and OR (the AND case was PR26207). In the OR test case, both ExactBECounts are
undef, but MaxBECount are different, so we hit the assertion below. This
patch uses the same solution the AND case already uses.

Assertion failed:
   ((isa<SCEVCouldNotCompute>(ExactNotTaken) || !isa<SCEVCouldNotCompute>(MaxNotTaken))
     && "Exact is not allowed to be less precise than Max"), function ExitLimit

This patch also consolidates test cases for both AND and OR in a single
test case.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13245

Reviewers: sanjoy, efriedma, mkazantsev

Reviewed By: sanjoy

Differential Revision: https://reviews.llvm.org/D58853

llvm-svn: 355259
2019-03-02 02:31:44 +00:00
Florian Hahn
73ecb4fe8b [SCEV] Remove undef check for SCEVConstant (NFC)
The value stored in SCEVConstant is of type ConstantInt*, which can
never be UndefValue. So we should never hit that code.

Reviewers: mkazantsev, sanjoy

Reviewed By: sanjoy

Differential Revision: https://reviews.llvm.org/D58851

llvm-svn: 355257
2019-03-02 01:57:28 +00:00
Nikita Popov
8b6244e618 [ValueTracking] Known bits support for unsigned saturating add/sub
We have two sources of known bits:

1. For adds leading ones of either operand are preserved. For sub
leading zeros of LHS and leading ones of RHS become leading zeros in
the result.

2. The saturating math is a select between add/sub and an all-ones/
zero value. As such we can carry out the add/sub known bits
calculation, and only preseve the known one/zero bits respectively.

Differential Revision: https://reviews.llvm.org/D58329

llvm-svn: 355223
2019-03-01 20:07:04 +00:00
Jonas Hahnfeld
065a990766 Hide two unused debugging methods, NFCI.
GCC correctly moans that PlainCFGBuilder::isExternalDef(llvm::Value*) and
StackSafetyDataFlowAnalysis::verifyFixedPoint() are defined but not used
in Release builds. Hide them behind 'ifndef NDEBUG'.

llvm-svn: 355205
2019-03-01 17:15:21 +00:00
Rong Xu
7da9d9200c [PGO] Context sensitive PGO (part 2)
Part 2 of CSPGO changes (mostly related to ProfileSummary).
Note that I use a default parameter in setProfileSummary() and getSummary().
This is to break the dependency in clang. I will make the parameter explicit
after changing clang in a separated patch.

Differential Revision: https://reviews.llvm.org/D54175

llvm-svn: 355131
2019-02-28 19:55:07 +00:00
Nikita Popov
2cd4c03c6b [ValueTracking] More accurate unsigned sub overflow detection
Second part of D58593.

Compute precise overflow conditions based on all known bits, rather
than just the sign bits. Unsigned a - b overflows iff a < b, and we
can determine whether this always/never happens based on the minimal
and maximal values achievable for a and b subject to the known bits
constraint.

llvm-svn: 355109
2019-02-28 18:04:20 +00:00
Bjorn Pettersson
a490753ab0 Add support for computing "zext of value" in KnownBits. NFCI
Summary:
The description of KnownBits::zext() and
KnownBits::zextOrTrunc() has confusingly been telling
that the operation is equivalent to zero extending the
value we're tracking. That has not been true, instead
the user has been forced to explicitly set the extended
bits as known zero afterwards.

This patch adds a second argument to KnownBits::zext()
and KnownBits::zextOrTrunc() to control if the extended
bits should be considered as known zero or as unknown.

Reviewers: craig.topper, RKSimon

Reviewed By: RKSimon

Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58650

llvm-svn: 355099
2019-02-28 15:45:29 +00:00
Nikita Popov
ccf8d4c393 [ValueTracking] More accurate unsigned add overflow detection
Part of D58593.

Compute precise overflow conditions based on all known bits, rather
than just the sign bits. Unsigned a + b overflows iff a > ~b, and we
can determine whether this always/never happens based on the minimal
and maximal values achievable for a and ~b subject to the known bits
constraint.

llvm-svn: 355072
2019-02-28 08:11:20 +00:00
Richard Trieu
24db3e65cf Fix IR/Analysis layering issue with OptBisect
OptBisect is in IR due to LLVMContext using it.  However, it uses IR units from
Analysis as well.  This change moves getDescription functions from OptBisect
to their respective IR units.  Generating names for IR units will now be up
to the callers, keeping the Analysis IR units in Analysis.  To prevent
unnecessary string generation, isEnabled function is added so that callers know
when the description needs to be generated.

Differential Revision: https://reviews.llvm.org/D58406

llvm-svn: 355068
2019-02-28 04:00:55 +00:00
Alina Sbirlea
8601cb2c88 [MemorySSA] Make insertDef insert corresponding phi nodes.
Summary:
The original assumption for the insertDef method was that it would not
materialize Defs out of no-where, hence it will not insert phis needed
after inserting a Def.

However, when cloning an instruction (use case used in LICM), we do
materialize Defs "out of no-where". If the block receiving a Def has at
least one other Def, then no processing is needed. If the block just
received its first Def, we must check where Phi placement is needed.
The only new usage of insertDef is in LICM, hence the trigger for the bug.

But the original goal of the method also fails to apply for the move()
method. If we move a Def from the entry point of a diamond to either the
left or right blocks, then the merge block must add a phi.
While this usecase does not currently occur, or may be viewed as an
incorrect transformation, MSSA must behave corectly given the scenario.

Resolves PR40749 and PR40754.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58652

llvm-svn: 355040
2019-02-27 22:20:22 +00:00
Sanjay Patel
647ad88eaf [InstSimplify] remove zero-shift-guard fold for general funnel shift
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-February/130491.html

We can't remove the compare+select in the general case because
we are treating funnel shift like a standard instruction (as
opposed to a special instruction like select/phi).

That means that if one of the operands of the funnel shift is
poison, the result is poison regardless of whether we know that
the operand is actually unused based on the instruction's
particular semantics.

The motivating case for this transform is the more specific
rotate op (rather than funnel shift), and we are preserving the
fold for that case because there is no chance of introducing
extra poison when there is no anonymous extra operand to the
funnel shift.

llvm-svn: 354905
2019-02-26 18:26:56 +00:00
Simon Pilgrim
bbece4ffe7 [Vectorizer] Add vectorization support for fixed smul/umul intrinsics
This requires a couple of tweaks to existing vectorization functions as they were assuming that only the second call argument (ctlz/cttz/powi) could ever be the 'always scalar' argument, but for smul.fix + umul.fix its the third argument.

Differential Revision: https://reviews.llvm.org/D58616

llvm-svn: 354790
2019-02-25 15:42:02 +00:00
Jordan Rupprecht
2aa95e5635 [NFC] Fix typos: preceeding -> preceding
llvm-svn: 354715
2019-02-23 01:28:32 +00:00
Chijun Sima
b98fe759b3 [DTU] Refine the interface and logic of applyUpdates
Summary:
This patch separates two semantics of `applyUpdates`:
1. User provides an accurate CFG diff and the dominator tree is updated according to the difference of `the number of edge insertions` and `the number of edge deletions` to infer the status of an edge before and after the update.
2. User provides a sequence of hints. Updates mentioned in this sequence might never happened and even duplicated.

Logic changes:

Previously, removing invalid updates is considered a side-effect of deduplication and is not guaranteed to be reliable. To handle the second semantic, `applyUpdates` does validity checking before deduplication, which can cause updates that have already been applied to be submitted again. Then, different calls to `applyUpdates` might cause unintended consequences, for example,
```
DTU(Lazy) and Edge A->B exists.
1. DTU.applyUpdates({{Delete, A, B}, {Insert, A, B}}) // User expects these 2 updates result in a no-op, but {Insert, A, B} is queued
2. Remove A->B
3. DTU.applyUpdates({{Delete, A, B}}) // DTU cancels this update with {Insert, A, B} mentioned above together (Unintended)
```
But by restricting the precondition that updates of an edge need to be strictly ordered as how CFG changes were made, we can infer the initial status of this edge to resolve this issue.

Interface changes:
The second semantic of `applyUpdates`  is separated to `applyUpdatesPermissive`.
These changes enable DTU(Lazy) to use the first semantic if needed, which is quite useful in `transforms/utils`.

Reviewers: kuhar, brzycki, dmgreen, grosser

Reviewed By: brzycki

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58170

llvm-svn: 354669
2019-02-22 13:48:38 +00:00
Sanjay Patel
bf014116f5 [InstSimplify] use any-zero matcher for fcmp folds
The m_APFloat matcher does not work with anything but strict
splat vector constants, so we could miss these folds and then
trigger an assertion in instcombine:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13201

The previous attempt at this in rL354406 had a logic bug that
actually triggered a regression test failure, but I failed to
notice it the first time.

llvm-svn: 354467
2019-02-20 14:34:00 +00:00
Sanjay Patel
689e75dc7b Revert "[InstSimplify] use any-zero matcher for fcmp folds"
This reverts commit 058bb8351351d56d2a4e8a772570231f9e5305e5.
Forgot to update another test affected by this change.

llvm-svn: 354408
2019-02-20 00:20:38 +00:00
Sanjay Patel
1e2c4b88b3 [InstSimplify] use any-zero matcher for fcmp folds
The m_APFloat matcher does not work with anything but strict
splat vector constants, so we could miss these folds and then
trigger an assertion in instcombine:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13201

llvm-svn: 354406
2019-02-20 00:09:50 +00:00
Sam Parker
c7acb5681d [BPI] Look through bitcasts in calcZeroHeuristic
Constant hoisting may have hidden a constant behind a bitcast so that
it isn't folded into its users. However, this prevents BPI from
calculating some of its heuristics that are based upon constant
values. So, I've added a simple helper function to look through these
casts.

Differential Revision: https://reviews.llvm.org/D58166

llvm-svn: 354119
2019-02-15 11:50:21 +00:00
Nick Desaulniers
08f5596af4 Revert "[INLINER] allow inlining of address taken blocks"
This reverts commit 19e95fe61182945b7b68ad15348f144fb996633f.

llvm-svn: 354082
2019-02-14 23:42:21 +00:00
Nick Desaulniers
995c559f1d [INLINER] allow inlining of address taken blocks
as long as their uses does not contain calls to functions that capture
the argument (potentially allowing the blockaddress to "escape" the
lifetime of the caller).

TODO:
- add more tests
- fix crash in llvm::updateCGAndAnalysisManagerForFunctionPass when
  invoking Transforms/Inline/blockaddress.ll

llvm-svn: 354079
2019-02-14 23:35:53 +00:00
Max Kazantsev
b7242fec0a Make widenable condition transparent for MemoryWriteTracking
Side effects of widenable condition intrinsic are modelled via
InaccessibleMemOnly, and there is no way to say that it isn't
really writing any memory. This patch teaches MemoryWriteTracking
ignore this intrinsic.

llvm-svn: 354021
2019-02-14 11:10:29 +00:00
Max Kazantsev
60734592fa Teach isGuaranteedToTransferExecutionToSuccessor about widenable conditions
Widenable condition intrinsic is guaranteed to return value, notify
the isGuaranteedToTransferExecutionToSuccessor function about it.

llvm-svn: 354020
2019-02-14 11:10:21 +00:00
Max Kazantsev
916098884d [NFC] Simplify code & reduce nest slightly
llvm-svn: 353832
2019-02-12 11:31:46 +00:00
Evandro Menezes
788b75ce57 [TargetLibraryInfo] Update run time support for Windows
It seems that, since VC19, the `float` C99 math functions are supported for all
targets, unlike the C89 ones.

According to the discussion at https://reviews.llvm.org/D57625.

llvm-svn: 353758
2019-02-11 22:12:01 +00:00
Alina Sbirlea
3b4aff3884 [MemorySSA] Remove verifyClobberSanity.
Summary:
This verification may fail after certain transformations due to
BasicAA's fragility. Added a small explanation and a testcase that
triggers the assert in checkClobberSanity (before its removal).
Addresses PR40509.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, llvm-commits, Prazek

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57973

llvm-svn: 353739
2019-02-11 19:51:21 +00:00
Michael Kruse
54b7706ba0 Refactor setAlreadyUnrolled() and setAlreadyVectorized().
Loop::setAlreadyUnrolled() and
LoopVectorizeHints::setLoopAlreadyUnrolled() both add loop metadata that
stops the same loop from being transformed multiple times. This patch
merges both implementations.

In doing so we fix 3 potential issues:

 * setLoopAlreadyUnrolled() kept the llvm.loop.vectorize/interleave.*
   metadata even though it will not be used anymore. This already caused
   problems such as http://llvm.org/PR40546. Change the behavior to the
   one of setAlreadyUnrolled which deletes this loop metadata.

 * setAlreadyUnrolled() used to create a new LoopID by calling
   MDNode::get with nullptr as the first operand, then replacing it by
   the returned references using replaceOperandWith. It is possible
   that MDNode::get would instead return an existing node (due to
   de-duplication) that then gets modified. To avoid, use a fresh
   TempMDNode that does not get uniqued with anything else before
   replacing it with replaceOperandWith.

 * LoopVectorizeHints::matchesHintMetadataName() only compares the
   suffix of the attribute to set the new value for. That is, when
   called with "enable", would erase attributes such as
   "llvm.loop.unroll.enable", "llvm.loop.vectorize.enable" and
   "llvm.loop.distribute.enable" instead of the one to replace.
   Fortunately, function was only called with "isvectorized".

Differential Revision: https://reviews.llvm.org/D57566

llvm-svn: 353738
2019-02-11 19:45:44 +00:00
Evandro Menezes
c0cab68f7b [TargetLibraryInfo] Update run time support for Windows
It seems that the run time for Windows has changed and supports more math
functions than it used to, especially on AArch64, ARM, and AMD64.

Fixes PR40541.

Differential revision: https://reviews.llvm.org/D57625

llvm-svn: 353733
2019-02-11 19:02:28 +00:00
Chandler Carruth
45b3e3ad13 Move CFLGraph and the AA summary code over to the new CallBase
instruction base class rather than the `CallSite` wrapper.

llvm-svn: 353676
2019-02-11 09:25:41 +00:00
Chandler Carruth
1042c3aa50 Remove CallSite from the CodeMetrics analysis, moving it to the new
`CallBase` and simpler APIs therein.

llvm-svn: 353673
2019-02-11 09:03:32 +00:00
Chandler Carruth
45d6040ab3 [CallSite removal] Port InstSimplify over to use CallBase both in its
interface and implementation.

Port code with: `cast<CallBase>(CS.getInstruction())`.

llvm-svn: 353662
2019-02-11 07:54:10 +00:00
Chandler Carruth
25dd71753a [CallSite removal] Migrate ConstantFolding APIs and implementation to
`CallBase`.

Users have been updated. You can see how to update any out-of-tree
usages: pass `cast<CallBase>(CS.getInstruction())`.

llvm-svn: 353661
2019-02-11 07:51:44 +00:00
Craig Topper
ea7e6b3857 Implementation of asm-goto support in LLVM
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html

This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.

This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.

There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.

Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii

Differential Revision: https://reviews.llvm.org/D53765

llvm-svn: 353563
2019-02-08 20:48:56 +00:00
Sergey Dmitriev
c02969bade [CodeExtractor] Update function's assumption cache after extracting blocks from it
Summary: Assumption cache's self-updating mechanism does not correctly handle the case when blocks are extracted from the function by the CodeExtractor. As a result function's assumption cache may have stale references to the llvm.assume calls that were moved to the outlined function. This patch fixes this problem by removing extracted llvm.assume calls from the function’s assumption cache.

Reviewers: hfinkel, vsk, fhahn, davidxl, sanjoy

Reviewed By: hfinkel, vsk

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57215

llvm-svn: 353500
2019-02-08 06:55:18 +00:00
Sam Parker
555093ddca [LSR] Generate cross iteration indexes
Modify GenerateConstantOffsetsImpl to create offsets that can be used
by indexed addressing modes. If formulae can be generated which
result in the constant offset being the same size as the recurrence,
we can generate a pre-indexed access. This allows the pointer to be
updated via the single pre-indexed access so that (hopefully) no
add/subs are required to update it for the next iteration. For small
cores, this can significantly improve performance DSP-like loops.

Differential Revision: https://reviews.llvm.org/D55373

llvm-svn: 353403
2019-02-07 13:32:54 +00:00
Alina Sbirlea
8c8a782f87 [LICM/MSSA] Add promotion to scalars by building an AliasSetTracker with MemorySSA.
Summary:
Experimentally we found that promotion to scalars carries less benefits
than sinking and hoisting in LICM. When using MemorySSA, we build an
AliasSetTracker on demand in order to reuse the current infrastructure.
We only build it if less than AccessCapForMSSAPromotion exist in the
loop, a cap that is by default set to 250. This value ensures there are
no runtime regressions, and there are small compile time gains for
pathological cases. A much lower value (20) was found to yield a single
regression in the llvm-test-suite and much higher benefits for compile
times. Conservatively we set the current cap to a high value, but we will
explore lowering it when MemorySSA is enabled by default.

Reviewers: sanjoy, chandlerc

Subscribers: nemanjai, jlebar, Prazek, george.burgess.iv, jfb, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D56625

llvm-svn: 353339
2019-02-06 20:25:17 +00:00
Alina Sbirlea
cbb1b7b00a [AliasSetTracker] Pass MustAlias to addPointer more often.
Summary:
Pass the alias info to addPointer when available. Will save an alias()
call for must sets when adding a known Must or May alias.
[Part of a series of cleanup patches]

Reviewers: reames, mkazantsev

Subscribers: sanjoy, jlebar, llvm-commits

Differential Revision: https://reviews.llvm.org/D56613

llvm-svn: 353335
2019-02-06 19:55:12 +00:00
Philip Reames
135121a399 [AliasSetTracker] Minor style tweak to avoid a variable w/two distinct live ranges [NFC]
llvm-svn: 353267
2019-02-06 03:46:40 +00:00
Richard Trieu
cc5fa2b650 Move DomTreeUpdater from IR to Analysis
DomTreeUpdater depends on headers from Analysis, but is in IR.  This is a
layering violation since Analysis depends on IR.  Relocate this code from IR
to Analysis to fix the layering violation.

llvm-svn: 353265
2019-02-06 02:52:52 +00:00
Alina Sbirlea
4d489e3f47 [BasicAA] Cache nonEscapingLocalObjects for alias() calls.
Summary:
Use a small cache for Values tested by nonEscapingLocalObject().
Since the calls to PointerMayBeCaptured are fairly expensive, this saves
a good amount of compile time for anything relying heavily on
BasicAA.alias() calls.

This uses the same approach as the AliasCache, i.e. the cache is reset
after each alias() call. The cache is not used or updated by modRefInfo
calls since it's harder to know when to reset the cache.

Testcases that show improvements with this patch are too large to
include. Example compile time improvement: 7s to 6s.

Reviewers: chandlerc, sunfish

Subscribers: sanjoy, jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57627

llvm-svn: 353245
2019-02-05 23:52:08 +00:00
Evandro Menezes
8b9cac7053 [TargetLibraryInfo] Regroup run time functions for Windows (NFC)
Regroup supported and unsupported functions by precision and C standard.

llvm-svn: 353213
2019-02-05 20:24:21 +00:00
Hiroshi Inoue
d74d87c0db [NFC] fix trivial typos in comments
llvm-svn: 353147
2019-02-05 08:30:48 +00:00
Evandro Menezes
a25ad8a4a9 Revert "[PATCH] [TargetLibraryInfo] Update run time support for Windows"
This reverts accidental commit ff5527718d5d3b9966f6e8948866c0dc15ffcf3c.

llvm-svn: 353118
2019-02-04 23:34:50 +00:00
Evandro Menezes
02cfb18433 [PATCH] [TargetLibraryInfo] Update run time support for Windows
It seems that the run time for Windows has changed and supports more math
functions than before.  Since LLVM requires at least VS2015, I assume that
this is the run time that would be redistributed with programs built with
Clang.  Thus, I based this update on the header file `math.h` that
accompanies it.

This patch addresses the PR40541.  Unfortunately, I have no access to a
Windows development environment to validate it.

llvm-svn: 353114
2019-02-04 23:29:41 +00:00
David Callahan
37990503cc Adjust cardinality of internal inliner thresholds
Summary:
While compiling openJDK11 (also other workloads), some make files would pass both  CFLAGS  and LDFLAGS at link step ; resulting in duplicate options on the command line when one is using LTO and trying to influence the inliner. Most of the internal flags are ZeroOrMore, this diff changes the remaining ones.

Reviewers: david2050, twoh, modocache

Reviewed By: twoh

Subscribers: mehdi_amini, dexonsmith, eraman, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57537

Patch by: Abdoul-Kader Keita

llvm-svn: 353071
2019-02-04 18:46:25 +00:00
Max Kazantsev
ae64db0c14 [SCEV] Do not bother creating separate SCEVUnknown for unreachable nodes
Currently, SCEV creates SCEVUnknown for every node of unreachable code. If we
have a huge amounts of such code, we will be littering SE with these nodes. We could
just state that they all are undef and save some memory.

Differential Revision: https://reviews.llvm.org/D57567
Reviewed By: sanjoy

llvm-svn: 353017
2019-02-04 05:04:19 +00:00
Philip Pfaffe
a414bd14ab [DA][NewPM] Handle transitive dependencies in the new-pm version of DA
Summary:
The analysis result of DA caches pointers to AA, SCEV, and LI, but it
never checks for their invalidation. Fix that.

Reviewers: chandlerc, dmgreen, bogner

Reviewed By: dmgreen

Subscribers: hiraditya, bollu, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D56381

llvm-svn: 352986
2019-02-03 12:25:41 +00:00
Dmitry Venikov
c1c4330b2a [InstSimplify] Missed optimization in math expression: log10(pow(10.0,x)) == x, log2(pow(2.0,x)) == x
Summary: This patch enables folding following instructions under -ffast-math flag: log10(pow(10.0,x)) -> x, log2(pow(2.0,x)) -> x

Reviewers: hfinkel, spatel, efriedma, craig.topper, zvi, majnemer, lebedev.ri

Reviewed By: spatel, lebedev.ri

Subscribers: lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D41940

llvm-svn: 352981
2019-02-03 03:48:30 +00:00
Yevgeny Rouban
fb40fa7cec Provide reason messages for unviable inlining
InlineCost's isInlineViable() is changed to return InlineResult
instead of bool. This provides messages for failure reasons and
allows to get more specific messages for cases where callsites
are not viable for inlining.

Reviewed By: xbolva00, anemet

Differential Revision: https://reviews.llvm.org/D57089

llvm-svn: 352849
2019-02-01 10:44:43 +00:00
Alina Sbirlea
413877fd20 [MemorySSA] Extend removeMemoryAccess API to optimize MemoryPhis.
Summary:
EarlyCSE needs to optimize MemoryPhis after an access is removed and has
special handling for it. This should be handled by MemorySSA instead.
The default remains that MemoryPhis are *not* optimized after an access
is removed.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D57199

llvm-svn: 352787
2019-01-31 20:13:47 +00:00
Yevgeny Rouban
6f15c9c083 Test commit. NFCI.
llvm-svn: 352738
2019-01-31 08:49:20 +00:00
Max Kazantsev
1f510bd4c6 [SCEV] Prohibit SCEV transformations for huge SCEVs
Currently SCEV attempts to limit transformations so that they do not work with
big SCEVs (that may take almost infinite compile time). But for this, it uses heuristics
such as recursion depth and number of operands, which do not give us a guarantee
that we don't actually have big SCEVs. This situation is still possible, though it is not
likely to happen. However, the bug PR33494 showed a bunch of simple corner case
tests where we still produce huge SCEVs, even not reaching big recursion depth etc.

This patch introduces a concept of 'huge' SCEVs. A SCEV is huge if its expression
size (intoduced in D35989) exceeds some threshold value. We prohibit optimizing
transformations if any of SCEVs we are dealing with is huge. This gives us a reliable
check that we don't spend too much time working with them.

As the next step, we can possibly get rid of old limiting mechanisms, such as recursion
depth thresholds.

Differential Revision: https://reviews.llvm.org/D35990
Reviewed By: reames

llvm-svn: 352728
2019-01-31 06:19:25 +00:00
Erik Pilkington
133b958621 Add a 'dynamic' parameter to the objectsize intrinsic
This is meant to be used with clang's __builtin_dynamic_object_size.
When 'true' is passed to this parameter, the intrinsic has the
potential to be folded into instructions that will be evaluated
at run time. When 'false', the objectsize intrinsic behaviour is
unchanged.

rdar://32212419

Differential revision: https://reviews.llvm.org/D56761

llvm-svn: 352664
2019-01-30 20:34:35 +00:00
Hiroshi Inoue
504bb0dad8 [NFC] fix trivial typos in comments
llvm-svn: 352602
2019-01-30 05:26:31 +00:00
Max Kazantsev
d52a6c5060 [NFC] Use ArrayRef instead of SmallVectorImpl where possible
llvm-svn: 352466
2019-01-29 09:39:15 +00:00
Max Kazantsev
7ed87fdb4e [SCEV] Take correct loop in AddRec simplification. PR40420
The code of AddRec simplification is using wrong loop when it creates a new
AddRecExpr. It should be using AddRecLoop which we have saved and against which
all gate checks are made, and not calling AddRec->getLoop() over and over
again because AddRec may change and become an AddRecurrency from outer loop
during the transform iterations.

Considering this change trivial, commiting for postcommit review.

llvm-svn: 352451
2019-01-29 05:37:59 +00:00
Teresa Johnson
0c7ba752d1 [ThinLTO] Add option to dump per-module summary dot graph
Summary:
I found that there currently isn't a way to invoke exportToDot from
the command line for a per-module summary index, and therefore no
testing of that case. Add an internal option and use it to test dumping
of per module summary indexes.

In particular, I am looking at fixing the limitation that causes the
aliasee GUID in the per-module summary to be 0, and want to be able to
test that change.

Reviewers: evgeny777

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D57206

llvm-svn: 352441
2019-01-28 23:43:26 +00:00
Alina Sbirlea
03b47976ac [AliasSetTracker] Cleanup more comments. [NFCI]
llvm-svn: 352416
2019-01-28 19:38:03 +00:00
Alina Sbirlea
4021140d52 [AliasSetTracker] Cleanup comments. [NFCI]
llvm-svn: 352406
2019-01-28 19:01:32 +00:00
Alina Sbirlea
ca31619260 [AliasSetTracker] Update signature to aliasesPointer [NFCI].
llvm-svn: 352399
2019-01-28 18:30:05 +00:00
Johannes Doerfert
b9ee173a80 [ValueTracking] Look through casts when determining non-nullness
Bitcast and certain Ptr2Int/Int2Ptr instructions will not alter the
value of their operand and can therefore be looked through when we
determine non-nullness.

Differential Revision: https://reviews.llvm.org/D54956

llvm-svn: 352293
2019-01-26 23:40:35 +00:00
Eli Friedman
d0caf58911 [Analysis] Fix isSafeToLoadUnconditionally handling of volatile.
A volatile operation cannot be used to prove an address points to normal
memory.  (LangRef was recently updated to state it explicitly.)

Differential Revision: https://reviews.llvm.org/D57040

llvm-svn: 352109
2019-01-24 21:31:13 +00:00
Simon Pilgrim
20dbf82008 Move saturated arithmetic intrinsics to other integer intrinsics. NFCI.
They were in the floating point group.

llvm-svn: 351953
2019-01-23 13:49:10 +00:00
Max Kazantsev
03355e4336 [NFC] Add function to parse widenable conditional branches
llvm-svn: 351803
2019-01-22 11:21:32 +00:00
Max Kazantsev
75db383edd [NFC] Add detector for guards expressed as branch by widenable conditions
This patch adds a function to detect guards expressed in explicit control
flow form as branch by `and` with widenable condition intrinsic call:

    %wc = call i1 @llvm.experimental.widenable.condition()
    %guard_cond = and i1, %some_cond, %wc
    br i1 %guard_cond, label %guarded, label %deopt

  deopt:
    <maybe some non-side-effecting instructions>
    deoptimize()

This form can be used as alternative to implicit control flow guard
representation expressed by `experimental_guard` intrinsic.

Differential Revision: https://reviews.llvm.org/D56074
Reviewed By: reames

llvm-svn: 351791
2019-01-22 09:36:22 +00:00
Philip Reames
22e59f8759 [CVP] Use LVI to constant fold deopt operands
Deopt operands are generally intended to record information about a site in code with minimal perturbation of the surrounding code. Idiomatically, they also tend to appear down rare paths. Putting these together, we have an obvious case for extending CVP w/deopt operand constant folding. Arguably, we should be doing this for all operands on all instructions, but that's definitely a much larger and risky change.

Differential Revision: https://reviews.llvm.org/D55678

llvm-svn: 351774
2019-01-22 01:34:33 +00:00
Max Kazantsev
f0c38d90c7 [SCEV][NFC] Introduces expression sizes estimation
This patch introduces the field `ExpressionSize` in SCEV. This field is
calculated only once on SCEV creation, and it represents the complexity of
this SCEV from arithmetical point of view (not from the point of the number
of actual different SCEV nodes that are used in the expression). Roughly
saying, it is the number of operands and operations symbols when we print this
SCEV.

A formal definition is following: if SCEV `X` has operands
  `Op1`, `Op2`, ..., `OpN`,
then
  Size(X) = 1 + Size(Op1) + Size(Op2) + ... + Size(OpN).
Size of SCEVConstant and SCEVUnknown is one.

Expression size may be used as a universal way to limit SCEV transformations
for huge SCEVs. Currently, we have a bunch of options that represents various
limits (such as recursion depth limit) that may not make any sense from the
point of view of a LLVM users who is not familiar with SCEV internals, and all
these different options pursue one goal. A more general rule that may
potentially allow us to get rid of this redundancy in options is "do not make
transformations with SCEVs of huge size". It can apply to all SCEV traversals
and transformations that may need to visit a SCEV node more than once, hence
they are prone to combinatorial explosions.

This patch only introduces SCEV sizes calculation as NFC, its utilization will
be introduced in follow-up patches.

Differential Revision: https://reviews.llvm.org/D35989
Reviewed By: reames

llvm-svn: 351725
2019-01-21 06:19:50 +00:00
Chandler Carruth
ae65e281f3 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Teresa Johnson
e8cb930dbb Revert "[ThinLTO] Add summary entries for index-based WPD"
Mistaken commit of something still under review!

This reverts commit r351453.

llvm-svn: 351455
2019-01-17 16:05:04 +00:00
Teresa Johnson
5b4ceacd72 [ThinLTO] Add summary entries for index-based WPD
Summary:
If LTOUnit splitting is disabled, the module summary analysis computes
the summary information necessary to perform single implementation
devirtualization during the thin link with the index and no IR. The
information collected from the regular LTO IR in the current hybrid WPD
algorithm is summarized, including:
1) For vtable definitions, record the function pointers and their offset
within the vtable initializer (subsumes the information collected from
IR by tryFindVirtualCallTargets).
2) A record for each type metadata summarizing the vtable definitions
decorated with that metadata (subsumes the TypeIdentiferMap collected
from IR).

Also added are the necessary bitcode records, and the corresponding
assembly support.

The index-based WPD will be sent as a follow-on.

Depends on D53890.

Reviewers: pcc

Subscribers: mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54815

llvm-svn: 351453
2019-01-17 15:49:03 +00:00
Tom Stellard
62cb21da04 Only promote args when function attributes are compatible
Summary:
Check to make sure that the caller and the callee have compatible
function arguments before promoting arguments.  This uses the same
TargetTransformInfo queries that are used to determine if attributes
are compatible for inlining.

The goal here is to avoid breaking ABI when a called function's ABI
depends on a target feature that is not enabled in the caller.

This is a very conservative fix for PR37358.  Ideally we would have a more
sophisticated check for ABI compatiblity rather than checking if the
attributes are compatible for inlining.

Reviewers: echristo, chandlerc, eli.friedman, craig.topper

Reviewed By: echristo, chandlerc

Subscribers: nikic, xbolva00, rkruppe, alexcrichton, llvm-commits

Differential Revision: https://reviews.llvm.org/D53554

llvm-svn: 351296
2019-01-16 05:15:31 +00:00
Nikita Popov
3247e0488e Reapply "[DemandedBits] Use SetVector for Worklist"
DemandedBits currently uses a simple vector for the worklist, which
means that instructions may be inserted multiple times into it.
Especially in combination with the deep lattice, this may cause
instructions too be recomputed very often. To avoid this, switch
to a SetVector.

Reapplying with a smaller number of inline elements in the
SmallSetVector, to avoid running into the SmallDenseMap issue
described in D56455.

Differential Revision: https://reviews.llvm.org/D56362

llvm-svn: 350997
2019-01-12 09:09:15 +00:00
Nikita Popov
8654837372 [ConstantFolding] Fold undef for integer intrinsics
This fixes https://bugs.llvm.org/show_bug.cgi?id=40110.

This implements handling of undef operands for integer intrinsics in
ConstantFolding, in particular for the bitcounting intrinsics (ctpop,
cttz, ctlz), the with.overflow intrinsics, the saturating math
intrinsics and the funnel shift intrinsics.

The undef behavior follows what InstSimplify does for the general cas
e of non-constant operands. For the bitcount intrinsics (where
InstSimplify doesn't do undef handling -- there cannot be a combination
of an undef + non-constant operand) I'm using a 0 result if the intrinsic
is defined for zero and undef otherwise.

Differential Revision: https://reviews.llvm.org/D55950

llvm-svn: 350971
2019-01-11 21:18:00 +00:00
Teresa Johnson
52c19ef944 [LTO] Record whether LTOUnit splitting is enabled in index
Summary:
Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).

The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.

This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:
1) We are performing LowerTypeTests or Whole Program Devirtualization.
2) There are type tests or type checked loads in the code.

Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.

Reviewers: pcc

Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits

Differential Revision: https://reviews.llvm.org/D53890

llvm-svn: 350948
2019-01-11 18:31:57 +00:00
Alina Sbirlea
fe33c8f598 [MemorySSA] Disable checkClobberSanity for SkipSelfWalker.
Sanity will fail for this, since we're exploring getting a clobber
further than the sanity check expects.
Ideally we need to teach the sanity check to differentiate between the
two walkers based on the SkipSelf bool in the query.

llvm-svn: 350895
2019-01-10 21:47:15 +00:00
Easwaran Raman
a5daaaae35 Refactor synthetic profile count computation. NFC.
Summary:
Instead of using two separate callbacks to return the entry count and the
relative block frequency, use a single callback to return callsite
count. This would allow better supporting hybrid mode in the future as
the count of callsite need not always be derived from entry count (as in
sample PGO).

Reviewers: davidxl

Subscribers: mehdi_amini, steven_wu, dexonsmith, dang, llvm-commits

Differential Revision: https://reviews.llvm.org/D56464

llvm-svn: 350755
2019-01-09 20:10:27 +00:00
Easwaran Raman
372b240afa [Inliner] Assert that the computed inline threshold is non-negative.
Reviewers: chandlerc

Subscribers: haicheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D56409

llvm-svn: 350751
2019-01-09 19:26:17 +00:00
David Callahan
cd641e1e98 refactor BlockFrequencyInfo::view to take a title parameter
Summary: All a non-default title for the debugging this debugging aide

Reviewers: twoh, Kader, modocache

Reviewed By: twoh

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56499

llvm-svn: 350749
2019-01-09 19:12:38 +00:00
Max Kazantsev
523ec9ed09 [IPT] Drop cache less eagerly in GVN and LoopSafetyInfo
Current strategy of dropping `InstructionPrecedenceTracking` cache is to
invalidate the entire basic block whenever we change its contents. In fact,
`InstructionPrecedenceTracking` has 2 internal strictures: `OrderedInstructions`
that is needed to be invalidated whenever the contents changes, and the map
with first special instructions in block. This second map does not need an
update if we add/remove a non-special instuction because it cannot
affect the contents of this map.

This patch changes API of `InstructionPrecedenceTracking` so that it now
accounts for reasons under which we invalidate blocks. This should lead
to much less recalculations of the map and should save us some compile time
because in practice we don't typically add/remove special instructions.

Differential Revision: https://reviews.llvm.org/D54462
Reviewed By: efriedma

llvm-svn: 350694
2019-01-09 07:28:13 +00:00
Philip Pfaffe
c5802c17b7 [DA][NewPM] Add a printerpass and port the testsuite
The new-pm version of DA is untested. Testing requires a printer, so
add that and use it in the existing DA tests.

Differential Revision: https://reviews.llvm.org/D56386

llvm-svn: 350624
2019-01-08 14:06:58 +00:00
Alina Sbirlea
608c9998e4 [MemorySSA] Add SkipSelfWalker.
Summary: Add implementation of SkipSelfWalker.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D56285

llvm-svn: 350561
2019-01-07 19:38:47 +00:00
Alina Sbirlea
6f9fb7e80b [MemorySSA] Refactor CachingWalker.
Summary:
Refactor caching walker to make creating a walker that skips the
starting access strightforward.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits, jfb

Differential Revision: https://reviews.llvm.org/D55957

llvm-svn: 350558
2019-01-07 19:22:37 +00:00
Alina Sbirlea
6c3810e67a [MemorySSA] Extend the clobber walker with the option to skip the starting access.
Summary:
The option enables loop transformations to hoist accesses that do not
have clobbers in the loop. If the clobber queries skips the starting
access, the result may be outside the loop instead of the header Phi.

Adding the walker that uses this option in a separate patch.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D55944

llvm-svn: 350551
2019-01-07 18:40:27 +00:00
Nikita Popov
a26e9aae55 Revert "[DemandedBits] Use SetVector for Worklist"
This reverts commit r350547.

Seeing assertion failures on clang tests.

llvm-svn: 350549
2019-01-07 18:15:11 +00:00
Nikita Popov
cef1cdf524 [DemandedBits] Use SetVector for Worklist
DemandedBits currently uses a simple vector for the worklist, which
means that instructions may be inserted multiple times into it.
Especially in combination with the deep lattice, this may cause
instructions too be recomputed very often. To avoid this, switch
to a SetVector.

Differential Revision: https://reviews.llvm.org/D56362

llvm-svn: 350547
2019-01-07 18:03:36 +00:00
Chandler Carruth
5954d00ef2 [CallSite removal] Port IndirectCallSiteVisitor to use CallBase and
update client code.

Also rename it to use the more generic term `call` instead of something
that could be confused with a praticular type.

Differential Revision: https://reviews.llvm.org/D56183

llvm-svn: 350508
2019-01-07 07:15:51 +00:00
Chandler Carruth
cb1f5addb7 [CallSite removal] Migrate all Alias Analysis APIs to use the newly
minted `CallBase` class instead of the `CallSite` wrapper.

This moves the largest interwoven collection of APIs that traffic in
`CallSite`s. While a handful of these could have been migrated with
a minorly more shallow migration by converting from a `CallSite` to
a `CallBase`, it hardly seemed worth it. Most of the APIs needed to
migrate together because of the complex interplay of AA APIs and the
fact that converting from a `CallBase` to a `CallSite` isn't free in its
current implementation.

Out of tree users of these APIs can fairly reliably migrate with some
combination of `.getInstruction()` on the `CallSite` instance and
casting the resulting pointer. The most generic form will look like `CS`
-> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there
is a more elegant migration. Hopefully, this migrates enough APIs for
users to fully move from `CallSite` to the base class. All of the
in-tree users were easily migrated in that fashion.

Thanks for the review from Saleem!

Differential Revision: https://reviews.llvm.org/D55641

llvm-svn: 350503
2019-01-07 05:42:51 +00:00
Nikita Popov
61d02a0896 [BDCE] Remove dead uses of arguments
In addition to finding dead uses of instructions, also find dead uses
of function arguments, and replace them with zero as well.

I'm changing the way the known bits are computed here to remove the
coupling between the transfer function and the algorithm. It previously
relied on the first op being visited first and computing known bits --
unless the first op is not an instruction, in which case they're computed
on the second op. I could have adjusted this to check for "instruction
or argument", but I think it's better to avoid the repeated calculation
with an explicit flag.

Differential Revision: https://reviews.llvm.org/D56247

llvm-svn: 350435
2019-01-04 21:21:43 +00:00
Florian Hahn
1b9728f62a [ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffset
GetPointerBaseWithConstantOffset include this code, where ByteOffset
and GEPOffset are both of type llvm::APInt :

  ByteOffset += GEPOffset.getSExtValue();

The problem with this line is that getSExtValue() returns an int64_t, but
the += matches an overload for uint64_t. The problem is that the resulting
APInt is no longer considered to be signed. That in turn causes assertion
failures later on if the relevant pointer type is > 64 bits in width and
the GEPOffset was negative.

Changing it to

  ByteOffset += GEPOffset.sextOrTrunc(ByteOffset.getBitWidth());

resolves the issue and explicitly performs the sign-extending
or truncation. Additionally, instead of asserting later if the result
is > 64 bits, it breaks out of the loop in that case.

See also
 https://reviews.llvm.org/D24729
 https://reviews.llvm.org/D24772

This commit must be merged after D38662 in order for the test to pass.

Patch by Michael Ferguson <mpfergu@gmail.com>.

Reviewers: reames, sanjoy, hfinkel

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D38501

llvm-svn: 350395
2019-01-04 14:53:22 +00:00
Simon Pilgrim
6db96180a2 [SLPVectorizer] Flag ADD/SUB SSAT/USAT intrinsics trivially vectorizable (PR40123)
Enables SLP vectorization for the SSE2 PADDS/PADDUS/PSUBS/PSUBUS style intrinsics

llvm-svn: 350300
2019-01-03 12:18:23 +00:00
Hal Finkel
88d49a9216 [BasicAA] Support arbitrary pointer sizes (and fix an overflow bug)
Motivated by the discussion in D38499, this patch updates BasicAA to support
arbitrary pointer sizes by switching most remaining non-APInt calculations to
use APInt. The size of these APInts is set to the maximum pointer size (maximum
over all address spaces described by the data layout string).

Most of this translation is straightforward, but this patch contains a fix for
a bug that revealed itself during this translation process. In order for
test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit
pointers, the intermediate calculations must be performed using 64-bit
integers. This is because, as noted in the patch, when GetLinearExpression
decomposes an expression into C1*V+C2, and we then multiply this by Scale, and
distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even
through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can
overflow. If this happens, later logic will draw invalid conclusions from the
(base) offset value. Thus, when initially applying the APInt conversion,
because the maximum pointer size in this test is 32 bits, it started failing.
Suspicious, I created a 64-bit version of this test (included here), and that
failed (miscompiled) on trunk for a similar reason (the multiplication can
overflow).

After fixing this overflow bug, the first test case (at least) in
Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was
relying on having 64-bit intermediate values to have BasicAA return an accurate
result. In order to fix this problem, and because I believe that it is not
uncommon to use i64 indexing expressions in 32-bit code (especially portable
code using int64_t), it seems reasonable to always use at least 64-bit
integers. In this way, we won't regress our analysis capabilities (and there's
a command-line option added, so experimenting with this should be easy).

As pointed out by Eli during the review, there are other potential overflow
conditions that this patch does not address. Fixing those is left to follow-up
work.

Patch by me with contributions from Michael Ferguson (mferguson@cray.com).

Differential Revision: https://reviews.llvm.org/D38662

llvm-svn: 350220
2019-01-02 16:28:09 +00:00
Nikita Popov
a3f725f6cd Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 350188
2019-01-01 10:05:26 +00:00
George Burgess IV
11021e590b [MemoryLocation] Use LocationSize instead of ints; NFC
Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.

This one sadly isn't *super* small, but all of the changes here are
either to:
- libfuncs that are passed a constant size (memcpy, memset, ...)
- instructions that store/load a constant size

So they have to be precise

llvm-svn: 350017
2018-12-23 03:36:44 +00:00
George Burgess IV
5c64c0f1d1 [Loads] Use LocationSize instead of ints; NFC
Keeping these patches super small so they're easily post-commit
verifiable, as requested in D44748.

This tries to find literal loads/stores of the given type, so this has
to be precise.

llvm-svn: 350016
2018-12-23 03:10:56 +00:00
George Burgess IV
304a7e4519 [Lint] Use LocationSize instead of ints; NFC
Keeping these patches super small so they're easily post-commit
verifiable, as requested in D44748.

llvm-svn: 350015
2018-12-23 02:50:08 +00:00
George Burgess IV
584e0434bb [AAEval] Use LocationSize instead of ints; NFC
Keeping these patches super small so they're easily post-commit
verifiable, as requested in D44748.

llvm-svn: 350014
2018-12-23 02:39:58 +00:00
George Burgess IV
283d33e193 [Analysis] More LocationSize cleanup; NFC
Keeping these patches super small so they're easily post-commit
verifiable, as requested in D44748.

llvm-svn: 350008
2018-12-22 18:23:21 +00:00
Vedant Kumar
0705274c67 [IR] Add Instruction::isLifetimeStartOrEnd, NFC
Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an
llvm.lifetime.start or an llvm.lifetime.end intrinsic.

This was suggested as a cleanup in D55967.

Differential Revision: https://reviews.llvm.org/D56019

llvm-svn: 349964
2018-12-21 21:49:40 +00:00
Reid Kleckner
d851eb745f [BasicAA] Fix AA bug on dynamic allocas and stackrestore
Summary:
BasicAA has special logic for unescaped allocas, which normally applies
equally well to dynamic and static allocas. However, llvm.stackrestore
has the power to end the lifetime of dynamic allocas, without referring
to them directly.

stackrestore is already marked with the most conservative memory
modification attributes, but because the alloca is not escaped, the
normal logic produces incorrect results. I think BasicAA needs a special
case here to teach it about the relationship between dynamic allocas and
stackrestore.

Fixes PR40118

Reviewers: gbiv, efriedma, george.burgess.iv

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55969

llvm-svn: 349945
2018-12-21 19:59:03 +00:00
Florian Hahn
800a38a203 [LAA] Avoid generating RT checks for known deps preventing vectorization.
If we found unsafe dependences other than 'unknown', we already know at
compile time that they are unsafe and the runtime checks should always
fail. So we can avoid generating them in those cases.

This should have no negative impact on performance as the runtime checks
that would be created previously should always fail. As a sanity check,
I measured the test-suite, spec2k and spec2k6 and there were no regressions.

Reviewers: Ayal, anemet, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D55798

llvm-svn: 349794
2018-12-20 18:49:09 +00:00
Clement Courbet
af5c2af0bc [NFC] Fix trailing comma after function.
lib/Analysis/VectorUtils.cpp:482:2: warning: extra ‘;’ [-Wpedantic]

llvm-svn: 349732
2018-12-20 09:20:07 +00:00
Michael Kruse
a9ec994652 Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.

This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).

This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.

The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.

Differential Revision: https://reviews.llvm.org/D52116

llvm-svn: 349725
2018-12-20 04:58:07 +00:00
Nikita Popov
045fafaf88 Revert "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This reverts commit r349674. It causes a failure in
test-suite enc-3des.execution_time.

llvm-svn: 349684
2018-12-19 22:09:02 +00:00
Nikita Popov
1da83b4649 [BDCE][DemandedBits] Detect dead uses of undead instructions
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The test case has a couple of cases that are not simplified yet. In
particular, we're only looking at uses of instructions right now. I think
it would make sense to also extend this to arguments. Furthermore
DemandedBits doesn't yet know some of the tricks that InstCombine does
for the demanded bits or bitwise or/and/xor in combination with known
bits information.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 349674
2018-12-19 19:56:21 +00:00