1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 04:22:57 +02:00
Commit Graph

1336 Commits

Author SHA1 Message Date
Hiroshi Yamauchi
0596ae1e4a Add heuristics for irreducible loop metadata under PGO
Summary:
Add the following heuristics for irreducible loop metadata:

- When an irreducible loop header is missing the loop header weight metadata,
  give it the minimum weight seen among other headers.
- Annotate indirectbr targets with the loop header weight metadata (as they are
  likely to become irreducible loop headers after indirectbr tail duplication.)

These greatly improve the accuracy of the block frequency info of the Python
interpreter loop (eg. from ~3-16x off down to ~40-55% off) and the Python
performance (eg. unpack_sequence from ~50% slower to ~8% faster than GCC) due to
better register allocation under PGO.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39980

llvm-svn: 318693
2017-11-20 21:03:38 +00:00
Mohammed Agabaria
059fc817b2 [LV][X86] Support of AVX2 Gathers code generation and update the LV with this
This patch depends on: https://reviews.llvm.org/D35348

Support of pattern selection of masked gathers of AVX2 (X86\AVX2 code gen)
Update LoopVectorize to generate gathers for AVX2 processors.

Reviewers: delena, zvi, RKSimon, craig.topper, aaboud, igorb

Reviewed By: delena, RKSimon

Differential Revision: https://reviews.llvm.org/D35772

llvm-svn: 318641
2017-11-20 08:18:12 +00:00
Yaxun Liu
aeeb963599 Let llvm.invariant.group.barrier accepts pointer to any address space
llvm.invariant.group.barrier may accept pointers to arbitrary address space.

This patch let it accept pointers to i8 in any address space and returns
pointer to i8 in the same address space.

Differential Revision: https://reviews.llvm.org/D39973

llvm-svn: 318413
2017-11-16 16:32:16 +00:00
Mohammed Agabaria
aed9eba03d [TTI][X86] update costs of interleaved load\store of i64\double
This patch contains more accurate cost of interelaved load\store of stride 2 for the types int64\double on AVX2.

Reviewers: delena, RKSimon, craig.topper, dorit

Reviewed By: dorit

Differential Revision: https://reviews.llvm.org/D40008

llvm-svn: 318385
2017-11-16 09:38:32 +00:00
Mikael Holmen
37eaf65cb9 [Lint] Don't warn about passing alloca'd value to tail call if using byval
Summary:
This fixes PR35241.

When using byval, the data is effectively copied as part of the call
anyway, so the pointer returned by the alloca will not be leaked to the
callee and thus there is no reason to issue a warning.

Reviewers: rnk

Reviewed By: rnk

Subscribers: Ka-Ka, llvm-commits

Differential Revision: https://reviews.llvm.org/D40009

llvm-svn: 318279
2017-11-15 07:46:48 +00:00
Hiroshi Yamauchi
cd071125c2 Simplify irreducible loop metadata test code.
Summary:
Shorten the irreducible loop metadata test code by removing insignificant
instructions.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40043

llvm-svn: 318182
2017-11-14 19:48:59 +00:00
Jatin Bhateja
b020a5ea70 [SCEV] Handling for ICmp occuring in the evolution chain.
Summary:
 If a compare instruction is same or inverse of the compare in the
 branch of the loop latch, then return a constant evolution node.
 This shall facilitate computations of loop exit counts in cases
 where compare appears in the evolution chain of induction variables.

 Will fix PR 34538

Reviewers: sanjoy, hfinkel, junryoungju

Reviewed By: sanjoy, junryoungju

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D38494

llvm-svn: 318050
2017-11-13 16:43:24 +00:00
Mohammed Agabaria
1921646036 [LV][X86] update the cost of interleaving mem. access of floats
Recommit:
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.
fixed the location of the lit test it works with make check-all.

Differential Revision: https://reviews.llvm.org/D39403

llvm-svn: 317471
2017-11-06 10:56:20 +00:00
Mohammed Agabaria
01b828cb80 [REVERT][LV][X86] update the cost of interleaving mem. access of floats
reverted my changes will be committed later after fixing the failure
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.

Differential Revision: https://reviews.llvm.org/D39403

llvm-svn: 317433
2017-11-05 09:36:54 +00:00
Mohammed Agabaria
58aaaace99 [LV][X86] update the cost of interleaving mem. access of floats
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.

Differential Revision: https://reviews.llvm.org/D39403

llvm-svn: 317432
2017-11-05 09:06:23 +00:00
Hiroshi Yamauchi
9e5d6805dd Irreducible loop metadata for more accurate block frequency under PGO.
Summary:
Currently the block frequency analysis is an approximation for irreducible
loops.

The new irreducible loop metadata is used to annotate the irreducible loop
headers with their header weights based on the PGO profile (currently this is
approximated to be evenly weighted) and to help improve the accuracy of the
block frequency analysis for irreducible loops.

This patch is a basic support for this.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: mehdi_amini, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D39028

llvm-svn: 317278
2017-11-02 22:26:51 +00:00
Yichao Yu
bce23aee65 Allow inaccessiblememonly and inaccessiblemem_or_argmemonly to be overwriten on call site with operand bundle
Summary:
Similar to argmemonly, readonly and readnone.

Fix PR35128

Reviewers: andrew.w.kaylor, chandlerc, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D39434

llvm-svn: 317201
2017-11-02 12:18:33 +00:00
Geoff Berry
126122d5fd [BranchProbabilityInfo] Handle irreducible loops.
Summary:
Compute the strongly connected components of the CFG and fall back to
use these for blocks that are in loops that are not detected by
LoopInfo when computing loop back-edge and exit branch probabilities.

Reviewers: dexonsmith, davidxl

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D39385

llvm-svn: 317094
2017-11-01 15:16:50 +00:00
Sanjoy Das
f1309f7dda [SCEV] Fix an assertion failure in the max backedge taken count
Max backedge taken count is always expected to be a constant; and this is
usually true by construction -- it is a SCEV expression with constant inputs.
However, if the max backedge expression ends up being computed to be a udiv with
a constant zero denominator[0], SCEV does not fold the result to a constant
since there is no constant it can fold it to (SCEV has no representation for
"infinity" or "undef").

However, in computeMaxBECountForLT we already know the denominator is positive,
and thus at least 1; and we can use this fact to avoid dividing by zero.

[0]: We can end up with a constant zero denominator if the signed range of the
stride is more precise than the unsigned range.

llvm-svn: 316615
2017-10-25 21:41:00 +00:00
Bjorn Pettersson
72e765907f [ConstantFolding] Avoid assert when folding ptrtoint of vectorized GEP
Summary:
Got asserts in llvm::CastInst::getCastOpcode saying:
`DestBits == SrcBits && "Illegal cast to vector (wrong type or size)"' failed.

Problem seemed to be that llvm::ConstantFoldCastInstruction did
not handle ptrtoint cast of a getelementptr returning a vector
correctly. I assume such situations are quite rare, since the
GEP needs to be considered as a constant value (base pointer
being null).
The solution used here is to simply avoid the constant fold
of ptrtoint when the value is a vector. It is not supported,
and by bailing out we do not fail on assertions later on.

Reviewers: craig.topper, majnemer, davide, filcab, efriedma

Reviewed By: efriedma

Subscribers: efriedma, filcab, llvm-commits

Differential Revision: https://reviews.llvm.org/D38546

llvm-svn: 316430
2017-10-24 12:08:11 +00:00
Sanjoy Das
d41e33114f Revert "[ScalarEvolution] Handling for ICmp occuring in the evolution chain."
This reverts commit r316054.  There was some confusion over the review process:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171016/495884.html

llvm-svn: 316129
2017-10-18 22:00:57 +00:00
Michael Zuckerman
22995fcf48 [AVX512][AVX2]Cost calculation for interleave load/store patterns {v8i8,v16i8,v32i8,v64i8}
This patch adds accurate instructions cost.
The formula presents two cases(stride 3 and stride 4) and calculates the cost according to the VF and stride.

Reviewers:
1. delena
2. Farhana
3. zvi
4. dorit
5. Ayal

Differential Revision: https://reviews.llvm.org/D38762

Change-Id: If4cfbd4ac0e63694e8144cb78c7fa34850647ff7
llvm-svn: 316072
2017-10-18 11:41:55 +00:00
Jatin Bhateja
02e129e082 [ScalarEvolution] Handling for ICmp occuring in the evolution chain.
Summary:
 If a compare instruction is same or inverse of the compare in the
 branch of the loop latch, then return a constant evolution node.
 Currently scope of evaluation is limited to SCEV computation for
 PHI nodes.

 This shall facilitate computations of loop exit counts in cases
 where compare appears in the evolution chain of induction variables.

 Will fix PR 34538
Reviewers: sanjoy, hfinkel, junryoungju

Reviewed By: junryoungju

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D38494

llvm-svn: 316054
2017-10-18 01:36:16 +00:00
Anna Thomas
b4afc533f7 [SCEV] Teach SCEV to find maxBECount when loop endbound is variant
Summary:
This patch teaches SCEV to calculate the maxBECount when the end bound
of the loop can vary. Note that we cannot calculate the exactBECount.

This will only be done when both conditions are satisfied:
1. the loop termination condition is strictly LT.
2. the IV is proven to not overflow.

This provides more information to users of SCEV and can be used to
improve identification of finite loops.

Reviewers: sanjoy, mkazantsev, silviu.baranga, atrick

Reviewed by: mkazantsev

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38825

llvm-svn: 315683
2017-10-13 14:30:43 +00:00
Daniel Jasper
3863a9d4f9 Revert r314923: "Recommit : Use the basic cost if a GEP is not used as addressing mode"
Significantly reduces performancei (~30%) of gipfeli
(https://github.com/google/gipfeli)

I have not yet managed to reproduce this regression with the open-source
version of the benchmark on github, but will work with others to get a
reproducer to you later today.

llvm-svn: 315680
2017-10-13 14:04:21 +00:00
Sanjay Patel
0ac027152c [ValueTracking] return zero when there's conflict in known bits of a shift (PR34838)
Poison allows us to return a better result than undef.

llvm-svn: 315595
2017-10-12 17:31:46 +00:00
Justin Lebar
c8a4e60a71 Convert an APInt to int64_t properly in TTI::getGEPCost().
Summary:
If the pointer width is 32 bits and the calculated GEP offset is
negative, we call APInt::getLimitedValue(), which does a
*zero*-extension of the offset.  That's wrong -- we should do an sext.

Fixes a bug introduced in rL314362 and found by Evgeny Astigeevich.

Reviewers: efriedma

Subscribers: sanjoy, javed.absar, llvm-commits, eastig

Differential Revision: https://reviews.llvm.org/D38557

llvm-svn: 314935
2017-10-04 20:47:33 +00:00
Guozhi Wei
e2af0b6a83 [TargetTransformInfo] Check if function pointer is valid before calling isLoweredToCall
Function isLoweredToCall can only accept non-null function pointer, but a function pointer can be null for indirect function call. So check it before calling isLoweredToCall from getInstructionLatency.

Differential Revision: https://reviews.llvm.org/D38204

llvm-svn: 314927
2017-10-04 20:14:08 +00:00
Jun Bum Lim
a6292a6c0f Recommit : Use the basic cost if a GEP is not used as addressing mode
Recommitting r314517 with the fix for handling ConstantExpr.

Original commit message:
  Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing
  mode in the target. However, since it doesn't check its actual users, it will
  return FREE even in cases where the GEP cannot be folded away as a part of
  actual addressing mode. For example, if an user of the GEP is a call
  instruction taking the GEP as a parameter, then the GEP may not be folded in
  isel.

llvm-svn: 314923
2017-10-04 18:33:52 +00:00
Mikael Holmen
61a1a08d08 [Lint] Avoid failed assertion by fetching the proper pointer type
Summary:
When checking if a constant expression is a noop cast we fetched the
IntPtrType by doing DL->getIntPtrType(V->getType())). However, there can
be cases where V doesn't return a pointer, and then getIntPtrType()
triggers an assertion.

Now we pass DataLayout to isNoopCast so the method itself can determine
what the IntPtrType is.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D37894

llvm-svn: 314763
2017-10-03 06:03:49 +00:00
Craig Topper
0c11fe123a [X86] Add AVX512 check lines to the cost model truncate test.
llvm-svn: 314758
2017-10-03 03:47:34 +00:00
Davide Italiano
3d730b1c92 [PassManager] Retire cl::opt that have been set for a while. NFCI.
llvm-svn: 314740
2017-10-02 23:39:20 +00:00
Adrian Prantl
6d0ecb4cce Move the stripping of invalid debug info from the Verifier to AutoUpgrade.
This came out of a recent discussion on llvm-dev
(https://reviews.llvm.org/D38042). Currently the Verifier will strip
the debug info metadata from a module if it finds the dbeug info to be
malformed. This feature is very valuable since it allows us to improve
the Verifier by making it stricter without breaking bcompatibility,
but arguable the Verifier pass should not be modifying the IR. This
patch moves the stripping of broken debug info into AutoUpgrade
(UpgradeDebugInfo to be precise), which is a much better location for
this since the stripping of malformed (i.e., produced by older, buggy
versions of Clang) is a (harsh) form of AutoUpgrade.

This change is mostly NFC in nature, the one big difference is the
behavior when LLVM module passes are introducing malformed debug
info. Prior to this patch, a NoAsserts build would have printed a
warning and stripped the debug info, after this patch the Verifier
will report a fatal error. I believe this behavior is actually more
desirable anyway.

Differential Revision: https://reviews.llvm.org/D38184

llvm-svn: 314699
2017-10-02 18:31:29 +00:00
Daniel Jasper
cb351d3503 Revert r314435: "[JumpThreading] Preserve DT and LVI across the pass"
Causes a segfault on a builtbot (and in our internal bootstrapping of
Clang). See Eli's response on the commit thread.

llvm-svn: 314589
2017-09-30 11:57:19 +00:00
Alex Shlyapnikov
d4a97b3c06 Revert "Use the basic cost if a GEP is not used as addressing mode"
This reverts commit r314517.

This commit crashes sanitizer bots, for example:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/4167

Stack snippet:
...
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Support/Casting.h:255:0
llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getGEPCost(llvm::GEPOperator const*, llvm::ArrayRef<llvm::Value const*>)
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:742:0
llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>)
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:782:0
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/lib/Analysis/TargetTransformInfo.cpp:116:0
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:116:0
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:343:0
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:864:0
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfo.h:285:0
...

llvm-svn: 314560
2017-09-29 22:04:45 +00:00
Jun Bum Lim
dd78ed1cc5 Use the basic cost if a GEP is not used as addressing mode
Summary:
Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target.
However, since it doesn't check its actual users, it will return FREE even in cases
where the GEP cannot be folded away as a part of actual addressing mode.
For example, if an user of the GEP is a call instruction taking the GEP as a parameter,
then the GEP may not be folded in isel.

Reviewers: hfinkel, efriedma, mcrosier, jingyue, haicheng

Reviewed By: hfinkel

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D38085

llvm-svn: 314517
2017-09-29 14:50:16 +00:00
Evandro Menezes
aac52b0c56 [JumpThreading] Preserve DT and LVI across the pass
JumpThreading now preserves dominance and lazy value information across the
entire pass.  The pass manager is also informed of this preservation with
the goal of DT and LVI being recalculated fewer times overall during
compilation.

This change prepares JumpThreading for enhanced opportunities; particularly
those across loop boundaries.

Patch by: Brian Rzycki <b.rzycki@samsung.com>,
          Sebastian Pop <s.pop@samsung.com>

Differential revision: https://reviews.llvm.org/D37528

llvm-svn: 314435
2017-09-28 17:24:40 +00:00
Justin Lebar
ae90b4a8da Check for overflows when calculating the offset in GetGEPCost.
Summary:
This avoids C++ UB if the GEP is weird and the calculation overflows
int64_t, and it's also observable in the cost model's results.

Such GEPs are almost surely not valid pointers, but LLVM nonetheless
generates them sometimes.

Reviewers: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38337

llvm-svn: 314362
2017-09-27 23:16:56 +00:00
Guozhi Wei
af71947aaf [TargetTransformInfo] Handle intrinsic call in getInstructionLatency()
Usually an intrinsic is a simple target instruction, it should have a small latency. A real function call has much larger latency. So handle the intrinsic call in function getInstructionLatency().

Differential Revision: https://reviews.llvm.org/D38104

llvm-svn: 314003
2017-09-22 18:25:53 +00:00
Guozhi Wei
24cd70c139 [TargetTransformInfo] Static alloca has 0 cost
Static alloca usually doesn't generate any machine instructions, so it has 0 cost.

Differential Revision: https://reviews.llvm.org/D37879

llvm-svn: 313410
2017-09-15 22:28:12 +00:00
Guozhi Wei
3025ed18e5 [TargetTransformInfo] Detect 0 latency instructions
For instructions that unlikely generate machine instructions, they should also have 0 latency.

Differential Revision: https://reviews.llvm.org/D37833

llvm-svn: 313288
2017-09-14 19:20:02 +00:00
Silviu Baranga
a68e72a0c8 [LAA] Allow more run-time alias checks by coercing pointer expressions to AddRecExprs
Summary:
LAA can only emit run-time alias checks for pointers with affine AddRec
SCEV expressions. However, non-AddRecExprs can be now be converted to
affine AddRecExprs using SCEV predicates.

This change tries to add the minimal set of SCEV predicates in order
to enable run-time alias checking.

Reviewers: anemet, mzolotukhin, mkuper, sanjoy, hfinkel

Reviewed By: hfinkel

Subscribers: mssimpso, Ayal, dorit, roman.shirokiy, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D17080

llvm-svn: 313012
2017-09-12 07:48:22 +00:00
Guozhi Wei
5c5643c0c8 [TargetTransformInfo] Add a new public interface getInstructionCost
Current TargetTransformInfo can support throughput cost model and code size model, but sometimes we also need instruction latency cost model in different optimizations. Hal suggested we need a single public interface to query the different cost of an instruction. So I proposed following interface:

  enum TargetCostKind {
    TCK_RecipThroughput, ///< Reciprocal throughput.
    TCK_Latency,         ///< The latency of instruction.
    TCK_CodeSize         ///< Instruction code size.
  };

  int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const;

All clients should mainly use this function to query the cost of an instruction, parameter <kind> specifies the desired cost model.

This patch also provides a simple default implementation of getInstructionLatency.

The default getInstructionLatency provides latency numbers for only small number of instruction classes, those latency numbers are only reasonable for modern OOO processors. It can be extended in following ways:

   Add more detail into this function.
   Add getXXXLatency function and call it from here.
   Implement target specific getInstructionLatency function.

Differential Revision: https://reviews.llvm.org/D37170

llvm-svn: 312832
2017-09-08 22:29:17 +00:00
Zvi Rackover
bf9b51d3fc X86: Improve AVX512 fptoui lowering
Summary:
Add patterns for
  fptoui <16 x float> to <16 x i8>
  fptoui <16 x float> to <16 x i16>

Reviewers: igorb, delena, craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37505

llvm-svn: 312704
2017-09-07 07:40:34 +00:00
Alexandre Isoard
a49643a387 [SCEV] Add URem support to SCEV
In LLVM IR the following code:

    %r = urem <ty> %t, %b

is equivalent to

    %q = udiv <ty> %t, %b
    %s = mul <ty> nuw %q, %b
    %r = sub <ty> nuw %t, %q ; (t / b) * b + (t % b) = t

As UDiv, Mul and Sub are already supported by SCEV, URem can be implemented
with minimal effort using that relation:

    %r --> (-%b * (%t /u %b)) + %t

We implement two special cases:

  - if %b is 1, the result is always 0
  - if %b is a power-of-two, we produce a zext/trunc based expression instead

That is, the following code:

    %r = urem i32 %t, 65536

Produces:

    %r --> (zext i16 (trunc i32 %a to i16) to i32)

Note that while this helps get a tighter bound on the range analysis and the
known-bits analysis, this exposes some normalization shortcoming of SCEVs:

    %div = udim i32 %a, 65536
    %mul = mul i32 %div, 65536
    %rem = urem i32 %a, 65536
    %add = add i32 %mul, %rem

Will usually not be reduced.

llvm-svn: 312329
2017-09-01 14:59:59 +00:00
Matt Arsenault
11d7542186 AMDGPU: Don't assert in TTI with fp32 denorms enabled
Also refine for f16 and rcp cases.

llvm-svn: 312213
2017-08-31 05:47:00 +00:00
Simon Pilgrim
7b3c093106 [CostModel][X86][XOP] Improve costs for XOP shuffles
VPPERM/VPERMIL2PD/VPERMIL2PS all provide more effective 2-input shuffles than regular AVX instructions

llvm-svn: 311005
2017-08-16 13:50:20 +00:00
Jakub Kuderski
e25b7db874 [Dominators] Include infinite loops in PostDominatorTree
Summary:
This patch teaches PostDominatorTree about infinite loops. It is built on top of D29705 by @dberlin which includes a very detailed motivation for this change.

What's new is that the patch also teaches the incremental updater how to deal with reverse-unreachable regions and how to properly maintain and verify tree roots. Before that, the incremental algorithm sometimes ended up preserving reverse-unreachable regions after updates that wouldn't appear in the tree if it was constructed from scratch on the same CFG.

This patch makes the following assumptions:
- A sequence of updates should produce the same tree as a recalculating it.
- Any sequence of the same updates should lead to the same tree.
- Siblings and roots are unordered.

The last two properties are essential to efficiently perform batch updates in the future.
When it comes to the first one, we can decide later that the consistency between freshly built tree and an updated one doesn't matter match, as there are many correct ways to pick roots in infinite loops, and to relax this assumption. That should enable us to recalculate postdominators less frequently.

This patch is pretty conservative when it comes to incremental updates on reverse-unreachable regions and ends up recalculating the whole tree in many cases. It should be possible to improve the performance in many cases, if we decide that it's important enough.
That being said, my experiments showed that reverse-unreachable are very rare in the IR emitted by clang when bootstrapping  clang. Here are the statistics I collected by analyzing IR between passes and after each removePredecessor call:

```
# functions:  52283
# samples:  337609
# reverse unreachable BBs:  216022
# BBs:  247840796
Percent reverse-unreachable:  0.08716159869015269 %
Max(PercRevUnreachable) in a function:  87.58620689655172 %
# > 25 % samples:  471 ( 0.1395104988314885 % samples )
... in 145 ( 0.27733680163724345 % functions )
```

Most of the reverse-unreachable regions come from invalid IR where it wouldn't be possible to construct a PostDomTree anyway.

I would like to commit this patch in the next week in order to be able to complete the work that depends on it before the end of my internship, so please don't wait long to voice your concerns :).

Reviewers: dberlin, sanjoy, grosser, brzycki, davide, chandlerc, hfinkel

Reviewed By: dberlin

Subscribers: nhaehnle, javed.absar, kparzysz, uabelho, jlebar, hiraditya, llvm-commits, dberlin, david2050

Differential Revision: https://reviews.llvm.org/D35851

llvm-svn: 310940
2017-08-15 18:14:57 +00:00
Hal Finkel
ce5e2d8b37 [ValueTracking] Don't delete assumes of side-effectful instructions
ValueTracking has to strike a balance when attempting to propagate information
backwards from assumes, because if the information is trivially propagated
backwards, it can appear to LLVM that the assumption is known to be true, and
therefore can be removed.

This is sound (because an assumption has no semantic effect except for causing
UB), but prevents the assume from allowing further optimizations.

The isEphemeralValueOf check exists to try and prevent this issue by not
removing the source of an assumption. This tries to make it a little bit more
general to handle the case of side-effectful instructions, such as in

  %0 = call i1 @get_val()
  %1 = xor i1 %0, true
  call void @llvm.assume(i1 %1)

Patch by Ariel Ben-Yehuda, thanks!

Differential Revision: https://reviews.llvm.org/D36590

llvm-svn: 310859
2017-08-14 17:11:43 +00:00
Chandler Carruth
b246fe2de1 [ValueTracking] Revert r310583 which enabled functionality that still is
causing compile time issues.

Moreover, the patch *deleted* the flag in addition to changing the
default, and links to a code review that doesn't even discuss the flag
and just has an update to a Clang test case.

I've followed up on the commit thread to ask for numbers on compile time
at this point, leaving the flag in place until things stabilize, and
pointing at specific code that seems to exhibit excessive compile time
with this patch.

Original commit message for r310583:
"""
[ValueTracking] Enabling ValueTracking patch by default (recommit). Part 2.

The original patch was an improvement to IR ValueTracking on
non-negative integers. It has been checked in to trunk (D18777,
r284022). But was disabled by default due to performance regressions.
Perf impact has improved. The patch would be enabled by default.
""""

llvm-svn: 310816
2017-08-14 07:03:24 +00:00
Simon Pilgrim
a097338133 [CostModel][X86] Add SSE2 two-src shuffle costs
llvm-svn: 310654
2017-08-10 19:32:35 +00:00
Simon Pilgrim
9232f70295 [CostModel][X86] Add avx1 two-src shuffle costs
llvm-svn: 310650
2017-08-10 19:02:51 +00:00
Simon Pilgrim
74c596b9bb [CostModel][X86] Add avx2 two-src shuffle costs
llvm-svn: 310645
2017-08-10 18:29:34 +00:00
Simon Pilgrim
aa4188e1fa [CostModel][X86] Extend two src shuffle cost tests
Cover most 128/256/512/1024-bit cases for vXf64/vXi64, vXf32/vXi32, vXi16 + vXi8

llvm-svn: 310641
2017-08-10 18:02:45 +00:00
Simon Pilgrim
52d93338c3 [CostModel][X86] Add avx512vbmi broadcast/reverse/single-src shuffle cost tests
llvm-svn: 310633
2017-08-10 17:33:25 +00:00