1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 04:22:57 +02:00
Commit Graph

5788 Commits

Author SHA1 Message Date
Sanjoy Das
bb484e2b4d [GVN] Make a test case more robust
The singleton !range metadata gets simplified more aggressively after a
later change, so change the !range metadata to contain more than one
element.

While at it, turn some `; CHECK` s to `; CHECK-LABEL:` s.

llvm-svn: 251485
2015-10-28 03:20:05 +00:00
David Majnemer
067f270fea [SimplifyCFG] Don't DCE catchret because the successor is unreachable
CatchReturnInst has side-effects: it runs a destructor.  This destructor
could conceivably run forever/call exit/etc. and should not be removed.

llvm-svn: 251461
2015-10-27 22:43:56 +00:00
Hal Finkel
9cc3c36523 [AliasSetTracker] Use mod/ref information for UnknownInstr
AliasSetTracker does not need to convert the access mode to ModRefAccess if the
new visited UnknownInst has only 'REF' modrefinfo to existing pointers in the
sets.

Patch by Andrew Zhogin!

llvm-svn: 251451
2015-10-27 20:37:04 +00:00
David Majnemer
7b9b63574c [ScalarEvolutionExpander] PHI on a catchpad can be used on both edges
A PHI on a catchpad might be used by both edges out of the catchpad,
feeding back into a loop.  In this case, just use the insertion point.
Anything more clever would require new basic blocks or PHI placement.

llvm-svn: 251442
2015-10-27 19:48:28 +00:00
NAKAMURA Takumi
53299c896e Revert r251291, "Loop Vectorizer - skipping "bitcast" before GEP"
It causes miscompilation of llvm/lib/ExecutionEngine/Interpreter/Execution.cpp.
See also PR25324.

llvm-svn: 251436
2015-10-27 19:02:36 +00:00
Charlie Turner
a9f1af80ab [SLP] Be more aggressive about reduction width selection.
Summary:
This change could be way off-piste, I'm looking for any feedback on whether it's an acceptable approach.

It never seems to be a problem to gobble up as many reduction values as can be found, and then to attempt to reduce the resulting tree. Some of the workloads I'm looking at have been aggressively unrolled by hand, and by selecting reduction widths that are not constrained by a vector register size, it becomes possible to profitably vectorize. My test case shows such an unrolling which SLP was not vectorizing (on neither ARM nor X86) before this patch, but with it does vectorize.

I measure no significant compile time impact of this change when combined with D13949 and D14063. There are also no significant performance regressions on ARM/AArch64 in SPEC or LNT.

The more principled approach I thought of was to generate several candidate tree's and use the cost model to pick the cheapest one. That seemed like quite a big design change (the algorithms seem very much one-shot), and would likely be a costly thing for compile time. This seemed to do the job at very little cost, but I'm worried I've misunderstood something!

Reviewers: nadav, jmolloy

Subscribers: mssimpso, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D14116

llvm-svn: 251428
2015-10-27 17:59:03 +00:00
Charlie Turner
32140fced6 [SLP] Try a bit harder to find reduction PHIs
Summary:
Currently, when the SLP vectorizer considers whether a phi is part of a reduction, it dismisses phi's whose incoming blocks are not the same as the block containing the phi. For the patterns I'm looking at, extending this rule to allow phis whose incoming block is a containing loop latch allows me to vectorize certain workloads.

There is no significant compile-time impact, and combined with D13949, no performance improvement measured in ARM/AArch64 in any of SPEC2000, SPEC2006 or LNT.

Reviewers: jmolloy, mcrosier, nadav

Subscribers: mssimpso, nadav, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D14063

llvm-svn: 251425
2015-10-27 17:54:16 +00:00
Charlie Turner
18cdd84f54 [SLP] Treat SelectInsts as reduction values.
Summary:
Certain workloads, in particular sum-of-absdiff loops, can be vectorized using SLP if it can treat select instructions as reduction values.

The test case is a bit awkward. The AArch64 cost model needs some tuning to not be so pessimistic about selects. I've had to tweak the SLP threshold here.

Reviewers: jmolloy, mzolotukhin, spatel, nadav

Subscribers: nadav, mssimpso, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D13949

llvm-svn: 251424
2015-10-27 17:49:11 +00:00
Diego Novillo
13365b7f1f Fix SamplePGO segfault when debug info is missing.
When emitting a remark for a conditional branch annotation, the remark
uses the line location information of the conditional branch in the
message.  In some cases, that information is unavailable and the
optimization would segfaul. I'm still not sure whether this is a bug or
WAI, but the optimizer should not die because of this.

llvm-svn: 251420
2015-10-27 17:37:00 +00:00
David Majnemer
e6ee317fc1 [ScalarEvolutionExpander] Properly insert no-op casts + EH Pads
We want to insert no-op casts as close as possible to the def.  This is
tricky when the cast is of a PHI node and the BasicBlocks between the
def and the use cannot hold any instructions.  Iteratively walk EH pads
until we hit a non-EH pad.

This fixes PR25326.

llvm-svn: 251393
2015-10-27 07:36:42 +00:00
Igor Laevsky
b3b79f34cd [RS4GC] Strip noalias attribute after statepoint rewrite
We should remove noalias along with dereference and dereference_or_null attributes 
because statepoint could potentially touch the entire heap including noalias objects.

Differential Revision: http://reviews.llvm.org/D14032

llvm-svn: 251333
2015-10-26 19:06:01 +00:00
Diego Novillo
73fa815dcb SamplePGO - Add optimization reports.
This adds a couple of optimization remarks to the SamplePGO
transformation. When it decides to inline a hot function (to mimic the
inline stack and repeat useful inline decisions in the original build).

It will also report branch destinations. For instance, given the code
fragment:

     6      if (i < 1000)
     7        sum -= i;
     8      else
     9        sum += -i * rand();

If the 'else' branch is taken most of the time, building this code with
-Rpass=sample-profile will produce:

a.cc:9:14: remark: most popular destination for conditional branches at small.cc:6:9 [-Rpass=sample-profile]
      sum += -i * rand();
             ^

llvm-svn: 251330
2015-10-26 18:52:53 +00:00
Evgeniy Stepanov
6d86bc6f79 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

The previous iteration of this change was reverted in r250461. This
version leaves the generic, compiler-rt based implementation in
SafeStack.cpp instead of moving it to TargetLoweringBase in order to
allow testing without a TargetMachine.

llvm-svn: 251324
2015-10-26 18:28:25 +00:00
Diego Novillo
7c45627db9 Cleanup test case debug info. NFC.
llvm-svn: 251320
2015-10-26 18:14:44 +00:00
Elena Demikhovsky
5e7ce7f64e Loop Vectorizer - skipping "bitcast" before GEP
Vectorization of memory instruction (Load/Store) is possible when the pointer is coming from GEP. The GEP analysis allows to estimate the profit.
In some cases we have a "bitcast" between GEP and memory instruction.
I added code that skips the "bitcast".

http://reviews.llvm.org/D13886

llvm-svn: 251291
2015-10-26 13:42:41 +00:00
Silviu Baranga
8fa79f5d2b [InstCombine] Teach instcombine not to create extra PHI nodes when folding GEPs
Summary:
InstCombine tries to transform GEP(PHI(GEP1, GEP2, ..)) into GEP(GEP(PHI(...))
when possible. However, this may leave the old PHI node around. Even if we
do end up folding the GEPs, having an extra PHI node might not be beneficial.

This change makes the transformation more conservative. We now only do this if
the PHI has only one use, and can therefore be removed after the transformation.

Reviewers: jmolloy, majnemer

Subscribers: mcrosier, mssimpso, llvm-commits

Differential Revision: http://reviews.llvm.org/D13887

llvm-svn: 251281
2015-10-26 10:25:05 +00:00
Chen Li
c9e8b188d2 Revert rL251061 [SimplifyCFG] Extend SimplifyResume to handle phi of trivial landing pad.
llvm-svn: 251149
2015-10-23 21:13:01 +00:00
Hal Finkel
1a64f66683 Handle non-constant shifts in computeKnownBits, and use computeKnownBits for constant folding in InstCombine/Simplify
First, the motivation: LLVM currently does not realize that:

  ((2072 >> (L == 0)) >> 7) & 1 == 0

where L is some arbitrary value. Whether you right-shift 2072 by 7 or by 8, the
lowest-order bit is always zero. There are obviously several ways to go about
fixing this, but the generic solution pursued in this patch is to teach
computeKnownBits something about shifts by a non-constant amount. Previously,
we would give up completely on these. Instead, in cases where we know something
about the low-order bits of the shift-amount operand, we can combine (and
together) the associated restrictions for all shift amounts consistent with
that knowledge. As a further generalization, I refactored all of the logic for
all three kinds of shifts to have this capability. This works well in the above
case, for example, because the dynamic shift amount can only be 0 or 1, and
thus we can say a lot about the known bits of the result.

This brings us to the second part of this change: Even when we know all of the
bits of a value via computeKnownBits, nothing used to constant-fold the result.
This introduces the necessary code into InstCombine and InstSimplify. I've
added it into both because:

  1. InstCombine won't automatically pick up the associated logic in
     InstSimplify (InstCombine uses InstSimplify, but not via the API that
     passes in the original instruction).

  2. Putting the logic in InstCombine allows the resulting simplifications to become
     part of the iterative worklist

  3. Putting the logic in InstSimplify allows the resulting simplifications to be
     used by everywhere else that calls SimplifyInstruction (inlining, unrolling,
     and many others).

And this requires a small change to our definition of an ephemeral value so
that we don't break the rest case from r246696 (where the icmp feeding the
@llvm.assume, is also feeding a br). Under the old definition, the icmp would
not be considered ephemeral (because it is used by the br), but this causes the
assume to remove itself (in addition to simplifying the branch structure), and
it seems more-useful to prevent that from happening.

llvm-svn: 251146
2015-10-23 20:37:08 +00:00
Tim Northover
e2a148b97f GVN: don't try to replace instruction with itself.
After some look-ahead PRE was added for GEPs, an instruction could end
up in the table of candidates before it was actually inspected. When
this happened the pass might decide it was the best candidate to
replace itself. This didn't go well.

Should fix PR25291

llvm-svn: 251145
2015-10-23 20:30:02 +00:00
Chen Li
87928bb2ce [SimplifyCFG] Extend SimplifyResume to handle phi of trivial landing pad.
Summary: Currently SimplifyResume can convert an invoke instruction to a call instruction if its landing pad is trivial. In practice we could have several invoke instructions with trivial landing pads and share a common rethrow block, and in the common rethrow block, all the landing pads join to a phi node. The patch extends SimplifyResume to check the phi of landing pad and their incoming blocks. If any of them is trivial, remove it from the phi node and convert the invoke instruction to a call instruction.  

Reviewers: hfinkel, reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13718

llvm-svn: 251061
2015-10-22 20:48:38 +00:00
Sanjoy Das
6701b8778f [SCEV] Remove a test case added in r249168
The test case wasn't testing what it was commented to be testing; and
when I tried to fix the test I noticed that SCEV does not support the
simplification that the test was supposed to test.

This change removes the test case to avoid confusion.

llvm-svn: 251053
2015-10-22 19:57:41 +00:00
Sanjoy Das
592ded635e [SCEV] Opportunistically interpret unsigned constraints as signed
Summary:
An unsigned comparision is equivalent to is corresponding signed version
if both the operands being compared are positive.  Teach SCEV to use
this fact when profitable.

Reviewers: atrick, hfinkel, reames, nlewycky

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13687

llvm-svn: 251051
2015-10-22 19:57:34 +00:00
Sanjoy Das
7eba4a995b [SCEV] Teach SCEV some axioms about non-wrapping arithmetic
Summary:
 - A s<  (A + C)<nsw> if C >  0
 - A s<= (A + C)<nsw> if C >= 0
 - (A + C)<nsw> s<  A if C <  0
 - (A + C)<nsw> s<= A if C <= 0

Right now `C` needs to be a constant, but we can later generalize it to
be a non-constant if needed.

Reviewers: atrick, hfinkel, reames, nlewycky

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13686

llvm-svn: 251050
2015-10-22 19:57:29 +00:00
Sanjoy Das
fd03ccdaff [SCEV] Mark AddExprs as nsw or nuw if legal
Summary:
This uses `ScalarEvolution::getRange` and not potentially control
dependent `nsw` and `nuw` bits on the arithmetic instruction.

Reviewers: atrick, hfinkel, nlewycky

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D13613

llvm-svn: 251048
2015-10-22 19:57:19 +00:00
Michael Liao
9a38e95740 [InstCombine] Revise the test case to match full sequene
llvm-svn: 250950
2015-10-21 21:50:58 +00:00
David Majnemer
1956090ed0 [SimplifyCFG] Don't use-after-free an SSA value
SimplifyTerminatorOnSelect didn't consider the possibility that the
condition might be related to one of PHI nodes.

This fixes PR25267.

llvm-svn: 250922
2015-10-21 18:22:24 +00:00
Dehao Chen
de5bbe325e Tolerate negative offset when matching sample profile.
In some cases (as illustrated in the unittest), lineno can be less than the heade_lineno because the function body are included from some other files. In this case, offset will be negative. This patch makes clang still able to match the profile to IR in this situation.

http://reviews.llvm.org/D13914

llvm-svn: 250873
2015-10-21 01:22:27 +00:00
Sanjoy Das
ed101be091 [RS4GC] Re-purpose normalizeForInvokeSafepoint; NFC.
`normalizeForInvokeSafepoint` in RewriteStatepointsForGC.cpp, as it is
written today, deals with `gc.relocate` and `gc.result` uses of a
statepoint equally well.  This change documents this fact and adds a
test case.

There is no functional change here -- only documentation of existing
functionality.

llvm-svn: 250784
2015-10-20 01:06:24 +00:00
Michael Liao
5efcb99302 [InstCombine] Optimize icmp of inc/dec at RHS
Allow LLVM to optimize the sequence like the following:

  %inc = add nsw i32 %i, 1
  %cmp = icmp slt %n, %inc

into:

  %cmp = icmp sle i32 %n, %i

The case is not handled previously due to the complexity of compuation of %n.
Hence, LLVM cannot swap operands of icmp accordingly.

llvm-svn: 250746
2015-10-19 22:08:14 +00:00
Sanjay Patel
659d8f6866 [CGP] transform select instructions into branches and sink expensive operands
This was originally checked in at r250527, but reverted at r250570 because of PR25222.
There were at least 2 problems: 
1. The cost check was checking for an instruction with an exact cost of TCC_Expensive;
that should have been >=.
2. The cause of the clang stage 1 failures was illegally sinking 'call' instructions;
we can't sink instructions that may have side effects / are not safe to execute speculatively.

Fixed those conditions in sinkSelectOperand() and added test cases.

Original commit message:
This is a follow-up to the discussion in D12882.

Ideally, we would like SimplifyCFG to be able to form select instructions even when the operands
are expensive (as defined by the TTI cost model) because that may expose further optimizations.
However, we would then like a later pass like CodeGenPrepare to undo that transformation if the
target would likely benefit from not speculatively executing an expensive op (this patch).

Once we have this safety mechanism in place, we can adjust SimplifyCFG to restore its
select-formation behavior that changed with r248439.

Differential Revision: http://reviews.llvm.org/D13297

llvm-svn: 250743
2015-10-19 21:59:12 +00:00
Jakub Staszak
c9f385dfda Preserve CFG in MergedLoadStoreMotion. This fixes PR24426.
llvm-svn: 250660
2015-10-18 19:34:10 +00:00
Simon Pilgrim
2df3161736 [InstCombine] SSE4A constant folding and conversion to shuffles.
This patch improves support for combining the SSE4A EXTRQ(I) and INSERTQ(I) intrinsics:

1 - Converts INSERTQ/EXTRQ calls to INSERTQI/EXTRQI if the 'bit index' and 'length' operands are constant
2 - Converts INSERTQI/EXTRQI calls to shufflevector if the bit index/length are both byte aligned (we can already lower shuffles to INSERTQI/EXTRQI if its useful)
3 - Constant folding support
4 - Add zeroinitializer handling

Differential Revision: http://reviews.llvm.org/D13348

llvm-svn: 250609
2015-10-17 11:40:05 +00:00
Benjamin Kramer
4393ca2076 Revert "This is a follow-up to the discussion in D12882."
Breaks clang selfhost, see PR25222. This reverts commits r250527 and r250528.

llvm-svn: 250570
2015-10-16 23:00:29 +00:00
Diego Novillo
aa097db5de Sample profiles - Re-arrange binary format to emit head samples only on top functions.
The number of samples collected at the head of a function only make
sense for top-level functions (i.e., those actually called as opposed to
being inlined inside another).

Head samples essentially count the time spent inside the function's
prologue.  This clearly doesn't make sense for inlined functions, so we
were always emitting 0 in those.

llvm-svn: 250539
2015-10-16 18:54:35 +00:00
Sanjay Patel
a43083a12e move test case to x86 directory because it specifies an x86 target
llvm-svn: 250528
2015-10-16 17:18:07 +00:00
Sanjay Patel
060480ec5a This is a follow-up to the discussion in D12882.
Ideally, we would like SimplifyCFG to be able to form select instructions even when the operands
are expensive (as defined by the TTI cost model) because that may expose further optimizations. 
However, we would then like a later pass like CodeGenPrepare to undo that transformation if the
target would likely benefit from not speculatively executing an expensive op (this patch).

Once we have this safety mechanism in place, we can adjust SimplifyCFG to restore its 
select-formation behavior that changed with r248439.

Differential Revision: http://reviews.llvm.org/D13297

llvm-svn: 250527
2015-10-16 16:54:30 +00:00
Sanjoy Das
3f75d20781 [RS4GC] Dont' propagate call attrs related to patchable statepoints
The `"statepoint-id"` and `"statepoint-num-patch-bytes"` attributes are
used solely to determine properties of the `gc.statepoint` being
created.  Once the `gc.statepoint` is in place, these should be removed.

llvm-svn: 250491
2015-10-16 02:41:23 +00:00
Sanjoy Das
3c39a5c0ab [RS4GC] Use "deopt" operand bundles
Summary:
This is a step towards using operand bundles to carry deopt state till
RewriteStatepointsForGC.  The change adds a flag to
RewriteStatepointsForGC that teaches it to pick up deopt state from a
`"deopt"` operand bundle attached to the `call` or `invoke` it is
wrapping.

The command line flag added, `-rs4gc-use-deopt-bundles`, will only exist
for a short while.  Once we are able to pipe deopt bundle state through
the full optimization pipeline without problems, we will "constant fold"
`-rs4gc-use-deopt-bundles` to `true`.

Reviewers: swaroop.sridhar, reames

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D13372

llvm-svn: 250489
2015-10-16 02:41:00 +00:00
Sanjoy Das
10cf1fcef4 [IndVars] Have cloneArithmeticIVUser guess better
Summary:
`cloneArithmeticIVUser` currently trips over expression like `add %iv,
-1` when `%iv` is being zero extended -- it tries to construct the
widened use as `add %iv.zext, zext(-1)` and (correctly) fails to prove
equivalence to `zext(add %iv, -1)` (here the SCEV for `%iv` is
`{1,+,1}`).

This change teaches `IndVars` to try sign extending the non-IV operand
if that makes the newly constructed IV use equivalent to the widened
narrow IV use.

Reviewers: atrick, hfinkel, reames

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13717

llvm-svn: 250483
2015-10-16 01:00:47 +00:00
Evgeniy Stepanov
17157ff131 Revert "[safestack] Fast access to the unsafe stack pointer on AArch64/Android."
Breaks the hexagon buildbot.

llvm-svn: 250461
2015-10-15 21:26:49 +00:00
Evgeniy Stepanov
f2afc9b765 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

llvm-svn: 250456
2015-10-15 20:50:16 +00:00
Philip Reames
53f2760fdc Revert 250343 and 250344
Turns out this approach is buggy.  In discussion about follow on work, Sanjoy pointed out that we could be subject to circular logic problems.  

Consider:
 if (i u< L) leave()
 if ((i + 1) u< L) leave()
 print(a[i] + a[i+1]) 

If we know that L is less than UINT_MAX, we could possible prove (in a control dependent way) that i + 1 does not overflow.  This gives us:
 if (i u< L) leave()
 if ((i +nuw 1) u< L) leave()
 print(a[i] + a[i+1]) 

If we now do the transform this patch proposed, we end up with:
 if ((i +nuw 1) u< L) leave_appropriately()
 print(a[i] + a[i+1]) 

That would be a miscompile when i==-1.  The problem here is that the control dependent nuw bits got used to prove something about the first condition.  That's obviously invalid.

This won't happen today, but since I plan to enhance LVI/CVP with exactly that transform at some point in the not too distant future...

llvm-svn: 250430
2015-10-15 16:51:00 +00:00
Manman Ren
8fc1a288a0 Recommit r250345, it was reverted in r250366 to investigate a bot failure.
Our internal bot is still red after r250366.

llvm-svn: 250415
2015-10-15 14:59:40 +00:00
Manman Ren
f2865ff590 Temporarily revert r250345 to sort out bot failure.
With r250345 and r250343, we start to observe the following failure
when bootstrap clang with lto and pgo:
PHI node entries do not match predecessors!
  %.sroa.029.3.i = phi %"class.llvm::SDNode.13298"* [ null, %30953 ], [ null, %31017 ], [ null, %30998 ], [ null, %_ZN4llvm8dyn_castINS_14ConstantSDNodeENS_7SDValueEEENS_10cast_rettyIT_T0_E8ret_typeERS5_.exit.i.1804 ], [ null, %30975 ], [ null, %30991 ], [ null, %_ZNK4llvm3EVT13getScalarTypeEv.exit.i.1812 ], [ %..sroa.029.0.i, %_ZN4llvm11SmallVectorIiLj8EED1Ev.exit.i.1826 ], !dbg !451895
label %30998
label %_ZNK4llvm3EVTeqES0_.exit19.thread.i
LLVM ERROR: Broken function found, compilation aborted!

I will re-commit this if the bot does not recover.

llvm-svn: 250366
2015-10-15 04:58:24 +00:00
Cong Hou
07a3f21d1f Update the branch weight metadata in JumpThreading pass.
Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).

This is the third attempt to submit this patch, while the first two led to failures in some FDO tests. After investigation, it is the edge weight normalization that caused those failures. In this patch the edge weight normalization is fixed so that there is no zero weight in the output and the sum of all weights can fit in 32-bit integer. Several unit tests are added.

Differential revision: http://reviews.llvm.org/D10979

llvm-svn: 250345
2015-10-14 23:14:17 +00:00
Philip Reames
b2c152ecb1 Test case which should have been part of 250343
llvm-svn: 250344
2015-10-14 22:47:06 +00:00
Philip Reames
1bb1df8eb7 Tighten known bits for ctpop based on zero input bits
This is a cleaned up patch from the one written by John Regehr based on the findings of the Souper superoptimizer.

The basic idea here is that input bits that are known zero reduce the maximum count that the intrinsic could return. We know that the number of bits required to represent a particular count is at most log2(N)+1.

Differential Revision: http://reviews.llvm.org/D13253

llvm-svn: 250338
2015-10-14 22:42:12 +00:00
Manman Ren
0a63668758 Revert r250204 and r250240 due to bot failure. We failed to build PGO-ed clang.
llvm-svn: 250264
2015-10-14 03:04:03 +00:00
Kostya Serebryany
b4d1390441 [asan] Disabling speculative loads under asan. Patch by Mike Aizatsky
llvm-svn: 250259
2015-10-14 00:21:05 +00:00
Diego Novillo
c53fb25fe1 Sample profiles - Add a name table to the binary encoding.
Binary encoded profiles used to encode all function names inline at
every reference.  This is clearly suboptimal in terms of space.  This
patch fixes this by adding a name table to the header of the file.

llvm-svn: 250241
2015-10-13 22:48:46 +00:00