1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 05:23:45 +02:00
Commit Graph

16383 Commits

Author SHA1 Message Date
Gor Nishanov
f845a3821b [Coroutines] Part15c: Fix coro-split to correctly handle definitions between coro.save and coro.suspend
Summary:
In the case below, %Result.i19 is defined between coro.save and coro.suspend and used after coro.suspend. We need to correctly place such a value into the coroutine frame.

```
  %save = call token @llvm.coro.save(i8* null)
  %Result.i19 = getelementptr inbounds %"struct.lean_future<int>::Awaiter", %"struct.lean_future<int>::Awaiter"* %ref.tmp7, i64 0, i32 0
  %suspend = call i8 @llvm.coro.suspend(token %save, i1 false)
  switch i8 %suspend, label %exit [
    i8 0, label %await.ready
    i8 1, label %exit
  ]
await.ready:
  %val = load i32, i32* %Result.i19

```

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24418

llvm-svn: 282902
2016-09-30 19:24:19 +00:00
Gor Nishanov
04e0faa3b3 [Coroutines] Part15b: Fix dbg information handling in coro-split.
Summary:
Without the fix, if there was a function inlined into the coroutine with debug information, CloneFunctionInto(NewF, &F, VMap, /*ModuleLevelChanges=*/true, Returns); would duplicate all of the debug information including the DICompileUnit.

We know use VMap to indicate that debug metadata for a File, Unit and FunctionType should not be duplicated when we creating clones that will become f.resume, f.destroy and f.cleanup.

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24417

llvm-svn: 282899
2016-09-30 19:05:06 +00:00
Gor Nishanov
9f360633cd [Coroutines] Part 15a: Lower coro.subfn.addr in CoroCleanup
Summary: Not all coro.subfn.addr intrinsics can be eliminated in CoroElide through devirtualization. Those that remain need to be lowered in CoroCleanup.

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24412

llvm-svn: 282897
2016-09-30 18:41:35 +00:00
Dehao Chen
766ad13bec Update loop unroller cost model to make sure debug info does not affect optimization decisions.
Summary: Debug info should *not* affect optimization decisions. This patch updates loop unroller cost model to make it not affected by debug info.

Reviewers: davidxl, mzolotukhin

Subscribers: haicheng, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D25098

llvm-svn: 282894
2016-09-30 18:30:04 +00:00
Etienne Bergeron
a6c64717ba [asan] Support dynamic shadow address instrumentation
Summary:
This patch is adding the support for a shadow memory with
dynamically allocated address range.

The compiler-rt needs to export a symbol containing the shadow
memory range.

This is required to support ASAN on windows 64-bits.

Reviewers: kcc, rnk, vitalybuka

Subscribers: zaks.anna, kubabrecka, dberris, llvm-commits, chrisha

Differential Revision: https://reviews.llvm.org/D23354

llvm-svn: 282881
2016-09-30 17:46:32 +00:00
Artur Pilipenko
af3e8db88d CVP. Turn marking adds as no wrap on by default (was turned off by 279082)
With 282650 in tree extra no wrap on adds doesn't cause regressions anymore. Reenable the optimzation.

llvm-svn: 282872
2016-09-30 16:20:08 +00:00
Matthew Simpson
f6e1d8dedb [LV] Build all scalar steps for non-uniform induction variables
When building the steps for scalar induction variables, we previously attempted
to determine if all the scalar users of the induction variable were uniform. If
they were, we would only emit the step corresponding to vector lane zero. This
optimization was too aggressive. We generally don't know the entire set of
induction variable users that will be scalar. We have
isScalarAfterVectorization, but this is only a conservative estimate of the
instructions that will be scalarized. Thus, an induction variable may have
scalar users that aren't already known to be scalar. To avoid emitting unused
steps, we can only check that the induction variable is uniform. This should
fix PR30542.

Reference: https://llvm.org/bugs/show_bug.cgi?id=30542
llvm-svn: 282863
2016-09-30 15:13:52 +00:00
Adam Nemet
ca8cbca151 [LDist] Port to new streaming API for opt remarks
llvm-svn: 282838
2016-09-30 04:56:25 +00:00
Adam Nemet
718a6b9aef [LoopUnroll] Port to the new streaming interface for opt remarks.
llvm-svn: 282834
2016-09-30 03:44:16 +00:00
Piotr Padlewski
98cb07e1d2 [thinlto] Don't decay threshold for hot callsites
Summary:
We don't want to decay hot callsites to import chains of hot
callsites. The same mechanism is used in LIPO.

Reviewers: tejohnson, eraman, mehdi_amini

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24976

llvm-svn: 282833
2016-09-30 03:01:17 +00:00
Adam Nemet
d6eee273b7 [LoopDataPrefetch] Port to new streaming API for opt remarks
llvm-svn: 282826
2016-09-30 00:42:43 +00:00
Adam Nemet
cdbc87b8e9 [LV] Port the remarks in processLoop to the new streaming API
This completes LV.

llvm-svn: 282821
2016-09-30 00:29:30 +00:00
Adam Nemet
907a78cd5a [LV] Port the last opt remark in Hints to the new streaming interface
llvm-svn: 282820
2016-09-30 00:29:25 +00:00
Adam Nemet
abdb6ed3d0 [LAA, LV] Port to new streaming interface for opt remarks. Update LV
(Recommit after making sure IsVerbose gets properly initialized in
DiagnosticInfoOptimizationBase.  See previous commit that takes care of
this.)

OptimizationRemarkAnalysis directly takes the role of the report that is
generated by LAA.

Then we need the magic to be able to turn an LAA remark into an LV
remark.  This is done via a new OptimizationRemark ctor.

llvm-svn: 282813
2016-09-30 00:01:30 +00:00
Sanjay Patel
29dc6bed46 [InstCombine] fix function names; NFC
Also, make foldSelectExtConst() a member of InstCombiner, remove
unnecessary parameters from its interface, and group visitSelectInst
helpers together in the header file.

llvm-svn: 282796
2016-09-29 22:18:30 +00:00
Adam Nemet
3b79489357 Revert "[LAA, LV] Port to new streaming interface for opt remarks. Update LV"
This reverts commit r282758.

There are some clang failures I haven't seen.

llvm-svn: 282759
2016-09-29 20:17:37 +00:00
Adam Nemet
6a1c0c8759 [LAA, LV] Port to new streaming interface for opt remarks. Update LV
OptimizationRemarkAnalysis directly takes the role of the report that is
generated by LAA.

Then we need the magic to be able to turn an LAA remark into an LV
remark.  This is done via a new OptimizationRemark ctor.

llvm-svn: 282758
2016-09-29 20:12:18 +00:00
Adam Nemet
08f283bd6f [LV] Port OptimizationRemarkAnalysisFPCommute and
OptimizationRemarkAnalysisAliasing to new streaming API for opt remarks

llvm-svn: 282742
2016-09-29 18:04:47 +00:00
Adam Nemet
92ef1f4f7d [LV] Convert processLoop to new streaming API for opt remarks
llvm-svn: 282740
2016-09-29 17:55:13 +00:00
Sanjay Patel
0edea01276 fix formatting; NFC
llvm-svn: 282737
2016-09-29 17:48:19 +00:00
Kostya Serebryany
83752c3be3 [sanitizer-coverage/libFuzzer] make the guards for trace-pc 32-bit; create one array of guards per function, instead of one guard per BB. reorganize the code so that trace-pc-guard does not create unneeded globals
llvm-svn: 282735
2016-09-29 17:43:24 +00:00
Piotr Padlewski
85b64e4eaf [thinlto] Add cold-callsite import heuristic
Summary:
Not tunned up heuristic, but with this small heuristic there is about
+0.10% improvement on SPEC 2006

Reviewers: tejohnson, mehdi_amini, eraman

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24940

llvm-svn: 282733
2016-09-29 17:32:07 +00:00
Adam Nemet
0a2f86bc8c [LV] Move static createMissedAnalysis from anonymous to global namespace
This is an attempt to fix a windows bot.

llvm-svn: 282730
2016-09-29 17:25:00 +00:00
Adam Nemet
fcdc0e7b5b [LV] Convert CostModel to use the new streaming opt remark API
Here we can already remove the member function emitAnalysis.

llvm-svn: 282729
2016-09-29 17:15:48 +00:00
Adam Nemet
3c671b0c2d [LV] Split most of createMissedAnalysis into a static function. NFC
This will be shared between Legality and CostModel.

llvm-svn: 282728
2016-09-29 17:05:35 +00:00
Adam Nemet
02a510a330 [LV] Convert all but one opt remark in Legality to new streaming interface
The last one remaining after which emitAnalysis can be removed is when
we convert the LAA's report to a vectorization report.  This requires
converting LAA to the new interface first.

llvm-svn: 282726
2016-09-29 16:49:42 +00:00
Adam Nemet
831398892e [LV] Convert emitRemark to new opt remark streaming interface
Also renamed the function to emitRemarkWithHints to better reflect what
the function actually does.

llvm-svn: 282723
2016-09-29 16:23:12 +00:00
Volkan Keles
286cf968eb Test commit. NFC.
llvm-svn: 282717
2016-09-29 13:04:37 +00:00
Evgeny Stupachenko
837a069ffd Wisely choose sext or zext when widening IV.
Summary:
The patch fixes regression caused by two earlier patches D18777 and D18867.

Reviewers: reames, sanjoy

Differential Revision: http://reviews.llvm.org/D24280

From: Li Huang
llvm-svn: 282650
2016-09-28 23:39:39 +00:00
Dehao Chen
8c4cb5e152 Refactor the ProfileSummaryInfo to use doInitialization and doFinalization to handle Module update.
Summary: This refactors the change in r282616

Reviewers: davidxl, eraman, mehdi_amini

Subscribers: mehdi_amini, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D25041

llvm-svn: 282630
2016-09-28 21:00:58 +00:00
Jonas Paulsson
3c5fa71cd5 [SystemZ] Implementation of getUnrollingPreferences().
This commit enables more unrolling for SystemZ by implementing the
SystemZTargetTransformInfo::getUnrollingPreferences() method.

It has been found that it is better to only unroll moderately, so the
DefaultUnrollRuntimeCount has been moved into UnrollingPreferences in order
to set this to a lower value for SystemZ (4).

Reviewers: Evgeny Stupachenko, Ulrich Weigand.
https://reviews.llvm.org/D24451

llvm-svn: 282570
2016-09-28 09:41:38 +00:00
Adam Nemet
ec2292c80c [Inliner] Port all opt remarks to new streaming API
llvm-svn: 282559
2016-09-27 23:47:03 +00:00
Adam Nemet
c03a73efe2 Shorten DiagnosticInfoOptimizationRemark* to OptimizationRemark*. NFC
With the new streaming interface, these class names need to be typed a
lot and it's way too looong.

llvm-svn: 282544
2016-09-27 22:19:23 +00:00
Adam Nemet
e23cf79174 [Inliner] Fold the analysis remark into the missed remark
There is really no reason for these to be separate.

The vectorizer started this pretty bad tradition that the text of the
missed remarks is pretty meaningless, i.e. vectorization failed.  There,
you have to query analysis to get the full picture.

I think we should just explain the reason for missing the optimization
in the missed remark when possible.  Analysis remarks should provide
information that the pass gathers regardless whether the optimization is
passing or not.

llvm-svn: 282542
2016-09-27 21:58:17 +00:00
Michael Zolotukhin
38f796095c [LoopSimplify] When simplifying phis in loop-simplify, do it only if it preserves LCSSA form.
llvm-svn: 282541
2016-09-27 21:03:45 +00:00
Adam Nemet
f602aa8cdd Output optimization remarks in YAML
(Re-committed after moving the template specialization under the yaml
namespace.  GCC was complaining about this.)

This allows various presentation of this data using an external tool.
This was first recommended here[1].

As an example, consider this module:

  1 int foo();
  2 int bar();
  3
  4 int baz() {
  5   return foo() + bar();
  6 }

The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):

  remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
  remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)

Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:

  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: foo
    - String:  will not be inlined into
    - Caller: baz
  ...
  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: bar
    - String:  will not be inlined into
    - Caller: baz
  ...

This is a summary of the high-level decisions:

* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:

   ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                DEBUG_TYPE, "NotInlined", &I)
            << NV("Callee", Callee) << " will not be inlined into "
            << NV("Caller", CS.getCaller()) << setIsVerbose());

NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.

Subsequent patches will update ORE users to use the new streaming API.

* I am using YAML I/O for writing the YAML file.  YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types.  Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.

On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).

* The YAML stream is stored in the LLVM context.

* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".

* As before hotness is computed in the analysis pass instead of
DiganosticInfo.  This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.

[1] https://reviews.llvm.org/D19678#419445

Differential Revision: https://reviews.llvm.org/D24587

llvm-svn: 282539
2016-09-27 20:55:07 +00:00
Reid Kleckner
dd0ddde1e5 [DebugInfo] Add comments to phi dbg.value tracking code, NFC
LLVM developers might be surprised to learn that there are blocks
without valid insertion points (catchswitch), so it seems worth calling
that out explicitly.  Also add a FIXME about what we should really be
doing if we ever need to make optimized Windows EH code debuggable.

While I'm here, make auto usage more consistent with LLVM standards and
avoid an unecessary call to insertBefore.

llvm-svn: 282521
2016-09-27 18:45:31 +00:00
Adam Nemet
5058aadaf2 Revert "Output optimization remarks in YAML"
This reverts commit r282499.

The GCC bots are failing

llvm-svn: 282503
2016-09-27 16:39:24 +00:00
Adam Nemet
b1d6f940c4 Output optimization remarks in YAML
This allows various presentation of this data using an external tool.
This was first recommended here[1].

As an example, consider this module:

  1 int foo();
  2 int bar();
  3
  4 int baz() {
  5   return foo() + bar();
  6 }

The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):

  remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
  remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)

Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:

  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: foo
    - String:  will not be inlined into
    - Caller: baz
  ...
  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: bar
    - String:  will not be inlined into
    - Caller: baz
  ...

This is a summary of the high-level decisions:

* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:

   ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                DEBUG_TYPE, "NotInlined", &I)
            << NV("Callee", Callee) << " will not be inlined into "
            << NV("Caller", CS.getCaller()) << setIsVerbose());

NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.

Subsequent patches will update ORE users to use the new streaming API.

* I am using YAML I/O for writing the YAML file.  YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types.  Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.

On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).

* The YAML stream is stored in the LLVM context.

* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".

* As before hotness is computed in the analysis pass instead of
DiganosticInfo.  This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.

[1] https://reviews.llvm.org/D19678#419445

Differential Revision: https://reviews.llvm.org/D24587

llvm-svn: 282499
2016-09-27 16:15:16 +00:00
Kostya Serebryany
3271dd3f6e [sanitizer-coverage] fix a bug in trace-gep
llvm-svn: 282467
2016-09-27 01:55:08 +00:00
Kostya Serebryany
6db39b63fe [sanitizer-coverage] don't emit the CTOR function if nothing has been instrumented
llvm-svn: 282465
2016-09-27 01:08:33 +00:00
Ivan Krasin
920c8236df Revert r277556. Add -lowertypetests-bitsets-level to control bitsets generation
Summary:
We don't currently need this facility for CFI. Disabling individual hot methods proved
to be a better strategy in Chrome.

Also, the design of the feature is suboptimal, as pointed out by Peter Collingbourne.

Reviewers: pcc

Subscribers: kcc

Differential Revision: https://reviews.llvm.org/D24948

llvm-svn: 282461
2016-09-27 00:29:53 +00:00
Peter Collingbourne
a29a10b3bd LowerTypeTests: Remove unused variable.
llvm-svn: 282456
2016-09-26 23:56:17 +00:00
Peter Collingbourne
de758baa03 LowerTypeTests: Create LowerTypeTestsModule class and move implementation there. Related simplifications.
llvm-svn: 282455
2016-09-26 23:54:39 +00:00
Piotr Padlewski
3152e057e1 [thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.

I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.

I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.

Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.

Reviewers: eraman, mehdi_amini, tejohnson

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24638

llvm-svn: 282437
2016-09-26 20:37:32 +00:00
Daniel Berlin
1c6e9ba785 Remove pruning of phi nodes in MemorySSA - it makes updating harder
Reviewers: george.burgess.iv

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24923

llvm-svn: 282419
2016-09-26 17:22:54 +00:00
Matthew Simpson
894b941fee [LV] Scalarize instructions marked scalar after vectorization
This patch ensures that we actually scalarize instructions marked scalar after
vectorization. Previously, such instructions may have been vectorized instead.

Differential Revision: https://reviews.llvm.org/D23889

llvm-svn: 282418
2016-09-26 17:08:37 +00:00
Gor Nishanov
5e19e9e23d [Coroutines] Part14: Handle coroutines with no suspend points.
Summary:
If coroutine has no suspend points, remove heap allocation and turn a coroutine into a normal function.

Also, if a pattern is detected that coroutine resumes or destroys itself prior to coro.suspend call, turn the suspend point into a simple jump to resume or cleanup label. This pattern occurs when coroutines are used to propagate errors in functions that return expected<T>.

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24408

llvm-svn: 282414
2016-09-26 15:49:28 +00:00
Alexey Bataev
a554bff930 [InstCombine] Fixed bug introduced in r282237
The index of the new insertelement instruction was evaluated in the
wrong way, it was considered as the index of the inserted value instead
of index of the position, where the value should be inserted.

llvm-svn: 282401
2016-09-26 13:18:59 +00:00
Andrea Di Biagio
c32aed95bc [InstCombine] Teach the udiv folding logic how to handle constant expressions.
This patch fixes PR30366.

Function foldUDivShl() worked under the assumption that one of the values
in input to the function was always an instance of llvm::Instruction.
However, function visitUDivOperand() (the only user of foldUDivShl) was
clearly violating that precondition; internally, visitUDivOperand() uses pattern
matches to check the operands of a udiv. Pattern matchers for binary operators
know how to handle both Instruction and ConstantExpr values.

This patch fixes the problem in foldUDivShl(). Now we use pattern matchers
instead of explicit casts to Instruction. The reduced test case from PR30366
has been added to test file InstCombine/udiv-simplify.ll.

Differential Revision: https://reviews.llvm.org/D24565

llvm-svn: 282398
2016-09-26 12:07:23 +00:00
Duncan P. N. Exon Smith
315172f4ca ObjCARC: Don't look at users of ConstantData
Stop looking at users of UndefValue and ConstantPointerNull in the
objective C ARC optimizers.  The other users aren't actually
interesting, since they're not pointing at a particular object.  I
imagine these calls could be optimized through -instcombine... maybe
they already are?

These early returns will be required at some point in the future, with a
WIP patch that asserts when someone accesses a use-list on ConstantData.

llvm-svn: 282338
2016-09-24 21:01:20 +00:00
Duncan P. N. Exon Smith
1463b59de9 Scalar: Ignore ConstantData in processAssumption
Assumptions on UndefValue and ConstantPointerNull aren't relevant to
other users.  Ignore them entirely to avoid wasting cycles walking
through their (possibly extremely extensive (cross-module)) use-lists.

It wasn't clear how to add a specific test for this, and it'll be
covered anyway by an eventual patch that asserts when trying to access
the use-list of an instance of ConstantData.

llvm-svn: 282334
2016-09-24 20:00:38 +00:00
Duncan P. N. Exon Smith
f0fc8836d4 GlobalStatus: Don't walk use-lists of ConstantData
Return early from llvm::isSafeToDestroyConstant() whenever the value
`isa<ConstantData>()`.  These constants are shared across the
LLVMContext.  We never really want to delete them here, and walking
their use-lists can be very expensive.

(This is motivated by an eventual goal of removing use-lists entirely
from ConstantData.)

llvm-svn: 282320
2016-09-24 02:30:11 +00:00
Alexey Bataev
5e200a4d72 [InstCombine] Fix for PR29124: reduce insertelements to shufflevector
If inserting more than one constant into a vector:

define <4 x float> @foo(<4 x float> %x) {
  %ins1 = insertelement <4 x float> %x, float 1.0, i32 1
  %ins2 = insertelement <4 x float> %ins1, float 2.0, i32 2
  ret <4 x float> %ins2
}

InstCombine could reduce that to a shufflevector:

define <4 x float> @goo(<4 x float> %x) {
 %shuf = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.0, float 2.0, float undef>, <4 x i32><i32 0, i32 5, i32 6, i32 3>
 ret <4 x float> %shuf
}
Also, InstCombine tries to convert shuffle instruction to single insertelement, if one of the vectors is a constant vector and only a single element from this constant should be used in shuffle, i.e.
shufflevector <4 x float> %v, <4 x float> <float undef, float 1.0, float
undef, float undef>, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef> ->
insertelement <4 x float> %v, float 1.0, 1

Differential Revision: https://reviews.llvm.org/D24182

llvm-svn: 282237
2016-09-23 09:14:08 +00:00
Sanjay Patel
3fb981ce63 [InstCombine] fold X urem C -> X < C ? X : X - C when C is big (PR28672)
We already have the udiv variant of this transform, so I think this is ok for 
InstCombine too even though there is an increase in IR instructions. As the 
tests and TODO comments show, the transform can lead to follow-on combines.

This should fix: https://llvm.org/bugs/show_bug.cgi?id=28672

Differential Revision: https://reviews.llvm.org/D24527

llvm-svn: 282209
2016-09-22 22:36:26 +00:00
Hans Wennborg
397edc9070 Revert r282168 "GVN-hoist: fix store past load dependence analysis (PR30216)"
and also the dependent r282175 "GVN-hoist: do not dereference null pointers"

It's causing compiler crashes building Harfbuzz (PR30499).

llvm-svn: 282199
2016-09-22 21:20:53 +00:00
Sebastian Pop
2879352ee0 GVN-hoist: do not dereference null pointers
there may be basic blocks without memory accesses, in which case the
list of accesses is a null pointer.

llvm-svn: 282175
2016-09-22 17:22:58 +00:00
Sebastian Pop
0a87104c7e GVN-hoist: fix store past load dependence analysis (PR30216)
To hoist stores past loads, we used to search for potential
conflicting loads on the hoisting path by following a MemorySSA
def-def link from the store to be hoisted to the previous
defining memory access, and from there we followed the def-use
chains to all the uses that occur on the hoisting path. The
problem is that the def-def link may point to a store that does
not alias with the store to be hoisted, and so the loads that are
walked may not alias with the store to be hoisted, and even as in
the testcase of PR30216, the loads that may alias with the store
to be hoisted are not visited.

The current patch visits all loads on the path from the store to
be hoisted to the hoisting position and uses the alias analysis
to ask whether the store may alias the load. I was not able to
use the MemorySSA functionality to ask for whether load and
store are clobbered: I'm not sure which function to call, so I
used a call to AA->isNoAlias().

Store past store is still working as before using a MemorySSA
query: I added an extra test to pr30216.ll to make sure store
past store does not regress.

Differential Revision: https://reviews.llvm.org/D24517

llvm-svn: 282168
2016-09-22 15:33:51 +00:00
Sebastian Pop
8b7ceb1130 GVN-hoist: fix typo
llvm-svn: 282165
2016-09-22 15:08:09 +00:00
Etienne Bergeron
f1efb9897d [compiler-rt] fix typo in option description [NFC]
llvm-svn: 282163
2016-09-22 14:57:24 +00:00
Sebastian Pop
628c0b289f GVN-hoist: only hoist relevant scalar instructions
Without this patch, GVN-hoist would think that a branch instruction is a scalar instruction
and would try to value number it. The patch filters out all such kind of irrelevant instructions.

A bit frustrating is that there is no easy way to discard all those very infrequent instructions,
a bit like isa<TerminatorInst> that stands for a large family of instructions. I'm thinking that
checking for those very infrequent other instructions would cost us more in compilation time
than just letting those instructions getting numbered, so I'm still thinking that a simpler check:

  if (isa<TerminatorInst>(I))
    return false;

is better than listing all the other less frequent instructions.

Differential Revision: https://reviews.llvm.org/D23929

llvm-svn: 282160
2016-09-22 14:45:40 +00:00
Keith Walker
e6fccf064e Reapplying r281895 (and follow-up r281964) after fixing pr30468.
The additional fix is:

When adding debug information to a lowered phi node in mem2reg
check that we have a valid insertion point after the phi for adding
the debug information.

This change addresses the issue in pr30468 where a lowered phi was
added before a catchswitch and no debug information should be added
after the phi in this case.

Differential Revision: https://reviews.llvm.org/D24797

llvm-svn: 282155
2016-09-22 14:13:25 +00:00
Anna Thomas
16cf546b7c [RS4GC] Remat in presence of phi and use live value
Summary:

Reviewers:

Subscribers:

llvm-svn: 282150
2016-09-22 13:13:06 +00:00
Sagar Thakur
b374c8619d [EfficiencySanitizer] Using '$' instead of '#' for struct counter name
For MIPS '#' is the start of comment line. Therefore we get assembler errors if # is used in the structure names.

Differential: D24334
Reviewed by: zhaoqin

llvm-svn: 282141
2016-09-22 08:33:06 +00:00
Dorit Nuzman
e1fa95c4f1 Fix revision 281960
llvm-svn: 282139
2016-09-22 07:56:23 +00:00
Chad Rosier
f9ec13f3d1 [LoopInterchange] Track all dependencies, not just anti dependencies.
Currently, we give up on loop interchange if we encounter a flow dependency
anywhere in the loop list. Worse yet, we don't even track output dependencies.

This patch updates the dependency matrix computation to track flow and output
dependencies in the same way we track anti dependencies.

This improves an internal workload by 2.2x.

Note the loop interchange pass is off by default and it can be enabled with
'-mllvm -enable-loopinterchange'

Differential Revision: https://reviews.llvm.org/D24564

llvm-svn: 282101
2016-09-21 19:16:47 +00:00
Nico Weber
0866d9b8d9 revert 281908 because 281909 got reverted
llvm-svn: 282097
2016-09-21 18:25:43 +00:00
Matthew Simpson
90550420f2 [LV] Don't emit unused scalars for uniform instructions
If we identify an instruction as uniform after vectorization, we know that we
should only use the value corresponding to the first vector lane of each unroll
iteration. However, when scalarizing such instructions, we still produce values
for the other vector lanes. This patch prevents us from generating the unused
scalars.

Differential Revision: https://reviews.llvm.org/D24275

llvm-svn: 282087
2016-09-21 16:50:24 +00:00
Dehao Chen
054ed42bc2 Change the basic block weight calculation algorithm to use max instead of voting.
Summary: Now that we have more precise debug info, we should change back to use maximum to get basic block weight.

Reviewers: dnovillo

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D24788

llvm-svn: 282084
2016-09-21 16:26:51 +00:00
Matthew Simpson
cf186bdb44 [LV] Rename "Width" to "Lane" (NFC)
llvm-svn: 282083
2016-09-21 16:09:23 +00:00
Hans Wennborg
0092ffdf91 Revert r281895 "Add @llvm.dbg.value entries for the phi node created by -mem2reg"
(And follow-up r281964.)

It caused PR30468.

llvm-svn: 282077
2016-09-21 15:55:53 +00:00
Arnold Schwaighofer
0f29bae334 DeadArgElim: Don't mark swifterror arguments as unused
Replacing swifterror arguments with undef creates invalid IR.

rdar://28300490

llvm-svn: 282075
2016-09-21 15:29:08 +00:00
Chad Rosier
c75a5ac4c9 [LoopInterchange] Various cleanup. NFC.
llvm-svn: 282071
2016-09-21 13:28:41 +00:00
Xinliang David Li
6f5cb0b776 code cleanup -- commoning IR travsersals
llvm-svn: 282034
2016-09-20 22:39:47 +00:00
Anna Thomas
70699ac5a1 [RS4GC] Refactor code for Rematerializing in presence of phi. NFC
Summary:
This is an NFC refactoring change as a precursor to the actual fix for rematerializing in
presence of phi.
https://reviews.llvm.org/D24399

Pasted from review:
findRematerializableChainToBasePointer changed to return the root of the
chain. instead of true or false.
move the PHI matching logic into the caller by inspecting the root return value.
This includes an assertion that the alternate root is in the liveset for the
call.

Tested with current RS4GC tests.

Reviewers: reames, sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24780

llvm-svn: 282023
2016-09-20 21:36:02 +00:00
Xinliang David Li
c3764cfede [Profile] Do not annotate select insts not covered in profile.
Fixed PR/30466

llvm-svn: 282009
2016-09-20 20:20:01 +00:00
Xinliang David Li
7c7f35dd63 [Profile] code refactoring: make getStep a method in base class
llvm-svn: 282002
2016-09-20 19:07:22 +00:00
Adrian Prantl
a598972487 ASAN: Don't drop debug info attachements for global variables.
This is a follow-up to r281284. Global Variables now can have
!dbg attachements, so ASAN should clone these when generating a
sanitized copy of a global variable.

<rdar://problem/24899262>

llvm-svn: 281994
2016-09-20 18:28:42 +00:00
Keith Walker
02dae108bf Make llvm::ConvertDebugDeclareToDebugValue() be a void function (NFC)
The routines llvm::ConvertDebugDeclareToDebugValue() always returned
a true value which was never checked at the call site; change the
function return type to void.

This NFC cleanup was approved in the review https://reviews.llvm.org/D23715

llvm-svn: 281964
2016-09-20 10:36:17 +00:00
Dorit Nuzman
3f7d392c3a Reverting revision 281960 due to test failures.
llvm-svn: 281961
2016-09-20 08:27:48 +00:00
Dorit Nuzman
ed1f20cca4 [SROA] Preserve llvm.mem.parallel_loop_access metadata.
SROA doesn't preserve the llvm.mem.parallel_loop_access metadata when it
transforms loads/stores. This patch fixes a couple occurences of this
issue.

(Partially addresses PR28981).

Differential Revision: https://reviews.llvm.org/D23549

llvm-svn: 281960
2016-09-20 07:50:49 +00:00
Kostya Serebryany
977e6e2c48 [sanitizer-coverage] add comdat to coverage guards if needed
llvm-svn: 281952
2016-09-20 00:16:54 +00:00
Philip Reames
a858157e46 [LCSSA] Cache LoopExits to avoid wasted work
When looking at the scribus_1.3 example from https://llvm.org/bugs/show_bug.cgi?id=10584, I noticed that we were spending a large amount of time computing loop exits in LCSSA. This code appears to be written with the assumption that LoopExits are stored in the Loop and thus cheap to query. This is not true, so we should cache the result across the potentially long running loop which tends to visit a small handful of Loops.

On the particular example from 10584, this change drops the time spent in LCSSA computation by about 80%.

Differential Revision: https://reviews.llvm.org/D24509

llvm-svn: 281949
2016-09-19 23:30:23 +00:00
David Callahan
ac0cc8c0ba Merge branch 'ADCE5'
llvm-svn: 281947
2016-09-19 23:17:58 +00:00
Dehao Chen
d152981ec9 Handle early inline for hot callsites that reside in the same basic block.
Summary: Callsites in the same basic block should share the same hotness. This patch checks for the hottest callsite in the same basic block, and use the hotness for all callsites in that basic block for early inline decisions. It also fixes the test to add "-S" so theat the "CHECK-NOT" is actually checking the content.

Reviewers: dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24734

llvm-svn: 281927
2016-09-19 18:38:14 +00:00
Dehao Chen
4d399f0ac6 Only set branch weight during sample pgo annotation when max_weight of the branch is non-zero. Otherwise use default static profile to set branch probability.
Summary: It does not make sense to set equal weights for all unkown branches as we have static branch prediction available.

Reviewers: dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24732

llvm-svn: 281912
2016-09-19 16:33:41 +00:00
Dehao Chen
302fe385af Use call target count to derive the call instruction weight
Summary: The call target count profile is directly derived from LBR branch->target data. This is more reliable than instruction frequency profiles that could be moved across basic block boundaries. This patches uses call target count profile to annotate call instructions.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24410

llvm-svn: 281911
2016-09-19 16:06:37 +00:00
Etienne Bergeron
6822e52e5d [asan] Support dynamic shadow address instrumentation
Summary:
This patch is adding the support for a shadow memory with
dynamically allocated address range.

The compiler-rt needs to export a symbol containing the shadow
memory range.

This is required to support ASAN on windows 64-bits.

Reviewers: kcc, rnk, vitalybuka

Subscribers: kubabrecka, dberris, llvm-commits, chrisha

Differential Revision: https://reviews.llvm.org/D23354

llvm-svn: 281908
2016-09-19 15:58:38 +00:00
Keith Walker
64915fd385 Add @llvm.dbg.value entries for the phi node created by -mem2reg
When phi nodes are created in the -mem2reg phase, the @llvm.dbg.declare
entries are converted to @llvm.dbg.value entries at the place where the
store instructions existed. However no entry is created to describe
the resulting value of the phi node.

The effect of this is especially noticeable in for loops which have a
constant for the intial value; the loop control variable's location
would be described as the intial constant value in the loop body once
the -mem2reg optimization phase was run.

This change adds the creation of the @llvm.dbg.value entries to describe
variables whose location is the result of a phi node created in -mem2reg.

Also when the phi node is finally lowered to a machine instruction it
is important that the lowered "load" instruction is placed before the
associated DEBUG_VALUE entry describing the value loaded.

Differential Revision: https://reviews.llvm.org/D23715

llvm-svn: 281895
2016-09-19 09:49:30 +00:00
James Molloy
1f17c2f83a [SimplifyCFG] Update (AND) IR flags when CSE'ing instructions
We were updating metadata but not IR flags. Because we pick an arbitrary instruction to be the CSE candidate, it comes down to luck (50% or less chance) if this results in broken codegen or not, which is why PR30373 which is actually not the fault of the commit it was bisected down to.

Fixes PR30373.

llvm-svn: 281889
2016-09-19 08:23:08 +00:00
Dehao Chen
0a9a8c9c72 Handle Invoke during sample profiler annotation: make it inlinable.
Summary: Previously we reline on inst-combine to remove inlinable invoke instructions. This causes trouble because a few extra optimizations are schedule early that could introduce too much CFG change (e.g. simplifycfg removes too much control flow). This patch handles invoke instruction in-place during sample profile annotation, so that we do not rely on instcombine to remove those invoke instructions.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24409

llvm-svn: 281870
2016-09-18 23:11:37 +00:00
Simon Pilgrim
28ff0c8578 Fix covered-switch-default warning
llvm-svn: 281865
2016-09-18 21:08:35 +00:00
Xinliang David Li
ca3730b14d Fix built bot failure
llvm-svn: 281859
2016-09-18 18:52:08 +00:00
Xinliang David Li
3da3a5fee4 [Profile] Implement select instruction instrumentation in IR PGO
Differential Revision: http://reviews.llvm.org/D23727

llvm-svn: 281858
2016-09-18 18:34:07 +00:00
Elena Demikhovsky
fdf2d14b30 [Loop Vectorizer] Consecutive memory access - fixed and simplified
Amended consecutive memory access detection in Loop Vectorizer.
Load/Store were not handled properly without preceding GEP instruction.

Differential Revision: https://reviews.llvm.org/D20789

llvm-svn: 281853
2016-09-18 13:56:08 +00:00
Elena Demikhovsky
6f58841f3b [Loop vectorizer] Simplified GEP cloning. NFC.
Simplified GEP cloning in vectorizeMemoryInstruction().
Added an assertion that checks consecutive GEP, which should have only one loop-variant operand.

Differential Revision: https://reviews.llvm.org/D24557

llvm-svn: 281851
2016-09-18 09:22:54 +00:00
Kostya Serebryany
ad93add26c [libFuzzer] use 'if guard' instead of 'if guard >= 0' with trace-pc; change the guard type to intptr_t; use separate array for 8-bit counters
llvm-svn: 281845
2016-09-18 04:52:23 +00:00
Teresa Johnson
8d9ed1ced3 [ThinLTO] Ensure anonymous globals renamed even at -O0
Summary:
This fixes an issue when files are compiled with -flto=thin
at default -O0. We need to rename anonymous globals before attempting
to write the module summary because all values need names for
the summary. This was happening at -O1 and above, but not before
the early exit when constructing the pipeline for -O0.

Also add an internal -prepare-for-thinlto option to enable this
to be tested via opt.

Fixes PR30419.

Reviewers: mehdi_amini

Subscribers: probinson, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24701

llvm-svn: 281840
2016-09-17 20:40:16 +00:00
Mehdi Amini
3efb00834f Don't create a SymbolTable in Function when the LLVMContext discards value names (NFC)
The ValueSymbolTable is used to detect name conflict and rename
instructions automatically. This is not needed when the value
names are automatically discarded by the LLVMContext.
No functional change intended, just saving a little bit of memory.

This is a recommit of r281806 after fixing the accessor to return
a pointer instead of a reference and updating all the call-sites.

llvm-svn: 281813
2016-09-17 06:00:02 +00:00
Kostya Serebryany
6f62f4753a [sanitizer-coverage] change trace-pc to use 8-byte guards
llvm-svn: 281809
2016-09-17 05:03:05 +00:00