1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 19:52:54 +01:00
Commit Graph

25865 Commits

Author SHA1 Message Date
Evgeniy Brevnov
6aaf92e546 [NARY-REASSOCIATE] Simplify traversal logic by post deleting dead instructions
Currently we delete optimized instructions as we go. That has several negative consequences. First it complicates traversal logic itself. Second if newly generated instruction has been deleted the traversal is repeated from scratch.

But real motivation for the change is upcoming change with support for min/max reassociation. Here we employ SCEV expander to generate code. As a result newly generated instructions may be inserted not right before original instruction (because SCEV may do hoisting) and there is no way to know 'next' instruction.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88285
2020-12-04 16:17:50 +07:00
Kazu Hirata
9e95dea61d [JumpThreading] Call eraseBlock when folding a conditional branch
This patch teaches the jump threading pass to call BPI->eraseBlock
when it folds a conditional branch.

Without this patch, BranchProbabilityInfo could end up with stale edge
probabilities for the basic block containing the conditional branch --
one edge probability with less than 1.0 and the other for a removed
edge.

Differential Revision: https://reviews.llvm.org/D92608
2020-12-03 23:50:17 -08:00
Max Kazantsev
6a8ff5d40d Return "[IndVars] ICmpInst should not prevent IV widening"
This reverts commit 4bd35cdc3ae1874c6d070c5d410b3f591de54ee6.

The patch was reverted during the investigation. The investigation
shown that the patch did not cause any trouble, but just exposed
the existing problem that is addressed by the previous patch
"[IndVars] Quick fix LHS/RHS bug". Returning without changes.
2020-12-04 12:34:43 +07:00
Max Kazantsev
da72560ea4 [IndVars] Quick fix LHS/RHS bug
The code relies on fact that LHS is the NarrowDef but never
really checks it. Adding the conservative restrictive check,
will follow-up with handling of case where RHS is a NarrowDef.
2020-12-04 12:34:42 +07:00
Jianzhou Zhao
37af161d61 [dfsan] Support passing non-i16 shadow values in TLS mode
This is a child diff of D92261.

It extended TLS arg/ret to work with aggregate types.

For a function
  t foo(t1 a1, t2 a2, ... tn an)
Its arguments shadow are saved in TLS args like
  a1_s, a2_s, ..., an_s
TLS ret simply includes r_s. By calculating the type size of each shadow
value, we can get their offset.

This is similar to what MSan does. See __msan_retval_tls and __msan_param_tls
from llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp.

Note that this change does not add test cases for overflowed TLS
arg/ret because this is hard to test w/o supporting aggregate shdow
types. We will be adding them after supporting that.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92440
2020-12-04 02:45:07 +00:00
Philip Reames
17055bcff3 [LoopVec] Support non-instructions as argument to uniform mem ops
The initial step of the uniform-after-vectorization (lane-0 demanded only) analysis was very awkwardly written. It would revisit use list of each pointer operand of a widened load/store. As a result, it was in the worst case O(N^2) where N was the number of instructions in a loop, and had restricted operand Value types to reduce the size of use lists.

This patch replaces the original algorithm with one which is at most O(2N) in the number of instructions in the loop. (The key observation is that each use of a potentially interesting pointer is visited at most twice, once on first scan, once in the use list of *it's* operand. Only instructions within the loop have their uses scanned.)

In the process, we remove a restriction which required the operand of the uniform mem op to itself be an instruction.  This allows detection of uniform mem ops involving global addresses.

Differential Revision: https://reviews.llvm.org/D92056
2020-12-03 14:51:44 -08:00
dfukalov
b944ac9e0a [NFC] Reduce include files dependency.
1. Removed #include "...AliasAnalysis.h" in other headers and modules.
2. Cleaned up includes in AliasAnalysis.h.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92489
2020-12-03 18:25:05 +03:00
Max Kazantsev
78b51b6b79 Revert "[IndVars] ICmpInst should not prevent IV widening"
This reverts commit 0c9c6ddf17bb01ae350a899b3395bb078aa0c62e.

We are seeing some failures with this patch locally. Not clear
if it's causing them or just triggering a problem in another
place. Reverting while investigating.
2020-12-03 18:01:41 +07:00
modimo
5b1e62daa4 [NFC] Fix typo 2020-12-02 22:23:57 -08:00
Jianzhou Zhao
cb92e3d61f [dfsan] Rename ShadowTy/ZeroShadow with prefix Primitive
This is a child diff of D92261.

After supporting field/index-level shadow, the existing shadow with type
i16 works for only primitive types.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92459
2020-12-03 05:31:01 +00:00
Florian Hahn
fad4c5768d [ConstraintElimination] Make sure arguments of std:pow match.
This should fix a build failure on some systems, e.g. solaris11-sparcv9
http://lab.llvm.org:8014/#/builders/22
2020-12-02 22:23:26 +00:00
Hongtao Yu
4cefe8e200 [CSSPGO] Pseudo probes for function calls.
An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work.

One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution.

With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst.

To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against  the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use.

Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`.

Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91756
2020-12-02 13:45:20 -08:00
Jianzhou Zhao
1191fc6062 [dfsan] Rename CachedCombinedShadow to be CachedShadow
At D92261, this type will be used to cache both combined shadow and
converted shadow values.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92458
2020-12-02 21:39:16 +00:00
jasonliu
60c6f78bef [XCOFF][AIX] Generate LSDA data and compact unwind section on AIX
Summary:
AIX uses the existing EH infrastructure in clang and llvm.
The major differences would be
1. AIX do not have CFI instructions.
2. AIX uses a new personality routine, named __xlcxx_personality_v1.
   It doesn't use the GCC personality rountine, because the
   interoperability is not there yet on AIX.
3. AIX do not use eh_frame sections. Instead, it would use a eh_info
section (compat unwind section) to store the information about
personality routine and LSDA data address.

Reviewed By: daltenty, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D91455
2020-12-02 18:42:44 +00:00
Bardia Mahjour
fbc2c5ae27 [LV] Epilogue Vectorization with Optimal Control Flow (Recommit)
This is yet another attempt at providing support for epilogue
vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none
and reviews D30247 and D88819.

Similar to D88819, this patch achieve epilogue vectorization by
executing a single vplan twice: once on the main loop and a second
time on the epilogue loop (using a different VF). However it's able
to handle more loops, and generates more optimal control flow for
cases where the trip count is too small to execute any code in vector
form.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D89566
2020-12-02 10:09:56 -05:00
Sanjay Patel
1b9dd18234 [SLP] use 'match' for binop/select; NFC
This might be a small improvement in readability, but the
real motivation is to make it easier to adapt the code to
deal with intrinsics like 'maxnum' and/or integer min/max.

There is potentially help in doing that with D92086, but
we might also just add specialized wrappers here to deal
with the expected patterns.
2020-12-02 09:04:08 -05:00
Alex Zinenko
0085eeb3aa [OpenMPIRBuilder] forward arguments as pointers to outlined function
OpenMPIRBuilder::createParallel outlines the body region of the parallel
construct into a new function that accepts any value previously defined outside
the region as a function argument. This function is called back by OpenMP
runtime function __kmpc_fork_call, which expects trailing arguments to be
pointers. If the region uses a value that is not of a pointer type, e.g. a
struct, the produced code would be invalid. In such cases, make createParallel
emit IR that stores the value on stack and pass the pointer to the outlined
function instead. The outlined function then loads the value back and uses as
normal.

Reviewed By: jdoerfert, llitchev

Differential Revision: https://reviews.llvm.org/D92189
2020-12-02 14:59:41 +01:00
David Sherwood
6d7c7dcc2b [SVE] Add support for scalable vectors with vectorize.scalable.enable loop attribute
In this patch I have added support for a new loop hint called
vectorize.scalable.enable that says whether we should enable scalable
vectorization or not. If a user wants to instruct the compiler to
vectorize a loop with scalable vectors they can now do this as
follows:

  br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !2
  ...
  !2 = !{!2, !3, !4}
  !3 = !{!"llvm.loop.vectorize.width", i32 8}
  !4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

Setting the hint to false simply reverts the behaviour back to the
default, using fixed width vectors.

Differential Revision: https://reviews.llvm.org/D88962
2020-12-02 13:23:43 +00:00
Chen Zheng
15f3086fcf [LSR][NFC] don't collect chains when isNumRegsMajorCostOfLSR is false.
Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D92159
2020-12-01 22:29:33 -05:00
Jianzhou Zhao
063bca0cca [msan] Replace 8 by kShadowTLSAlignment
Reviewed-by: eugenis

Differential Revision: https://reviews.llvm.org/D92275
2020-12-02 01:09:49 +00:00
Fangrui Song
3b69235500 static const char *const foo => const char foo[]
By default, a non-template variable of non-volatile const-qualified type
having namespace-scope has internal linkage, so no need for `static`.
2020-12-01 10:33:18 -08:00
Bardia Mahjour
b7eee47753 Revert "[LV] Epilogue Vectorization with Optimal Control Flow"
This reverts commit 9c5504adceb544d9954ddb8ff3035a414f4b1423.
Reverting to investigate build failure in http://lab.llvm.org:8011/#/builders/98/builds/1461/steps/9
2020-12-01 12:50:36 -05:00
Bardia Mahjour
63b138b338 [LV] Epilogue Vectorization with Optimal Control Flow
This is yet another attempt at providing support for epilogue
vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none
and reviews D30247 and D88819.

Similar to D88819, this patch achieve epilogue vectorization by
executing a single vplan twice: once on the main loop and a second
time on the epilogue loop (using a different VF). However it's able
to handle more loops, and generates more optimal control flow for
cases where the trip count is too small to execute any code in vector
form.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D89566
2020-12-01 12:04:29 -05:00
Nikita Popov
4955e186d9 [MemCpyOpt] Port to MemorySSA
This is a straightforward port of MemCpyOpt to MemorySSA following
the approach of D26739. MemDep queries are replaced with MSSA queries
without changing the overall structure of the pass. Some care has
to be taken to account for differences between these APIs
(MemDep also returns reads, MSSA doesn't).

Differential Revision: https://reviews.llvm.org/D89207
2020-12-01 17:57:41 +01:00
Clement Courbet
c96423a30c [MergeICmps] Fix missing split.
We were not correctly splitting a blocks for chains of length 1.

Before that change, additional instructions for blocks in chains of
length 1 were not split off from the block before removing (this was
done correctly for chains of longer size).
If this first block contained an instruction referenced elsewhere,
deleting the block, would result in invalidation of the produced value.

This caused a miscompile which motivated D92297 (before D17993,
nonnull and dereferenceable attributed were not added so MergeICmps were
not triggered.) The new test gep-references-bb.ll demonstrate the issue.

The regression was introduced in
rG0efadbbcdeb82f5c14f38fbc2826107063ca48b2.

This supersedes D92364.

Test case by MaskRay (Fangrui Song).

Differential Revision: https://reviews.llvm.org/D92375
2020-12-01 16:50:55 +01:00
Sanjay Patel
b8c83b9595 [InstCombine] canonicalize sign-bit-shift of difference to ext(icmp)
icmp is the preferred spelling in IR because icmp analysis is
expected to be better than any other analysis. This should
lead to more follow-on folding potential.

It's difficult to say exactly what we should do in codegen to
compensate. For example on AArch64, which of these is preferred:
	sub	w8, w0, w1
	lsr	w0, w8, #31

vs:
	cmp	w0, w1
	cset	w0, lt

If there are perf regressions, then we should deal with those in
codegen on a case-by-case basis.

A possible motivating example for better optimization is shown in:
https://llvm.org/PR43198 but that will require other transforms
before anything changes there.

Alive proof:
https://rise4fun.com/Alive/o4E

  Name: sign-bit splat
  Pre: C1 == (width(%x) - 1)
  %s = sub nsw %x, %y
  %r = ashr %s, C1
  =>
  %c = icmp slt %x, %y
  %r = sext %c

  Name: sign-bit LSB
  Pre: C1 == (width(%x) - 1)
  %s = sub nsw %x, %y
  %r = lshr %s, C1
  =>
  %c = icmp slt %x, %y
  %r = zext %c
2020-12-01 09:58:11 -05:00
Florian Hahn
758cff690d [ConstraintElimination] Decompose GEP %ptr, ZEXT(SHL()).
Add support to decompose a GEP with a ZEXT(SHL()) operand.
2020-12-01 14:23:21 +00:00
Bhramar Vatsa
2a0264f0ea [InstCombine] Optimize away the unnecessary multi-use sign-extend
C.f. https://bugs.llvm.org/show_bug.cgi?id=47765

Added a case for handling the sign-extend (Shl+AShr) for multiple uses,
to optimize it away for an individual use,
when the demanded bits aren't affected by sign-extend.

https://rise4fun.com/Alive/lgf

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D91343
2020-12-01 16:54:00 +03:00
Roman Lebedev
b2bda36752 [InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold, 2
If the shift amount was undef for some lane, the shift amount in opposite
shift is irrelevant for that lane, and the new shift amount for that lane
can be undef.
2020-12-01 16:54:00 +03:00
Roman Lebedev
bd4ad50d2c Revert "[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold"
It seems i have missed checklines, temporairly reverting,
will reland momentairly..

This reverts commit aa1aa135097ecfab6d9917a435142030eff0a226.
2020-12-01 15:47:04 +03:00
Roman Lebedev
0e49bd7b55 [InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold
If the shift amount was undef for some lane, the shift amount in opposite
shift is irrelevant for that lane, and the new shift amount for that lane
can be undef.
2020-12-01 15:13:08 +03:00
Roman Lebedev
28c20628bd [InstCombine] Evaluate new shift amount for sext(ashr(shl(trunc()))) fold in wide type (PR48343)
It is not correct to compute that new shift amount in it's narrow type
and only then extend it into the wide type:

----------------------------------------
Optimization: PR48343 good
Precondition: (width(%X) == width(%r))
  %o0 = trunc %X
  %o1 = shl %o0, %Y
  %o2 = ashr %o1, %Y
  %r = sext %o2
=>
  %n0 = sext %Y
  %n1 = sub width(%o0), %n0
  %n2 = sub width(%X), %n1
  %n3 = shl %X, %n2
  %r = ashr %n3, %n2

Done: 2016
Optimization is correct!

----------------------------------------
Optimization: PR48343 bad
Precondition: (width(%X) == width(%r))
  %o0 = trunc %X
  %o1 = shl %o0, %Y
  %o2 = ashr %o1, %Y
  %r = sext %o2
=>
  %n0 = sub width(%o0), %Y
  %n1 = sub width(%X), %n0
  %n2 = sext %n1
  %n3 = shl %X, %n2
  %r = ashr %n3, %n2

Done: 1
ERROR: Domain of definedness of Target is smaller than Source's for i9 %r

Example:
%X i9 = 0x000 (0)
%Y i4 = 0x3 (3)
%o0 i4 = 0x0 (0)
%o1 i4 = 0x0 (0)
%o2 i4 = 0x0 (0)
%n0 i4 = 0x1 (1)
%n1 i4 = 0x8 (8, -8)
%n2 i9 = 0x1F8 (504, -8)
%n3 i9 = 0x000 (0)
Source value: 0x000 (0)
Target value: undef


I.e. we should be computing it in the wide type from the beginning.

Fixes https://bugs.llvm.org/show_bug.cgi?id=48343
2020-12-01 15:13:07 +03:00
Roman Lebedev
9cb07703fc [SimplifyCFG] FoldBranchToCommonDest: don't require that cmp of br is last instruction
There is no correctness need for that, and since we allow live-out
uses, this could theoretically happen, because currently nothing
will move the cond to right before the branch in those tests.
But regardless, lifting that restriction even makes the transform
easier to understand.

This makes the transform happen in 81 more cases (+0.55%)
)
2020-12-01 15:13:06 +03:00
Cullen Rhodes
c640adbe73 [LV] Clamp VF hint when unsafe
In the following loop the dependence distance is 2 and can only be
vectorized if the vector length is no larger than this.

  void foo(int *a, int *b, int N) {
    #pragma clang loop vectorize(enable) vectorize_width(4)
    for (int i=0; i<N; ++i) {
      a[i + 2] = a[i] + b[i];
    }
  }

However, when specifying a VF of 4 via a loop hint this loop is
vectorized. According to [1][2], loop hints are ignored if the
optimization is not safe to apply.

This patch introduces a check to bail of vectorization if the user
specified VF is greater than the maximum feasible VF, unless explicitly
forced with '-force-vector-width=X'.

[1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave
[2] https://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations

Reviewed By: sdesmalen, fhahn, Meinersbur

Differential Revision: https://reviews.llvm.org/D90687
2020-12-01 11:30:34 +00:00
Caroline Concatto
319a0490a1 [NFC][CostModel]Extend class IntrinsicCostAttributes to use ElementCount Type
This patch replaces the attribute  `unsigned VF`  in the class
IntrinsicCostAttributes by `ElementCount VF`.
This is a non-functional change to help upcoming patches to compute the cost
model for scalable vector inside this class.

Differential Revision: https://reviews.llvm.org/D91532
2020-12-01 11:12:51 +00:00
Florian Hahn
5168f3f070 [ConstraintElimination] Decompose GEP %ptr, SHL().
Add support the decompose a GEP with an SHL operand.
2020-12-01 10:58:36 +00:00
Sjoerd Meijer
c00b31be29 ExtractValue instruction costs
Instruction ExtractValue wasn't handled in
LoopVectorizationCostModel::getInstructionCost(). As a result, it was modeled
as a mul which is not really accurate. Since it is free (most of the times),
this now gets a cost of 0 using getInstructionCost.

This is a follow-up of D92208, that required changing this regression test.
In a follow up I will look at InsertValue which also isn't handled yet.

Differential Revision: https://reviews.llvm.org/D92317
2020-12-01 10:42:23 +00:00
Greg Parker
f76f6a2acc [DSE] Remove a redundant call to getLocForWriteEx()
Differential Revision: https://reviews.llvm.org/D92263
2020-11-30 21:12:24 -08:00
Mircea Trofin
6fefe6148b [llvm][inliner] Reuse the inliner pass to implement 'always inliner'
Enable performing mandatory inlinings upfront, by reusing the same logic
as the full inliner, instead of the AlwaysInliner. This has the
following benefits:
- reduce code duplication - one inliner codebase
- open the opportunity to help the full inliner by performing additional
function passes after the mandatory inlinings, but before th full
inliner. Performing the mandatory inlinings first simplifies the problem
the full inliner needs to solve: less call sites, more contextualization, and,
depending on the additional function optimization passes run between the
2 inliners, higher accuracy of cost models / decision policies.

Note that this patch does not yet enable much in terms of post-always
inline function optimization.

Differential Revision: https://reviews.llvm.org/D91567
2020-11-30 12:03:39 -08:00
Hongtao Yu
9aa70f5aa6 [CSSPGO] Pseudo probe instrumentation pass
This change introduces a pseudo probe instrumentation pass for block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

Given the following LLVM IR:

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}
```

The instrumented IR will look like below. Note that each llvm.pseudoprobe intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86499
2020-11-30 10:16:54 -08:00
Florian Hahn
816af73eeb [VPlan] Use VPUser to manage VPPredInstPHIRecipe operand (NFC).
VPPredInstPHIRecipe is one of the recipes that was missed during the
initial conversion. This patch adjusts the recipe to also manage its
operand using VPUser.
2020-11-30 13:09:58 +00:00
Roman Lebedev
1fcba19f69 [NFC][SimplifyCFG] Add STATISTIC() to the FoldValueComparisonIntoPredecessors() fold 2020-11-30 12:27:16 +03:00
Max Kazantsev
baf1a4d1ab [IndVars] ICmpInst should not prevent IV widening
If we decided to widen IV with zext, then unsigned comparisons
should not prevent widening (same for sext/sign comparisons).
The result of comparison in wider type does not change in this case.

Differential Revision: https://reviews.llvm.org/D92207
Reviewed By: nikic
2020-11-30 10:51:31 +07:00
Fangrui Song
3d01add6b1 [VPlan] Fix -Wunused-variable after a813090072c0527eb6ed51dd2ea4f54cb6bc72a0 2020-11-29 10:38:01 -08:00
Florian Hahn
ffd8b60920 [VPlan] Use VPValue and VPUser ops to print VPReplicateRecipe. 2020-11-29 18:28:27 +00:00
Florian Hahn
10fe977fe3 [VPlan] Manage stored values of interleave groups using VPUser (NFC)
Interleave groups also depend on the values they store. Manage the
stored values as VPUser operands. This is currently a NFC, but is
required to allow VPlan transforms and to manage generated vector values
exclusively in VPTransformState.
2020-11-29 17:24:36 +00:00
Andrew Litteken
29b0588fca Revert "[IRSim][IROutliner] Adding the extraction basics for the IROutliner."
Reverting commit due to address sanitizer errors.

> Extracting the similar regions is the first step in the IROutliner.
> 
> Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
> sort them by how many instructions will be removed.  Each
> IRSimilarityCandidate is used to define an OutlinableRegion.  Each
> region is ordered by their occurrence in the Module and the regions that
> are not compatible with previously outlined regions are discarded.
> 
> Each region is then extracted with the CodeExtractor into its own
> function.
> 
> We test that correctly extract in:
> test/Transforms/IROutliner/extraction.ll
> test/Transforms/IROutliner/address-taken.ll
> test/Transforms/IROutliner/outlining-same-globals.ll
> test/Transforms/IROutliner/outlining-same-constants.ll
> test/Transforms/IROutliner/outlining-different-structure.ll
> 
> Reviewers: paquette, jroelofs, yroux
> 
> Differential Revision: https://reviews.llvm.org/D86975

This reverts commit bf899e891387d07dfd12de195ce2a16f62afd5e0.
2020-11-27 19:55:57 -06:00
Andrew Litteken
4300f34d01 [IRSim][IROutliner] Adding the extraction basics for the IROutliner.
Extracting the similar regions is the first step in the IROutliner.

Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
sort them by how many instructions will be removed.  Each
IRSimilarityCandidate is used to define an OutlinableRegion.  Each
region is ordered by their occurrence in the Module and the regions that
are not compatible with previously outlined regions are discarded.

Each region is then extracted with the CodeExtractor into its own
function.

We test that correctly extract in:
test/Transforms/IROutliner/extraction.ll
test/Transforms/IROutliner/address-taken.ll
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll

Reviewers: paquette, jroelofs, yroux

Differential Revision: https://reviews.llvm.org/D86975
2020-11-27 19:08:29 -06:00
Florian Hahn
f8bb3cf583 [VPlan] Use VPTransformState::set in widenGEP.
This patch updates widenGEP to manage the resulting vector values using
the VPValue of VPWidenGEP recipe.
2020-11-27 17:01:55 +00:00
Francesco Petrogalli
4a2f3f7420 [AllocaInst] Update getAllocationSizeInBits to return TypeSize.
Reviewed By: peterwaller-arm, sdesmalen

Differential Revision: https://reviews.llvm.org/D92020
2020-11-27 16:39:10 +00:00