1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 11:42:57 +01:00
Commit Graph

25902 Commits

Author SHA1 Message Date
Jianzhou Zhao
38add4d4b7 [dfsan] Track field/index-level shadow values in variables
*************
* The problem
*************
See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current
DFSan always uses a 16bit shadow value for a variable with any type by
combining all shadow values of all bytes of the variable. So it cannot
distinguish two fields of a struct: each field's shadow value equals the
combined shadow value of all fields. This introduces an overtaint issue.

Consider a parsing function

   std::pair<char*, int> get_token(char* p);

where p points to a buffer to parse, the returned pair includes the next
token and the pointer to the position in the buffer after the token.

If the token is tainted, then both the returned pointer and int ar
tainted. If the parser keeps on using get_token for the rest parsing,
all the following outputs are tainted because of the tainted pointer.

The CL is the first change to address the issue.

**************************
* The proposed improvement
**************************
Eventually all fields and indices have their own shadow values in
variables and memory.

For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8},
[2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16],
{[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with
primary type still have shadow values i16.

***************************
* An potential implementation plan
***************************

The idea is to adopt the change incrementially.

1) This CL
Support field-level accuracy at variables/args/ret in TLS mode,
load/store/alloca still use combined shadow values.

After the alloca promotion and SSA construction phases (>=-O1), we
assume alloca and memory operations are reduced. So if struct
variables do not relate to memory, their tracking is accurate at
field level.

2) Support field-level accuracy at alloca
3) Support field-level accuracy at load/store

These two should make O0 and real memory access work.

4) Support vector if necessary.
5) Support Args mode if necessary.
6) Support passing more accurate shadow values via custom functions if
necessary.

***************
* About this CL.
***************
The CL did the following

1) extended TLS arg/ret to work with aggregate types. This is similar
to what MSan does.

2) implemented how to map between an original type/value/zero-const to
its shadow type/value/zero-const.

3) extended (insert|extract)value to use field/index-level progagation.

4) for other instructions, propagation rules are combining inputs by or.
The CL converts between aggragate and primary shadow values at the
cases.

5) Custom function interfaces also need such a conversion because
all existing custom functions use i16. It is unclear whether custome
functions need more accurate shadow propagation yet.

6) Added test cases for aggregate type related cases.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92261
2020-12-09 19:38:35 +00:00
Sanjay Patel
d6e6ff92ec [VectorCombine] allow peeking through an extractelt when creating a vector load
This is an enhancement to load vectorization that is motivated by
a pattern in https://llvm.org/PR16739.
Unfortunately, it's still not enough to make a difference there.
We will have to handle multi-use cases in some better way to avoid
creating multiple overlapping loads.

Differential Revision: https://reviews.llvm.org/D92858
2020-12-09 10:36:14 -05:00
Roman Lebedev
ae9bbdd2bb [InstCombine] canonicalizeSaturatedAdd(): last fold is only valid for strict comparison (PR48390)
We could create uadd.sat under incorrect circumstances
if a select with -1 as the false value was canonicalized
by swapping the T/F values. Unlike the other transforms
in the same function, it is not invariant to equality.

Some alive proofs: https://alive2.llvm.org/ce/z/emmKKL

Based on original patch by David Green!

Fixes https://bugs.llvm.org/show_bug.cgi?id=48390

Differential Revision: https://reviews.llvm.org/D92717
2020-12-09 18:19:09 +03:00
Anton Afanasyev
3423fa8d78 [SLP] Use the width of value truncated just before storing
For stores chain vectorization we choose the size of vector
elements to ensure we fit to minimum and maximum vector register
size for the number of elements given. This patch corrects vector
element size choosing the width of value truncated just before
storing instead of the width of value stored.

Fixes PR46983

Differential Revision: https://reviews.llvm.org/D92824
2020-12-09 16:38:45 +03:00
Sander de Smalen
ca602d5346 [LoopVectorizer][SVE] Vectorize a simple loop with with a scalable VF.
* Steps are scaled by `vscale`, a runtime value.
* Changes to circumvent the cost-model for now (temporary)
  so that the cost-model can be implemented separately.

This can vectorize the following loop [1]:

   void loop(int N, double *a, double *b) {
     #pragma clang loop vectorize_width(4, scalable)
     for (int i = 0; i < N; i++) {
       a[i] = b[i] + 1.0;
     }
   }

[1] This source-level example is based on the pragma proposed
separately in D89031. This patch only implements the LLVM part.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D91077
2020-12-09 11:25:21 +00:00
Sander de Smalen
05c67c93f8 [LoopVectorizer] NFC: Remove unnecessary asserts that VF cannot be scalable.
This patch removes a number of asserts that VF is not scalable, even though
the code where this assert lives does nothing that prevents VF being scalable.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D91060
2020-12-09 11:25:21 +00:00
Joe Ellis
eb525ef991 [SelectionDAG] Add llvm.vector.{extract,insert} intrinsics
This commit adds two new intrinsics.

- llvm.experimental.vector.insert: used to insert a vector into another
  vector starting at a given index.

- llvm.experimental.vector.extract: used to extract a subvector from a
  larger vector starting from a given index.

The codegen work for these intrinsics has already been completed; this
commit is simply exposing the existing ISD nodes to LLVM IR.

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D91362
2020-12-09 11:08:41 +00:00
Philip Reames
88c99ea84d [indvars] Common a bit of code [NFC] 2020-12-08 15:25:48 -08:00
Anna Thomas
ea0d3b5f95 [ScalarizeMaskedMemIntrin] Add new PM support
This patch adds new PM support for the pass and the pass can be now used
during middle-end transforms. The old pass is remamed to
ScalarizeMaskedMemIntrinLegacyPass.

Reviewed-By: skatkov, aeubanks
Differential Revision: https://reviews.llvm.org/D92743
2020-12-08 17:15:22 -05:00
Benjamin Kramer
b7301cf589 Move createScalarizeMaskedMemIntrinPass to Scalar.h 2020-12-08 19:08:09 +01:00
Benjamin Kramer
2418d4c8bd Remove unused include. NFC.
This is also a layering violation.
2020-12-08 19:03:56 +01:00
Anna Thomas
86f8a623c5 [ScalarizeMaskedMemIntrinsic] Move from CodeGen into Transforms
ScalarizeMaskedMemIntrinsic is currently a codeGen level pass. The pass
is actually operating on IR level and does not use any code gen specific
passes.  It is useful to move it into transforms directory so that it
can be more widely used as a mid-level transform as well (apart from
usage in codegen pipeline).
In particular, we have a usecase downstream where we would like to use
this pass in our mid-level pipeline which operates on IR level.

The next change will be to add support for new PM.

Reviewers: craig.topper, apilipenko, skatkov
Reviewed-By: skatkov
Differential Revision: https://reviews.llvm.org/D92407
2020-12-08 12:25:58 -05:00
Xun Li
6b5a85d5d6 [coroutine] should disable inline before calling coro split
This is a rework of D85812, which didn't land.
When callee coroutine function is inlined into caller coroutine function before coro-split pass, llvm will emits "coroutine should have exactly one defining @llvm.coro.begin". It seems that coro-early pass can not handle this quiet well.
So we believe that unsplited coroutine function should not be inlined.
This patch fix such issue by not inlining function if it has attribute "coroutine.presplit" (it means the function has not been splited) to fix this issue
test plan: check-llvm, check-clang

In D85812, there was suggestions on moving the macros to Attributes.td to avoid circular header dependency issue.
I believe it's not worth doing just to be able to use one constant string in one place.
Today, there are already 3 possible attribute values for "coroutine.presplit": c6543cc6b8/llvm/lib/Transforms/Coroutines/CoroInternal.h (L40-L42)
If we move them into Attributes.td, we would be adding 3 new attributes to EnumAttr, just to support this, which I think is an overkill.

Instead, I think the best way to do this is to add an API in Function class that checks whether this function is a coroutine, by checking the attribute by name directly.

Differential Revision: https://reviews.llvm.org/D92706
2020-12-08 08:53:08 -08:00
Teresa Johnson
01add29a48 [ICP] Don't promote when target not defined in module
This guards against cases where the symbol was dead code eliminated in
the binary by ThinLTO, and we have a sample profile collected for one
binary but used to optimize another.

Most of the benefit from ICP comes from inlining the target, which we
can't do with only a declaration anyway. If this is in the pre-ThinLTO
link step (e.g. for instrumentation based PGO), we will attempt the
promotion again in the ThinLTO backend after importing anyway, and we
don't need the early promotion to facilitate that.

Differential Revision: https://reviews.llvm.org/D92804
2020-12-08 07:45:36 -08:00
Sjoerd Meijer
34ffe52cbc [LICM][docs] Document that LICM is also a canonicalization transform. NFC.
This documents that LICM is a canonicalization transform, which we discussed
recently in:

http://lists.llvm.org/pipermail/llvm-dev/2020-December/147184.html

but which was also discused earlier, e.g. in:

http://lists.llvm.org/pipermail/llvm-dev/2019-September/135058.html
2020-12-08 11:56:35 +00:00
Evgeniy Brevnov
816ffc867c [DSE][NFC] Need to be carefull mixing signed and unsigned types
Currently in some places we use signed type to represent size of an access and put explicit casts from unsigned to signed.
For example: int64_t EarlierSize = int64_t(Loc.Size.getValue());

Even though it doesn't loos bits (immidiatly) it may overflow and we end up with negative size. Potentially that cause later code to work incorrectly. A simple expample is a check that size is not negative.

I think it would be safer and clearer if we use unsigned type for the size and handle it appropriately.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D92648
2020-12-08 16:53:37 +07:00
Valentin Churavy
fa8ef60bf6 [VNCoercion] Disallow coercion between different ni addrspaces
I'm not sure if it would be legal by the IR reference to introduce
an addrspacecast here, since the IR reference is a bit vague on
the exact semantics, but at least for our usage of it (and I
suspect for many other's usage) it is not. For us, addrspacecasts
between non-integral address spaces carry frontend information that the
optimizer cannot deduce afterwards in a generic way (though we
have frontend specific passes in our pipline that do propagate
these). In any case, I'm sure nobody is using it this way at
the moment, since it would have introduced inttoptrs, which
are definitely illegal.

Fixes PR38375

Co-authored-by: Keno Fischer <keno@alumni.harvard.edu>

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D50010
2020-12-07 20:19:48 -05:00
Sanjay Patel
98a0dc8d53 [SLP] fix typo in debug string; NFC 2020-12-07 15:09:21 -05:00
Bardia Mahjour
2c6f4bdbb7 [LV] Epilogue Vectorization with Optimal Control Flow - Default Enablement
This patch enables epilogue vectorization by default per reviewer requests.

Differential Revision: https://reviews.llvm.org/D89566
2020-12-07 14:29:36 -05:00
Florian Hahn
4b1d72bba8 [ConstraintElimination] Tweak placement in pipeline.
This patch adds the ConstraintElimination pass to the LTO pipeline and
also runs it after SCCP in the function simplification pipeline.

This increases the number of cases we can elimination. Pending further
tuning.
2020-12-07 19:08:40 +00:00
Simon Pilgrim
cf86ed0017 [IPO] Fix operator precedence warning. NFCI.
Check the entire assertion condition before && with the message.
2020-12-07 18:23:54 +00:00
Alexey Bataev
a5cc9d0a29 [SLP]Merge reorder and reuse shuffles.
It is possible to merge reuse and reorder shuffles and reduce the total
cost of the ivectorization tree/number of final instructions.

Differential Revision: https://reviews.llvm.org/D92668
2020-12-07 07:50:00 -08:00
Jun Ma
253199f455 [Coroutines] Add DW_OP_deref for transformed dbg.value intrinsic.
Differential Revision: https://reviews.llvm.org/D92462
2020-12-07 10:24:44 +08:00
Craig Topper
374bd34552 [LoopIdiomRecognize] Merge a conditional operator with an earlier if and remove an extra temporary variable. NFC
The CountPrev variable was only used to forward a value from
the if statement to the conditional operator under the same
condition.

While there move some variable declarations to their first
assignment.
2020-12-06 15:23:18 -08:00
Fangrui Song
4cd07b6bd6 [Transforms] Delete unused declarations from NewGVN/CoroSplit/ValueMapper 2020-12-06 13:04:01 -08:00
Wenlei He
6ab8756fe0 [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining
This change adds the context-senstive sample PGO infracture described in CSSPGO RFC (https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s). It introduced an abstraction between input profile and profile loader that queries input profile for functions. Specifically, there's now the notion of base profile and context profile, and they are managed by the new SampleContextTracker for adjusting and merging profiles based on inline decisions. It works with top-down profiled guided inliner in profile loader (https://reviews.llvm.org/D70655) for better inlining with specialization and better post-inline profile fidelity. In the future, we can also expose this infrastructure to CGSCC inliner in order for it to take advantage of context-sensitive profile. This change is the consumption part of context-sensitive profile (The generation part is in this stack: https://reviews.llvm.org/D89707). We've seen good results internally in conjunction with Pseudo-probe (https://reviews.llvm.org/D86193). Pacthes for integration with Pseudo-probe coming up soon.

Currently the new infrastructure kick in when input profile contains the new context-sensitive profile; otherwise it's no-op and does not affect existing AutoFDO.

**Interface**

There're two sets of interfaces for query and tracking respectively exposed from SampleContextTracker. For query, now instead of simply getting a profile from input for a function, we can explicitly query base profile or context profile for given call path of a function. For tracking, there're separate APIs for marking context profile as inlined, or promoting and merging not inlined context profile.

- Query base profile (`getBaseSamplesFor`)
Base profile is the merged synthetic profile for function's CFG profile from any outstanding (not inlined) context. We can query base profile by function.

- Query context profile (`getContextSamplesFor`)
Context profile is a function's CFG profile for a given calling context. We can query context profile by context string.

- Track inlined context profile (`markContextSamplesInlined`)
When a function is inlined for given calling context, we need to mark the context profile for that context as inlined. This is to make sure we don't include inlined context profile when synthesizing base profile for that inlined function.

- Track not-inlined context profile (`promoteMergeContextSamplesTree`)
When a function is not inlined for given calling context, we need to promote the context profile tree so the not inlined context becomes top-level context. This preserve the sub-context under that function so later inline decision for that not inlined function will still have context profile for its call tree. Note that profile will be merged if needed when promoting a context profile tree if any of the node already exists at its promoted destination.

**Implementation**

Implementation-wise, `SampleContext` is created as abstraction for context. Currently it's a string for call path, and we can later optimize it to something more efficient, e.g. context id. Each `SampleContext` also has a `ContextState` indicating whether it's raw context profile from input, whether it's inlined or merged, whether it's synthetic profile created by compiler. Each `FunctionSamples` now has a `SampleContext` that tells whether it's base profile or context profile, and for context profile what is the context and state.

On top of the above context representation, a custom trie tree is implemented to track and manager context profiles. Specifically, `SampleContextTracker` is implemented that encapsulates a trie tree with `ContextTireNode` as node. Each node of the trie tree represents a frame in calling context, thus the path from root to a node represents a valid calling context. We also track `FunctionSamples` for each node, so this trie tree can serve efficient query for context profile. Accordingly, context profile tree promotion now becomes moving a subtree to be under the root of entire tree, and merge nodes for subtree if this move encounters existing nodes.

**Integration**

`SampleContextTracker` is now also integrated with AutoFDO, `SampleProfileReader` and `SampleProfileLoader`. When we detected input profile contains context-sensitive profile, `SampleContextTracker` will be used to track profiles, and all profile query will go to `SampleContextTracker` instead of `SampleProfileReader` automatically. Tracking APIs are called automatically for each inline decision from `SampleProfileLoader`.

Differential Revision: https://reviews.llvm.org/D90125
2020-12-06 11:49:18 -08:00
Kazu Hirata
ee8c0f1a72 [InstCombine] Remove replacePointer (NFC)
The declaration was introduced on Feb 10, 2017 in commit
ba01ed00fef32c48d8e2787a6feaf33568a80bfe without a corresponding
definition.
2020-12-06 10:24:08 -08:00
Sanjay Patel
a64549da09 [InstCombine] avoid crash on phi with unreachable incoming block (PR48369) 2020-12-06 09:31:47 -05:00
Fangrui Song
68d910197d [MemProf] Make __memprof_shadow_memory_dynamic_address dso_local in static relocation model
The x86-64 backend currently has a bug which uses a wrong register when for the GOTPCREL reference.
The program will crash without the dso_local specifier.
2020-12-05 21:36:31 -08:00
Florian Hahn
efea313a11 [ConstraintElimination] Handle constraints with all zero var coeffs.
Constraints where all variable coefficients are 0 do not add any useful
information. When checking, we can check if they are always true/false.
2020-12-05 12:06:53 +00:00
Kazu Hirata
68aac7fe9c [IRCE] Remove unused IsSigned and its accessor (NFC)
IsSigned and its accessor, isSigned, were introduced on Oct 25, 2017
in commit 9ac7021a2563d433549a21990f96184d413e69e2.  The last use was
removed on Nov 20, 2017 in commit
268467869b99b15a15f81bf009d31e11536bef39.
2020-12-04 21:26:12 -08:00
Jianzhou Zhao
63a65f2899 [dfsan] Add empty APIs for field-level shadow
This is a child diff of D92261.

This diff adds APIs that return shadow type/value/zero from origin
objects. For the time being these APIs simply returns primitive
shadow type/value/zero. The following diff will be implementing the
conversion.

As D92261 explains, some cases still use primitive shadow during
the incremential changes. The cases include
1) alloca/load/store
2) custom function IO
3) vectors
At the cases this diff does not use the new APIs, but uses primitive
shadow objects explicitly.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92629
2020-12-04 21:42:07 +00:00
Duncan P. N. Exon Smith
ea15c01596 ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC
Prepare to delete `AlignedCharArrayUnion` by migrating its users over to
`std::aligned_union_t`.

I will delete `AlignedCharArrayUnion` and its tests in a follow-up
commit so that it's easier to revert in isolation in case some
downstream wants to keep using it.

Differential Revision: https://reviews.llvm.org/D92516
2020-12-04 12:34:49 -08:00
Duncan P. N. Exon Smith
f1e94c82c9 ADT: Stop peeking inside AlignedCharArrayUnion, NFC
Update all the users of `AlignedCharArrayUnion` to stop peeking inside
(to look at `buffer`) so that a follow-up patch can replace it with an
alias to `std::aligned_union_t`.

This was reviewed as part of https://reviews.llvm.org/D92512, but I'm
splitting this bit out to commit first to reduce churn in case the
change to `AlignedCharArrayUnion` needs to be reverted for some
unexpected reason.
2020-12-04 11:07:42 -08:00
Hiroshi Yamauchi
e814afa1c5 Fix for Bug 48055.
Differential Revision: https://reviews.llvm.org/D92599
2020-12-04 11:05:01 -08:00
Arthur Eubanks
4fce57098e [NewPM] Make pass adaptors less templatey
Currently PassBuilder.cpp is by far the file that takes longest to
compile. This is due to tons of templates being instantiated per pass.

Follow PassManager by using wrappers around passes to avoid making
the adaptors templated on the pass type. This allows us to move various
adaptors' run methods into .cpp files.

This reduces the compile time of PassBuilder.cpp on my machine from 66
to 39 seconds. It also reduces the size of opt from 685M to 676M.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D92616
2020-12-04 08:30:50 -08:00
Evgeniy Brevnov
c94df02b1d [NFC][NARY-REASSOCIATE] Restructure code to aviod isPotentiallyReassociatable
Currently we have to duplicate the same checks in isPotentiallyReassociatable and tryReassociate. With simple pattern like add/mul this may be not a big deal. But the situation gets much worse when I try to add support for min/max. Min/Max may be represented by several instructions and can take different forms. In order reduce complexity for upcoming min/max support we need to restructure the code a bit to avoid mentioned code duplication.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88286
2020-12-04 16:19:43 +07:00
Evgeniy Brevnov
6aaf92e546 [NARY-REASSOCIATE] Simplify traversal logic by post deleting dead instructions
Currently we delete optimized instructions as we go. That has several negative consequences. First it complicates traversal logic itself. Second if newly generated instruction has been deleted the traversal is repeated from scratch.

But real motivation for the change is upcoming change with support for min/max reassociation. Here we employ SCEV expander to generate code. As a result newly generated instructions may be inserted not right before original instruction (because SCEV may do hoisting) and there is no way to know 'next' instruction.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88285
2020-12-04 16:17:50 +07:00
Kazu Hirata
9e95dea61d [JumpThreading] Call eraseBlock when folding a conditional branch
This patch teaches the jump threading pass to call BPI->eraseBlock
when it folds a conditional branch.

Without this patch, BranchProbabilityInfo could end up with stale edge
probabilities for the basic block containing the conditional branch --
one edge probability with less than 1.0 and the other for a removed
edge.

Differential Revision: https://reviews.llvm.org/D92608
2020-12-03 23:50:17 -08:00
Max Kazantsev
6a8ff5d40d Return "[IndVars] ICmpInst should not prevent IV widening"
This reverts commit 4bd35cdc3ae1874c6d070c5d410b3f591de54ee6.

The patch was reverted during the investigation. The investigation
shown that the patch did not cause any trouble, but just exposed
the existing problem that is addressed by the previous patch
"[IndVars] Quick fix LHS/RHS bug". Returning without changes.
2020-12-04 12:34:43 +07:00
Max Kazantsev
da72560ea4 [IndVars] Quick fix LHS/RHS bug
The code relies on fact that LHS is the NarrowDef but never
really checks it. Adding the conservative restrictive check,
will follow-up with handling of case where RHS is a NarrowDef.
2020-12-04 12:34:42 +07:00
Jianzhou Zhao
37af161d61 [dfsan] Support passing non-i16 shadow values in TLS mode
This is a child diff of D92261.

It extended TLS arg/ret to work with aggregate types.

For a function
  t foo(t1 a1, t2 a2, ... tn an)
Its arguments shadow are saved in TLS args like
  a1_s, a2_s, ..., an_s
TLS ret simply includes r_s. By calculating the type size of each shadow
value, we can get their offset.

This is similar to what MSan does. See __msan_retval_tls and __msan_param_tls
from llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp.

Note that this change does not add test cases for overflowed TLS
arg/ret because this is hard to test w/o supporting aggregate shdow
types. We will be adding them after supporting that.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92440
2020-12-04 02:45:07 +00:00
Philip Reames
17055bcff3 [LoopVec] Support non-instructions as argument to uniform mem ops
The initial step of the uniform-after-vectorization (lane-0 demanded only) analysis was very awkwardly written. It would revisit use list of each pointer operand of a widened load/store. As a result, it was in the worst case O(N^2) where N was the number of instructions in a loop, and had restricted operand Value types to reduce the size of use lists.

This patch replaces the original algorithm with one which is at most O(2N) in the number of instructions in the loop. (The key observation is that each use of a potentially interesting pointer is visited at most twice, once on first scan, once in the use list of *it's* operand. Only instructions within the loop have their uses scanned.)

In the process, we remove a restriction which required the operand of the uniform mem op to itself be an instruction.  This allows detection of uniform mem ops involving global addresses.

Differential Revision: https://reviews.llvm.org/D92056
2020-12-03 14:51:44 -08:00
dfukalov
b944ac9e0a [NFC] Reduce include files dependency.
1. Removed #include "...AliasAnalysis.h" in other headers and modules.
2. Cleaned up includes in AliasAnalysis.h.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92489
2020-12-03 18:25:05 +03:00
Max Kazantsev
78b51b6b79 Revert "[IndVars] ICmpInst should not prevent IV widening"
This reverts commit 0c9c6ddf17bb01ae350a899b3395bb078aa0c62e.

We are seeing some failures with this patch locally. Not clear
if it's causing them or just triggering a problem in another
place. Reverting while investigating.
2020-12-03 18:01:41 +07:00
modimo
5b1e62daa4 [NFC] Fix typo 2020-12-02 22:23:57 -08:00
Jianzhou Zhao
cb92e3d61f [dfsan] Rename ShadowTy/ZeroShadow with prefix Primitive
This is a child diff of D92261.

After supporting field/index-level shadow, the existing shadow with type
i16 works for only primitive types.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92459
2020-12-03 05:31:01 +00:00
Florian Hahn
fad4c5768d [ConstraintElimination] Make sure arguments of std:pow match.
This should fix a build failure on some systems, e.g. solaris11-sparcv9
http://lab.llvm.org:8014/#/builders/22
2020-12-02 22:23:26 +00:00
Hongtao Yu
4cefe8e200 [CSSPGO] Pseudo probes for function calls.
An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work.

One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution.

With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst.

To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against  the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use.

Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`.

Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91756
2020-12-02 13:45:20 -08:00
Jianzhou Zhao
1191fc6062 [dfsan] Rename CachedCombinedShadow to be CachedShadow
At D92261, this type will be used to cache both combined shadow and
converted shadow values.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92458
2020-12-02 21:39:16 +00:00