llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 03:33:20 +01:00

Author	SHA1	Message	Date
Eli Friedman	1e3d2f90ba	[InstCombine] Fix a couple crashes with extractelement on a scalable vector. Differential Revision: https://reviews.llvm.org/D86989	2020-09-02 18:02:07 -07:00
Jordan Rupprecht	e72bcb7267	[NFC] Fix unused var in release build	2020-09-01 13:05:56 -07:00
Florian Hahn	2248875209	[Loads] Add canReplacePointersIfEqual helper. This patch adds an initial, incomeplete and unsound implementation of canReplacePointersIfEqual to check if a pointer value A can be replaced by another pointer value B, that are deemed to be equivalent through some means (e.g. information from conditions). Note that is in general not sound to blindly replace pointers based on equality, for example if they are based on different underlying objects. LLVM's memory model is not completely settled as of now; see https://bugs.llvm.org/show_bug.cgi?id=34548 for a more detailed discussion. The initial version of canReplacePointersIfEqual only rejects a very specific case: replacing a pointer with a constant expression that is not dereferenceable. Such a replacement is problematic and can be restricted relatively easily without impacting most code. Using it to limit replacements in GVN/SCCP/CVP only results in small differences in 7 programs out of MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto. This patch is supposed to be an initial step to improve the current situation and the helper should be made stricter in the future. But this will require careful analysis of the impact on performance. Reviewed By: aqjune Differential Revision: https://reviews.llvm.org/D85524	2020-09-01 20:57:41 +01:00
Alina Sbirlea	deb30c80fe	[MemorySSA] Update phi map with replacement value.	2020-09-01 11:56:40 -07:00
Alina Sbirlea	c9a8d991e6	[MemorySSA] Clean up single value phis. MemoryPhis with a single value are correct, but can lead to errors when updating. Clean up single entry Phis newly added when cloning blocks. Resolves PR46574.	2020-08-31 19:26:08 -07:00
Nikita Popov	b469dc9b48	[InstSimplify] Reduce code duplication in simplifySelectWithICmpCond (NFC) Canonicalize icmp ne to icmp eq and implement all the folds only once.	2020-08-29 22:38:49 +02:00
Nikita Popov	556e0d5173	[InstSimplify] Protect against more poison in SimplifyWithOpReplaced (PR47322) Replace the check for poison-producing instructions in SimplifyWithOpReplaced() with the generic helper canCreatePoison() that properly handles poisonous shifts and thus avoids the problem from PR47322. This additionally fixes a bug in IIQ.UseInstrInfo=false mode, which previously could have caused this code to ignore poison flags. Setting UseInstrInfo=false should reduce the possible optimizations, not increase them. This is not a full solution to the problem, as poison could be introduced more indirectly. This is just a minimal, easy to backport fix. Differential Revision: https://reviews.llvm.org/D86834	2020-08-29 21:59:39 +02:00
Nikita Popov	d37c50d614	[LVI] Remove unnecessary lambda capture (NFC)	2020-08-29 21:33:19 +02:00
Nikita Popov	4e796fd7b6	Reapply [LVI] Normalize pointer behavior This got reverted because a dependency was reverted. It has since been reapplied, so reapply this as well. ----- Related to D69686. As noted there, LVI currently behaves differently for integer and pointer values: For integers, the block value is always valid inside the basic block, while for pointers it is only valid at the end of the basic block. I believe the integer behavior is the correct one, and CVP relies on it via its getConstantRange() uses. The reason for the special pointer behavior is that LVI checks whether a pointer is dereferenced in a given basic block and marks it as non-null in that case. Of course, this information is valid only after the dereferencing instruction, or in conservative approximation, at the end of the block. This patch changes the treatment of dereferencability: Instead of including it inside the block value, we instead treat it as something similar to an assume (it essentially is a non-nullness assume) and incorporate this information in intersectAssumeOrGuardBlockValueConstantRange() if the context instruction is the terminator of the basic block. This happens either when determining an edge-value internally in LVI, or when a terminator was explicitly passed to getValueAt(). The latter case makes this more powerful than the previous implementation as a side-effect, and this does actually seem benefitial in practice. Of course, we do not want to recompute dereferencability on each intersectAssume call, so we need a new cache for this. The dereferencability analysis requires walking the entire basic block and computing underlying objects of all memory operands. This was previously done separately for each queried pointer value. In the new implementation (both because this makes the caching simpler, and because it is faster), I instead only walk the full BB once and cache all the dereferenced pointers. So the traversal is now performed only once per BB, instead of once per queried pointer value. I think the overall model now makes more sense than before, and there will be no more pitfalls due to differing integer/pointer behavior. Differential Revision: https://reviews.llvm.org/D69914	2020-08-29 21:17:03 +02:00
Roman Lebedev	2a46a04b28	[NFC][InstructionSimplify] Add a warning about not simplifying to not def-reachable See https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200824/824235.html and https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200824/824967.html InstSimply is not allowed to perform simplifications to instructions that are not def-reachable from the original instruction.	2020-08-29 09:58:08 +03:00
Owen Anderson	df34423d50	Revert "[InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block" This reverts commit 6102310d814ad73eab60a88b21dd70874f7a056f. It appears to cause compilation non-determinism and caused stage3 mismatches.	2020-08-28 23:43:42 +00:00
serge-sans-paille	41f672146f	Skip analysis re-computation when no changes are reported This is a follow-up to https://reviews.llvm.org/D80707, generalized to CallGraphSCC, Loop and Region Differential Revision: https://reviews.llvm.org/D86442	2020-08-28 21:41:01 +02:00
David Sherwood	56b8c35591	[SVE] Make ElementCount members private This patch changes ElementCount so that the Min and Scalable members are now private and can only be accessed via the get functions getKnownMinValue() and isScalable(). In addition I've added some other member functions for more commonly used operations. Hopefully this makes the class more useful and will reduce the need for calling getKnownMinValue(). Differential Revision: https://reviews.llvm.org/D86065	2020-08-28 14:43:53 +01:00
Florian Hahn	1e84da9f17	[MemLoc] Support memcmp in MemoryLocation::getForArgument. This patch adds support for memcmp in MemoryLocation::getForArgument. memcmp reads from the first 2 arguments up to the number of bytes of the third argument. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86725	2020-08-28 10:19:54 +01:00
Martin Storsjö	47182191ac	[ValueTracking] Remove a stray semicolon. NFC. This silences warnings when built with GCC at least.	2020-08-28 09:24:10 +03:00
serge-sans-paille	a12b4db565	(Expensive) Check for Loop, SCC and Region pass return status This generalizes the logic introduced in https://reviews.llvm.org/D80916 to other passes. It's needed by https://reviews.llvm.org/D86442 to assert passes correctly report their status. Differential Revision: https://reviews.llvm.org/D86589	2020-08-28 07:56:35 +02:00
Alina Sbirlea	ee5e19fe38	[MemorySSA] Assert defining access is not a MemoryUse.	2020-08-27 18:21:10 -07:00
Vitaly Buka	2df7657efb	[ValueTracking] Replace recursion with Worklist Now findAllocaForValue can handle nontrivial phi cycles.	2020-08-27 14:44:49 -07:00
Vitaly Buka	1490706ac2	[StackSafety] Ignore allocas with partial lifetime markers Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D86672	2020-08-27 13:54:41 -07:00
Vitaly Buka	4f88f9a474	[NFC][ValueTracking] Add OffsetZero into findAllocaForValue For StackLifetime after finding alloca we need to check that values ponting to the begining of alloca. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D86692	2020-08-27 13:46:22 -07:00
Roman Lebedev	0c6e84d654	[InstSimplify] SimplifyPHINode(): check that instruction is in basic block first As pointed out in post-commit review, this can legally be called on instructions that are not inserted into basic blocks, so don't blindly assume that there is basic block.	2020-08-27 22:32:03 +03:00
Simon Moll	2afc6e83ea	[sda][nfc] clang-formatting	2020-08-27 18:27:44 +02:00
Roman Lebedev	2088bfe3c4	[InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block Apparently, we don't do this, neither in EarlyCSE, nor in InstSimplify, nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places.. While i could teach EarlyCSE how to hash PHI nodes, we can't really do much (anything?) even if we find two identical PHI nodes in different basic blocks, same-BB case is the interesting one, and if we teach InstSimplify about it (which is what i wanted originally, https://reviews.llvm.org/D86530), we get EarlyCSE support for free. So i would think this is pretty uncontroversial. On vanilla llvm test-suite + RawSpeed, this has the following effects: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \\|%\\| \| \|----------------------------------------------------\|-----------\|-----------\|-------:\|---------:\|---------:\| \| instsimplify.NumPHICSE \| 0 \| 23779 \| 23779 \| 0.00% \| 0.00% \| \| asm-printer.EmittedInsts \| 7942328 \| 7942392 \| 64 \| 0.00% \| 0.00% \| \| assembler.ObjectBytes \| 273069192 \| 273084704 \| 15512 \| 0.01% \| 0.01% \| \| correlated-value-propagation.NumPhis \| 18412 \| 18539 \| 127 \| 0.69% \| 0.69% \| \| early-cse.NumCSE \| 2183283 \| 2183227 \| -56 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 550105 \| 542090 \| -8015 \| -1.46% \| 1.46% \| \| instcombine.NumAggregateReconstructionsSimplified \| 73 \| 4506 \| 4433 \| 6072.60% \| 6072.60% \| \| instcombine.NumCombined \| 3640264 \| 3664769 \| 24505 \| 0.67% \| 0.67% \| \| instcombine.NumDeadInst \| 1778193 \| 1783183 \| 4990 \| 0.28% \| 0.28% \| \| instcount.NumCallInst \| 1758401 \| 1758799 \| 398 \| 0.02% \| 0.02% \| \| instcount.NumInvokeInst \| 59478 \| 59502 \| 24 \| 0.04% \| 0.04% \| \| instcount.NumPHIInst \| 330557 \| 330533 \| -24 \| -0.01% \| 0.01% \| \| instcount.TotalInsts \| 8831952 \| 8832286 \| 334 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 4300 \| 4410 \| 110 \| 2.56% \| 2.56% \| \| simplifycfg.NumSimpl \| 1019808 \| 999607 \| -20201 \| -1.98% \| 1.98% \| ``` I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` -> `call` transforms, and counter-intuitively results in more instructions total. That being said, the PHI count doesn't decrease that much, and looking at some examples, it seems at least some of them were previously getting PHI CSE'd in SimplifyCFG of all places.. I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same time. As a comment in `InstCombinerImpl::visitPHINode()` already stated, there are no guarantees on the ordering of the operands of a PHI node, so if we just naively compare them, we may false-negatively say that the nodes are not equal when the only difference is operand order, which is especially important since the fold is in InstSimplify, so we can't rely on InstCombine sorting them beforehand. Fixing this for the general case is costly (geomean +0.02%), and does not appear to catch anything in test-suite, but for the same-BB case, it's trivial, so let's fix at least that. As per http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions this appears to cause geomean +0.03% compile time increase (regression), but geomean -0.01%..-0.04% code size decrease (improvement).	2020-08-27 18:47:04 +03:00
Vitaly Buka	afd0abc1f8	[ValueTracking] Support select in findAllocaForValue	2020-08-27 02:13:52 -07:00
Nikita Popov	d8d374b2b3	[InstSimplify] Fold min/max intrinsic based on icmp of operands This is a reboot of D84655, now performing the inner icmp simplification query without undef folds. It should be possible to handle the current foldMinMaxSharedOp() fold based on this, by moving the logic into icmp of min/max instead, making it more general. We can't drop the folds for constant operands, because those also allow undef, which we exclude here. The tests use assumes for exhaustive coverage, and have a few more examples of misc folds we get based on icmp simplification. Differential Revision: https://reviews.llvm.org/D85929	2020-08-26 22:02:57 +02:00
Arthur Eubanks	4c381d496d	[InstSimplify] Simplify to vector constants when possible InstSimplify should do all transformations that ConstProp does, but one thing that ConstProp does that InstSimplify wouldn't is inline vector instructions that are constants, e.g. into a ret. Previously vector instructions wouldn't be inlined in InstSimplify because llvm::Simplify*Instruction() would return nullptr for specific instructions, such as vector instructions that were actually constants, if it couldn't simplify them. This changes SimplifyInsertElementInst, SimplifyExtractElementInst, and SimplifyShuffleVectorInst to return a vector constant when possible. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85946	2020-08-26 11:40:36 -07:00
Juneyoung Lee	5d4d13a2ae	[IR] Add NoUndef attribute to Intrinsics.td This patch adds NoUndef to Intrinsics.td. The attribute is attached to llvm.assume's operand, because llvm.assume(undef) is UB. It is attached to pointer operands of several memory accessing intrinsics as well. This change makes ValueTracking::getGuaranteedNonPoisonOps' intrinsic check unnecessary, so it is removed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86576	2020-08-27 02:54:48 +09:00
Mircea Trofin	b186c6758c	[MLInliner] Simplify TFUTILS_SUPPORTED_TYPES We only need the C++ type and the corresponding TF Enum. The other parameter was used for the output spec json file, but we can just standardize on the C++ type name there. Differential Revision: https://reviews.llvm.org/D86549	2020-08-25 14:19:39 -07:00
Juneyoung Lee	65c4cb9e7b	[ValueTracking] Let getGuaranteedNonPoisonOp find multiple non-poison operands This patch helps getGuaranteedNonPoisonOp find multiple non-poison operands. Instead of special-casing llvm.assume, I think it is also a viable option to add noundef to Intrinsics.td. If it makes sense, I'll make a patch for that. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86477	2020-08-26 04:40:21 +09:00
Nikita Popov	f861b3c934	[MemDep] Use BatchAA when computing pointer dependencies We're not changing IR while running a single MemDep query, so it's safe to cache alias analysis results using BatchAA. This adds BatchAA usage to getSimplePointerDependencyFrom(), which is non-intrusive -- covering larger parts (like a whole processNonLocalLoad query) is also possible, but requires threading BatchAA through a bunch of APIs. For the ThinLTO configuration, this is a 1% geomean improvement on CTMark. Differential Revision: https://reviews.llvm.org/D85583	2020-08-25 21:34:34 +02:00
Ta-Wei Tu	9380a91af9	[LoopNest] False negative of `arePerfectlyNested` with LCSSA loops Summary: The LCSSA pass (required for all loop passes) sometimes adds additional blocks containing LCSSA variables, and checkLoopsStructure may return false even when the loops are perfectly nested in this case. This is because the successor of the exit block of the inner loop now points to the LCSSA block instead of the latch block of the outer loop. Examples are shown in the test nests-with-lcssa.ll. To fix the issue, the successor of the exit block of the inner loop can now point to a block in which all instructions are LCSSA phi node (except the terminator), and the sole successor of that block should point to the latch block of the outer loop. Reviewed By: Whitney, etiotto Differential Revision: https://reviews.llvm.org/D86133	2020-08-25 16:20:52 +00:00
Mircea Trofin	9c71d4e1d1	[MLInliner] Support training that doesn't require partial rewards If we use training algorithms that don't need partial rewards, we don't need to worry about an ir2native model. In that case, training logs won't contain a 'delta_size' feature either (since that's the partial reward). Differential Revision: https://reviews.llvm.org/D86481	2020-08-24 17:36:29 -07:00
Sam Parker	0828b21ed0	[SCEV] Still trying to fix windows buildbots	2020-08-24 10:26:48 +01:00
Alina Sbirlea	d3e372ea8f	[DomTree] Extend update API to allow a post CFG view. Extend the `applyUpdates` in DominatorTree to allow a post CFG view, different from the current CFG. This patch implements the functionality of updating an already up to date DT, to the desired PostCFGView. Combining a set of updates towards an up to date DT and a PostCFGView is not yet supported. Differential Revision: https://reviews.llvm.org/D85472	2020-08-21 17:23:08 -07:00
Arthur Eubanks	5e4555a20b	[opt][NewPM] Add basic-aa in legacy PM compatibility mode The legacy PM alias analysis pipeline by default includes basic-aa. When running `opt -foo-pass` under the NPM and -disable-basic-aa is not specified, use basic-aa. This decreases the number of check-llvm failures under NPM from 913 to 752. Reviewed By: ychen, asbirlea Differential Revision: https://reviews.llvm.org/D86167	2020-08-21 14:05:07 -07:00
Roman Lebedev	2893862e92	[NFC] Port InstCount pass to new pass manager	2020-08-21 12:39:42 +03:00
Yevgeny Rouban	1bdf10a116	[NewPM][PassInstrumentation] Add PreservedAnalyses parameter to AfterPass* callbacks Both AfterPass and AfterPassInvalidated pass instrumentation callbacks get additional parameter of type PreservedAnalyses. This patch was created by @fedor.sergeev. I have just slightly changed it. Reviewers: fedor.sergeev Differential Revision: https://reviews.llvm.org/D81555	2020-08-21 16:10:42 +07:00
David Green	53fac1f9ad	[ARM][LV] Add a preferPredicatedReductionSelect target hook As part of D84741, this adds a target hook for the preferPredicatedReductionSelect option and makes use of it under MVE, allowing us to tail predicate most reduction loops. Differential Revision: https://reviews.llvm.org/D85980	2020-08-21 08:48:12 +01:00
Sanjay Patel	042574c236	[ValueTracking] define/use max recursion depth in header There's a potential motivating case to increase this limit in PR47191: http://bugs.llvm.org/PR47191 But first we should make it less hacky. The limit in InstCombine is directly tied to this value because an increase there can cause asserts in the underlying value tracking calls if not changed together. The usage in VectorUtils is independent, but the comment suggests that we should use the same value unless there's a known reason to diverge. There are similar limits in codegen analysis, but I think we should leave those independent in case we intentionally want the optimization power/cost to be different there. Differential Revision: https://reviews.llvm.org/D86113	2020-08-19 16:56:59 -04:00
Mehdi Amini	db235b2187	Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private."" Was reverted because MLIR/Flang builds were broken, these APIs have been fixed in the meantime.	2020-08-19 17:26:36 +00:00
Mehdi Amini	4386b1823a	Revert "[NFC][llvm] Make the contructors of `ElementCount` private." This reverts commit 264afb9e6aebc98c353644dd0700bec808501cab. (and dependent 6b742cc48 and fc53bd610f) MLIR/Flang are broken.	2020-08-19 17:21:37 +00:00
Francesco Petrogalli	d75808bc7f	[NFC][llvm] Make the contructors of `ElementCount` private. Differential Revision: https://reviews.llvm.org/D86120	2020-08-19 16:26:44 +00:00
Mircea Trofin	3359e4e021	[MLInliner] In development mode, obtain the output specs from a file Different training algorithms may produce models that, besides the main policy output (i.e. inline/don't inline), produce additional outputs that are necessary for the next training stage. To facilitate this, in development mode, we require the training policy infrastructure produce a description of the outputs that are interesting to it, in the form of a JSON file. We special-case the first entry in the JSON file as the inlining decision - we care about its value, so we can guide inlining during training - but treat the rest as opaque data that we just copy over to the training log. Differential Revision: https://reviews.llvm.org/D85674	2020-08-17 16:56:47 -07:00
Tyker	dfa36c6cdf	[AssumeBundles] Fix Bug in Assume Queries this bug was causing miscompile. now clang cant properly selfhost with -mllvm --enable-knowledge-retention Reviewed By: jdoerfert, lebedev.ri Differential Revision: https://reviews.llvm.org/D83507	2020-08-17 21:36:53 +02:00
Dávid Bolvanský	26599cbe3f	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit 50c743fa713002fe4e0c76d23043e6c1f9e9fe6f. Patch will be split to smaller ones.	2020-08-17 20:44:33 +02:00
Simon Pilgrim	8366289c89	[DemandedBits] Improve accuracy of Add propagator The current demand propagator for addition will mark all input bits at and right of the alive output bit as alive. But carry won't propagate beyond a bit for which both operands are zero (or one/zero in the case of subtraction) so a more accurate answer is possible given known bits. I derived a propagator by working through truth tables and using a bit-reversed addition to make demand ripple to the right, but I'm not sure how to make a convincing argument for its correctness in the comments yet. Nevertheless, here's a minimal implementation and test to get feedback. This would help in a situation where, for example, four bytes (<128) packed into an int are added with four others SIMD-style but only one of the four results is actually read. Known A: 0_______0_______0_______0_______ Known B: 0_______0_______0_______0_______ AOut: 00000000001000000000000000000000 AB, current: 00000000001111111111111111111111 AB, patch: 00000000001111111000000000000000 Committed on behalf of: @rrika (Erika) Differential Revision: https://reviews.llvm.org/D72423	2020-08-17 12:54:09 +01:00
Cullen Rhodes	ed95f77522	[InlineCost] Fix scalable vectors in visitAlloca Discovered as part of the VLS type work (see D85128). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85848	2020-08-17 10:34:27 +00:00
Vitaly Buka	42145d2b15	[NFC][StackSafety] Move out sort from the loop	2020-08-17 03:30:14 -07:00
Vitaly Buka	6f71d99b21	[StackSafety] Skip ambiguous lifetime analysis If we can't identify alloca used in lifetime marker we need to assume to worst case scenario. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D84630	2020-08-16 18:05:52 -07:00
Vitaly Buka	7e7fd55416	[StackSafety] Change how callee searched in index Handle other than local linkage types.	2020-08-16 04:37:19 -07:00
Wenlei He	8c3d7a1d09	[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context. A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose. This is a resubmit of https://reviews.llvm.org/D83743	2020-08-15 20:17:21 -07:00
Vitaly Buka	320ac778a0	[StackSafety] Use ValueInfo in ParamAccess::Call This avoid GUID lookup in Index.findSummaryInModule. Follow up for D81242. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D85269	2020-08-14 12:42:44 -07:00
Matt Morehouse	9fdad1bbaa	Revert "[NFC][StackSafety] Move out sort from the loop" This reverts commit 0426e28419799c35cf52fe3d773c5bab9928c699 due to ASan buildbot failure.	2020-08-14 08:17:35 -07:00
Vitaly Buka	9ad15cd174	[NFC][StackSafety] Change map key comparison	2020-08-14 04:23:15 -07:00
Vitaly Buka	aecc9e0fb0	[NFC][StackSafety] Move out sort from the loop	2020-08-14 04:19:10 -07:00
Vitaly Buka	f936ac90a2	[NFC][StackSafety] Dedup callees	2020-08-14 01:14:52 -07:00
Dávid Bolvanský	7129f2d26c	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781	2020-08-13 19:54:27 +02:00
Simon Pilgrim	f02d625e34	Fix unused variable warning. NFC. Reduce the dyn_cast<> to a isa<> as that's all non-assert builds require, and move the cast<> inside the assert.	2020-08-13 15:43:20 +01:00
Dávid Bolvanský	baa55bd4d6	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit 44587e2f7e732604cd6340061d40ac21e7e188e5. Sanitizer tests need to be updated.	2020-08-13 14:37:40 +02:00
Dávid Bolvanský	f4c1a714d0	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781	2020-08-13 14:23:58 +02:00
Dávid Bolvanský	aecc53e597	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit 385c9d673f217e176b18e7bf6fe055154ac589c6.	2020-08-13 12:59:15 +02:00
Dávid Bolvanský	b38379d5d6	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781	2020-08-13 12:45:40 +02:00
Ali Tamur	e7d6dfa5d7	Revert "[SCEV] Look through single value PHIs." This reverts commit e441b7a7a0a72c28daf5a8e594559c667e5b4534. This patch causes a compile error in tensorflow opensource project. The stack trace looks like: Point of crash: llvm/include/llvm/Analysis/LoopInfoImpl.h : line 35 (gdb) ptype this type = const class llvm::LoopBase<llvm::BasicBlock, llvm::Loop> [with BlockT = llvm::BasicBlock, LoopT = llvm::Loop] (gdb) p this $1 = {ParentLoop = 0x0, SubLoops = std::vector of length 0, capacity 0, Blocks = std::vector of length 0, capacity 1, DenseBlockSet = {<llvm::SmallPtrSetImpl<llvm::BasicBlock const>> = {<llvm::SmallPtrSetImplBase> = {<llvm::DebugEpochBase> = {Epoch = 3}, SmallArray = 0x1b2bf6c8, CurArray = 0x1b2bf6c8, CurArraySize = 8, NumNonEmpty = 0, NumTombstones = 0}, <No data fields>}, SmallStorage = {0xfffffffffffffffe, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}, IsInvalid = true} (gdb) p this->DenseBlockSet->CurArray $2 = (const void *) 0xfffffffffffffffe I will try to get a case from tensorflow or use creduce to get a small case.	2020-08-12 23:13:24 -07:00
Nikita Popov	3cea08454d	[ValueTracking] Add abs intrinsics support to computeConstantRange() Implementation is the same as for SPF_ABS.	2020-08-12 22:28:46 +02:00
Nikita Popov	2a8504a89f	[ValueTracking] Support min/max intrinsics in computeConstantRange() The implementation is the same as for the SPF_* case.	2020-08-12 22:07:29 +02:00
Craig Topper	0308690c50	Recommit "[InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms" and its follow up patches This recommits the following patches now that D85684 has landed 1cf6f210a2e [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison. 469da663f2d [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison 122b0640fc9 [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison ac0af12ed2f [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison 9b1e95329af [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms	2020-08-12 10:45:27 -07:00
Florian Hahn	371c1e57df	[SCEV] Look through single value PHIs. Now that SCEVExpander can preserve LCSSA form, we do not have to worry about LCSSA form when trying to look through PHIs. SCEVExpander will take care of inserting LCSSA PHI nodes as required. This increases precision of the analysis in some cases. Reviewed By: mkazantsev, bmahjour Differential Revision: https://reviews.llvm.org/D71539	2020-08-12 10:03:42 +01:00
Nikita Popov	587bdc1d95	[InstSimplify] Respect CanUseUndef in more places Similar to what we do in IIQ, add an isUndefValue() helper that checks for undef values while respective CanUseUndef. This makes it much easier to search for places that don't respect the flag yet.	2020-08-11 21:53:33 +02:00
Dávid Bolvanský	ee8d84179f	[BPI] Teach BPI about bcmp function bcmp is similar to memcmp	2020-08-11 20:44:53 +02:00
Nikita Popov	577d874016	[InstSimplify] Forbid undef folds in expandBinOp This is the replacement for D84250 based on D84792. As we recursively fold with the same value twice, we need to disable undef folds, to prevent an undef from being folded to two different values. Reverting rG00f3579aea6e3d4a4b7464c3db47294f71cef9e4 and using the test case from https://reviews.llvm.org/D83360#2145793, it no longer performs the incorrect fold. Differential Revision: https://reviews.llvm.org/D85684	2020-08-11 18:39:24 +02:00
Sanjay Patel	5b7d18ac79	[InstSimplify] fold min/max with matching min/max operands I think this is the last remaining translation of an existing instcombine transform for the corresponding cmp+sel idiom. This interpretation is more general though - we can remove mismatched signed/unsigned combinations in addition to the more obvious cases. min/max(X, Y) must produce X or Y as the result, so this is just another clause in the existing transform that was already matching a min/max of min/max.	2020-08-11 11:23:15 -04:00
Florian Hahn	fc2f262900	[SCEV] ] If RHS >= Start, simplify (Start smax RHS) to RHS for trip counts. This is the max version of D85046. This change causes binary changes in 44 out of 237 benchmarks (out of MultiSource/SPEC2000/SPEC2006) Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D85189	2020-08-11 13:20:24 +01:00
Juneyoung Lee	6e8034136f	[LazyValueInfo] Let getEdgeValueLocal look into freeze instructions This patch makes getEdgeValueLocal more precise when a freeze instruction is given, by adding support for freeze into constantFoldUser Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84629	2020-08-11 16:39:34 +09:00
Thomas Lively	1da4ba1036	[WebAssembly][ConstantFolding] Fold fp-to-int truncation intrinsics Constant fold both the trapping and saturating versions of the WebAssembly truncation intrinsics. The tests are adapted from the WebAssembly spec tests for the corresponding instructions. Requested in PR46982. Differential Revision: https://reviews.llvm.org/D85392	2020-08-10 12:40:05 -07:00
Mircea Trofin	0ec177f3b4	[NFC][MLInliner] remove curly braces for a few sinle-line loops	2020-08-10 09:32:21 -07:00
Mircea Trofin	f46e151000	[NFC][MLInliner] Set up the logger outside the development mode advisor This allows us to subsequently configure the logger for the case when we use a model evaluator and want to log additional outputs. Differential Revision: https://reviews.llvm.org/D85577	2020-08-10 09:22:17 -07:00
Vitaly Buka	82a20d8b12	[NFC][StackSafety] Add a couple of early returns	2020-08-09 23:42:09 -07:00
Vitaly Buka	4f3a3f0cc2	[NFC][StackSafety] Count dataflow inputs	2020-08-09 23:32:41 -07:00
Vitaly Buka	55c588271d	[StackSafety] Fix union which produces wrapped sets	2020-08-09 23:20:17 -07:00
Vitaly Buka	6795706f92	[NFC][StackSafety] Avoid assert in getBaseObjec	2020-08-09 23:20:17 -07:00
Vitaly Buka	00ccee5400	[StackSafety] Don't keep FullSet in index Optimization. Missing record is enterpreted as FullSet anyway.	2020-08-09 15:01:46 -07:00
Florian Hahn	564f5c4ac7	[InstSimplify/NewGVN] Add option to control the use of undef. Making use of undef is not safe if the simplification result is not used to replace all uses of the result. This leads to problems in NewGVN, which does not replace all uses in the IR directly. See PR33165 for more details. This patch adds an option to SimplifyQuery to disable the use of undef. Note that I've only guarded uses if isa<UndefValue>/m_Undef where SimplifyQuery is currently available. If we agree on the general direction, I'll update the remaining uses. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84792	2020-08-09 19:16:56 +01:00
Vitaly Buka	3e043ca45f	[NFC][StackSafety] Fix statistics	2020-08-07 16:18:52 -07:00
Mircea Trofin	be6dadeffd	[NFC][MLInliner] Refactor logging implementation This prepares it for logging externally-specified outputs. Differential Revision: https://reviews.llvm.org/D85451	2020-08-07 14:56:56 -07:00
Vitaly Buka	6669d78639	Revert "[StackSafety] Skip ambiguous lifetime analysis" This reverts commit 0b2616a8045cb776ea1514c3401d0a8577de1060. Crashes with safe-stack.	2020-08-07 14:02:50 -07:00
Vitaly Buka	ae3ad63414	[StackSafety,NFC] Add Stats counters	2020-08-07 14:02:50 -07:00
Vitaly Buka	f8fc07d99e	[StackSafety,NFC] Fix tests in debug	2020-08-06 20:46:39 -07:00
Vitaly Buka	15a533fc32	[StackSafety,NFC] Add debug counters	2020-08-06 19:24:02 -07:00
Vitaly Buka	4124f20201	[StackSafety,NFC] Use CHECK-EMPTY in tests	2020-08-06 19:19:51 -07:00
Vitaly Buka	3b944733de	[StackSafety] Skip ambiguous lifetime analysis If we can't identify alloca used in lifetime marker we need to assume to worst case scenario. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D84630	2020-08-06 19:10:33 -07:00
Vitaly Buka	567e88646c	[LTO,NFC] Skip generateParamAccessSummary when empty addGlobalValueSummary can check newly added FunctionSummary and set HasParamAccess to mark that generateParamAccessSummary is needed. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D85182	2020-08-06 19:01:19 -07:00
Jessica Paquette	e43ddd00f6	[GlobalISel] Fix computing known bits for loads with range metadata In GlobalISel, if you have a load into a small type with a range, you'll hit an assert if you try to compute known bits on it starting at a larger type. e.g. ``` %x:_(s8) = G_LOAD %whatever(p0) :: (load 1 ... !range !n) ... %y:_(s32) = G_SOMETHING %x ``` When we walk through G_SOMETHING and hit the load, the width of our known bits is 32. However, the width of the range is going to be 8. This will cause us to hit an assert. To fix this, make computeKnownBitsFromRangeMetadata zero extend or truncate the range type to match the bitwidth of the known bits we're calculating. Add a testcase in CodeGen/GlobalISel/KnownBitsTest.cpp to reflect that this works now. https://reviews.llvm.org/D85375	2020-08-06 16:47:07 -07:00
Sanjay Patel	01843e6c65	[InstSimplify] avoid crashing by trying to rem-by-zero Bug was noted in the post-commit comments for: rGe8760bb9a8a3	2020-08-06 16:06:31 -04:00
Mircea Trofin	1f12cdca6a	[NFC]{MLInliner] Point out the tests' model dependencies	2020-08-06 09:57:26 -07:00
Mircea Trofin	b790a0e2cc	[llvm][MLInliner] Don't log 'mandatory' events We don't want mandatory events in the training log. We do want to handle them, to keep the native size accounting accurate, but that's all. Fixed the code, also expanded the test to capture this. Differential Revision: https://reviews.llvm.org/D85373	2020-08-06 09:04:15 -07:00
David Green	77d21dcd3f	[LoopVectorizer] Inloop vector reductions Arm MVE has multiple instructions such as VMLAVA.s8, which (in this case) can take two 128bit vectors, sign extend the inputs to i32, multiplying them together and sum the result into a 32bit general purpose register. So taking 16 i8's as inputs, they can multiply and accumulate the result into a single i32 without any rounding/truncating along the way. There are also reduction instructions for plain integer add and min/max, and operations that sum into a pair of 32bit registers together treated as a 64bit integer (even though MVE does not have a plain 64bit addition instruction). So giving the vectorizer the ability to use these instructions both enables us to vectorize at higher bitwidths, and to vectorize things we previously could not. In order to do that we need a way to represent that the reduction operation, specified with a llvm.experimental.vector.reduce when vectorizing for Arm, occurs inside the loop not after it like most reductions. This patch attempts to do that, teaching the vectorizer about in-loop reductions. It does this through a vplan recipe representing the reductions that the original chain of reduction operations is replaced by. Cost modelling is currently just done through a prefersInloopReduction TTI hook (which follows in a later patch). Differential Revision: https://reviews.llvm.org/D75069	2020-08-06 10:10:50 +01:00
Sanjay Patel	82aa9a6b56	[InstSimplify] fold icmp with mul nsw and constant operands https://rise4fun.com/Alive/slvl Name: mul nsw with icmp eq Pre: (C2 % C1) != 0 %a = mul nsw i8 %x, C1 %r = icmp eq i8 %a, C2 => %r = false Name: mul nsw with icmp ne Pre: (C2 % C1) != 0 %a = mul nsw i8 %x, C1 %r = icmp ne i8 %a, C2 => %r = true Follow-up to the 'nuw' variation added with: rGf879c9b79621	2020-08-05 14:38:39 -04:00
Sanjay Patel	60815c576a	[InstSimplify] fold icmp with mul nuw and constant operands https://rise4fun.com/Alive/pZEr Name: mul nuw with icmp eq Pre: (C2 %u C1) != 0 %a = mul nuw i8 %x, C1 %r = icmp eq i8 %a, C2 => %r = false Name: mul nuw with icmp ne Pre: (C2 %u C1) != 0 %a = mul nuw i8 %x, C1 %r = icmp ne i8 %a, C2 => %r = true There are potentially several other transforms we need to add based on: D51625 ...but it doesn't look like there was follow-up to that patch.	2020-08-05 14:32:17 -04:00
Jordan Rupprecht	eb9074b6d8	Revert "[LoopVectorizer] Inloop vector reductions" This reverts commit e9761688e41cb979a1fa6a79eb18145a75104933. It breaks the build: ``` ~/src/llvm-project/llvm/lib/Analysis/IVDescriptors.cpp:868:10: error: no viable conversion from returned value of type 'SmallVector<[...], 8>' to function return type 'SmallVector<[...], 4>' return ReductionOperations; ```	2020-08-05 10:24:15 -07:00
Mircea Trofin	3bd1a7f753	[TFUtils] Expose untyped accessor to evaluation result tensors These were implementation detail, but become necessary for generic data copying. Also added const variations to them, and move assignment, since we had a move ctor (and the move assignment helps in a subsequent patch). Differential Revision: https://reviews.llvm.org/D85262	2020-08-05 10:22:45 -07:00
David Green	8e671cc375	[LoopVectorizer] Inloop vector reductions Arm MVE has multiple instructions such as VMLAVA.s8, which (in this case) can take two 128bit vectors, sign extend the inputs to i32, multiplying them together and sum the result into a 32bit general purpose register. So taking 16 i8's as inputs, they can multiply and accumulate the result into a single i32 without any rounding/truncating along the way. There are also reduction instructions for plain integer add and min/max, and operations that sum into a pair of 32bit registers together treated as a 64bit integer (even though MVE does not have a plain 64bit addition instruction). So giving the vectorizer the ability to use these instructions both enables us to vectorize at higher bitwidths, and to vectorize things we previously could not. In order to do that we need a way to represent that the reduction operation, specified with a llvm.experimental.vector.reduce when vectorizing for Arm, occurs inside the loop not after it like most reductions. This patch attempts to do that, teaching the vectorizer about in-loop reductions. It does this through a vplan recipe representing the reductions that the original chain of reduction operations is replaced by. Cost modelling is currently just done through a prefersInloopReduction TTI hook (which follows in a later patch). Differential Revision: https://reviews.llvm.org/D75069	2020-08-05 18:14:05 +01:00
Sanjay Patel	fb7a244b8e	[InstSimplify] reduce code duplication in simplifyICmpWithMinMax(); NFC	2020-08-05 11:39:28 -04:00
Evgeniy Brevnov	08e662f69a	[BPI][NFC] Unify handling of normal and SCC based loops This is one more NFC part extracted from D79485. Normal and SCC based loops have very different representation and have to be handled separatly each time we deal with loops. D79485 is going to introduce much more extensive use of loops what will be problematic with out this change. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D84838	2020-08-05 11:19:24 +07:00
Mircea Trofin	96e978c534	[llvm] Expose type and element count-related APIs on TensorSpec Added a mechanism to check the element type, get the total element count, and the size of an element. Differential Revision: https://reviews.llvm.org/D85250	2020-08-04 17:32:16 -07:00
Mircea Trofin	d0d6d07a22	[llvm][NFC] Moved implementation of TrainingLogger outside of its decl Also renamed a method - printTensor - to print; and added comments.	2020-08-04 14:35:35 -07:00
Xavier Denis	fc5d02688b	[InstSimplify] Peephole optimization for icmp (urem X, Y), X This revision adds the following peephole optimization and it's negation: %a = urem i64 %x, %y %b = icmp ule i64 %a, %x ====> %b = true With John Regehr's help this optimization was checked with Alive2 which suggests it should be valid. This pattern occurs in the bound checks of Rust code, the program const N: usize = 3; const T = u8; pub fn split_mutiple(slice: &[T]) -> (&[T], &[T]) { let len = slice.len() / N; slice.split_at(len * N) } the method call slice.split_at will check that len * N is within the bounds of slice, this bounds check is after some transformations turned into the urem seen above and then LLVM fails to optimize it any further. Adding this optimization would cause this bounds check to be fully optimized away. ref: https://github.com/rust-lang/rust/issues/74938 Differential Revision: https://reviews.llvm.org/D85092	2020-08-04 20:48:37 +02:00
Sanjay Patel	d960c0f146	[InstSimplify] refactor min/max folds with shared operand; NFC	2020-08-04 12:21:05 -04:00
Sanjay Patel	46f44cd641	[InstSimplify] fold nested min/max intrinsics with constant operands This is based on the existing code for the non-intrinsic idioms in InstCombine. The vector constant constraint is non-obvious: undefs should be ok in the outer call, but they can't propagate safely from the inner call in all cases. Example: https://alive2.llvm.org/ce/z/-2bVbM define <2 x i8> @src(<2 x i8> %x) { %0: %m = umin <2 x i8> %x, { 7, undef } %m2 = umin <2 x i8> { 9, 9 }, %m ret <2 x i8> %m2 } => define <2 x i8> @tgt(<2 x i8> %x) { %0: %m = umin <2 x i8> %x, { 7, undef } ret <2 x i8> %m } Transformation doesn't verify! ERROR: Value mismatch Example: <2 x i8> %x = < undef, undef > Source: <2 x i8> %m = < #x00 (0) [based on undef value], #x00 (0) > <2 x i8> %m2 = < #x00 (0), #x00 (0) > Target: <2 x i8> %m = < #x07 (7), #x10 (16) > Source value: < #x00 (0), #x00 (0) > Target value: < #x07 (7), #x10 (16) >	2020-08-04 08:44:48 -04:00
Sanjay Patel	5cdd72d32f	[InstSimplify] reduce code for min/max analysis; NFC This should probably be moved up to some common area eventually when there's another user.	2020-08-04 08:02:33 -04:00
David Green	9fdafb1a2b	[BasicAA] Enable -basic-aa-recphi by default This option was added a while back, to help improve AA around pointer phi loops. It looks for phi(gep(phi, const), x) loops, checking if x can then prove more precise aliasing info. Differential Revision: https://reviews.llvm.org/D82998	2020-08-04 10:43:42 +01:00
Alina Sbirlea	592b072474	[MemorySSA] Restrict optimizations after a PhiTranslation. Merging alias results from different paths, when a path did phi translation is not necesarily correct. Conservatively terminate such paths. Aimed to fix PR46156. Differential Revision: https://reviews.llvm.org/D84905	2020-08-03 14:46:41 -07:00
Sanjay Patel	e545f2a08e	[InstSimplify] fold variations of max-of-min with common operand https://alive2.llvm.org/ce/z/ZtxpZ3	2020-08-03 15:02:46 -04:00
Mircea Trofin	1cbf2902fb	[llvm] Add a parser from JSON to TensorSpec A JSON->TensorSpec utility we will use subsequently to specify additional outputs needed for certain training scenarios. Differential Revision: https://reviews.llvm.org/D84976	2020-08-03 09:49:31 -07:00
Florian Hahn	6d82efa764	[SCEV] If Start>=RHS, simplify (Start smin RHS) = RHS for trip counts. In some cases, it seems like we can get rid of unnecessary s/umins by using information from the loop guards (unless I am missing something). One place where this seems to be helpful in practice is when computing loop trip counts. This patch just changes howManyGreaterThans for now. Note that this requires a loop for which we can check 'is guarded'. On SPEC2000/SPEC2006/MultiSource, there are some notable changes for some programs in the number of loops unrolled and trip counts computed. ``` Same hash: 179 (filtered out) Remaining: 58 Metric: scalar-evolution.NumTripCountsComputed Program base patch diff test-suite...langs-C/compiler/compiler.test 25.00 31.00 24.0% test-suite.../Applications/SPASS/SPASS.test 2020.00 2323.00 15.0% test-suite...langs-C/allroots/allroots.test 29.00 32.00 10.3% test-suite.../Prolangs-C/loader/loader.test 17.00 18.00 5.9% test-suite...fice-ispell/office-ispell.test 253.00 265.00 4.7% test-suite...006/450.soplex/450.soplex.test 3552.00 3692.00 3.9% test-suite...chmarks/MallocBench/gs/gs.test 453.00 470.00 3.8% test-suite...ngs-C/assembler/assembler.test 29.00 30.00 3.4% test-suite.../Benchmarks/Ptrdist/bc/bc.test 263.00 270.00 2.7% test-suite...rks/FreeBench/pifft/pifft.test 722.00 741.00 2.6% test-suite...count/automotive-bitcount.test 41.00 42.00 2.4% test-suite...0/253.perlbmk/253.perlbmk.test 1417.00 1451.00 2.4% test-suite...000/197.parser/197.parser.test 387.00 396.00 2.3% test-suite...lications/sqlite3/sqlite3.test 1168.00 1189.00 1.8% test-suite...000/255.vortex/255.vortex.test 173.00 176.00 1.7% Metric: loop-unroll.NumUnrolled Program base patch diff test-suite...langs-C/compiler/compiler.test 1.00 3.00 200.0% test-suite.../Applications/SPASS/SPASS.test 134.00 234.00 74.6% test-suite...count/automotive-bitcount.test 3.00 4.00 33.3% test-suite.../Prolangs-C/loader/loader.test 3.00 4.00 33.3% test-suite...langs-C/allroots/allroots.test 3.00 4.00 33.3% test-suite...Source/Benchmarks/sim/sim.test 10.00 12.00 20.0% test-suite...fice-ispell/office-ispell.test 21.00 25.00 19.0% test-suite.../Benchmarks/Ptrdist/bc/bc.test 32.00 38.00 18.8% test-suite...006/450.soplex/450.soplex.test 300.00 352.00 17.3% test-suite...rks/FreeBench/pifft/pifft.test 60.00 69.00 15.0% test-suite...chmarks/MallocBench/gs/gs.test 57.00 63.00 10.5% test-suite...ngs-C/assembler/assembler.test 10.00 11.00 10.0% test-suite...0/253.perlbmk/253.perlbmk.test 145.00 157.00 8.3% test-suite...000/197.parser/197.parser.test 43.00 46.00 7.0% test-suite...TimberWolfMC/timberwolfmc.test 205.00 214.00 4.4% Geomean difference 7.6% ``` Fixes https://bugs.llvm.org/show_bug.cgi?id=46939 Fixes https://bugs.llvm.org/show_bug.cgi?id=46924 on X86. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D85046	2020-08-03 17:22:42 +01:00
Vitaly Buka	5f096386c6	[StackSafety, NFC] Don't insert empty objects into the map Result should be the same but it makes generateParamAccessSummary 5x faster.	2020-08-02 13:58:56 -07:00
Sanjay Patel	a4319ccb7e	[InstSimplify] fold max (max X, Y), X --> max X, Y https://alive2.llvm.org/ce/z/VGgG3M	2020-08-02 11:50:58 -04:00
Nikita Popov	86f94d1395	[InstSimplify] Reduce code duplication in icmp of binop folds (NFC) For folds where we check for the binop on both the LHS and RHS, extract a function that expects it on the LHS and call it with swapped order.	2020-08-02 15:47:18 +02:00
Kazu Hirata	6e4cee6f1e	Use llvm::is_contained where appropriate (NFC) Use llvm::is_contained where appropriate (NFC) Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D85083	2020-08-01 21:51:06 -07:00
Craig Topper	183d6fbbe7	[InstSimplify] Fold abs(abs(x)) -> abs(x) It's always safe to pick the earlier abs regardless of the nsw flag. We'll just lose it if it is on the outer abs but not the inner abs. Differential Revision: https://reviews.llvm.org/D85053	2020-08-01 13:25:00 -07:00
Sanjay Patel	afc3df8955	[InstSimplify] simplify abs if operand is known non-negative abs() should be rare enough that using value tracking is not going to be a compile-time cost burden, so use it to reduce a variety of potential patterns. We do this in DAGCombiner too. Differential Revision: https://reviews.llvm.org/D85043	2020-08-01 07:47:06 -04:00
Craig Topper	9eeafe7bcb	[ValueTracking] Improve llvm.abs handling in computeKnownBits. Add the optimizations we have in the SelectionDAG version. Known non-negative copies all known bits. Any known one other than the sign bit makes result non-negative. Differential Revision: https://reviews.llvm.org/D85000	2020-07-31 15:55:03 -07:00
Sanjay Patel	65a64fcb5b	[ConstantFolding] fold abs intrinsic The handling for minimum value is similar to cttz/ctlz with 0 just above this case. Differential Revision: https://reviews.llvm.org/D84942	2020-07-31 14:08:44 -04:00
Craig Topper	2c4eee97f8	[ValueTracking] Add ComputeNumSignBits support for llvm.abs intrinsic If absolute value needs turn a negative number into a positive number it reduces the number of sign bits by at most 1. Differential Revision: https://reviews.llvm.org/D84971	2020-07-31 10:59:12 -07:00
Vitaly Buka	1bae08d2a5	[NFC] Remove unused GetUnderlyingObject paramenter Depends on D84617. Differential Revision: https://reviews.llvm.org/D84621	2020-07-31 02:10:03 -07:00
Vitaly Buka	4ee4573a60	[NFC] GetUnderlyingObject -> getUnderlyingObject I am going to touch them in the next patch anyway	2020-07-30 21:08:24 -07:00
Vitaly Buka	0093612032	[ValueTracking] Remove AllocaForValue parameter findAllocaForValue uses AllocaForValue to cache resolved values. The function is used only to resolve arguments of lifetime intrinsic which usually are not fare for allocas. So result reuse is likely unnoticeable. In followup patches I'd like to replace the function with GetUnderlyingObjects. Depends on D84616. Differential Revision: https://reviews.llvm.org/D84617	2020-07-30 18:48:34 -07:00
Vitaly Buka	fe28af466f	[NFC] Move findAllocaForValue into ValueTracking.h Differential Revision: https://reviews.llvm.org/D84616	2020-07-30 18:22:59 -07:00
Craig Topper	5b76494391	[ValueTracking] Add basic computeKnownBits support for llvm.abs intrinsic This includes basic support for computeKnownBits on abs. I've left FIXMEs for more complicated things we could do. Differential Revision: https://reviews.llvm.org/D84963	2020-07-30 16:26:54 -07:00
Florian Hahn	db60ce547b	[LAA] Avoid adding pointers to the checks if they are not needed. Currently we skip alias sets with only reads or a single write and no reads, but still add the pointers to the list of pointers in RtCheck. This can lead to cases where we try to access a pointer that does not exist when grouping checks. In most cases, the way we access PositionMap masked that, as the value would default to index 0. But in the example in PR46854 it causes a crash. This patch updates the logic to avoid adding pointers for alias sets that do not need any checks. It makes things slightly more verbose, by first checking the numbers of reads/writes and bailing out early if we don't need checks for the alias set. I think this makes the logic a bit simpler to follow. Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D84608	2020-07-30 19:21:14 +01:00
Yuanfang Chen	e1803bebb8	[NewPM][PassInstrument] Add PrintPass callback to StandardInstrumentations Problem: Right now, our "Running pass" is not accurate when passes are wrapped in adaptor because adaptor is never skipped and a pass could be skipped. The other problem is that "Running pass" for a adaptor is before any "Running pass" of passes/analyses it depends on. (for example, FunctionToLoopPassAdaptor). So the order of printing is not the actual order. Solution: Doing things like PassManager::Debuglogging is very intrusive because we need to specify Debuglogging whenever adaptor is created. (Actually, right now we're not specifying Debuglogging for some sub-PassManagers. Check PassBuilder) This patch move debug logging for pass as a PassInstrument callback. We could be sure that all running passes are logged and in the correct order. This could also be used to implement hierarchy pass logging in legacy PM. We could also move logging of pass manager to this if we want. The test fixes looks messy. It includes changes: - Remove PassInstrumentationAnalysis - Remove PassAdaptor - If a PassAdaptor is for a real pass, the pass is added - Pass reorder (to the correct order), related to PassAdaptor - Add missing passes (due to Debuglogging not passed down) Reviewed By: asbirlea, aeubanks Differential Revision: https://reviews.llvm.org/D84774	2020-07-30 10:07:57 -07:00
Mircea Trofin	ff4bf8bfb5	[llvm][NFC] TensorSpec abstraction for ML evaluator Further abstracting the specification of a tensor, to more easily support different types and shapes of tensor, and also to perform initialization up-front, at TFModelEvaluator construction time. Differential Revision: https://reviews.llvm.org/D84685	2020-07-29 16:29:21 -07:00
Sanjay Patel	486acf841a	[InstSimplify] fold min/max intrinsic with undef operand	2020-07-29 17:03:50 -04:00
Sanjay Patel	b787afe9ff	[InstSimplify] fold min/max with opposite of limit value	2020-07-29 17:03:50 -04:00
Nikita Popov	fca04145c2	[ConstantRange] Add API for intrinsics (NFC) This adds a common API for compute constant ranges of intrinsics. The intention here is that a) we can reuse the same code across different passes that handle constant ranges, i.e. this can be reused in SCCP b) we only have to add knowledge about supported intrinsics to ConstantRange, not any consumers. Differential Revision: https://reviews.llvm.org/D84587	2020-07-29 22:16:27 +02:00
Craig Topper	7c95515f0a	[LV] Add abs/smin/smax/umin/umax intrinsics to isTriviallyVectorizable This patch adds support for vectorizing these intrinsics. Differential Revision: https://reviews.llvm.org/D84796	2020-07-29 10:23:07 -07:00
Sanjay Patel	26020de8d8	[InstSimplify] try constant folding intrinsics before general simplifications This matches the behavior of simplify calls for regular opcodes - rely on ConstantFolding before spending time on folds with variables. I am not aware of any diffs from this re-ordering currently, but there was potential for unintended behavior from the min/max intrinsics because that code is implicitly assuming that only 1 of the input operands is constant.	2020-07-29 13:18:40 -04:00
Sanjay Patel	6422893580	[InstSimplify] allow partial undef constants for vector min/max folds	2020-07-29 11:53:41 -04:00
Sanjay Patel	41638d7550	[InstSimplify] fold integer min/max intrinsic with same args	2020-07-29 11:53:41 -04:00
Sanjay Patel	af9b68aef4	[ConstantFolding] fold integer min/max intrinsics If both operands are undef, return undef. If one operand is undef, clamp to limit constant.	2020-07-29 11:01:13 -04:00
David Green	49873f2449	[Analysis] TTI: Add CastContextHint for getCastInstrCost Currently, getCastInstrCost has limited information about the cast it's rating, often just the opcode and types. Sometimes there is a context instruction as well, but it isn't trustworthy: for instance, when the vectorizer is rating a plan, it calls getCastInstrCost with the old instructions when, in fact, it's trying to evaluate the cost of the instruction post-vectorization. Thus, the current system can get the cost of certain casts incorrect as the correct cost can vary greatly based on the context in which it's used. For example, if the vectorizer queries getCastInstrCost to evaluate the cost of a sext(load) with tail predication enabled, getCastInstrCost will think it's free most of the time, but it's not always free. On ARM MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar situations can come up with how masked loads can be extended when being split. To fix that, this path adds a new parameter to getCastInstrCost to give it a hint about the context of the cast. It adds a CastContextHint enum which contains the type of the load/store being created by the vectorizer - one for each of the types it can produce. Original patch by Pierre van Houtryve Differential Revision: https://reviews.llvm.org/D79162	2020-07-29 13:32:53 +01:00
Sanjay Patel	5fbd1942fd	[InstSimplify] allow undefs in icmp with vector constant folds This is the main icmp simplification shortcoming seen in D84655. Alive2 agrees that the basic examples are correct at least: define <2 x i1> @src(<2 x i8> %x) { %0: %r = icmp sle <2 x i8> { undef, 128 }, %x ret <2 x i1> %r } => define <2 x i1> @tgt(<2 x i8> %x) { %0: ret <2 x i1> { 1, 1 } } Transformation seems to be correct! define <2 x i1> @src(<2 x i32> %X) { %0: %A = or <2 x i32> %X, { 63, 63 } %B = icmp ult <2 x i32> %A, { undef, 50 } ret <2 x i1> %B } => define <2 x i1> @tgt(<2 x i32> %X) { %0: ret <2 x i1> { 0, 0 } } Transformation seems to be correct! https://alive2.llvm.org/ce/z/omt2ee https://alive2.llvm.org/ce/z/GW4nP_ Differential Revision: https://reviews.llvm.org/D84762	2020-07-28 15:13:53 -04:00
Evgeniy Brevnov	7532dd9434	[BPI] Fix memory leak reported by sanitizer bots There is a silly mistake where release() is used instead of reset() for free resources of unique pointer. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D84747	2020-07-28 19:53:46 +07:00
Evgeniy Brevnov	06a0c0fb9c	[BPI][NFC] Consolidate code to deal with SCCs under a dedicated data structure. In order to facilitate review of D79485 here is a small NFC change which restructures code around handling of SCCs in BPI. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D84514	2020-07-28 17:42:33 +07:00
Alina Sbirlea	bb6015d9c6	[GraphDiff] Use class method getChildren instead of GraphTraits. Summary: Use getChildren() method in GraphDiff instead of GraphTraits. This simplifies the code and allows for refactorigns inside GraphDiff. All usecase need not have a light-weight/copyable range. Clean GraphTraits implementation. Reviewers: dblaikie Subscribers: hiraditya, llvm-commits, george.burgess.iv Tags: #llvm Differential Revision: https://reviews.llvm.org/D84562	2020-07-27 16:12:34 -07:00
Kazu Hirata	0d01902704	Use llvm::is_contained where appropriate (NFC) Summary: This patch replaces std::find with llvm::is_contained where appropriate. Reviewers: efriedma, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, jvesely, nhaehnle, hiraditya, rogfer01, kerbowa, llvm-commits, vkmr Tags: #llvm Differential Revision: https://reviews.llvm.org/D84489	2020-07-27 10:20:44 -07:00
Sergey Dmitriev	02505ee23a	[CallGraph] Preserve call records vector when replacing call edge Summary: Try not to resize vector of call records in a call graph node when replacing call edge. That would prevent invalidation of iterators stored in the CG SCC pass manager's scc_iterator. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D84295	2020-07-27 06:02:55 -07:00
Sanjay Patel	3c7b643ee6	[InstSimplify] fold integer min/max intrinsics with limit constant	2020-07-26 09:41:54 -04:00
Sanjay Patel	c36efeb0c6	[InstSimplify] fold fcmp using isKnownNeverInfinity + isKnownNeverNaN Follow-up to D84035 / rG7393d7574c09. This sidesteps a question of FMF/poison on fcmp raised in PR46077: http://bugs.llvm.org/PR46077 https://alive2.llvm.org/ce/z/TCsyzD define i1 @src(float %x) { %0: %x42 = fadd nnan ninf float %x, 42.000000 %r = fcmp ueq float %x42, inf ret i1 %r } => define i1 @tgt(float %x) { %0: ret i1 0 } Transformation seems to be correct! https://alive2.llvm.org/ce/z/FQaH7a define i1 @src(i8 %x) { %0: %cast = uitofp i8 %x to float %r = fcmp one float inf, %cast ret i1 %r } => define i1 @tgt(i8 %x) { %0: ret i1 1 } Transformation seems to be correct!	2020-07-26 09:04:37 -04:00
Juneyoung Lee	2b46786139	[ConstantFolding] Fold freeze if it is never undef or poison This is a simple patch that adds constant folding for freeze instruction. IIUC, it isn't needed to update ConstantFold.cpp because there is no freeze constexpr. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84597	2020-07-26 21:54:44 +09:00
Juneyoung Lee	e7db975cc1	[ValueTracking] Instruction::isBinaryOp should be used for constexprs This is a simple patch that makes canCreateUndefOrPoison use Instruction::isBinaryOp because BinaryOperator inherits Instruction. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84596	2020-07-26 21:48:51 +09:00

1 2 3 4 5 ...

9731 Commits