llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 11:13:28 +01:00

Author	SHA1	Message	Date
Tony	a3714ce03c	[AMDGPU] DWARF proposal changes for expression context - Clarify what context is used in DWARF expression evaluation. - Define location descriptions to fully resolve the context and so include the context in their result. - As a consequence of location descriptions being fully resoved, change address spaces so only a swizzled and unswizzled private address space is defined. The lane is now part of the location description context. - Clarify how call frame information is used to fully resolve expressions that specify registers. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D70523	2020-07-30 01:59:22 +00:00
Matt Arsenault	c9d9ecfba5	GlobalISel: Use result of find rather than rechecking map	2020-07-29 21:26:20 -04:00
Matt Arsenault	71c6e8a505	GlobalISel: Handle assorted no-op intrinsics SelectionDAGBuilder just drops these, so do the same.	2020-07-29 21:26:20 -04:00
Juneyoung Lee	82bb41ae0f	[JumpThreading] Fold br(freeze(undef)) This patch makes JumpThreading fold br(freeze(undef)) if the freeze instruction is only used by the branch. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84818	2020-07-30 09:38:50 +09:00
Matt Arsenault	e2b102c48a	GlobalISel: Handle llvm.roundeven I still think it's highly questionable that we have two intrinsics with identical behavior and only vary by the name of the libcall used if it happens to be lowered that way, but try to reduce the feature delta between SDAG and GlobalISel for recently added intrinsics. I'm not sure which opcode should be considered the canonical one, but lower roundeven back to round.	2020-07-29 20:01:12 -04:00
Mircea Trofin	ff4bf8bfb5	[llvm][NFC] TensorSpec abstraction for ML evaluator Further abstracting the specification of a tensor, to more easily support different types and shapes of tensor, and also to perform initialization up-front, at TFModelEvaluator construction time. Differential Revision: https://reviews.llvm.org/D84685	2020-07-29 16:29:21 -07:00
Hiroshi Yamauchi	ba8d265716	Revert "[PGO] Include the mem ops into the function hash." This reverts commit 120e66b3418b37b95fc1dbbb23e296a602a24fa8. Due to a buildbot failure.	2020-07-29 15:04:57 -07:00
Craig Topper	13dfcf7ce3	[X86] Remove unused argument from HandleAVX512Operand in the assembly parser.	2020-07-29 14:23:01 -07:00
Arthur Eubanks	490b642e95	[opt][NewPM] Fix typo From https://reviews.llvm.org/D84872.	2020-07-29 14:20:01 -07:00
Sanjay Patel	486acf841a	[InstSimplify] fold min/max intrinsic with undef operand	2020-07-29 17:03:50 -04:00
Sanjay Patel	b787afe9ff	[InstSimplify] fold min/max with opposite of limit value	2020-07-29 17:03:50 -04:00
Hiroshi Yamauchi	8659ad4aa0	[PGO] Include the mem ops into the function hash. To avoid hash collisions when the only difference is in mem ops. Differential Revision: https://reviews.llvm.org/D84782	2020-07-29 13:59:40 -07:00
Arthur Eubanks	ea8ba1b79f	[NewPM][opt] Revert to legacy PM when any codegen passes are specified This reduces the number of check-llvm failures by 500. Ideally we'd have a codegen version of PassRegistry.def, or have all the codegen passes ported and put into PassRegistry.def. But since that doesn't exist yet, hardcode the list of codegen IR passes. There are still codegen passes missing from this list, I'll add them later as I stumble upon them. Reviewed By: asbirlea, ychen Differential Revision: https://reviews.llvm.org/D84872	2020-07-29 13:55:11 -07:00
Philip Reames	f22040ec6c	[Statepoint] Enable cross block relocates w/vreg lowering This change is mechanical, it just removes the restriction and updates tests. The key building blocks were submitted in 31342eb and 8fe2abc. Note that this (and preceeding changes) entirely subsumes D83965. I did includes a couple of it's tests. From the codegen changes, an interesting observation: this doesn't actual reduce spilling, it just let's the register allocator do it's job. That results in a slightly different overall result which has both pros and cons over the eager spill lowering. (i.e. We'll have some perf tuning to do once this is stable.)	2020-07-29 13:32:51 -07:00
Nikita Popov	fca04145c2	[ConstantRange] Add API for intrinsics (NFC) This adds a common API for compute constant ranges of intrinsics. The intention here is that a) we can reuse the same code across different passes that handle constant ranges, i.e. this can be reused in SCCP b) we only have to add knowledge about supported intrinsics to ConstantRange, not any consumers. Differential Revision: https://reviews.llvm.org/D84587	2020-07-29 22:16:27 +02:00
Victor Huang	827e26bee8	[PowerPC] Support for R_PPC64_REL24_NOTOC calls where the caller has no TOC and the callee is not DSO local This patch supports the situation where caller does not have a valid TOC and calls using the R_PPC64_REL24_NOTOC relocation and the callee is not DSO local. In this case the call cannot be made directly since the callee may or may not require a valid TOC pointer. As a result this situation require a PC-relative plt stub to set up r12. Reviewed By: sfertile, MaskRay, stefanp Differential Revision: https://reviews.llvm.org/D83669	2020-07-29 19:49:28 +00:00
Simon Pilgrim	8c1ad1e305	[X86][AVX] isHorizontalBinOp - relax no-lane-crossing limit for AVX1-only targets. Instead of never accepting v8f32/v4f64 FHADD/FHSUB if the input shuffle masks cross lanes, perform the matching and determine if the post shuffle mask simplifies to a 'whole lane shuffle' mask - in which case we are guaranteed to cheaply perform this as a VPERM2F128 shuffle.	2020-07-29 20:49:10 +01:00
Philip Reames	33a681d4a6	[Tests] Split a file for ease of update	2020-07-29 12:45:04 -07:00
Florian Hahn	bf9e3782d5	Reland "[SCEVExpander] Add option to preserve LCSSA directly." This reverts the revert commit dc2867576886247cbe351e7c63618c09ab6af808. It includes a fix for Polly, which uses SCEVExpander on IR that is not in LCSSA form. Set PreserveLCSSA = false in that case, to ensure we do not introduce LCSSA phis where there were none before.	2020-07-29 20:41:53 +01:00
Stanislav Mekhanoshin	d6bf6e55ee	[AMDGPU] Fixed formatting in GCNHazardRecognizer.cpp. NFC.	2020-07-29 12:21:28 -07:00
Stanislav Mekhanoshin	683cf8a2b5	[AMDGPU] prefer non-mfma in post-RA schedule MFMA instructions shall not be scheduled back to back to avoid MAI SIMD stall. Tell post-RA schedule we would prefer some other instruction instead. Differential Revision: https://reviews.llvm.org/D84883	2020-07-29 12:17:50 -07:00
Matt Arsenault	d279d0e1dd	GlobalISel: Fix insert point in CSEMIRBuilder unit test This was using invalid MIR for the test instructions. The test add was the first instruction in the block, before the trunc inputs or copies from physical registers which I assume was not intended.	2020-07-29 15:08:42 -04:00
Baptiste Saleil	8004150937	[PowerPC] Add options to control paired vector memops support Adds frontend and backend options to enable and disable the PowerPC paired vector memory operations added in ISA 3.1. Instructions using these options will be added in subsequent patches. Differential Revision: https://reviews.llvm.org/D83722	2020-07-29 14:00:53 -05:00
Matt Morehouse	05810ce897	[DFSan] Add efficient fast16labels instrumentation mode. Adds the -fast-16-labels flag, which enables efficient instrumentation for DFSan when the user needs <=16 labels. The instrumentation eliminates most branches and most calls to __dfsan_union or __dfsan_union_load. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D84371	2020-07-29 18:58:47 +00:00
Amara Emerson	d7527ffdb2	[GlobalISel] Add G_INTRINSIC_LRINT and translate from llvm.lrint Differential Revision: https://reviews.llvm.org/D84551	2020-07-29 11:51:04 -07:00
Philip Reames	742638c542	[Statepoint] Consolidate relocation type tracking [NFC] Change the way we track how a particular pointer was relocated at a statepoint in selection dag. Previously, we used an optional<location> for the spill lowering, and a block local Register for the newly introduced vreg lowering. Combine all three lowerings (norelocate, spill, and vreg) into a single helper class, and keep a single copy of the information. This is submitted separately as it really does make the code more readible on it's own, but the indirect motivation is to move vreg tracking from StatepointLowering to FunctionLoweringInfo. This is the last piece needed to support cross block relocations with vregs; that will follow in a separate (non-NFC) patch.	2020-07-29 11:45:31 -07:00
Amara Emerson	9d49eb2e5c	[AArch64][GlobalISel] Selection support for vector DUP[X]lane instructions. In future, we'd like to use the perfect-shuffle mechanism to deal with these shuffle permutations. For now, this improves performance by avoiding the super-expensive const-pool load + tbl instruction. Differential Revision: https://reviews.llvm.org/D84866	2020-07-29 11:41:37 -07:00
Matt Arsenault	a5d20b4e1d	AMDGPU/GlobalISel: Handle llvm.amdgcn.reloc.constant	2020-07-29 14:24:21 -04:00
Florian Hahn	9db6d6a866	Revert "[SCEVExpander] Add option to preserve LCSSA directly." This reverts commit 99166fd4fb422351f131fb1265cb85d5f6c5b8da, because it breaks the polly builders. polly/test/Isl/CodeGen/invariant_load_escaping_second_scop.ll fails because a apparently unnecessary LCSSA phi node is introduced. Make the bots green again, while I take a closer look.	2020-07-29 19:19:04 +01:00
Matt Arsenault	2428f2b78e	GlobalISel: Implement lower for G_EXTRACT_VECTOR_ELT Use the basic store to stack and reload.	2020-07-29 14:16:28 -04:00
Jessica Paquette	6898e4fc01	[AArch64][GlobalISel] Select XRO addressing mode with wide immediates Port the wide immediate case from AArch64DAGToDAGISel::SelectAddrModeXRO. If we have a wide immediate which can't be represented in an add, we can end up with code like this: ``` mov x0, imm add x1, base, x0 ldr x2, [x1, 0] ``` If we use the [base, xN] addressing mode instead, we can produce this: ``` mov x0, imm ldr x2, [base, x0] ``` This saves 0.4% code size on 7zip at -O3, and gives a geomean code size improvement of 0.1% on CTMark. Differential Revision: https://reviews.llvm.org/D84784	2020-07-29 11:02:10 -07:00
Matt Arsenault	cd9f0064fc	AMDGPU: Relax restriction on folding immediates into physregs I never completed the work on the patches referenced by f8bf7d7f42f28fa18144091022236208e199f331, but this was intended to avoid folding immediate writes into m0 which the coalescer doesn't understand very well. Relax this to allow simple SGPR immediates to fold directly into VGPR copies. This pattern shows up routinely in current GlobalISel code since nothing is smart enough to emit VGPR constants yet.	2020-07-29 14:01:53 -04:00
Matt Arsenault	e28350d5b3	GloblaISel: Remove unreachable condition Fixes bug 46882	2020-07-29 13:42:22 -04:00
LLVM GN Syncbot	36ff7942da	[gn build] Port 276f9e8cfaf	2020-07-29 17:37:10 +00:00
Heejin Ahn	166ce9a607	[WebAssembly] Fix getBottom for loops When it was first created, CFGSort only made sure BBs in each `MachineLoop` are sorted together. After we added exception support, CFGSort now also sorts BBs in each `WebAssemblyException`, which represents a `catch` block, together, and `Region` class was introduced to be a thin wrapper for both `MachineLoop` and `WebAssemblyException`. But how we compute those loops and exceptions is different. `MachineLoopInfo` is constructed using the standard loop computation algorithm in LLVM; the definition of loop is "a set of BBs that are dominated by a loop header and have a path back to the loop header". So even if some BBs are semantically contained by a loop in the original program, or in other words dominated by a loop header, if they don't have a path back to the loop header, they are not considered a part of the loop. For example, if a BB is dominated by a loop header but contains `call abort()` or `rethrow`, it wouldn't have a path back to the header, so it is not included in the loop. But `WebAssemblyException` is wasm-specific data structure, and its algorithm is simple: a `WebAssemblyException` consists of an EH pad and all BBs dominated by the EH pad. So this scenario is possible: (This is also the situation in the newly added test in cfg-stackify-eh.ll) ``` Loop L: header, A, ehpad, latch Exception E: ehpad, latch, B ``` (B contains `abort()`, so it does not have a path back to the loop header, so it is not included in L.) And it is sorted in this order: ``` header A ehpad latch B ``` And when CFGStackify places `end_loop` or `end_try` markers, it previously used `WebAssembly::getBottom()`, which returns the latest BB in the sorted order, and placed the marker there. So in this case the marker placements will be like this: ``` loop header try A catch ehpad latch end_loop <-- misplaced! B end_try ``` in which nesting between the loop and the exception is not correct. `end_loop` marker has to be placed after `B`, and also after `end_try`. Maybe the fundamental way to solve this problem is to come up with our own algorithm for computing loop region too, in which we include all BBs dominated by a loop header in a loop. But this takes a lot more effort. The only thing we need to fix is actually, `getBottom()`. If we make it return the right BB, which means in case of a loop, the latest BB of the loop itself and all exceptions contained in there, we are good. This renames `Region` and `RegionInfo` to `SortRegion` and `SortRegionInfo` and extracts them into their own file. And add `getBottom` to `SortRegionInfo` class, from which it can access `WebAssemblyExceptionInfo`, so that it can compute a correct bottom block for loops. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D84724	2020-07-29 10:36:32 -07:00
Hiroshi Yamauchi	3182381779	[PGO] Remove insignificant function hash values from some tests. This is to avoid the need to update a bunch of test files when the PGO instrumentation function hashing changes. Split off of D84782. Differential Revision: https://reviews.llvm.org/D84865	2020-07-29 10:23:42 -07:00
Craig Topper	0afe3fbed5	[X86] Add custom lowering for llvm.roundeven with sse4.1. We can use the roundss/sd/ps/pd instructions like we do for ceil/floor/trunc/rint/nearbyint. Differential Revision: https://reviews.llvm.org/D84592	2020-07-29 10:23:08 -07:00
Craig Topper	7c95515f0a	[LV] Add abs/smin/smax/umin/umax intrinsics to isTriviallyVectorizable This patch adds support for vectorizing these intrinsics. Differential Revision: https://reviews.llvm.org/D84796	2020-07-29 10:23:07 -07:00
Arthur Eubanks	6d7e104f99	[DFSan][NewPM] Port DataFlowSanitizer to NewPM Reviewed By: ychen, morehouse Differential Revision: https://reviews.llvm.org/D84707	2020-07-29 10:19:15 -07:00
Sanjay Patel	26020de8d8	[InstSimplify] try constant folding intrinsics before general simplifications This matches the behavior of simplify calls for regular opcodes - rely on ConstantFolding before spending time on folds with variables. I am not aware of any diffs from this re-ordering currently, but there was potential for unintended behavior from the min/max intrinsics because that code is implicitly assuming that only 1 of the input operands is constant.	2020-07-29 13:18:40 -04:00
Simon Pilgrim	220154b160	[DAG][AMDGPU][X86] Add SimplifyMultipleUseDemandedBits handling for SIGN/ZERO_EXTEND + SIGN/ZERO_EXTEND_VECTOR_INREG Peek through multiple use ops like we already do for ANY_EXTEND/ANY_EXTEND_VECTOR_INREG Differential Revision: https://reviews.llvm.org/D84863	2020-07-29 18:10:59 +01:00
Roman Lebedev	4a9109b967	[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form. After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure. Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline. I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens. Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere. Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late. As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement The largest fallout appears to be in GVN's load partial redundancy elimination, it spends much more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time. If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D84108	2020-07-29 20:05:30 +03:00
Kang Zhang	19f5254989	[PowerPC] Set v1i128 to expand for SETCC to avoid crash Summary: PPC only supports the instruction selection for v16i8, v8i16, v4i32, v2i64, v4f32 and v2f64 for ISD::SETCC, don't support the v1i128, so v1i128 for ISD::SETCC will crash. This patch is to set v1i128 to expand to avoid crash. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D84238	2020-07-29 16:39:27 +00:00
Philip Reames	3b1e951b5e	[Statepoint] When using the tied def lowering, unconditionally use vregs [almost NFC] This builds on 3da1a96 on the path towards supporting invokes and cross block relocations. The actual change attempts to be NFC, but does fail in one corner-case explained below. The change itself is fairly mechanical. Rather than remember SDValues - which are inherently block local - immediately produce a virtual register copy and remember that. Once this lands, we'll update the FunctionLoweringInfo::StatepointSpillMap map to allow register based lowerings, delete VirtRegs from StatepointLowering, and drop the restriction against cross block relocations. I deliberately separate the semantic part into it's own change for easy of understanding and fault isolation. The corner-case which isn't quite NFC is that the old implementation implicitly CSEd gc.relocates of the same SDValue regardless of type. The new implementation still only relocates once, but it produces distinct vregs for the bitcast and it's source, whereas SelectionDAG's generic CSE was able to remove the bitcast in the old implementation. Note that the final assembly doesn't change (at least in the test), as our MI level optimizations catch the duplication. I assert that this is an uninteresting corner-case. It's functionally correct, and if we find a case where this influences performance, we should really be canonicalizing types to i8* at the IR level. Differential Revision: https://reviews.llvm.org/D84692	2020-07-29 09:23:52 -07:00
Arthur Eubanks	00d3b0ba7d	[NewPM][Attributor] Pin tests with -attributor to legacy PM All these tests already explicitly test against both legacy PM and NPM. $ sed -i 's/ -attributor / -attributor -enable-new-pm=0 /g' $(rg --path-separator // -l -- -passes=) $ sed -i 's/ -attributor-cgscc / -attributor-cgscc -enable-new-pm=0 /g' $(rg --path-separator // -l -- -passes=) Now all tests in Transforms/Attributor/ pass under NPM. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84813	2020-07-29 09:02:30 -07:00
Sanjay Patel	6422893580	[InstSimplify] allow partial undef constants for vector min/max folds	2020-07-29 11:53:41 -04:00
Sanjay Patel	41638d7550	[InstSimplify] fold integer min/max intrinsic with same args	2020-07-29 11:53:41 -04:00
Kang Zhang	bb99550f85	[MachineVerifier] Handle the PHI node for verifyLiveVariables() Summary: When doing MachineVerifier for LiveVariables, the MachineVerifier pass will calculate the LiveVariables, and compares the result with the result livevars pass gave. If they are different, verifyLiveVariables() will give error. But when we calculate the LiveVariables in MachineVerifier, we don't consider the PHI node, while livevars considers. This patch is to fix above bug. Reviewed By: bjope Differential Revision: https://reviews.llvm.org/D80274	2020-07-29 15:43:47 +00:00
Matt Arsenault	43dbbcf721	AMDGPU: Account for the size of LDS globals used through constant expressions. Also "fix" the longstanding bug where the computed size depends on the order of the visitation. We could try to predict the allocation order used by legalization, but it would never be 100% perfect. Until we start fixing the addresses somehow (or have a more reliable allocation scheme later), just try to compute the size based on the worst case padding.	2020-07-29 11:40:42 -04:00
David Sherwood	82faee9523	[SVE] Don't consider scalable vector types in SLPVectorizerPass::vectorizeChainsInBlock In vectorizeChainsInBlock we try to collect chains of PHI nodes that have the same element type, but the code is relying upon the implicit conversion from TypeSize -> uint64_t. For now, I have modified the code to ignore PHI nodes with scalable types. Differential Revision: https://reviews.llvm.org/D83542	2020-07-29 16:29:19 +01:00

1 2 3 4 5 ...

201026 Commits