llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 18:54:02 +01:00

Author	SHA1	Message	Date
Jonas Paulsson	90d0031df7	[DAGTypeLegalizer] Handle ZERO_EXTEND of promoted type in WidenVecRes_Convert. On SystemZ, a ZERO_EXTEND of an i1 vector handled by WidenVecRes_Convert() always ended up being scalarized, because the type action of the input is promotion which was previously an unhandled case in this method. This fixes https://bugs.llvm.org/show_bug.cgi?id=47132. Differential Revision: https://reviews.llvm.org/D86268 Patch by Eli Friedman. Review: Ulrich Weigand	2020-09-08 16:49:51 +02:00
Nico Weber	e47e8722c3	[gn build] (manually) port 156b127945a8	2020-09-08 10:00:41 -04:00
Florian Hahn	15e96e9e04	[DSE,MemorySSA] Increase walker limit a bit. This slightly bumps the walker limit so that it covers more cases while not increasing compile-time too much: http://llvm-compile-time-tracker.com/compare.php?from=0fc1c2b51ba0cfb9145139af35be638333865251&to=91144a50ea4fa82c0c877e77784f60371640b263&stat=instructions	2020-09-08 14:55:46 +01:00
Sam Parker	551a2d7822	[NFC][ARM] Precommit test	2020-09-08 14:44:28 +01:00
Sanjay Patel	e7dfd86249	[InstCombine] add bitwise logic fold tests for D86395; NFC	2020-09-08 09:17:42 -04:00
Raul Tambre	ccd618fea7	[CMake] Remove dead FindPythonInterp code LLVM has bumped the minimum required CMake version to 3.13.4, so this has become dead code. Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D87189	2020-09-08 15:23:23 +03:00
Simon Pilgrim	9fa62857a3	X86CallLowering.cpp - improve auto const/pointer/reference qualifiers. NFCI. Fix clang-tidy warnings by ensuring auto variables are more cleanly qualified, or just avoid auto entirely.	2020-09-08 13:01:23 +01:00
Simon Pilgrim	6acd1b7baf	X86DomainReassignment.cpp - improve auto const/pointer/reference qualifiers. NFCI. Fix clang-tidy warnings by ensuring auto variables are more cleanly qualified, or just avoid auto entirely.	2020-09-08 13:01:23 +01:00
Xing GUO	51d61a7d47	[DWARFYAML] Make the debug_ranges section optional. This patch makes the debug_ranges section optional. When we specify an empty debug_ranges section, yaml2obj only emits the section header. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D87263	2020-09-08 19:55:47 +08:00
Sam Tebbs	6a1491418e	[ARM][LowOverheadLoops] Remove modifications to the correct element count register After my patch at D86087, code that now uses the mov operand rather than the vctp operand will no longer remove modifications to the vctp operand as they should. This patch fixes that by explicitly removing modifications to the vctp operand rather than the register used as the element count.	2020-09-08 10:30:05 +01:00
Qiu Chaofan	ff9f37b205	Revert "[PowerPC] Implement instruction clustering for stores" This reverts commit 3c0b3250230b3847a2a47dfeacfdb794c2285f02, (along with ea795304 and bb39eb9e) since it breaks test with UB sanitizer.	2020-09-08 17:24:08 +08:00
Serge Guelton	602a8c1ce6	Provide anchor for compiler extensions This patch is cherry-picked from 04b0a4e22e3b4549f9d241f8a9f37eebecb62a31, and amended to prevent an undefined reference to `llvm::EnableABIBreakingChecks'	2020-09-08 10:33:38 +02:00
Xing GUO	063c6c1b3c	[obj2yaml] Stop parsing the debug_str section when it encounters a string without the null terminator. When obj2yaml encounters a string without the null terminator, it should stop parsing the debug_str section. This patch addresses comments in [D86867](https://reviews.llvm.org/D86867#inline-803291). Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D87261	2020-09-08 16:09:36 +08:00
Max Kazantsev	19662073e2	[Test] More tests where IndVars fails to eliminate a range check	2020-09-08 14:43:29 +07:00
Qiu Chaofan	d230f91a15	[PowerPC] Fix getMemOperandWithOffsetWidth Commit 3c0b3250 introduced memory cluster under pwr10 target, but a check for operands was unexpectedly removed. This adds it back to avoid regression.	2020-09-08 15:35:25 +08:00
Simon Wallis	cdb9b6e903	[AARCH64][RegisterCoalescer] clang miscompiles zero-extension to long long Implement AArch64 variant of shouldCoalesce() to detect a known failing case and prevent the coalescing of a 32-bit copy into a 64-bit sign-extending load. Do not coalesce in the following case: COPY where source is bottom 32 bits of a 64-register, and destination is a 32-bit subregister of a 64-bit register, ie it causes the rest of the register to be implicitly set to zero. A mir test has been added. In the test case, the 32-bit copy implements a 32 to 64 bit zero extension and relies on the upper 32 bits being zeroed. Coalescing to the result of the 64-bit load meant overwriting the upper 32 bits incorrectly when the loaded byte was negative. Reviewed By: john.brawn Differential Revision: https://reviews.llvm.org/D85956	2020-09-08 08:04:52 +01:00
Mikael Holmen	5af75a16f7	[PowerPC] Add parentheses to silence gcc warning Without gcc 7.4 warns with ../lib/Target/PowerPC/PPCInstrInfo.cpp:2284:25: warning: suggest parentheses around '&&' within '\|\|' [-Wparentheses] BaseOp1.isFI() && ~~~~~~~~~~~~~~~^~ "Only base registers and frame indices are supported."); ~	2020-09-08 08:39:57 +02:00
Andrew Wei	4e26a597de	[LSR] Canonicalize a formula before insert it into the list In GenerateConstantOffsetsImpl, we may generate non canonical Formula if BaseRegs of that Formula is updated and includes a recurrent expr reg related with current loop while its ScaledReg is not. Patched by: mdchen Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D86939	2020-09-08 13:14:53 +08:00
Johannes Doerfert	56cc4d49d5	[Attributor][FIX] Don't crash on internalizing linkonce_odr hidden functions The CloneFunctionInto has implicit requirements with regards to the linkage and visibility of the function. We now update these after we did the CloneFunctionInto on the copy with the same linkage and visibility as the original.	2020-09-07 23:38:09 -05:00
Johannes Doerfert	7745d5467f	[Attributor][NFC] Cleanup internalize test case One run line was different and probably introduced for the manually added function attribute & name checks. We can do this with the script and a check prefix used for the other run lines as well.	2020-09-07 23:38:09 -05:00
Johannes Doerfert	314c55ed96	[Attributor][NFC] Change variable spelling	2020-09-07 23:38:09 -05:00
Johannes Doerfert	926bc1dd25	[Attributor][NFC] Clang tidy: no else after continue	2020-09-07 23:38:08 -05:00
Johannes Doerfert	ba69ec7fb4	[Attributor][NFC] Expand `auto` types (clang-fix-it)	2020-09-07 23:38:08 -05:00
Johannes Doerfert	ec05659ebd	[Attributor][FIX] Properly return changed if the IR was modified Deleting or replacing anything is certainly a modification. This caused a later assertion in IPSCCP when compiling 400.perlbench with the new PM. I'm not sure how to test this.	2020-09-07 23:38:08 -05:00
Max Kazantsev	b6ff4a7671	[Test] Auto-generated checks for some IndVarSimplify tests	2020-09-08 11:15:40 +07:00
Qiu Chaofan	2bb8ef68b6	[PowerPC] Implement instruction clustering for stores On Power10, it's profitable to schedule some stores with adjacent target address together. This patch implements this feature. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D86754	2020-09-08 11:03:09 +08:00
Alexander Shaposhnikov	60f196bf71	[llvm-objcopy] Consolidate and unify version tests In this diff the tests which verify version printing functionality are refactored. Since they are not specific to a particular format we move them into tool-version.test and slightly unify (similarly to tool-name.test and tool-help-message.test). Test plan: make check-all Differential revision: https://reviews.llvm.org/D87211	2020-09-07 18:44:32 -07:00
Florian Hahn	afd0a35752	[DSE,MemorySSA] Add an early check for read clobbers to traversal. Depending on the benchmark, this early exit can save a substantial amount of compile-time: http://llvm-compile-time-tracker.com/compare.php?from=505f2d817aa8e07ba98e5fd4a8f6ff0666f89df1&to=eb4e441147f9b4b7a5fcbbc57428cadbe9e01f10&stat=instructions	2020-09-07 23:22:10 +01:00
Roman Lebedev	8406429eae	Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline This was reverted in 503deec2183d466dad64b763bab4e15fd8804239 because it caused gigantic increase (3x) in branch mispredictions in certain benchmarks on certain CPU's, see https://reviews.llvm.org/D84108#2227365. It has since been investigated and here are the results: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html > It's an amazingly severe regression, but it's also all due to branch > mispredicts (about 3x without this). The code layout looks ok so there's > probably something else to deal with. I'm not sure there's anything we can > reasonably do so we'll just have to take the hit for now and wait for > another code reorganization to make the branch predictor a bit more happy :) > > Thanks for giving us some time to investigate and feel free to recommit > whenever you'd like. > > -eric So let's just reland this. Original commit message: I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form. After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure. Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline. I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens. Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere. Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late. As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement The largest fallout appears to be in GVN's load partial redundancy elimination, it spends much more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time. If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D84108 This reverts commit 503deec2183d466dad64b763bab4e15fd8804239.	2020-09-08 00:24:03 +03:00
Nikita Popov	608aa74b19	[KnownBits] Avoid some copies (NFC) These lambdas don't need copies, use const reference.	2020-09-07 22:19:29 +02:00
Nikita Popov	6a9be09c82	[SCCP] Compute ranges for supported intrinsics For intrinsics supported by ConstantRange, compute the result range based on the argument ranges. We do this independently of whether some or all of the input ranges are full, as we can often still constrain the result in some way. Differential Revision: https://reviews.llvm.org/D87183	2020-09-07 22:16:06 +02:00
Craig Topper	558281ef53	[SelectionDAG][X86][ARM] Teach ExpandIntRes_ABS to use sra+add+xor expansion when ADDCARRY is supported. Rather than using SELECT instructions, use SRA, UADDO/ADDCARRY and XORs to expand ABS. This is the multi-part version of the sequence we use in LegalizeDAG. It's also the same as the Custom sequence uses for i64 on 32-bit and i128 on 64-bit. So we can remove the X86 customization. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D87215	2020-09-07 13:15:26 -07:00
Sanjay Patel	c81c9fc73f	[InstCombine] improve fold of pointer differences This was supposed to be an NFC cleanup, but there's a real logic difference (did not drop 'nsw') visible in some tests in addition to an efficiency improvement. This is because in the case where we have 2 GEPs, the code was always swapping the operands and negating the result. But if we have 2 GEPs, we should never need swapping/negation AFAICT. This is part of improving flags propagation noticed with PR47430.	2020-09-07 15:54:32 -04:00
Sanjay Patel	f8a11985c3	[InstCombine] add ptr difference tests; NFC	2020-09-07 15:54:32 -04:00
Craig Topper	6f4d8f61ba	[X86] Use the same sequence for i128 ISD::ABS on 64-bit targets as we use for i64 on 32-bit targets. Differential Revision: https://reviews.llvm.org/D87214	2020-09-07 11:14:05 -07:00
Craig Topper	05d9a7585d	[X86] Pre-commit new test case for D87214. NFC	2020-09-07 11:14:05 -07:00
Sanjay Patel	367b6584ea	[DAGCombiner] allow more store merging for non-i8 truncated ops This is a follow-up suggested in D86420 - if we have a pair of stores in inverted order for the target endian, we can rotate the source bits into place. The "be_i64_to_i16_order" test shows a limitation of the current function (which might be avoided if we integrate this function with the other cases in mergeConsecutiveStores). In the earlier "be_i64_to_i16" test, we skip the first 2 stores because we do not match the full set as consecutive or rotate-able, but then we reach the last 2 stores and see that they are an inverted pair of 16-bit stores. The "be_i64_to_i16_order" test alters the program order of the stores, so we miss matching the sub-pattern. Differential Revision: https://reviews.llvm.org/D87112	2020-09-07 14:12:36 -04:00
Eric Astor	ffbe9ff668	[ms] [llvm-ml] Allow use of locally-defined variables in expressions MASM allows variables defined by equate statements to be used in expressions. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D86946	2020-09-07 14:00:14 -04:00
Eric Astor	7ae8eb3e82	[ms] [llvm-ml] Fix STRUCT field alignment MASM aligns fields to the _minimum_ of the STRUCT alignment value and the size of the next field. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D86945	2020-09-07 13:58:59 -04:00
Eric Astor	93e5d34daa	[ms] [llvm-ml] Add support for bitwise named operators (AND, NOT, OR) in MASM Add support for expressions of the form '1 or 2', etc. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D86944	2020-09-07 13:57:54 -04:00
Simon Pilgrim	b19ede2649	VPlan.h - remove unnecessary forward declarations. NFCI. Already defined in includes.	2020-09-07 18:35:06 +01:00
Simon Pilgrim	2a0e71d0ce	MipsISelLowering.h - remove CCState/CCValAssign forward declarations. NFCI. These are already defined in the CallingConvLower.h include.	2020-09-07 18:15:26 +01:00
Simon Pilgrim	54e2cf9073	BTFDebug.h - reduce MachineInstr.h include to forward declaration. NFCI.	2020-09-07 17:51:13 +01:00
Simon Pilgrim	38c8707d93	LeonPasses.h - remove unnecessary includes. NFCI. Reduce to forward declarations and move includes to LeonPasses.cpp where necessary.	2020-09-07 17:51:12 +01:00
Simon Pilgrim	6c92862767	LeonPasses.h - remove orphan function declarations. NFCI. The implementations no longer exist.	2020-09-07 17:51:12 +01:00
Sanjay Patel	c688e87b99	[InstCombine] improve folds for icmp with multiply operands (PR47432) Check for no overflow along with an odd constant before we lose information by converting to bitwise logic. https://rise4fun.com/Alive/2Xl Pre: C1 != 0 %mx = mul nsw i8 %x, C1 %my = mul nsw i8 %y, C1 %r = icmp eq i8 %mx, %my => %r = icmp eq i8 %x, %y Name: nuw ne Pre: C1 != 0 %mx = mul nuw i8 %x, C1 %my = mul nuw i8 %y, C1 %r = icmp ne i8 %mx, %my => %r = icmp ne i8 %x, %y Name: odd ne Pre: C1 % 2 != 0 %mx = mul i8 %x, C1 %my = mul i8 %y, C1 %r = icmp ne i8 %mx, %my => %r = icmp ne i8 %x, %y	2020-09-07 12:40:37 -04:00
Sanjay Patel	1fd12b30bd	[InstCombine] move/add tests for icmp with mul operands; NFC	2020-09-07 12:40:37 -04:00
alex-t	c025a25bff	[AMDGPU] SILowerControlFlow::optimizeEndCF should remove empty basic block optimizeEndCF removes EXEC restoring instruction case this instruction is the only one except the branch to the single successor and that successor contains EXEC mask restoring instruction that was lowered from END_CF belonging to IF_ELSE. As a result of such optimization we get the basic block with the only one instruction that is a branch to the single successor. In case the control flow can reach such an empty block from S_CBRANCH_EXEZ/EXECNZ it might happen that spill/reload instructions that were inserted later by register allocator are placed under exec == 0 condition and never execute. Removing empty block solves the problem. This change require further work to re-implement LIS updates. Recently, LIS is always nullptr in this pass. To enable it we need another patch to fix many places across the codegen. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D86634	2020-09-07 19:37:27 +03:00
Momchil Velikov	8e2bba4fdc	Reduce the number of memory allocations when displaying a warning about clobbering reserved registers (NFC). Also address some minor inefficiencies and style issues. Differential Revision: https://reviews.llvm.org/D86088	2020-09-07 17:04:00 +01:00
Simon Pilgrim	8ac9bb3910	AntiDepBreaker.h - remove unnecessary ScheduleDAG.h include. NFCI.	2020-09-07 16:39:42 +01:00

1 2 3 4 5 ...

203113 Commits