llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 10:42:39 +01:00

Author	SHA1	Message	Date
Mateusz Mikuła	8e6bb7d616	[MinGW] Use lib prefix for libraries In MinGW world, UNIX like lib prefix is preferred for the libraries. This patch adjusts CMake files to do that. Differential Revision: https://reviews.llvm.org/D87517	2020-09-12 22:01:29 +03:00
Craig Topper	c6a7e261b5	[SelectionDAG][X86][ARM][AArch64] Add ISD opcode for __builtin_parity. Expand it to shifts and xors. Clang emits (and (ctpop X), 1) for __builtin_parity. If ctpop isn't natively supported by the target, this leads to poor codegen due to the expansion of ctpop being more complex than what is needed for parity. This adds a DAG combine to convert the pattern to ISD::PARITY before operation legalization. Type legalization is updated to handled Expanding and Promoting this operation. If after type legalization, CTPOP is supported for this type, LegalizeDAG will turn it back into CTPOP+AND. Otherwise LegalizeDAG will emit a series of shifts and xors followed by an AND with 1. I've avoided vectors in this patch to avoid more legalization complexity for this patch. X86 previously had a custom DAG combiner for this. This is now moved to Custom lowering for the new opcode. There is a minor regression in vector-reduce-xor-bool.ll, but a follow up patch can easily fix that. Fixes PR47433 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87209	2020-09-12 11:42:18 -07:00
Florian Hahn	99ce5bbc34	[DSE] Adjust coroutines test after e082dee2b588.	2020-09-12 19:23:13 +01:00
Florian Hahn	ad8e41fb76	[DSE] Bail out on MemoryPhis when deleting stores at end of function. When deleting stores at the end of a function, we have to do PHI translation, otherwise we might miss reads in different iterations of a loop. See multiblock-loop-carried-dependence.ll for details. This fixes a mis-compile and surprisingly also increases the number of eliminated stores from 26047 to 26572 for MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto. This is most likely because we save budget by not exploring through MemoryPhis, which are less likely to result in valid candidates for elimination. The issue was reported post-commit for fb109c42d91c.	2020-09-12 19:05:59 +01:00
Florian Hahn	2fae14a0b6	[DSE] Precommit test case with loop carried dependence.	2020-09-12 18:51:08 +01:00
David Green	c370185b0e	[LV][ARM] Add preferInloopReduction target hook. This allows the backend to tell the vectorizer to produce inloop reductions through a TTI hook. For the moment on ARM under MVE this means allowing integer add reductions of the correct size. In the future this can include integer min/max too, under -Os. Differential Revision: https://reviews.llvm.org/D75512	2020-09-12 17:47:04 +01:00
Paul C. Anagnostopoulos	780978660e	TableGen: change a couple of member names to clarify their use.	2020-09-12 12:21:36 -04:00
Simon Pilgrim	294f9cdd8d	[InstCombine][X86] Covert masked load/stores with (sign extended) bool vector masks to generic intrinsics. As detailed on PR11210, if the mask is known to come from a (sign extended) bool vector (e.g. comparisons) then we can represent with a generic masked load/store without losing anything. We already do something similar for BLENDV -> SELECT conversion.	2020-09-12 15:09:28 +01:00
Evgeny Leviant	3ea8a2208a	[MachineScheduler] Fix operand scheduling for pre/post-increment loads Differential revision: https://reviews.llvm.org/D87557	2020-09-12 16:53:12 +03:00
Tyker	c86946593e	Reland [AssumeBundles] Use operand bundles to encode alignment assumptions NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html Complemantary to the assumption outliner prototype in D71692, this patch shows how we could simplify the code emitted for an alignemnt assumption. The generated code is smaller, less fragile, and it makes it easier to recognize the additional use as a "assumption use". As mentioned in D71692 and on the mailing list, we could adopt this scheme, and similar schemes for other patterns, without adopting the assumption outlining.	2020-09-12 15:36:06 +02:00
Simon Pilgrim	9b2e865f05	[InstCombine][X86] Add tests for masked load/stores with comparisons. As detailed on PR11210, if the mask is known to come from a (sign extended) bool vector (e.g. comparisons) then we can represent with a generic masked load/store without losing anything.	2020-09-12 14:32:27 +01:00
David Green	67898847a1	[ARM] Fixup single source mla reductions. This fixes a complication on top of D87276. If we are sign extending around a mul with the two operands that are the same, instcombine will helpfully convert one of the sext to a zext. Reverse that so that we again generate a reduction. Differnetial Revision: https://reviews.llvm.org/D87287	2020-09-12 14:31:26 +01:00
Sanjay Patel	2c86671523	[Intrinsics] define semantics for experimental fmax/fmin vector reductions As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html This is hopefully the final remaining showstopper before we can remove the 'experimental' from the reduction intrinsics. No behavior was specified for the FP min/max reductions, so we have a mess of different interpretations. There are a few potential options for the semantics of these max/min ops. I think this is the simplest based on current behavior/implementation: make the reductions inherit from the existing llvm.maxnum/minnum intrinsics. These correspond to libm fmax/fmin, and those are similar to the (now deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing data). So the default expansion creates calls to libm functions. Another option would be to inherit from llvm.maximum/minimum (NaNs propagate), but most targets just crash in codegen when given those nodes because no default expansion was ever implemented AFAICT. We could also just assume 'nnan' semantics by default (we are already assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets (AArch64, PowerPC) support the more defined behavior, so it doesn't make much sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to loosen the semantics. (Note that D67507 was proposed to update the LangRef to acknowledge the more recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do update based on the new standard, the reduction instructions can seamlessly inherit from whatever updates are made to the max/min intrinsics.) x86 sees a regression here on 'nnan' tests because we have underlying, longstanding bugs in FMF creation/propagation. Those need to be fixed apart from this change (for example: https://llvm.org/PR35538). The expansion sequence before this patch may not have been correct. Differential Revision: https://reviews.llvm.org/D87391	2020-09-12 09:10:28 -04:00
Simon Pilgrim	f3983a0856	[InstCombine][X86] getNegativeIsTrueBoolVec - use ConstantExpr evaluators. NFCI. Don't do this manually, we can just use the ConstantExpr evaluators to do it more tidily for us.	2020-09-12 13:58:58 +01:00
David Green	32ca04bf0a	[ARM] Recognize "double extend" reduction patterns We can sometimes get code that does: xe = zext i16 x to i32 ye = zext i16 y to i32 m = mul i32 xe, ye me = zext i32 m to i64 r = vecreduce.add(me) This "double extend" can trip up the reduction identification, but should give identical results. This extends the pattern matching to handle them. Differential Revision: https://reviews.llvm.org/D87276	2020-09-12 13:51:42 +01:00
Nikita Popov	ccb5157d9e	[InstCombine] Fix incorrect SimplifyWithOpReplaced transform (PR47322) This is a followup to D86834, which partially fixed this issue in InstSimplify. However, InstCombine repeats the same transform while dropping poison flags -- which does not cover cases where poison is introduced in some other way. The fix here is a bit more comprehensive, because things are quite entangled, and it's hard to only partially address it without regressing optimization. There are really two changes here: * Export the SimplifyWithOpReplaced API from InstSimplify, with an added AllowRefinement flag. For replacements inside the TrueVal we don't actually care whether refinement occurs or not, the replacement is always legal. This part of the transform is now done in InstSimplify only. (It should be noted that the current AllowRefinement check is not sufficient -- that's an issue we need to address separately.) * Change the InstCombine fold to work by temporarily dropping poison generating flags, running the fold and then restoring the flags if it didn't work out. This will ensure that the InstCombine fold is correct as long as the InstSimplify fold is correct. Differential Revision: https://reviews.llvm.org/D87445	2020-09-12 14:45:06 +02:00
Simon Pilgrim	d537d09fc4	[X86][SSE] lowerShuffleAsDecomposedShuffleBlend - support decomposed unpacks for some vXi8/vXi16 cases Follow up to D86429 to handle the remaining regressions. This patch generalizes lowerShuffleAsDecomposedShuffleBlend to lowerShuffleAsDecomposedShuffleMerge, and attempts to use an UNPCKL shuffle mask instead of a blend for the cases where the inputs are coming from alternating vXi8/vXi16 sources. Technically they don't have to be alternating (just as long as they can fit into a lower lane half for the unpack) but I didn't find as many general cases and it needed a lot more of the function to be altered. For vXi32/vXi64 cases this could still be beneficial but in most cases the existing permute+blend approach was better. Differential Revision: https://reviews.llvm.org/D87405	2020-09-12 13:39:33 +01:00
LLVM GN Syncbot	b0d19df2e5	[gn build] Port 19531a81f1d	2020-09-12 10:08:18 +00:00
Jianzhou Zhao	45a1dbf88c	Add a header file to support ssize_t for windows fixing `0ece51c60c`	2020-09-12 08:50:22 +00:00
Jianzhou Zhao	078b488a52	Add raw_fd_stream_test.cpp into CMakeLists.txt Fixing `0ece51c60c`	2020-09-12 07:48:12 +00:00
Jianzhou Zhao	097ba299ca	Add raw_fd_stream that supports reading/seeking/writing This is used by https://reviews.llvm.org/D86905 to support bitcode writer's incremental flush.	2020-09-12 07:34:19 +00:00
QingShan Zhang	083df4ba20	[Power10] Enable the heuristic for Power10 and switch the sched model with P9 Model Enable the pre-ra and post-ra scheduler strategy for Power10 as we want to customize the heuristic later. And switch the scheduler model with P9 model before P10 Model is available. The NoSchedModel is modelled as in-order cpu and the pre-ra scheduler is not bi-directional which will have big impact on the scheduler. Reviewed By: jji Differential Revision: https://reviews.llvm.org/D86865	2020-09-12 02:49:47 +00:00
QingShan Zhang	65635c700c	[PowerPC] Set the mayRaiseFPException for FCMPUS/FCMPUD From ISA, fcmpu will raise the Floating-Point Invalid Operation Exception (SNaN) if either of the operands is a Signaling NaN by setting the bit VXSNAN. But the instruction description didn't set the mayRaiseFPException which might have impact on the scheduling or some backend optimization. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D83937	2020-09-12 02:42:22 +00:00
LLVM GN Syncbot	48a0edbcbc	[gn build] Port ad99e34c59b	2020-09-12 01:54:23 +00:00
Yuanfang Chen	8df10aca8c	Revert "[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline" This reverts commit 31ecf8d29d81d196374a562c6d2bd2c25a62861e. This reverts commit 3fdaa8602a086a3fca5f0fc8527536ac659079d0. There is laying violation for Target->CodeGen.	2020-09-11 18:52:32 -07:00
Reid Kleckner	7f311ed45c	[gn] Remove unneeded MC dep from llvm-tblgen Tablegen does not have link time dependencies on MC. Having llvm-tblgen depend on it causes it to be rebuilt in the gn build every time somebody touches any cpp file in llvm/lib/MC* or llvm/lib/DebugInfo/Codeview*. Touching tablegen invalidates most of the rest of the build, and re-running it takes a while. This is is annoying for me when swapping between branches that touch CodeView logic. This dep was added to LLVMBuild.txt back in 2018, and presumably it was carried over into the gn build. Differential Revision: https://reviews.llvm.org/D87553	2020-09-11 18:28:49 -07:00
Eli Friedman	2b45dcbc0b	[ConstantFold] Make areGlobalsPotentiallyEqual less aggressive. In particular, we shouldn't make assumptions about globals which are unnamed_addr: we can fold them together with other globals. Also while I'm here, use isInterposable() instead of trying to explicitly name all the different kinds of weak linkage. Fixes https://bugs.llvm.org/show_bug.cgi?id=47090 Differential Revision: https://reviews.llvm.org/D87123	2020-09-11 17:23:08 -07:00
LLVM GN Syncbot	79018c22fb	[gn build] Port 31ecf8d29d8	2020-09-11 23:54:25 +00:00
Yuanfang Chen	90c8e3a008	Fix a typo in 31ecf8d29d81d196374a562c6d2bd2c25a62861e	2020-09-11 16:51:33 -07:00
Eli Friedman	f9df848755	[ConstantFold] Fold binary arithmetic on scalable vector splats. It's a nice simplification, and it confuses instcombine if we don't do it. Differential Revision: https://reviews.llvm.org/D87422	2020-09-11 16:41:58 -07:00
Yuanfang Chen	cfd0162bc3	[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline Following up on D67687. Please refer to the RFC here http://lists.llvm.org/pipermail/llvm-dev/2020-July/143309.html `CodeGenPassBuilder` is the NPM counterpart of `TargetPassConfig` with below differences. - Debugging features (MIR print/verify, disable pass, start/stop-before/after, etc.) living in `TargetPassConfig` are moved to use PassInstrument as much as possible. (Implementation also lives in `TargetPassConfig.cpp`) - `TargetPassConfig` is a polymorphic base (virtual inheritance) to build the target-dependent pipeline whereas `CodeGenPassBuilder` is the CRTP base/helper to implement the target-dependent pipeline. The motivation is flexibility for targets to customize the pipeline, inlining opportunity, and fits the overall NPM value semantics design. - `TargetPassConfig` is a legacy immutable pass to declare hooks for targets to customize some target-independent codegen layer behavior. This is partially ported to TargetMachine::options. The rest, such as `createMachineScheduler/createPostMachineScheduler`, are left out for now. They should be implemented in LLVMTargetMachine in the future. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D83608	2020-09-11 16:41:17 -07:00
Fangrui Song	e2ec712e25	[MC] Allow .org directives in SHT_NOBITS sections This is used by kvm-unit-tests and can be trivially supported.	2020-09-11 15:12:42 -07:00
Matt Arsenault	844327af57	RegAllocFast: Fix typo in comment	2020-09-11 18:06:14 -04:00
Matt Arsenault	85aefa6cd7	CodeGen: Require SSA to run PeepholeOptimizer	2020-09-11 18:03:04 -04:00
Lang Hames	4ec13fba1c	Re-apply "[ORC] Make MaterializationResponsibility immovable..." with fixes. Re-applies c74900ca672 with fixes for the ThinLtoJIT example.	2020-09-11 14:09:05 -07:00
Mircea Trofin	a6aa03251f	[ThinLTO] Make -lto-embed-bitcode an enum The current behavior of -lto-embed-bitcode is not quite the same as that of -fembed-bitcode. While both populate .llvmbc with bitcode, the latter populates it with pre-optimized bitcode(), while the former with post-optimized. The scenarios driving them are different - the latter's goal is to allow re-compilation, while the former, IIUC, is execution. I plan to add a third mode for thinlto cases, closely-related to -fembed-bitcode's scenario: adding the bitcode pre-optimization, but post-merging. This would allow re-compilation without requiring the other .bc files that were merged (akin to how -fembed-bitcode allows recompilation without all the .h files) The third mode can't co-exist with the current -lto-embed-bitcode mode, because the latter would overwrite it. For clarity, we change -lto-embed-bitcode to be an enum. () That's the compiler semantics. The driver splits compilation in 2 phases, so if -fembed-bitcode is given to the driver, the .llvmbc is optimized bitcode; if the option is passed to the compiler (after -cc1), the section is pre-optimized. Differential Revision: https://reviews.llvm.org/D87477	2020-09-11 13:24:54 -07:00
Sam Clegg	4522227204	[WebAssembly] Add assembly syntax for mutable globals This adds and optional ", immutable" to the end of a `.globaltype` declaration. I would have prefered to match the `.wat` syntax where immutable is the default and `mut` is the signifier for mutable globals. Sadly changing the default would break backwards compat with existing assembly in the wild so I think its best to stick with this approach. Differential Revision: https://reviews.llvm.org/D87515	2020-09-11 11:11:02 -07:00
David Green	24b5eb8f5a	[ARM] Extra MLA reductions tests. NFC	2020-09-11 17:51:15 +01:00
Jonas Devlieghere	3212fcd778	Revert "[examples] Adjust ThinLtoInstrumentationLayer for emit signature change" I raced with Florian and he had already reverted the original patch.	2020-09-11 09:22:42 -07:00
YangZhihui	86cfd2d991	[docs] Fix typos Differential Revision: https://reviews.llvm.org/D87356	2020-09-11 17:58:07 +02:00
Sanjay Patel	0a0854f6fe	[SLP] further limit bailout for load combine candidate (PR47450) The test example based on PR47450 shows that we can match non-byte-sized shifts, but those won't ever be bswap opportunities. This isn't a full fix (we'd still match if the shifts were by 8-bits for example), but this should be enough until there's evidence that we need to do more (this is a borderline case for vectorization in the first place).	2020-09-11 11:56:11 -04:00
Sanjay Patel	5a21be3257	[SLP] add test for missed store vectorization; NFC	2020-09-11 11:56:11 -04:00
Jonas Devlieghere	08b8817ba3	[examples] Adjust ThinLtoInstrumentationLayer for emit signature change Emit now takes a std::unique_ptr<MaterializationResponsibility> instead of a MaterializationResponsibility directly. This should fix: http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake-standalone/	2020-09-11 08:33:37 -07:00
Nico Weber	3f7d83ad04	[gn build] slightly improve libcxx_needs_site_config The write_cmake_config() here still looks busted, but at least the value that's explicitly set is now set correctly.	2020-09-11 11:32:20 -04:00
Krzysztof Parzyszek	fed5c200ff	[DSE] Make sure that DSE+MSSA can handle masked stores Differential Revision: https://reviews.llvm.org/D87414	2020-09-11 10:00:21 -05:00
Sanjay Patel	7d22ac8f46	Revert "[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430)" This reverts commit 324a53205a3af979e3de109fdd52f91781816cba. On closer examination of at least one of the test diffs, this does not appear to be correct in all cases. Even the existing 'nsw' creation may be wrong based on this example: https://alive2.llvm.org/ce/z/uL4Hw9 https://alive2.llvm.org/ce/z/fJMKQS	2020-09-11 10:54:48 -04:00
Sanjay Patel	f7e01a7376	[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430) There's no signed wrap if both geps have 'inbounds': https://alive2.llvm.org/ce/z/nZkQTg https://alive2.llvm.org/ce/z/7qFauh	2020-09-11 10:39:09 -04:00
Sanjay Patel	7ae9f7c717	[InstCombine] add/move tests for ptr diff; NFC	2020-09-11 10:39:09 -04:00
Jeremy Morse	4052430556	[LiveDebugValues][NFC] Add additional tests These were supposed to be in 0caeaff1237 and D83054, but a fat-fingered error when git-adding missed them. Ooops.	2020-09-11 15:34:37 +01:00
Simon Pilgrim	7f2aab2015	[NFC] Fix compiler warnings due to integer comparison of different signedness Fix by directly using INT_MAX and INT32_MAX. Patch by: @nullptr.cpp (Yang Fan) Differential Revision: https://reviews.llvm.org/D87347	2020-09-11 15:32:03 +01:00

1 2 3 4 5 ...

203351 Commits