llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 12:43:36 +01:00

Author	SHA1	Message	Date
Anirudh Prasad	f663aad8da	[SystemZ][z/OS] Implement getHostCPUName for z/OS - Currently, the host cpu information is not easily available on z/OS as in other platforms. - This information is stored in the Communications Vector Table (https://www.ibm.com/docs/en/zos/2.2.0?topic=information-cvt-mapping) Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D102793	2021-05-25 11:18:12 -04:00
Joe Nash	75d94cb27c	[AMDGPU] Allow no-modifier operands in cvtDPP NFC, since no instructions have their AsmMatchConverter changed, but prepares for that to happen. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103046 Change-Id: I6afefad899076de7b9a412374d09b95b29e012fa	2021-05-25 10:58:06 -04:00
Simon Pilgrim	414d8672a3	[CostModel][X86] Improve accuracy of vXi64 vector non-uniform shift costs on AVX2+ targets rG1ad4f887bd7692a9e63fb42586f0ece366f2fe01 incorrectly assumed that vXi64 non-uniform shifts were slow like vXi32 were - but llvm-mca (+Agner) both confirm that Haswell/Broadwell are full rate.	2021-05-25 15:58:23 +01:00
Simon Pilgrim	ddf476d680	[X86][SSE] Regenerate vector shift codegen tests. NFCI.	2021-05-25 15:58:22 +01:00
Joe Nash	b28aeaea9c	[AMDGPU] More accurate names for dpp operand types NFC. Renames the variable in the dpp input operand generators from DstRC to OldRC, because that is what it actually sets. Also documents the importance of setting HasModifiers = 0 in the dpp8 asm string. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D103047 Change-Id: Ice69ae38f644de7f228a75ca47c43e88b1f7d9e1	2021-05-25 10:35:25 -04:00
David Goldblatt	ce067f4056	[InstSimplify] Transform X * Y % Y --> 0 simplifyDiv already handles the case X * Y / Y --> X (barring overflow). This adds the equivalent handling to simplifyRem. Correctness: https://alive2.llvm.org/ce/z/J2cUbS https://alive2.llvm.org/ce/z/us9NUM https://alive2.llvm.org/ce/z/AvaDGJ https://alive2.llvm.org/ce/z/kq9ige Extending the situations in which we apply this transform would not be correct: https://alive2.llvm.org/ce/z/Lf9V63 https://alive2.llvm.org/ce/z/6RPQK3 https://alive2.llvm.org/ce/z/p9UdxC https://alive2.llvm.org/ce/z/A2zlhE https://alive2.llvm.org/ce/z/vHTtLw https://alive2.llvm.org/ce/z/lvpH42 Differential Revision: https://reviews.llvm.org/D102864	2021-05-25 10:16:04 -04:00
Florian Hahn	82f075623d	[VectorCombine] Add test that combines load & store scalarization.	2021-05-25 14:28:37 +01:00
Florian Hahn	5a31a2bcff	[VectorCombine] Use constant range info for index scalarization legality. We can only scalarize memory accesses if we know the index is valid. This patch adjusts canScalarizeAcceess to fall back to computeConstantRange to check if the index is known to be valid. Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D102476	2021-05-25 13:58:42 +01:00
Sanjay Patel	8343401f13	[InstCombine] canonicalize cast before unary shuffle We could go either direction on this transform. VectorCombine already goes this way for bitcasts (and handles more complicated cases using the cost model), so let's try cast-first. Deferring completely to VectorCombine is another possibility. But the backend should be able to invert this easily when the vectors have the same shape, so it doesn't seem like a transform that we need to avoid. The motivating example from https://llvm.org/PR49081 has an int-to-float sandwiched between 2 shuffles, and the backend currently does not reduce that, so on x86, we get something like: pshufd $249, %xmm0, %xmm0] cvtdq2ps %xmm0, %xmm0 shufps $144, %xmm0, %xmm0 ...instead of just a single conversion instruction. Differential Revision: https://reviews.llvm.org/D103038	2021-05-25 08:43:09 -04:00
Sanjay Patel	e031b0571b	[InstCombine] add tests for cast-of-shuffle; NFC	2021-05-25 08:43:09 -04:00
Chuanqi Xu	1abdf4bef5	[NFC] [Coroutines] Remove unused variable: UnreachableCache	2021-05-25 20:33:46 +08:00
Roman Lebedev	895d1ce31b	[LoopIdiom] Support 'left-shift until zero' idiom This adds support for the "count active bits" pattern, i.e.: ``` int countBits(unsigned val) { int cnt = 0; for( ; (val << cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one: ``` int countBits(unsigned val, int start, int off) { int cnt; for (cnt = start; val << (cnt + off); cnt++) ; return cnt; } ``` alive2 is happy with all the tests there. Note that, again, much like with the right-shift cases, we don't require the `val != 0` guard. This is the last pattern that was supported by `detectShiftUntilZeroIdiom()`, which now becomes obsolete.	2021-05-25 15:26:35 +03:00
Roman Lebedev	36a7fe2d2f	[NFC][LoopIdiom] Add tests for 'left-shift until zero' idiom	2021-05-25 15:26:34 +03:00
Bradley Smith	1d41093ecb	[AArch64][SVE] Add fixed length codegen for FP_TO_{S,U}INT/{S,U}INT_TO_FP Depends on D102607 Differential Revision: https://reviews.llvm.org/D102777	2021-05-25 12:54:55 +01:00
Roman Lebedev	af1a311083	[LoopIdiom] Support 'arithmetic right-shift until zero' idiom This adds support for the "count active bits" pattern, i.e.: ``` int countActiveBits(signed val) { int cnt = 0; for( ; (val >> cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one: ``` int countActiveBits(signed val, int start, int off) { int cnt; for (cnt = start; val >> (cnt + off); cnt++) ; return cnt; } ``` This directly matches the existing 'logical right-shift until zero' idiom. alive2 is happy with all the tests there. Note that, again, much like with the original unsigned case, we don't require the `val != 0` guard. The old `detectShiftUntilZeroIdiom()` already supports this pattern, the idea here is that the `val` must be positive (have at least one leading zero), because otherwise the loop is non-terminating, but since it is not `while(1)`, that would have been UB.	2021-05-25 14:30:49 +03:00
Roman Lebedev	6b3c845322	[NFC][LoopIdiom] Add tests for 'arithmetic right-shift until zero' idiom	2021-05-25 14:30:49 +03:00
Marco Elver	b835b9cf36	[SanitizeCoverage] Add support for NoSanitizeCoverage function attribute We really ought to support no_sanitize("coverage") in line with other sanitizers. This came up again in discussions on the Linux-kernel mailing lists, because we currently do workarounds using objtool to remove coverage instrumentation. Since that support is only on x86, to continue support coverage instrumentation on other architectures, we must support selectively disabling coverage instrumentation via function attributes. Unfortunately, for SanitizeCoverage, it has not been implemented as a sanitizer via fsanitize= and associated options in Sanitizers.def, but rolls its own option fsanitize-coverage. This meant that we never got "automatic" no_sanitize attribute support. Implement no_sanitize attribute support by special-casing the string "coverage" in the NoSanitizeAttr implementation. To keep the feature as unintrusive to existing IR generation as possible, define a new negative function attribute NoSanitizeCoverage to propagate the information through to the instrumentation pass. Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035 Reviewed By: vitalybuka, morehouse Differential Revision: https://reviews.llvm.org/D102772	2021-05-25 12:57:14 +02:00
Simon Pilgrim	91c3ce1475	Fix MSVC "truncation of constant value" warning. NFCI.	2021-05-25 11:35:57 +01:00
Simon Pilgrim	491a1dbd4b	[CostModel][X86] Improve accuracy of vXi8/vXi16 vector non-uniform shift costs on AVX2/AVX512 targets Determined from llvm-mca analysis, AVX2+ capable targets have a higher throughput for VPBLENDVB and VPMOVZX ops, making it cheaper to perform shift+select patterns for vXi8 shifts or extend/shift/truncate for vXi16 shifts. Similarly AVX512BW can perform vXi8 as extend/shift/truncate patterns.	2021-05-25 11:35:57 +01:00
Christudasan Devadasan	50950cea4e	[AMDGPU] Remove dead declaration (NFC).	2021-05-25 16:04:04 +05:30
Florian Hahn	c4658762f3	[AArch64] Add tests for lowering of vector load + single extract. Currently the vector load + extract gets lowered to a single scalar store, not accounting for the fact that the index could be out-of-bounds, which is poison, not UB. See PR50382.	2021-05-25 11:09:15 +01:00
Stanislav Mekhanoshin	c7c6ba1331	[IR] Allow Value::replaceUsesWithIf() to process constants The change is currently NFC, but exploited by the depending D102954. Code to handle constants is borrowed from the general implementation of Value::doRAUW(). Differential Revision: https://reviews.llvm.org/D103051	2021-05-25 02:12:01 -07:00
Roman Lebedev	5d534d8259	[llvm-exegesis] Loop unrolling for loop snippet repetitor mode I really needed this, like, factually, yesterday, when verifying dependency breaking idioms for AMD Zen 3 scheduler model. Consider the following example: ``` $ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=duplicate Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4a7e50.o --- mode: inverse_throughput key: instructions: - 'VPXORYrr YMM0 YMM0 YMM0' config: '' register_initial_values: [] cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 1000000 measurements: - { key: inverse_throughput, value: 0.31025, per_snippet_value: 0.31025 } error: '' info: '' assembled_snippet: C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C3 ... ``` What does it tell us? So wait, it can only execute ~3 x86 AVX YMM PXOR zero-idioms per cycle? That doesn't seem right. That's even less than there are pipes supporting this type of op. Now, second example: ``` $ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2418b5.o --- mode: inverse_throughput key: instructions: - 'VPXORYrr YMM0 YMM0 YMM0' config: '' register_initial_values: [] cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 1000000 measurements: - { key: inverse_throughput, value: 1.00011, per_snippet_value: 1.00011 } error: '' info: '' assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3 ... ``` Now that's just worse. Due to the looping, the throughput completely plummeted, and now we can only do a single instruction/cycle!? That's not great. And final example: ``` $ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop --loop-body-size=1000 Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c402e2.o --- mode: inverse_throughput key: instructions: - 'VPXORYrr YMM0 YMM0 YMM0' config: '' register_initial_values: [] cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 1000000 measurements: - { key: inverse_throughput, value: 0.167087, per_snippet_value: 0.167087 } error: '' info: '' assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3 ... ``` So if we merge the previous two approaches, do duplicate this single-instruction snippet 1000x (loop-body-size/instruction count in snippet), and run a loop with 1000 iterations over that duplicated/unrolled snippet, the measured throughput goes through the roof, up to 5.9 instructions/cycle, which finally tells us that this idiom is zero-cycle! Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D102522	2021-05-25 12:08:27 +03:00
Kristina Bessonova	a85df00b51	[ARM][NEON] Combine base address updates for vld1x intrinsics Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D102855	2021-05-25 11:06:39 +02:00
David Spickett	a4a13012a7	[clang][ARM] Remove non-existent arm9312 CPU I cannot find documentation on this CPU, and it is not supported by the Arm Compiler 5 product either. It was likely a mistake or a different name for the "ep9312", which is an Arm based Cirrus Logic chip. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D103024	2021-05-25 08:58:24 +00:00
David Spickett	523f0589d5	[llvm][ARM] Remove non-existent arm1176j-s CPU This was removed in https://reviews.llvm.org/D52594 for clang. The one test using it has been updated to use the mpcore CPU as the linked clang change does. This is part of fixing https://bugs.llvm.org/show_bug.cgi?id=50454. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D103022	2021-05-25 08:56:55 +00:00
Benjamin Kramer	9d856bcc84	[GlobalISel] Silence unused variable warning in Release builds. NFC.	2021-05-25 10:55:29 +02:00
David Spickett	e073151ba7	[clang][ARM] Remove non-existent arm1136jz-s CPU There is an ARM1136JF-S and an ARM1136J-S but I could find no references to an ARM1136JZ-S. In CPU manuals or the manual for Arm Compiler 5. See: https://developer.arm.com/documentation/ddi0211/latest/ https://developer.arm.com/documentation/dui0472/latest/ Using this CPU you get: $ ./bin/clang --target=arm-linux-gnueabihf -march=armv3m -mcpu=arm1136jz-s -c /tmp/test.c -o /tmp/test.o 'arm1136jz-s' is not a recognized processor for this target (ignoring processor) Since the llvm target does not know what it is. This is part of fixing https://bugs.llvm.org/show_bug.cgi?id=50454. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D103019	2021-05-25 08:54:59 +00:00
Alexey Lapshin	28ca22b845	[TRE] Reland: allow TRE for non-capturing calls. The D82085 "allow TRE for non-capturing calls" caused failure during bootstrap. This patch does the same as D82085 plus fixes bootstrap error. The problem with D82085 is that it does not create copies for byval operands, while replacing function call with a branch. Consider following example: ``` int zoo ( S p1 ); int foo ( int count, S p1 ) { if ( count > 10 ) return zoo(p1); // temporarily variable created for passing byvalue parameter // p1 could be used when zoo(p1) is called(after TRE is done). // lifetime.start p1.byvalue.temp return foo(count+1, p1); // lifetime.end p1.byvalue.temp } ``` After recursive call to foo is replaced with a jump into start of the function, its parameters could be passed to zoo function. i.e. temporarily variable created for byvalue parameter "p1" could be passed to zoo. Finally zoo receives broken operand: ``` int foo ( int count, S p1 ) { :tailrecurse p1_tr = phi p1, p1.byvalue.temp if ( count > 10 ) return zoo(p1_tr); // temporarily variable created for passing byvalue parameter // p1 could be used when zoo(p1) is called(after TRE is done). lifetime.start p1.byvalue.temp memcpy (p1.byvalue.temp, p1_tr) count = count + 1 lifetime.end p1.byvalue.temp br tailrecurse } ``` To prevent using p1.byvalue.temp after its scope finished by lifetime.end marker this patch copies value from p1.byvalue.temp into another temporarily variable and then copies this variable into the input parameter for next iteration. This patch passes bootstrap build and bootstrap build with AddressSanitizer. Differential Revision: https://reviews.llvm.org/D85614	2021-05-25 11:35:48 +03:00
Amara Emerson	afaf301e48	[GlobalISel] Fix MachineIRBuilder not using the DstOp argument for G_SHUFFLE_VECTOR.	2021-05-25 00:43:26 -07:00
Ben Shi	a929002e61	[RISCV] Optimize xor/or with immediate in the zbs extension Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102893	2021-05-25 14:14:09 +08:00
Lang Hames	9ee7d4ffa3	[JITLink] Suppress expect-death test in release mode.	2021-05-24 22:57:10 -07:00
Max Kazantsev	701a7ca441	[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration This patch handles one particular case of one-iteration loops for which SCEV cannot straightforwardly prove BECount = 1. The idea of the optimization is to symbolically execute conditional branches on the 1st iteration, moving in topoligical order, and only visiting blocks that may be reached on the first iteration. If we find out that we never reach header via the latch, then the backedge can be broken. Differential Revision: https://reviews.llvm.org/D102615 Reviewed By: reames	2021-05-25 12:43:31 +07:00
Max Kazantsev	bebc3be883	[Test] Add test for unreachable backedge with duplicating predecessors	2021-05-25 12:43:31 +07:00
Christudasan Devadasan	022be2495f	AMDGPU/GlobalISel: Legalize G_[SU]DIVREM instructions Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D100726	2021-05-25 10:51:07 +05:30
Lang Hames	5ce7249a28	[JITLink] Enable creation and management of mutable block content. This patch introduces new operations on jitlink::Blocks: setMutableContent, getMutableContent and getAlreadyMutableContent. The setMutableContent method will set the block content data and size members and flag the content as mutable. The getMutableContent method will return a mutable copy of the existing content value, auto-allocating and populating a new mutable copy if the existing content is marked immutable. The getAlreadyMutableMethod asserts that the existing content is already mutable and returns it. setMutableContent should be used when updating the block with totally new content backed by mutable memory. It can be used to change the size of the block. The argument value should not be shared with any other block. getMutableContent should be used when clients want to modify the existing content and are unsure whether it is mutable yet. getAlreadyMutableContent should be used when clients want to modify the existing content and know from context that it must already be immutable. These operations reduce copy-modify-update boilerplate and unnecessary copies introduced when clients couldn't me sure whether the existing content was mutable or not.	2021-05-24 22:09:36 -07:00
Arthur Eubanks	05ebdcd268	Making Instrumentation aware of LoopNest Pass Intrumentation callbacks are not made aware of LoopNest passes. From the loop pass manager, we can pass the outermost loop of the LoopNest to instrumentation in case of LoopNest passes. The current patch made the change in two places in StandardInstrumentation.cpp. I will submit a proper patch where the OuterMostLoop is passed from the LoopPassManager to the call backs. That way we will avoid making changes at multiple places in StandardInstrumentation.cpp. A testcase also will be submitted. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D102463	2021-05-24 20:25:52 -07:00
maekawatoshiki	b5c1f57fc4	Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass" This reverts commit d65c32fb41b03a35a2a16330ba1ea15cf6818f04.	2021-05-25 11:39:49 +09:00
David Blaikie	8e3f8bcb4e	Add a range-based wrapper for std::unique(begin, end, binary_predicate)	2021-05-24 17:26:46 -07:00
Vitaly Buka	cebde3ea57	[NFC][OMP] Fix 'unused' warning	2021-05-24 17:14:38 -07:00
Jonas Devlieghere	8e74c099a0	[dsymutil] Emit an error when the Mach-O exceeds the 4GB limit. The Mach-O object file format is limited to 4GB because its used of 32-bit offsets in the header. It is possible for dsymutil to (silently) emit an invalid binary. Instead of having consumers deal with this, emit an error instead.	2021-05-24 16:29:06 -07:00
Jonas Devlieghere	8f5c83a0a1	[dsymutil] Use EXIT_SUCCESS and EXIT_FAILURE (NFC)	2021-05-24 16:29:05 -07:00
Jonas Devlieghere	5e6dceac4f	[dsymutil] Compute the output location once per input file (NFC) Compute the location of the output file just once outside the loop over the different architectures.	2021-05-24 16:29:05 -07:00
Anton Afanasyev	114349058c	[SLP] Fix "gathering" of insertelement instructions For rare exceptional case vector tree node (insertelements for now only) is marked as `NeedToGather`, this case is processed by patch. Follow-up of D98714 to fix bug reported here https://reviews.llvm.org/D98714#2764135. Differential Revision: https://reviews.llvm.org/D102675	2021-05-25 01:35:43 +03:00
Hongtao Yu	68eaf84013	[NFC][CSSPGO]llvm-profge] Fix Build warning dueo to an attrbute usage.	2021-05-24 12:59:02 -07:00
Hongtao Yu	9a044d4e9b	[CSSPGO][llvm-profgen] Report samples for untrackable frames. Fixing an issue where samples collected for an untrackable frame is not reported. An untrackable frame refers to a frame whose caller is untrackable due to missing debug info or pseudo probe. Though the frame is connected to its parent frame through the frame pointer chain at runtime, the compiler cannot build the connection without debug info or pseudo probe. In such case we just need to report the untrackable frame as the base frame and all of its child frames. With more samples reported I'm seeing this improves the performance of an internal benchmark by 2.5%. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D102961	2021-05-24 12:39:12 -07:00
Nick Desaulniers	6d2f5ef7fe	fix up test from D102742 In D102742, I mistakenly put the split file designator above a bunch of CHECK lines, which unintentionally removed the CHECKs from actually being verified. This can be verified by observing: <build dir>/test/CodeGen/X86/Output/stack-protector-3.ll.tmp/main.ll	2021-05-24 12:09:02 -07:00
LLVM GN Syncbot	604164d943	[gn build] Port b510e4cf1b96	2021-05-24 18:48:17 +00:00
Craig Topper	9ed3d57a44	[RISCV] Add a vsetvli insert pass that can be extended to be aware of incoming VL/VTYPE from other basic blocks. This is a replacement for D101938 for inserting vsetvli instructions where needed. This new version changes how we track the information in such a way that we can extend it to be aware of VL/VTYPE changes in other blocks. Given how much it changes the previous patch, I've decided to abandon the previous patch and post this from scratch. For now the pass consists of a single phase that assumes the incoming state from other basic blocks is unknown. A follow up patch will extend this with a phase to collect information about how VL/VTYPE change in each block and a second phase to propagate this information to the entire function. This will be used by a third phase to do the vsetvli insertion. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D102737	2021-05-24 11:47:27 -07:00
LLVM GN Syncbot	90d57fab4e	[gn build] Port a64ebb863727	2021-05-24 18:36:50 +00:00

1 2 3 4 5 ...

216303 Commits