llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 04:32:44 +01:00

Author	SHA1	Message	Date
Mats Larsen	2251d44074	[NewPM] Add C bindings for new pass manager This patch contains the bare minimum to run the new Pass Manager from the LLVM-C APIs. It does not feature PGOOptions, PassPlugins or Debugify in its current state. Bugzilla: PR48499 Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D102136	2021-05-17 10:48:45 -07:00
Roman Lebedev	4892585b46	[LoopIdiom] 'logical right-shift until zero' ('count active bits') "on steroids" idiom recognition. I think i've added exhaustive test coverage, and i have verified that alive2 is happy with all the tests, so in principle i'm fine with landing this without review, but just in case.. This adds support for the "count active bits" pattern, i.e.: ``` int countActiveBits(unsigned val) { int cnt = 0; for( ; (val >> cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one, since that is what i need: ``` int countActiveBits(unsigned val, int start, int off) { int cnt; for (cnt = start; val >> (cnt + off); cnt++) ; return cnt; } ``` I've followed in footstep of 'left-shift until bittest' idiom (D91038), in the sense that iff the `ctlz` intrinsic is cheap, we'll transform, regardless of all other factors. This can have a shocking effect on certain benchmarks: ``` raw.pixls.us-unique/Olympus/XZ-1$ /repositories/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf RUNNING: /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmp49_28zcm 2021-05-09T01:06:05+03:00 Running /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench Run on (32 X 3600.24 MHz CPU s) CPU Caches: L1 Data 32 KiB (x16) L1 Instruction 32 KiB (x16) L2 Unified 512 KiB (x16) L3 Unified 32768 KiB (x2) Load Average: 5.26, 6.29, 3.49 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ p1319978.orf/threads:32/process_time/real_time_mean 145 ms 145 ms 128 0.145319 0.999981 10.1568M 69.8949M 69.8936M 6.88159 6.88146 0.145322 p1319978.orf/threads:32/process_time/real_time_median 145 ms 145 ms 128 0.145317 0.999986 10.1568M 69.8941M 69.8931M 6.88151 6.88141 0.145319 p1319978.orf/threads:32/process_time/real_time_stddev 0.766 ms 0.766 ms 128 766.586u 15.1302u 0 354.167k 354.098k 0.0348699 0.0348631 766.469u RUNNING: /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpwb9sw2x0 2021-05-09T01:06:24+03:00 Running /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Run on (32 X 3599.95 MHz CPU s) CPU Caches: L1 Data 32 KiB (x16) L1 Instruction 32 KiB (x16) L2 Unified 512 KiB (x16) L3 Unified 32768 KiB (x2) Load Average: 4.05, 5.95, 3.43 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ p1319978.orf/threads:32/process_time/real_time_mean 99.8 ms 99.8 ms 128 0.0997758 0.999972 10.1568M 101.797M 101.794M 10.0225 10.0222 0.0997786 p1319978.orf/threads:32/process_time/real_time_median 99.7 ms 99.7 ms 128 0.0997165 0.999985 10.1568M 101.857M 101.854M 10.0284 10.0281 0.0997195 p1319978.orf/threads:32/process_time/real_time_stddev 0.224 ms 0.224 ms 128 224.166u 34.345u 0 226.81k 227.231k 0.0223309 0.0223723 224.586u Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------- p1319978.orf/threads:32/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 128 vs 128 p1319978.orf/threads:32/process_time/real_time_mean -0.3134 -0.3134 145 100 145 100 p1319978.orf/threads:32/process_time/real_time_median -0.3138 -0.3138 145 100 145 100 p1319978.orf/threads:32/process_time/real_time_stddev -0.7073 -0.7078 1 0 1 0 ``` Reviewed By: craig.topper, zhuhan0 Differential Revision: https://reviews.llvm.org/D102116	2021-05-17 20:33:33 +03:00
Steffen Larsen	1e7a7bb573	[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions Adds NVPTX builtins and intrinsics for the CUDA PTX `redux.sync` instructions for `sm_80` architecture or newer. PTX ISA description of `redux.sync`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-redux-sync Authored-by: Steffen Larsen <steffen.larsen@codeplay.com> Differential Revision: https://reviews.llvm.org/D100124	2021-05-17 09:46:59 -07:00
Stuart Adams	4b94b88699	[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions Adds NVPTX builtins and intrinsics for the CUDA PTX `cp.async` instructions for `sm_80` architecture or newer. PTX ISA description of `cp.async`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-asynchronous-copy https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive Authored-by: Stuart Adams <stuart.adams@codeplay.com> Co-Authored-by: Alexander Johnston <alexander@codeplay.com> Differential Revision: https://reviews.llvm.org/D100394	2021-05-17 09:46:59 -07:00
Alex Zinenko	01e701738c	[llvm][doc] fix header for read/write_register intrinsics in LangRef Mutli-line headers are not allowed in RST, reformat the header to be a single wide line.	2021-05-17 18:38:16 +02:00
Florian Hahn	f33ce47836	[LoopUnroll] Add multi-exit test which does not exit through latch. This patch adds a new test for loop-unrolling with multiple exiting blocks, where the latch does not exit, but the header does. This can happen when the loop has not been rotated, e.g. due to minsize. Inspired by the following end-to-end test, using -Oz https://godbolt.org/z/fP6sna8qK bool foo(int *ptr, int limit) { #pragma clang loop unroll(full) for (unsigned int i = 0; i < 4; i++) { if (ptr[i] > limit) return false; ptr[i]++; } return true; }	2021-05-17 17:08:15 +01:00
Stanislav Mekhanoshin	ca4e8cd7a6	[AMDGPU] Set unused dst_sel to '?' in the encoding This is to allow disasm with any bits in the unused fields. Differential Revision: https://reviews.llvm.org/D102526	2021-05-17 08:38:52 -07:00
Sanjay Patel	afbea032ea	[x86] update fma test with deprecated intrinsics; NFC All of the CHECK lines should be identical to before, but without any of the x86-specific calls that were replaced with generic FMA long ago. The file still has value because it shows a miscompile as demonstrated in D90901, but we probably need to add tests with FMF to make that explicit without losing coverage.	2021-05-17 11:06:32 -04:00
Simon Pilgrim	a3d5fb0749	[X86] Don't dereference a dyn_cast<> - use a cast<> instead. NFCI. dyn_cast<> can return null if the cast fails, by using cast<> we assert that the cast is correct helping to avoid a potential null dereference.	2021-05-17 15:58:32 +01:00
Jay Foad	257b9bc0d9	[AMDGPU] Tweak VOP3_INTERP16 profile Set the output register class based on the output type, instead of hard-coding VGPR_32. I think this is more correct. It doesn't make any difference at the moment because we use the same class for 16- and 32-bit results, but it might in future if we make more use of true 16-bit register classes. Differential Revision: https://reviews.llvm.org/D102622	2021-05-17 15:28:00 +01:00
Fraser Cormack	6af6b413bf	[RISCV][NFC] Correct alignment in scatter/gather tests This lays the groundwork for changes to alignment in D102493 to be more apparent.	2021-05-17 15:12:55 +01:00
Andy Yankovsky	dc2e44c588	[APInt][NFC] Fix typo vlalue->value Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D102618	2021-05-17 16:18:22 +02:00
Irina Dobrescu	f3133f1d74	[AArch64] Lower bitreverse in ISel Adding lowering support for bitreverse. Previously, lowering bitreverse would expand it into a series of other instructions. This patch makes it so this produces a single rbit instruction instead. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D102397	2021-05-17 13:35:27 +01:00
Benjamin Kramer	3ae6f7dd7e	Put back the trailing commas on TYPED_TEST_SUITE This avoids a -pedantic warning: warning: ISO C++11 requires at least one argument for the "..." in a variadic macro See also https://github.com/google/googletest/issues/2271	2021-05-17 14:14:13 +02:00
Roman Lebedev	faba1577fb	[InstCombine] isFreeToInvert(): constant expressions aren't free to invert (PR50370) This fixes https://bugs.llvm.org/show_bug.cgi?id=50370, which reports a yet another endless combine loop, this one regressed from 554b1bced325a8d860ad00bd59020d66d01c95f8, which fixed yet another endless combine loop (PR50308) This code had fallen into the very typical pitfall of forgetting that constant expressions exist, and they aren't free to invert, because the `not` won't be absorbed by the "constant", but will remain a (constant) expression...	2021-05-17 14:58:05 +03:00
Simon Pilgrim	e099d7d19d	[X86] Regenerate cmov.ll tests	2021-05-17 12:50:38 +01:00
James Henderson	b7e39fd3ed	[debuginfo-tests] Fix environment variable used to specify LLDB Currently, if the user specifies the environment variable 'CLANG', tests will attempt to use the value as a path to the clang executable. Previously, lldb could also be specified via the CLANG environment variable, but this was almost certainly a bug, because that meant both clang and lldb would have the same path. This patch changes the environment variable for lldb to 'LLDB'. Reviewed by: thopre, teemperor Differential Revision: https://reviews.llvm.org/D101982	2021-05-17 12:50:10 +01:00
Benjamin Kramer	1e6c6cdc2d	Clean up uses of gmock Invoke in an attempt to make it work with GCC 6.2. NFCI.	2021-05-17 13:48:45 +02:00
Nemanja Ivanovic	8655d76475	[PowerPC] Add patterns for vselect of v1i128 These patterns are missing even though the underlying instruction doesn't really care about the type. Added these patterns to resolve https://bugs.llvm.org/show_bug.cgi?id=50084	2021-05-17 06:37:46 -05:00
Max Kazantsev	4cff0ef1cf	[Test] Auto-generate checks in a test (prepring to update)	2021-05-17 18:26:47 +07:00
Nemanja Ivanovic	5fe52009aa	[PowerPC] Do not emit dssall on AIX This instruction is a nop on all server cores (certainly on all cores that AIX supports) so it is fine to emit a nop instead of it. In fact, that is exactly what XL emits. So we emit a nop on AIX and we leave the codegen as is on other platforms since there may indeed be cores out there for which this actually does some prefetching.	2021-05-17 06:08:06 -05:00
Nico Weber	b25c3747e2	[gn build] reformat all gn files $ git ls-files '.gn' '.gni' \| xargs llvm/utils/gn/gn.py format	2021-05-17 06:59:43 -04:00
Nico Weber	3e66c8c917	[gn build] Add build file for msan runtime Works for the examples on https://clang.llvm.org/docs/MemorySanitizer.html Differential Revision: https://reviews.llvm.org/D102554	2021-05-17 06:58:10 -04:00
Tim Northover	4f5c40575e	X86: support Swift Async context This adds support to the X86 backend for the newly committed swiftasync function parameter. If such a (pointer) parameter is present it gets stored into an augmented frame record (populated in IR, but generally containing enhanced backtrace for coroutines using lots of tail calls back and forth). The context frame is identical to AArch64 (primarily so that unwinders etc don't get extra complexity). Specfically, the new frame record is [AsyncCtx, %rbp, ReturnAddr], and its presence is signalled by bit 60 of the stored %rbp being set to 1. %rbp still points to the frame pointer in memory for backwards compatibility (only partial on x86, but OTOH the weird AsyncCtx before the rest of the record is because of x86).	2021-05-17 11:56:16 +01:00
Tim Northover	47e1a53df5	AArch64: mark x22 livein if it's an async context that gets stored. This fixes a crash with expensive checks enabled (the verifier was not happy).	2021-05-17 11:56:03 +01:00
Max Kazantsev	a6b214f432	[Test] Fix test to make the transform for which is was added legal %limit in these tests is supposed to be positive.	2021-05-17 17:19:01 +07:00
Simon Pilgrim	633bafaf5f	[TargetLowering] prepareUREMEqFold/prepareSREMEqFold - account for non legal shift types Ensure we tell getShiftAmountTy that we're working with pre-legalized types to prevent cases where the (legalized) shift type can no longer handle the (non-legalized) type width. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34366	2021-05-17 11:03:27 +01:00
Tim Northover	fc5daa6083	IR/AArch64/X86: add "swifttailcc" calling convention. Swift's new concurrency features are going to require guaranteed tail calls so that they don't consume excessive amounts of stack space. This would normally mean "tailcc", but there are also Swift-specific ABI desires that don't naturally go along with "tailcc" so this adds another calling convention that's the combination of "swiftcc" and "tailcc". Support is added for AArch64 and X86 for now.	2021-05-17 10:48:34 +01:00
Jacob Bramley	2320a8cd94	[AArch64] Lower fptoi.sat intrinsics. AArch64's fctv instructions implement the saturating behaviour that the fpto*i.sat intrinsics require, in cases where the destination width matches the saturation width. Lowering them removes a lot of unnecessary generated code. Only scalar lowerings are supported for now. Differential Revision: https://reviews.llvm.org/D102353	2021-05-17 10:19:19 +01:00
Fraser Cormack	69fc258fc0	[DAGCombiner] Relax an assertion to an early return The select-of-constants transform was asserting that its constant vector inputs did not implicitly truncate their input without that as an explicit precondition to the function. This patch relaxes that assertion into an early return to skip the optimization. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D102393	2021-05-17 09:15:55 +01:00
Chen Zheng	9ccc0ec15a	[PowerPC] add a testcase for reverse memory op; nfc	2021-05-17 03:29:14 -04:00
Hongtao Yu	4a7561c809	[CSSPGO] Update pseudo probe distribution factor based on inline context. With prelink inlining, pseudo probes with same ID can come from different inline contexts. Such probes should not share samples and their factors should be fixed up separately. I'm seeing 0.3% speedup for SPEC2017 overall. Benchmark 631.deepsjeng_s benefits the most, about 4%. Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D102429	2021-05-16 23:11:36 -07:00
Arthur Eubanks	99d811bdc5	Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry" This reverts commit 16748bd2fb1fe10d7d097961f1988327338f3f9f. Causes https://crbug.com/1209013	2021-05-16 22:02:10 -07:00
Arthur Eubanks	630c86e151	Revert "[NFC] Use ArgListEntry indirect types more in ISel lowering" This reverts commit 85af8a8c1b574faa0d5d57d189ae051debdfada8.	2021-05-16 22:00:54 -07:00
Pan, Tao	f9d8052498	[SelectionDAG] Make fast and linearize visible by clang -pre-RA-sched ScheduleDAGFast.cpp is compiled to object file, but the ScheduleDAGFast object file isn't linked into clang executable file as no symbol is referred by outside. Add calling to createXxx of ScheduleDAGFast.cpp, then the ScheduleDAGFast object file will be linked into clang executable file. The static RegisterScheduler will register scheduler fast and linearize at clang boot time. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D101601	2021-05-17 11:25:15 +08:00
Ben Shi	796d22443b	[RISCV] Optimize or/xor with immediate in the zbs extension Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102398	2021-05-17 10:59:52 +08:00
Ben Shi	f38e4dcf6f	[RISCV][test] Add new tests of or/xor in the zbs extension Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D102396	2021-05-17 09:47:23 +08:00
David Blaikie	2cfbc27b4f	Fix some deprecated copy ops in google mock	2021-05-16 15:59:06 -07:00
Craig Topper	05a02f8da8	[RISCV] Replace AddiPair ComplexPattern with a PatLeaf. NFC The ComplexPattern is looking for an immediate in a certain range that has a single use. This can be handled with a PatLeaf since we aren't matching multiple patterns or checking any complicated relationships between nodes. This shrinks the isel table a little bit since tablegen no longer has to generate patterns with commuted operands. With the PatLeaf, tablegen can see we're matching an immediate which should always be on the right hand side of add. Reviewed By: benshi001 Differential Revision: https://reviews.llvm.org/D102510	2021-05-16 12:17:52 -07:00
Fangrui Song	89806dccc8	[test] Improve CodeGen/*/semantic-interposition-asm.ll	2021-05-16 11:17:09 -07:00
Alessandro Decina	770bc060c5	[BPF] add support for 32 bit registers in inline asm Add "w" constraint type which allows selecting 32 bit registers. 32 bit registers were added in https://reviews.llvm.org/rGca31c3bb3ff149850b664838fbbc7d40ce571879. Differential Revision: https://reviews.llvm.org/D102118	2021-05-16 11:01:47 -07:00
Lang Hames	e4c74b057d	[JITLink] Fix symbol comparator in LinkGraph::dump. The existing implementation did not provide a strict weak ordering.	2021-05-16 10:11:58 -07:00
David Green	99f589780c	[CPG][ARM] Optimize towards branch on zero in codegenprepare This adds a simple fold into codegenprepare that converts comparison of branches towards comparison with zero if possible. For example: %c = icmp ult %x, 8 br %c, bla, blb %tc = lshr %x, 3 becomes %tc = lshr %x, 3 %c = icmp eq %tc, 0 br %c, bla, blb As a first order approximation, this can reduce the number of instructions needed to perform the branch as the shift is (often) needed anyway. At the moment this does not effect very much, as llvm tends to prefer the opposite form. But it can protect against regressions from commits like rG9423f78240a2. Simple cases of Add and Sub are added along with Shift, equally as the comparison to zero can often be folded with cpsr flags. Differential Revision: https://reviews.llvm.org/D101778	2021-05-16 17:54:06 +01:00
Philip Reames	9bcc60e5c1	Revert "Do actual DCE in LoopUnroll (try 2)" This reverts commit 653fa0b46ae34c06495b542414b704b30381cd02. Reported to trigger pr50354. Reverting until investigated.	2021-05-16 09:38:36 -07:00
David Green	010e023bd4	[ARM] Extra branch on zero tests. NFC	2021-05-16 17:22:52 +01:00
Kai Luo	f5ce33b9fe	[Utils] Fix indentation error in utils/wciia.py Running this script gives ``` "llvm-project/llvm/./utils/wciia.py", line 56 if word == "N:": TabError: inconsistent use of tabs and spaces in indentation ``` Under emacs' whitespace-mode, it shows ``` for·line·in·code_owners_file:$ ····for·word·in·line.split():$ » if·word·==·"N:":$ » » name·=·line[2:].strip()$ » » if·code_owner:$ » » » process_code_owner(code_owner)$ » » » code_owner·=·{}$ ``` I use `yapf` to format this script directly and it's running correctly.	2021-05-16 22:34:09 +08:00
Nikita Popov	945b8d0f5d	[CaptureTracking] Simplify reachability check (NFCI) This code was re-implementing the same-BB case of isPotentiallyReachable(). Historically, this was done because CaptureTracking used additional caching for local dominance queries. Now that it is no longer needed, the code is effectively the same as isPotentiallyReachable(). The only difference are extra checks for invoke/phis. These are misleading checks related to dominance in the value availability sense that are not relevant for control reachability. The invoke check was correct but redundant in that invokes are always terminators, so `I` could never come before the invoke. The phi check is a matter of interpretation (should an earlier phi node be considered reachable from a later phi node in the same block?) but ultimately doesn't matter because phis don't capture anyway.	2021-05-16 16:04:10 +02:00
Nikita Popov	88e5c8610b	Reapply [CaptureTracking] Do not check domination Reapply after adjusting the synchronized.m test case, where the TODO is now resolved. The pointer is only captured on the exception handling path. ----- For the CapturesBefore tracker, it is sufficient to check that I can not reach BeforeHere. This does not necessarily require that BeforeHere dominates I, it can also occur if the capture happens on an entirely disjoint path. This change was previously accepted in D90688, but had to be reverted due to large compile-time impact in some cases: It increases the number of reachability queries that are performed. After recent changes, the compile-time impact is largely mitigated, so I'm reapplying this patch. The remaining compile-time impact is largely proportional to changes in code-size.	2021-05-16 15:46:31 +02:00
Florian Hahn	f52bf2cd81	[Matrix] Fix some newpm check lines, which fail on some bots. (2)	2021-05-16 14:11:18 +01:00
Simon Pilgrim	c0cbb64a42	[X86][SSE] Pull out combineToHorizontalAddSub helper from inside (F)ADD/SUB combines (REAPPLIED). NFCI. The intention is to be able to run this from additional locations (such as shuffle combining) in the future. Reapplies rGb95a103808ac (after reversion at rGc012a388a15b), with SSE3/SSSE3 typo fix, test added at rG0afb10de1449.	2021-05-16 13:50:58 +01:00

1 2 3 4 5 ...

215897 Commits