llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 18:54:02 +01:00

Author	SHA1	Message	Date
David Green	9491485bb0	[ARM] Remove new ARMSelectionDAGTest unittest. This removes the unit test from a968e7b82eac as it reportedly causes some link problems. It can be reinstated once the issues are understood and sorted out.	2021-03-04 10:14:35 +00:00
Max Kazantsev	3f3acbc661	[X86][CodeGenPrepare] Try to reuse IV's incremented value instead of adding the offset, part 2 This patch enables the case where we do not completely eliminate offset. Supposedly in this case we reduce live range overlap that never harms, but since there are doubts this is true, this goes as a separate change. Differential Revision: https://reviews.llvm.org/D96399 Reviewed By: reames	2021-03-04 16:47:43 +07:00
James Henderson	7aeb2619a9	[llvm-objcopy][test] Improve many-sections object and test case Additionally do some test tidy-ups and improve coverage of symbol section indexes where the logical section index >= SHN_LORESERVE. The symbol and section names in the many-section input object were mostly shared. This patch changes them to be distinct, enabling different operations such as --add-symbol, to be more targeted, when using the object. It also makes the test less confusing and removes some oddness in the symbol table order, presumably caused by the duplicate names. The input object was built from assembly that was of the form: .section s1 sym1: .section s2 sym2: ... with a total of 65536 such occurrences. llvm-objcopy was then used to remove the empty .text section automatically generated by MC, and incidentally to move .strtab to the end of the object. This ensured that the section/symbol indexes matched their name (i.e. section index 1 was s1, section index 2 was s2 etc, and sym1 was in s1, sym2 in s2 etc). Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D97660	2021-03-04 09:42:43 +00:00
Fraser Cormack	ebfa93d23f	[RISCV] Fix crash when inserting large fixed-length subvectors This patch addresses a compiler crash resulting from passing a fixed-length type to one that expects scalable vector types. An assertion was added to prevent this regressing in the future. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97868	2021-03-04 09:27:16 +00:00
Fraser Cormack	4fc8390412	[RISCV] Preserve fixed-length VL on insert_vector_elt in more cases This patch fixes up one case where the fixed-length-vector VL was dropped (falling back to VLMAX) when inserting vector elements, as the code would lower via ISD::INSERT_VECTOR_ELT (at index 0) which loses the fixed-length vector information. To this end, a custom node, VMV_S_XF_VL, was introduced to carry the VL operand through to the final instruction. This node wraps the RVV vmv.s.x and vmv.s.f instructions, which were being selected by insert_vector_elt anyway. There should be no observable difference in scalable-vector codegen. There is still one outstanding drop from fixed-length VL to VLMAX, when an i64 element is inserted into a vector on RV32; the splat (which is custom legalized) has no notion of the original fixed-length vector type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97842	2021-03-04 09:21:10 +00:00
Martin Storsjö	ea1b68c099	[ARM] Fix linking of the new unittest from a968e7b82eac	2021-03-04 11:04:17 +02:00
David Green	fd710d5689	[ARM] KnownBits for CSINC/CSNEG/CSINV This adds some simple known bits handling for the three CSINC/NEG/INV instructions. From the operands known bits we can compute the common bits of the first operand and incremented/negated/inverted second operand. The first, especially CSINC ZR, ZR, comes up fair amount in the tests. The others are more rare so a unit test for them is added. Differential Revision: https://reviews.llvm.org/D97788	2021-03-04 08:40:20 +00:00
Max Kazantsev	8d6ae1909f	[X86][CodeGenPrepare] Try to reuse IV's incremented value instead of adding the offset, part 1 While optimizing the memory instruction, we sometimes need to add offset to the value of `IV`. We could avoid doing so if the `IV.next` is already defined at the point of interest. In this case, we may get two possible advantages from this: - If the `IV` step happens to match with the offset, we don't need to add the offset at all; - We reduce overlap of live ranges of `IV` and `IV.next`. They may stop overlapping and it will lead to better register allocation. Even if the overlap will preserve, we are not introducing a new overlap, so it should be a neutral transform (Disabled this patch, will come with follow-up). Currently I've only added support for IVs that get decremented using `usub` intrinsic. We could also support `AddInstr`, however there is some weird interaction with some other transform that may lead to infinite compilation in this case (seems like same transform is done and undone over and over). I need to investigate why it happens, but generally we could do that too. The first part only handles case where this reuse fully elimiates the offset. Differential Revision: https://reviews.llvm.org/D96399 Reviewed By: reames	2021-03-04 15:22:55 +07:00
Juneyoung Lee	4267f32aaf	[LangRef] remove links to lifetime since use marker intro already has a link	2021-03-04 17:19:23 +09:00
Juneyoung Lee	c2621fae37	[LangRef] fix more undefined label errors	2021-03-04 17:09:03 +09:00
Craig Topper	3fc87467d8	[LegalizeVectorTypes] Remove a tautological compare.	2021-03-03 23:26:00 -08:00
Hongtao Yu	d0f29b816e	[CSSPGO] Deduplicating dangling pseudo probes. Same dangling probes are redundant since they all have the same semantic that is to rely on the counts inference tool to get reasonable count for the same original block. Therefore, there's no need to keep multiple copies of them. I've seen jump threading created tons of redundant dangling probes that slowed down the compiler dramatically. Other optimization passes can also result in redundant probes though without an observed impact so far. This change removes block-wise redundant dangling probes specifically introduced by jump threading. To support removing redundant dangling probes caused by all other passes, a final function-wise deduplication is also added. An 18% size win of the .pseudo_probe section was seen for SPEC2017. No performance difference was observed. Differential Revision: https://reviews.llvm.org/D97482	2021-03-03 22:44:42 -08:00
Hongtao Yu	f2b87eabed	[CSSPGO] Unblocking optimizations by dangling pseudo probes. This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments. The optimizations being unblocked are: 1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted. 2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions. 3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks. Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D97481	2021-03-03 22:44:42 -08:00
Hongtao Yu	b26882e846	[CSSPGO] Introducing dangling pseudo probes. Dangling probes are the probes associated to an empty block. This usually happens when all real instructions are optimized away from the block. There is a problem with dangling probes during the offline counts processing. The way the sample profiler works is that samples collected on the first physical instruction following a probe will be counted towards the probe. This logically equals to treating the instruction next to a probe as if it is from the same block of the probe. In the dangling probe case, the real instruction following a dangling probe actually starts a new block, and samples collected on the new block may cause issues when counted towards the empty block. To mitigate this issue, we first try to move around a dangling probe inside its owning block. If there are still native instructions preceding the probe in the same block, we can then use them as a place holder to collect samples for the probe. A pass is added to walk each block backwards looking for probes not followed by any real instruction and moving them before the first real instruction. This is done right before the object emission. If we are unlucky to find such in-block preceding instructions for a probe, the solution we are taking is to tag such probe as dangling so that the samples reported for them will not be trusted by the compiler. We leave it up to the counts inference algorithm to get such probes a reasonable count. The number `UINT64_MAX` is used to mark sample count as collected for a dangling probe. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95962	2021-03-03 22:44:41 -08:00
Johannes Doerfert	97088b4db9	[Docs] Remove `no-aa` from the alias analysis documentation The `no-aa` pass has been removed with 7b560d40bddf. Differential Revision: https://reviews.llvm.org/D95416	2021-03-04 00:35:52 -06:00
Johannes Doerfert	c5b0326e2f	[Attributor] Make DepClass a required argument We often used a sub-optimal dependence class in the past because we didn't see the argument. Let's make it explicit so we remember to think about it.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	fa902e7024	[Attributor] Fold "TrackDependence" into the DepClassTy enum We don't need a bool and an enum to express the three options we currently have. This makes the interface nicer and much easier to use optional dependencies. Also avoids mistakes where the bool is false and enum ignored.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	45e74bf95c	[Attributor] Avoid work for GEPs and wait till the users are visited	2021-03-04 00:35:52 -06:00
Johannes Doerfert	d5abc69606	[Attributor] Use known alignment as lower bound to avoid work If we know already more than available from a use, we don't need to invest time on it.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	c92d3580fd	[Attributor][NFC] Move some trivial checks up	2021-03-04 00:35:52 -06:00
Johannes Doerfert	a8bee9c76c	[Attributor] Use sensible initialization in AANoCaptureCallSiteReturned	2021-03-04 00:35:51 -06:00
Evgeniy Brevnov	f31243de74	[DSE] Add support for not aligned begin/end This is an attempt to improve handling of partial overlaps in case of unaligned begin\end. Existing implementation just bails out if it encounters such cases. Even when it doesn't I believe existing code checking alignment constraints is not quite correct. It tries to ensure alignment of the "later" start/end offset while should be preserving relative alignment between earlier and later start/end. The idea behind the change is simple. When start/end is not aligned as we wish instead of bailing out let's adjust it as necessary to get desired alignment. I'll update with performance results as measured by the test-suite...it's still running... Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D93530	2021-03-04 12:24:23 +07:00
Serguei Katkov	36bf8f45fb	[InstCombine] Move statepoint intrinsic handling from visitCall to visitCallBase statepoint intrinsic can be used in invoke context, so it should be handled in visitCallBase to cover both call and invoke. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D97833	2021-03-04 11:00:22 +07:00
Wang, Pengfei	2e2e287013	Add Windows ehcont section support (/guard:ehcont). Add option /guard:ehcont Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D96709	2021-03-04 11:47:29 +08:00
Juneyoung Lee	09d28f9368	[LangRef] fix undefined label	2021-03-04 10:12:57 +09:00
Juneyoung Lee	f861c629c1	[LangRef] Make lifetime intrinsic's semantics consistent with StackColoring's comment This patch is an update to LangRef by describing lifetime intrinsics' behavior by following the description of MIR's LIFETIME_START/LIFETIME_END markers at StackColoring.cpp (`eb44682d67/llvm/lib/CodeGen/StackColoring.cpp (L163)`) and the discussion in llvm-dev. In order to explicitly define the meaning of an object lifetime, I added 'Object Lifetime' subsection. Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D94002	2021-03-04 09:58:06 +09:00
Fangrui Song	25d19a9795	[IRSymTab] Set FB_used on llvm.compiler.used symbols IR symbol table does not parse inline asm. A symbol only referenced by inline asm is not in the IR symbol table, so LTO does not know that the definition (in another translation unit) is referenced and may internalize it, even if that definition has `__attribute__((used))` (which lowers to `llvm.compiler.used` on ELF targets since D97446). ``` // cabac.c __attribute__((used)) const uint8_t ff_h264_cabac_tables[...] = {...}; // h264_cabac.c asm("lea ff_h264_cabac_tables(%rip), %0" : ...); ``` `__attribute__((used))` is the recommended way to tell the compiler there may be inline asm references, so the usage is perfectly fine. This patch conservatively sets the `FB_used` bit on `llvm.compiler.used` symbols to work around the IR symbol table limitation. Note: before D97446, Clang never emitted symbols in the `llvm.compiler.used` list, so this change does not punish any Clang emitted global object. Without the patch, `ff_h264_cabac_tables` may be assigned to a non-external partition and get internalized. Then we will get a linker error because the `cabac.c` definition is not exposed. Differential Revision: https://reviews.llvm.org/D97755	2021-03-03 16:22:30 -08:00
Xun Li	a7cf9dd738	[LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions See pr46990(https://bugs.llvm.org/show_bug.cgi?id=46990). LICM should not sink store instructions to loop exit blocks which cross coro.suspend intrinsics. This breaks semantic of coro.suspend intrinsic which return to caller directly. Also this leads to use-after-free if the coroutine is freed before control returns to the caller in multithread environment. This patch disable promotion by check whether loop contains coro.suspend intrinsics. This is a resubmit of D86190. Disabling LICM for loops with coroutine suspension is a better option not only for correctness purpose but also for performance purpose. In most cases LICM sinks memory operations. In the case of coroutine, sinking memory operation out of the loop does not improve performance since coroutien needs to get data from the frame anyway. In fact LICM would hurt coroutine performance since it adds more entries to the frame. Differential Revision: https://reviews.llvm.org/D96928	2021-03-03 15:21:57 -08:00
Fangrui Song	e0a172f86d	[test] Fix profiling.ll `__llvm_prf_nm` is compressed if zlib is available. In addition, its size may not be that stable.	2021-03-03 15:18:44 -08:00
Jin Lin	b34db73957	Add the use of register r for outlined function when register r is live in and defined later. The compiler needs to mark register $x0 as live in for the following case. $x1 = ADDXri $sp, 16, 0 BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def dead $x0 Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D95267	2021-03-03 15:14:11 -08:00
George Balatsouras	a99b153a84	[dfsan] Remove hard-coded shadow width in more tests As a preparation step for fast8 support, we need to update the tests to pass in both modes. That requires generalizing the shadow width and remove any hard coded references that assume it's always 2 bytes. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D97884	2021-03-03 15:05:16 -08:00
Sanjay Patel	f2517ef43e	[Analysis] simplify propagation of FMF in recurrences; NFC This is a mess, but this is hopefully no-functional-change. The 'Prev' descriptor is only used for min/max recurrences or when starting a match from a phi, so it should not be a factor when propagating FMF for fmul/fadd. The API is confusing (and should be reduced in subsequent steps) because the "UnsafeAlgebraInst" appears to actually be a placeholder for a recurrence that does NOT have FMF, but we still want to treat it as reassociative.	2021-03-03 17:28:10 -05:00
Stefan Gränitz	ee07f5b459	[lli] Add JITLink link component after 99a6d003edbe	2021-03-03 23:14:26 +01:00
Florian Hahn	4b2c4a9325	[AArch64] Add implicit uses for operands when expanding BLR_RVMARKER. Make sure we preserve info about passed arguments as implicit uses, to make sure later passes still have access to this information. This fixes a mis-compile where the machine-combiner would pick an incorrect free register.	2021-03-03 21:56:05 +00:00
Stefan Gränitz	4479241a3b	Revert "hack to unbreak check-llvm on win after D97335" in attempt for actual fix This reverts commit 900f076113302e26e1939541b546b0075e3e9721 and attempts an actual fix: All failing tests for llvm-jitlink use the `-noexec` flag. The inputs they operate on are not meant for execution on the host system. Looking e.g. at the MachO_test_harness_harnesss.s test, llvm-mc generates input machine code with "x86_64-apple-macosx10.9". My previous attempt in bbdb4c8c9bcef0e8db751630accc04ad874f54e7 disabled the debug support plugin for Windows targets, but what we would actually want is to disable it on Windows HOSTS. With the new patch here, I don't do exactly that, but instead follow the approach for the EH frame plugin and include the `-noexec` flag in the condition. It should have the desired effect when it comes to the test suite. It appears a little workaround'ish, but should work reliably for now. I will discuss the issue with Lang and see if we can do better. Thanks @thakis again for the temporary fix.	2021-03-03 22:35:36 +01:00
Whitney Tsang	9ec6c6eade	[LoopUnrollRuntime] Add option to assume the non latch exit block to be predictable. Reviewed By: Meinersbur, bmahjour Differential Revision: https://reviews.llvm.org/D97747	2021-03-03 20:43:31 +00:00
Alexey Bataev	73ba017fcc	[Cost]Add tests for boolean and/or reductions, NFC. Tests with the default costs for boolean and/or reductions. Differential Revision: https://reviews.llvm.org/D97793	2021-03-03 12:34:30 -08:00
Philip Reames	db90a04542	Address review comment from D97219 (follow up to 8051156) Probably should have done this before landing, but I forgot. Basic idea is to avoid using the SCEV predicate when it doesn't buy us anything. Also happens to set us up for handling non-add recurrences in the future if desired.	2021-03-03 12:20:27 -08:00
Philip Reames	5c3ba57c9e	Sink routine for replacing a operand bundle to CallBase [NFC] We had equivalent code for both CallInst and InvokeInst, but never cared about the result type.	2021-03-03 12:07:55 -08:00
Philip Reames	d2f05a31bd	[LSR] Unify scheduling of existing and inserted addrecs LSR goes to some lengths to schedule IV increments such that %iv and %iv.next never need to overlap. This is fairly fundamental to LSRs cost model. LSR assumes that an addrec can be represented with a single register. If %iv and %iv.next have to overlap, then that assumption does not hold. The bug - which this patch is fixing - is that LSR only does this scheduling for IVs which it inserts, but it's cost model assumes the same for existing IVs that it reuses. It will rewrite existing IV users such that the no-overlap property holds, but will not actually reschedule said IV increment. As you can see from the relatively lack of test updates, this doesn't actually impact codegen much. The main reason for doing it is to make a follow up patch series which improves post-increment use and scheduling easier to follow. Differential Revision: https://reviews.llvm.org/D97219	2021-03-03 12:07:55 -08:00
Jonas Paulsson	d57759f053	[SystemZ] Reimplement the i8/i16 compare-and-swap logic. Even though the implementation in emitAtomicCmpSwapW() was correct, it made Valgrind report an error. Instead of using a RISBG on CmpVal, an LL[CH]R can be made on the OldVal, and the problem is avoided. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D97604	2021-03-03 14:04:32 -06:00
Florian Hahn	620203451b	[AArch64] Move CALL_RVMARKER definition after CALL. This is a NFC with respect to the generated code. But it fixes a crash when using -debug, because of the position in the enum CALL_RVMARKER nodes were treated as memops. That caused a crash when printing CALL_RVMARKER nodes.	2021-03-03 19:42:16 +00:00
Fangrui Song	6ef5900ddb	[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF `__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not referenced via relocation in the translation unit. With `-z start-stop-gc` (LLD 13 (D96914); GNU ld 2.37 https://sourceware.org/bugzilla/show_bug.cgi?id=27451), the linker does not let `__start_/__stop_` references retain their sections. Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make them retained by the linker. This patch changes most existing `UsedVars` cases to `CompilerUsedVars` to reflect the ideal state - if the binary format properly supports section based GC (dead stripping), `llvm.compiler.used` should be sufficient. `__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars` since we want them to be unconditionally retained by both compiler and linker. Behaviors on COFF/Mach-O are not affected. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D97649	2021-03-03 11:32:24 -08:00
Fangrui Song	d5a2734874	[test] Improve PGO tests	2021-03-03 11:32:24 -08:00
George Balatsouras	000167f2ef	[dfsan] Remove hardcoded shadow width in abilist_aggregate.ll As a preparation step for fast8 support, we need to update the tests to pass in both modes. That requires generalizing the shadow width and remove any hard coded references that assume it's always 2 bytes. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D97723	2021-03-03 11:12:59 -08:00
Stanislav Mekhanoshin	2b0a0e51ae	[AMDGPU] Exclude always_inline from max bb threshold Honor always_inline attribute when processing -amdgpu-inline-max-bb. It was lost during the ports of the heuristic. There is no reason to honor inline hint, but not always inline. Differential Revision: https://reviews.llvm.org/D97790	2021-03-03 10:21:56 -08:00
Hongtao Yu	6997a9ef05	[CSSPGO][llvm-profgen] Continue disassembling after illegal instruction is seen. Previously we errored out when disassembling illegal instructions and there would be no profile generated. In fact illegal instructions are not uncommon and we'd better skip them and print "unknown" instead of erroring out. This matches the behavior of llvm-objdump (see disassembleObject in llvm-objdump.cpp). Reviewed By: wlei, wenlei Differential Revision: https://reviews.llvm.org/D97776	2021-03-03 10:14:10 -08:00
Choongwoo Han	c78f1b079e	[llvm-cov] Cache file status information Currently, getSourceFile accesses file system to check if two paths are the same file with a thread lock, which is a huge performance bottleneck in some cases. Currently, it's accessing file system size(files) * size(files) times. Thus, cache file status information, which reduces file system access to size(files) times. When I tested it with two binaries and 16 cpu cores, it saved over 70% of time. Binary 1: 56 secs -> 3 secs Binary 2: 17 hours -> 4 hours Differential Revision: https://reviews.llvm.org/D97061	2021-03-03 10:04:07 -08:00
Philip Reames	238c95fc20	Fix a build warning from ea7d208	2021-03-03 09:16:56 -08:00
Philip Reames	05b6860c6e	[basicaa] Rewrite isGEPBaseAtNegativeOffset in terms of index difference [mostly NFC] This is almost purely NFC, it just fits more obviously in the flow of the code now that we've standardized on the index different approach. The non-NFC bit is that because of canceling the VariableOffsets in the subtract, we can now handle the case where both sides involve a common variable offset. This isn't an "interesting" improvement; it just happens to fall out of the natural code structure. One subtle point - the placement of this above the BaseAlias check is important in the original code as this can return NoAlias even when we can't find a relation between the bases otherwise. Also added some enhancement TODOs noticed while understanding the existing code. Note: This is slightly different than the LGTMed version. I fixed the "inbounds" issue Nikita noticed with the original code in e6e5ef4 and rebased this to include the same fix. Differential Revision: https://reviews.llvm.org/D97520	2021-03-03 09:03:28 -08:00

1 2 3 4 5 ...

212074 Commits