llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 04:32:44 +01:00

Author	SHA1	Message	Date
Fangrui Song	a2b91c0b40	ELF: Create unique SHF_GNU_RETAIN sections for llvm.used global objects If a global object is listed in `@llvm.used`, place it in a unique section with the `SHF_GNU_RETAIN` flag. The section is a GC root under `ld --gc-sections` with LLD>=13 or GNU ld>=2.36. For front ends which do not expect to see multiple sections of the same name, consider emitting `@llvm.compiler.used` instead of `@llvm.used`. SHF_GNU_RETAIN is restricted to ELFOSABI_GNU and ELFOSABI_FREEBSD in binutils. We don't do the restriction - see the rationale in D95749. The integrated assembler has supported SHF_GNU_RETAIN since D95730. GNU as>=2.36 supports section flag 'R'. We don't need to worry about GNU ld support because older GNU ld just ignores the unknown SHF_GNU_RETAIN. With this change, `__attribute__((retain))` functions/variables emitted by clang will get the SHF_GNU_RETAIN flag. Differential Revision: https://reviews.llvm.org/D97448	2021-02-26 16:38:44 -08:00
Jessica Paquette	17d644a711	[AArch64][GlobalISel] Import FMOV patterns rather than manually selecting it There are existing patterns for FMOVHi, FMOVSi, and FMOVDi in AArch64InstrFormats.td. Importing these allows us to remove the manual selection code for FMOV. It also allows us to select FMOVHi for non-zero constants when we have full fp-16 support. Refactor some of the code in AArch64InstrFormats.td so that we can create equivalent custom renderers in GlobalISel. Differential Revision: https://reviews.llvm.org/D97511	2021-02-26 16:27:39 -08:00
Fangrui Song	76036bed4d	[test] Fix PGOProfile/comdat_internal.ll	2021-02-26 16:27:23 -08:00
Fangrui Song	69094d3a4d	[InstrProfiling] Use llvm.compiler.used instead of llvm.used for ELF Many optimizers (e.g. GlobalOpt/ConstantMerge) do not respect linker semantics for comdat and may not discard the sections as a unit. The interconnected `__llvm_prf_{cnts,data}` sections (in comdat for ELF) are similar to D97432: `__profd_` is not directly referenced, so `__profd_` may be discarded while `__profc_` is retained, breaking the interconnection. We currently conservatively add all such sections to `llvm.used` and let the linker do GC for ELF. In D97448, we will change GlobalObject's in the llvm.used list to use SHF_GNU_RETAIN, causing the metadata sections to be unnecessarily retained (some `check-profile` tests check for GC). Use `llvm.compiler.used` to retain the current GC behavior. Differential Revision: https://reviews.llvm.org/D97585	2021-02-26 16:14:03 -08:00
Arthur Eubanks	33f6c8a3f1	[docs] Add documentation on using the new pass manager And clarify in the "writing a pass" docs that both the legacy and new PMs are being used for the codegen/optimization pipelines. Reviewed By: ychen, asbirlea Differential Revision: https://reviews.llvm.org/D97515	2021-02-26 15:28:19 -08:00
Matt Arsenault	f1ba4465de	AMDGPU: Use kill instruction to hint soft clause live ranges Previously we would use a bundle to hint the register allocator to not overwrite the pointers in a sequence of loads to avoid breaking soft clauses. This bundling was based on a fuzzy register pressure heuristic, so we could not guarantee using more registers than are really available. This would result in register allocator failing on unsatisfiable bundles. Use a kill to artificially extend the live ranges, so we can always succeed at register allocation even if it means extra spills in the worst case. This seems to capture most of the benefit of the bundle while avoiding most of the risk presented by the bundle. However the lit tests do show a handful of regressions. In some cases with sequences of volatile loads, unused load components end up getting reallocated to the next load which forces a wait between. There are also a few small scheduling regressions where a hazard used to be avoided, and one spill torture test which for some reason nearly doubles the stack usage. There is also a bit of noise from leftover kills (it may make sense for post-RA pseudos to strip all of these out).	2021-02-26 18:26:40 -05:00
Craig Topper	4d9e0c8d29	[DAGCombiner] Optimize SMULO/UMULO if we can prove that overflow is impossible. Using ComputeNumSignBits or computeKnownBits we might be able to determine that overflow is impossible. This especially helps after type legalization if the type was promoted from a type with half the bits or more. Type legalization conservatively creates a promoted smulo/umulo and an overflow check for the promoted bits. The overflow from the promoted smulo/umulo is ORed with the result of the promoted bits overflow check. Proving that the promoted smulo/umulo can never overflow will leave us with just the promoted bits overflow check. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97160	2021-02-26 14:50:03 -08:00
George Balatsouras	bf739e859b	[dfsan] Record dfsan metadata in globals This will allow identifying exactly how many shadow bytes were used during compilation, for when fast8 mode is introduced. Also, it will provide a consistent matching point for instrumentation tests so that the exact llvm type used (i8 or i16) for the shadow can be replaced by a pattern substitution. This is handy for tests with multiple prefixes. Reviewed by: stephan.yichao.zhao, morehouse Differential Revision: https://reviews.llvm.org/D97409	2021-02-26 14:42:46 -08:00
Dan Gohman	674189ca78	[WebAssembly] Avoid `bit_cast` when printing f32 and f64 immediates Use `APInt` to convert a 32-bit or 64-bit immediate to an `APFloat` rather than `bit_cast` to a `float` or `double` to avoid going through host floating-point and potentially changing the bit pattern of NaNs. Differential Revision: https://reviews.llvm.org/D97490	2021-02-26 14:19:02 -08:00
Heejin Ahn	52b47abf91	[WebAssembly] Fix remapping branch dests in fixCatchUnwindMismatches This is a case D97178 tried to solve but missed. D97178 could not handle the case when multiple consecutive delegates are generated: - Before: ``` block br (a) try catch end_try end_block <- (a) ``` - After ``` block br (a) try ... try try catch end_try <- (a) delegate delegate end_block <- (b) ``` (The `br` should point to (b) now) D97178 assumed `end_block` exists two BBs later than `end_try`, because it assumed the order as `end_try` BB -> `delegate` BB -> `end_block` BB. But it turned out there can be multiple `delegate`s in between. This patch changes the logic so we just search from `end_try` BB until we find `end_block`. Fixes https://github.com/emscripten-core/emscripten/issues/13515. (More precisely, fixes https://github.com/emscripten-core/emscripten/issues/13515#issuecomment-784711318.) Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97569	2021-02-26 13:38:13 -08:00
Philip Reames	2152c1de23	[tests] Precommit for upcoming patch	2021-02-26 13:11:25 -08:00
Stanislav Mekhanoshin	55c6071bea	[AMDGPU] Avoid second rescheduling for some regions If a region was not constrained by a high register pressure and was not rescheduled without clustering we can skip rescheduling it ClusteredLowOccupancyReschedule stage. This improves scheduling speed by 25% on some kernels. Differential Revision: https://reviews.llvm.org/D97506	2021-02-26 12:29:37 -08:00
Stanislav Mekhanoshin	d4fd541f6e	[AMDGPU] Skip unclusterd rescheduling w/o ld/st We are attempting rescheduling without load store clustering if occupancy limits were not met with clustering. Skip this for regions which do not have any loads or stores at all. In a set of kernels I am experimenting with this improves scheduling time by ~30%. Differential Revision: https://reviews.llvm.org/D97342	2021-02-26 12:29:03 -08:00
Anirudh Prasad	da73d5f3fb	[SystemZ] Introducing assembler dialects for the Z backend - This patch introduces a different assembler dialect ("hlasm") for z/OS. The default dialect has now been given the "att" dialect name. For this appropriate changes have been added to SystemZ.td. - This patch also makes a few changes to SystemZInstrFormats.td which restrict a few condition code mnemonics to just the "att" dialect variant (he, le, lh, nhe, nle, nlh). These extended condition code mnemonics are not available in HLASM. - A new private function has been introduced in SystemZAsmParser.cpp to return the assembler dialect set in SystemZMCAsmInfo.cpp. The reason we couldn't/haven't explicitly queried the overriden getAssemblerDialect function from AsmParser is outlined in this thread here. This returned dialect is directly passed onto the relevant matcher functions which taken in a variantID, so that the matcher functions can appropriately choose an instruction based on the variant. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D94250	2021-02-26 15:14:38 -05:00
James Y Knight	d4f3746ed7	Use getAlign() on atomicrmw/cmpxchg instructions, now that it's available. These locations were missed as part of adding alignment to the instructions, and were still making their own alignment assumptions.	2021-02-26 15:06:15 -05:00
Philip Reames	2091e30f9f	[cgp] Minor code improvement - reuse an existing named helper [NFC]	2021-02-26 11:51:32 -08:00
Craig Topper	043ad9cb88	[RISCV] Call SelectBaseAddr on the base pointer in the custom isel for vector loads and stores. This will allow FrameIndex as the base address instead of emitting a separate ADDI from isel. eliminateFrameIndex will likely turn it back into an ADDI, but this makes things consistent with the SDPatterns and VLPatterns. I only tested one case for simplicity. I can test more if reviewers want. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97221	2021-02-26 11:38:23 -08:00
Philip Reames	b01edb719a	Be more mathematicly precise about definition of recurrence [NFC] This clarifies the interface of the matchSimpleRecurrence helper introduced in 8020be0b8 for non-commutative operators. After ebd3aeba, I realized the original way I framed the routine was inconsistent. For shifts, we only matched the the LHS form, but for sub we matched both and the caller wanted that information. So, instead, we now consistently match both forms for non-commutative operators and the caller becomes responsible for filtering if needed. I tried to put a clear warning in the header because I suspect the RHS form of e.g. a sub recurrence is non-obvious for most folks. (It was for me.)	2021-02-26 11:22:01 -08:00
Jianzhou Zhao	fdd0e4e24a	[dfsan] Propagate origins for callsites This is a part of https://reviews.llvm.org/D95835. Each customized function has two wrappers. The first one dfsw is for the normal shadow propagation. The second one dfso is used when origin tracking is on. It calls the first one, and does additional origin propagation. Which one to use can be decided at instrumentation time. This is to ensure minimal additional overhead when origin tracking is off. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97483	2021-02-26 19:12:03 +00:00
Fangrui Song	95d9b0ea49	[SanitizerCoverage] Clarify llvm.used/llvm.compiler.used and partially fix unmatched metadata sections on Windows `__sancov_pcs` parallels the other metadata section(s). While some optimizers (e.g. GlobalDCE) respect linker semantics for comdat and retain or discard the sections as a unit, some (e.g. GlobalOpt/ConstantMerge) do not. So we have to conservatively retain all unconditionally in the compiler. When a comdat is used, the COFF/ELF linkers' GC semantics ensure the associated parallel array elements are retained or discarded together, so `llvm.compiler.used` is sufficient. Otherwise (MachO (see rL311955/rL311959), COFF special case where comdat is not used), we have to use `llvm.used` to conservatively make all sections retain by the linker. This will fix the Windows problem once internal linkage GlobalObject's in `llvm.used` are retained via `/INCLUDE:`. Reviewed By: morehouse, vitalybuka Differential Revision: https://reviews.llvm.org/D97432	2021-02-26 11:10:03 -08:00
Philip Reames	86325e758d	Use helper introduced in 8020be0b8 to simplify ValueTracking [NFC] Direct rewrite of the code the helper was extracted from.	2021-02-26 10:47:26 -08:00
Philip Reames	709be793e4	Add a helper for matching simple recurrence cycles This helper came up in another review, and I've got about 4 different patches with copies of this copied into it. Time to precommit the routine. :)	2021-02-26 10:21:23 -08:00
Alexey Bataev	37bc33ccf2	[InstCombine][NFC]Add a test for logical reductions.	2021-02-26 10:08:48 -08:00
Mircea Trofin	a2d89b1a45	[NFC][regalloc] const-ed APIs, using MCRegister instead of unsigned	2021-02-26 09:54:20 -08:00
Mircea Trofin	053e17daef	[NFC] Const-ed 2 APIs in VirtRegMap	2021-02-26 09:32:42 -08:00
Mircea Trofin	534541a3bf	[NFC] MCRegister fixes in RegisterClassInfo, and const-ed APIs	2021-02-26 08:53:57 -08:00
Jay Foad	994677c6da	[AMDGPU] Add selection pattern for v_xnor_b32 This allows GlobalISel to use this instruction where available. I assume SelectionDAG always selects s_xnor_b32 so it isn't affected by this change. Differential Revision: https://reviews.llvm.org/D97560	2021-02-26 16:41:47 +00:00
Nico Weber	0a5bf9f615	[PDB] Fix unsigned integer overflow When building with -fsanitize=unsigned-integer-overflow, this code causes a diagnostic like: ../../llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp:159:15: runtime error: unsigned integer overflow: 90 - 229 cannot be represented in type 'unsigned long' unsigned integer overflow is well defined and it isn't an issue in practice, but in obscure scenarios (S1.size() small, S2.size over 2GB on 32-bit systems) it could even be a bug. So use the usual idiom for implementing cmp functions instead of the gernally considered buggy idiom :) See e.g. https://www.gnu.org/software/libc/manual/html_node/Comparison-Functions.html or https://stackoverflow.com/questions/10996418/efficient-integer-compare-function/10997428#10997428 Differential Revision: https://reviews.llvm.org/D97557	2021-02-26 11:26:53 -05:00
Simon Pilgrim	e226eb5874	[X86][AVX] SimplifyDemandedBitsForTargetNode - add basic X86ISD::VBROADCAST handling. Simplify through to the scalar/vector source operand.	2021-02-26 16:13:14 +00:00
Dave Lee	c596b6103e	[llvm][utils] Fix innocuous off by one in lldb formatters num_children is "last_index" + 1, thus num_children + 1 = "last_index" + 2 this worked anyway because the index of `$$dereference$$` would work as long as it was past the last index.	2021-02-26 08:10:41 -08:00
James Y Knight	3f71310ad7	Fix assert to use getTypeStoreSize instead of getPrimitiveSizeInBits, per comment on D97223.	2021-02-26 11:08:00 -05:00
Jay Foad	ba178527e9	[AMDGPU] Better codegen for i64 bitreverse Differential Revision: https://reviews.llvm.org/D97547	2021-02-26 15:51:36 +00:00
Dave Lee	a529f3dc9a	[llvm][utils] Rename lldb dict variables to internal_dict Most lldb scripts use `internal_dict`. Also, `dict` is a builtin constructor, it's good habit to avoid using it as a variable name.	2021-02-26 07:46:54 -08:00
Dave Lee	d72bff4c3d	[llvm][utils] Support dereferencing llvm::Optional lldb formatter Add deref support to `llvm::Optional` in `lldbDataFormatters.py`. This creates a synthetic provider that adds dereference support, but otherwise proxies all access to the underlying value. With this change, an optional value can be displayed by running `v *someOptional`, and its contents can be accessed with the arrow `operator->`, for example `v someOpt->HasThing`. This matches expressions usable from expression evaluation. See also D97165 and D97524. Differential Revision: https://reviews.llvm.org/D97525	2021-02-26 07:43:33 -08:00
Vladislav Vinogradov	d7a547dee1	[ADT][NFC] Add extra typedefs to `ArrayRef` and `MutableArrayRef` * `value_type` * `pointer` * `const_pointer` * `reference` * `const_reference` * `const_reverse_iterator` * `size_type` * `difference_type` It makes `ArrayRef` and `MutableArrayRef` types fully compliant with STL Container concept. Reviewed By: lattner, courbet Differential Revision: https://reviews.llvm.org/D95611	2021-02-26 18:37:08 +03:00
Simon Pilgrim	e298ce89c1	[Utils] collectBitParts - bail for integers > 128-bits collectBitParts uses int8_t for the bit indices, leaving a 128-bit limit. We already test for this before calling collectBitParts, but rGb94c215592bd added truncate handling which meant we could end up processing wider integers. Thanks to @manojgupta for the repro.	2021-02-26 14:58:01 +00:00
Nico Weber	e616a03374	[libcxxabi] Fewer assumptions about path from libcxx to libcxxabi This is useful for projects that pull in libcxx and libcxxabi and build them using out-of-tree build files, but don't make them sibling directories (or don't call the sibling directories libcxx and libcxxabi for some reason). Fixes PR49313. Differential Revision: https://reviews.llvm.org/D97379	2021-02-26 09:10:18 -05:00
Wang, Pengfei	945f3f9111	[X86] Allow PTILEZEROV and PTILELOADDV to be rematerializable Spilling and reloading AMX registers are expensive. We allow PTILEZEROV and PTILELOADDV to be rematerializable to avoid the register spilling. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D97453	2021-02-26 21:55:59 +08:00
Stephen Tozer	abe864fcc4	[InstCombine] Avoid redundant or out-of-order debug value sinking This patch modifies TryToSinkInstruction in the InstCombine pass, to prevent redundant debug intrinsics from being produced, and also prevent the intrinsics from being emitted in an incorrect order. It does this by ensuring that when this pass sinks an instruction and creates clones of the debug intrinsics that use that instruction, it inserts those debug intrinsics in their original order, and only inserts the last debug intrinsic for each variable in the Instruction's block. Differential revision: https://reviews.llvm.org/D95463	2021-02-26 13:04:33 +00:00
Evgeniy Brevnov	7d20cd7bf1	Revert "[NARY-REASSOCIATE] Support reassociation of min/max" This reverts commit 83d134c3c4222e8b8d3d90c099f749a3b3abc8e0.	2021-02-26 19:47:54 +07:00
Stefan Gränitz	65d983e2c9	[Orc] Use extensible RTTI for the orc::ObjectLayer class hierarchy So far we had no way to distinguish between JITLink and RuntimeDyld in lli. Instead, we used implicit knowledge that RuntimeDyld would be used for linking ELF. In order to get D97337 to work with lli though, we have to move on and allow JITLink for ELF. This patch uses extensible RTTI to allow external clients to add their own layers without touching the LLVM sources. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D97338	2021-02-26 13:13:05 +01:00
Simon Pilgrim	87b20452ec	[X86] Remove unnecessary custom lowering of vXi1 SADDSAT/SSUBSAT/UADDSAT/USUBSAT As discussed on D97478. The removal of the custom tag causes some changes in the add/sub-overflow expansion as it no longer expands to sat-arith codegen.	2021-02-26 12:10:23 +00:00
Stefan Gränitz	66b9e5f6f9	[docs][JITLink] Few typo fixes in JITLink design/API doc	2021-02-26 12:56:42 +01:00
Simon Pilgrim	c01b393324	[DAG] Fold vXi1 multiplies -> and This allows us to remove X86 custom lowering of vXi1 MUL, which helps simplify a load of mask math. Mentioned in D97478 post review.	2021-02-26 11:46:12 +00:00
Simon Pilgrim	7e5e212710	[X86] Remove unnecessary custom lowering of v16i1/v32i1 ADD/SUB These were missed in D97478	2021-02-26 11:46:11 +00:00
Simon Pilgrim	dc7318caba	[DAG] expandAddSubSat - break if-else chain. NFCI. Fix styleguide issue - each if() block always returns so we don't need to make them a if-else chain.	2021-02-26 11:02:08 +00:00
Max Kazantsev	212cb543c9	[Test] Two more interesting test cases & their codegen counterparts	2021-02-26 17:23:12 +07:00
Fraser Cormack	001e25ee44	[RISCV] Use existing method for the LMUL1 type. NFCI. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97467	2021-02-26 09:44:05 +00:00
Clement Courbet	d3be0b160e	[llvm-exegesis][X86] Ignore a few unmeasurable opcodes. Differential Revision: https://reviews.llvm.org/D90744	2021-02-26 10:48:15 +01:00
Max Kazantsev	eca521558a	[Test] Add one more test with corner cases for non-wrapping IVs	2021-02-26 16:17:50 +07:00

... 5 6 7 8 9 ...

212161 Commits