llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 02:52:53 +02:00

Author	SHA1	Message	Date
Christopher Tetreault	74267eec45	Revert "Ensure that InstructionCost actually implements a total ordering" This reverts commit b481cd519e07b3ad2bd3e81c89b0dd8efd68d6bc.	2021-02-02 12:10:02 -08:00
Hongtao Yu	3594bb4c1b	[CSSPGO] Introducing distribution factor for pseudo probe. Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count. This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes. A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead. Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D93264	2021-02-02 11:55:01 -08:00
Christopher Tetreault	21f48fe20a	Ensure that InstructionCost actually implements a total ordering Previously, operator== would consider the actual equality of the pairs (lhs.Value, lhs.State) == (rhs.Value, rhs.State). However, if an invalid cost was involved in a call to operator<, only the state would be compared. Thus, it was not the case that ({2, Invalid} < {3, Invalid} \|\| {2, Invalid} > {3, Invalid} \|\| {2, Invalid} == {3, Invalid}). This patch implements a true total ordering, where cost state is considered first, then value. While it's not really imporant that {2, Invalid} be considered to be less than {3, Invalid}, it's not a problem either. This patch also implements operator== in terms of operator<, so the two definitions will be kept in sync. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D95803	2021-02-02 11:49:14 -08:00
Greg McGary	bc41ebf40b	[lld-macho][NFC] refactor relocation handling Add per-reloc-type attribute bits and migrate code from per-target file into target independent code, driven by reloc attributes. Many cleanups Differential Revision: https://reviews.llvm.org/D95121	2021-02-02 10:54:53 -07:00
Fangrui Song	4be3ad3853	[yaml2obj/obj2yaml/llvm-readobj] Support SHF_GNU_RETAIN In binutils, the flag is defined for ELFOSABI_GNU and ELFOSABI_FREEBSD. It can be used to mark a section as a GC root. In practice, the flag has generic semantics and can be applied to many EI_OSABI values, so we consider it generic. Differential Revision: https://reviews.llvm.org/D95728	2021-02-02 09:19:53 -08:00
Florian Hahn	4314d26857	[ConstraintElimination] Add nicer way to dump constraints (NFC). Use ConstraintSystem::dump(Names) to display the result of decomposing a condition.	2021-02-02 16:36:45 +00:00
Tom Weaver	99a2c6afdf	Revert "[InstrProfiling] Use !associated metadata for counters, data and values" This reverts commit df3e39f60b356ca9dbfc11e96e5fdda30afa7acb. introduced failing test instrprof-gc-sections.c causing build bot to fail: http://lab.llvm.org:8011/#/builders/53/builds/1184	2021-02-02 14:19:31 +00:00
Thomas Symalla	a7e61e92bb	Reverted whitespace changes. Differential Revision: https://reviews.llvm.org/D90968	2021-02-02 09:14:54 +01:00
Thomas Symalla	8955159f2d	Resolve formatting changes.	2021-02-02 09:14:53 +01:00
Thomas Symalla	122a71a9f1	Move Combiner to PreLegalize step	2021-02-02 09:14:53 +01:00
Thomas Symalla	8206639df8	Refactored the pattern matching.	2021-02-02 09:14:52 +01:00
Thomas Symalla	2f722f6a50	Renames	2021-02-02 09:14:52 +01:00
Thomas Symalla	1f11e485f5	Added comments.	2021-02-02 09:14:52 +01:00
Thomas Symalla	ae887237b4	clang-format	2021-02-02 09:14:52 +01:00
Thomas Symalla	1663fb4919	Added clamp i64 to i16 global isel pattern.	2021-02-02 09:14:52 +01:00
Wenlei He	4a2e76b6b1	[CSSPGO] Call site prioritized inlining for sample PGO This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`). Motivation With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO. - It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat. - The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately. - It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO. New FDO Inliner Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed. - Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655. - Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check. - Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites. - BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path. Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO). Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO. Tunings and knobs I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO. Results Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it). Differential Revision: https://reviews.llvm.org/D94001	2021-02-01 23:46:34 -08:00
Gil Rapaport	1307269e6f	[SCEV] Apply loop guards to divisibility tests Extend applyLoopGuards() to take into account conditions/assumes proving some value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the loop unroller and the loop vectorizer identify more loops as not requiring remainder loops. Differential Revision: https://reviews.llvm.org/D95521	2021-02-02 08:09:39 +02:00
Nathan Hawes	574038ba44	[VFS] Add support to RedirectingFileSystem for mapping a virtual directory to one in the external FS. Previously file entries in the -ivfsoverlay yaml could map to a file in the external file system, but directories had to list their contents in the form of other file entries or directories. Allowing directory entries to map to a directory in the external file system makes it possible to present an external directory's contents in a different location and (in combination with the 'fallthrough' option) overlay one directory's contents on top of another. rdar://problem/72485443 Differential Revision: https://reviews.llvm.org/D94844	2021-02-02 14:56:17 +10:00
Kazu Hirata	ee92719619	[llvm] Use pop_back_val (NFC)	2021-02-01 20:55:05 -08:00
Rahman Lavaee	3beeb7f456	[obj2yaml, yaml2obj] Use Hex64 for BBAddressMap fields. This patch let the yaml encoding use Hex64 values for NumBlocks, BB AddressOffset, BB Size, and BB Metadata. Additionally, it changes the decoded values in elf2yaml to uint64_t to match DataExtractor::getULEB128 return type. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D95767	2021-02-01 15:37:30 -08:00
Petr Hosek	62cf2a8725	[InstrProfiling] Use !associated metadata for counters, data and values C identifier name input sections such as __llvm_prf_* are GC roots so they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the C identifier name semantics. The !associated metadata may be attached to a global object declaration with a single argument that references another global object, and it gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded by the linker, setting up !associated metadata allows linker to discard counters, data and values associated with that function symbol. Note that !associated metadata is only supported by ELF, it does not have any effect on non-ELF targets. Differential Revision: https://reviews.llvm.org/D76802	2021-02-01 15:01:43 -08:00
Sanjay Patel	a3cb545d77	[LoopVectorize] improve IR fast-math-flags propagation in reductions This is another step (see D95452) towards correcting fast-math-flags bugs in vector reductions. There are multiple bugs visible in the test diffs, and this is still not working as it should. We still use function attributes (rather than FMF) to drive part of the logic, but we are not checking for the correct FP function attributes. Note that FMF may not be propagated optimally on selects (example in https://llvm.org/PR35607 ). That's why I'm proposing to union the FMF of a fcmp+select pair and avoid regressions on existing vectorizer tests. Differential Revision: https://reviews.llvm.org/D95690	2021-02-01 16:21:36 -05:00
Philip Reames	fc9b7d7c42	[Loads] Plumb through TLI argument [NFC] This is a (rather delayed) follow up to commit 0129cd5. This commit is entirely NFC, the semantic change to leverage the new information will be submitted separate with a test case.	2021-02-01 11:45:30 -08:00
Simon Pilgrim	9ee9a3c639	Revert rGce587529ad8b5 - "[APFloat] multiplySignificand - pass IEEEFloat as const reference. NFCI." Breaks on some buildbots	2021-02-01 16:15:23 +00:00
J-Y You	c233f2a4f7	[TableGen] Fix anonymous record self-reference in foreach and multiclass If we instantiate self-referenced anonymous records in foreach and multiclass, the NAME value will point to incorrect record. It's because anonymous name is resolved too early. This patch adds AnonymousNameInit to represent an anonymous record name. When instantiating an anonymous record, it will update the referred name. Differential Revision: https://reviews.llvm.org/D95309	2021-02-01 10:59:07 -05:00
Simon Pilgrim	c54bcb01a6	[APFloat] multiplySignificand - pass IEEEFloat as const reference. NFCI. Avoids unnecessary IEEEFloat copies.	2021-02-01 15:41:50 +00:00
Kerry McLaughlin	267255edd9	[SVE][CodeGen] Remove performMaskedGatherScatterCombine The AArch64 DAG combine added by D90945 & D91433 extends the index of a scalable masked gather or scatter to i32 if necessary. This patch removes the combine and instead adds shouldExtendGSIndex, which is used by visitMaskedGather/Scatter in SelectionDAGBuilder to query whether the index should be extended before calling getMaskedGather/Scatter. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D94525	2021-02-01 14:10:00 +00:00
Serge Pavlov	ef7f39cab9	[FPEnv] Intrinsic for setting rounding mode To set non-default rounding mode user usually calls function 'fesetround' from standard C library. This way has some disadvantages. * It creates unnecessary dependency on libc. On the other hand, setting rounding mode requires few instructions and could be made by compiler. Sometimes standard C library even is not available, like in the case of GPU or AI cores that execute small kernels. * Compiler could generate more effective code if it knows that a particular call just sets rounding mode. This change introduces new IR intrinsic, namely 'llvm.set.rounding', which sets current rounding mode, similar to 'fesetround'. It however differs from the latter, because it is a lower level facility: * 'llvm.set.rounding' does not return any value, whereas 'fesetround' returns non-zero value in the case of failure. In glibc 'fesetround' reports failure if its argument is invalid or unsupported or if floating point operations are unavailable on the hardware. Compiler usually knows what core it generates code for and it can validate arguments in many cases. * Rounding mode is specified in 'fesetround' using constants like 'FE_TONEAREST', which are target dependent. It is inconvenient to work with such constants at IR level. C standard provides a target-independent way to specify rounding mode, it is used in FLT_ROUNDS, however it does not define standard way to set rounding mode using this encoding. This change implements only IR intrinsic. Lowering it to machine code is target-specific and will be implemented latter. Mapping of 'fesetround' to 'llvm.set.rounding' is also not implemented here. Differential Revision: https://reviews.llvm.org/D74729	2021-02-01 11:28:14 +07:00
Craig Topper	6f831b56f3	[RISCV][LegalizeTypes] Try to expand BSWAP before promoting if the promoted BSWAP would expand anyway. If we're going to end up expanding anyway, we should do it early so we don't create extra operations to handle the bytes added by promotion. This is helfpul on RISCV where we might have to promote i16 all the way to i64. Differential Revision: https://reviews.llvm.org/D95756	2021-01-31 14:33:29 -08:00
Florian Hahn	52eafc5cc1	[LTOCodeGenerator] Use lto::Config for options (NFC). This patch removes some options that have been duplicated in LTOCodeGenerator and instead use lto::Config directly to manage the options. This is a cleanup after 6a59f0560648. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95738	2021-01-31 19:08:07 +00:00
Kazu Hirata	e0a8f45e5f	[llvm] Drop unnecessary const from return types (NFC) Identified with const-return-type.	2021-01-31 10:23:43 -08:00
Alexey Lapshin	54c6ca7364	[dsymutil][DWARFLinker][NFC] make AddressManager not depending on the order of checks for relocations. Current dsymutil implementation of hasLiveMemoryLocation()/hasLiveAddressRange() and applyValidRelocs() assume that calls should be done in certain order (from first Dies to last). Multi-thread implementation might call these methods in other order(it might process compilation units in order other than they are physically located), so we remove restriction that searching for relocations should be done in ascending order. This change does not introduce noticable performance degradation. The testing results for clang binary: golden-dsymutil/dsymutil 23787992 clang MD5: 5efa8fd9355ebf81b65f24db5375caa2 elapsed time=91sec build-Release/bin/dsymutil 23855616 clang MD5: 5efa8fd9355ebf81b65f24db5375caa2 elapsed time=91sec Differential Revision: https://reviews.llvm.org/D93106	2021-01-31 16:34:10 +03:00
Kazu Hirata	d4906cf2ad	[llvm] Add missing header guards (NFC) Identified with llvm-header-guard.	2021-01-30 09:53:42 -08:00
Florian Hahn	d993d158ec	[LTO] Add option enable NewPM with LTOCodeGenerator. This patch adds an option to enable the new pass manager in LTOCodeGenerator. It also updates a few tests with legacy PM specific tests, which started failing after 6a59f0560648 when LLVM_ENABLE_NEW_PASS_MANAGER=true.	2021-01-30 11:54:20 +00:00
Florian Hahn	da0198959b	[LTO] Use lto::backend for code generation. This patch updates LTOCodeGenerator to use the utilities provided by LTOBackend to run middle-end optimizations and backend code generation. This is a first step towards unifying the code used by libLTO's C API and the newer, C++ interface (see PR41541). The immediate motivation is to allow using the new pass manager when doing LTO using libLTO's C API, which is used on Darwin, among others. With the changes, there are no codegen/stats differences when building MultiSource/SPEC2000/SPEC2006 on Darwin X86 with LTO, compared to without the patch. Reviewed By: steven_wu Differential Revision: https://reviews.llvm.org/D94487	2021-01-30 10:09:55 +00:00
Kazu Hirata	04422f73a1	[llvm] Use append_range (NFC)	2021-01-29 23:23:34 -08:00
Nathan Hawes	3567e63d35	[VFS] Combine VFSFromYamlDirIterImpl and OverlayFSDirIterImpl into a single implementation (NFC) As a fixme notes, both of these directory iterator implementations are conceptually similar and duplicate the functionality of returning and uniquing entries across two or more directories. This patch combines them into a single class 'CombiningDirIterImpl'. This also drops the 'Redirecting' prefix from RedirectingDirEntry and RedirectingFileEntry to save horizontal space. There's no loss of clarity as they already have to be prefixed with 'RedirectingFileSystem::' whenever they're referenced anyway. rdar://problem/72485443 Differential Revision: https://reviews.llvm.org/D94857	2021-01-30 11:10:10 +10:00
Roman Lebedev	e695f6dc77	[ShadowStackGCLowering] Preserve Dominator Tree, if avaliable This doesn't help avoid any Dominator Tree recalculations just yet, there's one more pass to go..	2021-01-30 01:14:51 +03:00
Christopher Tetreault	67f2871ba2	[SVE] delete VectorType::getNumElements() The previously agreed-upon deprecation period for VectorType::getNumElements() has passed. This patch removes this method and completes the refactor proposed in the RFC: https://lists.llvm.org/pipermail/llvm-dev/2020-March/139811.html Reviewed By: david-arm, rjmccall Differential Revision: https://reviews.llvm.org/D95570	2021-01-29 13:46:54 -08:00
Jay Foad	9fb7df77ee	[GlobalISel] Fix modifying a G_OR without notifying the observer Remove the call to setFlags in favour of creating the instruction with the correct flags in the first place, so we don't have to explicitly notify the observer. Differential Revision: https://reviews.llvm.org/D95681	2021-01-29 16:32:24 +00:00
Florian Hahn	f1e4600eb5	[LTO] Update splitCodeGen to take a reference to the module. (NFC) splitCodeGen does not need to take ownership of the module, as it currently clones the original module for each split operation. There is an ~4 year old fixme to change that, but until this is addressed, the function can just take a reference to the module. This makes the transition of LTOCodeGenerator to use LTOBackend a bit easier, because under some circumstances, LTOCodeGenerator needs to write the original module back after codegen. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95222	2021-01-29 11:53:11 +00:00
Kazu Hirata	606fed8845	[llvm] Forward-declare formatted_raw_ostream (NFC) Various TargetStreamer.h need formatted_raw_ostream but rely on a forward declaration of formatted_raw_ostream in MCStreamer.h. This patch adds forward declarations right in TargetStreamer.h. While we are at it, this patch removes the one in MCStreamer.h, where it is unnecessary.	2021-01-28 22:21:13 -08:00
Christudasan Devadasan	e86b7c773c	Support a list of CostPerUse values This patch allows targets to define multiple cost values for each register so that the cost model can be more flexible and better used during the register allocation as per the target requirements. For AMDGPU the VGPR allocation will be more efficient if the register cost can be associated dynamically based on the calling convention. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D86836	2021-01-29 10:14:52 +05:30
Duncan P. N. Exon Smith	a126d2972b	ADT: Add SFINAE to the generic IntrusiveRefCntPtr constructors Add an `enable_if` to the generic `IntrusiveRefCntPtr` constructors so that std::is_convertible gives an honest answer when the underlying pointers cannot be converted. Added `static_assert`s to the test suite to verify. Also combine generic constructors from `IntrusiveRefCntPtr<X>&&` and `const IntrusiveRefCntPtr<X>&`. At first glance this appears to be an infinite loop, but the real copy/move constructors are spelled out separately above. Added a unit test to verify. Differential Revision: https://reviews.llvm.org/D95498	2021-01-28 15:07:27 -08:00
Jessica Paquette	c1671731c6	Recommit "[GlobalISel] Walk through hints in getDefIgnoringCopies et al" Recommit of 4580acf6752ea3cc884657b5aa3e174bed86fc8c `Opc = DefMI->getOpcode()` was in the wrong place.	2021-01-28 14:43:00 -08:00
Jessica Paquette	2033979607	Revert "[GlobalISel] Walk through hints in getDefIgnoringCopies et al" This reverts commit 4580acf6752ea3cc884657b5aa3e174bed86fc8c. Reverting while looking into some test failures.	2021-01-28 14:37:57 -08:00
Jessica Paquette	fe6b2e148e	[GlobalISel] Walk through hints in getDefIgnoringCopies et al Treat hint instructions like G_ASSERT_ZEXT like COPY instructions in helpers which walk through copies. This ensures that instructions like G_ASSERT_ZEXT won't impact any optimizations that rely on these helpers. Differential Revision: https://reviews.llvm.org/D95577	2021-01-28 14:27:00 -08:00
Cassie Jones	e11c57fcf5	[GlobalISel] Implement widenScalar for carry-in add/sub These are widened to a wider UADDE/USUBE, with the overflow value unused, and with the same synthesis of a new overflow value as for the O operations. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D95326	2021-01-28 17:06:24 -05:00
Jessica Paquette	3fcc23c823	[GlobalISel] Add G_ASSERT_ZEXT This adds a generic opcode which communicates that a type has already been zero-extended from a narrower type. This is intended to be similar to AssertZext in SelectionDAG. For example, ``` %x_was_extended:_(s64) = G_ASSERT_ZEXT %x, 16 ``` Signifies that the top 48 bits of %x are known to be 0. This is useful in cases like this: ``` define i1 @zeroext_param(i8 zeroext %x) { %cmp = icmp ult i8 %x, -20 ret i1 %cmp } ``` In AArch64, `%x` must use a 32-bit register, which is then truncated to a 8-bit value. If we know that `%x` is already zero-ed out in the relevant high bits, we can avoid the truncate. Currently, in GISel, this looks like this: ``` _zeroext_param: and w8, w0, #0xff ; We don't actually need this! cmp w8, #236 cset w0, lo ret ``` While SDAG does not produce the truncation, since it knows that it's unnecessary: ``` _zeroext_param: cmp w0, #236 cset w0, lo ret ``` This patch - Adds G_ASSERT_ZEXT - Adds MIRBuilder support for it - Adds MachineVerifier support for it - Documents it It also puts G_ASSERT_ZEXT into its own class of "hint instruction." (There should be a G_ASSERT_SEXT in the future, maybe a G_ASSERT_ALIGN as well.) This allows us to skip over hints in the legalizer etc. These can then later be selected like COPY instructions or removed. Differential Revision: https://reviews.llvm.org/D95564	2021-01-28 13:58:37 -08:00
Greg Clayton	4dc036b075	Add the ability to extract the unwind rows from DWARF Call Frame Information. This patch adds the ability to evaluate the state machine for CIE and FDE unwind objects and produce a UnwindTable with all UnwindRow objects needed to unwind registers. It will also dump the UnwindTable for each CIE and FDE when dumping DWARF .debug_frame or .eh_frame sections in llvm-dwarfdump or llvm-objdump. This allows users to see what the unwind rows actually look like for a given CIE or FDE instead of just seeing a list of opcodes. This patch adds new classes: UnwindLocation, RegisterLocations, UnwindRow, and UnwindTable. UnwindLocation is a class that describes how to unwind a register or Call Frame Address (CFA). RegisterLocations is a class that tracks registers and their UnwindLocations. It gets populated when parsing the DWARF call frame instruction opcodes for a unwind row. The registers are mapped from their register numbers to the UnwindLocation in a map. UnwindRow contains the result of evaluating a row of DWARF call frame instructions for the CIE, or a row from a FDE. The CIE can produce a set of initial instructions that each FDE that points to that CIE will use as the seed for the state machine when parsing FDE opcodes. A UnwindRow for a CIE will not have a valid address, whille a UnwindRow for a FDE will have a valid address. The UnwindTable is a class that contains a sorted (by address) vector of UnwindRow objects and is the result of parsing all opcodes in a CIE, or FDE. Parsing a CIE should produce a UnwindTable with a single row. Parsing a FDE will produce a UnwindTable with one or more UnwindRow objects where all UnwindRow objects have valid addresses. The rows in the UnwindTable will be sorted from lowest Address to highest after parsing the state machine, or an error will be returned if the table isn't sorted. To parse a UnwindTable clients can use the following methods: static Expected<UnwindTable> UnwindTable::create(const CIE Cie); static Expected<UnwindTable> UnwindTable::create(const FDE Fde); A valid table will be returned if the DWARF call frame instruction opcodes have no encoding errors. There are a few things that can go wrong during the evaluation of the state machine and these create functions will catch and return them. Differential Revision: https://reviews.llvm.org/D89845	2021-01-28 13:39:17 -08:00
Reid Kleckner	365f97bc0d	Revert "[PDB] Defer relocating .debug$S until commit time and parallelize it" This reverts commit 1a9bd5b81328adf0dd5a8b4f3ad5949463e66da3. I suspect that this patch may have caused https://crbug.com/1171438.	2021-01-28 13:17:27 -08:00
Thomas Lively	30175dd5d9	[WebAssembly] Prototype i8x16 to i32x4 widening instructions As proposed in https://github.com/WebAssembly/simd/pull/395 and matching the opcodes used in V8: https://chromium-review.googlesource.com/c/v8/v8/+/2617385/4/src/wasm/wasm-opcodes.h Differential Revision: https://reviews.llvm.org/D95557	2021-01-28 10:59:32 -08:00
David Blaikie	266d73d7f0	DebugInfo: Add a DWARF FORM extension for addrx+offset references to reduce relocations This is an alternative to the use of complex DWARF expressions for addresses - shaving off a few extra bytes of expression overhead.	2021-01-28 10:20:02 -08:00
Simon Pilgrim	0cbd016ac6	[APFloat] Remove orphan ilogb(DoubleAPFloat) declaration. NFCI.	2021-01-28 15:18:25 +00:00
Simon Pilgrim	984ce3745a	[APFloat] scalbn - pass DoubleAPFloat arg as const-ref. NFCI. Avoid unnecessary copy and fix clang-tidy warning.	2021-01-28 15:18:24 +00:00
Stefan Gränitz	2f98bedc10	[Orc] Remove unused header from TPC server The header would include OrcJIT headers in OrcTargetProcess, which is not desired. All common declarations should be in OrcShared. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D95606	2021-01-28 14:16:49 +01:00
Simon Pilgrim	5bf02200da	[DebugInfo] Remove some unused includes. NFCI. Mainly removing a lot of <vector> includes from files that don't explicitly use std::vector	2021-01-28 11:21:35 +00:00
Georgii Rymar	441aa2949d	[yaml2obj] - Allow empty SectionHeaderTable definitions. Currently we don't allow the following definition: ``` Sections: - Type: SectionHeaderTable - Name: .foo Type: SHT_PROGBITS ``` We report an error: "SectionHeaderTable can't be empty. Use 'NoHeaders' key to drop the section header table". It was implemented in this way earlier, when `SectionHeaderTable` was a dedicated key outside of the `Sections` list. And we did not allow to select where the table is written. Currently it makes sense to allow it, because a user might want to place the default section header table at an arbitrary position, e.g. before other sections. In this case it is not convenient and error prone to require specifying all sections: ``` Sections: - Type: SectionHeaderTable Sections: - Name: .foo - Name: .strtab - Name: .shstrtab - Name: .foo Type: SHT_PROGBITS ``` This patch allows empty SectionHeaderTable definitions. Differential revision: https://reviews.llvm.org/D95341	2021-01-28 10:51:52 +03:00
Kazu Hirata	3902d5e888	[DebugInfo] Forward-declare PDBFile (NFC) NativeEnumInjectedSources.h needs PDBFile but relies on a forward declaration of PDBFile in InjectedSourceStream.h. This patch adds a forward declaration right in NativeEnumInjectedSources.h. While we are at it, this patch removes the one in InjectedSourceStream.h, where it is unnecessary.	2021-01-27 23:25:38 -08:00
Hongtao Yu	9b80fe63e4	[CSSPGO] Support of CS profiles in extended binary format. This change brings up support of context-sensitive profiles in the format of extended binary. Existing sample profile reader/writer/merger code is being tweaked to reflect the fact of bracketed input contexts, like (`[...]`). The paired brackets are also needed in extbinary profiles because we don't yet have an otherwise good way to tell calling contexts apart from regular function names since the context delimiter `@` can somehow serve as a part of the C++ mangled names. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95547	2021-01-27 21:29:46 -08:00
Fangrui Song	612a586751	[llvm-c] Move LLVMX86_AMXTypeKind & LLVMPoisonValueValueKind to the bottom to avoid value changes compared with LLVM<=11 Fixes PR48905	2021-01-27 16:28:04 -08:00
Teresa Johnson	75870000ad	[LTO] Prevent devirtualization for symbols dynamically exported Identify dynamically exported symbols (--export-dynamic[-symbol=], --dynamic-list=, or definitions needed to preempt shared objects) and prevent their LTO visibility from being upgraded. This helps avoid use of whole program devirtualization when there may be overrides in dynamic libraries. Differential Revision: https://reviews.llvm.org/D91583	2021-01-27 15:54:13 -08:00
James Y Knight	37f683eeb0	Itanium Mangling: Mangle `__alignof__` differently than `alignof`. The two operations have acted differently since Clang 8, but were unfortunately mangled the same. The new mangling uses new "vendor extended expression" syntax proposed in https://github.com/itanium-cxx-abi/cxx-abi/issues/112 GCC had the same mangling problem, https://gcc.gnu.org/PR88115, and will hopefully be switching to the same mangling as implemented here. Additionally, fix the mangling of `__uuidof` to use the new extension syntax, instead of its previous nonstandard special-case. Adjusts the demangler accordingly. Differential Revision: https://reviews.llvm.org/D93922	2021-01-27 16:46:51 -05:00
Varun Gandhi	52a271678f	[Demangle] Support demangling Swift calling convention in MS demangler. Previously, Clang was able to mangle the Swift calling convention but 'MicrosoftDemangle.cpp' was not able to demangle it. Reviewed By: compnerd, rnk Differential Revision: https://reviews.llvm.org/D95053	2021-01-27 13:24:54 -08:00
Sanjay Patel	2ae45edb62	[LoopVectorize] use IR fast-math-flags exclusively (not FP function attributes) I am trying to untangle the fast-math-flags propagation logic in the vectorizers (see a6f022127 for SLP). The loop vectorizer has a mix of checking FP function attributes, IR-level FMF, and just wrong assumptions. I am trying to avoid regressions while fixing this, and I think the IR-level logic is good enough for that, but it's hard to say for sure. This would be the 1st step in the clean-up. The existing test that I changed to include 'fast' actually shows a miscompile: the function only had the equivalent of nnan, but we created new instructions that had fast (all FMF set). This is similar to the example in https://llvm.org/PR35538 Differential Revision: https://reviews.llvm.org/D95452	2021-01-27 14:17:11 -05:00
Fangrui Song	b3b970744d	[ThinLTO] Add Visibility bits to GlobalValueSummary::GVFlags Imported functions and variable get the visibility from the module supplying the definition. However, non-imported definitions do not get the visibility from (ELF) the most constraining visibility among all modules (Mach-O) the visibility of the prevailing definition. This patch * adds visibility bits to GlobalValueSummary::GVFlags * computes the result visibility and propagates it to all definitions Protected/hidden can imply dso_local which can enable some optimizations (this is stronger than GVFlags::DSOLocal because the implied dso_local can be leveraged for ELF -shared while default visibility dso_local has to be cleared for ELF -shared). Note: we don't have summaries for declarations, so for ELF if a declaration has the most constraining visibility, the result visibility may not be that one. Differential Revision: https://reviews.llvm.org/D92900	2021-01-27 10:43:51 -08:00
Craig Topper	96c406a887	[FaultsMaps][llvm-objdump] Move FaultMapParser to Object/. Remove CodeGen dependency from llvm-objdump FaultsMapParser lived in CodeGen and was forcing llvm-objdump to link CodeGen and everything CodeGen depends on. This was previously attempted in r240364 to fix a link failure. The CodeGen dependency was independently added to fix the same link failure, and that ended up being kept. Removing the dependency seems like the correct layering for llvm-objdump. Reviewed By: MaskRay, jhenderson Differential Revision: https://reviews.llvm.org/D95414	2021-01-27 10:39:59 -08:00
Valentin Clement	f7b64e36c0	[flang][openacc] Allow multiple wait clauses kernels loop and enter data had a too restrictive constraint for the wait clause. The wait clause is allowed multiple times and not only once. This patch fix this problem. Reviewed By: SouraVX Differential Revision: https://reviews.llvm.org/D95469	2021-01-27 13:18:46 -05:00
Florian Hahn	5d270f12d2	[LoopUtils] Pass SCEVExpander instead SE to addRuntimeChecks. This gives the user control over which expander to use, which in turn allows the user to decide what to do with the expanded instructions. Used in D75980. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D94295	2021-01-27 17:36:19 +00:00
Valentin Clement	1973b57fb0	[flang][openacc] Fix clause restriction for exit data directive Restriction on clauses for the EXIT DATA directive were not fully correct. This patch fixes the situation. The async, if and finalize clauses are allowed only once. Reviewed By: SouraVX Differential Revision: https://reviews.llvm.org/D95470	2021-01-27 10:07:19 -05:00
Valentin Clement	202f767031	[flang][openacc] Fix clause restriction for host_data directive Restriction on clauses for the HOST_DATA directive were not fully correct. This patch fixes the situation. The if and if_present clauses are allowed only once. Reviewed By: SouraVX Differential Revision: https://reviews.llvm.org/D95473	2021-01-27 10:06:33 -05:00
Kazu Hirata	d493c03566	[AMDGPU] Forward-declare TargetRegisterClass (NFC) AMDGPUInstructionSelector.h needs TargetRegisterClass but relies on a forward declaration of TargetRegisterClass in InstructionSelector.h. This patch adds a forward declaration right in AMDGPUInstructionSelector.h. While we are at it, this patch removes the one in InstructionSelector.h, where it is unnecessary.	2021-01-26 20:00:16 -08:00
Petr Hosek	3cfd18b793	Support for instrumenting only selected files or functions This change implements support for applying profile instrumentation only to selected files or functions. The implementation uses the sanitizer special case list format to select which files and functions to instrument, and relies on the new noprofile IR attribute to exclude functions from instrumentation. Differential Revision: https://reviews.llvm.org/D94820	2021-01-26 17:13:34 -08:00
Duncan P. N. Exon Smith	795af9c7de	Frontend: Simplify handling of non-seeking streams in CompilerInstance, NFC Add a new `raw_pwrite_ostream` variant, `buffer_unique_ostream`, which is like `buffer_ostream` but with unique ownership of the stream it's wrapping. Use this in CompilerInstance to simplify the ownership of non-seeking output streams, avoiding logic sprawled around to deal with them specially. This also simplifies future work to encapsulate output files in a different class. Differential Revision: https://reviews.llvm.org/D93260	2021-01-26 15:20:43 -08:00
Haowei Wu	4cba78c0ca	[llvm-elfabi] Support ELF file that lacks .gnu.hash section Before this change, when reading ELF file, elfabi determines number of entries in .dynsym by reading the .gnu.hash section. This change makes elfabi read section headers directly first. This change allows elfabi works on ELF files which do not have .gnu.hash sections. Differential Revision: https://reviews.llvm.org/D93362	2021-01-26 12:31:52 -08:00
Fangrui Song	71834ae8ab	Add -fbinutils-version= to gate ELF features on the specified binutils version There are two use cases. Assembler We have accrued some code gated on MCAsmInfo::useIntegratedAssembler(). Some features are supported by latest GNU as, but we have to use MCAsmInfo::useIntegratedAs() because the newer versions have not been widely adopted (e.g. SHF_LINK_ORDER 'o' and 'unique' linkage in 2.35, --compress-debug-sections= in 2.26). Linker We want to use features supported only by LLD or very new GNU ld, or don't want to work around older GNU ld. We currently can't represent that "we don't care about old GNU ld". You can find such workarounds in a few other places, e.g. Mips/MipsAsmprinter.cpp PowerPC/PPCTOCRegDeps.cpp X86/X86MCInstrLower.cpp AArch64 TLS workaround for R_AARCH64_TLSLD_MOVW_DTPREL_* (PR ld/18276), R_AARCH64_TLSLE_LDST8_TPREL_LO12 (https://bugs.llvm.org/show_bug.cgi?id=36727 https://sourceware.org/bugzilla/show_bug.cgi?id=22969) Mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER components (supported by LLD in D84001; GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=16833 may take a while before available). This feature allows to garbage collect some unused sections (e.g. fragmented .gcc_except_table). This patch adds `-fbinutils-version=` to clang and `-binutils-version` to llc. It changes one codegen place in SHF_MERGE to demonstrate its usage. `-fbinutils-version=2.35` means the produced object file does not care about GNU ld<2.35 compatibility. When `-fno-integrated-as` is specified, the produced assembly can be consumed by GNU as>=2.35, but older versions may not work. `-fbinutils-version=none` means that we can use all ELF features, regardless of GNU as/ld support. Both clang and llc need `parseBinutilsVersion`. Such command line parsing is usually implemented in `llvm/lib/CodeGen/CommandFlags.cpp` (LLVMCodeGen), however, ClangCodeGen does not depend on LLVMCodeGen. So I add `parseBinutilsVersion` to `llvm/lib/Target/TargetMachine.cpp` (LLVMTarget). Differential Revision: https://reviews.llvm.org/D85474	2021-01-26 12:28:23 -08:00
Petr Hosek	ef4906bd9c	Revert "Support for instrumenting only selected files or functions" This reverts commit 4edf35f11a9e20bd5df3cb47283715f0ff38b751 because the test fails on Windows bots.	2021-01-26 12:25:28 -08:00
Petr Hosek	a8839c989d	Support for instrumenting only selected files or functions This change implements support for applying profile instrumentation only to selected files or functions. The implementation uses the sanitizer special case list format to select which files and functions to instrument, and relies on the new noprofile IR attribute to exclude functions from instrumentation. Differential Revision: https://reviews.llvm.org/D94820	2021-01-26 11:11:39 -08:00
Simon Pilgrim	95a58e9335	[AMDGPU] HSAMD::fromString - replace std::string arg with StringRef. NFCI. Removes an unnecessary chain of StringRef -> std::string -> StringRef conversions	2021-01-26 16:09:39 +00:00
Sander de Smalen	466208abea	[CostModel] Handle CTLZ and CCTZ in getTypeBasedIntrinsicInstrCost Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D95355	2021-01-26 14:37:51 +00:00
Sebastian Neubauer	3bc57e4094	[AMDGPU] Add IntrWillReturn to three intrinsics None of these can terminate a wave or lane. With these, all intrinsic are IntrWillReturn except those that change exec or can terminate the wave. Not marking intrinsics as WillReturn may prevent optimizations in the future: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148047.html Differential Revision: https://reviews.llvm.org/D95436	2021-01-26 15:33:15 +01:00
Florian Hahn	752c4c588e	[Passes] Run peeling as part of simple/full loop unrolling. Loop peeling removes conditions from loop bodies that become invariant after a small number of iterations. When triggered, this leads to fewer compares and possibly PHIs in loop bodies, enabling further optimizations. The current cost-model of loop peeling should be quite conservative/safe, i.e. only peel if a condition in the loop becomes known after peeling. For example, see PR47671, where loop peeling enables vectorization by removing a PHI the vectorizer does not understand. Granted, the loop-vectorizer could also be taught about constant PHIs, but loop peeling is likely to enable other optimizations as well. This has an impact on quite a few benchmarks from MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto, for example Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsVectorized Program base patch diff test-suite...ve-susan/automotive-susan.test 8.00 9.00 12.5% test-suite...nal/skidmarks10/skidmarks.test 35.00 31.00 -11.4% test-suite...lications/sqlite3/sqlite3.test 41.00 43.00 4.9% test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test 25.00 26.00 4.0% test-suite...006/450.soplex/450.soplex.test 88.00 89.00 1.1% test-suite...TimberWolfMC/timberwolfmc.test 120.00 119.00 -0.8% test-suite.../CINT2006/403.gcc/403.gcc.test 215.00 216.00 0.5% test-suite...006/447.dealII/447.dealII.test 957.00 958.00 0.1% test-suite...ternal/HMMER/hmmcalibrate.test 75.00 75.00 0.0% Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsAnalyzed Program base patch diff test-suite...ks/Prolangs-C/agrep/agrep.test 440.00 434.00 -1.4% test-suite...nal/skidmarks10/skidmarks.test 312.00 308.00 -1.3% test-suite...marks/7zip/7zip-benchmark.test 6399.00 6323.00 -1.2% test-suite...lications/minisat/minisat.test 134.00 135.00 0.7% test-suite...rks/FreeBench/pifft/pifft.test 295.00 297.00 0.7% test-suite...TimberWolfMC/timberwolfmc.test 1879.00 1869.00 -0.5% test-suite...pplications/treecc/treecc.test 689.00 691.00 0.3% test-suite...T2000/300.twolf/300.twolf.test 1593.00 1597.00 0.3% test-suite.../Benchmarks/Bullet/bullet.test 1394.00 1392.00 -0.1% test-suite...ications/JM/ldecod/ldecod.test 1431.00 1429.00 -0.1% test-suite...6/464.h264ref/464.h264ref.test 2229.00 2230.00 0.0% test-suite...lications/sqlite3/sqlite3.test 2590.00 2589.00 -0.0% test-suite...ications/JM/lencod/lencod.test 2732.00 2733.00 0.0% test-suite...006/453.povray/453.povray.test 3395.00 3394.00 -0.0% Note the -11% regression in number of loops vectorized for skidmarks. I suspect this corresponds to the fact that those loops are gone now (see the reduction in number of loops analyzed by LV). Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88471	2021-01-26 13:52:30 +00:00
Georgii Rymar	b39ef35f86	[yaml2obj][obj2yaml] - Improve how we set/dump the sh_entsize field. We already set the `sh_entsize` field in a single place for all non-implicit sections. This patch reorders the logic slightly and with it we finally have the only one place where the `sh_entsize` is set. obj2yaml will not dump the `EntSize` key for `SHT_DYNSYM/SHT_SYMTAB` sections anymore, when the value of `sh_entsize` is equal to `sizeof(Elf_Sym)` Note that this also seems revealed an issue in llvm-objcopy: Previously yaml2obj set the `sh_entsize` for the `.symtab` section to 0x18, now we it sets it for `SHT_SYMTAB` sections, i.e. by type. But the `llvm-objcopy/ELF/only-keep-debug.test` has a `.symtab` section of type `SHT_STRTAB`, and now yaml2obj sets the `sh_entsize` to 0 for it. I had to update the corresponding check lines for `ES`, but the behavior of `llvm-objcopy` should be fixed instead I think. I've added a TODO and a comment. Differential revision: https://reviews.llvm.org/D95364	2021-01-26 13:33:02 +03:00
Georgii Rymar	2900e3238f	[libObject,llvm-readelf/obj] - Don't use @@ when printing versions of undefined symbols. A default version (@@) is only available for defined symbols. Currently we use "@@" for undefined symbols too. This patch fixes the issue and improves our test case. Differential revision: https://reviews.llvm.org/D95219	2021-01-26 12:05:59 +03:00
Jan Svoboda	8d411fdc2d	[clang][cli] Accept strings instead of options in ImpliedByAnyOf To be able to refer to constant keypaths (e.g. `defvar cplusplus = LangOpts<"CPlusPlus">`) inside `ImpliedByAnyOf`, let's accept strings instead of `Option` instances. This somewhat weakens the guarantees that we're referring to an existing (option) record, but we can still use the option.KeyPath syntax to simulate this. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D95344	2021-01-26 09:30:36 +01:00
Hsiangkai Wang	816e1ded77	[RISCV] Implement vlsegff intrinsics. Differential Revision: https://reviews.llvm.org/D95303	2021-01-26 12:02:43 +08:00
Kazu Hirata	803b1cc469	[AMDGPU] Forward-declare MachineIRBuilder (NFC) AMDGPULegalizerInfo.h needs MachineIRBuilder but relies on a forward declaration of MachineIRBuilder in LegalizerInfo.h. This patch adds a forward declaration right in AMDGPULegalizerInfo.h. While we are at it, this patch removes the one in LegalizerInfo.h, where it is unnecessary.	2021-01-25 19:24:01 -08:00
Kazu Hirata	23ef7e2902	[StackSafety] Use ListSeparator (NFC)	2021-01-25 19:23:59 -08:00
Amara Emerson	20267c085f	[GlobalISel][Localizer] Don't localize phi operands which are used more than once in the phi. The current algorithm just tries to localize defs as far as they can go, and in the case of G_PHI operands, it clones the def into the predecessor block for each incoming edge. When multiple edges have the same register value, this can cause unnecessary code bloat, and inhibit later optimizations. This change checks if a given phi operand is unique in the phi, if not the def of that register is not localized to the predecessor. Differential Revision: https://reviews.llvm.org/D95406	2021-01-25 17:48:04 -08:00
Mitch Phillips	587eafbc21	Revert "Revert "[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method"" This reverts commit 554b3211fefd09b56b64357b9edd66c78ae200b5. Differential Revision: https://reviews.llvm.org/D95035	2021-01-25 16:22:22 -08:00
modimo	46d7a90a64	[InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks This change leverages the work done in D83743 to replay in the SampleProfile inliner to also be used in the CGSCC inliner. NOTE: currently restricted to non-ML advisors only. The added switch `-cgscc-inline-replay=<remarks file>` will replay the inlining decisions in that file where the remarks file is generated via `-Rpass=inline`. The aim here is to make it easier to analyze changes that would modify inlining heuristics to be separated from this behavior. Doing so allows easier examination of assembly and runtime behavior compared to the baseline rather than trying to dig through the large churn caused by inlining. In LTO compilation, since inlining is done twice you can separately specify replay by passing the flag to the FE (`-cgscc-inline-replay=`) and to the linker (`-Wl,cgscc-inline-replay=`) with the remarks generated from their respective places. Testing on mysqld by comparing the inline decisions between base (generates remarks.txt) and diff (replay using identical input/tools with remarks.txt) and examining the inlining sites with `diff` shows 14,000 mismatches out of 247,341 for a ~94% replay accuracy. I believe this gap can be narrowed further though for the general case we may never achieve full accuracy. For my personal use, this is close enough to be representative: I set the baseline as the one generated by the replay on identical input/toolset and compare that to my modified input/toolset using the same replay. Testing: ninja check-llvm newly added test correctly replays CGSCC inlining decisions Reviewed By: mtrofin, wenlei Differential Revision: https://reviews.llvm.org/D94334	2021-01-25 15:38:57 -08:00
Duncan P. N. Exon Smith	d8f7c22241	Support: Remove duplicated code in {File,clang::ModulesDependency}Collector, NFC Refactor the duplicated canonicalize-path logic in `FileCollector` and `ModulesDependencyCollector` into a new utility called `PathCanonicalizer` that's shared. This popped up when tracking down a bug common to both in https://reviews.llvm.org/D95202. As drive-bys, update a few names and comments to better reflect the effect of the code, delay removal of `..`s to avoid an unnecessary extra string copy, and leave behind a couple of FIXMEs for future consideration. Differential Revision: https://reviews.llvm.org/D95279	2021-01-25 15:09:00 -08:00
Richard Smith	b9cfb936a1	Revert "[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV" This reverts commit 53176c168061d6f26dcf3ce4fa59288b7d67255e, which introduceed a layering violation. LLVM's IR library can't include headers from Analysis.	2021-01-25 13:53:38 -08:00
Jonas Devlieghere	ba9adaa9dd	[YAML I/O] Fix bug in emission of empty sequence Don't emit an output dash for an empty sequence. Take emitting a vector of strings for example: std::vector<std::string> Strings = {"foo", "bar"}; LLVM_YAML_IS_SEQUENCE_VECTOR(std::string) yout << Strings; This emits the following YAML document. --- - foo - bar ... When the vector is empty, this generates the following result: --- - [] ... Although this is valid YAML, it does not match what we meant to emit. The result is a one-element sequence consisting of an empty list. Indeed, if we were to try to read this again we get an error: YAML:2:4: error: not a mapping - [] The problem is the output dash before the empty list. The correct output would be: --- [] ... This patch fixes that by not emitting the output dash for an empty sequence. Differential revision: https://reviews.llvm.org/D95280	2021-01-25 13:35:36 -08:00
Akira Hatanaka	96195f7442	[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end annotates calls with attribute "clang.arc.rv"="retain" or "clang.arc.rv"="claim", which indicates the call is implicitly followed by a marker instruction and a retainRV/claimRV call that consumes the call result. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the annotated calls in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the annotated calls. It doesn't remove the attribute on the call since the backend needs it to emit the marker instruction. The retainRV/claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes the autoreleaseRV call in the callee that returns the result if nothing in the callee prevents it from being paired up with the calls annotated with "clang.arc.rv"="retain/claim" in the caller. If the call is annotated with "claim", a release call is inserted since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the attributes to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV call returning the callee result, which makes it impossible to pair it up with the retainRV or claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the call is annotated with "retain" and does nothing if it's annotated with "claim". - This patch teaches dead argument elimination pass not to change the return type of a function if any of the calls to the function are annotated with attribute "clang.arc.rv". This is necessary since the pass can incorrectly determine nothing in the IR uses the function return, which can happen since the front-end no longer explicitly emits retainRV/claimRV calls in the IR, and change its return type to 'void'. Future work: - Use the attribute on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the attributes. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-01-25 11:57:08 -08:00
Sander de Smalen	110f0110d8	[InstructionCost] Prevent InstructionCost being created with CostState. For a function that returns InstructionCost, it is very tempting to write: return InstructionCost::Invalid; But that actually returns InstructionCost(1 /* int value of Invalid */)) which has a totally different meaning. By marking this constructor as `delete`, this can no longer happen.	2021-01-25 11:26:56 +00:00
Georgii Rymar	fc47fdd498	[yaml2obj, obj2yaml] - Implement section header table as a special Chunk. This was discussed in D93678 thread. Currently we have one special chunk - Fill. This patch re implements the "SectionHeaderTable" key to become a special chunk too. With that we are able to place the section header table at any location, just like we place sections. Differential revision: https://reviews.llvm.org/D95140	2021-01-25 13:08:08 +03:00
Fangrui Song	7cc3572b85	[XRay] Support DW_TAG_call_site and delete unneeded PATCHABLE_EVENT_CALL/PATCHABLE_TYPED_EVENT_CALL lowering	2021-01-25 00:49:18 -08:00
QingShan Zhang	d5b70bbb38	[NFC] [DAGCombine] Correct the result for sqrt even the iteration is zero For now, we correct the result for sqrt if iteration > 0. This doesn't make sense as they are not strict relative. Reviewed By: dmgreen, spatel, RKSimon Differential Revision: https://reviews.llvm.org/D94480	2021-01-25 04:02:44 +00:00
Chen Zheng	db58448497	[PowerPC] support register pressure reduction in machine combiner. Reassociating some patterns to generate more fma instructions to reduce register pressure. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D92071	2021-01-24 21:28:21 -05:00
Carl Ritson	8b5995e559	[AMDGPU] Fix llvm.amdgcn.init.exec and frame materialization Frame-base materialization may insert vector instructions before EXEC is initialised. Fix this by moving lowering of llvm.amdgcn.init.exec later in backend. Also remove SI_INIT_EXEC_LO pseudo as this is not necessary. Reviewed By: ruiling Differential Revision: https://reviews.llvm.org/D94645	2021-01-25 08:31:17 +09:00
Kazu Hirata	0ec2908ce9	[llvm] Use pop_back_val (NFC)	2021-01-24 12:18:57 -08:00
Kazu Hirata	2fb5578408	[CodeGen] Forward-declare TargetMachine (NFC) InstrEmitter.h needs TargetMachine but relies on a forward declaration of TargetMachine in MachineOperand.h. This patch adds a forward declaration right in InstrEmitter.h. While we are at it, this patch removes the one in MachineOperand.h, where it is unnecessary.	2021-01-24 12:18:54 -08:00
Nikita Popov	710501602d	[Utils] Use NoAliasScopeDeclInst in a few more places (NFC) In the cloning infrastructure, only track an MDNode mapping, without explicitly storing the Metadata mapping, same as is done during inlining. This makes things slightly simpler.	2021-01-24 16:24:11 +01:00
Florian Hahn	91e3095774	[LTO] Move DisableVerify setting to LTOCodeGenerator class (NFC). To simplify the transition to using LTOBackend, move DisableVerify to the LTOCodeGenerator class, like most/all other options. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95223	2021-01-24 14:14:40 +00:00
Jeroen Dobbelaere	2c513aff8c	[LoopRotate] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed Similar to D92887, LoopRotation also needs duplicate the noalias scopes when rotating a `@llvm.experimental.noalias.scope.decl` across a block boundary. This is based on the version from the Full Restrict paches (D68511). The problem it fixes also showed up in Transforms/Coroutines/ex5.ll after D93040 (when enabling strict checking with -verify-noalias-scope-decl-dom). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94306	2021-01-24 13:53:13 +01:00
Jeroen Dobbelaere	76b232f00a	[LoopUnroll] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed This is a fix for https://bugs.llvm.org/show_bug.cgi?id=39282. Compared to D90104, this version is based on part of the full restrict patched (D68484) and uses the `@llvm.experimental.noalias.scope.decl` intrinsic to track the location where !noalias and !alias.scope scopes have been introduced. This allows us to only duplicate the scopes that are really needed. Notes: - it also includes changes and tests from D90104 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D92887	2021-01-24 13:48:20 +01:00
Michael Kruse	d945273b52	[OpenMPIRBuilder] Implement tileLoops. The tileLoops method implements the code generation part of the tile directive introduced in OpenMP 5.1. It takes a list of loops forming a loop nest, tiles it, and returns the CanonicalLoopInfo representing the generated loops. The implementation takes n CanonicalLoopInfos, n tile size Values and returns 2*n new CanonicalLoopInfos. The input CanonicalLoopInfos are invalidated and BBs not reused in the new loop nest removed from the function. In a modified version of D76342, I was able to correctly compile and execute a tiled loop nest. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D92974	2021-01-23 19:39:29 -06:00
Nikita Popov	15041a14bb	[IR] Add NoAliasScopeDeclInst (NFC) Add an intrinsic type class to represent the llvm.experimental.noalias.scope.decl intrinsic, to make code working with it a bit nicer by hiding the metadata extraction from view.	2021-01-23 22:40:32 +01:00
Florian Hahn	5b8c530938	[FuzzMutate] Add mutator to modify instruction flags. This patch adds a new InstModificationIRStrategy to mutate flags/options for instructions. For example, it may add or remove nuw/nsw flags from add, mul, sub, shl instructions or change the predicate for icmp instructions. Subtle changes such as those mentioned above should lead to a more interesting range of inputs. The presence or absence of overflow flags can expose subtle bugs, for example. Reviewed By: bogner Differential Revision: https://reviews.llvm.org/D94905	2021-01-23 19:05:20 +00:00
Kazu Hirata	f4934140cd	[llvm] Forward-declare ICFLoopSafetyInfo (NFC) LoopUtils.h needs ICFLoopSafetyInfo but relies on a forward declaration of ICFLoopSafetyInfo in IVDescriptors.h. This patch adds a forward declaration right in LoopUtils.h. While we are at it, this patch removes the one in IVDescriptors.h, where it is unnecessary.	2021-01-23 10:56:30 -08:00
Florian Hahn	283961f4c8	[Local] Treat calls that may not return as being alive. With the addition of the `willreturn` attribute, functions that may not return (e.g. due to an infinite loop) are well defined, if they are not marked as `willreturn`. This patch updates `wouldInstructionBeTriviallyDead` to not consider calls that may not return as dead. This patch still provides an escape hatch for intrinsics, which are still assumed as willreturn unconditionally. It will be removed once all intrinsics definitions have been reviewed and updated. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94106	2021-01-23 16:05:14 +00:00
Roman Lebedev	cbc12c9267	[SimplifyCFG] Change 'LoopHeaders' to be ArrayRef<WeakVH>, not a naked set, thus avoiding dangling pointers If i change it to AssertingVH instead, a number of existing tests fail, which means we don't consistently remove from the set when deleting blocks, which means newly-created blocks may happen to appear in that set if they happen to occupy the same memory chunk as did some block that was in the set originally. There are many places where we delete blocks, and while we could probably consistently delete from LoopHeaders when deleting a block in transforms located in SimplifyCFG.cpp itself, transforms located elsewhere (Local.cpp/BasicBlockUtils.cpp) also may delete blocks, and it doesn't seem good to teach them to deal with it. Since we at most only ever delete from LoopHeaders, let's just delegate to WeakVH to do that automatically. But to be honest, personally, i'm not sure that the idea behind LoopHeaders is sound.	2021-01-23 16:48:35 +03:00
Florian Hahn	438026988c	[LTO] Store target attributes as vector of strings (NFC). The target features are obtained as a list of features/attributes. Instead of storing them in a single string, store the vector. This matches lto::Config's behavior and simplifies the transition to lto::backend(). Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95224	2021-01-23 12:11:58 +00:00
Simon Pilgrim	982a3c3ce9	[Support] TrigramIndex::insert - pass std::String argument by const reference. NFCI. Avoid string copies and fix clang-tidy warning.	2021-01-23 11:04:00 +00:00
Kazu Hirata	67c2d74113	[llvm] Use static_assert instead of assert (NFC) Identified with misc-static-assert.	2021-01-22 23:25:05 -08:00
Kazu Hirata	0069e576c3	[llvm] Use isAlpha/isAlnum (NFC)	2021-01-22 23:25:03 -08:00
Hsiangkai Wang	02776ce1c4	[RISCV] Implement vsoxseg/vsuxseg intrinsics. Define vsoxseg/vsuxseg intrinsics and pseudo instructions. Lower vsoxseg/vsuxseg intrinsics to pseudo instructions in RISCVDAGToDAGISel. Differential Revision: https://reviews.llvm.org/D94940	2021-01-23 08:54:56 +08:00
Hsiangkai Wang	233ef2a3eb	[RISCV] Implement vloxseg/vluxseg intrinsics. Define vloxseg/vluxseg intrinsics and pseudo instructions. Lower vloxseg/vluxseg intrinsics to pseudo instructions in RISCVDAGToDAGISel. Differential Revision: https://reviews.llvm.org/D94903	2021-01-23 08:54:56 +08:00
Duncan P. N. Exon Smith	e675c8ca11	ADT: Use 'using' to inherit assign and append in SmallString Rather than reimplement, use a `using` declaration to bring in `SmallVectorImpl<char>`'s assign and append implementations in `SmallString`. The `SmallString` versions were missing reference invalidation assertions from `SmallVector`. This patch also fixes a bug in `llvm::FileCollector::addFileImpl`, which was a copy/paste from `clang::ModuleDependencyCollector::copyToRoot`, both caught by the no-longer-skipped assertions. As a drive-by, this also sinks the `const SmallVectorImpl&` versions of these methods down into `SmallVectorImpl`, since I imagine they'd be useful elsewhere. Differential Revision: https://reviews.llvm.org/D95202	2021-01-22 16:17:58 -08:00
Stanislav Mekhanoshin	dd3332e2e5	Change materializeFrameBaseRegister() to return register The only caller of this function is in the LocalStackSlotAllocation and it creates base register of class returned by the target's getPointerRegClass(). AMDGPU wants to use a different reg class here so let materializeFrameBaseRegister to just create and return whatever it wants. Differential Revision: https://reviews.llvm.org/D95268	2021-01-22 15:51:06 -08:00
Mitch Phillips	7a51025e46	Revert "[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method" This reverts commit 2bb92bf451d7eb2c817f3e5403353e7c0c14d350. Dependent patch broke UBSan on Android: 3dedad475da45c05bc4f66cd14e9f44581edf0bc	2021-01-22 14:32:11 -08:00
Jonas Devlieghere	68cbad8a6e	[VFS] Fix inconsistencies between relative paths and fallthrough. This patch addresses inconsistencies in the way fallthrough is handled in the RedirectingFileSystem. Rather than trying to change the working directory of the external filesystem, the RedirectingFileSystem will canonicalize every path before handing it down. This guarantees that relative paths are resolved relative to the RedirectingFileSystem's working directory. This allows us to have a strictly virtual working directory, and still fallthrough for absolute paths, but not for relative paths that would get resolved incorrectly at the lower layer (for example, in case of the RealFileSystem, because the strictly virtual path does not exist). Differential revision: https://reviews.llvm.org/D95188	2021-01-22 14:15:48 -08:00
Cassie Jones	166f6f7864	[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method The widenScalar implementation for signed and unsigned overflowing operations were very similar: both are checked by truncating the result and then re-sign/zero-extending it and checking that it matches the computed operation. Using a truncate + zero-extend for the unsigned case instead of manually producing the AND instruction like before leads to an extra copy instruction during legalization, but this should be harmless. Differential Revision: https://reviews.llvm.org/D95035	2021-01-22 14:08:46 -08:00
Shimin Cui	50ae94abae	[Analysis] Support AIX vec_malloc routines This is to support the memory routines vec_malloc, vec_calloc, vec_realloc, and vec_free. These routines manage memory that is 16-byte aligned. And they are only available on AIX. Differential Revision: https://reviews.llvm.org/D94710	2021-01-22 16:03:01 -05:00
Yaxun (Sam) Liu	2e16ddcdc8	[HIP] Support __managed__ attribute This patch implements codegen for __managed__ variable attribute for HIP. Diagnostics will be added later. Differential Revision: https://reviews.llvm.org/D94814	2021-01-22 11:43:58 -05:00
Roman Lebedev	1cff9b2a83	[NFC][InstCombine] Extract freelyInvertAllUsersOf() out of canonicalizeICmpPredicate() I'd like to use it in an upcoming fold.	2021-01-22 17:23:53 +03:00
Nikita Popov	8c54b2d6f0	[IR] Optimize adding attribute to AttributeList (NFC) When adding an enum attribute to an AttributeList, avoid going through an AttrBuilder and instead directly add the attribute to the correct set. Going through AttrBuilder is expensive, because it requires all string attributes to be reconstructed. This can be further improved by inserting the attribute at the right position and using the AttributeSetNode::getSorted() API. This recovers the small compile-time regression from D94633.	2021-01-22 11:30:21 +01:00
Jay Foad	eee5207f2a	[LegacyPM] Update InversedLastUser on the fly. NFC. This speeds up setLastUser enough to give a 5% to 10% speed up on trivial invocations of opt and llc, as measured by: perf stat -r 100 opt -S -o /dev/null -O3 /dev/null perf stat -r 100 llc -march=amdgcn /dev/null -filetype null Don't dump last use information unless -debug-pass=Details to avoid printing lots of spam that will break some existing lit tests. Before this patch, dumping last use information was broken anyway, because it used InversedLastUser before it had been populated. Differential Revision: https://reviews.llvm.org/D92309	2021-01-22 09:48:54 +00:00
Sven van Haastregt	22391a6c81	[APSInt][NFC] Clean up doxygen comments Add a Doxygen class comment and clean up other Doxygen comments in this file while we're at it.	2021-01-22 09:23:41 +00:00
Nathan Lanza	c9d2b08298	NFC: Remove simple_ilist comment mentioning ilist/iplist allocating Allocation was removed from ilist in 2016 in the git commit b5da00533510. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D93953	2021-01-22 03:24:54 -05:00
Arthur Eubanks	d815d3c37a	[AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook Having a custom inliner doesn't really fit in with the new PM's pipeline. It's also extra technical debt. amdgpu-inline only does a couple of custom things compared to the normal inliner: 1) It disables inlining if the number of BBs in a function would exceed some limit 2) It increases the threshold if there are pointers to private arrays(?) These can all be handled as TTI inliner hooks. There already exists a hook for backends to multiply the inlining threshold. This way we can remove the custom amdgpu-inline pass. This caused inline-hint.ll to fail, and after some investigation, it looks like getInliningThresholdMultiplier() was previously getting applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it not applying at all, so some later inliner change must have fixed something), so I had to change the threshold in the test. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D94153	2021-01-21 20:29:17 -08:00
Jacques Pienaar	d6b74b6a39	[mlir] Enable passing crash reproducer stream factory method Add factory to create streams for logging the reproducer. Allows for more general logging (beyond file) and logging the configuration/module separately (logged in order, configuration before module). Also enable querying filename of ToolOutputFile. Differential Revision: https://reviews.llvm.org/D94868	2021-01-21 20:03:15 -08:00
Kazu Hirata	69d44af40a	[llvm] Use isDigit (NFC)	2021-01-21 19:59:50 -08:00
ShihPo Hung	a54a750343	[RISCV] Add intrinsics for RVV1.0 VFRSQRTE7 & VFRECE7 Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D95113	2021-01-21 18:38:49 -08:00
ShihPo Hung	1d80f86c9c	[RISCV] Add intrinsics for vector unordered indexed load in RVV 1.0 Add unordered indexed load: vluxei Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D95028	2021-01-21 18:38:49 -08:00
ShihPo Hung	8009ef0a5f	[RISCV] Add intrinsics for RVV 1.0 vrgatherei16 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D95014	2021-01-21 18:38:49 -08:00
Craig Topper	2134974a6f	[RISCV] Add a VL output to vleff intrinsics. The fault-only-first-load instructions can reduce VL if an element other than element 0 triggers a memory fault. This can be used to vectorize loops with data dependent exit conditions like strcmp or strlen. This patch adds a VL output to these intrinsics so that the new VL value can be captured by software. This will be expanded to 'csrr gpr, vl' after the vleff instruction during SelectionDAG. By doing this with one intrinsic we are able to guarantee that the csrr reads the VL value produced by the vleff instruction. Having it as a separate intrinsic would make it impossible to guarantee ordering without making every other vector intrinsic have side effects. The intrinsics are expanded during lowering into two ISD nodes that are glued together. These ISD nodes will go through isel separately, but should maintain the glue so that they get emitted adjacently by InstrEmitter. I've only ran the chain through the vleff instruction, allowing the READ_VL to be deleted if it is unused. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D94286	2021-01-21 17:19:58 -08:00
Chen Zheng	b37b3473aa	[NFC] [TargetRegisterInfo] add another API to get srcreg through copy. Reviewed By: nemanjai, jsji Differential Revision: https://reviews.llvm.org/D92069	2021-01-21 20:10:25 -05:00
Hsiangkai Wang	9cce78baf4	[RISCV] New vector load/store in V extension v1.0 Upgrade RISC-V V extension to v1.0-08a0b46. Indexed load/store have ordered and unordered form. New whole vector load/store. Differential Revision: https://reviews.llvm.org/D93614	2021-01-22 07:30:09 +08:00
David Green	b177d35a9a	[LV][ARM] Inloop reduction cost modelling This adds cost modelling for the inloop vectorization added in 745bf6cf4471. Up until now they have been modelled as the original underlying instruction, usually an add. This happens to works OK for MVE with instructions that are reducing into the same type as they are working on. But MVE's instructions can perform the equivalent of an extended MLA as a single instruction: %sa = sext <16 x i8> A to <16 x i32> %sb = sext <16 x i8> B to <16 x i32> %m = mul <16 x i32> %sa, %sb %r = vecreduce.add(%m) -> R = VMLADAV A, B There are other instructions for performing add reductions of v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64 (VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV). The i64 are particularly interesting as there are no native i64 add/mul instructions, leading to the i64 add and mul naturally getting very high costs. Also worth mentioning, under NEON there is the concept of a sdot/udot instruction which performs a partial reduction from a v16i8 to a v4i32. They extend and mul/sum the first four elements from the inputs into the first element of the output, repeating for each of the four output lanes. They could possibly be represented in the same way as above in llvm, so long as a vecreduce.add could perform a partial reduction. The vectorizer would then produce a combination of in and outer loop reductions to efficiently use the sdot and udot instructions. Although this patch does not do that yet, it does suggest that separating the input reduction type from the produced result type is a useful concept to model. It also shows that a MLA reduction as a single instruction is fairly common. This patch attempt to improve the costmodelling of in-loop reductions by: - Adding some pattern matching in the loop vectorizer cost model to match extended reduction patterns that are optionally extended and/or MLA patterns. This marks the cost of the reduction instruction correctly and the sext/zext/mul leading up to it as free, which is otherwise difficult to tell and may get a very high cost. (In the long run this can hopefully be replaced by vplan producing a single node and costing it correctly, but that is not yet something that vplan can do). - getExtendedAddReductionCost is added to query the cost of these extended reduction patterns. - Expanded the ARM costs to account for these expanded sizes, which is a fairly simple change in itself. - Some minor alterations to allow inloop reduction larger than the highest vector width and i64 MVE reductions. - An extra InLoopReductionImmediateChains map was added to the vectorizer for it to efficiently detect which instructions are reductions in the cost model. - The tests have some updates to show what I believe is optimal vectorization and where we are now. Put together this can greatly improve performance for reduction loop under MVE. Differential Revision: https://reviews.llvm.org/D93476	2021-01-21 21:03:41 +00:00
Duncan P. N. Exon Smith	66bedcb549	ADT: Fix reference invalidation in SmallVector::emplace_back and assign(N,V) This fixes the final (I think?) reference invalidation in `SmallVector` that we need to fix to align with `std::vector`. (There is still some left in the range insert / append / assign, but the standard calls that UB for `std::vector` so I think we don't care?) For POD-like types, reimplement `emplace_back()` in terms of `push_back()`, taking a copy even for large `T` rather than lose the realloc optimization in `grow_pod()`. For other types, split the grow operation in three and construct the new element in the middle. - `mallocForGrow()` calculates the new capacity and returns the result of `safe_malloc()`. We only need a single definition per `SmallVectorBase` so this is defined in SmallVector.cpp to avoid code size bloat. Moving this part of non-POD grow to the source file also allows the logic to be easily shared with `grow_pod`, and `report_size_overflow()` and `report_at_maximum_capacity()` can move there too. - `moveElementsForGrow()` moves elements from the old to the new allocation. - `takeAllocationForGrow()` frees the old allocation and saves the new allocation and capacity . `SmallVector:assign(size_type, const T&)` also uses the split-grow operations for non-POD, but it also has a semantic change when not growing. Previously, assign would start with `clear()`, and so the old elements were destructed and all elements of the new vector were copy-constructed (potentially invalidating references). The new implementation skips destruction and uses copy-assignment for the prefix of the new vector that fits. The new semantics match what libc++ does for `std::vector::assign()`. Note that the following is another possible implementation: ``` void assign(size_type NumElts, ValueParamT Elt) { std::fill_n(this->begin(), std::min(NumElts, this->size()), Elt); this->resize(NumElts, Elt); } ``` The downside of this simpler implementation is that if the vector has to grow there will be `size()` redundant copy operations. (I had planned on splitting this patch up into three for committing (after getting performance numbers / initial review), but I've realized that if this does for some reason need to be reverted we'll probably want to revert the whole package...) Differential Revision: https://reviews.llvm.org/D94739	2021-01-21 12:11:41 -08:00
Matt Arsenault	a964cd0596	AArch64/GlobalISel: Factor out parametersInCSRMatch Make this look more like the DAG handling and move to common code. I also noticed AArch64 seems to not be properly adding the physreg:virtreg mapping to the function live ins.	2021-01-21 10:32:48 -05:00
Joseph Huber	627ae5f2d2	[OpenMP] Add support for mapping names in mapper API Summary: The custom mapper API did not previously support the mapping names added previously. This means they were not present if a user requested debugging information while using the mapper functions. This adds basic support for passing the mapped names to the runtime library. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D94806	2021-01-21 09:26:44 -05:00
Luo, Yuanke	d25938d612	Revert "[X86][AMX] Fix tile config register spill issue." This reverts commit 20013d02f3352a88d0838eed349abc9a2b0e9cc0.	2021-01-21 18:11:43 +08:00
Fangrui Song	260c3e1d18	MCDwarf: Delete uneeded parameter And change signature	2021-01-21 00:55:07 -08:00
Luo, Yuanke	90681ff5c1	[X86][AMX] Fix tile config register spill issue. Previous code build the model that tile config register is the user of each AMX instruction. There is a problem for the tile config register spill. When across function, the ldtilecfg instruction may be inserted on each AMX instruction which use tile config register. This cause all tile data register clobber. To fix this issue, we remove the model of tile config register. We analyze the regmask of call instruction and insert ldtilecfg if there is any tile data register live across the call. Inserting the sttilecfg before the call is unneccessary, because the tile config doesn't change and we can just reload the config. Besides we also need check tile config register interference. Since we don't model the config register we should check interference from the ldtilecfg to each tile data register def. ldtilecfg / \ BB1 BB2 / \ call BB3 / \ %1=tileload %2=tilezero We can start from the instruction of each tile def, and backward to ldtilecfg. If there is any call instruction, and tile data register is not preserved, we should insert ldtilecfg after the call instruction. Differential Revision: https://reviews.llvm.org/D94155	2021-01-21 16:01:50 +08:00
Georgii Rymar	fc516b0eae	[yaml2obj/obj2yaml] - Improve dumping/creating of ELF versioning sections. This makes the following improvements. For `SHT_GNU_versym`: * yaml2obj: set `sh_link` to index of `.dynsym` section automatically. For `SHT_GNU_verdef`: * yaml2obj: set `sh_link` to index of `.dynstr` section automatically. * yaml2obj: set `sh_info` field automatically. * obj2yaml: don't dump the `Info` field when its value matches the number of version definitions. For `SHT_GNU_verneed`: * yaml2obj: set `sh_link` to index of `.dynstr` section automatically. * yaml2obj: set `sh_info` field automatically. * obj2yaml: don't dump the `Info` field when its value matches the number of version dependencies. Also, simplifies few test cases. Differential revision: https://reviews.llvm.org/D94956	2021-01-21 10:36:48 +03:00
Kazu Hirata	cd6de67c94	[llvm] Use hasSingleElement (NFC)	2021-01-20 21:35:55 -08:00
Hsiangkai Wang	ac0a4f4e3c	[RISCV] Implement vssseg intrinsics. Define vlsseg intrinsics and pseudo instructions. Lower vlsseg intrinsics to pseudo instructions in RISCVDAGToDAGISel. Differential Revision: https://reviews.llvm.org/D94863	2021-01-21 11:51:35 +08:00
Hsiangkai Wang	0051771904	[RISCV] Implement vlsseg intrinsics. Define vlsseg intrinsics and pseudo instructions. Lower vlsseg intrinsics to pseudo instructions in RISCVDAGToDAGISel. Differential Revision: https://reviews.llvm.org/D94763	2021-01-21 11:51:35 +08:00
Hsiangkai Wang	00fc860b80	[RISCV] Implement vsseg intrinsics. Define vsseg intrinsics and pseudo instructions. Lower vsseg intrinsics to pseudo instructions in RISCVDAGToDAGISel. Differential Revision: https://reviews.llvm.org/D94688	2021-01-21 11:51:35 +08:00
Varun Gandhi	d38b7dd07c	[NFC] Minor cleanup for ValueHandle code. Based on feedback in https://reviews.llvm.org/D93433. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D94238	2021-01-20 16:27:55 -08:00
Mircea Trofin	62a8a8cc0d	Reland "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor" This reverts commit d97f776be5f8cd3cd446fe73827cd355f6bab4e1. The original problem was due to build failures in shared lib builds. D95079 moved ImportedFunctionsInliningStatistics under Analysis, unblocking this.	2021-01-20 13:33:43 -08:00
Mircea Trofin	88d4cb48b4	[NFC] Move ImportedFunctionsInliningStatistics to Analysis This is related to D94982. We want to call these APIs from the Analysis component, so we can't leave them under Transforms. Differential Revision: https://reviews.llvm.org/D95079	2021-01-20 13:18:03 -08:00
Reid Kleckner	ea4ad8388b	Reland "[PDB] Defer relocating .debug$S until commit time and parallelize it" This reverts commit 5b7aef6eb4b2930971029b984cb2360f7682e5a5 and relands 6529d7c5a45b1b9588e512013b02f891d71bc134. The ASan error was debugged and determined to be the fault of an invalid object file input in our test suite, which was fixed by my last change. LLD's project policy is that it assumes input objects are valid, so I have added a comment about this assumption to the relocation bounds check.	2021-01-20 11:53:43 -08:00
Thomas Lively	652dd89674	[WebAssembly] Prototype new f64x2 conversions As proposed in https://github.com/WebAssembly/simd/pull/383. Differential Revision: https://reviews.llvm.org/D95012	2021-01-20 11:28:06 -08:00
Jez Ng	248ae4d5bc	[lld-macho] Run ObjCContractPass during LTO Run the ObjCARCContractPass during LTO. The legacy LTO backend (under LTO/ThinLTOCodeGenerator.cpp) already does this; this diff just adds that behavior to the new LTO backend. Without that pass, the objc.clang.arc.use intrinsic will get passed to the instruction selector, which doesn't know how to handle it. In order to test both the new and old pass managers, I've also added support for the `--[no-]lto-legacy-pass-manager` flags. P.S. Not sure if the ordering of the pass within the pipeline matters... Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94547	2021-01-20 14:21:32 -05:00
Mircea Trofin	21cfd88ff2	Revert "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor" This reverts commit e8aec763a57e211420dfceb2a8dc6b88574924f3.	2021-01-20 11:19:34 -08:00
Mircea Trofin	615713638e	[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor When using 2 InlinePass instances in the same CGSCC - one for other mandatory inlinings, the other for the heuristic-driven ones - the order in which the ImportedFunctionStats would be output-ed would depend on the destruction order of the inline passes, which is not deterministic. This patch moves the ImportedFunctionStats responsibility to the InlineAdvisor to address this problem. Differential Revision: https://reviews.llvm.org/D94982	2021-01-20 11:07:36 -08:00
Paul C. Anagnostopoulos	96e4bd87bc	Revert "[TableGen] Improve algorithm for inheriting class template args and fields" This reverts commit c056f824340ff0189f3ef7870b83e3730de401d1. That commit causes build failures.	2021-01-20 09:47:13 -05:00
Paul C. Anagnostopoulos	2ae5745b4b	[TableGen] Improve algorithm for inheriting class template args and fields Differential Revision: https://reviews.llvm.org/D94822	2021-01-20 09:31:43 -05:00
Amanieu d'Antras	bea54db865	[AArch64] Add support for the GNU ILP32 ABI Add the aarch64[_be]-*-gnu_ilp32 targets to support the GNU ILP32 ABI for AArch64. The needed codegen changes were mostly already implemented in D61259, which added support for the watchOS ILP32 ABI. The main changes are: - Wiring up the new target to enable ILP32 codegen and MC. - ILP32 va_list support. - ILP32 TLSDESC relocation support. There was existing MC support for ELF ILP32 relocations from D25159 which could be enabled by passing "-target-abi ilp32" to llvm-mc. This was changed to check for "gnu_ilp32" in the target triple instead. This shouldn't cause any issues since the existing support was slightly broken: it was generating ELF64 objects instead of the ELF32 object files expected by the GNU ILP32 toolchain. This target has been tested by running the full rustc testsuite on a big-endian ILP32 system based on the GCC ILP32 toolchain. Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D94143	2021-01-20 13:34:47 +00:00
Bjorn Pettersson	be875842f9	[PM] Avoid duplicates in the Used/Preserved/Required sets The pass analysis uses "sets" implemented using a SmallVector type to keep track of Used, Preserved, Required and RequiredTransitive passes. When having nested analyses we could end up with duplicates in those sets, as there was no checks to see if a pass already existed in the "set" before pushing to the vectors. This idea with this patch is to avoid such duplicates by avoiding pushing elements that already is contained when adding elements to those sets. To align with the above PMDataManager::collectRequiredAndUsedAnalyses is changed to skip adding both the Required and RequiredTransitive passes to its result vectors (since RequiredTransitive always is a subset of Required we ended up with duplicates when traversing both sets). Main goal with this is to avoid spending time verifying the same analysis mulitple times in PMDataManager::verifyPreservedAnalysis when iterating over the Preserved "set". It is assumed that removing duplicates from a "set" shouldn't have any other negative impact (I have not seen any problems so far). If this ends up causing problems one could do some uniqueness filtering of the vector being traversed in verifyPreservedAnalysis instead. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D94416	2021-01-20 13:55:18 +01:00
Mirko Brkusanin	a421260042	[AMDGPU][GlobalISel] Avoid selecting S_PACK with constants If constants are hidden behind G_ANYEXT we can treat them same way as G_SEXT. For that purpose we extend getConstantVRegValWithLookThrough with option to handle G_ANYEXT same way as G_SEXT. Differential Revision: https://reviews.llvm.org/D92219	2021-01-20 11:54:53 +01:00
Gabriel Hjort Åkerlund	0fd4b778e9	[GlobalISel] Add missing operand update when copy is required When constraining an operand register using constrainOperandRegClass(), the function may emit a COPY in case the provided register class does not match the current operand register class. However, the operand itself is not updated to make use of the COPY, thereby resulting in incorrect code. This patch fixes that bug by updating the machine operand accordingly. Reviewed By: dsanders Differential Revision: https://reviews.llvm.org/D91244	2021-01-20 10:32:52 +01:00
David Sherwood	0c549cead8	[NFC][InstructionCost] Use InstructionCost in lib/Transforms/IPO/IROutliner.cpp In places where we call a TTI.getXXCost() function I have changed the code to use InstructionCost instead of unsigned. This is in preparation for later on when we will change the TTI interfaces to return InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D94427	2021-01-20 08:33:59 +00:00
Hsiangkai Wang	73f4381c97	[RISCV] Implement vlseg intrinsics. For Zvlsseg, we need continuous vector registers for the values. We need to define new register classes for the different combinations of (number of fields and LMUL). For example, when the number of fields(NF) = 3, LMUL = 2, the values will be assigned to (V0M2, V2M2, V4M2), (V2M2, V4M2, V6M2), (V4M2, V6M2, V8M2), ... We define the vlseg intrinsics with multiple outputs. There is no way to describe the codegen patterns with multiple outputs in the tablegen files. We do the codegen in RISCVISelDAGToDAG and use EXTRACT_SUBREG to extract the values of output. The multiple scalable vector values will be put into a struct. This patch is depended on the support for scalable vector struct. Differential Revision: https://reviews.llvm.org/D94229	2021-01-20 14:26:04 +08:00
Juneyoung Lee	c70101b156	Allow nonnull/align attribute to accept poison Currently LLVM is relying on ValueTracking's `isKnownNonZero` to attach `nonnull`, which can return true when the value is poison. To make the semantics of `nonnull` consistent with the behavior of `isKnownNonZero`, this makes the semantics of `nonnull` to accept poison, and return poison if the input pointer isn't null. This makes many transformations like below legal: ``` %p = gep inbounds %x, 1 ; % p is non-null pointer or poison call void @f(%p) ; instcombine converts this to call void @f(nonnull %p) ``` Instead, this semantics makes propagation of `nonnull` to caller illegal. The reason is that, passing poison to `nonnull` does not immediately raise UB anymore, so such program is still well defined, if the callee does not use the argument. Having `noundef` attribute there re-allows this. ``` define void @f(i8* %p) { ; functionattr cannot mark %p nonnull here anymore call void @g(i8* nonnull %p) ; .. because @g never raises UB if it never uses %p. ret void } ``` Another attribute that needs to be updated is `align`. This patch updates the semantics of align to accept poison as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90529	2021-01-20 11:31:23 +09:00
Wei Mi	6e11fd80e6	Fix Wmissing-field-initializers warnings.	2021-01-19 15:26:52 -08:00
Wei Mi	c2ff32c8b0	[SampleFDO] Add the support to split the function profiles with context into separate sections. For ThinLTO, all the function profiles without context has been annotated to outline functions if possible in prelink phase. In postlink phase, profile annotation in postlink phase is only meaningful for function profile with context. If the profile is large, it is better to split the profile into two parts, one with context and one without, so the profile reading in postlink phase only has to read the part with context. To have the profile splitting, we extend the ExtBinary format to support different section arrangement. It will be flexible to add other section layout in the future without the need to create new class inheriting from ExtBinary class. Differential Revision: https://reviews.llvm.org/D94435	2021-01-19 15:16:19 -08:00
Mitch Phillips	c784193edd	Revert "[PDB] Defer relocating .debug$S until commit time and parallelize it" This reverts commit 6529d7c5a45b1b9588e512013b02f891d71bc134. Reason: Broke the ASan buildbots. http://lab.llvm.org:8011/#/builders/99/builds/1567	2021-01-19 11:45:48 -08:00
Craig Topper	14aa520218	[RISCV] Add DAG combine to turn (setcc X, 1, setne) -> (setcc X, 0, seteq) if we can prove X is 0/1. If we are able to compare with 0 instead of 1, we might be able to fold the setcc into a beqz/bnez. Often these setccs start life as an xor that gets converted to a setcc by DAG combiner's rebuildSetcc. I looked into a detecting (xor X, 1) and converting to (seteq X, 0) based on boolean contents being 0/1 in rebuildSetcc instead of using computeKnownBits. It was very perturbing to AMDGPU tests which I didn't look closely at. It had a few changes on a couple other targets, but didn't seem to be much if any improvement. Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D94730	2021-01-19 11:21:48 -08:00
Jeroen Dobbelaere	116cd71f2c	[noalias.decl] Look through llvm.experimental.noalias.scope.decl Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93042	2021-01-19 20:09:42 +01:00
Jessica Paquette	c4d2d8a4de	[GlobalISel] Combine (a[0]) \| (a[1] << k1) \| ...\| (a[m] << kn) into a wide load This is a restricted version of the combine in `DAGCombiner::MatchLoadCombine`. (See D27861) This tries to recognize patterns like below (assuming a little-endian target): ``` s8* x = ... s32 val = a[0] \| (a[1] << 8) \| (a[2] << 16) \| (a[3] << 24) -> s32 val = ((i32)a) s8 x = ... s32 val = a[3] \| (a[2] << 8) \| (a[1] << 16) \| (a[0] << 24) -> s32 val = BSWAP(*((s32)a)) ``` (This patch also handles the big-endian target case as well, in which the first example above has a BSWAP, and the second example above does not.) To recognize the pattern, this searches from the last G_OR in the expression tree. E.g. ``` Reg Reg \ / OR_1 Reg \ / OR_2 \ Reg .. / Root ``` Each non-OR register in the tree is put in a list. Each register in the list is then checked to see if it's an appropriate load + shift logic. If every register is a load + potentially a shift, the combine checks if those loads + shifts, when OR'd together, are equivalent to a wide load (possibly with a BSWAP.) To simplify things, this patch (1) Only handles G_ZEXTLOADs (which appear to be the common case) (2) Only works in a single MachineBasicBlock (3) Only handles G_SHL as the bit twiddling to stick the small load into a specific location An IR example of this is here: https://godbolt.org/z/4sP9Pj (lifted from test/CodeGen/AArch64/load-combine.ll) At -Os on AArch64, this is a 0.5% code size improvement for CTMark/sqlite3, and a 0.4% improvement for CTMark/7zip-benchmark. Also fix a bug in `isPredecessor` which caused it to fail whenever `DefMI` was the first instruction in the block. Differential Revision: https://reviews.llvm.org/D94350	2021-01-19 10:24:27 -08:00
Valentin Clement	715d713b9e	[flang][directive] Get rid of flangClassValue in TableGen The TableGen emitter for directives has two slots for flangClass information and this was mainly to be able to keep up with the legacy openmp parser at the time. Now that all clauses are encapsulated in AccClause or OmpClause, these two strings are not necessary anymore and were the the source of couple of problem while working with the generic structure checker for OpenMP. This patch remove the flangClassValue string from DirectiveBase.td and use the string flangClass as the placeholder for the encapsulated class. Reviewed By: sameeranjoshi Differential Revision: https://reviews.llvm.org/D94821	2021-01-19 10:28:46 -05:00
Tim Northover	ed1f4159c7	AArch64: add apple-a14 as a CPU This CPU supports all v8.5a features except BTI, and so identifies as v8.5a to Clang. A bit weird, but the best way for things like xnu to detect the new features it cares about.	2021-01-19 14:04:53 +00:00
Med Ismail Bennani	3496691e63	[llvm/Orc] Fix ExecutionEngine module build breakage This patch updates the llvm module map to reflect changes made in `24672ddea3c97fd1eca3e905b23c0116d7759ab8` and fixes the module builds (`-DLLVM_ENABLE_MODULES=On`). Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>	2021-01-19 14:39:06 +01:00
Caroline Concatto	90b2e7be56	[AArch64][SVE]Add cost model for vector reduce for scalable vector This patch computes the cost for vector.reduce<operand> for scalable vectors. The cost is split into two parts: the legalization cost and the horizontal reduction. Differential Revision: https://reviews.llvm.org/D93639	2021-01-19 11:54:16 +00:00
Florian Hahn	d596025713	[LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands. D84108 exposed a bad interaction between inlining and loop-rotation during regular LTO, which is causing notable regressions in at least CINT2006/473.astar. The problem boils down to: we now rotate a loop just before the vectorizer which requires duplicating a function call in the preheader when compiling the individual files ('prepare for LTO'). But this then prevents further inlining of the function during LTO. This patch tries to resolve this issue by making LoopRotate more conservative with respect to rotating loops that have inline-able calls during the 'prepare for LTO' stage. I think this change intuitively improves the current situation in general. Loop-rotate tries hard to avoid creating headers that are 'too big'. At the moment, it assumes all inlining already happened and the cost of duplicating a call is equal to just doing the call. But with LTO, inlining also happens during full LTO and it is possible that a previously duplicated call is actually a huge function which gets inlined during LTO. From the perspective of LV, not much should change overall. Most loops calling user-provided functions won't get vectorized to start with (unless we can infer that the function does not touch memory, has no other side effects). If we do not inline the 'inline-able' call during the LTO stage, we merely delayed loop-rotation & vectorization. If we inline during LTO, chances should be very high that the inlined code is itself vectorizable or the user call was not vectorizable to start with. There could of course be scenarios where we inline a sufficiently large function with code not profitable to vectorize, which would have be vectorized earlier (by scalarzing the call). But even in that case, there probably is no big performance impact, because it should be mostly down to the cost-model to reject vectorization in that case. And then the version with scalarized calls should also not be beneficial. In a way, LV should have strictly more information after inlining and make more accurate decisions (barring cost-model issues). There is of course plenty of room for things to go wrong unexpectedly, so we need to keep a close look at actual performance and address any follow-up issues. I took a look at the impact on statistics for MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer loops rotated, but no change to the number of loops vectorized. Reviewed By: sanwou01 Differential Revision: https://reviews.llvm.org/D94232	2021-01-19 10:15:29 +00:00
Lang Hames	3c8ce89cc8	[ORC] Move LookupRequest from OrcShared to Orc. It depends on Orc types (SymbolLookupSet), so can't be part of OrcShared.	2021-01-19 20:23:47 +11:00
Andy Wingo	b4f7ebdf5e	[WebAssembly] Change prefix on data segment flags to WASM_DATA_SEGMENT Element sections will also need flags, so we shouldn't squat the WASM_SEGMENT namespace. Depends on D90948. Differential Revision: https://reviews.llvm.org/D92315	2021-01-19 09:40:42 +01:00
ShihPo Hung	2b1fcde239	[RISCV] Add intrinsics for vector AMO operations Add vamoswap, vamoadd, vamoxor, vamoand, vamoor, vamomin, vamomax, vamominu, vamomaxu intrinsics. Reviewed By: craig.topper, khchen Differential Revision: https://reviews.llvm.org/D94589	2021-01-18 23:11:10 -08:00
Lang Hames	af4971c5fa	[ORC] Move OrcError.h to include/llvm/ExecutionEngine/Orc/Shared. OrcShared is the correct home for this header since Orc was split in 1d0676b54c4. (It should have been moved in that commit, but was overlooked).	2021-01-19 16:18:00 +11:00
Chen Zheng	7478f56eb6	Revert "[NFC] [TargetRegisterInfo] add one use check to lookThruCopyLike." This reverts commit 3bdf4507b66348ad78df4655a8e4f36c3fc10f3c. Post commit comments need to be addressed first.	2021-01-18 21:33:31 -05:00
Kazu Hirata	57cbbdd64e	[LoopInfo] Fix a typo in compareLoops The code here is checking to see if two sets are identical. OtherBlocksSet should point to OtherL->getBlocksSet() instead. Differential Revision: https://reviews.llvm.org/D94926	2021-01-18 14:53:22 -08:00
Kazu Hirata	32a3ef3ebc	[STLExtras] Add a default value to drop_begin This patch adds the default value of 1 to drop_begin. In the llvm codebase, 70% of calls to drop_begin have 1 as the second argument. The interface similar to with std::next should improve readability. This patch converts a couple of calls to drop_begin as examples. Differential Revision: https://reviews.llvm.org/D94858	2021-01-18 10:16:34 -08:00
Florian Hahn	a6f855d31d	[AArch64] Revert back to Intrinsic<> for TME instructions. This patch reverts back to Intrinsic for the instructions for the transactional memory extension, so nosync is not included.	2021-01-18 18:03:58 +00:00
Florian Hahn	3773c85b51	[AArch64] Make target intrinsics DefaultAttrIntrinsics. DefaultAttrIntrinsics was introduced to add very common attributes to a large set of intrinsics. Currently the added attributes include: nofree nosync nounwind willreturn I think those should hold for most AArch64 target intrinsics, but there are too many to check manually. This patch makes most AArch64 target intrinsics DefaultAttrsIntrinsics. Some notable exceptions I think are exclusive loads and stores as well as the memory barrier intrinsics, for which nosync does not apply I think. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D94687	2021-01-18 17:32:15 +00:00
Florian Hahn	ffd7bee534	[VectorUtils] Do not try to add indices matching tombstone/empty values. Keys matching the tombstone/empty special values cannot be inserted in a DenseMap. Under some circumstances, LV tries to add members to an interleave group that match the special values. Skip adding such members. This is unlikely to have any impact in practice, because interleave groups with such indices are very likely to not be vectorized, due to gaps. This issue has been surfaced by fuzzing, see https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11638	2021-01-18 11:18:28 +00:00
Tres Popp	3fdf369051	Revert "[PowerPC] support register pressure reduction in machine combiner." This reverts commit 26a396c4ef481cb159bba631982841736a125a9c. See https://reviews.llvm.org/D92071 for a description of the issue.	2021-01-18 12:01:57 +01:00
Georgii Rymar	43999a3d98	[Object, llvm-readelf] - Move the API for retrieving symbol versions to ELF.h `ELFDumper.cpp` implements the functionality that allows to get symbol versions. It is used for dumping versioned symbols. This helps to implement https://bugs.llvm.org/show_bug.cgi?id=48670 ("make llvm-nm -D print version names"): we can move out and reuse the code from `ELFDumper.cpp`. This is what this patch do: it moves the related functionality to `ELFFile<ELFT>`. Differential revision: https://reviews.llvm.org/D94771	2021-01-18 12:50:29 +03:00
Craig Topper	4097ff94d8	[IR] Allow scalable vectors in structs to support intrinsics returning multiple values. RISC-V would like to use a struct of scalable vectors to return multiple values from intrinsics. This woud also be needed for target independent intrinsics like llvm.sadd.overflow. This patch removes the existing restriction for this. I've modified StructType::isSized to consider a struct containing scalable vectors as unsized so the verifier won't allow loads/stores/allocas of these structs. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D94142	2021-01-17 23:29:51 -08:00
Chen Zheng	6f751d75c9	[PowerPC] support register pressure reduction in machine combiner. Reassociating some patterns to generate more fma instructions to reduce register pressure. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D92071	2021-01-17 23:56:13 -05:00
Chen Zheng	2bf67e3d19	[NFC] [TargetRegisterInfo] add one use check to lookThruCopyLike. add one use check to lookThruCopyLike. The root node is safe to be deleted if we are sure that every definition in the copy chain only has one use. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D92069	2021-01-17 19:56:42 -05:00
Nikita Popov	39611f134f	Reapply [BasicAA] Handle recursive queries more efficiently There are no changes relative to the original commit. However, an issue this exposed in BasicAA assumption tracking has been fixed in the previous commit. ----- An alias query currently works out roughly like this: * Look up location pair in cache. * Perform BasicAA logic (including cache lookup and insertion...) * Perform a recursive query using BestAAResults. * Look up location pair in cache (and thus do not recurse into BasicAA) * Query all the other AA providers. * Query all the other AA providers. This is a lot of unnecessary work, all ultimately caused by the BestAAResults query at the end of aliasCheck(). The reason we perform it, is that aliasCheck() is getting called recursively, and we of course want those recursive queries to also make use of other AA providers, not just BasicAA. We can solve this by making the recursive queries directly use BestAAResults (which will check both BasicAA and other providers), rather than recursing into aliasCheck(). There are some tradeoffs: * We can no longer pass through the precomputed underlying object to aliasCheck(). This is not a major concern, because nowadays getUnderlyingObject() is quite cheap. * Results from other AA providers are no longer cached inside BasicAA. The way this worked was already a bit iffy, in that a result could be cached, but if it was MayAlias, we'd still end up re-querying other providers anyway. If we want to cache non-BasicAA results, we should do that in a more principled manner. In any case, despite those tradeoffs, this works out to be a decent compile-time improvment. I think it also simplifies the mental model of how BasicAA works. It took me quite a while to fully understand how these things interact. Differential Revision: https://reviews.llvm.org/D90094	2021-01-17 10:34:35 +01:00
Nikita Popov	3c72247d85	[BasicAA] Move assumption tracking into AAQI D91936 placed the tracking for the assumptions into BasicAA. However, when recursing over phis, we may use fresh AAQI instances. In this case AssumptionBasedResults from an inner AAQI can reesult in a removal of an element from the outer AAQI. To avoid this, move the tracking into AAQI. This generally makes more sense, as the NoAlias assumptions themselves are also stored in AAQI. The test case only produces an assertion failure with D90094 reapplied. I think the issue exists independently of that change as well, but I wasn't able to come up with a reproducer.	2021-01-17 10:34:35 +01:00
Kazu Hirata	8b192f274e	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-16 09:40:53 -08:00
Kazu Hirata	d123714733	[StringExtras] Fix comment typos (NFC)	2021-01-16 09:40:51 -08:00
Florian Hahn	c8b9f98090	[LTO] Remove options to disable inlining, vectorization & GVNLoadPRE. This patch removes some ancient options as a clean-up before moving code-gen to use LTOBackend in D94487. I think it would preferable to remove those ancient options, because 1. There are no corresponding options in LTOBackend based tools, 2. There are no unit tests for them, 3. They are not passed through by Clang, 4. At least for GNVLoadPRE, users could just use GVN's `enable-load-pre`. Alternatively we could add support for those options to lto::Config & co, but I think it would be better to remove them, unless they are actually used in practice. Reviewed By: steven_wu, tejohnson Differential Revision: https://reviews.llvm.org/D94783	2021-01-16 16:29:15 +00:00

... 2 3 4 5 6 ...

44078 Commits