llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 03:33:20 +01:00

Author	SHA1	Message	Date
Hongtao Yu	91873af129	[CSSPGO] Pseudo probe encoding and emission. This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. ELF object emission The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission. Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool. The format of `.pseudo_probe_desc` section looks like: ``` .section .pseudo_probe_desc,"",@progbits .quad 6309742469962978389 // Func GUID .quad 4294967295 // Func Hash .byte 9 // Length of func name .ascii "_Z5funcAi" // Func name .quad 7102633082150537521 .quad 138828622701 .byte 12 .ascii "_Z8funcLeafi" .quad 446061515086924981 .quad 4294967295 .byte 9 .ascii "_Z5funcBi" .quad -2016976694713209516 .quad 72617220756 .byte 7 .ascii "_Z3fibi" ``` For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format : ``` FUNCTION BODY (one for each outlined function present in the text section) GUID (uint64) GUID of the function NPROBES (ULEB128) Number of probes originating from this function. NUM_INLINED_FUNCTIONS (ULEB128) Number of callees inlined into this function, aka number of first-level inlinees PROBE RECORDS A list of NPROBES entries. Each entry contains: INDEX (ULEB128) TYPE (uint4) 0 - block probe, 1 - indirect call, 2 - direct call ATTRIBUTE (uint3) reserved ADDRESS_TYPE (uint1) 0 - code address, 1 - address delta CODE_ADDRESS (uint64 or ULEB128) code address or address delta, depending on ADDRESS_TYPE INLINED FUNCTION RECORDS A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined callees. Each record contains: INLINE SITE GUID of the inlinee (uint64) ID of the callsite probe (ULEB128) FUNCTION BODY A FUNCTION BODY entry describing the inlined function. ``` To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index. Assembling Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis. A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file. A example assembly looks like: ``` foo2: # @foo2 # %bb.0: # %bb0 pushq %rax testl %edi, %edi .pseudoprobe 837061429793323041 1 0 0 je .LBB1_1 # %bb.2: # %bb2 .pseudoprobe 837061429793323041 6 2 0 callq foo .pseudoprobe 837061429793323041 3 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq .LBB1_1: # %bb1 .pseudoprobe 837061429793323041 5 1 0 callq %rsi .pseudoprobe 837061429793323041 2 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq # -- End function .section .pseudo_probe_desc,"",@progbits .quad 6699318081062747564 .quad 72617220756 .byte 3 .ascii "foo" .quad 837061429793323041 .quad 281547593931412 .byte 4 .ascii "foo2" ``` With inlining turned on, the assembly may look different around %bb2 with an inlined probe: ``` # %bb.2: # %bb2 .pseudoprobe 837061429793323041 3 0 .pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6 .pseudoprobe 837061429793323041 4 0 popq %rax retq ``` Disassembling* We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file. An example disassembly looks like: ``` 00000000002011a0 <foo2>: 2011a0: 50 push rax 2011a1: 85 ff test edi,edi [Probe]: FUNC: foo2 Index: 1 Type: Block 2011a3: 74 02 je 2011a7 <foo2+0x7> [Probe]: FUNC: foo2 Index: 3 Type: Block [Probe]: FUNC: foo2 Index: 4 Type: Block [Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6 2011a5: 58 pop rax 2011a6: c3 ret [Probe]: FUNC: foo2 Index: 2 Type: Block 2011a7: bf 01 00 00 00 mov edi,0x1 [Probe]: FUNC: foo2 Index: 5 Type: IndirectCall 2011ac: ff d6 call rsi [Probe]: FUNC: foo2 Index: 4 Type: Block 2011ae: 58 pop rax 2011af: c3 ret ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91878	2020-12-10 09:50:08 -08:00
Arthur Eubanks	efbcfa65ec	[test] Fix scev-expander-preserve-lcssa.ll under NPM The NPM runs loop passes over loops in forward program order, rather than the legacy loop PM's reverse program order. This seems to produce better results as shown here. I verified that changing the loop order to reverse program order results in the same IR with the NPM. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D92817	2020-12-10 09:46:08 -08:00
Craig Topper	55c03d9d7b	[RISCV][LegalizeDAG] Expand SETO and SETUO comparisons. Teach LegalizeDAG to expand SETUO expansion when UNE isn't legal. If SETUNE isn't legal, UO can use the NOT of the SETO expansion. Removes some complex isel patterns. Most of the test changes are from using XORI instead of SEQZ. Differential Revision: https://reviews.llvm.org/D92008	2020-12-10 09:15:52 -08:00
Florian Hahn	11dfe26f5c	[CallBase] Add hasRetAttr version that takes StringRef. This makes it slightly easier to deal with custom attributes and CallBase already provides hasFnAttr versions that support both AttrKind and StringRef arguments in a similar fashion. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D92567	2020-12-10 17:00:16 +00:00
Irina Dobrescu	1f92697f11	[flang]Add Parser Support for Allocate Directive Differential Revision: https://reviews.llvm.org/D89562	2020-12-10 16:21:19 +00:00
clementval	a38956acf7	Revert "[openmp] Remove clause from OMPKinds.def and use OMP.td info" This reverts commit a7b2847216b4f7a84ef75461fd47a5adfbb63e27. failing buildbot on warnings	2020-12-10 10:34:59 -05:00
Nuno Lopes	3db4186d3d	AA: make AliasAnalysis.h compatible with C++20 (NFC) can't mix arithmetic with different enums	2020-12-10 15:32:11 +00:00
Nico Weber	eeec860f46	[gn build] fix build after a7b2847216b4f7 Ports 6e42a417bacb since it's now needed, and undo an accidental deletion from d69762c404ded while here (this part is not needed to fix the build, it's just in the vicinity).	2020-12-10 10:28:48 -05:00
Valentin Clement	643cb4d428	[openmp] Remove clause from OMPKinds.def and use OMP.td info Remove the OpenMP clause information from the OMPKinds.def file and use the information from the new OMP.td file. There is now a single source of truth for the directives and clauses. To avoid generate lots of specific small code from tablegen, the macros previously used in OMPKinds.def are generated almost as identical. This can be polished and possibly removed in a further patch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D92955	2020-12-10 10:19:09 -05:00
Krzysztof Parzyszek	eb2eae25ad	[Hexagon] Fix gcc6 compilation issue	2020-12-10 08:17:07 -06:00
Kerry McLaughlin	1ca5a57655	[SVE][CodeGen] Extend index of masked gathers This patch changes performMSCATTERCombine to also promote the indices of masked gathers where the element type is i8 or i16, and adds various tests for gathers with illegal types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91433	2020-12-10 13:54:45 +00:00
Haojian Wu	9fb48763b2	Fix a -Wunused-variable warning in release build.	2020-12-10 14:52:45 +01:00
Kazushi (Jam) Marukawa	e3d737a117	[VE] Add vector reduce intrinsic instructions Add vrmax, vrmin, vfrmax, vfrmin, vrand, vror, and vrxor intrinsic instructions and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92941	2020-12-10 22:21:17 +09:00
Sjoerd Meijer	1a124afc04	[AArch64] Cortex-R82: remove crypto Remove target features crypto for Cortex-R82, because it doesn't have any, and add LSE which was missing while we are at it. This also removes crypto from the v8-R architecture description because that aligns better with GCC and so far none of the R-cores have implemented crypto, so is probably a more sensible default. Differential Revision: https://reviews.llvm.org/D91994	2020-12-10 12:54:51 +00:00
David Green	67f2592469	[ARM][RegAlloc] Add t2LoopEndDec We currently have problems with the way that low overhead loops are specified, with LR being spilled between the t2LoopDec and the t2LoopEnd forcing the entire loop to be reverted late in the backend. As they will eventually become a single instruction, this patch introduces a t2LoopEndDec which is the combination of the two, combined before registry allocation to make sure this does not fail. Unfortunately this instruction is a terminator that produces a value (and also branches - it only produces the value around the branching edge). So this needs some adjustment to phi elimination and the register allocator to make sure that we do not spill this LR def around the loop (needing to put a spill after the terminator). We treat the loop very carefully, making sure that there is nothing else like calls that would break it's ability to use LR. For that, this adds a isUnspillableTerminator to opt in the new behaviour. There is a chance that this could cause problems, and so I have added an escape option incase. But I have not seen any problems in the testing that I've tried, and not reverting Low overhead loops is important for our performance. If this does work then we can hopefully do the same for t2WhileLoopStart and t2DoLoopStart instructions. This patch also contains the code needed to convert or revert the t2LoopEndDec in the backend (which just needs a subs; bne) and the code pre-ra to create them. Differential Revision: https://reviews.llvm.org/D91358	2020-12-10 12:14:23 +00:00
Martin Storsjö	102c71a022	[llvm-rc] Handle driveless absolute windows paths when loading external files When llvm-rc loads an external file, it looks for it relative to a number of include directories and the current working directory. If the path is considered absolute, llvm-rc tries to open the filename as such, and doesn't try to open it relative to other paths. On Windows, a path name like "\dir\file" isn't considered absolute as it lacks the drive name, but by appending it on top of the search dirs, it's not found. LLVM's sys::path::append just appends such a path (same with a properly absolute posix path) after the paths it's supposed to be relative to. This fix doesn't handle the case if the resource script and the external file are on a different drive than the current working directory; to fix that, we'd have to make LLVM's sys::path::append handle appending fully absolute and partially absolute paths (ones lacking a drive prefix but containing a root directory), or switch to C++17's std::filesystem. Differential Revision: https://reviews.llvm.org/D92558	2020-12-10 14:11:06 +02:00
Alexey Lapshin	8c9a1f9e87	[dsymutil][DWARFLinker][NFC] Make interface of AddressMap more general. Current interface of AddressMap assumes that relocations exist. That is correct for not-linked object file but is not correct for linked executable. This patch changes interface in such way that AddressMap could be used not only with not-linked object files: hasValidRelocationAt() replaced with: hasLiveMemoryLocation() hasLiveAddressRange() Differential Revision: https://reviews.llvm.org/D87723	2020-12-10 14:57:08 +03:00
Mirko Brkusanin	e195dd75ce	[AMDGPU] Resolve issues when picking between ds_read/write and ds_read2/write2 Both ds_read_b128 and ds_read2_b64 are valid for 128bit 16-byte aligned loads but the one that will be selected is determined either by the order in tablegen or by the AddedComplexity attribute. Currently ds_read_b128 has priority. While ds_read2_b64 has lower alignment requirements, we cannot always restrict ds_read_b128 to 16-byte alignment because of unaligned-access-mode option. This was causing ds_read_b128 to be selected for 8-byte aligned loads regardles of chosen access mode. To resolve this we use two patterns for selecting ds_read_b128. One requires alignment of 16-byte and the other requires unaligned-access-mode option. Same goes for ds_write2_b64 and ds_write_b128. Differential Revision: https://reviews.llvm.org/D92767	2020-12-10 12:40:49 +01:00
David Green	b4f282c77f	[ARM] Additional test for Min loop. NFC	2020-12-10 10:49:00 +00:00
David Green	04038723cc	[ARM] Remove copies from low overhead phi inductions. The phi created in a low overhead loop gets created with a default register class it seems. There are then copied inserted between the low overhead loop pseudo instructions (which produce/consume GPRlr instructions) and the phi holding the induction. This patch removes those as a step towards attempting to make t2LoopDec and t2LoopEnd a single instruction, and appears useful in it's own right as shown in the tests. Differential Revision: https://reviews.llvm.org/D91267	2020-12-10 10:30:31 +00:00
Jun Ma	0d59bcd1c0	[TruncInstCombine] Remove scalable vector restriction Differential Revision: https://reviews.llvm.org/D92819	2020-12-10 18:00:19 +08:00
Benjamin Kramer	5449a6bbe1	Remove Shapet assignment operator that's identical to the default. NFC.	2020-12-10 10:58:41 +01:00
Benjamin Kramer	94cf8f1d41	[Hexagon] Fold single-use variables into assert. NFCI. Silences unused variable warnings in Release builds.	2020-12-10 10:53:56 +01:00
David Green	683e29b9a4	[ARM] MVE vcreate tests, for dual lane moves. NFC	2020-12-10 09:17:34 +00:00
LLVM GN Syncbot	dce1606292	[gn build] Port f80b29878b0	2020-12-10 09:13:09 +00:00
Luo, Yuanke	4a2765406d	[X86] AMX programming model. This patch implements amx programming model that discussed in llvm-dev (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html). Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet. This patch implemeted 7 components. 1. The c interface to end user. 2. The AMX intrinsics in LLVM IR. 3. Transform load/store <256 x i32> to AMX intrinsics or split the type into two <128 x i32>. 4. The Lowering from AMX intrinsics to AMX pseudo instruction. 5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx intruction. 6. The register allocation for tile register. 7. Morph AMX pseudo instruction to AMX real instruction. Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0 Differential Revision: https://reviews.llvm.org/D87981	2020-12-10 17:01:54 +08:00
Lang Hames	bc304940bf	[JITLink][ELF] Reformat/add debug logging in ELF_x86_64.cpp. Moves symbol name to the end of the output and makes other columns fixed width so that they line up.	2020-12-10 18:46:44 +11:00
Kazu Hirata	9793163151	[Tablegen] Use llvm::is_contained (NFC)	2020-12-09 23:34:07 -08:00
Sergey Dmitriev	9302af918f	[llvm-link][NFC] Minor cleanup llvm::Linker::linkModules() is a static member, so there is no need to pass reference to llvm::Linker instance to loadArFile() function. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D92918	2020-12-09 23:16:13 -08:00
Kazushi (Jam) Marukawa	4bf07b5b90	[VE][NFC] Disable VP tests VP tests recently added don't work on Release mode. They work on Debug mode, so I disable them on Release mode to make tests work.	2020-12-10 15:13:05 +09:00
Arthur Eubanks	07235ffbaf	[test] Fix coro-retcon.ll under NPM The full aa-pipeline is required to remove the extra store.	2020-12-09 22:04:59 -08:00
Alina Sbirlea	1574bc6938	[MemorySSA/docs] Extend MemorySSA documentation.	2020-12-09 18:00:16 -08:00
Arthur Eubanks	3c001d0408	[LTO][NPM] Default to using NPM under ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER This affects users of LTO that don't explicitly set UseNewPM. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D92894	2020-12-09 17:48:47 -08:00
Fangrui Song	ead0e3209d	Rename -plugin-opt=no-new-pass-manager to -plugin-opt=legacy-pass-manager	2020-12-09 16:43:30 -08:00
Stanislav Mekhanoshin	7b785e39c8	[AMDGPU] Fix expansion of 192 bit spills in PEI Differential Revision: https://reviews.llvm.org/D92979	2020-12-09 16:36:29 -08:00
Krzysztof Parzyszek	071e5ba62e	[Hexagon] Silence warnings about unused objects	2020-12-09 17:54:10 -06:00
Krzysztof Parzyszek	1a325ba866	[Hexagon] Fix build: move template specialization into namespace scope	2020-12-09 17:40:15 -06:00
Scott Linder	6218f09b03	[MC] Fix ICE with non-newline terminated input There is an explicit option for the lexer to support this, but we crash when `-preserve-comments` is enabled because it checks for `getTok().getString().empty()` to detect the case. This doesn't work currently because the lexer reports this case as a string of length 1, containing a null byte. Change the lexer to instead report this case via an empty string, as the null terminator isn't logically a part of the textual input, and the check for `.empty()` seems natural and obvious in the calling code. Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D92681	2020-12-09 23:39:32 +00:00
LLVM GN Syncbot	89ea615411	[gn build] Port f5d07a05bbd	2020-12-09 23:12:27 +00:00
Krzysztof Parzyszek	deb082d99d	[Hexagon] Realign HVX vectors wherever possible Introduce HexagonVectorCombine as a helper class for vector-related optimizations.	2020-12-09 17:11:25 -06:00
Saleem Abdulrasool	4ae2c1c200	X86: use a data driven configuration of Windows x86 libcalls (NFC) Rather than creating a series of associated calls and ensuring that everything is lined up, use a table driven approach that ensures that they two always stay in sync.	2020-12-09 22:49:11 +00:00
Scott Linder	19b5d1fffc	[MC][AMDGPU] Consume EndOfStatement in asm parser Avoids spurious newlines showing up in the output when emitting assembly via MC. Reviewed By: MaskRay, arsenm Differential Revision: https://reviews.llvm.org/D92690	2020-12-09 21:45:55 +00:00
Craig Topper	067e0b2781	[X86] Use APInt::isSignedIntN instead of isIntN for 64-bit ANDs in X86DAGToDAGISel::IsProfitableToFold Pretty sure we meant to be checking signed 32 immediates here rather than unsigned 32 bit. I suspect I messed this up because in MathExtras.h we have isIntN and isUIntN so isIntN differs in signedness depending on whether you're using APInt or plain integers. This fixes a case where we didn't fold a constant created by shrinkAndImmediate. Since shrinkAndImmediate doesn't topologically sort constants it creates, we can fail to convert the Constant to a TargetConstant. This leads to very strange behavior later. Fixes PR48458.	2020-12-09 13:39:07 -08:00
Fangrui Song	7f2a5362d1	[LLD][gold] Add -plugin-opt=no-new-pass-manager -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=on configured LLD and LLVMgold.so will use the new pass manager by default. Add an option to use the legacy pass manager. This will also be used by the Clang driver when -fno-new-pass-manager (D92915) / -fno-experimental-new-pass-manager is set. Reviewed By: aeubanks, tejohnson Differential Revision: https://reviews.llvm.org/D92916	2020-12-09 13:31:03 -08:00
Yuanfang Chen	bbd8d2f9e4	[NFCI] Add missing triple to several LTO tests Also remove the module triple of clang/test/CodeGenObjC/arc.ll, the commandline tripe is all it needs.	2020-12-09 13:13:58 -08:00
Scott Linder	ad95bab280	[AMDGPU][MC] Restore old error position for "too few operands" Revert part of https://reviews.llvm.org/D92084 to make it simpler to start consuming the EndOfStatement token within AMDGPU's ParseInstruction in a future patch. This also brings us back to what every other target currently does. A future change to move the position back to the end of the statement would likely need to audit all of the AMDGPUOperand SMLoc ranges, and determine the SMLoc for the last character of the last operand. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D92960	2020-12-09 21:09:47 +00:00
Sam Clegg	cee0963ebc	[WebAssembly] Add support for named data sections in wasm binaries Followup to https://reviews.llvm.org/D91769 which added support for names globals. Differential Revision: https://reviews.llvm.org/D92909	2020-12-09 12:57:07 -08:00
Mircea Trofin	99ada595aa	[NFC] Removed unused prefixes in llvm/test/CodeGen/AArch64 Differential Revision: https://reviews.llvm.org/D92943	2020-12-09 12:47:51 -08:00
Florian Hahn	2e708d6d56	[AArch64] Add aarch64_neon_vcmla{_rot{90,180,270}} intrinsics. Add builtins required to implement vcmla and rotated variants from the ACLE Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D92929	2020-12-09 19:46:49 +00:00
Michael Munday	0e3bafc4e2	[RISCV][NFC] Regenerate RISCV CodeGen tests Regenerated using: ./llvm/utils/update_llc_test_checks.py -u llvm/test/CodeGen/RISCV/*.ll This has added comments to spill-related instructions and added @plt to some symbols. Differential Revision: https://reviews.llvm.org/D92841	2020-12-09 19:42:49 +00:00

1 2 3 4 5 ...

208151 Commits