llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 12:43:36 +01:00

Author	SHA1	Message	Date
Louis Dionne	e8ebe9cd4f	[lit] Allow passing extra commands to executeShTest This allows creating custom test formats on top of `executeShTest` that inject commands at the beginning of the file being parsed, without requiring these commands to physically appear in the test file itself. For example, one could define a test format that prints out additional debug information at the beginning of each test. More realistically, this has been used to define custom test formats like one that supports compilation failure tests (e.g. with the extension `compile.fail.cpp`) by injecting a command that calls the compiler on the file itself and expects it to fail. Without this change, the only alternative is to create a temporary file with the same content as the original test, then prepend the desired `// RUN:` lines to that file, and call `executeShTest` on that file instead. This is both slow and cumbersome to do. Differential Revision: https://reviews.llvm.org/D76290	2020-03-24 15:02:37 -04:00
Vedant Kumar	baf8348499	[DWARF] Emit DW_AT_call_pc for tail calls Record the address of a tail-calling branch instruction within its call site entry using DW_AT_call_pc. This allows a debugger to determine the address to use when creating aritificial frames. This creates an extra attribute + relocation at tail call sites, which constitute 3-5% of all call sites in xnu/clang respectively. rdar://60307600 Differential Revision: https://reviews.llvm.org/D76336	2020-03-24 12:01:55 -07:00
Louis Dionne	0bf724260d	NFC: Fix typos in TestingGuide documentation	2020-03-24 14:54:55 -04:00
Louis Dionne	eaf303b7d3	[lit] NFC: Document missing result codes These result codes already exist, but they were not documented. I assume this is an oversight when adding these result codes.	2020-03-24 14:46:54 -04:00
Juneyoung Lee	49bbd5d17a	[DivRemPairs] Freeze operands if they can be undef values Summary: DivRemPairs is unsound with respect to undef values. ``` // bb1: // %rem = srem %x, %y // bb2: // %div = sdiv %x, %y // --> // bb1: // %div = sdiv %x, %y // %mul = mul %div, %y // %rem = sub %x, %mul ``` If X can be undef, X should be frozen first. For example, let's assume that Y = 1 & X = undef: ``` %div = sdiv undef, 1 // %div = undef %rem = srem undef, 1 // %rem = 0 => %div = sdiv undef, 1 // %div = undef %mul = mul %div, 1 // %mul = undef %rem = sub %x, %mul // %rem = undef - undef = undef ``` http://volta.cs.utah.edu:8080/z/m7Xrx5 Same for Y. If X = 1 and Y = (undef \| 1), %rem in src is either 1 or 0, but %rem in tgt can be one of many integer values. This resolves https://bugs.llvm.org/show_bug.cgi?id=42619 . This miscompilation disappears if undef value is removed, but it may take a while. DivRemPair happens pretty late during the optimization pipeline, so this optimization seemed as a good candidate to fix without major regression using freeze than other broken optimizations. Reviewers: spatel, lebedev.ri, george.burgess.iv Reviewed By: spatel Subscribers: wuzish, regehr, nlopes, nemanjai, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76483	2020-03-25 03:46:14 +09:00
Benjamin Kramer	43c419e4aa	[SelectionDAG] Don't crash when freezing illegal float types	2020-03-24 19:45:19 +01:00
Simon Pilgrim	7433341c72	[X86][AVX] Add some v32i16 to v32i8 style truncation shuffle tests	2020-03-24 18:38:13 +00:00
Matt Arsenault	962631ded7	AMDGPU/GlobalISel: Add more tests for add3 folding Forget to squash into 2ea46051055b37faf95c58daad57608bb7610f58	2020-03-24 14:30:24 -04:00
Matt Arsenault	7ae688c9cc	AMDGPU/GlobalISel: Add some more tests for add3 folding These currently fail to form add3 due to the pointer type, but they should be handle.	2020-03-24 14:26:23 -04:00
Matt Arsenault	66c5ce183c	AMDGPU/GlobalISel: Fix smrd loads of v4i64	2020-03-24 13:44:41 -04:00
Sanjay Patel	446f29b2c2	[ValueTracking] improve undef/poison analysis for constant vectors Differential Revision: https://reviews.llvm.org/D76702	2020-03-24 13:35:47 -04:00
LLVM GN Syncbot	caa4525525	[gn build] Port b91905a2637	2020-03-24 16:46:53 +00:00
Hiroshi Yamauchi	2be2675a03	Revert "Include static prof data when collecting loop BBs" This reverts commit 129c911efaa492790c251b3eb18e4db36b55cbc5. Due to an internal benchmark regression.	2020-03-24 09:41:16 -07:00
Nico Weber	b5e7c63121	[gn build] (manually) port 8140f6bcde4 better	2020-03-24 12:39:49 -04:00
Nico Weber	45c217b8bb	[gn build] (manually) port 8140f6bcde4	2020-03-24 12:38:25 -04:00
Nico Weber	8375fac139	[gn build] Port 49e5a97ec36	2020-03-24 12:36:08 -04:00
David Green	b8d11aabd8	[ARM] Fold VMOVrh VLDR to LDRH This adds a simple fold to combine VMOVrh load to a integer load. Similar to what is already performed for BITCAST, but needs to account for the types being of different sizes, creating an zero extending load. Differential Revision: https://reviews.llvm.org/D76485	2020-03-24 15:51:03 +00:00
Sanjay Patel	f8f875fc08	[InstSimplify] add tests for freeze(constexpr); NFC	2020-03-24 11:39:19 -04:00
Lama	5cc4cf3bbd	[MachinePipeliner] Fix a bug in Output Dependency chains The current implementation collects all Preds/Succs of a Dep of kind Output, creating a long chain and subsequently a schedule with an unnecessarily large II. Was this done on purpose for a reason I'm missing? Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D75424	2020-03-24 14:37:50 +00:00
Simon Pilgrim	eca2dede42	[X86][SSE1] Add support for logic+movmsk patterns (PR42870) rL368506 handled the basic case, but we need to account for boolean logic patterns as well.	2020-03-24 14:28:40 +00:00
Pavel Labath	58fd4ef93a	[DWARF] Fix v5 debug_line parsing of prologues with many files Summary: The directory_count and file_name_count fields are (section 6.2.4 of DWARF5 spec) supposed to be uleb128s, not bytes. This bug meant that it was not possible to correctly parse headers with more than 128 files or directories. I've found this bug by code inspection, though the limit is so small someone would have run into it for real sooner or later. I've verified that the producer side handles many files correctly, and that we are able to parse such files after this fix. Reviewers: dblaikie, jhenderson Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76498	2020-03-24 15:11:54 +01:00
Juneyoung Lee	c23deb9eda	[SelDag] Add FREEZE Summary: - Add FREEZE node to SelDag - Lower FreezeInst (in IR) to FREEZE node - Add Legalization for FREEZE node Reviewers: qcolombet, bogner, efriedma, lebedev.ri, nlopes, craig.topper, arsenm Reviewed By: lebedev.ri Subscribers: wdng, xbolva00, Petar.Avramovic, liuz, lkail, dylanmckay, hiraditya, Jim, arsenm, craig.topper, RKSimon, spatel, lebedev.ri, regehr, trentxintong, nlopes, mkuper, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D29014	2020-03-24 23:04:58 +09:00
Sanjay Patel	be3b79c242	[InstSimplify] add more tests for freeze(constant); NFC These should really be moved over to a ConstantFolding test file, but since this may overlap with the in-progress D76010 and similar tests already exist here, we can do that as a later cleanup.	2020-03-24 09:53:49 -04:00
Simon Pilgrim	6b9010d4c5	[X86][SSE1] Add additional logic+movmsk patterns that scalarize (PR42870) rL368506 handled the basic case, but we need to account for boolean logic patterns as well.	2020-03-24 13:20:41 +00:00
Florian Hahn	295234e5e3	[ConstantRange] Add initial support for binaryXor. The initial implementation just delegates to APInt's implementation of XOR for single element ranges and conservatively returns the full set otherwise. Reviewers: nikic, spatel, lebedev.ri Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D76453	2020-03-24 12:59:50 +00:00
Benjamin Kramer	57eaed442a	Make helpers static. NFC.	2020-03-24 13:43:00 +01:00
Simon Tatham	9307a8bf6e	[ReleaseNotes,ARM] MVE intrinsics are all implemented! Summary: The next release of LLVM will support the full ACLE spec for MVE intrinsics, so it's worth saying so in the release notes. Reviewers: kristof.beyls Reviewed By: kristof.beyls Subscribers: cfe-commits, hans, dmgreen, llvm-commits Tags: #llvm, #clang Differential Revision: https://reviews.llvm.org/D76513	2020-03-24 11:42:25 +00:00
Sam Parker	4bca9e9146	[NFC][ARM] Add missing tests	2020-03-24 11:08:01 +00:00
Simon Pilgrim	bb10a9825b	[UpdateTestChecks] Use common ir function name matcher and extend to accept periods in names (PR37586) Remove the local versions of the IR_FUNCTION_RE matcher (they weren't doing anything different), and ensure all the function name matchers accept '.' and '-'. We don't need to use '\.' inside python regex sets either, or '\-' as long as thats at the end of the set.	2020-03-24 10:59:30 +00:00
David Green	3d211b57a5	[ARM] Don't split trunc stores that can be better handled as VMOVN We deliberately split stores of the form store(truncate(larger-than-legal-type)) into two stores, allowing each store to perform part of the truncate for free. There are times however where it makes more sense to use VMOVN to de-interlace the results back into a single vector, and store that in one go. This adds a check for that situation, not splitting the store if it looks like a VMOVN can be more useful. Differential Revision: https://reviews.llvm.org/D76511	2020-03-24 08:48:52 +00:00
Sam Parker	52158b1a9d	[ARM][LowOverheadLoops] Add checks for narrowing Modify ValidateLiveOuts to track 'FalseLaneZeros' more precisely, including checks on specific operations that can generate non-zeros from zero values, e.g VMVN. We can then check that any instructions that retain some information in their output register (all narrowing instructions) that they only use and def registers that always have zeros in their falsely predicated bytes, whether or not tail predication happens. Most of the logic remains the same, just the names of the data structures and helpers have been renamed to reflect the change in logic. The key change, apart from the opcode checkers, is that the FalseZeros set now strictly contains only instructions which will always generate zeros, and not instructions that could also have their false bytes masked away later. Differential Revision: https://reviews.llvm.org/D76235	2020-03-24 08:41:48 +00:00
Sam Parker	392ec3c3a9	[ARM][MVE] Add target flag for narrowing insts Add a flag, 'RetainsPreviousHalfElement', for operations that operate on top/bottom halves of their input and only write to half of their destination, leaving the other half to retain its previous value. Differential Revision: https://reviews.llvm.org/D76608	2020-03-24 08:36:44 +00:00
Chen Zheng	3c07f2a94c	[PowerPC] fix a typo in commit 3f85134d710c Implement target hook isProfitableToHoist - typo fix.	2020-03-24 01:56:15 -04:00
Douglas Yung	a5f05f1ca4	Fix another instance where a variable was renamed in the generated LLVM IR. [NFC]	2020-03-23 22:53:29 -07:00
Jun Ma	801dde651c	[Coroutines] Also check lifetime intrinsic for local variable when build coroutine frame Currently we move all allocas into the frame when build coroutine frame in CoroSplit pass. However, this can be relaxed. Since CoroSplit pass run after Inline pass, we can use lifetime intrinsic to do such analysis: If the scope of lifetime intrinsic is not across any suspend point, rather than move the allocas to frame, we can just move them to entry bb of corresponding function. This reduce the frame size. More importantly, this also avoid data race in multithread environment. Consider one inline function by coroutine: it starts a thread which access local variables, while after inline the movement of allocs to frame also access them. cause data race. Differential Revision: https://reviews.llvm.org/D75664	2020-03-24 13:41:55 +08:00
Vedant Kumar	0a33dc9cac	[GlobalOpt] Treat null-check of loaded value as use of global (PR35760) PR35760 shows an example program which, when compiled with `clang -O0` or gcc at any optimization level, prints '0'. However, llvm transforms the program in a way that causes it to print '1'. Fix the issue by having `AllUsesOfValueWillTrapIfNull` return false when analyzing a load from a global which is used by an `icmp`. This special case was untested [0] so this is just deleting dead code. An alternative fix might be to change the GlobalStatus analysis for the global to report "Stored" instead of "StoredOnce". However, "StoredOnce" is appropriate when only one value other than the initializer is stored to the global. [0] http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/Transforms/IPO/GlobalOpt.cpp.html#L662 Differential Revision: https://reviews.llvm.org/D76645	2020-03-23 22:36:09 -07:00
Douglas Yung	21c50d3795	Make test more flexible for when the variable is renamed in the generated LLVM IR. [NFC]	2020-03-23 22:03:21 -07:00
Jinsong Ji	a67a65a968	[NFC][RUIP] Small debug output refine Add a new line, so that we always print MI in a new line, before and after UpdateRegMask, for easier check..	2020-03-24 03:29:45 +00:00
John McCall	93f2c3b449	Add an algorithm for performing "optimal" layout of a struct. The algorithm supports both assigning a fixed offset to a field prior to layout and allowing fields to have sizes that aren't multiples of their required alignments. This means that the well-known algorithm of sorting by decreasing alignment isn't always good enough. Still, we start with that, and only if that leaves padding around do we fall back on a greedy padding-minimizing algorithm. There is no known efficient algorithm for producing a guaranteed-minimal layout in all cases. In fact, allowing arbitrary fixed-offset fields means there's a straightforward reduction from bin-packing, making this NP-hard. But as usual with such problems, we can still efficiently produce adequate solutions to the cases that matter most to us. I intend to use this in coroutine frame layout, where the retcon lowerings very badly want to minimize total space usage, and where the switch lowering can indeed produce a header with interior padding if the promise field is highly-aligned. But it may be useful in a much wider variety of situations.	2020-03-23 23:24:48 -04:00
Jonas Devlieghere	f1c756f116	[VirtualFileSystem] Add unit test for vfs::YAMLVFSWriter Add a unit test for vfs::YAMLVFSWriter. This patch exposes an issue in the writer: when we call addFileMapping with a directory, the VFS writer will emit it as a regular file, causing any of the nested files or directories to not be found.	2020-03-23 18:49:06 -07:00
Peter Collingbourne	24071f551a	scudo: Create a public include directory. NFCI. For MTE error reporting we will need to expose interfaces for crash handlers to use to interpret scudo headers and metadata. The intent is that these interfaces will live in scudo/interface.h. Move the existing interface.h into an include/scudo directory and make it independent of the internal headers, so that we will be able to add the interfaces there. Differential Revision: https://reviews.llvm.org/D76648	2020-03-23 18:23:29 -07:00
Johannes Doerfert	019f4fb9d9	[OpenMPOpt] Initialize value to avoid use of uninitialized memory This should fix the issue reported here: https://reviews.llvm.org/D76058#1937554	2020-03-23 19:17:19 -05:00
Jessica Paquette	6ed3fad872	[GlobalISel] Combine G_SELECTs of the form (cond ? x : x) into x When we find something like this: ``` %a:_(s32) = G_SOMETHING ... ... %select:_(s32) = G_SELECT %cond(s1), %a, %a ``` We can remove the select and just replace it entirely with `%a` because it's always going to result in `%a`. Same if we have ``` %select:_(s32) = G_SELECT %cond(s1), %a, %b ``` where we can deduce that `%a == %b`. This implements the following cases: - `%select:_(s32) = G_SELECT %cond(s1), %a, %a` -> `%a` - `%select:_(s32) = G_SELECT %cond(s1), %a, %some_copy_from_a` -> `%a` - `%select:_(s32) = G_SELECT %cond(s1), %a, %b` -> `%a` when `%a` and `%b` are defined by identical instructions This gives a few minor code size improvements on CTMark at -O3 for AArch64. Differential Revision: https://reviews.llvm.org/D76523	2020-03-23 16:46:03 -07:00
Nemanja Ivanovic	86f2aa5d7c	[PowerPC] Improve handling of some BUILD_VECTOR nodes An analysis of real world code turned up a number of patterns with BUILD_VECTOR of nodes resulting from operations on extracted vector elements for which we produce poor code. This addresses those cases. No attempt is made for completeness as that would entail a large amount of work for something that there is no evidence of in real code. Differential revision: https://reviews.llvm.org/D72660	2020-03-23 17:34:29 -05:00
Stephen Neuendorffer	5f9c6088ec	[examples] Fixes for BUILD_SHARED_LIBS=on	2020-03-23 15:21:45 -07:00
Justin Hibbits	6457993f9d	[PowerPC]: e500 target can't use lwsync, use msync instead The e500 core has a silicon bug that triggers an illegal instruction program trap on any sync other than msync. Other cores will typically ignore illegal sync types, and the documentation even implies that the 'illegal' bits are ignored. Address this hardware deficiency by only using msync, like the PPC440. Differential Revision: https://reviews.llvm.org/D76614	2020-03-23 17:15:27 -05:00
Ladd Van Tol	d31a976026	Improve module.pcm lock file performance on machines with high core counts Summary: When building a large Xcode project with multiple module dependencies, and mixed Objective-C & Swift, I observed a large number of clang processes stalling at zero CPU for 30+ seconds throughout the build. This was especially prevalent on my 18-core iMac Pro. After some sampling, the major cause appears to be the lock file implementation for precompiled modules in the module cache. When the lock is heavily contended by multiple clang processes, the exponential backoff runs in lockstep, with some of the processes sleeping for 30+ seconds in order to acquire the file lock. In the attached patch, I implemented a more aggressive polling mechanism that limits the sleep interval to a max of 500ms, and randomizes the wait time. I preserved a limited form of exponential backoff. I also updated the code to use cross-platform timing, thread sleep, and random number capabilities available in C++11. On iMac Pro (2.3 GHz Intel Xeon W, 18 core): Xcode 11.1 bundled clang: 502.2 seconds (average of 5 runs) Custom clang build with LockFileManager patch applied: 276.6 seconds (average of 5 runs) This is a 1.82x speedup for this use case. On MacBook Pro (4 core 3.1GHz Intel i7): Xcode 11.1 bundled clang: 539.4 seconds (average of 2 runs) Custom clang build with LockFileManager patch applied: 509.5 seconds (average of 2 runs) As expected, machines with fewer cores benefit less from this change. ``` Call graph: 2992 Thread_393602 DispatchQueue_1: com.apple.main-thread (serial) 2992 start (in libdyld.dylib) + 1 [0x7fff6a1683d5] 2992 main (in clang) + 297 [0x1097a1059] 2992 driver_main(int, char const*) (in clang) + 2803 [0x1097a5513] 2992 cc1_main(llvm::ArrayRef<char const>, char const, void) (in clang) + 1608 [0x1097a7cc8] 2992 clang::ExecuteCompilerInvocation(clang::CompilerInstance) (in clang) + 3299 [0x1097dace3] 2992 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (in clang) + 509 [0x1097dcc1d] 2992 clang::FrontendAction::Execute() (in clang) + 42 [0x109818b3a] 2992 clang::ParseAST(clang::Sema&, bool, bool) (in clang) + 185 [0x10981b369] 2992 clang::Parser::ParseFirstTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&) (in clang) + 37 [0x10983e9b5] 2992 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&) (in clang) + 141 [0x10983ecfd] 2992 clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec) (in clang) + 695 [0x10983f3b7] 2992 clang::Parser::ParseObjCAtDirectives(clang::Parser::ParsedAttributesWithRange&) (in clang) + 637 [0x10a9be9bd] 2992 clang::Parser::ParseModuleImport(clang::SourceLocation) (in clang) + 170 [0x10c4841ba] 2992 clang::Parser::ParseModuleName(clang::SourceLocation, llvm::SmallVectorImpl<std::__1::pair<clang::IdentifierInfo, clang::SourceLocation> >&, bool) (in clang) + 503 [0x10c485267] 2992 clang::Preprocessor::Lex(clang::Token&) (in clang) + 316 [0x1098285cc] 2992 clang::Preprocessor::LexAfterModuleImport(clang::Token&) (in clang) + 690 [0x10cc7af62] 2992 clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<std::__1::pair<clang::IdentifierInfo, clang::SourceLocation> >, clang::Module::NameVisibilityKind, bool) (in clang) + 7989 [0x10bba6535] 2992 compileAndLoadModule(clang::CompilerInstance&, clang::SourceLocation, clang::SourceLocation, clang::Module*, llvm::StringRef) (in clang) + 296 [0x10bba8318] 2992 llvm::LockFileManager::waitForUnlock() (in clang) + 91 [0x10b6953ab] 2992 nanosleep (in libsystem_c.dylib) + 199 [0x7fff6a22c914] 2992 __semwait_signal (in libsystem_kernel.dylib) + 10 [0x7fff6a2a0f32] ``` Differential Revision: https://reviews.llvm.org/D69575	2020-03-23 14:59:39 -07:00
LLVM GN Syncbot	1e282655bf	[gn build] Port 7bf871c39f7	2020-03-23 21:05:55 +00:00
Matt Arsenault	3655fd0fd3	AMDGPU: Allow vectorization of round intrinsic There seems to be a small benefit to the legalized sequence for v2f16 round with packed instructions, so allow vectorizing it by reducing the cost. An unintended side effect is vectorization of f32 round also happens. The current FMA logic seems off to me, and isn't checking for packed instructions.	2020-03-23 17:00:41 -04:00
Matt Arsenault	e19af960e3	AMDGPU: Emit llvm.fshr for __builtin_amdgcn_alignbit These are equivalent. The generic rotate builtins do not directly map to the fshr intrinsic.	2020-03-23 16:51:25 -04:00

1 2 3 4 5 ...

193797 Commits