llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 04:32:44 +01:00

Author	SHA1	Message	Date
Tyker	6526abbd2a	Ignore/Drop droppable uses for code-sinking in InstCombine Summary: This patch allows code-sinking in InstCombine to be performed when instruction have uses in llvm.assume. Use are considered droppable when it is preferable to modify the User such that the use disappears rather than to prevent a transformation because of the use. for now uses are considered droppable if they are in an llvm.assume. Reviewers: jdoerfert, nikic, spatel, lebedev.ri, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73832	2020-03-25 20:42:52 +01:00
Alina Sbirlea	a5d6fe1986	[CFG/BasicBlock] Rename succ_const to const_succ. [NFC] Summary: Rename `succ_const_iterator` to `const_succ_iterator` and `succ_const_range` to `const_succ_range` for consistency with the predecessor iterators, and the corresponding iterators in MachineBasicBlock. Reviewers: nicholas, dblaikie, nlewycky Subscribers: hiraditya, bmahjour, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75952	2020-03-25 12:40:55 -07:00
Heejin Ahn	58a2f8e25d	[WebAssembly] Move event section before global section Summary: https://github.com/WebAssembly/exception-handling/issues/98 Also this moves many parts of code to make code align with the section order, even if they don't affect the output. Reviewers: tlively Subscribers: dschuff, sbc100, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76752	2020-03-25 11:49:03 -07:00
Nico Weber	47138b2f83	Suppress a few -Wunreachable-code warnings. No behavior change. Also fix a comment to say match reality.	2020-03-25 13:55:42 -04:00
Simon Pilgrim	18ad2e6d54	[X86][AVX] Combine shuffles to TRUNCATE/VTRUNC patterns Add support for combining shuffles to AVX512 truncate instructions - another step toward fixing D56387/D66004. It also fixes SKX code on PR31443. We could probably extend this further to handle non-VLX truncation cases.	2020-03-25 17:41:51 +00:00
Gil Rapaport	41cef53616	[LV] Replace stored value with a VPValue (NFCI) InnerLoopVectorizer's code called during VPlan execution still relies on original IR's def-use relations to decide which vector code to generate, limiting VPlan transformations ability to modify def-use relations and still have ILV generate the vector code. This commit introduces a VPValue for VPWidenMemoryInstructionRecipe to use as the stored value. The recipe is generated with a VPValue wrapping the stored value of the scalar store. This reduces ingredient def-use usage by ILV as a step towards full VPlan-based def-use relations. Differential Revision: https://reviews.llvm.org/D76373	2020-03-25 19:36:55 +02:00
Tyker	e193e13b1b	[NFC] Rename function to match Coding Convention and fix typo in KnowledgeRetention	2020-03-25 18:31:13 +01:00
Mikhail Maltsev	3301f6f1d2	[ARM,CDE] Implement predicated Q-register CDE intrinsics Summary: This patch implements the following CDE intrinsics: T __arm_vcx1q_m(int coproc, T inactive, uint32_t imm, mve_pred_t p); T __arm_vcx2q_m(int coproc, T inactive, U n, uint32_t imm, mve_pred_t p); T __arm_vcx3q_m(int coproc, T inactive, U n, V m, uint32_t imm, mve_pred_t p); T __arm_vcx1qa_m(int coproc, T acc, uint32_t imm, mve_pred_t p); T __arm_vcx2qa_m(int coproc, T acc, U n, uint32_t imm, mve_pred_t p); T __arm_vcx3qa_m(int coproc, T acc, U n, V m, uint32_t imm, mve_pred_t p); The intrinsics are not part of the released ACLE spec, but internally at Arm we have reached consensus to add them to the next ACLE release. Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen Reviewed By: simon_tatham Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76610	2020-03-25 17:08:19 +00:00
Yvan Roux	c9838f5f9f	[ARM] Move ConstantIsland and LowOverheadLoops Passes. Move ARM ConstantIsland and LowOverheadLopps passes later in the pipeline such that they will be run after the upcoming Machine Outlining pass. Differential Revision: https://reviews.llvm.org/D76065	2020-03-25 16:49:21 +01:00
cdevadas	63a2b80308	[AMDGPU] Add SIPreEmitPeephole pass. This pass can handle all the optimization opportunities found just before code emission. Presently it includes the handling of vcc branch optimization that was handled earlier in SIInsertSkips. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D76712	2020-03-25 15:35:35 +00:00
Jonas Paulsson	8975f60913	[SystemZ] Improve foldMemoryOperandImpl() A spilled load of an immediate can use MVHI/MVGHI instead. A compare of a spilled register against an immediate can use CHSI/CGHSI. A logical compare can use CLFHSI/CLGHSI. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D76055	2020-03-25 16:21:08 +01:00
Sean Fertile	f0b5c97abf	[PowerPC][AIX] ByVal formal arguments in a single register. Adds support for passing ByVal formal arguments as long as they fit in a single register. Differential Revision: https://reviews.llvm.org/D76401	2020-03-25 11:09:40 -04:00
Kerry McLaughlin	1b630dbb38	[AArch64][SVE] Add SVE intrinsics for masked loads & stores Summary: Implements the following intrinsics for contiguous loads & stores: - @llvm.aarch64.sve.ld1 - @llvm.aarch64.sve.st1 Reviewers: sdesmalen, andwar, efriedma, cameron.mcinally, dancgr, rengolin Reviewed By: cameron.mcinally Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76688	2020-03-25 11:48:40 +00:00
Sam Parker	6f6eed9671	[ARM][MVE] Add HorizontalReduction flag Add a target flag for instructions that reduce into one, or more, scalar reg(s), including variants of: - VADDV - VABAV - VMINV/VMAXV - VMLADAV Differential Revision: https://reviews.llvm.org/D76683	2020-03-25 11:12:03 +00:00
Kazushi (Jam) Marukawa	69765715bb	[VE] Change name of enum to CondCode Summary: Change enum name for condition codes from CondCodes to CondCode. Reviewers: arsenm, simoll, k-ishizaka Reviewed By: arsenm Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm, #ve Differential Revision: https://reviews.llvm.org/D76747	2020-03-25 09:20:05 +01:00
Juneyoung Lee	1f40e952e7	Minor fixes to a comment in CodeGenPrepare	2020-03-25 16:34:43 +09:00
Adrian Prantl	49996d074b	Add an -object-path-prefix option to dsymutil to remap object file paths (but no source paths) before processing. This is meant to be used for Clang objects where the module cache location was remapped using ``-fdebug-prefix-map``; to help dsymutil find the Clang module cache. <rdar://problem/55685132> Differential Revision: https://reviews.llvm.org/D76391	2020-03-24 17:13:42 -07:00
Matt Arsenault	c9dcead077	GlobalISel: Introduce bitcast legalize action For some operations, the type is unimportant and only the number of bits matters. For example I don't want to treat <4 x s8> as a legal type, but I also don't want to decompose loads of this into smaller pieces to get legal register types. On AMDGPU in SelectionDAG, we legalize a number of operations (most notably load and store) by coercing all types to vectors of i32. For GlobalISel, I'm trying very hard to avoid doing this for every type, but I don't think this strategy can be completely avoided. I'm trying to avoid bitcasts for any legitimately legal type we can operate on, since the intervening bitcasts have proven to be a hassle. For loads, I think I can get away without ever casting the result type, and handling any arbitrary bitwidth during selection (I will eventually want new tablegen support to help with this, rather than having to add every possible type as legal). The unmerge required to do anything with the value should expand to the expected shifts. This is trickier for stores, since it would now require handling a wide array of truncates during selection which I don't want. Future potentially interesting case are for vector indexing, where sub-dword type should be indexed in s32 pieces.	2020-03-24 19:33:33 -04:00
Nikita Popov	4edeaf3fd8	[LVI] Convert some checks to assertions; NFC solveBlockValue() should only be called if the value isn't cached yet. Similarly, it does not make sense to "solve" a constant.	2020-03-24 23:11:13 +01:00
Amara Emerson	9cd6b80807	[AArch64][GlobalISel] Don't localize TLS G_GLOBAL_VALUEs on Darwin. On Darwin these need to be selected into a function call for the TLS address lookup. As a result, they can't be moved below a physreg write, which happens in call sequences. In the long term, we should have some mechanism in the localizer to prevent localizing into target-specific atomic instruction sequences. rdar://60056248 Differential Revision: https://reviews.llvm.org/D76652	2020-03-24 13:35:50 -07:00
Johannes Doerfert	31d276a1c1	[Attributor] Use knowledge retained in llvm.assume (operand bundles) This patch integrates operand bundle llvm.assumes [0] with the Attributor. Most IRAttributes will now look at uses of the associated value and if there are llvm.assume operand bundle uses with the right tag we will check if they are in the must-be-executed-context (around the context instruction). Droppable users, which is currently only llvm::assume, are handled special in some places now as well. [0] http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D74888	2020-03-24 15:33:40 -05:00
Craig Topper	7db5b91467	[X86] Disable autoupgrade support for avx512.mask.broadcasti32x2.* and avx512.mask.broadcastf32x2.*. These intrinsics take a v4i32/v4f32 input and are supposed to broadcast elements 0 and 1. Instead the autoupgrade code was broadcasting elements 0, 1, 2, and 3. I could fix the autoupgrade, but since its been broken for years it seemed better just to steer anyone still trying to use it away completely.	2020-03-24 12:35:24 -07:00
Reid Kleckner	a7424b53fb	Re-land "Avoid emitting unreachable SP adjustments after `throw`" This reverts commit 4e0fe038f438ae1679eae9e156e1f248595b2373. Re-lands 65b21282c710afe9c275778820c6e3c1cf46734b. After landing 5ff5ddd0adc89f8827b345577bbb3e7eb74fc644 to add int3 into trailing unreachable blocks, we can now remove these extra stack adjustments without confusing the Win64 unwinder. See https://llvm.org/45064#c4 or X86AvoidTrailingCall.cpp for a full explanation. Fixes PR45064.	2020-03-24 12:04:43 -07:00
Vedant Kumar	baf8348499	[DWARF] Emit DW_AT_call_pc for tail calls Record the address of a tail-calling branch instruction within its call site entry using DW_AT_call_pc. This allows a debugger to determine the address to use when creating aritificial frames. This creates an extra attribute + relocation at tail call sites, which constitute 3-5% of all call sites in xnu/clang respectively. rdar://60307600 Differential Revision: https://reviews.llvm.org/D76336	2020-03-24 12:01:55 -07:00
Juneyoung Lee	49bbd5d17a	[DivRemPairs] Freeze operands if they can be undef values Summary: DivRemPairs is unsound with respect to undef values. ``` // bb1: // %rem = srem %x, %y // bb2: // %div = sdiv %x, %y // --> // bb1: // %div = sdiv %x, %y // %mul = mul %div, %y // %rem = sub %x, %mul ``` If X can be undef, X should be frozen first. For example, let's assume that Y = 1 & X = undef: ``` %div = sdiv undef, 1 // %div = undef %rem = srem undef, 1 // %rem = 0 => %div = sdiv undef, 1 // %div = undef %mul = mul %div, 1 // %mul = undef %rem = sub %x, %mul // %rem = undef - undef = undef ``` http://volta.cs.utah.edu:8080/z/m7Xrx5 Same for Y. If X = 1 and Y = (undef \| 1), %rem in src is either 1 or 0, but %rem in tgt can be one of many integer values. This resolves https://bugs.llvm.org/show_bug.cgi?id=42619 . This miscompilation disappears if undef value is removed, but it may take a while. DivRemPair happens pretty late during the optimization pipeline, so this optimization seemed as a good candidate to fix without major regression using freeze than other broken optimizations. Reviewers: spatel, lebedev.ri, george.burgess.iv Reviewed By: spatel Subscribers: wuzish, regehr, nlopes, nemanjai, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76483	2020-03-25 03:46:14 +09:00
Benjamin Kramer	43c419e4aa	[SelectionDAG] Don't crash when freezing illegal float types	2020-03-24 19:45:19 +01:00
Matt Arsenault	66c5ce183c	AMDGPU/GlobalISel: Fix smrd loads of v4i64	2020-03-24 13:44:41 -04:00
Sanjay Patel	446f29b2c2	[ValueTracking] improve undef/poison analysis for constant vectors Differential Revision: https://reviews.llvm.org/D76702	2020-03-24 13:35:47 -04:00
Hiroshi Yamauchi	2be2675a03	Revert "Include static prof data when collecting loop BBs" This reverts commit 129c911efaa492790c251b3eb18e4db36b55cbc5. Due to an internal benchmark regression.	2020-03-24 09:41:16 -07:00
David Green	b8d11aabd8	[ARM] Fold VMOVrh VLDR to LDRH This adds a simple fold to combine VMOVrh load to a integer load. Similar to what is already performed for BITCAST, but needs to account for the types being of different sizes, creating an zero extending load. Differential Revision: https://reviews.llvm.org/D76485	2020-03-24 15:51:03 +00:00
Lama	5cc4cf3bbd	[MachinePipeliner] Fix a bug in Output Dependency chains The current implementation collects all Preds/Succs of a Dep of kind Output, creating a long chain and subsequently a schedule with an unnecessarily large II. Was this done on purpose for a reason I'm missing? Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D75424	2020-03-24 14:37:50 +00:00
Simon Pilgrim	eca2dede42	[X86][SSE1] Add support for logic+movmsk patterns (PR42870) rL368506 handled the basic case, but we need to account for boolean logic patterns as well.	2020-03-24 14:28:40 +00:00
Pavel Labath	58fd4ef93a	[DWARF] Fix v5 debug_line parsing of prologues with many files Summary: The directory_count and file_name_count fields are (section 6.2.4 of DWARF5 spec) supposed to be uleb128s, not bytes. This bug meant that it was not possible to correctly parse headers with more than 128 files or directories. I've found this bug by code inspection, though the limit is so small someone would have run into it for real sooner or later. I've verified that the producer side handles many files correctly, and that we are able to parse such files after this fix. Reviewers: dblaikie, jhenderson Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76498	2020-03-24 15:11:54 +01:00
Juneyoung Lee	c23deb9eda	[SelDag] Add FREEZE Summary: - Add FREEZE node to SelDag - Lower FreezeInst (in IR) to FREEZE node - Add Legalization for FREEZE node Reviewers: qcolombet, bogner, efriedma, lebedev.ri, nlopes, craig.topper, arsenm Reviewed By: lebedev.ri Subscribers: wdng, xbolva00, Petar.Avramovic, liuz, lkail, dylanmckay, hiraditya, Jim, arsenm, craig.topper, RKSimon, spatel, lebedev.ri, regehr, trentxintong, nlopes, mkuper, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D29014	2020-03-24 23:04:58 +09:00
Florian Hahn	295234e5e3	[ConstantRange] Add initial support for binaryXor. The initial implementation just delegates to APInt's implementation of XOR for single element ranges and conservatively returns the full set otherwise. Reviewers: nikic, spatel, lebedev.ri Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D76453	2020-03-24 12:59:50 +00:00
Benjamin Kramer	57eaed442a	Make helpers static. NFC.	2020-03-24 13:43:00 +01:00
David Green	3d211b57a5	[ARM] Don't split trunc stores that can be better handled as VMOVN We deliberately split stores of the form store(truncate(larger-than-legal-type)) into two stores, allowing each store to perform part of the truncate for free. There are times however where it makes more sense to use VMOVN to de-interlace the results back into a single vector, and store that in one go. This adds a check for that situation, not splitting the store if it looks like a VMOVN can be more useful. Differential Revision: https://reviews.llvm.org/D76511	2020-03-24 08:48:52 +00:00
Sam Parker	52158b1a9d	[ARM][LowOverheadLoops] Add checks for narrowing Modify ValidateLiveOuts to track 'FalseLaneZeros' more precisely, including checks on specific operations that can generate non-zeros from zero values, e.g VMVN. We can then check that any instructions that retain some information in their output register (all narrowing instructions) that they only use and def registers that always have zeros in their falsely predicated bytes, whether or not tail predication happens. Most of the logic remains the same, just the names of the data structures and helpers have been renamed to reflect the change in logic. The key change, apart from the opcode checkers, is that the FalseZeros set now strictly contains only instructions which will always generate zeros, and not instructions that could also have their false bytes masked away later. Differential Revision: https://reviews.llvm.org/D76235	2020-03-24 08:41:48 +00:00
Sam Parker	392ec3c3a9	[ARM][MVE] Add target flag for narrowing insts Add a flag, 'RetainsPreviousHalfElement', for operations that operate on top/bottom halves of their input and only write to half of their destination, leaving the other half to retain its previous value. Differential Revision: https://reviews.llvm.org/D76608	2020-03-24 08:36:44 +00:00
Chen Zheng	3c07f2a94c	[PowerPC] fix a typo in commit 3f85134d710c Implement target hook isProfitableToHoist - typo fix.	2020-03-24 01:56:15 -04:00
Jun Ma	801dde651c	[Coroutines] Also check lifetime intrinsic for local variable when build coroutine frame Currently we move all allocas into the frame when build coroutine frame in CoroSplit pass. However, this can be relaxed. Since CoroSplit pass run after Inline pass, we can use lifetime intrinsic to do such analysis: If the scope of lifetime intrinsic is not across any suspend point, rather than move the allocas to frame, we can just move them to entry bb of corresponding function. This reduce the frame size. More importantly, this also avoid data race in multithread environment. Consider one inline function by coroutine: it starts a thread which access local variables, while after inline the movement of allocs to frame also access them. cause data race. Differential Revision: https://reviews.llvm.org/D75664	2020-03-24 13:41:55 +08:00
Vedant Kumar	0a33dc9cac	[GlobalOpt] Treat null-check of loaded value as use of global (PR35760) PR35760 shows an example program which, when compiled with `clang -O0` or gcc at any optimization level, prints '0'. However, llvm transforms the program in a way that causes it to print '1'. Fix the issue by having `AllUsesOfValueWillTrapIfNull` return false when analyzing a load from a global which is used by an `icmp`. This special case was untested [0] so this is just deleting dead code. An alternative fix might be to change the GlobalStatus analysis for the global to report "Stored" instead of "StoredOnce". However, "StoredOnce" is appropriate when only one value other than the initializer is stored to the global. [0] http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/Transforms/IPO/GlobalOpt.cpp.html#L662 Differential Revision: https://reviews.llvm.org/D76645	2020-03-23 22:36:09 -07:00
Jinsong Ji	a67a65a968	[NFC][RUIP] Small debug output refine Add a new line, so that we always print MI in a new line, before and after UpdateRegMask, for easier check..	2020-03-24 03:29:45 +00:00
John McCall	93f2c3b449	Add an algorithm for performing "optimal" layout of a struct. The algorithm supports both assigning a fixed offset to a field prior to layout and allowing fields to have sizes that aren't multiples of their required alignments. This means that the well-known algorithm of sorting by decreasing alignment isn't always good enough. Still, we start with that, and only if that leaves padding around do we fall back on a greedy padding-minimizing algorithm. There is no known efficient algorithm for producing a guaranteed-minimal layout in all cases. In fact, allowing arbitrary fixed-offset fields means there's a straightforward reduction from bin-packing, making this NP-hard. But as usual with such problems, we can still efficiently produce adequate solutions to the cases that matter most to us. I intend to use this in coroutine frame layout, where the retcon lowerings very badly want to minimize total space usage, and where the switch lowering can indeed produce a header with interior padding if the promise field is highly-aligned. But it may be useful in a much wider variety of situations.	2020-03-23 23:24:48 -04:00
Johannes Doerfert	019f4fb9d9	[OpenMPOpt] Initialize value to avoid use of uninitialized memory This should fix the issue reported here: https://reviews.llvm.org/D76058#1937554	2020-03-23 19:17:19 -05:00
Jessica Paquette	6ed3fad872	[GlobalISel] Combine G_SELECTs of the form (cond ? x : x) into x When we find something like this: ``` %a:_(s32) = G_SOMETHING ... ... %select:_(s32) = G_SELECT %cond(s1), %a, %a ``` We can remove the select and just replace it entirely with `%a` because it's always going to result in `%a`. Same if we have ``` %select:_(s32) = G_SELECT %cond(s1), %a, %b ``` where we can deduce that `%a == %b`. This implements the following cases: - `%select:_(s32) = G_SELECT %cond(s1), %a, %a` -> `%a` - `%select:_(s32) = G_SELECT %cond(s1), %a, %some_copy_from_a` -> `%a` - `%select:_(s32) = G_SELECT %cond(s1), %a, %b` -> `%a` when `%a` and `%b` are defined by identical instructions This gives a few minor code size improvements on CTMark at -O3 for AArch64. Differential Revision: https://reviews.llvm.org/D76523	2020-03-23 16:46:03 -07:00
Nemanja Ivanovic	86f2aa5d7c	[PowerPC] Improve handling of some BUILD_VECTOR nodes An analysis of real world code turned up a number of patterns with BUILD_VECTOR of nodes resulting from operations on extracted vector elements for which we produce poor code. This addresses those cases. No attempt is made for completeness as that would entail a large amount of work for something that there is no evidence of in real code. Differential revision: https://reviews.llvm.org/D72660	2020-03-23 17:34:29 -05:00
Justin Hibbits	6457993f9d	[PowerPC]: e500 target can't use lwsync, use msync instead The e500 core has a silicon bug that triggers an illegal instruction program trap on any sync other than msync. Other cores will typically ignore illegal sync types, and the documentation even implies that the 'illegal' bits are ignored. Address this hardware deficiency by only using msync, like the PPC440. Differential Revision: https://reviews.llvm.org/D76614	2020-03-23 17:15:27 -05:00
Ladd Van Tol	d31a976026	Improve module.pcm lock file performance on machines with high core counts Summary: When building a large Xcode project with multiple module dependencies, and mixed Objective-C & Swift, I observed a large number of clang processes stalling at zero CPU for 30+ seconds throughout the build. This was especially prevalent on my 18-core iMac Pro. After some sampling, the major cause appears to be the lock file implementation for precompiled modules in the module cache. When the lock is heavily contended by multiple clang processes, the exponential backoff runs in lockstep, with some of the processes sleeping for 30+ seconds in order to acquire the file lock. In the attached patch, I implemented a more aggressive polling mechanism that limits the sleep interval to a max of 500ms, and randomizes the wait time. I preserved a limited form of exponential backoff. I also updated the code to use cross-platform timing, thread sleep, and random number capabilities available in C++11. On iMac Pro (2.3 GHz Intel Xeon W, 18 core): Xcode 11.1 bundled clang: 502.2 seconds (average of 5 runs) Custom clang build with LockFileManager patch applied: 276.6 seconds (average of 5 runs) This is a 1.82x speedup for this use case. On MacBook Pro (4 core 3.1GHz Intel i7): Xcode 11.1 bundled clang: 539.4 seconds (average of 2 runs) Custom clang build with LockFileManager patch applied: 509.5 seconds (average of 2 runs) As expected, machines with fewer cores benefit less from this change. ``` Call graph: 2992 Thread_393602 DispatchQueue_1: com.apple.main-thread (serial) 2992 start (in libdyld.dylib) + 1 [0x7fff6a1683d5] 2992 main (in clang) + 297 [0x1097a1059] 2992 driver_main(int, char const*) (in clang) + 2803 [0x1097a5513] 2992 cc1_main(llvm::ArrayRef<char const>, char const, void) (in clang) + 1608 [0x1097a7cc8] 2992 clang::ExecuteCompilerInvocation(clang::CompilerInstance) (in clang) + 3299 [0x1097dace3] 2992 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (in clang) + 509 [0x1097dcc1d] 2992 clang::FrontendAction::Execute() (in clang) + 42 [0x109818b3a] 2992 clang::ParseAST(clang::Sema&, bool, bool) (in clang) + 185 [0x10981b369] 2992 clang::Parser::ParseFirstTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&) (in clang) + 37 [0x10983e9b5] 2992 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&) (in clang) + 141 [0x10983ecfd] 2992 clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec) (in clang) + 695 [0x10983f3b7] 2992 clang::Parser::ParseObjCAtDirectives(clang::Parser::ParsedAttributesWithRange&) (in clang) + 637 [0x10a9be9bd] 2992 clang::Parser::ParseModuleImport(clang::SourceLocation) (in clang) + 170 [0x10c4841ba] 2992 clang::Parser::ParseModuleName(clang::SourceLocation, llvm::SmallVectorImpl<std::__1::pair<clang::IdentifierInfo, clang::SourceLocation> >&, bool) (in clang) + 503 [0x10c485267] 2992 clang::Preprocessor::Lex(clang::Token&) (in clang) + 316 [0x1098285cc] 2992 clang::Preprocessor::LexAfterModuleImport(clang::Token&) (in clang) + 690 [0x10cc7af62] 2992 clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<std::__1::pair<clang::IdentifierInfo, clang::SourceLocation> >, clang::Module::NameVisibilityKind, bool) (in clang) + 7989 [0x10bba6535] 2992 compileAndLoadModule(clang::CompilerInstance&, clang::SourceLocation, clang::SourceLocation, clang::Module*, llvm::StringRef) (in clang) + 296 [0x10bba8318] 2992 llvm::LockFileManager::waitForUnlock() (in clang) + 91 [0x10b6953ab] 2992 nanosleep (in libsystem_c.dylib) + 199 [0x7fff6a22c914] 2992 __semwait_signal (in libsystem_kernel.dylib) + 10 [0x7fff6a2a0f32] ``` Differential Revision: https://reviews.llvm.org/D69575	2020-03-23 14:59:39 -07:00
Matt Arsenault	3655fd0fd3	AMDGPU: Allow vectorization of round intrinsic There seems to be a small benefit to the legalized sequence for v2f16 round with packed instructions, so allow vectorizing it by reducing the cost. An unintended side effect is vectorization of f32 round also happens. The current FMA logic seems off to me, and isn't checking for packed instructions.	2020-03-23 17:00:41 -04:00

1 2 3 4 5 ...

132534 Commits