llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 11:13:28 +01:00

Author	SHA1	Message	Date
LLVM GN Syncbot	594960eb50	[gn build] Port e69e551e0e5	2020-12-18 13:00:09 +00:00
Kerry McLaughlin	a9d5f92f0d	[SVE][CodeGen] Vector + immediate addressing mode for masked gather/scatter This patch extends LowerMGATHER/MSCATTER to make use of the vector + reg/immediate addressing modes for scalable masked gathers & scatters. selectGatherScatterAddrMode checks if the base pointer is null, in which case we can swap the base pointer and the index, e.g. getelementptr nullptr, <vscale x N x T> (splat(%offset)) + %indices) -> getelementptr %offset, <vscale x N x T> %indices Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D93132	2020-12-18 11:56:36 +00:00
Simon Pilgrim	28cf5e1f81	[X86][AVX] Replace extract_subvector(broadcast(), 0) folds with generic SimplifyDemandedVectorEltsForTargetNode handling. Simplifies a few more cases, notably shuffle demanded elts cases.	2020-12-18 11:51:10 +00:00
Carl Ritson	b988d0d807	[AMDGPU][NFC] Remove unused Hi16Elt definition	2020-12-18 20:38:54 +09:00
Lucas Prates	bf2fbafd5f	[AArch64] Add support for the SPE-EEF feature This is an addition to the existing Statistical Profiling extension, which introduces an extra system register that is enabled by the new 'spe-eef' subtarget feature. Patch written by Simon Tatham. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D92391	2020-12-18 11:11:56 +00:00
Lucas Prates	60c2a88e72	[AArch64] Add support for the Branch Record Buffer extension This introduces asm support for the Branch Record Buffer extension, through the new 'brbe' subtarget feature. It consists of a new set of system registers that enable the handling of branch records. Patch written by Simon Tatham. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D92389	2020-12-18 11:11:06 +00:00
Carl Ritson	d466c5273e	[AMDGPU][NFC] Document high parameter of f16 interp intrinsics	2020-12-18 19:59:13 +09:00
Cullen Rhodes	b434082b2a	[TTI] Add supportsScalableVectors target hook This is split off from D91718 and adds a new target hook supportsScalableVectors that can be queried to check if scalable vectors are supported by the backend. For AArch64 this returns true if SVE is enabled. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D93060	2020-12-18 10:37:01 +00:00
Bjorn Pettersson	3774e2781f	Add intrinsics for saturating float to int casts This patch adds support for the fptoui.sat and fptosi.sat intrinsics, which provide basically the same functionality as the existing fptoui and fptosi instructions, but will saturate (or return 0 for NaN) on values unrepresentable in the target type, instead of returning poison. Related mailing list discussion can be found at: https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ The intrinsics have overloaded source and result type and support vector operands: i32 @llvm.fptoui.sat.i32.f32(float %f) i100 @llvm.fptoui.sat.i100.f64(double %f) <4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f) // etc On the SelectionDAG layer two new ISD opcodes are added, FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands and one result. The second operand is an integer constant specifying the scalar saturation width. The idea here is that initially the second operand and the scalar width of the result type are the same, but they may change during type legalization. For example: i19 @llvm.fptsi.sat.i19.f32(float %f) // builds i19 fp_to_sint_sat f, 19 // type legalizes (through integer result promotion) i32 fp_to_sint_sat f, 19 I went for this approach, because saturated conversion does not compose well. There is no good way of "adjusting" a saturating conversion to i32 into one to i19 short of saturating twice. Specifying the saturation width separately allows directly saturating to the correct width. There are two baseline expansions for the fp_to_xint_sat opcodes. If the integer bounds can be exactly represented in the float type and fminnum/fmaxnum are legal, we can expand to something like: f = fmaxnum f, FP(MIN) f = fminnum f, FP(MAX) i = fptoxi f i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN If the bounds cannot be exactly represented, we expand to something like this instead: i = fptoxi f i = select f ult FP(MIN), MIN, i i = select f ogt FP(MAX), MAX, i i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN It should be noted that this expansion assumes a non-trapping fptoxi. Initial tests are for AArch64, x86_64 and ARM. This exercises all of the scalar and vector legalization. ARM is included to test float softening. Original patch by @nikic and @ebevhan (based on D54696). Differential Revision: https://reviews.llvm.org/D54749	2020-12-18 11:09:41 +01:00
Yevgeny Rouban	80a4f7c734	[IndVars] A test for adding trunc instructions to unwind blocks Differential Revision: https://reviews.llvm.org/D93521 Reviewed By: skatkov	2020-12-18 17:08:26 +07:00
Kazu Hirata	e0c0bfdedc	[InlineCost] Implement cost-benefit-based inliner This patch adds an alternative cost metric for the inliner to take into account both the cost (i.e. size) and cycle count savings into account. Without this patch, we decide to inline a given call site if the size of inlining the call site is below the threshold that is computed according to the hotness of the call site. This patch adds a new cost metric, turned off by default, to take over the handling of hot call sites. Specifically, with the new cost metric, we decide to inline a given call site if the ratio of cycle savings to size exceeds a threshold. The cycle savings are computed from call site costs, parameter propagation, folded conditional branches, etc, all weighted by their respective profile counts. The size is primarily the callee size, but we subtract call site costs and the size of basic blocks that are never executed. The new cost metric implicitly takes advantage of the machine function splitter recently introduced by Snehasish Kumar, which dramatically reduces the cost of duplicating (e.g. inlining) cold basic blocks by placing cold basic blocks of hot functions in the .text.split section. We evaluated the new cost metric on clang bootstrap and SPECInt 2017. For clang bootstrap, we observe 0.69% runtime improvement. For SPECInt we report the change in IntRate the C/C++ benchmarks. All benchmarks apart from perlbench and omnetpp improve, on average by 0.21% with the max for mcf at 1.96%. Benchmark % Change 500.perlbench_r -0.45 502.gcc_r 0.13 505.mcf_r 1.96 520.omnetpp_r -0.28 523.xalancbmk_r 0.49 525.x264_r 0.00 531.deepsjeng_r 0.00 541.leela_r 0.35 557.xz_r 0.21 Differential Revision: https://reviews.llvm.org/D92780	2020-12-18 00:37:24 -08:00
Jan Svoboda	da37def107	[clang][cli] Convert Analyzer option string based options to new option parsing system Depends on D84185 Reviewed By: dexonsmith Original patch by Daniel Grumberg. Differential Revision: https://reviews.llvm.org/D84186	2020-12-18 08:56:06 +01:00
QingShan Zhang	4c07a9f4d6	[PowerPC] Select the D-Form load if we know its offset meets the requirement The LD/STD likewise instruction are selected only when the alignment in the load/store >= 4 to deal with the case that the offset might not be known(i.e. relocations). That means we have to select the X-Form load for %0 = load i64, i64* %arrayidx, align 2 In fact, we can still select the D-Form load if the offset is known. So, we only query the load/store alignment when we don't know if the offset is a multiple of 4. Reviewed By: jji, Nemanjai Differential Revision: https://reviews.llvm.org/D93099	2020-12-18 07:27:26 +00:00
Mircea Trofin	a76e040629	[NFC][utils] Factor remaining APIs under FunctionTestBuilder Finishing the refactoring started in D93413. Differential Revision: https://reviews.llvm.org/D93506	2020-12-17 22:13:14 -08:00
Yevgeny Rouban	011d4cc599	[IndVars] Fix adding trunc instructions to unwind blocks Truncate instruction must not be inserted before landing pads. The insertion point is fixed.	2020-12-18 12:52:23 +07:00
Kazu Hirata	e527e2e6d7	[IVDescriptors] Remove getConsecutiveDirection (NFC) The last use of the function was removed on Sep 18, 2016 in commit 5f8cc0c3469ba3a7aa440b43aaababa3a6274213. The function was later moved to llvm/lib/Analysis/IVDescriptors.cpp on Sep 12, 2018 in commit 7e98d69847aefb1028aaa7131b508f4b4e9896ae.	2020-12-17 20:19:15 -08:00
Kazu Hirata	4088b88d51	[Transforms] Use llvm::erase_if (NFC)	2020-12-17 19:53:10 -08:00
Hsiangkai Wang	d53504a97c	[RISCV] Remove NoVReg to avoid compile warning messages.	2020-12-18 11:37:47 +08:00
Rong Xu	be7baa3c43	Fix clang-ppc64le-rhel buildbot build error ix buildbot build error due to commit 3733463d: [IR][PGO] Add hot func attribute and use hot/cold attribute in func section	2020-12-17 19:14:43 -08:00
Rong Xu	92b9137fcb	[IR][PGO] Add hot func attribute and use hot/cold attribute in func section Clang FE currently has hot/cold function attribute. But we only have cold function attribute in LLVM IR. This patch adds support of hot function attribute to LLVM IR. This attribute will be used in setting function section prefix/suffix. Currently .hot and .unlikely suffix only are added in PGO (Sample PGO) compilation (through isFunctionHotInCallGraph and isFunctionColdInCallGraph). This patch changes the behavior. The new behavior is: (1) If the user annotates a function as hot or isFunctionHotInCallGraph is true, this function will be marked as hot. Otherwise, (2) If the user annotates a function as cold or isFunctionColdInCallGraph is true, this function will be marked as cold. The changes are: (1) user annotated function attribute will used in setting function section prefix/suffix. (2) hot attribute overwrites profile count based hotness. (3) profile count based hotness overwrite user annotated cold attribute. The intention for these changes is to provide the user a way to mark certain function as hot in cases where training input is hard to cover all the hot functions. Differential Revision: https://reviews.llvm.org/D92493	2020-12-17 18:41:12 -08:00
Monk Chiang	f958745fcd	[RISCV] Define vsadd/vsaddu/vssub/vssubu intrinsics. We work with @rogfer01 from BSC to come out this patch. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com> Co-Authored-by: Monk Chiang <monk.chiang@sifive.com> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D93366	2020-12-18 10:24:24 +08:00
Layton Kifer	787313e887	[DAGCombiner] Improve shift by select of constant Clean up a TODO, to support folding a shift of a constant by a select of constants, on targets with different shift operand sizes. Reviewed By: RKSimon, lebedev.ri Differential Revision: https://reviews.llvm.org/D90349	2020-12-18 02:21:42 +00:00
Andrew Litteken	603148b130	[IRSim][IROutliner] Adding InstVisitor to disallow certain operations. This adds a custom InstVisitor to return false on instructions that should not be allowed to be outlined. These match the illegal instructions in the IRInstructionMapper with exception of the addition of the llvm.assume intrinsic. Tests all the tests marked: illegal-*-.ll with a test for each kind of instruction that has been marked as illegal. Reviewers: jroelofs, paquette Differential Revisions: https://reviews.llvm.org/D86976	2020-12-17 19:33:57 -06:00
Zakk Chen	a97267c772	[RISCV] Define vlse/vsse intrinsics. Define vlse/vsse intrinsics and lower to V instructions. We work with @rogfer01 from BSC to come out this patch. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D93445	2020-12-17 17:00:01 -08:00
Nikita Popov	b8f0d648f9	[DSE] Add test for potential caching bug (NFC) This one would miscompile if read-clobber checks switched to using the EarlierAccess location, but the read cache was retained.	2020-12-17 23:35:01 +01:00
Sanjay Patel	55eacfb858	[VectorCombine] add tests for gep load with cast; NFC	2020-12-17 16:40:55 -05:00
Roman Lebedev	a354d00a71	[SimplifyCFG] Teach simplifyUnreachable() to preserve DomTree Pretty boring, removeUnwindEdge() already known how to update DomTree, so if we are to call it, we must first flush our own pending updates; otherwise, we just stop predecessors from branching to us, and for certain predecessors, stop their predecessors from branching to them also.	2020-12-18 00:37:22 +03:00
Roman Lebedev	db36cde4ed	[SimplifyCFG] ConstantFoldTerminator() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a number of tests, all of which are marked as such so that they do not regress.	2020-12-18 00:37:22 +03:00
Roman Lebedev	2f6c6c6ca4	[SimplifyCFG] DeleteDeadBlock() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-18 00:37:21 +03:00
Bangtian Liu	33b4e1043e	Revert "Ensure SplitEdge to return the new block between the two given blocks" This reverts commit d20e0c3444ad9ada550d9d6d1d56fd72948ae444.	2020-12-17 21:00:37 +00:00
Nico Weber	3aa1f027dc	[gn build] Link with -Wl,--gdb-index when linking with LLD For full-debug-info (is_debug=true / symbol_level=2 builds), this makes linking 15% slower, but gdb startup 1500% faster (for lld: link time 3.9s->4.4s, gdb load time >30s->2s). For link time, I ran bench.py -o {noindex,index}.txt \ sh -c 'rm out/gn/bin/lld && ninja -C out/gn lld' and then `ministat noindex.txt index.txt`: ``` x noindex.txt + index.txt N Min Max Median Avg Stddev x 5 3.784461 4.0200169 3.8452811 3.8754988 0.089902595 + 5 4.32496 4.6058481 4.3361208 4.4141198 0.12288267 Difference at 95.0% confidence 0.538621 +/- 0.15702 13.8981% +/- 4.05161% (Student's t, pooled s = 0.107663) ``` For gdb load time I loaded the crash in PR48392 with gdb -ex r --args ../out/gn/bin/ld64.lld.darwinnew @response.txt and just stopped the time until the crash got displayed with a stopwatch a few times. So the speedup there is less precise, but it's so pronounced that that's ok (loads ~instantly with the patch, takes a very long time without it). Only doing this for LLD because I haven't tried it with other linkers. Differential Revision: https://reviews.llvm.org/D92844	2020-12-17 15:39:00 -05:00
Johannes Doerfert	7bcfe2157e	[OpenMP][NFC] Provide a new remark and documentation If a GPU function is externally reachable we give up trying to find the (unique) kernel it is called from. This can hinder optimizations. Emit a remark and explain mitigation strategies. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D93439	2020-12-17 14:38:26 -06:00
Nico Weber	5a3cb1f76f	[gn build] (manually) merge f4c8b8031800	2020-12-17 15:09:51 -05:00
Nikita Popov	2810fecbf2	[DSE] Add more tests for read clobber location (NFC)	2020-12-17 21:03:00 +01:00
Arthur Eubanks	517fe7c42b	[test] Factor out creation of copy of SCC Nodes into function Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D93434	2020-12-17 11:39:34 -08:00
Tony	062d45d53c	[NFC][AMDGPU] Reorganize description of scratch handling Differential Revision: https://reviews.llvm.org/D93440	2020-12-17 19:33:14 +00:00
Valentin Clement	328670a855	[openmp] Remove clause from OMPKinds.def and use OMP.td info Remove the OpenMP clause information from the OMPKinds.def file and use the information from the new OMP.td file. There is now a single source of truth for the directives and clauses. To avoid generate lots of specific small code from tablegen, the macros previously used in OMPKinds.def are generated almost as identical. This can be polished and possibly removed in a further patch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D92955	2020-12-17 14:08:12 -05:00
Baptiste Saleil	5cbcf5677b	[PowerPC] Rename the vector pair intrinsics and builtins to replace the _mma_ prefix by _vsx_ On PPC, the vector pair instructions are independent from MMA. This patch renames the vector pair LLVM intrinsics and Clang builtins to replace the _mma_ prefix by _vsx_ in their names. We also move the vector pair type/intrinsic/builtin tests to their own files. Differential Revision: https://reviews.llvm.org/D91974	2020-12-17 13:19:27 -05:00
LLVM GN Syncbot	13f5c4b392	[gn build] Port dae34463e3e	2020-12-17 17:28:45 +00:00
Andrew Litteken	cdd15bf157	[IRSim][IROutliner] Adding the extraction basics for the IROutliner. Extracting the similar regions is the first step in the IROutliner. Using the IRSimilarityIdentifier, we collect the SimilarityGroups and sort them by how many instructions will be removed. Each IRSimilarityCandidate is used to define an OutlinableRegion. Each region is ordered by their occurrence in the Module and the regions that are not compatible with previously outlined regions are discarded. Each region is then extracted with the CodeExtractor into its own function. We test that correctly extract in: test/Transforms/IROutliner/extraction.ll test/Transforms/IROutliner/address-taken.ll test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll Recommit of bf899e891387d07dfd12de195ce2a16f62afd5e0 fixing memory leaks. Reviewers: paquette, jroelofs, yroux Differential Revision: https://reviews.llvm.org/D86975	2020-12-17 11:27:26 -06:00
Arthur Eubanks	3dc6698f98	[gn build] Add symbol_level to adjust debug info level is_debug by default makes symbol_level = 2 and !is_debug means by default symbol_level = 0. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D92958	2020-12-17 09:20:53 -08:00
Fangrui Song	803a9c9668	[LangRef] Update new ssp/sspstrong/sspreq semantics after D91816 Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D93422	2020-12-17 09:16:37 -08:00
Valentin Clement	d6763a4688	[flang][openacc] Enforce restriction on routine directive and clauses This patch add some checks for the restriction on the routine directive and fix several issue at the same time. Validity tests have been added in a separate file than acc-clause-validity.f90 since this one became quite large. I plan to split the larger file once on-going review are done. Reviewed By: sameeranjoshi Differential Revision: https://reviews.llvm.org/D92672	2020-12-17 11:33:34 -05:00
Nabeel Omer	4ce8799006	[DebugInfo] Avoid re-ordering assignments in LCSSA The LCSSA pass makes use of a function insertDebugValuesForPHIs() to propogate dbg.value() intrinsics to newly inserted PHI instructions. Faulty behaviour occurs when the parent PHI of a newly inserted PHI is not the most recent assignment to a source variable. insertDebugValuesForPHIs ends up propagating a value that isn't the most recent assignemnt. This change removes the call to insertDebugValuesForPHIs() from LCSSA, preventing incorrect dbg.value intrinsics from being propagated. Propagating variable locations between blocks will occur later, during LiveDebugValues. Differential Revision: https://reviews.llvm.org/D92576	2020-12-17 16:17:32 +00:00
Jinsong Ji	2af4f68200	[PowerPC][NFC] Cleanup PPCCTRLoopsVerify pass The PPCCTRLoop pass has been moved to HardwareLoops, so the comments and some useless code are deprecated now. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D93336	2020-12-17 11:16:33 -05:00
Jon Chesterfield	0309056e6e	[amdgpu] Default to code object v3 [amdgpu] Default to code object v3 v4 is not yet readily available, and doesn't appear to be implemented in the back end Reviewed By: t-tye, yaxunl Differential Revision: https://reviews.llvm.org/D93258	2020-12-17 16:09:33 +00:00
Bangtian Liu	a2ec1d8ec2	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-17 16:00:15 +00:00
Amy Huang	ec802f1356	[llvm-symbolizer][Windows] Add start line when searching in line table sections. Fixes issue where if a line section doesn't start with a line number then the addresses at the beginning of the section don't have line numbers. For example, for a line section like this ``` 0001:00000010-00000014, line/column/addr entries = 1 7 00000013 ! ``` a line number wouldn't be found for addresses from 10 to 12. This matches behavior when using the DIA SDK. Differential Revision: https://reviews.llvm.org/D93306	2020-12-17 07:57:36 -08:00
Simon Pilgrim	1c482de06b	[SampleFDO] Fix uninitialized field warnings. NFCI. Seems to have been caused by D93254 which added the SecHdrTableEntry::LayoutIndex field.	2020-12-17 15:51:26 +00:00
Valentin Clement	03ce68ffad	[flang][openacc] Update serial construct clauses for OpenACC 3.1 Update the allowed clauses for the SERIAL construct for the new OpenACC 3.1 specification. Reviewed By: sameeranjoshi Differential Revision: https://reviews.llvm.org/D92123	2020-12-17 10:50:47 -05:00

1 2 3 4 5 ...

208470 Commits