llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 02:52:53 +02:00

Author	SHA1	Message	Date
Bardia Mahjour	b7eee47753	Revert "[LV] Epilogue Vectorization with Optimal Control Flow" This reverts commit 9c5504adceb544d9954ddb8ff3035a414f4b1423. Reverting to investigate build failure in http://lab.llvm.org:8011/#/builders/98/builds/1461/steps/9	2020-12-01 12:50:36 -05:00
Rahman Lavaee	1916d7b3c4	Let .llvm_bb_addr_map section use the same unique id as its associated .text section. Currently, `llvm_bb_addr_map` sections are generated per section names because we use the `LinkedToSymbol` argument of getELFSection. This will cause the address map tables of functions grouped into the same section when `-function-sections=true -unique-section-names=false` which is not the intended behaviour. This patch lets the unique id of every `.text` section propagate to the associated `.llvm_bb_addr_map` section. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D92113	2020-12-01 09:21:00 -08:00
Nikita Popov	f105ee91d5	[BasicAA] Add test for suboptimal result with unknown sizes (NFC)	2020-12-01 18:20:34 +01:00
Bardia Mahjour	63b138b338	[LV] Epilogue Vectorization with Optimal Control Flow This is yet another attempt at providing support for epilogue vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none and reviews D30247 and D88819. Similar to D88819, this patch achieve epilogue vectorization by executing a single vplan twice: once on the main loop and a second time on the epilogue loop (using a different VF). However it's able to handle more loops, and generates more optimal control flow for cases where the trip count is too small to execute any code in vector form. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D89566	2020-12-01 12:04:29 -05:00
Nikita Popov	4955e186d9	[MemCpyOpt] Port to MemorySSA This is a straightforward port of MemCpyOpt to MemorySSA following the approach of D26739. MemDep queries are replaced with MSSA queries without changing the overall structure of the pass. Some care has to be taken to account for differences between these APIs (MemDep also returns reads, MSSA doesn't). Differential Revision: https://reviews.llvm.org/D89207	2020-12-01 17:57:41 +01:00
Fangrui Song	214c071c8d	[X86] Support modifier @PLTOFF for R_X86_64_PLTOFF64 `gcc -mcmodel=large` can emit @PLTOFF. Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D92294	2020-12-01 08:39:01 -08:00
Juneyoung Lee	37d6671aa8	[InstSimplify] Add tests that fold instructions with poison operands (NFC)	2020-12-02 01:01:59 +09:00
Clement Courbet	c96423a30c	[MergeICmps] Fix missing split. We were not correctly splitting a blocks for chains of length 1. Before that change, additional instructions for blocks in chains of length 1 were not split off from the block before removing (this was done correctly for chains of longer size). If this first block contained an instruction referenced elsewhere, deleting the block, would result in invalidation of the produced value. This caused a miscompile which motivated D92297 (before D17993, nonnull and dereferenceable attributed were not added so MergeICmps were not triggered.) The new test gep-references-bb.ll demonstrate the issue. The regression was introduced in rG0efadbbcdeb82f5c14f38fbc2826107063ca48b2. This supersedes D92364. Test case by MaskRay (Fangrui Song). Differential Revision: https://reviews.llvm.org/D92375	2020-12-01 16:50:55 +01:00
Sanjay Patel	47c90fc42e	[x86] adjust cost model values for minnum/maxnum with fast-math-flags Without FMF, we lower these intrinsics into something like this: vmaxsd %xmm0, %xmm1, %xmm2 vcmpunordsd %xmm0, %xmm0, %xmm0 vblendvpd %xmm0, %xmm1, %xmm2, %xmm0 But if we can ignore NANs, the single min/max instruction is enough because there is no need to fix up the x86 logic that corresponds to X > Y ? X : Y. We probably want to make other adjustments for FP intrinsics with FMF to account for specialized codegen (for example, FSQRT). Differential Revision: https://reviews.llvm.org/D92337	2020-12-01 10:45:53 -05:00
Benjamin Kramer	465a434844	[DAG] Remove unused variable. NFC.	2020-12-01 16:29:02 +01:00
David Green	072f29e65f	[ARM] Mark select and selectcc of MVE vector operations as expand. We already expand select and select_cc in codegenprepare, but they can still be generated under some situations. Explicitly mark them as expand to ensure they are not produced, leading to a failure to select the nodes. Differential Revision: https://reviews.llvm.org/D92373	2020-12-01 15:05:55 +00:00
Sanjay Patel	b8c83b9595	[InstCombine] canonicalize sign-bit-shift of difference to ext(icmp) icmp is the preferred spelling in IR because icmp analysis is expected to be better than any other analysis. This should lead to more follow-on folding potential. It's difficult to say exactly what we should do in codegen to compensate. For example on AArch64, which of these is preferred: sub w8, w0, w1 lsr w0, w8, #31 vs: cmp w0, w1 cset w0, lt If there are perf regressions, then we should deal with those in codegen on a case-by-case basis. A possible motivating example for better optimization is shown in: https://llvm.org/PR43198 but that will require other transforms before anything changes there. Alive proof: https://rise4fun.com/Alive/o4E Name: sign-bit splat Pre: C1 == (width(%x) - 1) %s = sub nsw %x, %y %r = ashr %s, C1 => %c = icmp slt %x, %y %r = sext %c Name: sign-bit LSB Pre: C1 == (width(%x) - 1) %s = sub nsw %x, %y %r = lshr %s, C1 => %c = icmp slt %x, %y %r = zext %c	2020-12-01 09:58:11 -05:00
Simon Pilgrim	285c17bd3f	[DAG] Move vselect(icmp_ult, 0, sub(x,y)) -> usubsat(x,y) to DAGCombine (PR40111) Move the X86 VSELECT->USUBSAT fold to DAGCombiner - there's nothing target specific about these folds.	2020-12-01 14:25:29 +00:00
Florian Hahn	758cff690d	[ConstraintElimination] Decompose GEP %ptr, ZEXT(SHL()). Add support to decompose a GEP with a ZEXT(SHL()) operand.	2020-12-01 14:23:21 +00:00
Kazushi (Jam) Marukawa	bfd3c64d85	[VE] Add vmul and vdiv intrinsic instructions Add vmul and vdiv intrinsic instructions and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92377	2020-12-01 23:03:49 +09:00
Simon Pilgrim	307dd79679	[X86] Add PR48223 usubsat test case	2020-12-01 13:57:08 +00:00
Bhramar Vatsa	2a0264f0ea	[InstCombine] Optimize away the unnecessary multi-use sign-extend C.f. https://bugs.llvm.org/show_bug.cgi?id=47765 Added a case for handling the sign-extend (Shl+AShr) for multiple uses, to optimize it away for an individual use, when the demanded bits aren't affected by sign-extend. https://rise4fun.com/Alive/lgf Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D91343	2020-12-01 16:54:00 +03:00
Roman Lebedev	b2bda36752	[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold, 2 If the shift amount was undef for some lane, the shift amount in opposite shift is irrelevant for that lane, and the new shift amount for that lane can be undef.	2020-12-01 16:54:00 +03:00
Sanjay Patel	e356a6cdf3	[InstCombine] add tests for sign-bit-shift-of-sub; NFC	2020-12-01 08:01:00 -05:00
Hans Wennborg	678f3b2a4a	Remove rm -f cortex-a57-misched-mla.s; hopefully the bots have all cycled past it now	2020-12-01 13:50:49 +01:00
Roman Lebedev	bd4ad50d2c	Revert "[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold" It seems i have missed checklines, temporairly reverting, will reland momentairly.. This reverts commit aa1aa135097ecfab6d9917a435142030eff0a226.	2020-12-01 15:47:04 +03:00
Roman Lebedev	11a8c6ce51	[NFC][InstCombine] sext.ll: @test9: avoid only differently-cased names for values and block names	2020-12-01 15:33:12 +03:00
Roman Lebedev	0e49bd7b55	[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold If the shift amount was undef for some lane, the shift amount in opposite shift is irrelevant for that lane, and the new shift amount for that lane can be undef.	2020-12-01 15:13:08 +03:00
Roman Lebedev	c8f6a30ae8	[NFC][InstCombine] Improve vector undef test coverage for sext(ashr(shl(trunc()))) fold	2020-12-01 15:13:07 +03:00
Roman Lebedev	28c20628bd	[InstCombine] Evaluate new shift amount for sext(ashr(shl(trunc()))) fold in wide type (PR48343) It is not correct to compute that new shift amount in it's narrow type and only then extend it into the wide type: ---------------------------------------- Optimization: PR48343 good Precondition: (width(%X) == width(%r)) %o0 = trunc %X %o1 = shl %o0, %Y %o2 = ashr %o1, %Y %r = sext %o2 => %n0 = sext %Y %n1 = sub width(%o0), %n0 %n2 = sub width(%X), %n1 %n3 = shl %X, %n2 %r = ashr %n3, %n2 Done: 2016 Optimization is correct! ---------------------------------------- Optimization: PR48343 bad Precondition: (width(%X) == width(%r)) %o0 = trunc %X %o1 = shl %o0, %Y %o2 = ashr %o1, %Y %r = sext %o2 => %n0 = sub width(%o0), %Y %n1 = sub width(%X), %n0 %n2 = sext %n1 %n3 = shl %X, %n2 %r = ashr %n3, %n2 Done: 1 ERROR: Domain of definedness of Target is smaller than Source's for i9 %r Example: %X i9 = 0x000 (0) %Y i4 = 0x3 (3) %o0 i4 = 0x0 (0) %o1 i4 = 0x0 (0) %o2 i4 = 0x0 (0) %n0 i4 = 0x1 (1) %n1 i4 = 0x8 (8, -8) %n2 i9 = 0x1F8 (504, -8) %n3 i9 = 0x000 (0) Source value: 0x000 (0) Target value: undef I.e. we should be computing it in the wide type from the beginning. Fixes https://bugs.llvm.org/show_bug.cgi?id=48343	2020-12-01 15:13:07 +03:00
Roman Lebedev	fe22e8de9e	[NFC][InstCombine] Add PR48343 miscompiled testcase	2020-12-01 15:13:07 +03:00
Roman Lebedev	9af6393c0a	[NFC][InstCombine] Autogenerate sext.ll test checklines	2020-12-01 15:13:06 +03:00
Roman Lebedev	9cb07703fc	[SimplifyCFG] FoldBranchToCommonDest: don't require that cmp of br is last instruction There is no correctness need for that, and since we allow live-out uses, this could theoretically happen, because currently nothing will move the cond to right before the branch in those tests. But regardless, lifting that restriction even makes the transform easier to understand. This makes the transform happen in 81 more cases (+0.55%) )	2020-12-01 15:13:06 +03:00
Roman Lebedev	aa4c647bc8	[NFC][SimplifyCFG] fold-branch-to-common-dest: add tests with cond of br not being the last op	2020-12-01 15:13:05 +03:00
Simon Pilgrim	2867c54817	[DAG] Move vselect(icmp_ult, -1, add(x,y)) -> uaddsat(x,y) to DAGCombine (PR40111) Move the X86 VSELECT->UADDSAT fold to DAGCombiner - there's nothing target specific about these folds. The SSE42 test diffs are relatively benign - its avoiding an extra constant load in exchange for an extra xor operation - there are extra register moves, which is annoying as all those operations should commute them away. Differential Revision: https://reviews.llvm.org/D91876	2020-12-01 11:56:26 +00:00
Cullen Rhodes	c640adbe73	[LV] Clamp VF hint when unsafe In the following loop the dependence distance is 2 and can only be vectorized if the vector length is no larger than this. void foo(int a, int b, int N) { #pragma clang loop vectorize(enable) vectorize_width(4) for (int i=0; i<N; ++i) { a[i + 2] = a[i] + b[i]; } } However, when specifying a VF of 4 via a loop hint this loop is vectorized. According to [1][2], loop hints are ignored if the optimization is not safe to apply. This patch introduces a check to bail of vectorization if the user specified VF is greater than the maximum feasible VF, unless explicitly forced with '-force-vector-width=X'. [1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave [2] https://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations Reviewed By: sdesmalen, fhahn, Meinersbur Differential Revision: https://reviews.llvm.org/D90687	2020-12-01 11:30:34 +00:00
Simon Pilgrim	32a49915a1	[InstCombine][X86] Fold addsub intrinsic to fadd/fsub depending on demanded elts (PR46277)	2020-12-01 11:27:40 +00:00
Caroline Concatto	319a0490a1	[NFC][CostModel]Extend class IntrinsicCostAttributes to use ElementCount Type This patch replaces the attribute `unsigned VF` in the class IntrinsicCostAttributes by `ElementCount VF`. This is a non-functional change to help upcoming patches to compute the cost model for scalable vector inside this class. Differential Revision: https://reviews.llvm.org/D91532	2020-12-01 11:12:51 +00:00
Florian Hahn	5168f3f070	[ConstraintElimination] Decompose GEP %ptr, SHL(). Add support the decompose a GEP with an SHL operand.	2020-12-01 10:58:36 +00:00
Kazushi (Jam) Marukawa	60cc012b65	[VE] Add vadd and vsub intrinsic instructions Add vadd and vsub intrinsic instructions and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92332	2020-12-01 19:57:22 +09:00
Simon Pilgrim	e202dff3a4	[InstCombine][X86] Add test coverage showing failure to simplify addsub intrinsics to fadd/fsub If we only use odd/even lanes then we just need fadd/fsub ops	2020-12-01 10:49:43 +00:00
Sjoerd Meijer	c00b31be29	ExtractValue instruction costs Instruction ExtractValue wasn't handled in LoopVectorizationCostModel::getInstructionCost(). As a result, it was modeled as a mul which is not really accurate. Since it is free (most of the times), this now gets a cost of 0 using getInstructionCost. This is a follow-up of D92208, that required changing this regression test. In a follow up I will look at InsertValue which also isn't handled yet. Differential Revision: https://reviews.llvm.org/D92317	2020-12-01 10:42:23 +00:00
David Green	de6fd49a53	[AArch64] Update pass pipeline test. NFC	2020-12-01 10:40:04 +00:00
David Green	a65007720f	[ARM] PREDICATE_CAST demanded bits The PREDICATE_CAST node is used to model moves between MVE predicate registers and gpr's, and eventually become a VMSR p0, rn. When moving to a predicate only the bottom 16 bits of the sources register are demanded. This adds a simple fold for that, allowing it to potentially remove instructions like uxth. Differential Revision: https://reviews.llvm.org/D92213	2020-12-01 10:32:24 +00:00
Jay Foad	54cad475e3	[AMDGPU] Simplify some generation checks. NFC.	2020-12-01 10:15:32 +00:00
Hans Wennborg	2f9caad35e	[gn build] Manually merge 40659cd	2020-12-01 11:15:05 +01:00
Georgii Rymar	59f342a322	[obj2yaml] - Teach tool to emit the "SectionHeaderTable" key and sort sections by file offset. Currently when we dump sections, we dump them in the order, which is specified in the sections header table. With that the order in the output might not match the order in the file. This patch starts sorting them by by file offsets when dumping. When the order in the section header table doesn't match the order in the file, we should emit the "SectionHeaderTable" key. This patch does it. Differential revision: https://reviews.llvm.org/D91249	2020-12-01 12:59:15 +03:00
Georgii Rymar	d41ff4ca02	[llvm-readobj][test] - Merge 2 test cases together. This merges `invalid-attr-section-size.test` and `invalid-attr-version.test` into `invalid-attributes-sec.test`. This allows to have a single place where other related test cases can be added. Differential revision: https://reviews.llvm.org/D92316	2020-12-01 12:51:07 +03:00
Georgii Rymar	f4dc2c0d44	[llvm-readobj] - Introduce `ObjDumper::reportUniqueWarning(const Twine &Msg)`. This introduces the overload for `reportUniqueWarning` which allows to avoid using `createError` in many places. Differential revision: https://reviews.llvm.org/D92371	2020-12-01 12:36:44 +03:00
Jan Svoboda	63e7cdb281	[clang][cli] Port DependencyOutput option flags to new option parsing system Depends on D91861. Reviewed By: dexonsmith Original patch by Daniel Grumberg. Differential Revision: https://reviews.llvm.org/D83694	2020-12-01 10:36:12 +01:00
Jan Svoboda	8ff798c583	[clang][cli] Port Frontend option flags to new option parsing system Depends on D91861. Reviewed By: dexonsmith Original patch by Daniel Grumberg. Differential Revision: https://reviews.llvm.org/D83697	2020-12-01 10:02:08 +01:00
Jan Svoboda	5bc4c8d4e4	[clang][cli] Split DefaultAnyOf into a default value and ImpliedByAnyOf This makes the options API composable, allows boolean flags to imply non-boolean values and makes the code more logical (IMO). Differential Revision: https://reviews.llvm.org/D91861	2020-12-01 09:50:11 +01:00
Kristof Beyls	27280e2609	collect_and_build_with_pgo.py: adapt to monorepo Differential Revision: https://reviews.llvm.org/D92328	2020-12-01 09:16:12 +01:00
Georgii Rymar	950e7e96bb	[llvm-readelf] - Switch to using from `reportWarning` to `reportUniqueWarning` in `DynRegionInfo`. This is a part of the plan we had previously to convert all calls to `reportUniqueWarning` and then rename it to just `reportWarning`. I was a bit unsure about this particular change at first, because it doesn't add a new functionality: seems it is impossible to trigger a warning duplication currently. At the same time I find the idea of the plan mentioned very reasonable. And with that we will be sure that `DynRegionInfo` can't report duplicate warnings, what looks like a nice feature for possible refactorings and further tool development. Differential revision: https://reviews.llvm.org/D92224	2020-12-01 11:09:30 +03:00
Georgii Rymar	37259d46e0	[llvm-readelf/obj] - Move unique warning handling logic to the `ObjDumper`. This moves the `reportUniqueWarning` method to the base class. My motivation is the following: I've experimented with replacing `reportWarning` calls with `reportUniqueWarning` in ELF dumper. I've found that for example for removing them from `DynRegionInfo` helper class, it is worth to pass a dumper instance to it (to be able to call dumper()->reportUniqueWarning()). The problem was that `ELFDumper<ELFT>` is a template class. I had to make `DynRegionInfo` to be templated and do lots of minor changes everywhere what did not look reasonable/nice. At the same time I guess one day other dumpers like COFF/MachO/Wasm etc might want to start using `reportUniqueWarning` API too. Then it looks reasonable to move the logic to the base class. With that the problem of passing the dumper instance will be gone. Differential revision: https://reviews.llvm.org/D92218	2020-12-01 10:53:00 +03:00

1 2 3 4 5 ...

207569 Commits