llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 10:42:39 +01:00

Author	SHA1	Message	Date
Philipp Krones	d7917544a3	[Inliner] Make the CallPenalty configurable Tests with multiple benchmarks, like Embench [1], showed that the CallPenalty magic number has the most influence on inlining decisions when optimizing for size. On the other hand, there was no good default value for this parameter. Some benchmarks profited strongly from a reduced call penalty. On example is the picojpeg benchmark compiled for RISC-V, which got 6% smaller with a CallPenalty of 10 instead of 12. Other benchmarks increased in size, like matmult. This commit makes the compromise of turning the magic number constant of CallPenalty into a configurable value. This introduces the flag `--inline-call-penalty`. With that flag users can fine tune the inliner to their needs. The CallPenalty constant was also used for loops. This commit replaces the CallPenalty constant with a new LoopPenalty constant that is now used instead. This is a slimmed down version of https://reviews.llvm.org/D30899 [1]: https://github.com/embench/embench-iot Differential Revision: https://reviews.llvm.org/D105976	2021-07-26 12:07:49 +01:00
Florian Hahn	a0b07d2e54	[VPlan] Use stored value from recipes for interleave groups. Instead of getting the VPValue for the stored IR values through the current plan, use the stored value of the recipes directly. This way, the correct VPValues are used if the store recipes have been modified in the VPlan and the IR value is not correct any longer. This can happen, e.g. due to D105008.	2021-07-26 12:05:23 +01:00
Dylan Fleming	6f6b3d4f7a	[SVE] Add support for folding for select + masked loads Add folds to instcombine to support the removal of select instruction when the masked_load is guaranteed to zero the same lanes, i.e. select(mask, mload(,,mask,0), 0) -> mload(,,mask,0). Patch originally authored by @paulwalker-arm Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106376	2021-07-26 11:58:41 +01:00
Caroline Concatto	9af238dfc2	[SVE][AArch64] Improve code generation for vector_splice for Imm > 0 This patch implements vector_splice in tablegen for all cases when the Immediate is positive and lower than the known minimum value of a scalable vector. Vector_splice can be implemented using SVE instruction EXT. For instance : @llvm.experimental.vector.splice(Vector_1, Vector_2, Imm) @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> EXT Vector_1, Vector_2, Imm // Vector_1 = B, C, D + Vector_2 = E Depends on D105633 Differential Revision: https://reviews.llvm.org/D106273	2021-07-26 11:45:46 +01:00
David Sherwood	fd6a38b569	Fix test failures caused by 0aff1798b5721d5f95d16f465b99d357012bb8d1	2021-07-26 11:40:26 +01:00
Caroline Concatto	d9b910d9d2	[AArch64][SVE] Improve code generation for vector_splice for Imm == -1 This patch implements vector_splice in tablegen for: a) when the immediate is equal to -1 (Imm==1) and uses: INSR + LASTB For instance : @llvm.experimental.vector.splice(Vector_1, Vector_2, -1) @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <D, E, F, G> LAST RegLast, Vector_1 // RegLast = D INSR Res, (Vector_1 >> 1), RegLast // Res = D + E, F, G Differential Revision: https://reviews.llvm.org/D105633	2021-07-26 11:25:01 +01:00
Simon Pilgrim	96fce2fe63	[X86][AVX] Prefer vinsertf128 to vperm2f128 on AVX1 targets Splatting the lower xmm with vinsertf128 is at least as quick as vperm2f128, and a lot faster on some AMD targets. First step towards PR50053	2021-07-26 11:11:56 +01:00
Simon Pilgrim	f76b3abd62	[X86][SSE] Don't scrub address math from interleaved shuffle tests	2021-07-26 11:03:31 +01:00
Cullen Rhodes	f81ad3ab04	[AArch64][AsmParser] NFC: Parser.getTok().getLoc() -> getLoc() Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D106635	2021-07-26 09:36:34 +00:00
David Sherwood	2d2e4a1b17	[Analysis] Add simple cost model for strict (in-order) reductions I have added a new FastMathFlags parameter to getArithmeticReductionCost to indicate what type of reduction we are performing: 1. Tree-wise. This is the typical fast-math reduction that involves continually splitting a vector up into halves and adding each half together until we get a scalar result. This is the default behaviour for integers, whereas for floating point we only do this if reassociation is allowed. 2. Ordered. This now allows us to estimate the cost of performing a strict vector reduction by treating it as a series of scalar operations in lane order. This is the case when FP reassociation is not permitted. For scalable vectors this is more difficult because at compile time we do not know how many lanes there are, and so we use the worst case maximum vscale value. I have also fixed getTypeBasedIntrinsicInstrCost to pass in the FastMathFlags, which meant fixing up some X86 tests where we always assumed the vector.reduce.fadd/mul intrinsics were 'fast'. New tests have been added here: Analysis/CostModel/AArch64/reduce-fadd.ll Analysis/CostModel/AArch64/sve-intrinsics.ll Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D105432	2021-07-26 10:26:06 +01:00
Fraser Cormack	2df597c58a	[SelectionDAG] Support scalable-vector splats in yet more cases This patch extends support for (scalable-vector) splats in the DAGCombiner via the `ISD::matchBinaryPredicate` function, which enable a variety of simple combines of constants. Users of this function may now have to distinguish between `BUILD_VECTOR` and `SPLAT_VECTOR` vector operands. The way of dealing with this in-tree follows the approach added for `ISD::matchUnaryPredicate` implemented in D94501. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106575	2021-07-26 10:15:08 +01:00
Lang Hames	480ddba43e	[ORC][ORC-RT] Add initial Objective-C and Swift support to MachOPlatform. This allows ORC to execute code containing Objective-C and Swift classes and methods (provided that the language runtime is loaded into the executor).	2021-07-26 18:02:01 +10:00
Yuanfang Chen	0b2bb4e657	[Object] make SourceMgr available to MCContext during inline asm symbols collection Fixes PR51210.	2021-07-25 21:23:03 -07:00
Esme-Yi	b278eba445	[Debug-Info][llvm-dwarfdump] Don't use DW_FORM_data4/8 to encode the constants for DW_AT_data_member_location. Summary: In DWARF v3, DW_FORM_data4/8 in DW_AT_data_member_location are interpreted as location list pointers. Interpreting constants as pointers is not expected, so we use DW_FORM_udata to encode the constants. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D105687	2021-07-26 03:47:02 +00:00
Mehdi Amini	f99a775b8b	Revert "Build libSupport with -Werror=global-constructors (NFC)" This reverts commit 579cc9ad2e2db6c3f1670b9f42c2cfe67bc5722c. This breaks on Windows.	2021-07-26 03:08:26 +00:00
Mehdi Amini	77f87f1745	Build libSupport with -Werror=global-constructors (NFC) Ensure that libSupport does not carry any static global initializer. libSupport can be embedded in use cases where we don't want to load all cl::opt unless we want to parse the command line. ManagedStatic can be used to enable lazy-initialization of globals.	2021-07-26 03:04:31 +00:00
Esme-Yi	d0cd0162ac	[yaml2obj] Do not write the string table if there is no string entry. Summary: yaml2obj shouldn't create the string table that isn't needed - doing so wastes time and disk space. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D106420	2021-07-26 02:37:49 +00:00
Mehdi Amini	a60a1f2f04	Revert "Build libSupport with -Werror=global-constructors (NFC)" This reverts commit 5eb2e9aa64b7be7cd8ed7f36de19c2c9bdf1977c. This broke MacOS builds, needs to have a safer check guarding the flag addition.	2021-07-26 00:55:36 +00:00
Mehdi Amini	61041e8ca3	Build libSupport with -Werror=global-constructors (NFC) Ensure that libSupport does not carry any static global initializer. libSupport can be embedded in use cases where we don't want to load all cl::opt unless we want to parse the command line. ManagedStatic can be used to enable lazy-initialization of globals.	2021-07-26 00:21:09 +00:00
Mehdi Amini	ecdc653aaa	Remove the NotUnderValgrind caching flag The motivation for this caching wasn't clear, remove it in an effort to simplify the code and make libSupport free of global dynamic constructor. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D106206	2021-07-26 00:21:09 +00:00
Roman Lebedev	aaa4bd0e18	[SimplifyCFG] Fold branch to common dest: if branch is unpredictable, prefer to speculate This is consistent with the two other usages of prof md in this pass.	2021-07-26 02:57:19 +03:00
Roman Lebedev	519c4b9ca7	[SimplifyCFG] Don't speculatively execute BB[s] if they are predictably not taken Same as D106650, but for `FoldTwoEntryPHINode()` Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D106717	2021-07-26 02:55:15 +03:00
Roman Lebedev	cc6967667d	[SimplifyCFG] Don't speculatively execute BB if it's predictably not taken If the branch isn't `unpredictable`, and it is predicted to not branch to the block we are considering speculatively executing, then it seems counter-productive to execute the code that is predicted not to be executed. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D106650	2021-07-26 02:55:14 +03:00
Roman Lebedev	ddc2eed16c	[NFC][SimplifyCFG] Add more negative tests for profmd-induced speculation avoidance	2021-07-26 02:55:08 +03:00
Stefan Gränitz	c0784ed26d	[docs] Update release notes to mention lli JIT engine switch	2021-07-25 23:58:43 +02:00
Nico Weber	ef8d46554d	Revert "[VPlan] Add recipe for first-order rec phis, make splicing explicit." Makes clang crash: https://reviews.llvm.org/D105008#2903350 This reverts commit d2a73fb44ea0b8c981e4b923f811f18793fc4770. Also revert a minor formatting follow-up: This reverts commit 82834a673246f27a541ffcc57e0eb65b008102ef.	2021-07-25 17:39:28 -04:00
Fangrui Song	56c255e563	[LangRef] Reorder two paragraphs for comdat so that IMAGE_COMDAT_SELECT_LARGEST refers to the correct example.	2021-07-25 12:53:14 -07:00
Simon Pilgrim	a499e7c2d0	[X86][AVX] Add getBROADCAST_LOAD helper function. NFCI. Begin replacing individual getMemIntrinsicNode calls and setup (for X86ISD::VBROADCAST_LOAD + X86ISD::SUBV_BROADCAST_LOAD opcodes) with this getBROADCAST_LOAD helper.	2021-07-25 20:37:58 +01:00
Joseph Huber	daea2dd14a	[OpenMP] Introduce RAII to protect certain RTL calls from DCE This patch introduces a new RAII struct that will temporarily make an OpenMP RTL function have external linkage. This is done before the attributor is invoked to prevent it from incorrectly removing some function definitions that we will use later. For example, if we determine all calls to one function are dead, because it has internal linkage it can safely be removed. Later when we try to get an instance to that function to modify the source using `getOrCreateRuntimeFunction` we will then get an empty declaration for that function that won't be defined anywhere. This patch prevents this from occurring. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106707	2021-07-25 14:15:47 -04:00
Roman Lebedev	549bfa55a4	[NFC][Codegen][X86] Improve test coverage for insertions into XMM vector	2021-07-25 21:08:03 +03:00
Kyungwoo Lee	e13913351b	[AArch64] Fix Local Deallocation for Homogeneous Prolog/Epilog The stack adjustment for local deallocation was incorrectly ported. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D106760	2021-07-25 10:51:11 -07:00
Simon Pilgrim	4b99bcdef6	[X86][SSE] LowerRotate - perform modulo on the amount splat source directly. If the rotation amount is a known splat, perform the modulo on the splat source, and then perform the splat. That way the amount-extension performed later by LowerScalarVariableShift can fold the splats away without any multiple-use issues. Fixes one of the concerns raised on D104156	2021-07-25 17:30:32 +01:00
Nikita Popov	ed5a269ff1	[Attributes] Clean up handling of UB implying attributes (NFC) Rather than adding methods for dropping these attributes in various places, add a function that returns an AttrBuilder with these attributes, which can then be used with existing methods for dropping attributes. This is with an eye on D104641, which also needs to drop them from returns, not just parameters. Also be more explicit about the semantics of the method in the documentation. Refer to UB rather than Undef, which is what this is actually about.	2021-07-25 18:21:13 +02:00
Nikita Popov	adc5107b73	[Attributes] Remove nonnull from UB-implying attributes From LangRef: > if the parameter or return pointer is null, poison value is > returned or passed instead. The nonnull attribute should be > combined with the noundef attribute to ensure a pointer is not > null or otherwise the behavior is undefined. Dropping noundef is sufficient to prevent UB. Including nonnull in this method just muddies the semantics.	2021-07-25 18:07:31 +02:00
Simon Pilgrim	4c407a3b35	Revert rG939291041bb35b8088e3b61be2b8b3bc950f64a7 "[AMDGPU] Regenerate wave32.ll test checks" This still breaks buildbots	2021-07-25 15:59:26 +01:00
Nico Weber	e2e1559a48	[JITLink][RISCV] Run new test from 0ad562b48 only if the RISCV backend is enabled	2021-07-25 10:47:26 -04:00
Krishna Kariya	6d0cc981cb	[InstCombine] Fix PR47960 - Incorrect transformation of fabs with nnan flag Bug Fix for PR: https://llvm.org/PR47960 This patch makes sure that the fast math flag used in the 'select' instruction is the same as the 'fabs' instruction after the transformation. Differential Revision: https://reviews.llvm.org/D101727	2021-07-25 10:43:33 -04:00
Roman Lebedev	5c9c7cf653	[NFC][Codegen][X86] Improve test coverage for repeated insertions of the same scalar into different elements	2021-07-25 17:37:04 +03:00
Simon Pilgrim	bbcc64cd75	[AMDGPU] Regenerate wave32.ll test checks To simplify diff in future patch	2021-07-25 15:13:09 +01:00
Simon Pilgrim	60858d3f0b	[AMDGPU] Regenerate mul24 test checks To simplify diffs in future patch	2021-07-25 15:13:09 +01:00
Sanjay Patel	f3223b51e0	[x86] improve CMOV codegen by pushing add into operands, part 2 This is a minimum extension of D106607 to allow folding for 2 non-zero constantsi that can be materialized as immediates.. In the reduced test examples, we save 1 instruction by rolling the constants into LEA/ADD. In the motivating test from the bullet benchmark, we absorb both of the constant moves into add ops via LEA magic, so we reduce by 2 instructions. Differential Revision: https://reviews.llvm.org/D106684	2021-07-25 10:05:41 -04:00
Kazu Hirata	697f5408f6	[GlobalISel] Remove FlagsOp (NFC) The class was introduced without a use on Dec 11, 2018 in commit cef44a234219e38e1c28c902ff24586150eef682.	2021-07-25 07:05:07 -07:00
Kazu Hirata	3e06079a92	[Inline] Fix a warning by removing an explicit copy constructor This patches fixes the warning: llvm/include/llvm/Analysis/InlineCost.h:62:3: error: definition of implicit copy assignment operator for 'CostBenefitPair' is deprecated because it has a user-declared copy constructor [-Werror,-Wdeprecated-copy] by removing the explicit copy constructor.	2021-07-25 06:56:47 -07:00
Simon Pilgrim	2aaf1231f6	[X86][AVX] Adjust AllowBWIVPERMV3 tolerance to account for VariableCrossLaneShuffleDepth As noticed on D105390 - we were hardwiring the depth limit for combining to VPERMI2W/VPERMI2B instructions. Not only had we made the limit too low, we hadn't accounted for slow/fast shuffles via the VariableCrossLaneShuffleDepth control	2021-07-25 14:05:11 +01:00
Simon Pilgrim	fcfcb73f87	[AMDGPU] Regenerate global-load-saddr-to-vaddr test checks To simplify diff in future patch	2021-07-25 14:05:10 +01:00
Simon Pilgrim	2561fe6c14	[AMDGPU] Regenerate ctpop16 test checks To simplify diff in future patch	2021-07-25 14:05:09 +01:00
Simon Pilgrim	a47cd122f1	[AMDGPU] Regenerate half test checks To simplify diff in future patch	2021-07-25 14:05:08 +01:00
Simon Pilgrim	324c7c98e4	[AMDGPU] Regenerate anyext test checks To simplify diff in future patch	2021-07-25 14:05:08 +01:00
Liqiang Tao	4952863892	[llvm][Inline] Add interface to return cost-benefit stuff Return cost-benefit stuff which is computed by cost-benefit analysis. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D105349	2021-07-25 20:18:19 +08:00
Amara Emerson	bd7a7d5ff0	[AArch64][GlobalISel] Widen non-pow-2 types for shifts before clamping. For types like s96, we don't want to clamp to s64, we want to first widen to s128 and then narrow it. Otherwise we end up with impossible to legalize types.	2021-07-24 15:50:43 -07:00

... 3 4 5 6 7 ...

219356 Commits