llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 02:52:53 +02:00

Author	SHA1	Message	Date
Stanislav Mekhanoshin	be34997c89	Move implementation of isAssumeLikeIntrinsic into IntrinsicInst This is remove dependency on ValueTracking in the future patch. Differential Revision: https://reviews.llvm.org/D96079	2021-02-11 11:41:34 -08:00
Snehasish Kumar	4a5bac2f05	[CodeGen] Split out cold exception handling pads. Support for splitting exception handling pads was added in D73739. This change updates the code to split out exception handling pads if profile information indicates that they are cold. For a given function with multiple landind pads, if one of them is hot they are all retained as part of the hot code section. Differential Revision: https://reviews.llvm.org/D96372	2021-02-11 11:23:43 -08:00
Adrian Prantl	b8b6f85b4f	llvm-dwarfdump: fix the counting when printing DW_OP_entry_value The block size is in bytes, and not number of operands. Differential Revision: https://reviews.llvm.org/D96472	2021-02-11 11:17:04 -08:00
Snehasish Kumar	7744abeec1	[CodeGen] Basic block sections should take precendence over splitting. The use of basic block sections should take precedence over the machine function splitting pass. Since they use the same underlying mechanism they are kept exclusive. Updated the tests to check that split machine functions is overridden by all flavours of basic block sections. Differential Revision: https://reviews.llvm.org/D96392	2021-02-11 11:14:10 -08:00
Matt Arsenault	0ac1411820	AMDGPU: Restrict soft clause bundling at half of the available regs Fixes a testcase that was overcommitting large register tuples to a bundle, which the register allocator could not possibly satisfy. This was producing a bundle which used nearly all of the available SGPRs with a series of 16-dword loads (not all of which are freely available to use). This is a quick hack for some deeper issues with how the clause bundler tracks register pressure. Overall the pressure tracking used here doesn't make sense and is too imprecise for what it needs to avoid the allocator failing. The pressure estimate does not account for the alignment requirements of large SGPR tuples, so this was really underestimating the pressure impact. This also ignores the impact of the extended live range of the use registers after the bundle is introduced. Additionally, it didn't account for some wide tuples not being available due to reserved registers. This regresses a few cases. These end up introducing more spilling. This is also a function of the global pressure being used in the decision to bundle, not the local pressure impact of the bundle itself.	2021-02-11 14:08:59 -05:00
Philip Reames	e71408afd4	[tests] Precommit tests for D96440	2021-02-11 10:48:07 -08:00
Sanjay Patel	ad68f10c06	[InstCombine] add tests for disguised mul ops; NFC	2021-02-11 13:39:52 -05:00
Yonghong Song	18184b22d0	BPF: Add LLVMAnalysis in CMakefile LINK_COMPONENTS buildbot reported a build error like below: BPFTargetMachine.cpp:(.text._ZN4llvm19TargetTransformInfo5ModelINS_10BPFTTIImplEED2Ev [_ZN4llvm19TargetTransformInfo5ModelINS_10BPFTTIImplEED2Ev]+0x14): undefined reference to `llvm::TargetTransformInfo::Concept::~Concept()' lib/Target/BPF/CMakeFiles/LLVMBPFCodeGen.dir/BPFTargetMachine.cpp.o: In function `llvm::TargetTransformInfo::Model<llvm::BPFTTIImpl>::~Model()': Commit a260ae716030 ("BPF: Implement TTI.IntImmCost() properly") added TargetTransformInfo to BPF, which requires LLVMAnalysis dependence. In certain cmake configurations, lacking explicit LLVMAnalysis dependency may cause compilation error. Similar to other targets, this patch added LLVMAnalysis in CMakefile LINK_COMPONENTS explicitly.	2021-02-11 10:24:22 -08:00
Michael Kruse	4625d98371	Revert "[AssumptionCache] Avoid dangling llvm.assume calls in the cache" This reverts commit b7d870eae7fdadcf10d0f177faa7409c2e37d776 and the subsequent fix "[Polly] Fix build after AssumptionCache change (D96168)" (commit e6810cab09fcbc87b6e5e4d226de0810e2f2ea38). It caused indeterminism in the output, such that e.g. the polly-x86_64-linux buildbot failed accasionally.	2021-02-11 12:17:38 -06:00
David Green	0f9807e238	[ARM] Single source vmovnt tests. NFC	2021-02-11 17:50:11 +00:00
Jay Foad	2f91cc65b2	[AMDGPU] Better selection of base offset when merging DS reads/writes When merging a pair of DS reads or writes needs to materialize the base offset in a vgpr, choose a value that is aligned to as high a power of two as possible. This maximises the chance that different pairs can use the same base offset, in which case the base offset registers can be commoned up by MachineCSE. Differential Revision: https://reviews.llvm.org/D96421	2021-02-11 17:46:09 +00:00
Craig Topper	32e1c9e6be	[TargetLowering][RISCV][AArch64][PowerPC] Enable BuildUDIV/BuildSDIV on illegal types before type legalization if we can find a larger legal type that supports MUL. If we wait until the type is legalized, we'll lose information about the orginal type and need to use larger magic constants. This gets especially bad on RISCV64 where i64 is the only legal type. I've limited this to simple scalar types so it only works for i8/i16/i32 which are most likely to occur. For more odd types we might want to do a small promotion to a type where MULH is legal instead. Unfortunately, this does prevent some urem/srem+seteq matching since that still require legal types. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96210	2021-02-11 09:43:13 -08:00
Nico Weber	a74bdbb4a4	[gn build] port ed98676fa483	2021-02-11 12:41:29 -05:00
Stella Stamenova	51ae9734a2	Support multi-configuration generators correctly in several config files Multi-configuration generators (such as Visual Studio and Xcode) allow the specification of a build flavor at build time instead of config time, so the lit configuration files need to support that - and they do for the most part. There are several places that had one of two issues (or both!): 1) Paths had %(build_mode)s set up, but then not configured, resulting in values that would not work correctly e.g. D:/llvm-build/%(build_mode)s/bin/dsymutil.exe 2) Paths did not have %(build_mode)s set up, but instead contained $(Configuration) (which is the value for Visual Studio at configuration time, for Xcode they would have had the equivalent) e.g. "D:/llvm-build/$(Configuration)/lib". This seems to indicate that we still have a lot of fragility in the configurations, but also that a number of these paths are never used (at least on Windows) since the errors appear to have been there a while. This patch fixes the configurations and it has been tested with Ninja and Visual Studio to generate the correct paths. We should consider removing some of these settings altogether. Reviewed By: JDevlieghere, mehdi_amini Differential Revision: https://reviews.llvm.org/D96427	2021-02-11 09:32:20 -08:00
Florian Hahn	fee71778a7	[LV] Add tests showing suboptimal vectorization for narrow types. This patch adds additional test cases showing missing/sub-optimal vectorization for loops which contain small and wider memory ops on AArch64.	2021-02-11 17:24:28 +00:00
Craig Topper	5d204c33c0	[RISCV] Add support loads, stores, and splats of vXi1 fixed vectors. This refines how we determine which masks types are legal and adds support for loads, stores, and all ones/zeros splats. I left a fixme in store handling where I think we need to zero extra bits if the type isn't a multiple of a byte. If I remember right from X86 there was some case we could have a store of a 1, 2, or 4 bit mask and have a scalar zextload that then expected the bits to be 0. Its tricky to zero the bits with RVV. We need to do something like round VL up, zero a register, lower the VL back down, then do a tail undisturbed move into the zero register. Another option might be to generate a mask of 1/2/4 bits set with a VL of 8 and use that to mask off the bits. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96468	2021-02-11 09:13:16 -08:00
Simon Pilgrim	992503511c	[DAG] foldLogicOfSetCCs - Generalize and/or (setcc X, CMax, ne), (setcc X, CMin, ne/eq) fold. NFCI. Prep work to add support for non-uniform vectors - replace APInt values with using the SDValue ops directly.	2021-02-11 17:09:01 +00:00
Nico Weber	741d1ffc50	[gn build] Port 7e3b9aba609f	2021-02-11 11:54:51 -05:00
Yonghong Song	d55275bb4b	BPF: Implement TTI.IntImmCost() properly This patch implemented TTI.IntImmCost() properly. Each BPF insn has 32bit immediate space, so for any immediate which can be represented as 32bit signed int, the cost is technically free. If an int cannot be presented as a 32bit signed int, a ld_imm64 instruction is needed and a TCC_Basic is returned. This change is motivated when we observed that several bpf selftests failed with latest llvm trunk, e.g., #10/16 strobemeta.o:FAIL #10/17 strobemeta_nounroll1.o:FAIL #10/18 strobemeta_nounroll2.o:FAIL #10/19 strobemeta_subprogs.o:FAIL #96 snprintf_btf:FAIL The reason of the failure is due to that SpeculateAroundPHIsPass did aggressive transformation which alters control flow for which currently verifer cannot handle well. In llvm12, SpeculateAroundPHIsPass is not called. SpeculateAroundPHIsPass relied on TTI.getIntImmCost() and TTI.getIntImmCostInst() for profitability analysis. This patch implemented TTI.getIntImmCost() properly for BPF backend which also prevented transformation which caused the above test failures. Differential Revision: https://reviews.llvm.org/D96448	2021-02-11 08:35:25 -08:00
Alex Hoppen	8769fef07f	[Timer] On macOS count number of executed instructions In addition to wall time etc. this should allow us to get less noisy values for time measurements. Reviewed By: JDevlieghere Differential Revision: https://reviews.llvm.org/D96049	2021-02-11 17:26:37 +01:00
David Green	91764313a1	[ARM] Add CostKind to getMVEVectorCostFactor. This adds the CostKind to getMVEVectorCostFactor, so that it can automatically account for CodeSize costs, where it returns a cost of 1 not the MVEFactor used for Throughput/Latency. This helps simplify the caller code and allows us to get the codesize cost more correct in more cases.	2021-02-11 15:33:59 +00:00
Thomas Preud'homme	1e8cc1428d	Improve STRICT_FSETCC codegen in absence of no NaN As for SETCC, use a less expensive condition code when generating STRICT_FSETCC if the node is known not to have Nan. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D91972	2021-02-11 14:19:43 +00:00
Joe Ellis	266e9878b3	[DebugInfo] Only perform TypeSize -> unsigned cast when necessary This commit moves a line in SelectionDAGBuilder::handleDebugValue to avoid implicitly casting a TypeSize object to an unsigned earlier than necessary. It was possible that we bail out of the loop before the value is ever used, which means we could create a superfluous TypeSize warning. Reviewed By: DavidTruby Differential Revision: https://reviews.llvm.org/D96423	2021-02-11 13:54:09 +00:00
Simon Tatham	84aee28dc2	[ARM] Copy-paste error in ARMv87a architecture definition. In the tablegen architecture definition, the Name field for the ARMv87a record read "ARMv86a". All the other records contain their own names. Corrected it to "ARMv87a", and added the necessary value in ARMArchEnum for that to refer to. Reviewed By: pratlucas Differential Revision: https://reviews.llvm.org/D96493	2021-02-11 13:35:56 +00:00
Max Kazantsev	72c2b2ad9e	Return "[Codegenprepare][X86] Use usub with overflow opt for IV increment" The patch did not account for one corner case where cmp does not dominate the loop latch. This patch adds this check, hopefully it's cheap because the CFG does not change during the transform, so DT queries should be executed quickly. If you see compile time slowness from this, please revert. Differential Revision: https://reviews.llvm.org/D96119	2021-02-11 19:49:23 +07:00
Nico Weber	e16edde8ce	[gn build] Port b4993cf54d7f	2021-02-11 07:20:21 -05:00
Max Kazantsev	f390ae2d0f	[Test] Add test that exposed failure on reverted patch in codegen	2021-02-11 19:16:55 +07:00
David Green	71e1b7feea	[ARM] Change getScalarizationOverhead overload used in gather costs. NFC This changes which of the getScalarizationOverhead overloads is used in the gather/scatter cost to use the base variant directly, not relying on the version using heuristics on the number of args with no args provided. It should still produce the same costs for scalarized gathers/scatters.	2021-02-11 11:58:55 +00:00
Carl Ritson	fb4b457dbd	[AMDGPU] Move kill lowering to WQM pass and add live mask tracking Move implementation of kill intrinsics to WQM pass. Add live lane tracking by updating a stored exec mask when lanes are killed. Use live lane tracking to enable early termination of shader at any point in control flow. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D94746	2021-02-11 20:31:29 +09:00
Sander de Smalen	17c9b82086	NFC: Migrate CodeMetrics to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96030	2021-02-11 11:08:41 +00:00
Max Kazantsev	d0e934ee3a	Revert "[Codegenprepare][X86] Use usub with overflow opt for IV increment" This reverts commit 3d15b7e7dfc3e2cefc47791d1e8d95909e937842. We've found an internal failure, need to analyze.	2021-02-11 17:52:11 +07:00
David Green	c7af7bf359	[ARM] Remove dead mov's in preheader of tail predicated loops With t2DoLoopDec we can be left with some extra MOV's in the preheaders of tail predicated loops. This removes them, in the same way we remove other dead variables. Differential Revision: https://reviews.llvm.org/D91857	2021-02-11 10:48:20 +00:00
Sander de Smalen	e6c099942e	[TTI] Change TargetTransformInfo::getMinimumVF to return ElementCount This will be needed in the loop-vectorizer where the minimum VF requested may be a scalable VF. getMinimumVF now takes an additional operand 'IsScalableVF' that indicates whether a scalable VF is required. Reviewed By: kparzysz, rampitec Differential Revision: https://reviews.llvm.org/D96020	2021-02-11 09:08:48 +00:00
Markus Lavin	87a9aa8569	Expand masked mem intrinsics correctly wrt big-endian Need to take endianness into account when doing vector to scalar casts such as %bc = bitcast <8 x i1> %v to i8 Companion commit for https://reviews.llvm.org/D94867 Upload in response to https://lists.llvm.org/pipermail/llvm-dev/2021-January/147862.html Attempting to document the actual memory layout rules for vectors in https://reviews.llvm.org/D94964 Differential Revision: https://reviews.llvm.org/D94765	2021-02-11 08:59:52 +00:00
David Green	a86aaa927d	[ARM] Make a BE predicate bitcast consistent with the rest of llvm We were storing predicate registers, such as a <8 x i1>, in the opposite order to how the rest of llvm expects. This actually turns out to be correct for the one place that usually uses it - the ScalarizeMaskedMemIntrin pass, but only because the pass was incorrect itself. This fixes the order so that bits are stored in the opposite order and bitcasts work as expected. This allows the Scalarization pass to be fixed, as in https://reviews.llvm.org/D94765. Differential Revision: https://reviews.llvm.org/D94867	2021-02-11 08:59:52 +00:00
Sander de Smalen	d43cdc983c	[LoopVectorize] NFC: Change selectVectorizationFactor to work on ElementCount. This patch is NFC and changes occurrences of `unsigned Width` and `unsigned i` to work on type ElementCount instead. This patch is a preparatory patch with the ultimate goal of making `computeMaxVF()` return both a max fixed VF and a max scalable VF, so that `selectVectorizationFactor()` can pick the most cost-effective vectorization factor. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96019	2021-02-11 08:47:59 +00:00
Sander de Smalen	dec9975f46	[AArch64][SVE] Asm: Fix supported immediates for DUP/CPY This patch fixes an issue in the implementation of DUP/CPY where certain immediates were not accepted. Immediates should be interpreted as a two's complement encoding of a value that fits the number of bits of the element type. mov z0.b, p0/z, #127 <=> mov z0.b, p0/z, #-129 <=> mov z0.b, p0/z, #0xffffffffffffff7f This behaviour is in line with the GNU assembler. Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D94776	2021-02-11 08:14:15 +00:00
Arthur Eubanks	a1e9fa0c43	[NFC] Don't pass redundant arguments Some parameters were already part of the Config passed in.	2021-02-10 22:06:49 -08:00
Max Kazantsev	956eb929fb	[Codegenprepare][X86] Use usub with overflow opt for IV increment Function `replaceMathCmpWithIntrinsic` artificially limits the scope of the optimization, setting a requirement of two instructions be in the same block, due to two reasons: - usage of DT for more general check is costly in terms of compile time; - risk of creating a new value that lives through multiple blocks. Because of this, two semantically equivalent tests may be or not be the subject of this opt depending on where the binary operation is located. See `test/CodeGen/X86/usub_inc_iv.ll` for motivation There is one important particular case where this limitation is too strict: it is when the binary operation is the increment of the induction variable. As result, the application of this opt becomes fragile and highly reliant on where other passes decide to place IV increment. In most cases, they place it in the end of the latch block, killing the opt opportunity (when in fact it does not matter where to insert the actual instruction). This patch handles this particular case separately. - The detector does not use dom tree and has constant cost; - The value of IV or IV.next lives through all loop in any case, so this should not create a new unexpected long-living value. As result, the transform becomes more robust. It also seems to lead to better code generation in some cases (see `test/CodeGen/X86/lsr-loop-exit-cond.ll`). Differential Revision: https://reviews.llvm.org/D96119 Reviewed By: spatel, reames	2021-02-11 11:59:45 +07:00
Max Kazantsev	a8eb60430b	[Test] Add negative tests where usub optimization should not apply	2021-02-11 11:59:44 +07:00
Carl Ritson	84e151045c	[AMDGPU] Refactor MIMG tables to better handle hardware variants Add mimgopc object to represent the opcode allowing different opcodes for different hardware variants. This enables image_atomic_fcmpswap, image_atomic_fmin, and image_atomic_fmax on GFX10 Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D96309	2021-02-11 13:22:41 +09:00
Kazu Hirata	296ee17661	[AsmPrinter] Use range-based for loops (NFC)	2021-02-10 20:01:22 -08:00
Kazu Hirata	be876e8742	[TableGen] Use ListSeparator (NFC)	2021-02-10 20:01:20 -08:00
Kazu Hirata	32c2f14548	[GCOV] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2021-02-10 20:01:18 -08:00
Craig Topper	c0c25b3ec4	[X86] Simplify patterns for avx512 vpcmp. NFC This removes the commuted PatFrags that only existed to carry an SDNodeXForm in its OperandTransform field. We know all the places that need to use the commuted SDNodeXForm and there is one transform shared by signed and unsigned compares. So just hardcode the the SDNodeXForm where it is needed and use the non commuted PatFrag in the pattern. I think when I wrote this I thought the SDNodeXForm name had to match what is in the PatFrag that is being used. But that's not true. The OperandTransform is only used when the PatFrag is used in an instruction pattern and not a separate Pat pattern. All the commuted cases are Pat patterns.	2021-02-10 19:24:27 -08:00
Jessica Clarke	0547508f43	[RISCV] More whitespace and comment typo fixes in RISCVInstrInfoC.td	2021-02-11 02:32:36 +00:00
Jessica Clarke	de77b30b92	[RISCV] Fix whitespace in RISCVInstrInfoC.td	2021-02-11 02:23:09 +00:00
Craig Topper	c885998d0a	[RISCV] Use OperandTransform field of ImmLeaf to slightly simplify a couple bitmanip patterns. NFC This binds the SDNodeXForm to the ImmLeaf so we only need to mention the ImmLeaf in both the input and output pattern.	2021-02-10 17:52:07 -08:00
xgupta	0a80a6fb48	[Draft] [examples] Move llvm/examples/OCaml-Kaleidoscope/ to llvm-archive	2021-02-11 06:52:24 +05:30
Duncan P. N. Exon Smith	65e9e80474	ValueMapper: Rename RF_MoveDistinctMDs => RF_ReuseAndMutateDistinctMDs, NFC Rename the `RF_MoveDistinctMDs` flag passed into `MapValue` and `MapMetadata` to `RF_ReuseAndMutateDistinctMDs` in order to more precisely describe its effect and clarify the header documentation. Found this while helping to investigate PR48841, which pointed out an unsound use of the flag in `CloneModule()`. For now I've just added a FIXME there, but I'm hopeful that the new (more precise) name will prevent other similar errors.	2021-02-10 16:53:21 -08:00

1 2 3 4 5 ...

211060 Commits