llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 02:33:06 +01:00

Author	SHA1	Message	Date
David Blaikie	a9632b4cd8	PR51018: Remove explicit conversions from SmallString to StringRef to future-proof against C++23 C++23 will make these conversions ambiguous - so fix them to make the codebase forward-compatible with C++23 (& a follow-up change I've made will make this ambiguous/invalid even in <C++23 so we don't regress this & it generally improves the code anyway)	2021-07-08 13:37:57 -07:00
Vitaly Buka	dec7ba68ad	[msan] Handle funnel shifts Fixes https://bugs.llvm.org/show_bug.cgi?id=50840 Differential Revision: https://reviews.llvm.org/D105387	2021-07-08 12:49:49 -07:00
Vitaly Buka	4395169d3e	[msan] Add funel shift tests For https://bugs.llvm.org/show_bug.cgi?id=50840	2021-07-08 12:49:49 -07:00
Alexey Bataev	684123ead3	[SLP]Improve vectorization of stores. Patch tries to improve the vectorization of stores. Originally, we just check the type and the base pointer of the store. Patch adds some extra checks to avoid non-profitable vectorization cases. It includes analysis of the scalar values to be stored and triggers the vectorization attempt only if the scalar values have same/alt opcode and are from same basic block, i.e. we don't end up immediately with the gather node, which is not profitable. This also improves compile time by filtering out non-profitable cases. Part of D57059. Differential Revision: https://reviews.llvm.org/D104122	2021-07-08 12:35:39 -07:00
Nikita Popov	d73135cade	[AMDGPU] Simplify GEP construction (NFC) Noticed while making a related change. This code was doing something really peculiar: Creating an APInt by parsing a string. And then creating a SmallVector with one element to create the GEP. Instead create the APInt from integers and directly pass the single index to GetElementPtrInst::Create().	2021-07-08 21:21:43 +02:00
Nikita Popov	b93e2e6eba	[AMDGPU] Pass explicit GEP type in printf transform (NFC) This code is working on an i8*. Avoid nullptr element type in preparation for removing support.	2021-07-08 21:21:43 +02:00
Nikita Popov	a0e0d56f9a	[NVPTX] Pass explicit GEP type (NFC) Use source element type of original GEP, as we're just changing the address space.	2021-07-08 21:21:43 +02:00
Alexey Bataev	9770438876	[SLP][COST][X86]Improve cost model for masked gather. Revived D101297 in its original form + added some changes in X86 legalization cehcking for masked gathers. This solution is the most stable and the most correct one. We have to check the legality before trying to build the masked gather in SLP. Without this check we have incorrect cost (for SLP) in case if the masked gather is not legal/slower than the gather. And we're missing some vectorization opportunities. This can be fixed in the cost model, but in this case we need to add special checks for the cost of GEPs for ScatterVectorize node, add special check for small trees, etc., i.e. there are a lot of corner cases here and there, which insrease code base and make it harder to maintain the code. > Can't we rely on cost model to deal with this? This can be profitable for futher vectorization, when we can start from such gather loads as seed. The question from D101297. Actually, no, it can't. Actually, simple gather may give us better result, especially after we started vectorization of insertelements. Plus, like I said before, the cost for non-legal masked gathers leads to missed vectorization opportunities. Differential Revision: https://reviews.llvm.org/D105042	2021-07-08 11:53:30 -07:00
Craig Topper	c852776479	[ARM] Use matchSimpleRecurrence to simplify some code in MVEGatherScatterLowering. NFCI Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D105262	2021-07-08 11:42:56 -07:00
Alexey Bataev	ffb982ed74	[X86][NFC]Add run lines for AVX512VL for masked gather test, NFC.	2021-07-08 11:30:31 -07:00
Stanislav Mekhanoshin	08c7e1d8d3	[AMDGPU] Fix more indention in llc-pipeline test. NFC.	2021-07-08 11:20:00 -07:00
Michael Liao	bee0b38da8	[Metadata] Decorate methods with 'const'. NFC. - Minor coding style fix.	2021-07-08 14:11:14 -04:00
Stanislav Mekhanoshin	011e4165ea	[AMDGPU] Fix indention in llc-pipeline test. NFC.	2021-07-08 11:08:25 -07:00
Matt Arsenault	4bfece2838	Mips/GlobalISel: Remove custom splitToValueTypes	2021-07-08 13:39:06 -04:00
Matt Arsenault	fc47c36984	GlobalISel: Track original argument index in ArgInfo SelectionDAG's equivalents in ISD::InputArg/OutputArg track the original argument index. Mips relies on this, and its currently reinventing its own parallel CallLowering infrastructure which tracks these indexes on the side. Add this to help move towards deleting the custom mips handling.	2021-07-08 13:39:02 -04:00
Matt Arsenault	f41a104b55	Mips/GlobalISel: Use correct callee calling convention This was using the convention from the calling function.	2021-07-08 13:38:57 -04:00
Fangrui Song	3e6125b247	[LangRef] Fix typo about SHF_LINK_ORDER	2021-07-08 10:29:43 -07:00
Eli Friedman	915fc454ff	[ScalarEvolution] Fix overflow in computeBECount. There are two issues with the current implementation of computeBECount: 1. It doesn't account for the possibility that adding "Stride - 1" to Delta might overflow. For almost all loops, it doesn't, but it's not actually proven anywhere. 2. It doesn't account for the possibility that Stride is zero. If Delta is zero, the backedge is never taken; the value of Stride isn't relevant. To handle this, we have to make sure that the expression returned by computeBECount evaluates to zero. To deal with this, add two new checks: 1. Use a variety of tricks to try to prove that the addition doesn't overflow. If the proof is impossible, use an alternate sequence which never overflows. 2. Use umax(Stride, 1) to handle the possibility that Stride is zero. Differential Revision: https://reviews.llvm.org/D105216	2021-07-08 10:09:55 -07:00
Simon Pilgrim	5d69b0490b	[CostModel][X86] Account for older SSE targets with slow fp->int conversions Both the conversion cost and the xmm->gpr transfer cost tend to be a lot higher on early SSE targets	2021-07-08 18:08:24 +01:00
Fangrui Song	6f7d20de03	[LangRef] Clarify !associated Notably, a global variable with the metadata should generally not be referenced by a function function. E.g. -fstack-size-section usage is fine, but -fsanitize-coverage= used to have a linker GC problem (fixed by D97430). Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D104933	2021-07-08 10:07:10 -07:00
Stanislav Mekhanoshin	ea5d414af6	[AMDGPU] Set LoopInfo as preserved by SIAnnotateControlFlow The pass does not change loops, it just adds calls. Differential Revision: https://reviews.llvm.org/D105583	2021-07-08 09:34:43 -07:00
Nikita Popov	4f2df3c6f8	[IR] Restore vector support for deprecated CreateGEP methods As pointed out in post-commit review on rG8e22539067d9, it's necessary to call getScalarType() to support GEPs with a vector base. Dropping that call was an oversight on my side.	2021-07-08 18:15:56 +02:00
Jeremy Morse	d7cf7abb78	[DebugInfo][InstrRef][4/4] Support DBG_INSTR_REF through all backend passes This is a cleanup patch -- we're now able to support all flavours of variable location in instruction referencing mode. This patch updates various tests for debug instructions to be broader: numerous code paths try to ignore debug isntructions, and they now have to ignore the additional DBG_PHI and DBG_INSTR_REFs that we can generate. A small amount of rework happens for LiveDebugVariables: as we don't need to track live intervals through regalloc any more, we can get away with unlinking debug instructions before regalloc, then re-inserting them after. Note that this isn't (yet) true of DBG_VALUE_LISTs, they still have to go through live interval tracking. In SelectionDAG, add a helper lambda that emits half-formed DBG_INSTR_REFs for arguments in instr-ref mode, DBG_VALUE otherwise. This is one of the final locations where DBG_VALUEs are emitted for vreg arguments. X86InstrInfo now un-sets the debug instr number on SUB instructions that get mutated into CMP instructions. As the instruction no longer computes a subtraction, we can't use it for variable locations. Differential Revision: https://reviews.llvm.org/D88898	2021-07-08 16:42:24 +01:00
LLVM GN Syncbot	a3b5c3bb94	[gn build] Port 321c2ea91cb1	2021-07-08 15:35:54 +00:00
Tim Northover	7c89253a7a	Recommit: Support: add llvm::thread class that supports specifying stack size. This adds a new llvm::thread class with the same interface as std::thread except there is an extra constructor that allows us to set the new thread's stack size. On Darwin even the default size is boosted to 8MB to match the main thread. It also switches all users of the older C-style `llvm_execute_on_thread` API family over to `llvm::thread` followed by either a `detach` or `join` call and removes the old API. Moved definition of DefaultStackSize into the .cpp file to hopefully fix the build on some (GCC-6?) machines.	2021-07-08 16:22:26 +01:00
Alexey Bataev	e3e067e99e	[Instcombine]Transform reduction+(sext/zext(<n x i1>) to <n x im>) to [-]zext/trunc(ctpop(bitcast <n x i1> to in)) to im. Some of the SPEC tests end up with reduction+(sext/zext(<n x i1>) to <n x im>) pattern, which can be transformed to [-]zext/trunc(ctpop(bitcast <n x i1> to in)) to im. Also, reduction+(<n x i1>) can be transformed to ctpop(bitcast <n x i1> to in) & 1 != 0. Differential Revision: https://reviews.llvm.org/D105587	2021-07-08 07:56:41 -07:00
Michael Liao	36ed673592	[Internalize] Preserve variables externally initialized. - ``externally_initialized`` variables would be initialized or modified elsewhere. Particularly, CUDA or HIP may have host code to initialize or modify ``externally_initialized`` device variables, which may not be explicitly referenced on the device side but may still be used through the host side interfaces. Not preserving them triggers the elimination of them in the GlobalDCE and breaks the user code. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D105135	2021-07-08 10:48:19 -04:00
Alexey Bataev	9296dfb97c	[Instcombine][NFC]Add a test for reduce+([sext/zext](<n x i1)) case, NFC.	2021-07-08 07:38:11 -07:00
Michael Liao	4ed4e3be71	[amdgpu] Remove the GlobalDCE pass prior to the internalization pass. - In [D98783](https://reviews.llvm.org/D98783), an extra GlobalDCE pass is inserted before the internalization pass to ensure a global variable without users could be internalized even if there are dead users. Instead of inserting a dedicated optimization pass, the dead user checking, i.e. 'use_empty()', should be preceeded with constant dead user removal to ensure an accurate result. Differential Revision: https://reviews.llvm.org/D105590	2021-07-08 10:25:58 -04:00
Tim Northover	1b885b1ce7	Revert "Support: add llvm::thread class that supports specifying stack size." It's causing build failures because DefaultStackSize isn't defined everywhere it should be and I need time to investigate.	2021-07-08 14:59:47 +01:00
Tim Northover	43bfac999c	Support: add llvm::thread class that supports specifying stack size. This adds a new llvm::thread class with the same interface as std::thread except there is an extra constructor that allows us to set the new thread's stack size. On Darwin even the default size is boosted to 8MB to match the main thread. It also switches all users of the older C-style `llvm_execute_on_thread` API family over to `llvm::thread` followed by either a `detach` or `join` call and removes the old API.	2021-07-08 14:51:53 +01:00
xndcn	4bda00e90e	[NFC] Mark Expected<T>::assertIsChecked() as const Some const methods of Expected<T> invoke assertIsChecked(), so we should mark it as const too. Differential Revision: https://reviews.llvm.org/D105292	2021-07-08 21:30:23 +08:00
Bradley Smith	7f15962ed8	[AArch64][SVE] Add ISel patterns for floating point compare with zero instructions Additionally, lower the floating point compare SVE intrinsics to SETCC_MERGE_ZERO ISD nodes to avoid duplicating ISel patterns. Differential Revision: https://reviews.llvm.org/D105486	2021-07-08 10:46:12 +00:00
Max Kazantsev	c4087cbb99	[Test] Add loop deletion switch tests Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D105543	2021-07-08 17:28:08 +07:00
Moritz Sichert	2f6870edd6	[IR] Added operator delete to subclasses of User to avoid UB Several subclasses of User override operator new without also overriding operator delete. This means that delete expressions fall back to using operator delete of the base class, which would be User. However, this is only allowed if the base class has a virtual destructor which is not the case for User, so this is UB. See also [expr.delete] (3) for the exact wording. This is actually detected in some cases by GCC 11's -Wmismatched-new-delete now which is how I found this error. Differential Revision: https://reviews.llvm.org/D103143	2021-07-08 11:59:22 +02:00
Sebastian Neubauer	18736d1b57	[AMDGPU] Fix typo	2021-07-08 10:07:33 +02:00
Lang Hames	2d682bd2a2	[ORC] Introduce ExecutorAddress type, fix broken LLDB bot. ExecutorAddressRange depended on JITTargetAddress, but JITTargetAddress is defined in ExecutionEngine, which OrcShared should not depend on. This seems like as good a time as any to introduce a new ExecutorAddress type to eventually replace JITTargetAddress. For now it's just another uint64_t alias, but it will soon be changed to a class type to provide greater type safety.	2021-07-08 16:31:59 +10:00
Lang Hames	bee25fbe59	[ORC] Improve computeLocalDeps / computeNamedSymbolDependencies performance. The computeNamedSymbolDependencies and computeLocalDeps methods on ObjectLinkingLayerJITLinkContext are responsible for computing, for each symbol in the current MaterializationResponsibility, the set of non-locally-scoped symbols that are depended on. To calculate this we have to consider the effect of chains of dependence through locally scoped symbols in the LinkGraph. E.g. .text .globl foo foo: callq bar ## foo depneds on external 'bar' movq Ltmp1(%rip), %rcx ## foo depends on locally scoped 'Ltmp1' addl (%rcx), %eax retq .data Ltmp1: .quad x ## Ltmp1 depends on external 'x' In this example symbol 'foo' depends directly on 'bar', and indirectly on 'x' via 'Ltmp1', which is locally scoped. Performance of the existing implementations appears to have been mediocre: Based on flame graphs posted by @drmeister (in #jit on the LLVM discord server) the computeLocalDeps function was taking up a substantial amount of time when starting up Clasp (https://github.com/clasp-developers/clasp). This commit attempts to address the performance problems in three ways: 1. Using jitlink::Blocks instead of jitlink::Symbols as the nodes of the dependencies-introduced-by-locally-scoped-symbols graph. Using either Blocks or Symbols as nodes provides the same information, but since there may be more than one locally scoped symbol per block the block-based version of the dependence graph should always be a subgraph of the Symbol-based version, and so faster to operate on. 2. Improved worklist management. The older version of computeLocalDeps used a fixed worklist containing all nodes, and iterated over this list propagating dependencies until no further changes were required. The worklist was not sorted into a useful order before the loop started. The new version uses a variable work-stack, visiting nodes in DFS order and only adding nodes when there is meaningful work to do on them. Compared to the old version the new version avoids revisiting nodes which haven't changed, and I suspect it converges more quickly (due to the DFS ordering). 3. Laziness and caching. Mappings of... jitlink::Symbol* -> Interned Name (as SymbolStringPtr) jitlink::Block* -> Immediate dependencies (as SymbolNameSet) jitlink::Block* -> Transitive dependencies (as SymbolNameSet) are all built lazily and cached while running computeNamedSymbolDependencies. According to @drmeister these changes reduced Clasp startup time in his test setup (averaged over a handful of starts) from 4.8 to 2.8 seconds (with ORC/JITLink linking ~11,000 object files in that time), which seems like enough to justify switching to the new algorithm in the absence of any other perf numbers.	2021-07-08 16:31:59 +10:00
Thomas Lively	d99937df5b	[WebAssembly] Optimize out shift masks WebAssembly's shift instructions implicitly masks the shift count, so optimize out redundant explicit masks of the shift count. For vector shifts, this currently only works if the mask is applied before splatting the shift count, but this should be addressed in a future commit. Resolves PR49655. Differential Revision: https://reviews.llvm.org/D105600	2021-07-07 23:14:31 -07:00
Lang Hames	760f860c3a	[ORC] Replace MachOJITDylibInitializers::SectionExtent with ExecutorAddressRange MachOJITDylibInitializers::SectionExtent represented the address range of a section as an (address, size) pair. The new ExecutorAddressRange type generalizes this to an address range (for any object, not necessarily a section) represented as a (start-address, end-address) pair. The aim is to express more of ORC (and the ORC runtime) in terms of simple types that can be serialized/deserialized via SPS. This will simplify SPS-based RPC involving arguments/return-values of these types.	2021-07-08 14:15:44 +10:00
Lang Hames	4c6599a274	[ORC] Fix file comments.	2021-07-08 14:15:44 +10:00
Patrick Holland	36584cd187	Revert "[MCA] [AMDGPU] Adding an implementation to AMDGPUCustomBehaviour for handling s_waitcnt instructions." Build failures when building with shared libraries. Reverting until I can fix. Differential Revision: https://reviews.llvm.org/D104730	2021-07-07 20:48:42 -07:00
Qiu Chaofan	6e1952e313	[PowerPC] Fix i64 to vector lowering on big endian Lowering for scalar to vector would skip if current subtarget is big endian and the scalar is larger or equal than 64 bits. However there's some issue in implementation that SToVRHS may refer to SToVLHS's scalar size if SToVLHS is present, which leads to some crash.o Reviewed By: nemanjai, shchenz Differential Revision: https://reviews.llvm.org/D105094	2021-07-08 11:05:09 +08:00
Nico Weber	11944c0204	[gn build] (manually) port ef16c8eaa5cd5679759 (MCACustomBehaviorAMDGPU)	2021-07-07 21:59:07 -04:00
Nico Weber	cde6b35842	[gn build] (semi-manually) port 966386514bec	2021-07-07 19:27:19 -04:00
Stanislav Mekhanoshin	dc43bb3409	[AMDGPU] Disable garbage collection passes Differential Revision: https://reviews.llvm.org/D105593	2021-07-07 15:47:57 -07:00
Fangrui Song	385913f9ce	[llvm-nm][test] Fix just-symbols.test	2021-07-07 15:04:18 -07:00
Arthur Eubanks	266a9a84be	[OpaquePtr] Use ArgListEntry::IndirectType for lowering ABI attributes Consolidate PreallocatedType and ByValType into IndirectType, and use that for inalloca.	2021-07-07 14:58:38 -07:00
Jinsong Ji	8dfa5bc073	[PowerPC] Add P7 RUN line for load and splat test	2021-07-07 21:43:46 +00:00
Arthur Eubanks	b3ffc2a93b	[OpaquePtr] Remove checking pointee type for byval/preallocated type These currently always require a type parameter. The bitcode reader already upgrades old bitcode without the type parameter to use the pointee type. In cases where the caller does not have byval but the callee does, we need to follow CallBase::paramHasAttr() and also look at the callee for the byval type so that CallBase::isByValArgument() and CallBase::getParamByValType() are in sync. Do the same for preallocated. While we're here add a corresponding version for inalloca since we'll need it soon. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104663	2021-07-07 14:28:55 -07:00

1 2 3 4 5 ...

218189 Commits