llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 03:02:36 +01:00

Author	SHA1	Message	Date
Carl Ritson	84e151045c	[AMDGPU] Refactor MIMG tables to better handle hardware variants Add mimgopc object to represent the opcode allowing different opcodes for different hardware variants. This enables image_atomic_fcmpswap, image_atomic_fmin, and image_atomic_fmax on GFX10 Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D96309	2021-02-11 13:22:41 +09:00
Kazu Hirata	296ee17661	[AsmPrinter] Use range-based for loops (NFC)	2021-02-10 20:01:22 -08:00
Kazu Hirata	be876e8742	[TableGen] Use ListSeparator (NFC)	2021-02-10 20:01:20 -08:00
Kazu Hirata	32c2f14548	[GCOV] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2021-02-10 20:01:18 -08:00
Craig Topper	c0c25b3ec4	[X86] Simplify patterns for avx512 vpcmp. NFC This removes the commuted PatFrags that only existed to carry an SDNodeXForm in its OperandTransform field. We know all the places that need to use the commuted SDNodeXForm and there is one transform shared by signed and unsigned compares. So just hardcode the the SDNodeXForm where it is needed and use the non commuted PatFrag in the pattern. I think when I wrote this I thought the SDNodeXForm name had to match what is in the PatFrag that is being used. But that's not true. The OperandTransform is only used when the PatFrag is used in an instruction pattern and not a separate Pat pattern. All the commuted cases are Pat patterns.	2021-02-10 19:24:27 -08:00
Jessica Clarke	0547508f43	[RISCV] More whitespace and comment typo fixes in RISCVInstrInfoC.td	2021-02-11 02:32:36 +00:00
Jessica Clarke	de77b30b92	[RISCV] Fix whitespace in RISCVInstrInfoC.td	2021-02-11 02:23:09 +00:00
Craig Topper	c885998d0a	[RISCV] Use OperandTransform field of ImmLeaf to slightly simplify a couple bitmanip patterns. NFC This binds the SDNodeXForm to the ImmLeaf so we only need to mention the ImmLeaf in both the input and output pattern.	2021-02-10 17:52:07 -08:00
xgupta	0a80a6fb48	[Draft] [examples] Move llvm/examples/OCaml-Kaleidoscope/ to llvm-archive	2021-02-11 06:52:24 +05:30
Duncan P. N. Exon Smith	65e9e80474	ValueMapper: Rename RF_MoveDistinctMDs => RF_ReuseAndMutateDistinctMDs, NFC Rename the `RF_MoveDistinctMDs` flag passed into `MapValue` and `MapMetadata` to `RF_ReuseAndMutateDistinctMDs` in order to more precisely describe its effect and clarify the header documentation. Found this while helping to investigate PR48841, which pointed out an unsound use of the flag in `CloneModule()`. For now I've just added a FIXME there, but I'm hopeful that the new (more precise) name will prevent other similar errors.	2021-02-10 16:53:21 -08:00
Jessica Paquette	026e93e7b0	[AArch64][GlobalISel] Don't perform the mul const combine with G_PTR_ADD A G_MUL + G_PTR_ADD can also be folded into a madd. So, conservatively, we shouldn't combine when the G_MUL is used by a G_PTR_ADD either. Differential Revision: https://reviews.llvm.org/D96457	2021-02-10 15:30:45 -08:00
Arthur Eubanks	edd40c02bf	[docs] Make clearer in WritingAnLLVMPass that the legacy PM isn't the default Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D96452	2021-02-10 15:26:25 -08:00
Jessica Paquette	f217f0c3d4	[AArch64][GlobalISel] Perform load/store extended reg folding with optsize GlobalISel was only doing this with minsize. SDAG does this with optsize. (See: `SelectionDAG::shouldOptForSize()`) This is a 0.3% code size improvement for CTMark at -Os. (Best: 1.1% improvements on lencod + pairlocalalign) Differential Revision: https://reviews.llvm.org/D96451	2021-02-10 14:42:25 -08:00
Hongtao Yu	31388aa1ee	[CSSPGO] Restrict pseudo probe tests to x86_64 only.	2021-02-10 14:41:10 -08:00
Benjamin Kramer	540bd737f8	[SampleFDO] Silence -Wnon-virtual-dtor warning There's no polymorphic deletion happening here.	2021-02-10 23:37:15 +01:00
Arthur Eubanks	b21a1c4671	[opt] Add helpful alternatives for -analyze under new PM Reviewed By: reames Differential Revision: https://reviews.llvm.org/D96449	2021-02-10 14:09:17 -08:00
Jacques Pienaar	abf5d9af78	Revert "Make gCrashRecoveryEnabled thread local" This reverts commit 5e77ea04f214c7a18bd5c782c8b8a7b7c828ad7a. Causes a breakage on Windows buildbot.	2021-02-10 13:36:56 -08:00
Rong Xu	95894decb1	[SampleFDO][NFC] Refactor SampleProfileLoader to reuse in CodeGen Break SampleProfileLoader into to a base and a derived class. Base class (SampleProfileLoaderBaseImpl) includes the common code for IR and MachineIR (CodeGen) sample loader. It will be templatelized in the later patch. Inline and Probe related code will remain in the derived class of SampleProfileLoader and stays in SampleProfile.cpp. We need to refactor some functions: (1) getInstWeight() to enable the code sharing -- put the core into getInstWeightImpl(). (2) emitAnnotation() and propagateWeights() to carve out the code specific to SampleProfileLoader. (3) make getInstWeight() and findFunctionSamples() virtual and override in SampleProfileLoader as they need to access the fields in the derived class. Differential Revision: https://reviews.llvm.org/D95832	2021-02-10 13:29:15 -08:00
Jessica Paquette	984ed5a2c0	[AArch64][GlobalISel] Fold G_ADD into the cset for G_ICMP When we have a G_ADD which is fed by a G_ICMP on one side, we can fold it into the cset for the G_ICMP. e.g. Given ``` %cmp = G_ICMP ... %x, %y %add = G_ADD %cmp, %z ``` We would normally emit a cmp, cset, and add. However, `%add` is either `%z` or `%z + 1`. So, we can just use `%z` as the source of the cset rather than wzr, saving an instruction. This would probably be cleaner in AArch64PostLegalizerLowering, but we'd need to change the way we represent G_ICMP to do that, I think. For now, it's easiest to implement in selection. This is a 0.1% code size improvement on CTMark/pairlocalalign at -Os. Example: https://godbolt.org/z/7KdrP8 Differential Revision: https://reviews.llvm.org/D96388	2021-02-10 13:28:01 -08:00
Jacques Pienaar	ee4f11f288	Make gCrashRecoveryEnabled thread local If context is enabled/disabled and queried concurrently then this results in a data race/TSAN failure with RunSafely (where boolean variable was not locked). There doesn't seem to be a reasonable way to enable threads that enable and disable recovery in parallel (without also keeping gCrashRecoveryEnabled's lock held during Fn execution which seems undesirable). This makes enable checking if enabled thread local and consistent with other thread local usage of crash context here. Differential Revision: https://reviews.llvm.org/D93907	2021-02-10 12:44:18 -08:00
Hongtao Yu	20454adfae	[CSSPGO] Unblock optimizations with pseudo probe instrumentation. The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include: 1. IR InstCombine, sinking load operation to shorten lifetimes. 2. MIR LiveRangeShrink, similar to #1 3. MIR TwoAddressInstructionPass, i.e, opeq transform 4. MIR function argument copy elision 5. IR stack protection. (though not perf-critical but nice to have). Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95982	2021-02-10 12:43:17 -08:00
Adrian Prantl	b39bde1400	Add missing nullptr check. salvageDebugInfoImpl() may fail and return a nullptr.	2021-02-10 12:15:24 -08:00
Philip Reames	c0b69d1bf9	[SCEV] Add a missing AssumptionCache parameter The AssumptionCache mechanism is used to feed assumes into known bits computations. Most places in SCEV passed it in, but one place appears to have been missed. Spotted via inspection, don't have a test case which actually exercises this, but it seemed like an obvious fixit.	2021-02-10 12:08:55 -08:00
Sanjay Patel	eb0c4391c2	[InstCombine] fold lshr(mul X, SplatC), C2 This is a special-case multiply that replicates bits of the source operand. We need this fold to avoid regression if we make canonicalization to `mul` more aggressive for shl+or patterns. I did not see a way to make Alive generalize the bit width condition for even-number-of-bits only, but an example of the proof is: Name: i32 Pre: isPowerOf2(C1 - 1) && log2(C1) == C2 && (C2 * 2 == width(C2)) %m = mul nuw i32 %x, C1 %t = lshr i32 %m, C2 => %t = and i32 %x, C1 - 2 Name: i14 %m = mul nuw i14 %x, 129 %t = lshr i14 %m, 7 => %t = and i14 %x, 127 https://rise4fun.com/Alive/e52	2021-02-10 15:02:31 -05:00
Sanjay Patel	8d38c5205a	[InstCombine] add tests for lshr with mul; NFC	2021-02-10 15:02:31 -05:00
Jameson Nash	adecc4fab7	Renovate CMake files in the `llvm-exegesis` tool. This attempts to move all tools over to using `add_llvm_library` for better consistency. After doing this, I noticed it ended up as nearly a reimplementation of https://reviews.llvm.org/rL342148, which later got reverted in r342336 (b09a8c9bd9b819741b38071a7ccd95042ef2643a). With ccache and ninja on a large core machine (40), I haven't run into build errors, so I'm hopeful it's better now, though it doesn't seem to be any different / new. Reviewed By: stephenneuendorffer Differential Revision: https://reviews.llvm.org/D90970	2021-02-10 14:22:55 -05:00
Arthur Eubanks	31acdea7b2	[opt][NewPM] Add a --print-passes flag to print all available passes It seems nicer to list passes given a flag rather than displaying all passes in opt --help. This is awkwardly structured because a PassBuilder is required, but reusing the PassBuilder in runPassPipeline() doesn't work because we read the input IR before getting to runPassPipeline(). So printing the list of passes needs to happen before reading the input IR. If we remove the legacy PM code in main() and move everything from NewPMDriver.cpp into opt.cpp, we can create the PassBuilder before reading IR and check if we should print the list of passes and exit. But until then this hack seems fine. Compared to the legacy PM, the new PM passes are lacking descriptions. We'll need to figure out a way to add descriptions if we think this is important. Also, this only works for passes specified in PassRegistry.def. If we want to print other custom registered passes, we'll need a different mechanism. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D96101	2021-02-10 11:22:12 -08:00
Craig Topper	ca72db68cf	[RISCV] Remove superfluous semicolon. NFC	2021-02-10 11:20:29 -08:00
Nick Desaulniers	b7078d6c0f	[Thumb2] support `movs pc, lr` alias for `subs pc, lr, #0`/`eret` This is used by the Linux kernel built with CONFIG_THUMB2_KERNEL. Because different operands are not permitted to `movs`, the diagnostics now provide multiple suggestions along the lines of using a non-pc destination operand or lr source operand. Forked from D95586. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D96304	2021-02-10 11:00:42 -08:00
Arthur Eubanks	be513286ab	Specify that some flags are legacy PM-specific Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D96100	2021-02-10 10:53:04 -08:00
Craig Topper	02464ec9ee	[RISCV] Add support for matching .vf forms of fadd/fsub/fmul/fdiv/fma for fixed vectors. fma+neg will come in a different patch since I haven't done it for .vv yet either. Differential Revision: https://reviews.llvm.org/D96375	2021-02-10 10:16:27 -08:00
Tom Stellard	3495902fa7	[CMake] Remove some dead code in llvm_install_library_symlink() Reviewed By: smeenai Differential Revision: https://reviews.llvm.org/D95666	2021-02-10 10:13:04 -08:00
Craig Topper	4df7502e0a	[RISCV] Add support for selecting vrgather.vx/vi for fixed vector splat shuffles. The test cases extract a fixed element from a vector and splat it into a vector. This gets DAG combined into a splat shuffle. I've used some very wide vectors in the test to make sure we have at least a couple tests where the element doesn't fit into the uimm5 immediate of vrgather.vi so we fall back to vrgather.vx. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96186	2021-02-10 10:01:56 -08:00
Fangrui Song	16779dce78	DebugInfo/Symbolize: Retrieve filename from the preceding STT_FILE for .symtab symbolization The ELF spec says: > STT_FILE: Conventionally, the symbol's name gives the name of the source file associated with the object file. A file symbol has STB_LOCAL binding, its section index is SHN_ABS, and it precedes the other STB_LOCAL symbols for the file, if it is present. For a local symbol, the preceding STT_FILE symbol is almost always in the same file[1]. GNU addr2line uses this heuristic to retrieve the filename associated with a local symbol (e.g. internal linkage functions in C/C++). GNU addr2line can assign STT_FILE filename to a non-local symbol, too, but the trick only works if no regular symbol precede STT_FILE. This patch does not implement this corner case (not useful for most executables which have more than one files). In case of filename mismatch between .debug_line & .symtab, arbitrarily make .debug_line win. [1]: LLD does not synthesize STT_FILE symbols (https://bugs.llvm.org/show_bug.cgi?id=48023 see also https://sourceware.org/bugzilla/show_bug.cgi?id=26822). An assembly file without `.file` directives can cause mis-attribution. This is an edge case. Differential Revision: https://reviews.llvm.org/D95927	2021-02-10 09:47:10 -08:00
Fangrui Song	7ed6695b16	[llvm-cfi-verify] Set UseSymbolTable to false parseSectionContents expects to skip regions not described by DWARF. With my pending DebugInfo/Symbolize change, the filename can be recovered and there will be more IndirectInstructions entries.	2021-02-10 09:44:13 -08:00
Jeremy Morse	6523da3123	Reland [DWARF] Location-less inlined variables should not have DW_TAG_variable Originally landed in ddc2f1e3fb4 and reverted in d32deaab4d because of a Generic test objecting. That was fixed up in 013613964fd9. Original landing commit message follows: [DWARF] Location-less inlined variables should not have DW_TAG_variable Discussed in this thread: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148139.html DwarfDebug::collectEntityInfo accidentally distinguishes between variable locations that never have a location specified, and variable locations that have an empty location specified. The latter leads to the creation of an empty variable referring to the abstract origin. Fix this by seeking a non-empty location before producing a concrete entity, to guarantee a DW_AT_location will be produced. Other loops in collectEntityInfo and endFunctionImpl take care of examining the retainedNodes collection and ensuring optimised-out variables are created. Differential Revision: https://reviews.llvm.org/D95617	2021-02-10 15:40:47 +00:00
Jay Foad	bba4fc0f2b	[AMDGPU] Add another test case for combining DS reads	2021-02-10 14:59:49 +00:00
Jay Foad	8f88ddb181	[AMDGPU] Fix comments in SILoadStoreOptimizer::offsetsCanBeCombined	2021-02-10 14:49:33 +00:00
Luís Marques	bca21a9503	[DAGCombiner] Don't fold FCOPYSIGN vector sign operand casts Avoid doing the following combine for vector types: ``` copysign(x, fp_extend(y)) -> copysign(x, y) copysign(x, fp_round(y)) -> copysign(x, y) ``` That combine seemed to impede the selection of vector instruction and cause a mess in some circumstances. Differential Revision: https://reviews.llvm.org/D96037	2021-02-10 14:25:24 +00:00
Nico Weber	84c12c061a	[gn build] (manually) port e89fcbfad6a3	2021-02-10 08:59:07 -05:00
Daniel Cederman	d254f97c03	[Sparc] Support relocatable expressions in the assembler Allow assembler expressions to start with an identifier. This allows for expressions such as ``` b symbol + 4 ``` and ``` mov symEnd - symStart, %g1 ``` The patch builds upon https://reviews.llvm.org/D47136. Reviewed By: joerg Differential Revision: https://reviews.llvm.org/D47458	2021-02-10 14:52:44 +01:00
Fraser Cormack	d37d68a83e	[RISCV] Add support for selecting vid.v from build_vector This patch optimizes a build_vector "index sequence" and lowers it to the existing custom RISCVISD::VID node. This pattern is common in autovectorized code. The custom node was updated to allow it to be used by both scalable and fixed-length vectors, thus avoiding pattern duplication. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96332	2021-02-10 10:58:40 +00:00
Jeremy Morse	43a3653e2e	Reapply [DebugInfo] Re-engineer a test to be stricter, add XFails Was e05c10380ce, reverted in d7d0b17de77, see D95617 for details. I've added "arm64" to the XFail list (as well as aarch64), will follow up on the mailing list about whether there's anything else to be done.	2021-02-10 10:46:58 +00:00
Simon Pilgrim	d02df5329d	Revert rGe1172959226689a "[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - merge VPERMILPD ops with different low/high masks." Revert this while I investigate a downstream breakage report.	2021-02-10 10:26:44 +00:00
Sander de Smalen	6613c13273	[LoopVectorize] NFC: Change computeFeasibleMaxVF to operate on ElementCount. This patch is NFC and changes occurrences of `unsigned MaxVectorSize` to work on type ElementCount. This patch is a preparatory patch with the ultimate goal of making `computeMaxVF()` return both a max fixed VF and a max scalable VF, so that `selectVectorizationFactor()` can pick the most cost-effective vectorization factor. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D96018	2021-02-10 08:52:10 +00:00
Sander de Smalen	fcaf7fc621	[ValueTypes] Add MVT for nxv1bf16. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96249	2021-02-10 08:50:41 +00:00
Sam Parker	ae7d261ac5	[WebAssembly] Enable loop unrolling Enable partial and runtime unrolling with a threshold of 30, which was derived from a large number of kernels running on node and wasmtime for amd64 and aarch64. Unrolling is enabled by default at -O2 and -O3 and is disabled at -Oz and -Os. Compiling with -Os is recommended if the wasm binary size is the most important factor. Differential Revision: https://reviews.llvm.org/D95125	2021-02-10 08:25:46 +00:00
Jessica Paquette	38cf82bcec	[AArch64][GlobalISel] Fold selects fed by G_PTR_ADD Similar to the case for G_ADD. There was a function in CTMark/pairlocalalign which was missing this case, causing GlobalISel to emit a add + csel when a csinc is all that is necessary. https://godbolt.org/z/ax69E9 Minor code size improvements on CTMark at -Os. Differential Revision: https://reviews.llvm.org/D96390	2021-02-10 00:03:13 -08:00
Kazu Hirata	4c2d405ebe	[SelectionDAG] Use range-based for loops (NFC)	2021-02-09 22:14:30 -08:00
Kazu Hirata	f3dfaa1d5d	[TableGen] Drop unnecessary const from return types (NFC)	2021-02-09 22:14:28 -08:00

... 2 3 4 5 6 ...

211170 Commits