llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 12:41:49 +01:00

Author	SHA1	Message	Date
Sanjay Patel	eb4a7d1736	[InstCombine] allow undef elements when comparing vector constants for min/max bailout This is a hacky, but low-risk fix to avoid the infinite loop in PR46271: https://bugs.llvm.org/show_bug.cgi?id=46271 As discussed there, the problem is that FoldOpIntoSelect() can get into a conflict with a transform that wants to pull a 'not' op through min/max via SimplifyDemandedVectorElts(). We need to relax our matching of min/max to include undefined elements in vector constants to avoid that. Alternatively, we could improve or cripple the demanded elements analysis, but that could create even more problems. The likely better, safer alternative will be to create min/max intrinsics, so we can remove all of the hacks related to min/max matching in instcombine. Differential Revision: https://reviews.llvm.org/D81698	2020-06-14 09:02:47 -04:00
Simon Pilgrim	226765afee	[X86][SSE] LowerVectorAllZeroTest - add support for pre-SSE41 targets Even without PTEST, we can still efficiently perform an OR reduction as PMOVMSKB(PCMPEQB(X,0)) == 0, avoiding xmm->gpr extractions.	2020-06-14 13:41:56 +01:00
Simon Pilgrim	7d91c0c67b	[X86][SSE] Add non-SSE41 target PTEST tests Ensure codegen is still reasonable - ideally we'd make use of MOVMSK for this.	2020-06-14 12:23:10 +01:00
Xing GUO	e16c61e2d3	[NFC] mv llvm/test/tools/obj2yaml/macho-DWARF-debug-ranges.yaml llvm/test/ObjectYAML/MachO/DWARF-debug_ranges.yaml	2020-06-14 16:39:15 +08:00
Xing GUO	6a0fa3e985	[ObjectYAML][DWARF] Let the target address size be inferred from FileHeader. This patch adds a new field `bool Is64bit` in `DWARFYAML::Data` to indicate the address size of target. It's helpful for inferring the `AddrSize` in some DWARF sections. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D81709	2020-06-14 12:42:20 +08:00
Fangrui Song	7106336a4c	[IteratedDominanceFrontier] Decrease number of SmallPtrSet::insert and delete unneeded SmallVector::clear Also, fix the argument name to be consistent with the declaration.	2020-06-13 19:48:50 -07:00
Craig Topper	12f0d14d7c	[X86] Add mayLoad flag to FARCALL*m/FARJMP memory instrutions. Add 'm' to the end of FARJMP64/FARCALL64 instruction names. We never codegen them so this doesn't matter in practice. But sometimes someone comes along and tries to use these flags for something else. LIke the Load Value Inject inline assembly handling.	2020-06-13 15:40:51 -07:00
Craig Topper	0f96f788ae	[X86] Automatically harden inline assembly RET instructions against Load Value Injection (LVI) Previously, the X86AsmParser would issue a warning whenever a ret instruction is encountered. This patch changes the behavior to automatically transform each ret instruction in an inline assembly stream into: shlq $0, (%rsp) lfence ret which is secure, according to https://software.intel.com/security-software-guidance/insights/deep-dive-load-value-injection#specialinstructions. Patch by Scott Constable with some minor changes by Craig Topper.	2020-06-13 15:16:05 -07:00
Craig Topper	1b3f3aebe1	[X86] Teach combineBitcastvxi1 to prefer movmsk on avx512 in more cases If the input to the bitcast is a sign bit test, it makes sense to directly use vpmovmskb or vmovmskps/pd. This removes the need to copy the sign bits to a k-register and then to a GPR. Fixes PR46200. Differential Revision: https://reviews.llvm.org/D81327	2020-06-13 14:50:13 -07:00
Craig Topper	e6c4a0d6e9	[X86] Move -x86-use-vzeroupper command line flag into runOnMachineFunction for the pass itself rather than the pass pipeline construction This pass has no dependencies on other passes so conditionally including it in the pipeline doens't do much. Just move it the pass itself to keep it isolated.	2020-06-13 14:42:41 -07:00
Roman Lebedev	6de1cdd580	[NFCI][AggressiveInstCombiner] Add `STATISTIC()`s for transforms	2020-06-13 23:53:16 +03:00
Florian Hahn	c23c96f75f	[DSE,MSSA] Fix location order in isOverwrite call. isOverwrite expects the later location as first argument and the earlier result later. The adjusted call is intended to check whether CC overwrites DefLoc.	2020-06-13 20:39:00 +01:00
Craig Topper	12bd6c2a43	[X86] Enable the EVEX->VEX compression pass at -O0. A lot of what EVEX->VEX does is equivalent to what the prioritization in the assembly parser does. When an AVX mnemonic is used without any EVEX features or XMM16-31, the parser will pick the VEX encoding. Since codegen doesn't go through the parser, we should also use VEX instructions when we can so that the code coming out of integrated assembler matches what you'd get from outputing an assembly listing and parsing it. The pass early outs if AVX isn't enabled and uses TSFlags to check for EVEX instructions before doing the more costly table lookups. Hopefully that's enough to keep this from impacting -O0 compile times.	2020-06-13 12:29:04 -07:00
Craig Topper	f7e2b5eebe	[X86] Separate imm from relocImm handling. relocImm was a complexPattern that handled both ConstantSDNode and X86Wrapper. But it was only applied selectively because using it would cause patterns to be not importable into FastISel or GlobalISel. So it only got applied to flag setting instructions, stores, RMW arithmetic instructions, and rotates. Most of the test changes are a result of making patterns available to GlobalISel or FastISel. The absolute-cmp.ll change is due to this fixing a pattern ordering issue to make an absolute symbol match to an 8-bit immediate before trying a 32-bit immediate. I tried to use PatFrags to reduce the repetition, but I was getting errors from TableGen.	2020-06-13 11:29:28 -07:00
Amanieu d'Antras	867f35bc0b	Fix FastISel dropping srcloc metadata from InlineAsm Summary: Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=46060 I've also added the Extra_IsConvergent flag which was missing from FastISel. Reviewers: echristo Reviewed By: echristo Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80759	2020-06-13 16:52:37 +01:00
Xing GUO	deb304b59f	Recommit "[DWARFYAML][debug_line] Replace `InitialLength` with `Format` and `Length`." This recommits fcc0c186e9cea0af644581069058f0e00469d20e	2020-06-13 23:39:11 +08:00
Xing GUO	370f4bd1aa	Revert "[DWARFYAML][debug_line] Replace `InitialLength` with `Format` and `Length`." This reverts commit fcc0c186e9cea0af644581069058f0e00469d20e.	2020-06-13 17:57:02 +08:00
Xing GUO	8aff593f62	[DWARFYAML][debug_line] Replace `InitialLength` with `Format` and `Length`.	2020-06-13 17:47:06 +08:00
Nikita Popov	f4f19b701b	Reapply [LVI] Restructure caching to fix non-determinism This was reverted due to a reported memory usage increase. However, a test case was never provided, and I wasn't able to reproduce it myself. Relative to the original patch, I have moved the block cache structure behind a unique_ptr, to avoid storing a huge structure inside a DenseMap. --- Variant on D70103 to fix https://bugs.llvm.org/show_bug.cgi?id=43909. The caching is switched to always use a BB to cache entry map, which then contains per-value caches. A separate set contains value handles with a deletion callback. This allows us to properly invalidate overdefined values. A possible alternative would be to always cache by value first and have per-BB maps/sets in the each cache entry. In that case we could use a ValueMap and would avoid the separate value handle set. I went with the BB indexing at the top level to make it easier to integrate D69914, but possibly that's not the right choice. Differential Revision: https://reviews.llvm.org/D70376	2020-06-13 11:31:40 +02:00
Craig Topper	232abcb694	[X86] Remove brand_id check from getHostCPUName. Brand index was a feature some Pentium III and Pentium 4 CPUs. It provided an index into a software lookup table to provide a brand name for the CPU. This is separate from the family/model. It's unclear to me why this index being non-zero was used to block checking family/model. I think the effect of this is that -march=native was not working correctly on the CPUs that have a non-zero brand index. They are all about 20 years old so this probably hasn't affected many users.	2020-06-12 20:38:30 -07:00
Mehdi Amini	5f9fe3c1d0	Fix GCC5 build by renaming variable used in 'auto' deduction (NFC) GCC5 errors out with: llvm/lib/Analysis/StackSafetyAnalysis.cpp:935:21: error: use of 'KV' before deduction of 'auto' for (auto &KV : KV.second.Params) { ^	2020-06-13 03:08:56 +00:00
Dan Gohman	a6e436091a	[WebAssembly] WebAssembly doesn't support "protected" visibility Implement the `hasProtectedVisibility()` hook to indicate that, like Darwin, WebAssembly doesn't support "protected" visibility. On ELF, "protected" visibility is intended to be an optimization, however in practice it often [isn't], and ELF documentation generally ranges from [not mentioning it at all] to [strongly discouraging its use]. [isn't]: https://www.airs.com/blog/archives/307 [not mentioning it at all]: https://gcc.gnu.org/wiki/Visibility [strongly discouraging its use]: https://www.akkadia.org/drepper/dsohowto.pdf While here, also mention the new Reactor support in the release notes.	2020-06-12 19:52:35 -07:00
Craig Topper	fb25648f7b	[X86] Combine the three feature variables in getHostCPUName into an array and pass it around as an array reference. This makes the setting and clearing of bits simpler.	2020-06-12 18:30:41 -07:00
Vitaly Buka	5eed995b68	[StackSafety] Run ThinLTO Summary: ThinLTO linking runs dataflow processing on collected function parameters. Then StackSafetyGlobalInfoWrapperPass in ThinLTO backend will run as usual looking up to external symbol in the summary if needed. Depends on D80985. Reviewers: eugenis, pcc Reviewed By: eugenis Subscribers: inglorion, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D81242	2020-06-12 18:11:29 -07:00
Vitaly Buka	ee47bd7b36	[StackSafety,NFC] Extract addOverflowNever	2020-06-12 17:42:32 -07:00
Eric Christopher	c7d09d1011	Temporarily revert "[MemCpyOptimizer] Simplify API of processStore and processMem* functions" as it seems to be causing some internal crashes in AA after email with the author. This reverts commit f79e6a8847aa330cac6837168d02f6b319024858.	2020-06-12 14:01:27 -07:00
Roman Lebedev	132f6b3168	[NFCI][MachineCopyPropagation] invalidateRegister(): use SmallSet<8> instead of DenseSet. This decreases the time consumed by the pass [during RawSpeed unity build] by 25% (0.0586 s -> 0.04388 s). While that isn't really impressive overall, that wasn't the goal here. The memory results here are noticeable. The baseline results are: ``` total runtime: 55.65s. calls to allocation functions: 19754254 (354960/s) temporary memory allocations: 4951609 (88974/s) peak heap memory consumption: 239.13MB peak RSS (including heaptrack overhead): 463.79MB total memory leaked: 198.01MB ``` While with this patch the results are: ``` total runtime: 55.37s. calls to allocation functions: 19068237 (344403/s) # -3.47 % temporary memory allocations: 4261772 (76974/s) # -13.93 % (!!!) peak heap memory consumption: 239.13MB peak RSS (including heaptrack overhead): 463.73MB total memory leaked: 198.01MB ``` So we get rid of a lot of temporary allocations. Using `SmallSet<8>` makes sense to me because at least here for x86 BdVer2, the size of that set is never more than 3, over all of llvm test-suite + RawSpeed. The story might be different on other targets, not sure if it will ever justify whole DenseSet, but if it does SmallDenseSet might be a compromise.	2020-06-12 23:10:54 +03:00
Roman Lebedev	d7bb9dc47c	[NFCI] VectorCombine: add statistic for bitcast(shuf()) -> shuf(bitcast()) xform	2020-06-12 23:10:53 +03:00
Roman Lebedev	1a4a95dfd5	[NFC] OpenMPOpt: add a statistic for num of parallel regions deleted	2020-06-12 23:10:53 +03:00
Ronak Chauhan	94e53ef1f1	[MC] Changes to help improve target specific symbol disassembly Summary: This commit slightly modifies the MCDisassembler, and llvm-objdump to allow targets to also decode entire symbols. WebAssembly uses the onSymbolStart hook it to decode preludes. WebAssembly partially disassembles the symbol in its target specific way; and then falls back to the normal flow of llvm-objdump. AMDGPU needs it to decode kernel descriptors entirely, and move to the next symbol. This commit is to split the above task into 2. - Changes to llvm-objdump and MC-layer without breaking WebAssembly code [ this commit ] - AMDGPU's implementation of onSymbolStart that decodes kernel descriptors. [ https://reviews.llvm.org/D80713 ] Reviewers: scott.linder, t-tye, sunfish, arsenm, jhenderson, MaskRay, aardappel Reviewed By: scott.linder, jhenderson, aardappel Subscribers: bcain, dschuff, wdng, tpr, sbc100, jgravelle-google, hiraditya, aheejin, MaskRay, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80512	2020-06-12 15:51:37 -04:00
Christopher Tetreault	3be3719b08	[SVE] Break dependency of Type.h on DerivedTypes.h Summary: Inline functions in Type.h depended upon inline functions isVectorTy and getScalarType defined in DerivedTypes.h. Reimplement these functions in Type.h in terms of Type Reviewers: rengolin, efriedma, echristo, c-rhodes, david-arm Reviewed By: echristo Subscribers: tschuett, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81684	2020-06-12 12:43:33 -07:00
David Blaikie	60f7d5f8b7	llvm-dwarfdump: Include unit count in DWP index header dumping And add comma separators (to be consistent with recent changes/improvements to the dumping of other section headers) while I'm here.	2020-06-12 12:40:02 -07:00
Michael Liao	7a293f5b3c	[amdgpu] Skip OR combining on 64-bit integer before legalizing ops. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81710	2020-06-12 15:22:38 -04:00
Erich Keane	3960ffe231	Update Kaleidoscope tutorial inline code Reported on IRC, the tutorial code at the bottom of the page correctly namespaces the FunctionPassManager, but the as-you-go code does not. This patch adds the namespace to those.	2020-06-12 12:02:35 -07:00
Amara Emerson	d1a0fd3633	[AArch64][GlobalISel] Legalize vector G_PTR_ADD and enable selection. Differential Revision: https://reviews.llvm.org/D81419	2020-06-12 11:25:17 -07:00
David Green	a70a741e88	[ARM] Always use reductions intrinsics under MVE Similar to a recent change to the X86 backend, this changes things so that we always produce a reduction intrinsics for all reduction types, not just the legal ones. This gives a better chance in the backend to custom lower them to something more suitable for MVE. Especially for something like fadd the in-order reduction produced during DAG lowering is already better than the shuffles produced in the midend, and we can do even better with a bit of custom lowering. Differential Revision: https://reviews.llvm.org/D81398	2020-06-12 19:21:17 +01:00
Daniel Grumberg	4e65793a85	[TableGen] Make behavior of getValueAsListOfStrings consistent with getValueAsString	2020-06-12 19:16:48 +01:00
Louis Dionne	db01bed9bb	[Lit] Pass through SSH_AUTH_SOCK from the surrounding environment This allows running Lit tests that run ssh without having to manually enter a password (which is inconvenient), by just having ssh-agent setup properly when running the test suite.	2020-06-12 13:59:29 -04:00
Michael Liao	2c215416cc	[DAGCombine] Generalize the case (add (or x, c1), c2) -> (add x, (c1 + c2)) Reviewers: arsenm Subscribers: sdardis, wdng, hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, ecnelises, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81708	2020-06-12 13:53:08 -04:00
Jessica Paquette	3f385bc0de	[AArch64][GlobalISel] Allow G_DUP for elements smaller than 32 B. We select all of these via patterns now, so there's no reason to disallow this. Update select-dup.mir to show that we correctly select the smaller types. Differential Revision: https://reviews.llvm.org/D81322	2020-06-12 09:40:34 -07:00
Jessica Paquette	8a20f4e977	[AArch64][GlobalISel] Set hasSideEffects = 0 on custom shuffle opcodes This was making it so that the instructions weren't eliminated in select-rev.mir and select-trn.mir despite not being used. Update the tests accordingly. Differential Revision: https://reviews.llvm.org/D81492	2020-06-12 09:39:46 -07:00
Huihui Zhang	5e72cbc6bb	[NFC] Silence compiler warning [-Wmissing-braces]. llvm/lib/Target/AArch64/AArch64SLSHardening.cpp:146:5: warning: suggest braces around initialization of subobject [-Wmissing-braces] "__llvm_slsblr_thunk_x0", "__llvm_slsblr_thunk_x1", ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ { llvm/lib/Target/AArch64/AArch64SLSHardening.cpp:168:5: warning: suggest braces around initialization of subobject [-Wmissing-braces] AArch64::X0, AArch64::X1, AArch64::X2, AArch64::X3, AArch64::X4, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ {	2020-06-12 08:55:03 -07:00
Matt Arsenault	a4ac51f83e	GlobalISel: Fix not erasing old instruction in sitofp/uitofp lowering	2020-06-12 10:33:23 -04:00
Masoud Ataei	603bc25832	DAGCombiner optimization for pow(x,0.75) and pow(x,0.25) on double and single precision even in case massv function is asked Here, I am proposing to add an special case for massv powf4/powd2 function (SIMD counterpart of powf/pow function in MASSV library) in MASSV pass to get later optimizations like conversion from pow(x,0.75) and pow(x,0.25) for double and single precision to sequence of sqrt's in the DAGCombiner in vector float case. My reason for doing this is: the optimized pow(x,0.75) and pow(x,0.25) for double and single precision to sequence of sqrt's is faster than powf4/powd2 on P8 and P9. In case MASSV functions is called, and if the exponent of pow is 0.75 or 0.25, we will get the sequence of sqrt's and if exponent is not 0.75 or 0.25 we will get the appropriate MASSV function. Reviewed By: steven.zhang Tags: #LLVM #PowerPC Differential Revision: https://reviews.llvm.org/D80744	2020-06-12 10:02:16 -04:00
Simon Pilgrim	558c76c79d	[DAG] foldAddSubOfSignBit - add support for non-uniform vector constants	2020-06-12 14:58:15 +01:00
Simon Pilgrim	3daa1659ab	[X86] Add non-uniform vector signbit test cases	2020-06-12 14:58:15 +01:00
Joel E. Denny	962e0146a4	[lit] Fix handling of various keyword parse errors In TestRunner.py, D78589 extracts a `_parseKeywords` function from `parseIntegratedTestScript`, which then expects `_parseKeywords` to always return a list of keyword/value pairs. However, the extracted code sometimes returns an unresolved `lit.Test.Result` on a keyword parsing error, which then produces a stack dump instead of the expected diagnostic. This patch fixes that, makes the style of those diagnostics more consistent, and extends the lit test suite to cover them. Reviewed By: ldionne Differential Revision: https://reviews.llvm.org/D81665	2020-06-12 09:37:40 -04:00
Marco Elver	634f94db04	[ASan][NFC] Refactor redzone size calculation Refactor redzone size calculation. This will simplify changing the redzone size calculation in future. Note that AddressSanitizer.cpp violates the latest LLVM style guide in various ways due to capitalized function names. Only code related to the change here was changed to adhere to the style guide. No functional change intended. Reviewed By: andreyknvl Tags: #llvm Differential Revision: https://reviews.llvm.org/D81367	2020-06-12 15:33:00 +02:00
Xing GUO	6780e06d24	[ObjectYAML][DWARF] Add one helper function `writeInitialLength()`. NFC.	2020-06-12 21:10:58 +08:00
Simon Pilgrim	7956ed7a6d	[X86][SSE] combineX86ShuffleChain - combine INSERT_VECTOR_ELT patterns to INSERTPS Noticed while trying to cleanup D66004 - if a shuffle operand came from a scalar, we're better off using INSERTPS vs UNPCKLPS as this is more likely to load fold later on. It also matches our existing BUILD_VECTOR lowering. We can extend this to other PINSRB/D/Q/W cases in the future as the need arises.	2020-06-12 11:59:01 +01:00

1 2 3 4 5 ...

198435 Commits