llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 18:54:02 +01:00

Author	SHA1	Message	Date
Ahmed Bougacha	95630b78aa	[AArch64] Add apple-m1 CPU, and default to it for macOS. apple-m1 has the same level of ISA support as apple-a14, so this is a straightforward mechanical change. However, that also means this inherits apple-a14's v8.5a+nobti quirkiness. rdar://68287159	2021-04-20 08:41:04 -07:00
LLVM GN Syncbot	bc0540ed2d	[gn build] Port 120fa8293e22	2021-04-20 15:33:43 +00:00
Matt Arsenault	e229e7cbd1	GlobalISel: Check for powers of 2 for inverse funnel shift lowering This doesn't make a practical difference since it would only be broken if a target actually had a legal non-power-of-2 inverse shift.	2021-04-20 11:30:22 -04:00
Alexey Bataev	86ad0ff420	Revert "[SLP] Add detection of shuffled/perfect matching of tree entries." This reverts commit daf6e18c55c2ac56bbf0f9de233fb2a1150ee331 to fix the compiler crash.	2021-04-20 08:29:32 -07:00
David Green	0f24d11c47	[ARM] Limit PerformExtractEltToVMOVRRD to when f64 is legal. The generic SoftFloatVectorExtract.ll test was failing when run on arm machines, as it tries to create a f64 under soft float. Limit the transform to when f64 is legal. Also add a missing override, as reported in D100244.	2021-04-20 16:24:36 +01:00
Matt Arsenault	51148e14d7	AMDGPU/GlobalISel: Fix uitofp/sitofp with non-power-of-2 integers	2021-04-20 11:13:29 -04:00
Matt Arsenault	ad2346a20c	GlobalISel: Restrict narrow scalar for fptoui/fptosi results This practically only works for the f16 case AMDGPU uses, not wider types. Fixes bug 49710 by failing legalization.	2021-04-20 10:54:40 -04:00
Matt Arsenault	73b19968f2	MachineVerifier: Continue reporting errors for copies This was skipping verification of later copies, but generally the verifier tries to report as many things wrong as possible in the function.	2021-04-20 10:54:40 -04:00
Alexey Bataev	746009e1c5	[SLP] Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495	2021-04-20 07:46:49 -07:00
Nico Weber	14ab5acbae	[gn build] reformat all gn files $ git ls-files '.gn' '.gni' \| xargs llvm/utils/gn/gn.py format (and manually wrap two comments)	2021-04-20 10:34:08 -04:00
Bradley Smith	935e65cc31	[AArch64][SVE] Lower MULHU/MULHS nodes to umulh/smulh instructions Mark MULHS/MULHU nodes as legal for both scalable and fixed SVE types, and lower them to the appropriate SVE instructions. Additionally now that the MULH nodes are legal, integer divides can be expanded into a more performant code sequence. Differential Revision: https://reviews.llvm.org/D100487	2021-04-20 15:18:06 +01:00
Alexey Bataev	7d693d1c6d	Revert "[SLP] Add detection of shuffled/perfect matching of tree entries." This reverts commit b232771acad6225574a2eaf9f860a0fed7ef0804 to fix buildbots.	2021-04-20 07:16:11 -07:00
David Green	17e932c916	[ARM] Create VMOVRRD from adjacent vector extracts This adds a combine for extract(x, n); extract(x, n+1) -> VMOVRRD(extract x, n/2). This allows two vector lanes to be moved at the same time in a single instruction, and thanks to the other VMOVRRD folds we have added recently can help reduce the amount of executed instructions. Floating point types are very similar, but will include a bitcast to an integer type. This also adds a shouldRewriteCopySrc, to prevent copy propagation from DPR to SPR, which can break as not all DPR regs can be extracted from directly. Otherwise the machine verifier is unhappy. Differential Revision: https://reviews.llvm.org/D100244	2021-04-20 15:15:43 +01:00
Alexey Bataev	4b716fe7c5	[SLP] Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495	2021-04-20 06:55:55 -07:00
Cullen Rhodes	a845a72a5b	[AArch64][AsmParser] NFC: Remove unused ExtendOp struct Left over from 2625a993f926 when extend and shift were merged.	2021-04-20 13:45:09 +00:00
Thomas Preud'homme	2c0468b550	Fix PR46880: Fail CHECK-NOT with undefined variable Currently a CHECK-NOT directive succeeds whenever the corresponding match fails. However match can fail due to an error rather than a lack of match, for instance if a variable is undefined. This commit makes match error a failure for CHECK-NOT. Reviewed By: jdenny Differential Revision: https://reviews.llvm.org/D86222	2021-04-20 14:42:46 +01:00
Sebastian Neubauer	01dd9602ed	[AMDGPU] Add TransVALU to gfx10 Instructions on the transcendental unit are executed in parallel to the normal VALU, so add this as an extra resource. This doesn't seem to have any effect, but it should be more correct. Differential Revision: https://reviews.llvm.org/D100123	2021-04-20 15:34:43 +02:00
Fraser Cormack	67e5cf2db6	[RISCV][NFC] Add tests for scalable-vector DAGCombiner improvements These will all be improved by future patches.	2021-04-20 14:26:26 +01:00
Jay Foad	c58ec16acd	[AMDGPU] Use if instead of foreach in a few places. NFC.	2021-04-20 14:20:30 +01:00
Andrea Di Biagio	f4f953260a	[MCA][LSUnit] Fix a potential use after free in the logic that updates memory groups. Make sure that the `CriticalMemoryInstruction` of a memory group is invalidated if it references an already executed instruction. This avoids a potential use-after-free if the critical memory info becomes stale, and the value is read after the instruction has executed.	2021-04-20 13:30:45 +01:00
Nemanja Ivanovic	9a29810062	[PowerPC] Canonicalize shuffles on big endian targets as well Extend shuffle canonicalization and conversion of shuffles fed by vectorized scalars to big endian subtargets. For big endian subtargets, loads and direct moves of scalars into vector registers put the data in the correct element for SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is narrower, the value still ends up in the wrong place - althouth a different wrong place than on little endian targets. This patch extends the combine that keeps values where they are if they feed a shuffle to big endian targets. Differential revision: https://reviews.llvm.org/D100478	2021-04-20 07:29:47 -05:00
Nico Weber	74a701ab05	[llvm-objdump] Add an llvm-otool tool This implements an LLVM tool that's flag- and output-compatible with macOS's `otool` -- except for bugs, but from testing with both `otool` and `xcrun otool-classic`, llvm-otool matches vanilla otool's behavior very well already. It's not 100% perfect, but it's a very solid start. This uses the same approach as llvm-objcopy: llvm-objdump uses a different OptTable when it's invoked as llvm-otool. This is possible thanks to D100433. Differential Revision: https://reviews.llvm.org/D100583	2021-04-20 08:24:58 -04:00
Cullen Rhodes	2e96104338	[ValueTypes] Fix sizes of v256i32 and v256f32 (8182 -> 8192)	2021-04-20 12:10:02 +00:00
Jay Foad	b4774cf0e1	[AMDGPU] Use simpler alternatives to !foldl. NFC.	2021-04-20 12:59:04 +01:00
Simon Pilgrim	a482e5c5af	[DAG] SelectionDAG.cpp - breakup if-else chains where each block returns. NFCI. Match style guide that requests that if+return blocks are separate.	2021-04-20 12:37:00 +01:00
Thomas Preud'homme	23f47cc464	[lit, test] Fix test cancellation feature detection A lit feature guards tests for the lit timeout functionality because on most system it depends on the availability of the psutil Python module. However, that feature is defined based on the ability of the testing lit to cancel test, which does not necessarily apply to the ability of the tested lit. In particular, RUN commands have a cleared PYTHONPATH and user site packages are disabled. In the case where psutil is found by the testing lit from one of those two source of python path, the tested lit would not be able to find it, causing timeout tests to fail. This commit fixes the issue by testing the ability to cancel tests in the RUN command environment. Reviewed By: yln Differential Revision: https://reviews.llvm.org/D99728	2021-04-20 12:09:30 +01:00
hsmahesha	bdd9b3f551	[AMDGPU] Re-arrange ds_read/ds_write ISel pattern for better readability. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D100773	2021-04-20 16:17:15 +05:30
Dávid Bolvanský	5cbb47ede4	[MemoryBuiltins] Added support for memalign memalign is older aligned_alloc.	2021-04-20 12:39:54 +02:00
Simon Pilgrim	a2304db8c0	[Support] APInt.h - remove <algorithm> include. NFCI. Replace std::min use which should allow us to avoid including the <algorithm> header in every include of APInt.h.	2021-04-20 11:21:39 +01:00
Simon Pilgrim	246fbb1f2f	[CodeGen] CodeGenPassBuilder.h - remove unnecessary <string> include. NFCI. We only use StringRef so include that.	2021-04-20 11:21:39 +01:00
Ben Shi	9a8c38e0c8	[RISCV] Refactor an optimization of addition with immediate Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100769	2021-04-20 18:04:25 +08:00
Joe Ellis	a248bdd4af	[AArch64] Constant fold sve_convert_from_svbool(zero) to zero Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D100463	2021-04-20 10:02:49 +00:00
Joe Ellis	c0b6134f2d	[AArch64][SVE][InstCombine] Replace last{a,b} intrinsics with extracts... when the predicate used by last{a,b} specifies a known vector length. For example: aarch64_sve_lasta(VL1, D) -> extractelement(D, #1) aarch64_sve_lastb(VL1, D) -> extractelement(D, #0) Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D100476	2021-04-20 10:01:33 +00:00
David Green	e05ba7b8c6	[ARM] Regenerate a couple of tests. NFC	2021-04-20 10:54:41 +01:00
Simon Pilgrim	386c069a3e	[Support] BinaryStreamReader.h - remove unnecessary <string> include. NFCI. We only use StringRef so include that.	2021-04-20 10:31:12 +01:00
Serguei Katkov	58a3e2d34c	Re-land [GreedyRA ORE] Add Cost of spill locations into remark Re-land the patch with a fix of clang test. Cost of spill location is computed basing on relative branch frequency where corresponding spill/reload/copy are located. While the number itself is highly depends on incoming IR, the total cost can be used when do some changes in RA. Revert "Revert "[GreedyRA ORE] Add Cost of spill locations into remark"" This reverts commit 680f3d6de79f7dd75ee0cda256a541d18e504a22.	2021-04-20 16:21:07 +07:00
Fraser Cormack	0bb73d1f7a	[RISCV] Fix missing emergency slots for scalable stack offsets This patch adds an additional emergency spill slot to RVV code. This is required as RVV stack offsets may require an additional register to compute. This patch includes an optimization by @HsiangKai <kai.wang@sifive.com> to reduce the number of registers required for the computation of stack offsets from 3 to 2. Otherwise we'd need two additional emergency spill slots. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D100574	2021-04-20 09:59:41 +01:00
Sander de Smalen	df925eac48	[LV] Let selectVectorizationFactor reason directly on VectorizationFactor. Rather than maintaining two separate values, a `float` for the per-lane cost and a Width for the VF, maintain a single VectorizationFactor which comprises the two and also removes the need for converting an integer value to float. This simplifies the query when asking if one VF is more profitable than another when we want to extend this for scalable vectors (which may require additional options to determine if e.g. a scalable VF of the some cost, is more profitable than a fixed VF of the same cost). The patch isn't entirely NFC because it also fixes an issue in selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs no longer truncates the floating-point cost from `float` to `unsigned` to then perform the calculation on the truncated cost. It now does a cost comparison with the correct precision. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100121	2021-04-20 09:54:45 +01:00
Qiu Chaofan	ae1dd01644	[PowerPC] Use mtvsrdd to put callee-saved GPR into VSR This patch exploits mtvsrdd instruction (available in ISA3.0+) to save two callee-saved GPR registers into a single VSR, making it more efficient. Reviewed By: jsji, nemanjai Differential Revision: https://reviews.llvm.org/D62565	2021-04-20 16:43:24 +08:00
Jun Ma	12a1ce706d	[DAGCombiner] Support fold zero scalar vector. This patch changes ISD::isBuildVectorAllZeros to ISD::isConstantSplatVectorAllZeros which handles zero sclar vector. TestPlan: check-llvm Differential Revision: https://reviews.llvm.org/D100813	2021-04-20 16:28:43 +08:00
Jay Foad	c7f91a7bf2	[AMDGPU] GCNDPPCombine: don't shrink V_ADD_CO_U32 if carry out is used Don't shrink VOP3 instructions if there are any uses of a carry-out operand, because the shrunken form of the instruction would write the carry-out to vcc instead of to a virtual register. Differential Revision: https://reviews.llvm.org/D100760	2021-04-20 09:17:52 +01:00
Luo, Yuanke	eed19ff5a2	[X86][AMX] Verify illegal types or instructions for x86_amx. This patch is related to https://reviews.llvm.org/D100032 which define some illegal types or operations for x86_amx. There are no arguments, arrays, pointers, vectors or constants of x86_amx. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D100472	2021-04-20 16:14:22 +08:00
Arthur Eubanks	8d6572654c	Explicitly pass type to cast load constant folding result Previously we would use the type of the pointee to determine what to cast the result of constant folding a load. To aid with opaque pointer types, we should explicitly pass the type of the load rather than looking at pointee types. ConstantFoldLoadThroughBitcast() converts the const prop'd value to the proper load type (e.g. [1 x i32] -> i32). Instead of calling this in every intermediate step like bitcasts, we only call this when we actually see the global initializer value. In some existing uses of this API, we don't know the exact type we're loading from immediately (e.g. first we visit a bitcast, then we visit the load using the bitcast). In those cases we have to manually call ConstantFoldLoadThroughBitcast() when simplifying the load to make sure that we cast to the proper type. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D100718	2021-04-20 00:53:21 -07:00
Qiu Chaofan	d26cbdc561	[PowerPC] Support f128 under VSX This patch is the last one in backend to support fp128 type in pre-POWER9 subtargets with VSX, removing temporary option and updating remaining tests. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92374	2021-04-20 15:49:52 +08:00
Fraser Cormack	d36c3da976	[SelectionDAG] Relax constraints on STEP_VECTOR step operand This patch relaxes the requirement that the STEP_VECTOR step constant must be of a type at least as large as the vector element type. This does not permit its use on targets which have legal vector element types larger than the largest legal scalar type, such as i64 vectors on RV32. As such, the requirement has been loosened so that the step operand must be any scalar type so long as the constant immediate is non-negative and the value fits inside the vector element type. This limits combining optimizations in certain circumstances but in practice it's unlikely to be a hindrance. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D100660	2021-04-20 08:41:42 +01:00
Zi Xuan Wu	ca9ceef593	[CSKY 6/n] Add support branch and symbol series instruction This patch adds basic CSKY branch instructions and symbol address series instructions. Those two kinds of instruction have relationship between each other, and it involves much work about Fixups. For now, basic instructions are enabled except for disassembler support. We would support to generate basic codegen asm firstly and delay disassembler work later. Differential Revision: https://reviews.llvm.org/D95029	2021-04-20 15:36:49 +08:00
Zi Xuan Wu	2b6d3d4a3f	[CSKY 5/n] Add support for all CSKY basic integer instructions except for branch series This patch adds basic CSKY integer instructions except for branch series such as bsr, br. It mainly includes basic ALU, load & store, compare and data move instructions. Branch series instructions need handle complex symbol operand as following patch later. Differential Revision: https://reviews.llvm.org/D94007	2021-04-20 15:36:49 +08:00
Zi Xuan Wu	cc37252d6a	[CSKY 4/n] Add basic CSKYAsmParser and CSKYInstPrinter This basic parser will handle basic instructions with register or immediate operands. With the addition of CSKYInstPrinter, we can now make use of lit tests. Differential Revision: https://reviews.llvm.org/D93798	2021-04-20 15:36:49 +08:00
Max Kazantsev	135bc13710	[NFC] Restructure code to make it possible to insert other GCs	2021-04-20 14:24:38 +07:00
Luo, Yuanke	c35d11aa02	[X86][AMX] Add description of x86_amx to LangRef. Differential Revision: https://reviews.llvm.org/D100032	2021-04-20 14:29:17 +08:00

1 2 3 4 5 ...

214433 Commits