llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 10:42:39 +01:00

Author	SHA1	Message	Date
Arthur Eubanks	b57fa43169	[test][TSan] Fix tests under NPM Under NPM, the TSan passes are split into a module and function pass. A couple tests were testing for inserted module constructors, which is only part of the module pass.	2020-09-18 11:37:55 -07:00
Huihui Zhang	3c26b4961e	[InstCombine][SVE] Skip scalable type for InstCombiner::getFlippedStrictnessPredicateAndConstant. We cannot iterate on scalable vector, the number of elements is unknown at compile-time. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87918	2020-09-18 11:26:36 -07:00
David Blaikie	4972772ef0	Linewrap & remove some dead typedefs from previous commit Cleanup for 51a505340dfdfdfd9ab32c7267a74db3cdeefa56	2020-09-18 11:22:37 -07:00
David Blaikie	c717659738	DebugInfo: Simplify line table parsing to take all the units together, rather than CUs and TUs separately	2020-09-18 11:18:23 -07:00
James Y Knight	3c2a4cc1bc	PR47468: Fix findPHICopyInsertPoint, so that copies aren't incorrectly inserted after an INLINEASM_BR. findPHICopyInsertPoint special cases placement in a block with a callbr or invoke in it. In that case, we must ensure that the copy is placed before the INLINEASM_BR or call instruction, if the register is defined prior to that instruction, because it may jump out of the block. Previously, the code placed it immediately after the last def _or use_. This is wrong, if the use is the instruction which may jump. We could correctly place it immediately after the last def (ignoring uses), but that is non-optimal for register pressure. Instead, place the copy after the last def, or before the call/inlineasm_br, whichever is later. Differential Revision: https://reviews.llvm.org/D87865	2020-09-18 14:14:04 -04:00
Simon Pilgrim	4b7fc5190e	[X86][AVX] Add missing non AVX512VL broadcastm test coverage	2020-09-18 19:11:29 +01:00
Matt Arsenault	c40596b921	CodeGen: Move split block utility to MachineBasicBlock AMDGPU needs this in several places, so consolidate them here.	2020-09-18 14:05:18 -04:00
Matt Arsenault	b5d406c600	RegAllocFast: Rewrite and improve This rewrites big parts of the fast register allocator. The basic strategy of doing block-local allocation hasn't changed but I tweaked several details: Track register state on register units instead of physical registers. This simplifies and speeds up handling of register aliases. Process basic blocks in reverse order: Definitions are known to end register livetimes when walking backwards (contrary when walking forward then uses may or may not be a kill so we need heuristics). Check register mask operands (calls) instead of conservatively assuming everything is clobbered. Enhance heuristics to detect killing uses: In case of a small number of defs/uses check if they are all in the same basic block and if so the last one is a killing use. Enhance heuristic for copy-coalescing through hinting: We check the first k defs of a register for COPYs rather than relying on there just being a single definition. When testing this on the full llvm test-suite including SPEC externals I measured: average 5.1% reduction in code size for X86, 4.9% reduction in code on aarch64. (ranging between 0% and 20% depending on the test) 0.5% faster compiletime (some analysis suggests the pass is slightly slower than before, but we more than make up for it because later passes are faster with the reduced instruction count) Also adds a few testcases that were broken without this patch, in particular bug 47278. Patch mostly by Matthias Braun	2020-09-18 14:05:18 -04:00
Matt Arsenault	a428a9d8a7	Reapply "RegAllocFast: Record internal state based on register units" The regressions this caused should be fixed when https://reviews.llvm.org/D52010 is applied. This reverts commit a21387c65470417c58021f8d3194a4510bb64f46.	2020-09-18 14:05:18 -04:00
Zequan Wu	59b9ad16d6	[CodeGen] emit CG profile for COFF object file I forgot to add emission of CG profile for COFF object file, when adding the support (https://reviews.llvm.org/D81775) Differential Revision: https://reviews.llvm.org/D87811	2020-09-18 10:57:54 -07:00
Arthur Eubanks	df42712ac8	[test][HWAsan] Fix kernel-inline.ll under NPM	2020-09-18 10:56:08 -07:00
David Blaikie	2dc313f897	DebugInfo: Tidy up initializing multi-section contributions in DWARFContext	2020-09-18 10:54:43 -07:00
Arthur Eubanks	145867e5d6	[ASan][NewPM] Fix byref-args.ll under NPM	2020-09-18 10:50:53 -07:00
Matt Arsenault	22a074ddad	AMDGPU: Don't sometimes allow instructions before lowered si_end_cf Since 6524a7a2b9ca072bd7f7b4355d1230e70c679d2f, this would sometimes not emit the or to exec at the beginning of the block, where it really has to be. If there is an instruction that defines one of the source operands, split the block and turn the si_end_cf into a terminator. This avoids regressions when regalloc fast is switched to inserting reloads at the beginning of the block, instead of spills at the end of the block. In a future change, this should always split the block.	2020-09-18 13:43:01 -04:00
Amara Emerson	09bd06838e	[AArch64][GlobalISel] Make <8 x s8> of G_BUILD_VECTOR legal.	2020-09-18 10:32:33 -07:00
Francis Visoiu Mistrih	1be91f6af0	[NFC][ScheduleDAG] Remove unused EntrySU SUnit EntrySU doesn't seem to be used at all when building the ScheduleDAG. Differential Revision: https://reviews.llvm.org/D87867	2020-09-18 09:50:47 -07:00
Jianzhou Zhao	1e77ba8067	Use one more byte to silence a warning from Vistual C++	2020-09-18 16:42:38 +00:00
Simon Pilgrim	39e440fde2	[X86][AVX] lowerBuildVectorAsBroadcast - improve i64 BROADCASTM lowering on 32-bit targets We already handle the the cases where we have a 'zero extended splat' build vector (a, 0, 0, 0, a, 0, 0, 0, ...) but were missing the case where the 'a' scalar was zero-extended as well - such as i64 -> vXi64 splat cases on 32-bit targets.	2020-09-18 16:59:57 +01:00
Simon Pilgrim	975db31f6e	[X86][AVX] Add missing i686 broadcastm test coverage	2020-09-18 16:10:24 +01:00
Simon Pilgrim	99d78e28b2	[DAG] BuildVectorSDNode::getSplatValue - pull out repeated getNumOperands() calls. NFCI.	2020-09-18 16:10:23 +01:00
David Tenty	655fbd5c46	[AIX] Enable large code model when building with clang	2020-09-18 11:03:22 -04:00
Sanjay Patel	0432f77854	[InstSimplify] fix fmin/fmax miscompile for partial undef vectors (PR47567) It would also be correct to return the variable operand in these cases, but eliminating a variable use is probably better for optimization.	2020-09-18 10:05:44 -04:00
Matt Arsenault	8bd8036a64	IR: Move denormal mode parsing from MachineFunction to Function This was just inspecting the IR to begin with, and is useful to check in some places in the IR.	2020-09-18 09:55:47 -04:00
Matt Arsenault	15d1f58f1b	emacs: Add nofree and willreturn to list of attributes	2020-09-18 09:48:33 -04:00
Matt Arsenault	c124c9a532	Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel." This reverts commit c3492a1aa1b98c8d81b0969d52cea7681f0624c2. I think this is the wrong strategy and wrong place to do this transform anyway. Also reverts follow up commit 7d593d0d6905b55ca1124fca5e4d1ebb17203138.	2020-09-18 09:48:33 -04:00
Alexey Bataev	50c5b40f69	[SLP] Allow reordering of vectorization trees with reused instructions. If some leaves have the same instructions to be vectorized, we may incorrectly evaluate the best order for the root node (it is built for the vector of instructions without repeated instructions and, thus, has less elements than the root node). In this case we just can not try to reorder the tree + we may calculate the wrong number of nodes that requre the same reordering. For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first leaf, it will be shrink to \<a, b\>. If instructions in this leaf should be reordered, the best order will be \<1, 0\>. We need to extend this order for the root node. For the root node this order should look like \<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes with the reused instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D45263	2020-09-18 09:34:59 -04:00
Mirko Brkusanin	46431af84c	[AMDGPU] Set DS alignment requirements to be more strict Alignment requirements for ds_read/write_b96/b128 for gfx9 and onward are now the same as for other GCN subtargets. This way we can avoid any unintentional use of these instructions on systems that do not support dword alignment and instead require natural alignment. This also makes 'SH_MEM_CONFIG.alignment_mode == STRICT' the default. Differential Revision: https://reviews.llvm.org/D87821	2020-09-18 15:26:24 +02:00
Sanjay Patel	88960bf64d	[InstSimplify] add another test for NaN propagation; NFC	2020-09-18 09:20:26 -04:00
Xing GUO	3c2809fd0d	[DWARFYAML] Make the include_directories, file_names and opcodes fields of the line table optional. This patch makes the include_directories, file_names and opcodes fields of the line table optional. This helps us simplify some tests. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D87878	2020-09-18 20:21:11 +08:00
Xing GUO	2a69cdbaed	[DWARFYAML][test] Use 'CHECK-NEXT:' to make checkers stricter. NFC. This patch makes checkers stricter so that we are able to avoid some potential problems earlier. Reviewed By: jhenderson, MaskRay Differential Revision: https://reviews.llvm.org/D87876	2020-09-18 20:19:33 +08:00
David Greene	6628767888	[UpdateCCTestChecks] Include generated functions if asked Add the --include-generated-funcs option to update_cc_test_checks.py so that any functions created by the compiler that don't exist in the source will also be checked. We need to maintain the output order of generated function checks so that CHECK-LABEL works properly. To do so, maintain a list of functions output for each prefix in the order they are output. Use this list to output checks for generated functions in the proper order. Differential Revision: https://reviews.llvm.org/D83004	2020-09-18 06:34:59 -05:00
Max Kazantsev	6be1d3bebc	[Test] Missing range check removal opportunity	2020-09-18 17:55:23 +07:00
Florian Hahn	0312420155	Recommit "[DSE] Switch to MemorySSA-backed DSE by default." This switches to using DSE + MemorySSA by default again, after fixing the issues reported after the first commit. Notable fixes fc8200633122, a0017c2bc258. This reverts commit 3a59628f3cc26eb085acfc9cbdc97243ef71a6c5.	2020-09-18 11:05:00 +01:00
Florian Hahn	447cd8eb56	[SCEV] Generalize SCEVParameterRewriter to accept SCEV expression as target. This patch extends SCEVParameterRewriter to support rewriting unknown epxressions to arbitrary SCEV expressions. It will be used by further patches. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D67176	2020-09-18 10:05:02 +01:00
Gabriel Hjort Åkerlund	7fb62030d9	[TableGen][GlobalISel] Fix handling of zero_reg When generating matching tables for GlobalISel, TableGen would output "::zero_reg" whenever encountering the zero_reg, which in turn would result in compilation error. This patch fixes that by instead outputting NoRegister (== 0), which is the same result that TableGen produces when generating matching tables for ISelDAG. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D86215	2020-09-18 11:01:11 +02:00
Tim Northover	bc600a0484	AArch64: make sure jump table entries can reach entire image This turns all jump table entries into deltas within the target function because in the small memory model all code & static data must be in a 4GB block somewhere in memory. When the entries were a delta between the table location and a basic block, the 32-bit signed entries are not enough to guarantee reachability. https://reviews.llvm.org/D87286	2020-09-18 09:50:40 +01:00
Nikita Popov	07e4739f3c	Revert "[InstCombine] Canonicalize SPF_ABS to abs intrinc" This reverts commit 05d4c4ebc2fb006b8a2bd05b24c6aba10dd2eef8. mstorsjo reports a miscompile after this change in https://reviews.llvm.org/D87188#2281093. Reverting until I can investigate this.	2020-09-18 09:38:26 +02:00
Andrew Wei	f8a859443e	[AArch64] Add tests for zext pattern match with AssertZext/AssertSext operand, NFC	2020-09-18 15:02:43 +08:00
Serge Pavlov	35714dfbb7	[FPEnv] Use typed accessors in FPOptions Previously methods `FPOptions::get*` returned unsigned value even if the corresponding property was represented by specific enumeration type. With this change such methods return actual type of the property. It also allows printing value of a property as text rather than integer code. Differential Revision: https://reviews.llvm.org/D87812	2020-09-18 14:16:43 +07:00
Craig Topper	c7c58df71f	[X86] Add some demanded bits test cases for PDEP with constant mask The number of ones in the mask for the PDEP determines how many bits of the other operand are used. If the mask is constant we can use this to build a mask for SimplifyDemandedBits. This can be used to replace the extends in the test with anyextend.	2020-09-17 22:48:19 -07:00
Andrew Wei	c2b264b734	[AArch64] Emit zext move when the source of the zext is AssertZext or AssertSext When the source of the zext is AssertZext or AssertSext, it is hard to know any information about the upper 32 bits, so we should insert a zext move before emitting SUBREG_TO_REG to define the lower 32 bits. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87771	2020-09-18 12:48:41 +08:00
Amara Emerson	680fe78a3d	[AArch64][GlobalISel] Make G_STORE <8 x s8> legal.	2020-09-17 16:42:18 -07:00
Amara Emerson	c09ce4f3bc	[AArch64][GlobalISel] clang-format AArch64LegalizerInfo.cpp. NFC.	2020-09-17 16:41:10 -07:00
Amy Kwan	e85d496ffd	[PowerPC] Add Set Boolean Condition Instruction Definitions and MC Tests This patch adds the instruction definitions and assembly/disassembly tests for the set boolean condition instructions. This also includes the negative, and reverse variants of the instruction. Differential Revision: https://reviews.llvm.org/D86252	2020-09-17 18:20:54 -05:00
Amy Kwan	00f4e38665	[PowerPC] Implement Vector Count Mask Bits builtins in LLVM/Clang This patch implements the vec_cntm function prototypes in altivec.h in order to utilize the vector count mask bits instructions introduced in Power10. Differential Revision: https://reviews.llvm.org/D82726	2020-09-17 18:20:53 -05:00
Philip Reames	eab425518c	[MemorySSA] Fix an unused variable warning [NFC]	2020-09-17 16:07:59 -07:00
Zhaoshi Zheng	7d4e6e8ff5	[RISCV] Support Shadow Call Stack Currenlty assume x18 is used as pointer to shadow call stack. User shall pass flags: "-fsanitize=shadow-call-stack -ffixed-x18" Runtime supported is needed to setup x18. If SCS is desired, all parts of the program should be built with -ffixed-x18 to maintain inter-operatability. There's no particuluar reason that we must use x18 as SCS pointer. Any register may be used, as long as it does not have designated purpose already, like RA or passing call arguments. Differential Revision: https://reviews.llvm.org/D84414	2020-09-17 16:02:35 -07:00
Philip Reames	f9eea5b8c6	[AArch64] Enable implicit null check transformation This change enables the generic implicit null transformation for the AArch64 target. As background for those unfamiliar with our implicit null check support: An implicit null check is the use of a signal handler to catch and redirect to a handler a null pointer. Specifically, it's replacing an explicit conditional branch with such a redirect. This is only done for very cold branches under frontend control w/appropriate metadata. FAULTING_OP is used to wrap the faulting instruction. It is modelled as being a conditional branch to reflect the fact it can transfer control in the CFG. FAULTING_OP does not need to be an analyzable branch to achieve it's purpose. (Or at least, that's the x86 model. I find this slightly questionable.) When lowering to MC, we convert the FAULTING_OP back into the actual instruction, record the labels, and lower the original instruction. As can be seen in the test changes, currently the AArch64 backend does not eliminate the unconditional branch to the fallthrough block. I've tried two approaches, neither of which worked. I plan to return to this in a separate change set once I've wrapped my head around the interactions a bit better. (X86 handles this via AllowModify on analyzeBranch, but adding the obvious code causing BranchFolding to crash. I haven't yet figured out if it's a latent bug in BranchFolding, or something I'm doing wrong.) Differential Revision: https://reviews.llvm.org/D87851	2020-09-17 16:00:19 -07:00
Arthur Eubanks	9ae2247c05	[test] Fix FullUnroll.ll I believe the intention of this test added in https://reviews.llvm.org/D71687 was to test LoopFullUnrollPass with clang's -fno-unroll-loops, not its interaction with optnone. Loop unrolling passes don't run under optnone/-O0. Also added back unintentionally removed -disable-loop-unrolling from https://reviews.llvm.org/D85578. Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D86485	2020-09-17 15:56:13 -07:00
Quentin Colombet	f17e1f0936	[TargetRegisterInfo] Add a couple of target hooks for the greedy register allocator Before this patch, the last chance recoloring and deferred spilling techniques were solely controled by command line options. This patch adds target hooks for these two techniques so that it is easier for backend writers to override the default behavior. The default behavior of the hooks preserves the default values of the related command line options. NFC	2020-09-17 15:23:15 -07:00

1 2 3 4 5 ...

203833 Commits