llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

Author	SHA1	Message	Date
Fangrui Song	921cf42d37	[lit] Delete empty lines at the end of lit.local.cfg NFC llvm-svn: 363538	2019-06-17 09:51:07 +00:00
Roman Lebedev	8c37b784cb	[NFC][Codegen] Standalone tests for icmp eq/ne (urem %x, C), 0 -> icmp eq/ne %x, 0 fold (D63390) llvm-svn: 363537	2019-06-17 09:50:50 +00:00
Sander de Smalen	520ee96f7f	Describe stack-id as an enum This patch changes MIR stack-id from an integer to an enum, and adds printing/parsing support for this in MIR files. The default stack-id '0' is now renamed to 'default'. This should make MIR tests that have stack objects with different stack-ids more descriptive. It also clarifies code operating on StackID. Reviewers: arsenm, thegameg, qcolombet Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60137 llvm-svn: 363533	2019-06-17 09:13:29 +00:00
Hans Wennborg	6b843448d0	Re-commit r357452 (take 3): "SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)" Third time's the charm. This was reverted in r363220 due to being suspected of an internal benchmark regression and a test failure, none of which turned out to be caused by this. llvm-svn: 363529	2019-06-17 07:47:28 +00:00
Yevgeny Rouban	0b9f7752c8	[SimplifyCFG] Fix prof branch_weights MD while removing unreachable switch cases SimplifyCFG has a bug that results in inconsistent prof branch_weights metadata if unreachable switch cases are removed. This patch fixes this bug by making use of the newly introduced SwitchInstProfUpdateWrapper class (see patch D62122). A new test is created. Differential Revision: https://reviews.llvm.org/D62186 llvm-svn: 363527	2019-06-17 05:55:12 +00:00
Justin Hibbits	2bad53a983	PowerPC: Optimize SPE double parameter calling setup Summary: SPE passes doubles the same as soft-float, in register pairs as i32 types. This is all handled by the target-independent layer. However, this is not optimal when splitting or reforming the doubles, as it pushes to the stack and loads from, on either side. For instance, to pass a double argument to a function, assuming the double value is in r5, the sequence currently looks like this: evstdd 5, X(1) lwz 3, X(1) lwz 4, X+4(1) Likewise, to form a double into r5 from args in r3 and r4: stw 3, X(1) stw 4, X+4(1) evldd 5, X(1) This optimizes the fence to use SPE instructions. Now, to pass a double to a function: mr 4, 5 evmergehi 3, 5, 5 And to form a double into r5 from args in r3 and r4: evmergelo 5, 3, 4 This is comparable to the way that gcc generates the double splits. This also fixes a bug with expanding builtins to libcalls, where the LowerCallTo() code path was generating intermediate illegal type nodes. Reviewers: nemanjai, hfinkel, joerg Subscribers: kbarton, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54583 llvm-svn: 363526	2019-06-17 03:15:23 +00:00
Seiya Nuta	942fd33165	[yaml2obj][MachO] Don't fill dummy data for virtual sections Summary: Currently, MachOWriter::writeSectionData writes dummy data (0xdeadbeef) to fill section data areas in the file even if the section is a virtual one. Since virtual sections don't occupy any space in the file, writing dummy data could results the "OS.tell() - fileStart <= Sec.offset" assertion failure. This patch fixes the bug by simply not writing any dummy data for virtual sections. Reviewers: beanz, jhenderson, rupprecht, alexshap Reviewed By: alexshap Subscribers: compnerd, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62991 llvm-svn: 363525	2019-06-17 02:07:20 +00:00
Seiya Nuta	d44769e77f	[llvm-objcopy] Add elf32-sparc and elf32-sparcel target Summary: The "sparc"/"sparcel" architectures appears in ArchMap (used by -B option) but not in OutputFormatMap (used by -I/-O option). Add their targets into OutputFormatMap for consistency. Note that AFAIK there're no targets for 32-bit little-endian SPARC ("elf32-sparcel") in GNU binutils. Reviewers: espindola, alexshap, rupprecht, jhenderson, compnerd, jakehehrlich Reviewed By: jhenderson, compnerd, jakehehrlich Subscribers: jyknight, emaste, arichardson, fedor.sergeev, jakehehrlich, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63238 llvm-svn: 363524	2019-06-17 02:03:45 +00:00
Roman Lebedev	760a88fd91	[InstSimplify] Fix addo/subo undef folds (PR42209) Fix folds of addo and subo with an undef operand to be: `@llvm.{u,s}{add,sub}.with.overflow` all fold to `{ undef, false }`, as per LLVM undef rules. Same for commuted variants. Based on the original version of the patch by @nikic. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42209 \| PR42209 ]] Differential Revision: https://reviews.llvm.org/D63065 llvm-svn: 363522	2019-06-16 20:39:45 +00:00
Nicolai Haehnle	7e12ec07a2	AMDGPU: Prepare for explicit absolute relocations in code generation Summary: We will use absolute relocations for LDS symbols. Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61492 llvm-svn: 363517	2019-06-16 17:43:37 +00:00
Nicolai Haehnle	fa39305b4a	AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0 Summary: Instead of encoding a high-word of 0 using a fake TargetGlobalAddress, just use a literal target constant. This simplifies some subsequent changes. The generated assembly is now more explicit about the kind of relocation that is to be used. Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61491 llvm-svn: 363516	2019-06-16 17:32:01 +00:00
Nicolai Haehnle	c02be57950	AMDGPU/GFX10: Support DLC bit in llvm.amdgcn.s.buffer.load intrinsic Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539 Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62486 Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0 llvm-svn: 363514	2019-06-16 17:14:12 +00:00
Stanislav Mekhanoshin	cefbe0fc94	[AMDGPU] gfx10 conditional registers handling This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513	2019-06-16 17:13:09 +00:00
Sanjay Patel	7019c260cd	[CodeGenPrepare][x86] shift both sides of a vector select when profitable This is based on the example/discussion in PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 Proper vector shift instructions don't appear until AVX2, so we may generate several extra instructions within a loop trying to compensate for that. It's difficult to recover from that shift expansion later than this, so use the existing TLI hook and splat analysis to enable better codegen. This extends CGP functionality introduced with: rL201655 Differential Revision: https://reviews.llvm.org/D63233 llvm-svn: 363511	2019-06-16 15:29:03 +00:00
Sanjay Patel	df1008af37	[x86] split 256-bit vector selects if operands are vector concats This is similar logic/motivation to the select splitting in D62969. In D63233, the pattern changes so that we no longer have an extract_subvector of vselect, but the operands of the select are still being concatenated. The closest case is represented in either the first or last test diffs here - we have an extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions. I think that's the right trade-off for most AVX1 targets. In the example based on PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 ...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx). Differential Revision: https://reviews.llvm.org/D63364 llvm-svn: 363508	2019-06-16 14:04:49 +00:00
Simon Pilgrim	5aad3bee75	[X86] CombineShuffleWithExtract - handle cases with different vector extract sources Insert the shorter vector source into an undef vector of the longer vector source's type. llvm-svn: 363507	2019-06-16 08:00:41 +00:00
Simon Pilgrim	f90ae92229	[X86][AVX] Handle lane-crossing shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well. Fixes PR34380 llvm-svn: 363500	2019-06-15 18:30:43 +00:00
Simon Pilgrim	e165d25b9b	[X86][AVX] Decode constant bits from insert_subvector(c1, c2, c3) This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0) llvm-svn: 363499	2019-06-15 17:05:24 +00:00
Roman Lebedev	eca9899a89	[NFC][MCA][X86] Add one more 'clear super register' pattern - movss/movsd load clears high XMM bits llvm-svn: 363498	2019-06-15 16:12:13 +00:00
Roman Lebedev	f3a76c9591	[NFC][MCA][X86] Add baseline test coverage for AMD Barcelona (aka K10, fam10h) Looking into sched model for that CPU ... llvm-svn: 363497	2019-06-15 16:12:05 +00:00
Kang Zhang	b6bd9f2f91	[PowerPC] Set the innermost hot loop to align 32 bytes Summary: If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that we can decrease cache misses and branch-prediction misses. Actual alignment of the loop will depend on the hotness check and other logic in alignBlocks. The old code will only align hot loop to 32 bytes when the LoopSize larger than 16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop to 32 bytes not only for the hot loop whose size is 16~32 bytes. Reviewed By: steven.zhang, jsji Differential Revision: https://reviews.llvm.org/D61228 llvm-svn: 363495	2019-06-15 15:10:24 +00:00
Nikita Popov	c334a9e7fa	[SCEV] Use unsigned/signed intersection type in SCEV Based on D59959, this switches SCEV to use unsigned/signed range intersection based on the sign hint. This will prefer non-wrapping ranges in the relevant domain. I've left the one intersection in getRangeForAffineAR() to use the smallest intersection heuristic, as there doesn't seem to be any obvious preference there. Differential Revision: https://reviews.llvm.org/D60035 llvm-svn: 363490	2019-06-15 09:15:52 +00:00
Nikita Popov	e851c6c59e	[SimplifyIndVar] Simplify non-overflowing saturating add/sub If we can detect that saturating math that depends on an IV cannot overflow, replace it with simple math. This is similar to the CVP optimization from D62703, just based on a different underlying analysis (SCEV vs LVI) that catches different cases. Differential Revision: https://reviews.llvm.org/D62792 llvm-svn: 363489	2019-06-15 08:48:52 +00:00
Fangrui Song	1aca9a529e	[RISCV] Regenerate remat.ll and atomic-rmw.ll after D43256 llvm-svn: 363487	2019-06-15 07:49:14 +00:00
Alex Brachet	1c3240a461	[objcopy] Error when --preserve-dates is specified with standard streams Summary: llvm-objcopy/strip now error when -p is specified when reading from stdin or writing to stdout Reviewers: jhenderson, rupprecht, espindola, alexshap Reviewed By: jhenderson, rupprecht Subscribers: emaste, arichardson, jakehehrlich, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63090 llvm-svn: 363485	2019-06-15 05:32:23 +00:00
Michael Berg	70f373fb8d	adding more fmf propagation for selects plus updated tests llvm-svn: 363484	2019-06-15 04:53:51 +00:00
Fangrui Song	81b56b45f1	Revert "adding more fmf propagation for selects plus tests" This reverts rL363474. -debug-only=isel was added to some tests that don't specify `REQUIRES: asserts`. This causes failures on -DLLVM_ENABLE_ASSERTIONS=off builds. I chose to revert instead of fixing the tests because I'm not sure whether we should add `REQUIRES: asserts` to more tests. llvm-svn: 363482	2019-06-15 03:51:08 +00:00
Huihui Zhang	ed311665bd	[InstCombine] Add tests to show missing fold opportunity for "icmp and shift" (nfc). Summary: For icmp pred (and (sh X, Y), C), 0 When C is signbit, expect to fold (X << Y) & signbit ==/!= 0 into (X << Y) >=/< 0, rather than (X & (signbit >> Y)) != 0. When C+1 is power of 2, expect to fold (X << Y) & ~C ==/!= 0 into (X << Y) </>= C+1, rather than (X & (~C >> Y)) == 0. For icmp pred (and X, (sh signbit, Y)), 0 Expect to fold (X & (signbit l>> Y)) ==/!= 0 into (X << Y) >=/< 0 Expect to fold (X & (signbit << Y)) ==/!= 0 into (X l>> Y) >=/< 0 Reviewers: lebedev.ri, efriedma, spatel, craig.topper Reviewed By: lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63025 llvm-svn: 363479	2019-06-15 00:33:41 +00:00
Matt Arsenault	e24b975d54	Reapply "GlobalISel: Avoid producing Illegal copies in RegBankSelect" This reapplies r363410, avoiding null dereference if there is no AltRegBank. llvm-svn: 363478	2019-06-15 00:33:26 +00:00
Mitch Phillips	c40bb0e320	Revert "GlobalISel: Avoid producing Illegal copies in RegBankSelect" This patch breaks UBSan build bots. See https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild for a guide as to how to reproduce the error. This reverts commit c2864c0de07efb5451d32d27a7d4ff2984830929. This reverts rL363410. llvm-svn: 363476	2019-06-14 23:45:34 +00:00
Michael Berg	19763b0404	adding more fmf propagation for selects plus tests llvm-svn: 363474	2019-06-14 23:30:52 +00:00
Guozhi Wei	6a55f3d913	[MBP] Move a latch block with conditional exit and multi predecessors to top of loop Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is: * a latch block * it has two successors, one is loop header, another is exit * it has more than one predecessors If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch. Differential Revision: https://reviews.llvm.org/D43256 llvm-svn: 363471	2019-06-14 23:08:59 +00:00
Akira Hatanaka	b788377f2b	[ObjC][ARC] Delete ObjC runtime calls on global variables annotated with 'objc_arc_inert' Those calls are no-ops, so they can be safely deleted. rdar://problem/49839633 Differential Revision: https://reviews.llvm.org/D62433 llvm-svn: 363468	2019-06-14 22:06:32 +00:00
Matt Arsenault	d3eb4b0274	AMDGPU: Avoid most waitcnts before calls Currently you get extra waits, because waits are inserted for the register dependencies of the call, and the function prolog waits on everything. Currently waits are still inserted on returns. It may make sense to not do this, and wait in the caller instead. llvm-svn: 363465	2019-06-14 21:52:26 +00:00
Francis Visoiu Mistrih	9be6e61346	[Remarks][NFC] Improve testing and documentation of -foptimization-record-passes This adds: * documentation to the user manual * nicer error message * test for the error case * test for the gold plugin llvm-svn: 363463	2019-06-14 21:38:57 +00:00
Matt Arsenault	731346ec27	SROA: Allow eliminating addrspacecasted allocas There is a circular dependency between SROA and InferAddressSpaces today that requires running both multiple times in order to be able to eliminate all simple allocas and addrspacecasts. InferAddressSpaces can't remove addrspacecasts when written to memory, and SROA helps move pointers out of memory. This should avoid inserting new commuting addrspacecasts with GEPs, since there are unresolved questions about pointer wrapping between different address spaces. For now, don't replace volatile operations that don't match the alloca addrspace, as it would change the address space of the access. It may be still OK to insert an addrspacecast from the new alloca, but be more conservative for now. llvm-svn: 363462	2019-06-14 21:38:31 +00:00
Matt Arsenault	e8819fc12c	SROA: Add baseline test for addrspacecast changes llvm-svn: 363460	2019-06-14 21:22:26 +00:00
Matt Arsenault	5f185d6ecc	AMDGPU: Fix capitalized register names in asm constraints This was a workaround a long time ago, but the canonical lower case names work now. llvm-svn: 363459	2019-06-14 21:16:06 +00:00
Matt Arsenault	1897dcd325	AMDGPU: Fix dropping memref for ds append/consume The way SelectionDAG treats memory operands is very frustrating, and by default drops them unless a property is set on the pattern. There is no pattern for manually selected instructions, so this requires manually setting them. llvm-svn: 363455	2019-06-14 21:01:24 +00:00
Matt Arsenault	d469e3d76e	AMDGPU: Add baseline test for call waitcnt insertion llvm-svn: 363453	2019-06-14 21:01:23 +00:00
Sanjay Patel	607142d387	[x86] add test for 256-bit blendv with AVX targets; NFC This is a reduction of the pattern seen in D63233. llvm-svn: 363448	2019-06-14 20:03:42 +00:00
Amara Emerson	a21f959530	[GlobalISel] Add a G_BRJT opcode. This is a branch opcode that takes a jump table pointer, jump table index and an index into the table to do an indirect branch. We pass both the table pointer and JTI to allow targets like ARM64 to more easily use the existing jump table compression optimization without having to walk up the block to find a paired G_JUMP_TABLE. Differential Revision: https://reviews.llvm.org/D63159 llvm-svn: 363434	2019-06-14 17:55:48 +00:00
Florian Hahn	9a5851ba0e	Revert Fix a bug w/inbounds invalidation in LFTR Reverting because it breaks a green dragon build: http://green.lab.llvm.org/green/job/clang-stage2-Rthinlto/18208 This reverts r363289 (git commit eb88badff96dacef8fce3f003dec34c2ef6900bf) llvm-svn: 363427	2019-06-14 17:23:09 +00:00
Valery Pykhtin	8ac3784ab2	[AMDGPU] Don't constrain callees with inlinehint from inlining on MaxBB check Summary: Function bodies marked inline in an opencl source are eliminated but MaxBB check may prevent inlining them leaving undefined references. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, Anastasia, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63337 llvm-svn: 363418	2019-06-14 16:37:33 +00:00
Kevin P. Neal	3226a47280	[FPEnv] Lower STRICT_FP_EXTEND and STRICT_FP_ROUND nodes in preprocess phase of ISelLowering to mirror non-strict nodes on x86. I recently discovered a bug on the x86 platform: The fp80 type was not handled well by x86 for constrained floating point nodes, as their regular counterparts are replaced by extending loads and truncating stores during the preprocess phase. Normally, platforms don't have this issue, as they don't typically attempt to perform such legalizations during instruction selection preprocessing. Before this change, strict_fp nodes survived until they were mutated to normal nodes, which happened shortly after preprocessing on other platforms. This modification lowers these nodes at the same phase while properly utilizing the chain.5 Submitted by: Drew Wock <drew.wock@sas.com> Reviewed by: Craig Topper, Kevin P. Neal Approved by: Craig Topper Differential Revision: https://reviews.llvm.org/D63271 llvm-svn: 363417	2019-06-14 16:28:55 +00:00
Sanjay Patel	1485e66f09	[x86] move vector shift tests for PR37428; NFC As suggested in the post-commit thread for rL363392 - it's wasteful to have so many runs for larger tests. AVX1/AVX2 is what shows the diff and probably what matters most going forward. llvm-svn: 363411	2019-06-14 15:23:09 +00:00
Matt Arsenault	49e58954a8	GlobalISel: Avoid producing Illegal copies in RegBankSelect Avoid producing illegal register bank copies for reg_sequence and phi. The default implementation assumes it is possible to pick any operand's bank and use that for the result, introducing a copy for operands with a different bank. This does not check for illegal copies. It is not legal to introduce a VGPR->SGPR copy, so any VGPR operand requires the result to be a VGPR. The changes in getInstrMappingImpl aren't strictly necessary, since AMDGPU now just bypasses this for reg_sequence/phi. This could be replaced with an assert in case other targets run into this. It is currently responsible for producing the error for unsatisfiable copies, but this will be better served with a verifier check. For phis, for now assume any undetermined operands must be VGPRs. Eventually, this needs to be able to defer mapping these operations. This also does not yet have a way to check for whether the block is in a divergent region. llvm-svn: 363410	2019-06-14 15:22:25 +00:00
Sanjay Patel	2df6f39bd3	[CodeGenPrepare] propagate debuginfo when copying a shuffle llvm-svn: 363409	2019-06-14 15:05:35 +00:00
Matt Arsenault	009f0f6078	AMDGPU: Fold readlane intrinsics of constants I'm not 100% sure about this, since I'm worried about IR transforms that might end up introducing divergence downstream once replaced with a constant, but I haven't come up with an example yet. llvm-svn: 363406	2019-06-14 14:51:26 +00:00
Mikhail Maltsev	ec507e9c0b	[ARM] Add MVE horizontal accumulation instructions This is the family of vector instructions that combine all the lanes in their input vector(s), and output a value in one or two GPRs. Differential Revision: https://reviews.llvm.org/D62670 llvm-svn: 363403	2019-06-14 14:31:13 +00:00

1 2 3 4 5 ...

62371 Commits