llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 12:33:33 +02:00

Author	SHA1	Message	Date
Artur Pilipenko	8d5f1040c3	[InstCombine] Allow InstrCombine to remove one of adjacent guards if they are equivalent This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine Reviewed By: majnemer, apilipenko Differential Revision: https://reviews.llvm.org/D29071 Patch by Maxim Kazantsev. llvm-svn: 293056	2017-01-25 14:12:12 +00:00
Alexey Bataev	cb8becd477	[SLP] Improve horizontal vectorization for non-power-of-2 number of instructions. If number of instructions in horizontal reduction list is not power of 2 then only PowerOf2Floor(NumberOfInstructions) last elements are actually vectorized, other instructions remain scalar. Patch tries to vectorize the remaining elements either. Differential Revision: https://reviews.llvm.org/D28959 llvm-svn: 293042	2017-01-25 09:54:38 +00:00
whitequark	e14626baaf	Mark @llvm.powi.* as safe to speculatively execute. Floating point intrinsics in LLVM are generally not speculatively executed, since most of them are defined to behave the same as libm functions, which set errno. However, the @llvm.powi.* intrinsics do not correspond to any libm function, and lacks any defined error handling semantics in LangRef. It most certainly does not alter errno. llvm-svn: 293041	2017-01-25 09:32:30 +00:00
Mohammed Agabaria	ee333a2999	[X86] enable memory interleaving for X86\SLM arch. Differential Revision: https://reviews.llvm.org/D28547 llvm-svn: 293040	2017-01-25 09:14:48 +00:00
Artur Pilipenko	4d180f5ce3	Fix buildbot failures introduced by 293036 Fix unused variable, specify types explicitly to make VC compiler happy. llvm-svn: 293039	2017-01-25 09:10:07 +00:00
Artur Pilipenko	0e6418640a	[DAGCombiner] Match load by bytes idiom and fold it into a single load. Attempt #2 . The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm. http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used. From the original commit: Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it. Assuming little endian target: i8 a = ... i32 val = a[0] \| (a[1] << 8) \| (a[2] << 16) \| (a[3] << 24) => i32 val = ((i32)a) i8 a = ... i32 val = (a[0] << 24) \| (a[1] << 16) \| (a[2] << 8) \| a[3] => i32 val = BSWAP(((i32)a)) This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations. Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part: i32 val = a[i] \| (a[i + 1] << 8) \| (a[i + 2] << 16) \| (a[i + 3] << 24) Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses. The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed. Reviewed By: RKSimon, filcab, chandlerc Differential Revision: https://reviews.llvm.org/D27861 llvm-svn: 293036	2017-01-25 08:53:31 +00:00
Diana Picus	3e1c91b8c7	[ARM] GlobalISel: Support i1 add and ABI extensions Add support for: * i1 add * i1 function arguments, if passed through registers * i1 returns, with ABI signext/zeroext Differential Revision: https://reviews.llvm.org/D27706 llvm-svn: 293035	2017-01-25 08:47:40 +00:00
Diana Picus	216544ff2f	[ARM] GlobalISel: Support i8/i16 ABI extensions At the moment, this means supporting the signext/zeroext attribute on the return type of the function. For function arguments, signext/zeroext should be handled by the caller, so there's nothing for us to do until we start lowering calls. Note that this does not include support for other extensions (i8 to i16), those will be added later. Differential Revision: https://reviews.llvm.org/D27705 llvm-svn: 293034	2017-01-25 08:10:40 +00:00
Serge Pavlov	8bed9fa904	Do not verify dominator tree if it has no roots If dominator tree has no roots, the pass that calculates it is likely to be skipped. It occures, for instance, in the case of entities with linkage available_externally. Do not run tree verification in such case. Differential Revision: https://reviews.llvm.org/D28767 llvm-svn: 293033	2017-01-25 07:58:10 +00:00
Dean Michael Berris	ace789b028	Implemented color coding and Vertex labels in XRay Graph Summary: A patch to enable the llvm-xray graph subcommand to color edges and vertices based on statistics and to annotate vertices with statistics. Depends on D27243 Reviewers: dblaikie, dberris Reviewed By: dberris Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D28225 llvm-svn: 293031	2017-01-25 07:14:43 +00:00
Coby Tayree	00c1f58b67	[X86]Enable the use of 'mov' with a 64bit GPR and a large immediate Enable the next form (intel style): "mov <reg64>, <largeImm>" which is should be available, where <largeImm> stands for immediates which exceed the range of a singed 32bit integer Differential Revision: https://reviews.llvm.org/D28988 llvm-svn: 293030	2017-01-25 07:09:42 +00:00
Diana Picus	a678e82c94	[ARM] GlobalISel: Bail out on Thumb. NFC Thumb is not supported yet, so bail out early. llvm-svn: 293029	2017-01-25 07:08:53 +00:00
Matt Arsenault	9f4074601e	AMDGPU: Check nsz instead of unsafe math llvm-svn: 293028	2017-01-25 06:27:02 +00:00
Akira Hatanaka	68f4d270cd	[SimplifyCFG] Do not sink and merge inline-asm instructions. Conservatively disable sinking and merging inline-asm instructions as doing so can potentially create arguments that cannot satisfy the inline-asm constraints. For example, SimplifyCFG used to do the following transformation: (before) if.then: %0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 8) br label %if.end if.else: %1 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 6) br label %if.end (after) %.sink = select i1 %tobool, i32 6, i32 8 %0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 %.sink) This would result in a crash in the backend since only immediate integer operands are permitted for constraint "n". rdar://problem/30110806 Differential Revision: https://reviews.llvm.org/D29111 llvm-svn: 293025	2017-01-25 06:21:51 +00:00
Matt Arsenault	093401c700	DAG: Recognize no-signed-zeros-fp-math attribute clang already emits this with -cl-no-signed-zeros, but codegen doesn't do anything with it. Treat it like the other fast math attributes, and change one place to use it. llvm-svn: 293024	2017-01-25 06:08:42 +00:00
Justin Bogner	a115225f55	GlobalISel: Fix typo in error message llvm-svn: 293023	2017-01-25 06:02:10 +00:00
NAKAMURA Takumi	1de33b6dae	Ignore llvm/test/tools/llvm-symbolizer/coff-exports.test on mingw. FIXME: Demangler could behave along not host but target. For example, assume host=mingw, target=msc. llvm-svn: 293021	2017-01-25 05:26:23 +00:00
Matt Arsenault	b570b62964	DAGCombiner: Allow negating ConstantFP after legalize llvm-svn: 293019	2017-01-25 04:54:34 +00:00
Gerolf Hoflehner	a64ca53972	[InstCombine] Added regression test to narrow-swich.ll llvm-svn: 293018	2017-01-25 04:34:59 +00:00
NAKAMURA Takumi	b8624b1643	Rewind instantiations of OuterAnalysisManagerProxy in r289317, r291651, and r291662. I found root class should be instantiated for variadic tempate to instantiate static member explicitly. This will fix failures in mingw DLL build. llvm-svn: 293017	2017-01-25 04:26:29 +00:00
Matt Arsenault	ec49368879	AMDGPU: Implement early ifcvt target hooks. Leave early ifcvt disabled for now since there are some shader-db regressions. This causes some immediate improvements, but could be better. The cost checking that the pass does is based on critical path length for out of order CPUs which we do not want so it skips out on many cases we want. llvm-svn: 293016	2017-01-25 04:25:02 +00:00
Peter Collingbourne	a6e1ed1de7	gold-plugin: Add the file path to the file open error diagnostic. llvm-svn: 293013	2017-01-25 03:35:28 +00:00
Ahmed Bougacha	9401d2bc09	Try to prevent build breakage by touching a CMakeLists.txt. Looks like our cmake goop for handling .inc->td dependencies doesn't track the .td files. This manifests as cmake complaining about missing files since r293009. Force a rerun to avoid that. llvm-svn: 293012	2017-01-25 02:55:24 +00:00
Chandler Carruth	c3aa937b25	[PM] Teach LoopUnroll to update the LPM infrastructure as it unrolls loops. We do this by reconstructing the newly added loops after the unroll completes to avoid threading pass manager details through all the mess of the unrolling infrastructure. I've enabled some extra assertions in the LPM to try and catch issues here and enabled a bunch of unroller tests to try and make sure this is sane. Currently, I'm manually running loop-simplify when needed. That should go away once it is folded into the LPM infrastructure. Differential Revision: https://reviews.llvm.org/D28848 llvm-svn: 293011	2017-01-25 02:49:01 +00:00
Ahmed Bougacha	365c1158a8	[GlobalISel] Generate selector for more integer binop patterns. This surprisingly isn't NFC because there are patterns to select GPR sub to SUBSWrr (rather than SUBWrr/rs); SUBS is later optimized to SUB if NZCV is dead. From ISel's perspective, both are fine. llvm-svn: 293010	2017-01-25 02:41:38 +00:00
Ahmed Bougacha	b9bb47b7f8	[GlobalISel] Rename TargetGlobalISel.td to GISel/SelectionDAGCompat.td llvm-svn: 293009	2017-01-25 02:41:26 +00:00
Greg Parker	efc0b2cc29	Reinstate "r292904 - [lit] Allow boolean expressions in REQUIRES and XFAIL and UNSUPPORTED" This reverts the revert in r292942. llvm-svn: 293007	2017-01-25 02:26:03 +00:00
Gor Nishanov	750a36491c	[coroutines] Spill the result of the invoke instruction correctly Summary: When we decide that the result of the invoke instruction need to be spilled, we need to insert the spill into a block that is on the normal edge coming out of the invoke instruction. (Prior to this change the code would insert the spill immediately after the invoke instruction, which breaks the IR, since invoke is a terminator instruction). In the following example, we will split the edge going into %cont and insert the spill there. ``` %r = invoke double @print(double 0.0) to label %cont unwind label %pad cont: %0 = call i8 @llvm.coro.suspend(token none, i1 false) switch i8 %0, label %suspend [i8 0, label %resume i8 1, label %cleanup] resume: call double @print(double %r) ``` Reviewers: majnemer Reviewed By: majnemer Subscribers: mehdi_amini, llvm-commits, EricWF Differential Revision: https://reviews.llvm.org/D29102 llvm-svn: 293006	2017-01-25 02:25:54 +00:00
Tom Stellard	bbf29e433b	AMDGPU add support for spilling to a user sgpr pointed buffers Summary: This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1]. Patch By: Dave Airlie Reviewers: nhaehnle, arsenm, tstellarAMD Reviewed By: arsenm Subscribers: mareko, llvm-commits, kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D25428 llvm-svn: 293000	2017-01-25 01:25:13 +00:00
Eugene Zelenko	7efc5e5021	[AArch64] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 292996	2017-01-25 00:29:26 +00:00
Justin Bogner	1aa5ce17e7	GlobalISel: Use the correct types when translating landingpad instructions There was a bug here where we were using p0 instead of s32 for the selector type in the landingpad. Instead of hardcoding these types we should get the types from the landingpad instruction directly. Note that we replicate an assert from SDAG here to only support two-valued landingpads. llvm-svn: 292995	2017-01-25 00:16:53 +00:00
Kevin Enderby	8bf2878f72	Fix llvm-objdump so it picks a good CPU based for Mach-O files for CPU_SUBTYPE_ARM_V7S and CPU_SUBTYPE_ARM_V7K. For these two cpusubtypes they should default to a cortex-a7 CPU to give proper disassembly without a -mcpu= flag. rdar://27431703 llvm-svn: 292993	2017-01-24 23:41:04 +00:00
Eugene Zelenko	4dcbc300ec	[XCore] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 292988	2017-01-24 23:02:48 +00:00
Matt Arsenault	8060661ae3	AMDGPU: Remove spurious out branches after a kill The sequence like this: v_cmpx_le_f32_e32 vcc, 0, v0 s_branch BB0_30 s_cbranch_execnz BB0_30 ; BB#29: exp null off, off, off, off done vm s_endpgm BB0_30: ; %endif110 is likely wrong. The s_branch instruction will unconditionally jump to BB0_30 and the skip block (exp done + endpgm) inserted for performing the kill instruction will never be executed. This results in a GPU hang with Star Ruler 2. The s_branch instruction is added during the "Control Flow Optimizer" pass which seems to re-organize the basic blocks, and we assume that SI_KILL_TERMINATOR is always the last instruction inside a basic block. Thus, after inserting a skip block we just go to the next BB without looking at the subsequent instructions after the kill, and the s_branch op is never removed. Instead, we should remove the unconditional out branches and let skip the two instructions if the exec mask is non-zero. This patch fixes the GPU hang and doesn't introduce any regressions with "make check". Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019 Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 292985	2017-01-24 22:18:39 +00:00
Wei Mi	ca272c5437	Revert rL292621. Caused some internal build bot failures in apple. llvm-svn: 292984	2017-01-24 22:15:06 +00:00
Eugene Zelenko	54635cc5dd	[SystemZ] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 292983	2017-01-24 22:10:43 +00:00
Matt Arsenault	81a9bfe915	Enable FeatureFlatForGlobal on Volcanic Islands This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 292982	2017-01-24 22:02:15 +00:00
Dehao Chen	abd6950761	Explicitly promote indirect calls before sample profile annotation. Summary: In iterative sample pgo where profile is collected from PGOed binary, we may see indirect call targets promoted and inlined in the profile. Before profile annotation, we need to make this happen in order to annotate correctly on IR. This patch explicitly promotes these indirect calls and inlines them before profile annotation. Reviewers: xur, davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29040 llvm-svn: 292979	2017-01-24 21:05:51 +00:00
Saleem Abdulrasool	8d12fdc869	Demangle: correct demangling for CV-qualified functions When demangling a CV-qualified function type with a final reference type parameter, we would treat the reference type parameter as a r-value ref accidentally. This would result in the improper decoration of the function type itself. Resolves PR31741! llvm-svn: 292976	2017-01-24 20:04:58 +00:00
Saleem Abdulrasool	14ea071037	Demangle: use named values for CV qualifiers Rather than hard-coding magic values of 1, 2, 4 (bit-field), use an enum to name the values. NFC. llvm-svn: 292975	2017-01-24 20:04:56 +00:00
Ivan Krasin	4c52eb7431	Revert [AMDGPU][mc][tests][NFC] Add coverage/smoke tests for Gfx7 and Gfx8. Reason: broke ASAN bots with a global buffer overflow. http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/2291 Each test contains 20-30K test cases but takes only several (from 4 to 10) seconds to complete on average machine. The tests cover the majority of AMDGPU Gfx7/Gfx8 instructions, including many dark corners, and intended to quickly find out if something is broken. llvm-svn: 292974	2017-01-24 19:58:59 +00:00
Daniel Berlin	cfc48bb77f	Remove the load hoisting code of MLSM, it is completely subsumed by GVNHoist Summary: GVNHoist performs all the optimizations that MLSM does to loads, in a more general way, and in a faster time bound (MLSM is N^3 in most cases, N^4 in a few edge cases). This disables the load portion. Note that the way ld_hoist_st_sink.ll is written makes one think that the loads should be moved to the while.preheader block, but 1. Neither MLSM nor GVNHoist do it (they both move them to identical places). 2. MLSM couldn't possibly do it anyway, as the while.preheader block is not the head of the diamond, while.body is. (GVNHoist could do it if it was legal). 3. At a glance, it's not legal anyway because the in-loop load conflict with the in-loop store, so the loads must stay in-loop. I am happy to update the test to use update_test_checks so that checking is tighter, just was going to do it as a followup. Note that i can find no particular benefit to the store portion on any real testcase/benchmark i have (even size-wise). If we really still want it, i am happy to commit to writing a targeted store sinker, just taking the code from the MemorySSA port of MergedLoadStoreMotion (which is N^2 worst case, and N most of the time). We can do what it does in a much better time bound. We also should be both hoisting and sinking stores, not just sinking them, anyway, since whether we should hoist or sink to merge depends basically on luck of the draw of where the blockers are placed. Nonetheless, i have left it alone for now. Reviewers: chandlerc, davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29079 llvm-svn: 292971	2017-01-24 19:55:36 +00:00
Changpeng Fang	69ca91bb54	AMDGPU/SI: Give up in promote alloca when a pointer may be captured. Differential Revision: http://reviews.llvm.org/D28970 Reviewer: Matt llvm-svn: 292966	2017-01-24 19:06:28 +00:00
Saleem Abdulrasool	cebc2ebe41	Demangle: avoid butchering parameter type When demangling a CV-qualified function type with a final parameter with a reference type, we would insert the CV qualification on the parameter rather than the function, and in the process adjust the insertion point by one extra, splitting the type name. This avoids doing so, even though the attribution is still incorrect. llvm-svn: 292965	2017-01-24 18:52:19 +00:00
Chad Rosier	4dd0c4bc13	[AArch64] Fix typo. NFC. llvm-svn: 292959	2017-01-24 18:08:10 +00:00
Amaury Sechet	2dc6021f94	Use InstCombine's builder in foldSelectCttzCtlz instead of creating a new one. Summary: As per title. This will add the instructiions we are interested in in the worklist. Reviewers: mehdi_amini, majnemer, andreadb Differential Revision: https://reviews.llvm.org/D29081 llvm-svn: 292957	2017-01-24 17:48:25 +00:00
Stanislav Mekhanoshin	324d0de803	[AMDGPU] Add VGPR copies post regalloc fix pass Regalloc creates COPY instructions which do not formally use VALU. That results in v_mov instructions displaced after exec mask modification. One pass which do it is SIOptimizeExecMasking, but potentially it can be done by other passes too. This patch adds a pass immediately after regalloc to add implicit exec use operand to all VGPR copy instructions. Differential Revision: https://reviews.llvm.org/D28874 llvm-svn: 292956	2017-01-24 17:46:17 +00:00
Evandro Menezes	43707e2175	[AArch64] Rename 'no-quad-ldst-pairs' to 'slow-paired-128' In order to follow the pattern of the existing 'slow-misaligned-128store' option, rename the option 'no-quad-ldst-pairs' to 'slow-paired-128'. llvm-svn: 292954	2017-01-24 17:34:31 +00:00
Chris Bieneman	c4b940b616	[Lanai] Rename LanaiInstPrinter library to LanaiAsmPrinter Summary: This is in keeping with LLVM convention. The classes are InstPrinters, but the library is ${target}AsmPrinter. This patch is in response to bryant pointing out to me that Lanai was the only backend deviating from convention here. Thanks! Reviewers: jpienaar, bryant Subscribers: mgorny, jgosnell, llvm-commits Differential Revision: https://reviews.llvm.org/D29043 llvm-svn: 292953	2017-01-24 17:27:01 +00:00
Sanjay Patel	7b87306d0f	[InstSimplify] try to eliminate icmp Pred (add nsw X, C1), C2 I was surprised to see that we're missing icmp folds based on 'add nsw' in InstCombine, but we should handle the InstSimplify cases first because that could make the InstCombine code simpler. Here are Alive-based proofs for the logic: Name: add_neg_constant Pre: C1 < 0 && (C2 > ((1<<(width(C1)-1)) + C1)) %a = add nsw i7 %x, C1 %b = icmp sgt %a, C2 => %b = false Name: add_pos_constant Pre: C1 > 0 && (C2 < ((1<<(width(C1)-1)) + C1 - 1)) %a = add nsw i6 %x, C1 %b = icmp slt %a, C2 => %b = false Name: nuw Pre: C1 u>= C2 %a = add nuw i11 %x, C1 %b = icmp ult %a, C2 => %b = false Differential Revision: https://reviews.llvm.org/D29053 llvm-svn: 292952	2017-01-24 17:03:24 +00:00

1 2 3 4 5 ...

143739 Commits