llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-25 14:02:52 +02:00

Author	SHA1	Message	Date
Justin Lebar	4d3b101eb3	[CUDA] Rework "optimizations" and "publication" section in CompileCudaWithLLVM.rst. llvm-svn: 280869	2016-09-07 21:46:53 +00:00
Justin Lebar	f70f068f9f	[CUDA] Clarify that -l and -L only need to be passed when linking, in CompileCudaWithLLVM.rst. llvm-svn: 280868	2016-09-07 21:46:49 +00:00
Justin Lebar	aa89f5c0db	[CUDA] Further reformat "invoking clang" section of CompileCudaWithLLVM.rst. llvm-svn: 280867	2016-09-07 21:46:21 +00:00
Hal Finkel	8fd0d52d70	[SimplifyCFG] Don't try to create metadata-valued PHIs We can't create metadata-valued PHIs; don't try to do so when sinking. I created a test case for this using the @llvm.type.test intrinsic, because it takes a metadata parameter and does not have severe side effects (thus SimplifyCFG is willing to otherwise sink it). Previously, running the test case would crash with: Invalid use of metadata! %.sink = select i1 %flag, metadata <...>, metadata <0x4e45dc0> LLVM ERROR: Broken function found, compilation aborted! llvm-svn: 280866	2016-09-07 21:38:22 +00:00
Haicheng Wu	062f7f349a	[LoopUnroll] Correct a debug message. NFC. Differential Revision: https://reviews.llvm.org/D24299 llvm-svn: 280865	2016-09-07 21:30:16 +00:00
Elena Demikhovsky	0c7260ec4b	Shift-left (ISD::SHL) operation crashes on "DAG Legalization" phase. https://llvm.org/bugs/show_bug.cgi?id=29058. While node legalization we tried to legalize its operands. If an operand node is replaced during legalization the user node may be destroyed. Differential Revision: https://reviews.llvm.org/D24244 llvm-svn: 280862	2016-09-07 20:54:33 +00:00
Sanjay Patel	02d7d70623	[InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectors This is a revert of r280676 which was a revert of r280637; ie, this is r280637 again. It was speculatively reverted to help debug buildbot failures. llvm-svn: 280861	2016-09-07 20:50:44 +00:00
Justin Lebar	fed4170db7	[CUDA] Fix typo in link in CompileCudaWithLLVM. llvm-svn: 280859	2016-09-07 20:42:24 +00:00
Justin Lebar	c1a06c1111	[CUDA] Move AXPY example into gist. No need to have a long inline code snippet in this doc. Also move "flags that control numerical code" underneath the "invoking clang" section, and reformat things a bit. llvm-svn: 280857	2016-09-07 20:37:41 +00:00
Krzysztof Parzyszek	43979022ef	[RDF] Fix liveness analysis for phi nodes with shadow uses Shadow uses need to be analyzed together, since each individual shadow will only have a partial reaching def. All shadows together may cover a given register ref, while each individual shadow may not. llvm-svn: 280855	2016-09-07 20:37:05 +00:00
Michael Kuperstein	0e86379a25	Don't reuse a variable name in a nested scope. NFC. llvm-svn: 280853	2016-09-07 20:29:49 +00:00
Krzysztof Parzyszek	dc938edd03	[RDF] Introduce "undef" flag for ref nodes llvm-svn: 280851	2016-09-07 20:10:56 +00:00
Justin Lebar	176391aa2c	[CUDA] Simplify build/install instructions in CompileCudaWithLLVM.rst. llvm-svn: 280850	2016-09-07 20:09:53 +00:00
Justin Lebar	3b1ad18de8	[CUDA] Call it "CUDA", not "CUDA C/C++" in our docs. CUDA is an extension to C++ -- there is no such thing as "CUDA C". But also, the language is much more commonly called "CUDA" than "CUDA C++". llvm-svn: 280849	2016-09-07 20:09:50 +00:00
Justin Lebar	1ecf02eb9f	[CUDA] Expand upon --cuda-gpu-arch flag in CompileCudaWithLLVM doc. llvm-svn: 280848	2016-09-07 20:09:46 +00:00
Wei Mi	d3da1aad61	Rename test pr30298.ll to shrink_vmul_sse.ll, to make the name more meaningful, NFC. Add PR number and comment in pr30298.ll to explain what is testing. llvm-svn: 280843	2016-09-07 18:46:15 +00:00
Yaxun Liu	c1a7b8a5f8	AMDGPU: Remove a useless variable which caused build failure for lld. llvm-svn: 280841	2016-09-07 18:31:11 +00:00
Wei Mi	811ea02523	Don't reduce the width of vector mul if the target doesn't support SSE2. The patch is to fix PR30298, which is caused by rL272694. The solution is to bail out if the target has no SSE2. Differential Revision: https://reviews.llvm.org/D24288 llvm-svn: 280837	2016-09-07 18:22:17 +00:00
Hans Wennborg	1a5c9c1117	Add more triple to conditional-tailcall.ll test llvm-svn: 280835	2016-09-07 18:19:31 +00:00
Chad Rosier	c44ce6174f	Typo. NFC. llvm-svn: 280834	2016-09-07 18:15:12 +00:00
Saleem Abdulrasool	00c4a26e5c	CodeGen: ensure that libcalls are always AAPCS CC The original commit was too aggressive about marking LibCalls as AAPCS. The libcalls contain libc/libm/libunwind calls which are not AAPCS, but C. llvm-svn: 280833	2016-09-07 17:56:09 +00:00
Hans Wennborg	aa75671d5c	X86: Fold tail calls into conditional branches where possible (PR26302) When branching to a block that immediately tail calls, it is possible to fold the call directly into the branch if the call is direct and there is no stack adjustment, saving one byte. Example: define void @f(i32 %x, i32 %y) { entry: %p = icmp eq i32 %x, %y br i1 %p, label %bb1, label %bb2 bb1: tail call void @foo() ret void bb2: tail call void @bar() ret void } before: f: movl 4(%esp), %eax cmpl 8(%esp), %eax jne .LBB0_2 jmp foo .LBB0_2: jmp bar after: f: movl 4(%esp), %eax cmpl 8(%esp), %eax jne bar .LBB0_1: jmp foo I don't expect any significant size savings from this (on a Clang bootstrap I saw 288 bytes), but it does make the code a little tighter. This patch only does 32-bit, but 64-bit would work similarly. Differential Revision: https://reviews.llvm.org/D24108 llvm-svn: 280832	2016-09-07 17:52:14 +00:00
Davide Italiano	9cc7f6f6fa	[lib/LTO] Add a way to run a custom pipeline Differential Revision: https://reviews.llvm.org/D24095 llvm-svn: 280830	2016-09-07 17:46:16 +00:00
Yaxun Liu	e97f88d72f	AMDGPU: Add hidden kernel arguments to runtime metadata OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata. Differential Revision: https://reviews.llvm.org/D23424 llvm-svn: 280829	2016-09-07 17:44:00 +00:00
Reid Kleckner	bdfdb24819	[codeview] Add new directives to record inlined call site line info Summary: Previously we were trying to represent this with the "contains" list of the .cv_inline_linetable directive, which was not enough information. Now we directly represent the chain of inlined call sites, so we know what location to emit when we encounter a .cv_loc directive of an inner inlined call site while emitting the line table of an outer function or inlined call site. Fixes PR29146. Also fixes PR29147, where we would crash when .cv_loc directives crossed sections. Now we write down the section of the first .cv_loc directive, and emit an error if any other .cv_loc directive for that function is in a different section. Also fixes issues with discontiguous inlined source locations, like in this example: volatile int unlikely_cond = 0; extern void __declspec(noreturn) abort(); __forceinline void f() { if (!unlikely_cond) abort(); } int main() { unlikely_cond = 0; f(); unlikely_cond = 0; } Previously our tables gave bad location information for the 'abort' call, and the debugger wouldn't snow the inlined stack frame for 'f'. It is important to emit good line tables for this code pattern, because it comes up whenever an asan bug occurs in an inlined function. The __asan_report* stubs are generally placed after the normal function epilogue, leading to discontiguous regions of inlined code. Reviewers: majnemer, amccarth Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24014 llvm-svn: 280822	2016-09-07 16:15:31 +00:00
Chad Rosier	2f60815d8e	[LoopInterchange] Improve debug output. NFC. llvm-svn: 280820	2016-09-07 16:07:17 +00:00
Chad Rosier	6b482ffada	[LoopInterchange] Improve debug output. NFC. llvm-svn: 280819	2016-09-07 15:56:59 +00:00
Justin Lebar	902676611e	[LSV] Use the original loads' names for the extractelement instructions. Summary: LSV replaces multiple adjacent loads with one vectorized load and a bunch of extractelement instructions. This patch makes the extractelement instructions' names match those of the original loads, for (hopefully) improved readability. Reviewers: asbirlea, tstellarAMD Subscribers: arsenm, mzolotukhin Differential Revision: https://reviews.llvm.org/D23748 llvm-svn: 280818	2016-09-07 15:49:48 +00:00
Sanjay Patel	25aacf2ea5	[x86] move combines of 'select of 2 constants' to its own function; NFC There are missing folds here and possibly folds that could be made generic. llvm-svn: 280817	2016-09-07 15:47:34 +00:00
Simon Pilgrim	648199b413	Fix typo in test - it should be masking bits0-15 not bit16 llvm-svn: 280816	2016-09-07 15:19:07 +00:00
Andrea Di Biagio	a7ae65eca5	Regenerate vector bitcast folding tests using update_test_checks.py. Two tests have been merged together, regenerated and then moved to a more appropriate directory. No functional change. llvm-svn: 280814	2016-09-07 14:50:07 +00:00
Simon Pilgrim	44db51e5ab	[X86][SSE] Added or combine tests for known bits of vectors Part of the yak shaving for D24253 llvm-svn: 280813	2016-09-07 14:49:50 +00:00
Simon Pilgrim	c9ab30118b	[X86][SSE] Added and+or+zext combine tests for known bits of vectors Part of the yak shaving for D24253 llvm-svn: 280810	2016-09-07 14:00:52 +00:00
Simon Pilgrim	4abaac135b	[X86][SSE] Added and+or combine tests currently failing with vectors (and (or x, C), D) -> D if (C & D) == D Part of the yak shaving for D24253 llvm-svn: 280809	2016-09-07 13:40:03 +00:00
Pablo Barrio	4572c42d8d	[ARM] Lower UDIV+UREM to UDIV+MLS (and the same for SREM) Summary: This saves a library call to __aeabi_uidivmod. However, the processor must feature hardware division in order to benefit from the transformation. Reviewers: scott-0, jmolloy, compnerd, rengolin Subscribers: t.p.northover, compnerd, aemerson, rengolin, samparker, llvm-commits Differential Revision: https://reviews.llvm.org/D24133 llvm-svn: 280808	2016-09-07 12:49:15 +00:00
Andrea Di Biagio	143c56ed9b	[InstCombine][SSE4a] Fix assertion failure in the insertq/insertqi combining logic. This fixes a similar issue to the one already fixed by r280804 (revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts in the extrq/extrqi combining logic. However, it turns out that even the insertq/insertqi logic was affected by the same problem. llvm-svn: 280807	2016-09-07 12:47:53 +00:00
Andrea Di Biagio	b15baf693c	[InstCombine][SSE4a] Fix assertion failure caused by unsafe dyn_casts on the operands of extrq/extrqi intrinsic calls. This patch fixes an assertion failure caused by unsafe dynamic casts on the constant operands of sse4a intrinsic calls to extrq/extrqi The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently checks if the input operands are constants. Internally, that logic relies on dyn_casts of values returned by calls to method Constant::getAggregateElement. However, method getAggregateElemet may return nullptr if the constant element cannot be retrieved. So, all the dyn_casts can potentially fail. This is what happens for example if a constexpr value is passed in input to an extrq/extrqi intrinsic call. This patch fixes the problem by using a dyn_cast_or_null (instead of a simple dyn_cast) on the result of each call to Constant::getAggregateElement. Added reproducible test cases to x86-sse4a.ll. Differential Revision: https://reviews.llvm.org/D24256 llvm-svn: 280804	2016-09-07 12:03:03 +00:00
Renato Golin	82e5a400af	Revert "[EfficiencySanitizer] Adds shadow memory parameters for 40-bit virtual memory address." This reverts commit r280796, as it broke the AArch64 bots for no reason. The tests were passing and we should try to keep them passing, so a proper review should make that happen. llvm-svn: 280802	2016-09-07 10:54:42 +00:00
Vasileios Kalintiris	f08d514b9b	[mips] Disable the TImode shift libcalls for 32-bit targets. Summary: The o32 ABI doesn't not support the TImode helpers. For the time being, disable just the shift libcalls as they break recursive builds on MIPS. Reviewers: sdardis Subscribers: llvm-commits, sdardis Differential Revision: https://reviews.llvm.org/D24259 llvm-svn: 280798	2016-09-07 10:01:18 +00:00
Sagar Thakur	3efce374b0	[EfficiencySanitizer] Adds shadow memory parameters for 40-bit virtual memory address. Adding 40-bit shadow memory parameters because MIPS64 uses 40-bit virtual memory addresses. Reviewed by bruening Differential: D23801 llvm-svn: 280796	2016-09-07 09:45:37 +00:00
James Molloy	4ddb9ff571	[SimplifyCFG] Followup fix to r280790 In failure cases it's not guaranteed that the PHI we're inspecting is actually in the successor block! In this case we need to bail out early, and never query getIncomingValueForBlock() as that will cause an assert. llvm-svn: 280794	2016-09-07 09:01:22 +00:00
James Molloy	278080462f	[SimplifyCFG] Update workaround for PR30188 to also include loads I should have realised this the first time around, but if we're avoiding sinking stores where the operands come from allocas so they don't create selects, we also have to do the same for loads because SROA will be just as defective looking at loads of selected addresses as stores. Fixes PR30188 (again). llvm-svn: 280792	2016-09-07 08:40:20 +00:00
Diana Picus	6ffe0abe96	[CMake] Use CMake's default RPATH for the unit tests In the top-level CMakeLists.txt, we set CMAKE_BUILD_WITH_INSTALL_RPATH to ON, and then for the unit tests we set it to <test>/../../lib. This works for tests that live in unittest/<whatever>, but not for those that live in subdirectories e.g. unittest/Transforms/IPO or unittest/ExecutionEngine/Orc. When building with BUILD_SHARED_LIBRARIES, such tests don't manage to find their libraries. Since the tests are run from the build directory, it makes sense to set their RPATH for the build tree, rather than the install tree. This is the default in CMake since 2.6, so all we have to do is set CMAKE_BUILD_WITH_INSTALL_RPATH to OFF for the unit tests. llvm-svn: 280791	2016-09-07 08:37:15 +00:00
James Molloy	c501708bef	[SimplifyCFG] Check PHI uses more accurately PR30292 showed a case where our PHI checking wasn't correct. We were checking that all values were used by the same PHI before deciding to sink, but we weren't checking that the incoming values for that PHI were what we expected. As a result, we had to bail out after block splitting which caused us to never reach a steady state in SimplifyCFG. Fixes PR30292. llvm-svn: 280790	2016-09-07 08:15:54 +00:00
Hal Finkel	d7f40f9afa	[PowerPC] Fix address-offset folding for plain addi When folding an addi into a memory access that can take an immediate offset, we were implicitly assuming that the existing offset was zero. This was incorrect. If we're dealing with an addi with a plain constant, we can add it to the existing offset (assuming that doesn't overflow the immediate, etc.), but if we have anything else (i.e. something that will become a relocation expression), we'll go back to requiring the existing immediate offset to be zero (because we don't know what the requirements on that relocation expression might be - e.g. maybe it is paired with some addis in some relevant way). On the other hand, when dealing with a plain addi with a regular constant immediate, the alignment restrictions (from the TOC base pointer, etc.) are irrelevant. I've added the test case from PR30280, which demonstrated the bug, but also demonstrates a missed optimization opportunity (i.e. we don't need the memory accesses at all). Fixes PR30280. llvm-svn: 280789	2016-09-07 07:36:11 +00:00
Elena Demikhovsky	dedd445ace	AVX512F: FMA intrinsic + FNEG - sequence optimization The previous commit (r280368 - https://reviews.llvm.org/D23313) does not cover AVX-512F, KNL set. FNEG(x) operation is lowered to (bitcast (vpxor (bitcast x), (bitcast constfp(0x80000000))). It happens because FP XOR is not supported for 512-bit data types on KNL and we use integer XOR instead. I added pattern match for integer XOR. Differential Revision: https://reviews.llvm.org/D24221 llvm-svn: 280785	2016-09-07 06:54:28 +00:00
Matt Arsenault	7369913db8	AMDGPU: Make some scalar instructions commutable llvm-svn: 280784	2016-09-07 06:25:55 +00:00
Matt Arsenault	4042e05485	Remove unnecessary call to getAllocatableRegClass This reapplies r252565 and r252674, effectively reverting r252956. This allows VS_32/VS_64 to be unallocatable like they should be. llvm-svn: 280783	2016-09-07 06:16:45 +00:00
Craig Topper	6b7eb97ac7	[X86] Add hasSideEffects=0 to some instructions. llvm-svn: 280782	2016-09-07 04:46:15 +00:00
Craig Topper	a231c68058	[AVX-512] Add support for commuting masked instructions in findCommutedOpIndices. The default implementation doesn't skip the mask input or the preserved input. llvm-svn: 280781	2016-09-07 04:46:11 +00:00

... 3 4 5 6 7 ...

137921 Commits