llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 03:23:01 +02:00

Author	SHA1	Message	Date
Chandler Carruth	465908b70b	[x86] Rip out some broken test cases for avx512 i1 store support. It isn't reasonable to test storing things using undef pointers -- storing through those is at best "good luck" and really should be transformed to "unreachable". Random changes in the combiner can randomly break these tests for no good reason. I'm following up on the original commit regarding the right long-term strategy here. llvm-svn: 213810	2014-07-23 22:29:19 +00:00
Juergen Ributzka	dd1be32354	[RuntimeDyld][AArch64] Update relocation tests and also add a simple GOT test. llvm-svn: 213807	2014-07-23 22:23:17 +00:00
David Blaikie	eaf4b8c44f	ArgPromo+DebugInfo: Handle updating debug info over multiple applications of argument promotion. While the subprogram map cache used by Dead Argument Elimination works there, I made a mistake when reusing it for Argument Promotion in r212128 because ArgPromo may transform functions more than once whereas DAE transforms each function only once, removing all the dead arguments in one go. To address this, ensure that the map is updated after each argument promotion. In retrospect it might be a little wasteful to create a map of all subprograms when only handling a single CGSCC, but the alternative is walking the debug info for each function in the CGSCC that gets updated. It's not clear to me what the right tradeoff is there, but since the current tradeoff seems to be working OK (and the code to keep things updated is very cheap), let's stick with that for now. llvm-svn: 213805	2014-07-23 22:09:29 +00:00
David Blaikie	ab62d52eec	Test debug info in arg promotion with an actual promotion case, rather than a degenerate arg promotion that's actually DAE performed by ArgPromo Also the debug location I had here was bogus, describing the location of the call site as in the callee - and unnecessary, so just drop it. llvm-svn: 213803	2014-07-23 21:30:59 +00:00
Jim Grosbach	98597dd715	Use an explicit triple in testcase. Make the test work better on non-darwin hosts. Hopefully. llvm-svn: 213801	2014-07-23 20:46:32 +00:00
Jim Grosbach	f6143714d5	[X86,AArch64] Extend vcmp w/ unary op combine to work w/ more constants. The transform to constant fold unary operations with an AND across a vector comparison applies when the constant is not a splat of a scalar as well. llvm-svn: 213800	2014-07-23 20:41:43 +00:00
Jim Grosbach	5e39096957	X86: restrict combine to when type sizes are safe. The folding of unary operations through a vector compare and mask operation is only safe if the unary operation result is of the same size as its input. For example, it's not safe for [su]itofp from v4i32 to v4f64. llvm-svn: 213799	2014-07-23 20:41:38 +00:00
Jim Grosbach	ae8284ac32	DAG: fp->int conversion for non-splat constants. Constant fold the lanes of the input constant build_vector individually so we correctly handle when the vector elements are not all the same constant value. PR20394 llvm-svn: 213798	2014-07-23 20:41:31 +00:00
Justin Holewinski	d4b153f954	[NVPTX] Add some extra tests for mul.wide to test non-power-of-two source types llvm-svn: 213794	2014-07-23 20:23:49 +00:00
Mark Heffernan	6e2086acbe	Do not add unroll disable metadata after unrolling pass for loops with #pragma clang loop unroll(full). llvm-svn: 213789	2014-07-23 20:05:44 +00:00
Juergen Ributzka	2a176cc855	[FastISel][AArch64] Fix return type in FastLowerCall. I used the wrong method to obtain the return type inside FinishCall. This fix simply uses the return type from FastLowerCall, which we already determined to be a valid type. Reduced test case from Chad. Thanks. llvm-svn: 213788	2014-07-23 20:03:13 +00:00
Justin Holewinski	851431280b	[NVPTX] mul.wide generation works for any smaller integer source types, not just the next smaller power of two llvm-svn: 213784	2014-07-23 18:46:03 +00:00
Robert Khasanov	4f172d9c81	[SKX] Added missed test files for rev 213757 llvm-svn: 213780	2014-07-23 18:17:49 +00:00
Saleem Abdulrasool	ef57f88f77	AsmParser: remove deprecated LLIR support linker_private and linker_private_weak were deprecated in 3.5. Remove support for them now that the 3.5 branch has been created. llvm-svn: 213777	2014-07-23 18:09:31 +00:00
Robert Khasanov	4e33d5c3f9	[SKX] Fix lowercase "error:" in rev 213757 llvm-svn: 213774	2014-07-23 17:42:13 +00:00
Justin Holewinski	ece54a0498	[NVPTX] Make sure we do not generate MULWIDE ISD nodes when optimizations are disabled With optimizations disabled, we disable the isel patterns for mul.wide; but we were still generating MULWIDE ISD nodes. Now, we only try to generate MULWIDE ISD nodes in DAGCombine if the optimization level is not zero. llvm-svn: 213773	2014-07-23 17:40:45 +00:00
Mark Heffernan	cf39d19c7f	In unroll pragma syntax and loop hint metadata, change "enable" forms to a new form using the string "full". llvm-svn: 213772	2014-07-23 17:31:37 +00:00
Chad Rosier	2cb7fd5dae	[AArch64] Lower sdiv x, pow2 using add + select + shift. The target-independent DAGcombiner will generate: asr w1, X, #31 w1 = splat sign bit. add X, X, w1, lsr #28 X = X + 0 or pow2-1 asr w0, X, asr #4 w0 = X/pow2 However, the add + shifts is expensive, so generate: add w0, X, 15 w0 = X + pow2-1 cmp X, wzr X - 0 csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X; asr w0, X, asr 4 w0 = X/pow2 llvm-svn: 213758	2014-07-23 14:57:52 +00:00
Robert Khasanov	cfc9aa43e1	[SKX] Enabling mask instructions: encoding, lowering KMOVB, KMOVW, KMOVD, KMOVQ, KNOTB, KNOTW, KNOTD, KNOTQ Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 213757	2014-07-23 14:49:42 +00:00
Tim Northover	869fa46eae	ARM: spot SBFX-compatbile code expressed with sign_extend_inreg We were assuming all SBFX-like operations would have the shl/asr form, but often when the field being extracted is an i8 or i16, we end up with a SIGN_EXTEND_INREG acting on a shift instead. Simple enough to check for though. llvm-svn: 213754	2014-07-23 13:59:12 +00:00
Tim Northover	68aeddde08	ARM: add patterns for [su]xta[bh] from just a shift. Although the final shifter operand is a rotate, this actually only matters for the half-word extends when the amount == 24. Otherwise folding a shift in is just as good. llvm-svn: 213753	2014-07-23 13:59:07 +00:00
Tilmann Scheller	189ad507d0	[ARM] Make the assembler reject unpredictable pre/post-indexed ARM STRB instructions. The ARM ARM prohibits STRB instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STRB instructions with unpredictable behavior. llvm-svn: 213750	2014-07-23 13:03:47 +00:00
Tim Northover	c357579164	AArch64: remove "arm64_be" support in favour of "aarch64_be". There really is no arm64_be: it was a useful fiction to test big-endian support while both backends existed in parallel, but now the only platform that uses the name (iOS) doesn't have a big-endian variant, let alone one called "arm64_be". llvm-svn: 213748	2014-07-23 12:58:11 +00:00
Tilmann Scheller	ea661e77fd	[ARM] Make the assembler reject unpredictable pre/post-indexed ARM STR instructions. The ARM ARM prohibits STR instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STR instructions with unpredictable behavior. llvm-svn: 213745	2014-07-23 12:38:17 +00:00
Andrea Di Biagio	e373d3d06d	Revert r211771. It was: "[X86] Improve the selection of SSE3/AVX addsub instructions". This chang fully reverts r211771. That revision added a canonicalization rule which has the potential to causes a combine-cycle in the target-independent canonicalizing DAG combine. The plan is to move the logic that forms target specific addsub nodes as part of the lowering of shuffles. llvm-svn: 213736	2014-07-23 11:20:24 +00:00
Chandler Carruth	d673be21b7	[x86] Clean up a test case to use check labels and spell out the exact instruction sequences with CHECK-NEXT for these test cases. This notably exposes how absolutely horrible the generated code is for several of these test cases, and will make any future updates to the test as our vector instruction selection gets better. llvm-svn: 213732	2014-07-23 09:11:48 +00:00
Tilmann Scheller	92abecadc2	[ARM] Add regression test for the earlyclobber constraint of ARM STRB. The constraint was added in r213369. llvm-svn: 213730	2014-07-23 08:39:50 +00:00
Tilmann Scheller	e0c6a75a6a	[ARM] Add earlyclobber constraint to pre/post-indexed ARM STRH instructions. The post-indexed instructions were missing the constraint, causing unpredictable STRH instructions to be emitted. The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed. llvm-svn: 213729	2014-07-23 08:12:51 +00:00
Chandler Carruth	908e62868c	[SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate insertions. The old behavior could cause arbitrarily bad memory usage in the DAG combiner if there was heavy traffic of adding nodes already on the worklist to it. This commit switches the DAG combine worklist to work the same way as the instcombine worklist where we null-out removed entries and only add new entries to the worklist. My measurements of codegen time shows slight improvement. The memory utilization is unsurprisingly dominated by other factors (the IR and DAG itself I suspect). This change results in subtle, frustrating churn in the particular order in which DAG combines are applied which causes a number of minor regressions where we fail to match a pattern previously matched by accident. AFAICT, all of these should be using AddToWorklist to directly or should be written in a less brittle way. None of the changes seem drastically bad, and a few of the changes seem distinctly better. A major change required to make this work is to significantly harden the way in which the DAG combiner handle nodes which become dead (zero-uses). Previously, we relied on the ability to "priority-bump" them on the combine worklist to achieve recursive deletion of these nodes and ensure that the frontier of remaining live nodes all were added to the worklist. Instead, I've introduced a routine to just implement that precise logic with no indirection. It is a significantly simpler operation than that of the combiner worklist proper. I suspect this will also fix some other problems with the combiner. I think the x86 changes are really minor and uninteresting, but the avx512 change at least is hiding a "regression" (despite the test case being just noise, not testing some performance invariant) that might be looked into. Not sure if any of the others impact specific "important" code paths, but they didn't look terribly interesting to me, or the changes were really minor. The consensus in review is to fix any regressions that show up after the fact here. Thanks to the other reviewers for checking the output on other architectures. There is a specific regression on ARM that Tim already has a fix prepped to commit. Differential Revision: http://reviews.llvm.org/D4616 llvm-svn: 213727	2014-07-23 07:08:53 +00:00
Nick Lewycky	08dae2c274	We may visit a call that uses an alloca multiple times in callUsesLocalStack, sometimes with IsNocapture true and sometimes with IsNocapture false. We accidentally skipped work we needed to do in the IsNocapture=false case if we were called with IsNocapture=true the first time. Fixes PR20405! llvm-svn: 213726	2014-07-23 06:24:49 +00:00
NAKAMURA Takumi	413d897e7d	Rework to let RuntimeDyld/X86/MachO_x86-64_PIC_relocations.s pass on win32. FIXME: "llvm-rtdyld -verify -check" is still sensitive to path separator. Fix searching StubMap to be tolerant of both '/' and '\\' on Win32. llvm-svn: 213723	2014-07-23 04:32:21 +00:00
NAKAMURA Takumi	4555cd0e60	Suppress a test on win32 for now, llvm/test/ExecutionEngine/RuntimeDyld/X86/MachO_x86-64_PIC_relocations.s. FIXME: Fix searching StubMap with '/' and '\\' on Win32. llvm-svn: 213721	2014-07-23 04:05:58 +00:00
NAKAMURA Takumi	a6795992fa	RuntimeDyld/X86/MachO_x86-64_PIC_relocations.s: Use %/T here, or sed(1) would be confused with dos path. llvm-svn: 213720	2014-07-23 04:05:46 +00:00
Juergen Ributzka	00d4bb61a8	XFAIL the test on MIPS Not sure how to debug this one without a MIPS machine. Any takers? llvm-svn: 213705	2014-07-22 23:15:01 +00:00
Juergen Ributzka	c2d9ee45f3	[FastIsel][AArch64] Add support for the FastLowerCall and FastLowerIntrinsicCall target-hooks. This commit modifies the existing call lowering functions to be used as the FastLowerCall and FastLowerIntrinsicCall target-hooks instead. This enables patchpoint intrinsic lowering for AArch64. This fixes <rdar://problem/17733076> llvm-svn: 213704	2014-07-22 23:14:58 +00:00
Juergen Ributzka	49aab6445a	[AArch64] Use CHECK-LABEL in ARM64 ABI unit tests. llvm-svn: 213703	2014-07-22 23:14:54 +00:00
Lang Hames	59246bdf0c	[MCJIT] Refactor and add stub inspection to the RuntimeDyldChecker framework. This patch introduces a 'stub_addr' builtin that can be used to find the address of the stub for a given (<file>, <section>, <symbol>) tuple. This address can be used both to verify the contents of stubs (by loading from the returned address) and to verify references to stubs (by comparing against the returned address). Example (1) - Verifying stub contents: Load 8 bytes (assuming a 64-bit target) from the stub for 'x' in the __text section of f.o, and compare that value against the addres of 'x'. # rtdyld-check: *{8}(stub_addr(f.o, __text, x) = x Example (2) - Verifying references to stubs: Decode the immediate of the instruction at label 'l', and verify that it's equal to the offset from the next instruction's PC to the stub for 'y' in the __text section of f.o (i.e. it's the correct PC-rel difference). # rtdyld-check: decode_operand(l, 4) = stub_addr(f.o, __text, y) - next_pc(l) l: movq y@GOTPCREL(%rip), %rax Since stub inspection requires cooperation with RuntimeDyldImpl this patch pimpl-ifies RuntimeDyldChecker. Its implementation is moved in to a new class, RuntimeDyldCheckerImpl, that has access to the definition of RuntimeDyldImpl. llvm-svn: 213698	2014-07-22 22:47:39 +00:00
Juergen Ributzka	b056ac8df3	[RuntimeDyld][MachO][AArch64] Add a helper function for encoding addends in instructions. Factor out the addend encoding into a helper function and simplify the processRelocationRef. Also add a few simple rtdyld tests. More tests to come once GOTs can be tested too. Related to <rdar://problem/17768539> llvm-svn: 213689	2014-07-22 21:42:55 +00:00
Suyog Sarda	959fecbe70	This patch implements optimization as mentioned in PR19753: Optimize comparisons with "ashr/lshr exact" of a constanst. It handles the errors which were seen in PR19958 where wrong code was being emitted due to earlier patch. Added code for lshr as well as non-exact right shifts. It implements : (icmp eq/ne (ashr/lshr const2, A), const1)" -> (icmp eq/ne A, Log2(const2/const1)) -> (icmp eq/ne A, Log2(const2) - Log2(const1)) Differential Revision: http://reviews.llvm.org/D4068 llvm-svn: 213678	2014-07-22 19:19:36 +00:00
Suyog Sarda	2092947078	Added InstCombine transform for pattern "(A & B) ^ (A ^ B) -> (A \| B)" Patch idea by Ankit Jain ! Differential Revision: http://reviews.llvm.org/D4618 llvm-svn: 213677	2014-07-22 18:30:54 +00:00
Suyog Sarda	65dba610e3	Added InstCombine Transform for patterns: "((~A & B) \| A) -> (A \| B)" and "((A & B) \| ~A) -> (~A \| B)" Original Patch credit to Ankit Jain !! Differential Revision: http://reviews.llvm.org/D4591 llvm-svn: 213676	2014-07-22 18:09:41 +00:00
Hal Finkel	3c4b506191	Make use of the align parameter attribute for all pointer arguments We previously supported the align attribute on all (pointer) parameters, but we only used it for byval parameters. However, it is completely consistent at the IR level to treat 'align n' on all pointer parameters as an alignment assumption on the pointer, and now we wll. Specifically, this causes computeKnownBits to use the align attribute on all pointer parameters, not just byval parameters. I've also added an explicit parameter attribute test for this to test/Bitcode/attributes.ll. And I've updated the LangRef to document the align parameter attribute (as it turns out, it was not documented at all previously, although the byval documentation mentioned that it could be used). There are (at least) two benefits to doing this: - It allows enhancing alignment based on the pointer alignment after inlining callees. - It allows simplification of pointer arithmetic. llvm-svn: 213670	2014-07-22 16:58:55 +00:00
Tim Northover	832d000766	X86: drop relocations on __eh_frame sections globally. Without this, we produce non-extern relocations when targeting older OS X versions that ld64 can't cope with in the particular context of __eh_frame sections (who'd want generic relocation-processing anyway?). This means that an updated linker (ld64 from Xcode 3.2.6 or later) may be needed when targeting such platforms with a modern version of LLVM, but this is probably the case anyway and a reasonable requirement. PR20212, rdar://problem/17544795 llvm-svn: 213665	2014-07-22 15:47:09 +00:00
Suyog Sarda	7289a7b99e	This patch implements transform for pattern "(A \| B) ^ (~A) -> (A \| ~B)". Patch Credit to Ankit Jain !! Differential Revision: http://reviews.llvm.org/D4588 llvm-svn: 213662	2014-07-22 15:37:39 +00:00
Peter Zotov	f79b84b24e	[OCaml] Don't truncate constants over 32 bits in Llvm.const_int. llvm-svn: 213655	2014-07-22 13:55:20 +00:00
Sasa Stankovic	6c6f1ac7c2	[mips] Fix two patterns that select i32's (for MIPS32r6) / i64's (for MIPS64r6) from setne comparison with an i32. The patterns that are fixed: * (select (i32 (setne i32, immZExt16)), i32, i32) (for MIPS32r6) * (select (i32 (setne i32, immZExt16)), i64, i64) (for MIPS64r6) llvm-svn: 213653	2014-07-22 13:36:02 +00:00
Elena Demikhovsky	50a62c2883	AVX-512: Fixed intrinsic of VSQRTPS/PD instructions. I set number and types of parameters according to GCC intrinsics. llvm-svn: 213640	2014-07-22 11:07:31 +00:00
Richard Smith	f8a40b80fc	Revert of r213521. This change introduced a non-hermetic test (depending on a file not in the test/ area). Backing out now so that this test isn't part of the 3.5 branch. Original commit message: "TableGen: Allow AddedComplexity values to be negative [...]" llvm-svn: 213596	2014-07-22 02:32:12 +00:00
Mark Heffernan	2ae2a57274	Rename metadata llvm.loop.vectorize.unroll to llvm.loop.vectorize.interleave. llvm-svn: 213588	2014-07-21 23:11:03 +00:00
Eli Bendersky	640c27ce2b	Add some tests for NVPTX lowering of cmpxchg llvm-svn: 213586	2014-07-21 22:54:44 +00:00
David Blaikie	faea669705	Revert "Recommit r212203: Don't try to construct debug LexicalScopes hierarchy for functions that do not have top level debug information." This reverts commit r212649 while I investigate/reduce/etc PR20367. llvm-svn: 213581	2014-07-21 20:45:59 +00:00
Logan Chien	b6d535b47e	Replace the result usages while legalizing cmpxchg. We should update the usages to all of the results; otherwise, we might get assertion failure or SEGV during the type legalization of ATOMIC_CMP_SWAP_WITH_SUCCESS with two or more illegal types. For example, in the following sequence, both i8 and i1 might be illegal in some target, e.g. armv5, mipsel, mips64el, %0 = cmpxchg i8* %ptr, i8 %desire, i8 %new monotonic monotonic %1 = extractvalue { i8, i1 } %0, 1 Since both i8 and i1 should be legalized, the corresponding ATOMIC_CMP_SWAP_WITH_SUCCESS dag will be checked/replaced/updated twice. If we don't update the usage to ALL of the results in the first round, the DAG for extractvalue might be processed earlier. The GetPromotedInteger() will result in assertion failure, because its operand (i.e. the success bit of cmpxchg) is not promoted beforehand. llvm-svn: 213569	2014-07-21 17:33:44 +00:00
Tom Stellard	7c4b3a94b6	R600/SI: Add instruction shrinking pass This pass converts 64-bit instructions to 32-bit when possible. llvm-svn: 213561	2014-07-21 16:55:33 +00:00
Tom Stellard	08b253cba1	R600/SI: Clean up some of the unused REGISTER_{LOAD,STORE} code There are a few more cleanups to do, but I ran into some problems with ext loads and trunc stores, when I tried to change some of the vector loads and stores from custom to legal, so I wasn't able to get rid of everything. llvm-svn: 213552	2014-07-21 15:45:06 +00:00
Tom Stellard	ed0ccca70d	R600/SI: Use scratch memory for large private arrays llvm-svn: 213551	2014-07-21 15:45:01 +00:00
Daniel Sanders	f60f0527d7	[mips] Do not emit '.module fp=...' unless we really need to. We now emit this value when we need to contradict the default value. This restores support for binutils 2.24. When a suitable binutils has been released we can resume unconditionally emitting .module directives. This is preferable to omitting the .module directives since the .module directives protect against, for example, accidentally assembling FP32 code with -mfp64 and producing an unusuable object. llvm-svn: 213548	2014-07-21 15:25:24 +00:00
Tom Stellard	5bfbb25d6b	R600/SI: Store constant initializer data in constant memory This implements a solution for constant initializers suggested by Vadim Girlin, where we store the data after the shader code and then use the S_GETPC instruction to compute its address. This saves use the trouble of creating a new buffer for constant data and then having to pass the pointer to the kernel via user SGPRs or the input buffer. llvm-svn: 213530	2014-07-21 14:01:14 +00:00
Tom Stellard	7f35eb40ab	R600/SI: Use VALU for i1 XOR llvm-svn: 213528	2014-07-21 14:01:10 +00:00
Daniel Sanders	8bfb511a18	[mips] Add MipsOptionRecord abstraction and use it to implement .reginfo/.MIPS.options This abstraction allows us to support the various records that can be placed in the .MIPS.options section in the future. We currently use it to record register usage information (the ODK_REGINFO record in our ELF64 spec). Each .MIPS.options record should subclass MipsOptionRecord and provide an implementation of EmitMipsOptionRecord. Patch by Matheus Almeida and Toma Tabacu llvm-svn: 213522	2014-07-21 13:30:55 +00:00
Tom Stellard	c386c1b7f3	TableGen: Allow AddedComplexity values to be negative This is useful for cases when stand-alone patterns are preferred to the patterns included in the instruction definitions. Instead of requiring that stand-alone patterns set a larger AddedComplexity value, which can be confusing to new developers, the allows us to reduce the complexity of the included patterns to achieve the same result. llvm-svn: 213521	2014-07-21 13:28:54 +00:00
Daniel Sanders	6aa5a305b2	[mips] Do not emit '.module [no]oddspreg' unless we really need to. We now emit this directive when we need to contradict the default value (e.g. -mno-odd-spreg is given) or an option changed the default value (e.g. -mfpxx is given). This restores support for the currently available head of binutils. However, at this point binutils 2.24 is still not sufficient since it does not support '.module fp=...'. llvm-svn: 213511	2014-07-21 10:45:47 +00:00
Chandler Carruth	1025f29d1e	FileCheck-ize a test. llvm-svn: 213508	2014-07-21 09:23:21 +00:00
Tim Northover	86e216695b	CodeGen: emit IR-level f16 conversion intrinsics as fptrunc/fpext This makes the first stage DAG for @llvm.convert.to.fp16 an fptrunc, and correspondingly @llvm.convert.from.fp16 an fpext. The legalisation path is now uniform, regardless of the input IR: fptrunc -> FP_TO_FP16 (if f16 illegal) -> libcall fpext -> FP16_TO_FP (if f16 illegal) -> libcall Each target should be able to select the version that best matches its operations and not be required to duplicate patterns for both fptrunc and FP_TO_FP16 (for example). As a result we can remove some redundant AArch64 patterns. llvm-svn: 213507	2014-07-21 09:13:56 +00:00
Andrea Di Biagio	626e5271d3	[DAGCombiner] Improve the shuffle-vector folding logic. Canonicalize shuffles according to rules: * shuffle(A, shuffle(A, B)) -> shuffle(shuffle(A,B), A) * shuffle(B, shuffle(A, B)) -> shuffle(shuffle(A,B), B) * shuffle(B, shuffle(A, Undef)) -> shuffle(shuffle(A, Undef), B) This patch helps identifying more shuffle pairs that could be combined reusing the already existing rules in the DAGCombiner. Added new test 'combine-vec-shuffle-5.ll' to verify that the canonicalized shuffles are now folded into a single shuffle node by the DAGCombiner. Added more test cases to 'combine-vec-shuffle-4.ll'. llvm-svn: 213504	2014-07-21 07:30:54 +00:00
Ulrich Weigand	eb914f2256	[PowerPC] ELFv2 aggregate passing support This patch adds infrastructure support for passing array types directly. These can be used by the front-end to pass aggregate types (coerced to an appropriate array type). The details of the array type being used inform the back-end about ABI-relevant properties. Specifically, the array element type encodes: - whether the parameter should be passed in FPRs, VRs, or just GPRs/stack slots (for float / vector / integer element types, respectively) - what the alignment requirements of the parameter are when passed in GPRs/stack slots (8 for float / 16 for vector / the element type size for integer element types) -- this corresponds to the "byval align" field Using the infrastructure provided by this patch, a companion patch to clang will enable two features: - In the ELFv2 ABI, pass (and return) "homogeneous" floating-point or vector aggregates in FPRs and VRs (this is similar to the ARM homogeneous aggregate ABI) - As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates that fit fully in registers without using the "byval" mechanism The patch uses the functionArgumentNeedsConsecutiveRegisters callback to encode that special treatment is required for all directly-passed array types. The isInConsecutiveRegs / isInConsecutiveRegsLast bits set as a results are then used to implement the required size and alignment rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc. As a related change, the ABI routines have to be modified to support passing floating-point types in GPRs. This is necessary because with homogeneous aggregates of 4-byte float type we can now run out of FPRs before we run out of the 64-byte argument save area that is shadowed by GPRs. Any extra floating-point arguments that no longer fit in FPRs must now be passed in GPRs until we run out of those too. Note that there was already code to pass floating-point arguments in GPRs used with vararg parameters, which was done by writing the argument out to the argument save area first and then reloading into GPRs. The patch re-implements this, however, in favor of code packing float arguments directly via extension/truncation, BITCAST, and BUILD_PAIR operations. This is required to support the ELFv2 ABI, since we cannot unconditionally write to the argument save area (which the caller might not have allocated). The change does, however, affect ELFv1 varags routines too; but even here the overall effect should be advantageous: Instead of loading the argument into the FPR, then storing the argument to the stack slot, and finally reloading the argument from the stack slot into a GPR, the new code now just loads the argument into the FPR, and subsequently loads the argument into the GPR (via BITCAST). That BITCAST might imply a save/reload from a stack temporary (in which case we're no worse than before); but it might be implemented more efficiently in some cases. The final part of the patch enables up to 8 FPRs and VRs for argument return in PPCCallingConv.td; this is required to support returning ELFv2 homogeneous aggregates. (Note that this doesn't affect other ABIs since LLVM wil only look for which register to use if the parameter is marked as "direct" return anyway.) Reviewed by Hal Finkel. llvm-svn: 213493	2014-07-21 00:13:26 +00:00
Ulrich Weigand	9fcc5caf2d	[PowerPC] ELFv2 explicit CFI for CR fields This is a minor improvement in the ELFv2 ABI. In ELFv1, DWARF CFI would represent a saved CR word (holding CR fields CR2, CR3, and CR4) using just a single CFI record refering to CR2. In ELFv2 instead, each of the CR fields is represented by its own CFI record. The advantage is that the compiler can now chose to save just a single (or two) CR fields instead of all of them, if those are the only ones that actually need saving. That can lead to more efficient code using mf(o)crf instead of the (slow) mfcr instruction. Note that this patch does not (yet) implement this more efficient code generation, but it does implement the part that is required to be ABI compliant: creating multiple CFI records if multiple CR fields are saved. Reviewed by Hal Finkel. llvm-svn: 213492	2014-07-21 00:03:18 +00:00
Ulrich Weigand	fb90fdfb31	[PowerPC] ELFv2 stack space reduction The ELFv2 ABI reduces the amount of stack required to implement an ABI-compliant function call in two ways: * the "linkage area" is reduced from 48 bytes to 32 bytes by eliminating two unused doublewords * the 64-byte "parameter save area" is now optional and need not be present in certain cases (it remains mandatory in functions with variable arguments, and functions that have any parameter that is passed on the stack) The following patch implements this required changes: - reducing the linkage area, and associated relocation of the TOC save slot, in getLinkageSize / getTOCSaveOffset (this requires updating all callers of these routines to pass in the isELFv2ABI flag). - (partially) handling the case where the parameter save are is optional This latter part requires some extra explanation: Currently, we still always allocate the parameter save area when calling a function. That is certainly always compliant with the ABI, but may cause code to allocate stack unnecessarily. This can be addressed by a follow-on optimization patch. On the callee side, in LowerFormalArguments, we must track correctly whether the ABI guarantees that the caller has allocated the parameter save area for our use, and the patch does so. However, there is one complication: the code that handles incoming "byval" arguments will currently always write to the parameter save area, because it has to force incoming register arguments to the stack since it must return an address to implement the byval semantics. To fix this, the patch changes the LowerFormalArguments code to write arguments to a freshly allocated stack slot on the function's own stack frame instead of the argument save area in those cases where that area is not present. Reviewed by Hal Finkel. llvm-svn: 213490	2014-07-20 23:43:15 +00:00
Ulrich Weigand	41e116ee77	[PowerPC] ELFv2 function call changes This patch builds upon the two preceding MC changes to implement the basic ELFv2 function call convention. In the ELFv1 ABI, a "function descriptor" was associated with every function, pointing to both the entry address and the related TOC base (and a static chain pointer for nested functions). Function pointers would actually refer to that descriptor, and the indirect call sequence needed to load up both entry address and TOC base. In the ELFv2 ABI, there are no more function descriptors, and function pointers simply refer to the (global) entry point of the function code. Indirect function calls simply branch to that address, after loading it up into r12 (as required by the ABI rules for a global entry point). Direct function calls continue to just do a "bl" to the target symbol; this will be resolved by the linker to the local entry point of the target function if it is local, and to a PLT stub if it is global. That PLT stub would then load the (global) entry point address of the final target into r12 and branch to it. Note that when performing a local function call, r2 must be set up to point to the current TOC base: if the target ends up local, the ABI requires that its local entry point is called with r2 set up; if the target ends up global, the PLT stub requires that r2 is set up. This patch implements all LLVM changes to implement that scheme: - No longer create a function descriptor when emitting a function definition (in EmitFunctionEntryLabel) - Emit two entry points if the function needs the TOC base (r2) anywhere (this is done EmitFunctionBodyStart; note that this cannot be done in EmitFunctionBodyStart because the global entry point prologue code must be part of the function as covered by debug info). - In order to make use tracking of r2 (as needed above) work correctly, mark direct function calls as implicitly using r2. - Implement the ELFv2 indirect function call sequence (no function descriptors; load target address into r12). - When creating an ELFv2 object file, emit the .abiversion 2 directive to tell the linker to create the appropriate version of PLT stubs. Reviewed by Hal Finkel. llvm-svn: 213489	2014-07-20 23:31:44 +00:00
Ulrich Weigand	28c7be6585	[MC] Pass MCSymbolData to needsRelocateWithSymbol As discussed in a previous checking to support the .localentry directive on PowerPC, we need to inspect the actual target symbol in needsRelocateWithSymbol to make the appropriate decision based on that symbol's st_other bits. Currently, needsRelocateWithSymbol does not get the target symbol. However, it is directly available to its sole caller. This patch therefore simply extends the needsRelocateWithSymbol by a new parameter "const MCSymbolData &SD", passes in the target symbol, and updates all derived implementations. In particular, in the PowerPC implementation, this patch removes the FIXME added by the previous checkin. llvm-svn: 213487	2014-07-20 23:15:06 +00:00
Hal Finkel	c0bfaefffa	[LoopVectorize] Use AA to partition potential dependency checks Prior to this change, the loop vectorizer did not make use of the alias analysis infrastructure. Instead, it performed memory dependence analysis using ScalarEvolution-based linear dependence checks within equivalence classes derived from the results of ValueTracking's GetUnderlyingObjects. Unfortunately, this meant that: 1. The loop vectorizer had logic that essentially duplicated that in BasicAA for aliasing based on identified objects. 2. The loop vectorizer could not partition the space of dependency checks based on information only easily available from within AA (TBAA metadata is currently the prime example). This means, for example, regardless of whether -fno-strict-aliasing was provided, the vectorizer would only vectorize this loop with a runtime memory-overlap check: void foo(int a, float b) { for (int i = 0; i < 1600; ++i) a[i] = b[i]; } This is suboptimal because the TBAA metadata already provides the information necessary to show that this check unnecessary. Of course, the vectorizer has a limit on the number of such checks it will insert, so in practice, ignoring TBAA means not vectorizing more-complicated loops that we should. This change causes the vectorizer to use an AliasSetTracker to keep track of the pointers in the loop. The resulting alias sets are then used to partition the space of dependency checks, and potential runtime checks; this results in more-efficient vectorizations. When pointer locations are added to the AliasSetTracker, two things are done: 1. The location size is set to UnknownSize (otherwise you'd not catch inter-iteration dependencies) 2. For instructions in blocks that would need to be predicated, TBAA is removed (because the metadata might have a control dependency on the condition being speculated). For non-predicated blocks, you can leave the TBAA metadata. This is safe because you can't have an iteration dependency on the TBAA metadata (if you did, and you unrolled sufficiently, you'd end up with the same pointer value used by two accesses that TBAA says should not alias, and that would yield undefined behavior). llvm-svn: 213486	2014-07-20 23:07:52 +00:00
Ulrich Weigand	1376d19372	[PowerPC] ELFv2 MC support for .localentry directive A second binutils feature needed to support ELFv2 is the .localentry directive. In the ELFv2 ABI, functions may have two entry points: one for calling the routine locally via "bl", and one for calling the function via function pointer (either at the source level, or implicitly via a PLT stub for global calls). The two entry points share a single ELF symbol, where the ELF symbol address identifies the global entry point address, while the local entry point is found by adding a delta offset to the symbol address. That offset is encoded into three platform-specific bits of the ELF symbol st_other field. The .localentry directive instructs the assembler to set those fields to encode a particular offset. This is typically used by a function prologue sequence like this: func: addis r2, r12, (.TOC.-func)@ha addi r2, r2, (.TOC.-func)@l .localentry func, .-func Note that according to the ABI, when calling the global entry point, r12 must be set to point the global entry point address itself; while when calling the local entry point, r2 must be set to point to the TOC base. The two instructions between the global and local entry point in the above example translate the first requirement into the second. This patch implements support in the PowerPC MC streamers to emit the .localentry directive (both into assembler and ELF object output), as well as support in the assembler parser to parse that directive. In addition, there is another change required in MC fixup/relocation handling to properly deal with relocations targeting function symbols with two entry points: When the target function is known local, the MC layer would immediately handle the fixup by inserting the target address -- this is wrong, since the call may need to go to the local entry point instead. The GNU assembler handles this case by not directly resolving fixups targeting functions with two entry points, but always emits the relocation and relies on the linker to handle this case correctly. This patch changes LLVM MC to do the same (this is done via the processFixupValue routine). Similarly, there are cases where the assembler would normally emit a relocation, but "simplify" it to a relocation targeting a section instead of the actual symbol. For the same reason as above, this may be wrong when the target symbol has two entry points. The GNU assembler again handles this case by not performing this simplification in that case, but leaving the relocation targeting the full symbol, which is then resolved by the linker. This patch changes LLVM MC to do the same (via the needsRelocateWithSymbol routine). NOTE: The method used in this patch is overly pessimistic, since the needsRelocateWithSymbol routine currently does not have access to the actual target symbol, and thus must always assume that it might have two entry points. This will be improved upon by a follow-on patch that modifies common code to pass the target symbol when calling needsRelocateWithSymbol. Reviewed by Hal Finkel. llvm-svn: 213485	2014-07-20 23:06:03 +00:00
Ulrich Weigand	22549677c8	[PowerPC] ELFv2 MC support for .abiversion directive ELFv2 binaries are marked by a bit in the ELF header e_flags field. A new assembler directive .abiversion can be used to set that flag. This patch implements support in the PowerPC MC streamers to emit the .abiversion directive (both into assembler and ELF binary output), as well as support in the assembler parser to parse the .abiversion directive. Reviewed by Hal Finkel. llvm-svn: 213484	2014-07-20 22:56:57 +00:00
Ulrich Weigand	c274d8aae7	[PowerPC] Fix FrameIndex handling in SelectAddressRegImm The PPCTargetLowering::SelectAddressRegImm routine needs to handle FrameIndex nodes in a special manner, by tranlating them into a TargetFrameIndex node. This was done in most cases, but seems to have been neglected in one path: when the input tree has an OR of the FrameIndex with an immediate. This can happen if the FrameIndex can be proven to be sufficiently aligned that an OR of that immediate is equivalent to an ADD. The missing handling of FrameIndex in that case caused the SelectionDAG instruction selection to miss opportunities to merge the OR back into the FrameIndex node, leading to superfluous addi/ori instructions in the final assembler output. llvm-svn: 213482	2014-07-20 22:26:40 +00:00
Matt Arsenault	a8f2abe691	R600: Add missing test for concat_vectors llvm-svn: 213473	2014-07-20 07:13:17 +00:00
Matt Arsenault	2d097d5e02	R600/SI: Remove dead code and add missing tests. This probably was killed by some generic DAGCombiner improvements in checking the TargetBooleanContents instead of just 1. llvm-svn: 213471	2014-07-20 06:11:02 +00:00
Matt Arsenault	840d57e330	R600/SI: implement range reduction for sin/cos These instructions can only take a limited input range, and return the constant value 1 out of range. We should do range reduction to be able to process arbitrary values. Use a FRACT instruction after normalization to achieve this. Also add a test for constant folding with the lowered code with unsafe-fp-math enabled. v2: use DAG lowering instead of intrinsic, adapt test v3: calculate constant, fold pattern into instruction definition v4: misc style fixes, add sin-fold testcase, cosmetics Patch by Grigori Goronzy llvm-svn: 213458	2014-07-19 18:44:39 +00:00
Hal Finkel	eb099b2a30	[LoopVectorize] Propagate known metadata to vectorized instructions There are some kinds of metadata that are safe to propagate from the scalar instructions to the vector instructions (fpmath and tbaa currently). Regarding TBAA, one might worry about propagating it on if-converted loads and stores, because the metadata might have had a control dependency on the condition, and thus actually aliased with some other non-speculated memory access when the condition was false. However, this would be caught by the runtime overlap checks. llvm-svn: 213452	2014-07-19 13:33:16 +00:00
Andrea Di Biagio	124fd0b7c5	[x86] Fix wrong shuffle mask in test 'combine-vec-shuffle-3.ll'. No functional change. Function @test3c should check that the DAGCombiner is able to fold a pair of shuffles into a new shuffle with a permute mask of <6,7,2,3>. However, one of the shuffles in @test3c had a wrong permute mask; this prevented the DAGCombiner from folding the shuffles into the expected result. Now that the shuffle mask is fixed, the backend correctly folds the two shuffles in function @test3c into a single movhlps instruction. llvm-svn: 213451	2014-07-19 07:52:58 +00:00
Hal Finkel	bf21903aff	Make Value::isDereferenceablePointer handle offsets to pointer types with dereferenceable attributes When we have a parameter (or call site return) with a dereferenceable attribute, it can specify the size of an array pointed to by that parameter. If we have a value for which we can accumulate a constant offset to such a parameter, then we can use that offset in a direct comparison with the size specified by the dereferenceable attribute. This enables us to handle cases like this: int foo(int a[static 3]) { return a[2]; /* this is always dereferenceable */ } llvm-svn: 213447	2014-07-19 03:25:16 +00:00
Saleem Abdulrasool	f8a4e76c32	ARM: correct WoA __builtin_alloca handling on O0 When performing a dynamic stack adjustment without optimisations, we would mark SP as def and R4 as kill. This occurred as part of the expansion of a WIN__CHKSTK SDNode which indicated the proper handling of SP and R4. The result would be that we would double define SP as part of an operation, which is obviously incorrect. Furthermore, the VTList for the chain had an incorrect parameter type of i32 instead of Other. Correct these to permit proper lowering of __builtin_alloca at -O0. llvm-svn: 213442	2014-07-19 01:29:51 +00:00
Hal Finkel	006e1d44a6	[PowerPC] 32-bit ELF PIC support This adds initial support for PPC32 ELF PIC (Position Independent Code; the -fPIC variety), thus rectifying a long-standing deficiency in the PowerPC backend. Patch by Justin Hibbits! llvm-svn: 213427	2014-07-18 23:29:49 +00:00
Mark Heffernan	5a81c3219c	Remove unroll pragma metadata after it is used. llvm-svn: 213412	2014-07-18 21:04:33 +00:00
Eli Bendersky	71f651043f	Add tests for atomic adds on floats. llvm-svn: 213406	2014-07-18 20:11:26 +00:00
Eli Bendersky	c9ff12e4fa	Use CHECK-LABEL where appropriate in this test. llvm-svn: 213398	2014-07-18 19:32:09 +00:00
Gerolf Hoflehner	5fa7774dfd	MergedLoadStoreMotion pass Merges equivalent loads on both sides of a hammock/diamond and hoists into into the header. Merges equivalent stores on both sides of a hammock/diamond and sinks it to the footer. Can enable if conversion and tolerate better load misses and store operand latencies. llvm-svn: 213396	2014-07-18 19:13:09 +00:00
David Peixotto	569d73691e	MC: support different sized constants in constant pools On AArch64 the pseudo instruction ldr <reg>, =... supports both 32-bit and 64-bit constants. Add support for 64 bit constants for the pools to support the pseudo instruction fully. Changes the AArch64 ldr-pseudo tests to use 32-bit registers and adds tests with 64-bit registers. Patch by Janne Grunau! Differential Revision: http://reviews.llvm.org/D4279 llvm-svn: 213387	2014-07-18 16:05:14 +00:00
Hal Finkel	000be1bc2f	Add a dereferenceable attribute This attribute indicates that the parameter or return pointer is dereferenceable. Practically speaking, loads from such a pointer within the associated byte range are safe to speculatively execute. Such pointer parameters are common in source languages (C++ references, for example). llvm-svn: 213385	2014-07-18 15:51:28 +00:00
Tim Northover	be8b73df94	AArch64: implement efficient f16 bitcasts Because i16 is illegal, there's no native DAG method to represent a bitcast to or from an f16 type. This meant LLVM was inserting a stack store/load pair which is really not ideal. llvm-svn: 213378	2014-07-18 13:07:05 +00:00
Tim Northover	0380ef9c00	NVPTX: support fpext/fptrunc to and from f16. llvm-svn: 213377	2014-07-18 13:01:43 +00:00
Tim Northover	854fe649af	R600: support fpext/fptrunc operations to and from f16. llvm-svn: 213376	2014-07-18 13:01:37 +00:00
Tim Northover	1b266803fb	AArch64: support f16 extend/trunc operations. llvm-svn: 213375	2014-07-18 13:01:31 +00:00
Tim Northover	07cb1b71c9	X86: support fpext/fptrunc operations to and from 16-bit floats. llvm-svn: 213374	2014-07-18 13:01:25 +00:00
Tim Northover	25c770b7c4	ARM: support legalisation of "fptrunc ... to half" operations. llvm-svn: 213373	2014-07-18 13:01:19 +00:00
Tim Northover	e4c93c0798	CodeGen: soften f16 type by default instead of marking legal. Actual support for softening f16 operations is still limited, and can be added when it's needed. But Soften is much closer to being a useful thing to try than keeping it Legal when no registers can actually hold such values. Longer term, we probably want something between Soften and Promote semantics for most targets, it'll be more efficient to promote the 4 basic operations to f32 than libcall them. llvm-svn: 213372	2014-07-18 12:41:46 +00:00
Tilmann Scheller	d04d5fde2e	[ARM] Add earlyclobber constraint to pre/post-indexed ARM STR instructions. The post-indexed instructions were missing the constraint, causing unpredictable STR instructions to be emitted. The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed. This fixes PR20323. llvm-svn: 213369	2014-07-18 12:05:49 +00:00
Tim Northover	62ad6904d9	R600: rename misleading fp16 test. This test is actually going in the opposite direction to what the filename and function name suggested. llvm-svn: 213358	2014-07-18 08:43:30 +00:00
Tim Northover	de7867151d	R600: support f16 -> f64 conversion intrinsic. Unfortunately, we don't seem to have a direct truncation, but the extension can be legally split into two operations so we should support that. llvm-svn: 213357	2014-07-18 08:43:24 +00:00
Tim Northover	86458323c0	NVPTX: support direct f16 <-> f64 conversions via intrinsics. Clang may well start emitting these soon, and while it may not be directly relevant for OpenCL or GLSL, the instructions were just sitting there waiting to be used. llvm-svn: 213356	2014-07-18 08:30:10 +00:00
Matt Arsenault	b71d4aab48	R600: Implement TTI:getPopcntSupport The test is just copied from X86, and I don't know of a better way to test it. llvm-svn: 213351	2014-07-18 06:07:13 +00:00
Jim Grosbach	7a17678ea4	X86: Constant fold converting vector setcc results to float. Since the result of a SETCC for X86 is 0 or -1 in each lane, we can move unary operations, in this case [su]int_to_fp through the mask operation and constant fold the operation away. Generally speaking: UNARYOP(AND(VECTOR_CMP(x,y), constant)) --> AND(VECTOR_CMP(x,y), constant2) where constant2 is UNARYOP(constant). This implements the transform where UNARYOP is [su]int_to_fp. For example, consider the simple function: define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind { %cmp = fcmp oeq <4 x float> %val, %test %ext = zext <4 x i1> %cmp to <4 x i32> %result = sitofp <4 x i32> %ext to <4 x float> ret <4 x float> %result } Before this change, the SSE code is generated as: LCPI0_0: .long 1 ## 0x1 .long 1 ## 0x1 .long 1 ## 0x1 .long 1 ## 0x1 .section __TEXT,__text,regular,pure_instructions .globl _foo .align 4, 0x90 _foo: ## @foo cmpeqps %xmm1, %xmm0 andps LCPI0_0(%rip), %xmm0 cvtdq2ps %xmm0, %xmm0 retq After, the code is improved to: LCPI0_0: .long 1065353216 ## float 1.000000e+00 .long 1065353216 ## float 1.000000e+00 .long 1065353216 ## float 1.000000e+00 .long 1065353216 ## float 1.000000e+00 .section __TEXT,__text,regular,pure_instructions .globl _foo .align 4, 0x90 _foo: ## @foo cmpeqps %xmm1, %xmm0 andps LCPI0_0(%rip), %xmm0 retq The cvtdq2ps has been constant folded away and the floating point 1.0f vector lanes are materialized directly via the ModRM operand of andps. llvm-svn: 213342	2014-07-18 00:40:56 +00:00
Jim Grosbach	2f665f1cd7	AArch64: Constant fold converting vector setcc results to float. Since the result of a SETCC for AArch64 is 0 or -1 in each lane, we can move unary operations, in this case [su]int_to_fp through the mask operation and constant fold the operation away. Generally speaking: UNARYOP(AND(VECTOR_CMP(x,y), constant)) --> AND(VECTOR_CMP(x,y), constant2) where constant2 is UNARYOP(constant). This implements the transform where UNARYOP is [su]int_to_fp. For example, consider the simple function: define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind { %cmp = fcmp oeq <4 x float> %val, %test %ext = zext <4 x i1> %cmp to <4 x i32> %result = sitofp <4 x i32> %ext to <4 x float> ret <4 x float> %result } Before this change, the code is generated as: fcmeq.4s v0, v0, v1 movi.4s v1, #0x1 // Integer splat value. and.16b v0, v0, v1 // Mask lanes based on the comparison. scvtf.4s v0, v0 // Convert each lane to f32. ret After, the code is improved to: fcmeq.4s v0, v0, v1 fmov.4s v1, #1.00000000 // f32 splat value. and.16b v0, v0, v1 // Mask lanes based on the comparison. ret The svvtf.4s has been constant folded away and the floating point 1.0f vector lanes are materialized directly via fmov.4s. Rather than do the folding manually in the target code, teach getNode() in the generic SelectionDAG to handle folding constant operands of vector [su]int_to_fp nodes. It is reasonable (as noted in a FIXME) to do additional constant folding there as well, but I don't have test cases for those operations, so leaving them for another time when it becomes appropriate. rdar://17693791 llvm-svn: 213341	2014-07-18 00:40:52 +00:00
Michael J. Spencer	ff351b2bdb	Revert "[x86] Fold extract_vector_elt of a load into the Load's address computation." There's a bug where this can create cycles in the DAG. It will take a bit to fix, so I'm backing it out for now. llvm-svn: 213339	2014-07-18 00:15:50 +00:00
Kevin Enderby	abaf0d20c2	Add printing of Mach-O stabs in llvm-nm. llvm-svn: 213327	2014-07-17 22:47:16 +00:00
Alexey Samsonov	7fb0be9dd0	[ASan] Don't instrument load/stores with !nosanitize metadata. This is used to avoid instrumentation of instructions added by UBSan in Clang frontend (see r213291). This fixes PR20085. Reviewed in http://reviews.llvm.org/D4544. llvm-svn: 213292	2014-07-17 18:48:12 +00:00
Justin Holewinski	3021ef095b	[NVPTX] Improve handling of FP fusion We now consider the FPOpFusion flag when determining whether to fuse ops. We also explicitly emit add.rn when fusion is disabled to prevent ptxas from fusing the operations on its own. llvm-svn: 213287	2014-07-17 18:10:09 +00:00
Zinovy Nis	c10a269e5f	[BUG] Due to a typo introduced in r199933 and r200027 two tests for FMA were never even started. llvm-svn: 213283	2014-07-17 17:14:35 +00:00
Adam Nemet	09fcf8939c	[X86] AVX512: Add disassembler support for compressed displacement There are two parts here. First is to modify tablegen to adjust the encoding type ENCODING_RM with the scaling factor. The second is to use the new encoding types to compute the correct displacement in the decoder. Fixes <rdar://problem/17608489> llvm-svn: 213281	2014-07-17 17:04:56 +00:00
Adam Nemet	3bb0a6b076	[TableGen] Allow shift operators to take bits<n> Convert the operand to int if possible, i.e. if the value is properly initialized. (I suppose there is further room for improvement here to also peform the shift if the uninitialized bits are shifted out.) With this little change we can now compute the scaling factor for compressed displacement with pure tablegen code in the X86 backend. This is useful because both the X86-disassembler-specific part of tablegen and the assembler need this and TD is the natural sharing place. The patch also adds the missing documentation for the shift and add operator. llvm-svn: 213277	2014-07-17 17:04:27 +00:00
Justin Holewinski	60265475a1	[NVPTX] Add missing .v4 qualifier on vector store instruction llvm-svn: 213276	2014-07-17 16:58:56 +00:00
Saleem Abdulrasool	86c034e0ff	MC: correct DWARF header for PE/COFF assembly input The header contains an offset to the DWARF abbreviations for the CU. The offset must be section relative for COFF and absolute for others. The non-assembly code path for the DWARF header generation already had the correct emission for the headers. This corrects just the assembly path. Due to the invalid relocation, processing of the debug information would halt previously on the first assembly input as the associated abbreviations would be out of range as they would have the location increased by image base and the section offset. This address PR20332. llvm-svn: 213275	2014-07-17 16:27:44 +00:00
Saleem Abdulrasool	b6a416296a	MC: fix MCAsmInfo usage for windows-itanium Windows itanium uses the GNUCOFF assmebly format, not ELF. llvm-svn: 213274	2014-07-17 16:27:40 +00:00
Justin Holewinski	5248ed4d97	[NVPTX] Flag surface/texture query instructions with IsTexSurfQuery Also, add some tests to make sure we can handle surface/texture queries on both Fermi and Kepler+. llvm-svn: 213268	2014-07-17 14:51:33 +00:00
Justin Holewinski	9c3e284e16	[NVPTX] Add more surface/texture intrinsics, including CUDA unified texture fetch This also uses TSFlags to mark machine instructions that are surface/texture accesses, as well as the vector width for surface operations. This is used to simplify some of the switch statements that need to detect surface/texture instructions llvm-svn: 213256	2014-07-17 11:59:04 +00:00
Tim Northover	48ae22e14a	ARM: support direct f16 <-> f64 conversions ARMv8 has instructions to handle it, otherwise a libcall is needed. llvm-svn: 213254	2014-07-17 11:27:04 +00:00
Justin Holewinski	a1eab159d8	[TABLEGEN] Do not crash on intrinsics with names longer than 40 characters Differential Revision: http://reviews.llvm.org/D4537 llvm-svn: 213253	2014-07-17 11:23:29 +00:00
Tim Northover	21a41cb9a1	CodeGen: generate single libcall for fptrunc -> f16 operations. Previously we asserted on this code. Currently compiler-rt doesn't actually implement any of these new libcalls, but external help is pretty much the only viable option for LLVM. I've followed the much more generic "__truncST2" naming, as opposed to the odd name for f32 -> f16 truncation. This can obviously be changed later, or overridden by any targets that need to. llvm-svn: 213252	2014-07-17 11:12:12 +00:00
Tim Northover	81da81acc1	X86: support double extension of f16 type. x86 has no native ability to extend an f16 to f64, but the same result is obtained if we expand it into two separate extensions: f16 -> f32 -> f64. Unfortunately the same is not true for truncate, so that still results in a compilation failure. llvm-svn: 213251	2014-07-17 11:04:04 +00:00
Tim Northover	eae1f1c8cc	CodeGen: extend f16 conversions to permit types > float. This makes the two intrinsics @llvm.convert.from.f16 and @llvm.convert.to.f16 accept types other than simple "float". This is only strictly needed for the truncate operation, since otherwise double rounding occurs and there's no way to represent the strict IEEE conversion. However, for symmetry we allow larger types in the extend too. During legalization, we can expand an "fp16_to_double" operation into two extends for convenience, but abort when the truncate isn't legal. A new libcall is probably needed here. Even after this commit, various target tweaks are needed to actually use the extended intrinsics. I've put these into separate commits for clarity, so there are no actual tests of f64 conversion here. llvm-svn: 213248	2014-07-17 10:51:23 +00:00
Yi Kong	9b1652c5d0	Port memory barriers intrinsics to AArch64 Memory barrier __builtin_arm_[dmb, dsb, isb] intrinsics are required to implement their corresponding ACLE and MSVC intrinsics. This patch ports ARM dmb, dsb, isb intrinsic to AArch64. Differential Revision: http://reviews.llvm.org/D4520 llvm-svn: 213247	2014-07-17 10:50:20 +00:00
Daniel Sanders	f49e5efb1a	[mips] .reginfo is 8 byte aligned on N32. Differential Revision: http://reviews.llvm.org/D4540 llvm-svn: 213246	2014-07-17 10:10:04 +00:00
Daniel Sanders	e88052c0c5	[mips] Correct ELF e_flags for the N32 ABI when using a mips-* triple rather than a mips64-* triple Summary: Generally speaking, mips-* vs mips64-* should not be used to make decisions about the content or format of the ELF. This should be based on the ABI and CPU in use. For example, `mips-linux-gnu-clang -mips64r2 -mabi=64` should produce an ELF64 as should `mips64-linux-gnu-clang -mabi=64`. Conversely, `mips64-linux-gnu-clang -mabi=n32` should produce an ELF32 as should `mips-linux-gnu-clang -mips64r2 -mabi=n32`. This patch fixes the e_flags but leaves the ELF32 vs ELF64 issue for now since there is no apparent way to base this decision on the ABI and CPU. Differential Revision: http://reviews.llvm.org/D4539 llvm-svn: 213244	2014-07-17 10:02:08 +00:00
Daniel Sanders	7121fa671b	[mips] Correct .MIPS.abiflags for -mfpxx on MIPS32r6 Summary: The cpr1_size field describes the minimum register width to run the program rather than the size of the registers on the target. MIPS32r6 was acting as if -mfp64 has been given because it starts off with 64-bit FPU registers. Differential Revision: http://reviews.llvm.org/D4538 llvm-svn: 213243	2014-07-17 09:57:23 +00:00
Daniel Sanders	eea41061d0	[mips] Fix ELF e_flags related to -mabicalls and -mplt. Summary: These options are not implemented yet but we act as if they are always given. The integrated assembler is driven by the clang driver so the e_flag test cases should match the e_flags emitted by GCC+GAS rather than GAS by itself. Differential Revision: http://reviews.llvm.org/D4536 llvm-svn: 213242	2014-07-17 09:52:56 +00:00
Evgeniy Stepanov	5b945c9d7e	[msan] Avoid redundant origin stores. Origin is meaningless for fully initialized values. Avoid storing origin for function arguments that are known to be always initialized (i.e. shadow is a compile-time null constant). This is not about correctness, but purely an optimization. Seems to affect compilation time of blacklisted functions significantly. llvm-svn: 213239	2014-07-17 09:10:37 +00:00
Suyog Sarda	e62b39fcd0	Move ashr optimization from InstCombineShift to InstSimplify. Refactor code, no functionality change, test case moved from instcombine to instsimplify. Differential Revision: http://reviews.llvm.org/D4102 llvm-svn: 213231	2014-07-17 06:28:15 +00:00
Hal Finkel	9779454568	Improve BasicAA CS-CS queries (redux) This reverts, "r213024 - Revert r212572 "improve BasicAA CS-CS queries", it causes PR20303." with a fix for the bug in pr20303. As it turned out, the relevant code was both wrong and over-conservative (because, as with the code it replaced, it would return the overall ModRef mask even if just Ref had been implied by the argument aliasing results). Hopefully, this correctly fixes both problems. Thanks to Nick Lewycky for reducing the test case for pr20303 (which I've cleaned up a little and added in DSE's test directory). The BasicAA test has also been updated to check for this error. Original commit message: BasicAA contains knowledge of certain intrinsics, such as memcpy and memset, and uses that information to form more-accurate answers to CallSite vs. Loc ModRef queries. Unfortunately, it did not use this information when answering CallSite vs. CallSite queries. Generically, when an intrinsic takes one or more pointers and the intrinsic is marked only to read/write from its arguments, the offset/size is unknown. As a result, the generic code that answers CallSite vs. CallSite (and CallSite vs. Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's arguments. While BasicAA's CallSite vs. Loc override could use more-accurate size information for some intrinsics, it did not do the same for CallSite vs. CallSite queries. This change refactors the intrinsic-specific logic in BasicAA into a generic AA query function: getArgLocation, which is overridden by BasicAA to supply the intrinsic-specific knowledge, and used by AA's generic implementation. This allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and CallSite vs. CallSite queries, and simplifies the BasicAA implementation. Currently, only one function, Mac's memset_pattern16, is handled by BasicAA (all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's getModRefBehavior override now also returns OnlyAccessesArgumentPointees for this function (which is an improvement). llvm-svn: 213219	2014-07-17 01:28:25 +00:00
Jingyue Wu	7c4bea3e99	Partially revert r210444 due to performance regression Summary: Converting outermost zext(a) to sext(a) causes worse code when the computation of zext(a) could be reused. For example, after converting ... = array[zext(a)] ... = array[zext(a) + 1] to ... = array[sext(a)] ... = array[zext(a) + 1], the program computes sext(a), which is actually unnecessary. I added one test in split-gep-and-gvn.ll to illustrate this scenario. Also, with r211281 and r211084, we annotate more "nuw" tags to computation involving CUDA intrinsics such as threadIdx.x. These annotations help with splitting GEP a lot, rendering the benefit we get from this reverted optimization only marginal. Test Plan: make check-all Reviewers: eliben, meheff Reviewed By: meheff Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D4542 llvm-svn: 213209	2014-07-16 23:25:00 +00:00
Sanjay Patel	5537765676	Fixed formatting, removed bug reference, renamed testcase Thanks to Duncan Exon Smith for reviewing and cleanup suggestions. llvm-svn: 213205	2014-07-16 22:40:28 +00:00
Juergen Ributzka	63d2af65d4	[FastISel] Local values shouldn't be alive across an inline asm call with side effects. This fixes an issue where a local value is defined before and used after an inline asm call with side effects. This fix simply flushes the local value map, which updates the insertion point for the inline asm call to be above any previously defined local values. This fixes <rdar://problem/17694203> llvm-svn: 213203	2014-07-16 22:20:51 +00:00
Sanjay Patel	8befe236c4	trivial fix for PR20314 Make sure that the AddrInst is an Instruction. llvm-svn: 213197	2014-07-16 21:08:10 +00:00
Justin Holewinski	35f9408e7f	[NVPTX] Honor alignment on vector loads/stores We were not considering the stated alignment on vector loads/stores, leading us to generate vector instructions even when we do not have sufficient alignment. Now, for IR like: %1 = load <4 x float>, <4 x float>* %ptr, align 4 we will generate correct, conservative PTX like: ld.f32 ... [%ptr] ld.f32 ... [%ptr+4] ld.f32 ... [%ptr+8] ld.f32 ... [%ptr+12] Or if we have an alignment of 8 (for example), we can generate code like: ld.v2.f32 ... [%ptr] ld.v2.f32 ... [%ptr+8] llvm-svn: 213186	2014-07-16 19:45:35 +00:00
Alexey Samsonov	d7eaaec8e8	CHECK-LABEL-ize one test llvm-svn: 213177	2014-07-16 18:11:31 +00:00
Kevin Enderby	d2a97a67de	Add the "-x" flag to llvm-nm for Mach-O files that prints the fields of a symbol in hex. (generally use for debugging the tools). This is same functionality as darwin’s nm(1) "-x" flag. llvm-svn: 213176	2014-07-16 17:38:26 +00:00
Justin Holewinski	84f0bca9c1	[NVPTX] Rename registers %fl -> %fd and %rl -> %rd This matches the internal behavior of NVIDIA tools like libnvvm. llvm-svn: 213168	2014-07-16 16:26:58 +00:00
Tim Northover	8d2acb0c42	Convert test to CHECK-LABEL llvm-svn: 213161	2014-07-16 15:37:08 +00:00
Andrea Di Biagio	a161db3638	[X86] Add a check for 'isMOVHLPSMask' within method 'isShuffleMaskLegal'. Before this change, method 'isShuffleMaskLegal' didn't know that shuffles implementing a 'movhlps' operation were perfectly legal for SSE targets. This patch adds the missing check for 'isMOVHLPSMask' inside method 'isShuffleMaskLegal' to fix the problem. The reason why it is important to do this is because the DAGCombiner conservatively avoids combining a pair of shuffles if the resulting shuffle node has an illegal mask. Before this patch, shuffles with a MOVHLPS mask were wrongly considered not to be legal. This was the root cause of some poor-code generation bugs. llvm-svn: 213137	2014-07-16 11:29:39 +00:00
Reid Kleckner	d5cc38a11b	Roundtrip the inalloca bit on allocas through bitcode This was an oversight in the original support. As it is, I stuffed this bit into the alignment. The alignment is stored in log2 form, so it doesn't need more than 5 bits, given that Value::MaximumAlignment is 1 << 29. Reviewers: nicholas Differential Revision: http://reviews.llvm.org/D3943 llvm-svn: 213118	2014-07-16 01:34:27 +00:00
Tyler Nowicki	16db81fdfa	Emit warnings if vectorization is forced and fails. This patch modifies the existing DiagnosticInfo system to create a generic base class that is inherited to produce diagnostic-based warnings. This is used by the loop vectorizer to trigger a warning when vectorization is forced and fails. Several tests have been added to verify this behavior. Reviewed by: Arnold Schwaighofer llvm-svn: 213110	2014-07-16 00:36:00 +00:00
Matt Arsenault	15eb0d54b0	R600/SI: Allow using f32 rcp / rsq when denormals not handled. These are precise enough to use for OpenCL unless denormals are handled. llvm-svn: 213107	2014-07-15 23:50:10 +00:00
Peter Collingbourne	81db7497a2	[dfsan] Introduce further optimization to reduce the number of union queries. Specifically, do not compute a union if it is statically known that one shadow set subsumes the other. llvm-svn: 213100	2014-07-15 22:13:19 +00:00
Matt Arsenault	c093eee935	R600/SI: Fix select on i1 llvm-svn: 213096	2014-07-15 21:44:37 +00:00
David Blaikie	1a77466a94	Try out FileCheck's new (in r212810) -implicit-check-not in a DebugInfo test. Just tried this on a few tests and this was the only one that was easily ported to use the new feature, so we'll go with that for now. Hopefully can act as inspiration/reminder for other tests. Not all debug info tests need to check for every DW_TAG or NULL child terminator, but perhaps they should (just to ensure they don't accidentally end up with tags nested inside other tags without the test failing, for example) llvm-svn: 213092	2014-07-15 21:06:37 +00:00
Matt Arsenault	1ceb5e82c1	R600/SI: Implement less wrong f32 fdiv Assuming single precision denormals and accurate sqrt/div are not reported, this passes the OpenCL conformance test. llvm-svn: 213089	2014-07-15 20:18:31 +00:00
Chris Bieneman	d1b660f0a6	[RegisterCoalescer] Add new subtarget hook allowing targets to opt-out of coalescing. The coalescer is very aggressive at propagating constraints on the register classes, and the register allocator doesn’t know how to split sub-registers later to recover. This patch provides an escape valve for targets that encounter this problem to limit coalescing. This patch also implements such for ARM to lower register pressure when using lots of large register classes. This works around PR18825. llvm-svn: 213078	2014-07-15 17:18:41 +00:00
Tilmann Scheller	eec6d84fe4	[AArch64] Add negative tests for the SIMD & FP LDP instructions. LDP is unpredictable if the registers in the pair are identical, these tests check that we don't assemble instructions like that and error out instead. llvm-svn: 213074	2014-07-15 16:33:24 +00:00
Cameron McInally	e9e4e99ecf	Revert r213070. It's breaking the build in MCELFStreamer::EmitInstToData(...). llvm-svn: 213073	2014-07-15 16:24:24 +00:00
Jan Vesely	aa9875787e	R600: Implement zero undef variants of ctlz/cttz v2: use ffbh/l if available v3: Rebase on top of Matt's SI patches Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 213072	2014-07-15 15:51:09 +00:00
Daniel Sanders	2972f614a7	[mips] Correct .MIPS.abiflags fp_abi field for -mfpxx and without .module Summary: Previously all the test cases set it after initialization with '.module fp=xx'. Differential Revision: http://reviews.llvm.org/D4489 llvm-svn: 213071	2014-07-15 15:31:39 +00:00
Cameron McInally	6eb8b83c5d	Add x86 patterns to match a specific add-with-carry. llvm-svn: 213070	2014-07-15 15:03:32 +00:00
Andrea Di Biagio	454620d57b	[DAGCombiner] Add more rules to fold shuffles. This patch adds two new rules to the DAGCombiner: 1. shuffle (shuffle A, Undef, M0), B, M1 -> shuffle A, B, M2 2. shuffle (shuffle A, Undef, M0), A, M1 -> shuffle A, Undef, M2 We only do this if the combined shuffle is legal for the target. Example: ;; define <4 x float> @test(<4 x float> %a, <4 x float> %b) { %1 = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32><i32 6, i32 0, i32 1, i32 7> %2 = shufflevector <4 x float> %1, <4 x float> %b, <4 x i32><i32 1, i32 2, i32 4, i32 5> ret <4 x i32> %2 } ;; (using llc -mcpu=corei7 -march=x86-64) Before, the x86 backend generated: pshufd $120, %xmm0, %xmm0 shufps $-108, %xmm0, %xmm1 movaps %xmm1, %xmm0 Now the x86 backend generates: movsd %xmm1, %xmm0 llvm-svn: 213069	2014-07-15 13:26:28 +00:00
Stepan Dyatkovskiy	d2c35e9525	MergeFunc patch from Björn Steinbrink. Phabricator ticket: D4246, Don't merge functions with different range metadata on call/invoke. Thanks! llvm-svn: 213060	2014-07-15 10:46:51 +00:00
Tim Northover	93f17f8aee	AArch64: fall back to generic code for out of range extract/insert. rdar://problem/17624784 llvm-svn: 213059	2014-07-15 10:00:26 +00:00
Peter Collingbourne	2e568801f1	[dfsan] Introduce an optimization to reduce the number of union queries. Specifically, when building a union query, if we are dominated by an identical query then use the result of that query instead. llvm-svn: 213047	2014-07-15 04:41:17 +00:00
David Majnemer	8a37c2d08e	Some targets don't prefix private symbols with dot llvm-svn: 213042	2014-07-15 03:00:41 +00:00
David Majnemer	68d2e3557f	Specify a more specific triple for constant-pool-remat-0.ll Instead of specifying 32-bit x86, specify 32-bit x86 linux. This test is testing a very specific behavior which changed with WinCOFF's constant pools. llvm-svn: 213041	2014-07-15 03:00:39 +00:00
David Majnemer	7021d13d69	Relax tests expecting to see CPI symbols WinCOFF doesn't use CPI symbols, it has a different scheme for naming constant pool entries. Update tests to handle either appearing. llvm-svn: 213039	2014-07-15 02:44:49 +00:00
David Majnemer	7f4b6696c1	CodeGen: Handle ConstantVector and undef in WinCOFF constant pools The constant pool entry code for WinCOFF assumed that vector constants would be formed using ConstantDataVector, it did not expect to see a ConstantVector. Furthermore, it did not expect undef as one of the elements of the vector. ConstantVectors should be handled like ConstantDataVectors, treat Undef as zero. llvm-svn: 213038	2014-07-15 02:34:12 +00:00
Matt Arsenault	211ccabffb	R600: Add dag combine for copy of an illegal type. This helps avoid redundant instructions to unpack, and repack the vectors. Ideally we could recognize that pattern and eliminate it. Currently v4i8 and other small element type vectors are scalarized, so this has the added bonus of avoiding that. llvm-svn: 213031	2014-07-15 02:06:31 +00:00
Matt Arsenault	d769765bf9	Teach computeKnownBits to look through addrspacecast. This fixes inferring alignment through an addrspacecast. llvm-svn: 213030	2014-07-15 01:55:03 +00:00
Andrea Di Biagio	b94410d6d3	Improve test 'CodeGen/X86/combine-vec-shuffle-3.ll'. Now functions 'test4', 'test9', 'test14' and 'test19' correctly perform a move of two packed values from the high quadword of vector %b to the low quadword of vector %a (movhlps idiom). No functional change intended. llvm-svn: 213029	2014-07-15 01:29:27 +00:00
Matt Arsenault	95ee145d10	Teach GetUnderlyingObject / BasicAA about addrspacecast llvm-svn: 213025	2014-07-15 00:56:40 +00:00
Nick Lewycky	91e41155de	Revert r212572 "improve BasicAA CS-CS queries", it causes PR20303. llvm-svn: 213024	2014-07-15 00:53:38 +00:00
Matt Arsenault	504e69a213	Convert test to FileCheck. Check the individual test functions for more useful failure errors. llvm-svn: 213021	2014-07-15 00:07:27 +00:00
Andrea Di Biagio	167e00fc99	[DAGCombiner] Avoid calling method 'isShuffleMaskLegal' on illegal vector types. This patch fixes a crasher in method 'DAGCombiner::visitOR' due to an invalid call to method 'isShuffleMaskLegal'. On x86, method 'isShuffleMaskLegal' always expects a legal vector value type in input. With this patch, we immediately check if the input OR dag node has a legal vector type; we only try to fold a OR dag node into a single shufflevector if we know that the resulting shuffle will have a legal type. This is to avoid calling method 'isShuffleMaskLegal' on a potentially illegal vector value type. Added a new test-case to file 'CodeGen/X86/combine-or.ll' to verify that DAGCombiner doesn't crash in the attempt to check/combine an OR between shuffles with illegal types. llvm-svn: 213020	2014-07-15 00:02:32 +00:00
Matt Arsenault	24911cb984	R600: Add denormal handling subtarget features. llvm-svn: 213018	2014-07-14 23:40:49 +00:00
Matt Arsenault	62262e12fa	R600/SI: Default to no single precision denormals. llvm-svn: 213017	2014-07-14 23:40:43 +00:00
David Majnemer	94c981273e	CodeGen: Stick constant pool entries in COMDAT sections for WinCOFF COFF lacks a feature that other object file formats support: mergeable sections. To work around this, MSVC sticks constant pool entries in special COMDAT sections so that each constant is in it's own section. This permits unused constants to be dropped and it also allows duplicate constants in different translation units to get merged together. This fixes PR20262. Differential Revision: http://reviews.llvm.org/D4482 llvm-svn: 213006	2014-07-14 22:57:27 +00:00
Andrea Di Biagio	1b83284869	[DAGCombiner] Add more rules to combine shuffle vector dag nodes. This patch teaches the DAGCombiner how to fold a pair of shuffles according to rules: 1. shuffle(shuffle A, B, M0), B, M1) -> shuffle(A, B, M2) 2. shuffle(shuffle A, B, M0), A, M1) -> shuffle(A, B, M3) The new rules would only trigger if the resulting shuffle has legal type and legal mask. Added test 'combine-vec-shuffle-3.ll' to verify that DAGCombiner correctly folds shuffles on x86 when the resulting mask is legal. Also added some negative cases to verify that we avoid introducing illegal shuffles. llvm-svn: 213001	2014-07-14 22:46:26 +00:00
Matt Arsenault	da8f2f7d36	Look through addrspacecast in IsConstantOffsetFromGlobal llvm-svn: 213000	2014-07-14 22:39:26 +00:00
Matt Arsenault	65202e7cee	Look through addrspacecast in GetPointerBaseWithConstantOffset llvm-svn: 212999	2014-07-14 22:39:22 +00:00
Matt Arsenault	b5624180ce	Convert test to FileCheck llvm-svn: 212992	2014-07-14 21:59:26 +00:00
David Majnemer	f9bbd5bf26	Fix a test broken in r212981 @icmp_sdiv_neg1 should have referred to %a instead of %call, it was renamed at the last second. llvm-svn: 212983	2014-07-14 20:46:04 +00:00
David Majnemer	6e615bab35	InstSimplify: Correct sdiv x / -1 Determining the bounds of x/ -1 would start off with us dividing it by INT_MIN. Suffice to say, this would not work very well. Instead, handle it upfront by checking for -1 and mapping it to the range: [INT_MIN + 1, INT_MAX. This means that the result of our division can be any value other than INT_MIN. llvm-svn: 212981	2014-07-14 20:38:45 +00:00
David Majnemer	a39248360a	InstSimplify: The upper bound of X / C was missing a rounding step Summary: When calculating the upper bound of X / -8589934592, we would perform the following calculation: Floor[INT_MAX / 8589934592] However, flooring the result would make us wrongly come to the conclusion that 1073741824 was not in the set of possible values. Instead, use the ceiling of the result. Reviewers: nicholas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4502 llvm-svn: 212976	2014-07-14 19:49:57 +00:00
Matt Arsenault	fe26dce073	Look through addrspacecast when checking isDereferenceablePointer llvm-svn: 212971	2014-07-14 18:54:12 +00:00
Nick Lewycky	91bbe4f758	Don't eliminate memcpy's when the address of the pointer may itself be relevant. Fixes PR18304. Patch by David Wiberg! llvm-svn: 212970	2014-07-14 18:52:02 +00:00
Bill Wendling	1a86df990e	Unify the lowering of arguments during SjLj prepare. The 'select true, %arg, undef' instruction can be used for both aggregate and non-aggregate arguments. llvm-svn: 212967	2014-07-14 18:21:11 +00:00
Saleem Abdulrasool	6f4137a9fc	X86: correct 64-bit atomics on 32-bit We would emit a libcall for a 64-bit atomic on x86 after SVN r212119. This was due to the misuse of hasCmpxchg16 to indicate if cmpxchg8b was supported on a 32-bit target. They were added at different times and would result in the border condition being mishandled. This fixes the border case to emit the cmpxchg8b instruction for 64-bit atomic operations on x86 at the cost of restoring a long-standing bug in the codegen. We emit a cmpxchg8b on all x86 targets even where the CPU does not support this instruction (pre-Pentium CPUs). Although this bug should be fixed, this was present prior to SVN r212119 and this change, so this is not really introducing a regression. llvm-svn: 212956	2014-07-14 16:28:13 +00:00
David Majnemer	fa3efc30b3	llvm-objdump: Handle BSS sections larger than the object file The size of the uninitialized sections, like BSS, can exceed the size of the object file. Do not attempt to grab the contents of such sections. llvm-svn: 212953	2014-07-14 16:20:14 +00:00
Tim Northover	4ef7afc394	X86: remove temporary atomicrmw used during lowering. We construct a temporary "atomicrmw xchg" instruction when lowering atomic stores for widths that aren't supported natively. This isn't on the top-level worklist though, so it won't be removed automatically and we have to do it ourselves once that itself has been lowered. Thanks Saleem for pointing this out! llvm-svn: 212948	2014-07-14 15:31:13 +00:00
Daniel Sanders	cff61819ea	Re-commit: [mips] Correct section alignments and EntrySizes for .bss, .text, .data, .reginfo, .MIPS.options, and .MIPS.abiflags The lld tests will temporarily fail again but Simon Atanasyan will commit a fix for those shortly. llvm-svn: 212946	2014-07-14 15:05:51 +00:00
Daniel Sanders	a57dbb6f1d	Revert: [mips] Correct section alignments and EntrySizes for .bss, .text, .data, .reginfo, .MIPS.options, and .MIPS.abiflags This commit causes multiple lld tests to fail. Reverting while I investigate the issue. llvm-svn: 212945	2014-07-14 14:43:45 +00:00
Daniel Sanders	16ca05d23a	[mips] Correct section alignments and EntrySizes for .bss, .text, .data, .reginfo, .MIPS.options, and .MIPS.abiflags Summary: .bss, .text, and .data are at least 16-byte aligned. .reginfo is 4-byte aligned and has a 24-byte EntrySize. .MIPS.abiflags has an 24-byte EntrySize. .MIPS.options is 8-byte aligned and has 1-byte EntrySize. Using a 1-byte EntrySize for .MIPS.options seems strange because the records are neither 1-byte long nor fixed-length but this matches the value that GAS emits. Differential Revision: http://reviews.llvm.org/D4487 llvm-svn: 212939	2014-07-14 14:02:14 +00:00
Daniel Sanders	3af4859657	[mips] For the FP64A ABI, odd-numbered double-precision moves must not use mtc1/mfc1. Summary: This is because the FP64A the hardware will redirect 32-bit reads/writes from/to odd-numbered registers to the upper 32-bits of the corresponding even register. In effect, simulating FR=0 mode when FR=0 mode is not available. Unfortunately, we have to make the decision to avoid mfc1/mtc1 before register allocation so we currently do this for even registers too. FPXX has a similar requirement on 32-bit architectures that lack mfhc1/mthc1 so this patch also handles the affected moves from the FPU for FPXX too. Moves to the FPU were supported by an earlier commit. Differential Revision: http://reviews.llvm.org/D4484 llvm-svn: 212938	2014-07-14 13:08:14 +00:00
Daniel Sanders	fc1f878e70	[mips] Use MFHC1 when it is available (MIPS32r2 and later) for both FP32 and FP64 moves Summary: This is similar to r210771 which did the same thing for MTHC1. Also corrected MTHC1_D32 and MTHC1_D64 which used AFGR64 and FGR64 on the wrong definitions. Differential Revision: http://reviews.llvm.org/D4483 llvm-svn: 212936	2014-07-14 12:41:31 +00:00
Tim Northover	4ac35c9d7b	AArch64: remove unnecessary pseudo-instruction. Sufficiently twisted use of TableGen lets us write patterns directly for f16 (as an i16 promoted to i32) -> f32 conversion. llvm-svn: 212933	2014-07-14 11:16:02 +00:00
Daniel Sanders	8ca7923a7c	[mips] Correct the AFL_FLAGS1_ODDSPREG flag in .MIPS.abiflags when no '.module oddspreg' is used Differential Revision: http://reviews.llvm.org/D4486 llvm-svn: 212932	2014-07-14 10:26:15 +00:00
Sasa Stankovic	d8104abd36	[mips] Expand BuildPairF64 to a spill and reload when the O32 FPXX ABI is enabled and mthc1 and dmtc1 are not available (e.g. on MIPS32r1) This prevents the upper 32-bits of a double precision value from being moved to the FPU with mtc1 to an odd-numbered FPU register. This is necessary to ensure that the code generated executes correctly regardless of the current FPU mode. MIPS32r2 and above continues to use mtc1/mthc1, while MIPS-IV and above continue to use dmtc1. Differential Revision: http://reviews.llvm.org/D4465 llvm-svn: 212930	2014-07-14 09:40:29 +00:00
Bill Wendling	93c9860cb7	Support lowering of empty aggregates. This crash was pretty common while compiling Rust for iOS (armv7). Reason - SjLj preparation step was lowering aggregate arguments as ExtractValue + InsertValue. ExtractValue has assertion which checks that there is some data in value, which is not true in case of empty (no fields) structures. Rust uses them quite extensively so this patch uses a 'select true, %val, undef' instruction to lower the argument. Patch by Valerii Hiora. llvm-svn: 212922	2014-07-14 06:22:36 +00:00
Andrea Di Biagio	4242d12adc	[DAGCombiner] Fix a crash caused by a missing check for legal type when trying to fold shuffles. Verify that DAGCombiner does not crash when trying to fold a pair of shuffles according to rule (added at r212539): (shuffle (shuffle A, Undef, M0), Undef, M1) -> (shuffle A, Undef, M2) The DAGCombiner avoids folding shuffles if the resulting shuffle dag node is not legal for the target. That means, the resulting shuffle must have legal type and legal mask. Before, the DAGCombiner only called method 'TargetLowering::isShuffleMaskLegal' to check if it was "safe" to fold according to the above-mentioned rule. However, this caused a crash in the x86 backend since method 'isShuffleMaskLegal' always expects to be called on a legal vector type. llvm-svn: 212915	2014-07-13 21:02:14 +00:00
Simon Atanasyan	a3c42e92aa	[Mips] Support SHT_MIPS_ABIFLAGS section type flag in the llvm-readobj, obj2yaml and yaml2obj tools. llvm-svn: 212908	2014-07-13 15:28:54 +00:00
David Majnemer	ea4ae2d8d2	IR: Allow comdats to be applied to globals with internal linkage Our verifier check for checking if a global has local linkage was too strict. Forbid private linkage but permit local linkage. Object file formats permit this and forbidding it prevents elimination of unused, internal, vftables under the MSVC ABI. llvm-svn: 212900	2014-07-13 04:56:11 +00:00
David Majnemer	494d8a9d0a	MC: Let non-temporary COFF aliases be in symtab MC was aping a binutils bug where aliases would default their linkage to private instead of internal. I've sent a patch to the binutils maintainers and they've recently applied it to the GNU assembler sources. This fixes PR20152. Differential Revision: http://reviews.llvm.org/D4395 llvm-svn: 212899	2014-07-13 04:31:19 +00:00
Matt Arsenault	f6ce1be587	R600: Run more tests with promote alloca disabled. Re-run tests changed in r211110 to test both paths. Also fix broken check line. llvm-svn: 212895	2014-07-13 02:46:17 +00:00
Matt Arsenault	fd221a07bc	R600: Run private-memory test with and without alloca promote The unpromoted path still needs to be tested since we can't always promote to using LDS. llvm-svn: 212894	2014-07-13 02:18:06 +00:00
Saleem Abdulrasool	00c379d824	AArch64: add support for llvm.aarch64.hint intrinsic This adds a llvm.aarch64.hint intrinsic to mirror the llvm.arm.hint in order to support the various hint intrinsic functions in the ACLE. Add an optional pattern field that permits the subclass to specify the pattern that matches the selection. The intrinsic pattern is set as mayLoad, mayStore, so overload the value for the definition of the hint instruction. llvm-svn: 212883	2014-07-12 21:20:49 +00:00
Simon Atanasyan	149290dd5a	[ELFYAML] Group ELF section type flags to target specific blocks. Recognize only flags which correspond to the current target. llvm-svn: 212880	2014-07-12 18:25:08 +00:00
Alexey Samsonov	72f517efca	[ASan] Collect unmangled names of global variables in Clang to print them in error reports. Currently ASan instrumentation pass creates a string with global name for each instrumented global (to include global names in the error report). Global name is already mangled at this point, and we may not be able to demangle it at runtime (e.g. there is no __cxa_demangle on Android). Instead, create a string with fully qualified global name in Clang, and pass it to ASan instrumentation pass in llvm.asan.globals metadata. If there is no metadata for some global, ASan will use the original algorithm. This fixes https://code.google.com/p/address-sanitizer/issues/detail?id=264. llvm-svn: 212872	2014-07-12 00:42:52 +00:00
Matt Arsenault	31e0179e8a	R600: Add missing tests for some intrinsics llvm-svn: 212870	2014-07-12 00:36:19 +00:00
Aditya Nandakumar	2b32e5e74c	When we sink an instruction, this can open up opportunity for the operands to be sunk - add them to the worklist llvm-svn: 212847	2014-07-11 21:49:39 +00:00

... 2 3 4 5 6 ...

25403 Commits