llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 06:22:56 +02:00

Author	SHA1	Message	Date
Quentin Colombet	108e19f763	Add a test for the MachineCopyPropagation change landed in r238518. llvm-svn: 238537	2015-05-29 01:40:00 +00:00
Chandler Carruth	3939505508	[x86] Move the vector popcount tests into non-ISA files, and instead organize them by the width of vector. This makes it a lot easier to see that we're covering all of the vector types but not doing so excessively. This also adds tests across the spectrum of SSE versions in addition to the AVX versions. If you're really tired of seeing the massive sprawl of scalarized code for this, don't worry, I'm just about to land Bruno's patch that dramatically improve the situation for SSSE3 and newer. llvm-svn: 238520	2015-05-28 22:46:48 +00:00
Alex Lorenz	8269eba38f	MIR Serialization: print and parse machine function names. This commit introduces a serializable structure called 'llvm::yaml::MachineFunction' that stores the machine function's name. This structure will mirror the machine function's state in the future. This commit prints machine functions as YAML documents containing a YAML mapping that stores the state of a machine function. This commit also parses the YAML documents that contain the machine functions. Reviewers: Duncan P. N. Exon Smith Differential Revision: http://reviews.llvm.org/D9841 llvm-svn: 238519	2015-05-28 22:41:12 +00:00
David Majnemer	9ea8945a54	Add testcase for r238503. llvm-svn: 238515	2015-05-28 22:12:27 +00:00
Reid Kleckner	711f4eb0a1	[WinEH] Start inserting state number stores for C++ EH This moves all the state numbering code for C++ EH to WinEHPrepare so that we can call it from the X86 state numbering IR pass that runs before isel. Now we just call the same state numbering machinery and insert a bunch of stores. It also populates MachineModuleInfo with information about the current function. llvm-svn: 238514	2015-05-28 22:00:24 +00:00
Reid Kleckner	2c78d5e9c7	Disable x86 tail call optimizations that jump through GOT For x86 targets, do not do sibling call optimization when materializing the callee's address would require a GOT relocation. We can still do tail calls to internal functions, hidden functions, and protected functions, because they do not require this kind of relocation. It is still possible to get GOT relocations when the user explicitly asks for it with musttail or -tailcallopt, both of which are supposed to guarantee TCO. Based on a patch by Chih-hung Hsieh. Reviewers: srhines, timmurray, danalbert, enh, void, nadav, rnk Subscribers: joerg, davidxl, llvm-commits Differential Revision: http://reviews.llvm.org/D9799 llvm-svn: 238487	2015-05-28 20:44:28 +00:00
Daniel Sanders	82a2ed16ac	Revert r238427 - [mips] Make TTypeEncoding indirect to allow .eh_frame to be read-only. It caused a smaller number of failures than the previous attempt at committing but still caused a couple on the llvm-linux-mips builder. Reverting while I investigate the remainder. llvm-svn: 238483	2015-05-28 20:30:32 +00:00
Peter Collingbourne	688547e6fc	Thumb2: Modify codegen for memcpy intrinsic to prefer LDM/STM. We were previously codegen'ing these as regular load/store operations and hoping that the register allocator would allocate registers in ascending order so that we could apply an LDM/STM combine after register allocation. According to the commit that first introduced this code (r37179), we planned to teach the register allocator to allocate the registers in ascending order. This never got implemented, and up to now we've been stuck with very poor codegen. A much simpler approach for achiveing better codegen is to create LDM/STM instructions with identical sets of virtual registers, let the register allocator pick arbitrary registers and order register lists when printing an MCInst. This approach also avoids the need to repeatedly calculate offsets which ultimately ought to be eliminated pre-RA in order to decrease register pressure. This is implemented by lowering the memcpy intrinsic to a series of SD-only MCOPY pseudo-instructions which performs a memory copy using a given number of registers. During SD->MI lowering, we lower MCOPY to LDM/STM. This is a little unusual, but it avoids the need to encode register lists in the SD, and we can take advantage of SD use lists to decide whether to use the _UPD variant of the instructions. Fixes PR9199. Differential Revision: http://reviews.llvm.org/D9508 llvm-svn: 238473	2015-05-28 20:02:45 +00:00
Daniel Sanders	bb2be43a35	[mips] Make TTypeEncoding indirect to allow .eh_frame to be read-only. Summary: Following on from r209907 which made personality encodings indirect, do the same for TType encodings. This fixes the case where a try/catch block needs to generate references to, for example, std::exception in the .gcc_except_table. Reviewers: petarj Reviewed By: petarj Subscribers: srhines, joerg, tberghammer, llvm-commits Differential Revision: http://reviews.llvm.org/D9669 llvm-svn: 238427	2015-05-28 14:52:15 +00:00
Chandler Carruth	521fdf1acb	[x86] Refactor the tests for popcnt. Extracted from the D6531 patch by Bruno Cardoso Lopes, and re-generated to reflect the current state of the world. This should let Bruno's D6531 actually show the delta between the approaches by running the x86 test case update script after re-building. llvm-svn: 238391	2015-05-28 02:40:15 +00:00
Alex Lorenz	b5ebbcd330	Resubmit r237954 (MIR Serialization: print and parse LLVM IR using MIR format). This commit a 3rd attempt at comitting the initial MIR serialization patch. The first commit (r237708) was reverted in 237730. Then the second commit (r237954) was reverted in r238007, as the MIR library under CodeGen caused a circular dependency where the CodeGen library depended on MIR and MIR library depended on CodeGen. This commit has fixed the dependencies between CodeGen and MIR by reorganizing the MIR serialization code - the code that prints out MIR has been moved to CodeGen, and the MIR library has been renamed to MIRParser. Now the CodeGen library doesn't depend on the MIRParser library, thus the circular dependency no longer exists. --Original Commit Message-- MIR Serialization: print and parse LLVM IR using MIR format. This commit is the initial commit for the MIR serialization project. It creates a new library under CodeGen called 'MIR'. This new library adds a new machine function pass that prints out the LLVM IR using the MIR format. This pass is then added as a last pass when a 'stop-after' option is used in llc. The new library adds the initial functionality for parsing of MIR files as well. This commit also extends the llc tool so that it can recognize and parse MIR input files. Reviewers: Duncan P. N. Exon Smith, Matthias Braun, Philip Reames Differential Revision: http://reviews.llvm.org/D9616 llvm-svn: 238341	2015-05-27 18:02:19 +00:00
Elena Demikhovsky	fbea14f66c	AVX-512: Fixed a bug in extracting subvector from v64i1 By Igor Breger (igor.breger@intel.com) llvm-svn: 238322	2015-05-27 14:09:33 +00:00
Daniel Sanders	4dc3da5c0b	Revert r238190 and r238197: [mips] Make TTypeEncoding indirect to allow .eh_frame to be read-only. This broke the llvm-mips-linux builder and several of our out-of-tree builders. Initial investigations show that the commit probably isn't the problem but reverting anyway while I investigate. llvm-svn: 238302	2015-05-27 08:44:01 +00:00
Elena Demikhovsky	51096c6536	AVX-512: Implemented all forms of sign-extend and zero-extend instructions for KNL and SKX Implemented DAG lowering for all these forms. Added tests for DAG lowering and encoding. By Igor Breger (igor.breger@intel.com) llvm-svn: 238301	2015-05-27 08:15:19 +00:00
Quentin Colombet	54045b0ecb	[X86] Implement the support for shrink-wrapping. With this patch the x86 backend is now shrink-wrapping capable and this functionality can be tested by using the -enable-shrink-wrap switch. The next step is to make more test and enable shrink-wrapping by default for x86. Related to <rdar://problem/20821487> llvm-svn: 238293	2015-05-27 06:28:41 +00:00
Rafael Espindola	d0b9369094	Print "lock \t foo" instead of "lock \n foo". This gets gas and llc -filetype=obj to agree on the order of prefixes. For llvm-mc we need to fix the asm parser to know that it makes a difference on which line the "lock" is in. Part of pr23594. llvm-svn: 238232	2015-05-26 18:35:10 +00:00
Jan Vesely	856e4fb817	R600: Use SIGN_EXTEND_INREG for SEXT loads Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> llvm-svn: 238229	2015-05-26 18:07:22 +00:00
Diego Novillo	4ef8ed56e4	Revert "Re-commit changes in r237579 with fix for bug breaking windows builds." This reverts commit r238201 to fix linking problems in x86 Linux http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150525/278413.html llvm-svn: 238223	2015-05-26 17:45:38 +00:00
Luke Cheeseman	e0014ad127	Re-commit changes in r237579 with fix for bug breaking windows builds. llvm-svn: 238201	2015-05-26 13:40:31 +00:00
Elena Demikhovsky	5d72ee6887	AVX-512: fixed a bug in lowering VSELECT for 512-bit vector https://llvm.org/bugs/show_bug.cgi?id=23634 llvm-svn: 238195	2015-05-26 11:32:39 +00:00
Daniel Sanders	adc4fb19a2	[mips] Make TTypeEncoding indirect to allow .eh_frame to be read-only. Summary: Following on from r209907 which made personality encodings indirect, do the same for TType encodings. This fixes the case where a try/catch block needs to generate references to, for example, std::exception in the .gcc_except_table. This commit uses DW_EH_PE_sdata8 for N64 as far as is possible at the moment. However, it is possible to end up with DW_EH_PE_sdata4 when a TargetMachine is not available. There's no risk of issues with inconsistency here since the tables are self describing but it does mean there is a small chance of the PC-relative offset being out of range for particularly large programs. Reviewers: petarj Reviewed By: petarj Subscribers: srhines, joerg, tberghammer, llvm-commits Differential Revision: http://reviews.llvm.org/D9669 llvm-svn: 238190	2015-05-26 10:19:18 +00:00
Simon Pilgrim	9bb9f0f392	[X86][AVX2] Vectorized i16 shift operators Part of D9474, this patch extends AVX2 v16i16 types to 2 x 8i32 vectors and uses i32 shift variable shifts before packing back to i16. Adds AVX2 tests for v8i16 and v16i16 llvm-svn: 238149	2015-05-25 17:49:13 +00:00
Tom Stellard	f32ac61b59	R600/SI: Fix bug with v_interp_p1_f32 instructions on 16 bank lds chips The src and dst register cannot be the same on chips with 16 lds banks. llvm-svn: 238147	2015-05-25 16:15:54 +00:00
Kit Barton	eb147c8fbd	This patch adds support for the vector quadword add/sub instructions introduced in POWER8: vadduqm vaddeuqm vaddcuq vaddecuq vsubuqm vsubeuqm vsubcuq vsubecuq In addition to adding the instructions themselves, it also adds support for the v1i128 type for intrinsics (Intrinsics.td, Function.cpp, and IntrinsicEmitter.cpp). http://reviews.llvm.org/D9081 llvm-svn: 238144	2015-05-25 15:49:26 +00:00
Michael Kuperstein	4559e5e720	[X86] When pattern-matching scalar FMA3 intrinsics, don't re-arrange the first and second operands. The semantics of the scalar FMA intrinsics are that the high vector elements are copied from the first source. The existing pattern switches src1 and src2 around, to match the "213" order, which ends up tying the original src2 to the dest. Since the actual scalar fma3 instructions copy the high elements from the dest register, the wrong values are copied. This modifies the pattern to leave src1 and src2 in their original order. Differential Revision: http://reviews.llvm.org/D9908 llvm-svn: 238131	2015-05-25 12:35:25 +00:00
Elena Demikhovsky	b0a0668b7b	Added promotion to EXTRACT_SUBVECTOR operand. I encountered with this case in one of KNL tests for i1 vectors. v16i1 = EXTRACT_SUBVECTOR v32i1, x llvm-svn: 238130	2015-05-25 11:33:13 +00:00
Matt Arsenault	78ebf53d3b	Add target hook to allow merging stores of nonzero constants On GPU targets, materializing constants is cheap and stores are expensive, so only doing this for zero vectors was silly. Most of the new testcases aren't optimally merged, and are for later improvements. llvm-svn: 238108	2015-05-24 00:51:27 +00:00
Hal Finkel	ef674a05e8	[PowerPC] Fix fast-isel when compare is split from branch When the compare feeding a branch was in a different BB from the branch, we'd try to "regenerate" the compare in the block with the branch, possibly trying to make use of values not available there. Copy a page from AArch64's play book here to fix the problem (at least in terms of correctness). Fixes PR23640. llvm-svn: 238097	2015-05-23 12:18:10 +00:00
Akira Hatanaka	626316f0c3	Stop resetting NoFramePointerElim in TargetMachine::resetTargetOptions. This is part of the work to remove TargetMachine::resetTargetOptions. In this patch, instead of updating global variable NoFramePointerElim in resetTargetOptions, its use in DisableFramePointerElim is replaced with a call to TargetFrameLowering::noFramePointerElim. This function determines on a per-function basis if frame pointer elimination should be disabled. There is no change in functionality except that cl:opt option "disable-fp-elim" can now override function attribute "no-frame-pointer-elim". llvm-svn: 238080	2015-05-23 01:14:08 +00:00
Akira Hatanaka	528279df1c	Remove unnecessary command line option "-disable-fp-elim". This option currently has no effect as function attribute "no-frame-pointer-elim=false" overrides it. llvm-svn: 238077	2015-05-23 00:31:56 +00:00
Rafael Espindola	693b117a0a	Revert "make reciprocal estimate code generation more flexible by adding command-line options" This reverts commit r238051. It broke some bots: http://lab.llvm.org:8011/builders/llvm-ppc64-linux1/builds/18190 llvm-svn: 238075	2015-05-23 00:22:44 +00:00
Ahmed Bougacha	59e6ea80d5	[AArch64][CGP] Sink zext feeding stxr/stlxr into the same block. The usual CodeGenPrepare trickery, on a target-specific intrinsic. Without this, the expansion of atomics will usually have the zext be hoisted out of the loop, defeating the various patterns we have to catch this precise case. Differential Revision: http://reviews.llvm.org/D9930 llvm-svn: 238054	2015-05-22 21:37:17 +00:00
Ahmed Bougacha	85c2eacc35	[AArch64] Robustize atomic cmpxchg test a little more. NFC. We changed the test to test non-constant values in r238049. We can also use CHECK-NEXT to be a little stricter. llvm-svn: 238052	2015-05-22 21:35:14 +00:00
Sanjay Patel	d16f201696	make reciprocal estimate code generation more flexible by adding command-line options This patch adds a class for processing many recip codegen possibilities. The TargetRecip class is intended to handle both command-line options to llc as well as options passed in from a front-end such as clang with the -mrecip option. The x86 backend is updated to use the new functionality. Only -mcpu=btver2 with -ffast-math should see a functional change from this patch. All other CPUs continue to not use reciprocal estimates by default with -ffast-math. Differential Revision: http://reviews.llvm.org/D8982 llvm-svn: 238051	2015-05-22 21:10:06 +00:00
Ahmed Bougacha	f07154b309	[AArch64] Robustize atomic cmpxchg test. NFC. Constants are easy to get right the wrong way. llvm-svn: 238049	2015-05-22 21:08:15 +00:00
Quentin Colombet	cc216023c0	Reapply r238011 with a fix for the trap instruction. The problem was that I slipped a change required for shrink-wrapping, namely I used getFirstTerminator instead of the getLastNonDebugInstr that was here before the refactoring, whereas the surrounding code is not yet patched for that. Original message: [X86] Refactor the prologue emission to prepare for shrink-wrapping. - Add a late pass to expand pseudo instructions (tail call and EH returns). Instead of doing it in the prologue emission. - Factor some static methods in X86FrameLowering to ease code sharing. NFC. Related to <rdar://problem/20821487> llvm-svn: 238035	2015-05-22 18:10:47 +00:00
NAKAMURA Takumi	4a7fccdf3b	Revert r237954, "Resubmit r237708 (MIR Serialization: print and parse LLVM IR using MIR format)." It brought cyclic dependencies between LLVMCodeGen and LLVMMIR. llvm-svn: 238007	2015-05-22 07:17:07 +00:00
Peter Collingbourne	0964ae6e51	Revert r237590, "ARM: allow jump tables to be placed as constant islands." Caused a miscompile of the Android port of Chromium, details forthcoming. llvm-svn: 237972	2015-05-21 23:20:55 +00:00
Chad Rosier	42c30bc92c	[AArch64] Enhance the load/store optimizer with target-specific alias analysis. Phabricator: http://reviews.llvm.org/D9863 llvm-svn: 237963	2015-05-21 21:36:46 +00:00
Alex Lorenz	4c932ccc6c	Resubmit r237708 (MIR Serialization: print and parse LLVM IR using MIR format). This commit is a 2nd attempt at committing the initial MIR serialization patch. The first commit (r237708) made the incremental buildbots unstable and was reverted in r237730. The original commit didn't add a terminating null character to the LLVM IR source which was passed to LLParser, and this sometimes caused the test 'llvmIR.mir' to fail with a parsing error because the LLVM IR source didn't have a null character immediately after the end and thus LLLexer encountered some garbage characters that ultimately caused the error. This commit also includes the other test fixes I committed in r237712 (llc path fix) and r237723 (remove target triple) which also got reverted in r237730. --Original Commit Message-- MIR Serialization: print and parse LLVM IR using MIR format. This commit is the initial commit for the MIR serialization project. It creates a new library under CodeGen called 'MIR'. This new library adds a new machine function pass that prints out the LLVM IR using the MIR format. This pass is then added as a last pass when a 'stop-after' option is used in llc. The new library adds the initial functionality for parsing of MIR files as well. This commit also extends the llc tool so that it can recognize and parse MIR input files. Reviewers: Duncan P. N. Exon Smith, Matthias Braun, Philip Reames Differential Revision: http://reviews.llvm.org/D9616 llvm-svn: 237954	2015-05-21 20:54:45 +00:00
Bill Schmidt	39c88f95dd	[PPC64] Handle vpkudum mask pattern correctly when vpkudum isn't available My recent patch to add support for ISA 2.07 vector pack/unpack instructions didn't properly check for availability of the vpkudum instruction when recognizing it as a special vector shuffle case. This causes us to leave the vector shuffle in place (rather than converting it to a vector permute) so that it can be recognized later as a vpkudum, but that pattern is invalid for processors prior to POWER8. Thus LLVM crashes with an "unable to select" message. We observed this since one of our buildbots is configured to generate code for a POWER7. This patch fixes the problem by checking for availability of the vpkudum instruction during custom lowering of vector shuffles. I've added a test case variant for the vpkudum pattern when the instruction isn't available. llvm-svn: 237952	2015-05-21 20:48:49 +00:00
Nemanja Ivanovic	78592ebe3a	Add support for VSX scalar single-precision arithmetic in the PPC target http://reviews.llvm.org/D9891 Following up on the VSX single precision loads and stores added earlier, this adds support for elementary arithmetic operations on single precision values in VSX registers. These instructions utilize the new VSSRC register class. Instructions added: xsaddsp xsdivsp xsmulsp xsresp xsrsqrtesp xssqrtsp xssubsp llvm-svn: 237937	2015-05-21 19:32:49 +00:00
Elena Demikhovsky	119b37ddd0	AVX-512: Enabled SSE intrinsics on AVX-512. Predicate UseAVX depricates pattern selection on AVX-512. This predicate is necessary for DAG selection to select EVEX form. But mapping SSE intrinsics to AVX-512 instructions is not ready yet. So I replaced UseAVX with HasAVX for intrinsics patterns. llvm-svn: 237903	2015-05-21 14:01:32 +00:00
Simon Pilgrim	ab5200dfe9	[X86][SSE] Improve support for 128-bit vector sign extension This patch improves support for sign extension of the lower lanes of vectors of integers by making use of the SSE41 pmovsx* sign extension instructions where possible, and optimizing the sign extension by shifts on pre-SSE41 targets (avoiding the use of i64 arithmetic shifts which require scalarization). It converts SIGN_EXTEND nodes to SIGN_EXTEND_VECTOR_INREG where necessary, that more closely matches the pmovsx* instruction than the default approach of using SIGN_EXTEND_INREG which splits the operation (into an ANY_EXTEND lowered to a shuffle followed by shifts) making instruction matching difficult during lowering. Necessary support for SIGN_EXTEND_VECTOR_INREG has been added to the DAGCombiner. Differential Revision: http://reviews.llvm.org/D9848 llvm-svn: 237885	2015-05-21 10:05:03 +00:00
Andrew Kaylor	7532e8ed04	[WinEH] C++ EH state numbering fixes Differential Revision: http://reviews.llvm.org/D9787 llvm-svn: 237854	2015-05-20 23:22:24 +00:00
Reid Kleckner	45341c98d0	[WinEH] Store pointers to the LSDA in the exception registration object We aren't yet emitting the LSDA yet, so this will still fail to assemble. llvm-svn: 237852	2015-05-20 23:08:04 +00:00
Hans Wennborg	4dd4acc9c0	Revert r237828 "[X86] Remove unused node after morphing it from shr to and." This caused assertions during DAG combine: PR23601. llvm-svn: 237843	2015-05-20 22:31:55 +00:00
Davide Italiano	4ae19d2fd4	[Target/ARM] Only enable OptimizeBarrierPass at -O1 and above. Ideally this is going to be and LLVM IR pass (shared, among others with AArch64), but for the time being just enable it if consumers ask us for optimization and not unconditionally. Discussed with Tim Northover on IRC. llvm-svn: 237837	2015-05-20 21:40:38 +00:00
Benjamin Kramer	a4110e490c	[X86] Remove unused node after morphing it from shr to and. In some cases it won't get cleaned up properly leading to crashes downstream. PR23353. Based on a patch by Davide Italiano. llvm-svn: 237828	2015-05-20 20:10:26 +00:00
Pawel Bylica	9935e8953f	Fix icmp lowering Summary: During icmp lowering it can happen that a constant value can be larger than expected (see the code around the change). APInt::getMinSignedBits() must be checked again as the shift before can change the constant sign to positive. I'm not sure it is the best fix possible though. Test Plan: Regression test included. Reviewers: resistor, chandlerc, spatel, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D9147 llvm-svn: 237812	2015-05-20 17:21:09 +00:00
Elena Demikhovsky	a5a6fe72e1	AVX-512: fixed algorithm of building vectors of i1 elements fixed extract-insert i1 element, load i1, zextload i1 should be with "and $1, %reg" to prevent loading garbage. added a bunch of new tests. llvm-svn: 237793	2015-05-20 14:32:03 +00:00
Daniel Sanders	cf42b6a699	Revert r237789 - [mips] The naming convention for private labels is ABI dependant. It works, but I've noticed that I missed several callers of createMCAsmInfo() and many don't have a TargetMachine to provide. llvm-svn: 237792	2015-05-20 14:18:59 +00:00
Daniel Sanders	4f3e25881b	[mips] Fix ehframe-indirect.ll test. Summary: -check-prefix replaces the default CHECK prefix rather than adding to it and must be explicitly re-added. Also added the N32 cases. Reviewers: petarj Reviewed By: petarj Subscribers: tberghammer, llvm-commits Differential Revision: http://reviews.llvm.org/D9668 llvm-svn: 237790	2015-05-20 13:19:19 +00:00
Daniel Sanders	571a176b3d	[mips] The naming convention for private labels is ABI dependant. Summary: For N32/N64, private labels begin with '.L' but for O32 they begin with '$'. MCAsmInfo now has an initializer function which can be used to provide information from the TargetMachine to control the assembly syntax. Reviewers: vkalintiris Reviewed By: vkalintiris Subscribers: jfb, sandeep, llvm-commits, rafael Differential Revision: http://reviews.llvm.org/D9821 llvm-svn: 237789	2015-05-20 13:16:42 +00:00
Igor Laevsky	277d7cfcc2	[StatepointLowering] Support of the gc.relocates for invoke statepoints. This change implements support for lowering of the gc.relocates tied to the invoke statepoint. This is acomplished by storing frame indices of the lowered values in "StatepointRelocatedValues" map inside FunctionLoweringInfo instead of storing them in per-basic block structure StatepointLowering. After this change StatepointLowering is used only during "LowerStatepoint" call and it is not necessary to store it as a field in SelectionDAGBuilder anymore. Differential Revision: http://reviews.llvm.org/D7798 llvm-svn: 237786	2015-05-20 11:37:25 +00:00
David Majnemer	263bcdd3a9	[X86] Implement the local-exec TLS model for Windows targets We know that _tls_index is zero for local-exec TLS variables because they are always defined in the executable. llvm-svn: 237772	2015-05-20 04:45:26 +00:00
Alex Lorenz	740bc929d2	Revert r237708 (MIR serialization) - incremental buildbots became unstable. The incremental buildbots entered a pass-fail cycle where during the fail cycle one of the tests from this commit fails for an unknown reason. I have reverted this commit and will investigate the cause of this problem. llvm-svn: 237730	2015-05-19 21:41:28 +00:00
Alex Lorenz	2409ccd4e0	Fix MIR testcase committed in r237708 - remove target triple. Remove the target specific triple and datalayout from the llvm IR in the MIR testcase file. llvm-svn: 237723	2015-05-19 20:51:48 +00:00
Alex Lorenz	94e9820764	Fix llc path in MIR testcases committed in r237708. I've committed testcases with local llc path by mistake. llvm-svn: 237712	2015-05-19 18:45:41 +00:00
Alex Lorenz	cb5500c145	MIR Serialization: print and parse LLVM IR using MIR format. This commit is the initial commit for the MIR serialization project. It creates a new library under CodeGen called 'MIR'. This new library adds a new machine function pass that prints out the LLVM IR using the MIR format. This pass is then added as a last pass when a 'stop-after' option is used in llc. The new library adds the initial functionality for parsing of MIR files as well. This commit also extends the llc tool so that it can recognize and parse MIR input files. Reviewers: Duncan P. N. Exon Smith, Matthias Braun, Philip Reames Differential Revision: http://reviews.llvm.org/D9616 llvm-svn: 237708	2015-05-19 18:17:39 +00:00
Daniel Sanders	b7e612b638	[mips] Correct and improve special-case shuffle instructions. Summary: The documentation writes vectors highest-index first whereas LLVM-IR writes them lowest-index first. As a result, instructions defined in terms of left_half() and right_half() had the halves reversed. In addition to correcting them, they have been improved to allow shuffles that use the same operand twice or in reverse order. For example, ilvev used to accept masks of the form: <0, n, 2, n+2, 4, n+4, ...> but now accepts: <0, 0, 2, 2, 4, 4, ...> <n, n, n+2, n+2, n+4, n+4, ...> <0, n, 2, n+2, 4, n+4, ...> <n, 0, n+2, 2, n+4, 4, ...> One further improvement is that splati.[bhwd] is now the preferred instruction for splat-like operations. The other special shuffles are no longer used for splats. This lead to the discovery that <0, 0, ...> would not cause splati.[hwd] to be selected and this has also been fixed. This fixes the enc-3des test from the test-suite on Mips64r6 with MSA. Reviewers: vkalintiris Reviewed By: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9660 llvm-svn: 237689	2015-05-19 12:24:52 +00:00
Michael Kuperstein	c9165a5a41	[X86] ABI change for x86-32: pass 3 vector arguments in-register instead of 4, except on Darwin. This changes the ABI used on 32-bit x86 for passing vector arguments. Historically, clang passes the first 4 vector arguments in-register, and additional vector arguments on the stack, regardless of platform. That is different from the behavior of gcc, icc, and msvc, all of which pass only the first 3 arguments in-register. The 3-register convention is documented, unofficially, in Agner's calling convention guide, and, officially, in the recently released version 1.0 of the i386 psABI. Darwin is kept as is because the OS X ABI Function Call Guide explicitly documents the current (4-register) behavior. This fixes PR21510 Differential revision: http://reviews.llvm.org/D9644 llvm-svn: 237682	2015-05-19 11:06:56 +00:00
Reid Kleckner	6d90c1f8f3	Re-land r237175: [X86] Always return the sret parameter in eax/rax ... This reverts commit r237210. Also fix X86/complex-fca.ll to match the code that we used to generate on win32 and now generate everwhere to conform to SysV. llvm-svn: 237639	2015-05-18 23:35:09 +00:00
Tim Northover	42f554c12a	ARM: allow jump tables to be placed as constant islands. Previously, they were forced to immediately follow the actual branch instruction. This was usually OK (the LEAs actually accessing them got emitted nearby, and weren't usually separated much afterwards). Unfortunately, a sufficiently nasty phi elimination dumps many instructions right before the basic block terminator, and this can increase the range too much. This patch frees them up to be placed as usual by the constant islands pass, and consequently has to slightly modify the form of TBB/TBH tables to refer to a PC-relative label at the final jump. The other jump table formats were already position-independent. rdar://20813304 llvm-svn: 237590	2015-05-18 17:10:40 +00:00
Oliver Stannard	d562deac4b	Revert r237579, as it broke windows buildbots llvm-svn: 237583	2015-05-18 16:39:16 +00:00
James Y Knight	494ba9c43a	Add support for the Sparc implementation-defined "ASR" registers. (Note that register "Y" is essentially just ASR0). Also added some test cases for divide and multiply, which had none before. Differential Revision: http://reviews.llvm.org/D8670 llvm-svn: 237580	2015-05-18 16:29:48 +00:00
Oliver Stannard	8fe6b2aa15	[LLVM - ARM/AArch64] Add ACLE special register intrinsics This patch implements LLVM support for the ACLE special register intrinsics in section 10.1, __arm_{w,r}sr{,p,64}. This patch is intended to lower the read/write_register instrinsics, used to implement the special register intrinsics in the clang patch for special register intrinsics (see http://reviews.llvm.org/D9697), to ARM specific instructions MRC,MCR,MSR etc. to allow reading an writing of coprocessor registers in AArch32 and AArch64. This is done by inspecting the register string passed to the intrinsic and then lowering to the appropriate instruction. Patch by Luke Cheeseman. Differential Revision: http://reviews.llvm.org/D9699 llvm-svn: 237579	2015-05-18 16:23:33 +00:00
Elena Demikhovsky	0b01514dcb	AVX-512: Added intrinsics for ADDSS/D, MULSS/D, SUBSS/D, DIVSS/D instructions. These intrinsics are comming with rounding mode. Added intrinsics for MAXSS/D, MINSS/D - with and without sae. By Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 237560	2015-05-18 07:24:19 +00:00
Elena Demikhovsky	a8c3a107bb	AVX-512: Added patterns for scalar-to-vector broadcast llvm-svn: 237558	2015-05-18 07:06:23 +00:00
Hal Finkel	d380588a20	[PowerPC] Add extra r2 read deps on @toc@l relocations If some commits are happy, and some commits are sad, this is a sad commit. It is sad because it restricts instruction scheduling to work around a binutils linker bug, and moreover, one that may never be fixed. On 2012-05-21, GCC was updated not to produce code triggering this bug, and now we'll do the same... When resolving an address using the ELF ABI TOC pointer, two relocations are generally required: one for the high part and one for the low part. Only the high part generally explicitly depends on r2 (the TOC pointer). And, so, we might produce code like this: .Ltmp526: addis 3, 2, .LC12@toc@ha .Ltmp1628: std 2, 40(1) ld 5, 0(27) ld 2, 8(27) ld 11, 16(27) ld 3, .LC12@toc@l(3) rldicl 4, 4, 0, 32 mtctr 5 bctrl ld 2, 40(1) And there is nothing wrong with this code, as such, but there is a linker bug in binutils (https://sourceware.org/bugzilla/show_bug.cgi?id=18414) that will misoptimize this code sequence to this: nop std r2,40(r1) ld r5,0(r27) ld r2,8(r27) ld r11,16(r27) ld r3,-32472(r2) clrldi r4,r4,32 mtctr r5 bctrl ld r2,40(r1) because the linker does not know (and does not check) that the value in r2 changed in between the instruction using the .LC12@toc@ha (TOC-relative) relocation and the instruction using the .LC12@toc@l(3) relocation. Because it finds these instructions using the relocations (and not by scanning the instructions), it has been asserted that there is no good way to detect the change of r2 in between. As a result, this bug may never be fixed (i.e. it may become part of the definition of the ABI). GCC was updated to add extra dependencies on r2 to instructions using the @toc@l relocations to avoid this problem, and we'll do the same here. This is done as a separate pass because: 1. These extra r2 dependencies are not really properties of the instructions, but rather due to a linker bug, and maybe one day we'll be able to get rid of them when targeting linkers without this bug (and, thus, keeping the logic centralized here will make that straightforward). 2. There are ISel-level peephole optimizations that propagate the @toc@l relocations to some user instructions, and so the exta dependencies do not apply only to a fixed set of instructions (without undesirable definition replication). The test case was reduced with the help of bugpoint, with minimal cleaning. I'm looking forward to our upcoming MI serialization support, and with that, much better tests can be created. llvm-svn: 237556	2015-05-18 06:25:59 +00:00
Elena Demikhovsky	e802d572b1	AVX-512: fixed extended load to 512-bit register llvm-svn: 237537	2015-05-17 08:08:06 +00:00
Elena Demikhovsky	1e28397b5a	AVX-512: fixed a bug in mask operations - (i1 1) pattern Filling k-reg with all-ones value was wrong, (i1 1) should switch on only one bit in mask register llvm-svn: 237536	2015-05-17 07:28:51 +00:00
Bill Schmidt	d115a77d30	[PPC64] Add vector pack/unpack support from ISA 2.07 This patch adds support for the following new instructions in the Power ISA 2.07: vpksdss vpksdus vpkudus vpkudum vupkhsw vupklsw These instructions are available through the vec_packs, vec_packsu, vec_unpackh, and vec_unpackl built-in interfaces. These are lane-sensitive instructions, so the built-ins have different implementations for big- and little-endian, and the instructions must be marked as killing the vector swap optimization for now. The first three instructions perform saturating pack operations. The fourth performs a modulo pack operation, which means it can be represented with a vector shuffle, and conversely the appropriate vector shuffles may cause this instruction to be generated. The other instructions are only generated via built-in support for now. Appropriate tests have been added. There is a companion patch to clang for the rest of this support. llvm-svn: 237499	2015-05-16 01:02:12 +00:00
James Molloy	a6f56a768b	Mark SMIN/SMAX/UMIN/UMAX nodes as legal and add patterns for them. The new [SU]{MIN,MAX} SDNodes can be lowered directly to instructions for most NEON datatypes - the big exclusion being v2i64. llvm-svn: 237455	2015-05-15 16:15:57 +00:00
Brendon Cahoon	7b7b1f4d51	[Hexagon] Generate hardware loop for a vectorized loop The induction variable in the vectorized loop wasn't recognized properly, so a hardware loop wasn't generated. Differential Revision: http://reviews.llvm.org/D9722 llvm-svn: 237388	2015-05-14 20:36:19 +00:00
Brendon Cahoon	92c444f2aa	[Hexagon] Remove dead constant assignment in hardware loop pass After converting a loop to a hardware loop, the pass should remove any unnecessary instructions from the old compare-and-branch code. This patch removes a dead constant assignment that was used in the compare instruction. Differential Revision: http://reviews.llvm.org/D9720 llvm-svn: 237373	2015-05-14 17:31:40 +00:00
Brendon Cahoon	c76b3878ca	[Hexagon] Check for underflow/wrap in hardware loop pass If the loop trip count may underflow or wrap, the compiler should not generate a hardware loop since the trip count will be incorrect. llvm-svn: 237365	2015-05-14 14:15:08 +00:00
Vasileios Kalintiris	fdfa4b2c5d	[mips] Do not place users of $ra in the delay slot of call instructions. Summary: When we are trying to fill the delay slot of a call instruction, we must avoid filler instructions that use the $ra register. This fixes the test MultiSource/Applications/JM/lencod when we enable the forward delay slot filler. Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9670 llvm-svn: 237362	2015-05-14 13:17:56 +00:00
Artyom Skrobov	b1d8d6d276	Re-apply r237247 - [AArch64] Codegen VMAX/VMIN for safe math cases No longer breaks SPEC2000/2006 llvm-svn: 237361	2015-05-14 12:59:46 +00:00
Elena Demikhovsky	b6e5772812	AVX-512: Added i1 type handling for calling conventions. i1 type is a legal type on AVX-512 and can be passed as parameter or return value. i1 is promoted to i8 on return and to i32 for call arguments (i8 is also promoted to i32 here). The result code is similar to the previous X86 targets, where i1 is allways promoted to i8. llvm-svn: 237350	2015-05-14 09:04:45 +00:00
Ahmed Bougacha	e34f179706	[CodeGen] Use standard -not gnueabi- naming for f16 libcalls on Darwin. Other targets probably should as well. Since r237161, compiler-rt has both, but I don't see why anything other than gnueabi would use a gnueabi naming scheme. llvm-svn: 237324	2015-05-14 01:00:51 +00:00
Brendon Cahoon	72f21b863a	[Hexagon] Generate loop1 instruction for nested loops loop1 is for the outer loop and loop0 is for the inner loop. Differential Revision: http://reviews.llvm.org/D9680 llvm-svn: 237266	2015-05-13 17:56:03 +00:00
Brendon Cahoon	1b5bcf570f	[Hexagon] Generate hardware loop when loop has a critical edge The hardware loop pass should try to generate a hardware loop instruction when the original loop has a critical edge. Differential Revision: http://reviews.llvm.org/D9678 llvm-svn: 237258	2015-05-13 14:54:24 +00:00
Silviu Baranga	452d76e681	Revert r237247 - [AArch64] Codegen VMAX/VMIN.. as it is causing failures in SPEC2000/2006 llvm-svn: 237256	2015-05-13 14:03:18 +00:00
Artyom Skrobov	b27364c475	[AArch64] Codegen VMAX/VMIN for safe math cases llvm-svn: 237247	2015-05-13 12:01:09 +00:00
Sanjoy Das	6d67db8c09	[Statepoints] Support for "patchable" statepoints. Summary: This change adds two new parameters to the statepoint intrinsic, `i64 id` and `i32 num_patch_bytes`. `id` gets propagated to the ID field in the generated StackMap section. If the `num_patch_bytes` is non-zero then the statepoint is lowered to `num_patch_bytes` bytes of nops instead of a call (the spill and reload code remains unchanged). A non-zero `num_patch_bytes` is useful in situations where a language runtime requires complete control over how a call is lowered. This change brings statepoints one step closer to patchpoints. With some additional work (that is not part of this patch) it should be possible to get rid of `TargetOpcode::STATEPOINT` altogether. PlaceSafepoints generates `statepoint` wrappers with `id` set to `0xABCDEF00` (the old default value for the ID reported in the stackmap) and `num_patch_bytes` set to `0`. This can be made more sophisticated later. Reviewers: reames, pgavlin, swaroop.sridhar, AndyAyers Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9546 llvm-svn: 237214	2015-05-12 23:52:24 +00:00
Saleem Abdulrasool	fa770f8269	CodeGen: ignore DEBUG_VALUE nodes in KILL tagging DEBUG_VALUE nodes do not take part in code generation. Ignore them when performing KILL updates. Addresses PR23486. llvm-svn: 237211	2015-05-12 23:36:18 +00:00
Chandler Carruth	a9a05bbdfe	Revert r237175: [X86] Always return the sret parameter in eax/rax ... This commit broke an x86 test and the bots have been broken for well over an hour now so I'm just reverting. llvm-svn: 237210	2015-05-12 23:34:27 +00:00
Reid Kleckner	170b30ec78	[X86] Always return the sret parameter in eax/rax, even on 32-bit Summary: This rule was always in the old SysV i386 ABI docs and the new ones that H.J. Lu has put together, but we never noticed: EAX scratch register; also used to return integer and pointer values from functions; also stores the address of a returned struct or union Fixes PR23491. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9715 llvm-svn: 237175	2015-05-12 20:56:32 +00:00
Sundeep Kushwaha	033e3e3b61	[PATCH] [HEXAGON] Add a test program to verify calling convention for large struct return by value. Differential Revision: http://reviews.llvm.org/D9709 llvm-svn: 237170	2015-05-12 20:13:10 +00:00
Pat Gavlin	f7cb0841d2	[Statepoints] Split the calling convention and statepoint flags operand to STATEPOINT into two separate operands. Differential Revision: http://reviews.llvm.org/D9623 llvm-svn: 237166	2015-05-12 19:50:19 +00:00
Petar Jovanovic	bd0b851f88	[Mips] Return false for isFPCloseToIncomingSP() On Mips, frame pointer points to the same side of the frame as the stack pointer. This function is used to decide where to put register scavenging spill slot. So far, it was put on the wrong side of the frame, and thus it was too far away from $fp when frame was larger than 2^15 bytes. Patch by Vladimir Radosavljevic. http://reviews.llvm.org/D8895 llvm-svn: 237153	2015-05-12 17:14:05 +00:00
Tom Stellard	db0893e43a	R600/SI: add pass to mark CF live ranges as non-spillable Spilling can insert instructions almost anywhere, and this can mess up control flow lowering in a multitude of ways, due to instruction reordering. Let's sort this out the easy way: never spill registers involved with control flow, i.e. saved EXEC masks. Unfortunately, this does not work at all with optimizations disabled, as the register allocator ignores spill weights. This should be addressed in a future commit. The test was reduced from the "stacks" shader of [1]. Some issues trigger the machine verifier while another one is checked manually. [1] http://madebyevan.com/webgl-path-tracing/ v2: only insert pass with optimizations enabled, merge test runs. Patch by: Grigori Goronzy llvm-svn: 237152	2015-05-12 17:13:02 +00:00
Sunil Srivastava	e4b73132fd	Changed renaming of local symbols by inserting a dot vefore the numeric suffix. One code change and several test changes to match that details in http://reviews.llvm.org/D9481 llvm-svn: 237150	2015-05-12 16:47:30 +00:00
Tom Stellard	d2e3c8c77e	R600/SI: Remove M0Reg register class It is no longer used. llvm-svn: 237142	2015-05-12 15:00:52 +00:00
Tom Stellard	368195b84d	R600/SI: Remove explicit m0 operand from DS instructions Instead add m0 as an implicit operand. This helps avoid spills of the m0 register in some cases. llvm-svn: 237141	2015-05-12 15:00:49 +00:00
Tom Stellard	b85064af8f	R600/SI: Make sendmsg test more strict We want to make sure that the m0 copies are being cse'd. llvm-svn: 237134	2015-05-12 14:18:16 +00:00
Elena Demikhovsky	bd877a1b65	AVX-512, X86: Added lowering for shift operations for SKX. The other changes in the LowerShift() are not functional, just to make the code more convenient. So, the functional changes for SKX only. llvm-svn: 237129	2015-05-12 13:25:46 +00:00
John Brawn	c74bcc2521	[ARM] Use AEABI aligned function variants AEABI defines aligned variants of memcpy etc. that can be faster than the default version due to not having to do alignment checks. When emitting target code for these functions make use of these aligned variants if possible. Also convert memset to memclr if possible. Differential Revision: http://reviews.llvm.org/D8060 llvm-svn: 237127	2015-05-12 13:13:38 +00:00
Igor Laevsky	d8e7227125	Reverse ordering of base and derived pointer during safepoint lowering. According to the documentation in StackMap section for the safepoint we should have: "The first Location in each pair describes the base pointer for the object. The second is the derived pointer actually being relocated." But before this change we emitted them in reverse order - derived pointer first, base pointer second. llvm-svn: 237126	2015-05-12 13:12:14 +00:00
Vasileios Kalintiris	2ff1217d69	[mips][FastISel] Handle calls with non legal types i8 and i16. Summary: Allow calls with non legal integer types based on i8 and i16 to be processed by mips fast-isel. Based on a patch by Reed Kotler. Test Plan: "Make check" test forthcoming. Test-suite passes at O0/O2 and with mips32 r1/r2 Reviewers: rkotler, dsanders Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D6770 llvm-svn: 237121	2015-05-12 12:29:17 +00:00
Vasileios Kalintiris	adab1a88f7	[mips][FastISel] Simplify callabi.ll by using multiple check prefixes. Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9635 llvm-svn: 237119	2015-05-12 12:17:11 +00:00
Vasileios Kalintiris	23d90511ce	[mips][FastISel] Allow computation of addresses from constant expressions. Summary: Try to compute addresses when the offset from a memory location is a constant expression. Based on a patch by Reed Kotler. Test Plan: Passes test-suite for -O0/O2 and mips 32 r1/r2 Reviewers: rkotler, dsanders Subscribers: llvm-commits, aemerson, rfuhler Differential Revision: http://reviews.llvm.org/D6767 llvm-svn: 237117	2015-05-12 12:08:31 +00:00
Elena Demikhovsky	a975085c80	AVX-512: select operation for i1 vectors like: select i1 %cond, <16 x i1> %a, <16 x i1> %b. I added pseudo-CMOV patterns to resolve the "select". Added tests for KNL and SKX. llvm-svn: 237106	2015-05-12 09:36:52 +00:00
Michael Kuperstein	59d6eb954a	[X86] DAGCombine should not assume arbitrary vector types are simple The X86-specific DAGCombine for stores should not assume vector types are always simple. This fixes PR23476. Differential Revision: http://reviews.llvm.org/D9659 llvm-svn: 237097	2015-05-12 07:33:07 +00:00
Eric Christopher	2ba04d1116	Migrate existing backends that care about software floating point to use the information in the module rather than TargetOptions. We've had and clang has used the use-soft-float attribute for some time now so have the backends set a subtarget feature based on a particular function now that subtargets are created based on functions and function attributes. For the one middle end soft float check go ahead and create an overloadable TargetLowering::useSoftFloat function that just checks the TargetSubtargetInfo in all cases. Also remove the command line option that hard codes whether or not soft-float is set by using the attribute for all of the target specific test cases - for the generic just go ahead and add the attribute in the one case that showed up. llvm-svn: 237079	2015-05-12 01:26:05 +00:00
Andrew Kaylor	c29dc428f8	[WinEH] Handle nested landing pads that return directly to the parent function. Differential Revision: http://reviews.llvm.org/D9684 llvm-svn: 237063	2015-05-11 23:06:02 +00:00
Andrew Kaylor	295660694e	[WinEH] Update exception numbering to give handlers their own base state. Differential Revision: http://reviews.llvm.org/D9512 llvm-svn: 237014	2015-05-11 19:41:19 +00:00
Pirama Arumuga Nainar	e8329e15ec	[X86] Updates to X86 backend for f16 promotion Summary: r235215 adds support for f16 to be considered as a load/store type and promote f16 operations to f32. This patch has miscellaneous fixes for the X86 backend so all f16 operations are handled: 1. Set loadextaction for f16 vectors to expand. 2. Handle FP_EXTEND in a switch statement when handling v2f32 3. Do not fold (FP_TO_SINT (load f16)) into FP_TO_INT*_IN_MEM or (store (SINT_TO_FP )) to a FILD. Tests included. Reviewers: ab, srhines, delena Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9092 llvm-svn: 237004	2015-05-11 17:14:39 +00:00
Elena Demikhovsky	8c817abca9	AVX-512: Changed CC parameter in "cmp" intrinsic from i8 to i32 according to the Intel Spec by Igor Breger (igor.breger@intel.com) llvm-svn: 236979	2015-05-11 09:03:14 +00:00
Elena Demikhovsky	f25b492812	AVX-512: Added SKX instructions and intrinsics: {add/sub/mul/div/} x {ps/pd} x {128/256} 2. max/min with sae By Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236971	2015-05-11 06:05:05 +00:00
Elena Demikhovsky	1a04d86baa	AVX-512: fixed UINT_TO_FP operation for 512-bit types. llvm-svn: 236955	2015-05-10 14:23:52 +00:00
Elena Demikhovsky	1ed7ba869f	AVX-512: fixed a bug in i1 vectors lowering llvm-svn: 236947	2015-05-10 10:33:32 +00:00
NAKAMURA Takumi	8e0d983ad2	llvm/test/CodeGen/AArch64/tailcall_misched_graph.ll: s/REQUIRE/REQUIRES/ llvm-svn: 236928	2015-05-09 05:59:00 +00:00
James Y Knight	51fd02be55	Fix MergeConsecutiveStore for non-byte-sized memory accesses. The bug showed up as a compile-time assertion failure: Assertion `NumBits >= MIN_INT_BITS && "bitwidth too small"' failed when building msan tests on x86-64. Prior to r236850, this bug was masked due to a bogus alignment check, which also accidentally rejected non-byte-sized accesses. Afterwards, an invalid ElementSizeBytes == 0 got further into the function, and triggered the assertion failure. It would probably be a good idea to allow it to handle merging stores of unusual widths as well, but for now, to un-break it, I'm just making the minimal fix. Differential Revision: http://reviews.llvm.org/D9626 llvm-svn: 236927	2015-05-09 03:13:37 +00:00
Pete Cooper	278c5147ca	[Fast-ISel] Don't mark the first use of a remat constant as killed. When emitting something like 'add x, 1000' if we remat the 1000 then we should be able to mark the vreg containing 1000 as killed. Given that we go bottom up in fast-isel, a later use of 1000 will be higher up in the BB and won't kill it, or be impacted by the lower kill. However, rematerialised constant expressions aren't generated bottom up. The local value save area grows downwards. This means that if you remat 2 constant expressions which both use 1000 then the first will kill it, then the second, which is lower in the BB will read a killed register. This is the case in the attached test where the 2 GEPs both need to generate 'add x, 6680' for the constant offset. Note that this commit only makes kill flag generation conservative. There's nothing else obviously wrong with the local value save area growing downwards, and in fact it needs to for handling arbitrarily complex constant expressions. However, it would be nice if there was a solution which would let us generate more accurate kill flags, or just kill flags completely. llvm-svn: 236922	2015-05-09 00:51:03 +00:00
Arnold Schwaighofer	d6f4926afa	ScheduleDAGInstrs: In functions with tail calls PseudoSourceValues are not non-aliasing distinct objects The code that builds the dependence graph assumes that two PseudoSourceValues don't alias. In a tail calling function two FixedStackObjects might refer to the same location. Worse 'immutable' fixed stack objects like function arguments are not immutable and will be clobbered. Change this so that a load from a FixedStackObject is not invariant in a tail calling function and don't return a PseudoSourceValue for an instruction in tail calling functions when building the dependence graph so that we handle function arguments conservatively. Fix for PR23459. rdar://20740035 llvm-svn: 236916	2015-05-08 23:52:00 +00:00
Hans Wennborg	c4f9165e69	Switch lowering: cluster adjacent fall-through cases even at -O0 It's cheap to do, and codegen is much faster if cases can be merged into clusters. llvm-svn: 236905	2015-05-08 21:23:39 +00:00
Pete Cooper	ecc08669bb	[Fast-ISel] Clear kill flags on registers replaced by updateValueMap. When selecting an extract instruction, we don't actually generate code but instead work out which register we are reading, and rewrite uses of the extract def to the source register. This is done via updateValueMap,. However, its possible that the source register we are rewriting to to also have uses. If those uses are after a kill of the value we are rewriting from then we have uses after a kill and the verifier fails. This code checks for the case where the to register is also used, and if so it clears all kill on the from register. This is conservative, but better that always clearing kills on the from register. llvm-svn: 236897	2015-05-08 20:46:54 +00:00
Brendon Cahoon	90c3ea5b75	[Hexagon] Generate more hardware loops Refactored parts of the hardware loop pass to generate more. Also, added more tests. Differential Revision: http://reviews.llvm.org/D9568 llvm-svn: 236896	2015-05-08 20:18:21 +00:00
Pete Cooper	c8837e431b	[X86] Fast-ISel was incorrectly always killing the source of a truncate. A trunc from i32 to i1 on x86_64 generates an instruction such as %vreg19<def> = COPY %vreg9:sub_8bit<kill>; GR8:%vreg19 GR32:%vreg9 However, the copy here should only have the kill flag on the 32-bit path, not the 64-bit one. Otherwise, we are killing the source of the truncate which could be used later in the program. llvm-svn: 236890	2015-05-08 18:29:42 +00:00
Pat Gavlin	c022b8d288	Extend the statepoint intrinsic to allow statepoints to be marked as transitions from GC-aware code to code that is not GC-aware. This changes the shape of the statepoint intrinsic from: @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args) to: @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args) This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back. In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation. Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops. Differential Revision: http://reviews.llvm.org/D9501 llvm-svn: 236888	2015-05-08 18:07:42 +00:00
Pete Cooper	3575f5c66f	Clear kill flags on all used registers when sinking instructions. The test here was sinking the AND here to a lower BB: %vreg7<def> = ANDWri %vreg8, 0; GPR32common:%vreg7,%vreg8 TBNZW %vreg8<kill>, 0, <BB#1>; GPR32common:%vreg8 which meant that vreg8 was read after it was killed. This commit changes the code from clearing kill flags on the AND to clearing flags on all registers used by the AND. llvm-svn: 236886	2015-05-08 17:54:32 +00:00
Brendon Cahoon	44a2180995	[Hexagon] Update AnalyzeBranch, etc target hooks Improved the AnalyzeBranch, InsertBranch, and RemoveBranch functions in order to handle more of our branch instructions. This requires changes to analyzeCompare and PredicateInstructions. Specifically, we've added support for new value compare jumps, improved handling of endloop, added more compare instructions, and improved support for predicate instructions. Differential Revision: http://reviews.llvm.org/D9559 llvm-svn: 236876	2015-05-08 16:16:29 +00:00
Andrea Di Biagio	6f502af8bb	[X86] Teach 'getTargetShuffleMask' how to look through ISD::WrapperRIP when decoding a PSHUFB mask. The function 'getTargetShuffleMask' already knows how to deal with PSHUFB nodes where the mask node is a load from constant pool, and the constant pool node is wrapped by a X86ISD::Wrapper node. This patch extends that logic by teaching it how to also look through X86ISD::WrapperRIP. This helps function combineX86ShufflesRecusively to combine more shuffle sequences containing PSHUFB nodes if we are in RIPRel PIC mode. Before this change, llc (with -relocation-model=pic -march=x86-64) was unable to decode a pshufb where the mask was loaded from a constant pool. For example, the no-op shuffle from test 'x86-fold-pshufb.ll' was not folded into its operand, so instead of generating a single 'movaps' the backend always generated a sub-optimal 'movdqa + pshufb' sequence. Added test x86-fold-pshufb.ll. llvm-svn: 236863	2015-05-08 15:11:07 +00:00
James Y Knight	f2154471fd	Fix test added in r236850 for OSX builders. Need to specify triple so that llvm emits the asm syntax that the test expected. llvm-svn: 236855	2015-05-08 14:04:54 +00:00
James Y Knight	7d10114335	Fix alignment checks in MergeConsecutiveStores. 1) check whether the alignment of the memory is sufficient for the merged store or load to be efficient. Not doing so can result in some ridiculously poor code generation, if merging creates a vector operation which must be aligned but isn't. 2) DON'T check that the alignment of each load/store is equal. If you're merging 2 4-byte stores, the first might have 8-byte alignment, but the second certainly will have 4-byte alignment. We do want to allow those to be merged. llvm-svn: 236850	2015-05-08 13:47:01 +00:00
Vasileios Kalintiris	0737141bbb	[mips] Emit the .insn directive for empty basic blocks. Summary: In microMIPS, labels need to know whether they are on code or data. This is indicated with STO_MIPS_MICROMIPS and can be inferred by being followed by instructions. For empty basic blocks, we can ensure this by emitting the .insn directive after the label. Also, this fixes some failures in our out-of-tree microMIPS buildbots, for the exception handling regression tests under: SingleSource/Regression/C++/EH Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9530 llvm-svn: 236815	2015-05-08 09:10:15 +00:00
Eric Christopher	23f40c9241	Now that we have a soft-float attribute, use it instead of the hard coded command line option for the Mips soft float tests. llvm-svn: 236801	2015-05-08 00:57:22 +00:00
Pete Cooper	8151546947	Clear kill flags in tail duplication. If we duplicate an instruction then we must also clear kill flags on any uses we rewrite. Otherwise we might be killing a register which was used in other BBs. For example, here the entry BB ended up with these instructions, the ADD having been tail duplicated. %vreg24<def> = t2ADDri %vreg10<kill>, 1, pred:14, pred:%noreg, opt:%noreg; GPRnopc:%vreg24 rGPR:%vreg10 %vreg22<def> = COPY %vreg10; GPR:%vreg22 rGPR:%vreg10 The copy here is inserted after the add and so needs vreg10 to be live. llvm-svn: 236782	2015-05-07 21:48:26 +00:00
Pete Cooper	6bd48d879a	[AArch64] Fix sext/zext folding in address arithmetic. We were accidentally folding a sign/zero extend in to address arithmetic in a different BB when the extend wasn't available there. Cross BB fast-isel isn't safe, so restrict this to only when the extend is in the same BB as the use. llvm-svn: 236764	2015-05-07 19:21:36 +00:00
Nemanja Ivanovic	5c7f6e9cdc	Add VSX Scalar loads and stores to the PPC back end This patch corresponds to review: http://reviews.llvm.org/D9440 It adds a new register class to the PPC back end to contain single precision values in VSX registers. Additionally, it adds scalar loads and stores for VSX registers. llvm-svn: 236755	2015-05-07 18:24:05 +00:00
Diego Novillo	62bfc25308	Fix information loss in branch probability computation. Summary: This addresses PR 22718. When branch weights are too large, they were being clamped to the range [1, MaxWeightForBB]. But this clamping is only applied to edges that go outside the range, so it distorts the relative branch probabilities. This patch changes the weight calculation to scale every branch so the relative probabilities are preserved. The scaling is done differently now. First, all the branch weights are added up, and if the sum exceeds 32 bits, it computes an integer scale to bring all the weights within the range. The patch fixes an existing test that had slightly wrong branch probabilities due to the previous clamping. It now gets branch weights scaled accordingly. Reviewers: dexonsmith Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9442 llvm-svn: 236750	2015-05-07 17:22:06 +00:00
Sanjay Patel	d786f12843	[x86] eliminate unnecessary shuffling/moves with unary scalar math ops (PR21507) Finish the job that was abandoned in D6958 following the refactoring in http://reviews.llvm.org/rL230221: 1. Uncomment the intrinsic def for the AVX r_Int instruction. 2. Add missing r_Int entries to the load folding tables; there are already tests that check these in "test/Codegen/X86/fold-load-unops.ll", so I haven't added any more in this patch. 3. Add patterns to solve PR21507 ( https://llvm.org/bugs/show_bug.cgi?id=21507 ). So instead of this: movaps %xmm0, %xmm1 rcpss %xmm1, %xmm1 movss %xmm1, %xmm0 We should now get: rcpss %xmm0, %xmm0 And instead of this: vsqrtss %xmm0, %xmm0, %xmm1 vblendps $1, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm1[0],xmm0[1,2,3] We should now get: vsqrtss %xmm0, %xmm0, %xmm0 Differential Revision: http://reviews.llvm.org/D9504 llvm-svn: 236740	2015-05-07 15:48:53 +00:00
Elena Demikhovsky	28f6bb84a5	AVX-512: Added all forms of FP compare instructions for KNL and SKX. Added intrinsics for the instructions. CC parameter of the intrinsics was changed from i8 to i32 according to the spec. By Igor Breger (igor.breger@intel.com) llvm-svn: 236714	2015-05-07 11:24:42 +00:00
NAKAMURA Takumi	5c71f88400	llvm/test/CodeGen/X86/llc-override-mcpu-mattr.ll: Tweak not to be affected by x64 Calling Convention. llvm-svn: 236710	2015-05-07 10:18:28 +00:00
Akira Hatanaka	ae9350177e	Let llc and opt override "-target-cpu" and "-target-features" via command line options. This commit fixes a bug in llc and opt where "-mcpu" and "-mattr" wouldn't override function attributes "-target-cpu" and "-target-features" in the IR. Differential Revision: http://reviews.llvm.org/D9537 llvm-svn: 236677	2015-05-06 23:54:14 +00:00
Pete Cooper	f60fb37d5f	Handle dead defs in the if converter. We had code such as this: r2 = ... t2Bcc label1: ldr ... r2 label2; return r2<dead, def> The if converter was transforming this to r2<def> = ... return [pred] r2<dead,def> ldr <r2, kill> return which fails the machine verifier because the ldr now reads from a dead def. The fix here detects dead defs in stepForward and passes them back to the caller in the clobbers list. The caller then clears the dead flag from the def is the value is live. llvm-svn: 236660	2015-05-06 22:51:04 +00:00
Pete Cooper	b282068f5f	Fix incorrect kill flags in fastisel. If called twice in the same BB on the same constant, FastISel::fastEmit_ri_ was marking the materialized vreg as killed on each use, instead of only the last use. Change this to only mark the last use as killed by making earlier uses check if the vreg is already used elsewhere. llvm-svn: 236650	2015-05-06 22:09:29 +00:00
Pete Cooper	f7321a38fb	[x86] Fix register class of folded load index reg. When folding a load in to another instruction, we need to fix the class of the index register Otherwise, it could be something like GR64 not GR64_NOSP and would fail the machine verifier. llvm-svn: 236644	2015-05-06 21:37:19 +00:00
Tim Northover	efb75367dd	CodeGen: move over-zealous assert into actual if statement. It's quite possible to encounter an insertvalue instruction that's more deeply nested than the value we're looking for, but when that happens we really mustn't compare beyond the end of the index array. Since I couldn't see any guarantees about what comparisons std::equal makes, we probably need to directly check the size beforehand. In practice, I suspect most std::equal implementations would probably bail early, which would be OK. But just in case... rdar://20834485 llvm-svn: 236635	2015-05-06 20:07:38 +00:00
Duncan P. N. Exon Smith	d5953a5da9	DwarfDebug: Emit number of bytes in .debug_loc entry directly Emit the number of bytes in a `.debug_loc` entry directly. The old code created temp labels (expensive), emitted the difference between them, and then emitted one on each side of the relevant bytes. (I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc` (the optimized version of ld64's `-save-temps` when linking the `verify-uselistorder` executable in an LTO bootstrap). I've hacked `MCContext::Allocate()` to just call `malloc()` instead of using the `BumpPtrAllocator` so that the heap profile is easier to read. As far as peak memory is concerned, `MCContext::Allocate()` is equivalent to a leak, since it only gets freed at process teardown. In my heap profile, this patch drops memory usage of `DwarfDebug::emitDebugLoc()` from 132.56 MB (11.4%) down to 29.86 MB (2.7%) at peak memory. Some of that must be noise from `SmallVector` (or other) allocations -- peak memory only dropped from 1160 MB down to 1100 MB -- but this nevertheless shaves 5% off the top.) llvm-svn: 236629	2015-05-06 19:11:20 +00:00
Pawel Bylica	2923e93e53	Readd the regression test from r236584. Calling convention fixed to linux. llvm-svn: 236610	2015-05-06 16:43:21 +00:00
Pete Cooper	a188cd1607	[ARM] Fast-Isel was incorrectly selecting <2 x double> adds. With neon enabled, we reach SelectBinaryFPOp and are able to get registers for a <2 x double> add. However, we shouldn't actually attempt arithmetic on it as ARMIselLowering says "v2f64 is legal so that QR subregs can be extracted as f64 elements, but neither Neon nor VFP support any arithmetic operations on it." This commit disables SelectBinaryFPOp for any vector types. There's already a FIXME to try handle neon. Doing so would require fixing this conditional which isn't safe for vectors 'VT == MVT::f64 \|\| VT == MVT::i64' llvm-svn: 236609	2015-05-06 16:39:17 +00:00
Bill Schmidt	1b44da961a	[PPC64LE] Adjust vector splats during VSX swap optimization The initial code drop for VSX swap optimization permitted the optimization only when all operations in a web of related computation are lane-insensitive. For some lane-sensitive operations, we can still permit the optimization provided that we make adjustments to those operations. This patch adds special handling for vector splats so that their presence doesn't kill the optimization. Vector splats are lane-sensitive since they identify by number a vector element to be used as the source of a splat. When swap optimizations take place, the desired vector element will move to the opposite doubleword of the quadword vector. We thus replace the index I by (I + N/2) % N, where N is the number of elements in the vector. A new test case is added to test that swap optimization succeeds when vector splats are present, and that the proper input element is used as the source of the splat. An ancillary change removes SH_BUILDVEC as one of the kinds of special handling that may be required by VSX swap optimization. From experience with GCC, I had expected to need some modifications for vector build operations, but I did not find that to be the case. llvm-svn: 236606	2015-05-06 15:40:46 +00:00
Artyom Skrobov	fbdc6e53f2	[ARM] generate VMAXNM/VMINNM for a compare followed by a select, in safe math mode too llvm-svn: 236590	2015-05-06 11:44:10 +00:00
Pawel Bylica	4162ffc23c	Revert regression test from r236584. Temporary remove a regression test added in r236584. It fails on Windows. llvm-svn: 236586	2015-05-06 10:41:46 +00:00
Pawel Bylica	a6995220b3	SelectionDAG: Handle out-of-bounds index in extract vector element Summary: This patch correctly handles undef case of EXTRACT_VECTOR_ELT node where the element index is constant and not less than vector size. Test Plan: CodeGen for X86 test included. Also one incorrect regression test fixed. Reviewers: qcolombet, chandlerc, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D9250 llvm-svn: 236584	2015-05-06 10:19:14 +00:00
Ahmed Bougacha	49254b5c2e	[ARM][FastISel] Use TST #1 instead of CMP #0 for select. Since r234249, i1 are sext instead of zext; because of that, doing "CMP rN, #0; IT EQ/NE" isn't correct anymore. "TST #1" is the conservatively correct alternative - the tradeoff being that it doesn't have a 16-bit encoding -, so use that instead. llvm-svn: 236569	2015-05-06 04:14:02 +00:00
Sanjoy Das	16d90a3c80	[Statepoints] Remove broken test case. statepoint-indirect-return.ll breaks on linux systems. Delete the test case to make the bots green while I figure out what the right fix is. llvm-svn: 236568	2015-05-06 02:51:46 +00:00
Sanjoy Das	95c979b9ce	[StatepointLowering] Don't create temporary instructions. NFCI. Summary: Instead of creating a temporary call instruction and lowering that, use SelectionDAGBuilder::lowerCallOperands. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9480 llvm-svn: 236563	2015-05-06 02:36:20 +00:00
Pete Cooper	5b7d6ac920	[X86 fast-isel] Constrain the index reg class to not include SP. The index reg on instructions with complex address modes is a GPR64_NOSP. Constrain it to appease the machine verifier. llvm-svn: 236557	2015-05-05 23:41:53 +00:00
Pete Cooper	e0b9a38dbb	Fix IfConverter to handle regmask machine operands. Note, this is a recommit of r236515 after fixing an error in r236514. The buildbot ran fast enough that it picked up r236514 prior to r236515 and threw an error. r236515 itself ran 'make check' without errors. Original commit message follows: A regmask (typically seen on a call) clobbers the set of registers it lists. The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks. These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier. Otherwise, uses after the if converted call could think they are reading an undefined register. Reviewed by Matthias Braun and Quentin Colombet. llvm-svn: 236550	2015-05-05 22:09:41 +00:00
Peter Collingbourne	549025e337	Thumb2SizeReduction: Check the correct set of registers for LDMIA. The register set for LDMIA begins at offset 3, not 4. We were previously missing the short encoding of this instruction in the case where the base register was the first register in the register set. Also clean up some dead code: - The isARMLowRegister check is redundant with what VerifyLowRegs does; replace with an assert. - Remove handling of LDMDB instruction, which has no short encoding (and does not appear in ReduceTable). Differential Revision: http://reviews.llvm.org/D9485 llvm-svn: 236535	2015-05-05 20:07:10 +00:00
Ulrich Weigand	353f5c25c8	[DAGCombiner] Account for getVectorIdxTy() when narrowing vector load This patch makes ReplaceExtractVectorEltOfLoadWithNarrowedLoad convert the element number from getVectorIdxTy() to PtrTy before doing pointer arithmetic on it. This is needed on z, where element numbers are i32 but pointers are i64. Original patch by Richard Sandiford. llvm-svn: 236530	2015-05-05 19:34:10 +00:00
Ulrich Weigand	8229aae81f	[DAGCombiner] Fix ReplaceExtractVectorEltOfLoadWithNarrowedLoad for BE For little-endian, the function would convert (extract_vector_elt (load X), Y) to X + Ysizeof(elt). For big-endian it would instead use X + sizeof(vec) - Ysizeof(elt). The big-endian case wasn't right since vector index order always follows memory/array order, even for big-endian. (Note that the current handling has to be wrong for Y==0 since it would access beyond the end of the vector.) Original patch by Richard Sandiford. llvm-svn: 236529	2015-05-05 19:33:37 +00:00
Ulrich Weigand	aab0c45488	[LegalizeVectorTypes] Allow single loads and stores for more short vectors When lowering a load or store for TypeWidenVector, the type legalizer would use a single load or store if the associated integer type was legal. E.g. it would load a v4i8 as an i32 if i32 was legal. This patch extends that behavior to promoted integers as well as legal ones. If the integer type for the full vector width is TypePromoteInteger, the element type is going to be TypePromoteInteger too, and it's still better to use a single promoting load or truncating store rather than N individual promoting loads or truncating stores. E.g. if you have a v2i8 on a target where i16 is promoted to i32, it's better to load the v2i8 as an i16 rather than load both i8s individually. Original patch by Richard Sandiford. llvm-svn: 236528	2015-05-05 19:32:57 +00:00
Ulrich Weigand	cd1cd8d125	[SystemZ] Add vector intrinsics This adds intrinsics to allow access to all of the z13 vector instructions. Note that instructions whose semantics can be described by standard LLVM IR do not get any intrinsics. For each instructions whose semantics cannot (fully) be described, we define an LLVM IR target-specific intrinsic that directly maps to this instruction. For instructions that also set the condition code, the LLVM IR intrinsic returns the post-instruction CC value as a second result. Instruction selection will attempt to detect code that compares that CC value against constants and use the condition code directly instead. Based on a patch by Richard Sandiford. llvm-svn: 236527	2015-05-05 19:31:09 +00:00
Ulrich Weigand	f07c2265fd	[SystemZ] Mark v1i128 and v1f128 as unsupported The ABI specifies that <1 x i128> and <1 x fp128> are supposed to be passed in vector registers. We do not yet support those types, and some infrastructure is missing before we can do so. In order to prevent accidentally generating code violating the ABI, this patch adds checks to detect those types and error out if user code attempts to use them. llvm-svn: 236526	2015-05-05 19:30:05 +00:00
Ulrich Weigand	096deccae0	[SystemZ] Handle sub-128 vectors The ABI allows sub-128 vectors to be passed and returned in registers, with the vector occupying the upper part of a register. We therefore want to legalize those types by widening the vector rather than promoting the elements. The patch includes some simple tests for sub-128 vectors and also tests that we can recognize various pack sequences, some of which use sub-128 vectors as temporary results. One of these forms is based on the pack sequences generated by llvmpipe when no intrinsics are used. Signed unpacks are recognized as BUILD_VECTORs whose elements are individually sign-extended. Unsigned unpacks can have the equivalent form with zero extension, but they also occur as shuffles in which some elements are zero. Based on a patch by Richard Sandiford. llvm-svn: 236525	2015-05-05 19:29:21 +00:00
Ulrich Weigand	73b770d782	[SystemZ] Add CodeGen support for scalar f64 ops in vector registers The z13 vector facility includes some instructions that operate only on the high f64 in a v2f64, effectively extending the FP register set from 16 to 32 registers. It's still better to use the old instructions if the operands happen to fit though, since the older instructions have a shorter encoding. Based on a patch by Richard Sandiford. llvm-svn: 236524	2015-05-05 19:28:34 +00:00
Ulrich Weigand	119ff7dab3	[SystemZ] Add CodeGen support for v4f32 The architecture doesn't really have any native v4f32 operations except v4f32->v2f64 and v2f64->v4f32 conversions, with only half of the v4f32 elements being used. Even so, using vector registers for <4 x float> and scalarising individual operations is much better than generating completely scalar code, since there's much less register pressure. It's also more efficient to do v4f32 comparisons by extending to 2 v2f64s, comparing those, then packing the result. This particularly helps with llvmpipe. Based on a patch by Richard Sandiford. llvm-svn: 236523	2015-05-05 19:27:45 +00:00
Ulrich Weigand	2413036cdd	[SystemZ] Add CodeGen support for v2f64 This adds ABI and CodeGen support for the v2f64 type, which is natively supported by z13 instructions. Based on a patch by Richard Sandiford. llvm-svn: 236522	2015-05-05 19:26:48 +00:00
Ulrich Weigand	938ff50eda	[SystemZ] Add CodeGen support for integer vector types This the first of a series of patches to add CodeGen support exploiting the instructions of the z13 vector facility. This patch adds support for the native integer vector types (v16i8, v8i16, v4i32, v2i64). When the vector facility is present, we default to the new vector ABI. This is characterized by two major differences: - Vector types are passed/returned in vector registers (except for unnamed arguments of a variable-argument list function). - Vector types are at most 8-byte aligned. The reason for the choice of 8-byte vector alignment is that the hardware is able to efficiently load vectors at 8-byte alignment, and the ABI only guarantees 8-byte alignment of the stack pointer, so requiring any higher alignment for vectors would require dynamic stack re-alignment code. However, for compatibility with old code that may use vector types, when not using the vector facility, the old alignment rules (vector types are naturally aligned) remain in use. These alignment rules are not only implemented at the C language level (implemented in clang), but also at the LLVM IR level. This is done by selecting a different DataLayout string depending on whether the vector ABI is in effect or not. Based on a patch by Richard Sandiford. llvm-svn: 236521	2015-05-05 19:25:42 +00:00
Pete Cooper	d33c29a169	Revert "Fix IfConverter to handle regmask machine operands." This reverts commit b27413cbfd78d959c18e713bfa271fb69e6b3303 (ie r236515). This is to get the bots green while i investigate the failures. llvm-svn: 236517	2015-05-05 18:49:05 +00:00
Pete Cooper	14239b5817	Fix IfConverter to handle regmask machine operands. A regmask (typically seen on a call) clobbers the set of registers it lists. The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks. These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier. Otherwise, uses after the if converted call could think they are reading an undefined register. Reviewed by Matthias Braun and Quentin Colombet. llvm-svn: 236515	2015-05-05 18:31:36 +00:00
Reid Kleckner	feb0da7d82	Re-land "[WinEH] Add an EH registration and state insertion pass for 32-bit x86" This reverts commit r236360. This change exposed a bug in WinEHPrepare by opting win32 code into EH preparation. We already knew that WinEHPrepare has bugs, and is the status quo for x64, so I don't think that's a reason to hold off on this change. I disabled exceptions in the sanitizer tests in r236505 and an earlier revision. llvm-svn: 236508	2015-05-05 17:44:16 +00:00
Quentin Colombet	c82cc9dc57	[ShrinkWrap] Add (a simplified version) of shrink-wrapping. This patch introduces a new pass that computes the safe point to insert the prologue and epilogue of the function. The interest is to find safe points that are cheaper than the entry and exits blocks. As an example and to avoid regressions to be introduce, this patch also implements the required bits to enable the shrink-wrapping pass for AArch64. Context Currently we insert the prologue and epilogue of the method/function in the entry and exits blocks. Although this is correct, we can do a better job when those are not immediately required and insert them at less frequently executed places. The job of the shrink-wrapping pass is to identify such places. Motivating example Let us consider the following function that perform a call only in one branch of a if: define i32 @f(i32 %a, i32 %b) { %tmp = alloca i32, align 4 %tmp2 = icmp slt i32 %a, %b br i1 %tmp2, label %true, label %false true: store i32 %a, i32* %tmp, align 4 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp) br label %false false: %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ] ret i32 %tmp.0 } On AArch64 this code generates (removing the cfi directives to ease readabilities): _f: ; @f ; BB#0: stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething LBB0_2: ; %false mov sp, x29 ldp x29, x30, [sp], #16 ret With shrink-wrapping we could generate: _f: ; @f ; BB#0: cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething add sp, x29, #16 ; =16 ldp x29, x30, [sp], #16 LBB0_2: ; %false ret Therefore, we would pay the overhead of setting up/destroying the frame only if we actually do the call. Proposed Solution This patch introduces a new machine pass that perform the shrink-wrapping analysis (See the comments at the beginning of ShrinkWrap.cpp for more details). It then stores the safe save and restore point into the MachineFrameInfo attached to the MachineFunction. This information is then used by the PrologEpilogInserter (PEI) to place the related code at the right place. This pass runs right before the PEI. Unlike the original paper of Chow from PLDI’88, this implementation of shrink-wrapping does not use expensive data-flow analysis and does not need hack to properly avoid frequently executed point. Instead, it relies on dominance and loop properties. The pass is off by default and each target can opt-in by setting the EnableShrinkWrap boolean to true in their derived class of TargetPassConfig. This setting can also be overwritten on the command line by using -enable-shrink-wrap. Before you try out the pass for your target, make sure you properly fix your emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not necessarily the entry block. Design Decisions 1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but for debugging and clarity I thought it was best to have its own file. 2. Right now, we only support one save point and one restore point. At some point we can expand this to several save point and restore point, the impacted component would then be: - The pass itself: New algorithm needed. - MachineFrameInfo: Hold a list or set of Save/Restore point instead of one pointer. - PEI: Should loop over the save point and restore point. Anyhow, at least for this first iteration, I do not believe this is interesting to support the complex cases. We should revisit that when we motivating examples. Differential Revision: http://reviews.llvm.org/D9210 <rdar://problem/3201744> llvm-svn: 236507	2015-05-05 17:38:16 +00:00
Kit Barton	4fec877662	This patch adds ABI support for v1i128 data type. It adds v1i128 to the appropriate register classes and checks parameter passing and return values. This is related to http://reviews.llvm.org/D9081, which will add instructions that exploit the v1i128 datatype. Phabricator review: http://reviews.llvm.org/D9475 llvm-svn: 236503	2015-05-05 16:10:44 +00:00
Daniel Sanders	b35b5a2e1c	[mips] Generate code for insert/extract operations when using the N64 ABI and MSA. Summary: When using the N64 ABI, element-indices use the i64 type instead of i32. In many cases, we can use iPTR to account for this but additional patterns and pseudo's are also required. This fixes most (but not quite all) failures in the test-suite when using N64 and MSA together. Reviewers: vkalintiris Reviewed By: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9342 llvm-svn: 236494	2015-05-05 10:32:24 +00:00
Daniel Sanders	ba800c7a2f	[mips][msa] Test basic operations for the N32 ABI too. Summary: This required adding instruction aliases for dneg. N64 will be enabled shortly but requires additional bugfixes. Reviewers: vkalintiris Reviewed By: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9341 llvm-svn: 236489	2015-05-05 08:48:35 +00:00
Reid Kleckner	92f8523946	[X86] Fix assertion while DAG combining offsets and ExternalSymbols ExternalSymbol nodes do not contain offsets, unlike GlobalValue nodes. llvm-svn: 236471	2015-05-04 23:22:36 +00:00
Sanjay Patel	8a09ce2621	zap windows line endings; NFC llvm-svn: 236460	2015-05-04 21:27:27 +00:00
Tim Northover	47387fb490	CodeGen: match up correct insertvalue indices when assessing tail calls. When deciding whether a value comes from the aggregate or inserted value of an insertvalue instruction, we compare the indices against those of the location we're interested in. One of the lists needs reversing because the input data is backwards (so that modifications take place at the end of the SmallVector), but we were reversing both before leading to incorrect results. Should fix PR23408 llvm-svn: 236457	2015-05-04 20:41:51 +00:00
Elena Demikhovsky	9652f6e23f	AVX-512: added a test for encoding by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236421	2015-05-04 12:59:15 +00:00
Elena Demikhovsky	16b6cc68cf	AVX-512: added calling convention for i1 vectors in 32-bit mode. Fixed some bugs in extend/truncate for AVX-512 target. Removed VBROADCASTM (masked broadcast) node, since it is not used any more. llvm-svn: 236420	2015-05-04 12:40:50 +00:00
Elena Demikhovsky	40362f45c8	AVX-512: added integer "add" and "sub" instructions with saturation for SKX with intrinsics and tests by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236418	2015-05-04 12:35:55 +00:00
Elena Demikhovsky	5b00c277f4	AVX-512: Added VPACK* instructions forms for KNL and SKX and their intrinsics by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236414	2015-05-04 09:14:02 +00:00
Elena Demikhovsky	adaab7f0c7	Masked gather and scatter intrinsics - enabled codegen for KNL. llvm-svn: 236394	2015-05-03 07:12:25 +00:00
Simon Pilgrim	63255c2b3d	[DAGCombiner] Enabled vector float/double -> int constant folding llvm-svn: 236387	2015-05-02 13:04:07 +00:00
Simon Pilgrim	8d912adea6	Line ending fix llvm-svn: 236386	2015-05-02 11:50:47 +00:00
Simon Pilgrim	ad482c67e6	[SSE] Added vector int (i32 and i64) -> float/double conversion tests llvm-svn: 236385	2015-05-02 11:42:47 +00:00
Simon Pilgrim	ee5f3b3e16	[SSE] Added vector float/double -> i32 and i64 conversion tests llvm-svn: 236384	2015-05-02 11:18:47 +00:00
Eric Christopher	c218c7cfb5	Rework test to use FileCheck by making sure we have no xmm registers with numbers. llvm-svn: 236373	2015-05-02 01:06:17 +00:00
Reid Kleckner	8a0256ee3c	Revert "[WinEH] Add an EH registration and state insertion pass for 32-bit x86" This reverts commit r236359. Things are still broken despite testing. :( llvm-svn: 236360	2015-05-01 22:50:14 +00:00
Reid Kleckner	c7b4ae0218	Re-land "[WinEH] Add an EH registration and state insertion pass for 32-bit x86" This reverts commit r236340. llvm-svn: 236359	2015-05-01 22:40:25 +00:00
Colin LeMahieu	d791ea0b5f	[Hexagon] r236351 fix does not work on builder configurations yet. llvm-svn: 236358	2015-05-01 22:39:20 +00:00
Quentin Colombet	00a9a98ba4	[AArch64][FastISel] Variant of the logical instructions that use two input registers cannot write on SP. rdar://problem/20748715 llvm-svn: 236352	2015-05-01 21:34:57 +00:00
Colin LeMahieu	58a246ec36	[Hexagon] Adding expression MC emission and removing XFAIL from test that hits this code path. llvm-svn: 236348	2015-05-01 21:14:21 +00:00
Quentin Colombet	e70bd6613a	[AArch64][FastISel] Fix the setting of kill flags for MUL -> UMULH sequences. rdar://problem/20748715 llvm-svn: 236346	2015-05-01 20:57:11 +00:00
Reid Kleckner	a471390f42	Revert "[WinEH] Add an EH registration and state insertion pass for 32-bit x86" This reverts commit r236339, it breaks the win32 clang-cl self-host. llvm-svn: 236340	2015-05-01 20:14:04 +00:00
Reid Kleckner	ec21431851	[WinEH] Add an EH registration and state insertion pass for 32-bit x86 This pass is responsible for constructing the EH registration object that gets linked into fs:00, which is all it does in this change. In the future, it will also insert stores to update the EH state number. I considered keeping this functionality in WinEHPrepare, but it's pretty separable and X86 specific. It has conceptually very little to do with the task of WinEHPrepare, which is currently outlining. WinEHPrepare is also in theory useful on ARM, but this logic is pretty x86 specific. Reviewers: andrew.w.kaylor, majnemer Differential Revision: http://reviews.llvm.org/D9422 llvm-svn: 236339	2015-05-01 20:04:54 +00:00
Peter Collingbourne	daa528e28b	ARM: Align functions containing Thumb-2 jump tables to 4 bytes. Functions with jump tables need an alignment of 4 because they use the ADR instruction, which aligns the PC to 4 bytes before adding an offset. Differential Revision: http://reviews.llvm.org/D9424 llvm-svn: 236327	2015-05-01 18:05:59 +00:00
Simon Pilgrim	eb80927c6c	[SelectionDAG] Unary vector constant folding integer legality fixes This patch fixes issues with vector constant folding not correctly handling scalar input operands if they require implicit truncation - this was tested with llvm-stress as recommended by Patrik H Hagglund. The patch ensures that integer input scalars from a build vector are correctly truncated before folding, and that constant integer scalar results are promoted to a legal type before inclusion in the new folded build vector. I have added another crash test case and also a test for UINT_TO_FP / SINT_TO_FP using an non-truncated scalar input, which was failing before this patch. Differential Revision: http://reviews.llvm.org/D9282 llvm-svn: 236308	2015-05-01 08:20:04 +00:00
Tom Stellard	b37aaa4862	R600/SI: Add VCC as an implict def of SI_KILL When SI_KILL has a register operand, its lowered form writes to vcc. llvm-svn: 236307	2015-05-01 03:44:09 +00:00
Tom Stellard	6da1425028	R600/SI: Fix verifier errors from the SIAnnotateControlFlow pass This pass was generating 'Instruction does not dominate all uses!' errors for programs which had loops with a condition variable that depended on the result of a phi instruction from outside of the loop. The pass was inserting new phi nodes outside of the loop which used values defined inside the loop. http://bugs.freedesktop.org/show_bug.cgi?id=90056 llvm-svn: 236306	2015-05-01 03:44:08 +00:00
Quentin Colombet	a179ebf75a	[ARM][TEST] Strengthen test against smarter reg alloc. Follow-up of r236247. rdar://problem/20770899 llvm-svn: 236296	2015-05-01 00:45:55 +00:00
Pete Cooper	9851cd43b2	[ARM] optimizeSelect should clear kill flags. If we move an instruction from one block down to a MOVC and predicate it, then the original instruction could be moved in to a loop. In this case, its invalid for any kill flags to remain on there. Fails with -verfy-machineinstrs. rdar://problem/20752113 llvm-svn: 236290	2015-04-30 23:57:47 +00:00
Pete Cooper	2c871b35a8	Commute the internal flag on MachineOperands. When commuting a thumb instruction in the size reduction pass, thumb instructions are represented as a bundle and so some operands may be marked as internal. The internal flag has to move with the operand when commuting. This test is sensitive to register allocation so can't specifically check that this error was happening, but so long as it continues to pass with -verify then hopefully its still ok. rdar://problem/20752113 llvm-svn: 236282	2015-04-30 23:14:14 +00:00
Quentin Colombet	43702a31a2	[AArch64] Fix bad register class constraint in fast-isel for TST instruction. rdar://problem/20748715 llvm-svn: 236273	2015-04-30 22:27:20 +00:00
Pete Cooper	90ff562042	Don't always apply kill flag in thumb2 ABS pseudo expansion. The expansion for t2ABS was always setting the kill flag on the rsb instruction. It should instead only be set on rsb if it was set on the original ABS instruction. rdar://problem/20752113 llvm-svn: 236272	2015-04-30 22:15:59 +00:00
Reid Kleckner	6e67433faf	[X86] Use 4 byte preferred aggregate alignment on Win32 This helps reduce the frequency of stack realignment prologues in 32-bit X86 Windows code. Before this change and the corresponding clang change, we would take the max of the type preferred alignment and the explicit alignment on the alloca. If you don't override aggregate alignment in datalayout, you get a default of 8. This dates back to 2007 / r34356, and changing it seems prohibitively difficult at this point. llvm-svn: 236270	2015-04-30 22:11:59 +00:00
Andrea Di Biagio	50f551d703	Fix comment in test. NFC. llvm-svn: 236262	2015-04-30 21:22:28 +00:00
Andrea Di Biagio	2fddd75212	Fix for PR23103. Correctly propagate the 'IsUndef' flag to the register operands of a commuted instruction. Revision 220239 exposed a latent bug in method 'TargetInstrInfo::commuteInstruction'. When commuting the operands of a machine instruction, method 'commuteInstruction' didn't correctly propagate the 'IsUndef' flag to the register operands of the new (commuted) instruction. Before this patch, the following instruction: %vreg4<def> = VADDSDrr %vreg14, %vreg5<undef>; FR64:%vreg4,%vreg14,%vreg5 was wrongly converted by method 'commuteInstruction' into: %vreg4<def> = VADDSDrr %vreg5, %vreg14<undef>; FR64:%vreg4,%vreg5,%vreg14 The correct instruction should have been: %vreg4<def> = VADDSDrr %vreg5<undef>, %vreg14; FR64:%vreg4,%vreg5,%vreg14 This patch fixes the problem in method 'TargetInstrInfo::commuteInstruction'. When swapping the operands of a machine instruction, we now make sure that 'IsUndef' flags are correctly set. Added test case 'pr23103.ll'. Differential Revision: http://reviews.llvm.org/D9406 llvm-svn: 236258	2015-04-30 21:03:29 +00:00
Pete Cooper	54ecccfd00	Don't rewrite jumps to empty BBs to landing pads. In the test case here, the 'unreachable' BB was removed by BranchFolding because its empty. It then rewrote the jump from 'entry' to jump to its fallthrough, which was a landing pad. This results in 'entry' jumping to 2 different landing pads, which fails the machine verifier. rdar://problem/20750162 llvm-svn: 236248	2015-04-30 18:58:23 +00:00
Quentin Colombet	bc9c5a5af1	[ARM] Do not generate invalid encoding for stack adjust, even if this is just temporary. Because of that: 1. The machine verifier was complaining on such code. 2. The generate code worked just because the thumb reduction size pass fixed the opcode. rdar://problem/20749824 llvm-svn: 236247	2015-04-30 18:52:49 +00:00
Jan Vesely	72f354c922	Reinstate revisions r234755, r234759, r234760 changes: Don't apply on hexagon and NVPTX since they no longer claim to support UADDO/USUBO Add location to getConstant Drop comment about the ops being turned into expand llvm-svn: 236240	2015-04-30 17:15:56 +00:00
Daniel Sanders	0bbd3e3f2c	[mips][msa] Rename main check prefix to 'ALL' in basic operations tests. NFC Summary: The majority of the checks are subtarget independent. The few that aren't will be corrected shortly. Reviewers: vkalintiris Reviewed By: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9340 llvm-svn: 236220	2015-04-30 09:57:37 +00:00
Daniel Sanders	95fe30da62	[mips][msa] Use CHECK-LABEL where missing, and remove checks matching the .size directive. NFC. Summary: Reviewers: vkalintiris Reviewed By: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9339 llvm-svn: 236219	2015-04-30 09:56:30 +00:00
Daniel Sanders	32a98df5fb	[mips] Add missing signext attributes to MSA basic operations tests. NFC. Summary: This doesn't make much difference to MIPS32, but it will simplify a MIPS64r6 bugfix which will follow shortly by removing unnecessary sign-extension of parameters. Reviewers: vkalintiris Reviewed By: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9338 llvm-svn: 236216	2015-04-30 09:24:09 +00:00
Simon Pilgrim	07a58b66cf	[SSE] Fix for MUL v16i8 on pre-SSE41 targets (PR23369). Sign extension of i8 to i16 was placing the unpacked bytes in the lower byte instead of the upper byte. llvm-svn: 236209	2015-04-30 08:23:16 +00:00
Owen Anderson	da14e7665d	Semantically revert r236031, which is not a good idea for in-order targets. At the least it should be guarded by some kind of target hook. It also introduced catastrophic compile time and code quality regressions on some out of tree targets (test case still being reduced/sanitized). Sanjay agreed with reverting this patch until these issues can be resolved. llvm-svn: 236199	2015-04-30 04:06:32 +00:00
Hans Wennborg	847f84eef8	XFAIL test/CodeGen/Generic/MachineBranchProb.ll on Hexagon (PR23377) llvm-svn: 236196	2015-04-30 01:59:04 +00:00
Hans Wennborg	1476a3b8b7	Switch lowering: use profile info to build weight-balanced binary search trees This will cause hot nodes to appear closer to the root. The literature says building the tree like this makes it a near-optimal (in terms of search time given key frequencies) binary search tree. In LLVM's case, we can do up to 3 comparisons in each leaf node, so it might be better to opt for lower tree height in some cases; that's something to look into in the future. Differential Revision: http://reviews.llvm.org/D9318 llvm-svn: 236192	2015-04-30 00:57:37 +00:00
Ahmed Bougacha	bfe392760e	Flip r236172 testcase RUN option ordering for BSD sed(1). NFC. llvm-svn: 236186	2015-04-30 00:07:34 +00:00
Pete Cooper	00f179e7a1	Change x86 CMOVE_F to read it source, not write it. This was breaking sqlite with the machine verifier because operand 0 was a def according to tablegen, but didn't have the 'isDef' flag set. Looking at the ISA, its clear that this operand is a source as writing to st(0) is implicit. So move the operand to the correct place in the td file. rdar://problem/20751584 llvm-svn: 236183	2015-04-29 23:51:33 +00:00
Reid Kleckner	3691294c25	[WinEH] Start EH preparation for 32-bit x86, it uses no arguments 32-bit x86 MSVC-style exceptions are functionaly similar to 64-bit, but they take no arguments. Instead, they implicitly use the value of EBP passed in by the caller as a pointer to the parent's frame. In LLVM, we can represent this as llvm.frameaddress(1), and feed that into all of our calls to llvm.framerecover. The next steps are: - Add an alloca to the fs:00 linked list of handlers - Add something like llvm.sjlj.lsda or generalize it to store in the alloca - Move state number calculation to WinEHPrepare, arrange for FunctionLoweringInfo to call it - Use the state numbers to insert explicit loads and stores in the IR llvm-svn: 236172	2015-04-29 22:49:54 +00:00
Reid Kleckner	97ba8a8b4c	[X86] Avoid mangling frameescape labels x86 Windows uses the '_' prefix for all global symbols, and this was mistakenly being applied to frameescape labels, which are not externally visible global symbols. They use the private global prefix 'L'. The right way to fix this is probably to stop masquerading this label as an ExternalSymbol and create a new SDNode type. These labels are not "external", and we know they will be resolved by assembly time. Having a custom SDNode type would allow us to do better X86 address mode matching, so it's probably worth doing eventually. llvm-svn: 236123	2015-04-29 16:46:01 +00:00
Duncan P. N. Exon Smith	09b5c9c24d	IR: Give 'DI' prefix to debug info metadata Finish off PR23080 by renaming the debug info IR constructs from `MD` to `DI`. The last of the `DIDescriptor` classes were deleted in r235356, and the last of the related typedefs removed in r235413, so this has all baked for about a week. Note: If you have out-of-tree code (like a frontend), I recommend that you get everything compiling and tests passing with the previous commit before updating to this one. It'll be easier to keep track of what code is using the `DIDescriptor` hierarchy and what you've already updated, and I think you're extremely unlikely to insert bugs. YMMV of course. Back to this commit: I did this using the rename-md-di-nodes.sh upgrade script I've attached to PR23080 (both code and testcases) and filtered through clang-format-diff.py. I edited the tests for test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns were off-by-three. It should work on your out-of-tree testcases (and code, if you've followed the advice in the previous paragraph). Some of the tests are in badly named files now (e.g., test/Assembler/invalid-mdcompositetype-missing-tag.ll should be 'dicompositetype'); I'll come back and move the files in a follow-up commit. llvm-svn: 236120	2015-04-29 16:38:44 +00:00
Vasileios Kalintiris	6f47a197aa	Mips fast-isel - handle functions which return i8 or i6 . Summary: Allow Mips fast-isel to handle functions which return i8/i16 signed/unsigned. Test Plan: Make check tests are forthcoming. Already passes test-suite at O0/O2 for Mips 32 r1/r2 Reviewers: dsanders, rkotler Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D6765 llvm-svn: 236103	2015-04-29 14:17:14 +00:00
Daniel Sanders	d40d7b9594	[mips] Correct 128-bit shifts on 64-bit targets. Summary: The existing code was correct for 32-bit GPR's but not 64-bit GPR's. It now accounts for both cases. Reviewers: vkalintiris Reviewed By: vkalintiris Subscribers: llvm-commits, mohit.bhakkad, sagar Differential Revision: http://reviews.llvm.org/D9337 llvm-svn: 236099	2015-04-29 12:28:58 +00:00
Tim Northover	5bf87deee6	ARM: fix peephole optimisation of TST We were trying to look through COPY instructions, but only to the next instruction in a BB and incorrectly anyway. The cases where that would actually be a good idea are rare enough (and not even tested!) that it's not worth trying to get right. rdar://20721342 llvm-svn: 236050	2015-04-28 22:03:55 +00:00
Andrew Kaylor	9f3b999556	[WinEH] Split blocks at calls to llvm.eh.begincatch Differential Revision: http://reviews.llvm.org/D9311 llvm-svn: 236046	2015-04-28 21:54:14 +00:00
Sanjay Patel	9b05b26bc6	transform fadd chains to increase parallelism This is a compromise: with this simple patch, we should always handle a chain of exactly 3 operations optimally, but we're not generating the optimal balanced binary tree for a longer sequence. In general, this transform will reduce the dependency chain for a sequence of instructions using N operands from a worst case N-1 dependent operations to N/2 dependent operations. The optimal balanced binary tree would reduce the chain to log2(N). The trade-off for not dealing with longer sequences is: (1) we have less complexity in the compiler, (2) we avoid unknown compile-time blowup calculating a balanced tree, and (3) we don't need to worry about the increased register pressure required to parallelize longer sequences. It also seems unlikely that we would ever encounter really long strings of dependent ops like that in the wild, but I'm not sure how to verify that speculation. FWIW, I see no perf difference for test-suite running on btver2 (x86-64) with -ffast-math and this patch. We can extend this patch to cover other associative operations such as fmul, fmax, fmin, integer add, integer mul. This is a partial fix for: https://llvm.org/bugs/show_bug.cgi?id=17305 and if extended: https://llvm.org/bugs/show_bug.cgi?id=21768 https://llvm.org/bugs/show_bug.cgi?id=23116 The issue also came up in: http://reviews.llvm.org/D8941 Differential Revision: http://reviews.llvm.org/D9232 llvm-svn: 236031	2015-04-28 21:03:22 +00:00
Tom Stellard	347b82f6fc	R600: Fix up for AsmPrinter's OutStreamer being a unique_ptr Fixes a crash with basically any OpenGL application using the radeonsi driver. Patch by: Michel Dänzer Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90176 Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 236004	2015-04-28 17:37:03 +00:00
Justin Holewinski	41800386f1	[NVPTX] Handle addrspacecast constant expressions in aggregate initializers We need to track if an AddrSpaceCast expression was seen when generating an MCExpr for a ConstantExpr. This change introduces a custom lowerConstant method to the NVPTX asm printer that will create NVPTXGenericMCSymbolRefExpr nodes at the appropriate places to encode the information that a given symbol needs to be casted to a generic address. llvm-svn: 236000	2015-04-28 17:18:30 +00:00
Elena Demikhovsky	224807ff06	Fixed crash of variable shift inst on AVX2 https://llvm.org/bugs/show_bug.cgi?id=22955 llvm-svn: 235993	2015-04-28 14:46:35 +00:00
Elena Demikhovsky	23e33119e3	AVX-512: Added "pandn" intrinsics set by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 235971	2015-04-28 08:12:42 +00:00
Hans Wennborg	cc333a9a05	Switch lowering: Take branch weight into account when ordering for fall-through Previously, the code would try to put a fall-through case last, even if that meant moving a case with much higher branch weight further down the chain. Ordering by branch weight is most important, putting a fall-through block last is secondary. llvm-svn: 235942	2015-04-27 23:35:22 +00:00
Ahmed Bougacha	54c4fb4a3f	[AArch64] Also combine vector selects fed by non-i1 SETCCs. After legalization, scalar SETCC has an i32 result type on AArch64. The i1 requirement seems too conservative, replace it with an assert. This also means that we now can run after legalization. That should also be fine, since the ops legalizer runs again after each combine, and all types created all have the same sizes as the (legal) inputs. Exposed by r235917; while there, robustize its tests (bsl also uses the register it defines). llvm-svn: 235922	2015-04-27 21:43:12 +00:00
Ahmed Bougacha	5f0f3e8528	[AArch64] Don't assert when combining (v3f32 select (setcc f64)). When the setcc has f64 operands, we can't build a vector setcc mask to feed a vselect, because f64 doesn't divide v3f32 evenly. Just bail out when that happens. llvm-svn: 235917	2015-04-27 21:01:20 +00:00
Hans Wennborg	7103f56991	Switch lowering: order bit tests by branch weight. llvm-svn: 235912	2015-04-27 20:21:17 +00:00
Bill Schmidt	6661e2ddb2	[PPC64LE] Remove unnecessary swaps from lane-insensitive vector computations This patch adds a new SSA MI pass that runs on little-endian PPC64 code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors without alignment constraints are accomplished for little-endian using lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional xxswapd instructions hurts performance in comparison with big-endian code, but they are necessary in the general case to support correct semantics. However, the general case does not apply to most vector code. Many vector instructions are lane-insensitive; they do not "care" which lanes the parallel computations are performed within, provided that the resulting data is stored into the correct locations. Thus this pass looks for computations that perform only lane-insensitive operations, and remove the unnecessary swaps from loads and stores in such computations. Future improvements will allow computations using certain lane-sensitive operations to also be optimized in this manner, by modifying the lane-sensitive operations to account for the permuted order of the lanes. However, this patch only adds the infrastructure to permit this; no lane-sensitive operations are optimized at this time. This code is heavily exercised by the various vectorizing applications in the projects/test-suite tree. For the time being, I have only added one simple test case to demonstrate what the pass is doing. Although it is quite simple, it provides coverage for much of the code, including the special case handling of copies and subreg-to-reg operations feeding the swaps. I plan to add additional tests in the future as I fill in more of the "special handling" code. Two existing tests were affected, because they expected the swaps to be present, but they are now removed. llvm-svn: 235910	2015-04-27 19:57:34 +00:00
Elena Demikhovsky	489127abd6	AVX-512: added calling conventions for i1 vectors. Fixed bug: https://llvm.org/bugs/show_bug.cgi?id=20724 llvm-svn: 235889	2015-04-27 15:11:19 +00:00
Brendon Cahoon	37b8b0d293	[Hexagon] Use constant extenders to fix up hardware loops Use a loop instruction with a constant extender for a hardware loop instruction that is too far away from the start of the loop. This is cheaper than changing the SA register value. Differential Revision: http://reviews.llvm.org/D9262 llvm-svn: 235882	2015-04-27 14:16:43 +00:00
Vasileios Kalintiris	50a170aec4	Reapply "[mips][FastISel] Implement shift ops for Mips fast-isel."" This reapplies r235194, which was reverted in r235495 because it was causing a failure in our out-of-tree buildbots for MIPS. With the sign-extension patch in r235718, this patch doesn't cause any problem any more. llvm-svn: 235878	2015-04-27 13:28:05 +00:00
Elena Demikhovsky	3485573818	AVX-512: Extend/Truncate operations for SKX, SETCC for bit-vectors llvm-svn: 235875	2015-04-27 12:57:59 +00:00
Simon Pilgrim	8174379b38	[X86][SSE] Add v16i8/v32i8 multiplication support Patch to allow int8 vectors to be multiplied on the SSE unit instead of being scalarized. The patch sign extends the i8 lanes to i16, uses the SSE2 pmullw multiplication instruction, then packs the lower byte from each result. Differential Revision: http://reviews.llvm.org/D9115 llvm-svn: 235837	2015-04-27 07:55:46 +00:00
Matt Arsenault	363e46ec74	R600: Remove / merge redundant testcases llvm-svn: 235813	2015-04-26 00:53:33 +00:00
Sanjay Patel	6a762d07bf	add SSE run to check non-AVX codegen llvm-svn: 235809	2015-04-25 20:41:51 +00:00
Simon Pilgrim	85f3a113c7	line endings fix llvm-svn: 235800	2015-04-25 12:12:43 +00:00
Reid Kleckner	3a59b4e3a9	[SEH] Implement GetExceptionCode in __except blocks This introduces an intrinsic called llvm.eh.exceptioncode. It is lowered by copying the EAX value live into whatever basic block it is called from. Obviously, this only works if you insert it late during codegen, because otherwise mid-level passes might reschedule it. llvm-svn: 235768	2015-04-24 20:25:05 +00:00
David Blaikie	2fcc0180e4	[opaque pointer type] Add textual IR support for explicit type parameter to the invoke instruction Same as r235145 for the call instruction - the justification, tradeoffs, etc are all the same. The conversion script worked the same without any false negatives (after replacing 'call' with 'invoke'). llvm-svn: 235755	2015-04-24 19:32:54 +00:00
Sundeep Kushwaha	17d5168fea	[PATCH] [Hexagon] Adding a test case for calling convention. http://reviews.llvm.org/D9241 llvm-svn: 235754	2015-04-24 19:22:02 +00:00
Yaron Keren	b412c387f7	Teach AArch64\lit.local.cfg the new triple names windows-gnu and windows-msvc. Tests were failing when built with -DLLVM_DEFAULT_TARGET_TRIPLE=i686-pc-windows-gnu. llvm-svn: 235733	2015-04-24 17:14:16 +00:00
Hans Wennborg	2223cdaf41	Switch lowering: fix APInt overflow causing infinite loop / OOM llvm-svn: 235729	2015-04-24 16:53:55 +00:00
Reid Kleckner	a6adb4cb4e	[WinEH] Split the landingpad BB instead of cloning it This means we don't have to RAUW the landingpad instruction and landingpad BB, which is a nice win. llvm-svn: 235725	2015-04-24 16:22:19 +00:00
Jingyue Wu	b822e43272	[NVPTX] Emits "generic()" depending on the original address space Summary: Fixes a bug in the NVPTX codegen. The code used to miss necessary "generic()" on aggregates of addrspacecasts. Test Plan: addrspacecast-gvar.ll Reviewers: eliben, jholewinski Reviewed By: jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D9130 llvm-svn: 235689	2015-04-24 02:57:30 +00:00
Matt Arsenault	5c45e4a835	R600/SI: Fix verifier error when producing v_madmk_f32 Copy the kill flags when swapping the operands. llvm-svn: 235687	2015-04-24 01:57:58 +00:00
Matthias Braun	3cae2dc2b2	R600/RegisterCoalescer: Enable more rematerialization/add missing testcase This enables the rematerialization of some R600 MOV instructions in the RegisterCoalescer and adds a testcase for r235668. llvm-svn: 235675	2015-04-24 00:25:50 +00:00

... 3 4 5 6 7 ...

12990 Commits