llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 04:52:54 +02:00

Author	SHA1	Message	Date
Akira Hatanaka	ff17fbeebc	[mips] Implement the following optimizations using dominance information to make PIC calls a little more efficient: 1. Remove instructions setting up $gp if it is known that a function has been called at least once. 2. Save the address of a called function in a register instead of loading it from the GOT at every call site. llvm-svn: 195892	2013-11-27 23:38:42 +00:00
Tom Stellard	95624c101d	R600: Expand vector FABS NOTE: This is a candidate for the 3.4 branch. llvm-svn: 195881	2013-11-27 21:23:39 +00:00
Tom Stellard	eac3acc854	R600/SI: Implement spilling of SGPRs v5 SGPRs are spilled into VGPRs using the {READ,WRITE}LANE_B32 instructions. v2: - Fix encoding of Lane Mask - Use correct register flags, so we don't overwrite the low dword when restoring multi-dword registers. v3: - Register spilling seems to hang the GPU, so replace all shaders that need spilling with a dummy shader. v4: - Fix *LANE definitions - Change destination reg class for 32-bit SMRD instructions v5: - Remove small optimization that was crashing Serious Sam 3. https://bugs.freedesktop.org/show_bug.cgi?id=68224 https://bugs.freedesktop.org/show_bug.cgi?id=71285 NOTE: This is a candidate for the 3.4 branch. llvm-svn: 195880	2013-11-27 21:23:35 +00:00
Tom Stellard	d386cdf4d0	R600/SI: Use SGPR_32 register class for 32-bit SMRD outputs Writing to the M0 register from an SMRD instruction hangs the GPU, so we need to use the SGPR_32 register class, which does not include M0. NOTE: This is a candidate for the 3.4 branch. llvm-svn: 195879	2013-11-27 21:23:29 +00:00
Tom Stellard	0a14ce13e1	R600: Add support for ISD::FROUND NOTE: This is a candidate for the 3.4 branch. llvm-svn: 195878	2013-11-27 21:23:20 +00:00
Rafael Espindola	c377a725ca	Use FileCheck and expand the test a bit. In particular, check the name of the symbol we are putting in the constant pool. llvm-svn: 195865	2013-11-27 19:22:14 +00:00
Rafael Espindola	c099bf8035	Use the same tls section name as msvc. We currently error in clang with: "error: thread-local storage is unsupported for the current target", but we can start to get the llvm level ready. When compiling template<typename T> struct foo { static __declspec(thread) int bar; }; template<typename T> __declspec(therad) int foo<T>::bar; template struct foo<int>; msvc produces SECTION HEADER #3 .tls$ name 0 physical address 0 virtual address 4 size of raw data 12F file pointer to raw data (0000012F to 00000132) 0 file pointer to relocation table 0 file pointer to line numbers 0 number of relocations 0 number of line numbers C0301040 flags Initialized Data COMDAT; sym= "public: static int foo<int>::bar" (?bar@?$foo@H@@2HA) 4 byte align Read Write gcc produces a ".data$__emutls_v.<symbol>" for the testcase with __declspec(thread) replaced with thread_local. llvm-svn: 195849	2013-11-27 15:52:11 +00:00
Jiangning Liu	d9270b7a51	Fix the AArch64 NEON bug exposed by checking constant integer argument range of ACLE intrinsics. llvm-svn: 195843	2013-11-27 14:02:25 +00:00
Rafael Espindola	22b6ec4d69	Cleanup and test X86AsmPrinter::printPCRelImm. It is only used for asm printing. On X86 we put basic block addresses on register before passing them to inline asm, so the MO_MachineBasicBlock case was dead. MO_ExternalSymbol was dead since any symbol being passed to inline asm is represented as MO_GlobalAddress. The MO_GlobalAddress and MO_Register cases were not tested. llvm-svn: 195824	2013-11-27 06:53:13 +00:00
Chad Rosier	ca062e81db	[AArch64] Add support for NEON scalar floating-point absolute difference. llvm-svn: 195803	2013-11-27 01:45:58 +00:00
Rafael Espindola	3ceb67b21b	Use simple section names for COMDAT sections on COFF. With this patch we use simple names for COMDAT sections (like .text or .bss). This matches the MSVC behavior. When merging it is the COMDAT symbol that is used to decide if two sections should be merged, so there is no point in building a fancy name. This survived a bootstrap on mingw32. llvm-svn: 195798	2013-11-27 01:18:37 +00:00
Nadav Rotem	dc01e91cf5	PR1860 - We can't save a list of ExtractElement instructions to CSE because some of these instructions may be removed and optimized in future iterations. Instead we save a list of basic blocks that we need to CSE. llvm-svn: 195791	2013-11-26 22:24:25 +00:00
Chad Rosier	1337fcc721	[AArch64] Add support for NEON scalar floating-point to integer convert instructions. llvm-svn: 195788	2013-11-26 22:17:37 +00:00
Arnold Schwaighofer	d0c05d2c84	LoopVectorizer: Truncate i64 trip counts of i32 phis if necessary In signed arithmetic we could end up with an i64 trip count for an i32 phi. Because it is signed arithmetic we know that this is only defined if the i32 does not wrap. It is therefore safe to truncate the i64 trip count to a i32 value. Fixes PR18049. llvm-svn: 195787	2013-11-26 22:11:23 +00:00
Reed Kotler	06b47695fb	Fix a bug related to constant islands for Mips16 and mips16/32 dual mode. The determination of when we are doing constant pools was being made too early in the asm printer. llvm-svn: 195781	2013-11-26 20:38:40 +00:00
Michael Liao	8c702e1a18	Fix PR18054 - Fix bug in (vsext (vzext x)) -> (vsext x) in SIGN_EXTEND_IN_REG lowering where we need to check whether x is a vector type (in-reg type) of i8, i16 or i32; otherwise, that optimization is not valid. llvm-svn: 195779	2013-11-26 20:31:31 +00:00
David Blaikie	bbf2455d59	DwarfDebug: Include type units in accelerator tables. Since type units aren't in the CUMap, use the DwarfUnits list to iterate over units for tasks such as accelerator table building. llvm-svn: 195776	2013-11-26 19:14:34 +00:00
Nadav Rotem	643eb4c26e	PR18060 - When we RAUW values with ExtractElement instructions in some cases we generate PHI nodes with multiple entries from the same basic block but with different values. Enabling CSE on ExtractElement instructions make sure that all of the RAUWed instructions are the same. llvm-svn: 195773	2013-11-26 17:29:19 +00:00
Stepan Dyatkovskiy	83455f2b60	PR17925 bugfix. Short description. This issue is about case of treating pointers as integers. We treat pointers as different if they references different address space. At the same time, we treat pointers equal to integers (with machine address width). It was a point of false-positive. Consider next case on 32bit machine: void foo0(i32 addrespace(1)* %p) void foo1(i32 addrespace(2)* %p) void foo2(i32 %p) foo0 != foo1, while foo1 == foo2 and foo0 == foo2. As you can see it breaks transitivity. That means that result depends on order of how functions are presented in module. Next order causes merging of foo0 and foo1: foo2, foo0, foo1 First foo0 will be merged with foo2, foo0 will be erased. Second foo1 will be merged with foo2. Depending on order, things could be merged we don't expect to. The fix: Forbid to treat any pointer as integer, except for those, who belong to address space 0. llvm-svn: 195769	2013-11-26 16:11:03 +00:00
Tim Northover	f0a2ff9091	Darwin-ARM: use movw/movt for static relocations llvm-svn: 195759	2013-11-26 12:45:05 +00:00
Richard Sandiford	b3250399ac	[SystemZ] Fix incorrect use of RISBG for a zero-extended right shift We would wrongly transform the testcase into the equivalent of an AND with 1. The problem was that, when testing whether the shifted-in bits of the right shift were significant, we used the width of the final zero-extended result rather than the width of the shifted value. llvm-svn: 195731	2013-11-26 10:53:16 +00:00
Kevin Qin	1370a1e1ee	Refactored the implementation of AArch64 NEON instruction ZIP, UZP and TRN. Fix a bug when mixed use of vget_high_u8() and vuzp_u8(). llvm-svn: 195716	2013-11-26 03:26:47 +00:00
Kevin Qin	95c8b28223	[AArch64]Implement 128 bit register copy with NEON. llvm-svn: 195713	2013-11-26 02:33:42 +00:00
Andrew Trick	95afafe3fa	StackMap: Implement support for DirectMemRefOp. A Direct stack map location records the address of frame index. This address is itself the value that the runtime requested. This differs from IndirectMemRefOp locations, which refer to a stack locations from which the requested values must be loaded. Direct locations can directly communicate the address if an alloca, while IndirectMemRefOp handle register spills. For example: entry: %a = alloca i64... llvm.experimental.stackmap(i32 <ID>, i32 <shadowBytes>, i64* %a) Since both the alloca and stackmap intrinsic are in the entry block, and the intrinsic takes the address of the alloca, the runtime can assume that LLVM will not substitute alloca with any intervening value. This must be verified by the runtime by checking that the stack map's location is a Direct location type. The runtime can then determine the alloca's relative location on the stack immediately after compilation, or at any time thereafter. This differs from Register and Indirect locations, because the runtime can only read the values in those locations when execution reaches the instruction address of the stack map. llvm-svn: 195712	2013-11-26 02:03:25 +00:00
David Blaikie	aeec78b126	DebugInfo: Update test case due to dumper improvements in r195698 The dumper was only dumping one pubtypes set and it was /always/ dumping one pubtypes set even when there were zero sets. Now that the dumper correctly dumps zero, one, or many sets, we can update this test case to test for the absolute absence of a set rather than a bogus/accidental zero-valued set. llvm-svn: 195706	2013-11-26 01:11:02 +00:00
David Blaikie	98277f8277	DebugInfo: Avoid emitting pubtype entries for type DIEs that just indirect to a type unit. llvm-svn: 195698	2013-11-26 00:22:37 +00:00
Cameron McInally	2ff051483c	Add an intrinsic for the SSE2 PAUSE instruction. llvm-svn: 195697	2013-11-26 00:20:43 +00:00
Chandler Carruth	497a42d1b9	Add the test case that I missed when committing r195528. Doh! llvm-svn: 195691	2013-11-25 22:24:27 +00:00
Rafael Espindola	ae17ac667e	Use -triple to fix the test on non-ELF hosts. llvm-svn: 195682	2013-11-25 20:46:18 +00:00
Rafael Espindola	fa5cbd5557	Don't use nopl in cpus that don't support it. Patch by Mikulas Patocka. I added the test. I checked that for cpu names that gas knows about, it also doesn't generate nopl. The modified cpus: i686 - there are i686-class CPUs that don't have nopl: Via c3, Transmeta Crusoe, Microsoft VirtualBox - see https://bbs.archlinux.org/viewtopic.php?pid=775414 k6, k6-2, k6-3, winchip-c6, winchip2 - these are 586-class CPUs via c3 c3-2 - see https://bugs.archlinux.org/task/19733 as a proof that Via c3 and c3-Nehemiah don't have nopl llvm-svn: 195679	2013-11-25 20:15:14 +00:00
David Peixotto	647697e4ae	ARM integrated assembler generates incorrect nop opcode This patch fixes a bug in the assembler that was causing bad code to be emitted. When switching modes in an assembly file (e.g. arm to thumb mode) we would always emit the opcode from the original mode. Consider this small example: $ cat align.s .code 16 foo: add r0, r0 .align 3 add r0, r0 $ llvm-mc -triple armv7-none-linux align.s -filetype=obj -o t.o $ llvm-objdump -triple thumbv7 -d t.o Disassembly of section .text: foo: 0: 00 44 add r0, r0 2: 00 f0 20 e3 blx #4195904 6: 00 00 movs r0, r0 8: 00 44 add r0, r0 This shows that we have actually emitted an arm nop (e320f000) instead of a thumb nop. Unfortunately, this encodes to a thumb branch which causes bad things to happen when compiling assembly code with align directives. The fix is to notify the ARMAsmBackend when we switch mode. The MCMachOStreamer was already doing this correctly. This patch makes the same change for the MCElfStreamer. There is still a bug in the way nops are emitted for alignment because the MCAlignment fragment does not store the correct mode. The ARMAsmBackend will emit nops for the last mode it knew about. In the example above, we still generate an arm nop if we add a `.code 32` to the end of the file. PR18019 llvm-svn: 195677	2013-11-25 19:11:13 +00:00
Bill Wendling	0fe82ef0aa	Unrevert r195599 with testcase fix. I'm not sure how it was checking for the wrong values... PR18023. llvm-svn: 195670	2013-11-25 18:05:22 +00:00
Rafael Espindola	a355ffef1b	Fix .comm and .lcomm on COFF. These should not use COMDATs. GNU as uses .bss for .lcomm and section 0 for .comm. Given static int a; int b; MSVC puts both in .bss. This patch then puts both .comm and .lcomm on .bss. With this change we agree with gas on .lcomm, are much closer on .comm and clang-cl matches msvc on the above example. llvm-svn: 195654	2013-11-25 16:06:04 +00:00
Amara Emerson	368f3c89e8	[ARM] Enable FeatureMP for Cortex-A5 by default. Patch by Oliver Stannard. llvm-svn: 195640	2013-11-25 13:17:15 +00:00
Amara Emerson	dfecbfdfc2	Revert r195599 as it broke the builds. llvm-svn: 195636	2013-11-25 11:24:18 +00:00
Daniel Sanders	054e9e0703	Fixed tryFoldToZero() for vector types that need expansion. Summary: Moved the requirement for SelectionDAG::getConstant() to return legally typed nodes slightly earlier. There were two optional DAGCombine passes that were missed out and were required to produce type-legal DAGs. Simplified a code-path in tryFoldToZero() to use SelectionDAG::getConstant(). This provides support for both promoted and expanded vector types whereas the previous code only supported promoted vector types. Fixes a "Type for zero vector elements is not legal" assertion detected by an llvm-stress generated test. Reviewers: resistor CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2251 llvm-svn: 195635	2013-11-25 11:14:43 +00:00
Bill Wendling	f4bc87d59d	Don't look past volatile loads. A volatile load should block us from trying to coalesce stores. PR18023 llvm-svn: 195599	2013-11-25 05:01:21 +00:00
Hao Liu	66ab312f94	Fixed a bug about disassembling AArch64 post-index load/store single element instructions. ie. echo "0x00 0x04 0x80 0x0d" \| ../bin/llvm-mc -triple=aarch64 -mattr=+neon -disassemble echo "0x00 0x00 0x80 0x0d" \| ../bin/llvm-mc -triple=aarch64 -mattr=+neon -disassemble will be disassembled into the same instruction st1 {v0b}[0], [x0], x0. llvm-svn: 195591	2013-11-25 01:53:26 +00:00
Venkatraman Govindaraju	d0d03fae95	[Sparc] Emit large negative adjustments to SP/FP with sethi+xor instead of sethi+or. This generates correct code for both sparc32 and sparc64. llvm-svn: 195576	2013-11-24 20:23:25 +00:00
Venkatraman Govindaraju	73dd53211d	[SparcV9]: Do not emit .register directives for global registers that are clobbered by calls but not used in the function itself. llvm-svn: 195574	2013-11-24 18:41:49 +00:00
Venkatraman Govindaraju	0c27a5ac2c	[SparcV9] Enable custom lowering of DYNAMIC_STACKALLOC in sparc64. llvm-svn: 195573	2013-11-24 17:41:41 +00:00
Reed Kotler	6088c0e228	Make sure that for C++ emitting LwConstant32 pseudos, that it corresponds to what is needed for constant islands. The prescan method for Mips16 constant islands will eventually go away. It is only temporary and should be done earlier when the instructions are first created or from the DAG. If we keep it here we need to handle better the situation where constant islands is called multiple times since don't want to prescan more than once. llvm-svn: 195569	2013-11-24 06:18:50 +00:00
Reed Kotler	6a8a859a63	Update older test cases for latest patch. llvm-svn: 195566	2013-11-24 03:37:56 +00:00
Reed Kotler	eb75f46c95	Fix a funny bug I introduced during conversion of ARM constant islands to Mips. I had to move some code and I moved a declaration forward past it's first use in the function but by nutty coincidence there was another variable of the same name and type and with completely unrelated function that was declared globally in the class so no compilation error ensued. It required some unusual conditions for it to even matter. Caused test case casts.c in test-suite to fail during compilation with a duplicate symbol error. I would have noticed it during final code review for this port. llvm-svn: 195565	2013-11-24 02:53:09 +00:00
Manman Ren	e53617a3e6	Debug Info: update testing cases to specify the debug info version number. We are going to drop debug info without a version number or with a different version number, to make sure we don't crash when we see bitcode files with different debug info metadata format. Make tests more robust by removing hard-coded metadata numbers in CHECK lines. llvm-svn: 195535	2013-11-23 01:16:29 +00:00
Tom Stellard	5da7926d0a	R600/SI: Fixing handling of condition codes We were ignoring the ordered/onordered bits and also the signed/unsigned bits of condition codes when lowering the DAG to MachineInstrs. NOTE: This is a candidate for the 3.4 branch. llvm-svn: 195514	2013-11-22 23:07:58 +00:00
Manman Ren	f0d5143ea6	Debug Info: update testing cases to specify the debug info version number. We are going to drop debug info without a version number or with a different version number, to make sure we don't crash when we see bitcode files with different debug info metadata format. llvm-svn: 195504	2013-11-22 21:49:45 +00:00
Jim Grosbach	02f7297367	X86: Perform integer comparisons at i32 or larger. Utilizing the 8 and 16 bit comparison instructions, even when an input can be folded into the comparison instruction itself, is typically not worth it. There are too many partial register stalls as a result, leading to significant slowdowns. By always performing comparisons on at least 32-bit registers, performance of the calculation chain leading to the comparison improves. Continue to use the smaller comparisons when minimizing size, as that allows better folding of loads into the comparison instructions. rdar://15386341 llvm-svn: 195496	2013-11-22 19:57:47 +00:00
Matt Arsenault	35acaad8c7	StructurizeCFG: Fix verification failure with some loops. If the beginning of the loop was also the entry block of the function, branches were inserted to the entry block which isn't allowed. If this occurs, create a new dummy function entry block that branches to the start of the loop. llvm-svn: 195493	2013-11-22 19:24:39 +00:00
Matt Arsenault	9afcbf3562	StructurizeCFG: Fix inverting a branch on an argument llvm-svn: 195492	2013-11-22 19:24:37 +00:00
Paul Robinson	0ef8735f0a	Teach ISel not to optimize 'optnone' functions (revised). Improvements over r195317: - Set/restore EnableFastISel flag instead of just running FastISel within SelectAllBasicBlocks; the flag is checked in various places, and FastISel won't run properly if those places don't do the right thing. - Test looks for normal ISel versus FastISel behavior, and not something more subtle that doesn't work everywhere. Based on work by Andrea Di Biagio. llvm-svn: 195491	2013-11-22 19:11:24 +00:00
Andrew Trick	0167d0293a	patchpoint: factor SD builder code for live vars. Plain stackmap also optimizes Constant values now. llvm-svn: 195488	2013-11-22 19:07:36 +00:00
Rafael Espindola	524e7f35de	Add a fixed version of r195470 back. The fix is simply to use CurI instead of I when handling aliases to avoid accessing a invalid iterator. original message: Convert linkonce* to weak* instead of strong. Also refactor the logic into a helper function. This is an important improve on mingw where the linker complains about mixed weak and strong symbols. Converting to weak ensures that the symbol is not dropped, but keeps in a comdat, making the linker happy. llvm-svn: 195477	2013-11-22 17:58:12 +00:00
Michael Liao	0f7c6dee5e	Fix PR18014 - When simplifying the mask generation for BLEND, check whether that mask is also consumed by other non-BLEND insns. If true, skip that simplification. llvm-svn: 195476	2013-11-22 17:56:57 +00:00
Richard Sandiford	d5298a3795	[SystemZ] Fix TMHH and TMHL usage for z10 with -O0 I've no idea why I decided to handle TMxx differently from all the other high/low logic operations, but it was a stupid thing to do. The high registers aren't available as separate 32-bit registers on z10, so subreg_h32 can't be used on a GR64 there. I've normally been testing with z196 and with -O3 and so hadn't noticed this until now. llvm-svn: 195473	2013-11-22 17:28:28 +00:00
Rafael Espindola	749aa1e00d	Revert "Convert linkonce* to weak* instead of strong." This reverts commit r195470. Debugging failure in some bots. llvm-svn: 195472	2013-11-22 17:09:34 +00:00
Richard Sandiford	82ac8f6b68	Add a Scalarizer pass. llvm-svn: 195471	2013-11-22 16:58:05 +00:00
Rafael Espindola	2ac1404bee	Convert linkonce* to weak* instead of strong. Also refactor the logic into a helper function. This is an important improvement on mingw where the linker complains about mixed weak and strong symbols. Converting to weak ensures that the symbol is not dropped, but keeps in a comdat, making the linker happy. llvm-svn: 195470	2013-11-22 16:14:30 +00:00
Daniel Sanders	f04c74ae00	[mips][msa] Add test case that should have been added in r195456. llvm-svn: 195469	2013-11-22 15:47:18 +00:00
Rafael Espindola	7660b5bf36	Don't produce tail calls when the caller is x86_thiscallcc. The callee will not pop the stack for us. llvm-svn: 195467	2013-11-22 15:18:28 +00:00
Tim Northover	65386891a6	ARM: use CHECK-LABEL on a test. llvm-svn: 195457	2013-11-22 13:25:07 +00:00
Richard Barton	98aacc3f23	Add support for Cortex-A12. Patch by Oliver Stannard! llvm-svn: 195448	2013-11-22 11:53:16 +00:00
Daniel Sanders	d301ede02d	[mips][msa] Float vector constants cannot use ldi.[wd] directly. Bitcast from the appropriate integer vector type. Fixes an instruction selection failure detected by llvm-stress. llvm-svn: 195444	2013-11-22 11:24:50 +00:00
Kostya Serebryany	3c8539795c	Revert r195318 as it causes miscompilation (PR18029) llvm-svn: 195439	2013-11-22 10:30:39 +00:00
Hao Liu	4c6cc894d2	Fix the bugs about AArch64 Load/Store vector types and bitcast between i64 and vector types. e.g. "%tmp = load <2 x i64>* %ptr" can't be selected. "%tmp = bitcast i64 %in to <2 x i32>" can't be selected. llvm-svn: 195424	2013-11-22 08:47:22 +00:00
Jiangning Liu	a50f9e81f3	For AArch64 back-end instruction selection, lower Neon_Lowxxx with EXTRCT_SUBREG. llvm-svn: 195408	2013-11-22 02:45:13 +00:00
NAKAMURA Takumi	b244bea94b	Tweak 3 tests in llvm/test/CodeGen/X86 to add -mcpu=generic since r195383. They failed on bdver2 buildslave. FIXME: FileCheck-ize them. llvm-svn: 195407	2013-11-22 02:28:04 +00:00
Yi Jiang	74286d427a	SLP Vectorizer: Extract cost will only be added once even if the scalar has multiple external uses. llvm-svn: 195406	2013-11-22 01:57:02 +00:00
Tom Stellard	c2f05239d7	SelectionDAG: Optimize expansion of vec_type = BITCAST scalar_type The legalizer can now do this type of expansion for more type combinations without loading and storing to and from the stack. NOTE: This is a candidate for the 3.4 branch. llvm-svn: 195398	2013-11-22 00:41:05 +00:00
Eric Christopher	8b3b8ae8cc	In Dwarf 3 (and Dwarf 2) attributes whose value are offsets into a section use the form DW_FORM_data4 whilst in Dwarf 4 and later they use the form DW_FORM_sec_offset. This patch updates the places where such attributes are generated to use the appropriate form depending on the Dwarf version. The DIE entries affected have the following tags: DW_AT_stmt_list, DW_AT_ranges, DW_AT_location, DW_AT_GNU_pubnames, DW_AT_GNU_pubtypes, DW_AT_GNU_addr_base, DW_AT_GNU_ranges_base It also adds a hidden command line option "--dwarf-version=<uint>" to llc which allows the version of Dwarf to be generated to override what is specified in the metadata; this makes it possible to update existing tests to check the debugging information generated for both Dwarf 4 (the default) and Dwarf 3 using the same metadata. Patch (slightly modified) by Keith Walker! llvm-svn: 195391	2013-11-21 23:46:41 +00:00
Ekaterina Romanova	eda4e2e4a7	SHLD/SHRD are VectorPath (microcode) instructions known to have poor latency on certain architectures. While generating SHLD/SHRD instructions is acceptable when optimizing for size, optimizing for speed on these platforms should be implemented using alternative sequences of instructions composed of add, adc, shr, shl, or and lea which are directPath instructions. These alternative instructions not only have a lower latency but they also increase the decode bandwidth by allowing simultaneous decoding of a third directPath instruction. AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency. Optimization guides for these processors recommend using an alternative sequence of instructions. For these AMD's processors, I disabled folding (or (x << c) \| (y >> (64 - c))) when we are not optimizing for size. It might be beneficial to disable this folding for some of the Intel's processors. However, since I couldn't find specific recommendations regarding using SHLD/SHRD instructions on Intel's processors, I haven't disabled this peephole for Intel. llvm-svn: 195383	2013-11-21 23:21:26 +00:00
Peter Collingbourne	89b5505b6b	Introduce two command-line flags for the instrumentation pass to control whether the labels of pointers should be ignored in load and store instructions The new command line flags are -dfsan-ignore-pointer-label-on-store and -dfsan-ignore-pointer-label-on-load. Their default value matches the current labelling scheme. Additionally, the function __dfsan_union_load is marked as readonly. Patch by Lorenzo Martignoni! Differential Revision: http://llvm-reviews.chandlerc.com/D2187 llvm-svn: 195382	2013-11-21 23:20:54 +00:00
Artyom Skrobov	b6c8a8a69f	[ARM] add the overlooked tests for Cortex-A7 build attributes llvm-svn: 195365	2013-11-21 16:22:39 +00:00
Daniel Sanders	0e60951a47	[mips][msa] Fix a corner case in performORCombine() when combining nodes into VSELECT. Mask == ~InvMask asserts if the width of Mask and InvMask differ. The combine isn't valid (with two exceptions, see below) if the widths differ so test for this before testing Mask == ~InvMask. In the specific cases of Mask=~0 and InvMask=0, as well as Mask=0 and InvMask=~0, the combine is still valid. However, there are more appropriate combines that could be used in these cases such as folding x & 0 to 0, or x & ~0 to x. llvm-svn: 195364	2013-11-21 16:11:31 +00:00
Daniel Sanders	a556d0abd7	Add support for legalizing SETNE/SETEQ by inverting the condition code and the result of the comparison. Summary: LegalizeSetCCCondCode can now legalize SETEQ and SETNE by returning the inverse condition and requesting that the caller invert the result of the condition. The caller of LegalizeSetCCCondCode must handle the inverted CC, and they do so as follows: SETCC, BR_CC: Invert the result of the SETCC with SelectionDAG::getNOT() SELECT_CC: Swap the true/false operands. This is necessary for MSA which lacks an integer SETNE instruction. Reviewers: resistor CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2229 llvm-svn: 195355	2013-11-21 13:24:49 +00:00
Evgeniy Stepanov	405470ce24	[msan] Propagate condition origin in select instruction. llvm-svn: 195349	2013-11-21 12:00:24 +00:00
Daniel Sanders	5e17920764	[mips][msa/dsp] Only do DSP combines if DSP is enabled. Fixes a crash (null pointer dereferenced) when MSA is enabled. llvm-svn: 195343	2013-11-21 11:40:14 +00:00
Evgeniy Stepanov	0ee8c16502	Use multiple filecheck prefixes in msan instrumentation tests. llvm-svn: 195342	2013-11-21 11:37:16 +00:00
NAKAMURA Takumi	add94e0548	Revert r195317 (and r195333), "Teach ISel not to optimize 'optnone' functions." It broke, at least, i686 target. It is reproducible with "llc -mtriple=i686-unknown". FYI, it didn't appear to add either "-O0" or "-fast-isel". llvm-svn: 195339	2013-11-21 10:55:15 +00:00
Kostya Serebryany	4b3f2e0afc	add 'REQUIRES: asserts' to a test that uses 'llc -debug'; this fixes the no-asserts build llvm-svn: 195333	2013-11-21 09:28:16 +00:00
Ana Pazos	86d72bbede	Implemented Neon scalar vdup_lane intrinsics. Fixed scalar dup alias and added test case. llvm-svn: 195330	2013-11-21 08:16:15 +00:00
Ana Pazos	5ddc31e426	Implemented Neon scalar by element intrinsics. Intrinsics implemented: vqdmull_lane, vqdmulh_lane, vqrdmulh_lane, vqdmlal_lane, vqdmlsl_lane scalar Neon intrinsics. llvm-svn: 195327	2013-11-21 07:37:04 +00:00
Kostya Serebryany	1513e9969b	Don't speculate loads under ThreadSanitizer Summary: Don't speculate loads under ThreadSanitizer. This fixes https://code.google.com/p/thread-sanitizer/issues/detail?id=40 Also discussed here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-November/067929.html Reviewers: chandlerc Reviewed By: chandlerc CC: llvm-commits, dvyukov Differential Revision: http://llvm-reviews.chandlerc.com/D2227 llvm-svn: 195324	2013-11-21 07:29:28 +00:00
Bill Wendling	07a5510fa2	The basic problem is that some mainstream programs cannot deal with the way clang optimizes tail calls, as in this example: int foo(void); int bar(void) { return foo(); } where the call is transformed to: calll .L0$pb .L0$pb: popl %eax .Ltmp0: addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax movl foo@GOT(%eax), %eax popl %ebp jmpl *%eax # TAILCALL However, the GOT references must all be resolved at dlopen() time, and so this approach cannot be used with lazy dynamic linking (e.g. using RTLD_LAZY), which usually populates the PLT with stubs that perform the actual resolving. This patch changes X86TargetLowering::LowerCall() to skip tail call optimization, if the called function is a global or external symbol. Patch by Dimitry Andric! PR15086 llvm-svn: 195318	2013-11-21 07:04:30 +00:00
Paul Robinson	eba6ab82dd	Teach ISel not to optimize 'optnone' functions. Based on work by Andrea Di Biagio. llvm-svn: 195317	2013-11-21 06:33:32 +00:00
Reed Kotler	caba86b795	Add, to constant islands, long jumps similar to ARM far branch. llvm-svn: 195312	2013-11-21 05:13:23 +00:00
Hal Finkel	fb82ed6bb5	PPC popcnt[dw] do not have record forms The instruction definitions incorrectly specified that popcntd and popcntw have record forms; they do not. This mistake was causing invalid code generation. llvm-svn: 195272	2013-11-20 20:54:55 +00:00
Benjamin Kramer	40f6475264	MachineBlockPlacement: Strengthen the source order bias when picking an exit block. We now only allow breaking source order if the exit block frequency is significantly higher than the other exit block. The actual bias is currently under a flag so the best cut-off can be found; the flag defaults to the old behavior. The idea is to get some benchmark coverage over different values for the flag and pick the best one. When we require the new frequency to be at least 20% higher than the old frequency I see a 5% speedup on zlib's deflate when compressing a random file on x86_64/westmere. Hal reported a small speedup on Fhourstones on a BG/Q and no regressions in the test suite. The test case is the full long_match function from zlib's deflate. I was reluctant to add it for previous tweaks to branch probabilities because it's large and potentially fragile, but changed my mind since it's an important use case and more likely to break with all the current work going into the PGO infrastructure. Differential Revision: http://llvm-reviews.chandlerc.com/D2202 llvm-svn: 195265	2013-11-20 19:08:44 +00:00
Daniel Sanders	29b990c693	FileCheck: fix a bug with multiple --check-prefix options. Similar to r194565 Summary: Directives are being ignored, when they occur between a partial-word false match and any match on another prefix. For example, with FOO and BAR prefixes: _FOO FOO: foo BAR: bar FileCheck incorrectly matches: fog bar This happens because FOO falsely matched as a partial word at '_FOO' and was ignored while BAR matched at 'BAR:'. The match of BAR is incorrectly returned as the 'first match' causing the FOO directive to be discarded. Fixed this the same way as r194565 (D2166) did for a similar test case. The partial-word false match should be counted as a match for the purposes of finding the first match of a prefix, but should be returned as a false match using CheckTy::CheckNone so that it isn't treated as a directive. Fixes PR17995 Reviewers: samsonov, arsenm Reviewed By: samsonov CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2228 llvm-svn: 195248	2013-11-20 13:25:05 +00:00
Elena Demikhovsky	692524f3bd	AVX-512: Concat 4 128-bit vectors in one 512-bit vector. llvm-svn: 195229	2013-11-20 09:10:40 +00:00
Yuchen Wu	734fa40b2a	llvm-cov: Added file checksum to gcno and gcda files. Instead of permanently outputting "MVLL" as the file checksum, clang will create gcno and gcda checksums by hashing the destination block numbers of every arc. This allows for llvm-cov to check if the two gcov files are synchronized. Regenerated the test files so they contain the checksum. Also added negative test to ensure error when the checksums don't match. llvm-svn: 195191	2013-11-20 04:15:05 +00:00
Hal Finkel	d1fc028d62	PPC: Optimize rldicl generation for masked shifts Masking operations (where only some number of the low bits are being kept) are selected to rldicl(x, 0, mb). If x is a logical right shift (which would become rldicl(y, 64-n, n)), we might be able to fold the two instructions together: rldicl(rldicl(x, 64-n, n), 0, mb) -> rldicl(x, 64-n, mb) for n <= mb The right shift is really a left rotate followed by a mask, and if the explicit mask is a more-restrictive sub-mask of the mask implied by the shift, only one rldicl is needed. llvm-svn: 195185	2013-11-20 01:10:15 +00:00
David Blaikie	e40c1e850f	DebugInfo: Partial implementation of DWARF type units. Emit DW_TAG_type_units into the debug_info section using compile unit headers. This is bogus/unusable by debuggers, but testable and provides more isolated review. Subsequent patches will include support for type unit headers and emission into the debug_types section, as well as comdat grouping the types based on their hash. Also the CompileUnit type will be renamed 'Unit' and relevant portions pulled out into respective CompileUnit and TypeUnit types. llvm-svn: 195166	2013-11-19 23:08:21 +00:00
Arnold Schwaighofer	242935ec8c	SLPVectorizer: Fix stale for Value pointer array We are slicing an array of Value pointers and process those slices in a loop. The problem is that we might invalidate a later slice by vectorizing a former slice. Use a WeakVH to track the pointer. If the pointer is deleted or RAUW'ed we can tell. The test case will only fail when running with libgmalloc. radar://15498655 llvm-svn: 195162	2013-11-19 22:20:20 +00:00
Petar Jovanovic	ddac1ebfb9	[mips] Resolve relocation for the stubs in MCJIT when load address is known Instead of processing relocation for branch to stubs right away, emit a modified relocation and add it to queue to be resolved later when final load address is known. This resolves seven MIPS MCJIT issues that were caused by missing relocation fixups at the end. llvm-svn: 195157	2013-11-19 21:56:00 +00:00
Rafael Espindola	4833910f66	Make it explicit that nulls are not allowed in names. The object files we support use null terminated strings, so there is no way to support these. This patch adds an assert to catch bad API use and an error check in the .ll parser. llvm-svn: 195155	2013-11-19 21:12:39 +00:00
Jack Carter	6943b6e5c6	reverts 195057 per request llvm-svn: 195152	2013-11-19 20:53:28 +00:00
Rafael Espindola	5d21406399	Support multiple COFF sections with the same name but different COMDAT. This is the first step to fix pr17918. It extends the .section directive a bit, inspired by what the ELF one looks like. The problem with using linkonce is that given .section foo .linkonce.... .section foo .linkonce we would already have switched sections when getting to .linkonce. The cleanest solution seems to be to add the comdat information in the .section itself. llvm-svn: 195148	2013-11-19 19:52:52 +00:00
Cameron McInally	9232b52359	Fix assembly operands for the SSE2 cvtsd2ss instruction. llvm-svn: 195129	2013-11-19 14:36:00 +00:00
Simon Atanasyan	226923909e	[Mips] Adjust float ABI settings in case of MIPS16 mode. Hard float for mips16 means essentially to compile as soft float but to use a runtime library for soft float that is written with native mips32 floating point instructions (those runtime routines run in mips32 hard float mode). The patch reviewed by Reed Kotler. llvm-svn: 195123	2013-11-19 12:20:17 +00:00
Chandler Carruth	4d8e469cd3	Fix an issue where SROA computed different results based on the relative order of slices of the alloca which have exactly the same size and other properties. This was found by a perniciously unstable sort implementation used to flush out buggy uses of the algorithm. The fundamental idea is that findCommonType should return the best common type it can find across all of the slices in the range. There were two bugs here previously: 1) We would accept an integer type smaller than a byte-width multiple, and if there were different bit-width integer types, we would accept the first one. This caused an actual failure in the testcase updated here when the sort order changed. 2) If we found a bad combination of types or a non-load, non-store use before an integer typed load or store we would bail, but if we found the integere typed load or store, we would use it. The correct behavior is to always use an integer typed operation which covers the partition if one exists. While a clever debugging sort algorithm found problem #1 in our existing test cases, I have no useful test case ideas for #2. I spotted in by inspection when looking at this code. llvm-svn: 195118	2013-11-19 09:03:18 +00:00
Daniel Jasper	9d3984a876	Add .clang-format without column limit to subdirectory tests/. A column limit in the test folder can lead to trouble as the RUN, CHECK, etc. comments can potentially be broken over multiple lines changing their meaning. Without column limit, clang-format will simply keep the test author's line breaks. llvm-svn: 195100	2013-11-19 04:26:05 +00:00
Andrew Trick	9f7d826e8a	Use symbolic operands in the patchpoint folding routine and fix a spilling bug. Fixes <rdar://15487687> [JS] AnyRegCC argument ends up being spilled llvm-svn: 195094	2013-11-19 03:29:59 +00:00
Hao Liu	b26dfe0306	Implement AArch64 neon instructions class SIMD lsone and SIMD lone-post. llvm-svn: 195078	2013-11-19 02:17:05 +00:00
Jiangning Liu	42b7a215f4	Implement AArch64 SISD intrinsics for vget_high and vget_low. llvm-svn: 195074	2013-11-19 01:46:48 +00:00
Kevin Qin	7b74269765	implement MC layer of AArch64 neon instruction PMULL and PMULL2 with 128 bit integer. llvm-svn: 195072	2013-11-19 01:40:25 +00:00
Jiangning Liu	7c858f236d	Add predicate for AArch64 crypto instructions. llvm-svn: 195071	2013-11-19 01:38:31 +00:00
Jack Carter	8bb31d387d	[Mips] Support for MicroMips STO refactoring. No true functional changes. Change the "hack" name of emitMipsHackSTOCG to emitSymSTO. Remove demonstration code in AsmParser for emitMipsHackSTOCG and emitMipsHackELFFlags. The STO field is in an ELF symbol and is not an explicit directive. That said, we are missing the compliment call in AsmParser and that will need to be addressed soon. XFAIL dummy tests for emitMipsHackELFFlags and emitMipsHackELFFlags. These will built out with following patches. llvm-svn: 195067	2013-11-19 01:25:18 +00:00
David Blaikie	97d9c49ba1	llvm-dwarfdump: support for emitting only the debug_types section using -debug-dump llvm-svn: 195063	2013-11-19 00:29:42 +00:00
Reid Kleckner	552118c34a	Revert "COFF: Emit all MCSymbols rather than filtering out some of them" This reverts commit r190888, to fix PR17967. The original change wasn't the right way to get @feat.00 into the object file. The right fix is to make @feat.00 be a global symbol. llvm-svn: 195053	2013-11-18 23:08:12 +00:00
Adrian Prantl	afeac86924	Debug info: Let LowerDbgDeclare perfom the dbg.declare -> dbg.value lowering only for load/stores to scalar allocas. The resulting values confuse the backend and don't add anything because we can describe array-allocas with a dbg.declare intrinsic just fine. rdar://problem/15464571 llvm-svn: 195052	2013-11-18 23:04:38 +00:00
Paul Robinson	27ef03dc70	The 'optnone' attribute means don't inline anything into this function (except functions marked always_inline). Functions with 'optnone' must also have 'noinline' so they don't get inlined into any other function. Based on work by Andrea Di Biagio. llvm-svn: 195046	2013-11-18 21:44:03 +00:00
Matt Arsenault	be108f1643	R600/SI: Fix moveToVALU when the first operand is VSrc. Moving into a VSrc doesn't always work, since it could be replaced with an SGPR later. llvm-svn: 195042	2013-11-18 20:09:55 +00:00
Matt Arsenault	cdea5c8fe0	R600/SI: Fix multiple SGPR reads when using VCC. No other SGPR operands are allowed, so if VCC is used, move the other to a VGPR. llvm-svn: 195041	2013-11-18 20:09:50 +00:00
Matt Arsenault	485f69c9cf	R600/SI: Implement add i64, but do not yet enable. Test doesn't actually check the output. I need to fix add i64 being matched for the addressing calculations. llvm-svn: 195040	2013-11-18 20:09:47 +00:00
Matt Arsenault	62a8d8b89a	R600/SI: Move patterns to match add / sub to scalar instructions llvm-svn: 195034	2013-11-18 20:09:29 +00:00
Tom Stellard	84bb236e61	R600: Enable the IR structurizer by default llvm-svn: 195031	2013-11-18 19:43:44 +00:00
Tom Stellard	f1b1fa4727	R600: Fix a crash in the AMDILCFGStrucurizer The ifPatternMatch() function was not correctly reporting the number of matches in some cases. llvm-svn: 195030	2013-11-18 19:43:38 +00:00
Tom Stellard	d0cdc72805	R600/SI: Fix illegal VGPR->SGPR copy inside of loop llvm-svn: 195026	2013-11-18 18:50:20 +00:00
Tom Stellard	47634da2de	R600/SI: Fix another case of illegal VGPR->SGPR copy llvm-svn: 195025	2013-11-18 18:50:15 +00:00
Alexey Samsonov	cbf7462c74	[ASan] Fix PR17867 - make sure ASan doesn't crash if use-after-scope and use-after-return are combined. llvm-svn: 195014	2013-11-18 14:53:55 +00:00
NAKAMURA Takumi	f5722be30d	[PR17978] Mark two ARM/fast-isel tests as XFAIL:vg_leak due to GV. llvm-svn: 195010	2013-11-18 13:50:19 +00:00
Arnold Schwaighofer	e4280ec4dd	LoopVectorizer: Extend the induction variable to a larger type In some case the loop exit count computation can overflow. Extend the type to prevent most of those cases. The problem is loops like: int main () { int a = 1; char b = 0; lbl: a &= 4; b--; if (b) goto lbl; return a; } The backedge count is 255. The induction variable type is i8. If we add one to 255 to get the exit count we overflow to zero. To work around this issue we extend the type of the induction variable to i32 in the case of i8 and i16. PR17532 llvm-svn: 195008	2013-11-18 13:14:32 +00:00
Daniel Sanders	52b1c62a95	[mips] Fix 'ran out of registers' in MIPS32 with FP64 when generating code for (ConstantFP 0.0) Fixed an inappropriate use of BuildPairF64 when compiling for MIPS32 with FP64 which resulted in an impossible constraint on the register allocation. It now uses BuildPairF64_64. llvm-svn: 195007	2013-11-18 13:12:43 +00:00
Matheus Almeida	a941fd6ccd	[mips][msa] Update encoding of bnz.v (typo). Note that there's no hardware yet that relies on that encoding. llvm-svn: 195006	2013-11-18 13:09:54 +00:00
Matheus Almeida	f3405464c6	[mips][msa] Fix immediate value of LSA instruction as it was being wrongly encoded. The immediate field should be encoded as "imm - 1" as the CPU always adds one to that field. llvm-svn: 195004	2013-11-18 12:32:49 +00:00
Kevin Qin	eb2e892703	[AArch64 NEON]Add mov alias for simd copy instructions. Set some unspecified bits of INS/DUP to zero as ARMARM requested. llvm-svn: 194996	2013-11-18 09:20:32 +00:00
Hao Liu	fcc294f3dd	Implement the newly added ACLE functions for ld1/st1 with 2/3/4 vectors. The functions are like: vst1_s8_x2 ... llvm-svn: 194990	2013-11-18 06:31:53 +00:00
Bill Wendling	d1f1ad97d3	Testcase for PR17964 llvm-svn: 194961	2013-11-17 10:53:19 +00:00
Benjamin Kramer	61051e1fa4	DAGCombiner: Partially revert r192795, getNOT was fixed not to create illegal constants. llvm-svn: 194959	2013-11-17 10:40:03 +00:00
Hal Finkel	f09058ab9e	Add the cold attribute to error-reporting call sites Generally speaking, control flow paths with error reporting calls are cold. So far, error reporting calls are calls to perror and calls to fprintf, fwrite, etc. with stderr as the stream. This can be extended in the future. The primary motivation is to improve block placement (the cold attribute affects the static branch prediction heuristics). llvm-svn: 194943	2013-11-17 02:06:35 +00:00
Andrew Trick	bd486c29f4	Added a size field to the stack map record to handle subregister spills. Implementing this on bigendian platforms could get strange. I added a target hook, getStackSlotRange, per Jakob's recommendation to make this as explicit as possible. llvm-svn: 194942	2013-11-17 01:36:23 +00:00
Matt Arsenault	ae406d5aa1	Use right address space pointer size llvm-svn: 194940	2013-11-17 00:06:39 +00:00
Hal Finkel	cc70e01f05	Add a loop rerolling pass This adds a loop rerolling pass: the opposite of (partial) loop unrolling. The transformation aims to take loops like this: for (int i = 0; i < 3200; i += 5) { a[i] += alpha * b[i]; a[i + 1] += alpha * b[i + 1]; a[i + 2] += alpha * b[i + 2]; a[i + 3] += alpha * b[i + 3]; a[i + 4] += alpha * b[i + 4]; } and turn them into this: for (int i = 0; i < 3200; ++i) { a[i] += alpha * b[i]; } and loops like this: for (int i = 0; i < 500; ++i) { x[3i] = foo(0); x[3i+1] = foo(0); x[3*i+2] = foo(0); } and turn them into this: for (int i = 0; i < 1500; ++i) { x[i] = foo(0); } There are two motivations for this transformation: 1. Code-size reduction (especially relevant, obviously, when compiling for code size). 2. Providing greater choice to the loop vectorizer (and generic unroller) to choose the unrolling factor (and a better ability to vectorize). The loop vectorizer can take vector lengths and register pressure into account when choosing an unrolling factor, for example, and a pre-unrolled loop limits that choice. This is especially problematic if the manual unrolling was optimized for a machine different from the current target. The current implementation is limited to single basic-block loops only. The rerolling recognition should work regardless of how the loop iterations are intermixed within the loop body (subject to dependency and side-effect constraints), but the significant restriction is that the order of the instructions in each iteration must be identical. This seems sufficient to capture all current use cases. This pass is not currently enabled by default at any optimization level. llvm-svn: 194939	2013-11-16 23:59:05 +00:00
Hal Finkel	79b1387151	Apply the InstCombine fptrunc sqrt optimization to llvm.sqrt InstCombine, in visitFPTrunc, applies the following optimization to sqrt calls: (fptrunc (sqrt (fpext x))) -> (sqrtf x) but does not apply the same optimization to llvm.sqrt. This is a problem because, to enable vectorization, Clang generates llvm.sqrt instead of sqrt in fast-math mode, and because this optimization is being applied to sqrt and not applied to llvm.sqrt, sometimes the fast-math code is slower. This change makes InstCombine apply this optimization to llvm.sqrt as well. This fixes the specific problem in PR17758, although the same underlying issue (optimizations applied to libcalls are not applied to intrinsics) exists for other optimizations in SimplifyLibCalls. llvm-svn: 194935	2013-11-16 21:29:08 +00:00
Matt Arsenault	3f72b0ae69	Fix assert on unaligned access to global with different address space size. llvm-svn: 194934	2013-11-16 20:50:54 +00:00
Matt Arsenault	82257ae18e	Fix codegen for null different sized pointer. llvm-svn: 194932	2013-11-16 20:24:41 +00:00
Vincent Lejeune	2a45033d9c	R600: Make dot_4 instructions predicable llvm-svn: 194927	2013-11-16 16:24:41 +00:00
Benjamin Kramer	0519e29d1b	InstCombine: fold (A >> C) == (B >> C) --> (A^B) < (1 << C) for constant Cs. This is common in bitfield code. llvm-svn: 194925	2013-11-16 16:00:48 +00:00
Matt Arsenault	4b9d0ada44	Use correct size for address space in BasicAA. The tests just hit this with a different sized address space since I haven't figured out how to use this to break it. I thought I committed this a long time ago, and I'm not sure why missing this hasn't caused any problems. llvm-svn: 194903	2013-11-16 00:36:43 +00:00
Eric Christopher	61a58988fa	For dwarf4 use the correct form for referencing debug_loc locations, and update test cases accordingly. This doesn't affect the output dumped using llvm-dwarfdump, but readelf does now dump the debug_loc section. llvm-svn: 194898	2013-11-16 00:18:40 +00:00
Ana Pazos	b1568fd504	Implemented aarch64 Neon scalar vmulx_lane intrinsics Implemented aarch64 Neon scalar vfma_lane intrinsics Implemented aarch64 Neon scalar vfms_lane intrinsics Implemented legacy vmul_n_f64, vmul_lane_f64, vmul_laneq_f64 intrinsics (v1f64 parameter type) using Neon scalar instructions. Implemented legacy vfma_lane_f64, vfms_lane_f64, vfma_laneq_f64, vfms_laneq_f64 intrinsics (v1f64 parameter type) using Neon scalar instructions. llvm-svn: 194888	2013-11-15 23:32:10 +00:00
Arnold Schwaighofer	01b6f1cc9a	LoopVectorizer: Use abi alignment for accesses with no alignment When we vectorize a scalar access with no alignment specified, we have to set the target's abi alignment of the scalar access on the vectorized access. Using the same alignment of zero would be wrong because most targets will have a bigger abi alignment for vector types. This probably fixes PR17878. llvm-svn: 194876	2013-11-15 23:09:33 +00:00
Chad Rosier	6b1d577e71	[AArch64] Fix the scalar NEON ACLE functions so that they return float/double rather than the vector equivalent. llvm-svn: 194853	2013-11-15 21:28:10 +00:00
Rui Ueyama	30dec160ae	Path: Recognize COFF import library file magic. Summary: Make identify_magic to recognize COFF import file. Reviewers: Bigcheese CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2165 llvm-svn: 194852	2013-11-15 21:22:02 +00:00
Manman Ren	2189fb16b9	ArgumentPromotion: correctly transfer TBAA tags and alignments. We used to use std::map<IndicesVector, LoadInst> for OriginalLoads, and when we try to promote two arguments, they will both write to OriginalLoads causing created loads for the two arguments to have the same original load. And the same tbaa tag and alignment will be put to the created loads for the two arguments. The fix is to use std::map<std::pair<Argument, IndicesVector>, LoadInst*> for OriginalLoads, so each Argument will write to different parts of the map. PR17906 llvm-svn: 194846	2013-11-15 20:41:15 +00:00
Bob Wilson	d433cf7463	Avoid illegal integer promotion in fastisel Stop folding constant adds into GEP when the type size doesn't match. Otherwise, the adds' operands are effectively being promoted, changing the conditions of an overflow. Results are different when: sext(a) + sext(b) != sext(a + b) Problem originally found on x86-64, but also fixed issues with ARM and PPC, which used similar code. <rdar://problem/15292280> Patch by Duncan Exon Smith! llvm-svn: 194840	2013-11-15 19:09:27 +00:00
Tom Stellard	01fa6ad95f	R600/SI: Add VReg_96 register class to SIRegisterInfo::hasVGPRs() This fixes a crash with GNOME settings manager. llvm-svn: 194836	2013-11-15 18:26:45 +00:00
Daniel Sanders	862d22db33	[mips][msa] Merge basic_operations_little.ll into basic_operations.ll. Now that FileCheck supports multiple check prefixes, we don't need to keep the little and big endian versions of this test separate anymore. Merge them back together. llvm-svn: 194826	2013-11-15 17:24:41 +00:00
Cameron McInally	cae8bdeb82	Add AVX512 unmasked FMA intrinsics and support. llvm-svn: 194824	2013-11-15 17:01:14 +00:00

1 2 3 4 5 ...

21891 Commits