llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 05:23:45 +02:00

Author	SHA1	Message	Date
Matt Arsenault	9be7600d7c	R600/SI: Allow commuting with source modifiers llvm-svn: 220066	2014-10-17 18:00:48 +00:00
Matt Arsenault	2d0382bc47	R600/SI: Simplify code with hasModifiersSet llvm-svn: 220065	2014-10-17 18:00:45 +00:00
Matt Arsenault	9c6d84b515	R600/SI: Fix general commuting breaking src mods The generic code trying to use findCommutedOpIndices won't understand that it needs to swap the modifier operands also, so it should fail if they are set. llvm-svn: 220064	2014-10-17 18:00:43 +00:00
Matt Arsenault	c43bde1e40	R600/SI: Cleanup code with ChangeToFPImmediate llvm-svn: 220063	2014-10-17 18:00:41 +00:00
Matt Arsenault	69078d03ff	R600/SI: Allow comuting fp immediates llvm-svn: 220062	2014-10-17 18:00:39 +00:00
Matt Arsenault	082244cff4	R600/SI: Use early return instead of checking condition twice Any commutable instruction will have at least src1. llvm-svn: 220061	2014-10-17 18:00:37 +00:00
Matt Arsenault	0b41e1680b	R600/SI: Use complex pattern for MUBUF load patterns. This eliminates a use of the SI_ADDR64_RSRC pseudo llvm-svn: 220057	2014-10-17 17:43:00 +00:00
Matt Arsenault	3ce8aba254	R600/SI: Remove SI_BUFFER_RSRC pseudo Just use REG_SEQUENCE directly, so there are fewer instructions to need to deal with later. llvm-svn: 220056	2014-10-17 17:42:56 +00:00
Andrea Di Biagio	7d3fa43e58	[X86] Fix missed selection of non-temporal store of zero vector. When the input to a store instruction was a zero vector, the backend always selected a normal vector store regardless of the non-temporal hint. This is fixed by this patch. This fixes PR19370. llvm-svn: 220054	2014-10-17 17:27:06 +00:00
James Molloy	b06648acfb	[AArch64] Fix a silent codegen fault in BUILD_VECTOR lowering. We should be talking about the number of source elements, not the number of destination elements, given we know at this point that the source and dest element numbers are not the same. While we're at it, avoid writing to std::vector::end()... Bug found with random testing and a lot of coffee. llvm-svn: 220051	2014-10-17 17:06:31 +00:00
Bill Schmidt	d3f8b7e4eb	[PowerPC] Enable use of lxvw4x/stxvw4x in VSX code generation Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64 types, but does not yet use lxvw4x and stxvw4x for 4x32 types. This patch adds that support. As with lxvd2x/stxvd2x, this involves straightforward overriding of the patterns normally recognized for lvx/stvx, with preference given to the VSX patterns when VSX is enabled. In addition, the logic for permitting misaligned memory accesses is modified so that v4r32 and v4i32 are treated the same as v2f64 and v2i64 when VSX is enabled. Finally, the DAG generation for unaligned loads is changed to just use a normal LOAD (which will become lxvw4x) on P8 and later hardware, where unaligned loads are preferred over lvsl/lvx/lvx/vperm. A number of tests now generate the VSX loads/stores instead of lvx/stvx, so this patch adds VSX variants to those tests. I've also added <4 x float> tests to the vsx.ll test case, and created a vsx-p8.ll test case to be used for testing code generation for the P8Vector feature. For now, that simply tests the unaligned load/store behavior. This has been tested along with a temporary patch to enable the VSX and P8Vector features, with no new regressions encountered with or without the temporary patch applied. llvm-svn: 220047	2014-10-17 15:13:38 +00:00
Jan Vesely	1949cffd79	Mips: Only set divrem i64 to custom on 64bit Reviewed-by: Daniel Sanders <daniel.sanders@imgtec.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 220046	2014-10-17 14:45:28 +00:00
Vasileios Kalintiris	86aabae168	[mips] Add support for COP1's Branch-On-Cond-Likely instructions Summary: Depends on D5782 Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5802 llvm-svn: 220042	2014-10-17 14:08:28 +00:00
Vasileios Kalintiris	f0e37a4687	[mips] Add support for COP0's Branch-On-Cond-Likely instructions Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5782 llvm-svn: 220036	2014-10-17 12:38:35 +00:00
Akira Hatanaka	1c4225a09b	ARM: Fix a bug which was causing convergence failure in constant-island pass. The bug is in ARMConstantIslands::createNewWater where the upper bound of the new water split point is computed: // This could point off the end of the block if we've already got constant // pool entries following this block; only the last one is in the water list. // Back past any possible branches (allow for a conditional and a maximally // long unconditional). if (BaseInsertOffset + 8 >= UserBBI.postOffset()) { BaseInsertOffset = UserBBI.postOffset() - UPad - 8; DEBUG(dbgs() << format("Move inside block: %#x\n", BaseInsertOffset)); } The split point is supposed to be somewhere between the machine instruction that loads from the constant pool entry and the end of the basic block, before branch instructions. The code above is fine if the basic block is large enough and there are a sufficient number of instructions following the machine instruction. However, if the machine instruction is near the end of the basic block, BaseInsertOffset can point to the machine instruction or another instruction that precedes it, and this can lead to convergence failure. This commit fixes this bug by ensuring BaseInsertOffset is larger than the offset of the instruction following the constant-loading instruction. rdar://problem/18581150 llvm-svn: 220015	2014-10-17 01:31:47 +00:00
Matt Arsenault	cb1558abf3	R600/SI: Simplify debug printing llvm-svn: 219999	2014-10-17 00:36:20 +00:00
Matt Arsenault	f84bb3f382	R600/SI: Remove another VALU pattern llvm-svn: 219988	2014-10-16 23:33:37 +00:00
Robin Morisset	8dc41d55aa	Erase fence insertion from SelectionDAGBuilder.cpp (NFC) Summary: Backends can use setInsertFencesForAtomic to signal to the middle-end that montonic is the only memory ordering they can accept for stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger ordering to fences + monotonic accesses is currently living in SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it for several reasons: - There is lots of redundancy to avoid: extremely similar logic already exists in AtomicExpand. - The current code in SelectionDAGBuilder does not use any target-hooks, it does the same transformation for every backend that requires it - As a result it is plain unsound, as it was apparently designed for ARM. It happens to mostly work for the other targets because they are extremely conservative, but Power for example had to switch to AtomicExpand to be able to use lwsync safely (see r218331). - Because it produces IR-level fences, it cannot be made sound ! This is noted in the C++11 standard (section 29.3, page 1140): ``` Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering semantics. ``` It can also be seen by the following example (called IRIW in the litterature): ``` atomic<int> x = y = 0; int r1, r2, r3, r4; Thread 0: x.store(1); Thread 1: y.store(1); Thread 2: r1 = x.load(); r2 = y.load(); Thread 3: r3 = y.load(); r4 = x.load(); ``` r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst. But if they are lowered to monotonic accesses, no amount of fences can prevent it.. This patch does three things (I could cut it into parts, but then some of them would not be tested/testable, please tell me if you would prefer that): - it provides a default implementation for emitLeadingFence/emitTrailingFence in terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder. As we saw above, this is unsound, but the best that can be done without knowing the targets well (and there is a comment warning about this risk). - it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default implementation (that exactly replicates the logic of SelectionDAGBuilder, so no functional change) - it finally erase this logic from SelectionDAGBuilder as it is dead-code. Ideally, each target would define its own override for emitLeading/TrailingFence using target-specific fences, but I do not know the Sparc/Mips/XCore memory model well enough to do this, and they appear to be dealing fine with the ARM-inspired default expansion for now (probably because they are overly conservative, as Power was). If anyone wants to compile fences more agressively on these platforms, the long comment should make it clear why he should first override emitLeading/TrailingFence. Test Plan: make check-all, no functional change Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5474 llvm-svn: 219957	2014-10-16 20:34:57 +00:00
Matt Arsenault	ecbe5b08b8	R600/SI: Remove unnecessary VALU patterns These haven't been necessary since allowing selecting SALU instructions in non-entry blocks was enabled. llvm-svn: 219956	2014-10-16 20:31:50 +00:00
Matt Arsenault	75125bd463	R600: Fix nonsensical implementation of computeKnownBits for BFE This was resulting in invalid simplifications of sdiv llvm-svn: 219953	2014-10-16 20:07:40 +00:00
Rafael Espindola	253081a9b6	Delete -std-compile-opts. These days -std-compile-opts was just a silly alias for -O3. llvm-svn: 219951	2014-10-16 20:00:02 +00:00
Juergen Ributzka	3743a4c0d2	[AArch64] Fix miscompile of sdiv-by-power-of-2. When the constant divisor was larger than 32bits, then the optimized code generated for the AArch64 backend would emit the wrong code, because the shift was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would loose the upper 32bits. This fixes rdar://problem/18678801. llvm-svn: 219934	2014-10-16 16:41:15 +00:00
Vasileios Kalintiris	5822b72dd9	[mips] Account for endianess when expanding BuildPairF64/ExtractElementF64 nodes. Summary: In order to support big endian targets for the BuildPairF64 nodes we just need to swap the low/high pair registers. Additionally, for the ExtractElementF64 nodes we have to calculate the correct stack offset with respect to the node's register/operand that we want to extract. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5753 llvm-svn: 219931	2014-10-16 15:41:51 +00:00
Vasileios Kalintiris	8d39330886	[mips] Marked the DI/EI instruction aliases as MIPS32r2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5751 llvm-svn: 219927	2014-10-16 15:23:52 +00:00
Vasileios Kalintiris	b62fa9afef	Test commit access: remove extra new line at the end of file llvm-svn: 219925	2014-10-16 14:37:00 +00:00
Matt Arsenault	c79fc2137a	R600: Remove dead function llvm-svn: 219879	2014-10-16 00:08:09 +00:00
Adam Nemet	e7c4f25494	[AVX512] Add DQ subvector inserts In AVX512f we support 64x2 and 32x8 inserts via matching them to 32x4 and 64x4 respectively. These are matched by "Alt" Pat<>'s (Alt stands for alternative VTs). Since DQ has native support for these intructions, I peeled off the non-"Alt" part of the baseclass into vinsert_for_size_no_alt. The DQ instructions are derived from this multiclass. The "Alt" Pat<>'s are disabled with DQ. Fixes <rdar://problem/18426089> llvm-svn: 219874	2014-10-15 23:42:17 +00:00
Adam Nemet	a3b1e47840	[AVX512] Two new attributes in X86VectorVTInfo for subvector insert The new attributes are NumElts and the CD8TupleForm. This prepares the code to enable x8 and x2 inserts. NFC, no change in X86.td.expanded except for the new attributes. llvm-svn: 219871	2014-10-15 23:42:09 +00:00
Adam Nemet	ea53faaf7d	[AVX512] Rename arg from Opcode32/64 to Opcode128/256 in vinsert_for_size It's the W bit that selects between 32 or 64 elt type and not the opcode. The opcode selects between the width of the insert (128 or 256). llvm-svn: 219870	2014-10-15 23:42:04 +00:00
Matt Arsenault	302179d41a	R600: Remove unnecessary part of computeKnownBitsForTargetNode Zero-width BFEs are combined away already, so there's no point in handling them. llvm-svn: 219868	2014-10-15 23:37:49 +00:00
Matt Arsenault	03564ece92	Move variable down to use llvm-svn: 219867	2014-10-15 23:37:42 +00:00
Tom Stellard	3a7f91e430	R600/SI: Fix bug where immediates were being used in DS addr operands The SelectDS1Addr1Offset complex pattern always tries to store constant lds pointers in the offset operand and store a zero value in the addr operand. Since the addr operand does not accept immediates, the zero value needs to first be copied to a register. This newly created zero value will not go through normal instruction selection, so we need to manually insert a V_MOV_B32_e32 in the complex pattern. This bug was hidden by the fact that if there was another zero value in the DAG that had not been selected yet, then the CSE done by the DAG would use the unselected node for the addr operand rather than the one that was just created. This would lead to the zero value being selected and the DAG automatically inserting a V_MOV_B32_e32 instruction. llvm-svn: 219848	2014-10-15 21:08:59 +00:00
Sid Manning	527b1df834	Wrong attribute. LLVM_ATTRIBUTE_UNUSED not LLVM_ATTRIBUTE_USED This original fix for the build break was correct. LLVM_ATTRIBUTE_USED removes the warning message because it keeps the function in the object file. LLVM_ATTRIBUTE_UNUSED indicates that it may or may not be used depending on build settings. llvm-svn: 219846	2014-10-15 20:41:17 +00:00
Sid Manning	31df72e266	Wrong attribute. LLVM_ATTRIBUTE_USED not LLVM_ATTRIBUTE_UNUSED llvm-svn: 219837	2014-10-15 19:32:52 +00:00
Sid Manning	1dc01a7664	Add LLVM_ATTRIBUTE_UNUSED to function currently just used in an assert Fixes break when -Wunused-function is used. llvm-svn: 219833	2014-10-15 19:24:14 +00:00
Juergen Ributzka	d372bea426	Reapply "[FastISel][AArch64] Add custom lowering for GEPs." This is mostly a copy of the existing FastISel GEP code, but we have to duplicate it for AArch64, because otherwise we would bail out even for simple cases. This is because the standard fastEmit functions don't cover MUL at all and ADD is lowered very inefficientily. The original commit had a bug in the add emit logic, which has been fixed. llvm-svn: 219831	2014-10-15 18:58:07 +00:00
Juergen Ributzka	e743ba566b	[FastISel][AArch64] Factor out add with immediate emission into a helper function. NFC. Simplify add with immediate emission by factoring it out into a helper function. llvm-svn: 219830	2014-10-15 18:58:02 +00:00
Sid Manning	477d8386ee	Enable the instruction printer in HexagonMCTargetDesc This adds the MCInstPrinter to the LLVMHexagonDesc library and removes the dependency LLVMHexagonAsmPrinter had on LLVMHexagonDesc. This is a prerequisite needed by the disassembler. Phabricator Revision: http://reviews.llvm.org/D5734 llvm-svn: 219826	2014-10-15 18:27:40 +00:00
Matt Arsenault	9c459727fb	R600/SI: Also try to use 0 base for misaligned 8-byte DS loads. llvm-svn: 219823	2014-10-15 18:06:43 +00:00
Matt Arsenault	1d906ecdac	R600: Fix miscompiles when BFE has multiple uses SimplifyDemandedBits would break the other uses of the operand. llvm-svn: 219819	2014-10-15 17:58:34 +00:00
Rafael Espindola	1dba93c519	Simplify handling of --noexecstack by using getNonexecutableStackSection. llvm-svn: 219799	2014-10-15 16:12:52 +00:00
Rafael Espindola	8c36da38ee	Move getNonexecutableStackSection up to the base ELF class. The .note.GNU-stack section is not SystemZ/X86 specific. llvm-svn: 219796	2014-10-15 15:44:16 +00:00
Matt Arsenault	cb725dcde9	R600: Use existing variable llvm-svn: 219778	2014-10-15 05:07:00 +00:00
Matt Arsenault	04b9e0240c	R600: Remove outdated comment llvm-svn: 219777	2014-10-15 05:06:57 +00:00
Juergen Ributzka	a0f67257e6	Revert "[FastISel][AArch64] Add custom lowering for GEPs." This breaks our internal build bots. Reverting it to get the bots green again. llvm-svn: 219776	2014-10-15 04:55:48 +00:00
Tim Northover	da6d757afb	ARM: drop check for triple that's no longer used. Early attempts to support AAPCS bare metal MachO targets based the decision on the CPU being compiled for. This was not a particularly great idea and we've got a better option now, but this check remained. No functional change for any target we care about. llvm-svn: 219767	2014-10-15 01:05:01 +00:00
Eric Christopher	6ae3b9e3df	Remove unused variable. llvm-svn: 219750	2014-10-15 00:09:07 +00:00
Gerolf Hoflehner	93b11ca136	[AArch64] Wrong CC access in CSINC-conditional branch sequence This is a follow up to commit r219742. It removes the CCInMI variable and accesses the CC in CSCINC directly. In the case of a conditional branch accessing the CC with CCInMI was wrong. llvm-svn: 219748	2014-10-14 23:55:00 +00:00
Gerolf Hoflehner	fbd25ba142	[AAarch64] Optimize CSINC-branch sequence Peephole optimization that generates a single conditional branch for csinc-branch sequences like in the examples below. This is possible when the csinc sets or clears a register based on a condition code and the branch checks that register. Also the condition code may not be modified between the csinc and the original branch. Examples: 1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44 to b.<invCC> 2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44 to b.<CC> rdar://problem/18506500 llvm-svn: 219742	2014-10-14 23:07:53 +00:00
Simon Pilgrim	6a516dfd91	[X86][SSE] pslldq/psrldq shuffle mask decodes Patch to provide shuffle decodes and asm comments for the sse pslldq/psrldq SSE2/AVX2 byte shift instructions. Differential Revision: http://reviews.llvm.org/D5598 llvm-svn: 219738	2014-10-14 22:31:34 +00:00
Tim Northover	a7b041da1f	ARM: remove ARM/Thumb distinction for preferred alignment. Thumb1 has legitimate reasons for preferring 32-bit alignment of types i1/i8/i16, since the 16-bit encoding of "add rD, sp, #imm" requires #imm to be a multiple of 4. However, this is a trade-off betweem code size and RAM usage; the DataLayout string is not the best place to represent it even if desired. So this patch removes the extra Thumb requirements, hopefully making ARM and Thumb completely compatible in this respect. llvm-svn: 219734	2014-10-14 22:12:17 +00:00
Tim Northover	c5e92d5ca6	ARM: allow misaligned local variables in Thumb1 mode. There's no hard requirement on LLVM to align local variable to 32-bits, so the Thumb1 frame handling needs to be able to deal with variables that are only naturally aligned without falling over. llvm-svn: 219733	2014-10-14 22:12:14 +00:00
Juergen Ributzka	d3bf2383f1	[FastISel][AArch64] Add custom lowering for GEPs. This is mostly a copy of the existing FastISel GEP code, but on AArch64 we bail out even for simple cases, because the standard fastEmit functions don't cover MUL and ADD is lowered inefficientily. llvm-svn: 219726	2014-10-14 21:41:23 +00:00
Hans Wennborg	71be459fd4	[x86 asm] allow fwait alias in both At&t and Intel modes (PR21208) Differential Revision: http://reviews.llvm.org/D5741 llvm-svn: 219725	2014-10-14 21:41:17 +00:00
Tim Northover	519637b1b7	ARM: set preferred aggregate alignment to 32 universally. Before, ARM and Thumb mode code had different preferred alignments, which could lead to some rather unexpected results. There's justification for reducing it from the default 64-bits (wasted space), but I don't think there is for going below 32-bits. There's no actual ABI change here, just to reassure people. llvm-svn: 219719	2014-10-14 20:57:26 +00:00
Juergen Ributzka	3aad57b5cd	[FastISel][AArch64] Fix sign-/zero-extend folding when SelectionDAG is involved. Sign-/zero-extend folding depended on the load and the integer extend to be both selected by FastISel. This cannot always be garantueed and SelectionDAG might interfer. This commit adds additonal checks to load and integer extend lowering to catch this. Related to rdar://problem/18495928. llvm-svn: 219716	2014-10-14 20:36:02 +00:00
Jan Vesely	a3b17a8d8b	Reapply "R600: Add new intrinsic to read work dimensions" This effectively reverts revert 219707. After fixing the test to work with new function name format and renamed intrinsic. Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219710	2014-10-14 20:05:26 +00:00
Rafael Espindola	48d2b0a972	Revert "R600: Add new intrinsic to read work dimensions" This reverts commit r219705. CodeGen/R600/work-item-intrinsics.ll was failing on linux. llvm-svn: 219707	2014-10-14 18:58:04 +00:00
Jan Vesely	c97f76d270	R600: Add new intrinsic to read work dimensions v2: Add SI lowering Add test v3: Place work dimensions after the kernel arguments. v4: Calculate offset while lowering arguments v5: rebase v6: change prefix to AMDGPU Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219705	2014-10-14 18:52:07 +00:00
Jan Vesely	a1d9fe15f6	R600: FMA is VecALU only instruction Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219704	2014-10-14 18:52:04 +00:00
Reed Kotler	5d3770d1fc	Finish getting Mips fast-isel to match up with AArch64 fast-isel Summary: In order to facilitate use of common code, checking by reviewers of other fast-isel ports, and hopefully to eventually move most of Mips and other fast-isel ports into target independent code, I've tried to get the two implementations to line up. There is no functional code change. Just methods moved in the file to be in the same order as in AArch64. Test Plan: No functional change. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, aemerson, rfuhler Differential Revision: http://reviews.llvm.org/D5692 llvm-svn: 219703	2014-10-14 18:27:58 +00:00
Matt Arsenault	5c68d89c00	R600/SI: Use DS offsets for constant addresses Use 0 as the base address for a constant address, so if we have a constant address we can save moves and form read2/write2s. llvm-svn: 219698	2014-10-14 17:21:19 +00:00
Robert Khasanov	1896253278	[AVX512] Extended avx512_binop_rm to DQ/VL subsets. Added encoding tests. llvm-svn: 219686	2014-10-14 15:13:56 +00:00
Robert Khasanov	518a523445	[AVX512] Extended avx512_binop_rm to BW/VL subsets. Added encoding tests. llvm-svn: 219685	2014-10-14 14:36:19 +00:00
Bradley Smith	bb04f108aa	[AArch64] Fix crash with empty/pseudo-only blocks in A53 erratum (835769) workaround llvm-svn: 219684	2014-10-14 14:02:41 +00:00
Eric Christopher	de9f191394	Grab the subtarget info off of the MachineFunction rather than indirecting through the TargetMachine. llvm-svn: 219674	2014-10-14 08:44:19 +00:00
Eric Christopher	c65e60f7c8	Use the triple to figure out if this is a darwin target, not the subtarget. llvm-svn: 219673	2014-10-14 08:25:26 +00:00
Hao Liu	6c7f917b92	[AArch64]Select wide immediate offset into [Base+XReg] addressing mode e.g Currently we'll generate following instructions if the immediate is too wide: MOV X0, WideImmediate ADD X1, BaseReg, X0 LDR X2, [X1, 0] Using [Base+XReg] addressing mode can save one ADD as following: MOV X0, WideImmediate LDR X2, [BaseReg, X0] Differential Revision: http://reviews.llvm.org/D5477 llvm-svn: 219665	2014-10-14 06:50:36 +00:00
Eric Christopher	cd8518b9e9	Include map into the A15SDOptimizer rather than pick it up transitively from the DFAPacketizer via TargetInstrInfo.h. llvm-svn: 219652	2014-10-14 01:13:51 +00:00
Eric Christopher	15c10d51e5	Remove the TargetMachine from DFAPacketizer since it was only being used to grab subtarget specific things that we can grab from the MachineFunction anyhow. llvm-svn: 219650	2014-10-14 01:03:16 +00:00
Reed Kotler	c9b7242391	Make first of several changes to bring up to AArch64 fast-isel style Summary: Make Mips fast-isel track the form of AArch64 where practical. This makes it easier for people to review the code, to borrow similar code, and to see how to eventually move a lot of this target code for fast-isels into target independent code. These are just cosmetic changes. Should be no functional difference. Test Plan: make check test-suite for 4 flavors mips32 r1/r2 , -O0/-O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: aemerson, llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D5595 llvm-svn: 219633	2014-10-13 21:46:41 +00:00
Filipe Cabecinhas	2e4a5e341a	Fix a broadcast related regression on the vector shuffle lowering. Summary: Test by Robert Lougher! Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5745 llvm-svn: 219617	2014-10-13 16:16:16 +00:00
Matt Arsenault	df1a3c9ec9	R600/SI: Minor cleanup of function llvm-svn: 219616	2014-10-13 15:47:59 +00:00
Yuri Gorshenin	a2d8cc5558	[asan-asm-instrumentation] Follow-up fixes to r219602: asserts are moved into function. llvm-svn: 219610	2014-10-13 11:44:06 +00:00
Renato Golin	c17a0bcd0e	Adds support for the Cortex-A17 to the ARM backend Patch by Matthew Wahab. llvm-svn: 219606	2014-10-13 10:22:19 +00:00
Bradley Smith	aa602e5e4d	[AArch64] Add workaround for Cortex-A53 erratum (835769) Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is possible for a 64-bit multiply-accumulate instruction in AArch64 state to generate an incorrect result. The details are quite complex and hard to determine statically, since branches in the code may exist in some circumstances, but all cases end with a memory (load, store, or prefetch) instruction followed immediately by the multiply-accumulate operation. The safest work-around for this issue is to make the compiler avoid emitting multiply-accumulate instructions immediately after memory instructions and the simplest way to do this is to insert a NOP. This patch implements such work-around in the backend, enabled via the option -aarch64-fix-cortex-a53-835769. The work-around code generation is not enabled by default. llvm-svn: 219603	2014-10-13 10:12:35 +00:00
Yuri Gorshenin	85aae05168	[asan-asm-instrumentation] Fixed memory references which includes %rsp as a base or an index register. Summary: [asan-asm-instrumentation] Fixed memory references which includes %rsp as a base or an index register. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5599 llvm-svn: 219602	2014-10-13 09:37:47 +00:00
NAKAMURA Takumi	82b729d656	Revert r219584, "[X86] Memory folding for commutative instructions." It broke i686 selfhosting. llvm-svn: 219595	2014-10-13 04:17:34 +00:00
Simon Pilgrim	3b8f17ae65	[X86] Memory folding for commutative instructions. This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning. This mainly helps the stack inliner better fold reloads of 3 (or more) operand instructions (VEX encoded SSE etc.) but by performing this in the lowest foldMemoryOperandImpl implementation it also replaces the X86InstrInfo::optimizeLoadInstr version and is now used by FastISel too. Differential Revision: http://reviews.llvm.org/D5701 llvm-svn: 219584	2014-10-12 10:52:55 +00:00
Simon Pilgrim	4d32502295	Test commit access (email fix) Indentation tidyup. llvm-svn: 219577	2014-10-11 20:28:56 +00:00
Benjamin Kramer	8a11a14550	MC: Bit pack MCSymbolData. On x86_64 this brings it from 80 bytes to 64 bytes. Also make any member variables private and clean up uses to go through the existing accessors. NFC. llvm-svn: 219573	2014-10-11 15:07:21 +00:00
Simon Pilgrim	c0e0dff487	Test commit access Fix comment typo + spelling. llvm-svn: 219572	2014-10-11 14:23:36 +00:00
Reed Kotler	e380f63f36	Add basic conditional branches in mips fast-isel Summary: Implement the most basic form of conditional branches in Mips fast-isel. Test Plan: br1.ll run 4 flavors of test-suite. mips32 r1/r2 and at -O0/O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D5583 llvm-svn: 219556	2014-10-11 00:55:18 +00:00
Matt Arsenault	72c32a25ce	R600/SI: Change how DS offsets are printed Match SC by using offset/offset0/offset1 and printing in decimal. llvm-svn: 219537	2014-10-10 22:16:07 +00:00
Matt Arsenault	e0605cf39b	R600/SI: Match read2/write2 stride 64 versions llvm-svn: 219536	2014-10-10 22:12:32 +00:00
Matt Arsenault	bb7fdce69c	R600/SI: Add load / store machine optimizer pass. Currently this only functions to match simple cases where ds_read2_* / ds_write2_* instructions can be used. In the future it might match some of the other weird load patterns, such as direct to LDS loads. Currently enabled only with a subtarget feature to enable easier testing. llvm-svn: 219533	2014-10-10 22:01:59 +00:00
Chandler Carruth	35cad92130	[mips] Actually mark that the default case is unreachable as this switch is over a subset of condition codes. This fixes the -Werror build which warns about use of uninitialized variables in the default case. llvm-svn: 219531	2014-10-10 21:07:03 +00:00
Reed Kotler	3fd5f752d3	Implement floating point compare for mips fast-isel Summary: Expand SelectCmp to handle floating point compare Test Plan: fpcmpa.ll run 4 flavors of test-suite, mips32 r1/r2 O0/O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D5567 llvm-svn: 219530	2014-10-10 20:46:28 +00:00
Matt Arsenault	e374d07100	R600/SI: Disable copying of SCC llvm-svn: 219519	2014-10-10 17:44:47 +00:00
Reed Kotler	766ffcd31c	implement integer compare in mips fast-isel Summary: implement SelectCmp (integer compare ) in mips fast-isel Test Plan: icmpa.ll also ran 4 test-suite flavors mips32 r1/r2 O0/O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, rfuhler, mcrosier Differential Revision: http://reviews.llvm.org/D5566 llvm-svn: 219518	2014-10-10 17:39:51 +00:00
Bill Schmidt	87ba7a67bb	[PowerPC] Reduce names from Power8Vector to P8Vector Per Hal Finkel's review, improving typability of some variable names. llvm-svn: 219514	2014-10-10 17:21:15 +00:00
Reed Kotler	7c9f20a7a1	Implement floating point to integer conversion in mips fast-isel Summary: Add the ability to convert 64 or 32 bit floating point values to integer in mips fast-isel Test Plan: fpintconv.ll ran 4 flavors of test-suite with no errors, misp32 r1/r2 O0/O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, rfuhler, mcrosier Differential Revision: http://reviews.llvm.org/D5562 llvm-svn: 219511	2014-10-10 17:00:46 +00:00
Benjamin Kramer	f35a067b43	Reduce double set lookups. NFC. llvm-svn: 219505	2014-10-10 15:32:50 +00:00
Bill Schmidt	581893751d	[PowerPC] Add feature for Power8 vector extensions The current VSX feature for PowerPC specifies availability of the VSX instructions added with the 2.06 architecture version. With 2.07, the architecture adds new instructions to both the Category:Vector and Category:VSX instruction sets. Additionally, unaligned vector storage operations have improved performance. This patch adds a feature to provide access to the new instructions and performance capabilities of Power8. For compatibility with GCC, the feature is controlled via a new -mpower8-vector switch, and the feature causes the __POWER8_VECTOR__ builtin define to be generated by the preprocessor. There is a companion patch for cfe being committed at the same time. llvm-svn: 219501	2014-10-10 15:09:28 +00:00
Zoran Jovanovic	d8488142ad	[mips][microMIPS] Implement ADDIUSP instruction Differential Revision: http://reviews.llvm.org/D5084 llvm-svn: 219500	2014-10-10 14:37:30 +00:00
Zoran Jovanovic	437a433c43	[mips][microMIPS] Implement JR16 instruction Differential Revision: http://reviews.llvm.org/D5062 llvm-svn: 219498	2014-10-10 14:02:44 +00:00
Zoran Jovanovic	454a101d3d	[mips][microMIPS] Implement ADDIUS5 instruction Differential Revision: http://reviews.llvm.org/D5049 llvm-svn: 219495	2014-10-10 13:45:34 +00:00
Zoran Jovanovic	d436e06cb4	ps][microMIPS] Implement JRC instruction Differential Revision: http://reviews.llvm.org/D5045 llvm-svn: 219494	2014-10-10 13:31:18 +00:00
Zoran Jovanovic	b5ed47d1e5	[mips][microMIPS] Implement JALRS16 instruction Differential Revision: http://reviews.llvm.org/D5027 llvm-svn: 219493	2014-10-10 13:22:28 +00:00
Chandler Carruth	3f4f07b016	Don't use an unqualified 'abs' function call with a builtin type. This is dangerous for numerous reasons. The primary risk here is with floating point or double types where if the wrong header files are included in a strange order this can implicitly convert to integers and then call the C abs function on the integers. There is a secondary risk that even impacts integers where if the namespace the code is written in ever defines an abs overload for types within that namespace the global abs will be hidden. The correct form is to call std::abs or write 'using std::abs' for builtin types (and only the latter is correct in any generic context). I've also added the requisite header to be a bit more explicit here. llvm-svn: 219484	2014-10-10 08:27:19 +00:00
Samuel Antao	83b3411742	Fix bug in GPR to FPR moves in PPC64LE. The current implementation of GPR->FPR register moves uses a stack slot. This mechanism writes a double word and reads a word. In big-endian the load address must be displaced by 4-bytes in order to get the right value. In little endian this is no longer required. This patch fixes the issue and adds LE regression tests to fast-isel-conversion which currently expose this problem. llvm-svn: 219441	2014-10-09 20:42:56 +00:00
Benjamin Kramer	4edebdbee3	Remove a compiler bug workaround from 2007. The affected versions of gcc are long gone. NFC. llvm-svn: 219433	2014-10-09 19:50:39 +00:00
Matt Arsenault	15f502932e	Fix typo llvm-svn: 219429	2014-10-09 19:15:15 +00:00
Tom Stellard	7496c3d0fc	R600/SI: Legalize CopyToReg during instruction selection The instruction emitter will crash if it encounters a CopyToReg node with a non-register operand like FrameIndex. llvm-svn: 219428	2014-10-09 19:06:00 +00:00
Lang Hames	e0ce993042	[PBQP] Add missing headers from r219421. llvm-svn: 219425	2014-10-09 18:36:59 +00:00
Lang Hames	1ac2927c37	[PBQP] Replace PBQPBuilder with composable constraints (PBQPRAConstraint). This patch removes the PBQPBuilder class and its subclasses and replaces them with a composable constraints class: PBQPRAConstraint. This allows constraints that are only required for optimisation (e.g. coalescing, soft pairing) to be mixed and matched. This patch also introduces support for target writers to supply custom constraints for their targets by overriding a TargetSubtargetInfo method: std::unique_ptr<PBQPRAConstraints> getCustomPBQPConstraints() const; This patch should have no effect on allocations. llvm-svn: 219421	2014-10-09 18:20:51 +00:00
Tom Stellard	ffb5a36502	R600/SI: Legalize INSERT_SUBREG instructions during PostISelFolding LLVM assumes INSERT_SUBREG will always have register operands, so we need to legalize non-register operands, like FrameIndexes, to avoid random assertion failures. llvm-svn: 219420	2014-10-09 18:09:15 +00:00
Bill Schmidt	ddf2d00a6b	[PPC64] VSX indexed-form loads use wrong instruction format The VSX instruction definitions for lxsdx, lxvd2x, lxvdsx, and lxvw4x incorrectly use the XForm_1 instruction format, rather than the XX1Form instruction format. This is likely a pasto when creating these instructions, which were based on lvx and so forth. This patch uses the correct format. The existing reformatting test (test/MC/PowerPC/vsx.s) missed this because the two formats differ only in that XX1Form has an extension to the target register field in bit 31. The tests for these instructions used a target register of 7, so the default of 0 in bit 31 for XForm_1 didn't expose a problem. For register numbers 32-63 this would be noticeable. I've changed the test to use higher register numbers to verify my change is effective. llvm-svn: 219416	2014-10-09 17:51:35 +00:00
Kevin Qin	33566b5879	[AArch64] Enable partial & runtime unrolling on cortex-a57. llvm-svn: 219401	2014-10-09 10:13:27 +00:00
Robert Khasanov	625ba0e53e	[AVX512] Extended avx512_binop_rm for AVX512VL subsets. Added avx512_binop_rm_vl multiclass for VL subset Added encoding tests llvm-svn: 219390	2014-10-09 08:38:48 +00:00
Bob Wilson	c80c9c8124	Use triple's isiOS() and isOSDarwin() methods. These methods are already used in lots of places. This makes things more consistent. NFC. llvm-svn: 219386	2014-10-09 05:43:30 +00:00
Eric Christopher	f9e1101078	Remove unused argument to CreateTargetScheduleState and change the TargetMachine to a TargetSubtargetInfo since everything we wanted is off of that. llvm-svn: 219382	2014-10-09 01:59:35 +00:00
Adam Nemet	248c8e9281	[AVX512] Rename AVX512_masking* to AVX512_maskable* No functional change. This is the current AVX512_maskable multiclass hierarchy: maskable_custom / \ / \ maskable_common maskable_in_asm / \ / \ maskable maskable_3src llvm-svn: 219363	2014-10-08 23:25:39 +00:00
Adam Nemet	9fae2c02a0	[AVX512] Intrinsics for vextract*x4 This adds the Pat<>'s for the intrinsics. These are necessary because we don't lower these intrinsics to SDNodes but match them directly. See the rational in the previous commit. llvm-svn: 219362	2014-10-08 23:25:37 +00:00
Adam Nemet	671fc00888	[AVX512] Add asm-only support for vextractx4 masking variants These derive from the new asm-only masking definitions. Unfortunately I wasn't able to find a ISel pattern that we could legally generate for the masking variants. The problem is that since the destination is v4 we would need VK4 register classes and v4i1 value types to express the masking. These are however not legal types/classes in AVX512f but only in VL, so things get complicated pretty quickly. We can revisit this question later if we have a more pressing need to express something like this. So the ISel patterns are empty for the masking instructions and the next patch will add Pat<>s instead to match the intrinsics calls with instructions. llvm-svn: 219361	2014-10-08 23:25:33 +00:00
Adam Nemet	701fdeb2f8	[AVX512] Move DAG for all-zero node to X86VectorVTInfo No functional change. No change in X86.td.expanded except for the appearance of the new attributes. The new attributes will be used in the subsequent patch. llvm-svn: 219360	2014-10-08 23:25:31 +00:00
Adam Nemet	81fdcfd475	[AVX512] Peel off an asm-only class from AVX512_masking_common. No functional change. This enables the generation of masking instructions that don't provide a ISel pattern. llvm-svn: 219358	2014-10-08 23:25:23 +00:00
Robin Morisset	c53597d7c5	[X86] Don't transform atomic-load-add into an inc/dec when inc/dec is slow llvm-svn: 219357	2014-10-08 23:16:23 +00:00
Robin Morisset	2f2a857e2b	[X86] Avoid generating inc/dec when slow for x.atomic_store(1 + x.atomic_load()) Summary: I had forgotten to check for NotSlowIncDec in the patterns that can generate inc/dec for the above pattern (added in D4796). This currently applies to Atom Silvermont, KNL and SKX. Test Plan: New checks on atomic_mi.ll Reviewers: jfb, nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5677 llvm-svn: 219336	2014-10-08 19:38:18 +00:00
Robert Khasanov	6a848d2b84	[AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VPCMP/VPCMPU{BWDQ} Added CMP_MASK_CC intrinsic type. Added tests for intrinsics. Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com> llvm-svn: 219316	2014-10-08 15:49:26 +00:00
Robert Khasanov	643ca10b43	[AVX512] Refactoring of avx512_binop_rm multiclass through AVX512_masking. Added new argrument for AVX512_masking: InstrItinClass and bit isCommutable. No functional change. llvm-svn: 219310	2014-10-08 14:37:45 +00:00
Renato Golin	23c290356f	Emit unaligned access build attribute for ARM Patch by Charlie Turner. llvm-svn: 219301	2014-10-08 12:26:22 +00:00
Renato Golin	89819e0935	Refactor isThumb1Only() && isMClass() into a predicate called isV6M() This must be enforced for all v6M cores, not just the cortex-m0, irregardless of the user-specified alignment. Patch by Charlie Turner. llvm-svn: 219300	2014-10-08 12:26:16 +00:00
Renato Golin	3465effae4	Simplify switch statement in ARM subtarget align access This switch can be reduced to a simpler if/else statement. Patch by Charlie Turner. llvm-svn: 219299	2014-10-08 12:26:13 +00:00
Eric Christopher	ce3f63df4d	Cache TargetLowering on SelectionDAGISel and update previous calls to getTargetLowering() with the cached variable. llvm-svn: 219284	2014-10-08 07:32:17 +00:00
Chad Rosier	75e17097bb	[AArch64] Generate vector signed/unsigned mul and mla/mls long. Phabricator Revision: http://reviews.llvm.org/D5589 Patch by Balaram Makam <bmakam@codeaurora.org>!! llvm-svn: 219276	2014-10-08 02:31:24 +00:00
Robin Morisset	a701fac7bc	[X86] Fix a bug with fetch_add(INT32_MIN) Summary: Fix pr21099 The pseudocode of what we were doing (spread through two functions) was: if (operand.doesNotFitIn32Bits()) Opc.initializeWithFoo(); if (operand < 0) operand = -operand; if (operand.doesFitIn8Bits()) Opc.initializeWithBar(); else if (operand.doesFitIn32Bits()) Opc.initializeWithBlah(); doStuff(Opc); So for operand == INT32_MIN, Opc was never initialized because the operand changes from fitting in 32 bits to not fitting, causing the various bugs/error messages noted by pr21099. This patch adds an extra test at the beginning for this case, and an llvm_unreachable to have better error message if the operand ends up not fitting in 32-bits at the end. Test Plan: new test + make check Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5655 llvm-svn: 219257	2014-10-07 23:53:57 +00:00
Tom Stellard	27b55923bb	R600/SI: Refactor VOP3 instruction defs llvm-svn: 219256	2014-10-07 23:51:41 +00:00
Tom Stellard	55377634e4	R600/SI: Refactor VOPC instruction defs llvm-svn: 219255	2014-10-07 23:51:39 +00:00
Tom Stellard	3b3e07ea49	R600/SI: Refactor VOP2 instruction defs llvm-svn: 219254	2014-10-07 23:51:38 +00:00
Tom Stellard	7f424810c6	R600/SI: Refactor VOP1 instruction defs llvm-svn: 219253	2014-10-07 23:51:34 +00:00
Matt Arsenault	78b9619d35	R600: Remove dead code llvm-svn: 219242	2014-10-07 21:29:56 +00:00
Tom Stellard	d5474640ef	R600: Remove some redundant initializations from AMDGPUMCAsmInfo llvm-svn: 219238	2014-10-07 21:09:25 +00:00
Tom Stellard	465af285b8	R600: Use MCAsmInfoELF as AMDGPUMCAsmInfo base class The main reason for this is that the MCAsmInfo class, which we were previously using as the base class, sets PrivateGlobalPrefix to "L", which causes all global functions that start with L to be treated as local symbols. MCAsmInfoELF sets PrivateGlobalPrefix to ".L", which is what we want, and it is probably a good idea to use this as the base class anyway, since we are emitting ELF binaries. llvm-svn: 219237	2014-10-07 21:09:23 +00:00
Tom Stellard	733db75950	R600/SI: Remove assertion in SIInstrInfo::areLoadsFromSameBasePtr() Added a FIXME coment instead, we need to handle the case where the two DS instructions being compared have different numbers of operands. llvm-svn: 219236	2014-10-07 21:09:20 +00:00
Yuri Gorshenin	e2f4949ea2	[asan-asm-instrumentation] CFI directives are generated for .S files. Summary: CFI directives are generated for .S files. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5520 llvm-svn: 219199	2014-10-07 11:03:09 +00:00
Daniel Sanders	1dd7a136d4	[mips] Return {f128} correctly for N32/N64. Summary: According to the ABI documentation, f128 and {f128} should both be returned in $f0 and $f2. However, this doesn't match GCC's behaviour which is to return f128 in $f0 and $f2, but {f128} in $f0 and $f1. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5578 llvm-svn: 219196	2014-10-07 09:29:59 +00:00
Craig Topper	645553b9fd	[X86] Fix a bug where the disassembler was ignoring the VEX.W bit in 32-bit mode for certain instructions it shouldn't. Unfortunately, this isn't easy to fix since there's no simple way to figure out from the disassembler tables whether the W-bit is being used to select a 64-bit GPR or if its a required part of the opcode. The fix implemented here just looks for "64" in the instruction name and ignores the W-bit in 32-bit mode if its present. Fixes PR21169. llvm-svn: 219194	2014-10-07 07:29:50 +00:00
Craig Topper	0d4a8e6f6c	Formatting fixes. Most putting 'else' on the same line as the preceding curly brace. llvm-svn: 219193	2014-10-07 07:29:48 +00:00
Craig Topper	114c41d35c	Fix filename in header and use C++ version of the C header files. llvm-svn: 219192	2014-10-07 07:29:46 +00:00
Juergen Ributzka	06c32d25cd	[FastISel][AArch64] Teach the address computation code to also fold sign-/zero-extends. The code already folds sign-/zero-extends, but only if they are arguments to mul and shift instructions. This extends the code to also fold them when they are direct inputs. llvm-svn: 219187	2014-10-07 03:40:06 +00:00
Juergen Ributzka	6619d4d4b3	[FastISel][AArch64] Teach the address computation to also fold sub instructions. Tiny enhancement to the address computation code to also fold sub instructions if the rhs is constant and can be folded into the offset. llvm-svn: 219186	2014-10-07 03:40:03 +00:00
Juergen Ributzka	d396480465	[FastISel][AArch64] Fix "Fold sign-/zero-extends into the load instruction." This commit fixes an issue with sign-/zero-extending loads that was discovered by Richard Barton. We use now the correct load instructions for sign-extending loads to 64bit. Also updated and added more unit tests. llvm-svn: 219185	2014-10-07 03:39:59 +00:00
NAKAMURA Takumi	0500a87f46	ARMInstPrinter.cpp: Suppress a warning for -Asserts. [-Wunused-variable] llvm-svn: 219172	2014-10-06 23:48:04 +00:00
Tim Northover	8dbc8e49a9	ARM: silence unused variable warning llvm-svn: 219128	2014-10-06 17:26:36 +00:00
Tim Northover	57191702fc	ARM: remove dead InstPrinting code This instruction form is handled by different AsmOperands now, so the code is completely dead (and wrong anyway). llvm-svn: 219127	2014-10-06 17:10:13 +00:00
Benjamin Kramer	55428105eb	X86: Drop the isConvertibleTo3Addr bit from shufps/shufpd now that we don't convert them anymore. llvm-svn: 219112	2014-10-06 09:56:40 +00:00
Eric Christopher	faca264c55	Add subtarget caches to aarch64, arm, ppc, and x86. These will make it easier to test further changes to the code generation and optimization pipelines as those are moved to subtargets initialized with target feature and target cpu. llvm-svn: 219106	2014-10-06 06:45:36 +00:00
Chandler Carruth	88628181c0	[x86] Remove the 2-addr-to-3-addr "optimization" from shufps to pshufd. This trades a (register-renamer-friendly) movaps for a floating point / integer domain cross. That is a very bad trade, even on architectures where domain crossing is relatively fast. On any chip where there is even a cycle stall, this is a Very Bad Idea. It doesn't even seem likely to cause a spill to be introduced because the reason for the copy is to destructively shuffle in place. Thanks to Ben Kramer for fixing a bug in this code that my new shuffle lowering exposed and highlighting that perhaps it should just go away. =] llvm-svn: 219090	2014-10-05 22:57:31 +00:00
Benjamin Kramer	fac673d6bc	X86: Don't drop half of the mask when converting 2-address shufps into 3-address pshufd. It's debatable whether this transform is useful at all, but for now make sure we don't generate invalid asm. llvm-svn: 219084	2014-10-05 16:14:29 +00:00
Elena Demikhovsky	0be5f4deeb	AVX-512-SKX: Added instruction VPMOVM2B/W/D/Q. This instruction allows to broadacst mask vector to data vector. llvm-svn: 219083	2014-10-05 14:11:08 +00:00
Chandler Carruth	aa7f8c811b	[x86] Fix PR21139, one of the last remaining regressions found in the new vector shuffle lowering. This is loosely based on a patch by Marius Wachtler to the PR (thanks!). I refactored it a bi to use std::count_if and a mutable array ref but the core idea was exactly right. I also added some direct testing of this case. I believe PR21137 is now the only remaining regression. llvm-svn: 219081	2014-10-05 12:07:34 +00:00
Chandler Carruth	b8978f2ab2	[x86] Teach the new vector shuffle lowering how to lower 128-bit shuffles using AVX and AVX2 instructions. This fixes PR21138, one of the few remaining regressions impacting benchmarks from the new vector shuffle lowering. You may note that it "regresses" many of the vperm2x128 test cases -- these were actually "improved" by the naive lowering that the new shuffle lowering previously did. This regression gave me fits. I had this patch ready-to-go about an hour after flipping the switch but wasn't sure how to have the best of both worlds here and thought the correct solution might be a completely different approach to lowering these vector shuffles. I'm now convinced this is the correct lowering and the missed optimizations shown in vperm2x128 are actually due to missing target-independent DAG combines. I've even written most of the needed DAG combine and will submit it shortly, but this part is ready and should help some real-world benchmarks out. llvm-svn: 219079	2014-10-05 11:41:36 +00:00
NAKAMURA Takumi	242f8cc95b	HexagonMCCodeEmitter.cpp: Prune 2nd redundant \brief. [-Wdocumentation] llvm-svn: 219073	2014-10-05 04:54:54 +00:00
NAKAMURA Takumi	ad3b39eff9	HexagonDesc: Update LLVMBuild.txt. llvm-svn: 219071	2014-10-05 04:54:29 +00:00
Benjamin Kramer	a961a36aad	[SystemZ] Make operator bool explicit. NFC. llvm-svn: 219069	2014-10-04 22:44:35 +00:00
Benjamin Kramer	860521c88b	Make AAMDNodes ctor and operator bool (!!!) explicit, mop up bugs and weirdness exposed by it. llvm-svn: 219068	2014-10-04 22:44:29 +00:00
Benjamin Kramer	7db3ef45b9	Remove unnecessary copying or replace it with moves in a bunch of places. NFC. llvm-svn: 219061	2014-10-04 16:55:56 +00:00
Chandler Carruth	5063f25595	[x86] Enable the new vector shuffle lowering by default. Update the entire regression test suite for the new shuffles. Remove most of the old testing which was devoted to the old shuffle lowering path and is no longer relevant really. Also remove a few other random tests that only really exercised shuffles and only incidently or without any interesting aspects to them. Benchmarking that I have done shows a few small regressions with this on LNT, zero measurable regressions on real, large applications, and for several benchmarks where the loop vectorizer fires in the hot path it shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy Bridge machines. Running on AMD machines shows even more dramatic improvements. When using newer ISA vector extensions the gains are much more modest, but the code is still better on the whole. There are a few regressions being tracked (PR21137, PR21138, PR21139) but by and large this is expected to be a win for x86 generated code performance. It is also more correct than the code it replaces. I have fuzz tested this extensively with ISA extensions up through AVX2 and found no crashes or miscompiles (yet...). The old lowering had a few miscompiles and crashers after a somewhat smaller amount of fuzz testing. There is one significant area where the new code path lags behind and that is in AVX-512 support. However, there was extremely little support for that already and so this isn't a significant step backwards and the new framework will probably make it easier to implement lowering that uses the full power of AVX-512's table-based shuffle+blend (IMO). Many thanks to Quentin, Andrea, Robert, and others for benchmarking assistance. Thanks to Adam and others for help with AVX-512. Thanks to Hal, Eric, and many others for answering my incessant questions about how the backend actually works. =] I will leave the old code path in the tree until the 3 PRs above are at least resolved to folks' satisfaction. Then I will rip it (and 1000s of lines of code) out. =] I don't expect this flag to stay around for very long. It may not survive next week. llvm-svn: 219046	2014-10-04 03:52:55 +00:00
Jingyue Wu	4a186967a9	Add fake use to suppress defined-but-unused warnings llvm-svn: 219045	2014-10-04 03:50:10 +00:00
Chandler Carruth	7001fb9ace	[x86] Fix a bug in the VZEXT DAG combine that I just made more powerful. It turns out this combine was always somewhat flawed -- there are cases where nested VZEXT nodes can't be combined: if their types have a mismatch that can be observed in the result. While none of these show up in currently, once I switch to the new vector shuffle lowering a few test cases actually form such nested VZEXT nodes. I've not come up with any IR pattern that I can sensible write to exercise this, but it will be covered by tests once I flip the switch. llvm-svn: 219044	2014-10-04 02:51:03 +00:00
Chandler Carruth	b73b4f12a1	[x86] Sink a generic combine of VZEXT nodes from the lowering to VZEXT nodes to the DAG combining of them. This will allow the combine to fire on both old vector shuffle lowering and the new vector shuffle lowering and generally seems like a cleaner design. I've trimmed down the code a bit and tried to make it and the surrounding combine fairly clean while moving it around. llvm-svn: 219042	2014-10-04 01:05:48 +00:00
Matt Arsenault	c421684bad	R600/SI: Custom lower f64 -> i64 conversions llvm-svn: 219038	2014-10-03 23:54:56 +00:00
Matt Arsenault	7b24655980	R600: Custom lower [s\|u]int_to_fp for i64 -> f64 llvm-svn: 219037	2014-10-03 23:54:41 +00:00
Matt Arsenault	2456242394	R600/SI: Fix ftrunc f64 conformance failures. Re-add the tests since they were deleted at some point llvm-svn: 219036	2014-10-03 23:54:27 +00:00
Chandler Carruth	74c4b81b56	[x86] Add a really preposterous number of patterns for matching all of the various ways in which blends can be used to do vector element insertion for lowering with the scalar math instruction forms that effectively re-blend with the high elements after performing the operation. This then allows me to bail on the element insertion lowering path when we have SSE4.1 and are going to be doing a normal blend, which in turn restores the last of the blends lost from the new vector shuffle lowering when I got it to prioritize insertion in other cases (for example when we don't have a blend instruction). Without the patterns, using blends here would have regressed sse-scalar-fp-arith.ll completely with the new vector shuffle lowering. For completeness, I've added RUN-lines with the new lowering here. This is somewhat superfluous as I'm about to flip the default, but hey, it shows that this actually significantly changed behavior. The patterns I've added are just ridiculously repetative. Suggestions on making them better very much welcome. In particular, handling the commuted form of the v2f64 patterns is somewhat obnoxious. llvm-svn: 219033	2014-10-03 22:43:17 +00:00
Chandler Carruth	13d884e744	[x86] Adjust the patterns for lowering X86vzmovl nodes which don't perform a load to use blendps rather than movss when it is available. For non-loads, blendps is much faster. It can execute on two ports in Sandy Bridge and Ivy Bridge, and three ports on Haswell. This fixes one of the "regressions" from aggressively taking the "insertion" path in the new vector shuffle lowering. This does highlight one problem with blendps -- it isn't commuted as heavily as it should be. That's future work though. llvm-svn: 219022	2014-10-03 21:38:49 +00:00
Richard Smith	e3953b4126	PR21145: Teach LLVM about C++14 sized deallocation functions. C++14 adds new builtin signatures for 'operator delete'. This change allows new/delete pairs to be removed in C++14 onwards, as they were in C++11 and before. llvm-svn: 219014	2014-10-03 20:17:06 +00:00
Adam Nemet	3fe531df7b	[ISel] Keep matching state consistent when folding during X86 address match In the X86 backend, matching an address is initiated by the 'addr' complex pattern and its friends. During this process we may reassociate and-of-shift into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the shift into the scale of the address. However as demonstrated by the testcase, this can trigger CSE of not only the shift and the AND which the code is prepared for but also the underlying load node. In the testcase this node is sitting in the RecordedNode and MatchScope data structures of the matcher and becomes a deleted node upon CSE. Returning from the complex pattern function, we try to access it again hitting an assert because the node is no longer a load even though this was checked before. Now obviously changing the DAG this late is bending the rules but I think it makes sense somewhat. Outside of addresses we prefer and-of-shift because it may lead to smaller immediates (FoldMaskAndShiftToScale is an even better example because it create a non-canonical node). We currently don't recognize addresses during DAGCombiner where arguably this canonicalization should be performed. On the other hand, having this in the matcher allows us to cover all the cases where an address can be used in an instruction. I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible for initiating the recursive CSE on users (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it is not strictly necessary since the shift is hooked into the visited user. Of course it's safer to keep the DAG consistent at all times (e.g. for accurate number of uses, etc.). So rather than changing the fundamentals, I've decided to continue along the previous patches and detect the CSE. This patch installs a very targeted DAGUpdateListener for the duration of a complex-pattern match and updates the matching state accordingly. (Previous patches used HandleSDNode to detect the CSE but that's not practical here). The listener is only installed on X86. I tested that there is no measurable overhead due to this while running through the spec2k BC files with llc. The only thing we pay for is the creation of the listener. The callback never ever triggers in spec2k since this is a corner case. Fixes rdar://problem/18206171 llvm-svn: 219009	2014-10-03 20:00:34 +00:00
Tom Stellard	e64393bd73	R600: Align functions to 256 bytes llvm-svn: 219002	2014-10-03 19:02:02 +00:00
Benjamin Kramer	4c9fb3d669	Eliminate some deep std::vector copies. NFC. llvm-svn: 218999	2014-10-03 18:33:16 +00:00
Robin Morisset	95772cea0c	[Power] Use lwsync for non-seq_cst fences Summary: hwsync is only required for seq_cst fences, acquire and release one can use the cheaper lwsync. Test Plan: Added some cases to atomics.ll + make check-all Reviewers: jfb, wschmidt Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5317 llvm-svn: 218995	2014-10-03 18:04:36 +00:00
Hans Wennborg	56b634e6ac	MipsAsmParser.cpp: fix VS2012 build llvm-svn: 218991	2014-10-03 17:16:24 +00:00
Hans Wennborg	52ee146fd6	HexagonMCCodeEmitter.h: deleted member functions are not supported in VS2012 llvm-svn: 218990	2014-10-03 17:02:28 +00:00
Daniel Sanders	208d0fa4ef	[mips] Print warning when using register names not available in N32/64 Summary: The register names t4-t7 are not available in the N32 and N64 ABIs. This patch prints a warning, when those names are used in N32/64, along with a fix-it with the correct register names. Patch by Vasileios Kalintiris Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5272 llvm-svn: 218989	2014-10-03 15:37:37 +00:00
Sid Manning	d00c41c965	Fix build break on Hexagon Differential Revision: http://reviews.llvm.org/D5600 llvm-svn: 218987	2014-10-03 13:59:01 +00:00
Sid Manning	4435e352ae	Adding skeleton for unit testing Hexagon Code Emission Adding and modifying CMakeLists.txt files to run unit tests under unittests/Target/* if the directory exists. Adding basic unit test to check that code emitter object can be retrieved. Differential Revision: http://reviews.llvm.org/D5523 Change by: Colin LeMahieu llvm-svn: 218986	2014-10-03 13:18:11 +00:00
Chandler Carruth	72b6e493f2	[x86] Teach the new vector shuffle lowering to aggressively form MOVSS and MOVSD nodes for single element vector inserts. This is particularly important because a number of patterns in the backend detect these patterns and leverage them to simplify things. It also fixes quite a few of the insertion bad code examples. However, it regresses a specific area: when available, blendps and blendpd are dramatically faster than movss and movsd respectively. But it doesn't really work to form the blend logic first because the blends aren't as crazy efficient when the data is coming from memory anyways, and thus will have a movss or movsd regardless. Also, doing that would block a bunch of the patterns that this is designed to hit. So my plan is to go into the patterns for lowering MOVSS and MOVSD and lower them via blends when available. However that's a pretty invasive restructuring so it will need to be a follow-up patch. I have already gone into the patterns to lower MOVSS and MOVSD from memory using MOVLPD, etc. Without that, several of the test cases I already have regress. llvm-svn: 218985	2014-10-03 13:11:13 +00:00
Renato Golin	aea9ec6761	Revert 202433 - Provide a target override for the latest regalloc heuristic That commit was introduced in order to help investigate a problem in ARM codegen breaking from commit 202304 (Add a limit to the heuristic that register allocates instructions in local order). Recent analisys indicated that the problem no longer exists, so I'm reverting this change. See PR18996. llvm-svn: 218981	2014-10-03 12:20:53 +00:00
Chandler Carruth	8fe3dae595	[x86] Refactor the element insertion logic in the new vector shuffle lowering to handle the potential mirroring of 2-element vectors (because we can't reliably sort them one way) in the caller rather than in the insertion logic. This will simplify things considerably as more ways to fail to match the insertion are added because now we have a nice try and retry point. llvm-svn: 218980	2014-10-03 12:01:55 +00:00
Chandler Carruth	4c00e4aeb0	[x86] Significantly improve the ability of the new vector shuffle lowering to match VZEXT_MOVL patterns. I hadn't realized that these had sufficient pattern smarts in the backend to lower zext-ing from the low element of a vector without it being a scalar_to_vector node. They do, and this is how to match a bunch of patterns for movq, movss, etc. There is a weird propensity to end up using pshufd to place the element afterward even though it means domain crossing (or rather, to use xorps+movss to zext the element rather than movq) but that's an orthogonal problem with VZEXT_MOVL that someone should probably look at. llvm-svn: 218977	2014-10-03 11:25:58 +00:00
Chandler Carruth	9f743353a1	[x86] Unbreak SSE1 with the new vector shuffle lowering. We can't widen element types to form illegal vector types. I've added a special SSE1 test case here that makes sure we don't break this going forward. llvm-svn: 218974	2014-10-03 10:11:39 +00:00
Eric Christopher	47c04156aa	constify TargetMachine parameter. llvm-svn: 218934	2014-10-03 00:42:41 +00:00
Eric Christopher	2cffcde1ca	constify TargetMachine argument. llvm-svn: 218930	2014-10-03 00:17:59 +00:00
Eric Christopher	f48003a13a	We can grab the options struct from the TargetMachine, no need to pass it down in the constructor. llvm-svn: 218929	2014-10-03 00:10:03 +00:00
Adam Nemet	9a91f09952	[AVX512] Pull pattern for subvector insert into the instruction definition No functional change intended. Very similar to the change I made for subvector extract in r218480. test/CodeGen/X86/avx512-insert-extract.ll covers this. llvm-svn: 218928	2014-10-02 23:18:30 +00:00
Adam Nemet	bcff277351	[AVX512] Refactor subvector inserts No functional change. Very similar to the extract refactoring I did in r218478. Compared X86.td.expanded before and after. llvm-svn: 218927	2014-10-02 23:18:28 +00:00
Adam Nemet	36d7986f4b	[AVX512] Fix i256mem->f256mem typo in VINSERTF64x4rm Just like in the case of extracts, the refactoring is uncovering some typos in the code. llvm-svn: 218926	2014-10-02 23:18:26 +00:00
Hal Finkel	2093a3cb26	[PowerPC] Modern Book-E cores support sync Older Book-E cores, such as the PPC 440, support only msync (which has the same encoding as sync 0), but not any of the other sync forms. Newer Book-E cores, however, do support sync, and for performance reasons we should allow the use of the more-general form. This refactors msync use into its own feature group so that it applies by default only to older Book-E cores (of the relevant cores, we only have definitions for the PPC440/450 currently). llvm-svn: 218923	2014-10-02 22:34:22 +00:00
Robin Morisset	8895df3e75	[Power] Improve the expansion of atomic loads/stores Summary: Atomic loads and store of up to the native size (32 bits, or 64 for PPC64) can be lowered to a simple load or store instruction (as the synchronization is already handled by AtomicExpand, and the atomicity is guaranteed thanks to the alignment requirements of atomic accesses). This is exactly what this patch does. Previously, these were implemented by complex load-linked/store-conditional loops.. an obvious performance problem. For example, this patch turns ``` define void @store_i8_unordered(i8* %mem) { store atomic i8 42, i8* %mem unordered, align 1 ret void } ``` from ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: rlwinm r2, r3, 3, 27, 28 li r4, 42 xori r5, r2, 24 rlwinm r2, r3, 0, 0, 29 li r3, 255 slw r4, r4, r5 slw r3, r3, r5 and r4, r4, r3 LBB4_1: ; =>This Inner Loop Header: Depth=1 lwarx r5, 0, r2 andc r5, r5, r3 or r5, r4, r5 stwcx. r5, 0, r2 bne cr0, LBB4_1 ; BB#2: blr ``` into ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: li r2, 42 stb r2, 0(r3) blr ``` which looks like a pretty clear win to me. Test Plan: fixed the tests + new test for indexed accesses + make check-all Reviewers: jfb, wschmidt, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5587 llvm-svn: 218922	2014-10-02 22:27:07 +00:00
Juergen Ributzka	2eab1ef77e	[Stackmaps] Make ithe frame-pointer required for stackmaps. Do not eliminate the frame pointer if there is a stackmap or patchpoint in the function. All stackmap references should be FP relative. This fixes PR21107. llvm-svn: 218920	2014-10-02 22:21:49 +00:00
Chandler Carruth	960ea9802e	[x86] Teach the new vector shuffle lowering to widen floating point elements as well as integer elements in order to form simpler shuffle patterns. This is the primary reason why we were failing to match some of the 2-and-2 floating point shuffles such as PR21140. Even after fixing this we need to support some extra patterns in the backend in order to match the resulting X86ISD::UNPCKL nodes into the correct instructions. This commit should fix PR21140 and includes more comprehensive testing of insertion patterns in v4 shuffles. Not all of the added tests are beautiful. For example, we don't have clever instructions to insert-via-load in the integer domain. There are also some places where we aren't sufficiently cunning with our use of movq and movd, but that's future work. llvm-svn: 218911	2014-10-02 21:37:14 +00:00
Tilmann Scheller	7617d101f0	[NVPTX] Remove dead code. Found by the Clang static analyzer. llvm-svn: 218874	2014-10-02 15:12:48 +00:00
Joerg Sonnenberger	ba894344cc	Support padding unaligned data in .text. llvm-svn: 218870	2014-10-02 13:41:42 +00:00
Chandler Carruth	3f933250d5	[x86] Improve and correct how the new vector shuffle lowering was matching and lowering 64-bit insertions. The first problem was that we weren't looking through bitcasts to discover that we could lower as insertions. Once fixed, we in turn weren't looking through bitcasts to discover that we could fold a load into the lowering. Once fixed, we weren't forming a SCALAR_TO_VECTOR node around the inserted element and instead were passing a scalar to a DAG node that expected a vector. It turns out there are some patterns that will "lower" this into the correct asm, but the rest of the X86 backend is very unhappy with such antics. This should fix a few more edge case regressions I've spotted going through the regression test suite to enable the new vector shuffle lowering. llvm-svn: 218839	2014-10-01 23:14:28 +00:00
Eric Christopher	3eb7c19a39	constify the TargetMachine argument used in the subtarget and lowering constructors. llvm-svn: 218832	2014-10-01 21:36:28 +00:00
Sanjay Patel	ae0d32c510	Lower FNEG ( FABS (x) ) -> FNABS (x) [X86 codegen] PR20578 Negative FABS of either a scalar or vector should be handled the same way on x86 with SSE/AVX: a single OR instruction of the FP operand with a constant to light up the sign bit(s). http://llvm.org/bugs/show_bug.cgi?id=20578 Differential Revision: http://reviews.llvm.org/D5201 llvm-svn: 218822	2014-10-01 21:20:06 +00:00
Eric Christopher	bce38d60f8	Now that the optimization level is adjusting the feature string before we hit the subtarget, remove the constructor parameter. llvm-svn: 218817	2014-10-01 21:05:35 +00:00
Eric Christopher	4ce55b7a5c	Rework the PPC TargetMachine so that the non-function specific overrides happen at TargetMachine creation and not on every subtarget creation. llvm-svn: 218805	2014-10-01 20:38:26 +00:00
Eric Christopher	5245670b4d	constify TargetMachine parameter for X86TargetLowering. llvm-svn: 218804	2014-10-01 20:38:22 +00:00

... 2 3 4 5 6 ...

30587 Commits