llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 19:12:56 +02:00

Author	SHA1	Message	Date
Krzysztof Parzyszek	97022d6a05	Revert r277372, it is causing buildbot failures llvm-svn: 277374	2016-08-01 20:00:33 +00:00
Krzysztof Parzyszek	b9313bbdd2	[Hexagon] Tidy up some code, NFC llvm-svn: 277372	2016-08-01 19:46:21 +00:00
Ron Lieberman	84b9473461	[Hexagon] Generate vector printing instructions llvm-svn: 277370	2016-08-01 19:36:39 +00:00
Evandro Menezes	fad8ba7058	[AArch64] Add support for Samsung Exynos M2 (NFC). llvm-svn: 277364	2016-08-01 18:39:45 +00:00
Krzysztof Parzyszek	09fa692bab	Replace MachineInstr* with MachineInstr& in TargetInstrInfo, NFC There were a few cases introduced with the modulo scheduler. llvm-svn: 277358	2016-08-01 17:55:48 +00:00
Krzysztof Parzyszek	3495701817	[Hexagon] Check for offset overflow when reserving scavenging slots Scavenging slots were only reserved when pseudo-instruction expansion in frame lowering created new virtual registers. It is possible to still need a scavenging slot even if no virtual registers were created, in cases where the stack is large enough to overflow instruction offsets. llvm-svn: 277355	2016-08-01 17:15:30 +00:00
Daniel Sanders	7f483c4eac	[mips][fastisel] Correct argument lowering for (f64, f64, i32) and similar. Summary: Allocating an AFGR64 shadows two GPR32's instead of just one. This fixes an LNT regression detected by our internal buildbots. Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: https://reviews.llvm.org/D23012 llvm-svn: 277348	2016-08-01 15:32:51 +00:00
Valery Pykhtin	448b27ff70	[AMDGPU] refactor DS instruction definitions. NFC. Differential revision: https://reviews.llvm.org/D22522 llvm-svn: 277344	2016-08-01 14:21:30 +00:00
Simon Pilgrim	cb0190856f	[X86] Use implicit masking of SHLD/SHRD shift double instructions Similar to the regular shift instructions, SHLD/SHRD only use the bottom bits of the shift value llvm-svn: 277341	2016-08-01 12:11:43 +00:00
Simon Pilgrim	1bdbfa9b79	Fixed MSVC out of range shift warning llvm-svn: 277333	2016-08-01 09:40:38 +00:00
Diana Picus	04ff51966a	[AArch64] Return the correct size for TLSDESC_CALLSEQ The branch relaxation pass is computing the wrong offsets because it assumes TLSDESC_CALLSEQ eats up 4 bytes, when in fact it is lowered to an instruction sequence taking up 16 bytes. This can become a problem in huge files with lots of TLS accesses, as it may slowly move branch targets out of the range computed by the branch relaxation pass. Fixes PR24234 https://llvm.org/bugs/show_bug.cgi?id=24234 Differential Revision: https://reviews.llvm.org/D22870 llvm-svn: 277331	2016-08-01 08:38:49 +00:00
Craig Topper	07c52fed4f	[AVX-512] Fix duplicate column in AVX512 execution dependency table that was preventing VMOVDQU32/VMOVDQA32 from being recognized. Fix a bug in the code that stops execution dependency fix from turning operations on 32-bit integer element types into operations on 64-bit integer element types. llvm-svn: 277327	2016-08-01 07:55:33 +00:00
Hrvoje Varga	c3ccf9f835	[mips] Clang generates unaligned offset for MSA instruction st.d Differential Revision: https://reviews.llvm.org/D19475 llvm-svn: 277323	2016-08-01 06:46:20 +00:00
Diana Picus	5efe040582	[AArch64] Register passes so they can be run by llc Initialize all AArch64-specific passes in the TargetMachine so they can be run by llc. This can lead to conflicts in opt with some command line options that share the same name as the pass, so I took this opportunity to do some cleanups: * rename all relevant command line options from "aarch64-blah" to "aarch64-enable-blah" and update the tests accordingly * run clang-format on their declarations * move all these declarations to a common place (the TargetMachine) as opposed to having them scattered around (AArch64BranchRelaxation and AArch64AddressTypePromotion were the only offenders) llvm-svn: 277322	2016-08-01 05:56:57 +00:00
Craig Topper	bda76c3945	[AVX-512] Teach X86InstrInfo::getLargestLegalSuperClass to inflate to FR32X/FR64X if AVX512 is supported and VR128X/VR256X if VLX is supported. Had to update a stack folding test to clobber the other 16 registers since this now made them get used instead of spilling. llvm-svn: 277321	2016-08-01 05:31:50 +00:00
Craig Topper	c63b152f5c	[AVX-512] Use FR32X/FR64X/VR128X/VR256X register classes in addRegisterClass if AVX512(for FR32X/FR64) or VLX(for VR128X/VR256) is supported. This is a minimal requirement to be able to allocate all 32 registers. llvm-svn: 277319	2016-08-01 04:29:13 +00:00
Craig Topper	446695a637	[X86] Move mask register handling into the main switch of getLoadStoreRegOpcode. No functional change intended. llvm-svn: 277318	2016-08-01 04:29:11 +00:00
Craig Topper	3af1cffebb	[X86] Simplify code for determing GR or FR reg classes by querying for super classes instead of manually listing individual classes. llvm-svn: 277306	2016-07-31 20:20:08 +00:00
Craig Topper	d31b93d0ed	[AVX512] Always use EVEX encodings for 128/256-bit move instructions in getLoadStoreRegOpcode if VLX is supported. llvm-svn: 277305	2016-07-31 20:20:05 +00:00
Craig Topper	841911ae72	[AVX512] Add VLX packed move instructions to the execution dependency fix pass and update tests. llvm-svn: 277304	2016-07-31 20:20:01 +00:00
Craig Topper	aa2754c787	[AVX512] Move FR32X/FR64X handling in getLoadStoreRegOpcode into the main switch. No functional change intended. llvm-svn: 277303	2016-07-31 20:19:55 +00:00
Craig Topper	ee8cfa497f	[AVX512] Stop treating VR512 specially in getLoadStoreRegOpcode and use the regular switch which already tried to handle it, but was unreachable. This has the added benefit of enabling aligned loads/stores if the stack is aligned. llvm-svn: 277302	2016-07-31 20:19:53 +00:00
Craig Topper	0e4bbd6b7f	[AVX512] Add X86::VR512RegClassID to X86RegisterInfo::getLargestLegalSuperClass. llvm-svn: 277301	2016-07-31 20:19:50 +00:00
Simon Pilgrim	6d81f830ac	[X86] Improve 64-bit shifts on 32-bit targets (PR14593) As discussed on PR14593, this patch adds support for lowering to SHLD/SHRD from the patterns generated by DAGTypeLegalizer::ExpandShiftWithKnownAmountBit. Differential Revision: https://reviews.llvm.org/D23000 llvm-svn: 277299	2016-07-31 19:50:45 +00:00
Craig Topper	01f8dc5886	[AVX-512] Don't let ExeDependencyFix pass convert VPANDD/Q to VPANDPS/PD unless DQI instructions are supported. Same for ANDN, OR, and XOR. Thanks to Igor Breger for pointing out my mistake. llvm-svn: 277292	2016-07-31 17:15:07 +00:00
Elena Demikhovsky	2a8fe9f6a8	AVX-512: Removed AssertZext node before TRUNCATE Removed AssertZext node, which was inserted between X86ISD::SETCC and "truncate to i1". Differential Revision: https://reviews.llvm.org/D22850 llvm-svn: 277289	2016-07-31 06:48:01 +00:00
Davide Italiano	dde6d4cdb8	[HexagonConstPropagation] Remove dead code. llvm-svn: 277285	2016-07-30 22:07:21 +00:00
Davide Italiano	f7847d4aa2	[HexagonBitSimplify] Remove dead code. llvm-svn: 277284	2016-07-30 22:07:18 +00:00
Davide Italiano	27137e1e36	[ARMConstantIslandPass] Remove dead code. llvm-svn: 277283	2016-07-30 22:07:15 +00:00
Simon Pilgrim	e14241934c	Strip trailing whitespace llvm-svn: 277280	2016-07-30 20:53:21 +00:00
Simon Pilgrim	00061d74dc	[X86] Use peekThroughOneUseBitcasts helper function llvm-svn: 277279	2016-07-30 20:51:26 +00:00
Simon Pilgrim	ac21df5892	[X86][SSE] Let 64-bit targets use the fast 2i32-2f32 UINT_TO_FP conversion as well as 32-bit The 2i32-2i64 legalization means that we can use the slightly quicker double bits + fptrunc approach for the same results llvm-svn: 277271	2016-07-30 14:06:59 +00:00
Benjamin Kramer	a3bbbe39f0	[Hexagon] Perform bit arithmetic on unsigned to avoid accidentally shifting negative values. Found by ubsan. llvm-svn: 277268	2016-07-30 13:25:37 +00:00
Benjamin Kramer	92669bed6f	[X86] Fix lifetime of SMRange temporaries. Found by asan -fsanitize-address-use-after-scope. llvm-svn: 277266	2016-07-30 11:31:24 +00:00
Benjamin Kramer	8392a2e47e	[AMDGPU] Fix lifetime of SmallVector temporaries. Found by asan -fsanitize-address-use-after-scope. llvm-svn: 277265	2016-07-30 11:31:16 +00:00
Matt Arsenault	0ab6a487e0	AMDGPU: Fix shouldConvertConstantLoadToIntImm behavior This should really be true for any immediate, not just inline ones. llvm-svn: 277260	2016-07-30 01:40:36 +00:00
Matt Arsenault	fc3cf0b620	AMDGPU: Set s_setpc_b64 as a terminator llvm-svn: 277259	2016-07-30 01:40:34 +00:00
Matt Arsenault	b7e78f5026	AMDGPU: Remove unused pattern llvm-svn: 277258	2016-07-30 01:40:30 +00:00
Tim Northover	f236d3a8f9	GlobalISel: support translation of intrinsic calls. These come in two variants for now: G_INTRINSIC and G_INTRINSIC_W_SIDE_EFFECTS. We may decide to split the latter up with finer-grained restrictions later, if necessary. llvm-svn: 277224	2016-07-29 22:32:36 +00:00
Krzysztof Parzyszek	846bd7b26a	[Hexagon] Referencify MachineInstr in HexagonInstrInfo, NFC llvm-svn: 277220	2016-07-29 21:49:42 +00:00
Michael Kuperstein	20f3abeefd	[X86] Match PSADBW in straight-line code Up until now, we only had code to match PSADBW patterns that look like what comes out of the loop vectorizer - a partial reduction inside the loop body that gets fed into a horizontal operation in a different basic block. This adds support for straight-line patterns, like those generated by the SLP vectorizer. Differential Revision: https://reviews.llvm.org/D22889 llvm-svn: 277219	2016-07-29 21:45:51 +00:00
Simon Pilgrim	23592c87e8	[X86][AVX] Fix VBROADCASTF128 selection bug (PR28770) Support for lowering to VBROADCASTF128 etc. in D22460 was not correctly ensuring that the only users of the 128-bit vector load were the insertions of the vector into the lower/upper subvectors. llvm-svn: 277214	2016-07-29 21:05:10 +00:00
Tim Northover	dda86274a2	CodeGen: add new "intrinsic" MachineOperand kind. This will be used during GlobalISel, where we need a more robust and readable way to write tests than a simple immediate ID. llvm-svn: 277209	2016-07-29 20:32:59 +00:00
Simon Pilgrim	39c59ab4a4	Fixed MSVC out of range shift warning llvm-svn: 277195	2016-07-29 18:43:59 +00:00
Krzysztof Parzyszek	3a5bc2df22	Revert r277178, the actual change had already been applied Will submit another patch with the testcase only. llvm-svn: 277180	2016-07-29 17:50:47 +00:00
Krzysztof Parzyszek	f5f51e9c74	[Hexagon] Misaligned loads and stores are not fast The DAG combiner tries to merge stores to adjacent vector wide memory locations by creating stores which are integral multiples of the vector width. Discourage this by informing it that this is slow. This should not affect legalization passes, because all of them ignore the "Fast" argument. Patch by Pranav Bhandarkar. llvm-svn: 277178	2016-07-29 17:45:16 +00:00
Tim Northover	2f025adfb6	CodeGen: improve MachineInstrBuilder & MachineIRBuilder interface For MachineInstrBuilder, having to manually use RegState::Define is ugly and makes register definitions clunkier than they need to be, so this adds two convenience functions: addDef and addUse. For MachineIRBuilder, we want to avoid BuildMI's first-reg-is-def rule because it's hidden away and causes bugs. So this patch switches buildInstr to returning a MachineInstrBuilder and adding all operands via addDef/addUse. NFC. llvm-svn: 277176	2016-07-29 17:43:52 +00:00
Ahmed Bougacha	999a09ced0	[AArch64][GlobalISel] Select G_XOR. llvm-svn: 277173	2016-07-29 16:56:25 +00:00
Ahmed Bougacha	bedaf830ba	[AArch64][GlobalISel] Select G_LOAD/G_STORE. Mostly straightforward as we ignore addressing modes and just use the base + unsigned immediate offset (always 0) variants. This currently fails to select extloads because we have yet to agree on a representation. llvm-svn: 277171	2016-07-29 16:56:16 +00:00
Brendon Cahoon	e37295579e	MachinePipeliner pass that implements Swing Modulo Scheduling Software pipelining is an optimization for improving ILP by overlapping loop iterations. Swing Modulo Scheduling (SMS) is an implementation of software pipelining that attempts to reduce register pressure and generate efficient pipelines with a low compile-time cost. This implementaion of SMS is a target-independent back-end pass. When enabled, the pass should run just prior to the register allocation pass, while the machine IR is in SSA form. If the pass is successful, then the original loop is replaced by the optimized loop. The optimized loop contains one or more prolog blocks, the pipelined kernel, and one or more epilog blocks. This pass is enabled for Hexagon only. To enable for other targets, a couple of target specific hooks must be implemented, and the pass needs to be called from the target's TargetMachine implementation. Differential Review: http://reviews.llvm.org/D16829 llvm-svn: 277169	2016-07-29 16:44:44 +00:00
Krzysztof Parzyszek	ce9680792b	[Hexagon] Custom lower VECTOR_SHUFFLE and EXTRACT_SUBVECTOR for HVX If the mask of a vector shuffle has alternating odd or even numbers starting with 1 or 0 respectively up to the largest possible index for the given type in the given HVX mode (single of double) we can generate vpacko or vpacke instruction respectively. E.g. %42 = shufflevector <32 x i16> %37, <32 x i16> %41, <32 x i32> <i32 1, i32 3, ..., i32 63> is %42.h = vpacko(%41.w, %37.w) Patch by Pranav Bhandarkar. llvm-svn: 277168	2016-07-29 16:44:27 +00:00
Krzysztof Parzyszek	4ca53a9c57	[Hexagon] Improve balancing of address calculation Rebalances address calculation trees and applies Hexagon-specific optimizations to the trees to improve instruction selection. Patch by Tobias Edler von Koch. llvm-svn: 277151	2016-07-29 15:15:35 +00:00
David L Kreitzer	717b5d713d	Avoid unnecessary 32-bit to 64-bit zero extensions following 32-bit CMOV instructions on x86_64. The 32-bit CMOV implicitly zero extends. Differential Revision: https://reviews.llvm.org/D22941 llvm-svn: 277148	2016-07-29 15:09:54 +00:00
Krzysztof Parzyszek	f6c7e61c5f	Fix license information in the file header llvm-svn: 277145	2016-07-29 14:04:17 +00:00
Krzysztof Parzyszek	6ac252b6e2	Add missing files to r277143 llvm-svn: 277144	2016-07-29 13:59:55 +00:00
Krzysztof Parzyszek	ef4e9bde37	[Hexagon] Implement DFA based hazard recognizer The post register allocator scheduler can generate poor schedules because the scoreboard hazard recognizer is unable to identify hazards for Hexagon precisely. Instead, Hexagon should use a DFA based hazard recognizer. Patch by Brendon Cahoon. llvm-svn: 277143	2016-07-29 13:59:09 +00:00
Daniel Sanders	92f45597fa	Re-commit: [mips][fastisel] Handle 0-4 arguments without SelectionDAG. Summary: Implements fastLowerArguments() to avoid the need to fall back on SelectionDAG for 0-4 argument functions that don't do tricky things like passing double in a pair of i32's. This allows us to move all except one test to -fast-isel-abort=3. The remaining one has function prototypes of the form 'i32 (i32, double, double)' which requires floats to be passed in GPR's. The previous commit had an uninitialized variable that caused the incoming argument region to have undefined size. This has been fixed. Reviewers: sdardis Subscribers: dsanders, llvm-commits, sdardis Differential Revision: https://reviews.llvm.org/D22680 llvm-svn: 277136	2016-07-29 12:27:28 +00:00
Simon Pilgrim	511d5dc0be	[X86][SSE] Optimize the truncation of vector comparison results with PACKSS We currently default to using either generic shuffles or MASK+PACKUS/PACKSS to truncate all integer vectors. For vector comparisons, we know that the result will be either all or zero bits in every element, which can be efficiently truncated by directly using PACKSS to repeatedly halve the size of each element. Due to the limited input values (-1 or 0) we don't need to account for vector element size, so for simplicity we just use the PACKSS(vXi16,vXi16) implementation in all cases. Additionally for AVX2 PACKSS of 256bit data we must perform a PERMQ shuffle to reorder the data into the correct order. I did investigate performing a single shuffle after all the PACKSS calls but the need to cross 128bit lanes makes this difficult to achieve efficiently. We avoid performing this on AVX512 as it should have better alternative truncation instructions. Differential Revision: https://reviews.llvm.org/D22814 llvm-svn: 277132	2016-07-29 10:23:10 +00:00
Simon Pilgrim	8fa33ce6ce	Fixed MSVC out of range shift warning llvm-svn: 277130	2016-07-29 10:03:39 +00:00
Sjoerd Meijer	fd7049c574	Fix for commit rL277126 that broke a build. llvm-svn: 277129	2016-07-29 09:57:37 +00:00
Prakhar Bahuguna	01b47c1da5	[Thumb] Emit Thumb move in both Thumb modes for struct_byval predicates Summary: The MOV/MOVT instructions being chosen for struct_byval predicates was conditional only on Thumb2, resulting in an ARM MOV/MOVT instruction being incorrectly emitted in Thumb1 mode. This is especially apparent with v8-m.base targets. This patch ensures that Thumb instructions are emitted in both Thumb modes. Reviewers: rengolin, t.p.northover Subscribers: llvm-commits, aemerson, rengolin Differential Revision: https://reviews.llvm.org/D22865 llvm-svn: 277128	2016-07-29 09:16:46 +00:00
Jacques Pienaar	08d08634c4	[lanai] Update for Target API (TargetRegistry::RegisterMCAsmBackend) change llvm-svn: 277127	2016-07-29 08:50:23 +00:00
Sjoerd Meijer	f6deb69730	TargetInstrInfo: add virtual function getInstSizeInBytes This adds a target hook getInstSizeInBytes to TargetInstrInfo that a lot of subclasses already implement. Differential Revision: https://reviews.llvm.org/D22885 llvm-svn: 277126	2016-07-29 08:16:16 +00:00
Craig Topper	72bf22eca3	[AVX512] Mark EVEX VMOVSSrm and VMOVSDrm as canFoldAsLoad and isReMaterializable. llvm-svn: 277120	2016-07-29 06:06:04 +00:00
Craig Topper	e3f3eaac43	[AVX512] Copy the patterns that recognize scalar arimetic operations inserting into the lower element of a packed vector from AVX/SSE so that we can use EVEX encoded instructions. llvm-svn: 277119	2016-07-29 06:06:00 +00:00
David Majnemer	93c48d55ce	[ConstnatFolding] Teach the folder how to fold ConstantVector A ConstantVector can have ConstantExpr operands and vice versa. However, the folder had no ability to fold ConstantVectors which, in some cases, was an optimization barrier. Instead, rephrase the folder in terms of Constants instead of ConstantExprs and teach callers how to deal with failure. llvm-svn: 277099	2016-07-29 03:27:26 +00:00
Craig Topper	c6646be4b6	[AVX512] Remove the intrinsic forms of VMOVSS/VMOVSD. We don't need two different forms of 'rr' and 'rm'. This matches SSE/AVX. I'm not convinced the patterns for the rm_Int was correct anyway. It had a tied source that should't exist for the unmasked version. The load form of MOVSS always zeros the most significant bits. I've left the patterns off the masked load instructions as I'm not sure what the correct pattern should be and we don't have any tests currently. Nor do we implement masked scalar load intrinsics in clang currently. llvm-svn: 277098	2016-07-29 02:49:08 +00:00
Changpeng Fang	cefc6be193	AMDGPU/SI: Don't handle a loop if there is no loop at all for a terminator BB. Differential Revision: http://reviews.llvm.org/D22021 Reviewed by: arsenm llvm-svn: 277073	2016-07-28 23:01:45 +00:00
Krzysztof Parzyszek	4a69e8723d	Fix build breaks after r277028 llvm-svn: 277031	2016-07-28 20:25:21 +00:00
Krzysztof Parzyszek	fe4b956a14	[Hexagon] Implement MI-level constant propagation llvm-svn: 277028	2016-07-28 20:01:59 +00:00
Krzysztof Parzyszek	7e077bcfe5	[Hexagon] Insert CFI instructions before throwing calls Normally, CFI instructions should be inserted after allocframe, but if allocframe is in the same packet with a call, the CFI instructions should be inserted before that packet. llvm-svn: 277020	2016-07-28 19:13:46 +00:00
Matthias Braun	91722d430e	MachineFunction: Return reference for getFrameInfo(); NFC getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017	2016-07-28 18:40:00 +00:00
Ahmed Bougacha	4873a7eca3	[AArch64][GlobalISel] Select G_BR. This is the first unsized instruction we support; move down the 'sized' check to binops. llvm-svn: 277007	2016-07-28 17:15:15 +00:00
Ahmed Bougacha	49c737bbe8	[AArch64][GlobalISel] Select GPR G_SUB. llvm-svn: 277003	2016-07-28 16:58:35 +00:00
Ahmed Bougacha	4c64e893fb	[AArch64][GlobalISel] Select GPR G_AND. llvm-svn: 277002	2016-07-28 16:58:31 +00:00
Ahmed Bougacha	64129e5a43	[GlobalISel] Remove types on selected insts instead of using LLT(). LLT() has a particular meaning: it's one invalid type. But we really want selected instructions to have no type whatsoever. Also verify that types don't linger after ISel, and enable the verifier on the AArch64 select test. llvm-svn: 277001	2016-07-28 16:58:27 +00:00
Wei Ding	ce5d57c9f9	AMDGPU : Add intrinsics for compare with the full wavefront result Differential Revision: http://reviews.llvm.org/D22482 llvm-svn: 276998	2016-07-28 16:42:13 +00:00
Sjoerd Meijer	ac97b58407	TargetInstrInfo: rename GetInstSizeInBytes to getInstSizeInBytes. NFC Differential Revision: https://reviews.llvm.org/D22925 llvm-svn: 276997	2016-07-28 16:32:22 +00:00
Daniel Sanders	8748011a34	[mips] Fix a warning that occurs on some gcc 4.9.2's but not all of them. llvm-svn: 276993	2016-07-28 15:59:06 +00:00
Daniel Sanders	0548b674d7	Revert r276982 and r276984: [mips][fastisel] Handle 0-4 arguments without SelectionDAG It seems that the stack offset in callabi.ll varies between machines. I'll look into it. llvm-svn: 276989	2016-07-28 15:37:42 +00:00
Craig Topper	9332f50e72	[X86] Remove CustomInserter for FMA3 instructions. Looks like since we got full commuting support for FMAs after this was added, the coalescer can now get this right on its own. Differential Revision: https://reviews.llvm.org/D22799 llvm-svn: 276987	2016-07-28 15:28:56 +00:00
Daniel Sanders	e4df16d417	[mips] Reword debug message as should have been done before committing r276982 llvm-svn: 276984	2016-07-28 15:13:23 +00:00
Daniel Sanders	78d6f98568	[mips][fastisel] Handle 0-4 arguments without SelectionDAG. Summary: Implements fastLowerArguments() to avoid the need to fall back on SelectionDAG for 0-4 argument functions that don't do tricky things like passing double in a pair of i32's. This allows us to move all except one test to -fast-isel-abort=3. The remaining one has function prototypes of the form 'i32 (i32, double, double)' which requires floats to be passed in GPR's. Reviewers: sdardis Subscribers: dsanders, llvm-commits, sdardis Differential Revision: https://reviews.llvm.org/D22680 llvm-svn: 276982	2016-07-28 14:55:28 +00:00
Tom Stellard	b70225cf83	AMDGPU/SI: Don't use reserved VGPRs for SGPR spilling Summary: We were using reserved VGPRs for SGPR spilling and this was causing some programs with a workgroup size of 1024 to use more than 64 registers, which is illegal. Reviewers: arsenm, mareko, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22032 llvm-svn: 276980	2016-07-28 14:30:43 +00:00
Nicolai Haehnle	ca079d3c9f	AMDGPU: add execfix flag to SI_ELSE Summary: SI_ELSE is lowered into two parts: s_or_saveexec_b64 dst, src (at the start of the basic block) s_xor_b64 exec, exec, dst (at the end of the basic block) The idea is that dst contains the exec mask of the preceding IF block. It can happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside the basic block that contains SI_ELSE, in which case it introduces an instruction s_and_b64 exec, exec, s[...] which masks out bits that can correspond to both the IF and the ELSE paths. So the resulting sequence must be: s_or_savexec_b64 dst, src s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode s_and_b64 dst, dst, exec <-- added by SILowerControlFlow s_xor_b64 exec, exec, dst Whether to add the additional s_and_b64 dst, dst, exec is currently determined via the ExecModified tracking. With this change, it is instead determined by an additional flag on SI_ELSE which is set by SIWholeQuadMode. Finally: It also occured to me that an alternative approach for the long run is for SILowerControlFlow to unconditionally emit s_or_saveexec_b64 dst, src ... s_and_b64 dst, dst, exec s_xor_b64 exec, exec, dst and have a pass that detects and cleans up the "redundant AND with exec" pattern where possible. This could be useful anyway, because we also add instructions s_and_b64 vcc, exec, vcc before s_cbranch_scc (in moveToALU), and those are often redundant. I have some pending changes to how KILL is lowered that could also benefit from such a cleanup pass. In any case, this current patch could help in the short term with the whole ExecModified business. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22846 llvm-svn: 276972	2016-07-28 11:39:24 +00:00
Zijiao Ma	b0f00268cf	Add unittests to {ARM \| AArch64}TargetParser. Add unittest to {ARM \| AArch64}TargetParser,and by the way correct problems as below: 1.Correct a incorrect indexing problem in AArch64TargetParser. The architecture enumeration is shared across ARM and AArch64 in original implementation.But In the code,I just used the index which was offset by the ARM, and this would index into the array incorrectly. To make AArch64 has its own arch enum,or we will do a lot of slowly iterating. 2.Correct a spelling error. The parameter of llvm::AArch64::getArchExtName. 3.Correct a writing mistake, in llvm::ARM::parseArchISA. Differential Revision: https://reviews.llvm.org/D21785 llvm-svn: 276957	2016-07-28 06:11:18 +00:00
Matt Arsenault	cb2032b588	AMDGPU: Turn dead checks into asserts llvm-svn: 276946	2016-07-28 00:32:05 +00:00
Matt Arsenault	6e95fdbb3e	AMDGPU: Remove analyzeImmediate This no longer uses the more complicated classification of constants. llvm-svn: 276945	2016-07-28 00:32:02 +00:00
Krzysztof Parzyszek	ca2db7e8ff	[Hexagon] Find speculative loop preheader in hardware loop generation Before adding a new preheader block, check if there is a candidate block where the loop setup could be placed speculatively. This will be off by default. llvm-svn: 276919	2016-07-27 21:20:54 +00:00
Michael Kuperstein	079b49ff59	[X86] Factor out another piece of the SAD combine. NFCI. llvm-svn: 276918	2016-07-27 20:59:51 +00:00
Krzysztof Parzyszek	cf1ee6fb72	[Hexagon] Add option to bisect spill slot optimization llvm-svn: 276917	2016-07-27 20:58:43 +00:00
Krzysztof Parzyszek	8aa44efbb6	[Hexagon] Do not optimize volatile stack spill slots llvm-svn: 276916	2016-07-27 20:50:42 +00:00
Krzysztof Parzyszek	359322393a	[Hexagon] Handle extended versions of restore routines llvm-svn: 276903	2016-07-27 18:47:25 +00:00
Duncan P. N. Exon Smith	4f68a7939a	XCore: Avoid implicit iterator conversions, NFC Avoid implicit conversions from MachineInstrBundleIterator to MachineInstr, mainly by preferring MachineInstr& over MachineInstr. llvm-svn: 276899	2016-07-27 18:14:38 +00:00
Nirav Dave	b6cc023169	[MC][X86] Fix Intel Operand assembly parsing for .set ids Fix intel syntax special case identifier operands that refer to a constant (e.g. .set <ID> n) to be interpreted as immediate not memory in parsing. Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22585 llvm-svn: 276895	2016-07-27 17:39:41 +00:00
Krzysztof Parzyszek	b179d77b11	[Hexagon] Add saved callee-saved registers as live-in in non-wrapped blocks The callee-saved registers that are saved in a function are not pristine, and so they can be defined and used. In case of shrink-wrapping though, there are blocks that are outside of the save/restore range, and in those blocks the saved registers must be treated as pristine. To avoid any uses of these registers, add them as live-in in all those blocks. This was already done for blocks reaching function exits after restore, add code that does the same for blocks reached from the function entry before save. llvm-svn: 276886	2016-07-27 16:26:39 +00:00
Reid Kleckner	d444e6bbd6	Remove MCAsmInfo.h include from TargetOptions.h TargetOptions wants the ExceptionHandling enum. Move that to MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in clang. Now you can add a DWARF tag without a full rebuild of clang semantic analysis. llvm-svn: 276883	2016-07-27 16:03:57 +00:00
Diana Picus	82a9864745	Typo fix. NFC llvm-svn: 276879	2016-07-27 15:13:25 +00:00
Ahmed Bougacha	fdc59ed6fb	[GlobalISel] Introduce an instruction selector. And implement it for AArch64, supporting x/w ADD/OR. Differential Revision: https://reviews.llvm.org/D22373 llvm-svn: 276875	2016-07-27 14:31:55 +00:00
Ahmed Bougacha	4599d4c26b	[AArch64] Mark various *Info classes as 'final'. NFC. llvm-svn: 276874	2016-07-27 14:31:46 +00:00
Ahmed Bougacha	58eca032c9	[AArch64] Define AArch64RegisterInfo as a class, not a struct. NFC. llvm-svn: 276873	2016-07-27 14:31:40 +00:00
Daniel Sanders	1eaf2533cd	[mips][ias] Check '$rs = $rd' constraints when both registers are in AsmText. Summary: This is one possible solution to the problem of ignoring constraints that Simon raised in D21473 but it's a bit of a hack. The integrated assembler currently ignores violations of the tied register constraints when the operands involved in a tie are both present in the AsmText. For example, 'dati $rs, $rt, $imm' with the '$rs = $rt' will silently replace $rt with $rs. So 'dati $2, $3, 1' is processed as if the user provided 'dati $2, $2, 1' without any diagnostic being emitted. This is difficult to solve properly because there are multiple parts of the matcher that are silently forcing these constraints to be met. Tied operands are rendered to instructions by cloning previously rendered operands but this is unnecessary because the matcher was already instructed to render the operand it would have cloned. This is also unnecessary because earlier code has already replaced the MCParsedOperand with the one it was tied to (so the parsed input is matched as if it were 'dati <RegIdx 2>, <RegIdx 2>, <Imm 1>'). As a result, it looks like fixing this properly amounts to a rewrite of the tied operand handling which affects all targets. This patch however, merely inserts a checking hook just before the substitution of MCParsedOperands and the Mips target overrides it. It's not possible to accurately check the registers are the same this early (because numeric registers haven't been bound to a register class yet) so it cheats a bit and checks that the tokens that produced the operand are lexically identical. This works because tied registers need to have the same register class but it does have a flaw. It will reject 'dati $4, $a0, 1' for violating the constraint even though $a0 ends up as the same register as $4. Reviewers: sdardis Subscribers: dsanders, llvm-commits, sdardis Differential Revision: https://reviews.llvm.org/D21994 llvm-svn: 276867	2016-07-27 13:49:44 +00:00
Nemanja Ivanovic	5b3b0e70fd	[PowerPC] Fix typo in PPCHazardRecognizers.cpp Fixes PR28731. llvm-svn: 276865	2016-07-27 13:24:54 +00:00
Duncan P. N. Exon Smith	6b7344a728	PowerPC: Avoid implicit iterator conversions, NFC Avoid implicit conversions from MachineInstrBundleIterator to MachineInstr* in the PowerPC backend, mainly by preferring MachineInstr& over MachineInstr* when a pointer isn't nullable and using range-based for loops. There was one piece of questionable code in PPCInstrInfo::AnalyzeBranch, where a condition checked a pointer converted from an iterator for nullptr. Since this case is impossible (moreover, the code above guarantees that the iterator is valid), I removed the check when I changed the pointer to a reference. Despite that case, there should be no functionality change here. llvm-svn: 276864	2016-07-27 13:24:16 +00:00
Renato Golin	c15ac98803	[ARM] Set a non-conflicting comment character for assembly in MSVC mode Currently, for ARMCOFFMCAsmInfoMicrosoft, no comment character is set, thus the idefault, '#', is used. The hash character doesn't work as comment character in ARM assembly, since '#' is used for immediate values. The comment character is set to ';', which is the comment character used by MS armasm.exe. (The microsoft armasm.exe uses a different directive syntax than what LLVM currently supports though, similar to ARM's armasm.) This allows inline assembly with immediate constants to be built (and brings the assembly output from clang -S closer to being possible to assemble). A test is added that verifies that ';' is correctly interpreted as comments in this mode, and verifies that assembling code that includes literal constants with a '#' works. Patch by Martin Storsjö. llvm-svn: 276859	2016-07-27 12:31:58 +00:00
Matt Arsenault	7c41fae233	AMDGPU: Use rcp for fdiv 1, x with fpmath metadata Using rcp should be OK for safe math usually, so this should not be replacing the original fdiv. llvm-svn: 276823	2016-07-26 23:25:44 +00:00
Matt Arsenault	3d9fa5a776	AMDGPU: Use implicit_def for selecting anyext llvm-svn: 276819	2016-07-26 23:06:33 +00:00
Matt Arsenault	bef5ca03b3	AMDGPU/R600: Remove dead custom inserters The intrinsics for these were removed, so this is dead. llvm-svn: 276805	2016-07-26 21:03:38 +00:00
Matt Arsenault	23a632a892	AMDGPU: Minor AsmPrinter cleanups llvm-svn: 276804	2016-07-26 21:03:36 +00:00
Krzysztof Parzyszek	e6e3bb2045	[Hexagon] Post-increment loads/stores enhancements - Generate vector post-increment stores more aggressively. - Predicate post-increment and vector stores in early if-conversion. llvm-svn: 276800	2016-07-26 20:30:30 +00:00
Michael Kuperstein	05ba3d25fa	[X86] Split out absdiff detection from SAD combine. NFC. Preparation for supporting PSADBW emission for straight-line code. llvm-svn: 276798	2016-07-26 20:01:29 +00:00
Krzysztof Parzyszek	0984ca425e	[Hexagon] Gracefully handle reg class mismatch in HexagonLoopReschedule llvm-svn: 276793	2016-07-26 19:17:13 +00:00
Krzysztof Parzyszek	97d3aa900c	[Hexagon] Rerun bit tracker on new instructions in RIE Consider this case: vreg1 = A2_zxth vreg0 (1) ... vreg2 = A2_zxth vreg1 (2) Redundant instruction elimination could delete the instruction (1) because the user (2) only cares about the low 16 bits. Then it could delete (2) because the input is already zero-extended. The problem is that the properties allowing each individual instruction to be deleted depend on the existence of the other instruction, so either one can be deleted, but not both. The existing check for this situation in RIE was insufficient. The fix is to update all dependent cells when an instruction is removed (replaced via COPY) in RIE. llvm-svn: 276792	2016-07-26 19:08:45 +00:00
Krzysztof Parzyszek	6d72583592	[Hexagon] Bitwise operations for insert/extract word not simplified Change the bit simplifier to generate REG_SEQUENCE instructions in addition to COPY, which will handle cases of word insert/extract. llvm-svn: 276787	2016-07-26 18:30:11 +00:00
Krzysztof Parzyszek	587906f308	[Hexagon] Add support for proper handling of H and L constraints H -> High part of reg pair. L -> Low part of reg pair. Patch by Sundeep Kushwaha. llvm-svn: 276773	2016-07-26 17:31:02 +00:00
Matt Arsenault	1c48278fe0	AMDGPU: Make AMDGPUMachineFunction fields private ABIArgOffset is a problem because properly fsetting the KernArgSize requires that the reserved area before the real kernel arguments be correctly aligned, which requires fixing clover. llvm-svn: 276766	2016-07-26 16:45:58 +00:00
Matt Arsenault	d60a4f902f	AMDGPU: Add fp legacy instruction intrinsics This could use some additional optimization work to use mad/mac legacy. llvm-svn: 276764	2016-07-26 16:45:45 +00:00
Daniel Sanders	4aa9b69215	[mips] Fix typos in spelling of lowerRETURNADDR. The first letter was mistakenly capitalized. llvm-svn: 276753	2016-07-26 14:46:11 +00:00
Krzysztof Parzyszek	63073c87c3	[Hexagon] Update store offset when not packetizing it with allocframe When the packetizer wants to put a store to a stack slot in the same packet with an allocframe, it updates the store offset to reflect the value of SP before it is updated by allocframe. If the store cannot be packetized with the allocframe after all, the offset needs to be updated back to the previous value. llvm-svn: 276749	2016-07-26 14:24:46 +00:00
Oliver Stannard	2003c0b073	[ARM] Improve error messages for .arch_extension directive - More informative message when extension name is not an identifier token. - Stop parsing directive if extension is unknown (avoid duplicate error messages). - Report unsupported extensions with a source location, rather than report_fatal_error. Differential Revision: https://reviews.llvm.org/D22806 llvm-svn: 276748	2016-07-26 14:24:43 +00:00
Oliver Stannard	150d7b2d23	[ARM] Implement -mimplicit-it assembler option This option, compatible with gas's -mimplicit-it, controls the generation/checking of implicit IT blocks in ARM/Thumb assembly. This option allows two behaviours that were not possible before: - When in ARM mode, emit a warning when assembling a conditional instruction that is not in an IT block. This is enabled with -mimplicit-it=never and -mimplicit-it=thumb. - When in Thumb mode, automatically generate IT instructions when an instruction with a condition code appears outside of an IT block. This is enabled with -mimplicit-it=thumb and -mimplicit-it=always. The default option is -mimplicit-it=arm, which matches the existing behaviour (allow conditional ARM instructions outside IT blocks without warning, and error if a conditional Thumb instruction is outside an IT block). The general strategy for generating IT blocks in Thumb mode is to keep a small list of instructions which should be in the IT block, and only emit them when we encounter something in the input which means we cannot continue the block. This could be caused by: - A non-predicable instruction - An instruction with a condition not compatible with the IT block - The IT block already contains 4 instructions - A branch-like instruction (including ALU instructions with the PC as the destination), which cannot appear in the middle of an IT block - A label (branching into an IT block is not legal) - A change of section, architecture, ISA, etc - The end of the assembly file. Some of these, such as change of section and end of file, are parsed outside of the ARM asm parser, so I've added a new virtual function to AsmParser to ensure any previously-parsed instructions have been emitted. The ARM implementation of this flushes the currently pending IT block. We now have to try instruction matching up to 3 times, because we cannot know if the current IT block is valid before matching, and instruction matching changes depending on the IT block state (due to the 16-bit ALU instructions, which set the flags iff not in an IT block). In the common case of not having an open implicit IT block and the instruction being matched not needing one, we still only have to run the matcher once. I've removed the ITState.FirstCond variable, because it does not store any information that isn't already represented by CurPosition. I've also updated the comment on CurPosition to accurately describe it's meaning (which this patch doesn't change). Differential Revision: https://reviews.llvm.org/D22760 llvm-svn: 276747	2016-07-26 14:19:47 +00:00
Simon Pilgrim	583bb759ad	[X86][SSE] Fixed issue with memory folding of (v)cvtsd2ss intrinsics Fixed typo in the intrinsic definitions of (v)cvtsd2ss with memory folding. This was only unearthed when rL276102 started using the intrinsic again..... llvm-svn: 276740	2016-07-26 10:41:28 +00:00
Simon Dardis	f4e81c479c	[mips] MIPS64R6 compact branch support MIPS64R6 compact branch support. As the MIPS LLVM backend uses distinct MachineInstrs for certain 32 and 64 bit instructions (e.g. BEQ & BEQ64) that map to the same instruction, extend compact branch support for the corresponding 64bit branches. Reviewers: dsanders Differential Revision: https://reviews.llvm.org/D20164 llvm-svn: 276739	2016-07-26 10:25:07 +00:00
Simon Pilgrim	5565482780	Fixed spelling in comment llvm-svn: 276738	2016-07-26 09:55:31 +00:00
Simon Dardis	0fc0478ed5	[mips] sgtu, s[rl]l, sra, dnegu, neg instruction aliases Add the instruction alias sgtu (register form only), two operand forms of s[rl]l and sra, and missing single/two operand forms of dnegu/neg. Reviewers: dsanders Differential Revision: https://reviews.llvm.org/D22752 llvm-svn: 276736	2016-07-26 09:13:46 +00:00
Craig Topper	955f9414b8	[X86] Remove isCommutable=1 from instructions that also load. Commuting such instruction isn't useful as it would unfold the load. The exception being FMA3 instructions. llvm-svn: 276733	2016-07-26 08:06:18 +00:00
Craig Topper	f0c4de855b	[AVX512] Don't mark ADDSSZr_Int or MULSSZr_Int as commutable. The intrinsics have one of their arguments indicated as passing through the high bits and we can't commute that. llvm-svn: 276732	2016-07-26 08:06:14 +00:00
Renato Golin	57cdb5383f	[ARM] Saturation instructions are DSP-only The saturation instructions appeared in v6T2, with DSP extensions, but they were being accepted / generated on any, with the new introduction of the saturation detection in the back-end. This commit restricts the usage to DSP-enable only cores. Fixes PR28607. llvm-svn: 276701	2016-07-25 22:25:25 +00:00
David Blaikie	36a632d19c	[WebAssembly] Update for Target API (TargetRegistry::RegisterMCAsmBackend) change llvm-svn: 276694	2016-07-25 21:41:42 +00:00
Tim Northover	2cf541095a	GlobalISel[AArch64]: support pointer types in argument lowering. They're basically i64 for AArch64, but we'll leave them intact for stranger targets. Also add some tests for the (very few) other cases we can handle right now. llvm-svn: 276689	2016-07-25 21:01:17 +00:00
Jan Vesely	4192bbafb1	AMDGPU: Remove read_workdim intrinsic Differential revision: https://reviews.llvm.org/D22732 llvm-svn: 276682	2016-07-25 20:17:02 +00:00
Matt Arsenault	d97bd5a03b	AMDGPU: Make skip threshold an option llvm-svn: 276680	2016-07-25 19:48:29 +00:00
Matt Arsenault	842a619c4a	AMDGPU: Delete dead code llvm-svn: 276675	2016-07-25 19:06:25 +00:00
Joel Jones	b127877693	MC] Provide an MCTargetOptions to implementors of MCAsmBackendCtorTy, NFC Some targets, notably AArch64 for ILP32, have different relocation encodings based upon the ABI. This is an enabling change, so a future patch can use the ABIName from MCTargetOptions to chose which relocations to use. Tested using check-llvm. The corresponding change to clang is in: http://reviews.llvm.org/D16538 Patch by: Joel Jones Differential Revision: https://reviews.llvm.org/D16213 llvm-svn: 276654	2016-07-25 17:18:28 +00:00
Elena Demikhovsky	c1d11ed3e5	AVX-512: Fixed [US]INT_TO_FP selection for i1 vectors. It failed with assertion before this patch. Differential Revision: https://reviews.llvm.org/D22735 llvm-svn: 276648	2016-07-25 16:51:00 +00:00
Krzysztof Parzyszek	9b5d973061	[Hexagon] Add target feature to generate long calls llvm-svn: 276638	2016-07-25 14:42:11 +00:00
Sam Parker	00a189f236	[ARM] Improve longMAC codegen test Added thumb targets and dataflow checks to the longMAC test. Differential Revision: https://reviews.llvm.org/D22684 llvm-svn: 276629	2016-07-25 10:11:00 +00:00
Simon Dardis	ec8cfdd35b	[mips] Optimize materialization of i64 constants Avoid MipsAnalyzeImmediate usage if the constant fits in an 32-bit integer. This allows us to generate the same instructions for the materialization of the same constants regardless the width of their type. Patch by: Vasileios Kalintiris Contributions by: Simon Dardis Reviewers: Daniel Sanders Differential Review: https://reviews.llvm.org/D21689 llvm-svn: 276628	2016-07-25 09:57:28 +00:00
Sam Parker	4b055f7546	[ARM] Small refactor of Thumb2 SMLA insts Follow up to r276624. Changes bits 22-20 to be parameters to instruction class. Differential Revision: https://reviews.llvm.org/D22562 llvm-svn: 276626	2016-07-25 09:29:24 +00:00
Sam Parker	4b5db242de	[ARM] Enable ISel of SMMLS for ARM and Thumb2 Use ISelDAGToDAG to recognise the SMMLS instruction pattern. Differential Revision: https://reviews.llvm.org/D22562 llvm-svn: 276624	2016-07-25 09:20:20 +00:00
Craig Topper	7dbacc292f	[AVX512] Add load folding support for the unmasked forms of the FMA instructions. llvm-svn: 276615	2016-07-25 07:20:35 +00:00
Craig Topper	70f4ea4f4f	[AVX512] Add some additional patterns so that we can fold broadcast loads in the first argument of an FMADD/FMSUB/FNMADD/FNMSUB/FMADDSUB/FMSUBADD node. Also add patterns to support all combinations of the broadcast input and the preserved input for masked versions. llvm-svn: 276614	2016-07-25 07:20:31 +00:00
Craig Topper	f835e69f53	[AVX512] Cleanup FMA operand order in patterns to match the VEX versions and to really be 213, 231, and 132. llvm-svn: 276613	2016-07-25 07:20:28 +00:00
Simon Pilgrim	d864569c65	[X86] Add 'FeatureSlowSHLD' to cpu 'bdver4' As with all AMD CPUs, excavator has poor SHLD/SHRD performance. Also added bdver3 to the test as it was missing. llvm-svn: 276569	2016-07-24 16:00:53 +00:00
Craig Topper	39f037759a	[X86] Make the FMA3 instruction names consistent between VEX and EVEX encoded versions. This places the 132/213/231 form number in front of the SS/SD/PS/PD. Move the Y for 256-bit versions to be after the PS/PD. Change the AVX512 scalar forms to include a Z in the their name. This new format should be consistent with the general naming of instructions. llvm-svn: 276559	2016-07-24 08:26:38 +00:00
Craig Topper	444f6fe7c2	[X86] Replace CodeGenOnly VPSRAVW/D/Q_Int instructions with patterns since the operand types exactly match the normal VPSRAVW/D/Q instructions. llvm-svn: 276555	2016-07-24 07:32:45 +00:00
Craig Topper	f933b29548	[X86] Fix typo in comment. llvm-svn: 276528	2016-07-23 16:44:08 +00:00
Chandler Carruth	d2c904150f	Fix a GCC error due to this member name also being a type name. This should fix the build with GCC 4.9 at least. Not sure if this is the right name or fix, but I've followed up on the original commit. llvm-svn: 276522	2016-07-23 07:50:05 +00:00
Craig Topper	2b1e6a9706	[AVX512] Implement commuting support for EVEX encoded FMA3 instructions. llvm-svn: 276521	2016-07-23 07:16:56 +00:00
Craig Topper	fdc847eeff	[X86] Make one of the FMA3 commuting methods static. Remove a call to isFMA3 just to get the IsIntrisic flag, instead get it during the first call and pass it along. NFC llvm-svn: 276520	2016-07-23 07:16:53 +00:00

1 2 3 4 5 ...

38767 Commits