llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 20:12:56 +02:00

Author	SHA1	Message	Date
Sanjay Patel	019be40c8c	Enable FeatureFastUAMem for btver2 Allow unaligned 16-byte memop codegen for btver2. No functional changes for any other subtargets. Replace the existing supposed small memcpy test with an actual test of a small memcpy. The previous test wasn't using FileCheck either. This patch should allow us to close PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ). Differential Revision: http://reviews.llvm.org/D6360 llvm-svn: 222925	2014-11-28 18:40:18 +00:00
Tim Northover	e8b34aaff0	AArch64: treat [N x Ty] as a block during procedure calls. The AAPCS treats small structs and homogeneous floating (or vector) aggregates specially, and guarantees they either get passed as a contiguous block of registers, or prevent any future use of those registers and get passed on the stack. This concept can fit quite neatly into LLVM's own type system, mapping an HFA to [N x float] and so on, and small structs to [N x i64]. Doing so allows front-ends to emit AAPCS compliant code without having to duplicate the register counting logic. llvm-svn: 222903	2014-11-27 21:02:42 +00:00
Charlie Turner	e4a6c7fd89	Stop uppercasing build attribute data. The string data for string-valued build attributes were being unconditionally uppercased. There is no mention in the ARM ABI addenda about case conventions, so it's technically implementation defined as to whether the data are capitialised in some way or not. However, there are good reasons not to captialise the data. * It's less work. * Some vendors may legitimately have case-sensitive checks for these attributes which would fail on LLVM generated object files. * There could be locale issues with uppercasing. The original reasons for uppercasing appear to have stemmed from an old codesourcery toolchain behaviour, see http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/87133 This patch makes the object file emitted no longer captialise string data, it encodes as seen in the assembly source. Change-Id: Ibe20dd6e60d2773d57ff72a78470839033aa5538 llvm-svn: 222882	2014-11-27 12:13:56 +00:00
Will Newton	da953f13b9	Update AArch64 ELF relocations to ABI 1.0 This mostly entails adding relocations, however there are a couple of changes to existing relocations: 1. R_AARCH64_NONE is defined to be zero rather than 256 R_AARCH64_NONE has been defined to be zero for a long time elsewhere e.g. binutils and glibc since the submission of the AArch64 port in 2012 so this is required for compatibility. 2. R_AARCH64_TLSDESC_ADR_PAGE renamed to R_AARCH64_TLSDESC_ADR_PAGE21 I don't think there is any way for relocation names to leak out of LLVM so this should not break anything. Tested with check-all with no regressions. llvm-svn: 222821	2014-11-26 10:49:18 +00:00
Elena Demikhovsky	868b76ae69	AVX-512: Scalar ERI intrinsics including SAE mode and memory operand. Added AVX512_maskable_scalar template, that should cover all scalar instructions in the future. The main difference between AVX512_maskable_scalar<> and AVX512_maskable<> is using X86select instead of vselect. I need it, because I can't create vselect node for MVT::i1 mask for scalar instruction. http://reviews.llvm.org/D6378 llvm-svn: 222820	2014-11-26 10:46:49 +00:00
Simon Pilgrim	0e1c44b939	[X86][SSE] Improvements to byte shift shuffle matching Since (v)pslldq / (v)psrldq instructions resolve to a single input argument it is useful to match it much earlier than we currently do - this prevents more complicated shuffles (notably insertion into a zero vector) matching before it. Differential Revision: http://reviews.llvm.org/D6409 llvm-svn: 222796	2014-11-25 22:34:59 +00:00
Cameron McInally	c32dadfa69	[AVX512] Add 512b integer shift by variable intrinsics and patterns. llvm-svn: 222786	2014-11-25 20:41:51 +00:00
Zoran Jovanovic	c3664f6f8a	[mips][micromips] Use call instructions with short delay slots Differential Revision: http://reviews.llvm.org/D6338 llvm-svn: 222752	2014-11-25 10:50:00 +00:00
Juergen Ributzka	c90ddb75a2	[FastISel][AArch64] Fix and extend the tbz/tbnz pattern matching. The pattern matching failed to recognize all instances of "-1", because when comparing against "-1" we didn't use an APInt of the same bitwidth. This commit fixes this and also adds inverse versions of the conditon to catch more cases. llvm-svn: 222722	2014-11-25 04:16:15 +00:00
Hal Finkel	515f6e50f5	[PowerPC] Implement combineRepeatedFPDivisors This does not matter on newer cores (where we can use reciprocal estimates in fast-math mode anyway), but for older cores this allows us to generate better fast-math code where we have multiple FDIVs with a common divisor. llvm-svn: 222710	2014-11-24 23:45:21 +00:00
Andrea Di Biagio	3646b17160	[X86] Improved target specific combine on VSELECT dag nodes. This patch teaches function 'transformVSELECTtoBlendVECTOR_SHUFFLE' how to convert VSELECT dag nodes to shuffles on targets that do not have SSE4.1. On pre-SSE4.1 targets, we can still perform blend operations using movss/movsd. Also, removed a target specific combine that performed a premature lowering of VSELECT nodes to target specific MOVSS/MOVSD nodes. llvm-svn: 222647	2014-11-24 12:23:15 +00:00
Michael Kuperstein	b12b19a24a	[X86] Fixes bug in build_vector v4x32 lowering r222375 made some improvements to build_vector lowering of v4x32 and v4xf32 into an insertps, but it missed a case where: 1. A single extracted element is used twice. 2. The lower of the two non-zero indexes should be preserved, and the higher should be used for the dest mask. This caused a crash, since the source value for the insertps ends-up uninitialized. Differential Revision: http://reviews.llvm.org/D6377 llvm-svn: 222635	2014-11-23 13:09:06 +00:00
Elena Demikhovsky	36a2243ab7	Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632	2014-11-23 08:07:43 +00:00
Matt Arsenault	1b03538afe	R600: Fix extloads of i1 on R600/Evergreen llvm-svn: 222631	2014-11-23 02:57:54 +00:00
Matt Arsenault	5edd328299	R600/SI: Add additional tests for i1 loads llvm-svn: 222629	2014-11-23 02:57:50 +00:00
Matt Arsenault	1d9ec67967	R600/SI: Fix broken check lines and modernize prefixes Use -LABEL and remove -CHECK llvm-svn: 222628	2014-11-23 02:57:49 +00:00
Matt Arsenault	a30225b261	R600/SI: Fix missing -verify-machineinstrs on a test llvm-svn: 222627	2014-11-23 02:57:47 +00:00
Chandler Carruth	4c0a2a8001	[x86] Add some tests for a common unpack pattern of vector shuffle that has a remarkably unique and efficient lowering. While we get this some of the time already, we miss a few cases and there wasn't a principled reason we got it. We should at least test this. v8 already has tests for this pattern. llvm-svn: 222607	2014-11-22 05:44:43 +00:00
Tom Stellard	3929e69978	R600/SI: Add a failing test case for offset order in ds_read2 instructions llvm-svn: 222585	2014-11-21 22:31:47 +00:00
Tom Stellard	a112fe4e40	R600/SI: Emit s_mov_b32 m0, -1 before every DS instruction This s_mov_b32 will write to a virtual register from the M0Reg class and all the ds instructions now take an extra M0Reg explicit argument. This change is necessary to prevent issues with the scheduler mixing together instructions that expect different values in the m0 registers. llvm-svn: 222583	2014-11-21 22:31:44 +00:00
Tom Stellard	484f10138e	R600/SI: Add SIFoldOperands pass This pass attempts to fold the source operands of mov and copy instructions into their uses. llvm-svn: 222581	2014-11-21 22:06:37 +00:00
Jozef Kolek	52fa965cf8	[mips][microMIPS] This patch implements functionality in MIPS delay slot filler such as if delay slot filler have to put NOP instruction into the delay slot of microMIPS BEQ or BNE instruction which uses the register $0, then instead of emitting NOP this instruction is replaced by the corresponding microMIPS compact branch instruction, i.e. BEQZC or BNEZC. Differential Revision: http://reviews.llvm.org/D3566 llvm-svn: 222580	2014-11-21 22:04:35 +00:00
Tom Stellard	267a21d6d7	R600/SI: Use hex notation for constant in test llvm-svn: 222578	2014-11-21 22:00:13 +00:00
Sanjay Patel	776e5485fb	Add a feature flag for slow 32-byte unaligned memory accesses [x86]. This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen for Sandy Bridge and Ivy Bridge. There is no functionality change intended for those chips. Previously, the absence of AVX2 was being used as a proxy to detect this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2 that do not have the 32-byte unaligned access slowdown. Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ). Differential Revision: http://reviews.llvm.org/D6355 llvm-svn: 222544	2014-11-21 17:40:04 +00:00
Chandler Carruth	5e598c0342	[x86] Restructure the checking patterns for v16 and v32 avx2 vector shuffle lowering to allow much better blend matching. Specifically, with the new structure the code seems clearer to me and we correctly can hit the cases where merging two 128-bit lanes is a clear win and can be shuffled cheaply afterward. llvm-svn: 222539	2014-11-21 14:53:03 +00:00
Chandler Carruth	7491f1f32f	[x86] Make the previous logic significantly less conservative and get a bunch more improvements. Non-lane-crossing is fine, the key is that lane merging only makes sense for single-input shuffles. Not sure why I got so turned around here. The code all works, I was just using the wrong model for it. This only updates v4 and v8 lowering. The v16 and v32 lowering requires restructuring the entire check sequence. llvm-svn: 222537	2014-11-21 14:33:24 +00:00
Andrea Di Biagio	0a8cf1ad5a	[DAG] Teach how to turn a build_vector into a shuffle if some of the operands are zero. Before this patch, the DAGCombiner only tried to convert build_vector dag nodes into shuffles if all operands were either extract_vector_elt or undef. This patch improves that logic and teaches the DAGCombiner how to deal with build_vector dag nodes where one or more operands are zero. A build_vector dag node with some zero operands is turned into a shuffle only if the resulting shuffle mask is legal for the target. llvm-svn: 222536	2014-11-21 14:32:06 +00:00
Chandler Carruth	8387bec088	[x86] Teach the x86 vector shuffle lowering to detect mergable 128-bit lanes. By special casing these we can often either reduce the total number of shuffles significantly or reduce the number of (high latency on Haswell) AVX2 shuffles that potentially cross 128-bit lanes. Even when these don't actually cross lanes, they have much higher latency to support that. Doing two of them and a blend is worse than doing a single insert across the 128-bit lanes to blend and then doing a single interleaved shuffle. While this seems like a narrow case, it kept cropping up on me and the difference is huge as you can see in many of the test cases. I first hit this trying to perfectly fix the interleaving shuffle patterns used by Halide for AVX2. llvm-svn: 222533	2014-11-21 13:56:05 +00:00
Chandler Carruth	5646862a2e	[x86] Remove more windows line endings that slipped into this file... llvm-svn: 222528	2014-11-21 12:33:46 +00:00
Chandler Carruth	2db7c4cf32	[x86] Add a bunch of test cases to 256-bit shuffles that exercise merging 128-bit subvectors and also shuffling all the elements of those subvectors. Currently we generate pretty bad code for many of these, but I'm testing a patch that should dramatically improve this in addition to making the shuffle lowering robust to other changes. llvm-svn: 222525	2014-11-21 12:17:50 +00:00
Alexey Volkov	235268b4ed	[X86] For Silvermont CPU use 16-bit division instead of 64-bit for small positive numbers Differential Revision: http://reviews.llvm.org/D5938 llvm-svn: 222521	2014-11-21 11:19:34 +00:00
Hao Liu	9cb82be410	DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal. E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip) A hook is added to allow the target to control whether it needs to do such combine. Reviewed in http://reviews.llvm.org/D6334 llvm-svn: 222510	2014-11-21 06:39:58 +00:00
Hal Finkel	ac26448a5c	[PPC] Use SeparateConstOffsetFromGEP This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM, there is a store moved out of the inner loop) and a potential speedup on MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it makes some code look cleaner, and synchronizing the backends in this regard seems like a generally good thing. llvm-svn: 222504	2014-11-21 04:35:51 +00:00
Quentin Colombet	8fea50c066	[X86] Do not custom lower UINT_TO_FP when the target type does not match the custom lowering. <rdar://problem/19026326> llvm-svn: 222489	2014-11-21 00:47:19 +00:00
Saleem Abdulrasool	0de13e90eb	X86: use the correct alloca symbol for Windows Itanium Windows itanium targets the MSVCRT, and the stack probe symbol is provided by MSVCRT. This corrects the emission of stack probes on i686-windows-itanium. llvm-svn: 222439	2014-11-20 18:01:26 +00:00
Andrea Di Biagio	b770dd344e	[X86] Improved lowering of v4x32 build_vector dag nodes. This patch improves the lowering of v4f32 and v4i32 build_vector dag nodes that are known to have at least two non-zero elements. With this patch, a build_vector that performs a blend with zero is converted into a shuffle. This is done to let the shuffle legalizer expand the dag node in a optimal way. For example, if we know that a build_vector performs a blend with zero, we can try to lower it as a movq/blend instead of always selecting an insertps. This patch also improves the logic that lowers a build_vector into a insertps with zero masking. See for example the extra test cases added to test sse41.ll. Differential Revision: http://reviews.llvm.org/D6311 llvm-svn: 222375	2014-11-19 19:34:29 +00:00
Tom Stellard	271c0a936e	R600/SI: Make SIInstrInfo::isOperandLegal() more strict A register operand that has a common sub-class with its instruction's defined register class is not always legal. For example, SReg_32 and M0Reg both have a common sub-class, but we can't use an SReg_32 in instructions that expect a M0Reg. This prevents the llvm.SI.sendmsg.ll test from failing when the fold operand pass is added. llvm-svn: 222368	2014-11-19 16:58:49 +00:00
Jozef Kolek	d19675f448	[mips][microMIPS] Implement CodeGen support for 16-bit instruction ADDIUR2. Differential Revision: http://reviews.llvm.org/D5800 llvm-svn: 222352	2014-11-19 13:23:58 +00:00
Jozef Kolek	9fbf00198c	[mips][microMIPS] Implement CodeGen support for ADDIUS5 instruction. Differential Revision: http://reviews.llvm.org/D5799 llvm-svn: 222351	2014-11-19 13:11:09 +00:00
Jozef Kolek	e466cd5b54	[mips][microMIPS] Implement SDBBP and RDHWR instructions. Differential Revision: http://reviews.llvm.org/D5240 llvm-svn: 222347	2014-11-19 11:25:50 +00:00
Simon Pilgrim	e5f972f1c1	[X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2 This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain. I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr. Differential Revision: http://reviews.llvm.org/D5699 llvm-svn: 222340	2014-11-19 10:06:49 +00:00
Hao Liu	00d285aca3	[AArch64] Enable SeparateConstOffsetFromGEP, EarlyCSE and LICM passes on AArch64 backend. SeparateConstOffsetFromGEP can gives more optimizaiton opportunities related to GEPs, which benefits EarlyCSE and LICM. By enabling these passes we can have better address calculations and generate a better addressing mode. Some SPEC 2006 benchmarks (astar, gobmk, namd) have obvious improvements on Cortex-A57. Reviewed in http://reviews.llvm.org/D5864. llvm-svn: 222331	2014-11-19 06:39:53 +00:00
Weiming Zhao	c7ce2ee93f	[Aarch64] Customer lowering of CTPOP to SIMD should check for NEON availability llvm-svn: 222292	2014-11-19 00:29:14 +00:00
Matt Arsenault	73f4bd8758	R600/SI: Implement areMemAccessesTriviallyDisjoint This partially makes up for not having address spaces used for alias analysis in some simple cases. This is not yet enabled by default so shouldn't change anything yet. llvm-svn: 222286	2014-11-19 00:01:31 +00:00
Simon Pilgrim	daabed160f	[X86][AVX] 256-bit vector stack unaligned load/stores identification Under many circumstances the stack is not 32-byte aligned, resulting in the use of the vmovups/vmovupd/vmovdqu instructions when inserting ymm reloads/spills. This minor patch adds these instructions to the isFrameLoadOpcode/isFrameStoreOpcode helpers so that they can be correctly identified and not be treated as folded reloads/spills. This has also been noticed by http://llvm.org/bugs/show_bug.cgi?id=18846 where it was causing redundant spills - I've added a reduced test case at test/CodeGen/X86/pr18846.ll Differential Revision: http://reviews.llvm.org/D6252 llvm-svn: 222281	2014-11-18 23:38:19 +00:00
Chad Rosier	ccf41a5c21	[FastISel][AArch64] Also allow folding of sign-/zero-extend and arithmetic shift-right for booleans (i1). Arithmetic shift-right immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. llvm-svn: 222272	2014-11-18 22:41:49 +00:00
Chad Rosier	7153154a79	[FastISel][AArch64] Also allow folding of sign-/zero-extend and logical shift-right for booleans (i1). Logical shift-right immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. llvm-svn: 222270	2014-11-18 22:38:42 +00:00
Juergen Ributzka	b3791ee3a7	[FastISel][AArch64] Follow-up fix for "Fix shift-immediate emission for "zero" shifts." Shifts also perform sign-/zero-extends to larger types, which requires us to emit an integer extend instead of a simple COPY. Related to PR21594. llvm-svn: 222257	2014-11-18 21:20:17 +00:00
Matt Arsenault	84d2214a94	R600/SI: Move SIFixSGPRCopies to inst selector passes This should expose more of the actually used VALU instructions to the machine optimization passes. This also should help getting i1 handling into a better state. For not entirly understood reasons, this fixes the split-scalar-i64-add.ll test where a 64-bit add would only partially be moved to the VALU resulting in use of undefined VCC. llvm-svn: 222256	2014-11-18 21:06:58 +00:00
Tom Stellard	962ccd7f85	R600/SI: Make sure resource descriptors are always stored in SGPRs llvm-svn: 222253	2014-11-18 20:39:39 +00:00

1 2 3 4 5 ...

11301 Commits