llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00

Author	SHA1	Message	Date
Michael Zuckerman	306643b672	[Clang][AVX512][intrinsics] Fix rcp and sqrt intrinsics. Differential Revision: http://reviews.llvm.org/D20438 llvm-svn: 270322	2016-05-21 14:44:18 +00:00
Michael Zuckerman	3d906b3ef7	[Clang][AVX512][intrinsics] Fix vscalef intrinsics. Differential Revision: http://reviews.llvm.org/D20324 llvm-svn: 270321	2016-05-21 11:09:53 +00:00
Craig Topper	79cd5a6b41	[AVX512] Add patterns for VEXTRACT v16i16->v8i16 and v32i8->v16i8. Disable AVX2 versions of vector extract when AVX512VL is enabled. llvm-svn: 270318	2016-05-21 07:08:56 +00:00
Craig Topper	f3e023e70e	[AVX512] Disable AVX2 VPERMD, VPERMQ, VPERMPS, and VPERMPD patterns when AVX512VL is enabled. Also add shuffle comment printing for AVX512VL VPERMPD/VPERMQ to keep some tests that now use these instructions instead of the AVX2 ones. llvm-svn: 270317	2016-05-21 06:07:18 +00:00
Craig Topper	30a8fe51db	[AVX512] Disable AVX/AVX2 VBROADCASTSS/VBROADCASTSD patterns when AVX512VL is enabled. llvm-svn: 270316	2016-05-21 05:47:25 +00:00
Craig Topper	12c223c367	[AVX512] Use update_llc_test_checks to update some tests so we can see all the instruction encodings and ensure everything is with EVEX. llvm-svn: 270315	2016-05-21 05:46:58 +00:00
Craig Topper	e4f5166f4f	[AVX512] Fix test cases I missed in r270311. llvm-svn: 270313	2016-05-21 03:59:55 +00:00
Matt Arsenault	c4ee204f5c	AMDGPU: Define priorities for register classes Allocating larger register classes first should give better allocation results (and more importantly for myself, make the lit tests more stable with respect to scheduler changes). Patch by Matthias Braun llvm-svn: 270312	2016-05-21 03:55:07 +00:00
Matt Arsenault	1eaf7c8b10	AMDGPU: Cleanup lowering actions These are kind of a mess and hard to follow, particularly for loads and stores. Fix various redundant, unnecessary and dead settings. llvm-svn: 270307	2016-05-21 02:27:49 +00:00
Matt Arsenault	44230570f6	AMDGPU: Fix high bits after division optimization This is essentially doing a 24-bit signed division with FP. We need to truncate to the N bit result. llvm-svn: 270305	2016-05-21 01:53:33 +00:00
Matt Arsenault	06895e862f	AMDGPU: Fix verifier error when spilling SGPRs The current SGPR spilling test does not stress this because it is using s_buffer_load instructions to increase SGPR pressure and spill, but their output operands have the same SReg_32_XM0 constraint. This fixes an error when the SReg_32 output from most instructions is spilled. llvm-svn: 270301	2016-05-21 00:53:42 +00:00
Matt Arsenault	c34a7d2258	AMDGPU: Handle cbranch vccz/vccnz llvm-svn: 270297	2016-05-21 00:29:40 +00:00
Matt Arsenault	5438a4669d	AMDGPU: Implement ReverseBranchCondition llvm-svn: 270296	2016-05-21 00:29:34 +00:00
Matt Arsenault	a197a65904	AMDGPU: Implement AnalyzeBranch Original patch by Tom Stellard llvm-svn: 270295	2016-05-21 00:29:27 +00:00
Dan Gohman	920c7d7490	[WebAssembly] Optimize away return instructions using fallthroughs. This saves a small amount of code size, and is a first small step toward passing values on the stack across block boundaries. Differential Review: http://reviews.llvm.org/D20450 llvm-svn: 270294	2016-05-21 00:21:56 +00:00
Matthias Braun	13037577f3	LiveIntervalAnalysis: Rework constructMainRangeFromSubranges() We now use LiveRangeCalc::extendToUses() instead of a specially designed algorithm in constructMainRangeFromSubranges(): - The original motivation for constructMainRangeFromSubranges() were differences between the main liverange and subranges because of hidden dead definitions. This case however cannot happen anymore with the DetectDeadLaneMasks pass in place. - It simplifies the code. - This fixes a longstanding bug where we did not properly create new SSA values on merging control flow (the MachineVerifier missed most of these cases). - Move constructMainRangeFromSubranges() to LiveIntervalAnalysis and LiveRangeCalc to better match the implementation/available helper functions. This re-applies r269016. The fixes from r270290 and r270259 should avoid the machine verifier problems this time. llvm-svn: 270291	2016-05-20 23:14:56 +00:00
Matthias Braun	327a7f0867	MachineVerifier: subregs so not require defs/valnos on every path It is fine for subregister ranges to be undefined on some CFG paths as we may have a "vregX:other_subreg<read-undef> =" def on that path. We do not (and should not) have live segments for the subregister ranges. The MachineVerifier should not complain about this. This is a slight variant of http://llvm.org/PR27705 llvm-svn: 270290	2016-05-20 23:02:13 +00:00
Tim Shen	a4d5b2aaf2	[PowerPC] Add a testcase for TCO on string rvo function Differential Revision: http://reviews.llvm.org/D20311 llvm-svn: 270287	2016-05-20 22:42:01 +00:00
Jacques Pienaar	4813cd5255	[lanai] Change reloc to use PIC_ by default and cleanup. * Change reloc to PIC_; * Cleanup (clang-format & modify test); llvm-svn: 270282	2016-05-20 21:41:53 +00:00
Matthias Braun	03d346febe	LiveIntervalAnalysis: Fix missing defs in renameDisconnectedComponents(). Fix renameDisconnectedComponents() creating vreg uses that can be reached from function begin withouthaving a definition (or explicit live-in). Fix this by inserting IMPLICIT_DEF instruction before control-flow joins as necessary. Removes an assert from MachineScheduler because we may now get additional IMPLICIT_DEF when preparing the scheduling policy. This fixes the underlying problem of http://llvm.org/PR27705 llvm-svn: 270259	2016-05-20 19:46:13 +00:00
Jun Bum Lim	3a259859b7	[AArch64] Disable narrow load merge by default Summary: As this optimization converts two loads into one load with two shift instructions, it could potentially hurt performance if a loop is arithmetic operation intensive. Reviewers: t.p.northover, mcrosier, jmolloy Subscribers: evandro, jmolloy, aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D20172 llvm-svn: 270251	2016-05-20 18:45:49 +00:00
Simon Pilgrim	95ba516d50	[X86][AVX] Generalized matching for target shuffle combines This patch is a first step towards a more extendible method of matching combined target shuffle masks. Initially this just pulls out the existing basic mask matches and adds support for some 256/512 bit equivalents. Future patterns will require a number of features to be added but I wanted to keep this patch simple. I hope we can avoid duplication between shuffle lowering and combining and share more complex pattern match functions in future commits. Differential Revision: http://reviews.llvm.org/D19198 llvm-svn: 270230	2016-05-20 16:19:30 +00:00
Simon Pilgrim	a938465a18	[X86][AVX] Sync with clang/test/CodeGen/avx-builtins.c llvm-svn: 270229	2016-05-20 16:05:55 +00:00
Rafael Espindola	3acc1df4cd	Refactor X86 symbol access classification. This refactors the logic in X86 to avoid code duplication. It also splits it in two steps: it first decides if a symbol is local to the DSO and then uses that information to decide how to access it. The first part is implemented by shouldAssumeDSOLocal. It is not in any way specific to X86. In a followup patch I intend to move it to somewhere common and reused it in other backends. llvm-svn: 270209	2016-05-20 12:20:10 +00:00
Rafael Espindola	8b4b8109e9	Simplify handling of hidden stubs on PowerPC. We now handle them just like non hidden ones. This was already the case on x86 (r207518) and arm (r207517). llvm-svn: 270205	2016-05-20 12:00:52 +00:00
Chris Dewhurst	08a87b67ef	[Sparc] Enable more inline assembly constraints. Note: This is specifically to allow GCC's test pr44707 to pass. Trivial change, not put for differential revision. Test included. llvm-svn: 270192	2016-05-20 09:03:01 +00:00
Craig Topper	3d9d65f12a	[X86] Run the AVX/AVX2 intrinsic tests in AVX512VL mode too just to make sure we don't break any older intrinsics. llvm-svn: 270183	2016-05-20 05:10:32 +00:00
Craig Topper	4b288a1e5a	Revert accidental commit of a test command line addition. llvm-svn: 270175	2016-05-20 02:01:51 +00:00
Craig Topper	195c9b10ae	[X86] Fix some AVX patterns to only be disabled if VLX and BWI are supported. Without this we get isel failures on the avx-intrinsics-x86.ll test in AVX512VL. llvm-svn: 270174	2016-05-20 02:00:08 +00:00
Matthew Simpson	f1715d1306	[ARM, AArch64] Match additional patterns to ldN instructions When matching an interleaved load to an ldN pattern, the interleaved access pass checks that all users of the load are shuffles. If the load is used by an instruction other than a shuffle, the pass gives up and an ldN is not generated. This patch considers users of the load that are extractelement instructions. It attempts to modify the extracts to use one of the available shuffles rather than the load. After the transformation, the load is only used by shuffles and will then be matched with an ldN pattern. Differential Revision: http://reviews.llvm.org/D20250 llvm-svn: 270142	2016-05-19 21:39:00 +00:00
Hans Wennborg	ff73dabfca	X86: Don't reset the stack after calls that don't return (PR27117) Since the calls don't return, the instruction afterwards will never run, and is just taking up unnecessary space in the binary. Differential Revision: http://reviews.llvm.org/D20406 llvm-svn: 270109	2016-05-19 20:15:33 +00:00
Sanjay Patel	8e951ee358	[x86] add tests for urem lowering llvm-svn: 270096	2016-05-19 18:57:54 +00:00
Simon Pilgrim	0c1a0a9e9d	[X86][SSE] Added fast-isel tests to sync with clang/test/CodeGen/sse-builtins.c llvm-svn: 270081	2016-05-19 16:55:52 +00:00
Simon Pilgrim	1da08bcea1	[X86][SSE2] Fixed shuffle of results in _mm_cmpnge_sd/_mm_cmpngt_sd tests llvm-svn: 270080	2016-05-19 16:49:53 +00:00
Chad Rosier	d705b5562f	[AArch64 ] Generate a BFXIL from 'or (and X, Mask0Imm),(and Y, Mask1Imm)'. Mask0Imm and ~Mask1Imm must be equivalent and one of the MaskImms is a shifted mask (e.g., 0x000ffff0). Both 'and's must have a single use. This changes code like: and w8, w0, #0xffff000f and w9, w1, #0x0000fff0 orr w0, w9, w8 into lsr w8, w1, #4 bfi w0, w8, #4, #12 llvm-svn: 270063	2016-05-19 14:19:47 +00:00
Ranjeet Singh	5c670ba3e8	[ARM] Add cdp intrinsic tests. - Renamed intrinsics.ll to intrinsics-coprocessor.ll as all the tests were testing coprocessor instructions, also made the test checks match the full instruction. Differential Revision: http://reviews.llvm.org/D20393 llvm-svn: 270057	2016-05-19 12:59:17 +00:00
Simon Pilgrim	52b8aeda48	[X86][SSE2] Added _mm_move_* tests llvm-svn: 270046	2016-05-19 11:59:57 +00:00
Simon Pilgrim	7db2a58619	[X86][SSE2] Added _mm_cast* and _mm_set* tests llvm-svn: 270041	2016-05-19 10:58:54 +00:00
Daniel Sanders	7b472cd465	[mips][mips16] Fix ZERO is not a CPU16Regs register error from the machine verifier. Summary: Partially fixes PR27458 Reviewers: sdardis Subscribers: dsanders, llvm-commits, sdardis Differential Revision: http://reviews.llvm.org/D20330 llvm-svn: 270037	2016-05-19 10:42:14 +00:00
Andrey Turetskiy	ccc62bdbb9	[X86] Enable RRL part of the LEA optimization pass for -O2. Enable "Remove Redundant LEAs" part of the LEA optimization pass for -O2. This gives 6.4% performance improve on Broadwell on nnet benchmark from Coremark-pro. There is no significant effect on other benchmarks (Geekbench, Spec2000, Spec2006). Differential Revision: http://reviews.llvm.org/D19659 llvm-svn: 270036	2016-05-19 10:18:29 +00:00
Dan Gohman	65295f5464	[WebAssembly] Make several CHECK lines less fragile using regexes and CHECK-DAG. llvm-svn: 270011	2016-05-19 01:52:56 +00:00
Matt Arsenault	efc0ca4c19	AMDGPU: Fix promote alloca for pointer loads If the load has a pointer type, we don't want to change its type. llvm-svn: 270000	2016-05-18 23:20:24 +00:00
Rafael Espindola	22e87bbb08	Delete Reloc::Default. Having an enum member named Default is quite confusing: Is it distinct from the others? This patch removes that member and instead uses Optional<Reloc> in places where we have a user input that still hasn't been maped to the default value, which is now clear has no be one of the remaining 3 options. llvm-svn: 269988	2016-05-18 22:04:49 +00:00
Krzysztof Parzyszek	72f4c82ed6	When looking for a spill slot in reg scavenger, find one that matches RC When looking for an available spill slot, the register scavenger would stop after finding the first one with no register assigned to it. That slot may have size and alignment that do not meet the requirements of the register that is to be spilled. Instead, find an available slot that is the closest in size and alignment to one that is needed to spill a register from RC. Differential Revision: http://reviews.llvm.org/D20295 llvm-svn: 269969	2016-05-18 18:16:00 +00:00
Simon Pilgrim	06fff37e6e	[X86][SSE2] Added fast-isel tests to sync with clang/test/CodeGen/sse2-builtins.c llvm-svn: 269966	2016-05-18 18:00:43 +00:00
Matt Arsenault	41311e20a0	AMDGPU: Other sizes of popcnt are fast We can chain bcnt instructions together, so any width popcnt is pretty fast. llvm-svn: 269950	2016-05-18 16:10:19 +00:00
Hans Wennborg	5b89989aa5	Re-commit r269828 "X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions" with an additional fix to make RegAllocFast ignore undef physreg uses. It would previously get confused about the "push %eax" instruction's use of eax. That method for adjusting the stack pointer is used in X86FrameLowering::emitSPUpdate as well, but since that runs after register-allocation, we didn't run into the RegAllocFast issue before. llvm-svn: 269949	2016-05-18 16:10:17 +00:00
Matt Arsenault	174af82fd1	AMDGPU: Fix assert when erroring on a call For some reason an assert is now hit when a valid chain is not returned, so return the entry chain. llvm-svn: 269948	2016-05-18 16:10:11 +00:00
Matt Arsenault	c1825f766d	AMDGPU: Handle alloca promoting with null operands If the second pointer in a multi-pointer instruction is a constant, we can replace the type. llvm-svn: 269945	2016-05-18 15:57:21 +00:00
Matt Arsenault	b21e3597b5	AMDGPU: Fix a few slightly broken tests Fix minor bugs and uses of undef which break when pointer related optimization passes are run. llvm-svn: 269944	2016-05-18 15:48:44 +00:00

1 2 3 4 5 ...

15969 Commits