llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 20:12:56 +02:00

Author	SHA1	Message	Date
Pete Cooper	b753649d63	Revert "Change memcpy/memset/memmove to have dest and source alignments." This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543	2015-11-19 05:56:52 +00:00
Pete Cooper	aca4c5cdc6	Change memcpy/memset/memmove to have dest and source alignments. Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.llvm\.memset.)i32\ [0-9]\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, / isVolatile / false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, / isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511	2015-11-18 22:17:24 +00:00
Dan Gohman	3ee3e60f77	Use TargetRegisterInfo for printing MachineOperand register comments Several places in AsmPrinter.cpp print comments describing MachineOperand registers using MCRegisterInfo, which uses MCOperand-oriented names. This doesn't work for targets that use virtual registers exclusively, as WebAssembly does, since virtual registers are represented and printed differently. This patch preserves what seems to be the spirit of r229978, avoiding the use of TM.getSubtargetImpl(), while still using MachineOperand-oriented printing for MachineOperands. Differential Revision: http://reviews.llvm.org/D14709 llvm-svn: 253338	2015-11-17 16:01:28 +00:00
Tom Stellard	375168229e	Revert "Remove unnecessary call to getAllocatableRegClass" This reverts commit r252565. This also includes the revert of the commit mentioned below in order to avoid breaking tests in AMDGPU: Revert "AMDGPU: Set isAllocatable = 0 on VS_32/VS_64" This reverts commit r252674. llvm-svn: 252956	2015-11-12 21:43:25 +00:00
Matt Arsenault	01b2f20bdc	AMDGPU: Set isAllocatable = 0 on VS_32/VS_64 llvm-svn: 252674	2015-11-11 00:01:32 +00:00
Tom Stellard	a2b56630c6	DAGCombiner: Check shouldReduceLoadWidth before combining (and (load), x) -> extload Reviewers: resistor, arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13805 llvm-svn: 252349	2015-11-06 21:58:37 +00:00
Matt Arsenault	c445aba1ed	AMDGPU: Create emergency stack slots during frame lowering Test has a bogus verifier error which will be fixed by later commits. llvm-svn: 252327	2015-11-06 18:17:45 +00:00
Matt Arsenault	275274c0fe	AMDGPU: Add pass to detect used kernel features Mark kernels that use certain features that require user SGPRs to support with kernel attributes. We need to know before instruction selection begins because it impacts the kernel calling convention lowering. For now this only detects the workitem intrinsics. llvm-svn: 252323	2015-11-06 18:01:57 +00:00
Matt Arsenault	368c08a19e	AMDGPU: Hack for VS_32 register pressure For some reason VS_32 ends up factoring into the pressure heuristics even though we should never see a virtual register with this class. When SGPRs are reserved for register spilling, this for some reason triggers reg-crit scheduling. Setting isAllocatable = 0 may help with this since that seems to remove it from the default implementation's generated table. llvm-svn: 252321	2015-11-06 17:54:43 +00:00
Tom Stellard	0113590b91	AMDGPU/SI: Emit HSA kernels with symbol type STT_AMDGPU_HSA_KERNEL Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13804 llvm-svn: 252291	2015-11-06 11:45:14 +00:00
Peter Collingbourne	5b721561aa	DI: Reverse direction of subprogram -> function edge. Previously, subprograms contained a metadata reference to the function they described. Because most clients need to get or set a subprogram for a given function rather than the other way around, this created unneeded inefficiency. For example, many passes needed to call the function llvm::makeSubprogramMap() to build a mapping from functions to subprograms, and the IR linker needed to fix up function references in a way that caused quadratic complexity in the IR linking phase of LTO. This change reverses the direction of the edge by storing the subprogram as function-level metadata and removing DISubprogram's function field. Since this is an IR change, a bitcode upgrade has been provided. Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is attached to the PR. Differential Revision: http://reviews.llvm.org/D14265 llvm-svn: 252219	2015-11-05 22:03:56 +00:00
Matt Arsenault	b6bed238b4	AMDGPU: Fix assert when legalizing atomic operands The operand layout is slightly different for the atomic opcodes from the usual MUBUF loads and stores. This should only fix it on SI/CI. VI is still broken because it still emits the addr64 replacement. llvm-svn: 252140	2015-11-05 02:46:56 +00:00
Matt Arsenault	dd215b7857	AMDGPU: Add missing v2f64 fadd tests llvm-svn: 252117	2015-11-05 01:03:11 +00:00
Matt Arsenault	456805768c	AMDGPU: Stop assuming vreg for build_vector This was causing a variety of test failures when v2i64 is added as a legal type. SIFixSGPRCopies should correctly handle the case of vector inputs to a scalar reg_sequence, so this isn't necessary anymore. This was hiding some deficiencies in how reg_sequence is handled later, but this shouldn't be a problem anymore since the register class copy of a reg_sequence is now done before the reg_sequence. llvm-svn: 251860	2015-11-02 23:30:48 +00:00
Matt Arsenault	4d9ca98b18	AMDGPU: Error on graphics shaders with HSA I've found myself pointlessly debugging problems from running graphics tests with an HSA triple a few times, so stop this from happening again. llvm-svn: 251858	2015-11-02 23:23:02 +00:00
Matt Arsenault	6d6f62b066	AMDGPU: Un XFAIL a test This should probably be merged with one of the other private memory tests, but it fails on r600. llvm-svn: 251856	2015-11-02 23:15:46 +00:00
Matt Arsenault	6d010fa207	AMDGPU: Distribute SGPR->VGPR copies of REG_SEQUENCE Make the REG_SEQUENCE be a VGPR, and do the register class copy first. llvm-svn: 251855	2015-11-02 23:15:42 +00:00
Marek Olsak	d46640b0e5	AMDGPU/SI: handle undef for llvm.SI.packf16 llvm-svn: 251632	2015-10-29 15:29:09 +00:00
Marek Olsak	7916819368	AMDGPU/SI: use S_OR for fneg (fabs f32) llvm-svn: 251631	2015-10-29 15:29:05 +00:00
Marek Olsak	8b37a7065a	AMDGPU/SI: use S_AND for i1 trunc llvm-svn: 251630	2015-10-29 15:05:03 +00:00
Matt Arsenault	34e6b29b92	AMDGPU: Fix verifier error in SIFoldOperands There may be other use operands that also need their kill flags cleared. This happens in a few tests when SIFoldOperands is moved after PeepholeOptimizer. PeepholeOptimizer rewrites cases that look like: %vreg0 = ... %vreg1 = COPY %vreg0 use %vreg1<kill> %vreg2 = COPY %vreg0 use %vreg2<kill> to use the earlier source to %vreg0 = ... use %vreg0 use %vreg0 Currently SIFoldOperands sees the copied registers, so there is only one use. So far I haven't managed to come up with a test that currently has multiple uses of a foldable VGPR -> VGPR copy. llvm-svn: 250960	2015-10-21 22:37:50 +00:00
Matt Arsenault	6e5d4b912c	AMDGPU: Stop reserving v[254:255] This wasn't doing anything useful. They weren't explicitly used anywhere, and the RegScavenger ignores reserved registers. This for some reason caused a random scheduling change in the test. Getting the check lines to pass is too frustrating, and there's probably not too much value in checking the vector case's operands N times. llvm-svn: 250794	2015-10-20 03:59:58 +00:00
Matt Arsenault	12118bb735	DAGCombiner: Don't stop finding better chain on 2 aliases The comment says this was stopped because it was unlikely to be profitable. This is not true if you want to combine vector loads with multiple components. For a simple case that looks like t0 = load t0 ... t1 = load t0 ... t2 = load t0 ... t3 = load t0 ... t4 = store t0:1, t0:1 t5 = store t4, t1:0 t6 = store t5, t2:0 t7 = store t6, t3:0 We want to get all of these stores onto a chain that is a TokenFactor of these N loads. This mostly solves the AMDGPU merge-stores.ll regressions with -combiner-alias-analysis for merging vector stores of vector loads. llvm-svn: 250138	2015-10-13 00:49:00 +00:00
Matt Arsenault	85dd075020	DAGCombiner: Combine extract_vector_elt from build_vector This basic combine was surprisingly missing. AMDGPU legalizes many operations in terms of 32-bit vector components, so not doing this results in many extra copies and subregister extracts that need to be cleaned up later. InstCombine already does this for the hasOneUse case. The target hook is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn from a vector materialize repeated immediate instruction to a constant vector load with more scalar copies from it. llvm-svn: 250129	2015-10-12 23:59:50 +00:00
Matt Arsenault	28c28a361a	AMDGPU: Use explicit register size indirect pseudos This stops using an unknown reg class operand. Currently build_vector selection has a broken looking check where it tries to use a VGPR reg class and an SGPR one if it sees an SGPR use. With the source operand has an explicit VGPR class, illegal copies will be inserted that SIFixSGPRCopies will take care of normally later, which will allow removing the weird check of build_vector users. Without this, when removed v_movrels_b32 would still be emitted even though all of the values were only stored in SGPRs. llvm-svn: 249494	2015-10-07 00:42:51 +00:00
Tom Stellard	6c21c7bcdf	AMDGPU/SI: Remove calling convention assertion from LowerFormalArguments() Summary: We currently ignore the calling convention, so there is no real reason to assert on the calling convention of functions. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13367 llvm-svn: 249468	2015-10-06 21:16:34 +00:00
Tom Stellard	597d1f5f9b	AMDGPU/SI: Remove assert from AMDGPUOpenCLImageTypeLowering pass Summary: Instead of asserting when the kernel metadata is different than we expect, we should just skip lowering that function. This fixes assertion failures with OpenCL argument metadata from older LLVM releases. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13356 llvm-svn: 249073	2015-10-01 21:16:05 +00:00
Tom Stellard	e835f56682	AMDGPU: Add MEM_RAT STORE_TYPED. v2: Add test (Matt). Fix capitalization of isEOP (Matt). Move pattern to class parameter (Matt). Make the instruction available to Cayman (Matt). Change name from MEM_RAT WRITE_TYPED to MEM_RAT STORE_TYPED. Patch by: Zoltan Gilian llvm-svn: 249042	2015-10-01 17:51:34 +00:00
Matt Arsenault	0376c2dc85	AMDGPU: Fix splitting x16 SMRD loads When used recursively, this would set the kill flag on the intermediate step from first splitting x16 to x8. llvm-svn: 248741	2015-09-28 20:54:52 +00:00
Matt Arsenault	f3f42b5b21	AMDGPU: Fix moving SMRD loads with literal offsets on CI llvm-svn: 248740	2015-09-28 20:54:46 +00:00
Matt Arsenault	5a88f554bc	AMDGPU: Add testcases Make sure we are testing moving users of the moved and split SMRD loads. llvm-svn: 248738	2015-09-28 20:54:38 +00:00
Matt Arsenault	c8915793b6	AMDGPU: Cleanup test Run instnamer on it, and rename check prefix. This is in preparation for adding new testcases to cover bugs on other subtargets. llvm-svn: 248737	2015-09-28 20:54:32 +00:00
Matt Arsenault	8a568ce423	AMDGPU: Fix sched model for VOP2b instructions Trying to use the version with the explicit output operand would complain because of the missing WriteSALU. I'm not sure why it doesn't complain about this with the implicit VCC def. llvm-svn: 248646	2015-09-26 02:25:45 +00:00
Tom Stellard	c6bc4ec163	AMDGPU/SI: Use .hsatext section instead of .text for HSA Reviewers: arsenm, grosbach, rafael Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12424 llvm-svn: 248619	2015-09-25 21:41:28 +00:00
Matt Arsenault	8a987c4789	PeepholeOptimizer: Remove redundant copies If a virtual register is copied and another copy was already seen, replace with the previous copy. This only handles the simplest cases for now. This pattern shows up from various operand restrictions AMDGPU has which require inserting copies depending on the register class of the operands. llvm-svn: 248611	2015-09-25 20:22:12 +00:00
Matt Arsenault	5b7567e45f	AMDGPU: Add some more tests for literal operands llvm-svn: 248600	2015-09-25 18:21:47 +00:00
Matt Arsenault	a9d7b4e305	AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAG This fixes a select error when the i64 source was also bitcasted to v2i32 in the original source. Instead of awkwardly trying to select the modified source value and the store, replace before isel begins. Uses a worklist to avoid possible problems from mutating the DAG, although it seems to work OK without it. llvm-svn: 248589	2015-09-25 17:27:08 +00:00
Matt Arsenault	7377cbeef9	AMDGPU: Improve accuracy of instruction rates for VOPC These were all using the default 32-bit VALU write class, but the i64/f64 compares are half rate. I'm not sure this is really correct, because they are still using the write to VALU write class, even though they really write to the SALU. llvm-svn: 248582	2015-09-25 16:58:25 +00:00
Matt Arsenault	4f500cff99	AMDGPU: Add s_dcache_* instructions llvm-svn: 248533	2015-09-24 19:52:27 +00:00
Matt Arsenault	304779755d	AMDGPU: Add cache invalidation instructions. These are necessary for implementing mem_fence for OpenCL 2.0. The VI assembler tests are disabled since it seems to be using the wrong encoding or opcode. llvm-svn: 248532	2015-09-24 19:52:21 +00:00
Matt Arsenault	099f4c5254	Introduce target hook for optimizing register copies Allow a target to do something other than search for copies that will avoid cross register bank copies. Implement for SI by only rewriting the most basic copies, so it should look through anything like a subregister extract. I'm not entirely satisified with this because it seems like eliminating a reg_sequence that isn't fully used should work generically for all targets without them having to override something. However, it seems to be tricky to have a simple implementation of this without rewriting to invalid kinds of subregister copies on some targets. I'm not sure if there is currently a generic way to easily check if a subregister index would be valid for the current use. The current set of TargetRegisterInfo::get*Class functions don't quite behave like I would expect (e.g. getSubClassWithSubReg returns the maximal register class rather than the minimal), so I'm not sure how to make the generic test keep searching if SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making the default implementation to check for simple copies breaks a variety of ARM and x86 tests by producing illegal subregister uses. The ARM tests are not actually changed since it should still be using the same sharesSameRegisterFile implementation, this just relaxes them to not check for specific registers. llvm-svn: 248478	2015-09-24 08:36:14 +00:00
Matt Arsenault	944f6bbc7c	AMDGPU: Fix printing trailing whitespace for mubuf atomics llvm-svn: 248472	2015-09-24 07:51:17 +00:00
Matt Arsenault	3b9edaf5a4	AMDGPU: Reduce number of copies emitted Instead of always inserting a copy in case the super register is itself a subregister, only extract to the super reg class if this is actually the case. This shouldn't really change codegen, but makes looking at the output of SIFixSGPRCopies easier to read. llvm-svn: 248467	2015-09-24 07:16:37 +00:00
Matthias Braun	401d9e7fdb	LiveIntervalAnalysis: Avoid multiple connected liveness components We may have subregister defs which are unused but not discovered and cleaned up prior to liveness analysis. This creates multiple connected components in the resulting live range which are forbidden in the MachineVerifier because they would unnecesarily constrain the register allocator. Rewrite those dead definitions to define a newly created virtual register. Differential Revision: http://reviews.llvm.org/D13035 llvm-svn: 248335	2015-09-22 22:37:44 +00:00
Simon Pilgrim	c8b097b7ef	[DAGCombiner] Improve FMA support for interpolation patterns This patch adds support for combining patterns such as (FMUL(FADD(1.0, x), y)) and (FMUL(FSUB(x, 1.0), y)) to their FMA equivalents. This is useful in particular for linear interpolation cases such as (FADD(FMUL(x, t), FMUL(y, FSUB(1.0, t)))) Differential Revision: http://reviews.llvm.org/D13003 llvm-svn: 248210	2015-09-21 20:32:48 +00:00
Matt Arsenault	70e6ce5a40	DAGCombiner: Replace store of FP constant after attemping store merges If storing multiple FP constants, some subset of the stores would be replaced with integers due to visit order, so MergeConsecutiveStores would only partially merge these. llvm-svn: 248169	2015-09-21 15:59:46 +00:00
Matt Arsenault	bd7c6a697f	AMDGPU: Add failing testcase for live interval construction llvm-svn: 248067	2015-09-19 00:03:56 +00:00
Tom Stellard	687b1fc846	AMDGPU/SI: Fold operands through REG_SEQUENCE instructions Summary: This helps mostly when we use add instructions for address calculations that contain immediates. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12256 llvm-svn: 247157	2015-09-09 15:43:26 +00:00
Matt Arsenault	a2aa311bd3	SelectionDAG: Support Expand of f16 extloads Currently this hits an assert that extload should always be supported, which assumes integer extloads. This moves a hack out of SI's argument lowering and is covered by existing tests. llvm-svn: 247113	2015-09-09 01:12:27 +00:00
Matt Arsenault	6a103a7cb5	AMDGPU: Handle sub of constant for DS offset folding sub C, x - > add (sub 0, x), C for DS offsets. This is mostly to fix regressions that show up when SeparateConstOffsetFromGEP is enabled. llvm-svn: 247054	2015-09-08 19:34:22 +00:00

1 2

96 Commits