llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00

Author	SHA1	Message	Date
Justin Holewinski	3021ef095b	[NVPTX] Improve handling of FP fusion We now consider the FPOpFusion flag when determining whether to fuse ops. We also explicitly emit add.rn when fusion is disabled to prevent ptxas from fusing the operations on its own. llvm-svn: 213287	2014-07-17 18:10:09 +00:00
Matt Arsenault	0968995f30	Fix typos llvm-svn: 213285	2014-07-17 17:50:22 +00:00
Zinovy Nis	c10a269e5f	[BUG] Due to a typo introduced in r199933 and r200027 two tests for FMA were never even started. llvm-svn: 213283	2014-07-17 17:14:35 +00:00
Adam Nemet	09fcf8939c	[X86] AVX512: Add disassembler support for compressed displacement There are two parts here. First is to modify tablegen to adjust the encoding type ENCODING_RM with the scaling factor. The second is to use the new encoding types to compute the correct displacement in the decoder. Fixes <rdar://problem/17608489> llvm-svn: 213281	2014-07-17 17:04:56 +00:00
Adam Nemet	e394093056	[X86] AVX512: Rename EVEX_CD8V to CD8_Form This is to match the naming of CD8_EltSize, CD8_Scale, etc. No functional change. llvm-svn: 213280	2014-07-17 17:04:52 +00:00
Adam Nemet	7032b7fc18	[X86] AVX512: Use the TD version of CD8_Scale in the assembler Passes the computed scaling factor in TSFlags rather than the old attributes. Also removes the C++ version of computing the scaling factor (MemObjSize) along with the asserts added by the previous patch. No functional change. llvm-svn: 213279	2014-07-17 17:04:50 +00:00
Adam Nemet	76494ac79d	[X86] AVX512: Move compressed displacement logic to TD This does not actually move the logic yet but reimplements it in the Tablegen language. Then asserts that the new implementation results in the same value. The next patch will remove the assert and the temporary use of the TSFlags and remove the C++ implementation. The formula requires a limited form of the logical left and right operators. I implemented these with the bit-extract/insert operator (i.e. blah{bits}). No functional change. llvm-svn: 213278	2014-07-17 17:04:34 +00:00
Adam Nemet	3bb0a6b076	[TableGen] Allow shift operators to take bits<n> Convert the operand to int if possible, i.e. if the value is properly initialized. (I suppose there is further room for improvement here to also peform the shift if the uninitialized bits are shifted out.) With this little change we can now compute the scaling factor for compressed displacement with pure tablegen code in the X86 backend. This is useful because both the X86-disassembler-specific part of tablegen and the assembler need this and TD is the natural sharing place. The patch also adds the missing documentation for the shift and add operator. llvm-svn: 213277	2014-07-17 17:04:27 +00:00
Justin Holewinski	60265475a1	[NVPTX] Add missing .v4 qualifier on vector store instruction llvm-svn: 213276	2014-07-17 16:58:56 +00:00
Saleem Abdulrasool	86c034e0ff	MC: correct DWARF header for PE/COFF assembly input The header contains an offset to the DWARF abbreviations for the CU. The offset must be section relative for COFF and absolute for others. The non-assembly code path for the DWARF header generation already had the correct emission for the headers. This corrects just the assembly path. Due to the invalid relocation, processing of the debug information would halt previously on the first assembly input as the associated abbreviations would be out of range as they would have the location increased by image base and the section offset. This address PR20332. llvm-svn: 213275	2014-07-17 16:27:44 +00:00
Saleem Abdulrasool	b6a416296a	MC: fix MCAsmInfo usage for windows-itanium Windows itanium uses the GNUCOFF assmebly format, not ELF. llvm-svn: 213274	2014-07-17 16:27:40 +00:00
Saleem Abdulrasool	e1094fea63	MC: collapse emission of producer Rather than use three EmitBytes, concatenate the string at compile time, constructing a single StringRef and emitting the data in one shot. This also creates nicer assembly output. NFC. llvm-svn: 213273	2014-07-17 16:27:35 +00:00
Justin Holewinski	5248ed4d97	[NVPTX] Flag surface/texture query instructions with IsTexSurfQuery Also, add some tests to make sure we can handle surface/texture queries on both Fermi and Kepler+. llvm-svn: 213268	2014-07-17 14:51:33 +00:00
Justin Holewinski	9c3e284e16	[NVPTX] Add more surface/texture intrinsics, including CUDA unified texture fetch This also uses TSFlags to mark machine instructions that are surface/texture accesses, as well as the vector width for surface operations. This is used to simplify some of the switch statements that need to detect surface/texture instructions llvm-svn: 213256	2014-07-17 11:59:04 +00:00
Tim Northover	48ae22e14a	ARM: support direct f16 <-> f64 conversions ARMv8 has instructions to handle it, otherwise a libcall is needed. llvm-svn: 213254	2014-07-17 11:27:04 +00:00
Justin Holewinski	a1eab159d8	[TABLEGEN] Do not crash on intrinsics with names longer than 40 characters Differential Revision: http://reviews.llvm.org/D4537 llvm-svn: 213253	2014-07-17 11:23:29 +00:00
Tim Northover	21a41cb9a1	CodeGen: generate single libcall for fptrunc -> f16 operations. Previously we asserted on this code. Currently compiler-rt doesn't actually implement any of these new libcalls, but external help is pretty much the only viable option for LLVM. I've followed the much more generic "__truncST2" naming, as opposed to the odd name for f32 -> f16 truncation. This can obviously be changed later, or overridden by any targets that need to. llvm-svn: 213252	2014-07-17 11:12:12 +00:00
Tim Northover	81da81acc1	X86: support double extension of f16 type. x86 has no native ability to extend an f16 to f64, but the same result is obtained if we expand it into two separate extensions: f16 -> f32 -> f64. Unfortunately the same is not true for truncate, so that still results in a compilation failure. llvm-svn: 213251	2014-07-17 11:04:04 +00:00
Tim Northover	eae1f1c8cc	CodeGen: extend f16 conversions to permit types > float. This makes the two intrinsics @llvm.convert.from.f16 and @llvm.convert.to.f16 accept types other than simple "float". This is only strictly needed for the truncate operation, since otherwise double rounding occurs and there's no way to represent the strict IEEE conversion. However, for symmetry we allow larger types in the extend too. During legalization, we can expand an "fp16_to_double" operation into two extends for convenience, but abort when the truncate isn't legal. A new libcall is probably needed here. Even after this commit, various target tweaks are needed to actually use the extended intrinsics. I've put these into separate commits for clarity, so there are no actual tests of f64 conversion here. llvm-svn: 213248	2014-07-17 10:51:23 +00:00
Yi Kong	9b1652c5d0	Port memory barriers intrinsics to AArch64 Memory barrier __builtin_arm_[dmb, dsb, isb] intrinsics are required to implement their corresponding ACLE and MSVC intrinsics. This patch ports ARM dmb, dsb, isb intrinsic to AArch64. Differential Revision: http://reviews.llvm.org/D4520 llvm-svn: 213247	2014-07-17 10:50:20 +00:00
Daniel Sanders	f49e5efb1a	[mips] .reginfo is 8 byte aligned on N32. Differential Revision: http://reviews.llvm.org/D4540 llvm-svn: 213246	2014-07-17 10:10:04 +00:00
Daniel Sanders	e88052c0c5	[mips] Correct ELF e_flags for the N32 ABI when using a mips-* triple rather than a mips64-* triple Summary: Generally speaking, mips-* vs mips64-* should not be used to make decisions about the content or format of the ELF. This should be based on the ABI and CPU in use. For example, `mips-linux-gnu-clang -mips64r2 -mabi=64` should produce an ELF64 as should `mips64-linux-gnu-clang -mabi=64`. Conversely, `mips64-linux-gnu-clang -mabi=n32` should produce an ELF32 as should `mips-linux-gnu-clang -mips64r2 -mabi=n32`. This patch fixes the e_flags but leaves the ELF32 vs ELF64 issue for now since there is no apparent way to base this decision on the ABI and CPU. Differential Revision: http://reviews.llvm.org/D4539 llvm-svn: 213244	2014-07-17 10:02:08 +00:00
Daniel Sanders	7121fa671b	[mips] Correct .MIPS.abiflags for -mfpxx on MIPS32r6 Summary: The cpr1_size field describes the minimum register width to run the program rather than the size of the registers on the target. MIPS32r6 was acting as if -mfp64 has been given because it starts off with 64-bit FPU registers. Differential Revision: http://reviews.llvm.org/D4538 llvm-svn: 213243	2014-07-17 09:57:23 +00:00
Daniel Sanders	eea41061d0	[mips] Fix ELF e_flags related to -mabicalls and -mplt. Summary: These options are not implemented yet but we act as if they are always given. The integrated assembler is driven by the clang driver so the e_flag test cases should match the e_flags emitted by GCC+GAS rather than GAS by itself. Differential Revision: http://reviews.llvm.org/D4536 llvm-svn: 213242	2014-07-17 09:52:56 +00:00
Yi Kong	d5d6470070	Fix the prefix for arm64 triple Triple.cpp still returns "arm64" as prefix for arm64 triple, causing Clang not being able to select the correct GCCBuiltin IR. This patch changes the value to correct prefix "aarch64". Regression test will be added in the coming patch. Differential Revision: http://reviews.llvm.org/D4516 llvm-svn: 213240	2014-07-17 09:43:27 +00:00
Evgeniy Stepanov	5b945c9d7e	[msan] Avoid redundant origin stores. Origin is meaningless for fully initialized values. Avoid storing origin for function arguments that are known to be always initialized (i.e. shadow is a compile-time null constant). This is not about correctness, but purely an optimization. Seems to affect compilation time of blacklisted functions significantly. llvm-svn: 213239	2014-07-17 09:10:37 +00:00
Suyog Sarda	e62b39fcd0	Move ashr optimization from InstCombineShift to InstSimplify. Refactor code, no functionality change, test case moved from instcombine to instsimplify. Differential Revision: http://reviews.llvm.org/D4102 llvm-svn: 213231	2014-07-17 06:28:15 +00:00
Matt Arsenault	45d9529fe9	Use range for llvm-svn: 213230	2014-07-17 06:19:06 +00:00
Matt Arsenault	54393bb30e	R600: Short circuit alloca check if address space isn't private. Skip calling GetUnderlyingObject in cases where it obviously isn't from an alloca. This should only be a compile time improvement. llvm-svn: 213229	2014-07-17 06:13:41 +00:00
Suyog Sarda	fe6fdd5295	Fix Typo (first commit to test commit access) llvm-svn: 213228	2014-07-17 06:09:34 +00:00
Eric Fiselier	95d51f6cee	[lit] Add --show-unsupported flag to LIT llvm-svn: 213227	2014-07-17 05:53:00 +00:00
Saleem Abdulrasool	b2cdb8f33d	MC: make WinEH opcode an opaque value This makes the opcode an opaque value (unsigned int) rather than the enumeration. This permits the use of target specific operands. Split out the generic type into a MCWinEH header and add a supporting MCWin64EH::Instruction to abstract out the selection of the opcode and construction of the actual instruction. llvm-svn: 213221	2014-07-17 03:08:50 +00:00
Hal Finkel	9779454568	Improve BasicAA CS-CS queries (redux) This reverts, "r213024 - Revert r212572 "improve BasicAA CS-CS queries", it causes PR20303." with a fix for the bug in pr20303. As it turned out, the relevant code was both wrong and over-conservative (because, as with the code it replaced, it would return the overall ModRef mask even if just Ref had been implied by the argument aliasing results). Hopefully, this correctly fixes both problems. Thanks to Nick Lewycky for reducing the test case for pr20303 (which I've cleaned up a little and added in DSE's test directory). The BasicAA test has also been updated to check for this error. Original commit message: BasicAA contains knowledge of certain intrinsics, such as memcpy and memset, and uses that information to form more-accurate answers to CallSite vs. Loc ModRef queries. Unfortunately, it did not use this information when answering CallSite vs. CallSite queries. Generically, when an intrinsic takes one or more pointers and the intrinsic is marked only to read/write from its arguments, the offset/size is unknown. As a result, the generic code that answers CallSite vs. CallSite (and CallSite vs. Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's arguments. While BasicAA's CallSite vs. Loc override could use more-accurate size information for some intrinsics, it did not do the same for CallSite vs. CallSite queries. This change refactors the intrinsic-specific logic in BasicAA into a generic AA query function: getArgLocation, which is overridden by BasicAA to supply the intrinsic-specific knowledge, and used by AA's generic implementation. This allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and CallSite vs. CallSite queries, and simplifies the BasicAA implementation. Currently, only one function, Mac's memset_pattern16, is handled by BasicAA (all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's getModRefBehavior override now also returns OnlyAccessesArgumentPointees for this function (which is an improvement). llvm-svn: 213219	2014-07-17 01:28:25 +00:00
Jingyue Wu	7c4bea3e99	Partially revert r210444 due to performance regression Summary: Converting outermost zext(a) to sext(a) causes worse code when the computation of zext(a) could be reused. For example, after converting ... = array[zext(a)] ... = array[zext(a) + 1] to ... = array[sext(a)] ... = array[zext(a) + 1], the program computes sext(a), which is actually unnecessary. I added one test in split-gep-and-gvn.ll to illustrate this scenario. Also, with r211281 and r211084, we annotate more "nuw" tags to computation involving CUDA intrinsics such as threadIdx.x. These annotations help with splitting GEP a lot, rendering the benefit we get from this reverted optimization only marginal. Test Plan: make check-all Reviewers: eliben, meheff Reviewed By: meheff Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D4542 llvm-svn: 213209	2014-07-16 23:25:00 +00:00
Sanjay Patel	5537765676	Fixed formatting, removed bug reference, renamed testcase Thanks to Duncan Exon Smith for reviewing and cleanup suggestions. llvm-svn: 213205	2014-07-16 22:40:28 +00:00
Juergen Ributzka	63d2af65d4	[FastISel] Local values shouldn't be alive across an inline asm call with side effects. This fixes an issue where a local value is defined before and used after an inline asm call with side effects. This fix simply flushes the local value map, which updates the insertion point for the inline asm call to be above any previously defined local values. This fixes <rdar://problem/17694203> llvm-svn: 213203	2014-07-16 22:20:51 +00:00
Lang Hames	ddc836290a	[MCJIT] Improve a RuntimeDyldChecker diagnostic. When a RuntimeDyldChecker test requests an invalid operand for an instruction, print the decoded instruction to aid diagnosis. llvm-svn: 213202	2014-07-16 22:02:20 +00:00
Hal Finkel	d8b9bfa0c2	Fix a typo in the inalloca description llvm-svn: 213200	2014-07-16 21:22:46 +00:00
Sanjay Patel	8befe236c4	trivial fix for PR20314 Make sure that the AddrInst is an Instruction. llvm-svn: 213197	2014-07-16 21:08:10 +00:00
Sanjay Patel	b3fb7dc171	Remove Atom references in description. Any CPU can run this pass. llvm-svn: 213190	2014-07-16 20:18:49 +00:00
Manuel Jacob	e41c2e7cde	Utilize CastInst::CreatePointerBitCastOrAddrSpaceCast here. llvm-svn: 213189	2014-07-16 20:13:45 +00:00
Chris Bieneman	8124ab9a24	[RegisterCoalescer] Moving the RegisterCoalescer subtarget hook onto the TargetRegisterInfo instead of the TargetSubtargetInfo. llvm-svn: 213188	2014-07-16 20:13:31 +00:00
Justin Holewinski	35f9408e7f	[NVPTX] Honor alignment on vector loads/stores We were not considering the stated alignment on vector loads/stores, leading us to generate vector instructions even when we do not have sufficient alignment. Now, for IR like: %1 = load <4 x float>, <4 x float>* %ptr, align 4 we will generate correct, conservative PTX like: ld.f32 ... [%ptr] ld.f32 ... [%ptr+4] ld.f32 ... [%ptr+8] ld.f32 ... [%ptr+12] Or if we have an alignment of 8 (for example), we can generate code like: ld.v2.f32 ... [%ptr] ld.v2.f32 ... [%ptr+8] llvm-svn: 213186	2014-07-16 19:45:35 +00:00
Alexey Samsonov	d7eaaec8e8	CHECK-LABEL-ize one test llvm-svn: 213177	2014-07-16 18:11:31 +00:00
Kevin Enderby	d2a97a67de	Add the "-x" flag to llvm-nm for Mach-O files that prints the fields of a symbol in hex. (generally use for debugging the tools). This is same functionality as darwin’s nm(1) "-x" flag. llvm-svn: 213176	2014-07-16 17:38:26 +00:00
David Blaikie	3569fd24ba	Remove unnecessary/redundant std::move (run returns unique_ptr by value already) llvm-svn: 213174	2014-07-16 17:09:21 +00:00
Alp Toker	78faffcebb	Track clang r213171 The clang rewriter is now a core facility. llvm-svn: 213173	2014-07-16 16:50:34 +00:00
Chris Bieneman	aa19c0dccd	Added documentation for SizeMultiplier in the ARM subtarget hook for register coalescing. Also fixed some 80 col violations. No functional code changes. llvm-svn: 213169	2014-07-16 16:27:31 +00:00
Justin Holewinski	84f0bca9c1	[NVPTX] Rename registers %fl -> %fd and %rl -> %rd This matches the internal behavior of NVIDIA tools like libnvvm. llvm-svn: 213168	2014-07-16 16:26:58 +00:00
Tim Northover	c6c02a43ba	CodeGen: don't form illegail EXTLOAD operations. It turns out that in most cases (the main exception being i1-related types) once these operations are formed we cannot separate them and the targets end up having to deal with them whether they want to or not. This is not a good situation, and a more reasonable default can be formed by ackowledging this and having targets leave them as Legal. Only x86 seems to be affected (other targets don't even try marking the operation Expand). Mostly there's no visible change here yet, but it will be useful to have truly expanded EXTLOADS for MVT::f16 softening support. llvm-svn: 213162	2014-07-16 15:37:24 +00:00

1 2 3 4 5 ...

105610 Commits