llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 04:52:54 +02:00

Author	SHA1	Message	Date
David Majnemer	c8b1f095a3	Move the personality function from LandingPadInst to Function The personality routine currently lives in the LandingPadInst. This isn't desirable because: - All LandingPadInsts in the same function must have the same personality routine. This means that each LandingPadInst beyond the first has an operand which produces no additional information. - There is ongoing work to introduce EH IR constructs other than LandingPadInst. Moving the personality routine off of any one particular Instruction and onto the parent function seems a lot better than have N different places a personality function can sneak onto an exceptional function. Differential Revision: http://reviews.llvm.org/D10429 llvm-svn: 239940	2015-06-17 20:52:32 +00:00
Kit Barton	12e4595b58	Properly handle the mftb instruction. The mftb instruction was incorrectly marked as deprecated in the PPC Backend. Instead, it should not be treated as deprecated, but rather be implemented using the mfspr instruction. A similar patch was put into GCC last year. Details can be found at: https://sourceware.org/ml/binutils/2014-11/msg00383.html. This change will replace instances of the mftb instruction with the mfspr instruction for all CPUs except 601 and pwr3. This will also be the default behaviour. Additional details can be found in: https://llvm.org/bugs/show_bug.cgi?id=23680 Phabricator review: http://reviews.llvm.org/D10419 llvm-svn: 239827	2015-06-16 16:01:15 +00:00
Nemanja Ivanovic	d737ff7af7	LLVM support for vector quad bit permute and gather instructions through builtins This patch corresponds to review: http://reviews.llvm.org/D10096 This is the back end portion of the patch related to D10095. The patch adds the instructions and back end intrinsics for: vbpermq vgbbd llvm-svn: 239505	2015-06-11 06:21:25 +00:00
Nemanja Ivanovic	d9fdd9c5e6	Add support for VSX FMA single-precision instructions to the PPC back end This patch corresponds to review: http://reviews.llvm.org/D9941 It adds the various FMA instructions introduced in the version 2.07 of the ISA along with the testing for them. These are operations on single precision scalar values in VSX registers. llvm-svn: 238578	2015-05-29 17:13:25 +00:00
Kit Barton	eb147c8fbd	This patch adds support for the vector quadword add/sub instructions introduced in POWER8: vadduqm vaddeuqm vaddcuq vaddecuq vsubuqm vsubeuqm vsubcuq vsubecuq In addition to adding the instructions themselves, it also adds support for the v1i128 type for intrinsics (Intrinsics.td, Function.cpp, and IntrinsicEmitter.cpp). http://reviews.llvm.org/D9081 llvm-svn: 238144	2015-05-25 15:49:26 +00:00
Hal Finkel	ef674a05e8	[PowerPC] Fix fast-isel when compare is split from branch When the compare feeding a branch was in a different BB from the branch, we'd try to "regenerate" the compare in the block with the branch, possibly trying to make use of values not available there. Copy a page from AArch64's play book here to fix the problem (at least in terms of correctness). Fixes PR23640. llvm-svn: 238097	2015-05-23 12:18:10 +00:00
Bill Schmidt	39c88f95dd	[PPC64] Handle vpkudum mask pattern correctly when vpkudum isn't available My recent patch to add support for ISA 2.07 vector pack/unpack instructions didn't properly check for availability of the vpkudum instruction when recognizing it as a special vector shuffle case. This causes us to leave the vector shuffle in place (rather than converting it to a vector permute) so that it can be recognized later as a vpkudum, but that pattern is invalid for processors prior to POWER8. Thus LLVM crashes with an "unable to select" message. We observed this since one of our buildbots is configured to generate code for a POWER7. This patch fixes the problem by checking for availability of the vpkudum instruction during custom lowering of vector shuffles. I've added a test case variant for the vpkudum pattern when the instruction isn't available. llvm-svn: 237952	2015-05-21 20:48:49 +00:00
Nemanja Ivanovic	78592ebe3a	Add support for VSX scalar single-precision arithmetic in the PPC target http://reviews.llvm.org/D9891 Following up on the VSX single precision loads and stores added earlier, this adds support for elementary arithmetic operations on single precision values in VSX registers. These instructions utilize the new VSSRC register class. Instructions added: xsaddsp xsdivsp xsmulsp xsresp xsrsqrtesp xssqrtsp xssubsp llvm-svn: 237937	2015-05-21 19:32:49 +00:00
Hal Finkel	d380588a20	[PowerPC] Add extra r2 read deps on @toc@l relocations If some commits are happy, and some commits are sad, this is a sad commit. It is sad because it restricts instruction scheduling to work around a binutils linker bug, and moreover, one that may never be fixed. On 2012-05-21, GCC was updated not to produce code triggering this bug, and now we'll do the same... When resolving an address using the ELF ABI TOC pointer, two relocations are generally required: one for the high part and one for the low part. Only the high part generally explicitly depends on r2 (the TOC pointer). And, so, we might produce code like this: .Ltmp526: addis 3, 2, .LC12@toc@ha .Ltmp1628: std 2, 40(1) ld 5, 0(27) ld 2, 8(27) ld 11, 16(27) ld 3, .LC12@toc@l(3) rldicl 4, 4, 0, 32 mtctr 5 bctrl ld 2, 40(1) And there is nothing wrong with this code, as such, but there is a linker bug in binutils (https://sourceware.org/bugzilla/show_bug.cgi?id=18414) that will misoptimize this code sequence to this: nop std r2,40(r1) ld r5,0(r27) ld r2,8(r27) ld r11,16(r27) ld r3,-32472(r2) clrldi r4,r4,32 mtctr r5 bctrl ld r2,40(r1) because the linker does not know (and does not check) that the value in r2 changed in between the instruction using the .LC12@toc@ha (TOC-relative) relocation and the instruction using the .LC12@toc@l(3) relocation. Because it finds these instructions using the relocations (and not by scanning the instructions), it has been asserted that there is no good way to detect the change of r2 in between. As a result, this bug may never be fixed (i.e. it may become part of the definition of the ABI). GCC was updated to add extra dependencies on r2 to instructions using the @toc@l relocations to avoid this problem, and we'll do the same here. This is done as a separate pass because: 1. These extra r2 dependencies are not really properties of the instructions, but rather due to a linker bug, and maybe one day we'll be able to get rid of them when targeting linkers without this bug (and, thus, keeping the logic centralized here will make that straightforward). 2. There are ISel-level peephole optimizations that propagate the @toc@l relocations to some user instructions, and so the exta dependencies do not apply only to a fixed set of instructions (without undesirable definition replication). The test case was reduced with the help of bugpoint, with minimal cleaning. I'm looking forward to our upcoming MI serialization support, and with that, much better tests can be created. llvm-svn: 237556	2015-05-18 06:25:59 +00:00
Bill Schmidt	d115a77d30	[PPC64] Add vector pack/unpack support from ISA 2.07 This patch adds support for the following new instructions in the Power ISA 2.07: vpksdss vpksdus vpkudus vpkudum vupkhsw vupklsw These instructions are available through the vec_packs, vec_packsu, vec_unpackh, and vec_unpackl built-in interfaces. These are lane-sensitive instructions, so the built-ins have different implementations for big- and little-endian, and the instructions must be marked as killing the vector swap optimization for now. The first three instructions perform saturating pack operations. The fourth performs a modulo pack operation, which means it can be represented with a vector shuffle, and conversely the appropriate vector shuffles may cause this instruction to be generated. The other instructions are only generated via built-in support for now. Appropriate tests have been added. There is a companion patch to clang for the rest of this support. llvm-svn: 237499	2015-05-16 01:02:12 +00:00
James Y Knight	f2154471fd	Fix test added in r236850 for OSX builders. Need to specify triple so that llvm emits the asm syntax that the test expected. llvm-svn: 236855	2015-05-08 14:04:54 +00:00
James Y Knight	7d10114335	Fix alignment checks in MergeConsecutiveStores. 1) check whether the alignment of the memory is sufficient for the merged store or load to be efficient. Not doing so can result in some ridiculously poor code generation, if merging creates a vector operation which must be aligned but isn't. 2) DON'T check that the alignment of each load/store is equal. If you're merging 2 4-byte stores, the first might have 8-byte alignment, but the second certainly will have 4-byte alignment. We do want to allow those to be merged. llvm-svn: 236850	2015-05-08 13:47:01 +00:00
Nemanja Ivanovic	5c7f6e9cdc	Add VSX Scalar loads and stores to the PPC back end This patch corresponds to review: http://reviews.llvm.org/D9440 It adds a new register class to the PPC back end to contain single precision values in VSX registers. Additionally, it adds scalar loads and stores for VSX registers. llvm-svn: 236755	2015-05-07 18:24:05 +00:00
Bill Schmidt	1b44da961a	[PPC64LE] Adjust vector splats during VSX swap optimization The initial code drop for VSX swap optimization permitted the optimization only when all operations in a web of related computation are lane-insensitive. For some lane-sensitive operations, we can still permit the optimization provided that we make adjustments to those operations. This patch adds special handling for vector splats so that their presence doesn't kill the optimization. Vector splats are lane-sensitive since they identify by number a vector element to be used as the source of a splat. When swap optimizations take place, the desired vector element will move to the opposite doubleword of the quadword vector. We thus replace the index I by (I + N/2) % N, where N is the number of elements in the vector. A new test case is added to test that swap optimization succeeds when vector splats are present, and that the proper input element is used as the source of the splat. An ancillary change removes SH_BUILDVEC as one of the kinds of special handling that may be required by VSX swap optimization. From experience with GCC, I had expected to need some modifications for vector build operations, but I did not find that to be the case. llvm-svn: 236606	2015-05-06 15:40:46 +00:00
Kit Barton	4fec877662	This patch adds ABI support for v1i128 data type. It adds v1i128 to the appropriate register classes and checks parameter passing and return values. This is related to http://reviews.llvm.org/D9081, which will add instructions that exploit the v1i128 datatype. Phabricator review: http://reviews.llvm.org/D9475 llvm-svn: 236503	2015-05-05 16:10:44 +00:00
Duncan P. N. Exon Smith	09b5c9c24d	IR: Give 'DI' prefix to debug info metadata Finish off PR23080 by renaming the debug info IR constructs from `MD` to `DI`. The last of the `DIDescriptor` classes were deleted in r235356, and the last of the related typedefs removed in r235413, so this has all baked for about a week. Note: If you have out-of-tree code (like a frontend), I recommend that you get everything compiling and tests passing with the previous commit before updating to this one. It'll be easier to keep track of what code is using the `DIDescriptor` hierarchy and what you've already updated, and I think you're extremely unlikely to insert bugs. YMMV of course. Back to this commit: I did this using the rename-md-di-nodes.sh upgrade script I've attached to PR23080 (both code and testcases) and filtered through clang-format-diff.py. I edited the tests for test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns were off-by-three. It should work on your out-of-tree testcases (and code, if you've followed the advice in the previous paragraph). Some of the tests are in badly named files now (e.g., test/Assembler/invalid-mdcompositetype-missing-tag.ll should be 'dicompositetype'); I'll come back and move the files in a follow-up commit. llvm-svn: 236120	2015-04-29 16:38:44 +00:00
Bill Schmidt	6661e2ddb2	[PPC64LE] Remove unnecessary swaps from lane-insensitive vector computations This patch adds a new SSA MI pass that runs on little-endian PPC64 code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors without alignment constraints are accomplished for little-endian using lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional xxswapd instructions hurts performance in comparison with big-endian code, but they are necessary in the general case to support correct semantics. However, the general case does not apply to most vector code. Many vector instructions are lane-insensitive; they do not "care" which lanes the parallel computations are performed within, provided that the resulting data is stored into the correct locations. Thus this pass looks for computations that perform only lane-insensitive operations, and remove the unnecessary swaps from loads and stores in such computations. Future improvements will allow computations using certain lane-sensitive operations to also be optimized in this manner, by modifying the lane-sensitive operations to account for the permuted order of the lanes. However, this patch only adds the infrastructure to permit this; no lane-sensitive operations are optimized at this time. This code is heavily exercised by the various vectorizing applications in the projects/test-suite tree. For the time being, I have only added one simple test case to demonstrate what the pass is doing. Although it is quite simple, it provides coverage for much of the code, including the special case handling of copies and subreg-to-reg operations feeding the swaps. I plan to add additional tests in the future as I fill in more of the "special handling" code. Two existing tests were affected, because they expected the swaps to be present, but they are now removed. llvm-svn: 235910	2015-04-27 19:57:34 +00:00
Hal Finkel	acf4a0f1ca	[PowerPC] Use sync inst alias when printing So long as the choice between printing msync and sync is not ambiguous, we can print 'sync 0' and just 'sync'. llvm-svn: 235663	2015-04-23 23:05:08 +00:00
Hal Finkel	4e2def1840	[PowerPC] Enable printing instructions using aliases TableGen had been nicely generating code to print a number of instructions using shorter aliases (and PowerPC has plenty of short mnemonics), but we were not calling it. For some of the aliases we support in the parser, TableGen can't infer the "inverse" alias relationship, so there is still more to do. Thus, after some hours of updating test cases... llvm-svn: 235616	2015-04-23 18:30:38 +00:00
Hans Wennborg	8823c80ce0	Re-commit r235560: Switch lowering: extract jump tables and bit tests before building binary tree (PR22262) Third time's the charm. The previous commit was reverted as a reverse for-loop in SelectionDAGBuilder::lowerWorkItem did 'I--' on an iterator at the beginning of a vector, causing asserts when using debugging iterators. This commit fixes that. llvm-svn: 235608	2015-04-23 16:45:24 +00:00
Aaron Ballman	be6ee771e3	Revert r235560; this commit was causing several failed assertions in Debug builds using MSVC's STL. The iterator is being used outside of its valid range. llvm-svn: 235597	2015-04-23 13:41:59 +00:00
Hans Wennborg	d4bc2d86b6	Switch lowering: extract jump tables and bit tests before building binary tree (PR22262) This is a re-commit of r235101, which also fixes the problems with the previous patch: - Switches with only a default case and non-fallthrough were handled incorrectly - The previous patch tickled a bug in PowerPC Early-Return Creation which is fixed here. > This is a major rewrite of the SelectionDAG switch lowering. The previous code > would lower switches as a binary tre, discovering clusters of cases > suitable for lowering by jump tables or bit tests as it went along. To increase > the likelihood of finding jump tables, the binary tree pivot was selected to > maximize case density on both sides of the pivot. > > By not selecting the pivot in the middle, the binary trees would not always > be balanced, leading to performance problems in the generated code. > > This patch rewrites the lowering to search for clusters of cases > suitable for jump tables or bit tests first, and then builds the binary > tree around those clusters. This way, the binary tree will always be balanced. > > This has the added benefit of decoupling the different aspects of the lowering: > tree building and jump table or bit tests finding are now easier to tweak > separately. > > For example, this will enable us to balance the tree based on profile info > in the future. > > The algorithm for finding jump tables is quadratic, whereas the previous algorithm > was O(n log n) for common cases, and quadratic only in the worst-case. This > doesn't seem to be major problem in practice, e.g. compiling a file consisting > of a 10k-case switch was only 30% slower, and such large switches should be rare > in practice. Compiling e.g. gcc.c showed no compile-time difference. If this > does turn out to be a problem, we could limit the search space of the algorithm. > > This commit also disables all optimizations during switch lowering in -O0. > > Differential Revision: http://reviews.llvm.org/D8649 llvm-svn: 235560	2015-04-22 23:14:56 +00:00
Hal Finkel	df195a3bea	[DAGCombine] Disable select(c, load,load) for indexed loads This turned up after r235333, but was a pre-existing bug. The optimization which transforms select(c, load, load) into a load of a select of the addresses does not handle indexed loads (pre/post inc/dec). However, it did not check for them either, leading to a crash if it tried to transform one of them. llvm-svn: 235497	2015-04-22 11:32:25 +00:00
Olivier Sallenave	f748efd9fd	Refactoring and enhancement to FMA combine. llvm-svn: 235344	2015-04-20 20:29:40 +00:00
Hal Finkel	40b204ea4b	[InlineAsm] Remove EarlyClobber on registers that are also inputs When an inline asm call has an output register marked as early-clobber, but that same register is also an input operand, what should we do? GCC accepts this, and is documented to accept this for read/write operands saying, "Furthermore, if the earlyclobber operand is also a read/write operand, then that operand is written only after it's used." For write-only operands, the situation seems less clear, but I have at least one existing codebase that assumes this will work, in part because it has syscall macros like this: ({ \ register uint64_t r0 __asm__ ("r0") = (__NR_ ## name); \ register uint64_t r3 __asm__ ("r3") = ((uint64_t) (arg0)); \ register uint64_t r4 __asm__ ("r4") = ((uint64_t) (arg1)); \ register uint64_t r5 __asm__ ("r5") = ((uint64_t) (arg2)); \ __asm__ __volatile__ \ ("sc" \ : "=&r"(r0),"=&r"(r3),"=&r"(r4),"=&r"(r5) \ : "0"(r0), "1"(r3), "2"(r4), "3"(r5) \ : "r6","r7","r8","r9","r10","r11","r12","cr0","memory"); \ r3; \ }) Furthermore, with register aliases and subregister relationships that only the backend knows about, rejecting this in the frontend seems like a difficult proposition (if we wanted to do so). However, keeping the early-clobber flag on the INLINEASM MI does not work for us, because it will cause the register's live interval to end to soon (so it will not appear defined to be used as an input). Fortunately, fixing this does not seem hard: When forming the INLINEASM MI, check to see if any of the early-clobber outputs are also inputs, and if so, remove the early-clobber flag. llvm-svn: 235283	2015-04-20 00:01:30 +00:00
David Blaikie	dfadb4e9ee	[opaque pointer type] Add textual IR support for explicit type parameter to the call instruction See r230786 and r230794 for similar changes to gep and load respectively. Call is a bit different because it often doesn't have a single explicit type - usually the type is deduced from the arguments, and just the return type is explicit. In those cases there's no need to change the IR. When that's not the case, the IR usually contains the pointer type of the first operand - but since typed pointers are going away, that representation is insufficient so I'm just stripping the "pointerness" of the explicit type away. This does make the IR a bit weird - it /sort of/ reads like the type of the first operand: "call void () %x(" but %x is actually of type "void ()" and will eventually be just of type "ptr". But this seems not too bad and I don't think it would benefit from repeating the type ("void (), void () %x(" and then eventually "void (), ptr %x(") as has been done with gep and load. This also has a side benefit: since the explicit type is no longer a pointer, there's no ambiguity between an explicit type and a function that returns a function pointer. Previously this case needed an explicit type (eg: a function returning a void() function was written as "call void () () * @x(" rather than "call void () * @x(" because of the ambiguity between a function returning a pointer to a void() function and a function returning void). No ambiguity means even function pointer return types can just be written alone, without writing the whole function's type. This leaves /only/ the varargs case where the explicit type is required. Given the special type syntax in call instructions, the regex-fu used for migration was a bit more involved in its own unique way (as every one of these is) so here it is. Use it in conjunction with the apply.sh script and associated find/xargs commands I've provided in rr230786 to migrate your out of tree tests. Do let me know if any of this doesn't cover your cases & we can iterate on a more general script/regexes to help others with out of tree tests. About 9 test cases couldn't be automatically migrated - half of those were functions returning function pointers, where I just had to manually delete the function argument types now that we didn't need an explicit function type there. The other half were typedefs of function types used in calls - just had to manually drop the * from those. import fileinput import sys import re pat = re.compile(r'((?:=\|:\|^\|\s)call\s(?:[^@]?))(\s$\|\s(?:(?:\[\[[a-zA-Z0-9_]+\]\]\|[@%](?:(")?[\\\?@a-zA-Z0-9_.]?(?(3)"\|)\|{{.}}))(?:$\|$)\|undef\|inttoptr\|bitcast\|null\|asm).$)') addrspace_end = re.compile(r"addrspace\(\d+$\s\$") func_end = re.compile("(?:void.\|\)\s)\$") def conv(match, line): if not match or re.search(addrspace_end, match.group(1)) or not re.search(func_end, match.group(1)): return line return line[:match.start()] + match.group(1)[:match.group(1).rfind('')].rstrip() + match.group(2) + line[match.end():] for line in sys.stdin: sys.stdout.write(conv(re.search(pat, line), line)) llvm-svn: 235145	2015-04-16 23:24:18 +00:00
Hans Wennborg	ff837f8fc0	Revert the switch lowering change (r235101, r235103, r235106) Looks like it broke the sanitizer-ppc64-linux1 build. Reverting for now. llvm-svn: 235108	2015-04-16 15:43:26 +00:00
Hans Wennborg	bc33cd14d7	Switch lowering: extract jump tables and bit tests before building binary tree (PR22262) This is a major rewrite of the SelectionDAG switch lowering. The previous code would lower switches as a binary tre, discovering clusters of cases suitable for lowering by jump tables or bit tests as it went along. To increase the likelihood of finding jump tables, the binary tree pivot was selected to maximize case density on both sides of the pivot. By not selecting the pivot in the middle, the binary trees would not always be balanced, leading to performance problems in the generated code. This patch rewrites the lowering to search for clusters of cases suitable for jump tables or bit tests first, and then builds the binary tree around those clusters. This way, the binary tree will always be balanced. This has the added benefit of decoupling the different aspects of the lowering: tree building and jump table or bit tests finding are now easier to tweak separately. For example, this will enable us to balance the tree based on profile info in the future. The algorithm for finding jump tables is O(n^2), whereas the previous algorithm was O(n log n) for common cases, and quadratic only in the worst-case. This doesn't seem to be major problem in practice, e.g. compiling a file consisting of a 10k-case switch was only 30% slower, and such large switches should be rare in practice. Compiling e.g. gcc.c showed no compile-time difference. If this does turn out to be a problem, we could limit the search space of the algorithm. This commit also disables all optimizations during switch lowering in -O0. Differential Revision: http://reviews.llvm.org/D8649 llvm-svn: 235101	2015-04-16 14:49:23 +00:00
Rafael Espindola	518baef93e	Update tests to not be as dependent on section numbers. Many of these predate llvm-readobj. With elf-dump we had to match a relocation to symbol number and symbol number to symbol name or section number. llvm-svn: 235015	2015-04-15 15:59:37 +00:00
Hal Finkel	41bf2cfa3a	[PowerPC] Really iterate over all loops in PPCLoopDataPrefetch/PPCLoopPreIncPrep When I fixed these a couple of days ago to iterate over all loops, not just depth == 1 loops, I inadvertently made it such that we'd only look at the first top-level loop. Make sure that we really look at all of them. llvm-svn: 234705	2015-04-12 17:18:56 +00:00
Hal Finkel	cf806a6531	[PowerPC] Disable part-word atomics on the P7 As it turns out, even though these are part of ISA 2.06, the P7 does not support them (or, at least, not any P7s we're tested so far). llvm-svn: 234686	2015-04-11 13:40:36 +00:00
Nemanja Ivanovic	9d5491e285	Add direct moves to/from VSR and exploit them for FP/INT conversions This patch corresponds to review: http://reviews.llvm.org/D8928 It adds direct move instructions to/from VSX registers to GPR's. These are exploited for FP <-> INT conversions. llvm-svn: 234682	2015-04-11 10:40:42 +00:00
Hal Finkel	47533e9dd7	[PowerPC] Fix PPCLoopPreIncPrep for depth > 1 loops This pass had the same problem as the data-prefetching pass: it was only checking for depth == 1 loops in practice. Fix that, add some debugging statements, and make sure that, when we grab an AddRec, it is for the loop we expect. llvm-svn: 234670	2015-04-11 00:33:08 +00:00
Hal Finkel	bc9c7308a0	[PowerPC] Prefetching should also consider depth > 1 loops Iterating over loops from the LoopInfo instance only provides top-level loops. We need to search the whole tree of loops to find the inner ones. llvm-svn: 234603	2015-04-10 15:05:02 +00:00
Hal Finkel	79f36597b9	[PowerPC] Don't crash on PPC32 i64 fp_to_uint on modern cores When we have an instruction for this (and, thus, don't generate a runtime call), we need to custom type legalize this (in a trivial way, just as we do for fp_to_sint). Fixes PR23173. llvm-svn: 234561	2015-04-10 03:39:00 +00:00
Nemanja Ivanovic	5c0e16778c	Add LLVM support for remaining integer divide and permute instructions from ISA 2.06 This is the patch corresponding to review: http://reviews.llvm.org/D8406 It adds some missing instructions from ISA 2.06 to the PPC back end. llvm-svn: 234546	2015-04-09 23:54:37 +00:00
Rafael Espindola	2d125b4495	Revert "Refactoring and enhancement to FMA combine." This reverts commit r234513. It was failing on the bots. llvm-svn: 234518	2015-04-09 18:29:32 +00:00
Olivier Sallenave	e19fcb1120	Refactoring and enhancement to FMA combine. llvm-svn: 234513	2015-04-09 17:55:26 +00:00
Eric Christopher	97beb00b59	Strip trailing whitespace and reword explanatory comment. llvm-svn: 234078	2015-04-04 02:26:47 +00:00
Bill Schmidt	3f570f996a	[PowerPC] Enable splat generation for BUILD_VECTOR with little endian When enabling PPC64LE, I disabled some optimizations of BUILD_VECTOR nodes for little endian because wrong results were produced. I've subsequently investigated and found this is due to a call to BuildVectorSDNode::isConstantSplat that was always specifying big-endian. With this changed to correctly identify the target endianness, the optimizations work as expected. I found another case of a call to the same method with big-endian hardcoded, in PPC::isAllNegativeZeroVector(). I discovered this was an orphaned method with no callers, so I've just removed it. The existing test/CodeGen/PowerPC/vec_constants.ll checks these optimizations, so for testing I've just added a variant for little endian. llvm-svn: 234011	2015-04-03 13:48:24 +00:00
Simon Pilgrim	335a565d46	[DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR This patch attempts to fold the shuffling of 'scalar source' inputs - BUILD_VECTOR and SCALAR_TO_VECTOR nodes - if the shuffle node is the only user. This folds away a lot of unnecessary shuffle nodes, and allows quite a bit of constant folding that was being missed. Differential Revision: http://reviews.llvm.org/D8516 llvm-svn: 234004	2015-04-03 10:02:21 +00:00
Hal Finkel	7bfde8133e	[PowerPC] FastISel can't handle i1 return values when using CR bits Under normal circumstances, use of CR bits is disabled when running at -O0, but it is enabled by default otherwise, and if you have optnone functions, they'll still generally be generated with crbits turned on (because nothing else turns them off). FastISel can't handle most things dealing with i1 values when using CR bits, and checks for that, but was not checking the return type on functions; we can't fast-isel function calls with i1 return values either when using CR bits for boolean values. Fixes PR22664. llvm-svn: 233775	2015-04-01 00:40:48 +00:00
Hal Finkel	fcde972bd3	[PowerPC] Don't use a vector preferred memory type at -O0 Even at -O0, we fall back to SDAG when we hit intrinsics, and if the intrinsic is a memset/memcpy/etc. we might normally use vector types. At -O0, this is probably not a good idea (because, if there is a bug in the lowering code, there would be no good way to turn it off). At -O0, only use scalar preferred types. Related to PR22754. llvm-svn: 233755	2015-03-31 20:56:09 +00:00
Hal Finkel	60a07f273f	[SDAG] Handle non-integer preferred memset types for non-constant values The existing code in getMemsetValue only handled integer-preferred types when the fill value was not a constant. Make this more robust in two ways: 1. If the preferred type is a floating-point value, do the mul-splat trick on the corresponding integer type and then bitcast. 2. If the preferred type is a vector, do the mul-splat trick on one vector element, and then build a vector out of them. Fixes PR22754 (although, we should also turn off use of vector types at -O0). llvm-svn: 233749	2015-03-31 20:35:26 +00:00
Duncan P. N. Exon Smith	4d29d309de	DebugInfo: Fix bad debug info for compile units and types Fix debug info in these tests, which started failing with a WIP patch to verify compile units and types. The problems look like they were all caused by bitrot. They fell into these categories: - Using `!{i32 0}` instead of `!{}`. - Using `!{null}` instead of `!{}`. - Using `!MDExpression()` instead of `!{}`. - Using `!8` instead of `!{!8}`. - `file:` references that pointed at `MDCompileUnit`s instead of the same `MDFile` as the compile unit. - `file:` references that were numerically off-by-one or (off-by-ten). llvm-svn: 233415	2015-03-27 20:46:33 +00:00
Andrew Trick	75ca61e404	Complete the MachineScheduler fix made way back in r210390. "Fix the MachineScheduler's logic for updating ready times for in-order. Now the scheduler updates a node's ready time as soon as it is scheduled, before releasing dependent nodes." This fix was only made in one variant of the ScheduleDAGMI driver. Francois de Ferriere reported the issue in the other bit of code where it was also needed. I never got around to coming up with a test case, but it's an obvious fix that shouldn't be delayed any longer. I'll try to refactor this code a little better. I did verify performance on a wide variety of targets and saw no negative impact with this fix. llvm-svn: 233366	2015-03-27 06:10:13 +00:00
Eric Christopher	020f333161	Testcase for r233239. llvm-svn: 233240	2015-03-26 00:57:33 +00:00
Kit Barton	d0dd6e5750	Add Hardware Transactional Memory (HTM) Support This patch adds Hardware Transaction Memory (HTM) support supported by ISA 2.07 (POWER8). The intrinsic support is based on GCC one [1], but currently only the 'PowerPC HTM Low Level Built-in Function' are implemented. The HTM instructions follows the RC ones and the transaction initiation result is set on RC0 (with exception of tcheck). Currently approach is to create a register copy from CR0 to GPR and comapring. Although this is suboptimal, since the branch could be taken directly by comparing the CR0 value, it generates code correctly on both test and branch and just return value. A possible future optimization could be elimitate the MFCR instruction to branch directly. The HTM usage requires a recently newer kernel with PPC HTM enabled. Tested on powerpc64 and powerpc64le. This is send along a clang patch to enabled the builtins and option switch. [1] https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Hardware-Transactional-Memory-Built-in-Functions.html Phabricator Review: http://reviews.llvm.org/D8247 llvm-svn: 233204	2015-03-25 19:36:23 +00:00
Hal Finkel	70188cc8ca	[SDAG] Don't widen VSETCC during type legalization for split operands Because the operands of a vector SETCC node can be of a different type from the result (and often are), it can happen that even if we'd prefer to widen the result type of the SETCC, the operands have been split instead. In this case, the SETCC result also must be split. This mirrors what is done in WidenVecRes_SELECT, and should be NFC elsewhere because if the operands are not widened the following calls to GetWidenedVector will assert (which is what was happening in the test case). llvm-svn: 232935	2015-03-23 08:22:43 +00:00
Eric Christopher	496632fcd2	Remove the bare getSubtargetImpl call from the PPC port. As part of this add a test that shows we can generate code with for functions that differ by subtarget feature. llvm-svn: 232882	2015-03-21 03:36:02 +00:00

1 2 3 4 5 ...

1033 Commits