llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 13:02:52 +02:00

Author	SHA1	Message	Date
Adrian Prantl	ebdf2406b7	Refactor DebugLocDWARFExpression so it doesn't require access to the TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes of the pre-calculated DWARF expression. Ought to be NFC, but it does slightly alter the output format of the textual assembly. This reapplies 230930 without the assertion in DebugLocEntry::finalize() because not all Machine registers can be lowered into DWARF register numbers and floating point constants cannot be expressed. llvm-svn: 231023	2015-03-02 22:02:33 +00:00
Adrian Prantl	d8089cbc89	Revert "Refactor DebugLocDWARFExpression so it doesn't require access to the" This reverts commit 230975 to investigate buildbot breakage. llvm-svn: 231004	2015-03-02 20:01:54 +00:00
David Blaikie	214504554b	Change SystemZ large tests to use the existing long_tests property (this is already used in Clang for a couple of tests) Reviewers: uweigand Differential Revision: http://reviews.llvm.org/D7965 llvm-svn: 230998	2015-03-02 19:34:11 +00:00
Adrian Prantl	e8d05839e3	Refactor DebugLocDWARFExpression so it doesn't require access to the TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes of the pre-calculated DWARF expression. Ought to be NFC, but it does slightly alter the output format of the textual assembly. This reapplies 230930 with a relaxed assertion in DebugLocEntry::finalize() that allows for empty DWARF expressions for constant FP values. llvm-svn: 230975	2015-03-02 17:21:06 +00:00
Vasileios Kalintiris	f1d98b8fc1	[mips] Optimize conditional moves where RHS is zero. Summary: When the RHS of a conditional move node is zero, we can utilize the $zero register by inverting the conditional move instruction and by swapping the order of its True/False operands. Reviewers: dsanders Differential Revision: http://reviews.llvm.org/D7945 llvm-svn: 230956	2015-03-02 12:47:32 +00:00
Nico Weber	e4e94b4dca	Revert r230930, it caused PR22747. llvm-svn: 230932	2015-03-02 04:37:11 +00:00
Adrian Prantl	5e06e85b02	Refactor DebugLocDWARFExpression so it doesn't require access to the TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes of the pre-calculated DWARF expression. Ought to be NFC, but it does slightly alter the output format of the textual assembly. llvm-svn: 230930	2015-03-02 02:38:18 +00:00
Elena Demikhovsky	e032aa37b6	AVX-512: Added mask and rounding mode for scalar arithmetics Added more tests for scalar instructions to destinguish between AVX and AVX-512 forms. llvm-svn: 230891	2015-03-01 07:44:04 +00:00
Sanjay Patel	46d370c549	avoid infinite looping when folding vector multiplies of constants (PR22698) We were missing a check for the following fold in DAGCombiner: // fold (fmul (fmul x, c1), c2) -> (fmul x, (fmul c1, c2)) If 'x' is also a constant, then we shouldn't do anything. Otherwise, we could end up swapping the operands back and forth forever. This should fix: http://llvm.org/bugs/show_bug.cgi?id=22698 Differential Revision: http://reviews.llvm.org/D7917 llvm-svn: 230884	2015-03-01 00:09:35 +00:00
Sanjay Patel	0f499d9c03	fixed to test only the feature, not the feature and a CPU llvm-svn: 230883	2015-03-01 00:02:03 +00:00
Sanjay Patel	cd857fab16	make the tested feature (SSE2) explicit llvm-svn: 230881	2015-02-28 23:55:24 +00:00
Duncan P. N. Exon Smith	d6d88d0f42	DebugInfo: Fix invalid file reference in CodeGen/X86/unknown-location.ll There are two types of files in the old (current) debug info schema. !0 = !{!"some/filename", !"/path/to/dir"} !1 = !{!"0x29", !0} ; [ DW_TAG_file_type ] !1 has a wrapper class called `DIFile` which inherits from `DIScope` and is referenced in 'scope' fields. !0 is called a "file node", and debug info nodes with a 'file' field point at one of these directly -- although they're built in `DIBuilder` by sending in a `DIFile` and reaching into it. In the new hierarchy, I unified these nodes as `MDFile` (which `DIFile` is a lightweight wrapper for) in r230057. Moving the new hierarchy into place (and upgrading testcases) caused CodeGen/X86/unknown-location.ll to start failing -- apparently "0x29" was previously showing up in the linetable as a filename, causing: .loc 2 4 3 (where 2 points at filename "0x29") instead of: .loc 1 4 3 (where 1 points at the actual filename). Change the testcase to use the old schema correctly. llvm-svn: 230880	2015-02-28 23:52:24 +00:00
Sanjay Patel	5a17442330	fixed to test only the feature, not the feature and a CPU llvm-svn: 230878	2015-02-28 23:47:09 +00:00
Craig Topper	c3d656cc84	[X86] Remove the blendpd/blendps/pblendw/pblendd intrinsics. They can represented by shuffle_vector instructions. llvm-svn: 230860	2015-02-28 19:33:17 +00:00
Bill Schmidt	06671b00a3	Regenerated test case from pr 230801 for change in LLVM IR syntax llvm-svn: 230811	2015-02-27 23:29:57 +00:00
David Blaikie	a412edd409	Update SystemZ/Large test generators to handle new gep IR syntax llvm-svn: 230810	2015-02-27 23:29:39 +00:00
David Blaikie	74e5055c90	Update SystemZ/Large test generators to handle new load IR syntax llvm-svn: 230809	2015-02-27 23:29:33 +00:00
Bill Schmidt	57e433433f	Revert test case until it can be fixed llvm-svn: 230803	2015-02-27 22:31:14 +00:00
Bill Schmidt	1d4d434e1c	[PowerPC] Fix PR22711 - Misaligned .toc section Straightforward patch to emit an alignment directive when emitting a TOC entry. The test case was generated from the test in PR22711 that demonstrated a misaligned .toc section. The object code is run through llvm-readobj to verify that the correct alignment has been applied to the .toc section. Thanks to Ulrich Weigand for running down where the fix was needed. llvm-svn: 230801	2015-02-27 22:14:10 +00:00
David Blaikie	ab043ff680	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794	2015-02-27 21:17:42 +00:00
Charles Davis	5c83517500	Target/X86: Never use the redzone for Win64 ABI functions. Summary: Until now, we did this (among other things) based on whether or not the target was Windows. This is clearly wrong, not just for Win64 ABI functions on non-Windows, but for System V ABI functions on Windows, too. In this change, we make this decision based on the ABI the calling convention specifies instead. Reviewers: rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7953 llvm-svn: 230793	2015-02-27 21:11:16 +00:00
Hal Finkel	979cca3be8	[PowerPC] Use vector types for memcpy and friends (sometimes) When using Altivec, we can use vector loads and stores for aligned memcpy and friends. Starting with the P7 and VXS, we have reasonable unaligned vector stores. Starting with the P8, we have fast unaligned loads too. For QPX, we use vector loads are stores, but only for aligned memory accesses. llvm-svn: 230788	2015-02-27 19:58:28 +00:00
David Blaikie	0d99339102	[opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float> %x, ... ->getelementptr float, <4 x float> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.?[^%\w]getelementptr inbounds )(((?:<\d x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") normrep = re.compile( r"(^.?[^%\w]getelementptr )(((?:<\d* x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll \| xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786	2015-02-27 19:29:02 +00:00
Eric Christopher	7c8f775d46	Remove the Forward Control Flow Integrity pass and its dependencies. This work is currently being rethought along different lines and if this work is needed it can be resurrected out of svn. Remove it for now as no current work in ongoing on it and it's unused. Verified with the authors before removal. llvm-svn: 230780	2015-02-27 19:03:38 +00:00
Mehdi Amini	32875af6e3	Change the fast-isel-abort option from bool to int to enable "levels" Summary: Currently fast-isel-abort will only abort for regular instructions, and just warn for function calls, terminators, function arguments. There is already fast-isel-abort-args but nothing for calls and terminators. This change turns the fast-isel-abort options into an integer option, so that multiple levels of strictness can be defined. This will help no being surprised when the "abort" option indeed does not abort, and enables the possibility to write test that verifies that no intrinsics are forgotten by fast-isel. Reviewers: resistor, echristo Subscribers: jfb, llvm-commits Differential Revision: http://reviews.llvm.org/D7941 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 230775	2015-02-27 18:32:11 +00:00
Rafael Espindola	95e3cfd5ed	Centralize handling of the eh_begin and eh_end labels. This removes a bit of duplicated code and more importantly, remembers the labels so that they don't need to be looked up by name. This in turn allows for any name to be used and avoids a crash if the name we wanted was already taken. llvm-svn: 230772	2015-02-27 18:18:39 +00:00
Renato Golin	aec6373e10	Equally to NetBSD, Bitrig/ARM uses the Itanium-ABI. Patch by Patrick Wildt. llvm-svn: 230762	2015-02-27 16:35:27 +00:00
Zoran Jovanovic	35fa249416	[mips][microMIPS] Change register class for GP register Differential Revision: http://reviews.llvm.org/D7934 llvm-svn: 230760	2015-02-27 15:03:50 +00:00
Petar Jovanovic	46254c7473	Pass correct -mtriple for krait-cpu-div-attribute.ll Not passing mtriple for one of the tests caused a regression failure on MIPS buildbot. The issue was introduced by r230651. Differential Revision: http://reviews.llvm.org/D7938 llvm-svn: 230756	2015-02-27 14:46:41 +00:00
Chandler Carruth	9471b89b1d	[x86] Run most of the rest of the shuffle combining over non-128-bit vectors. This lets us fix the rest of the v16 lowering problems when pshufb is clearly better. We might still be able to improve some of the lowerings by enabling the other combine-based rewriting to fire for non-128-bit vectors, but this at least should remove any regressions from using the fancy v16i16 lowering strategy. llvm-svn: 230753	2015-02-27 12:13:14 +00:00
Chandler Carruth	36698dfd52	[x86] Teach a bunch of the x86-specific shuffle combining to work with 256-bit vectors as well as 128-bit vectors. Fixes some of the redundant shuffles for v16i16. llvm-svn: 230752	2015-02-27 11:45:13 +00:00
Chandler Carruth	a1c6bfd527	[x86] Make the v8i16 clever single-input shuffle lowering usable for repeated 128-bit lane shuffles of wider vector types and use it to lower 256-bit v16i16 vector shuffles where applicable. This should let us perfectly lowering the pattern of pshuflw and pshufhw even for AVX2 256-bit patterns. I've not added AVX-512 support, but it should be trivial for someone working on that to wire up. Note that currently this generates bad, long shuffle chains because we don't combine 256-bit target shuffles. The subsequent patches will fix that. llvm-svn: 230751	2015-02-27 11:33:46 +00:00
Chandler Carruth	c1e4fdbb66	[x86] Add a bunch more tests for v16i16 shuffles. All of these are taken by mirroring v8i16 test cases across both 128-bit lanes. This should highlight problems where we aren't correctly using 128-bit shuffles to implement things. llvm-svn: 230750	2015-02-27 11:25:10 +00:00
Vasileios Kalintiris	38b77a1ef8	[mips] Account for constant-zero operands in ADDE nodes. Summary: We identify the cases where the operand to an ADDE node is a constant zero. In such cases, we can avoid generating an extra ADDu instruction disguised as an identity move alias (ie. addu $r, $r, 0 --> move $r, $r). Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7906 llvm-svn: 230742	2015-02-27 09:01:39 +00:00
Charles Davis	6a532329fd	Target/X86: Save Win64 non-volatile registers in a Win64 ABI function. Summary: This change causes us to actually save non-volatile registers in a Win64 ABI function that calls a System V ABI function, and vice-versa. Reviewers: rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7919 llvm-svn: 230714	2015-02-27 00:57:01 +00:00
Rafael Espindola	efb97e5733	Put jump tables in distinct sections if -ffunction-sections is used. A small regression in r230411 was that we were basing the decision on -fdata-sections. llvm-svn: 230707	2015-02-26 23:55:11 +00:00
Chandler Carruth	38a7e2c9a6	[x86] Fix PR22706 where we would incorrectly try lower a v32i8 dynamic blend as legal. We made the same mistake in two different places. Whenever we are custom lowering a v32i8 blend we need to check whether we are custom lowering it only for constant conditions that can be shuffled, or whether we actually have AVX2 and full dynamic blending support on bytes. Both are fixed, with comments added to make it clear what is going on and a new test case. llvm-svn: 230695	2015-02-26 22:15:34 +00:00
Reid Kleckner	d2d2baefa1	Don't sibcall between SysV and Win64 convention functions The shadow stack space expectations won't match. Fixes PR22709. llvm-svn: 230667	2015-02-26 19:43:20 +00:00
Paul Robinson	b0132db1ee	When the source has a series of assignments, users reasonably want to have the debugger step through each one individually. Turn off the combine for adjacent stores at -O0 so we get this behavior. Possibly, DAGCombine shouldn't run at all at -O0, but that's for another day; see PR22346. Differential Revision: http://reviews.llvm.org/D7181 llvm-svn: 230659	2015-02-26 18:47:57 +00:00
Petar Jovanovic	40f9f8b625	Fix justify error for small structures in varargs for MIPS64BE There was a problem when passing structures as variable arguments. The structures smaller than 64 bit were not left justified on MIPS64 big endian. This is now fixed by shifting the value to make it left- justified when appropriate. This fixes the bug http://llvm.org/bugs/show_bug.cgi?id=21608 Patch by Aleksandar Beserminji. Differential Revision: http://reviews.llvm.org/D7881 llvm-svn: 230657	2015-02-26 18:35:15 +00:00
Sumanth Gundapaneni	105aa6d4e2	Use ".arch_extension" ARM directive to support hwdiv on krait In case of "krait" CPU, asm printer doesn't emit any ".cpu" so the features bits are not computed. This patch lets the asm printer emit ".cpu cortex-a9" directive for krait and the hwdiv feature is enabled through ".arch_extension". In short, krait is treated as "cortex-a9" with hwdiv. We can not emit ".krait" as CPU since it is not supported bu GNU GAS yet llvm-svn: 230651	2015-02-26 18:08:41 +00:00
Tom Stellard	ab0488f5cd	R600/SI: Remove M0 from DS assembly strings This matches the assembly syntax for the proprietary compiler. llvm-svn: 230645	2015-02-26 17:08:43 +00:00
Bruno Cardoso Lopes	a831838ec1	[X86][MMX] Fix a typo in a couple of tests llvm-svn: 230638	2015-02-26 15:16:09 +00:00
Bruno Cardoso Lopes	7282dad6b1	[X86][MMX] Remove widening experimental flag from MMX tests. Turns out that after the past MMX commits, we don't need to rely on this flag to get better codegen for MMX. Also update the tests to become triple neutral. llvm-svn: 230637	2015-02-26 15:10:38 +00:00
Vladimir Medic	c5ca3a9948	Replace obsolete -mattr=n64 command line option with -target-abi=n64. No functional changes. llvm-svn: 230628	2015-02-26 12:29:48 +00:00
Hal Finkel	49a12f79c1	[PowerPC] Make LDtocL and friends invariant loads LDtocL, and other loads that roughly correspond to the TOC_ENTRY SDAG node, represent loads from the TOC, which is invariant. As a result, these loads can be hoisted out of loops, etc. In order to do this, we need to generate GOT-style MMOs for TOC_ENTRY, which requires treating it as a legitimate memory intrinsic node type. Once this is done, the MMO transfer is automatically handled for TableGen-driven instruction selection, and for nodes generated directly in PPCISelDAGToDAG, we need to transfer the MMOs manually. Also, we were not transferring MMOs associated with pre-increment loads, so do that too. Lastly, this fixes an exposed bug where R30 was not added as a defined operand of UpdateGBR. This problem was highlighted by an example (used to generate the test case) posted to llvmdev by Francois Pichet. llvm-svn: 230553	2015-02-25 21:36:59 +00:00
David Majnemer	0e99f5bfb9	X86, Win64: Allow 'mov' to restore the stack pointer if we have a FP The Win64 epilogue structure is very restrictive, it permits a very small number of opcodes and none of them are 'mov'. This means that given: mov %rbp, %rsp pop %rbp The mov isn't the epilogue, only the pop is. This is problematic unless a frame pointer is present in which case we are free to do whatever we'd like in the "body" of the function. If a frame pointer is present, unwinding will undo the prologue operations in reverse order regardless of the fact that we are at an instruction which is reseting the stack pointer. llvm-svn: 230543	2015-02-25 21:13:37 +00:00
Sanjoy Das	60e1014097	Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap (The change was landed in r230280 and caused the regression PR22674. This version contains a fix and a test-case for PR22674). When emitting the increment operation, SCEVExpander marks the operation as nuw or nsw based on the flags on the preincrement SCEV. This is incorrect because, for instance, it is possible that {-6,+,1} is <nuw> while {-6,+,1}+1 = {-5,+,1} is not. This change teaches SCEV to mark the increment as nuw/nsw only if it can explicitly prove that the increment operation won't overflow. Apart from the attached test case, another (more realistic) manifestation of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll. Differential Revision: http://reviews.llvm.org/D7778 llvm-svn: 230533	2015-02-25 20:02:59 +00:00
Vladimir Medic	66d30602b2	[MIPS]Multiple and add instructions for Mips are currently available in mips32r2/mips64r2 and later but should also be available in mips4, mips5, and mips64. This patch fixes the requested features and updates the corresponding test files. llvm-svn: 230500	2015-02-25 15:24:37 +00:00
Bruno Cardoso Lopes	5570fdd2bb	[X86][MMX] Reapply: Add MMX instructions to foldable tables Reapply r230248. Teach the peephole optimizer to work with MMX instructions by adding entries into the foldable tables. This covers folding opportunities not handled during isel. llvm-svn: 230499	2015-02-25 15:14:02 +00:00
Renato Golin	e3109d3bbd	Improve handling of stack accesses in Thumb-1 Thumb-1 only allows SP-based LDR and STR to be word-sized, and SP-base LDR, STR, and ADD only allow offsets that are a multiple of 4. Make some changes to better make use of these instructions: * Use word loads for anyext byte and halfword loads from the stack. * Enforce 4-byte alignment on objects accessed in this way, to ensure that the offset is valid. * Do the same for objects whose frame index is used, in order to avoid having to use more than one ADD to generate the frame index. * Correct how many bits of offset we think AddrModeT1_s has. Patch by John Brawn. llvm-svn: 230496	2015-02-25 14:41:06 +00:00
Vladimir Medic	84f0d33461	Replace obsolete -mattr=n64 command line option with -target-abi=n64. No functional changes. llvm-svn: 230482	2015-02-25 11:43:01 +00:00
Hal Finkel	157f3b2eaa	[PowerPC] Add triples to QPX tests Some of these tests fail on Darwin systems because of a lack of a triple; fix that. llvm-svn: 230421	2015-02-25 01:26:59 +00:00
Hal Finkel	67b5b15e9e	[PowerPC] Add support for the QPX vector instruction set This adds support for the QPX vector instruction set, which is used by the enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes wide, holding 4 double-precision floating-point values. Boolean values, modeled here as <4 x i1> are actually also represented as floating-point values (essentially { -1, 1 } for { false, true }). QPX shares many features with Altivec and VSX, but is distinct from both of them. One major difference is that, instead of adding completely-separate vector registers, QPX vector registers are extensions of the scalar floating-point registers (lane 0 is the corresponding scalar floating-point value). The operations supported on QPX vectors mirrors that supported on the scalar floating-point values (with some additional ones for permutations and logical/comparison operations). I've been maintaining this support out-of-tree, as part of the bgclang project, for several years. This is not the entire bgclang patch set, but is most of the subset that can be cleanly integrated into LLVM proper at this time. Adding this to the LLVM backend is part of my efforts to rebase bgclang to the current LLVM trunk, but is independently useful (especially for codes that use LLVM as a JIT in library form). The assembler/disassembler test coverage is complete. The CodeGen test coverage is not, but I've included some tests, and more will be added as follow-up work. llvm-svn: 230413	2015-02-25 01:06:45 +00:00
Rafael Espindola	565d0485ca	Support SHF_MERGE sections in COMDATs. This patch unifies the comdat and non-comdat code paths. By doing this it add missing features to the comdat side and removes the fixed section assumptions from the non-comdat side. In ELF there is no one true section for "4 byte mergeable" constants. We are better off computing the required properties of the section and asking the context for it. llvm-svn: 230411	2015-02-25 00:52:15 +00:00
Eric Christopher	ceeffadd1e	Make this test even more OS and register allocation neutral. llvm-svn: 230404	2015-02-25 00:12:11 +00:00
Eric Christopher	e992fbba6a	Make this test not dependent upon the triple. All that was needed was some flexibility in the check line for the comment basic block. llvm-svn: 230400	2015-02-24 23:43:26 +00:00
Simon Pilgrim	50fa51790b	Reapplied D7816 & rL230177 & rL230278 - with an additional fix toensure that the smallest build vector input scalar type is always used. Additional (crash) test cases already committed. llvm-svn: 230388	2015-02-24 22:08:56 +00:00
Simon Pilgrim	ce1717761f	Added test case for PR22678 (check CONCAT_VECTORS DAG combiner pass doesn't introduce illegal types) llvm-svn: 230386	2015-02-24 21:46:23 +00:00
Andrew Kaylor	914fdbf483	Fixing eol-style llvm-svn: 230378	2015-02-24 20:49:35 +00:00
Eric Christopher	db420cc1dd	Revert: Author: Simon Pilgrim <llvm-dev@redking.me.uk> Date: Mon Feb 23 23:04:28 2015 +0000 Fix based on post-commit comment on D7816 & rL230177 - BUILD_VECTOR operand truncation was using the the BV's output scalar type instead of the input type. and Author: Simon Pilgrim <llvm-dev@redking.me.uk> Date: Sun Feb 22 18:17:28 2015 +0000 [DagCombiner] Generalized BuildVector Vector Concatenation The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node. This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well. This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper. Differential Revision: http://reviews.llvm.org/D7816 as the root cause of PR22678 which is causing an assertion inside the DAG combiner. I'll follow up to the main thread as well. llvm-svn: 230358	2015-02-24 19:11:00 +00:00
Matthias Braun	45ead6c770	AArch64: Relax assert about large shift sizes. The reason why these large shift sizes happen is because OpaqueConstants currently inhibit alot of DAG combining, but that has to be addressed in another commit (like the proposal in D6946). Differential Revision: http://reviews.llvm.org/D6940 llvm-svn: 230355	2015-02-24 18:52:04 +00:00
Tom Stellard	266ed1f528	R600/SI: Remove isel mubuf legalization We legalize mubuf instructions post-instruction selection, so this code is no longer needed. llvm-svn: 230352	2015-02-24 17:59:19 +00:00
Tim Northover	afcf85da25	ARM: treat [N x i32] and [N x i64] as AAPCS composite types The logic is almost there already, with our special homogeneous aggregate handling. Tweaking it like this allows front-ends to emit AAPCS compliant code without ever having to count registers or add discarded padding arguments. Only arrays of i32 and i64 are needed to model AAPCS rules, but I decided to apply the logic to all integer arrays for more consistency. llvm-svn: 230348	2015-02-24 17:22:34 +00:00
Hans Wennborg	fb5af71543	Revert r230280: "Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap" This caused PR22674, failing this assert: Instructions.h:2281: llvm::Value* llvm::PHINode::getOperand(unsigned int) const: Assertion `i_nocapture < OperandTraits<PHINode>::operands(this) && "getOperand() out of range!"' failed. llvm-svn: 230341	2015-02-24 16:19:29 +00:00
Michael Kuperstein	0bcb785609	[x32] x32 should use ebx as the base pointer. This fixes the original issue in PR22655, but not the secondary one. llvm-svn: 230334	2015-02-24 15:27:13 +00:00
Reed Kotler	05834965d0	Beginning of alloca implementation for Mips fast-isel Summary: Begin to add various address modes; including alloca. Test Plan: Make sure there are no regressions in test-suite at O0/02 in mips32r1/r2 Reviewers: dsanders Reviewed By: dsanders Subscribers: echristo, rfuhler, llvm-commits Differential Revision: http://reviews.llvm.org/D6426 llvm-svn: 230300	2015-02-24 02:36:45 +00:00
David Majnemer	f29778c502	X86: Only use 'lea' in Win64 epilogues if a frame pointer exists We can only use 'add' in epilogues, 'lea' is not permitted unless we've established a frame pointer in the prologue. llvm-svn: 230286	2015-02-24 00:11:32 +00:00
Sanjoy Das	9dddcf7f33	Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap When emitting the increment operation, SCEVExpander marks the operation as nuw or nsw based on the flags on the preincrement SCEV. This is incorrect because, for instance, it is possible that {-6,+,1} is <nuw> while {-6,+,1}+1 = {-5,+,1} is not. This change teaches SCEV to mark the increment as nuw/nsw only if it can explicitly prove that the increment operation won't overflow. Apart from the attached test case, another (more realistic) manifestation of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll. NOTE: this change was landed with an incorrect commit message in rL230275 and was reverted for that reason in rL230279. This commit message is the correct one. Differential Revision: http://reviews.llvm.org/D7778 llvm-svn: 230280	2015-02-23 23:22:58 +00:00
Sanjoy Das	e5ca05754e	Revert 230275. 230275 got committed with an incorrect commit message due to a mixup on my side. Will re-land in a few moments with the correct commit message. llvm-svn: 230279	2015-02-23 23:13:22 +00:00
Andrea Di Biagio	fc5bcff188	[X86] Teach how to custom lower double-to-half conversions under fast-math. This patch teaches the backend how to expand a double-half conversion into a double-float conversion immediately followed by a float-half conversion. We do this only under fast-math, and if float-half conversions are legal for the target. Added test CodeGen/X86/fastmath-float-half-conversion.ll Differential Revision: http://reviews.llvm.org/D7832 llvm-svn: 230276	2015-02-23 22:59:02 +00:00
Sanjoy Das	421baaa45f	Fix bug 22641 The bug was a result of getPreStartForExtend interpreting nsw/nuw flags on an add recurrence more strongly than is legal. {S,+,X}<nsw> implies S+X is nsw only if the backedge of the loop is taken at least once. Differential Revision: http://reviews.llvm.org/D7808 llvm-svn: 230275	2015-02-23 22:55:13 +00:00
David Majnemer	b6ac6360c4	X86: Use a smaller 'mov' instruction for stack probe calls Prologue emission, in some cases, requires calls to a stack probe helper function. The amount of stack to probe is passed as a register argument in the Win64 ABI but the instruction sequence used is pessimistic: it assumes that the number of bytes to probe is greater than 4 GB. Instead, select a more appropriate opcode depending on the number of bytes we are going to probe. llvm-svn: 230270	2015-02-23 21:50:30 +00:00
David Majnemer	5bffee412d	X86: Use 'mov' instead of 'lea' in Win64 SEH prologues when possible 'mov' and 'lea' are equivalent when the displacement applied with 'lea' is zero. However, 'mov' should encode smaller. llvm-svn: 230269	2015-02-23 21:50:27 +00:00
Bruno Cardoso Lopes	a8c6f58e91	[X86][MMX] Fix test to reflect current codegen This test failed in several buildbots, a bit unclear how that happen since this was the previous behavior before r230248. llvm-svn: 230258	2015-02-23 20:57:46 +00:00
Andrew Kaylor	a496164a47	Adding test for Windows EH frame variable remapping. llvm-svn: 230250	2015-02-23 20:04:51 +00:00
Andrew Kaylor	4e424eb088	Remap frame variables for native Windows exception handling. Differential Revision: http://reviews.llvm.org/D7770 llvm-svn: 230249	2015-02-23 20:01:56 +00:00
Bruno Cardoso Lopes	390bb4572c	Revert "[X86][MMX] Add MMX instructions to foldable tables" This reverts commit r230226 since it breaks win buildbots. llvm-svn: 230248	2015-02-23 19:53:37 +00:00
Daniel Sanders	0eca0162a3	[mips] Honour -mno-odd-spreg for vector insert/extract when MSA is enabled. Summary: -mno-odd-spreg prohibits the use of odd-numbered single-precision floating point registers. However, vector insert/extract was still using them when manipulating the subregisters of an MSA register. Fixed this by ensuring that insertion/extraction is only performed on even-numbered vector registers when -mno-odd-spreg is given. Reviewers: vmedic, sstankovic Reviewed By: sstankovic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7672 llvm-svn: 230235	2015-02-23 17:22:16 +00:00
Bruno Cardoso Lopes	01a01f5b4a	[X86] Add specific mtriple in order to appease builbots llvm-svn: 230229	2015-02-23 15:33:40 +00:00
Bruno Cardoso Lopes	ae9a9b4601	[X86][MMX] Add MMX instructions to foldable tables Teach the peephole optimizer to work with MMX instructions by adding entries into the foldable tables. This covers folding opportunities not handled during isel. llvm-svn: 230226	2015-02-23 15:23:22 +00:00
Bruno Cardoso Lopes	8fe8621194	[X86][MMX] Support folding loads in psll, psrl and psra intrinsics llvm-svn: 230225	2015-02-23 15:23:14 +00:00
Bruno Cardoso Lopes	8b65bbca30	[X86][MMX] Add tests for pslli, psrli and psrai intrinsics Add tests to cover the RR form of the pslli, psrli and psrai intrinsics. In the next commit, the loads are going to be folded and the instructions use the RM form. llvm-svn: 230224	2015-02-23 15:23:06 +00:00
Elena Demikhovsky	b15d81ba19	AVX-512: recommitted 229837 + bugfix + test llvm-svn: 230223	2015-02-23 15:12:31 +00:00
Simon Pilgrim	fb2ad0f430	[DagCombiner] Generalized BuildVector Vector Concatenation The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node. This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well. This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper. Differential Revision: http://reviews.llvm.org/D7816 llvm-svn: 230177	2015-02-22 18:17:28 +00:00
Matt Arsenault	18256b1e8b	R600/SI: Use v_madmk_f32 llvm-svn: 230149	2015-02-21 21:29:10 +00:00
Matt Arsenault	b126670f9a	R600/SI: Try to use v_madak_f32 This is a code size optimization when the constant only has one use. llvm-svn: 230148	2015-02-21 21:29:07 +00:00
Simon Pilgrim	1794ce916b	[X86][SSE] Added shuffle based integer zero extension tests. llvm-svn: 230145	2015-02-21 21:25:16 +00:00
David Majnemer	15414a7819	Win64: Stack alignment constraints aren't applied during SET_FPREG Stack realignment occurs after the prolog, not during, for Win64. Because of this, don't factor in the maximum stack alignment when establishing a frame pointer. This fixes PR22572. llvm-svn: 230113	2015-02-21 01:04:47 +00:00
Rafael Espindola	2661d3d69a	Use short names for jumptable sections. Also refactor code to remove some duplication. llvm-svn: 230087	2015-02-20 23:28:28 +00:00
Matt Arsenault	d59f9a3d0d	R600/SI: Remove v_sub_f64 pseudo The expansion code does the same thing. Since the operands were not defined with the correct types, this has the side effect of fixing operand folding since the expanded pseudo would never use SGPRs or inline immediates. llvm-svn: 230072	2015-02-20 22:10:45 +00:00
Matt Arsenault	82f523f0fa	R600: Use new fmad node. This enables a few useful combines that used to only use fma. Also since v_mad_f32 apparently does not support denormals, disable the existing cases that are custom handled if they are requested. llvm-svn: 230071	2015-02-20 22:10:41 +00:00
Jozef Kolek	6455b0cb7f	Reversed revision 229706. The reason is regression, which is caused by the usage of instruction ADDU16 by CodeGen. For this instruction an improper register is allocated, i.e. the register that is not from register set defined for the instruction. llvm-svn: 230053	2015-02-20 20:26:52 +00:00
Andrea Di Biagio	bd0813ded2	[X86][FastIsel] Teach how to select float-half conversion intrinsics. This patch teaches X86FastISel how to select intrinsic 'convert_from_fp16' and intrinsic 'convert_to_fp16'. If the target has F16C, we can select VCVTPS2PHrr for a float-half conversion, and VCVTPH2PSrr for a half-float conversion. Differential Revision: http://reviews.llvm.org/D7673 llvm-svn: 230043	2015-02-20 19:37:14 +00:00
Kit Barton	e41ef2390a	I incorrectly marked the VORC instruction as isCommutable when I added it. This fix removes the VORC instruction definition from the isCommutable block. Phabricator review: http://reviews.llvm.org/D7772 llvm-svn: 230020	2015-02-20 15:54:58 +00:00
Hal Finkel	03acdd5b32	[PowerPC] Loop Data Prefetching for the BG/Q The IBM BG/Q supercomputer's A2 cores have a hardware prefetching unit, the L1P, but it does not prefetch directly into the A2's L1 cache. Instead, it prefetches into its own L1P buffer, and the latency to access that buffer is significantly higher than that to the L1 cache (although smaller than the latency to the L2 cache). As a result, especially when multiple hardware threads are not actively busy, explicitly prefetching data into the L1 cache is advantageous. I've been using this pass out-of-tree for data prefetching on the BG/Q for well over a year, and it has worked quite well. It is enabled by default only for the BG/Q, but can be enabled for other cores as well via a command-line option. Eventually, we might want to add some TTI interfaces and move this into Transforms/Scalar (there is nothing particularly target dependent about it, although only machines like the BG/Q will benefit from its simplistic strategy). llvm-svn: 229966	2015-02-20 05:08:21 +00:00
Chandler Carruth	ef5f7bdec3	[x86] Remove the old vector shuffle lowering code and its flag. The new shuffle lowering has been the default for some time. I've enabled the new legality testing by default with no really blocking regressions. I've fuzz tested this very heavily (many millions of fuzz test cases have passed at this point). And this cleans up a ton of code. =] Thanks again to the many folks that helped with this transition. There was a lot of work by others that went into the new shuffle lowering to make it really excellent. In case you aren't using a diff algorithm that can handle this: X86ISelLowering.cpp: 22 insertions(+), 2940 deletions(-) llvm-svn: 229964	2015-02-20 04:25:04 +00:00
Chandler Carruth	55eca885d4	[x86] Now that the new vector shuffle legality is enabled and everything is going well, remove the flag and the code for the old legality tests. This is the first step toward removing the entire old vector shuffle lowering. Much more code to delete coming up next. llvm-svn: 229963	2015-02-20 03:59:35 +00:00
Chandler Carruth	6b218ec380	[x86] Make the new vector shuffle legality test on by default, which reflects the fact that the x86 backend can in fact lower any shuffle you want it to with reasonably high code quality. My recent work on the new vector shuffle has made this regress very little. The diff in the test cases makes me very, very happy. llvm-svn: 229958	2015-02-20 03:05:47 +00:00
Chandler Carruth	74f60d7a7d	[x86] Clean up a couple of test cases with the new update script. Split one test case that is only partially tested in 32-bits into two test cases so that the script doesn't generate massive spews of tests for the cases we don't care about. llvm-svn: 229955	2015-02-20 02:44:13 +00:00
Chandler Carruth	4c2aad4a26	Revert r229944: EH: Prune unreachable resume instructions during Dwarf EH preparation This doesn't pass 'ninja check-llvm' for me. Lots of tests, including the ones updated, fail with crashes and other explosions. llvm-svn: 229952	2015-02-20 02:15:36 +00:00
Reid Kleckner	13eabcdf32	EH: Prune unreachable resume instructions during Dwarf EH preparation Today a simple function that only catches exceptions and doesn't run destructor cleanups ends up containing a dead call to _Unwind_Resume (PR20300). We can't remove these dead resume instructions during normal optimization because inlining might introduce additional landingpads that do have cleanups to run. Instead we can do this during EH preparation, which is guaranteed to run after inlining. Fixes PR20300. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D7744 llvm-svn: 229944	2015-02-20 01:00:19 +00:00
Eric Christopher	c93875565e	Revert "AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics." The instructions were being generated on architectures that don't support avx512. This reverts commit r229837. llvm-svn: 229942	2015-02-20 00:45:28 +00:00
Ahmed Bougacha	5f490e6f09	[ARM] Re-re-apply VLD1/VST1 base-update combine. This re-applies r223862, r224198, r224203, and r224754, which were reverted in r228129 because they exposed Clang misalignment problems when self-hosting. The combine caused the crashes because we turned ISD::LOAD/STORE nodes to ARMISD::VLD1/VST1_UPD nodes. When selecting addressing modes, we were very lax for the former, and only emitted the alignment operand (as in "[r1:128]") when it was larger than the standard alignment of the memory type. However, for ARMISD nodes, we just used the MMO alignment, no matter what. In our case, we turned ISD nodes to ARMISD nodes, and this caused the alignment operands to start being emitted. And that's how we exposed alignment problems that were ignored before (but I believe would have been caught with SCTRL.A==1?). To fix this, we can just mirror the hack done for ISD nodes: only take into account the MMO alignment when the access is overaligned. Original commit message: We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). rdar://19717869, rdar://14062261. llvm-svn: 229932	2015-02-19 23:52:41 +00:00
Sanjay Patel	3da4617a80	add X86 load folding tests for unary math ops X86 load folding is fragile; eg, the tests here don't work without AVX even though they should. This is because we have a mix of tablegen patterns that have been added over time, and we have a load folding table used by the peephole optimizer that has to be kept in sync with the ever-changing ISA and tablegen defs. llvm-svn: 229870	2015-02-19 16:59:11 +00:00
Chandler Carruth	6d618f4a18	[x86] Delete still more piles of complex code now that we have a good systematic lowering of v8i16. This required a slight strategy shift to prefer unpack lowerings in more places. While this isn't a cut-and-dry win in every case, it is in the overwhelming majority. There are only a few places where the old lowering would probably be a touch faster, and then only by a small margin. In some cases, this is yet another significant improvement. llvm-svn: 229859	2015-02-19 15:21:57 +00:00
Chandler Carruth	fad94c932a	[x86] Teach the unpack lowering how to lower with an initial unpack in addition to lowering to trees rooted in an unpack. This saves shuffles and or registers in many various ways, lets us handle another class of v4i32 shuffles pre SSE4.1 without domain crosses, etc. llvm-svn: 229856	2015-02-19 15:06:13 +00:00
Chandler Carruth	6f2050671d	[x86] Dramatically improve v8i16 shuffle lowering by not using its terribly complex partial blend logic. This code path was one of the more complex and bug prone when it first went in and it hasn't faired much better. Ultimately, with the simpler basis for unpack lowering and support bit-math blending, this is completely obsolete. In the worst case without this we generate different but equivalent instructions. However, in many cases we generate much better code. This is especially true when blends or pshufb is available. This does expose one (minor) weakness of the unpack lowering that I'll try to address. In case you were wondering, this is actually a big part of what I've been trying to pull off in the recent string of commits. llvm-svn: 229853	2015-02-19 14:08:24 +00:00
Chandler Carruth	73087f5ce9	[x86] Remove the final fallback in the v8i16 lowering that isn't really needed, and significantly improve the SSSE3 path. This makes the new strategy much more clear. If we can blend, we just go with that. If we can't blend, we try to permute into an unpack so that we handle cases where the unpack doing the blend also simplifies the shuffle. If that fails and we've got SSSE3, we now call into factored-out pshufb lowering code so that we leverage the fact that pshufb can set up a blend for us while shuffling. This generates great code, especially because we know we don't have a fast blend at this point. Finally, we fall back on decomposing into permutes and blends because we do at least have a bit-math-based blend if we need to use that. This pretty significantly improves some of the v8i16 code paths. We never need to form pshufb for the single-input shuffles because we have effective target-specific combines to form it there, but we were missing its effectiveness in the blends. llvm-svn: 229851	2015-02-19 13:56:49 +00:00
Chandler Carruth	ba91b52308	[x86] Simplify the pre-SSSE3 v16i8 lowering significantly by decomposing them into permutes and a blend with the generic decomposition logic. This works really well in almost every case and lets the code only manage the expansion of a single input into two v8i16 vectors to perform the actual shuffle. The blend-based merging is often much nicer than the pack based merging that this replaces. The only place where it isn't we end up blending between two packs when we could do a single pack. To handle that case, just teach the v2i64 lowering to handle these blends by digging out the operands. With this we're down to only really random permutations that cause an explosion of instructions. llvm-svn: 229849	2015-02-19 13:15:12 +00:00
Chandler Carruth	b0373058d2	[x86] Remove the insanely over-aggressive unpack lowering strategy for v16i8 shuffles, and replace it with new facilities. This uses precise patterns to match exact unpacks, and the new generalized unpack lowering only when we detect a case where we will have to shuffle both inputs anyways and they terminate in exactly a blend. This fixes all of the blend horrors that I uncovered by always lowering blends through the vector shuffle lowering. It also removes sooooo much of the crazy instruction sequences required for v16i8 lowering previously. Much cleaner now. The only "meh" aspect is that we sometimes use pshufb+pshufb+unpck when it would be marginally nicer to use pshufb+pshufb+por. However, the difference there is tiny. In many cases its a win because we re-use the pshufb mask. In others, we get to avoid the pshufb entirely. I've left a FIXME, but I'm dubious we can really do better than this. I'm actually pretty happy with this lowering now. For SSE2 this exposes some horrors that were really already there. Those will have to fixed by changing a different path through the v16i8 lowering. llvm-svn: 229846	2015-02-19 12:10:37 +00:00
Jozef Kolek	2a4c7551a1	[mips][microMIPS] Make usage of AND16, OR16 and XOR16 by code generator Differential Revision: http://reviews.llvm.org/D7611 llvm-svn: 229845	2015-02-19 11:51:32 +00:00
Elena Demikhovsky	41438d50e6	AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics. llvm-svn: 229837	2015-02-19 10:48:04 +00:00
Chandler Carruth	72b437995e	[x86] Add support for bit-wise blending and use it in the v8 and v16 lowering paths. I'm going to be leveraging this to simplify a lot of the overly complex lowering of v8 and v16 shuffles in pre-SSSE3 modes. Sadly, this isn't profitable on v4i32 and v2i64. There, the float and double blending instructions for pre-SSE4.1 are actually pretty good, and we can't beat them with bit math. And once SSE4.1 comes around we have direct blending support and this ceases to be relevant. Also, some of the test cases look odd because the domain fixer canonicalizes these to floating point domain. That's OK, it'll use the integer domain when it matters and some day I may be able to update enough of LLVM to canonicalize the other way. This restores almost all of the regressions from teaching x86's vselect lowering to always use vector shuffle lowering for blends. The remaining problems are because the v16 lowering path is still doing crazy things. I'll be re-arranging that strategy in more detail in subsequent commits to finish recovering the performance here. llvm-svn: 229836	2015-02-19 10:46:52 +00:00
Chandler Carruth	105d2fa5e8	[x86,sdag] Two interrelated changes to the x86 and sdag code. First, don't combine bit masking into vector shuffles (even ones the target can handle) once operation legalization has taken place. Custom legalization of vector shuffles may exist for these patterns (making the predicate return true) but that custom legalization may in some cases produce the exact bit math this matches. We only really want to handle this prior to operation legalization. However, the x86 backend, in a fit of awesome, relied on this. What it would do is mark VSELECTs as expand, which would turn them into arithmetic, which this would then match back into vector shuffles, which we would then lower properly. Amazing. Instead, the second change is to teach the x86 backend to directly form vector shuffles from VSELECT nodes with constant conditions, and to mark all of the vector types we support lowering blends as shuffles as custom VSELECT lowering. We still mark the forms which actually support variable blends as legal so that the custom lowering is bypassed, and the legal lowering can even be used by the vector shuffle legalization (yes, i know, this is confusing. but that's how the patterns are written). This makes the VSELECT lowering much more sensible, and in fact should fix a bunch of bugs with it. However, as you'll see in the test cases, right now what it does is point out the hilarious deficiency of the new vector shuffle lowering when it comes to blends. Fortunately, my very next patch fixes that. I can't submit it yet, because that patch, somewhat obviously, forms the exact and/or pattern that the DAG combine is matching here! Without this patch, teaching the vector shuffle lowering to produce the right code infloops in the DAG combiner. With this patch alone, we produce terrible code but at least lower through the right paths. With both patches, all the regressions here should be fixed, and a bunch of the improvements (like using 2 shufps with no memory loads instead of 2 andps with memory loads and an orps) will stay. Win! There is one other change worth noting here. We had hilariously wrong vectorization cost estimates for vselect because we fell through to the code path that assumed all "expand" vector operations are scalarized. However, the "expand" lowering of VSELECT is vector bit math, most definitely not scalarized. So now we go back to the correct if horribly naive cost of "1" for "not scalarized". If anyone wants to add actual modeling of shuffle costs, that would be cool, but this seems an improvement on its own. Note the removal of 16 and 32 "costs" for doing a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of course, we don't right now because of OMG bad code, but I'm going to fix that. Next patch. I promise. llvm-svn: 229835	2015-02-19 10:36:19 +00:00
Peter Collingbourne	258bbf91ac	llvm-mc: Use Target::createNullStreamer to fix crashes on target-specific asm directives. llvm-svn: 229798	2015-02-19 00:45:04 +00:00
Chandler Carruth	4ca147df5a	[x86] Merge checks for a recently added test case that is the same on all SSE variants and AVX variants. llvm-svn: 229770	2015-02-18 23:20:49 +00:00
Reid Kleckner	53bc7b1858	Add an IR-to-IR test for dwarf EH preparation using opt This tests the simple resume instruction elimination logic that we have before making some changes to it. llvm-svn: 229768	2015-02-18 23:17:41 +00:00
Reid Kleckner	c8eb2119c4	dos2unix the WinEH file and tests llvm-svn: 229735	2015-02-18 19:52:46 +00:00
Andrew Kaylor	eca1819627	Adding implementation to outline C++ catch handlers for native Windows 64 exception handling. Differential Revision: http://reviews.llvm.org/D7363 llvm-svn: 229715	2015-02-18 18:31:51 +00:00
Jozef Kolek	6b4e19ed7b	[mips][microMIPS] Make usage of ADDU16 and SUBU16 by code generator Differential Revision: http://reviews.llvm.org/D7609 llvm-svn: 229706	2015-02-18 17:33:56 +00:00
Daniel Sanders	1701d245d4	[mips] Add backend support for Mips32r[35] and Mips64r[35]. Summary: These ISA's didn't add any instructions so they are almost identical to Mips32r2 and Mips64r2. Even the ELF e_flags are the same, However the ISA revision in .MIPS.abiflags is 3 or 5 respectively instead of 2. Reviewers: vmedic Reviewed By: vmedic Subscribers: tomatabacu, llvm-commits, atanasyan Differential Revision: http://reviews.llvm.org/D7381 llvm-svn: 229695	2015-02-18 16:24:50 +00:00
Kit Barton	96c9271be4	This patch adds the VSX logical instructions introduced in the Power ISA 2.07. It also removes the added complexity that favors VMX versions of the three instructions. Phabricator review: http://reviews.llvm.org/D7616 Commiting on Nemanja's behalf. llvm-svn: 229694	2015-02-18 16:21:46 +00:00
Vasileios Kalintiris	89ab39ae46	[mips] Avoid redundant sign extension of the result of binary bitwise instructions. Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7581 llvm-svn: 229675	2015-02-18 14:57:05 +00:00
Bradley Smith	427ac5088a	[ARM] Add missing M/R class CPUs Add some of the missing M and R class Cortex CPUs, namely: Cortex-M0+ (called Cortex-M0plus for GCC compatibility) Cortex-M1 SC000 SC300 Cortex-R5 llvm-svn: 229660	2015-02-18 10:33:30 +00:00
Michael Kuperstein	8617e481c7	Fixes two issue in SimplifyDemandedBits of sext_in_reg: 1) We should not try to simplify if the sext has multiple uses 2) There is no need to simplify is the source value is already sign-extended. Patch by Gil Rapaport <gil.rapaport@intel.com> Differential Revision: http://reviews.llvm.org/D6949 llvm-svn: 229659	2015-02-18 09:43:40 +00:00
Chandler Carruth	fbca0e7b75	[x86] Refactor the bit shift code the same as I just did the byte shift code. While this didn't have the miscompile (it used MatchLeft consistently) it missed some cases where it could use right shifts. I've added a test case Craig Topper came up with to exercise the right shift matching. This code is really identical between the two. I'm going to merge them next so that we don't keep two copies of all of this logic. llvm-svn: 229655	2015-02-18 09:19:58 +00:00
Ulrich Weigand	d35c322d54	[SystemZ] Support all TLS access models - CodeGen part The current SystemZ back-end only supports the local-exec TLS access model. This patch adds all required CodeGen support for the other TLS models, which means in particular: - Expand initial-exec TLS accesses by loading TLS offsets from the GOT using @indntpoff relocations. - Expand general-dynamic and local-dynamic accesses by generating the appropriate calls to __tls_get_offset. Note that this routine has a non-standard ABI and requires loading the GOT pointer into %r12, so the patch also adds support for the GLOBAL_OFFSET_TABLE ISD node. - Add a new platform-specific optimization pass to remove redundant __tls_get_offset calls in the local-dynamic model (modeled after the corresponding X86 pass). - Add test cases verifying all access models and optimizations. llvm-svn: 229654	2015-02-18 09:13:27 +00:00
Daniel Jasper	3f53d83cd0	Remove experimental options to control machine block placement. This reverts r226034. Benchmarking with those flags has not revealed anything interesting. llvm-svn: 229648	2015-02-18 08:18:07 +00:00
Elena Demikhovsky	2ae2229fab	AVX-512: Added support for FP instructions with embedded rounding mode. By Asaf Badouh <asaf.badouh@intel.com> llvm-svn: 229645	2015-02-18 07:59:20 +00:00
Craig Topper	347250558c	[X86] Add another test case for the bug fixed in r229642. With the bug a vpsrldq was emitted instead of pslldq. llvm-svn: 229643	2015-02-18 07:45:43 +00:00
Chandler Carruth	e6fc612338	[x86] Rewrite the byte shift detection to not use boolean variables to track state. I didn't like this in the code review because the pattern tends to be error prone, but I didn't see a clear way to rewrite it. Turns out that there were bugs here, I found them when fuzz testing our shuffle lowering for correctness on x86. The core of the problem is that we need to consistently test all our preconditions for the same directionality of shift and the same input vector. Instead, formulate this as two predicates (one doesn't depend on the input in any way), pass things like the directionality and input vector as inputs, and loop over the alternatives. This fixes a pattern of very rare miscompiles coming out of this code. Turned up roughly 4 out of every 1 million v8 shuffles in my fuzz testing. The new code is over half a million test runs with no failures yet. I've also fuzzed every other function in the lowering code with over 3.5 million test cases and not discovered any other miscompiles. llvm-svn: 229642	2015-02-18 07:13:48 +00:00
Craig Topper	398dc737fa	[X86] Remove AVX2 and SSE2 pslldq and psrldq intrinsics. We can represent them in IR with vector shuffles now. All their uses have been removed from clang in favor of shuffles. llvm-svn: 229640	2015-02-18 06:24:44 +00:00
Matt Arsenault	670643e5d9	R600/SI: Add missing offset operand to buffer bothen llvm-svn: 229605	2015-02-18 02:04:38 +00:00
Matt Arsenault	3d3d606dd0	R600/SI: Add missing soffset operand to global atomics llvm-svn: 229604	2015-02-18 02:04:35 +00:00
Andrea Di Biagio	016c12ee8d	[X86][FastIsel] Teach how to select scalar integer to float/double conversions. This patch teaches fast-isel how to select a (V)CVTSI2SSrr for an integer to float conversion, and how to select a (V)CVTSI2SDrr for an integer to double conversion. Added test 'fast-isel-int-float-conversion.ll'. Differential Revision: http://reviews.llvm.org/D7698 llvm-svn: 229589	2015-02-17 23:40:58 +00:00
Rafael Espindola	be6855ec2a	Add r228939 back with a fix. The problem in the original patch was not switching back to .text after printing an eh table. Original message: On ELF, put PIC jump tables in a non executable section. Fixes PR22558. llvm-svn: 229586	2015-02-17 23:34:51 +00:00
Rafael Espindola	686783175f	Add a test showing the problem in r228939. If an EH table is printed in between the function and the jump table we would fail to switch back to the text section to print the jump table. llvm-svn: 229580	2015-02-17 23:21:46 +00:00
Simon Pilgrim	6da8e17d64	[X86][SSE] Generalised unpckl/unpckh shuffle matching Added commuted unpckl/unpckh shuffle matching patterns as many cases containing undefined lanes fail to commute by themselves. Differential Revision: http://reviews.llvm.org/D7564 llvm-svn: 229571	2015-02-17 22:24:32 +00:00
Sanjay Patel	f2c498a0c4	use a triple instead of a cpu; less builbot sadness llvm-svn: 229563	2015-02-17 21:59:54 +00:00
Rafael Espindola	6d441224de	Add testcases I missed in r229541. llvm-svn: 229542	2015-02-17 20:50:39 +00:00
Sanjay Patel	ec471315ab	make basic block label matching more flexible for less sad buildbots llvm-svn: 229535	2015-02-17 20:29:31 +00:00
Tom Stellard	225ebd9101	R600/SI: Fix asam errors in SIFoldOperands We were trying to fold into implicit uses, which led to out of bounds access of the MCInstrDesc::OpInfo arrray. llvm-svn: 229533	2015-02-17 20:11:54 +00:00
Sanjay Patel	5abe0e38c5	prevent folding a scalar FP load into a packed logical FP instruction (PR22371) Change the memory operands in sse12_fp_packed_scalar_logical_alias from scalars to vectors. That's what the hardware packed logical FP instructions define: 128-bit memory operands. There are no scalar versions of these instructions...because this is x86. Generating the wrong code (folding a scalar load into a 128-bit load) is still possible using the peephole optimization pass and the load folding tables. We won't completely solve this bug until we either fix the lowering in fabs/fneg/fcopysign and any other places where scalar FP logic is created or fix the load folding in foldMemoryOperandImpl() to make sure it isn't changing the size of the load. Differential Revision: http://reviews.llvm.org/D7474 llvm-svn: 229531	2015-02-17 20:08:21 +00:00
Sanjay Patel	0117b06b3f	Canonicalize splats as build_vectors (PR22283) This is a follow-on patch to: http://reviews.llvm.org/D7093 That patch canonicalized constant splats as build_vectors, and this patch removes the constant check so we can canonicalize all splats as build_vectors. This fixes the 2nd test case in PR22283: http://llvm.org/bugs/show_bug.cgi?id=22283 The unfortunate code duplication between SelectionDAG and DAGCombiner is discussed in the earlier patch review. At least this patch is just removing code... This improves an existing x86 AVX test and changes codegen in an ARM test. Differential Revision: http://reviews.llvm.org/D7389 llvm-svn: 229511	2015-02-17 16:54:32 +00:00
Tom Stellard	521960e294	R600/SI: Extend private extload pattern to include zext loads llvm-svn: 229507	2015-02-17 16:36:00 +00:00
Andrea Di Biagio	c38ea9f435	[X86][FastISel] Add missing flag -fast-isel-abort to run lines in test fast-isel-fptrunc-fpext.ll. Flag -fast-isel-abort is required in order to verify that X86FastISel never fails to select FPExt (float-to-double) and FPTrunc (double-to-float). No Functional change intended. llvm-svn: 229489	2015-02-17 12:25:49 +00:00
Elena Demikhovsky	30ee20b16b	AVX-512: changes in intel_ocl_bi calling conventions - added mask types v8i1 and v16i1 to possible function parameters - enabled passing 512-bit vectors in standard CC - added a test for KNL intel_ocl_bi conventions llvm-svn: 229482	2015-02-17 09:20:12 +00:00
Michael Kuperstein	812c46b9de	[X86] Combine vector anyext + and into a vector zext Vector zext tends to get legalized into a vector anyext, represented as a vector shuffle with an undef vector + a bitcast, that gets ANDed with a mask that zeroes the undef elements. Combine this into an explicit shuffle with a zero vector instead. This allows shuffle lowering to match it as a zext, instead of matching it as an anyext and emitting an explicit AND. This combine only covers a subset of the cases, but it's a start. Differential Revision: http://reviews.llvm.org/D7666 llvm-svn: 229480	2015-02-17 08:22:51 +00:00
Eric Christopher	49ad15fa29	Move ABI handling and 64-bitness to the PowerPC target machine. This required changing how the computation of the ABI is handled and how some of the checks for ABI/target are done. llvm-svn: 229471	2015-02-17 06:45:15 +00:00

1 2 3 4 5 ...

12198 Commits