llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 06:22:56 +02:00

Author	SHA1	Message	Date
Pete Cooper	9f89f00988	DAG legalisation can now handle illegal fma vector types by scalarisation llvm-svn: 159092	2012-06-24 00:05:44 +00:00
Craig Topper	e824497e82	Remove intrinsic specific instructions for (V)CVTDQ2PS. Use a Pat instead instead. llvm-svn: 159090	2012-06-23 22:33:14 +00:00
Craig Topper	59fcd68657	Make CVTDQ2PS instruction use SSE2 predicate instead of SSE1. No functional change because there are no patterns in the instructions. Also fix a typo in a comment. llvm-svn: 159087	2012-06-23 20:52:45 +00:00
Craig Topper	caf5a8e7aa	Move CVTPD2DQ to use SSE2 predicate instead of SSE3. Move DQ2PD and PD2DQ to the SSE2 section of the file. llvm-svn: 159086	2012-06-23 20:15:42 +00:00
Craig Topper	7067c92fbb	Use correct memory types for (V)CVTDQ2PD instructions. llvm-svn: 159075	2012-06-23 08:30:27 +00:00
Craig Topper	0c7acb290b	Compress flags in X86 op folding to reduce space in static tables. llvm-svn: 159073	2012-06-23 08:01:18 +00:00
Craig Topper	0dc1a9bf89	Make helper method static since it doesn't use anything in the class. llvm-svn: 159071	2012-06-23 04:58:41 +00:00
Craig Topper	b05bd41aed	Remove intrinsic specific instructions for 128-bit (V)CVTDQ2PD. Replace with intrinsic patterns. Mem forms omitted because the load size is only 64-bits. llvm-svn: 159070	2012-06-23 04:23:36 +00:00
Rafael Espindola	048a927ab5	Handle aliases to tls variables in all architectures, not just x86. llvm-svn: 159058	2012-06-23 00:30:03 +00:00
Chad Rosier	25837f2c81	Whitespace. llvm-svn: 159035	2012-06-22 22:07:19 +00:00
Jakob Stoklund Olesen	3efab18404	Functions calling __builtin_eh_return must have a frame pointer. The code in X86TargetLowering::LowerEH_RETURN() assumes that a frame pointer exists, but the frame pointer was forced by the presence of llvm.eh.unwind.init which isn't guaranteed. If llvm.eh.unwind.init is actually required in functions calling eh.return (is it?), we should diagnose that instead of emitting bad machine code. This should fix the dragonegg-x86_64-linux-gcc-4.6-test bot. llvm-svn: 158961	2012-06-22 03:04:27 +00:00
Chandler Carruth	6f8cc37074	Remove 'static' from inline functions defined in header files. There is a pretty staggering amount of this in LLVM's header files, this is not all of the instances I'm afraid. These include all of the functions that (in my build) are used by a non-static inline (or external) function. Specifically, these issues were caught by the new '-Winternal-linkage-in-inline' warning. I'll try to just clean up the remainder of the clearly redundant "static inline" cases on functions (not methods!) defined within headers if I can do so in a reliable way. There were even several cases of a missing 'inline' altogether, or my personal favorite "static bool inline". Go figure. ;] llvm-svn: 158800	2012-06-20 08:39:33 +00:00
Craig Topper	f19d6cef51	Add predicate check around some patterns. llvm-svn: 158797	2012-06-20 07:30:23 +00:00
Craig Topper	54d8fe551b	Add predicate check around some patterns. llvm-svn: 158795	2012-06-20 07:01:11 +00:00
Craig Topper	d63e429d68	Don't insert 128-bit UNDEF into 256-bit vectors. Just keep the 256-bit vector. Original patch by Elena Demikhovsky. Tweaked by me to allow possibility of covering more cases. llvm-svn: 158792	2012-06-20 05:39:26 +00:00
Rafael Espindola	38c45a939d	Move the support for using .init_array from ARM to the generic TargetLoweringObjectFileELF. Use this to support it on X86. Unlike ARM, on X86 it is not easy to find out if .init_array should be used or not, so the decision is made via TargetOptions and defaults to off. Add a command line option to llc that enables it. llvm-svn: 158692	2012-06-19 00:48:28 +00:00
Chandler Carruth	d2716ae111	Temporarily revert r158087. This patch causes problems when both dynamic stack realignment and dynamic allocas combine in the same function. With this patch, we no longer build the epilog correctly, and silently restore registers from the wrong position in the stack. Thanks to Matt for tracking this down, and getting at least an initial test case to Chad. I'm going to try to check a variation of that test case in so we can easily track the fixes required. llvm-svn: 158654	2012-06-18 07:03:12 +00:00
Kay Tiong Khoo	7247ab8114	*no need to pollute Intel syntax with bonus mnemonics; operand size is explicitly specified llvm-svn: 158603	2012-06-16 17:19:49 +00:00
Kay Tiong Khoo	a419828b83	*fixed to separate mnemonic from operands with tab llvm-svn: 158543	2012-06-15 21:04:21 +00:00
Craig Topper	19cfb998fd	Move AVX version of convert instructions that write to GPRs to the Op1 table. llvm-svn: 158497	2012-06-15 07:02:58 +00:00
Pete Cooper	e1c5e7bf9f	Move X86::VCVTTSD2SIrr from the 2 operand to 1 operand MemRegOp table. Can someone with more knowledge of this please look at other entries to see if others need moved. llvm-svn: 158474	2012-06-14 22:12:58 +00:00
Craig Topper	b2299168d3	Fix intrinsics for XOP frczss/sd instructions. These instructions only take one source register and zero the upper bits of the destination rather than preserving them. llvm-svn: 158396	2012-06-13 07:18:53 +00:00
Craig Topper	b355582afd	Add intrinsics for immediate form of XOP vprot instructions. Use i128mem instead of f128mem for integer XOP instructions. llvm-svn: 158291	2012-06-10 07:31:56 +00:00
Craig Topper	633e88fd15	Use XOP vpcom intrinsics in patterns instead of a target specific SDNode type. Remove the custom lowering code that selected the SDNode type. llvm-svn: 158279	2012-06-09 17:02:24 +00:00
Craig Topper	ad5e38e410	Replace XOP vpcom intrinsics with fewer intrinsics that take the immediate as an argument. llvm-svn: 158278	2012-06-09 16:46:13 +00:00
Manman Ren	186346ff90	Enable optimization for integer ABS on X86 if Subtarget has CMOV. llvm-svn: 158220	2012-06-08 18:58:26 +00:00
Manman Ren	f51a6d5fae	X86: optimize generated code for integer ABS This patch will generate the following for integer ABS: movl %edi, %eax negl %eax cmovll %edi, %eax INSTEAD OF movl %edi, %ecx sarl $31, %ecx leal (%rdi,%rcx), %eax xorl %ecx, %eax There exists a target-independent DAG combine for integer ABS, which converts integer ABS to sar+add+xor. For X86, we match this pattern back to neg+cmov. This is implemented in PerformXorCombine. rdar://10695237 llvm-svn: 158175	2012-06-07 22:39:10 +00:00
Nadav Rotem	7d996eafba	Do not optimize the used bits of the x86 vselect condition operand, when the condition operand is a vector of 1-bit predicates. This may happen on MIC devices. llvm-svn: 158168	2012-06-07 20:53:48 +00:00
Manman Ren	c8e46bcf47	PR13046: we can't replace usage of SUB with CMP in the lowering phase. It will cause assertion failure later on. llvm-svn: 158160	2012-06-07 19:27:33 +00:00
Rafael Espindola	4c9d611360	Use a base register instead of an index register with the local dynamic model. Fixes pr13048. llvm-svn: 158158	2012-06-07 18:39:19 +00:00
Manman Ren	1d91fc3342	X86: replace SUB with CMP if possible This patch will optimize the following movq %rdi, %rax subq %rsi, %rax cmovsq %rsi, %rdi movq %rdi, %rax to cmpq %rsi, %rdi cmovsq %rsi, %rdi movq %rdi, %rax Perform this optimization if the actual result of SUB is not used. rdar: 11540023 llvm-svn: 158126	2012-06-07 00:42:47 +00:00
Manman Ren	f591de61da	Revert r157755. The commit is intended to fix rdar://11540023. It is implemented as part of peephole optimization. We can actually implement this in the SelectionDAG lowering phase. llvm-svn: 158122	2012-06-06 23:53:03 +00:00
Benjamin Kramer	d93c18846c	Remove unused private fields found by clang's new -Wunused-private-field. There are some that I didn't remove this round because they looked like obvious stubs. There are dead variables in gtest too, they should be fixed upstream. llvm-svn: 158090	2012-06-06 18:25:08 +00:00
Chad Rosier	5a354cd5e8	Add support for dynamic stack realignment in the presence of dynamic allocas on X86. rdar://11496434 llvm-svn: 158087	2012-06-06 17:37:40 +00:00
Craig Topper	8d23b98818	Mark several instructions SSE2 instead of SSE3 as they should be. llvm-svn: 158049	2012-06-06 06:45:27 +00:00
Andrew Trick	6d6fa07808	X86 itinerary properties. llvm-svn: 157981	2012-06-05 03:44:46 +00:00
Andrew Trick	8b333df134	whitespace llvm-svn: 157976	2012-06-05 03:44:29 +00:00
Hans Wennborg	60ee5bbc4f	Better comments for TLS-related X86 MachineOperand flags. llvm-svn: 157920	2012-06-04 09:55:36 +00:00
Craig Topper	52bf0cfb27	Add intrinsic forms for FMA instructions to opcode folding tables. llvm-svn: 157917	2012-06-04 07:46:16 +00:00
Craig Topper	eb2d859f52	Add VFMADDSUB and VFMSUBADD FMA instructions to folding tables. Also add 213 forms of scalar FMA instructions. llvm-svn: 157914	2012-06-04 07:08:21 +00:00
Craig Topper	5837bcfc02	Rename FMA3 feature flag to just FMA to match gcc so it can be added to clang. llvm-svn: 157903	2012-06-03 18:58:46 +00:00
Craig Topper	8d3031fa46	Rename fma4 intrinsics to just fma since they are now used for both FMA4 and FMA3. Autoupgrade support coming in a separate commit. llvm-svn: 157898	2012-06-03 07:26:46 +00:00
Manman Ren	c3a6de9953	Revert r157831 llvm-svn: 157896	2012-06-03 03:14:24 +00:00
Craig Topper	685b86b007	Use sse_load_f32/64 for scalar FMA3 intrinsic patterns instead of 128-bit loads to match instruction behavior. llvm-svn: 157895	2012-06-03 01:40:43 +00:00
Craig Topper	e783584ea7	Add neverHasSideEffects and mayLoad to FMA3 instructions. llvm-svn: 157894	2012-06-03 00:30:49 +00:00
Benjamin Kramer	bb30e1face	Fix typos found by http://github.com/lyda/misspell-check llvm-svn: 157885	2012-06-02 10:20:22 +00:00
Jakob Stoklund Olesen	be0b8939c0	Switch all register list clients to the new MC*Iterator interface. No functional change intended. Sorry for the churn. The iterator classes are supposed to help avoid giant commits like this one in the future. The TableGen-produced register lists are getting quite large, and it may be necessary to change the table representation. This makes it possible to do so without changing all clients (again). llvm-svn: 157854	2012-06-01 23:28:30 +00:00
Manman Ren	74ccc117d5	X86: peephole optimization to remove cmp instruction This patch will optimize the following: sub r1, r3 cmp r3, r1 or cmp r1, r3 bge L1 TO sub r1, r3 bge L1 or ble L1 If the branch instruction can use flag from "sub", then we can eliminate the "cmp" instruction. llvm-svn: 157831	2012-06-01 19:49:33 +00:00
Hans Wennborg	4344ad4a86	Implement the local-dynamic TLS model for x86 (PR3985) This implements codegen support for accesses to thread-local variables using the local-dynamic model, and adds a clean-up pass so that the base address for the TLS block can be re-used between local-dynamic access on an execution path. llvm-svn: 157818	2012-06-01 16:27:21 +00:00
Craig Topper	04dc566bb2	Enable automatic detection of FMA3 support to allow intrinsics to be used. llvm-svn: 157805	2012-06-01 06:10:14 +00:00
Craig Topper	02e5a00a70	Remove fadd(fmul) patterns for FMA3. This needs to be implemented by paying attention to FP_CONTRACT and matching @llvm.fma which is not available yet. This will allow us to enablle intrinsic use at least though. llvm-svn: 157804	2012-06-01 06:07:48 +00:00
Craig Topper	07982d8752	Add VFNSUB* instructions to folding table. llvm-svn: 157802	2012-06-01 05:48:39 +00:00
Craig Topper	4b884d6fc3	Remove a trailing space and fix a comment. llvm-svn: 157801	2012-06-01 05:34:01 +00:00
Craig Topper	6343cc814e	Tidy up. Remove trailing spaces and fix the worst of the 80 column violations. llvm-svn: 157799	2012-06-01 05:24:29 +00:00
Chad Rosier	901c912b26	Put the shiny new MCSubRegIterator to work. llvm-svn: 157783	2012-06-01 00:02:08 +00:00
Jakob Stoklund Olesen	a77ddf1405	Add support for return value promotion in X86 calling conventions. Patch by Yiannis Tsiouris! llvm-svn: 157757	2012-05-31 17:28:20 +00:00
Manman Ren	82e2c9debf	X86: replace SUB with CMP if possible This patch will optimize the following movq %rdi, %rax subq %rsi, %rax cmovsq %rsi, %rdi movq %rdi, %rax to cmpq %rsi, %rdi cmovsq %rsi, %rdi movq %rdi, %rax Perform this optimization if the actual result of SUB is not used. rdar: 11540023 llvm-svn: 157755	2012-05-31 17:20:29 +00:00
Benjamin Kramer	cb686400fb	X86: Rename the CLMUL target feature to PCLMUL. It was renamed in gcc/gas a while ago and causes all kinds of confusion because it was named differently in llvm and clang. llvm-svn: 157745	2012-05-31 14:34:17 +00:00
Elena Demikhovsky	194da7364d	Added FMA3 Intel instructions. I disabled FMA3 autodetection, since the result may differ from expected for some benchmarks. I added tests for GodeGen and intrinsics. I did not change llvm.fma.f32/64 - it may be done later. llvm-svn: 157737	2012-05-31 09:20:20 +00:00
Craig Topper	c6bd90e646	Add intrinsic for pclmulqdq instruction. llvm-svn: 157731	2012-05-31 04:37:40 +00:00
Chris Lattner	1d39403e7b	it's pointed out that R11 can be used for magic things, and doing things just for 64-bit registers is silly. Just optimize 3 more. llvm-svn: 157699	2012-05-30 18:08:02 +00:00
Chris Lattner	505bcf70a0	Extend the (abi-irrelevant) return convention to be able to return more than two values in integer registers. This is already supported by the fastcc convention, but it doesn't hurt to support it in the standard conventions as well. In cases where we can cheat at the calling convention, this allows us to avoid returning things through memory in more cases. llvm-svn: 157698	2012-05-30 17:50:14 +00:00
Benjamin Kramer	11fa8a91e7	Port support for SSE4a extrq/insertq to the old jit code emitter. llvm-svn: 157685	2012-05-30 09:13:55 +00:00
Benjamin Kramer	0c823ae0ed	Add intrinsics, code gen, assembler and disassembler support for the SSE4a extrq and insertq instructions. This required light surgery on the assembler and disassembler because the instructions use an uncommon encoding. They are the only two instructions in x86 that use register operands and two immediates. llvm-svn: 157634	2012-05-29 19:05:25 +00:00
Justin Holewinski	77c4679dae	Change interface for TargetLowering::LowerCallTo and TargetLowering::LowerCall to pass around a struct instead of a large set of individual values. This cleans up the interface and allows more information to be added to the struct for future targets without requiring changes to each and every target. NV_CONTRIB llvm-svn: 157479	2012-05-25 16:35:28 +00:00
Eli Friedman	d89582030a	Simplify code for calling a function where CanLowerReturn fails, fixing a small bug in the process. llvm-svn: 157446	2012-05-25 00:09:29 +00:00
Craig Topper	c5bd0cba8f	Use uint16_t to store register number in static tables to match other tables. llvm-svn: 157374	2012-05-24 05:55:47 +00:00
Chad Rosier	b11b7f8b69	Tidy up naming for consistency and other cleanup. No functional change intended. llvm-svn: 157358	2012-05-23 23:45:10 +00:00
Craig Topper	49c52dde2b	Tidy up spacing. llvm-svn: 157313	2012-05-23 05:44:51 +00:00
Craig Topper	5161134794	Fix indentation of wrapped line for readability. No functional change. llvm-svn: 157309	2012-05-23 03:59:53 +00:00
Craig Topper	0fe3f1297b	Fix constant used for pshufb mask when lowering v16i8 shuffles. Bug introduced in r157043. Fixes PR12908. llvm-svn: 157236	2012-05-22 06:09:38 +00:00
Craig Topper	57d54b1831	Allow 256-bit shuffles to still be split even if only half of the shuffle comes from two 128-bit pieces. llvm-svn: 157175	2012-05-21 06:40:16 +00:00
Jakob Stoklund Olesen	e9ba1b5df0	Make the global base reg GR32_NOSP. It can sometimes be used in addressing modes that don't support %ESP. llvm-svn: 157165	2012-05-20 18:43:00 +00:00
Nadav Rotem	a46c75d9a7	On Haswell, perfer storing YMM registers using a single instruction. llvm-svn: 157129	2012-05-19 20:30:08 +00:00
Nadav Rotem	441318ee29	Add support for additional in-reg vbroadcast patterns llvm-svn: 157127	2012-05-19 19:57:37 +00:00
Craig Topper	55b5aa4042	Tidy up some spacing and inconsistent use of pre/post increment. No functional change intended. llvm-svn: 157122	2012-05-19 19:14:18 +00:00
Craig Topper	3f21aba382	Copy some AVX support from MCJIT to JIT. Maybe will fix PR12748. llvm-svn: 157109	2012-05-19 08:28:17 +00:00
Jim Grosbach	343a996ca5	Refactor data-in-code annotations. Use a dedicated MachO load command to annotate data-in-code regions. This is the same format the linker produces for final executable images, allowing consistency of representation and use of introspection tools for both object and executable files. Data-in-code regions are annotated via ".data_region"/".end_data_region" directive pairs, with an optional region type. data_region_directive := ".data_region" { region_type } region_type := "jt8" \| "jt16" \| "jt32" \| "jta32" end_data_region_directive := ".end_data_region" The previous handling of ARM-style "$d.*" labels was broken and has been removed. Specifically, it didn't handle ARM vs. Thumb mode when marking the end of the section. rdar://11459456 llvm-svn: 157062	2012-05-18 19:12:01 +00:00
Craig Topper	45a709baab	Simplify code a bit. No functional change intended. llvm-svn: 157044	2012-05-18 07:07:36 +00:00
Craig Topper	d86e4c0088	Simplify handling of v16i8 shuffles and fix a missed optimization. llvm-svn: 157043	2012-05-18 06:42:06 +00:00
Evan Cheng	9a33fc17be	Avoid creating a cycle when folding load / op with flag / store. PR11451474. rdar://11451474 llvm-svn: 156896	2012-05-16 01:54:27 +00:00
Jim Grosbach	2e62e2f664	Allow MCCodeEmitter access to the target MCRegisterInfo. Add the MCRegisterInfo to the factories and constructors. Patch by Tom Stellard <Tom.Stellard@amd.com>. llvm-svn: 156828	2012-05-15 17:35:52 +00:00
Dan Gohman	cc1f60a86c	Rename @llvm.debugger to @llvm.debugtrap. llvm-svn: 156774	2012-05-14 18:58:10 +00:00
Chad Rosier	4141fa486f	Typo. llvm-svn: 156633	2012-05-11 19:43:29 +00:00
Preston Gurd	691d5f1eb6	Added X86 Atom latencies to X86InstrMMX.td. llvm-svn: 156615	2012-05-11 14:27:12 +00:00
Hans Wennborg	a5a417fcd3	Implement initial-exec TLS model for 32-bit PIC x86 This fixes a TODO from 2007 :) Previously, LLVM would emit the wrong code here (see the update to test/CodeGen/X86/tls-pie.ll). llvm-svn: 156611	2012-05-11 10:11:01 +00:00
Dan Gohman	ed475ad173	Define a new intrinsic, @llvm.debugger. It will be similar to __builtin_trap(), but it generates int3 on x86 instead of ud2. llvm-svn: 156593	2012-05-11 00:19:32 +00:00
Preston Gurd	236873fb5d	Added X86 Atom latencies for instructions in X86InstrInfo.td. llvm-svn: 156579	2012-05-10 21:58:35 +00:00
Nadav Rotem	05a2f42f29	Fix merge-typo and cleanup llvm-svn: 156541	2012-05-10 12:50:02 +00:00
Nadav Rotem	157be301c5	AVX2: Add an additional broadcast idiom. llvm-svn: 156540	2012-05-10 12:39:13 +00:00
Nadav Rotem	64319ce27c	Generate AVX/AVX2 shuffles even when there is a memory op somewhere else in the program. Starting r155461 we are able to select patterns for vbroadcast even when the load op is used by other users. Fix PR11900. llvm-svn: 156539	2012-05-10 12:22:05 +00:00
Jakob Stoklund Olesen	88cf278739	Use ptr_rc_tailcall instead of GR32_TC. The getPointerRegClass() hook will return GR32_TC, or whatever is appropriate for the current function. Patch by Yiannis Tsiouris! llvm-svn: 156459	2012-05-09 01:50:09 +00:00
Jakob Stoklund Olesen	989c6b112d	s/CSR_Ghc/CSR_NoRegs/ Share the CalleeSavedRegs defs between all calling conventions having no callee-saved registers. Patch by Yiannis Tsiouris! llvm-svn: 156382	2012-05-08 15:07:29 +00:00
Craig Topper	77b1a4cee5	Remove 256-bit AVX non-temporal store intrinsics. Similar was previously done for 128-bit. llvm-svn: 156375	2012-05-08 06:58:15 +00:00
Jakob Stoklund Olesen	cc0cf22b98	Add an MF argument to TRI::getPointerRegClass() and TII::getRegClass(). The getPointerRegClass() hook can return register classes that depend on the calling convention of the current function (ptr_rc_tailcall). So far, we have been able to infer the calling convention from the subtarget alone, but as we add support for multiple calling conventions per target, that no longer works. Patch by Yiannis Tsiouris! llvm-svn: 156328	2012-05-07 22:10:26 +00:00
Chad Rosier	3e284d8bd6	Fix a regression from r147481. This combine should only happen if there is a single use. rdar://11360370 llvm-svn: 156316	2012-05-07 18:47:44 +00:00
Manman Ren	6fde9f74b4	X86: optimization for -(x != 0) This patch will optimize -(x != 0) on X86 FROM cmpl $0x01,%edi sbbl %eax,%eax notl %eax TO negl %edi sbbl %eax %eax In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td: def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>; rdar: 10961709 llvm-svn: 156312	2012-05-07 18:06:23 +00:00
Craig Topper	02644ca6b7	Fix some issues in the f16c instructions. llvm-svn: 156287	2012-05-07 06:00:15 +00:00
Craig Topper	c6d0bc2afc	Add SSE4A MOVNTSS/MOVNTSD instructions. llvm-svn: 156281	2012-05-07 05:36:19 +00:00
Craig Topper	4246b08208	Use MVT instead of EVT as the argument to all the shuffle decode functions. Simplify some of the decode functions. llvm-svn: 156268	2012-05-06 19:46:21 +00:00
Craig Topper	b3b4c9476d	Add VPERMQ/VPERMPD to the list of target specific shuffles that can be looked through for DAG combine purposes. llvm-svn: 156266	2012-05-06 18:54:26 +00:00
Craig Topper	b95ee6cfc1	Add shuffle decode support for VPERMQ/VPERMPD. llvm-svn: 156265	2012-05-06 18:44:02 +00:00
Jim Grosbach	f7461026c2	Nuke a few dead remnants of the CBE. llvm-svn: 156241	2012-05-05 17:45:12 +00:00
Benjamin Kramer	7a9528b540	Add a new target hook "predictableSelectIsExpensive". This will be used to determine whether it's profitable to turn a select into a branch when the branch is likely to be predicted. Currently enabled for everything but Atom on X86 and Cortex-A9 devices on ARM. I'm not entirely happy with the name of this flag, suggestions welcome ;) llvm-svn: 156233	2012-05-05 12:49:14 +00:00
Preston Gurd	8de39bd4f6	Adds Intel Atom scheduling latencies to X86InstrSystem.td. llvm-svn: 156194	2012-05-04 19:26:37 +00:00
Craig Topper	88bf1f4404	Fix some loops to match coding standards. No functional change intended. llvm-svn: 156159	2012-05-04 06:39:13 +00:00
Craig Topper	3845ea5b9e	Fix up some spacing. No functional change. llvm-svn: 156158	2012-05-04 06:18:33 +00:00
Craig Topper	71aab70d71	Simplify broadcast lowering code. No functional change intended. llvm-svn: 156157	2012-05-04 05:49:51 +00:00
Craig Topper	6881f1067c	Allow v16i16 and v32i8 shuffles to be rewritten as narrower shuffles. llvm-svn: 156156	2012-05-04 04:44:49 +00:00
Craig Topper	f7516089b7	Simplify shuffle narrowing code a bit. No functional change intended. llvm-svn: 156154	2012-05-04 04:08:44 +00:00
Jakob Stoklund Olesen	7bdae32bfd	Remove the SubRegClasses field from RegisterClass descriptions. This information in now computed by TableGen. llvm-svn: 156152	2012-05-04 03:30:34 +00:00
Craig Topper	9bdd3bb279	Use 'unsigned' instead of 'int' in a few places dealing with counts of vector elements. llvm-svn: 156060	2012-05-03 07:26:59 +00:00
Craig Topper	52869bf5bf	Fix 256-bit vpshuflw and vpshufhw immediate encoding to handle undefs in the lower half correctly. Missed in r155982. llvm-svn: 156059	2012-05-03 07:12:59 +00:00
Preston Gurd	047af997f6	For Intel Atom, use ILP scheduling always, instead of ILP for 64 bit and Hybrid for 32 bit, since benchmarks show ILP scheduling is better most of the time. llvm-svn: 156028	2012-05-02 22:02:02 +00:00
Preston Gurd	24f13ffba6	Change the Intel Atom detection code to recognize Lincroft and Medfield. llvm-svn: 156025	2012-05-02 21:38:46 +00:00
Preston Gurd	29e60325bf	This patch continues the work of adding instruction latencies for X86 Atom, by providing the latencies for the instructions in X86InstrFPStack.td. llvm-svn: 155996	2012-05-02 16:03:35 +00:00
Manman Ren	0bdd46e32e	Revert r155853 The commit is intended to fix rdar://10961709. But it is the root cause of PR12720. Revert it for now. llvm-svn: 155992	2012-05-02 15:24:32 +00:00
Craig Topper	00ccecdc84	Add support for selecting AVX2 vpshuflw and vpshufhw. Add decoding support for AsmPrinter. llvm-svn: 155982	2012-05-02 08:03:44 +00:00
Jakub Staszak	5a4bcd5559	Remove unneeded break. llvm-svn: 155959	2012-05-01 23:08:16 +00:00
Jakub Staszak	56c14bb368	Remove trailing spaces. llvm-svn: 155956	2012-05-01 23:04:38 +00:00
Preston Gurd	bee1603263	This patch marks the X86 floating point stack registers ST0-ST7 as reserved in order to avoid assertion failures in the register scavenger. The assertion failures were “Bad machine code: Using an undefined physical register” and “Bad machine code: MBB exits via unconditional fall-through but its successor differs from its CFG successor!”. llvm-svn: 155930	2012-05-01 19:50:22 +00:00
Manman Ren	2a032bd8f9	X86: optimization for max-like struct This patch will optimize the following cases on X86 (a > b) ? (a-b) : 0 (a >= b) ? (a-b) : 0 (b < a) ? (a-b) : 0 (b <= a) ? (a-b) : 0 FROM movl %edi, %ecx subl %esi, %ecx cmpl %edi, %esi movl $0, %eax cmovll %ecx, %eax TO xorl %eax, %eax subl %esi, %edi cmovll %eax, %edi movl %edi, %eax rdar: 10734411 llvm-svn: 155919	2012-05-01 17:16:15 +00:00
Alexey Samsonov	246af5318a	X86: Use StackRegister instead of FrameRegister in getFrameIndexReference (to generate debug info for local variables) if stack needs realignment llvm-svn: 155917	2012-05-01 15:16:06 +00:00
Bill Wendling	003b1bf46c	Change the PassManager from a reference to a pointer. The TargetPassManager's default constructor wants to initialize the PassManager to 'null'. But it's illegal to bind a null reference to a null l-value. Make the ivar a pointer instead. PR12468 llvm-svn: 155902	2012-05-01 08:27:43 +00:00
Craig Topper	1624fb0549	Allow BMI, AES, F16C, POPCNT, FMA3, and CLMUL to be detected on AMD processors. llvm-svn: 155899	2012-05-01 07:10:32 +00:00
Craig Topper	405f995b07	Make XOP and FMA4 require SSE4A to match GCC behavior. Use this to simplify Bulldozer feature list. llvm-svn: 155897	2012-05-01 06:54:48 +00:00
Craig Topper	0272669dd1	Attempt to handle MRMInitReg in emitVEXOpcodePrefix. Hopefully fixes PR12711. llvm-svn: 155896	2012-05-01 06:34:01 +00:00
Craig Topper	d4974e4713	Make XOP imply AVX as its needed to legalize the registers types. llvm-svn: 155891	2012-05-01 05:41:41 +00:00
Craig Topper	9fa14ed244	Remove HasSSE2 from AES and CLMUL predicates. It's now implied by the HasAES and HasCLMUL predicates. llvm-svn: 155890	2012-05-01 05:35:02 +00:00
Craig Topper	50be3b60a4	Make CLMUL and AES imply SSE2 since its needed to legalize the type. llvm-svn: 155888	2012-05-01 05:28:32 +00:00
Craig Topper	cfc6060070	Enable AVX and FMA4 for AMD Bulldozer processors. llvm-svn: 155885	2012-05-01 05:18:13 +00:00
Manman Ren	0a8b8b491f	X86: optimization for -(x != 0) This patch will optimize -(x != 0) on X86 FROM cmpl $0x01,%edi sbbl %eax,%eax notl %eax TO negl %edi sbbl %eax %eax llvm-svn: 155853	2012-04-30 22:51:25 +00:00
Chad Rosier	0092397f80	Tidy up. No functional change intended. llvm-svn: 155832	2012-04-30 17:47:15 +00:00
Derek Schuff	85abcc8498	Fix fastcc structure return with fast-isel on x86-32 On x86-32, structure return via sret lets the callee pop the hidden pointer argument off the stack, which the caller then re-pushes. However if the calling convention is fastcc, then a register is used instead, and the caller should not adjust the stack. This is implemented with a check of IsTailCallConvention X86TargetLowering::LowerCall but is now checked properly in X86FastISel::DoSelectCall. (this time, actually commit what was reviewed!) llvm-svn: 155825	2012-04-30 16:57:15 +00:00
Craig Topper	78a563fd27	No need to normalize index before calling Extract128BitVector llvm-svn: 155811	2012-04-30 05:17:10 +00:00
Pete Cooper	584ad8ab86	Copied all the VEX prefix encoding code from X86MCCodeEmitter to the x86 JIT emitter. Needs some major refactoring as these two code emitters are almost identical llvm-svn: 155810	2012-04-30 03:56:44 +00:00
Jakub Staszak	f526e691cf	Remove unneeded casts. No functionality change. llvm-svn: 155800	2012-04-29 20:52:53 +00:00
Craig Topper	ce1e652483	Simplify code a bit. No functional change intended. llvm-svn: 155798	2012-04-29 20:22:05 +00:00
Derek Schuff	7fe1fbbe81	Revert r155745 llvm-svn: 155746	2012-04-27 23:37:41 +00:00
Derek Schuff	80bd01f406	Fix fastcc structure return with fast-isel on x86-32 On x86-32, structure return via sret lets the callee pop the hidden pointer argument off the stack, which the caller then re-pushes. However if the calling convention is fastcc, then a register is used instead, and the caller should not adjust the stack. This is implemented with a check of IsTailCallConvention X86TargetLowering::LowerCall but is now checked properly in X86FastISel::DoSelectCall. llvm-svn: 155745	2012-04-27 23:27:17 +00:00
Craig Topper	5270dd7a71	Use 'unsigned' instead of 'int' in several places when retrieving number of vector elements. llvm-svn: 155742	2012-04-27 22:54:43 +00:00
Chad Rosier	d627fcbf2a	Add x86-specific DAG combine to simplify: x == -y --> x+y == 0 x != -y --> x+y != 0 On x86, the generated code goes from negl %esi cmpl %esi, %edi je .LBB0_2 to addl %esi, %edi je .L4 This case is correctly handled for ARM with "cmn". Patch by Manman Ren. rdar://11245199 PR12545 llvm-svn: 155739	2012-04-27 22:33:25 +00:00
Craig Topper	b06100424e	Tidy up spacing. llvm-svn: 155733	2012-04-27 21:05:09 +00:00
Benjamin Kramer	1380494168	X86: Don't emit conditional floating point moves on when targeting pre-pentiumpro architectures. * Model FPSW (the FPU status word) as a register. * Add ISel patterns for the FUCOM, FNSTSW and SAHF instructions. During Legalize/Lowering, build a node sequence to transfer the comparison result from FPSW into EFLAGS. If you're wondering about the right-shift: That's an implicit sub-register extraction (%ax -> %ah) which is handled later on by the instruction selector. Fixes PR6679. Patch by Christoph Erhardt! llvm-svn: 155704	2012-04-27 12:07:43 +00:00
Preston Gurd	fb1760744d	Trivial change to set UseLeaForSP flag in addition to toggling the FeatureLeaForSP feature bit when llvm auto detects Intel Atom. Patch by Andy Zhang llvm-svn: 155655	2012-04-26 19:52:27 +00:00
Craig Topper	f883096ff7	Enable detection of AVX and AVX2 support through CPUID. Add AVX/AVX2 to corei7-avx, core-avx-i, and core-avx2 cpu names. llvm-svn: 155618	2012-04-26 06:40:15 +00:00
Craig Topper	1a016fd95d	Use vector_shuffles instead of target specific unpack nodes for AVX ZERO_EXTEND/ANY_EXTEND combine. These will be converted to target specific nodes during lowering. This is more consistent with other code. llvm-svn: 155537	2012-04-25 06:39:39 +00:00
Nadav Rotem	021d75713c	AVX: Add additional vbroadcast replacement sequences for integers. Remove the v2f64 patterns because it does not match any vbroadcast instruction. llvm-svn: 155461	2012-04-24 18:09:59 +00:00
Nadav Rotem	f2756d7e7f	AVX2: The BLENDPW instruction selects between vectors of v16i16 using an i8 immediate. We can't use it here because the shuffle code does not check that the lower part of the word is identical to the upper part. llvm-svn: 155440	2012-04-24 11:27:53 +00:00
Nadav Rotem	d060c25823	AVX: We lower VECTOR_SHUFFLE and BUILD_VECTOR nodes into vbroadcast instructions using the pattern (vbroadcast (i32load src)). In some cases, after we generate this pattern new users are added to the load node, which prevent the selection of the blend pattern. This commit provides fallback patterns which perform in-vector broadcast (using in-vector vbroadcast in AVX2 and pshufd on AVX1). llvm-svn: 155437	2012-04-24 11:07:03 +00:00
Craig Topper	dae7196823	Remove dangling spaces. Fix some other formatting. llvm-svn: 155429	2012-04-24 06:36:35 +00:00
Craig Topper	61065e271e	Simplify code a bit and make it compile better. Remove unused parameters. llvm-svn: 155428	2012-04-24 06:02:29 +00:00
Nadav Rotem	c60ef21760	Optimize the vector UINT_TO_FP, SINT_TO_FP and FP_TO_SINT operations where the integer type is i8 (commonly used in graphics). llvm-svn: 155397	2012-04-23 21:53:37 +00:00
Preston Gurd	0a730de3c3	This patch fixes a problem which arose when using the Post-RA scheduler on X86 Atom. Some of our tests failed because the tail merging part of the BranchFolding pass was creating new basic blocks which did not contain live-in information. When the anti-dependency code in the Post-RA scheduler ran, it would sometimes rename the register containing the function return value because the fact that the return value was live-in to the subsequent block had been lost. To fix this, it is necessary to run the RegisterScavenging code in the BranchFolding pass. This patch makes sure that the register scavenging code is invoked in the X86 subtarget only when post-RA scheduling is being done. Post RA scheduling in the X86 subtarget is only done for Atom. This patch adds a new function to the TargetRegisterClass to control whether or not live-ins should be preserved during branch folding. This is necessary in order for the anti-dependency optimizations done during the PostRASchedulerList pass to work properly when doing Post-RA scheduling for the X86 in general and for the Intel Atom in particular. The patch adds and invokes the new function trackLivenessAfterRegAlloc() instead of using the existing requiresRegisterScavenging(). It changes BranchFolding.cpp to call trackLivenessAfterRegAlloc() instead of requiresRegisterScavenging(). It changes the all the targets that implemented requiresRegisterScavenging() to also implement trackLivenessAfterRegAlloc(). It adds an assertion in the Post RA scheduler to make sure that post RA liveness information is available when it is needed. It changes the X86 break-anti-dependencies test to use –mcpu=atom, in order to avoid running into the added assertion. Finally, this patch restores the use of anti-dependency checking (which was turned off temporarily for the 3.1 release) for Intel Atom in the Post RA scheduler. Patch by Andy Zhang! Thanks to Jakob and Anton for their reviews. llvm-svn: 155395	2012-04-23 21:39:35 +00:00
Craig Topper	95fa5a8765	Use MVT instead of EVT through all of LowerVECTOR_SHUFFLEtoBlend and not just the switch. Saves a little bit of binary size. llvm-svn: 155339	2012-04-23 07:36:33 +00:00
Craig Topper	4e6deec5d8	Make getZeroVector and getOnesVector more alike as far as how they detect 128-bit versus 256-bit vectors. Be explicit about both sizes and use llvm_unreachable. Similar changes to getLegalSplat. llvm-svn: 155337	2012-04-23 07:24:41 +00:00
Craig Topper	f9811e8f28	Tidy up by removing some 'else' after 'return' llvm-svn: 155336	2012-04-23 06:57:04 +00:00
Craig Topper	c315e7b6db	Tidy up spacing in LowerVECTOR_SHUFFLEtoBlend. Remove code that checks if shuffle operand has a different type than the the shuffle result since it can never happen. llvm-svn: 155333	2012-04-23 06:38:28 +00:00
Craig Topper	f27c3223f7	Add a couple llvm_unreachables. llvm-svn: 155332	2012-04-23 03:42:40 +00:00
Craig Topper	6c6ee67efe	Remove some tab characers. llvm-svn: 155331	2012-04-23 03:28:34 +00:00
Craig Topper	16829bb004	Remove some 'else' after 'return'. No functional change. llvm-svn: 155330	2012-04-23 03:26:18 +00:00
Craig Topper	2dedfa7805	Make Extract128BitVector and Insert128BitVector take an unsigned instead of an ConstantNode SDValue. getConstant was almost always called just before only to have the functions take it apart and build a new ConstantSDNode. llvm-svn: 155325	2012-04-22 20:55:18 +00:00
Craig Topper	5669044c57	Convert getNode(UNDEF) to getUNDEF. llvm-svn: 155321	2012-04-22 19:29:34 +00:00
Craig Topper	a9994377f2	Make calls to getVectorShuffle more consistent. Use shuffle VT for calls to getUNDEF instead of requerying. Use &Mask[0] instead of Mask.data(). llvm-svn: 155320	2012-04-22 19:17:57 +00:00
Craig Topper	5c4c8b1f81	Tidy up. 80 columns and argument alignment. llvm-svn: 155319	2012-04-22 18:51:37 +00:00
Craig Topper	58aeb7b7c3	Simplify code by converting multiple places that were manually concatenating 128-bit vectors to use either CONCAT_VECTORS or a helper function. CONCAT_VECTORS will itself be lowered to the same pattern as before. The helper function is needed for concats of BUILD_VECTORs since getNode(CONCAT_VECTORS) will just return a large BUILD_VECTOR and we may be trying to lower large BUILD_VECTORS when this occurs. llvm-svn: 155318	2012-04-22 18:15:59 +00:00
Elena Demikhovsky	35721fc4f8	ZERO_EXTEND/SIGN_EXTEND/TRUNCATE optimization for AVX2 llvm-svn: 155309	2012-04-22 09:39:03 +00:00
Craig Topper	96407e19f5	Make some fixed arrays const. Use array_lengthof in a couple places instead of a hardcoded number. llvm-svn: 155294	2012-04-21 18:58:38 +00:00
Craig Topper	2a70ca9377	Tidy up. 80 columns and some other spacing issues. llvm-svn: 155291	2012-04-21 18:13:35 +00:00
Craig Topper	a0bf6c3af3	Convert some uses of XXXRegisterClass to &XXXRegClass. No functional change since they are equivalent. llvm-svn: 155186	2012-04-20 06:31:50 +00:00
Kevin Enderby	7d41dd85c3	Fixed the llvm-mv X86 disassembler so the 'C' API gets jumps properly symbolicated. These have and operand type of TYPE_RELv which was not handled as isBranch in translateImmediate() in X86Disassembler.cpp. rdar://11268426 llvm-svn: 155074	2012-04-18 23:12:11 +00:00
Craig Topper	7c784d86eb	Remove AVX vpermil intrinsics. I removed their uses from clang headers and builtins a while back. llvm-svn: 154985	2012-04-18 05:24:00 +00:00
Craig Topper	ada065b23b	Don't decode vperm2i128 or vperm2f128 into a shuffle if bit 3 or 7 of the immediate is set. llvm-svn: 154907	2012-04-17 05:54:54 +00:00
Preston Gurd	01328a277e	Temporarily turn off anti-dependency checking during Post RA scheduling in X86, until the X86 target is changed to properly set up post RA liveness. llvm-svn: 154874	2012-04-16 22:52:28 +00:00
Richard Smith	971d090cbb	Fix incorrect atomics codegen introduced in r154705, and extend test to catch it. llvm-svn: 154845	2012-04-16 18:43:53 +00:00
Craig Topper	db4fcf7088	Replace vpermd/vpermps intrinic patterns with custom lowering to target specific nodes. llvm-svn: 154801	2012-04-16 07:13:00 +00:00
Craig Topper	a986fc78e2	Change type profile for vpermv back to using operand type for the mask argument to match intrinsic behavior. Add a bitcast to the lowering code to convert mask from v8i32 to v8f32 for vpermps. llvm-svn: 154798	2012-04-16 06:43:40 +00:00
Craig Topper	129dccdc84	Flip the arguments when converting vpermd/vpermps intrinsics into instructions. The intrinsic has the mask as the last operand, but the instruction has it as the second. llvm-svn: 154797	2012-04-16 06:26:15 +00:00
Craig Topper	1b15347812	Merge vpermps/vpermd and vpermpd/vpermq SD nodes. llvm-svn: 154782	2012-04-16 00:41:45 +00:00
Craig Topper	c217784dc3	Fix SDTypeProfile for vpermps. The mask operand should be v8i32. llvm-svn: 154781	2012-04-16 00:12:20 +00:00
Craig Topper	e274a2cc61	Spacing fixes and 80 column fixes. Use 0 instead of 0x80 for undef indices in vpermps/vpermd. Hardware only looks at lower 3-bits. llvm-svn: 154780	2012-04-15 23:48:57 +00:00
Craig Topper	788250eec1	Remove AVX2 vpermq and vpermpd intrinsics. These can now be handled with normal shuffle vectors. llvm-svn: 154778	2012-04-15 22:43:31 +00:00
Nadav Rotem	2a4e2ef10c	Fix PR12529. The Vxx family of instructions are only supported by AVX. Use non-vex instructions for SSE4. llvm-svn: 154770	2012-04-15 19:36:44 +00:00
Elena Demikhovsky	92fb3e613e	Added VPERM optimization for AVX2 shuffles llvm-svn: 154761	2012-04-15 11:18:59 +00:00
Richard Smith	d5004a79d9	Fix X86 codegen for 'atomicrmw nand' to generate x = ~(x & y), not x = ~x & y. llvm-svn: 154705	2012-04-13 22:47:00 +00:00
Evan Cheng	d9958dcd91	Generalize r153635 to deal with TokenFactor chains; also clean up the logic and fix the tests. rdar://11069732, rdar://11236106 llvm-svn: 154604	2012-04-12 19:14:21 +00:00
Craig Topper	448790d566	Fix 128-bit ptest intrinsics to take v2i64 instead of v4f32 since these are integer instructions. llvm-svn: 154580	2012-04-12 07:23:00 +00:00
Nadav Rotem	210f92b306	remove unused argument llvm-svn: 154494	2012-04-11 11:05:21 +00:00
Nadav Rotem	c922b4f2a3	Reapply 154396 after fixing a test. Original message: Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendV uses a register for the selection while Vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154483	2012-04-11 06:40:27 +00:00
Charles Davis	a5e1970cd0	Add retw and lretw instructions. Also, fix Intel syntax parsing for all ret instructions. llvm-svn: 154468	2012-04-11 01:10:53 +00:00
Chad Rosier	b2ebb93f3c	Whitespace. llvm-svn: 154427	2012-04-10 19:42:07 +00:00
Chad Rosier	f3b2588ea8	Revert r154396, which looks to be the real culprit behind the bot failures. llvm-svn: 154426	2012-04-10 19:39:18 +00:00
Eric Christopher	f8886e8f48	Temporarily revert this patch to see if it brings the buildbots back. llvm-svn: 154425	2012-04-10 19:33:16 +00:00
David Blaikie	cf463882c0	Remove unused variable. llvm-svn: 154398	2012-04-10 15:23:13 +00:00
Nadav Rotem	74f87a6bd8	Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendv uses a register for the selection while vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154396	2012-04-10 14:33:13 +00:00
Evan Cheng	5825e9dbf5	Fix a long standing tail call optimization bug. When a libcall is emitted legalizer always use the DAG entry node. This is wrong when the libcall is emitted as a tail call since it effectively folds the return node. If the return node's input chain is not the entry (i.e. call, load, or store) use that as the tail call input chain. PR12419 rdar://9770785 rdar://11195178 llvm-svn: 154370	2012-04-10 01:51:00 +00:00
Preston Gurd	c758aebf45	This patch adds X86 instruction itineraries, which were missed by the original patch to add itineraries, to X86InstrArithmetc.td. llvm-svn: 154320	2012-04-09 15:32:22 +00:00
Nadav Rotem	9f7f17826e	Lower some x86 shuffle sequences to the vblend family of instructions. llvm-svn: 154313	2012-04-09 08:33:21 +00:00
Nadav Rotem	4499fb1d50	Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type. Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering. llvm-svn: 154310	2012-04-09 07:45:58 +00:00
Chandler Carruth	bb1db0e66a	Cleanup and relax a restriction on the matching of global offsets into x86 addressing modes. This allows PIE-based TLS offsets to fit directly into an addressing mode immediate offset, which is the last remaining code quality issue from PR12380. With this patch, that PR is completely fixed. To understand why this patch is correct to match these offsets into addressing mode immediates, break it down by cases: 1) 32-bit is trivially correct, and unmodified here. 2) 64-bit non-small mode is unchanged and never matches. 3) 64-bit small PIC code which is RIP-relative is handled specially in the match to try to fit RIP into the base register. If it fails, it now early exits. This behavior is unchanged by the patch. 4) 64-bit small non-PIC code which is not RIP-relative continues to work as it did before. The reason these immediates are safe is because the ABI ensures they fit in small mode. This behavior is unchanged. 5) 64-bit small PIC code which is not using RIP-relative addressing. This is the only case changed by the patch, and the primary place you see it is in TLS, either the win64 section offset TLS or Linux local-exec TLS model in a PIC compilation. Here the ABI again ensures that the immediates fit because we are in small mode, and any other operations required due to the PIC relocation model have been handled externally to the Wrapper node (extra loads etc are made around the wrapper node in ISelLowering). I've tested this as much as I can comparing it with GCC's output, and everything appears safe. I discussed this with Anton and it made sense to him at least at face value. That said, if there are issues with PIC code after this patch, yell and we can revert it. llvm-svn: 154304	2012-04-09 02:13:06 +00:00

... 2 3 4 5 6 ...

8496 Commits