llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 06:22:56 +02:00

Author	SHA1	Message	Date
Michael Liao	3accf514d6	Revert r166049 - In general, it's unsafe for this transformation. llvm-svn: 166135	2012-10-17 22:41:15 +00:00
Reed Kotler	9a7c7a004d	Add conditional branch instructions and their patterns. llvm-svn: 166134	2012-10-17 22:29:54 +00:00
Michael Liao	b168cd6995	Teach DAG combine to fold (extract_subvec (concat v1, ..) i) to v_i - If the extracted vector has the same type of all vectored being concatenated together, it should be simplified directly into v_i, where i is the index of the element being extracted. llvm-svn: 166125	2012-10-17 20:48:33 +00:00
Anton Korobeynikov	302a70ec1e	Fix fallout from RegInfo => FrameLowering refactoring on MSP430. Patch by Job Noorman! llvm-svn: 166108	2012-10-17 17:37:11 +00:00
Michael Liao	db8bc2e5dc	Fix setjmp on models with non-Small code model nor non-Static relocation model - MBB address is only valid as an immediate value in Small & Static code/relocation models. On other models, LEA is needed to load IP address of the restore MBB. - A minor fix of MBB in MC lowering is added as well to enable target relocation flag being propagated into MC. llvm-svn: 166084	2012-10-17 02:22:27 +00:00
Jakob Stoklund Olesen	96ecdfd8dd	Avoid rematerializing a redef immediately after the old def. PR14098 contains an example where we would rematerialize a MOV8ri immediately after the original instruction: %vreg7:sub_8bit<def> = MOV8ri 9; GR32_ABCD:%vreg7 %vreg22:sub_8bit<def> = MOV8ri 9; GR32_ABCD:%vreg7 Besides being pointless, it is also wrong since the original instruction only redefines part of the register, and the value read by the new instruction is wrong. The problem was the LiveRangeEdit::allUsesAvailableAt() didn't special-case OrigIdx == UseIdx and found the wrong SSA value. llvm-svn: 166068	2012-10-16 22:51:58 +00:00
Jakob Stoklund Olesen	1cfbe5c549	Revert r166046 "Switch back to the old coalescer for now to fix the 32 bit bit" A fix for PR14098, including the test case is in the next commit. llvm-svn: 166067	2012-10-16 22:51:55 +00:00
Michael Liao	ee2ce36cda	Teach DAG combine to fold (trunc (fptoXi x)) to (fptoXi x) llvm-svn: 166049	2012-10-16 19:38:35 +00:00
Rafael Espindola	2f08719190	Switch back to the old coalescer for now to fix the 32 bit bit llvm+clang+compiler-rt bootstrap. llvm-svn: 166046	2012-10-16 19:34:06 +00:00
Bill Schmidt	ad04de0c32	This patch addresses PR13949. For the PowerPC 64-bit ELF Linux ABI, aggregates of size less than 8 bytes are to be passed in the low-order bits ("right-adjusted") of the doubleword register or memory slot assigned to them. A previous patch addressed this for aggregates passed in registers. However, small aggregates passed in the overflow portion of the parameter save area are still being passed left-adjusted. The fix is made in PPCTargetLowering::LowerCall_Darwin_Or_64SVR4 on the caller side, and in PPCTargetLowering::LowerFormalArguments_64SVR4 on the callee side. The main fix on the callee side simply extends existing logic for 1- and 2-byte objects to 1- through 7-byte objects, and correcting a constant left over from 32-bit code. There is also a fix to a bogus calculation of the offset to the following argument in the parameter save area. On the caller side, again a constant left over from 32-bit code is fixed. Additionally, some code for 1, 2, and 4-byte objects is duplicated to handle the 3, 5, 6, and 7-byte objects for SVR4 only. The LowerCall_Darwin_Or_64SVR4 logic is getting fairly convoluted trying to handle both ABIs, and I propose to separate this into two functions in a future patch, at which time the duplication can be removed. The patch adds a new test (structsinmem.ll) to demonstrate correct passing of structures of all seven sizes. Eight dummy parameters are used to force these structures to be in the overflow portion of the parameter save area. As a side effect, this corrects the case when aggregates passed in registers are saved into the first eight doublewords of the parameter save area: Previously they were stored left-justified, and now are properly stored right-justified. This requires changing the expected output of existing test case structsinregs.ll. llvm-svn: 166022	2012-10-16 13:30:53 +00:00
Stepan Dyatkovskiy	09c6b0a273	Issue: Stack is formed improperly for long structures passed as byval arguments for EABI mode. If we took AAPCS reference, we can found the next statements: A: "If the argument requires double-word alignment (8-byte), the NCRN (Next Core Register Number) is rounded up to the next even register number." (5.5 Parameter Passing, Stage C, C.3). B: "The alignment of an aggregate shall be the alignment of its most-aligned component." (4.3 Composite Types, 4.3.1 Aggregates). So if we have structure with doubles (9 double fields) and 3 Core unused registers (r1, r2, r3): caller should use r2 and r3 registers only. Currently r1,r2,r3 set is used, but it is invalid. Callee VA routine should also use r2 and r3 regs only. All is ok here. This behaviour is guessed by rounding up SP address with ADD+BFC operations. Fix: Main fix is in ARMTargetLowering::HandleByVal. If we detected AAPCS mode and 8 byte alignment, we waste odd registers then. P.S.: I also improved LDRB_POST_IMM regression test. Since ldrb instruction will not generated by current regression test after this patch. llvm-svn: 166018	2012-10-16 07:16:47 +00:00
NAKAMURA Takumi	83458d4d01	Reapply r165661, Patch by Shuxin Yang <shuxin.llvm@gmail.com>. Original message: The attached is the fix to radar://11663049. The optimization can be outlined by following rules: (select (x != c), e, c) -> select (x != c), e, x), (select (x == c), c, e) -> select (x == c), x, e) where the <c> is an integer constant. The reason for this change is that : on x86, conditional-move-from-constant needs two instructions; however, conditional-move-from-register need only one instruction. While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase. The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource". Original message since r165661: My previous change has a bug: I negated the condition code of a CMOV, and go ahead creating a new CMOV using the ORIGINAL condition code. llvm-svn: 166017	2012-10-16 06:28:34 +00:00
Rafael Espindola	af4181923d	Fix the cpu name and add -verify-machineinstrs. llvm-svn: 166003	2012-10-16 01:13:06 +00:00
Andrew Trick	af9fb59623	misched: Added handleMove support for updating all kill flags, not just for allocatable regs. This is a medium term workaround until we have a more robust solution in the form of a register liveness utility for postRA passes. llvm-svn: 166001	2012-10-16 00:22:51 +00:00
Michael Liao	a7e5913fde	Add __builtin_setjmp/_longjmp supprt in X86 backend - Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also used as a light-weight replacement of setjmp/longjmp which are used to implementation continuation, user-level threading, and etc. The support added in this patch ONLY addresses this usage and is NOT intended to support SjLj exception handling as zero-cost DWARF exception handling is used by default in X86. llvm-svn: 165989	2012-10-15 22:39:43 +00:00
Jim Grosbach	8df1c73056	ARM: v1i64 and v2i64 VBSL intrinsic support. rdar://12502028 llvm-svn: 165981	2012-10-15 21:23:40 +00:00
Andrew Trick	a3ebac4349	Check output of the misched unit tests llvm-svn: 165959	2012-10-15 20:33:14 +00:00
Rafael Espindola	e152eedf59	Add a cpu to try to fix the atom builder. llvm-svn: 165956	2012-10-15 19:25:43 +00:00
Rafael Espindola	e554bb55b6	Add testcase for pr14088. llvm-svn: 165954	2012-10-15 19:00:10 +00:00
Andrew Trick	252355ac40	misched tests: add a triple to speculatively fix windows builders. llvm-svn: 165952	2012-10-15 18:21:08 +00:00
Andrew Trick	a5e2aeb12b	misched: ILP scheduler for experimental heuristics. llvm-svn: 165950	2012-10-15 18:02:27 +00:00
Silviu Baranga	e3e0e84559	Fixed PR13938: the ARM backend was crashing because it couldn't select a VDUPLANE node with the vector input size different from the output size. This was bacause the BUILD_VECTOR lowering code didn't check that the size of the input vector was correct for using VDUPLANE. llvm-svn: 165929	2012-10-15 09:41:32 +00:00
Jakob Stoklund Olesen	4aac24404c	Drop <def,dead> flags when merging into an unused lane. The new coalescer can merge a dead def into an unused lane of an otherwise live vector register. Clear the <dead> flag when that happens since the flag refers to the full virtual register which is still live after the partial dead def. This fixes PR14079. llvm-svn: 165877	2012-10-13 17:26:47 +00:00
Jakob Stoklund Olesen	533711462c	Allow for loops in LiveIntervals::pruneValue(). It is possible that the live range of the value being pruned loops back into the kill MBB where the search started. When that happens, make sure that the beginning of KillMBB is also pruned. Instead of starting a DFS at KillMBB and skipping the root of the search, start a DFS at each KillMBB successor, and allow the search to loop back to KillMBB. This fixes PR14078. llvm-svn: 165872	2012-10-13 16:15:31 +00:00
Benjamin Kramer	35d6fcd24a	X86: Fix accidentally swapped operands. llvm-svn: 165871	2012-10-13 12:50:19 +00:00
Benjamin Kramer	a348e5aa65	X86: Promote i8 cmov when both operands are coming from truncates of the same width. X86 doesn't have i8 cmovs so isel would emit a branch. Emitting branches at this level is often not a good idea because it's too late for many optimizations to kick in. This solution doesn't add any extensions (truncs are free) and tries to avoid introducing partial register stalls by filtering direct copyfromregs. I'm seeing a ~10% speedup on reading a random .png file with libpng15 via graphicsmagick on x86_64/westmere, but YMMV depending on the microarchitecture. llvm-svn: 165868	2012-10-13 10:39:49 +00:00
Manman Ren	6d1b15f406	ARM: tail-call inside a function where part of a byval argument is on caller's local frame causes problem. For example: void f(StructToPass s) { g(&s, sizeof(s)); } will cause problem with tail-call since part of s is passed via registers and saved in f's local frame. When g tries to access s, part of s may be corrupted since f's local frame is popped out before the tail-call. The current fix is to disable tail-call if getVarArgsRegSaveSize is not 0 for the caller. This is a conservative approach, if we can prove the address of s or part of s is not taken and passed to g, it should be okay to perform tail-call. rdar://12442472 llvm-svn: 165853	2012-10-12 23:39:43 +00:00
Jakob Stoklund Olesen	a83648deef	Fix buildbots: -misched=shuffle is only available in +Asserts builds. llvm-svn: 165846	2012-10-12 23:01:33 +00:00
Jim Grosbach	f931c78724	ARM: Mark VSELECT as 'expand'. The backend already pattern matches to form VBSL when it can. We may want to teach it to use the vbsl intrinsics at some point to prevent machine licm from mucking with this, but using the Expand is completely correct. http://llvm.org/bugs/show_bug.cgi?id=13831 http://llvm.org/bugs/show_bug.cgi?id=13961 Patch by Peter Couperus <peter.couperus@st.com>. llvm-svn: 165845	2012-10-12 22:59:21 +00:00
Jakob Stoklund Olesen	74661b859d	Use a transposed algorithm for handleMove(). Completely update one interval at a time instead of collecting live range fragments to be updated. This avoids building data structures, except for a single SmallPtrSet of updated intervals. Also share code between handleMove() and handleMoveIntoBundle(). Add support for moving dead defs across other live values in the interval. The MI scheduler can do that. llvm-svn: 165824	2012-10-12 21:31:57 +00:00
Jakob Stoklund Olesen	8a4d65228b	Fix coalescing with IMPLICIT_DEF values. PHIElimination inserts IMPLICIT_DEF instructions to guarantee that all PHI predecessors have a live-out value. These IMPLICIT_DEF values are not considered to be real interference when coalescing virtual registers: %vreg1 = IMPLICIT_DEF %vreg2 = MOV32r0 When joining %vreg1 and %vreg2, the IMPLICIT_DEF instruction and its value number should simply be erased since the %vreg2 value number now provides a live-out value for the PHI predecesor block. llvm-svn: 165813	2012-10-12 18:03:04 +00:00
NAKAMURA Takumi	1a69f3cfb5	llvm/test/CodeGen/PowerPC/2012-10-12-bitcast.ll: Try to fix failure on non-ppc hosts, to add -mattr=+altivec. llvm-svn: 165803	2012-10-12 16:01:08 +00:00
Ulrich Weigand	dd9a6100a0	Fix big-endian codegen bug in DAGTypeLegalizer::ExpandRes_BITCAST On PowerPC, a bitcast of <16 x i8> to i128 may run through a code path in ExpandRes_BITCAST that attempts to do an intermediate bitcast to a <4 x i32> vector, and then construct the Hi and Lo parts of the resulting i128 by pairing up two of those i32 vector elements each. The code already recognizes that on a big-endian system, the first two vector elements form the Hi part, and the final two vector elements form the Lo part (vice-versa from the little-endian situation). However, we also need to take endianness into account when forming each of those separate pairs: on a big-endian system, vector element 0 is the high part of the pair making up the Hi part of the result, and vector element 1 is the low part of the pair. The code currently always uses vector element 0 as the low part and vector element 1 as the high part, as is appropriate for little-endian platforms only. This patch fixes this by swapping the vector elements as they are paired up as appropriate. llvm-svn: 165802	2012-10-12 15:42:58 +00:00
Reed Kotler	99d975c61d	Div, Rem int/unsigned int llvm-svn: 165783	2012-10-12 02:01:09 +00:00
Evan Cheng	76b896ec91	Legalizer optimize a pair of div / mod to a call to divrem libcall if they are not legal. However, it should use a div instruction + mul + sub if divide is legal. The rem legalization code was missing a check and incorrectly uses a divrem libcall even when div is legal. rdar://12481395 llvm-svn: 165778	2012-10-12 01:15:47 +00:00
Jakob Stoklund Olesen	a7a8e389d2	Pass an explicit operand number to addLiveIns. Not all instructions define a virtual register in their first operand. Specifically, INLINEASM has a different format. <rdar://problem/12472811> llvm-svn: 165721	2012-10-11 16:46:07 +00:00
Bill Schmidt	3b8ee801af	This patch addresses PR13947. For function calls on the 64-bit PowerPC SVR4 target, each parameter is mapped to as many doublewords in the parameter save area as necessary to hold the parameter. The first 13 non-varargs floating-point values are passed in registers; any additional floating-point parameters are passed in the parameter save area. A single-precision floating-point parameter (32 bits) must be mapped to the second (rightmost, low-order) word of its assigned doubleword slot. Currently LLVM violates this ABI requirement by mapping such a parameter to the first (leftmost, high-order) word of its assigned doubleword slot. This is internally self-consistent but will not interoperate correctly with libraries compiled with an ABI-compliant compiler. This patch corrects the problem by adjusting the parameter addressing on both sides of the calling convention. llvm-svn: 165714	2012-10-11 15:38:20 +00:00
NAKAMURA Takumi	1a341a8de1	Revert r165661, "Patch by Shuxin Yang <shuxin.llvm@gmail.com>." It broke stage2 clang and test-suite/MultiSource/Benchmarks/mediabench/g721/g721encode. llvm-svn: 165692	2012-10-11 02:02:05 +00:00
Evan Cheng	72074df318	Add isel patterns for v2f32 / v4f32 neon.vbsl intrinsics. rdar://12471808 llvm-svn: 165673	2012-10-10 23:06:34 +00:00
Bill Schmidt	6cac0197ea	Add -mattr=+altivec and remove XFAIL. llvm-svn: 165666	2012-10-10 22:25:11 +00:00
Bill Schmidt	227cfed3b5	XFAIL for all targets pending investigation llvm-svn: 165664	2012-10-10 21:52:10 +00:00
Nadav Rotem	303a9b2e50	Patch by Shuxin Yang <shuxin.llvm@gmail.com>. Original message: The attached is the fix to radar://11663049. The optimization can be outlined by following rules: (select (x != c), e, c) -> select (x != c), e, x), (select (x == c), c, e) -> select (x == c), x, e) where the <c> is an integer constant. The reason for this change is that : on x86, conditional-move-from-constant needs two instructions; however, conditional-move-from-register need only one instruction. While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase. The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource". llvm-svn: 165661	2012-10-10 21:31:55 +00:00
Bill Schmidt	57e3d38632	When generating spill and reload code for vector registers on PowerPC, the compiler makes use of GPR0. However, there are two flavors of GPR0 defined by the target: the 32-bit GPR0 (R0) and the 64-bit GPR0 (X0). The spill/reload code makes use of R0 regardless of whether we are generating 32- or 64-bit code. This patch corrects the problem in the obvious manner, using X0 and ADDI8 for 64-bit and R0 and ADDI for 32-bit. llvm-svn: 165658	2012-10-10 21:25:01 +00:00
Bill Schmidt	5f0844eeb4	The PowerPC VRSAVE register has been somewhat of an odd beast since the Altivec extensions were introduced. Its use is optional, and allows the compiler to communicate to the operating system which vector registers should be saved and restored during a context switch. In practice, this information is ignored by the various operating systems using the SVR4 ABI; the kernel saves and restores the entire register state. Setting the VRSAVE register is no longer performed by the AIX XL compilers, the IBM i compilers, or by GCC on Power Linux systems. It seems best to avoid this logic within LLVM as well. This patch avoids generating code to update and restore VRSAVE for the PowerPC SVR4 ABIs (32- and 64-bit). The code remains in place for the Darwin ABI. llvm-svn: 165656	2012-10-10 20:54:15 +00:00
Michael Liao	29bbad4a7b	Specify CPU model to avoid breaking ATOM builds llvm-svn: 165638	2012-10-10 18:04:52 +00:00
Michael Liao	6a09ff62ba	Add support for FP_ROUND from v2f64 to v2f32 - Due to the current matching vector elements constraints in ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from v2f32) is scalarized. Add a customized v2f32 widening to convert it into a target-specific X86ISD::VFPROUND to work around this constraints. llvm-svn: 165631	2012-10-10 16:53:28 +00:00
Stepan Dyatkovskiy	06c2fdd18f	Fix for LDRB instruction: SDNode for LDRB_POST_IMM is invalid: number of registers added to SDNode fewer that described in .td. 7 ops is needed, but SDNode with only 6 is created. In more details: In ARMInstrInfo.td, in multiclass AI2_ldridx, in definition _POST_IMM, offset operand is defined as am2offset_imm. am2offset_imm is complex parameter type, and actually it consists from dummy register and imm itself. As I understood trick with dummy reg was made for AsmParser. In ARMISelLowering.cpp, this dummy register was not added to SDNode, and it cause crash in Peephole Optimizer pass. The problem fixed by setting up additional dummy reg when emitting LDRB_POST_IMM instruction. llvm-svn: 165617	2012-10-10 11:43:40 +00:00
Stepan Dyatkovskiy	5182bb8695	Issue description: SchedulerDAGInstrs::buildSchedGraph ignores dependencies between FixedStack objects and byval parameters. So loading byval parameters from stack may be inserted before it will be stored, since these operations are treated as independent. Fix: Currently ARMTargetLowering::LowerFormalArguments saves byval registers with FixedStack MachinePointerInfo. To fix the problem we need to store byval registers with MachinePointerInfo referenced to first the "byval" parameter. Also commit adds two new fields to the InputArg structure: Function's argument index and InputArg's part offset in bytes relative to the start position of Function's argument. E.g.: If function's argument is 128 bit width and it was splitted onto 32 bit regs, then we got 4 InputArg structs with same arg index, but different offset values. llvm-svn: 165616	2012-10-10 11:37:36 +00:00
Akira Hatanaka	6fbe997f0d	Implement MipsTargetLowering::CanLowerReturn. Patch by Sasa Stankovic. llvm-svn: 165585	2012-10-10 01:27:09 +00:00
Evan Cheng	7aa64222c5	When expanding atomic load arith instructions, do not lose target flags. rdar://12453106 llvm-svn: 165568	2012-10-09 23:48:33 +00:00
Jakob Stoklund Olesen	28526a6248	Don't crash on extra evil irreducible control flow. When the CFG contains a loop with multiple entry blocks, the traces computed by MachineTraceMetrics don't always have the same nice properties. Loop back-edges are normally excluded from traces, but MachineLoopInfo doesn't recognize loops with multiple entry blocks, so those back-edges may be included. Avoid asserting when that happens by adding an isEarlierInSameTrace() function that accurately determines if a dominating block is part of the same trace AND is above the currrent block in the trace. llvm-svn: 165434	2012-10-08 22:06:44 +00:00
Adhemerval Zanella	91fa3a3479	PR12716: PPC crashes on vector compare Vector compare using altivec 'vcmpxxx' instructions have as third argument a vector register instead of CR one, different from integer and float-point compares. This leads to a failure in code generation, where 'SelectSETCC' expects a DAG with a CR register and gets vector register instead. This patch changes the behavior by just returning a DAG with the vector compare instruction based on the type. The patch also adds a testcase for all vector types llvm defines. It also included a fix on signed 5-bits predicates printing, where signed values were not handled correctly as signed (char are unsigned by default for PowerPC). This generates 'vspltisw' (vector splat) instruction with SIM out of range. llvm-svn: 165419	2012-10-08 18:59:53 +00:00
Adhemerval Zanella	490909f7d1	Add floating-point to and from integer conversion This patch add altivec support for v4i32 to v4f32 and for v4f32 to v4i32 vector rounding conversion. llvm-svn: 165409	2012-10-08 17:27:24 +00:00
Benjamin Kramer	ea6041a09a	X86: fcmov doesn't handle all possible EFLAGS, fall back to a branch for the others. Otherwise it will try to use SSE patterns and fail horribly if sse is disabled. Fixes PR14035. llvm-svn: 165377	2012-10-07 15:34:27 +00:00
Reed Kotler	edee8fd3c4	Patch for integer multiply, signed/unsigned, long/long long. llvm-svn: 165322	2012-10-05 18:27:54 +00:00
Rafael Espindola	2ebde0e0fb	Convert to unix line endings. llvm-svn: 165308	2012-10-05 13:32:38 +00:00
Evan Cheng	eb205858ec	Follow up to r165072. Try a different approach: only move the load when it's going to be folded into the call. rdar://12437604 llvm-svn: 165287	2012-10-05 01:48:22 +00:00
Nadav Rotem	2f513f5a29	When merging connsecutive stores, use vectors to store the constant zero. llvm-svn: 165267	2012-10-04 22:35:15 +00:00
Jim Grosbach	3237e667f1	ARM: locate user-defined text sections next to default text. Make sure functions located in user specified text sections (via the section attribute) are located together with the default text sections. Otherwise, for large object files, the relocations for call instructions are more likely to be out of range. This becomes even more likely in the presence of LTO. rdar://12402636 llvm-svn: 165254	2012-10-04 21:33:24 +00:00
Chad Rosier	711da11038	[ms-inline asm] Add support in the X86AsmPrinter for printing memory references in the Intel syntax. The MC layer supports emitting in the Intel syntax, but this would require the inline assembly MachineInstr to be lowered to an MCInst before emission. This is potential future work, but for now emitting directly from the MachineInstr suffices. llvm-svn: 165173	2012-10-03 22:06:44 +00:00
Nadav Rotem	3ffafc2d33	Fix a cycle in the DAG. In this code we replace multiple loads with a single load and multiple stores with a single load. We create the wide loads and stores (and their chains) before we remove the scalar loads and stores and fix the DAG chain. We attempted to merge loads with a different chain. When that happened, the assumption that it is safe to RAUW broke and a cycle was introduced. llvm-svn: 165148	2012-10-03 19:30:31 +00:00
Nadav Rotem	b537146d7f	A DAGCombine optimization for mergeing consecutive stores to memory. The optimization is not profitable in many cases because modern processors perform multiple stores in parallel and merging stores prior to merging requires extra work. We handle two main cases: 1. Store of multiple consecutive constants: q->a = 3; q->4 = 5; In this case we store a single legal wide integer. 2. Store of multiple consecutive loads: int a = p->a; int b = p->b; q->a = a; q->b = b; In this case we load/store either ilegal vector registers or legal wide integer registers. llvm-svn: 165125	2012-10-03 16:11:15 +00:00
Silviu Baranga	c4986e5454	Fixed a bug in the ExecutionDependencyFix pass that caused dependencies to not propagate through implicit defs. llvm-svn: 165102	2012-10-03 08:29:36 +00:00
Jakob Stoklund Olesen	3c17bbe30d	The early if conversion pass is ready to be used as an opt-in. Enable the pass by default for targets that request it, and change the -enable-early-ifcvt to the opposite -disable-early-ifcvt. There are still some x86 regressions when enabling early if-conversion because of the missing machine models. Disable the pass for x86 until machine models are added. llvm-svn: 165075	2012-10-03 00:51:32 +00:00
Evan Cheng	19255e0c70	Fix a serious X86 instruction selection bug. In X86DAGToDAGISel::PreprocessISelDAG(), isel is moving load inside callseq_start / callseq_end so it can be folded into a call. This can create a cycle in the DAG when the call is glued to a copytoreg. We have been lucky this hasn't caused too many issues because the pre-ra scheduler has special handling of call sequences. However, it has caused a crash in a specific tailcall case. rdar://12393897 llvm-svn: 165072	2012-10-02 23:49:13 +00:00
Nick Lewycky	d90da5977b	Make sure to put our sret argument into %rax on x86-64. Fixes PR13563! llvm-svn: 165063	2012-10-02 22:45:06 +00:00
Jakob Stoklund Olesen	f4d8b0432e	Make sure the whole live range is covered when values are pruned twice. JoinVals::pruneValues() calls LIS->pruneValue() to avoid conflicts when overlapping two different values. This produces a set of live range end points that are used to reconstruct the live range (with SSA update) after joining the two registers. When a value is pruned twice, the set of end points was insufficient: v1 = DEF v1 = REPLACE1 v1 = REPLACE2 KILL v1 The end point at KILL would only reconstruct the live range from REPLACE2 to KILL, leaving the range REPLACE1-REPLACE2 dead. Add REPLACE2 as an end point in this case so the full live range is reconstructed. This fixes PR13999. llvm-svn: 165056	2012-10-02 21:46:39 +00:00
Benjamin Kramer	aa07e96212	Fix broken tests. llvm-svn: 165019	2012-10-02 15:49:34 +00:00
Duncan Sands	199cd37f83	Fix PR13991: legalizing an overflowing multiplication operation is harder than the add/sub case since in the case of multiplication you also have to check that the operation in the larger type did not overflow. llvm-svn: 165017	2012-10-02 15:03:49 +00:00
NAKAMURA Takumi	6e0f1055bb	test/CodeGen/X86/red-zone2.ll: Add -mtriple=x86_64-linux, and FileCheck-ize. llvm-svn: 164975	2012-10-01 22:48:07 +00:00
Reed Kotler	28ed4620db	checking test case for r164811. was an omission to not check this in. this was already approved llvm-svn: 164972	2012-10-01 21:35:06 +00:00
Michael Liao	f2905fc76e	Fix PR13899 - Update maximal stack alignment when stack arguments are prepared before a call. - Test cases are enhanced to show it's not a Win32 specific issue but a generic one. llvm-svn: 164946	2012-10-01 16:44:04 +00:00
Nadav Rotem	972fdd96a1	Revert r164910 because it causes failures to several phase2 builds. llvm-svn: 164911	2012-09-30 07:17:56 +00:00
Nadav Rotem	f7ea233f5c	A DAGCombine optimization for merging consecutive stores. This optimization is not profitable in many cases because moden processos can store multiple values in parallel, and preparing the consecutive store requires some work. We only handle these cases: 1. Consecutive stores where the values and consecutive loads. For example: int a = p->a; int b = p->b; q->a = a; q->b = b; 2. Consecutive stores where the values are constants. Foe example: q->a = 4; q->b = 5; llvm-svn: 164910	2012-09-30 06:24:14 +00:00
Bob Wilson	ee6a40c517	Add LLVM support for Swift. llvm-svn: 164899	2012-09-29 21:43:49 +00:00
Bob Wilson	fdb7fc6060	Whitespace. llvm-svn: 164898	2012-09-29 21:27:31 +00:00
Duncan Sands	a29a3a24a7	Speculatively revert commit 164885 (nadav) in the hope of ressurecting a pile of buildbots. Original commit message: A DAGCombine optimization for merging consecutive stores. This optimization is not profitable in many cases because moden processos can store multiple values in parallel, and preparing the consecutive store requires some work. We only handle these cases: 1. Consecutive stores where the values and consecutive loads. For example: int a = p->a; int b = p->b; q->a = a; q->b = b; 2. Consecutive stores where the values are constants. Foe example: q->a = 4; q->b = 5; llvm-svn: 164890	2012-09-29 10:25:35 +00:00
Nadav Rotem	37dcc09044	A DAGCombine optimization for merging consecutive stores. This optimization is not profitable in many cases because moden processos can store multiple values in parallel, and preparing the consecutive store requires some work. We only handle these cases: 1. Consecutive stores where the values and consecutive loads. For example: int a = p->a; int b = p->b; q->a = a; q->b = b; 2. Consecutive stores where the values are constants. Foe example: q->a = 4; q->b = 5; llvm-svn: 164885	2012-09-29 06:33:25 +00:00
Evan Cheng	baf11248e4	Do not delete BBs if their addresses are taken. rdar://12396696 llvm-svn: 164866	2012-09-28 23:58:57 +00:00
Akira Hatanaka	30420f3184	MIPS DSP: add operands to make sure instruction strings are being matched. llvm-svn: 164849	2012-09-28 21:23:16 +00:00
Akira Hatanaka	8c74c52ca4	MIPS DSP: other miscellaneous instructions. llvm-svn: 164845	2012-09-28 20:50:31 +00:00
Manman Ren	d4cf759f2e	Testcase for r164835 llvm-svn: 164842	2012-09-28 20:26:33 +00:00
Akira Hatanaka	ca62e0c897	MIPS DSP: ADDUH.QB instruction sub-class. llvm-svn: 164840	2012-09-28 20:16:04 +00:00
Jakob Stoklund Olesen	6f2b596e57	Enable the new coalescer algorithm by default. The new coalescer is better at merging values into unused vector lanes, improving NEON code. llvm-svn: 164794	2012-09-27 21:06:02 +00:00
Akira Hatanaka	4282b930ad	MIPS DSP: ABSQ_S.PH instruction sub-class. llvm-svn: 164787	2012-09-27 19:09:21 +00:00
Akira Hatanaka	1b01d7ee93	MIPS DSP: SHLL.QB instruction sub-class. llvm-svn: 164786	2012-09-27 19:05:08 +00:00
Jakob Stoklund Olesen	ffe0e379b9	Avoid dereferencing a NULL pointer. Fixes PR13943. llvm-svn: 164778	2012-09-27 16:34:19 +00:00
Jush Lu	ff46f6b0c6	[arm-fast-isel] Add support for ELF PIC. This is a preliminary step towards ELF support; currently ARMFastISel hasn't been used for ELF object files yet. llvm-svn: 164759	2012-09-27 05:21:41 +00:00
Akira Hatanaka	573895e81e	Test case for r164755 and 164756. llvm-svn: 164757	2012-09-27 04:12:30 +00:00
Akira Hatanaka	8e6fa1d3a5	MIPS DSP: ADDU.QB instruction sub-class. llvm-svn: 164754	2012-09-27 03:13:59 +00:00
Akira Hatanaka	5a5e58b7ab	MIPS DSP: Branch on Greater Than or Equal To Value 32 in DSPControl Pos Field instruction. llvm-svn: 164751	2012-09-27 02:15:57 +00:00
Akira Hatanaka	b33fd0ad28	MIPS DSP: all the remaining instructions which read or write accumulators. llvm-svn: 164750	2012-09-27 02:11:20 +00:00
Akira Hatanaka	b2ac1bfabe	MIPS DSP: add support for extract-word instructions. llvm-svn: 164749	2012-09-27 02:05:42 +00:00
Akira Hatanaka	804a9036c3	MIPS DSP: add vector load/store patterns. llvm-svn: 164744	2012-09-27 01:50:59 +00:00
NAKAMURA Takumi	f8c0be6df4	ARM/atomicrmw_minmax.ll: Fix RUN line. llvm-svn: 164687	2012-09-26 10:12:20 +00:00
James Molloy	220547e625	Fix ordering of operands on lowering of atomicrmw min/max nodes on ARM. llvm-svn: 164685	2012-09-26 09:48:32 +00:00
NAKAMURA Takumi	af3b653c3e	llvm/test/CodeGen/X86/mulx*.ll: Fix copypasto. llvm-svn: 164681	2012-09-26 09:24:12 +00:00
Michael Liao	4b9634175f	Add SARX/SHRX/SHLX code generation support llvm-svn: 164675	2012-09-26 08:26:25 +00:00
Michael Liao	1a80ec900d	Add RORX code generation support llvm-svn: 164674	2012-09-26 08:24:51 +00:00
Michael Liao	6d6afbf548	Add MULX code generation support llvm-svn: 164673	2012-09-26 08:22:37 +00:00
Bill Wendling	28f1f0139e	Generate an error message instead of asserting or segfaulting when we have a scalar-to-vector conversion that we cannot handle. For instance, when an invalid constraint is used in an inline asm statement. <rdar://problem/12284092> llvm-svn: 164662	2012-09-26 06:16:18 +00:00
Bill Wendling	9a7fa167f9	Generate an error message instead of asserting or segfaulting when we have a scalar-to-vector conversion that we cannot handle. For instance, when an invalid constraint is used in an inline asm statement. <rdar://problem/12284092> llvm-svn: 164657	2012-09-26 04:04:19 +00:00
Michael Liao	b50e89ddce	Add missing i64 max/min/umax/umin on 32-bit target - Turn on atomic6432.ll and add specific test case as well llvm-svn: 164616	2012-09-25 18:08:13 +00:00
Evan Cheng	a5b2ee52c9	Fix an illegal tailcall opt where the callee returns a double via xmm while caller returns x86_fp80 via st0. rdar://12229511 llvm-svn: 164588	2012-09-25 05:32:34 +00:00
Jim Grosbach	484960af64	Mark jump tables in code sections with DataRegion directives. Even out-of-line jump tables can be in the code section, so mark them as data-regions for those targets which support the directives. rdar://12362871&12362974 llvm-svn: 164571	2012-09-24 23:06:27 +00:00
Roman Divacky	87f9c41b1c	Specify MachinePointerInfo as refering to the argument value and offset of the store when handling byval arguments. Thus preventing reordering of the store with load with post-RA scheduler. llvm-svn: 164553	2012-09-24 20:47:19 +00:00
Michael Liao	4b489dd743	Revise test to avoid using of 'grep' llvm-svn: 164472	2012-09-23 02:41:47 +00:00
Michael Liao	240d90ec9b	Enhance test case of atomic16 to verify inst encoding fixed in r164453. llvm-svn: 164465	2012-09-22 21:07:59 +00:00
Chad Rosier	a58913fc00	[fast-isel] Fallback to SelectionDAG isel if we require strict alignment for non-aligned i32 loads/stores. rdar://12304911 llvm-svn: 164381	2012-09-21 16:58:35 +00:00
NAKAMURA Takumi	8c966675e5	llvm/test/CodeGen/X86/pr5145.ll: Tweak expressions to match for darwin target. .LBB0_1: # Linux LBB0_1: # Darwin llvm-svn: 164362	2012-09-21 05:19:19 +00:00
Michael Liao	2197b133f8	Add missing i8 max/min/umax/umin support - Fix PR5145 and turn on test 8-bit atomic ops llvm-svn: 164358	2012-09-21 03:18:52 +00:00
NAKAMURA Takumi	a7a5a7cd7d	llvm/test/CodeGen/ARM/fast-isel.ll: Fix possible typos, s/@unaligned_i16_store/@unaligned_i16_load/g. I guess this had apparently passed in +Asserts possibly due to verborsity. llvm-svn: 164350	2012-09-21 01:15:05 +00:00
Chad Rosier	d80fb0b13d	Testcase does not need to be this strict. llvm-svn: 164347	2012-09-21 00:47:08 +00:00
Chad Rosier	c15c6508e0	Add newline. llvm-svn: 164346	2012-09-21 00:43:18 +00:00
Chad Rosier	8a1b0217f6	[fast-isel] Fallback to SelectionDAG isel if we require strict alignment for non-halfword-aligned i16 loads/stores. rdar://12304911 llvm-svn: 164345	2012-09-21 00:41:42 +00:00
Jim Grosbach	135898ebe3	ARM: Use a dedicated intrinsic for vector bitwise select. The expression based expansion too often results in IR level optimizations splitting the intermediate values into separate basic blocks, preventing the formation of the VBSL instruction as the code author intended. In particular, LICM would often hoist part of the computation out of a loop. rdar://11011471 llvm-svn: 164340	2012-09-21 00:18:20 +00:00
Jakob Stoklund Olesen	801e92ce89	Ignore PHI-defs for -new-coalescer interference checks. A PHI can't create interference on its own. If two live ranges interfere at a PHI, they must also interfere when leaving one of the PHI predecessors. llvm-svn: 164330	2012-09-20 23:08:42 +00:00
Evan Cheng	959ad65636	Try to make these tests more portable. llvm-svn: 164320	2012-09-20 21:35:21 +00:00
Benjamin Kramer	83afbbdcc4	Fix broken check lines. llvm-svn: 164317	2012-09-20 19:54:13 +00:00
Roman Divacky	3f44c24bfd	Specify cpu to get the correct instruction ordering. Remove XFAIL. llvm-svn: 164306	2012-09-20 14:59:42 +00:00
Michael Liao	b269a424a6	Specify CPu to prevent failure on ATOM due to different code scheduling llvm-svn: 164283	2012-09-20 03:34:04 +00:00
Michael Liao	34658dca78	Re-work X86 code generation of atomic ops with spin-loop - Rewrite/merge pseudo-atomic instruction emitters to address the following issue: * Reduce one unnecessary load in spin-loop previously the spin-loop looks like thisMBB: newMBB: ld t1 = [bitinstr.addr] op t2 = t1, [bitinstr.val] not t3 = t2 (if Invert) mov EAX = t1 lcs dest = [bitinstr.addr], t3 [EAX is implicit] bz newMBB fallthrough -->nextMBB the 'ld' at the beginning of newMBB should be lift out of the loop as lcs (or CMPXCHG on x86) will load the current memory value into EAX. This loop is refined as: thisMBB: EAX = LOAD [MI.addr] mainMBB: t1 = OP [MI.val], EAX LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined] JNE mainMBB sinkMBB: * Remove immopc as, so far, all pseudo-atomic instructions has all-register form only, there is no immedidate operand. * Remove unnecessary attributes/modifiers in pseudo-atomic instruction td * Fix issues in PR13458 - Add comprehensive tests on atomic ops on various data types. NOTE: Some of them are turned off due to missing functionality. - Revise tests due to the new spin-loop generated. llvm-svn: 164281	2012-09-20 03:06:15 +00:00
Jakob Stoklund Olesen	557b4e64be	Resolve conflicts involving dead vector lanes for -new-coalescer. A common coalescing conflict in vector code is lane insertion: %dst = FOO %src = BAR %dst:ssub0 = COPY %src The live range of %src interferes with the ssub0 lane of %dst, but that lane is never read after %src would have clobbered it. That makes it safe to merge the live ranges and eliminate the COPY: %dst = FOO %dst:ssub0 = BAR This patch teaches the new coalescer to resolve conflicts where dead vector lanes would be clobbered, at least as long as the clobbered vector lanes don't escape the basic block. llvm-svn: 164250	2012-09-19 21:29:18 +00:00
Michael Liao	2730b7865e	Unify the logic in SelectAtomicLoadAdd and SelectAtomicLoadArith - Merge the processing of LOAD_ADD with other atomic load-arith operations - Separate the logic getting target constant for atomic-load-op and add an optimization for atomic-load-add on i16 with negative value - Optimize a minor case for atomic-fetch-add i16 with negative operand. Test case is revised. llvm-svn: 164243	2012-09-19 19:36:58 +00:00
Jordan Rose	8051e46a49	Really XFAIL test/CodeGen/PowerPC/structsinregs.ll. XFAIL needs a trailing colon. Hopefully this will get the buildbots happy again while Bill works on getting it passing. llvm-svn: 164237	2012-09-19 17:03:11 +00:00
Bill Schmidt	10c15bfd85	XFAIL test/CodeGen/PowerPC/structsinregs.ll llvm-svn: 164233	2012-09-19 16:18:23 +00:00
Bill Schmidt	4e7e64ff70	Small structs for PPC64 SVR4 must be passed right-justified in registers. lib/Target/PowerPC/PPCISelLowering.{h,cpp} Rename LowerFormalArguments_Darwin to LowerFormalArguments_Darwin_Or_64SVR4. Rename LowerFormalArguments_SVR4 to LowerFormalArguments_32SVR4. Receive small structs right-justified in LowerFormalArguments_Darwin_Or_64SVR4. Rename LowerCall_Darwin to LowerCall_Darwin_Or_64SVR4. Rename LowerCall_SVR4 to LowerCall_32SVR4. Pass small structs right-justified in LowerCall_Darwin_Or_64SVR4. test/CodeGen/PowerPC/structsinregs.ll New test. llvm-svn: 164228	2012-09-19 15:42:13 +00:00
Hans Wennborg	ae53a22006	Move load_to_switch.ll to test/CodeGen/SPARC/ Because the test invokes llc -march=sparc, it needs to be in a directory which is only run when the sparc target is built. llvm-svn: 164211	2012-09-19 09:25:03 +00:00
Evan Cheng	1a3416521f	MOVi16 (movw) is only legal on cpus with V6T2 support. rdar://12300648 llvm-svn: 164169	2012-09-18 21:24:16 +00:00
Roman Divacky	1cc7e2c795	Add test for r164155 and remove two tests superseded by ppc64-calls.ll. llvm-svn: 164162	2012-09-18 19:51:44 +00:00
Roman Divacky	bb7740900c	Avoid symbol name clash when filling TOC. Patch by Adhemerval Zanella. llvm-svn: 164141	2012-09-18 17:10:37 +00:00
Roman Divacky	377f342a56	On PPC64 emit the environment pointer. Patch by Adhemerval Zanella. llvm-svn: 164139	2012-09-18 16:55:29 +00:00
Roman Divacky	953cd43dfa	Optimize local func calls to not emit nop for TOC restoration. Patch by Adhemerval Zanella. llvm-svn: 164138	2012-09-18 16:47:58 +00:00
James Molloy	4cb3751b3e	More domain conversion; convert VFP VMOVS to NEON instructions in more cases - when we may clobber the other S-lane by converting an S to a D instruction, make an effort to work out if the S lane is clobberable or not. llvm-svn: 164114	2012-09-18 08:31:15 +00:00
Evan Cheng	82c85585f9	Use vld1 / vst2 for unaligned v2f64 load / store. e.g. Use vld1.16 for 2-byte aligned address. Based on patch by David Peixotto. Also use vld1.64 / vst1.64 with 128-bit alignment to take advantage of alignment hints. rdar://12090772, rdar://12238782 llvm-svn: 164089	2012-09-18 01:42:45 +00:00
Jakob Stoklund Olesen	b761721812	Merge into undefined lanes under -new-coalescer. Add LIS::pruneValue() and extendToIndices(). These two functions are used by the register coalescer when merging two live ranges requires more than a trivial value mapping as supported by LiveInterval::join(). The pruneValue() function can remove the part of a value number that is going to conflict in join(). Afterwards, extendToIndices can restore the live range, using any new dominating value numbers and updating the SSA form. Use this complex value mapping to support merging a register into a vector lane that has a conflicting value, but the clobbered lane is undef. llvm-svn: 164074	2012-09-17 23:03:25 +00:00
Jan Wen Voung	bd8575d1d7	Add some cases to x86 OptimizeCompare to handle DEC and INC, too. While we are setting the earlier def to true, also make it live. llvm-svn: 164056	2012-09-17 22:04:23 +00:00
Michael Liao	b8b653397e	Fix PR13859 - Preserve the original NOutVT during casting from vector to integer by extracting vector elements. llvm-svn: 164042	2012-09-17 18:05:20 +00:00
Silviu Baranga	aa267976b5	Removed the VMLxForwarding feature for the Cortex-A15 target. llvm-svn: 164030	2012-09-17 14:10:54 +00:00
Nadav Rotem	eb2f820871	Fix the testcase to work on all platforms. llvm-svn: 163997	2012-09-16 07:58:47 +00:00
Nadav Rotem	c790bc0984	The PMOVZXWD family of functions had patterns extends narrow vector types to wide vector types. It had patterns for zext-loading and extending. This commit adds patterns for loading a wide type, performing a bitcast, and extending. This is an odd pattern, but it is commonly used when writing code with intrinsics. rdar://11897677 llvm-svn: 163995	2012-09-16 07:39:07 +00:00
Benjamin Kramer	3be8d89f89	X86: Emitting x87 fsin/fcos for sinf/cosf is not safe without unsafe fp math. This was only an issue if sse is disabled. llvm-svn: 163967	2012-09-15 12:44:27 +00:00
Akira Hatanaka	5540fca519	Handled unaligned load/stores properly in Mips16 Patch by Reed Kotler. llvm-svn: 163956	2012-09-15 01:02:03 +00:00
Eric Christopher	db1e1b33f0	Fix both the test for zero and what we do if we have a zero for umulo legalization. Fixes PR13839 llvm-svn: 163856	2012-09-13 23:24:02 +00:00
Michael Liao	0c0da113c5	Add wider vector/integer support for PR12312 - Enhance the fix to PR12312 to support wider integer, such as 256-bit integer. If more than 1 fully evaluated vectors are found, POR them first followed by the final PTEST. llvm-svn: 163832	2012-09-13 20:24:54 +00:00
Michael Liao	7c620b0d5f	Enhance type legalization on bitcast from vector to integer - Find a legal vector type before casting and extracting element from it. - As the new vector type may have more than 2 elements, build the final hi/lo pair by BFS pairing them from bottom to top. llvm-svn: 163830	2012-09-13 19:58:21 +00:00
Jakob Stoklund Olesen	163785928c	Fix test case to avoid PIC magic. llvm-svn: 163827	2012-09-13 19:47:45 +00:00
Jakob Stoklund Olesen	72138019a9	Fix the TCRETURNmi64 bug differently. Add a PatFrag to match X86tcret using 6 fixed registers or less. This avoids folding loads into TCRETURNmi64 using 7 or more volatile registers. <rdar://problem/12282281> llvm-svn: 163819	2012-09-13 18:31:27 +00:00
Jakob Stoklund Olesen	eae8fc91cf	Revert r163761 "Don't fold indexed loads into TCRETURNmi64." The patch caused "Wrong topological sorting" assertions. llvm-svn: 163810	2012-09-13 16:52:17 +00:00
Silviu Baranga	11ff2a551d	This patch introduces A15 as a target in LLVM. llvm-svn: 163803	2012-09-13 15:05:10 +00:00
Nadav Rotem	490052b3d2	Fix a dagcombine optimization. The optimization attempts to optimize a bitcast of fneg to integers by xoring the high-bit. This fails if the source operand is a vector because we need to negate each of the elements in the vector. Fix rdar://12281066 PR13813. llvm-svn: 163802	2012-09-13 14:54:28 +00:00
Nadav Rotem	9d75120c92	Stack Coloring: We have code that checks that all of the uses of allocas are within the lifetime zone. Sometime legitimate usages of allocas are hoisted outside of the lifetime zone. For example, GEPS may calculate the address of a member of an allocated struct. This commit makes sure that we only check (abort regions or assert) for instructions that read and write memory using stack frames directly. Notice that by allowing legitimate usages outside the lifetime zone we also stop checking for instructions which use derivatives of allocas. We will catch less bugs in user code and in the compiler itself. llvm-svn: 163791	2012-09-13 12:38:37 +00:00
Jakob Stoklund Olesen	b15912aafd	Don't fold indexed loads into TCRETURNmi64. We don't have enough GR64_TC registers when calling a varargs function with 6 arguments. Since %al holds the number of vector registers used, only %r11 is available as a scratch register. This means that addressing modes using both base and index registers can't be folded into TCRETURNmi64. <rdar://problem/12282281> llvm-svn: 163761	2012-09-13 00:25:00 +00:00
Michael Liao	e600a8a616	Fix PR11985 - BlockAddress has no support of BA + offset form and there is no way to propagate that offset into machine operand; - Add BA + offset support and a new interface 'getTargetBlockAddress' to simplify target block address forming; - All targets are modified to use new interface and X86 backend is enhanced to support BA + offset addressing. llvm-svn: 163743	2012-09-12 21:43:09 +00:00
Roman Divacky	3d302860e6	This patch corrects logic in PPCFrameLowering for save and restore of nonvolatile condition register fields across calls under the SVR4 ABIs. * With the 64-bit ABI, the save location is at a fixed offset of 8 from the stack pointer. The frame pointer cannot be used to access this portion of the stack frame since the distance from the frame pointer may change with alloca calls. * With the 32-bit ABI, the save location is just below the general register save area, and is accessed via the frame pointer like the rest of the save areas. This is an optional slot, so it must only be created if any of CR2, CR3, and CR4 were modified. * For both ABIs, save/restore logic is generated only if one of the nonvolatile CR fields were modified. I also took this opportunity to clean up an extra FIXME in PPCFrameLowering.h. Save area offsets for 32-bit GPRs are meaningless for the 64-bit ABI, so I removed them for correctness and efficiency. Fixes PR13708 and partially also PR13623. It lets us enable exception handling on PPC64. Patch by William J. Schmidt! llvm-svn: 163713	2012-09-12 14:47:47 +00:00
Kristof Beyls	76d939497b	Fix constant folding through bitcasts by no longer relying on undefined behaviour (converting NaN values between float and double). SelectionDAG::getConstantFP(double Val, EVT VT, bool isTarget); should not be used when Val is not a simple constant (as the comment in SelectionDAG.h indicates). This patch avoids using this function when folding an unknown constant through a bitcast, where it cannot be guaranteed that Val will be a simple constant. llvm-svn: 163703	2012-09-12 11:25:02 +00:00
Nadav Rotem	64c3cf5b29	Stack coloring: remove lifetime intervals which contain escaped allocas. The input program may contain intructions which are not inside lifetime markers. This can happen due to a bug in the compiler or due to a bug in user code (for example, returning a reference to a local variable). This commit adds checks that all of the instructions in the function and invalidates lifetime ranges which do not contain all of the instructions. llvm-svn: 163678	2012-09-12 04:57:37 +00:00
Chad Rosier	2853061544	[ms-inline asm] Split the parsing of IR asm strings into GCC and MS variants. Add support in the EmitMSInlineAsmStr() function for handling integer consts. llvm-svn: 163645	2012-09-11 19:09:56 +00:00
Chad Rosier	9d8aeab807	Formatting. No functional change intended. llvm-svn: 163627	2012-09-11 16:33:10 +00:00
Nadav Rotem	54b95cb654	Stack Coloring: Dont crash on dbg values which use stack frames. llvm-svn: 163616	2012-09-11 12:34:27 +00:00
NAKAMURA Takumi	c2e5cf8e3f	test/CodeGen/X86/ms-inline-asm.ll: Relax for non-darwin x86 targets. '##InlineAsm' could not be seen in other hosts. llvm-svn: 163554	2012-09-10 22:04:54 +00:00
Chad Rosier	1b83624c78	[ms-inline asm] Properly emit the asm directives when the AsmPrinterVariant and InlineAsmVariant don't match. llvm-svn: 163550	2012-09-10 21:36:05 +00:00
Chad Rosier	8311e77359	Update test case for Release builds. llvm-svn: 163549	2012-09-10 21:31:43 +00:00
Chad Rosier	054e489dd3	[ms-inline asm] Pass the correct AsmVariant to the PrintAsmOperand() function and update the printOperand() function accordingly. llvm-svn: 163544	2012-09-10 21:10:49 +00:00
Jakob Stoklund Olesen	ab77839866	Don't attempt to use flags from predicated instructions. The ARM backend can eliminate cmp instructions by reusing flags from a nearby sub instruction with similar arguments. Don't do that if the sub is predicated - the flags are not written unconditionally. <rdar://problem/12263428> llvm-svn: 163535	2012-09-10 19:17:25 +00:00
Nadav Rotem	cab746695d	Stack Coloring: Handle the case where END markers come before BEGIN markers properly. llvm-svn: 163530	2012-09-10 18:51:09 +00:00
Michael Liao	7dfa5e2092	Enhance PR11334 fix to support extload from v2f32/v4f32 - Fix an remaining issue of PR11674 as well llvm-svn: 163528	2012-09-10 18:33:51 +00:00
Michael Liao	2791a08d7e	Add boolean simplification support from CMOV - If a boolean value is generated from CMOV and tested as boolean value, simplify the use of test result by referencing the original condition. RDRAND intrinisc is one of such cases. llvm-svn: 163516	2012-09-10 16:36:16 +00:00
James Molloy	fe38f1d2b0	Fix an assertion failure when optimising a shufflevector incorrectly into concat_vectors, and a followup bug with SelectionDAG::getNode() creating nodes with invalid types. llvm-svn: 163511	2012-09-10 14:01:21 +00:00
Nadav Rotem	d5b75d5eec	Stack Coloring: Add support for multiple regions of the same slot, within a single basic block. llvm-svn: 163507	2012-09-10 12:39:35 +00:00
Elena Demikhovsky	56cdc6a59a	The VPSHUFB 256-bit instruction may be generated when one of input vector is undefined or zeroinitializer. I've added the "zeroinitializer" case in this patch. llvm-svn: 163506	2012-09-10 12:13:11 +00:00
Nadav Rotem	8442a2ec90	Teach the DAGBuilder about lifetime markers which are generated from PHINodes. llvm-svn: 163494	2012-09-10 08:43:23 +00:00
Craig Topper	fb97f05d3c	Teach DAG combiner to constant fold fneg of a BUILD_VECTOR of constants. llvm-svn: 163483	2012-09-09 22:58:45 +00:00
Craig Topper	a91d731898	Add instruction selection for ffloor of vectors when SSE4.1 or AVX is enabled. llvm-svn: 163473	2012-09-08 17:42:27 +00:00
Craig Topper	53ec08b4fc	Add support for lowering FABS of vector types. llvm-svn: 163461	2012-09-08 07:31:51 +00:00
Craig Topper	eb1db45675	Set operation action for FFLOOR to Expand for all vector types for X86. Set FFLOOR of v4f32 to Expand for ARM. v2f64 was already correct. llvm-svn: 163458	2012-09-08 04:58:43 +00:00
Jakob Stoklund Olesen	c89c722370	Allow overlaps between virtreg and physreg live ranges. The RegisterCoalescer understands overlapping live ranges where one register is defined as a copy of the other. With this change, register allocators using LiveRegMatrix can do the same, at least for copies between physical and virtual registers. When a physreg is defined by a copy from a virtreg, allow those live ranges to overlap: %CL<def> = COPY %vreg11:sub_8bit; GR32_ABCD:%vreg11 %vreg13<def,tied1> = SAR32rCL %vreg13<tied0>, %CL<imp-use,kill> We can assign %vreg11 to %ECX, overlapping the live range of %CL. llvm-svn: 163336	2012-09-06 18:15:23 +00:00
Nadav Rotem	8f64175704	Disable stack coloring by default in order to resolve the i386 failures. llvm-svn: 163316	2012-09-06 14:27:06 +00:00
Elena Demikhovsky	9339eef307	AVX2 optimization. Added generation of VPSHUB instruction for <32 x i8> vector shuffle when possible. llvm-svn: 163312	2012-09-06 12:42:01 +00:00
Nadav Rotem	129396262a	Fix the test by specifying an exact cpu model. llvm-svn: 163307	2012-09-06 10:33:33 +00:00
James Molloy	791ec0aa52	Improve codegen for BUILD_VECTORs on ARM. If we have a BUILD_VECTOR that is mostly a constant splat, it is often better to splat that constant then insertelement the non-constant lanes instead of insertelementing every lane from an undef base. llvm-svn: 163304	2012-09-06 09:55:02 +00:00
Nadav Rotem	f25e382cd2	Add a new optimization pass: Stack Coloring, that merges disjoint static allocations (allocas). Allocas are known to be disjoint if they are marked by disjoint lifetime markers (@llvm.lifetime.XXX intrinsics). llvm-svn: 163299	2012-09-06 09:17:37 +00:00
James Molloy	90179e600b	Optimize codegen for VSETLNi{8,16,32} operating on Q registers. Degenerate to a VSETLN on D registers, instead of an (INSERT_SUBREG (VSETLN (EXTRACT_SUBREG ))) sequence to help the register coalescer. llvm-svn: 163298	2012-09-06 09:16:01 +00:00
Craig Topper	b2bad42f00	Add patterns for converting stores of subvector_extracts of lower 128-bits of a 256-bit vector to VMOVAPSmr/VMOVUPSmr. llvm-svn: 163292	2012-09-06 05:15:01 +00:00
Jakob Stoklund Olesen	0324528c8c	Use predication instead of pseudo-opcodes when folding into MOVCC. Now that it is possible to dynamically tie MachineInstr operands, predicated instructions are possible in SSA form: %vreg3<def> = SUBri %vreg1, -2147483647, pred:14, pred:%noreg, %opt:%noreg %vreg4<def,tied1> = MOVCCr %vreg3<tied0>, %vreg1, %pred:12, pred:%CPSR Becomes a predicated SUBri with a tied imp-use: SUBri %vreg1, -2147483647, pred:13, pred:%CPSR, opt:%noreg, %vreg1<imp-use,tied0> This means that any instruction that is safe to move can be folded into a MOVCC, and the *CC pseudo-instructions are no longer needed. The test case changes reflect that Thumb2SizeReduce recognizes the predicated instructions. It didn't understand the pseudos. llvm-svn: 163274	2012-09-05 23:58:02 +00:00
Tim Northover	4e03b89c79	Strip old MachineInstrs after we know we can put them back. Previous patch accidentally decided it couldn't convert a VFP to a NEON instruction after it had already destroyed the old one. Not a good move. llvm-svn: 163230	2012-09-05 18:37:53 +00:00
Pranav Bhandarkar	876ff208b6	LLVM Bug Fix 13709: Remove needless lsr(Rp, #32 ) instruction access the subreg_hireg of register pair Rp. * lib/Target/Hexagon/HexagonPeephole.cpp(PeepholeDoubleRegsMap): New DenseMap similar to PeepholeMap that additionally records subreg info too. (runOnMachineFunction): Record information in PeepholeDoubleRegsMap and copy propagate the high sub-reg of Rp0 in Rp1 = lsr(Rp0, #32) to the instruction Rx = COPY Rp1:logreg_subreg. * test/CodeGen/Hexagon/remove_lsr.ll: New test. llvm-svn: 163214	2012-09-05 16:01:40 +00:00
Silviu Baranga	6f46bb1705	Fixed the DAG combiner to better handle the folding of AND nodes for vector types. The previous code was making the assumption that the length of the bitmask returned by isConstantSplat was equal to the size of the vector type. Now we first make sure that the splat value has at least the length of the vector lane type, then we only use as many fields as we have available in the splat value. llvm-svn: 163203	2012-09-05 08:57:21 +00:00
Logan Chien	a15abb3d65	Fix UseInitArray option for MIPS target. llvm-svn: 163193	2012-09-05 06:17:17 +00:00
Jakob Stoklund Olesen	ef5dcf47b8	Move tie checks into MachineVerifier::visitMachineOperand. llvm-svn: 163152	2012-09-04 18:38:28 +00:00
Preston Gurd	c80dc7d214	Generic Bypass Slow Div - CodeGenPrepare pass for identifying div/rem ops - Backend specifies the type mapping using addBypassSlowDivType - Enabled only for Intel Atom with O2 32-bit -> 8-bit - Replace IDIV with instructions which test its value and use DIVB if the value is positive and less than 256. - In the case when the quotient and remainder of a divide are used a DIV and a REM instruction will be present in the IR. In the non-Atom case they are both lowered to IDIVs and CSE removes the redundant IDIV instruction, using the quotient and remainder from the first IDIV. However, due to this optimization CSE is not able to eliminate redundant IDIV instructions because they are located in different basic blocks. This is overcome by calculating both the quotient (DIV) and remainder (REM) in each basic block that is inserted by the optimization and reusing the result values when a subsequent DIV or REM instruction uses the same operands. - Test cases check for the presents of the optimization when calculating either the quotient, remainder, or both. Patch by Tyler Nowicki! llvm-svn: 163150	2012-09-04 18:22:17 +00:00
Sergei Larin	905bc1964f	Porting Hexagon MI Scheduler to the new API. Change current Hexagon MI scheduler to use new converging scheduler. Integrates DFA resource model into it. llvm-svn: 163137	2012-09-04 14:49:56 +00:00
Arnold Schwaighofer	d606c6fcdf	Patch to implement UMLAL/SMLAL instructions for the ARM architecture This patch corrects the definition of umlal/smlal instructions and adds support for matching them to the ARM dag combiner. Bug 12213 Patch by Yin Ma! llvm-svn: 163136	2012-09-04 14:37:49 +00:00
Elena Demikhovsky	61924c155d	This patch optimizes shuffle instruction - generates 2 instructions instead of 4. Since this specific shuffle is widely used in many workloads we have ~10% performance on them. shufflevector <8 x float> %A, <8 x float> %B, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14> vmovaps (%rdx), %ymm0 vshufps $8, %ymm0, %ymm0, %ymm0 vmovaps (%rcx), %ymm1 vshufps $8, %ymm0, %ymm1, %ymm1 vunpcklps %ymm0, %ymm1, %ymm0 vmovaps (%rcx), %ymm0 vmovsldup (%rdx), %ymm1 vblendps $85, %ymm0, %ymm1, %ymm0 llvm-svn: 163134	2012-09-04 12:49:02 +00:00
Nadav Rotem	d1815a0763	Not all targets have efficient ISel code generation for select instructions. For example, the ARM target does not have efficient ISel handling for vector selects with scalar conditions. This patch adds a TLI hook which allows the different targets to report which selects are supported well and which selects should be converted to CF duting codegen prepare. llvm-svn: 163093	2012-09-02 12:10:19 +00:00
Nadav Rotem	3425b10f0f	Generate better select code by allowing the target to use scalar select, and not sign-extend. llvm-svn: 163086	2012-09-02 08:20:07 +00:00
Pete Cooper	78e01afae1	Revert "Take account of boolean vector contents when promoting a build vector from i1 to some other type. rdar://problem/12210060" This reverts commit 5dd9e214fb92847e947f9edab170f9b4e52b908f. Thanks to Duncan for explaining how this should have been done. Conflicts: test/CodeGen/X86/vec_select.ll llvm-svn: 163064	2012-09-01 17:37:55 +00:00
Logan Chien	b022dbf7dc	Fix Thumb2 fixup kind in the integrated-as. llvm-svn: 163063	2012-09-01 15:06:36 +00:00
Owen Anderson	27ba45c764	Teach DAG combine a number of tricks to simplify FMA expressions in fast-math mode. llvm-svn: 163051	2012-09-01 06:04:27 +00:00
NAKAMURA Takumi	528a7eb3d4	llvm/test/CodeGen/X86/fp-fast.ll: Suppress FMA4 on AMD Bulldozer host, corresponding to r162999. llvm-svn: 163041	2012-09-01 00:26:28 +00:00

... 2 3 4 5 6 ...

6647 Commits