llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 22:42:46 +02:00

Author	SHA1	Message	Date
Quentin Colombet	c82cc9dc57	[ShrinkWrap] Add (a simplified version) of shrink-wrapping. This patch introduces a new pass that computes the safe point to insert the prologue and epilogue of the function. The interest is to find safe points that are cheaper than the entry and exits blocks. As an example and to avoid regressions to be introduce, this patch also implements the required bits to enable the shrink-wrapping pass for AArch64. Context Currently we insert the prologue and epilogue of the method/function in the entry and exits blocks. Although this is correct, we can do a better job when those are not immediately required and insert them at less frequently executed places. The job of the shrink-wrapping pass is to identify such places. Motivating example Let us consider the following function that perform a call only in one branch of a if: define i32 @f(i32 %a, i32 %b) { %tmp = alloca i32, align 4 %tmp2 = icmp slt i32 %a, %b br i1 %tmp2, label %true, label %false true: store i32 %a, i32* %tmp, align 4 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp) br label %false false: %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ] ret i32 %tmp.0 } On AArch64 this code generates (removing the cfi directives to ease readabilities): _f: ; @f ; BB#0: stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething LBB0_2: ; %false mov sp, x29 ldp x29, x30, [sp], #16 ret With shrink-wrapping we could generate: _f: ; @f ; BB#0: cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething add sp, x29, #16 ; =16 ldp x29, x30, [sp], #16 LBB0_2: ; %false ret Therefore, we would pay the overhead of setting up/destroying the frame only if we actually do the call. Proposed Solution This patch introduces a new machine pass that perform the shrink-wrapping analysis (See the comments at the beginning of ShrinkWrap.cpp for more details). It then stores the safe save and restore point into the MachineFrameInfo attached to the MachineFunction. This information is then used by the PrologEpilogInserter (PEI) to place the related code at the right place. This pass runs right before the PEI. Unlike the original paper of Chow from PLDI’88, this implementation of shrink-wrapping does not use expensive data-flow analysis and does not need hack to properly avoid frequently executed point. Instead, it relies on dominance and loop properties. The pass is off by default and each target can opt-in by setting the EnableShrinkWrap boolean to true in their derived class of TargetPassConfig. This setting can also be overwritten on the command line by using -enable-shrink-wrap. Before you try out the pass for your target, make sure you properly fix your emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not necessarily the entry block. Design Decisions 1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but for debugging and clarity I thought it was best to have its own file. 2. Right now, we only support one save point and one restore point. At some point we can expand this to several save point and restore point, the impacted component would then be: - The pass itself: New algorithm needed. - MachineFrameInfo: Hold a list or set of Save/Restore point instead of one pointer. - PEI: Should loop over the save point and restore point. Anyhow, at least for this first iteration, I do not believe this is interesting to support the complex cases. We should revisit that when we motivating examples. Differential Revision: http://reviews.llvm.org/D9210 <rdar://problem/3201744> llvm-svn: 236507	2015-05-05 17:38:16 +00:00
Eric Christopher	70a4ec2213	In preparation for moving ARM's TargetRegisterInfo to the TargetMachine merge Thumb1RegisterInfo and Thumb2RegisterInfo. This will enable us to match the TargetMachine for our TargetRegisterInfo classes. llvm-svn: 232117	2015-03-12 22:48:50 +00:00
Eric Christopher	32ae945f51	Have getCalleeSavedRegs take a non-null MachineFunction all the time. The target independent code was passing in one all the time and targets weren't checking validity before using. Update a few calls to pass in a MachineFunction where necessary. llvm-svn: 231970	2015-03-11 21:41:28 +00:00
Tim Northover	5e9ed07177	ARM: simplify and extend byval handling The main issue being fixed here is that APCS targets handling a "byval align N" parameter with N > 4 were miscounting what objects were where on the stack, leading to FrameLowering setting the frame pointer incorrectly and clobbering the stack. But byval handling had grown over many years, and had multiple layers of cruft trying to compensate for each other and calculate padding correctly. This only really needs to be done once, in the HandleByVal function. Elsewhere should just do what it's told by that call. I also stripped out unnecessary APCS/AAPCS distinctions (now that Clang emits byvals with the correct C ABI alignment), which simplified HandleByVal. rdar://20095672 llvm-svn: 231959	2015-03-11 18:54:22 +00:00
Eric Christopher	1e7b535a5c	Migrate ARM except for TTI, AsmPrinter, and frame lowering away from getSubtargetImpl. llvm-svn: 227399	2015-01-29 00:19:33 +00:00
Adrian Prantl	566772c5d4	Thumb1 frame lowering: Mark CFI instructions with the FrameSetup flag. Followup to r224294: ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions. Debug info marks the first instruction without the FrameSetup flag as being the end of the function prologue. Any CFI instructions in the middle of the function prologue would cause debug info to end the prologue too early and worse, attach the line number of the CFI instruction, which incidentally is often 0. llvm-svn: 224743	2014-12-22 23:09:14 +00:00
Jonathan Roelofs	7bcdffb32c	Re-apply r214881: Fix return sequence on armv4 thumb This reverts r214893, re-applying r214881 with the test case relaxed a bit to satiate the build bots. POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214928	2014-08-05 21:32:21 +00:00
Jonathan Roelofs	5721e98d43	Revert r214881 because it broke lots of build-bots llvm-svn: 214893	2014-08-05 17:36:05 +00:00
Jonathan Roelofs	32bc0b3143	Fix return sequence on armv4 thumb POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214881	2014-08-05 17:13:17 +00:00
Eric Christopher	67c04e77e5	Have MachineFunction cache a pointer to the subtarget to make lookups shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. llvm-svn: 214838	2014-08-05 02:39:49 +00:00
Eric Christopher	99307e99a2	Remove the TargetMachine forwards for TargetSubtargetInfo based information and update all callers. No functional change. llvm-svn: 214781	2014-08-04 21:25:23 +00:00
Eric Christopher	1272c4e1c4	Move the frame lowering constructors out of line to avoid circular includes. llvm-svn: 211798	2014-06-26 19:29:59 +00:00
Craig Topper	694437e2ef	Make consistent use of MCPhysReg instead of uint16_t throughout the tree. llvm-svn: 205610	2014-04-04 05:16:06 +00:00
Rafael Espindola	cb9ca86245	Replace PROLOG_LABEL with a new CFI_INSTRUCTION. The old system was fairly convoluted: * A temporary label was created. * A single PROLOG_LABEL was created with it. * A few MCCFIInstructions were created with the same label. The semantics were that the cfi instructions were mapped to the PROLOG_LABEL via the temporary label. The output position was that of the PROLOG_LABEL. The temporary label itself was used only for doing the mapping. The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to one by holding an index into the CFI instructions of this function. I did consider removing MMI.getFrameInstructions completelly and having CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non trivial constructors and destructors and are somewhat big, so the this setup is probably better. The net result is that we don't create temporary labels that are never used. llvm-svn: 203204	2014-03-07 06:08:31 +00:00
David Blaikie	c4f7318790	Fix clang -Werror build break due to mismatched sign comparison. Originally committed in r202985. llvm-svn: 202992	2014-03-05 18:53:36 +00:00
Oliver Stannard	6aa486598a	ARM: Correctly align arguments after a byval struct is passed on the stack llvm-svn: 202985	2014-03-05 15:25:27 +00:00
Benjamin Kramer	e4eb1b495f	[C++11] Replace llvm::next and llvm::prior with std::next and std::prev. Remove the old functions. llvm-svn: 202636	2014-03-02 12:27:27 +00:00
Artyom Skrobov	10498a713c	Generate the DWARF stack frame decode operations in the function prologue for ARM/Thumb functions. Patch by Keith Walker! llvm-svn: 201423	2014-02-14 17:19:07 +00:00
Tim Northover	fae3207c66	ARM: correctly determine final tBX_LR in Thumb1 functions The changes caused by folding an sp-adjustment into a "pop" previously disrupted the forward search for the final real instruction in a terminating block. This switches to a backward search (skipping debug instrs). This fixes PR18399. Patch by Zhaoshi. llvm-svn: 199266	2014-01-14 22:53:28 +00:00
Tim Northover	e0e3fee19b	ARM MachO: sort out isTargetDarwin/isTargetIOS/... checks. The ARM backend has been using most of the MachO related subtarget checks almost interchangeably, and since the only target it's had to run on has been IOS (which is all three of MachO, Darwin and IOS) it's worked out OK so far. But we'd like to support embedded targets under the "--none-macho" triple, which means everything starts falling apart and inconsistent behaviours emerge. This patch should pick a reasonably sensible set of behaviours for the new triple (and any others that come along, with luck). Some choices were debatable (notably FP == r7 or r11), but we can revisit those later when deficiencies become apparent. llvm-svn: 198617	2014-01-06 14:28:05 +00:00
Tim Northover	c144b1204e	ARM: decide whether to use movw/movt based on "minsize" attribute. llvm-svn: 196102	2013-12-02 14:46:26 +00:00
Tim Northover	bcd72d7348	ARM: fix bug in -Oz stack adjustment folding Previously, we clobbered callee-saved registers when folding an "add sp, #N" into a "pop {rD, ...}" instruction. This change checks whether a register we're going to add to the "pop" could actually be live outside the function before doing so and should fix the issue. This should fix PR18081. llvm-svn: 196046	2013-12-01 14:16:24 +00:00
Tim Northover	e68673eeb6	ARM: fold prologue/epilogue sp updates into push/pop for code size ARM prologues usually look like: push {r7, lr} sub sp, sp, #4 If code size is extremely important, this can be optimised to the single instruction: push {r6, r7, lr} where we don't actually care about the contents of r6, but pushing it subtracts 4 from sp as a side effect. This should implement such a conversion, predicated on the "minsize" function attribute (-Oz) since I've yet to find any code it actually makes faster. llvm-svn: 194264	2013-11-08 17:18:07 +00:00
Tim Northover	96044852e0	ARM: remove unnecessary state-tracking during frame lowering. ResolveFrameIndex had what appeared to be a very nasty hack for when the frame-index referred to a callee-saved register. In this case it "adjusted" the offset so that the address was correct if (and only if) the MachineInstr immediately followed the respective push. This "worked" for all forms of GPR & DPR but was only ever used to set the frame pointer itself, and once this was put in a more sensible location the entire state-tracking machinery it relied on became redundant. So I stripped it. The only wrinkle is that "add r7, sp, #0" might theoretically be slower (need an actual ALU slot) compared to "mov r7, sp" so I added a micro-optimisation that also makes emitARMRegUpdate and emitT2RegUpdate also work when NumBytes == 0. No test changes since there shouldn't be any functionality change. llvm-svn: 194025	2013-11-04 23:04:15 +00:00
Stepan Dyatkovskiy	adb91e7ed9	PR15868 fix. Introduction: In case when stack alignment is 8 and GPRs parameter part size is not N8: we add padding to GPRs part, so part's last byte must be recovered at address K8-1. We need to do it, since remained (stack) part of parameter starts from address K8, and we need to "attach" "GPRs head" without gaps to it: Stack: \|---- 8 bytes block ----\| \|---- 8 bytes block ----\| \|---- 8 bytes... [ [padding] [GPRs head] ] [ ------ Tail passed via stack ------ ... FIX: Note, once we added padding we need to correct all* Arg offsets that are going after padded one. That's why we need this fix: Arg offsets were never corrected before this patch. See new test-cases included in patch. We also don't need to insert padding for byval parameters that are stored in GPRs only. We need pad only last byval parameter and only in case it outsides GPRs and stack alignment = 8. Though, stack area, allocated for recovered byval params, must satisfy "Size mod 8 = 0" restriction. This patch reduces stack usage for some cases: We can reduce ArgRegsSaveArea since inner N*4 bytes sized byval params my be "packed" with alignment 4 in some cases. llvm-svn: 182237	2013-05-20 08:01:34 +00:00
Stepan Dyatkovskiy	fd08e5fdd4	Refactoring patch. 1. VarArgStyleRegisters: functionality that emits "store" instructions for byval regs moved out into separated method "StoreByValRegs". Before this patch VarArgStyleRegisters had confused use-cases. It was used for both variadic functions and for regular functions with byval parameters. In last case it created new stack-frame and registered it as VarArg frame, that is wrong. This patch replaces VarArgsStyleRegisters usage for byval parameters with StoreByValRegs method. 2. In ARMMachineFunctionInfo, "get/setVarArgsRegSaveSize" was renamed to "get/setArgRegsSaveSize". By the same reason. Sometimes it was used for variadic functions, and sometimes for byval parameters in regular functions. Actually, this property means the size of registers, that keeps arguments, and thats why it was renamed. 3. In ARMISelLowering.cpp, ARMTargetLowering class, in methods computeRegArea and StoreByValRegs, VARegXXXXXX was renamed to ArgRegsXXXXXX still by the same reasons. llvm-svn: 180774	2013-04-30 07:19:58 +00:00
Eli Bendersky	37f247b8d8	Move the eliminateCallFramePseudoInstr method from TargetRegisterInfo to TargetFrameLowering, where it belongs. Incidentally, this allows us to delete some duplicated (and slightly different!) code in TRI. There are potentially other layering problems that can be cleaned up as a result, or in a similar manner. The refactoring was OK'd by Anton Korobeynikov on llvmdev. Note: this touches the target interfaces, so out-of-tree targets may be affected. llvm-svn: 175788	2013-02-21 20:05:00 +00:00
Logan Chien	740a4514e2	Fix thumbv5e frame lowering assertion failure. It is possible that frame pointer is not found in the callee saved info, thus FramePtrSpillFI may be incorrect if we don't check the result of hasFP(MF). Besides, if we enable the stack coloring algorithm, there will be an assertion to ensure the slot is live. But in the test case, %var1 is not live in the prologue of the function, and we will get the assertion failure. Note: There is similar code in ARMFrameLowering.cpp. llvm-svn: 175616	2013-02-20 12:21:33 +00:00
Jakob Stoklund Olesen	c81d04b28d	Add an MF argument to MI::copyImplicitOps(). This function is often used to decorate dangling instructions, so a context reference is required to allocate memory for the operands. Also add a corresponding MachineInstrBuilder method. llvm-svn: 170797	2012-12-20 22:54:02 +00:00
Craig Topper	0534d071b7	Reorder includes to match coding standards. Fix an issue or two exposed by that. llvm-svn: 152978	2012-03-17 07:33:42 +00:00
Craig Topper	585b4225c3	Use uint16_t to store registers in callee saved register tables to reduce size of static data. llvm-svn: 151996	2012-03-04 03:33:22 +00:00
Jia Liu	b077b6085d	Emacs-tag and some comment fix for all ARM, CellSPU, Hexagon, MBlaze, MSP430, PPC, PTX, Sparc, X86, XCore. llvm-svn: 150878	2012-02-18 12:03:15 +00:00
Evan Cheng	6a08916ce2	Don't forget to transfer implicit uses of return instruction. llvm-svn: 147752	2012-01-08 20:41:16 +00:00
Evan Cheng	4009ad2e33	Copy implicit defs (e.g. r0) when changing tBX_RET to tPOP_RET. This bug is exposed with an upcoming change will would delete the copy to return register because there is no use! It's amazing anything works. llvm-svn: 147715	2012-01-07 02:55:54 +00:00
Evan Cheng	4e60b65bc6	Fix more places which should be checking for iOS, not darwin. llvm-svn: 147513	2012-01-04 01:55:04 +00:00
Chad Rosier	38661ab3ce	Revert 142337. Thumb1 still doesn't support dynamic stack realignment. :( llvm-svn: 142557	2011-10-20 00:07:12 +00:00
Chad Rosier	eb469f466b	Add support for dynamic stack realignment when in thumb1 mode. rdar://10288916 llvm-svn: 142337	2011-10-18 05:28:00 +00:00
Chad Rosier	42a60003a1	Thumb1 does not support dynamic stack realignment. rdar://10288916 is tracking this fix. In the past, instcombine and other passes were promoting alloca alignment past the natural alignment, resulting in dynamic stack realignment. Lang's work now prevents this from happening (LLVM commit r141599). Now that this really shouldn't happen report a fatal error rather than silently generate bad code. llvm-svn: 142028	2011-10-15 00:28:24 +00:00
Chad Rosier	2cb0d1eddf	Revert r140924 "Attempt to fix dynamic stack realignment for thumb1 functions." to appease nightly testers. Not quite there yet. llvm-svn: 140953	2011-10-01 19:30:36 +00:00
Chad Rosier	ff29430882	Attempt to fix dynamic stack realignment for thumb1 functions. It is in fact useful if an optimization assumes the stack has been realigned. Credit to Eli for his assistance. rdar://10043857 llvm-svn: 140924	2011-10-01 02:03:18 +00:00
Jim Grosbach	74f96e7f3c	Tidy up a few 80 column violations. llvm-svn: 139636	2011-09-13 20:30:37 +00:00
Jim Grosbach	b33129ebad	Thumb1 ADD/SUB SP instructions are predicable in Thumb2 mode. Add the predicate operand to the instructions. Update the back end accordingly where the instructions are used. Restrict the SP operands to actually only be SP, as otherwise these break assembly parsing for the normal instruction variants. llvm-svn: 138445	2011-08-24 17:46:13 +00:00
Jim Grosbach	2b8103505a	Make tBX_RET and tBX_RET_vararg predicable. The normal tBX instruction is predicable, so there's no reason the pseudos for using it as a return shouldn't be. Gives us some nice code-gen improvements as can be seen by the test changes. In particular, several tests now have to disable if-conversion because it works too well and defeats the test. llvm-svn: 134746	2011-07-08 21:50:04 +00:00
Jim Grosbach	351dcca2cb	Refact ARM Thumb1 tMOVr instruction family. Merge the tMOVr, tMOVgpr2tgpr, tMOVtgpr2gpr, and tMOVgpr2gpr instructions into tMOVr. There's no need to keep them separate. Giving the tMOVr instruction the proper GPR register class for its operands is sufficient to give the register allocator enough information to do the right thing directly. llvm-svn: 134204	2011-06-30 23:38:17 +00:00
Jim Grosbach	32d3b2625b	Thumb1 register to register MOV instruction is predicable. Fix a FIXME and allow predication (in Thumb2) for the T1 register to register MOV instructions. This allows some better codegen with if-conversion (as seen in the test updates), plus it lays the groundwork for pseudo-izing the tMOVCC instructions. llvm-svn: 134197	2011-06-30 22:10:46 +00:00
Jim Grosbach	6dd5433c5c	Refactor away tSpill and tRestore pseudos in ARM backend. The tSpill and tRestore instructions are just copies of the tSTRspi and tLDRspi instructions, respectively. Just use those directly instead. llvm-svn: 134092	2011-06-29 20:26:39 +00:00
Jim Grosbach	3713c9ddfa	Fix coordination for using R4 in Thumb1 as a scratch for SP restore. The logic for reserving R4 for use as a scratch needs to match that for actually using it. Also, it's not necessary for immediate <=508, so adjust the value checked. llvm-svn: 132934	2011-06-13 21:18:25 +00:00
Anton Korobeynikov	d8873d31a8	Implement frame unwinding information emission for Thumb1. Not finished yet because there is no way given the constpool index to examine the actual entry: the reason is clones inserted by constant island pass, which are not tracked at all! The only connection is done during asmprinting time via magic label names which is really gross and needs to be eventually fixed. llvm-svn: 127104	2011-03-05 18:43:50 +00:00
Anton Korobeynikov	917ca94111	Preliminary support for ARM frame save directives emission via MI flags. This is just very first approximation how the stuff should be done (e.g. ARM-only for now). More to follow. llvm-svn: 127101	2011-03-05 18:43:32 +00:00
Jakob Stoklund Olesen	0f2b9d9dc4	Teach frame lowering to ignore debug values after the terminators. llvm-svn: 123399	2011-01-13 21:28:52 +00:00

1 2

52 Commits