1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 22:42:46 +02:00
Commit Graph

52 Commits

Author SHA1 Message Date
Quentin Colombet
c82cc9dc57 [ShrinkWrap] Add (a simplified version) of shrink-wrapping.
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.

As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.


** Context **

Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.


** Motivating example **

Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b)  {
 %tmp = alloca i32, align 4
 %tmp2 = icmp slt i32 %a, %b
 br i1 %tmp2, label %true, label %false

true:
 store i32 %a, i32* %tmp, align 4
 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
 br label %false

false:
 %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
 ret i32 %tmp.0
}

On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f:                                     ; @f
; BB#0:
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
LBB0_2:                                 ; %false
  mov  sp, x29
  ldp x29, x30, [sp], #16
  ret

With shrink-wrapping we could generate:
_f:                                     ; @f
; BB#0:
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
  add sp, x29, #16            ; =16
  ldp x29, x30, [sp], #16
LBB0_2:                                 ; %false
  ret

Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.


** Proposed Solution **

This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.

Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.

The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.

Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.


** Design Decisions **

1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
  pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.

Differential Revision: http://reviews.llvm.org/D9210

<rdar://problem/3201744>

llvm-svn: 236507
2015-05-05 17:38:16 +00:00
Eric Christopher
70a4ec2213 In preparation for moving ARM's TargetRegisterInfo to the TargetMachine
merge Thumb1RegisterInfo and Thumb2RegisterInfo. This will enable
us to match the TargetMachine for our TargetRegisterInfo classes.

llvm-svn: 232117
2015-03-12 22:48:50 +00:00
Eric Christopher
32ae945f51 Have getCalleeSavedRegs take a non-null MachineFunction all the
time. The target independent code was passing in one all the
time and targets weren't checking validity before using. Update
a few calls to pass in a MachineFunction where necessary.

llvm-svn: 231970
2015-03-11 21:41:28 +00:00
Tim Northover
5e9ed07177 ARM: simplify and extend byval handling
The main issue being fixed here is that APCS targets handling a "byval align N"
parameter with N > 4 were miscounting what objects were where on the stack,
leading to FrameLowering setting the frame pointer incorrectly and clobbering
the stack.

But byval handling had grown over many years, and had multiple layers of cruft
trying to compensate for each other and calculate padding correctly. This only
really needs to be done once, in the HandleByVal function. Elsewhere should
just do what it's told by that call.

I also stripped out unnecessary APCS/AAPCS distinctions (now that Clang emits
byvals with the correct C ABI alignment), which simplified HandleByVal.

rdar://20095672

llvm-svn: 231959
2015-03-11 18:54:22 +00:00
Eric Christopher
1e7b535a5c Migrate ARM except for TTI, AsmPrinter, and frame lowering
away from getSubtargetImpl.

llvm-svn: 227399
2015-01-29 00:19:33 +00:00
Adrian Prantl
566772c5d4 Thumb1 frame lowering: Mark CFI instructions with the FrameSetup flag.
Followup to r224294:

ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions.
Debug info marks the first instruction without the FrameSetup flag
as being the end of the function prologue. Any CFI instructions in the
middle of the function prologue would cause debug info to end the prologue
too early and worse, attach the line number of the CFI instruction, which
incidentally is often 0.

llvm-svn: 224743
2014-12-22 23:09:14 +00:00
Jonathan Roelofs
7bcdffb32c Re-apply r214881: Fix return sequence on armv4 thumb
This reverts r214893, re-applying r214881 with the test case relaxed a bit to
satiate the build bots.

POP on armv4t cannot be used to change thumb state (unilke later non-m-class
architectures), therefore we need a different return sequence that uses 'bx'
instead:

  POP {r3}
  ADD sp, #offset
  BX r3

This patch also fixes an issue where the return value in r3 would get clobbered
for functions that return 128 bits of data. In that case, we generate this
sequence instead:

  MOV ip, r3
  POP {r3}
  ADD sp, #offset
  MOV lr, r3
  MOV r3, ip
  BX lr

http://reviews.llvm.org/D4748

llvm-svn: 214928
2014-08-05 21:32:21 +00:00
Jonathan Roelofs
5721e98d43 Revert r214881 because it broke lots of build-bots
llvm-svn: 214893
2014-08-05 17:36:05 +00:00
Jonathan Roelofs
32bc0b3143 Fix return sequence on armv4 thumb
POP on armv4t cannot be used to change thumb state (unilke later non-m-class
architectures), therefore we need a different return sequence that uses 'bx'
instead:

  POP {r3}
  ADD sp, #offset
  BX r3

This patch also fixes an issue where the return value in r3 would get clobbered
for functions that return 128 bits of data. In that case, we generate this
sequence instead:

  MOV ip, r3
  POP {r3}
  ADD sp, #offset
  MOV lr, r3
  MOV r3, ip
  BX lr

http://reviews.llvm.org/D4748

llvm-svn: 214881
2014-08-05 17:13:17 +00:00
Eric Christopher
67c04e77e5 Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.

Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.

llvm-svn: 214838
2014-08-05 02:39:49 +00:00
Eric Christopher
99307e99a2 Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

llvm-svn: 214781
2014-08-04 21:25:23 +00:00
Eric Christopher
1272c4e1c4 Move the frame lowering constructors out of line to avoid circular
includes.

llvm-svn: 211798
2014-06-26 19:29:59 +00:00
Craig Topper
694437e2ef Make consistent use of MCPhysReg instead of uint16_t throughout the tree.
llvm-svn: 205610
2014-04-04 05:16:06 +00:00
Rafael Espindola
cb9ca86245 Replace PROLOG_LABEL with a new CFI_INSTRUCTION.
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.

The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.

The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.

I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.

The net result is that we don't create temporary labels that are never used.

llvm-svn: 203204
2014-03-07 06:08:31 +00:00
David Blaikie
c4f7318790 Fix clang -Werror build break due to mismatched sign comparison.
Originally committed in r202985.

llvm-svn: 202992
2014-03-05 18:53:36 +00:00
Oliver Stannard
6aa486598a ARM: Correctly align arguments after a byval struct is passed on the stack
llvm-svn: 202985
2014-03-05 15:25:27 +00:00
Benjamin Kramer
e4eb1b495f [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.
Remove the old functions.

llvm-svn: 202636
2014-03-02 12:27:27 +00:00
Artyom Skrobov
10498a713c Generate the DWARF stack frame decode operations in the function prologue for ARM/Thumb functions.
Patch by Keith Walker!

llvm-svn: 201423
2014-02-14 17:19:07 +00:00
Tim Northover
fae3207c66 ARM: correctly determine final tBX_LR in Thumb1 functions
The changes caused by folding an sp-adjustment into a "pop" previously
disrupted the forward search for the final real instruction in a
terminating block. This switches to a backward search (skipping debug
instrs).

This fixes PR18399.

Patch by Zhaoshi.

llvm-svn: 199266
2014-01-14 22:53:28 +00:00
Tim Northover
e0e3fee19b ARM MachO: sort out isTargetDarwin/isTargetIOS/... checks.
The ARM backend has been using most of the MachO related subtarget
checks almost interchangeably, and since the only target it's had to
run on has been IOS (which is all three of MachO, Darwin and IOS) it's
worked out OK so far.

But we'd like to support embedded targets under the "*-*-none-macho"
triple, which means everything starts falling apart and inconsistent
behaviours emerge.

This patch should pick a reasonably sensible set of behaviours for the
new triple (and any others that come along, with luck). Some choices
were debatable (notably FP == r7 or r11), but we can revisit those
later when deficiencies become apparent.

llvm-svn: 198617
2014-01-06 14:28:05 +00:00
Tim Northover
c144b1204e ARM: decide whether to use movw/movt based on "minsize" attribute.
llvm-svn: 196102
2013-12-02 14:46:26 +00:00
Tim Northover
bcd72d7348 ARM: fix bug in -Oz stack adjustment folding
Previously, we clobbered callee-saved registers when folding an "add
sp, #N" into a "pop {rD, ...}" instruction. This change checks whether
a register we're going to add to the "pop" could actually be live
outside the function before doing so and should fix the issue.

This should fix PR18081.

llvm-svn: 196046
2013-12-01 14:16:24 +00:00
Tim Northover
e68673eeb6 ARM: fold prologue/epilogue sp updates into push/pop for code size
ARM prologues usually look like:
    push {r7, lr}
    sub sp, sp, #4

If code size is extremely important, this can be optimised to the single
instruction:
    push {r6, r7, lr}

where we don't actually care about the contents of r6, but pushing it subtracts
4 from sp as a side effect.

This should implement such a conversion, predicated on the "minsize" function
attribute (-Oz) since I've yet to find any code it actually makes faster.

llvm-svn: 194264
2013-11-08 17:18:07 +00:00
Tim Northover
96044852e0 ARM: remove unnecessary state-tracking during frame lowering.
ResolveFrameIndex had what appeared to be a very nasty hack for when the
frame-index referred to a callee-saved register. In this case it "adjusted" the
offset so that the address was correct if (and only if) the MachineInstr
immediately followed the respective push.

This "worked" for all forms of GPR & DPR but was only ever used to set the
frame pointer itself, and once this was put in a more sensible location the
entire state-tracking machinery it relied on became redundant. So I stripped
it.

The only wrinkle is that "add r7, sp, #0" might theoretically be slower (need
an actual ALU slot) compared to "mov r7, sp" so I added a micro-optimisation
that also makes emitARMRegUpdate and emitT2RegUpdate also work when NumBytes ==
0.

No test changes since there shouldn't be any functionality change.

llvm-svn: 194025
2013-11-04 23:04:15 +00:00
Stepan Dyatkovskiy
adb91e7ed9 PR15868 fix.
Introduction:
In case when stack alignment is 8 and GPRs parameter part size is not N*8:
we add padding to GPRs part, so part's last byte must be recovered at
address K*8-1.
We need to do it, since remained (stack) part of parameter starts from
address K*8, and we need to "attach" "GPRs head" without gaps to it:

Stack:
|---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8 bytes...
[ [padding] [GPRs head] ] [ ------ Tail passed via stack  ------ ...

FIX:
Note, once we added padding we need to correct *all* Arg offsets that are going
after padded one. That's why we need this fix: Arg offsets were never corrected
before this patch. See new test-cases included in patch.

We also don't need to insert padding for byval parameters that are stored in GPRs
only. We need pad only last byval parameter and only in case it outsides GPRs
and stack alignment = 8.
Though, stack area, allocated for recovered byval params, must satisfy
"Size mod 8 = 0" restriction.

This patch reduces stack usage for some cases:
We can reduce ArgRegsSaveArea since inner N*4 bytes sized byval params my be
"packed" with alignment 4 in some cases.

llvm-svn: 182237
2013-05-20 08:01:34 +00:00
Stepan Dyatkovskiy
fd08e5fdd4 Refactoring patch.
1. VarArgStyleRegisters: functionality that emits "store" instructions for byval regs moved out into separated method "StoreByValRegs". Before this patch VarArgStyleRegisters had confused use-cases. It was used for both variadic functions and for regular functions with byval parameters. In last case it created new stack-frame and registered it as VarArg frame, that is wrong.

This patch replaces VarArgsStyleRegisters usage for byval parameters with StoreByValRegs method.

2. In ARMMachineFunctionInfo, "get/setVarArgsRegSaveSize" was renamed to "get/setArgRegsSaveSize". By the same reason. Sometimes it was used for variadic functions, and sometimes for byval parameters in regular functions. Actually, this property means the size of registers, that keeps arguments, and thats why it was renamed.

3. In ARMISelLowering.cpp, ARMTargetLowering class, in methods computeRegArea and StoreByValRegs, VARegXXXXXX was renamed to ArgRegsXXXXXX still by the same reasons.

llvm-svn: 180774
2013-04-30 07:19:58 +00:00
Eli Bendersky
37f247b8d8 Move the eliminateCallFramePseudoInstr method from TargetRegisterInfo
to TargetFrameLowering, where it belongs. Incidentally, this allows us
to delete some duplicated (and slightly different!) code in TRI.

There are potentially other layering problems that can be cleaned up
as a result, or in a similar manner.

The refactoring was OK'd by Anton Korobeynikov on llvmdev.

Note: this touches the target interfaces, so out-of-tree targets may
be affected.

llvm-svn: 175788
2013-02-21 20:05:00 +00:00
Logan Chien
740a4514e2 Fix thumbv5e frame lowering assertion failure.
It is possible that frame pointer is not found in the
callee saved info, thus FramePtrSpillFI may be incorrect
if we don't check the result of hasFP(MF).

Besides, if we enable the stack coloring algorithm, there
will be an assertion to ensure the slot is live.  But in
the test case, %var1 is not live in the prologue of the
function, and we will get the assertion failure.

Note: There is similar code in ARMFrameLowering.cpp.
llvm-svn: 175616
2013-02-20 12:21:33 +00:00
Jakob Stoklund Olesen
c81d04b28d Add an MF argument to MI::copyImplicitOps().
This function is often used to decorate dangling instructions, so a
context reference is required to allocate memory for the operands.

Also add a corresponding MachineInstrBuilder method.

llvm-svn: 170797
2012-12-20 22:54:02 +00:00
Craig Topper
0534d071b7 Reorder includes to match coding standards. Fix an issue or two exposed by that.
llvm-svn: 152978
2012-03-17 07:33:42 +00:00
Craig Topper
585b4225c3 Use uint16_t to store registers in callee saved register tables to reduce size of static data.
llvm-svn: 151996
2012-03-04 03:33:22 +00:00
Jia Liu
b077b6085d Emacs-tag and some comment fix for all ARM, CellSPU, Hexagon, MBlaze, MSP430, PPC, PTX, Sparc, X86, XCore.
llvm-svn: 150878
2012-02-18 12:03:15 +00:00
Evan Cheng
6a08916ce2 Don't forget to transfer implicit uses of return instruction.
llvm-svn: 147752
2012-01-08 20:41:16 +00:00
Evan Cheng
4009ad2e33 Copy implicit defs (e.g. r0) when changing tBX_RET to tPOP_RET. This bug is
exposed with an upcoming change will would delete the copy to return register
because there is no use! It's amazing anything works.

llvm-svn: 147715
2012-01-07 02:55:54 +00:00
Evan Cheng
4e60b65bc6 Fix more places which should be checking for iOS, not darwin.
llvm-svn: 147513
2012-01-04 01:55:04 +00:00
Chad Rosier
38661ab3ce Revert 142337. Thumb1 still doesn't support dynamic stack realignment. :(
llvm-svn: 142557
2011-10-20 00:07:12 +00:00
Chad Rosier
eb469f466b Add support for dynamic stack realignment when in thumb1 mode.
rdar://10288916

llvm-svn: 142337
2011-10-18 05:28:00 +00:00
Chad Rosier
42a60003a1 Thumb1 does not support dynamic stack realignment.
rdar://10288916 is tracking this fix.

In the past, instcombine and other passes were promoting alloca alignment past
the natural alignment, resulting in dynamic stack realignment.  Lang's work now
prevents this from happening (LLVM commit r141599).  Now that this really 
shouldn't happen report a fatal error rather than silently generate bad code.

llvm-svn: 142028
2011-10-15 00:28:24 +00:00
Chad Rosier
2cb0d1eddf Revert r140924 "Attempt to fix dynamic stack realignment for thumb1 functions."
to appease nightly testers.  Not quite there yet.

llvm-svn: 140953
2011-10-01 19:30:36 +00:00
Chad Rosier
ff29430882 Attempt to fix dynamic stack realignment for thumb1 functions. It is in fact
useful if an optimization assumes the stack has been realigned.  Credit to
Eli for his assistance.
rdar://10043857

llvm-svn: 140924
2011-10-01 02:03:18 +00:00
Jim Grosbach
74f96e7f3c Tidy up a few 80 column violations.
llvm-svn: 139636
2011-09-13 20:30:37 +00:00
Jim Grosbach
b33129ebad Thumb1 ADD/SUB SP instructions are predicable in Thumb2 mode.
Add the predicate operand to the instructions. Update the back end
accordingly where the instructions are used. Restrict the SP operands
to actually only be SP, as otherwise these break assembly parsing for the
normal instruction variants.

llvm-svn: 138445
2011-08-24 17:46:13 +00:00
Jim Grosbach
2b8103505a Make tBX_RET and tBX_RET_vararg predicable.
The normal tBX instruction is predicable, so there's no reason the
pseudos for using it as a return shouldn't be. Gives us some nice code-gen
improvements as can be seen by the test changes. In particular, several
tests now have to disable if-conversion because it works too well and defeats
the test.

llvm-svn: 134746
2011-07-08 21:50:04 +00:00
Jim Grosbach
351dcca2cb Refact ARM Thumb1 tMOVr instruction family.
Merge the tMOVr, tMOVgpr2tgpr, tMOVtgpr2gpr, and tMOVgpr2gpr instructions
into tMOVr. There's no need to keep them separate. Giving the tMOVr
instruction the proper GPR register class for its operands is sufficient
to give the register allocator enough information to do the right thing
directly.

llvm-svn: 134204
2011-06-30 23:38:17 +00:00
Jim Grosbach
32d3b2625b Thumb1 register to register MOV instruction is predicable.
Fix a FIXME and allow predication (in Thumb2) for the T1 register to
register MOV instructions. This allows some better codegen with
if-conversion (as seen in the test updates), plus it lays the groundwork
for pseudo-izing the tMOVCC instructions.

llvm-svn: 134197
2011-06-30 22:10:46 +00:00
Jim Grosbach
6dd5433c5c Refactor away tSpill and tRestore pseudos in ARM backend.
The tSpill and tRestore instructions are just copies of the tSTRspi and
tLDRspi instructions, respectively. Just use those directly instead.

llvm-svn: 134092
2011-06-29 20:26:39 +00:00
Jim Grosbach
3713c9ddfa Fix coordination for using R4 in Thumb1 as a scratch for SP restore.
The logic for reserving R4 for use as a scratch needs to match that for
actually using it. Also, it's not necessary for immediate <=508, so adjust
the value checked.

llvm-svn: 132934
2011-06-13 21:18:25 +00:00
Anton Korobeynikov
d8873d31a8 Implement frame unwinding information emission for Thumb1. Not finished yet because there is no way given the constpool index to examine the actual entry: the reason is clones inserted by constant island pass, which are not tracked at all! The only connection is done during asmprinting time via magic label names which is really gross and needs to be eventually fixed.
llvm-svn: 127104
2011-03-05 18:43:50 +00:00
Anton Korobeynikov
917ca94111 Preliminary support for ARM frame save directives emission via MI flags.
This is just very first approximation how the stuff should be done
(e.g. ARM-only for now). More to follow.

llvm-svn: 127101
2011-03-05 18:43:32 +00:00
Jakob Stoklund Olesen
0f2b9d9dc4 Teach frame lowering to ignore debug values after the terminators.
llvm-svn: 123399
2011-01-13 21:28:52 +00:00