This exposes a method in MachineFrameInfo that calculates
MaxCallFrameSize and calls it after instruction selection in the ARM
target.
This avoids
ARMBaseRegisterInfo::canRealignStack()/ARMFrameLowering::hasReservedCallFrame()
giving different answers in early/late phases of codegen.
The testcase shows a particular nasty example result of that where we
would fail to properly align an alloca.
Differential Revision: https://reviews.llvm.org/D32622
llvm-svn: 302303
This tracks whether MaxCallFrameSize is computed yet. Ideally we would
assert and fail when the value is queried before it is computed, however
this fails various targets that need to be fixed first.
Differential Revision: https://reviews.llvm.org/D32570
llvm-svn: 301851
Summary:
Avoids tons of prologue boilerplate when arguments are passed in memory
and left in memory. This can happen in a debug build or in a release
build when an argument alloca is escaped. This will dramatically affect
the code size of x86 debug builds, because X86 fast isel doesn't handle
arguments passed in memory at all. It only handles the x86_64 case of up
to 6 basic register parameters.
This is implemented by analyzing the entry block before ISel to identify
copy elision candidates. A copy elision candidate is an argument that is
used to fully initialize an alloca before any other possibly escaping
uses of that alloca. If an argument is a copy elision candidate, we set
a flag on the InputArg. If the the target generates loads from a fixed
stack object that matches the size and alignment requirements of the
alloca, the SelectionDAG builder will delete the stack object created
for the alloca and replace it with the fixed stack object. The load is
left behind to satisfy any remaining uses of the argument value. The
store is now dead and is therefore elided. The fixed stack object is
also marked as mutable, as it may now be modified by the user, and it
would be invalid to rematerialize the initial load from it.
Supersedes D28388
Fixes PR26328
Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans
Subscribers: igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D29668
llvm-svn: 296683
MachineFrameInfo does not need to be able to distinguish between the
user asking us not to realign the stack and the target telling us it
doesn't support stack realignment. Either way, fixed stack objects have
their alignment clamped.
llvm-svn: 265971
Summary:
After we make the adjustment, we can assume that for local allocas, but
not for stack parameters, the return address, or any other fixed stack
object (which has a negative offset and therefore lies prior to the
adjusted SP).
Fixes PR26662.
Reviewers: hfinkel, qcolombet, rnk
Subscribers: rnk, llvm-commits
Differential Revision: http://reviews.llvm.org/D18471
llvm-svn: 265886
In the code below on 32-bit targets, x would previously get forwarded to g()
without sign-extension to 32 bits as required by the parameter attribute.
void g(signed short);
void f(unsigned short x) {
g(x);
}
llvm-svn: 262352
We rely on HasOpaqueSPAdjustment not changing after we've calculated
things based on it. Things like whether or not we can use 'rep;movs' to
copy bytes around, that sort of thing. If it changes, invariants in the
backend will quietly break. This situation arose when we had a call to
memcpy *and* a COPY of the FLAGS register where we would attempt to
reference local variables using %esi, a register that was clobbered by
the 'rep;movs'.
This fixes PR26124.
llvm-svn: 257730
Teach the statepoint lowering code to emit Indirect stackmap entries for spill inserted by StatepointLowering (i.e. SelectionDAG), but Direct stackmap entries for in-IR allocas which represent manual stack slots. This is what the docs call for (http://llvm.org/docs/StackMaps.html#stack-map-format), but we've been emitting both as Direct. This was pointed out recently on the mailing list as a bug. It also blocks http://reviews.llvm.org/D15632 which extends the lowering to handle vector-of-pointers since only Indirect references can encode a variable sized slot.
To implement this, I introduced a new flag on the StackObject class used to maintian information about stack slots. I original considered (and prototyped in http://reviews.llvm.org/D15632), the idea of using the existing isSpillSlot flag, but end up deciding that was a bit too risky and that the cost of adding a new flag was low. Having the new flag will also allow us - in the future - to emit better comments in verbose assembly which indicate where a particular stack spill around a call comes from. (deopt, gc, regalloc).
Differential Revision: http://reviews.llvm.org/D15759
llvm-svn: 256352
This patch implements dynamic realignment of stack objects for targets
with a non-realigned stack pointer. Behaviour in FunctionLoweringInfo
is changed so that for a target that has StackRealignable set to
false, over-aligned static allocas are considered to be variable-sized
objects and are handled with DYNAMIC_STACKALLOC nodes.
It would be good to group aligned allocas into a single big alloca as
an optimization, but this is yet todo.
SystemZ benefits from this, due to its stack frame layout.
New tests SystemZ/alloca-03.ll for aligned allocas, and
SystemZ/alloca-04.ll for "no-realign-stack" attribute on functions.
Review and help from Ulrich Weigand and Hal Finkel.
llvm-svn: 254227
function.
This was the same as getFrameIndexReference, but without the FrameReg
output.
Differential Revision: http://reviews.llvm.org/D12042
llvm-svn: 245148
The incoming EBP value points to the end of a local stack allocation, so
we can use that to restore ESI, the base pointer. Once we do that, we
can use local stack allocations. If we know we need stack realignment,
spill the original frame pointer in the prologue and reload it after
restoring ESI.
llvm-svn: 241648
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
-checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
llvm/lib/
Thanks to Eugene Kosov for the original patch!
llvm-svn: 240137
About pristine regsiters:
Pristine registers "hold a value that is useless to the current
function, but that must be preserved - they are callee saved registers
that have not been saved." This concept saves compile time as it frees
the prologue/epilogue inserter from adding every such register to every
basic blocks live-in list.
However the current code in getPristineRegs is formulated in a
complicated way: Inside the function prologue and epilogue all callee
saves are considered pristine, while in the rest of the code only the
non-saved ones are considered pristine. This requires logic to
differentiate between prologue/epilogue and the rest and in the presence
of shrink-wrapping this even becomes complicated/expensive. It's also
unnecessary because the prologue epilogue inserters already mark
callee-save registers that are saved/restores properly in the respective
blocks in the prologue/epilogue (see updateLiveness() in
PrologueEpilogueInserter.cpp). So only declaring non-saved/restored
callee saved registers as pristine just works.
Differential Revision: http://reviews.llvm.org/D10101
llvm-svn: 238524
The code that builds the dependence graph assumes that two PseudoSourceValues
don't alias. In a tail calling function two FixedStackObjects might refer to the
same location. Worse 'immutable' fixed stack objects like function arguments are
not immutable and will be clobbered.
Change this so that a load from a FixedStackObject is not invariant in a tail
calling function and don't return a PseudoSourceValue for an instruction in tail
calling functions when building the dependence graph so that we handle function
arguments conservatively.
Fix for PR23459.
rdar://20740035
llvm-svn: 236916
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.
As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.
** Context **
Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.
** Motivating example **
Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b) {
%tmp = alloca i32, align 4
%tmp2 = icmp slt i32 %a, %b
br i1 %tmp2, label %true, label %false
true:
store i32 %a, i32* %tmp, align 4
%tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
br label %false
false:
%tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
ret i32 %tmp.0
}
On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f: ; @f
; BB#0:
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
LBB0_2: ; %false
mov sp, x29
ldp x29, x30, [sp], #16
ret
With shrink-wrapping we could generate:
_f: ; @f
; BB#0:
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
add sp, x29, #16 ; =16
ldp x29, x30, [sp], #16
LBB0_2: ; %false
ret
Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.
** Proposed Solution **
This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.
Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.
The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.
Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.
** Design Decisions **
1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.
Differential Revision: http://reviews.llvm.org/D9210
<rdar://problem/3201744>
llvm-svn: 236507
We don't need to represent UnwindHelp in IR. Instead, we can use the
knowledge that we are emitting the parent function to decide if we
should create the UnwindHelp stack object.
llvm-svn: 234061
These intrinsics allow multiple functions to share a single stack
allocation from one function's call frame. The function with the
allocation may only perform one allocation, and it must be in the entry
block.
Functions accessing the allocation call llvm.recoverframeallocation with
the function whose frame they are accessing and a frame pointer from an
active call frame of that function.
These intrinsics are very difficult to inline correctly, so the
intention is that they be introduced rarely, or at least very late
during EH preparation.
Reviewers: echristo, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D6493
llvm-svn: 225746
Summary:
If a variadic function body contains a musttail call, then we copy all
of the remaining register parameters into virtual registers in the
function prologue. We track the virtual registers through the function
body, and add them as additional registers to pass to the call. Because
this is all done in virtual registers, the register allocator usually
gives us good code. If the function does a call, however, it will have
to spill and reload all argument registers (ew).
Forwarding regparms on x86_32 is not implemented because most compilers
don't support varargs in 32-bit with regparms.
Reviewers: majnemer
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D5060
llvm-svn: 216780
There's no need to do this if the user doesn't call va_start. In the
future, we're going to have thunks that forward these register
parameters with musttail calls, and they won't need these spills for
handling va_start.
Most of the test suite changes are adding va_start calls to existing
tests to keep things working.
llvm-svn: 216294
We used to assume that any fixed-offset stack object was not aliased. This
meant that no IR value could point to the memory contained in such an object.
This is a reasonable default, but is not a universally-correct
target-independent fact. For example, on PowerPC (both Darwin and non-Darwin),
some byval arguments are allocated at fixed offsets by the ABI. These, however,
certainly can be pointed to by IR values. This change moves the 'isAliased'
logic out of FixedStackPseudoSourceValue and into MFI, and allows the isAliased
property to be overridden for fixed-offset objects.
This will be used by an upcoming commit to the PowerPC backend to fix PR20280.
No functionality change intended (the behavior of
FixedStackPseudoSourceValue::isAliased has been made more conservative for
callers that don't pass an MFI object, but I don't see any in-tree callers that
do that).
llvm-svn: 215794
--
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI. It handles all corner cases (I hope), including stack
realignment.
Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.
Patch by Vadim Chugunov!
Reviewed By: rnk
Differential Revision: http://reviews.llvm.org/D4081
llvm-svn: 211691
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI. It handles all corner cases (I hope), including stack
realignment.
Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.
Patch by Vadim Chugunov!
Reviewed By: rnk
Differential Revision: http://reviews.llvm.org/D4081
llvm-svn: 211399
selection dag (PR19012)
In X86SelectionDagInfo::EmitTargetCodeForMemcpy we check with MachineFrameInfo
to make sure that ESI isn't used as a base pointer register before we choose to
emit rep movs (which clobbers esi).
The problem is that MachineFrameInfo wouldn't know about dynamic allocas or
inline asm that clobbers the stack pointer until SelectionDAGBuilder has
encountered them.
This patch fixes the problem by checking for such things when building the
FunctionLoweringInfo.
Differential Revision: http://llvm-reviews.chandlerc.com/D2954
llvm-svn: 202930
This changes the MachineFrameInfo API to use the new SSPLayoutKind information
produced by the StackProtector pass (instead of a boolean flag) and updates a
few pass dependencies (to preserve the SSP analysis).
The stack layout follows the same approach used prior to this change - i.e.,
only LargeArray stack objects will be placed near the canary and everything
else will be laid out normally. After this change, structures containing large
arrays will also be placed near the canary - a case previously missed by the
old implementation.
Out of tree targets will need to update their usage of
MachineFrameInfo::CreateStackObject to remove the MayNeedSP argument.
The next patch will implement the rules for sspstrong and sspreq. The end goal
is to support ssp-strong stack layout rules.
WIP.
Differential Revision: http://llvm-reviews.chandlerc.com/D2158
llvm-svn: 197653
This optional register liveness analysis pass can be enabled with either
-enable-stackmap-liveness, -enable-patchpoint-liveness, or both. The pass
traverses each basic block in a machine function. For each basic block the
instructions are processed in reversed order and if a patchpoint or stackmap
instruction is encountered the current live-out register set is encoded as a
register mask and attached to the instruction.
Later on during stackmap generation the live-out register mask is processed and
also emitted as part of the stackmap.
This information is optional and intended for optimization purposes only. This
will enable a client of the stackmap to reason about the registers it can use
and which registers need to be preserved.
Reviewed by Andy
llvm-svn: 197317
This reverts commit r197254.
This was an accidental merge of Juergen's patch. It will be checked in
shortly, but wasn't meant to go in quite yet.
Conflicts:
include/llvm/CodeGen/StackMaps.h
lib/CodeGen/StackMaps.cpp
test/CodeGen/X86/stackmap-liveness.ll
llvm-svn: 197260
This re-lands commit r196876, which was reverted in r196879.
The tests have been fixed to pass on platforms with a stack alignment
larger than 4.
Update to clang side tests will land shortly.
llvm-svn: 196939
For stack frames requiring realignment, three pointers may be needed:
- ebp to address incoming arguments
- esi (could be any callee-saved register) to address locals
- esp to address outgoing arguments
We would use esi unconditionally without verifying that it did not
conflict with inline assembly.
This change doesn't do the verification, it simply emits a fatal error
on functions that use stack realignment, dynamic SP adjustments, and
inline assembly.
Because stack realignment is common on Windows, we also no longer assume
that MS inline assembly clobbers esp. Instead, we analyze the inline
instructions for implicit definitions and check if esp is there. If so,
we require the use of a base pointer and consider it in the condition
above.
Mostly fixes PR16830, but we could try harder to find a non-conflicting
base pointer.
Reviewers: sunfish
Differential Revision: http://llvm-reviews.chandlerc.com/D1317
llvm-svn: 196876
This is a generic function (derived from PEI); moving it into
MachineFrameInfo eliminates a current redundancy between the ARM and AArch64
backends, and will allow it to be used by the PowerPC target code.
No functionality change intended.
llvm-svn: 177111
Aside from the question of whether we report a warning or an error when we
can't satisfy a requested stack object alignment, the current implementation
of this is not good. We're not providing any source location in the diagnostics
and the current warning is not connected to any warning group so you can't
control it. We could improve the source location somewhat, but we can do a
much better job if this check is implemented in the front-end, so let's do that
instead. <rdar://problem/13127907>
llvm-svn: 174741
requirement when creating stack objects in MachineFrameInfo.
Add CreateStackObjectWithMinAlign to throw error when the minimal alignment
can't be achieved and to clamp the alignment when the preferred alignment
can't be achieved. Same is true for CreateVariableSizedObject.
Will not emit error in CreateSpillStackObject or CreateStackObject.
As long as callers of CreateStackObject do not assume the object will be
aligned at the requested alignment, we should not have miscompile since
later optimizations which look at the object's alignment will have the correct
information.
rdar://12713765
llvm-svn: 172027
the alignment is clamped to TargetFrameLowering.getStackAlignment if the target
does not support stack realignment or the option "realign-stack" is off.
This will cause miscompile if the address is treated as aligned and add is
replaced with or in DAGCombine.
Added a bool StackRealignable to TargetFrameLowering to check whether stack
realignment is implemented for the target. Also added a bool RealignOption
to MachineFrameInfo to check whether the option "realign-stack" is on.
rdar://12713765
llvm-svn: 169197