Instead of awkwardly encoding calling-convention information with ISD::CALL,
ISD::FORMAL_ARGUMENTS, ISD::RET, and ISD::ARG_FLAGS nodes, TargetLowering
provides three virtual functions for targets to override:
LowerFormalArguments, LowerCall, and LowerRet, which replace the custom
lowering done on the special nodes. They provide the same information, but
in a more immediately usable format.
This also reworks much of the target-independent tail call logic. The
decision of whether or not to perform a tail call is now cleanly split
between target-independent portions, and the target dependent portion
in IsEligibleForTailCallOptimization.
This also synchronizes all in-tree targets, to help enable future
refactoring and feature work.
llvm-svn: 78142
When LowerExtract eliminates an EXTRACT_SUBREG with a kill flag, it moves the
kill flag to the place where the sub-register is killed. This can accidentally
overlap with the use of a sibling sub-register, and we have trouble.
In the test case we have this code:
Live Ins: %R0 %R1 %R2
%R2L<def> = EXTRACT_SUBREG %R2<kill>, 1
%R2H<def> = LOAD16fi <fi#-1>, 0, Mem:LD(2,4) [FixedStack-1 + 0]
%R1L<def> = EXTRACT_SUBREG %R1<kill>, 1
%R0L<def> = EXTRACT_SUBREG %R0<kill>, 1
%R0H<def> = ADD16 %R2H<kill>, %R2L<kill>, %AZ<imp-def>, %AN<imp-def>, %AC0<imp-def>, %V<imp-def>, %VS<imp-def>
subreg: CONVERTING: %R2L<def> = EXTRACT_SUBREG %R2<kill>, 1
subreg: eliminated!
subreg: killed here: %R0H<def> = ADD16 %R2H, %R2L, %R2<imp-use,kill>, %AZ<imp-def>, %AN<imp-def>, %AC0<imp-def>, %V<imp-def>, %VS<imp-def>
The kill flag on %R2 is moved to the last instruction, and the live range overlaps with the definition of %R2H:
*** Bad machine code: Redefining a live physical register ***
- function: f
- basic block: 0x18358c0 (#0)
- instruction: %R2H<def> = LOAD16fi <fi#-1>, 0, Mem:LD(2,4) [FixedStack-1 + 0]
Register R2H was defined but already live.
The fix is to replace EXTRACT_SUBREG with IMPLICIT_DEF instead of eliminating
it completely:
subreg: CONVERTING: %R2L<def> = EXTRACT_SUBREG %R2<kill>, 1
subreg: replace by: %R2L<def> = IMPLICIT_DEF %R2<kill>
Note that these IMPLICIT_DEF instructions survive to the asm output. It is
necessary to fix the stack-color-with-reg test case because of that.
llvm-svn: 78093
killed by another operand.
There is probably a better fix. Either 1) scavenger can look at other operands, or
2) livevariables can be smarter about kill markers. Patches welcome.
llvm-svn: 78072
When LowerSubregsInstructionPass::LowerInsert eliminates an INSERT_SUBREG
instriction because it is an identity copy, make sure that the same registers
are alive before and after the elimination.
When the super-register is marked <undef> this requires inserting an
IMPLICIT_DEF instruction to make sure the super register is live.
Fix a related bug where a kill flag on the inserted sub-register was not transferred properly.
Finally, clear the undef flag in MachineInstr::addRegisterKilled. Undef implies dead and kill implies live, so they cant both be valid.
llvm-svn: 77989
This is not just a matter of passing in the target triple from the module;
currently backends are making decisions based on the build and host
architecture. The goal is to migrate to making these decisions based off of the
triple (in conjunction with the feature string). Thus most clients pass in the
target triple, or the host triple if that is empty.
This has one important change in the way behavior of the JIT and llc.
For the JIT, it was previously selecting the Target based on the host
(naturally), but it was setting the target machine features based on the triple
from the module. Now it is setting the target machine features based on the
triple of the host.
For LLC, -march was previously only used to select the target, the target
machine features were initialized from the module's triple (which may have been
empty). Now the target triple is taken from the module, or the host's triple is
used if that is empty. Then the triple is adjusted to match -march.
The take away is that -march for llc is now used in conjunction with the host
triple to initialize the subtarget. If users want more deterministic behavior
from llc, they should use -mtriple, or set the triple in the input module.
llvm-svn: 77946
__builtin_bfin_ones does the same as ctpop, so it can be implemented in the front-end.
__builtin_bfin_loadbytes loads from an unaligned pointer with the disalignexcpt instruction. It does the same as loading from a pointer with the low bits masked. It is better if the front-end creates a masked load. We can always instruction select the masked to disalignexcpt+load.
We keep csync/ssync/idle. These intrinsics represent instructions that need workarounds for some silicon revisions. We may even want to convert inline assembler to intrinsics to enable the workarounds.
llvm-svn: 77917
Allow imp-def and imp-use of anything in the scavenger asserts, just like the machine code verifier.
Allow redefinition of a sub-register of a live register.
llvm-svn: 77904
to:
.quad X
even on a 32-bit system, where X is not 64-bits. There isn't much that
we can do here, so we just print:
.quad ((X) & 4294967295)
instead.
llvm-svn: 77818
myself because I'm getting tired of seeing the red buildbots, which have
been red since 5:30PM PDT last night.
Proposed supplement to developer policy: committers should make sure to
be around to watch for buildbot failures after committing.
llvm-svn: 77785
instructions for calls since BL and BLX are always 32-bit long and BX is always
16-bit long.
Also, we should be using BLX to call external function stubs.
llvm-svn: 77756
padding is disabled, tabs get replaced by spaces except in the case of
the first operand, where the tab is output to line up the operands after
the mnemonics.
Add some better comments and eliminate redundant code.
Fix some testcases to not assume tabs.
llvm-svn: 77740
into the mergable section if it is one of our special cases. This could
obviously be improved, but this is the minimal fix and restores us to the
previous behavior.
llvm-svn: 77679
When the return value is not used (i.e. only care about the value in the memory), x86 does not have to use add to implement these. Instead, it can use add, sub, inc, dec instructions with the "lock" prefix.
This is currently implemented using a bit of instruction selection trick. The issue is the target independent pattern produces one output and a chain and we want to map it into one that just output a chain. The current trick is to select it into a merge_values with the first definition being an implicit_def. The proper solution is to add new ISD opcodes for the no-output variant. DAG combiner can then transform the node before it gets to target node selection.
Problem #2 is we are adding a whole bunch of x86 atomic instructions when in fact these instructions are identical to the non-lock versions. We need a way to add target specific information to target nodes and have this information carried over to machine instructions. Asm printer (or JIT) can use this information to add the "lock" prefix.
llvm-svn: 77582
wide vectors. Likewise, change VSTn intrinsics to take separate arguments
for each vector in a multi-vector struct. Adjust tests accordingly.
llvm-svn: 77468
- This change also makes it possible to switch between ARM / Thumb on a
per-function basis.
- Fixed thumb2 routine which expand reg + arbitrary immediate. It was using
using ARM so_imm logic.
- Use movw and movt to do reg + imm when profitable.
- Other code clean ups and minor optimizations.
llvm-svn: 77300
and make it more aggressive, we now put:
const int G2 __attribute__((weak)) = 42;
into the text (readonly) segment like gcc, previously we put
it into the data (readwrite) segment.
llvm-svn: 77104
Before:
adr r12, #LJTI3_0_0
ldr pc, [r12, +r0, lsl #2]
LJTI3_0_0:
.long LBB3_24
.long LBB3_30
.long LBB3_31
.long LBB3_32
After:
adr r12, #LJTI3_0_0
add pc, r12, +r0, lsl #2
LJTI3_0_0:
b.w LBB3_24
b.w LBB3_30
b.w LBB3_31
b.w LBB3_32
This has several advantages.
1. This will make it easier to optimize this to a TBB / TBH instruction +
(smaller) table.
2. This eliminate the need for ugly asm printer hack to force the address
into thumb addresses (bit 0 is one).
3. Same codegen for pic and non-pic.
4. This eliminate the need to align the table so constantpool island pass
won't have to over-estimate the size.
Based on my calculation, the later is probably slightly faster as well since
ldr pc with shifter address is very slow. That is, it should be a win as long
as the HW implementation can do a reasonable job of branch predict the second
branch.
llvm-svn: 77024