algorithm when assigning EnumValues to the synthesized registers.
The current algorithm, LessRecord, uses the StringRef compare_numeric
function. This function compares strings, while handling embedded numbers.
For example, the R600 backend registers are sorted as follows:
T1
T1_W
T1_X
T1_XYZW
T1_Y
T1_Z
T2
T2_W
T2_X
T2_XYZW
T2_Y
T2_Z
In this example, the 'scaling factor' is dEnum/dN = 6 because T0, T1, T2
have an EnumValue offset of 6 from one another. However, in other parts
of the register bank, the scaling factors are different:
dEnum/dN = 5:
KC0_128_W
KC0_128_X
KC0_128_XYZW
KC0_128_Y
KC0_128_Z
KC0_129_W
KC0_129_X
KC0_129_XYZW
KC0_129_Y
KC0_129_Z
The diff lists do not work correctly because different kinds of registers have
different 'scaling factors'. This new algorithm, LessRecordRegister, tries to
enforce a scaling factor of 1. For example, the registers are now sorted as
follows:
T1
T2
T3
...
T0_W
T1_W
T2_W
...
T0_X
T1_X
T2_X
...
KC0_128_W
KC0_129_W
KC0_130_W
...
For the Mips and R600 I see a 19% and 6% reduction in size, respectively. I
did see a few small regressions, but the differences were on the order of a
few bytes (e.g., AArch64 was 16 bytes). I suspect there will be even
greater wins for targets with larger register files.
Patch reviewed by Jakob.
rdar://14006013
llvm-svn: 185094
NOTE: If this broke your out-of-tree backend, in *RegisterInfo.td, change
the instances of SubRegIndex that have a comps template arg to use the
ComposedSubRegIndex class instead.
In TableGen land, this adds Size and Offset attributes to SubRegIndex,
and the ComposedSubRegIndex class, for which the Size and Offset are
computed by TableGen. This also adds an accessor in MCRegisterInfo, and
Size/Offsets for the X86 and ARM subreg indices.
llvm-svn: 183020
The size reduction in the RegDiffLists are rather dramatic. Here are a few
size differences for MCTargetDesc.o files (before and after) in bytes:
R600 - 36160B - 11184B - 69% reduction
ARM - 28480B - 8368B - 71% reduction
Mips - 816B - 576B - 29% reduction
One side effect of dynamically computing the aliases is that the iterator does
not guarantee that the entries are ordered or that duplicates have been removed.
The documentation implies this is a safe assumption and I found no clients that
requires these attributes (i.e., strict ordering and uniqueness).
My local LNT tester results showed no execution-time failures or significant
compile-time regressions (i.e., beyond what I would consider noise) for -O0g,
-O2 and -O3 runs on x86_64 and i386 configurations.
rdar://12906217
llvm-svn: 182783
This lane mask provides information about which register lanes
completely cover super-registers. See the block comment before
getCoveringLanes().
llvm-svn: 182034
At build-time register pressure was always computed in terms of
register units. But the compile-time API was expressed in terms of
register classes because it was intended for virtual registers (and
physical register units weren't yet used anywhere in codegen).
Now that the codegen uses physreg units consistently, prepare for
tracking register pressure also in terms of live units, not live
registers.
llvm-svn: 169360
Most places can use PrintFatalError as the unwinding mechanism was not
used for anything other than printing the error. The single exception
was CodeGenDAGPatterns.cpp, where intermediate errors during type
resolution were ignored to simplify incremental platform development.
This use is replaced by an error flag in TreePattern and bailout earlier
in various places if it is set.
llvm-svn: 166712
Sub-register lane masks are bitmasks that can be used to determine if
two sub-registers of a virtual register will overlap. For example, ARM's
ssub0 and ssub1 sub-register indices don't overlap each other, but both
overlap dsub0 and qsub0.
The lane masks will be accurate on most targets, but on targets that use
sub-register indexes in an irregular way, the masks may conservatively
report that two sub-register indices overlap when the eventually
allocated physregs don't.
Irregular register banks also mean that the bits in a lane mask can't be
mapped onto register units, but the concept is similar.
llvm-svn: 163630
Preserve the Composites map in the CodeGenSubRegIndex class so it can be
used to determine which sub-register indices can actually be composed.
llvm-svn: 163629
When reporting an error for a defm, we would previously only report the
location of the outer defm, which is not always where the error is.
Now we also print the location of the expanded multiclass defs:
lib/Target/X86/X86InstrSSE.td:2902:12: error: foo
defm ADD : basic_sse12_fp_binop_s<0x58, "add", fadd, SSE_ALU_ITINS_S>,
^
lib/Target/X86/X86InstrSSE.td:2801:11: note: instantiated from multiclass
defm PD : sse12_fp_packed<opc, !strconcat(OpcodeStr, "pd"), OpNode, VR128,
^
lib/Target/X86/X86InstrSSE.td:194:5: note: instantiated from multiclass
def rm : PI<opc, MRMSrcMem, (outs RC:$dst), (ins RC:$src1, x86memop:$src2),
^
llvm-svn: 162409
TableGen sometimes synthesizes missing sub-register indexes. Emit these
indexes as enumerators in the target namespace along with the
user-defined ones.
Also take this opportunity to stop creating new Record objects for
synthetic indexes.
llvm-svn: 161964
Now that the weird X86 sub_ss and sub_sd sub-register indexes are gone,
there is no longer a need for the CompositeIndices construct in .td
files. Sub-register index composition can be specified on the
SubRegIndex itself using the ComposedOf field.
Also enforce unique names for sub-registers in TableGen. The same
sub-register cannot be available with multiple sub-register indexes.
llvm-svn: 160842
Register units are already used internally in TableGen to compute
register pressure sets and overlapping registers. This patch makes them
available to the code generators.
The register unit lists are differentially encoded so they can be reused
for many related registers. This keeps the total size of the lists below
200 bytes for most targets. ARM has the largest table at 560 bytes.
Add an MCRegUnitIterator for traversing the register unit lists. It
provides an abstract interface so the representation can be changed in
the future without changing all clients.
llvm-svn: 157650
TableGen already computes register units as the basic unit of
interference. We can use that to compute the set of overlapping
registers.
This means that we can easily compute overlap sets for one register at a
time. There is no benefit to computing all registers at once.
llvm-svn: 156960
Besides the weight, we also want to store up to two root registers per
unit. Most units will have a single root, the leaf register they
represent. Units created for ad hoc aliasing get two roots: The two
aliasing registers.
The root registers can be used to compute the set of overlapping
registers.
llvm-svn: 156792
Register units can be used to compute if two registers overlap:
A overlaps B iff units(A) intersects units(B).
With this change, the above holds true even on targets that use ad hoc
aliasing (currently only ARM). This means that register units can be
used to implement regsOverlap() more efficiently, and the register
allocator can use the concept to model interference.
When there is no ad hoc aliasing, the register units correspond to the
maximal cliques in the register overlap graph. This is optimal, no other
register unit assignment can have fewer units.
With ad hoc aliasing, weird things are possible, and we don't try too
hard to compute the maximal cliques. The current approach is always
correct, and it works very well (probably optimally) as long as the ad
hoc aliasing doesn't have cliques larger than pairs. It seems unlikely
that any target would need more.
llvm-svn: 156763
The ad hoc aliasing specified in the 'Aliases' list in .td files is
currently only used by computeOverlaps(). It will soon be needed to
build accurate register units as well, so build the undirected graph in
CodeGenRegister::buildObjectGraph() instead.
Aliasing is a symmetric relationship with only one direction specified
in the .td files. Make sure both directions are represented in
getExplicitAliases().
llvm-svn: 156762
TableGen creates new register classes and sub-register indices based on
the sub-register structure present in the register bank. So far, it has
been doing that on a per-register basis, but that is not very efficient.
This patch teaches TableGen to compute topological signatures for
registers, and use that to reduce the amount of redundant computation.
Registers get the same TopoSig if they have identical sub-register
structure.
TopoSigs are not currently exposed outside TableGen.
llvm-svn: 156761
Don't compute the SuperRegs list until the sub-register graph is
completely finished. This guarantees that the list of super-registers is
properly topologically ordered, and has no duplicates.
llvm-svn: 156629
The sub-registers explicitly listed in SubRegs in the .td files form a
tree. In a complicated register bank, it is possible to have
sub-register relationships across sub-trees. For example, the ARM NEON
double vector Q0_Q1 is a tree:
Q0_Q1 = [Q0, Q1], Q0 = [D0, D1], Q1 = [D2, D3]
But we also define the DPair register D1_D2 = [D1, D2] which is fully
contained in Q0_Q1.
This patch teaches TableGen to find such sub-register relationships, and
assign sub-register indices to them. In the example, TableGen will
create a dsub_1_dsub_2 sub-register index, and add D1_D2 as a
sub-register of Q0_Q1.
This will eventually enable the coalescer to handle copies of skewed
sub-registers.
llvm-svn: 156587
The .td files specify a tree of sub-registers. Store that tree as
ExplicitSubRegs lists in CodeGenRegister instead of extracting it from
the Record when needed.
llvm-svn: 156555
This mapping is for internal use by TableGen. It will not be exposed in
the generated files.
Unfortunately, the mapping is not completely well-defined. The X86 xmm
registers appear with multiple sub-register indices in the ymm
registers. This is because of the odd idempotent sub_sd and sub_ss
sub-register indices. I hope to be able to eliminate them entirely, so
we can require the sub-registers to form a tree.
For now, just place the canonical sub_xmm index in the mapping, and
ignore the idempotents.
llvm-svn: 156519
This is still a topological ordering such that every register class gets
a smaller enum value than its sub-classes.
Placing the smaller spill sizes first makes a difference for the
super-register class bit masks. When looking for a super-register class,
we usually want the smallest possible kind of super-register. That is
now available as the first bit set in the bit mask.
llvm-svn: 156222
This manually enumerated list of super-register classes has been
superceeded by the automatically computed super-register class masks
available through SuperRegClassIterator.
llvm-svn: 156151
This is a new algorithm that finds sets of register units that can be
used to model registers pressure. This handles arbitrary, overlapping
register classes. Each register class is associated with a (small)
list of pressure sets. These are the dimensions of pressure affected
by the register class's liveness.
llvm-svn: 154374
This is a new algorithm that associates registers with weighted
register units to accuretely model their effect on register
pressure. This handles registers with multiple overlapping
subregisters. It is possible, but almost inconceivable that the
algorithm fails to find an exact solution for a target description. If
an exact solution cannot be found, an inexact, but reasonable solution
will be chosen.
llvm-svn: 154373