1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 04:22:57 +02:00
Commit Graph

739 Commits

Author SHA1 Message Date
Krzysztof Parzyszek
9876d68904 [Hexagon] Map ISD::TRAP to J2_trap0(#0)
llvm-svn: 339365
2018-08-09 18:03:45 +00:00
Krzysztof Parzyszek
7a1c2358ef [Hexagon] Diagnose misaligned absolute loads and stores
Differential Revision: https://reviews.llvm.org/D50405

llvm-svn: 339272
2018-08-08 17:00:09 +00:00
Krzysztof Parzyszek
b11835aa7c [Hexagon] Allow use of gather intrinsics even with no-packets
Vgather requires must be in a packet with a store, which contradicts
the no-packets feature. As a consequence, gather/scatter could not be
used with no-packets. Relax this, and allow gather packets as exceptions
to the no-packets requirements.

llvm-svn: 339177
2018-08-07 20:33:47 +00:00
Krzysztof Parzyszek
db3c93e416 [Hexagon] Simplify CFG after atomic expansion
This will remove suboptimal branching from the generated ll/sc loops.
The extra simplification pass affects a lot of testcases, which have
been modified to accommodate this change: either by modifying the
test to become immune to the CFG simplification, or (less preferablt)
by adding option -hexagon-initial-cfg-clenaup=0.

llvm-svn: 338774
2018-08-02 22:17:53 +00:00
Krzysztof Parzyszek
0ac6160f34 [Hexagon] Simplify A4_rcmp[n]eqi R, 0
Consider cases when register R is known to be zero/non-zero, or when it
is defined by a C2_muxii instruction.

llvm-svn: 338251
2018-07-30 14:28:02 +00:00
Krzysztof Parzyszek
ff3fd55327 [Hexagon] Properly scale bit index when extracting elements from vNi1
For example v = <2 x i1> is represented as bbbbaaaa in a predicate register,
where b = v[1], a = v[0]. Extracting v[1] is equivalent to extracting bit 4
from the predicate register.

llvm-svn: 337934
2018-07-25 16:20:59 +00:00
Krzysztof Parzyszek
3552e1c7b3 [Hexagon] Handle unnamed globals in HexagonConstExpr
Instead of comparing names, compare positions in the parent module.

llvm-svn: 337723
2018-07-23 18:30:17 +00:00
Krzysztof Parzyszek
e9a8cec406 [Hexagon] Disable packets in test to avoid ordering issues in checks
llvm-svn: 337624
2018-07-20 21:55:55 +00:00
Krzysztof Parzyszek
d58b08d50c [Hexagon] Avoid introducing calls into coalesced range of HVX vector pairs
If an HVX vector register is to be coalesced into a vector pair, make
sure that the vector pair will not have a function call in its live range,
unless it already had one. All HVX vector registers are volatile, so
any vector register live across a function call will have to be spilled.

If a vector needs to be spilled, and it's coalesced into a vector pair
then the whole pair will need to be spilled (even if only a part of it is
live), taking extra stack space.

llvm-svn: 337073
2018-07-13 23:42:29 +00:00
Krzysztof Parzyszek
51bc3f9232 [Hexagon] Change .mir testcase to make sure function is not in SSA form
If a machine function satisfies SSA, the IsSSA property is assumed even
if the pass to be executed runs after existing from SSA. If the pass
output then does not conform to SSA, a verifier error will be flagged
(with expensive checks enabled).

llvm-svn: 336682
2018-07-10 14:49:54 +00:00
Krzysztof Parzyszek
dcb259429e [Hexagon] Add implicit uses even when untied explicit uses are present
An explicit untied use is not sufficient to maintain liveness of a
register redefined in a predicated instruction. For example
  %1 = COPY %0
  ...
  %1 = A2_paddif %2, %1, 1
could become
  $r1 = COPY $r0
  ...
  $r1 = A2_paddif $p0, $r1, 1
and later
  $r1 = COPY $r0                ;; this is not really dead!
  ...
  $r1 = A2_paddif $p0, $r0, 1

llvm-svn: 336662
2018-07-10 12:57:49 +00:00
Simon Pilgrim
dff7071951 [DAGCombiner] Ensure we use the correct CC result type in visitSDIV (REAPPLIED)
We could get away with it for constant folded cases, but not for rL335719.

Thanks to Krzysztof Parzyszek for noticing.

Reapply original commit rL335821 which was reverted at rL335871 due to a WebAssembly bug that was fixed at rL335884.

llvm-svn: 335886
2018-06-28 17:33:41 +00:00
Haojian Wu
2ef4777db5 Revert "[DAGCombiner] Ensure we use the correct CC result type in visitSDIV"
This reverts commit r335821.

This crashes the webassembly test, run "ninja check-llvm-codegen-webassembly" to reproduce.

llvm-svn: 335871
2018-06-28 16:25:57 +00:00
Simon Pilgrim
c232e23554 [DAGCombiner] Ensure we use the correct CC result type in visitSDIV
We could get away with it for constant folded cases, but not for rL335719.

Thanks to Krzysztof Parzyszek for noticing.

llvm-svn: 335821
2018-06-28 09:54:28 +00:00
Brendon Cahoon
a4831b8a52 [Hexagon] Add a "generic" cpu
Add the generic processor for Hexagon so that it can be used
with 3rd party programs that create a back-end with the
"generic" CPU. This patch also enables the JIT for Hexagon.

Differential Revision: https://reviews.llvm.org/D48571

llvm-svn: 335641
2018-06-26 18:44:05 +00:00
Krzysztof Parzyszek
5f37d804cb [Hexagon] Replace .ll test for expanding post-ra pesudos with .mir
llvm-svn: 335158
2018-06-20 19:22:27 +00:00
Krzysztof Parzyszek
968d948456 [Hexagon] Enforce restrictions on packetizing cache instructions
llvm-svn: 335061
2018-06-19 17:26:20 +00:00
Krzysztof Parzyszek
7b567e5d2a Remove <undef> from rematerialized full register
When coalescing a small register into a subregister of a larger register,
if the larger register is rematerialized, the function updateRegDefUses
can add an <undef> flag to the rematerialized definition (since it's
treating it as only definining the coalesced subregister). While with that
assumption doing so is not incorrect, make sure to remove the flag later
on after the call to updateRegDefUses.

llvm-svn: 334845
2018-06-15 16:58:22 +00:00
Krzysztof Parzyszek
90f9f3ddd4 [DAGCombiner] Recognize more patterns for ABS
Differential Revision: https://reviews.llvm.org/D47831

llvm-svn: 334553
2018-06-12 21:51:49 +00:00
Krzysztof Parzyszek
5570d6d39d [Hexagon] Make floating point operations expensive for vectorization
llvm-svn: 334508
2018-06-12 15:12:50 +00:00
Krzysztof Parzyszek
cca7086924 [SelectionDAG] Provide default expansion for rotates
Implement default legalization of rotates: either in terms of the rotation
in the opposite direction (if legal), or in terms of shifts and ors.

Implement generating of rotate instructions for Hexagon. Hexagon only
supports rotates by an immediate value, so implement custom lowering of
ROTL/ROTR on Hexagon. If a rotate is not legal, use the default expansion.

Differential Revision: https://reviews.llvm.org/D47725

llvm-svn: 334497
2018-06-12 12:49:36 +00:00
Krzysztof Parzyszek
e497360890 [Hexagon] Late predicate producers cannot be used as dot-new sources
llvm-svn: 334426
2018-06-11 18:45:52 +00:00
Krzysztof Parzyszek
32e5a2dff1 [Hexagon] Implement vector-pair zero as V6_vsubw_dv
llvm-svn: 334123
2018-06-06 19:34:40 +00:00
Krzysztof Parzyszek
b5a700721d [Hexagon] Split CTPOP of vector pairs
llvm-svn: 334109
2018-06-06 18:03:29 +00:00
Krzysztof Parzyszek
6c9d726977 [Hexagon] Add pattern to generate 64-bit neg instruction
llvm-svn: 334043
2018-06-05 19:52:39 +00:00
Krzysztof Parzyszek
4a42c82327 [Hexagon] Add more patterns for generating abs/absp instructions
llvm-svn: 334038
2018-06-05 19:00:50 +00:00
Krzysztof Parzyszek
d849921d4a [Hexagon] Select HVX code for vector CTPOP, CTLZ, and CTTZ
llvm-svn: 333760
2018-06-01 14:52:58 +00:00
Krzysztof Parzyszek
0a4ed009a4 [SelectionDAG] Expand UADDO/USUBO into ADD/SUBCARRY if legal for target
Additionally, implement handling of ADD/SUBCARRY on Hexagon, utilizing
the UADDO/USUBO expansion.

Differential Revision: https://reviews.llvm.org/D47559

llvm-svn: 333751
2018-06-01 14:00:32 +00:00
Krzysztof Parzyszek
8c44cdee95 [Hexagon] Use vector align-left when shift amount fits in 3 bits
This saves an instruction because for align-right the shift amount
would need to be put in a register first.

llvm-svn: 333543
2018-05-30 13:45:34 +00:00
Krzysztof Parzyszek
977520b723 [Hexagon] Fix packing source vectors in shufflevector selection
When the shuffle mask selected a subvector of the second input vector,
and aligning of the source was performed, the shuffle mask was updated
incorrectly, resulting in an ICE further in the selection process.

llvm-svn: 333279
2018-05-25 14:53:14 +00:00
Krzysztof Parzyszek
309f570ea8 [Hexagon] Add patterns for accumulating HVX compares
llvm-svn: 333009
2018-05-22 18:27:02 +00:00
Brendon Cahoon
ee2880e215 [Hexagon] Generate post-increment for floating point types
The code that generates post-increments for Hexagon considered
integer values only. This patch adds support to generate them for
floating point values, f32 and f64.

Differential Revision: https://reviews.llvm.org/D47036

llvm-svn: 332748
2018-05-18 18:14:44 +00:00
Sanjay Patel
f61cc3256e [Hexagon] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.

llvm-svn: 332550
2018-05-16 22:49:08 +00:00
Krzysztof Parzyszek
bf9bd5a28e [Hexagon] Mark HVX vector predicate bitwise ops as legal, add patterns
llvm-svn: 332525
2018-05-16 21:00:24 +00:00
Krzysztof Parzyszek
1f14705c5a [Hexagon] Remove unused flag from subtarget and (non)corresponding test
llvm-svn: 332365
2018-05-15 16:13:52 +00:00
Krzysztof Parzyszek
a4467d89bc [Hexagon] Add a target feature for memop generation
llvm-svn: 332285
2018-05-14 20:09:07 +00:00
Sid Manning
7e2ce4c73e Hexagon: Put relocations after instructions not packets.
Change relocation output so that relocation information follows
individual instructions rather than clustering them at the end
of packets.

This change required shifting block of code but the actual change
is in HexagonPrettyPrinter's PrintInst.

Differential Revision: https://reviews.llvm.org/D46728

llvm-svn: 332283
2018-05-14 19:46:08 +00:00
Krzysztof Parzyszek
fbff7667ba [Hexagon] Avoid predicate copies to integer registers from store-locked
llvm-svn: 332260
2018-05-14 16:41:40 +00:00
Krzysztof Parzyszek
fa4a514b8e [Hexagon] Add patterns for vector shift-and-accumulate
llvm-svn: 331918
2018-05-09 21:10:41 +00:00
Shiva Chen
a2029fa58e [DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is

!DILabel(scope: !1, name: "foo", file: !2, line: 3)

We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is

llvm.dbg.label(metadata !1)

It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.

We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.

Differential Revision: https://reviews.llvm.org/D45024

Patch by Hsiangkai Wang.

llvm-svn: 331841
2018-05-09 02:40:45 +00:00
Krzysztof Parzyszek
fffa5b3e2d [Hexagon] Handle non-immediate constants in HexagonSplitDouble
llvm-svn: 331527
2018-05-04 15:04:48 +00:00
Krzysztof Parzyszek
2091dc141f [Hexagon] Skip reserved physical registers when updating liveness
llvm-svn: 331518
2018-05-04 13:59:05 +00:00
Krzysztof Parzyszek
4a7333a0ad [LivePhysRegs] Remove registers clobbered by regmasks from the live set
Dead defs were being removed from the live set (in stepForward), but
registers clobbered by regmasks weren't (more specifically, they were
actually removed by removeRegsInMask, but then they were added back in).

llvm-svn: 331219
2018-04-30 19:38:47 +00:00
Krzysztof Parzyszek
9c83713b22 [Hexagon] Improve HVX instruction selection (bitcast, vsplat)
There was some unfortunate interaction between VSPLAT and BITCAST
related to the selection of constant vectors (coming from selecting
shuffles). Introduce VSPLATW that always splats a 32-bit word, and
can have arbitrary result type (to avoid BITCASTs of VSPLAT).
Clean up the previous selection of BITCAST/VSPLAT.

llvm-svn: 330471
2018-04-20 19:38:37 +00:00
Krzysztof Parzyszek
d46b725fba [Hexagon] Skip fixed-stack indexes in HexagonConstExtenders
Fixed slots have negative values, and TRI::stackSlot2Index and
TRI::index2StackSlot do not handle negative numbers.

llvm-svn: 330468
2018-04-20 19:06:46 +00:00
Krzysztof Parzyszek
785430b011 [if-converter] Handle BBs that terminate in ret during diamond conversion
This fixes https://llvm.org/PR36825.

Original patch by Valentin Churavy (D45218).

Differential Revision: https://reviews.llvm.org/D45731

llvm-svn: 330345
2018-04-19 17:26:46 +00:00
Krzysztof Parzyszek
5f7d7b9d09 [Hexagon] Use legal types when lowering CONCAT_VECTORS via BUILD_VECTOR
llvm-svn: 330344
2018-04-19 17:11:58 +00:00
Krzysztof Parzyszek
56ad41cdd2 [Hexagon] Generate code for vector bswap intrinsics
llvm-svn: 330333
2018-04-19 14:46:44 +00:00
Krzysztof Parzyszek
d11cd46c0c [Hexagon] Add/fix patterns for 32/64-bit vector compares and logical ops
llvm-svn: 330330
2018-04-19 14:24:31 +00:00
Krzysztof Parzyszek
713a9c7a51 [Hexagon] Do not merge initializers for stack and non-stack expressions
Stack addressing needs addressing modes that provide an offset field
immediately following the frame index. An initializer from a non-stack
addressing could force the stack address to use a form that does not
provide an offset field.

llvm-svn: 330191
2018-04-17 15:23:09 +00:00
Galina Kistanova
8342105f58 Disable flaky tests till they get fixed.
llvm-svn: 329763
2018-04-10 22:07:29 +00:00
Krzysztof Parzyszek
971a8e292d [Hexagon] Handle subregisters when calculating iteration count in HW loops
llvm-svn: 329434
2018-04-06 17:51:57 +00:00
Krzysztof Parzyszek
d3874c8694 [Hexagon] Remove unneeded attributes from lit test
llvm-svn: 329078
2018-04-03 16:05:20 +00:00
Krzysztof Parzyszek
df9bb5dabc [Hexagon] Fix testcase
llvm-svn: 328899
2018-03-30 19:46:28 +00:00
Krzysztof Parzyszek
8e14c7cb4e [Hexagon] Avoid creating invalid offsets in packetizer
Two memory instructions with a dependency only on the address register
between the two (the first one of them being post-incrememnt) can be
packetized together after the offset on the second was updated to the
incremement value. Make sure that the new offset is valid for the
instruction.

llvm-svn: 328897
2018-03-30 19:28:37 +00:00
Krzysztof Parzyszek
62f3aefde3 [Hexagon] Fix printing :mem_noshuf on compiler-generated packets
llvm-svn: 328869
2018-03-30 15:09:05 +00:00
Krzysztof Parzyszek
e739d778c4 [Hexagon] Add support to handle bit-reverse load intrinsics
Patch by Sumanth Gundapaneni.

llvm-svn: 328774
2018-03-29 13:52:46 +00:00
Krzysztof Parzyszek
ff0c489bc6 [Hexagon] Add support for "new" circular buffer intrinsics
These instructions have been around for a long time, but we
haven't supported intrinsics for them. The "new" versions use
the CSx register for the start of the buffer instead of the K
field in the Mx register.

We need to use pseudo instructions for these instructions until
after register allocation. The problem is that these instructions
allocate a M0/CS0 or M1/CS1 pair. But, we can't generate code for
the CSx set-up until after register allocation when the Mx
register has been fixed for the instruction.

There is a related clang patch.

Patch by Brendon Cahoon.

llvm-svn: 328724
2018-03-28 19:38:29 +00:00
Krzysztof Parzyszek
d0aedbc0aa [Hexagon] Implement TTI::shouldMaximizeVectorBandwidth
llvm-svn: 328648
2018-03-27 18:10:47 +00:00
Krzysztof Parzyszek
314e3010b6 [Hexagon] Add more lit tests
llvm-svn: 328561
2018-03-26 17:53:48 +00:00
Krzysztof Parzyszek
d764226f07 [Pipeliner] Add missing loop carried dependences
The pipeliner is not adding a dependence edge for a loop carried
dependence, and ends up scheduling a load from iteration n prior
to an aliased store in iteration n-1.

The code that adds the loop carried dependences in the pipeliner
doesn't check if the memory objects for loads and stores are
"identified" (i.e., distinct) objects. If they are not, then the
code that adds the dependences needs to be conservative. The
objects can be used to check dependences only when they are
distinct objects.

The code that checks for loop carried dependences has been updated
to classify loads and stores that are not identified as "unknown"
values. A store with an "unknown" value can potentially create
a loop carried dependence with any pending load.

Patch by Brendon Cahoon.

llvm-svn: 328547
2018-03-26 16:50:11 +00:00
Krzysztof Parzyszek
c575ffd7a5 [Pipeliner] Use latency to compute RecMII
The patch contains severals changes needed to pipeline an example
that was transformed so that a Phi with a subreg is converted to
copies.

The pipeliner wasn't working for a couple of reasons.
- The RecMII was 3 instead of 2 due to the extra copies.
- Copy instructions contained a latency of 1.
- The node order algorithm was not choosing the best "bottom"
node, which caused an instruction to be scheduled that had a 
predecessor and successor already scheduled.
- Updated the Hexagon Machine Scheduler to check if the node is
latency bound when adding the cost for a 0-latency dependence.

The RecMII was 3 because the computation looks at the number of
nodes in the recurrence. The extra copy is an extra node but
it shouldn't increase the latency. The new RecMII computation
looks at the latency of the instructions in the recurrence. We
changed the latency of the dependence of a copy to 0. The latency
computation for the copy also checks the use of the copy (similar
to a reg_sequence).

The node order algorithm was not choosing the last instruction
in the recurrence for a bottom up traversal. This was when the
last instruction is a copy. A check was added when choosing the
instruction to check for NodeNum if the maxASAP is the same. This
means that the scheduler will not end up with another node in
the recurrence that has both a predecessor and successor already
scheduled.

The cost computation in Hexagon Machine Scheduler adds cost when
an instruction can be packetized with a zero-latency instruction.
We should only do this if the schedule is latency bound. 

Patch by Brendon Cahoon.

llvm-svn: 328542
2018-03-26 16:33:16 +00:00
Krzysztof Parzyszek
ef6658d2e0 [Pipeliner] Fix assert caused by pipeliner serialization
The pipeliner is asserting because the serialization step that 
occurs at the end is deleting an instruction.  The assert
occurs later on because there is a use without a definition.  

The problem occurs when an instruction defines a value used 
by a REQ_SEQUENCE and that value is used by a COPY instruction.
The latencies between these instructions are zero, so they are
put in to the same packet.  The serialization code is unable to
handle this correctly, and ends up putting the REG_SEQUENCE
before its definition.

There is special code in the serialization step that attempts
to handle zero-cost instructions (phis, copy, reg_sequence)
differently than regular instructions. Unfortunately, this means
the order does not come out correct.

This patch simplifies the code by changing the seperate steps for
handling zero-cost and regular instructions. Only phis are
handled separate now, since they should occurs first. Then, this
patch adds checks to make use the MoveUse is set to the smallest
value if there are multiple uses in a cycle.

Patch by Brendon Cahoon.

llvm-svn: 328540
2018-03-26 16:23:29 +00:00
Krzysztof Parzyszek
dac945e988 [Pipeliner] Fix check for order dependences when finalizing instructions
The code in orderDepdences that looks at the order dependences between
instructions was processing all the successor and predecessor order
dependences. However, we really only want to check for an order dependence
for instructions scheduled in the same cycle.

Also, fixed how the pipeliner handles output dependences. An output
dependence is also a potential loop carried dependence. The pipeliner
didn't handle this case properly so an invalid schedule could be created
that allowed an output dependence to be scheduled in the next iteration
at the same cycle.

Patch by Brendon Cahoon.

llvm-svn: 328516
2018-03-26 16:05:55 +00:00
Krzysztof Parzyszek
72c5d3ff50 [Pipeliner] Fix in the pipeliner phi reuse code
When the definition of a phi is used by a phi in the next iteration,
the pipeliner was assuming that the definition is processed first.
Because of the assumption, an incorrect phi name was used. This patch
has a check to see if the phi definition has been processed already.

Patch by Brendon Cahoon.

llvm-svn: 328510
2018-03-26 15:58:16 +00:00
Krzysztof Parzyszek
b1cd0ac92d [Pipeliner] Correctly update memoperands in the epilog
The pipeliner needs to be conservative when updating the memoperands
of instructions in the epilog. Previously, the pipeliner was changing
the offset of the memoperand based upon the scheduling stage. However,
that is incorrect when control flow branches around the kernel code.
The bug enabled a load and store to the same stack offset to be swapped.

This patch fixes the bug by updating the size of the memoperands to be
UINT_MAX. This conservative value means that dependences will be created
between other loads and stores.

Patch by Brendon Cahoon.

llvm-svn: 328508
2018-03-26 15:45:55 +00:00
Krzysztof Parzyszek
0ef1defd7c [Hexagon] Give priority to post-incremementing memory accesses in LSR
llvm-svn: 328506
2018-03-26 15:32:03 +00:00
Krzysztof Parzyszek
403e147450 [Hexagon] Boost profit for word-mask immediates, reduce for others
This avoids unnecessary splitting due to uninteresting immediates.

llvm-svn: 328364
2018-03-23 20:11:00 +00:00
Krzysztof Parzyszek
8bc2fe948c [Hexagon] Fold offset in base+immediate loads/stores
Optimize Ry = add(Rx,#n); memw(Ry+#0) = Rz  =>  memw(Rx,#n) = Rz.

Patch by Jyotsna Verma.

llvm-svn: 328355
2018-03-23 19:30:34 +00:00
Krzysztof Parzyszek
14957cb237 [Hexagon] Always generate mux out of predicated transfers if possible
HexagonGenMux would collapse pairs of predicated transfers if it assumed
that the predicated .new forms cannot be created. Turns out that generating
mux is preferable in almost all cases.
Introduce an option -hexagon-gen-mux-threshold that controls the minimum
distance between the instruction defining the predicate and the later of
the two transfers. If the distance is closer than the threshold, mux will
not be generated. Set the threshold to 0 by default.

llvm-svn: 328346
2018-03-23 18:43:09 +00:00
Krzysztof Parzyszek
d0cbb4efd7 [Hexagon] Avoid early if-conversion for one sided branches
Patch by Anand Kodnani.

llvm-svn: 328344
2018-03-23 18:00:18 +00:00
Krzysztof Parzyszek
693acd58bf [Hexagon] Two fixes in early if-conversion
- Fix checking for vector predicate registers.
- Avoid speculating llvm.lifetime.end intrinsic.

Patch by Harsha Jagasia and Brendon Cahoon.

llvm-svn: 328339
2018-03-23 17:46:09 +00:00
Jun Bum Lim
834ee28da9 [CodeGen] Add a new pass for PostRA sink
Summary:
This pass sinks COPY instructions into a successor block, if the COPY is not
used in the current block and the COPY is live-in to a single successor
(i.e., doesn't require the COPY to be duplicated).  This avoids executing the
the copy on paths where their results aren't needed.  This also exposes
additional opportunites for dead copy elimination and shrink wrapping.

These copies were either not handled by or are inserted after the MachineSink
pass. As an example of the former case, the MachineSink pass cannot sink
COPY instructions with allocatable source registers; for AArch64 these type
of copy instructions are frequently used to move function parameters (PhyReg)
into virtual registers in the entry block..

For the machine IR below, this pass will sink %w19 in the entry into its
successor (%bb.1) because %w19 is only live-in in %bb.1.

```
   %bb.0:
      %wzr = SUBSWri %w1, 1
      %w19 = COPY %w0
      Bcc 11, %bb.2
    %bb.1:
      Live Ins: %w19
      BL @fun
      %w0 = ADDWrr %w0, %w19
      RET %w0
    %bb.2:
      %w0 = COPY %wzr
      RET %w0
```
As we sink %w19 (CSR in AArch64) into %bb.1, the shrink-wrapping pass will be
able to see %bb.0 as a candidate.

With this change I observed 12% more shrink-wrapping candidate and 13% more dead copies deleted  in spec2000/2006/2017 on AArch64.

Reviewers: qcolombet, MatzeB, thegameg, mcrosier, gberry, hfinkel, john.brawn, twoh, RKSimon, sebpop, kparzysz

Reviewed By: sebpop

Subscribers: evandro, sebpop, sfertile, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41463

llvm-svn: 328237
2018-03-22 20:06:47 +00:00
Krzysztof Parzyszek
e6c7dc68db [Hexagon] Eliminate subregisters from PHI nodes before pipelining
The pipeliner needs to remove instructions from the SlotIndexes
structure when they are deleted. Otherwise, the SlotIndexes map
has stale data, and an assert will occur when adding new
instructions.

This patch also changes the pipeliner to make the back-edge of
a loop carried dependence 1 cycle. The 1 cycle latency is added
to the anti-dependence that represents the back-edge. This
changes eliminates a couple of hacks added to the pipeliner to
handle the latency of the back-edge. It is needed to correctly
pipeline the test case for the sub-register elimination pass.

llvm-svn: 328113
2018-03-21 16:39:11 +00:00
Krzysztof Parzyszek
a0de9e901c [Hexagon] Add a few more lit tests, NFC
llvm-svn: 328023
2018-03-20 19:35:09 +00:00
Krzysztof Parzyszek
9bbcab4539 [Hexagon] Add heuristic to exclude critical path cost for scheduling
Patch by Brendon Cahoon.

llvm-svn: 328022
2018-03-20 19:26:27 +00:00
Krzysztof Parzyszek
2c676c8a0b [Hexagon] Correct the computation of TopReadyCycle and BotReadyCycle of SU
TopReadyCycle and BotReadyCycle were off by one cycle when an SU is either
the first instruction or the last instruction in a packet.

Patch by Ikhlas Ajbar.

llvm-svn: 328000
2018-03-20 17:03:27 +00:00
Krzysztof Parzyszek
8d2b1269ef [Hexagon] Improve scheduling based on register pressure
Patch by Brendon Cahoon.

llvm-svn: 327975
2018-03-20 12:28:43 +00:00
Krzysztof Parzyszek
862ceda9fb [Hexagon] Add REQUIRES: asserts to test/CodeGen/Hexagon/v6vec_inc1.ll
llvm-svn: 327907
2018-03-19 21:05:21 +00:00
Krzysztof Parzyszek
4128f44745 [Hexagon] Add a few more lit tests
llvm-svn: 327884
2018-03-19 19:03:18 +00:00
Krzysztof Parzyszek
f03e30a0ba [Hexagon] Avoid bank conflicts in post-RA scheduler
Avoid scheduling two loads in such a way that they would end up in the
same packet. If there is a load in a packet, try to schedule a non-load
next.

Patch by Brendon Cahoon.

llvm-svn: 327742
2018-03-16 20:55:49 +00:00
Krzysztof Parzyszek
a69e19ba70 [Hexagon] Add lit testcases for atomic intrinsics
Patch by Ben Craig.

llvm-svn: 327737
2018-03-16 20:21:43 +00:00
Krzysztof Parzyszek
071a8678db [Hexagon] Fix zero-extending non-HVX bool vectors
llvm-svn: 327712
2018-03-16 15:03:37 +00:00
Francis Visoiu Mistrih
ef0d5db760 [CodeGen] Use MIR syntax for MachineMemOperand printing
Get rid of the "; mem:" suffix and use the one we use in MIR: ":: (load 2)".

rdar://38163529

Differential Revision: https://reviews.llvm.org/D42377

llvm-svn: 327580
2018-03-14 21:52:13 +00:00
Krzysztof Parzyszek
c8f258d8ec [Hexagon] Fix typo in testcase
llvm-svn: 327310
2018-03-12 18:29:47 +00:00
Krzysztof Parzyszek
d6a132b493 [Hexagon] Counting leading/trailing bits is cheap
llvm-svn: 327308
2018-03-12 18:18:23 +00:00
Krzysztof Parzyszek
030af396d9 [Hexagon] Subtarget feature to emit one instruction per packet
This adds two features: "packets", and "nvj".

Enabling "packets" allows the compiler to generate instruction packets,
while disabling it will prevent it and disable all optimizations that
generate them. This feature is enabled by default on all subtargets.
The feature "nvj" allows the compiler to generate new-value jumps and it
implies "packets". It is enabled on all subtargets.

The exception is made for packets with endloop instructions, since they
require a certain minimum number of instructions in the packets to which
they apply. Disabling "packets" will not prevent hardware loops from
being generated.

llvm-svn: 327302
2018-03-12 17:47:46 +00:00
Krzysztof Parzyszek
e2578f42ed [Hexagon] Add REQUIRES: asserts to testcases that use -stats
llvm-svn: 327281
2018-03-12 15:20:36 +00:00
Krzysztof Parzyszek
40443bbb0a [Hexagon] Add REQUIRES: asserts to testcases that use -debug-only
llvm-svn: 327279
2018-03-12 15:11:16 +00:00
Krzysztof Parzyszek
66abdd815e [Hexagon] Add more lit tests
llvm-svn: 327271
2018-03-12 14:01:28 +00:00
Krzysztof Parzyszek
6f361c131d [Hexagon] Ignore indexed loads when handling unaligned loads
llvm-svn: 327037
2018-03-08 18:15:13 +00:00
Roorda, Jan-Willem
b03f33eb41 [Pipeliner] Fixed node order issue related to zero latency edges
Summary:
A desired property of the node order in Swing Modulo Scheduling is
that for nodes outside circuits the following holds: none of them is
scheduled after both a successor and a predecessor. We call
node orders that meet this property valid.

Although invalid node orders do not lead to the generation of incorrect
code, they can cause the pipeliner not being able to find a pipelined schedule
for arbitrary II. The reason is that after scheduling the successor and the
predecessor of a node, no room may be left to schedule the node itself.

For data flow graphs with 0-latency edges, the node ordering algorithm
of Swing Modulo Scheduling can generate such undesired invalid node orders.
This patch fixes that.

In the remainder of this commit message, I will give an example
demonstrating the issue, explain the fix, and explain how the the fix is tested.

Consider, as an example, the following data flow graph with all
edge latencies 0 and all edges pointing downward.

```
   n0
  /  \
n1    n3
  \  /
   n2
    |
   n4
```

Consider the implemented node order algorithm in top-down mode. In that mode,
the algorithm orders the nodes based on greatest Height and in case of equal
Height on lowest Movability. Finally, in case of equal Height and
Movability, given two nodes with an edge between them, the algorithm prefers
the source-node.

In the graph, for every node, the Height and Movability are equal to 0.
As will be explained below, the algorithm can generate the order n0, n1, n2, n3, n4.
So, node n3 is scheduled after its predecessor n0 and after its successor n2.

The reason that the algorithm can put node n2 in the order before node n3,
even though they have an edge between them in which node n3 is the source,
is the following: Suppose the algorithm has constructed the partial node
order n0, n1. Then, the nodes left to be ordered are nodes n2, n3, and n4. Suppose
that the while-loop in the implemented algorithm considers the nodes in
the order n4, n3, n2. The algorithm will start with node n4, and look for
more preferable nodes. First, node n4 will be compared with node n3. As the nodes
have equal Height and Movability and have no edge between them, the algorithm
will stick with node n4. Then node n4 is compared with node n2. Again the
Height and Movability are equal. But, this time, there is an edge between
the two nodes, and the algorithm will prefer the source node n2.
As there are no nodes left to compare, the algorithm will add node n2 to
the node order, yielding the partial node order n0, n1, n2. In this way node n2
arrives in the node-order before node n3.

To solve this, this patch introduces the ZeroLatencyHeight (ZLH) property
for nodes. It is defined as the maximum unweighted length of a path from the
given node to an arbitrary node in which each edge has latency 0.
So, ZLH(n0)=3, ZLH(n1)=ZLH(n3)=2, ZLH(n2)=1, and ZLH(n4)=0

In this patch, the preference for a greater ZeroLatencyHeight
is added in the top-down mode of the node ordering algorithm, after the
preference for a greater Height, and before the preference for a
lower Movability.

Therefore, the two allowed node-orders are n0, n1, n3, n2, n4 and n0, n3, n1, n2, n4.
Both of them are valid node orders.

In the same way, the bottom-up mode of the node ordering algorithm is adapted
by introducing the ZeroLatencyDepth property for nodes.

The patch is tested by adding extra checks to the following existing
lit-tests:
test/CodeGen/Hexagon/SUnit-boundary-prob.ll
test/CodeGen/Hexagon/frame-offset-overflow.ll
test/CodeGen/Hexagon/vect/vect-shuffle.ll

Before this patch, the pipeliner failed to pipeline the loops in these tests
due to invalid node-orders. After the patch, the pipeliner successfully
pipelines all these loops.

Reviewers: bcahoon

Reviewed By: bcahoon

Subscribers: Ayal, mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D43620

llvm-svn: 326925
2018-03-07 18:53:36 +00:00
Krzysztof Parzyszek
71647c0d9d [Hexagon] Rewrite non-HVX unaligned loads as pairs of aligned ones
This is a follow-up to r325169, this time for all types, not just HVX
vector types.

Disable this by default, since it's not always safe. 

llvm-svn: 326915
2018-03-07 17:27:18 +00:00
Krzysztof Parzyszek
c16e97ea3c [Hexagon] Update more testcases
llvm-svn: 326830
2018-03-06 19:15:58 +00:00
Krzysztof Parzyszek
2790a1e6e3 [Hexagon] Remove {{ *}} from testcases
The spaces in the instructions are now consistent.

llvm-svn: 326829
2018-03-06 19:07:21 +00:00
Krzysztof Parzyszek
b3966a5583 [Hexagon] Generate valignb for shifting shuffles (instead of vdelta)
llvm-svn: 326627
2018-03-02 22:22:19 +00:00
Krzysztof Parzyszek
a3ffcf4b0f [Hexagon] Handle VACOPY in isel lowering
llvm-svn: 326599
2018-03-02 18:35:57 +00:00
Krzysztof Parzyszek
cbd1ba2fe2 [Hexagon] Recognize more sign-extensions as inputs to 32x32-bit multiply
llvm-svn: 326263
2018-02-27 22:44:41 +00:00
Krzysztof Parzyszek
d2c45ee389 [Pipeliner] Drop memrefs instead of creating ones with size UINT64_MAX
Absence of memory operands is treated as "aliasing everything", so
dropping them is sufficient.

Recommit r326256 with a fixed testcase.

llvm-svn: 326262
2018-02-27 22:40:52 +00:00
Krzysztof Parzyszek
32fce8da35 [Hexagon] Add patterns for compares of i1 values
llvm-svn: 326220
2018-02-27 18:31:46 +00:00