1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 04:22:57 +02:00
Commit Graph

642 Commits

Author SHA1 Message Date
Krzysztof Parzyszek
56ad41cdd2 [Hexagon] Generate code for vector bswap intrinsics
llvm-svn: 330333
2018-04-19 14:46:44 +00:00
Krzysztof Parzyszek
d11cd46c0c [Hexagon] Add/fix patterns for 32/64-bit vector compares and logical ops
llvm-svn: 330330
2018-04-19 14:24:31 +00:00
Krzysztof Parzyszek
713a9c7a51 [Hexagon] Do not merge initializers for stack and non-stack expressions
Stack addressing needs addressing modes that provide an offset field
immediately following the frame index. An initializer from a non-stack
addressing could force the stack address to use a form that does not
provide an offset field.

llvm-svn: 330191
2018-04-17 15:23:09 +00:00
Galina Kistanova
8342105f58 Disable flaky tests till they get fixed.
llvm-svn: 329763
2018-04-10 22:07:29 +00:00
Krzysztof Parzyszek
971a8e292d [Hexagon] Handle subregisters when calculating iteration count in HW loops
llvm-svn: 329434
2018-04-06 17:51:57 +00:00
Krzysztof Parzyszek
d3874c8694 [Hexagon] Remove unneeded attributes from lit test
llvm-svn: 329078
2018-04-03 16:05:20 +00:00
Krzysztof Parzyszek
df9bb5dabc [Hexagon] Fix testcase
llvm-svn: 328899
2018-03-30 19:46:28 +00:00
Krzysztof Parzyszek
8e14c7cb4e [Hexagon] Avoid creating invalid offsets in packetizer
Two memory instructions with a dependency only on the address register
between the two (the first one of them being post-incrememnt) can be
packetized together after the offset on the second was updated to the
incremement value. Make sure that the new offset is valid for the
instruction.

llvm-svn: 328897
2018-03-30 19:28:37 +00:00
Krzysztof Parzyszek
62f3aefde3 [Hexagon] Fix printing :mem_noshuf on compiler-generated packets
llvm-svn: 328869
2018-03-30 15:09:05 +00:00
Krzysztof Parzyszek
e739d778c4 [Hexagon] Add support to handle bit-reverse load intrinsics
Patch by Sumanth Gundapaneni.

llvm-svn: 328774
2018-03-29 13:52:46 +00:00
Krzysztof Parzyszek
ff0c489bc6 [Hexagon] Add support for "new" circular buffer intrinsics
These instructions have been around for a long time, but we
haven't supported intrinsics for them. The "new" versions use
the CSx register for the start of the buffer instead of the K
field in the Mx register.

We need to use pseudo instructions for these instructions until
after register allocation. The problem is that these instructions
allocate a M0/CS0 or M1/CS1 pair. But, we can't generate code for
the CSx set-up until after register allocation when the Mx
register has been fixed for the instruction.

There is a related clang patch.

Patch by Brendon Cahoon.

llvm-svn: 328724
2018-03-28 19:38:29 +00:00
Krzysztof Parzyszek
d0aedbc0aa [Hexagon] Implement TTI::shouldMaximizeVectorBandwidth
llvm-svn: 328648
2018-03-27 18:10:47 +00:00
Krzysztof Parzyszek
314e3010b6 [Hexagon] Add more lit tests
llvm-svn: 328561
2018-03-26 17:53:48 +00:00
Krzysztof Parzyszek
d764226f07 [Pipeliner] Add missing loop carried dependences
The pipeliner is not adding a dependence edge for a loop carried
dependence, and ends up scheduling a load from iteration n prior
to an aliased store in iteration n-1.

The code that adds the loop carried dependences in the pipeliner
doesn't check if the memory objects for loads and stores are
"identified" (i.e., distinct) objects. If they are not, then the
code that adds the dependences needs to be conservative. The
objects can be used to check dependences only when they are
distinct objects.

The code that checks for loop carried dependences has been updated
to classify loads and stores that are not identified as "unknown"
values. A store with an "unknown" value can potentially create
a loop carried dependence with any pending load.

Patch by Brendon Cahoon.

llvm-svn: 328547
2018-03-26 16:50:11 +00:00
Krzysztof Parzyszek
c575ffd7a5 [Pipeliner] Use latency to compute RecMII
The patch contains severals changes needed to pipeline an example
that was transformed so that a Phi with a subreg is converted to
copies.

The pipeliner wasn't working for a couple of reasons.
- The RecMII was 3 instead of 2 due to the extra copies.
- Copy instructions contained a latency of 1.
- The node order algorithm was not choosing the best "bottom"
node, which caused an instruction to be scheduled that had a 
predecessor and successor already scheduled.
- Updated the Hexagon Machine Scheduler to check if the node is
latency bound when adding the cost for a 0-latency dependence.

The RecMII was 3 because the computation looks at the number of
nodes in the recurrence. The extra copy is an extra node but
it shouldn't increase the latency. The new RecMII computation
looks at the latency of the instructions in the recurrence. We
changed the latency of the dependence of a copy to 0. The latency
computation for the copy also checks the use of the copy (similar
to a reg_sequence).

The node order algorithm was not choosing the last instruction
in the recurrence for a bottom up traversal. This was when the
last instruction is a copy. A check was added when choosing the
instruction to check for NodeNum if the maxASAP is the same. This
means that the scheduler will not end up with another node in
the recurrence that has both a predecessor and successor already
scheduled.

The cost computation in Hexagon Machine Scheduler adds cost when
an instruction can be packetized with a zero-latency instruction.
We should only do this if the schedule is latency bound. 

Patch by Brendon Cahoon.

llvm-svn: 328542
2018-03-26 16:33:16 +00:00
Krzysztof Parzyszek
ef6658d2e0 [Pipeliner] Fix assert caused by pipeliner serialization
The pipeliner is asserting because the serialization step that 
occurs at the end is deleting an instruction.  The assert
occurs later on because there is a use without a definition.  

The problem occurs when an instruction defines a value used 
by a REQ_SEQUENCE and that value is used by a COPY instruction.
The latencies between these instructions are zero, so they are
put in to the same packet.  The serialization code is unable to
handle this correctly, and ends up putting the REG_SEQUENCE
before its definition.

There is special code in the serialization step that attempts
to handle zero-cost instructions (phis, copy, reg_sequence)
differently than regular instructions. Unfortunately, this means
the order does not come out correct.

This patch simplifies the code by changing the seperate steps for
handling zero-cost and regular instructions. Only phis are
handled separate now, since they should occurs first. Then, this
patch adds checks to make use the MoveUse is set to the smallest
value if there are multiple uses in a cycle.

Patch by Brendon Cahoon.

llvm-svn: 328540
2018-03-26 16:23:29 +00:00
Krzysztof Parzyszek
dac945e988 [Pipeliner] Fix check for order dependences when finalizing instructions
The code in orderDepdences that looks at the order dependences between
instructions was processing all the successor and predecessor order
dependences. However, we really only want to check for an order dependence
for instructions scheduled in the same cycle.

Also, fixed how the pipeliner handles output dependences. An output
dependence is also a potential loop carried dependence. The pipeliner
didn't handle this case properly so an invalid schedule could be created
that allowed an output dependence to be scheduled in the next iteration
at the same cycle.

Patch by Brendon Cahoon.

llvm-svn: 328516
2018-03-26 16:05:55 +00:00
Krzysztof Parzyszek
72c5d3ff50 [Pipeliner] Fix in the pipeliner phi reuse code
When the definition of a phi is used by a phi in the next iteration,
the pipeliner was assuming that the definition is processed first.
Because of the assumption, an incorrect phi name was used. This patch
has a check to see if the phi definition has been processed already.

Patch by Brendon Cahoon.

llvm-svn: 328510
2018-03-26 15:58:16 +00:00
Krzysztof Parzyszek
b1cd0ac92d [Pipeliner] Correctly update memoperands in the epilog
The pipeliner needs to be conservative when updating the memoperands
of instructions in the epilog. Previously, the pipeliner was changing
the offset of the memoperand based upon the scheduling stage. However,
that is incorrect when control flow branches around the kernel code.
The bug enabled a load and store to the same stack offset to be swapped.

This patch fixes the bug by updating the size of the memoperands to be
UINT_MAX. This conservative value means that dependences will be created
between other loads and stores.

Patch by Brendon Cahoon.

llvm-svn: 328508
2018-03-26 15:45:55 +00:00
Krzysztof Parzyszek
0ef1defd7c [Hexagon] Give priority to post-incremementing memory accesses in LSR
llvm-svn: 328506
2018-03-26 15:32:03 +00:00
Krzysztof Parzyszek
403e147450 [Hexagon] Boost profit for word-mask immediates, reduce for others
This avoids unnecessary splitting due to uninteresting immediates.

llvm-svn: 328364
2018-03-23 20:11:00 +00:00
Krzysztof Parzyszek
8bc2fe948c [Hexagon] Fold offset in base+immediate loads/stores
Optimize Ry = add(Rx,#n); memw(Ry+#0) = Rz  =>  memw(Rx,#n) = Rz.

Patch by Jyotsna Verma.

llvm-svn: 328355
2018-03-23 19:30:34 +00:00
Krzysztof Parzyszek
14957cb237 [Hexagon] Always generate mux out of predicated transfers if possible
HexagonGenMux would collapse pairs of predicated transfers if it assumed
that the predicated .new forms cannot be created. Turns out that generating
mux is preferable in almost all cases.
Introduce an option -hexagon-gen-mux-threshold that controls the minimum
distance between the instruction defining the predicate and the later of
the two transfers. If the distance is closer than the threshold, mux will
not be generated. Set the threshold to 0 by default.

llvm-svn: 328346
2018-03-23 18:43:09 +00:00
Krzysztof Parzyszek
d0cbb4efd7 [Hexagon] Avoid early if-conversion for one sided branches
Patch by Anand Kodnani.

llvm-svn: 328344
2018-03-23 18:00:18 +00:00
Krzysztof Parzyszek
693acd58bf [Hexagon] Two fixes in early if-conversion
- Fix checking for vector predicate registers.
- Avoid speculating llvm.lifetime.end intrinsic.

Patch by Harsha Jagasia and Brendon Cahoon.

llvm-svn: 328339
2018-03-23 17:46:09 +00:00
Jun Bum Lim
834ee28da9 [CodeGen] Add a new pass for PostRA sink
Summary:
This pass sinks COPY instructions into a successor block, if the COPY is not
used in the current block and the COPY is live-in to a single successor
(i.e., doesn't require the COPY to be duplicated).  This avoids executing the
the copy on paths where their results aren't needed.  This also exposes
additional opportunites for dead copy elimination and shrink wrapping.

These copies were either not handled by or are inserted after the MachineSink
pass. As an example of the former case, the MachineSink pass cannot sink
COPY instructions with allocatable source registers; for AArch64 these type
of copy instructions are frequently used to move function parameters (PhyReg)
into virtual registers in the entry block..

For the machine IR below, this pass will sink %w19 in the entry into its
successor (%bb.1) because %w19 is only live-in in %bb.1.

```
   %bb.0:
      %wzr = SUBSWri %w1, 1
      %w19 = COPY %w0
      Bcc 11, %bb.2
    %bb.1:
      Live Ins: %w19
      BL @fun
      %w0 = ADDWrr %w0, %w19
      RET %w0
    %bb.2:
      %w0 = COPY %wzr
      RET %w0
```
As we sink %w19 (CSR in AArch64) into %bb.1, the shrink-wrapping pass will be
able to see %bb.0 as a candidate.

With this change I observed 12% more shrink-wrapping candidate and 13% more dead copies deleted  in spec2000/2006/2017 on AArch64.

Reviewers: qcolombet, MatzeB, thegameg, mcrosier, gberry, hfinkel, john.brawn, twoh, RKSimon, sebpop, kparzysz

Reviewed By: sebpop

Subscribers: evandro, sebpop, sfertile, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41463

llvm-svn: 328237
2018-03-22 20:06:47 +00:00
Krzysztof Parzyszek
e6c7dc68db [Hexagon] Eliminate subregisters from PHI nodes before pipelining
The pipeliner needs to remove instructions from the SlotIndexes
structure when they are deleted. Otherwise, the SlotIndexes map
has stale data, and an assert will occur when adding new
instructions.

This patch also changes the pipeliner to make the back-edge of
a loop carried dependence 1 cycle. The 1 cycle latency is added
to the anti-dependence that represents the back-edge. This
changes eliminates a couple of hacks added to the pipeliner to
handle the latency of the back-edge. It is needed to correctly
pipeline the test case for the sub-register elimination pass.

llvm-svn: 328113
2018-03-21 16:39:11 +00:00
Krzysztof Parzyszek
a0de9e901c [Hexagon] Add a few more lit tests, NFC
llvm-svn: 328023
2018-03-20 19:35:09 +00:00
Krzysztof Parzyszek
9bbcab4539 [Hexagon] Add heuristic to exclude critical path cost for scheduling
Patch by Brendon Cahoon.

llvm-svn: 328022
2018-03-20 19:26:27 +00:00
Krzysztof Parzyszek
2c676c8a0b [Hexagon] Correct the computation of TopReadyCycle and BotReadyCycle of SU
TopReadyCycle and BotReadyCycle were off by one cycle when an SU is either
the first instruction or the last instruction in a packet.

Patch by Ikhlas Ajbar.

llvm-svn: 328000
2018-03-20 17:03:27 +00:00
Krzysztof Parzyszek
8d2b1269ef [Hexagon] Improve scheduling based on register pressure
Patch by Brendon Cahoon.

llvm-svn: 327975
2018-03-20 12:28:43 +00:00
Krzysztof Parzyszek
862ceda9fb [Hexagon] Add REQUIRES: asserts to test/CodeGen/Hexagon/v6vec_inc1.ll
llvm-svn: 327907
2018-03-19 21:05:21 +00:00
Krzysztof Parzyszek
4128f44745 [Hexagon] Add a few more lit tests
llvm-svn: 327884
2018-03-19 19:03:18 +00:00
Krzysztof Parzyszek
f03e30a0ba [Hexagon] Avoid bank conflicts in post-RA scheduler
Avoid scheduling two loads in such a way that they would end up in the
same packet. If there is a load in a packet, try to schedule a non-load
next.

Patch by Brendon Cahoon.

llvm-svn: 327742
2018-03-16 20:55:49 +00:00
Krzysztof Parzyszek
a69e19ba70 [Hexagon] Add lit testcases for atomic intrinsics
Patch by Ben Craig.

llvm-svn: 327737
2018-03-16 20:21:43 +00:00
Krzysztof Parzyszek
071a8678db [Hexagon] Fix zero-extending non-HVX bool vectors
llvm-svn: 327712
2018-03-16 15:03:37 +00:00
Francis Visoiu Mistrih
ef0d5db760 [CodeGen] Use MIR syntax for MachineMemOperand printing
Get rid of the "; mem:" suffix and use the one we use in MIR: ":: (load 2)".

rdar://38163529

Differential Revision: https://reviews.llvm.org/D42377

llvm-svn: 327580
2018-03-14 21:52:13 +00:00
Krzysztof Parzyszek
c8f258d8ec [Hexagon] Fix typo in testcase
llvm-svn: 327310
2018-03-12 18:29:47 +00:00
Krzysztof Parzyszek
d6a132b493 [Hexagon] Counting leading/trailing bits is cheap
llvm-svn: 327308
2018-03-12 18:18:23 +00:00
Krzysztof Parzyszek
030af396d9 [Hexagon] Subtarget feature to emit one instruction per packet
This adds two features: "packets", and "nvj".

Enabling "packets" allows the compiler to generate instruction packets,
while disabling it will prevent it and disable all optimizations that
generate them. This feature is enabled by default on all subtargets.
The feature "nvj" allows the compiler to generate new-value jumps and it
implies "packets". It is enabled on all subtargets.

The exception is made for packets with endloop instructions, since they
require a certain minimum number of instructions in the packets to which
they apply. Disabling "packets" will not prevent hardware loops from
being generated.

llvm-svn: 327302
2018-03-12 17:47:46 +00:00
Krzysztof Parzyszek
e2578f42ed [Hexagon] Add REQUIRES: asserts to testcases that use -stats
llvm-svn: 327281
2018-03-12 15:20:36 +00:00
Krzysztof Parzyszek
40443bbb0a [Hexagon] Add REQUIRES: asserts to testcases that use -debug-only
llvm-svn: 327279
2018-03-12 15:11:16 +00:00
Krzysztof Parzyszek
66abdd815e [Hexagon] Add more lit tests
llvm-svn: 327271
2018-03-12 14:01:28 +00:00
Krzysztof Parzyszek
6f361c131d [Hexagon] Ignore indexed loads when handling unaligned loads
llvm-svn: 327037
2018-03-08 18:15:13 +00:00
Roorda, Jan-Willem
b03f33eb41 [Pipeliner] Fixed node order issue related to zero latency edges
Summary:
A desired property of the node order in Swing Modulo Scheduling is
that for nodes outside circuits the following holds: none of them is
scheduled after both a successor and a predecessor. We call
node orders that meet this property valid.

Although invalid node orders do not lead to the generation of incorrect
code, they can cause the pipeliner not being able to find a pipelined schedule
for arbitrary II. The reason is that after scheduling the successor and the
predecessor of a node, no room may be left to schedule the node itself.

For data flow graphs with 0-latency edges, the node ordering algorithm
of Swing Modulo Scheduling can generate such undesired invalid node orders.
This patch fixes that.

In the remainder of this commit message, I will give an example
demonstrating the issue, explain the fix, and explain how the the fix is tested.

Consider, as an example, the following data flow graph with all
edge latencies 0 and all edges pointing downward.

```
   n0
  /  \
n1    n3
  \  /
   n2
    |
   n4
```

Consider the implemented node order algorithm in top-down mode. In that mode,
the algorithm orders the nodes based on greatest Height and in case of equal
Height on lowest Movability. Finally, in case of equal Height and
Movability, given two nodes with an edge between them, the algorithm prefers
the source-node.

In the graph, for every node, the Height and Movability are equal to 0.
As will be explained below, the algorithm can generate the order n0, n1, n2, n3, n4.
So, node n3 is scheduled after its predecessor n0 and after its successor n2.

The reason that the algorithm can put node n2 in the order before node n3,
even though they have an edge between them in which node n3 is the source,
is the following: Suppose the algorithm has constructed the partial node
order n0, n1. Then, the nodes left to be ordered are nodes n2, n3, and n4. Suppose
that the while-loop in the implemented algorithm considers the nodes in
the order n4, n3, n2. The algorithm will start with node n4, and look for
more preferable nodes. First, node n4 will be compared with node n3. As the nodes
have equal Height and Movability and have no edge between them, the algorithm
will stick with node n4. Then node n4 is compared with node n2. Again the
Height and Movability are equal. But, this time, there is an edge between
the two nodes, and the algorithm will prefer the source node n2.
As there are no nodes left to compare, the algorithm will add node n2 to
the node order, yielding the partial node order n0, n1, n2. In this way node n2
arrives in the node-order before node n3.

To solve this, this patch introduces the ZeroLatencyHeight (ZLH) property
for nodes. It is defined as the maximum unweighted length of a path from the
given node to an arbitrary node in which each edge has latency 0.
So, ZLH(n0)=3, ZLH(n1)=ZLH(n3)=2, ZLH(n2)=1, and ZLH(n4)=0

In this patch, the preference for a greater ZeroLatencyHeight
is added in the top-down mode of the node ordering algorithm, after the
preference for a greater Height, and before the preference for a
lower Movability.

Therefore, the two allowed node-orders are n0, n1, n3, n2, n4 and n0, n3, n1, n2, n4.
Both of them are valid node orders.

In the same way, the bottom-up mode of the node ordering algorithm is adapted
by introducing the ZeroLatencyDepth property for nodes.

The patch is tested by adding extra checks to the following existing
lit-tests:
test/CodeGen/Hexagon/SUnit-boundary-prob.ll
test/CodeGen/Hexagon/frame-offset-overflow.ll
test/CodeGen/Hexagon/vect/vect-shuffle.ll

Before this patch, the pipeliner failed to pipeline the loops in these tests
due to invalid node-orders. After the patch, the pipeliner successfully
pipelines all these loops.

Reviewers: bcahoon

Reviewed By: bcahoon

Subscribers: Ayal, mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D43620

llvm-svn: 326925
2018-03-07 18:53:36 +00:00
Krzysztof Parzyszek
71647c0d9d [Hexagon] Rewrite non-HVX unaligned loads as pairs of aligned ones
This is a follow-up to r325169, this time for all types, not just HVX
vector types.

Disable this by default, since it's not always safe. 

llvm-svn: 326915
2018-03-07 17:27:18 +00:00
Krzysztof Parzyszek
c16e97ea3c [Hexagon] Update more testcases
llvm-svn: 326830
2018-03-06 19:15:58 +00:00
Krzysztof Parzyszek
2790a1e6e3 [Hexagon] Remove {{ *}} from testcases
The spaces in the instructions are now consistent.

llvm-svn: 326829
2018-03-06 19:07:21 +00:00
Krzysztof Parzyszek
b3966a5583 [Hexagon] Generate valignb for shifting shuffles (instead of vdelta)
llvm-svn: 326627
2018-03-02 22:22:19 +00:00
Krzysztof Parzyszek
a3ffcf4b0f [Hexagon] Handle VACOPY in isel lowering
llvm-svn: 326599
2018-03-02 18:35:57 +00:00