1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 12:33:33 +02:00
Commit Graph

398 Commits

Author SHA1 Message Date
Ulrich Weigand
461d4b06db [SystemZ] Fix truncstore + bswap codegen bug
SystemZTargetLowering::combineSTORE contains code to transform a
combination of STORE + BSWAP into a STRV type instruction.

This transformation is correct for regular stores, but not for
truncating stores.  The routine neglected to check for that case.

Fixes a miscompilation of llvm-objcopy with clang, which caused
test suite failures in the SystemZ multistage build bot.

llvm-svn: 313669
2017-09-19 20:50:05 +00:00
NAKAMURA Takumi
0839b68eae Move llvm/test/CodeGen/X86/clear-liverange-spillreg.mir to SystemZ. It was in wrong place.
llvm-svn: 313218
2017-09-14 00:03:23 +00:00
Jonas Paulsson
ed6377b66d [SystemZ] Add the CoveredBySubRegs bit to GPR64, GPR128 and FPR128 registers.
This bit is needed in order for the CalleeSavedRegs list to automatically
include the super registers if all of their subregs are present.

Thanks to Wei Mi for initially indicating this deficiency in the SystemZ
backend.

Review: Ulrich Weigand.
https://bugs.llvm.org/show_bug.cgi?id=34550

llvm-svn: 313023
2017-09-12 12:11:29 +00:00
Jonas Paulsson
8fe5ccd5d8 [SystemZ, MachineScheduler] Improve post-RA scheduling.
The idea of this patch is to continue the scheduler state over an MBB boundary
in the case where the successor block has only one predecessor. This means
that the scheduler will continue in the successor block (after emitting any
branch instructions) with e.g. maintained processor resource counters.
Benchmarks have been confirmed to benefit from this.

The algorithm in MachineScheduler.cpp that extracts scheduling regions of an
MBB has been extended so that the strategy may optionally reverse the order
of processing the regions themselves. This is controlled by a new method
doMBBSchedRegionsTopDown(), which defaults to false.

Handling the top-most region of an MBB first also means that a top-down
scheduler can continue the scheduler state across any scheduling boundary
between to regions inside MBB.

Review: Ulrich Weigand, Matthias Braun, Andy Trick.
https://reviews.llvm.org/D35053

llvm-svn: 311072
2017-08-17 08:33:44 +00:00
Mikael Holmen
11cbc372b6 [IfConversion] Maintain the CFG when predicating/merging blocks in IfConvert*
Summary:
This fixes PR32721 in IfConvertTriangle and possible similar problems in
IfConvertSimple, IfConvertDiamond and IfConvertForkedDiamond.

In PR32721 we had a triangle

   EBB
   | \
   |  |
   | TBB
   |  /
   FBB

where FBB didn't have any successors at all since it ended with an
unconditional return. Then TBB and FBB were be merged into EBB, but EBB
would still keep its successors, and the use of analyzeBranch and
CorrectExtraCFGEdges wouldn't help to remove them since the return
instruction is not analyzable (at least not on ARM).

The edge updating code and branch probability updating code is now pushed
into MergeBlocks() which allows us to share the same update logic between
more callsites. This lets us remove several dependencies on analyzeBranch
and completely eliminate RemoveExtraEdges.

One thing that showed up with this patch was that IfConversion sometimes
left a successor with 0% probability even if there was no branch or
fallthrough to the successor.

One such example from the test case ifcvt_bad_zero_prob_succ.mir. The
indirect branch tBRIND can only jump to bb.1, but without the patch we
got:

  bb.0:
    successors: %bb.1(0x80000000)

  bb.1:
    successors: %bb.1(0x80000000), %bb.2(0x00000000)
    tBRIND %r1, 1, %cpsr
    B %bb.1

  bb.2:

There is no way to jump from bb.1 to bb2, but still there is a 0% edge
from bb.1 to bb.2.

With the patch applied we instead get the expected:

  bb.0:
    successors: %bb.1(0x80000000)

  bb.1:
    successors: %bb.1(0x80000000)
    tBRIND %r1, 1, %cpsr
    B %bb.1

Since bb.2 had no predecessor at all, it was removed.

Several testcases had to be updated due to this since the removed
successor made the "Branch Probability Basic Block Placement" pass
sometimes place blocks in a different order.

Finally added a couple of new test cases:

* PR32721_ifcvt_triangle_unanalyzable.mir:
  Regression test for the original problem dexcribed in PR 32721.

* ifcvt_triangleWoCvtToNextEdge.mir:
  Regression test for problem that caused a revert of my first attempt
  to solve PR 32721.

* ifcvt_simple_bad_zero_prob_succ.mir:
  Test case showing the problem where a wrong successor with 0% probability
  was previously left.

* ifcvt_[diamond|forked_diamond|simple]_unanalyzable.mir
  Very simple test cases for the simple and (forked) diamond cases
  involving unanalyzable branches that can be nice to have as a base if
  wanting to write more complicated tests.

Reviewers: iteratee, MatzeB, grosser, kparzysz

Reviewed By: kparzysz

Subscribers: kbarton, davide, aemerson, nemanjai, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34099

llvm-svn: 310697
2017-08-11 06:57:08 +00:00
Evgeny Stupachenko
8c60bbb9d2 Reapply fix PR23384 (part 3 of 3) r304824 (was reverted in r305720).
The root cause of reverting was fixed - PR33514.

Summary:
The patch makes instruction count the highest priority for
 LSR solution for X86 (previously registers had highest priority).

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D30562

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 310289
2017-08-07 19:56:34 +00:00
Ulrich Weigand
9ee7a9a74c [SystemZ] Add support for 128-bit atomic load/store/cmpxchg
This adds support for the main 128-bit atomic operations,
using the SystemZ instructions LPQ, STPQ, and CDSG.

Generating these instructions is a bit more complex than usual
since the i128 type is not legal for the back-end.  Therefore,
we have to hook the LowerOperationWrapper and ReplaceNodeResults
TargetLowering callbacks.

llvm-svn: 310094
2017-08-04 18:57:58 +00:00
Ulrich Weigand
69fde521b5 [SystemZ] Eliminate unnecessary serialization operations
We currently emit a serialization operation (bcr 14, 0) before every
atomic load and after every atomic store.  This is overly conservative.
The SystemZ architecture actually does not require any serialization
for atomic loads, and a serialization after an atomic store only if
we need to enforce sequential consistency.  This is what other compilers
for the platform implement as well.

llvm-svn: 310093
2017-08-04 18:53:35 +00:00
Jonas Paulsson
e780323665 [SystemZ] test update
test/CodeGen/SystemZ/loop-01.ll was incorrectly updated by r308729.

llvm-svn: 308736
2017-07-21 13:14:17 +00:00
Jonas Paulsson
c38a4eb7d4 [SystemZ, LoopStrengthReduce]
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.

In order to achieve this, the following common code changes were made:

 * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
 LSR should do instruction-based addressing evaluations by calling
 isLegalAddressingMode() with the Instruction pointers.
 * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
 as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
 not just loads or stores.

SystemZ changes:

 * isLSRCostLess() implemented with Insns first, and without ImmCost.
 * New function supportedAddressingMode() that is a helper for TTI methods
 looking at Instructions passed via pointers.

Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262
https://reviews.llvm.org/D35049

llvm-svn: 308729
2017-07-21 11:59:37 +00:00
Ulrich Weigand
bc658bf60a [SystemZ] Add support for IBM z14 processor (3/3)
This adds support for the new 128-bit vector float instructions of z14.
Note that these instructions actually only operate on the f128 type,
since only each 128-bit vector register can hold only one 128-bit
float value.  However, this is still preferable to the legacy 128-bit
float instructions, since those operate on pairs of floating-point
registers (so we can hold at most 8 values in registers), while the
new instructions use single vector registers (so we hold up to 32
value in registers).

Adding support includes:
- Enabling the instructions for the assembler/disassembler.
- CodeGen for the instructions.  This includes allocating the f128
  type now to the VR128BitRegClass instead of FP128BitRegClass.
- Scheduler description support for the instructions.

Note that for a small number of operations, we have no new vector
instructions (like integer <-> 128-bit float conversions), and so
we use the legacy instruction and then reformat the operand
(i.e. copy between a pair of floating-point registers and a
vector register).

llvm-svn: 308196
2017-07-17 17:44:20 +00:00
Ulrich Weigand
ee0d5cd952 [SystemZ] Add support for IBM z14 processor (2/3)
This adds support for the new 32-bit vector float instructions of z14.
This includes:
- Enabling the instructions for the assembler/disassembler.
- CodeGen for the instructions, including new LLVM intrinsics.
- Scheduler description support for the instructions.
- Update to the vector cost function calculations.

In general, CodeGen support for the new v4f32 instructions closely
matches support for the existing v2f64 instructions.

llvm-svn: 308195
2017-07-17 17:42:48 +00:00
Ulrich Weigand
5f15092063 [SystemZ] Add support for IBM z14 processor (1/3)
This patch series adds support for the IBM z14 processor.  This part includes:
- Basic support for the new processor and its features.
- Support for new instructions (except vector 32-bit float and 128-bit float).
- CodeGen for new instructions, including new LLVM intrinsics.
- Scheduler description for the new processor.
- Detection of z14 as host processor.

Support for the new 32-bit vector float and 128-bit vector float
instructions is provided by separate patches.

llvm-svn: 308194
2017-07-17 17:41:11 +00:00
Quentin Colombet
2d20120a2d [RegAllocFast] Don't insert kill flags of super-register for partial kill
When reusing a register for a new definition, the fast register allocator
used to insert a kill flag at the previous last use of that register to
inform later passes that this register is free between the redef and the
last use. However, this may be wrong when subregisters are involved.
Indeed, a partially redef would have trigger a kill of the full super
register, potentially wrongly marking all the other subregisters as
free. Given we don't track which lanes are still live, we cannot set the
kill flag in such case.

Note: This bug has been latent for about 7 years (r104056).

llvmg.org/PR33677

llvm-svn: 307428
2017-07-07 19:25:45 +00:00
Ulrich Weigand
8dc10b2d40 [SystemZ] Fix missing emergency spill slot corner case
We sometimes need emergency spill slots for the register scavenger.
This may be the case when code needs to access a stack slot that
has an offset of 4096 or more relative to the stack pointer.

To make that determination, processFunctionBeforeFrameFinalized
currently simply checks the total stack frame size of the current
function.  But this is not enough, since code may need to access
stack slots in the caller's stack frame as well, in particular
incoming arguments stored on the stack.

This commit fixes the problem by taking argument slots into account.

llvm-svn: 306305
2017-06-26 16:50:32 +00:00
Jonas Paulsson
12cce93ec7 [SystemZ] Add a check against zero before calling getTestUnderMaskCond()
Csmith discovered that this function can be called with a zero argument,
in which case an assert for this triggered.

This patch also adds a guard before the other call to this function since
it was missing, although the test only covers the case where it was
discovered.

Reduced test case attached as CodeGen/SystemZ/int-cmp-54.ll.

Review: Ulrich Weigand
llvm-svn: 306287
2017-06-26 13:38:27 +00:00
Ulrich Weigand
c7437c5490 [SystemZ] Remove unnecessary serialization before volatile loads
This reverts the use of TargetLowering::prepareVolatileOrAtomicLoad
introduced by r196905.  Nothing in the semantics of the "volatile"
keyword or the definition of the z/Architecture actually requires
that volatile loads are preceded by a serialization operation, and
no other compiler on the platform actually implements this.

Since we've now seen a use case where this additional serialization
causes noticable performance degradation, this patch removes it.

The patch still leaves in the serialization before atomic loads,
which is now implemented directly in lowerATOMIC_LOAD.  (This also
seems overkill, but that can be addressed separately.)

llvm-svn: 306117
2017-06-23 15:56:14 +00:00
Jonas Paulsson
13d4242b40 [SystemZ] Fix trap issue and enable expensive checks.
The isBarrier/isTerminator flags have been removed from the SystemZ trap
instructions, so that tests do not fail with EXPENSIVE_CHECKS. This was just
an issue at -O0 and did not affect code output on benchmarks.

(Like Eli pointed out: "targets are split over whether they consider their
"trap" a terminator; x86, AArch64, and NVPTX don't, but ARM, MIPS, PPC, and
SystemZ do. We should probably try to be consistent here.". This is still the
case, although SystemZ has switched sides).

SystemZ now returns true in isMachineVerifierClean() :-)

These Generic tests have been modified so that they can be run with or without
EXPENSIVE_CHECKS: CodeGen/Generic/llc-start-stop.ll and
CodeGen/Generic/print-machineinstrs.ll

Review: Ulrich Weigand, Simon Pilgrim, Eli Friedman
https://bugs.llvm.org/show_bug.cgi?id=33047
https://reviews.llvm.org/D34143

llvm-svn: 306106
2017-06-23 14:30:46 +00:00
Geoff Berry
6ea2cff39a [SelectionDAG] Allow sin/cos -> sincos optimization on GNU triples w/ just -fno-math-errno
Summary:
This change enables the sin(x) cos(x) -> sincos(x) optimization on GNU
target triples.  This optimization was being inhibited when -ffast-math
wasn't set because sincos in GLibC does not set errno, while sin and cos
do.  However, this optimization will only run if the attributes on the
sin/cos calls include readnone, which is how clang represents the fact
that it doesn't care about the errno values set by these functions (via
the -fno-math-errno flag).

Reviewers: hfinkel, bogner

Subscribers: mcrosier, javed.absar, llvm-commits, paul.redmond

Differential Revision: https://reviews.llvm.org/D32921

llvm-svn: 305204
2017-06-12 17:15:41 +00:00
Quentin Colombet
45dde84d36 [SystemZ] Simplify test case. NFC
Remove useless successors information.

llvm-svn: 304615
2017-06-02 23:40:58 +00:00
Quentin Colombet
a3456a760c [RABasic] Properly update the LiveRegMatrix when LR splitting occur
Prior to this patch we used to not touch the LiveRegMatrix while doing
live-range splitting. In other words, when live-range splitting was
occurring, the LiveRegMatrix was not reflecting the changes.
This is generally fine because it means the query to the LiveRegMatrix
will be conservately correct. However, when decisions are taken based on
what is going to happen on the interferences (e.g., when we spill a
register and know that it is going to be available for another one), we
might hit an assertion that the color used for the assignment is still
in use.

This patch makes sure the changes on the live-ranges are properly
reflected in the LiveRegMatrix, so the assertions don't break.
An alternative could have been to remove the assertion, but it would
make the invariants of the code and the general reasoning more
complicated in my opnion.

http://llvm.org/PR33057

llvm-svn: 304603
2017-06-02 22:46:31 +00:00
Nirav Dave
3633380341 Elide stores which are overwritten without being observed.
Summary:
In SelectionDAG, when a store is immediately chained to another store
to the same address, elide the first store as it has no observable
effects. This is causes small improvements dealing with intrinsics
lowered to stores.

Test notes:

* Many testcases overwrite store addresses multiple times and needed
  minor changes, mainly making stores volatile to prevent the
  optimization from optimizing the test away.

* Many X86 test cases optimized out instructions associated with
  associated with va_start.

* Note that test_splat in CodeGen/AArch64/misched-stp.ll no longer has
  dependencies to check and can probably be removed and potentially
  replaced with another test.

Reviewers: rnk, john.brawn

Subscribers: aemerson, rengolin, qcolombet, jyknight, nemanjai, nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33206

llvm-svn: 303198
2017-05-16 19:43:56 +00:00
Jonas Paulsson
9727d7a8ef Handle a COPY with undef source operand in LowerCopy()
Llvm-stress discovered that a COPY may end up in ExpandPostRA::LowerCopy()
with an undef source operand. It is not possible for the target to handle
this, as this flag is not passed to TII->copyPhysReg().

This patch solves this by treating such a COPY as an identity COPY.

Review: Matthias Braun
https://reviews.llvm.org/D32892

llvm-svn: 302877
2017-05-12 06:32:03 +00:00
Jonas Paulsson
d401c2a29b [SystemZ] Implement getRepRegClassFor()
This method must return a valid register class, or the list-ilp isel
scheduler will crash. For MVT::Untyped nullptr was previously returned, but
now ADDR128BitRegClass is returned instead. This is needed just as long as
list-ilp (and probably also list-hybrid) is still there.

Review: Ulrich Weigand, A Trick
https://reviews.llvm.org/D32802

llvm-svn: 302649
2017-05-10 13:03:25 +00:00
Jonas Paulsson
466bbd4878 [SystemZ] Make copyPhysReg() add impl-use operands of super reg.
When a 128 bit COPY is lowered into two instructions, an impl-use operand of
the super-reg should be added to each new instruction in case one of the
sub-regs is undefined.

Review: Ulrich Weigand
llvm-svn: 302146
2017-05-04 13:33:30 +00:00
Jonas Paulsson
a537a87015 [SystemZ] Update kill-flag in splitMove().
EarlierMI needs to clear the kill flag on the first operand in case of a store.

Review: Ulrich Weigand
llvm-svn: 301177
2017-04-24 12:40:28 +00:00
Matt Arsenault
e0ead8d1f3 Add address space mangling to lifetime intrinsics
In preparation for allowing allocas to have non-0 addrspace.

llvm-svn: 299876
2017-04-10 20:18:21 +00:00
Jonas Paulsson
dd74c435b6 [SystemZ] Check for presence of vector support in SystemZISelLowering
A test case was found with llvm-stress that caused DAGCombiner to crash
when compiling for an older subtarget without vector support.

SystemZTargetLowering::combineTruncateExtract() should do nothing for older
subtargets.

This check was placed in canTreatAsByteVector(), which also helps in a few
other places.

Review: Ulrich Weigand
llvm-svn: 299763
2017-04-07 12:35:11 +00:00
Nirav Dave
7ec57298c0 [SystemZ] Prevent Merging Bitcast with non-normal loads
Fixes PR32505.

Reviewers: uweigand, jonpa

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31609

llvm-svn: 299552
2017-04-05 15:42:48 +00:00
Jonas Paulsson
168f955a23 [DAGCombiner] Don't make a BUILD_VECTOR with operands of illegal type.
When DAGCombiner visits a SIGN_EXTEND_INREG of a BUILD_VECTOR with
constant operands, a new BUILD_VECTOR node will be created transformed
constants.

Llvm-stress found a case where the new BUILD_VECTOR had constant operands
of an illegal type, because the (legal) element type is in fact not a legal
scalar type.

This patch changes this so that the new BUILD_VECTOR has the same operand
type as the old one.

Review: Eli Friedman, Nirav Dave
https://bugs.llvm.org//show_bug.cgi?id=32422

llvm-svn: 299540
2017-04-05 13:45:37 +00:00
Jonas Paulsson
de98d3678e [SystemZ] Make sure of correct regclasses in insertSelect()
Since LOCR only accepts GR32 virtual registers, its operands must be copied
into this regclass in insertSelect(), when an LOCR is built. Otherwise, the
case where the source operand was GRX32 will produce invalid IR.

Review: Ulrich Weigand
llvm-svn: 299220
2017-03-31 14:06:59 +00:00
Jonas Paulsson
85e07f410f [SystemZ] Skip DAGCombining of vector node for older subtargets.
Even on older subtargets that lack vector support, there may be vector values
with just one element in the input program. These are converted during DAG
legalization to scalar values.

The pre-legalize SystemZ DAGCombiner methods should in this circumstance not
touch these nodes. This patch adds a check for this in
SystemZTargetLowering::combineEXTRACT_VECTOR_ELT().

Review: Ulrich Weigand
llvm-svn: 299213
2017-03-31 13:22:59 +00:00
Nirav Dave
ab62d18fbb [SDAG] Fix Stale SDNode usage in visitAND
Reorder CombineTo Calls to prevent potential use of deleted node.
Fixes PR32372.

Reviewers: jnspaulsson, RKSimon, uweigand, jonpa

Reviewed By: jonpa

Subscribers: jonpa, llvm-commits

Differential Revision: https://reviews.llvm.org/D31346

llvm-svn: 298920
2017-03-28 14:11:20 +00:00
Jonas Paulsson
946a6fbc82 [SystemZ] Don't drop any operands in expandZExtPseudo()
Make sure that any operands, e.g. of an implicit def of a super reg is
transferred to the new instruction.

Review: Ulrich Weigand
llvm-svn: 298484
2017-03-22 06:03:32 +00:00
Jonas Paulsson
891e08dcf0 [DAGTypeLegalizer] Handle widening truncate to vector of i1.
Previously, PromoteIntRes_TRUNCATE() did not handle the case where
the operand needs widening, which resulted in llvm_unreachable().

This patch adds the needed handling, along with a test case.

Review: Eli Friedman, Simon Pilgrim.
https://reviews.llvm.org/D31077

llvm-svn: 298357
2017-03-21 10:24:14 +00:00
Jonas Paulsson
c9557adb6d [SystemZ] Don't drop MO flags in foldMemoryOperandImpl()
The def operand of the new LG/LD should have the old def operands
flags and subreg index.

New test: test/CodeGen/SystemZ/fold-memory-op-impl.ll

Review: Ulrich Weigand
llvm-svn: 298341
2017-03-21 05:49:40 +00:00
Jonas Paulsson
5b6288c8ae [SystemZ] New CodeGen tests for vector compare / select.
New SystemZ tests for the improved codegen of vector compare and select,
including cases with a logical combination of two compares.

Review: Ulrich Weigand.
https://reviews.llvm.org/D29489

llvm-svn: 298049
2017-03-17 07:11:46 +00:00
Jonas Paulsson
34e3b8e7cc [SystemZ] Add use of super-reg in splitMove()
If one of the subregs of the 128 bit reg is undefined when splitMove() splits
a store into two instructions, a use of an undefined physical register
results.

To remedy this, an implicit use of the super register is added onto both new
instructions, along with propagated kill and undef flags.

This was discovered with llvm-stress, and that test case is attached as
test/CodeGen/SystemZ/splitMove_undefReg_mverifier.ll

Thanks to Matthias Braun for helping with a nice explanation.

Review: Ulrich Weigand
llvm-svn: 298047
2017-03-17 06:47:08 +00:00
Nirav Dave
889cd22a6a In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements

    Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 297695
2017-03-14 00:34:14 +00:00
Jonas Paulsson
2490763fc1 [SystemZ] Add check VT.isSimple() in canTreateAsByteVector()
Since BB-vectorizer can produce vectors of for example 3 elements,
this check is needed.

Review: Ulrich Weigand
llvm-svn: 297136
2017-03-07 09:49:31 +00:00
Chandler Carruth
7b066daf55 [SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.

llvm-svn: 296862
2017-03-03 10:02:25 +00:00
Nirav Dave
e24ecaa975 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296476
2017-02-28 14:24:15 +00:00
Nirav Dave
e1556d9e43 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.

llvm-svn: 296279
2017-02-26 01:27:32 +00:00
Nirav Dave
0d7bce1241 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296252
2017-02-25 11:43:58 +00:00
Ahmed Bougacha
86ba72cd49 [TLI] Robustize SDAG LibFunc proto checking by merging it into TLI.
This re-applies commit r292189, reverted in r292191.

SelectionDAGBuilder recognizes libfuncs using some homegrown
parameter type-checking.

Use TLI instead, removing another heap of redundant code.

This isn't strictly NFC, as the SDAG code was too lax.
Concretely, this means changes are required to a few tests:
- calling a non-variadic function via a variadic prototype isn't OK;
  it just happens to work on x86_64 (but not on, e.g., aarch64).
- mempcpy has a size_t parameter;  the SDAG code accepts any integer
  type, which meant using i32 on x86_64 worked.
- a handful of SystemZ tests check the SDAG support for lax prototype
  checking: Ulrich agrees on removing them.

I don't think it's worth supporting any of these (IMO) invalid
testcases.  Instead, fix them to be more meaningful.

llvm-svn: 294028
2017-02-03 19:11:19 +00:00
Sanne Wouda
ded29c4b12 [LLC] Add an inline assembly diagnostics handler.
Summary:
llc would hit a fatal error for errors in inline assembly. The
diagnostics message is now printed.

Reviewers: rengolin, MatzeB, javed.absar, anemet

Reviewed By: anemet

Subscribers: jyknight, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D29408

llvm-svn: 293999
2017-02-03 11:14:39 +00:00
Nirav Dave
63300d8c5e Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.

llvm-svn: 293915
2017-02-02 18:24:55 +00:00
Nirav Dave
d4909b474b In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixing X86 inc/dec chain bug.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293893
2017-02-02 14:39:42 +00:00
Kyle Butt
9386601a22 CodeGen: Allow small copyable blocks to "break" the CFG.
When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well, subject to some simple frequency calculations.

Differential Revision: https://reviews.llvm.org/D28583

llvm-svn: 293716
2017-01-31 23:48:32 +00:00
Justin Bogner
4634635ce4 SDAG: Update ChainNodesMatched during UpdateChains if a node is replaced
Previously, we would hit UB (or the ISD::DELETED_NODE assert) if we
happened to replace a node during UpdateChains, because it would be
left in the list we were iterating over. This nulls out the pointer
when that happens so that we can avoid the issue.

Fixes llvm.org/PR31710

llvm-svn: 293522
2017-01-30 18:29:46 +00:00