1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 20:43:44 +02:00
Commit Graph

2219 Commits

Author SHA1 Message Date
Roman Tereshin
041719be2f [GlobalISel][AArch64] LegalizerInfo verifier: Fixing bugs exposed by LegalizerInfo::verify(...)
Reviewers: aemerson, qcolombet

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D46339

llvm-svn: 333618
2018-05-31 01:56:05 +00:00
Roman Tereshin
f1db1e6ccc [GlobalISel][AArch64] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call w/o fixing bugs
This is to make it clear what kind of bugs the LegalizerInfo::verifier
is able to catch and test its output

Reviewers: aemerson, qcolombet

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D46338

llvm-svn: 333597
2018-05-30 22:10:04 +00:00
Evandro Menezes
45428262f7 [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy
As suggested in https://bugs.llvm.org/show_bug.cgi?id=32384#c1, this change
makes the inlining of `memset()` and `memcpy()` more aggressive when
compiling for speed.  The tuning remains the same when optimizing for size.

Patch by: Sebastian Pop <s.pop@samsung.com>
          Evandro Menezes <e.menezes@samsung.com>

Differential revision: https://reviews.llvm.org/D45098

llvm-svn: 333429
2018-05-29 15:58:50 +00:00
Amara Emerson
128e26e6c5 Revert "[AArch64] added FP16 vcvth intrinsic support"
This reverts commit r333410 due to bot failures.

llvm-svn: 333427
2018-05-29 15:34:22 +00:00
Luke Geeson
30260ba412 [AArch64] added FP16 vcvth intrinsic support
Summary: Change-Id: I0df845749c7689dfc99150ba7c19c7d0dadbd705

Reviewers: javed.absar, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: llvm-commits, SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46311

llvm-svn: 333410
2018-05-29 11:40:33 +00:00
Eli Friedman
381e806df7 [AArch64] Improve orr+movk sequences for MOVi64imm.
The existing code has three different ways to try to lower a 64-bit
immediate to the sequence ORR+MOVK.  The result is messy: it misses
some possible sequences, and the order of the checks means we sometimes
emit two MOVKs when we only need one.

Instead, just use a simple loop to try all possible two-instruction
ORR+MOVK sequences.

Differential Revision: https://reviews.llvm.org/D47176

llvm-svn: 333218
2018-05-24 19:38:23 +00:00
Geoff Berry
33474b29b1 [AArch64] Take advantage of variable shift/rotate amount implicit mod operation.
Summary:
Optimize code generated for variable shifts/rotates by taking advantage
of the implicit and/mod done on the variable shift amount register.

Resolves bug 27582 and bug 37421.

Reviewers: t.p.northover, qcolombet, MatzeB, javed.absar

Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46844

llvm-svn: 333214
2018-05-24 18:29:42 +00:00
Eli Friedman
f156ca7ef2 [MachineOutliner] Add "thunk" outlining for AArch64.
When we're outlining a sequence that ends in a call, we can save up to
three instructions in the outlined function by turning the call into
a tail-call. I refer to this as thunk outlining because the resulting
outlined function looks like a thunk; suggestions welcome for a better
name.

In addition to making the outlined function shorter, thunk outlining
allows outlining calls which would otherwise be illegal to outline:
we don't need to save/restore LR, so we don't need to prove anything
about the stack access patterns of the callee.

To make this work effectively, I also added
MachineOutlinerInstrType::LegalTerminator to the generic MachineOutliner
code; this allows treating an arbitrary instruction as a terminator in
the suffix tree.

Differential Revision: https://reviews.llvm.org/D47173

llvm-svn: 333015
2018-05-22 19:11:06 +00:00
Sanjay Patel
6ca90c97f9 [DAG] fold FP binops with undef operands to NaN
This is the FP sibling of D43141 with the corresponding IR change in rL327212.

We can't propagate undef here because if a variable operand is a NaN, these 
binops must propagate NaN. Neither global nor node-level fast-math makes a 
difference. If we have 'nnan', I think later folds can turn the NaN into undef.

The tests in X86/fp-undef.ll are meant to be the definitive verification for 
these folds - everything reduces identically now.

The other test changes are collateral damage. They may need to be altered to
preserve their intent.

Differential Revision: https://reviews.llvm.org/D47026

llvm-svn: 332920
2018-05-21 23:54:19 +00:00
Roman Lebedev
b563ad8227 [DAGCombine][X86][AArch64] Masked merge unfolding: vector edition.
Summary:
This **appears** to be the last missing piece for the masked merge pattern handling in the backend.

This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`), and we need to make sure that they are generated.

Differential Revision: https://reviews.llvm.org/D46528

llvm-svn: 332904
2018-05-21 21:41:02 +00:00
Roman Lebedev
1fc27264bc [X86][AArch64][NFC] Add tests for vector masked merge unfolding
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`).

Differential Revision: https://reviews.llvm.org/D46008

llvm-svn: 332903
2018-05-21 21:40:51 +00:00
Haicheng Wu
02c9019201 [GlobalMerge] Exit early if only one global is to be merged
To save some compilation time and prevent some unnecessary changes.

Differential Revision: https://reviews.llvm.org/D46640

llvm-svn: 332813
2018-05-19 18:00:02 +00:00
Eli Friedman
8fa755c873 [MachineOutliner] Count savings from outlining in bytes.
Counting the number of instructions is both unintuitive and inaccurate.
On AArch64, this only affects the generated remarks and certain rare
pseudo-instructions, but it will have a bigger impact on other targets.

Differential Revision: https://reviews.llvm.org/D46921

llvm-svn: 332685
2018-05-18 01:52:16 +00:00
Sanjay Patel
0e08478da6 [AArch64] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.

Follow-up to:
https://reviews.llvm.org/rL332534
...because that change wasn't enough.

llvm-svn: 332636
2018-05-17 18:07:02 +00:00
Sanjay Patel
9bfc7ba08b [AArch64] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.

llvm-svn: 332534
2018-05-16 21:57:57 +00:00
Eli Friedman
237d12abb5 [MachineOutliner] Don't outline instructions that modify SP.
This breaks the code which saves and restores LR, so we can't outline
without doing something more complicated for stack adjustment.

Found by inspection; we get lucky in most cases because getMemOpInfo
only handles STRWpost, not any other pre/post-increment forms. But it
hits a couple of artificial testcases in the tree.

Differential Revision: https://reviews.llvm.org/D46920

llvm-svn: 332529
2018-05-16 21:20:16 +00:00
Eli Friedman
e346f11594 [MachineOutliner] Don't save/restore LR for tail calls.
The cost computation assumes we do this correctly, but the actual
lowering was wrong.

Differential Revision: https://reviews.llvm.org/D46923

llvm-svn: 332514
2018-05-16 19:49:01 +00:00
Sirish Pande
f2ba0203b0 [AArch64] Gangup loads and stores for pairing.
Keep loads and stores together (target defines how many loads
and stores to gang up), such that it will help in pairing
and vectorization.

Differential Revision https://reviews.llvm.org/D46477

llvm-svn: 332482
2018-05-16 15:36:52 +00:00
Amara Emerson
ba73b15d7a [GlobalISel][IRTranslator] Split aggregates during IR translation.
We currently handle all aggregates by creating one large LLT, and letting the
legalizer deal with splitting them up. However using this approach means that
we can't support big endian code correctly.

This patch changes the way that the IRTranslator deals with aggregate values,
by splitting them up into their constituent element values. To do this, parts
of the translator need to be modified to deal with multiple VRegs for a single
Value.

A new Value to VReg mapper is introduced to help keep compile time under
control, currently there is no measurable impact on CTMark despite the extra
code being generated in some cases.

Patch is based on the original work of Tim Northover.

Differential Revision: https://reviews.llvm.org/D46018

llvm-svn: 332449
2018-05-16 10:32:02 +00:00
Peter Smith
c69d5b5cbb [AArch64] Support "S" inline assembler constraint
This patch re-introduces the "S" inline assembler constraint. This matches
an absolute symbolic address or a label reference. The primary use case is

asm("adrp %0, %1\n\t"
    "add %0, %0, :lo12:%1" : "=r"(addr) : "S"(&var));

I say re-introduces as it seems like "S" was implemented in the original
AArch64 backend, but it looks like it wasn't carried forward to the merged
backend. The original implementation had A and L modifiers that could be
used to print ":lo12:" to the string. It looks like gcc doesn't use these
and :lo12: is expected to be written in the inline assembly string so I've
not implemented A and L. Clang already supports the S modifier.

Fixes PR37180

Differential Revision: https://reviews.llvm.org/D46745

llvm-svn: 332444
2018-05-16 09:33:25 +00:00
Eli Friedman
6893497aee [MachineOutliner] Add optsize markings to outlined functions.
It doesn't matter much this late in the pipeline, but one place that
does check for it is the function alignment code.

Differential Revision: https://reviews.llvm.org/D46373

llvm-svn: 332415
2018-05-15 23:36:46 +00:00
Evandro Menezes
32ec7fd07b [AArch64] Improve single vector lane unscaled stores
When storing the 0th lane of a vector, use a simpler and usually more
efficient scalar store instead.  In this case, also using the unscaled
offset.

Differential revision: https://reviews.llvm.org/D46762

llvm-svn: 332394
2018-05-15 20:41:12 +00:00
Geoff Berry
4cf1405e34 [AArch64] Fix mir test case liveins info.
The test case added in r332265 had incomplete livein information which
was caught by the EXPENSIVE_CHECKS bot.  Fix the livein information and
add -verify-machineinstrs to the test case.

llvm-svn: 332367
2018-05-15 16:27:34 +00:00
Sanjay Patel
a66bd4e046 [DAG] propagate FMF for all FPMathOperators
This is a simple hack based on what's proposed in D37686, but we can extend it if needed in follow-ups. 
It gets us most of the FMF functionality that we want without adding any state bits to the flags. It 
also intentionally leaves out non-FMF flags (nsw, etc) to minimize the patch.

It should provide a superset of the functionality from D46563 - the extra tests show propagation and 
codegen diffs for fcmp, vecreduce, and FP libcalls.

The PPC log2() test shows the limits of this most basic approach - we only applied 'afn' to the last 
node created for the call. AFAIK, there aren't any libcall optimizations based on the flags currently, 
so that shouldn't make any difference.

Differential Revision: https://reviews.llvm.org/D46854

llvm-svn: 332358
2018-05-15 14:16:24 +00:00
Sanjay Patel
d00ba62ad4 [AArch64] enhance test to show FMF loss; NFC
llvm-svn: 332301
2018-05-14 21:53:21 +00:00
Geoff Berry
8272ad3265 [BranchFolding] Allow hoisting to block with a single conditional branch.
Summary:
The BranchFolding pass is currently missing opportunities to hoist
common code if the hoisted-to block contains a single conditional branch
that has register uses.  This occurs somewhat frequently on AArch64 with
CBZ/TBZ opcodes.

This change also eliminates some code differences when debug info is
present since the presence of e.g. DBG_VALUE instructions in the
hoisted-to block can enable hoisting that wouldn't have occurred without
them.

Reviewers: MatzeB, rnk, kparzysz, twoh, aprantl, javed.absar

Subscribers: kristof.beyls, JDevlieghere, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46324

llvm-svn: 332265
2018-05-14 17:31:18 +00:00
Evandro Menezes
fca3a5cdd4 [AArch64] Improve single vector lane stores
When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead.

Differential revision: https://reviews.llvm.org/D46655

llvm-svn: 332251
2018-05-14 15:26:35 +00:00
Vedant Kumar
8a31d7c70d [DAGCombiner] Set the right SDLoc on a newly-created sextload (6/N)
This teaches tryToFoldExtOfLoad to set the right location on a
newly-created extload. With that in place, the logic for performing a
certain ([s|z]ext (load ...)) combine becomes identical for sexts and
zexts, and we can get rid of one copy of the logic.

The test case churn is due to dependencies on IROrders inherited from
the wrong SDLoc.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46158

llvm-svn: 332118
2018-05-11 18:40:08 +00:00
Geoff Berry
84b578a23c [AArch64] Fix performPostLD1Combine to check for constant lane index.
Summary:
performPostLD1Combine in AArch64ISelLowering looks for vector
insert_vector_elt of a loaded value which it can optimize into a single
LD1LANE instruction.  The code checking for the pattern was not checking
if the lane index was a constant which could cause two problems:

- an assert when lowering the LD1LANE ISD node since it assumes an
  constant operand

- an assert in isel if the lane index value depends on the
  post-incremented base register

Both of these issues are avoided by simply checking that the lane index
is a constant.

Fixes bug 35822.

Reviewers: t.p.northover, javed.absar

Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46591

llvm-svn: 332103
2018-05-11 16:25:06 +00:00
Roman Tereshin
ba67f4b6c1 [GlobalISel][Legalizer] Widening the second src op of shifts bug fix
The second source operand of G_SHL, G_ASHR, and G_LSHR must preserve its
value as a (small) unsigned integer, therefore its incorrect to widen it
in any way but by zero extending it.

G_SHL was using G_ANYEXT and G_ASHR - G_SEXT (which is correct for their
destination and first source operands, but not the "number of bits to
shift" operand).

Generally, shifts aren't as similar to regular binary operations as it
might seem, for instance, they aren't commutative nor associative and
the second source operand usually requires a special treatment.

Reviewers: bogner, javed.absar, aivchenk, rovka

Reviewed By: bogner

Subscribers: igorb, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D46413

llvm-svn: 331926
2018-05-09 21:43:30 +00:00
Roman Tereshin
ef8dceef3c Reapplying r331819 [GlobalISel][Legalizer] More concise and faster widenScalar, NFC
The commit was a suspect for clang-cmake-aarch64-global-isel and
    clang-cmake-aarch64-quick bot failures, proved to be innocent.

llvm-svn: 331898
2018-05-09 17:28:18 +00:00
Daniel Sanders
7932a6b01b Revert r331816 and r331820 - [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Reverting this to see if the clang-cmake-aarch64-global-isel and
clang-cmake-aarch64-quick bots are failing because of this commit.
We know it wasn't r331819.

llvm-svn: 331846
2018-05-09 05:00:17 +00:00
Shiva Chen
a2029fa58e [DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is

!DILabel(scope: !1, name: "foo", file: !2, line: 3)

We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is

llvm.dbg.label(metadata !1)

It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.

We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.

Differential Revision: https://reviews.llvm.org/D45024

Patch by Hsiangkai Wang.

llvm-svn: 331841
2018-05-09 02:40:45 +00:00
Roman Tereshin
53a5f4ff4b Revert r331819 [GlobalISel][Legalizer] More concise and faster widenScalar, NFC
Reverting this to see if the clang-cmake-aarch64-global-isel and
clang-cmake-aarch64-quick bots are failing because of this commit

llvm-svn: 331839
2018-05-09 01:43:12 +00:00
Roman Tereshin
40afff49ea [GlobalISel][Legalizer] More concise and faster widenScalar, NFC
Refactoring LegalizerHelper::widenScalar member function reducing its
size by approximately a factor of 2 and (hopefuly) making it more
straightforward and regular by introducing widenScalarSrc and
widenScalarDst helper methods.

The new widenScalar* methods mutate the instructions in place instead
of recreating them from scratch and removing the originals. The
compile time implications of this were measured on sqlite3
amalgamation, targeting AArch64 in -O0:

LegalizerHelper::widenScalar: > 25% faster
Legalizer::runOnMachineFunction: ~ 4.0 - 4.5% faster

Also adding MachineOperand::setCImm and refactoring out
MachineIRBuilder::recordInsertion methods to make the change possible.

Reviewers: aditya_nandakumar, bogner, javed.absar, t.p.northover, ab, dsanders, arsenm

Reviewed By: aditya_nandakumar

Subscribers: wdng, rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D46414

llvm-svn: 331819
2018-05-08 22:53:09 +00:00
Daniel Sanders
7e100ec90e [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Summary: Depends on D45541

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson

Reviewed By: aemerson

Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45543

llvm-svn: 331816
2018-05-08 22:26:39 +00:00
Roman Lebedev
9f18bca620 [DAGCombine][NFC] Masked merge unfolding: comment: some tests are non-canonical
As requested in https://reviews.llvm.org/D46494#inline-407282

llvm-svn: 331650
2018-05-07 16:42:47 +00:00
Daniel Sanders
6b73e7b79e [globalisel] Remove redundant -global-isel option from tests that use -run-pass. NFC
As Roman Tereshin pointed out in https://reviews.llvm.org/D45541, the
-global-isel option is redundant when -run-pass is given. -global-isel sets up
the GlobalISel passes in the pass manager but -run-pass skips that entirely and
configures it's own pipeline.

llvm-svn: 331603
2018-05-05 21:19:59 +00:00
Daniel Sanders
96623363e9 [globalisel] Update GlobalISel emitter to match new representation of extending loads
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
  registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
  extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
  improve optimization of extends and truncates, this legality requirement
  would spread without considerable care w.r.t when certain combines were
  permitted.
* The SelectionDAG importer required some ugly and fragile pattern
  rewriting to translate patterns into this style.

This patch changes the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.

Each extending load can be lowered by the legalizer into separate extends
and loads, however a target that supports s1 will need the any-extending
load to extend to at least s8 since LLVM does not represent memory accesses
smaller than 8 bit. The legalizer can widenScalar G_LOAD into an
any-extending load but sign/zero-extending loads need help from something
else like a combiner pass. A follow-up patch that adds combiner helpers for
for this will follow.

The new representation requires that the MMO correctly reflect the memory
access so this has been corrected in a couple tests. I've also moved the
extending loads to their own tests since they are (mostly) separate opcodes
now. Additionally, the re-write appears to have invalidated two tests from
select-with-no-legality-check.mir since the matcher table no longer contains
loads that result in s1's and they aren't legal in AArch64 anymore.

Depends on D45540

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar

Reviewed By: rtereshin

Subscribers: javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45541

llvm-svn: 331601
2018-05-05 20:53:24 +00:00
Roman Lebedev
aabea07014 [DAGCombiner] Masked merge: don't touch "not" xor's.
Summary:
Split off form D46031.

It seems we don't want to transform the pattern if the `xor`'s are actually `not`'s.
In vector case, this breaks `andnpd` / `vandnps` patterns.

That being said, we may want to re-visit this `not` handling, maybe in D46073.

Reviewers: spatel, craig.topper, javed.absar

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46492

llvm-svn: 331595
2018-05-05 15:45:40 +00:00
Geoff Berry
5f4014e773 [MachineLICM] Debug intrinsics shouldn't affect hoist decisions
Summary:
When checking if an instruction stores to a given frame index, check
that the instruction can write to memory before looking at the memory
operands list to avoid e.g. DBG_VALUE instructions that reference a
frame index preventing a load from that index from being hoisted.

Reviewers: dblaikie, MatzeB, qcolombet, reames, javed.absar

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46284

llvm-svn: 331549
2018-05-04 19:25:09 +00:00
Adhemerval Zanella
14b1860ec6 [AArch64] Add missing testcase for r331522
llvm-svn: 331541
2018-05-04 17:21:26 +00:00
Adhemerval Zanella
78143c1a3b [AArch64] Custom Lower MULLH{S,U} for v16i8, v8i16, and v4i32
This patch adds a custom lowering for ISD::MULH{S,U} used on divide by
constant optimization (DAGCombiner::BuildSDIV and DAGCombiner::BuildUDIV).

New patterns for smull and umull are added, so AArch64ISD::{S,U}MULL
can be correctly lowered to smull2 and umull2.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46009

llvm-svn: 331522
2018-05-04 14:33:55 +00:00
Piotr Padlewski
1e96fe1a21 Rename invariant.group.barrier to launder.invariant.group
Summary:
This is one of the initial commit of "RFC: Devirtualization v2" proposal:
https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo/edit?usp=sharing

Reviewers: rsmith, amharc, kuhar, sanjoy

Subscribers: arsenm, nhaehnle, javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D45111

llvm-svn: 331448
2018-05-03 11:03:01 +00:00
Eli Friedman
4d25e8bf01 [AArch64] Add more tests for 64-bit immediate lowering.
This adds a some more tests, and adds some notes to tests which are using
a suboptimal lowering.

The constants with suboptimal lowerings seem to be relatively rare in
practice, but it might be a fun project to work on improvements.

llvm-svn: 331304
2018-05-01 20:00:14 +00:00
Vedant Kumar
ba4e4efcfb [DAGCombiner] Set the right SDLoc on a newly-created zextload (1/N)
Setting the right SDLoc on a newly-created zextload fixes a line table
bug which resulted in non-linear stepping behavior.

Several backend tests contained CHECK lines which relied on the IROrder
inherited from the wrong SDLoc. This patch breaks that dependence where
feasbile and regenerates test cases where not.

In some cases, changing a node's IROrder may alter register allocation
and spill behavior. This can affect performance. I have chosen not to
prevent this by applying a "known good" IROrder to SDLocs, as this may
hide a more general bug in the scheduler, or cause regressions on other
test inputs.

rdar://33755881, Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D45995

llvm-svn: 331300
2018-05-01 19:26:15 +00:00
Daniel Sanders
794bd52155 Fix infinite loop after r331115
There are two separate fixes here:
* The lowering code for non-extending loads should report UnableToLegalize instead of emitting the same instruction.
* The target should not be requesting lowering of non-extending loads.

llvm-svn: 331201
2018-04-30 17:20:01 +00:00
Daniel Sanders
491fd6384e [globalisel][legalizerinfo] Introduce dedicated extending loads and add lowerings for them
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
  registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
  extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
  improve optimization of extends and truncates, this legality requirement
  would spread without considerable care w.r.t when certain combines were
  permitted.
* The SelectionDAG importer required some ugly and fragile pattern
  rewriting to translate patterns into this style.

This patch begins changing the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.

This patch introduces the new generic instructions and new variation on
G_LOAD and adds lowering for them to convert back to the existing
representations.

Depends on D45466

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, aemerson, javed.absar

Reviewed By: aemerson

Subscribers: aemerson, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45540

llvm-svn: 331115
2018-04-28 18:14:50 +00:00
Jessica Paquette
ee6b7b1495 [MachineOutliner] Add defs to calls + don't track liveness on outlined functions
This commit makes it so that if you outline a def of some register, then the
call instruction created by the outliner actually reflects that the register
is defined by the call. It also makes it so that outlined functions don't
have the TracksLiveness property.

Outlined calls shouldn't break liveness assumptions that someone might make.

This also un-XFAILs the noredzone test, and updates the calls test.

llvm-svn: 331095
2018-04-27 23:36:35 +00:00
Jun Bum Lim
4c87e73bbe [PostRASink] extend the live-in check for all aliased registers
Extend the live-in check for all aliased registers so that we can
allow sinking Copy instructions when only implicit def is in successor's
live-in.

llvm-svn: 331072
2018-04-27 19:59:20 +00:00
Daniel Sanders
a2906bd14b [globalisel][legalizerinfo] Add support for legalization based on the MachineMemOperand
Summary:
Currently only the memory size is supported but others can be added as
needed.

narrowScalar for G_LOAD and G_STORE now correctly update the
MachineMemOperand and will refuse to legalize atomics since those need more
careful expansions to maintain atomicity.

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, aemerson, javed.absar

Reviewed By: aemerson

Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45466

llvm-svn: 331071
2018-04-27 19:48:53 +00:00
Francis Visoiu Mistrih
fb4d267969 [AArch64] Place the first ldp at the end when ReverseCSRRestoreSeq is true
Put the first ldp at the end, so that the load-store optimizer can run
and merge the ldp and the add into a post-index ldp.

This didn't work in case no frame was needed and resulted in code size
regressions.

llvm-svn: 331044
2018-04-27 15:30:54 +00:00
Oliver Stannard
9fd29cce52 [AArch64] Codegen for v8.2A dot product intrinsics
This adds IR intrinsics for the AArch64 dot-product instructions introduced in
v8.2-A.

Differential revisioon: https://reviews.llvm.org/D46107

llvm-svn: 331036
2018-04-27 13:45:32 +00:00
Eli Friedman
588f10bea1 [MachineOutliner] Don't outline from functions with a section marking.
The program might have unusual expectations for functions; for example,
the Linux kernel's build system warns if it finds references from .text
to .init.data.

I'm not sure this is something we actually want to make any guarantees
about (there isn't any explicit rule that would disallow outlining
in this case), but we might want to be conservative anyway.

Differential Revision: https://reviews.llvm.org/D46091

llvm-svn: 331007
2018-04-27 00:21:34 +00:00
Geoff Berry
dfadfc4259 [AArch64] Fix scavenged spill slot base when stack realignment required.
Summary:
Use the FP for scavenged spill slot accesses to prevent corruption of
the callee-save region when the SP is re-aligned.

Based on problem and patch reported by @paulwalker-arm

This is an alternative to solution proposed in D45770

Reviewers: t.p.northover, paulwalker-arm, thegameg, javed.absar

Subscribers: qcolombet, mcrosier, paulwalker-arm, kristof.beyls, rengolin, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46063

llvm-svn: 330976
2018-04-26 18:50:45 +00:00
Francis Visoiu Mistrih
73884955f2 [MIR] Add support for debug metadata for fixed stack objects
Debug var, expr and loc were only supported for non-fixed stack objects.

This patch adds the following fields to the "fixedStack:" entries, and
renames the ones from "stack:" to:

* debug-info-variable
* debug-info-expression
* debug-info-location

Differential Revision: https://reviews.llvm.org/D46032

llvm-svn: 330859
2018-04-25 18:58:06 +00:00
Amara Emerson
d4697c6c9c [AArch64][GlobalISel] Implement selection for the llvm.trap intrinsic.
rdar://38674040

llvm-svn: 330831
2018-04-25 14:43:59 +00:00
Roman Lebedev
03d388a089 [X86][AArch64][NFC] Finish adding 'bad' tests for masked merge unfolding with constants.
I have initially committed basic tests in, rL330771,
but then quickly discovered that there are a few more
interesting patterns.

llvm-svn: 330819
2018-04-25 12:48:23 +00:00
Jessica Paquette
f7f9e664cd [MachineOutliner] Check for explicit uses of LR/W30 in MI operands
Before, the outliner would grab ADRPs that used LR/W30. This patch fixes
that by checking for explicit uses of those registers before the special-casing
for ADRPs.

This also adds a test that ensures that those sorts of ADRPs won't be outlined.

llvm-svn: 330783
2018-04-24 22:38:15 +00:00
Roman Lebedev
4c9a963a29 [X86][AArch64][NFC] Add tests for masked merge unfolding with %y = const
The fold was added in D45733.

This appears to be a regression.

llvm-svn: 330771
2018-04-24 21:23:22 +00:00
Petar Jovanovic
9686e666ef Correct dwarf unwind information in function epilogue
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.
It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.

The second part is platform independent and ensures that:

* CFI instructions do not affect code generation (they are not counted as
  instructions when tail duplicating or tail merging)
* Unwind information remains correct when a function is modified by
  different passes. This is done in a late pass by analyzing information
  about cfa offset and cfa register in BBs and inserting additional CFI
  directives where necessary.

Added CFIInstrInserter pass:

* analyzes each basic block to determine cfa offset and register are valid
  at its entry and exit
* verifies that outgoing cfa offset and register of predecessor blocks match
  incoming values of their successors
* inserts additional CFI directives at basic block beginning to correct the
  rule for calculating CFA

Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.
CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.

Patch by Violeta Vukobrat.

Differential Revision: https://reviews.llvm.org/D42848

llvm-svn: 330706
2018-04-24 10:32:08 +00:00
Roman Tereshin
68a9fbd154 [GlobalISel][Legalizer] Look thro copies while combining G_UNMERGE's
As we're becoming stricter w/ respect to not allowing vregs having LLTs
and regclasses assigned both mid-globalisel pipeline, the number of
extra copies grows, some of which separate G_UNMERGE's from their
corresponding G_MERGE's, becoming a performance concern.

It's worth mentioning that we're already looking through copies while
combining legalization artifacts for every kind of artifact but
G_UNMERGE.

Reviewed By: aditya_nandakumar

Reviewers: ab, t.p.northover, volkan, javed.absar

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45644

llvm-svn: 330660
2018-04-23 22:28:36 +00:00
Roman Lebedev
adffa0d402 [DAGCombiner] Unfold scalar masked merge if profitable
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andl`+`andn`/`andps`+`andnps` / `bic`/`bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`).
So we need to make sure that they are still generated.

If the mask is constant, we do nothing. InstCombine should have unfolded it.
Else, i use `hasAndNot()` TLI hook.

For now, only handle scalars.

https://rise4fun.com/Alive/bO6

----

I *really* don't like the code i wrote in `DAGCombiner::unfoldMaskedMerge()`.
It is super fragile. Is there something like IR Pattern Matchers for this?

Reviewers: spatel, craig.topper, RKSimon, javed.absar

Reviewed By: spatel

Subscribers: andreadb, courbet, kristof.beyls, javed.absar, rengolin, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D45733

llvm-svn: 330646
2018-04-23 20:38:49 +00:00
Roman Lebedev
e2422e1db7 [X86][AArch64][NFC] Add tests for masked merge unfolding
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andl`+`andn`/`andps`+`andnps` / `bic`/`bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`).
I'm guessing `llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp` should be able to unfold this.

Reviewers: spatel, craig.topper, RKSimon, javed.absar

Reviewed By: spatel

Subscribers: nemanjai, rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45563

llvm-svn: 330645
2018-04-23 20:38:42 +00:00
Peter Collingbourne
b4b51eb3aa Reland r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses.", with a fix for the bot failure.
This reland includes a check to prevent the DAG combiner from folding an
offset that is smaller than the existing one. This can cause oscillations
between two possible DAGs, which was the cause of the hang and later assertion
failure observed on the lnt-ctmark-aarch64-O3-flto bot.
http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/

Original commit message:
> This is a code size win in code that takes offseted addresses
> frequently, such as C++ constructors that typically need to compute
> an offseted address of a vtable. This reduces the size of Chromium
> for Android's .text section by 108KB.

Differential Revision: https://reviews.llvm.org/D45199

llvm-svn: 330630
2018-04-23 19:09:34 +00:00
Eli Friedman
6c495ec529 [AArch64] Don't crash trying to resolve __stack_chk_guard.
In certain cases, the compiler might try to merge __stack_chk_guard with
another global variable.  (Or someone could theoretically define
__stack_chk_guard as an alias.)  In that case, make sure we don't crash.

Differential Revision: https://reviews.llvm.org/D45746

llvm-svn: 330495
2018-04-21 00:07:46 +00:00
Jessica Paquette
cb7a478192 Fix typo in test (verify-machine-instrs -> verify-machineinstrs)
llvm-svn: 330494
2018-04-20 23:37:48 +00:00
Jessica Paquette
f99bed0103 [MachineOutliner] XFAIL machine-outliner-noredzone.ll
The verifier began complaining about an undefined physical register in this
test. XFAILing for the purposes of getting a bot up while I look into it.

Failure:
http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-expensive/11385/

llvm-svn: 330493
2018-04-20 23:35:54 +00:00
Jessica Paquette
c755676edc [MachineOutliner] Change B instruction for tail calls to TCRETURNdi
First off, this is more correct than having the B. Second off, this was making
a bot upset. This fixes that.

Update the test to include -verify-machineinstrs as well to prevent stuff like
this slipping by non debug/assert builds in the future.

llvm-svn: 330459
2018-04-20 18:03:21 +00:00
Sanjay Patel
093a966fb5 [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
This was originally committed at rL328921 and reverted at rL329920 to
investigate failures in Chrome. This time I've added to the ReleaseNotes
to warn users of the potential of exposing UB and let me repeat that
here for more exposure:

  Optimization of floating-point casts is improved. This may cause surprising
  results for code that is relying on undefined behavior. Code sanitizers can
  be used to detect affected patterns such as this:

    int main() {
      float x = 4294967296.0f;
      x = (float)((int)x);
      printf("junk in the ftrunc: %f\n", x);
      return 0;
    }

    $ clang -O1 ftrunc.c -fsanitize=undefined ; ./a.out
    ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of 
                   representable values of type 'int'
    junk in the ftrunc: 0.000000


Original commit message:

fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC,
so replace a pair of casts with the equivalent node. We don't have to account for
special cases (NaN, INF) because out-of-range casts are undefined.

Differential Revision: https://reviews.llvm.org/D44909

llvm-svn: 330437
2018-04-20 15:07:55 +00:00
Amara Emerson
4ec41556e3 [AArch64] Add isel pattern for v8i8->v2f32 NVCASTs.
rdar://39454635

llvm-svn: 330276
2018-04-18 17:10:19 +00:00
Momchil Velikov
128a6355b2 Add tests for shrink wrapping and VLAs
Differential revision: https://reviews.llvm.org/D45727

llvm-svn: 330253
2018-04-18 13:37:12 +00:00
Tim Northover
ab39f1740c MachO: trap unreachable instructions
Debugability is more important than saving 4 bytes to let us to fall
through to nonense.

llvm-svn: 330073
2018-04-13 22:25:20 +00:00
Peter Collingbourne
82ccce4b49 Revert r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses."
Caused a hang and eventually an assertion failure in LTO builds
of 7zip-benchmark on aarch64 iOS targets.
http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/

llvm-svn: 330063
2018-04-13 20:21:00 +00:00
Peter Collingbourne
d59dff036b AArch64: Introduce a DAG combine for folding offsets into addresses.
This is a code size win in code that takes offseted addresses
frequently, such as C++ constructors that typically need to compute
an offseted address of a vtable. This reduces the size of Chromium
for Android's .text section by 108KB.

Differential Revision: https://reviews.llvm.org/D45199

llvm-svn: 329956
2018-04-12 21:23:55 +00:00
Jessica Paquette
e52f1ef99a [AArch64] Move AFI->setRedZone(false) to top of emitPrologue
AFI->setRedZone(false) was put in the wrong place before, and so it only fired
on functions that didn't have stack frames. This moves that to the top of
emitPrologue to make sure that every function without a redzone has it set
correctly.

This also adds a function representing one of the early exit cases (GHC calling
convention) to the MachineOutliner noredzone test to ensure that we can outline
from functions like these, where we never use a redzone.

llvm-svn: 329922
2018-04-12 16:16:18 +00:00
Sanjay Patel
d4c71ad454 revert r328921 - [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
This change is exposing UB in source code - as was warned/predicted. :)
See D44909 for discussion. Reverting while we figure out how to fix things.

llvm-svn: 329920
2018-04-12 15:27:01 +00:00
Hiroshi Inoue
eee649c09f [NFC] fix trivial typos in documents and comments
"is is" -> "is", "if if" -> "if", "or or" -> "or"

llvm-svn: 329878
2018-04-12 05:53:20 +00:00
Reid Kleckner
277138a332 [FastISel] Disable local value sinking by default
This is causing compilation timeouts on code with long sequences of
local values and calls (i.e. foo(1); foo(2); foo(3); ...).  It turns out
that code coverage instrumentation is a great way to create sequences
like this, which how our users ran into the issue in practice.

Intel has a tool that detects these kinds of non-linear compile time
issues, and Andy Kaylor reported it as PR37010.

The current sinking code scans the whole basic block once per local
value sink, which happens before emitting each call. In theory, local
values should only be introduced to be used by instructions between the
current flush point and the last flush point, so we should only need to
scan those instructions.

llvm-svn: 329822
2018-04-11 16:03:07 +00:00
Francis Visoiu Mistrih
6218e0db93 [AArch64] Add test case for r329797
Forgot to add a test case in the previous commit.

llvm-svn: 329805
2018-04-11 13:37:25 +00:00
Geoff Berry
eef4029a4d [AArch64][Falkor] Fix bug in Falkor HWPF collision avoidance pass.
Summary:
When inserting MOVs to avoid Falkor HWPF collisions, the non-base
register operand of load instructions (e.g. a register offset) was not
being considered live, so it could potentially have been used as a
scratch register, clobbering the actual offset value.

Reviewers: mcrosier

Subscribers: rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45502

llvm-svn: 329761
2018-04-10 21:43:03 +00:00
Amara Emerson
803368e4c4 [AArch64] Fix isel failure when BUILD_PAIR nodes are left over.
rdar://39175175

llvm-svn: 329743
2018-04-10 19:01:58 +00:00
Peter Collingbourne
78fa2dfe59 Revert r329611, "AArch64: Allow offsets to be folded into addresses with ELF."
Caused a build failure in check-tsan.

llvm-svn: 329718
2018-04-10 16:19:30 +00:00
Francis Visoiu Mistrih
4152237804 [AArch64] Use FP to access the emergency spill slot
In the presence of variable-sized stack objects, we always picked the
base pointer when resolving frame indices if it was available.

This makes us hit an assert where we can't reach the emergency spill
slot if it's too far away from the base pointer. Since on AArch64 we
decide to place the emergency spill slot at the top of the frame, it
makes more sense to use FP to access it.

The changes here don't affect only emergency spill slots but all the
frame indices. The goal here is to try to choose between FP, BP and SP
so that we minimize the offset and avoid scavenging, or worse, asserting
when trying to access a slot allocated by the scavenger.

Previously discussed here: https://reviews.llvm.org/D40876.

Differential Revision: https://reviews.llvm.org/D45358

llvm-svn: 329691
2018-04-10 11:29:40 +00:00
Peter Collingbourne
8fce7e82b9 AArch64: Allow offsets to be folded into addresses with ELF.
This is a code size win in code that takes offseted addresses
frequently, such as C++ constructors that typically need to compute
an offseted address of a vtable. It reduces the size of Chromium for
Android's .text section by 46KB, or 56KB with ThinLTO (which exposes
more opportunities to use a direct access rather than a GOT access).

Because the addend range is limited in COFF and Mach-O, this is
enabled for ELF only.

Differential Revision: https://reviews.llvm.org/D45199

llvm-svn: 329611
2018-04-09 19:59:57 +00:00
Michael Zolotukhin
e15a25d24e Remove MachineLoopInfo dependency from AsmPrinter.
Summary:
Currently MachineLoopInfo is used in only two places:
1) for computing IsBasicBlockInsideInnermostLoop field of MCCodePaddingContext, and it is never used.
2) in emitBasicBlockLoopComments, which is called only if `isVerbose()` is true.
Despite that, we currently have a dependency on MachineLoopInfo, which makes
pass manager to compute it and MachineDominator Tree. This patch removes the
use (1) and makes the use (2) lazy, thus avoiding some redundant
recomputations.

Reviewers: opaparo, gadi.haber, rafael, craig.topper, zvi

Subscribers: rengolin, javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D44812

llvm-svn: 329542
2018-04-09 00:54:47 +00:00
Guozhi Wei
fb70343484 [DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst))
In our real world application, we found the following optimization is missed in DAGCombiner

(zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst))

If the user of original zext is an add, it may enable further lea optimization on x86.

This patch add a new function CombineZExtLogicopShiftLoad to do this optimization.

Differential Revision: https://reviews.llvm.org/D44402

llvm-svn: 329516
2018-04-07 23:36:10 +00:00
Tim Northover
beda738748 Reapply ARM: Do not spill CSR to stack on entry to noreturn functions
Should fix UBSan bot by also checking there's no "uwtable" attribute
before skipping. Otherwise the unwind table will be useless since its
moves expect CSRs to actually be preserved.

A noreturn nounwind function can be expected to never return in any way, and by
never returning it will also never have to restore any callee-saved registers
for its caller. This makes it possible to skip spills of those registers during
function entry, saving some stack space and time in the process. This is rather
useful for embedded targets with limited stack space.

Should fix PR9970.

Patch mostly by myeisha (pmb).

llvm-svn: 329494
2018-04-07 10:57:03 +00:00
Vitaly Buka
37f0ead627 Revert "ARM: Do not spill CSR to stack on entry to noreturn functions"
Breaks ubsan test TestCases/Misc/missing_return.cpp on ARM

This reverts commit r329287

llvm-svn: 329486
2018-04-07 05:36:44 +00:00
Tim Northover
3a66ac020d ARM: Do not spill CSR to stack on entry to noreturn functions
A noreturn nounwind function can be expected to never return in any way, and by
never returning it will also never have to restore any callee-saved registers
for its caller. This makes it possible to skip spills of those registers during
function entry, saving some stack space and time in the process. This is rather
useful for embedded targets with limited stack space.

Should fix PR9970.

Patch by myeisha (pmb).

llvm-svn: 329287
2018-04-05 14:26:06 +00:00
Peter Collingbourne
c154376f63 AArch64: Implement support for the shadowcallstack attribute.
The implementation of shadow call stack on aarch64 is quite different to
the implementation on x86_64. Instead of reserving a segment register for
the shadow call stack, we reserve the platform register, x18. Any function
that spills lr to sp also spills it to the shadow call stack, a pointer to
which is stored in x18.

Differential Revision: https://reviews.llvm.org/D45239

llvm-svn: 329236
2018-04-04 21:55:44 +00:00
John Brawn
b6e03097ce [AArch64] Add patterns matching (fabs (fsub x y)) to (fabd x y)
Differential Revision: https://reviews.llvm.org/D44573

llvm-svn: 329163
2018-04-04 10:12:53 +00:00
Jessica Paquette
9aae4e4ccc [MachineOutliner] Keep track of fns that use a redzone in AArch64FunctionInfo
This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It
returns true if the function is known to use a redzone, false if it is known
to not use a redzone, and no value otherwise.

This removes the requirement to pass -mno-red-zone when outlining for AArch64.

https://reviews.llvm.org/D45189

llvm-svn: 329120
2018-04-03 21:56:10 +00:00
Jessica Paquette
a2fb3c5cf5 [MachineOutliner][NFC] Make outlined functions have internal linkage
The linkage type on outlined functions was private before. This meant that if
you set a breakpoint in an outlined function, the debugger wouldn't be able to
give a sane name to the outlined function.

This commit changes the linkage type to internal and updates any tests that
relied on the prefixes on the names of outlined functions.
 

llvm-svn: 329116
2018-04-03 21:36:00 +00:00
Petr Hosek
2c556dfbc3 [AArch64] Reserve x18 register on Fuchsia
This register is reserved as a platform register on Fuchsia.

Differential Revision: https://reviews.llvm.org/D45105

llvm-svn: 328950
2018-04-01 23:44:04 +00:00
Sanjay Patel
559aa51573 [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, 
so replace a pair of casts with the equivalent node. We don't have to account for 
special cases (NaN, INF) because out-of-range casts are undefined.

Differential Revision: https://reviews.llvm.org/D44909

llvm-svn: 328921
2018-03-31 17:55:44 +00:00
Sanjay Patel
b36dd6007b [SelectionDAG] Removing FABS folding from DAGCombiner
The code has bugs dealing with -0.0.

Since D44550 introduced FABS pattern folding in InstCombine, 
this patch removes the now-redundant code that causes 
https://bugs.llvm.org/show_bug.cgi?id=36600.

Patch by Mikhail Dvoretckii!

Differential Revision: https://reviews.llvm.org/D44683

llvm-svn: 328872
2018-03-30 15:42:52 +00:00
Eli Friedman
3c02b623fd [MachineCopyPropagation] Handle COPY with overlapping source/dest.
MachineCopyPropagation::CopyPropagateBlock has a bunch of special
handling for COPY instructions. This handling assumes that COPY
instructions do not modify the source of the copy; this is wrong if
the COPY destination overlaps the source.

To fix the bug, check explicitly for this situation, and fall back to
the generic instruction handling.

This bug can't happen for most register classes because they don't
have this sort of overlap, but there are a few register classes
where this is possible. The testcase uses the AArch64 QQQQ register
class.

Differential Revision: https://reviews.llvm.org/D44911

llvm-svn: 328851
2018-03-30 00:56:03 +00:00
Jessica Paquette
1b7564f1ee [MachineOutliner] Simplify call outlining + require valid callee save info for call outlining
This commit simplifies the call outlining logic by removing references to the
Function associated with the callee. To do this, it requires that valid
callee save info is available to the outliner.

llvm-svn: 328719
2018-03-28 17:52:31 +00:00
Sanjay Patel
51fbc89db7 [AArch64] add ftrunc tests; NFC
As suggested in D44909.

llvm-svn: 328683
2018-03-28 00:56:00 +00:00
Jessica Paquette
b8eb640cc8 [MachineOutliner] AArch64: Don't outline ADRPs with un-outlinable operands
If an ADRP appears with, say, a CPI operand, we shouldn't outline it.

This moves the check for unsafe operands so that it occurs before the special-case
for ADRPs. Also add a test for outlining ADRPs.

llvm-svn: 328674
2018-03-27 22:23:48 +00:00
Artur Pilipenko
4ed25640c2 Fix a reoccuring typo in load-combine tests
%tmp = bitcast i32* %arg to i8*
   %tmp1 = getelementptr inbounds i8, i8* %tmp, i32 0
-  %tmp2 = load i8, i8* %tmp, align 1
+  %tmp2 = load i8, i8* %tmp1, align 1

This doesn't change the semantics of the tests but makes use of %tmp1 which was originally intended.

llvm-svn: 328642
2018-03-27 17:33:50 +00:00
Krzysztof Parzyszek
43f8619548 Use .set instead of = when printing assignment in assembly output
On Hexagon "x = y" is a syntax used in most instructions, and is not
treated as a directive.

Differential Revision: https://reviews.llvm.org/D44256

llvm-svn: 328635
2018-03-27 16:44:41 +00:00
John Brawn
7a92fa740d [AArch64] Don't reduce the width of loads if it prevents combining a shift
Loads and stores can only shift the offset register by the size of the value
being loaded, but currently the DAGCombiner will reduce the width of the load
if it's followed by a trunc making it impossible to later combine the shift.

Solve this by implementing shouldReduceLoadWidth for the AArch64 backend and
make it prevent the width reduction if this is what would happen, though do
allow it if reducing the load width will let us eliminate a later sign or zero
extend.

Differential Revision: https://reviews.llvm.org/D44794

llvm-svn: 328321
2018-03-23 14:47:07 +00:00
Amara Emerson
b429423707 [GlobalISel] Fix legalizer combine to not use illegal input G_EXTRACT.
This was being masked because GISel is enabled by default for -O0 and
the abort was disabled. Modified test to explicitly enable abort.

llvm-svn: 328311
2018-03-23 12:48:57 +00:00
Michael Zolotukhin
a1cdccdce3 State that CFG is preserved in 'Falkor HW Prefetch Fix Late Phase'.
That removes some redundant recomputations from the passes pipeline.

llvm-svn: 328272
2018-03-22 23:44:40 +00:00
Michael Zolotukhin
6aab60b5c7 Reapply "[test] Add tests for llc passes pipelines." with a fix for bots with expensive checks on.
llvm-svn: 328267
2018-03-22 23:02:48 +00:00
Jun Bum Lim
834ee28da9 [CodeGen] Add a new pass for PostRA sink
Summary:
This pass sinks COPY instructions into a successor block, if the COPY is not
used in the current block and the COPY is live-in to a single successor
(i.e., doesn't require the COPY to be duplicated).  This avoids executing the
the copy on paths where their results aren't needed.  This also exposes
additional opportunites for dead copy elimination and shrink wrapping.

These copies were either not handled by or are inserted after the MachineSink
pass. As an example of the former case, the MachineSink pass cannot sink
COPY instructions with allocatable source registers; for AArch64 these type
of copy instructions are frequently used to move function parameters (PhyReg)
into virtual registers in the entry block..

For the machine IR below, this pass will sink %w19 in the entry into its
successor (%bb.1) because %w19 is only live-in in %bb.1.

```
   %bb.0:
      %wzr = SUBSWri %w1, 1
      %w19 = COPY %w0
      Bcc 11, %bb.2
    %bb.1:
      Live Ins: %w19
      BL @fun
      %w0 = ADDWrr %w0, %w19
      RET %w0
    %bb.2:
      %w0 = COPY %wzr
      RET %w0
```
As we sink %w19 (CSR in AArch64) into %bb.1, the shrink-wrapping pass will be
able to see %bb.0 as a candidate.

With this change I observed 12% more shrink-wrapping candidate and 13% more dead copies deleted  in spec2000/2006/2017 on AArch64.

Reviewers: qcolombet, MatzeB, thegameg, mcrosier, gberry, hfinkel, john.brawn, twoh, RKSimon, sebpop, kparzysz

Reviewed By: sebpop

Subscribers: evandro, sebpop, sfertile, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41463

llvm-svn: 328237
2018-03-22 20:06:47 +00:00
Aditya Nandakumar
47742e8024 [GISel]: Fix incorrect IRTranslation while translating null pointer types
https://reviews.llvm.org/D44762

Currently IRTranslator produces
%vreg17<def>(p0) = G_CONSTANT 0;

instead we should build
%vreg16(s64) = G_CONSTANT 0
%vreg17(p0) = G_INTTOPTR %vreg16

reviewed by @aemerson.

llvm-svn: 328218
2018-03-22 17:31:38 +00:00
Jonas Devlieghere
0a76321e9b Revert "[test] Add tests for llc passes pipelines."
This reverts r328159 because the two AArch64 tests fail on GreenDragon:
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/11030/

llvm-svn: 328188
2018-03-22 10:34:06 +00:00
Michael Zolotukhin
7b1c1345b8 [test] Add tests for llc passes pipelines.
This is basically an extension of existing test
test/CodeGen/X86/O0-pipeline.ll introduced in r302608.

llvm-svn: 328159
2018-03-21 22:17:13 +00:00
Abderrazek Zaafrani
5c95c02aa1 [AArch64] Add vmulxh_lane fp16 vector intrinsic
https://reviews.llvm.org/D44591

llvm-svn: 328035
2018-03-20 20:25:40 +00:00
Sanjay Patel
ec7d20d5b5 [AArch64] add fabs tests for PR36600; NFC
llvm-svn: 327995
2018-03-20 16:08:47 +00:00
Jessica Paquette
9f81b4f011 [MachineOutliner] AArch64: Emit CFI instructions when outlining calls
When outlining calls, the outliner needs to update CFI to ensure that, say,
exception handling works. This commit adds that functionality and adds a test
just for call outlining.

Call outlining stuff in machine-outliner.mir should be moved into
machine-outliner-calls.mir in a later commit.

llvm-svn: 327917
2018-03-19 22:48:40 +00:00
Martin Storsjo
1a347f2b7a [ARM, AArch64] Check the no-stack-arg-probe attribute for dynamic stack probes
This extends the use of this attribute on ARM and AArch64 from
SVN r325900 (where it was only checked for fixed stack
allocations on ARM/AArch64, but for all stack allocations on X86).

This also adds a testcase for the existing use of disabling the
fixed stack probe with the attribute on ARM and AArch64.

Differential Revision: https://reviews.llvm.org/D44291

llvm-svn: 327897
2018-03-19 20:06:50 +00:00
Jessica Paquette
b58c0c941e [MachineOutliner] Make KILLs invisible
At the point the outliner runs, KILLs don't impact anything, but they're still
considered unique instructions. This commit makes them invisible like
DebugValues so that they can still be outlined without impacting outlining
decisions.

llvm-svn: 327760
2018-03-16 22:53:34 +00:00
Sjoerd Meijer
942d7df325 [AArch64] Codegen tests for the Armv8.2-A FP16 intrinsics
This is a follow up of the AArch64 FP16 intrinsics work;
the codegen tests had not been added yet.

Differential Revision: https://reviews.llvm.org/D44510

llvm-svn: 327624
2018-03-15 13:42:28 +00:00
Reid Kleckner
20a3d2184f [FastISel] Sink local value materializations to first use
Summary:
Local values are constants, global addresses, and stack addresses that
can't be folded into the instruction that uses them. For example, when
storing the address of a global variable into memory, we need to
materialize that address into a register.

FastISel doesn't want to materialize any given local value more than
once, so it generates all local value materialization code at
EmitStartPt, which always dominates the current insertion point. This
allows it to maintain a map of local value registers, and it knows that
the local value area will always dominate the current insertion point.

The downside is that local value instructions are always emitted without
a source location. This is done to prevent jumpy line tables, but it
means that the local value area will be considered part of the previous
statement. Consider this C code:
  call1();      // line 1
  ++global;     // line 2
  ++global;     // line 3
  call2(&global, &local); // line 4

Today we end up with assembly and line tables like this:
  .loc 1 1
  callq call1
  leaq global(%rip), %rdi
  leaq local(%rsp), %rsi
  .loc 1 2
  addq $1, global(%rip)
  .loc 1 3
  addq $1, global(%rip)
  .loc 1 4
  callq call2

The LEA instructions in the local value area have no source location and
are treated as being on line 1. Stepping through the code in a debugger
and correlating it with the assembly won't make much sense, because
these materializations are only required for line 4.

This is actually problematic for the VS debugger "set next statement"
feature, which effectively assumes that there are no registers live
across statement boundaries. By sinking the local value code into the
statement and fixing up the source location, we can make that feature
work. This was filed as https://bugs.llvm.org/show_bug.cgi?id=35975 and
https://crbug.com/793819.

This change is obviously not enough to make this feature work reliably
in all cases, but I felt that it was worth doing anyway because it
usually generates smaller, more comprehensible -O0 code. I measured a
0.12% regression in code generation time with LLC on the sqlite3
amalgamation, so I think this is worth doing.

There are some special cases worth calling out in the commit message:
1. local values materialized for phis
2. local values used by no-op casts
3. dead local value code

Local values can be materialized for phis, and this does not show up as
a vreg use in MachineRegisterInfo. In this case, if there are no other
uses, this patch sinks the value to the first terminator, EH label, or
the end of the BB if nothing else exists.

Local values may also be used by no-op casts, which adds the register to
the RegFixups table. Without reversing the RegFixups map direction, we
don't have enough information to sink these instructions.

Lastly, if the local value register has no other uses, we can delete it.
This comes up when fastisel tries two instruction selection approaches
and the first materializes the value but fails and the second succeeds
without using the local value.

Reviewers: aprantl, dblaikie, qcolombet, MatzeB, vsk, echristo

Subscribers: dotdash, chandlerc, hans, sdardis, amccarth, javed.absar, zturner, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D43093

llvm-svn: 327581
2018-03-14 21:54:21 +00:00
Francis Visoiu Mistrih
ef0d5db760 [CodeGen] Use MIR syntax for MachineMemOperand printing
Get rid of the "; mem:" suffix and use the one we use in MIR: ":: (load 2)".

rdar://38163529

Differential Revision: https://reviews.llvm.org/D42377

llvm-svn: 327580
2018-03-14 21:52:13 +00:00
Francis Visoiu Mistrih
648e1e248b [AArch64] Emit CSR loads in the same order as stores
Optionally allow the order of restoring the callee-saved registers in the
epilogue to be reversed.

The flag -reverse-csr-restore-seq generates the following code:

```
stp     x26, x25, [sp, #-64]!
stp     x24, x23, [sp, #16]
stp     x22, x21, [sp, #32]
stp     x20, x19, [sp, #48]

; [..]

ldp     x24, x23, [sp, #16]
ldp     x22, x21, [sp, #32]
ldp     x20, x19, [sp, #48]
ldp     x26, x25, [sp], #64
ret
```

Note how the CSRs are restored in the same order as they are saved.

One exception to this rule is the last `ldp`, which allows us to merge
the stack adjustment and the ldp into a post-index ldp. This is done by
first generating:

ldp x26, x27, [sp]
add sp, sp, #64

which gets merged by the arm64 load store optimizer into

ldp x26, x25, [sp], #64

The flag is disabled by default.

llvm-svn: 327569
2018-03-14 20:34:03 +00:00
Francis Visoiu Mistrih
319b50aebd [AArch64] Keep track of MIFlags in the LoadStoreOptimizer
Merging:

* $x26, $x25 = frame-setup LDPXi $sp, 0
* $sp = frame-destroy ADDXri $sp, 64, 0

into an LDPXpost should preserve the flags from both instructions as
following:

* frame-setup frame-destroy LDPXpost

Differential Revision: https://reviews.llvm.org/D44446

llvm-svn: 327533
2018-03-14 17:10:58 +00:00
Martin Storsjo
ee100c3673 [AArch64] Don't produce R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC
Support for this relocation is missing in both LLD and GNU binutils
at the moment.

This reverts the ELF parts of SVN r327316.

llvm-svn: 327503
2018-03-14 13:09:10 +00:00
Martin Storsjo
8fd786d3d2 [AArch64] Fold adds with tprel_lo12_nc and secrel_lo12 into a following ldr/str
Differential Revision: https://reviews.llvm.org/D44355

llvm-svn: 327316
2018-03-12 18:47:43 +00:00
George Burgess IV
be98554fc6 Revert r327199: "Clean up a temp file on the buildbots"
"I'll revert this tomorrow," I said yesterday. This should've reached
all the bots it can by now.

llvm-svn: 327230
2018-03-10 23:22:46 +00:00
Martin Storsjo
6ebf60a0f6 [AArch64] Implement native TLS for Windows
Differential Revision: https://reviews.llvm.org/D43971

llvm-svn: 327220
2018-03-10 19:05:21 +00:00
George Burgess IV
c5340f63b0 Clean up a temp file on the buildbots.
r327100 made us stop producing vecreduce-propagate-sd-flags.s, but it's
still sticking around on some bots. This makes the bots unhappy.

I'll revert this tomorrow.

llvm-svn: 327199
2018-03-10 02:51:10 +00:00
Sebastian Pop
5cb4d1ecb7 [x86][aarch64] ask the backend whether it has a vector blend instruction
The code to match and produce more x86 vector blends was enabled for all
architectures even though the transform may pessimize the code for other
architectures that do not provide a vector blend instruction.

Added an aarch64 testcase to check that a VZIP instruction is generated instead
of byte movs.

Differential Revision: https://reviews.llvm.org/D44118

llvm-svn: 327132
2018-03-09 14:29:21 +00:00
Martin Storsjo
671a0abcf3 [AArch64] Fix use of a regex in the win-alloca.ll test. NFC.
Check that the variable actually is the same as the one previously
matched.

llvm-svn: 327107
2018-03-09 09:45:37 +00:00
Matt Morehouse
2b690a0c14 Attempt to fix vecreduce-propagate-sd-flags.ll test.
Buildbots have been failing since r327079.

llvm-svn: 327100
2018-03-09 02:04:30 +00:00
Sameer AbuAsal
d64a063a3a Propagate flags to SDValue in SplitVecOp_VECREDUCE
This patch is a fix for PR36642.

 While legalizing long vector types, make sure the smaller types get the
 flags of the wider type.

 bugzilla link: https://bugs.llvm.org/show_bug.cgi?id=36642

Change-Id: I0c2829639f094c862c10a6b51b342d4c2563e1fa
llvm-svn: 327079
2018-03-08 23:41:40 +00:00
Sebastian Pop
845c533554 [AArch64] add missing pattern for insert_subvector undef
The attached testcase started failing after the patch to define
isExtractSubvectorCheap with the following pattern mismatch:

ISEL: Starting pattern match
  Initial Opcode index to 85068
    Match failed at index 85076
    LLVM ERROR: Cannot select: t47: v8i16 = insert_subvector undef:v8i16, t43, Constant:i64<0>

The code generated from llvm/lib/Target/AArch64/AArch64InstrInfo.td

def : Pat<(insert_subvector undef, (v4i16 FPR64:$src), (i32 0)),
          (INSERT_SUBREG (v8i16 (IMPLICIT_DEF)), FPR64:$src, dsub)>;

is in ninja/lib/Target/AArch64/AArch64GenDAGISel.inc
At the location of the error it is:
/* 85076*/    OPC_CheckChild2Type, MVT::i32,

And it failed to match the type of operand 2.
Adding another def-pat for i64 fixes the failed def-pat error:

def : Pat<(insert_subvector undef, (v4i16 FPR64:$src), (i64 0)),
          (INSERT_SUBREG (v8i16 (IMPLICIT_DEF)), FPR64:$src, dsub)>;

llvm-svn: 326949
2018-03-07 22:07:13 +00:00
Sebastian Pop
a5f9319c39 [AArch64] define isExtractSubvectorCheap
Following the ARM-neon backend, define isExtractSubvectorCheap to return true
when extracting low and high part of a neon register.

The patch disables a test in llvm/test/CodeGen/AArch64/arm64-ext.ll This
testcase is fragile in the sense that it requires a BUILD_VECTOR to "survive"
all DAG transforms until ISelLowering. The testcase is supposed to check that
AArch64TargetLowering::ReconstructShuffle() works, and for that we need a
BUILD_VECTOR in ISelLowering. As we now transform the BUILD_VECTOR earlier into
an VEXT + vector_shuffle, we don't have the BUILD_VECTOR pattern when we get to
ISelLowering. As there is no way to disable the combiner to only exercise the
code in ISelLowering, the patch disables the testcase.

Differential revision: https://reviews.llvm.org/D43973

llvm-svn: 326811
2018-03-06 16:54:55 +00:00
Volkan Keles
6a307ed4fd GlobalISel: IRTranslate llvm.fabs.* intrinsic
Summary:
Fabs is a common floating-point operation, especially for some expansions. This patch adds
a new generic opcode for llvm.fabs.* intrinsic in order to avoid building/matching this intrinsic.

Reviewers: qcolombet, aditya_nandakumar, dsanders, rovka

Reviewed By: aditya_nandakumar

Subscribers: kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D43864

llvm-svn: 326749
2018-03-05 22:31:55 +00:00
Evandro Menezes
d27cad6273 [AArch64] Harden test case
NFC

llvm-svn: 326724
2018-03-05 17:42:18 +00:00
Sebastian Pop
8a0c4b27b5 fix PR36582
The error occurs when reading i16 elements (as in the testcase) from a v8i8
with a pattern of <0,2,4,6>. As all the data in the vector is accessed, the
operation is not a VUZP. The patch stops the pattern recognition of VUZP when
EXTRACT_VECTOR_ELT has a different element type than BUILD_VECTOR.

llvm-svn: 326722
2018-03-05 17:35:49 +00:00
Evandro Menezes
701b911eac [AArch64] Improve code generation of constant vectors
Use the whole gammut of constant immediates available to set up a vector.
Instead of using, for example, `mov w0, #0xffff; dup v0.4s, w0`, which
transfers between register files, use the more efficient `movi v0.4s, #-1`
instead.  Not limited to just a few values, but any immediate value that can
be encoded by all the variants of `FMOV`, `MOVI`, `MVNI`, thus eliminating
the need to there be patterns to optimize special cases.

Differential revision: https://reviews.llvm.org/D42133

llvm-svn: 326718
2018-03-05 17:02:47 +00:00
Craig Topper
92f12ba673 [DAGCombiner] When combining zero_extend of a truncate, only mask before extending for vectors.
Masking first, prevents the extend from being combine with loads. Its also interfering with some vXi1 extraction code.

Differential Revision: https://reviews.llvm.org/D42679

llvm-svn: 326500
2018-03-01 22:32:25 +00:00
Sebastian Pop
cdc519913d [AArch64] generate vuzp instead of mov
when a BUILD_VECTOR is created out of a sequence of EXTRACT_VECTOR_ELT with a
specific pattern sequence, either <0, 2, 4, ...> or <1, 3, 5, ...>, replace the
BUILD_VECTOR with either vuzp1 or vuzp2.

With this patch LLVM generates the following code for the first function fun1 in the testcase:
adrp x8, .LCPI0_0
ldr  q0, [x8, :lo12:.LCPI0_0]
tbl  v0.16b, { v0.16b }, v0.16b
ext  v1.16b, v0.16b, v0.16b, #8
uzp1 v0.8b, v0.8b, v1.8b
str  d0, [x8]
ret

Without this patch LLVM currently generates this code:
adrp    x8, .LCPI0_0
ldr     q0, [x8, :lo12:.LCPI0_0]
tbl     v0.16b, { v0.16b }, v0.16b
mov     v1.16b, v0.16b
mov     v1.b[1], v0.b[2]
mov     v1.b[2], v0.b[4]
mov     v1.b[3], v0.b[6]
mov     v1.b[4], v0.b[8]
mov     v1.b[5], v0.b[10]
mov     v1.b[6], v0.b[12]
mov     v1.b[7], v0.b[14]
str     d1, [x8]
ret

llvm-svn: 326443
2018-03-01 15:47:39 +00:00
Roman Tereshin
4dc6df117e [GlobalISel][AArch64] Adding -disable-gisel-legality-check CL option
Currently it's impossible to test InstructionSelect pass with MIR which
is considered illegal by the Legalizer in Assert builds. In early stages
of porting an existing backend from SelectionDAG ISel to GlobalISel,
however, we would have very basic CallLowering, Legalizer, and
RegBankSelect implementations, but rather functional Instruction Select
with quite a few patterns selectable due to the semi-automatic porting
process borrowing them from SelectionDAG ISel.

As we are trying to define legality as a property of being selectable by
the instruction selector, it would be nice to be able to easily check
what the selector can do in its current state w/o the legality check
provided by the Legalizer getting in the way.

It also seems beneficial to have a regression testing set up that would
not allow the selector to silently regress in its support of the MIR not
supported yet by the previous passes in the GlobalISel pipeline.

This commit adds -disable-gisel-legality-check command line option to
llc that disables those legality checks in RegBankSelect and
InstructionSelect passes.

It also adds quite a few MIR test cases for AArch64's Instruction
Selector. Every one of them would fail on the legality check at the
moment, but will select just fine if the check is disabled. Every test
MachineFunction is intended to exercise a specific selection rule and
that rule only, encoded in the MachineFunction's name by the rule's
number, ID, and index of its GIM_Try opcode in TableGen'erated
MatchTable (-optimize-match-table=false).

Reviewers: ab, dsanders, qcolombet, rovka

Reviewed By: bogner

Subscribers: kristof.beyls, volkan, aditya_nandakumar, aemerson,
rengolin, t.p.northover, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D42886

llvm-svn: 326396
2018-03-01 00:27:48 +00:00
Chih-Hung Hsieh
13e607925f [TLS] use emulated TLS if the target supports only this mode
Emulated TLS is enabled by llc flag -emulated-tls,
which is passed by clang driver.
When llc is called explicitly or from other drivers like LTO,
missing -emulated-tls flag would generate wrong TLS code for targets
that supports only this mode.
Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether
emulated TLS code should be generated.
Unit tests are modified to run with and without the -emulated-tls flag.

Differential Revision: https://reviews.llvm.org/D42999

llvm-svn: 326341
2018-02-28 17:48:55 +00:00
Geoff Berry
c7dbf2fe82 Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Re-enable commit r323991 now that r325931 has been committed to make
MachineOperand::isRenamable() check more conservative w.r.t. code
changes and opt-in on a per-target basis.

llvm-svn: 326208
2018-02-27 16:59:10 +00:00
Evandro Menezes
226db1297f [AArch64] Harden test cases
NFC

llvm-svn: 326147
2018-02-26 23:19:25 +00:00
Francis Visoiu Mistrih
0164936dcf [CodeGen] Don't omit any redundant information in -debug output
In r322867, we introduced IsStandalone when printing MIR in -debug
output. The default behaviour for that was:

1) If any of MBB, MI, or MO are -debug-printed separately, don't omit any
redundant information.

2) When -debug-printing a MF entirely, don't print any redundant
information.

3) When printing MIR, don't print any redundant information.

I'd like to change 2) to:

2) When -debug-printing a MF entirely, don't omit any redundant information.

Differential Revision: https://reviews.llvm.org/D43337

llvm-svn: 326094
2018-02-26 15:23:42 +00:00
Evandro Menezes
0342261d78 [PATCH] [AArch64] Add new target feature to fuse conditional select
This feature enables the fusion of the comparison and the conditional select
instructions together.

Differential revision: https://reviews.llvm.org/D42392

llvm-svn: 325939
2018-02-23 19:27:43 +00:00
Evandro Menezes
3837e2e5fc [AArch64] Improve macro fusion test case
Improve a vector in the test case for the fusion of address generation and
loads or stores.  Otherwise, NFC.

llvm-svn: 325839
2018-02-22 23:32:06 +00:00
Sander de Smalen
a1c0a1def4 Revert "[DebugInfo][FastISel] Fix dropping dbg.value()"
This patch reverts r325440 and r325438 because it triggers an
assertion in SelectionDAGBuilder.cpp. Also having debug enabled
may unintentionally affect code-gen. The patch is reverted until
we find a better solution.

llvm-svn: 325825
2018-02-22 19:53:59 +00:00
Evandro Menezes
b11fbd7d6f [AArch64] Refactor instructions using SIMD immediates
Get rid of icky goto loops and make the code easier to maintain.  Otherwise,
NFC.

Restore r324903 and fix PR36369.

Differentail revision: https://reviews.llvm.org/D43364

llvm-svn: 325621
2018-02-20 20:31:45 +00:00
Amara Emerson
a99b25a021 [AArch64][GlobalISel] When copying from a gpr32 to an fpr16 reg, convert to fpr32 first.
This is a follow on commit to r[x] where we fix the other direction of copy.
For this case, after converting the source from gpr32 -> fpr32, we use a
subregister copy, which is essentially what EXTRACT_SUBREG does in SDAG land.

https://reviews.llvm.org/D43444

llvm-svn: 325550
2018-02-20 05:11:57 +00:00
Amara Emerson
dc30913313 [AArch64][GlobalISel] Fix an assert fail/miscompile when fp16 types are copied
to gpr register banks.

PR36345.

rdar://36478867

Differential Revision: https://reviews.llvm.org/D43310

llvm-svn: 325463
2018-02-18 17:10:49 +00:00
Amara Emerson
ecf2cc8686 [AArch64][GlobalISel] Support G_INSERT/G_EXTRACT of types < s32 bits.
These are needed for operations on fp16 types in a later patch.

llvm-svn: 325462
2018-02-18 17:03:02 +00:00