1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 19:12:56 +02:00
Commit Graph

44986 Commits

Author SHA1 Message Date
Simon Pilgrim
877d6ce5ff [X86] Split ctpop/ctlz/cttz cost tests
This will make things a lot easier to test all the permutations of avx512 

llvm-svn: 303290
2017-05-17 19:57:20 +00:00
Matt Arsenault
b41d61b11a AMDGPU: Fix min3/max3 combines for f16/i16
Fix missing instruction definitions for min3/max3.

llvm-svn: 303284
2017-05-17 19:25:06 +00:00
Simon Pilgrim
52e72fd51b [X86][AVX512] Add 512-bit vector bitreverse costs + tests
llvm-svn: 303283
2017-05-17 19:20:20 +00:00
Sanjay Patel
66f3d22962 [InstCombine] add isCanonicalPredicate() helper function and use it; NFCI
There should be a slight efficiency improvement from handling icmp/fcmp with one matcher and reducing duplicated code.

The larger motivation is that there are questions about how predicate canonicalization is handled, and the refactoring
should make it easier if we want to change any of that behavior.

1. As noted in the code comment, we've chosen 3 of the 16 FCMP preds as not canonical. Why those 3? It goes back to 
   rL32751 from what I can tell, but I'm not sure if there's a justification for that rule.
2. We currently do not canonicalize integer select conditions. Should we use the same rule that applies to branches 
   for selects?
3. We currently do canonicalize some FP select conditions, and those rules would conflict with the rule shown here. 
   Should one or both be changed? 

No-functional-change-intended, but adding tests anyway because there's no coverage for most of the predicates.

Differential Revision: https://reviews.llvm.org/D33247

llvm-svn: 303261
2017-05-17 14:21:19 +00:00
Daniel Sanders
c6132632b6 [globalisel][tablegen] Import rules containing intrinsic_wo_chain.
Summary:
As of this patch, 1018 out of 3938 rules are currently imported.

Depends on D32275

Reviewers: qcolombet, kristof.beyls, rovka, t.p.northover, ab, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: dberris, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D32278

llvm-svn: 303259
2017-05-17 13:39:49 +00:00
Sanjay Patel
8c90f78395 [x86] Update tests in psubus.ll; NFC
Remove unnecessary memops to minimize tests.

Patch by Yulia Koval!

Differential Revision: https://reviews.llvm.org/D32643

llvm-svn: 303258
2017-05-17 13:39:16 +00:00
Krzysztof Parzyszek
3fd2e20288 [PPC] Properly update register save area offsets
The variables MinGPR/MinG8R were not updated properly when resetting the
offsets, which in the included testcase lead to saving the CR register
in the same location as R30.

This fixes another issue reported in PR26519.

Differential Revision: https://reviews.llvm.org/D33017

llvm-svn: 303257
2017-05-17 13:25:09 +00:00
Igor Breger
259f70612a [GlobalISel][X86] Support add i64 in IA32.
Summary: support G_UADDE instruction selection.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D33096

llvm-svn: 303255
2017-05-17 12:48:08 +00:00
Jonas Paulsson
4273fbf74e [SystemZ] Modelling of costs of divisions with a constant power of 2.
Such divisions will eventually be implemented with shifts which should
be reflected in the cost function.

Review: Ulrich Weigand
llvm-svn: 303254
2017-05-17 12:46:26 +00:00
Daniel Sanders
fb194eb754 [globalisel][tablegen] Require that all registers between instructions of a match are virtual.
Summary:
Without this, it's possible to encounter multiple defs for a register.

This is triggered by the current version of D32868 when applied to trunk.

Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls

Reviewed By: qcolombet

Subscribers: llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D32869

llvm-svn: 303253
2017-05-17 12:43:30 +00:00
Daniel Cederman
e286942bf0 [Sparc] Remove execute permissions from non-executable text files
Reviewers: jyknight, lero_chris, venkatra

Reviewed By: jyknight

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27127

llvm-svn: 303245
2017-05-17 11:05:20 +00:00
Diana Picus
3318f6498e [GlobalISel][TableGen] Fix handling of default operands
When looping through a destination pattern's operands to decide how many
default operands we need to introduce, we used to count the "expanded"
number of operands. So if one default operand would be rendered as 2
values, we'd count it as 2 operands, when in fact it needs to count as
only 1 operand regardless of how many values it expands to.

This turns out to be a problem only in some very specific cases, e.g.
when we have one operand with multiple default values followed by more
operands with default values (see the new test). In such a situation
we'd stop looping before looking at all the operands, and then error out
assuming that we don't have enough default operands to make up the
shortfall.

At the moment this only affects ARM.

The patch removes the loop counting default operands entirely and
assumes that we'll have to introduce values for any default operand that
we find (i.e. we're assuming it cannot be given as a child at all). It
also extracts the code for adding renderers for default operands into a
helper method.

Differential Revision: https://reviews.llvm.org/D33031

llvm-svn: 303240
2017-05-17 08:57:28 +00:00
Pavel Labath
79e9360669 [RuntimeDyld] Fix debug section relocation (pr20457)
Summary:
Debug info sections, (or non-SHF_ALLOC sections in general) should be
linked as if their load address was zero to emulate the behavior of the
static linker.

This bug was discovered because it was breaking lldb expression evaluation on
linux.

Reviewers: lhames

Subscribers: aprantl, eugene, clayborg, lldb-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D32899

llvm-svn: 303239
2017-05-17 08:47:28 +00:00
Gor Nishanov
ef81a105e5 [coroutines] Handle spills before catchswitch
If we need to spill the result of the PHI instruction, we insert the spill after
all of the PHIs and EHPads, however, in a catchswitch block there is no
room to insert the spill. Make room by splitting away catchswitch into a separate
block.

Before the fix:

    catch.dispatch:
       %val = phi i32 [ 1, %if.then ], [ 2, %if.else ]
       %switch = catchswitch within none [label %catch] unwind label %cleanuppad

After:

    catch.dispatch:
       %val = phi i32 [ 1, %if.then ], [ 2, %if.else ]
       %tok = cleanuppad within none []
       ; spill goes here
       cleanupret from %tok unwind label %catch.dispatch.switch
    catch.dispatch.switch:
       %switch = catchswitch within none [label %catch] unwind label %cleanuppad

https://reviews.llvm.org/D31846

llvm-svn: 303232
2017-05-17 03:09:22 +00:00
Davide Italiano
e6fa4e685f [NewGVN] Re-enable test now that the nondeterminism has been fixed.
llvm-svn: 303217
2017-05-16 22:27:06 +00:00
NAKAMURA Takumi
0fc3371c12 llvm/test/Transforms/InstCombine/debuginfo-skip.ll REQUIRES +asserts.
llvm-svn: 303216
2017-05-16 22:19:56 +00:00
Sanjay Patel
5c904052a0 [InstSimplify] add folds for constant mask of value shifted by constant
We would eventually catch these via demanded bits and computing known bits in InstCombine,
but I think it's better to handle the simple cases as soon as possible as a matter of efficiency.

This fold allows further simplifications based on distributed ops transforms. eg:
  %a = lshr i8 %x, 7
  %b = or i8 %a, 2
  %c = and i8 %b, 1

InstSimplify can directly fold this now:
  %a = lshr i8 %x, 7

Differential Revision: https://reviews.llvm.org/D33221

llvm-svn: 303213
2017-05-16 21:51:04 +00:00
Amara Emerson
b4afa9c73c Re-commit r302678, fixing PR33053.
The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions
which didn't have a lowering.

llvm-svn: 303211
2017-05-16 21:29:22 +00:00
Easwaran Raman
d91313ddb5 [Inliner] Do not mix callsite and callee hotness based updates.
Update threshold based on callee's hotness only when BFI is not available.
Otherwise use only callsite's hotness. This makes it easier to reason about
hotness related threshold updates.

Differential revision: https://reviews.llvm.org/D33157

llvm-svn: 303210
2017-05-16 21:18:09 +00:00
Tim Shen
82dcf06a2b [PPC] Add -ppc-asm-full-reg-names to atomic-2.ll. NFC.
Differential Revisions: https://reviews.llvm.org/D32763

llvm-svn: 303209
2017-05-16 20:58:55 +00:00
Matthias Braun
b5e4bc434a Test for r303197
llvm-svn: 303208
2017-05-16 20:53:27 +00:00
Tim Shen
d0970ab97a [PPC] Lower load acquire/seq_cst trailing fence to cmp + bne + isync.
Summary:
This fixes pr32392.

The lowering pipeline is:
llvm.ppc.cfence in IR -> PPC::CFENCE8 in isel -> Actual instructions in
expandPostRAPseudo.

The reason why expandPostRAPseudo is chosen is because previous passes
are likely eliminating instructions like cmpw 3, 3 (early CSE) and bne-
7, .+4 (some branch pass(s)).

Differential Revision: https://reviews.llvm.org/D32763

llvm-svn: 303205
2017-05-16 20:18:06 +00:00
Sanjay Patel
95166c2c5b [InstCombine] auto-generate better checks; NFC
llvm-svn: 303203
2017-05-16 20:09:32 +00:00
Dmitry Mikulin
2eecd88090 In debug builds non-trivial amount of time is spent in InstCombine processing
@llvm.dbg.* calls in visitCallInst(). They can be safely ignored.

llvm-svn: 303202
2017-05-16 20:08:49 +00:00
Reid Kleckner
6ef635c682 Revert "[X86] Replace slow LEA instructions in X86"
This reverts commit r303183, it broke various buildbots and introduced
sanitizer errors.

llvm-svn: 303199
2017-05-16 19:55:03 +00:00
Nirav Dave
3633380341 Elide stores which are overwritten without being observed.
Summary:
In SelectionDAG, when a store is immediately chained to another store
to the same address, elide the first store as it has no observable
effects. This is causes small improvements dealing with intrinsics
lowered to stores.

Test notes:

* Many testcases overwrite store addresses multiple times and needed
  minor changes, mainly making stores volatile to prevent the
  optimization from optimizing the test away.

* Many X86 test cases optimized out instructions associated with
  associated with va_start.

* Note that test_splat in CodeGen/AArch64/misched-stp.ll no longer has
  dependencies to check and can probably be removed and potentially
  replaced with another test.

Reviewers: rnk, john.brawn

Subscribers: aemerson, rengolin, qcolombet, jyknight, nemanjai, nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33206

llvm-svn: 303198
2017-05-16 19:43:56 +00:00
Renato Golin
03c513ec21 Revert "[ARM] Mark LEApcrel instructions as isAsCheapAsAMove"
Revert "[ARM] Mark LEApcrel as not having side effects"

This reverts commit r303054 and r303053, as they broke the ARM
self-hosting buildbots:

http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/1550

http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost-neon/builds/1349

http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/1845

Offline investigation on course.

llvm-svn: 303193
2017-05-16 17:59:07 +00:00
Sanjay Patel
483e8a2253 [InstCombine] add motivational comment for tests; NFC
The referenced tests are derived from:
https://bugs.llvm.org/show_bug.cgi?id=32791
and:
https://reviews.llvm.org/D33172

The motivation for including negative tests may not be clear, so I'm adding an explanatory comment here.
In the post-commit thread for r303133:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170515/453793.html
...it was mentioned that we don't want to add redundant tests. This is a valid point. But in this case, 
we have a patch under review (D33172) that demonstrates that no existing regression tests are affected by
a proposed code change, but these are. Therefore, I think these tests have value not visible in any 
existing regression tests regardless of whether they show a transform.

Differential Revision: https://reviews.llvm.org/D33242

llvm-svn: 303185
2017-05-16 16:30:46 +00:00
Lama Saba
9f9269fa35 [X86] Replace slow LEA instructions in X86
According to Intel's Optimization Reference Manual for SNB+:
  " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must
    dispatch via port 1:
  - LEA that has all three source operands: base, index, and offset
  - LEA that uses base and index registers where the base is EBP, RBP,or R13
  - LEA that uses RIP relative addressing mode
  - LEA that uses 16-bit addressing mode "
  This patch currently handles the first 2 cases only.
 
Differential Revision: https://reviews.llvm.org/D32277

llvm-svn: 303183
2017-05-16 16:01:36 +00:00
Matthew Simpson
be5fce863d Revert 303174, 303176, and 303178
These commits are breaking the bots. Reverting to investigate.

llvm-svn: 303182
2017-05-16 15:50:30 +00:00
Matthew Simpson
f9ca5aa639 Make test target-specific
llvm-svn: 303178
2017-05-16 15:33:22 +00:00
Matthew Simpson
2b38ec1ab7 Fix test case to unbreak bots
llvm-svn: 303176
2017-05-16 15:20:27 +00:00
Matthew Simpson
fdeda43e2f [LV] Avoid potentential division by zero when selecting IC
llvm-svn: 303174
2017-05-16 14:43:55 +00:00
Gor Nishanov
e2a5e02b38 [coroutines] Handle unwind edge splitting
Summary:
RewritePHIs algorithm used in building of CoroFrame inserts a placeholder
```
%placeholder = phi [%val]
```
on every edge leading to a block starting with PHI node with multiple incoming edges,
so that if one of the incoming values was spilled and need to be reloaded, we have a
place to insert a reload. We use SplitEdge helper function to split the incoming edge.

SplitEdge function does not deal with unwind edges comping into a block with an EHPad.

This patch adds an ehAwareSplitEdge function that can correctly split the unwind edge.

For landing pads, we clone the landing pad into every edge block and replace the original
landing pad with a PHI collection the values from all incoming landing pads.

For WinEH pads, we keep the original EHPad in place and insert cleanuppad/cleapret in the
edge blocks.

Reviewers: majnemer, rnk

Reviewed By: majnemer

Subscribers: EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D31845

llvm-svn: 303172
2017-05-16 14:11:39 +00:00
Igor Breger
7b3a99c110 [GlobalISel][X86] Split memop test file. NFC
llvm-svn: 303169
2017-05-16 13:37:31 +00:00
Max Kazantsev
66886e6d12 [SCEV] Fix sorting order for AddRecExprs
The existing sorting order in defined CompareSCEVComplexity sorts AddRecExprs
by loop depth, but does not pay attention to dominance of loops. This can
lead us to the following buggy situation:

for (...) { // loop1
  op1 = {A,+,B}
}
for (...) { // loop2
  op2 = {A,+,B}
  S = add op1, op2
}

In this case there is no guarantee that in operand list of S the op2 comes
before op1 (loop depth is the same, so they will be sorted just
lexicographically), so we can incorrectly treat S as a recurrence of loop1,
which is wrong.

This patch changes the sorting logic so that it places the dominated recs
before the dominating recs. This ensures that when we pick the first recurrency
in the operands order, it will be the bottom-most in terms of domination tree.
The attached test set includes some tests that produce incorrect SCEV
estimations and crashes with oldlogic.

Reviewers: sanjoy, reames, apilipenko, anna

Reviewed By: sanjoy

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D33121

llvm-svn: 303148
2017-05-16 07:27:06 +00:00
Davide Italiano
c968f4e36b Revert "[NewGVN] Replace predicate info leftovers."
It's breaking the bots.

llvm-svn: 303142
2017-05-16 05:51:21 +00:00
Davide Italiano
08da355c1b [NewGVN] Replace predicate info leftovers.
Fixes PR32945.

Differential Revision:  https://reviews.llvm.org/D33226

llvm-svn: 303141
2017-05-16 05:23:23 +00:00
Sanjay Patel
3f32289f72 [InstCombine] add tests for PR32791; NFC
llvm-svn: 303133
2017-05-15 23:59:28 +00:00
Francis Visoiu Mistrih
6212dd268d [ShrinkWrapping] Handle restores on no-return paths
Shrink-wrapping uses post-dominators to find a restore point that
post-dominates all the uses of CSR / stack.

The way dominator trees are modeled in LLVM today is that unreachable
blocks are not present in a generic dominator tree, so, an unreachable node is
dominated by anything: include/llvm/Support/GenericDomTree.h:467.

Since for post-dominators, a no-return block is considered
"unreachable", calling findNearestCommonDominator on an unreachable node
A and a non-unreachable node B, will return B, which can be false. If we
find such node, we bail out since there is no good restore point
available.

rdar://problem/30186931

llvm-svn: 303130
2017-05-15 23:13:35 +00:00
Sanjay Patel
6a5f74bb94 [InstSimplify] add tests for unnecessary mask of shifted values; NFC
llvm-svn: 303127
2017-05-15 22:54:37 +00:00
Justin Bogner
e9f4448b87 Add "REQUIRES:" to the last few tests that use target specific intrinsics
llvm-svn: 303123
2017-05-15 22:15:22 +00:00
Tim Northover
cfd3f40830 AArch64: use linker-private symbols for globals in MachO.
We don't use section-relative relocations on AArch64, so all symbols must be at
least visible to the linker (i.e. properly global or l_whatever, but not
L_whatever).

llvm-svn: 303118
2017-05-15 21:51:38 +00:00
David Blaikie
cbf90e5613 PR32288: Describe a bool parameter's DWARF location with a simple register
There's no need (& a bit incorrect) to mask off the high bits of the
register reference when describing a simple bool value.

Reviewers: aprantl

Differential Revision: https://reviews.llvm.org/D31062

llvm-svn: 303117
2017-05-15 21:34:01 +00:00
Adam Nemet
f9607f0660 [SLP] Enable 64-bit wide vectorization on AArch64
ARM Neon has native support for half-sized vector registers (64 bits).  This
is beneficial for example for 2D and 3D graphics.  This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.

*** Performance Analysis

This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.

The results are with -O3 and PGO.  A negative percentage is an improvement.
The testsuite was run with a sample size of 4.

** SPEC

* CFP2006/482.sphinx3  -3.34%

A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.

* CFP2000/177.mesa     -3.34%
* CINT2000/256.bzip2   +6.97%

My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%.  There are also other problems with the codegen in
this loop so there is further room for improvement.

** LLVM testsuite

* SingleSource/Benchmarks/Misc/ReedSolomon               -10.75%

There are multiple small SLP vectorizations outside the hot code.  It's a bit
surprising that it adds up to 10%.  Some of this may be code-layout noise.

* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%

The opt-viewer screenshot can be seen at F3218284.  We start at a colder store
but the tree leads us into the hottest loop.

* MultiSource/Applications/lambda-0.1.3/lambda            -2.68%
* MultiSource/Benchmarks/Bullet/bullet                    -2.18%

This is using 3D vectors.

* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%

Noise, binary is unchanged.

* MultiSource/Benchmarks/Ptrdist/anagram/anagram          +4.90%

There is an additional SLP in the cold code.  The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.

* MultiSource/Applications/aha/aha                        +1.63%
* MultiSource/Applications/JM/lencod/lencod               +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark         +1.15%

Differential Revision: https://reviews.llvm.org/D31965

llvm-svn: 303116
2017-05-15 21:15:01 +00:00
Hans Wennborg
247e13c637 Revert r302678 "[AArch64] Enable use of reduction intrinsics."
This caused PR33053.

Original commit message:

> The new experimental reduction intrinsics can now be used, so I'm enabling this
> for AArch64. We will need this for SVE anyway, so it makes sense to do this for
> NEON reductions as well.
>
> The existing code to match shufflevector patterns are replaced with a direct
> lowering of the reductions to AArch64-specific nodes. Tests updated with the
> new, simpler, representation.
>
> Differential Revision: https://reviews.llvm.org/D32247

llvm-svn: 303115
2017-05-15 20:59:32 +00:00
Tim Northover
77c86e2d11 AArch64: diagnose unrecognized features in .cpu directive.
We were silently ignoring any features we couldn't match up, which led to
errors in an inline asm block missing the conventional "\n\t".

llvm-svn: 303108
2017-05-15 19:42:15 +00:00
Sanjay Patel
612c21f9a9 [InstCombine] restrict icmp fold with 2 sdiv exact operands (PR32949)
This is the InstCombine counterpart to D32954. 
I added some comments about the code duplication in:
rL302436

Alive-based verification:
http://rise4fun.com/Alive/dPw

This is a 2nd fix for the problem reported in:
https://bugs.llvm.org/show_bug.cgi?id=32949

Differential Revision: https://reviews.llvm.org/D32970

llvm-svn: 303105
2017-05-15 19:27:53 +00:00
Sanjay Patel
6116bcb0ac [InstSimplify] restrict icmp fold with 2 sdiv exact operands (PR32949)
These folds were introduced with https://reviews.llvm.org/rL127064 as part of solving:
https://bugs.llvm.org/show_bug.cgi?id=9343

As shown here:
http://rise4fun.com/Alive/C8
...however, the sdiv exact case needs a stronger predicate.

I opted for duplicated code instead of adding another fallthrough because I think that's 
easier to read (and edit in case we need/want to restrict/loosen the predicates any more).

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=32949
https://bugs.llvm.org/show_bug.cgi?id=32948

Differential Revision: https://reviews.llvm.org/D32954

llvm-svn: 303104
2017-05-15 19:16:49 +00:00
Evgeny Stupachenko
d11ab9e578 The patch adds CTLZ idiom recognition.
Summary:

The following loops should be recognized:
i = 0;
while (n) {
  n = n >> 1;
  i++;
  body();
}
use(i);

And replaced with builtin_ctlz(n) if body() is empty or
for CPUs that have CTLZ instruction converted to countable:

for (j = 0; j < builtin_ctlz(n); j++) {
  n = n >> 1;
  i++;
  body();
}
use(builtin_ctlz(n));

Reviewers: rengolin, joerg

Differential Revision: http://reviews.llvm.org/D32605

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 303102
2017-05-15 19:08:56 +00:00
Davide Italiano
a3335b259b [NewGVN] Fix verification of MemoryPhis in verifyMemoryCongruency().
verifyMemoryCongruency() filters out trivially dead MemoryDef(s),
as we find them immediately dead, before moving from TOP to a new
congruence class.
This fixes the same problem for PHI(s) skipping MemoryPhis if all
the operands are dead.

Differential Revision:  https://reviews.llvm.org/D33044

llvm-svn: 303100
2017-05-15 18:50:53 +00:00
Teresa Johnson
b00b861ff8 Add support for handling ifuncs to GlobalValue::getBaseObject
Summary:
All GlobalIndirectSymbol types (not just GlobalAlias) should return
their base object.

Without this patch LTO would warn "Unable to determine comdat of
alias!" for an ifunc.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D33202

llvm-svn: 303096
2017-05-15 18:28:29 +00:00
Kyle Butt
0bcf661a2a CodeGen: BlockPlacement: Increase tail duplication size for O3.
At O3 we are more willing to increase size if we believe it will improve
performance. The current threshold for tail-duplication of 2 instructions is
conservative, and can be relaxed at O3.

Benchmark results:
llvm test-suite:
6% improvement in aha, due to duplication of loop latch
3% improvement in hexxagon

2% slowdown in lpbench. Seems related, but couldn't completely diagnose.

Internal google benchmark:
Produces 4% improvement on internal google protocol buffer serialization
benchmarks.

Differential-Revision: https://reviews.llvm.org/D32324
llvm-svn: 303084
2017-05-15 17:30:47 +00:00
Simon Pilgrim
5188a44a1b [NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)
Follow up to D33147

NVPTXTargetLowering::LowerCall was trusting the default argument values.

Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.

Differential Revision: https://reviews.llvm.org/D33189

llvm-svn: 303082
2017-05-15 17:17:44 +00:00
Rafael Espindola
7369fbc825 Add an extra test for archive symbol tables.
The table should include only defined symbols.

llvm-svn: 303075
2017-05-15 15:56:23 +00:00
Simon Pilgrim
a09d9c5af0 [SLPVectorizer][X86] Add vectorization tests for vXi64/vXi32/vXi16/VXi8 add/sub/mul
llvm-svn: 303074
2017-05-15 15:48:15 +00:00
Florian Hahn
c8227d3439 [AArch64] Enable FeatureFuseAES on Cortex-A72.
This patch enables fusing dependent AESE/AESMC and AESD/AESIMC
instruction pairs on Cortex-A72, as recommended in the Software
Optimization Guide, section 4.10.

llvm-svn: 303073
2017-05-15 15:15:22 +00:00
Dmitry Preobrazhensky
5a5f736ba9 [AMDGPU][MC] Corrected several VI opcodes to avoid printing _e64
See bug 32936: https://bugs.llvm.org//show_bug.cgi?id=32936

Reviewers: artem.tamazov, vpykhtin

Differential Revision: https://reviews.llvm.org/D33123

llvm-svn: 303070
2017-05-15 14:28:23 +00:00
Simon Pilgrim
61ca3be831 [SLPVectorizer][X86] Add vectorization tests for vXi64/vXi32/vXi16/VXi8 shifts
llvm-svn: 303069
2017-05-15 14:27:11 +00:00
Dinar Temirbulatov
41867e5c63 Test commit.
llvm-svn: 303059
2017-05-15 13:14:04 +00:00
Dmitry Preobrazhensky
5cc1148d9a [AMDGPU][MC] Removed V_MQSAD_U16_U8
This instruction does not really exist

See Bug 33018: https://bugs.llvm.org//show_bug.cgi?id=33018

Reviewers: vpykhtin, artem.tamazov

Differential Revision: https://reviews.llvm.org/D33126

llvm-svn: 303055
2017-05-15 12:37:03 +00:00
John Brawn
3f252f0e9f [ARM] Mark LEApcrel instructions as isAsCheapAsAMove
Doing this means that if an LEApcrel is used in two places we will rematerialize
instead of generating two MOVs. This is particularly useful for printfs using
the same format string, where we want to generate an address into a register
that's going to get corrupted by the call.

Differential Revision: https://reviews.llvm.org/D32858

llvm-svn: 303054
2017-05-15 11:57:54 +00:00
John Brawn
1f37e6bdfc [ARM] Mark LEApcrel as not having side effects
Doing this lets us hoist it out of loops, and I've also marked it as
rematerializable the same as the thumb1 and thumb2 counterparts.

It looks like it being marked as such was just a mistake, as the commit that
made that change only mentions LEApcrelJT and in thumb1 and thumb2 only the
LEApcrelJT instructions were marked as having side-effects, so it looks like
the intent was to only mark LEApcrelJT as having side-effects but LEApcrel was
accidentally marked as such also.

Differential Revision: https://reviews.llvm.org/D32857

llvm-svn: 303053
2017-05-15 11:50:21 +00:00
Ayman Musa
46c492dbd0 [X86] Relocate code of replacement of subtarget unsupported masked memory intrinsics to run also on -O0 option.
Currently, when masked load, store, gather or scatter intrinsics are used, we check in CodeGenPrepare pass if the subtarget support these intrinsics, if not we replace them with scalar code - this is a functional transformation not an optimization (not optional).

CodeGenPrepare pass does not run when the optimization level is set to CodeGenOpt::None (-O0).

Functional transformation should run with all optimization levels, so here I created a new pass which runs on all optimization levels and does no more than this transformation.

Differential Revision: https://reviews.llvm.org/D32487

llvm-svn: 303050
2017-05-15 11:30:54 +00:00
Sam Kolton
396cd8fa77 [TableGen] Add EncoderMethod to RegisterOperand
Reviewers: stoklund, grosbach, vpykhtin

Differential Revision: https://reviews.llvm.org/D32493

llvm-svn: 303044
2017-05-15 10:13:07 +00:00
Arnaud A. de Grandmaison
5d50c74fd1 MCObjectStreamer : fail with a diagnostic when emitting an out of range value.
We were previously silently emitting bogus data in release mode,
making it very hard to diagnose the error, or crashing with an
assert in debug mode. A proper diagnostic is now always emitted
when the value to be emitted is out of range.

llvm-svn: 303041
2017-05-15 08:43:27 +00:00
Igor Breger
bbf2eba3b0 [GlobalISel][X86] G_BR instruction select test
llvm-svn: 303036
2017-05-15 07:03:38 +00:00
Daniel Jasper
72029fd63e Add '#' to test regex that I forgot in r303025.
llvm-svn: 303034
2017-05-15 04:58:27 +00:00
Daniel Jasper
9ff7dd68c1 Fix two tests that weren't correctly copied.
One didn't correctly fine the regex variable, the other still had a RUN
line for FNOBUILTIN-checks, which weren't copied to the file.

llvm-svn: 303025
2017-05-14 22:07:50 +00:00
Simon Pilgrim
58549a36e1 [X86][AVX1] Account for cost of extract/insert of 256-bit shifts
llvm-svn: 303023
2017-05-14 20:52:11 +00:00
Simon Pilgrim
3d4f03d434 [X86][AVX2] Fix costs for v4i64 ashr by splat
llvm-svn: 303022
2017-05-14 20:25:42 +00:00
Simon Pilgrim
662344253d [X86][AVX1] Account for cost of extract/insert of 256-bit shifts by splat
llvm-svn: 303021
2017-05-14 20:02:34 +00:00
Craig Topper
d441caa7a7 [X86] Add avx512vl command lines to the 128/256-bit vector-lzcnt tests so we can see what compare instructions are being used in the lookup table code.
I noticed the 512-bit lzcnts don't use the X86 specific lookup table code and instead use the EXPAND case in LegalizeDAG. I was toying around with fixing this and noticed it would require compare instructions that generate i1 masks and then converting from mask to vector. Then I noticed that we don't test which compares are used with avx512vl and no avx512cd.

llvm-svn: 303020
2017-05-14 19:38:11 +00:00
Craig Topper
7bf8b47c7e [X86] Cleanup some of the check-prefixes in the vector-lzcnt tests.
Remove an unneeded prefix from the 32-bit command line. Make all the 64-bit triples match. Replace ALL with X64 and remove it from the 32-bit test.

llvm-svn: 303019
2017-05-14 19:38:09 +00:00
Simon Pilgrim
9b072aaee8 [X86][AVX1] Account for cost of extract/insert of 256-bit SDIV/UDIV by mul sequences
llvm-svn: 303017
2017-05-14 18:52:15 +00:00
Shoaib Meenai
3230d2780d [COFF] Gracefully handle empty .drectve sections
Running `llvm-readobj -coff-directives msvcrt.lib` resulted in this error:

    Invalid data was encountered while parsing the file

This happened because some of the object files in the archive have empty
`.drectve` sections. These empty sections result in a `parse_failed` error being
returned from `COFFObjectFile::getSectionContents()`, which in turn caused
`llvm-readobj` to stop. With this change, `getSectionContents` now returns
success, and like before the resulting array is empty.

Patch by Dave Lee.

Differential Revision: https://reviews.llvm.org/D32652

llvm-svn: 303014
2017-05-14 18:34:56 +00:00
Simon Pilgrim
bd07f4d3ed [X86][XOP] XOP's general v16i8 shifts will be used instead of v8i16 shift + mask.
Tweak cost model to match what lowering actually does.

llvm-svn: 303013
2017-05-14 17:59:46 +00:00
Simon Pilgrim
d440f01082 [X86][SSE] Account for cost of extract/insert of v32i8 vector shifts
llvm-svn: 303012
2017-05-14 17:36:07 +00:00
Simon Pilgrim
75f3dd7e94 [X86][XOP] Account for cost of extract/insert of 256-bit vector shifts
llvm-svn: 303010
2017-05-14 13:38:53 +00:00
Simon Pilgrim
42caa7142b [X86][AVX] Allow 32-bit targets to peek through subvectors to extract constant splats for vXi64 shifts.
llvm-svn: 303009
2017-05-14 11:46:26 +00:00
Simon Pilgrim
3b5e5ceefe [X86][AVX] Add additional 32-bit target vector shift tests
Shows issue with 32-bits not being able to peek through subvectors to extract constant splats

llvm-svn: 303008
2017-05-14 11:13:03 +00:00
Craig Topper
6c2eb35d5e [InstSimplify] Add patterns for folding (A & B) | (~A ^ B) -> (~A ^ B) and its commuted variants.
We already had (A & ~B) | (A ^ B), but we missed the cases where the not was part of the xor.

llvm-svn: 303004
2017-05-14 07:54:43 +00:00
Craig Topper
3da63abd9f foo
llvm-svn: 303003
2017-05-14 07:54:40 +00:00
Xinliang David Li
5f25d42892 Renable test that was disabled due to cost analysis
llvm-svn: 303000
2017-05-14 02:58:39 +00:00
Zachary Turner
30b209f262 [llvm-pdbdump] Add the option to sort functions and data.
llvm-svn: 302998
2017-05-14 01:13:40 +00:00
Simon Pilgrim
fb7acbc016 [SelectionDAG] Added support for EXTRACT_SUBVECTOR/CONCAT_VECTORS demandedelts in ComputeNumSignBits
llvm-svn: 302997
2017-05-13 22:10:58 +00:00
Simon Pilgrim
e3e00995f1 [X86][SSE] Test showing missing EXTRACT_SUBVECTOR/CONCAT_VECTORS demandedelts support in ComputeNumSignBits
llvm-svn: 302994
2017-05-13 21:50:18 +00:00
Simon Pilgrim
dd6308c45b [SelectionDAG] Add VECTOR_SHUFFLE support to ComputeNumSignBits
llvm-svn: 302993
2017-05-13 19:57:10 +00:00
Simon Pilgrim
64ce41df9a [X86][SSE] Test showing inability of ComputeNumSignBits to resolve shuffles
llvm-svn: 302992
2017-05-13 17:41:07 +00:00
Justin Bogner
0e68367839 MSan: Mark MemorySanitizer tests that use x86 intrinsics as REQUIRES: x86
Tests that use target intrinsics are inherently target specific. Mark
them as such.

llvm-svn: 302990
2017-05-13 16:24:38 +00:00
Simon Pilgrim
91451ab4d4 [x86, SSE] AVX1 PR28129 (256-bit all-ones rematerialization)
Further perf tests on Jaguar indicate that:

vxorps  %ymm0, %ymm0, %ymm0
vcmpps  $15, %ymm0, %ymm0, %ymm0

is consistently faster (by about 9%) than:

vpcmpeqd  %xmm0, %xmm0, %xmm0
vinsertf128  $1, %xmm0, %ymm0, %ymm0

Testing equivalent code on a SandyBridge (E5-2640) puts it slightly (~3%) faster as well.

Committed on behalf of @dtemirbulatov

Differential Revision: https://reviews.llvm.org/D32416

llvm-svn: 302989
2017-05-13 13:42:35 +00:00
Simon Pilgrim
288ebc9253 [LoopOptimizer][Fix]PR32859, PR24738
The Loop vectorizer pass introduced undef value while it is fixing output of LCSSA form.
Here it is:

before: %e.0.ph = phi i32 [ 0, %for.inc.2.i ]
after: %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ undef, %middle.block ]

and after this change we have:

%e.0.ph = phi i32 [ 0, %for.inc.2.i ]
%e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ 0, %middle.block ]

Committed on behalf of @dtemirbulatov

Differential Revision: https://reviews.llvm.org/D33055

llvm-svn: 302988
2017-05-13 13:25:57 +00:00
Craig Topper
d39e7102a2 [InstCombine] Prevent InstCombine from triggering an extra iteration if something changed in the initial Worklist creation
Summary:
If the Worklist build causes an IR change this change flag currently factors into the flag for running another iteration of the iteration loop. But only changes during processing should trigger another loop.

This patch captures the worklist creation change flag into the outside the loop flag currently used for DbgDeclares and only sends that flag up to the caller. Rerunning the loop only depends on IC.run() now.

This uses the debug output of InstCombine to determine if one or two iterations run. I couldn't think of a better way to detect it since the second spurious iteration shoudn't make any visible changes. Just wasted computation.

I can do a pre-commit of the test case with the CHECK-NOT as a CHECK if this is an ok way to check this.

This is a subset of D31678 as I'm still not sure how to verify the analysis behavior for that.

Reviewers: davide, majnemer, spatel, chandlerc

Reviewed By: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32453

llvm-svn: 302982
2017-05-13 06:56:04 +00:00
Justin Bogner
104b204907 ConstProp: Split x86 SSE intrinsic tests out of calls.ll
This allows us to mark this as `REQUIRES: x86`, since it uses x86
target specific intrinsics.

llvm-svn: 302980
2017-05-13 05:52:17 +00:00
Justin Bogner
2d310588a2 InstCombine: Move tests that use target intrinsics into subdirectories
Tests with target intrinsics are inherently target specific, so it
doesn't actually make sense to run them if we've excluded their
target.

llvm-svn: 302979
2017-05-13 05:39:46 +00:00
NAKAMURA Takumi
0474b0e5ba Disable llvm/test/Transforms/NewGVN/pr32934.ll while Davide is investigating.
llvm-svn: 302977
2017-05-13 03:05:38 +00:00
Davide Italiano
c5771fbe04 [NewGVN] XFAIL a flaky test until I find out what's going on.
I bet the change is correct but this test seems to expose some underlying
problem that manifest only on some buildbots, and I'm not able to reproduce
locally. Unfortunately I can't debug right now but I don't want to annoy
people with spurious failures, so I'll XFAIL until I can take a look (over
the weekend).

llvm-svn: 302976
2017-05-13 02:45:47 +00:00
Dylan McKay
c826f672fe [AVR] When lowering Select8/Select16, put newly generated MBBs in the same spot
Contributed by Dr. Gergő Érdi.

Fixes a bug.

Raised from (https://github.com/avr-rust/rust/issues/49).

llvm-svn: 302973
2017-05-13 00:22:34 +00:00
Justin Bogner
9e6beb3722 AA: Use generic intrinsics for tests instead of target specific ones
Update a few tests to use llvm.masked.load/store instead of arm neon
vector loads and stores, and move the tests that are actually specific
to those arm intrinsics to their own files. This lets us mark the
tests that use target specific intrinsics as requiring those targets.

llvm-svn: 302972
2017-05-13 00:12:52 +00:00
Xinliang David Li
28a5d9c340 [PartialInlining] Profile based cost analysis
Implemented frequency based cost/saving analysis
and related options.

The pass is now in a state ready to be turne on
in the pipeline (in follow up).

Differential Revision: http://reviews.llvm.org/D32783

llvm-svn: 302967
2017-05-12 23:41:43 +00:00
Andrew Kaylor
392b1353f7 [TLI] Add mapping for various '__<func>_finite' forms of the math routines to SVML routines
Patch by Chris Chrulski

Differential Revision: https://reviews.llvm.org/D31789

llvm-svn: 302957
2017-05-12 22:11:26 +00:00
Andrew Kaylor
0792f83d68 [ConstantFolding] Add folding for various math '__<func>_finite' routines generated from -ffast-math
Patch by Chris Chrulski

Differential Revision: https://reviews.llvm.org/D31788

llvm-svn: 302956
2017-05-12 22:11:20 +00:00
Andrew Kaylor
95445317bd [TLI] Add declarations for various math header file routines from math-finite.h that create '__<func>_finite as functions
Patch by Chris Chrulski

Differential Revision: https://reviews.llvm.org/D31787

llvm-svn: 302955
2017-05-12 22:11:12 +00:00
Sanjay Patel
f353bb583f [x86] add vector tests for demanded bits; NFC
llvm-svn: 302949
2017-05-12 20:53:48 +00:00
Changpeng Fang
c5587a9cbd AMDGPU/SI: Don't promote to vector if the load/store is volatile.
Summary:
  We should not change volatile loads/stores in promoting alloca to vector.

Reviewers:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D33107

llvm-svn: 302943
2017-05-12 20:31:12 +00:00
Simon Pilgrim
86a30cc8d4 [NVPTX] Don't flag StoreRetVal memory chain operands as ReadMem (PR32146)
This fixes 47 of the 75 NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.

Differential Revision: https://reviews.llvm.org/D33147

llvm-svn: 302942
2017-05-12 19:56:43 +00:00
Dehao Chen
d7d29ebf8d Add LiveRangeShrink pass to shrink live range within BB.
Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB.

Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb

Reviewed By: MatzeB, andreadb

Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D32563

llvm-svn: 302938
2017-05-12 19:29:27 +00:00
Tom Stellard
31c4d52ea9 AMDGPU: Add lit.local.cfg to disable global-isel tests when global-isel is disabled
This should fix bots broken by r302919.

llvm-svn: 302928
2017-05-12 17:59:30 +00:00
Reid Kleckner
ca8b6180fe [codeview] Fix assertion failure introduced in r295354 refactoring
CodeViewDebug sets Asm to nullptr to disable debug info generation.  You
can get a .ll file like no-cus.ll from 'clang -gcodeview -g0', which
happens in the ubsan test suite.

llvm-svn: 302923
2017-05-12 17:02:40 +00:00
Tom Stellard
e65dcab676 AMDGPU/GlobalISel: Mark 32-bit integer constants as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33115

llvm-svn: 302919
2017-05-12 16:46:46 +00:00
James Y Knight
9afced58c2 [SPARC] Support 'f' and 'e' inline asm constraints.
Based on patch by Patrick Boettcher and Chris Dewhurst.

Differential Revision: https://reviews.llvm.org/D29116

llvm-svn: 302911
2017-05-12 15:59:10 +00:00
Sanjay Patel
4f6e2e4397 [x86] add tests for potential vector narrowing optimization (PR32790)
llvm-svn: 302910
2017-05-12 15:56:39 +00:00
Davide Italiano
c3a517dde1 [LoopUnroll] Fix a test. REQUIRE should be REQUIRES.
Found by inspection.

llvm-svn: 302909
2017-05-12 15:30:58 +00:00
Davide Italiano
4d6c0b489d [NewGVN] Don't incorrectly reset the memory leader.
This code was missing a check for stores, so we were thinking the
congruency class didn't have any memory members, and reset the
memory leader.

Differential Revision:  https://reviews.llvm.org/D33056

llvm-svn: 302905
2017-05-12 15:22:45 +00:00
Serguei Katkov
4f8d50be22 [BPI] Ignore remainder while distributing the remaining probability from unreachanble
This is a follow up patch for https://reviews.llvm.org/rL300440
to address a comment.

To make implementation to be consistent with other cases we just
ignore the remainder after distribution of remaining probability between
reachable edges.

If we reduced the probability of some edges coming to unreachable
blocks we should distribute the remaining part across other edges
coming to reachable blocks to satisfy the condition that sum of all
probabilities should be equal to one. If this remaining part is not
divided by number of "reachable" edges then we get this remainder.
This remainder probability should be pretty small. Other cases just ignore
if the sum of probabilities is not equal to one so we do the same.

Reviewers: chandlerc, sanjoy, vsk, junbuml, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D32124

llvm-svn: 302883
2017-05-12 07:50:06 +00:00
Jonas Paulsson
9727d7a8ef Handle a COPY with undef source operand in LowerCopy()
Llvm-stress discovered that a COPY may end up in ExpandPostRA::LowerCopy()
with an undef source operand. It is not possible for the target to handle
this, as this flag is not passed to TII->copyPhysReg().

This patch solves this by treating such a COPY as an identity COPY.

Review: Matthias Braun
https://reviews.llvm.org/D32892

llvm-svn: 302877
2017-05-12 06:32:03 +00:00
Mikael Holmen
3048bfca17 [IfConversion] Keep the CFG updated incrementally in IfConvertTriangle
Summary:
Instead of using RemoveExtraEdges (which uses analyzeBranch, which cannot
always be trusted) at the end to fixup the CFG we keep the CFG updated as
we go along and remove or add branches and merge blocks.

This way we won't have any problems if the involved MBBs contain
unanalyzable instructions.

This fixes PR32721.

In that case we had a triangle

   EBB
   | \
   |  |
   | TBB
   |  /
   FBB

where FBB didn't have any successors at all since it ended with an
unconditional return. Then TBB and FBB were be merged into EBB, but EBB
would still keep its successors, and the use of analyzeBranch and
CorrectExtraCFGEdges wouldn't help to remove them since the return
instruction is not analyzable (at least not on ARM).

Reviewers: kparzysz, iteratee, MatzeB

Reviewed By: iteratee

Subscribers: aemerson, rengolin, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33037

llvm-svn: 302876
2017-05-12 06:28:58 +00:00
Chandler Carruth
a360b737fa [PM/Unswitch] Teach the new simple loop unswitch to handle loop
invariant PHI inputs and to rewrite PHI nodes during the actual
unswitching.

The checking is quite easy, but rewriting the PHI nodes is somewhat
surprisingly challenging. This should handle both branches and switches.

I think this is now a full featured trivial unswitcher, and more full
featured than the trivial cases in the old pass while still being (IMO)
somewhat simpler in how it works.

Next up is to verify its correctness in more widespread testing, and
then to add non-trivial unswitching.

Thanks to Davide and Sanjoy for the excellent review. There is one
remaining question that I may address in a follow-up patch (see the
review thread for details) but it isn't related to the functionality
specifically.

Differential Revision: https://reviews.llvm.org/D32699

llvm-svn: 302867
2017-05-12 02:19:59 +00:00
David Blaikie
ea27535943 DWARF: Avoid cross-CU references under Fission
Turns out that the Fission/Split DWARF package format (DWP) is currently
insufficient to handle cross-CU (ref_addr) references. So for now,
duplicate any debug info needed in these situations:
* inlined_subroutine's abstract_origin
* inlined variable's abstract_origin
* types

Keep the ref_addr behavior in general, including in the split DWARF
inline debug info that can be emitted into the object files for online
symbolication.
Keep a flag to use the old (ref_addr) behavior for testing ways of
addressing this limitation in the DWP tool (& for those not using DWP
packaging).

llvm-svn: 302858
2017-05-12 01:13:45 +00:00
Dehao Chen
228587901e Change sample profile writer to make it deterministic.
Summary: This patch changes the function profile output order to be deterministic. In order to make it easier to understand, hottest functions (with most total samples) is ordered first.

Reviewers: dnovillo, davidxl

Reviewed By: dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33111

llvm-svn: 302851
2017-05-11 23:43:44 +00:00
Teresa Johnson
81aa3660f7 Restrict call metadata based hotness detection to Sample PGO mode
Summary:
Don't use the metadata on call instructions for determining hotness
unless we are in sample PGO mode, where it is needed because profile
counts are not accurate. In instrumentation mode this is not necessary
and does more harm than good when calls have VP metadata that hasn't
been properly scaled after transformations or dropped after constant
prop based devirtualization (both should be fixed, but we don't need
to do this in the first place for instrumentation PGO).

This required adjusting a number of tests to distinguish between sample
and instrumentation PGO handling, and to add in profile summary metadata
so that getProfileCount can get the summary.

Reviewers: davidxl, danielcdh

Subscribers: aemerson, rengolin, mehdi_amini, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D32877

llvm-svn: 302844
2017-05-11 23:18:05 +00:00
Guozhi Wei
37cf363f24 [PPC] Change the register constraint of the first source operand of instruction mtvsrdd to g8rc_nox0
According to Power ISA V3.0 document, the first source operand of mtvsrdd is constant 0 if r0 is specified. So the corresponding register constraint should be g8rc_nox0.

This bug caused wrong output generated by 401.bzip2 when -mcpu=power9 and fdo are specified.

Differential Revision: https://reviews.llvm.org/D32880

llvm-svn: 302834
2017-05-11 22:17:35 +00:00
Easwaran Raman
7fb8c288b9 Decrease inlinecold-threshold to 45
I ran the test-suite (including SPEC 2006) in PGO mode comparing cold
thresholds of 225 and 45. Here are some stats on the text size:

Out of 904 tests that ran, 197 see a change in text size. The average
text size reduction (of all the 904 binaries) is 1.07%. Of the 197
binaries, 19 see a text size increase, as high as 18%, but most of them
are small single source benchmarks. There are 3 multisource benchmarks
with a >0.5% size increase (0.7, 1.3 and 2.1 are their % increases). On
the other side of the spectrum, 31 benchmarks see >10% size reduction
and 6 of them are MultiSource.

I haven't run the test-suite with other values of inlinecold-threshold.
Since we have a cold callsite threshold of 45, I picked this value.

Differential revision: https://reviews.llvm.org/D33106

llvm-svn: 302829
2017-05-11 21:36:28 +00:00
Chad Rosier
4962115cec [AArch64][MachineCombine] Fold FNMUL+FSUB -> FNMADD.
Differential Revision: http://reviews.llvm.org/D33101.

llvm-svn: 302822
2017-05-11 20:07:24 +00:00
Vadzim Dambrouski
80fb90f30d [MSP430] Generate EABI-compliant libcalls
Updates the MSP430 target to generate EABI-compatible libcall names.
As a byproduct, adjusts the hardware multiplier options available in
the MSP430 target, adds support for promotion of the ISD::MUL operation
for 8-bit integers, and correctly marks R11 as used by call instructions.

Patch by Andrew Wygle.

Differential Revision: https://reviews.llvm.org/D32676

llvm-svn: 302820
2017-05-11 19:56:14 +00:00
Matt Arsenault
4f44d0e3e0 AMDGPU: Remove tfe bit from flat instruction definitions
We don't use it and it was removed in gfx9, and the encoding
bit repurposed.

Additionally actually using it requires changing the output register
class, which wasn't done anyway.

llvm-svn: 302814
2017-05-11 17:38:33 +00:00
Matt Arsenault
fd9981d4ab AMDGPU: Pull fneg out of extract_vector_elt
This allows folding source modifiers in more f16 cases.
Makes it easier to select per-component packed neg modifiers.

llvm-svn: 302813
2017-05-11 17:26:25 +00:00
Adam Nemet
73703d12a4 [SLP] Emit optimization remarks
The approach I followed was to emit the remark after getTreeCost concludes
that SLP is profitable.  I initially tried emitting them after the
vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely
remember missing a few cases for example in HorizontalReduction::tryToReduce.

ORE is placed in BoUpSLP so that it's available from everywhere (notably
HorizontalReduction::tryToReduce).

We use the first instruction in the root bundle as the locator for the remark.
In order to get a sense how far the tree is spanning I've include the size of
the tree in the remark.  This is not perfect of course but it gives you at
least a rough idea about the tree.  Then you can follow up with -view-slp-tree
to really see the actual tree.

llvm-svn: 302811
2017-05-11 17:06:17 +00:00
Nemanja Ivanovic
58096023de [PowerPC] Eliminate integer compare instructions - vol. 1
This patch is the first in a series of patches to provide code gen for
doing compares in GPRs when the compare result is required in a GPR.

It adds the infrastructure to select GPR sequences for i1->i32 and i1->i64
extensions. This first patch handles equality comparison on i32 operands with
the result sign or zero extended.

Differential Revision: https://reviews.llvm.org/D31847

llvm-svn: 302810
2017-05-11 16:54:23 +00:00
Simon Pilgrim
8ee014b2e7 [X86][AVX] Added zeroall/zeroupper scheduler tests
Missing on SandyBridge and Btver2 models

llvm-svn: 302804
2017-05-11 15:02:49 +00:00
Javed Absar
a6a50d93e8 [IR] Allow attributes with global variables
This patch extends llvm-ir to allow attributes to be set on global variables.
An RFC was sent out earlier by my colleague James Molloy: http://lists.llvm.org/pipermail/cfe-dev/2017-March/053100.html
A key part of that proposal was to extend LLVM-IR to carry attributes on global variables.
This generic feature could be useful for multiple purposes.
In our present context, it would be useful to carry user specified sections for bss/rodata/data.

Reviewed by: Jonathan Roelofs, Reid Kleckner
Differential Revision: https://reviews.llvm.org/D32009

llvm-svn: 302794
2017-05-11 12:28:08 +00:00
Alexander Potapenko
d099c88f0a [msan] Fix PR32842
It turned out that MSan was incorrectly calculating the shadow for int comparisons: it was done by truncating the result of (Shadow1 OR Shadow2) to i1, effectively rendering all bits except LSB useless.
This approach doesn't work e.g. in the case where the values being compared are even (i.e. have the LSB of the shadow equal to zero).
Instead, if CreateShadowCast() has to cast a bigger int to i1, we replace the truncation with an ICMP to 0.

This patch doesn't affect the code generated for SPEC 2006 binaries, i.e. there's no performance impact.

For the test case reported in PR32842 MSan with the patch generates a slightly more efficient code:

  orq     %rcx, %rax
  jne     .LBB0_6
, instead of:

  orl     %ecx, %eax
  testb   $1, %al
  jne     .LBB0_6

llvm-svn: 302787
2017-05-11 11:07:48 +00:00
Chandler Carruth
7ba7a0f05a [x86] Fix a failure to select with AVX-512 when the type legalizer
manages to form a VSELECT with a non-i1 element type condition. Those
are technically allowed in SDAG (at least, the generic type legalization
logic will form them and I wouldn't want to try to audit everything te
preclude forming them) so we need to be able to lower them.

This isn't too hard to implement. We mark VSELECT as custom so we get
a chance in C++, add a fast path for i1 conditions to get directly
handled by the patterns, and a fallback when we need to manually force
the condition to be an i1 that uses the vptestm instruction to turn
a non-mask into a mask.

This, unsurprisingly, generates awful code. But it at least doesn't
crash. This was actually impacting open source packages built with LLVM
for AVX-512 in the wild, so quickly landing a patch that at least stops
the immediate bleeding.

I think I've found where to fix the codegen quality issue, but less
confident of that change so separating it out from the thing that
doesn't change the result of any existing test case but causes mine to
not crash.

llvm-svn: 302785
2017-05-11 10:52:16 +00:00
Diana Picus
aa41f21260 [ARM][GlobalISel] Legalize narrow scalar ops by widening
This is the same as r292827 for AArch64: we widen 8- and 16-bit ADD, SUB
and MUL to 32 bits since we only have TableGen patterns for 32 bits.
See the commit message for r292827 for more details.

At this point we could just remove some of the tests for regbankselect
and instruction-select, since we're not going to see any narrow
operations at those levels anymore. Instead I decided to update them
with G_ANYEXT/G_TRUNC operations, so we can validate the full sequences
generated by the legalizer.

llvm-svn: 302782
2017-05-11 09:45:57 +00:00
Diana Picus
97796d8b55 [ARM][GlobalISel] Support for G_ANYEXT
G_ANYEXT can be introduced by the legalizer when widening scalars. Add
support for it in the register bank info (same mapping as everything
else) and in the instruction selector.

When selecting it, we treat it as a COPY, just like G_TRUNC. On this
occasion we get rid of some assertions in selectCopy so we can reuse it.
This shouldn't be a problem at the moment since we're not supporting any
complicated cases (e.g. FPR, different register banks). We might want to
separate the paths when we do.

llvm-svn: 302778
2017-05-11 08:28:31 +00:00
Igor Breger
d4690d9a2f [GlobalISel][X86] G_ICMP support.
Summary: support G_ICMP for scalar types i8/i16/i64.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, llvm-commits, krytarowski

Differential Revision: https://reviews.llvm.org/D32995

llvm-svn: 302774
2017-05-11 07:17:40 +00:00
David L. Jones
3e3254804c Revert "[SDAG] Relax conditions under stores of loaded values can be merged"
This reverts r302712.

The change fails with ASAN enabled:

ERROR: AddressSanitizer: use-after-poison on address ... at ...
READ of size 2 at ... thread T0
  #0 ... in llvm::SDNode::getNumValues() const <snip>/include/llvm/CodeGen/SelectionDAGNodes.h:855:42
  #1 ... in llvm::SDNode::hasAnyUseOfValue(unsigned int) const <snip>/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:7270:3
  #2 ... in llvm::SDValue::use_empty() const <snip> include/llvm/CodeGen/SelectionDAGNodes.h:1042:17
  #3 ... in (anonymous namespace)::DAGCombiner::MergeConsecutiveStores(llvm::StoreSDNode*) <snip>/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:12944:7

Reviewers: niravd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33081

llvm-svn: 302746
2017-05-10 23:56:21 +00:00
Sanjay Patel
c4d21ee61b [InstCombine] remove fold that swaps xor/or with constants; NFCI
// (X ^ C1) | C2 --> (X | C2) ^ (C1&~C2)

This canonicalization was added at:
https://reviews.llvm.org/rL7264 

By moving xors out/down, we can more easily combine constants. I'm adding
tests that do not change with this patch, so we can verify that those kinds
of transforms are still happening.

This is no-functional-change-intended because there's a later fold:
// (X^C)|Y -> (X|Y)^C iff Y&C == 0
...and demanded-bits appears to guarantee that any fold that would have
hit the fold we're removing here would be caught by that 2nd fold.

Similar reasoning was used in:
https://reviews.llvm.org/rL299384

The larger motivation for removing this code is that it could interfere with 
the fix for PR32706:
https://bugs.llvm.org/show_bug.cgi?id=32706

Ie, we're not checking if the 'xor' is actually a 'not', so we could reverse
a 'not' optimization and cause an infinite loop by altering an 'xor X, -1'. 

Differential Revision: https://reviews.llvm.org/D33050

llvm-svn: 302733
2017-05-10 21:33:55 +00:00
Matt Arsenault
46f6718d8f AMDGPU: Make some packed shuffles free
VOP3P instructions can encode access to either
half of the register.

llvm-svn: 302730
2017-05-10 21:29:33 +00:00
Nirav Dave
0603cde0c8 [SDAG] Relax conditions under stores of loaded values can be merged
Summary:

Allow consecutive stores whose values come from consecutive loads to
merged in the presense of other uses of the loads. Previously this was
disallowed as in general the merged load cannot be shared with the
other uses. Merging N stores into 1 may cause as many as N redundant
loads. However in the context of caching this should have neglible
affect on memory pressure and reduce instruction count making it
almost always a win.

Fixes PR32086.

Reviewers: spatel, jyknight, andreadb, hfinkel, efriedma

Reviewed By: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30471

llvm-svn: 302712
2017-05-10 19:53:41 +00:00
Sanjay Patel
9a17eb1902 [InstSimplify, InstCombine] move 'or' simplification tests; NFC
Surprisingly, I don't think these are redundant for InstSimplify. 
They were just misplaced as InstCombine tests.

llvm-svn: 302684
2017-05-10 15:57:47 +00:00
Simon Pilgrim
3c3c9c0eef [X86][SSE] Check vec_set BUILD_VECTOR tests on both 32 and 64-bit targets
llvm-svn: 302683
2017-05-10 15:52:59 +00:00
Quentin Colombet
625bc4b0e8 [AArch64][RegisterBankInfo] Change the default mapping of fp stores.
For stores, check if the stored value is defined by a floating point
instruction and if yes, we return a default mapping with FPR instead
of GPR.

llvm-svn: 302679
2017-05-10 15:19:41 +00:00
Amara Emerson
08ca9bd16b [AArch64] Enable use of reduction intrinsics.
The new experimental reduction intrinsics can now be used, so I'm enabling this
for AArch64. We will need this for SVE anyway, so it makes sense to do this for
NEON reductions as well.

The existing code to match shufflevector patterns are replaced with a direct
lowering of the reductions to AArch64-specific nodes. Tests updated with the
new, simpler, representation.

Differential Revision: https://reviews.llvm.org/D32247

llvm-svn: 302678
2017-05-10 15:15:38 +00:00
Sanjay Patel
d548d9e52a [InstCombine] remove redundant tests
The first test in this file is duplicated exactly in and.ll -> test33.
We have commuted and vector variants there too.

The second test is a composite of 2 folds. The first fold is tested
independently in add.ll -> flip_and_mask (including vector variant).
After that transform fires, the IR is identical to the first transform.

llvm-svn: 302676
2017-05-10 14:54:49 +00:00
Sanjay Patel
984fd017a0 [InstCombine] fix auto-generated FileCheck-captured variable refs
The script at utils/update_test_checks.py has (had?) a bug when variables
start with the same sequence of letters (clearly, not all of the time).

llvm-svn: 302674
2017-05-10 14:40:04 +00:00
Sanjay Patel
cc782735dc [InstCombine] fix typo in test comment; NFC
llvm-svn: 302669
2017-05-10 14:25:23 +00:00
Ulrich Weigand
bb3bddd00e [SystemZ] Add miscellaneous instructions
This adds a few missing instructions for the assembler and
disassembler.  Those should be the last missing general-
purpose (Chapter 7) instructions for the z10 ISA.

llvm-svn: 302667
2017-05-10 14:20:15 +00:00
Ulrich Weigand
1ef7c92f6f [SystemZ] Add missing arithmetic instructions
This adds the remaining general arithmetic instructions
for assembler / disassembler use.  Most of these are not
useful for codegen; a few might be, and those are listed
in the README.txt for future improvements.

llvm-svn: 302665
2017-05-10 14:18:47 +00:00
Sam Clegg
aaf3055813 [llvm-readobj] Improve errors on invalid binary
The previous code was discarding the error message from
createBinary() by calling errorToErrorCode().
This meant that such error were always reported unhelpfully
as "Invalid data was encountered while parsing the file".

Other tools such as llvm-objdump already produce a more
the error message in this case.

Differential Revision: https://reviews.llvm.org/D32985

llvm-svn: 302664
2017-05-10 14:18:11 +00:00
Sanjay Patel
28b1842d00 [InstCombine] add (ashr (shl i32 X, 31), 31), 1 --> and (not X), 1
This is another step towards favoring 'not' ops over random 'xor' in IR:
https://bugs.llvm.org/show_bug.cgi?id=32706

This transformation may have occurred in longer IR sequences using computeKnownBits,
but that could be much more expensive to calculate.

As the scalar result shows, we do not currently favor 'not' in all cases. The 'not'
created by the transform is transformed again (unnecessarily). Vectors don't have
this problem because vectors are (wrongly) excluded from several other combines.

llvm-svn: 302659
2017-05-10 13:56:52 +00:00
Michael Zuckerman
0456ea13b1 [LLVM][inline-asm] Altmacro string escape character '!'
This patch is the fourth patch in a series of reviews for the Altmacro feature. 
This patch introduces a new escape character '!' and it depends on D32701.

according to https://sourceware.org/binutils/docs/as/Altmacro.html:
"single-character string escape
To include any single character literally in a string (even if the character would otherwise have some special meaning), you can prefix the character with !' (an exclamation mark). For example, you can write <4.3 !> 5.4!!>' to get the literal text `4.3 > 5.4!'. "

Differential Revision: https://reviews.llvm.org/D32792

llvm-svn: 302652
2017-05-10 13:08:11 +00:00
Mikael Holmen
0aa88ec197 [IfConversion] Add missing check in IfConversion/canFallThroughTo
Summary:
When trying to figure out if MBB could fallthrough to ToMBB (possibly by
falling through a bunch of other MBBs) we didn't actually check if there
was fallthrough between the last two blocks in the chain.

Reviewers: kparzysz, iteratee, MatzeB

Reviewed By: kparzysz, iteratee

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D32996

llvm-svn: 302650
2017-05-10 13:06:13 +00:00
Jonas Paulsson
d401c2a29b [SystemZ] Implement getRepRegClassFor()
This method must return a valid register class, or the list-ilp isel
scheduler will crash. For MVT::Untyped nullptr was previously returned, but
now ADDR128BitRegClass is returned instead. This is needed just as long as
list-ilp (and probably also list-hybrid) is still there.

Review: Ulrich Weigand, A Trick
https://reviews.llvm.org/D32802

llvm-svn: 302649
2017-05-10 13:03:25 +00:00
Dmitry Preobrazhensky
299fc6910a [AMDGPU][MC] Corrected v_madak/madmk to avoid printing "_e32" in disassembler output
See bug 32927: https://bugs.llvm.org//show_bug.cgi?id=32927

Reviewers: vpykhtin, artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D32913

llvm-svn: 302648
2017-05-10 13:00:28 +00:00
Igor Breger
6263596e23 [GlobalISel][X86] Split test file. NFC
llvm-svn: 302647
2017-05-10 12:58:31 +00:00
Ulrich Weigand
55d6a88557 [SystemZ] Add decimal integer instructions
This adds the set of decimal integer (BCD) instructions for
assembler / disassembler use.

llvm-svn: 302646
2017-05-10 12:42:45 +00:00
Ulrich Weigand
250144573e [SystemZ] Add crypto instructions
This adds the set of message-security assist instructions for
assembler / disassembler use.

llvm-svn: 302645
2017-05-10 12:42:00 +00:00
Ulrich Weigand
f0971128ba [SystemZ] Add translate/convert instructions
This adds the set of character-set translate and convert instructions
for assembler / disassembler use.

llvm-svn: 302644
2017-05-10 12:41:12 +00:00
Ulrich Weigand
7c859f5897 [SystemZ] Add missing memory/string instructions
This adds a number of missing memory and string instructions
for assembler / disassembler use.

llvm-svn: 302643
2017-05-10 12:40:15 +00:00
Ulrich Weigand
934386e031 [SystemZ] Reformat assembler/disassembler tests
The assembler and disassmebler test cases started out formatted and
sorted in a particular way, but this got lost over time as patches
were added.  Reformat them again.  NFC.

llvm-svn: 302642
2017-05-10 12:39:11 +00:00
Simon Pilgrim
ac3e69a0aa [DAGCombiner] Add vector support to fold (shl/srl 0, x) -> 0
llvm-svn: 302641
2017-05-10 12:34:27 +00:00
Chandler Carruth
20b358db9d Revert r301950: SpeculativeExecution: Stop using whitelist for costs
This pass doesn't correctly handle testing for when it is legal to hoist
arbitrary instructions. The whitelist happens to make it safe, so before
it is removed the pass's legality checks will need to be enhanced.

Details have been added to the code review thread for the patch.

llvm-svn: 302640
2017-05-10 12:30:07 +00:00
Amara Emerson
668fbd4cf5 Add a late IR expansion pass for the experimental reduction intrinsics.
This pass uses a new target hook to decide whether or not to expand a particular
intrinsic to the shuffevector sequence.

Differential Revision: https://reviews.llvm.org/D32245

llvm-svn: 302631
2017-05-10 09:42:49 +00:00
Igor Breger
7350527066 [GlobalISel][X86] G_ZEXT i1 to i32/i64 support.
Summary: Support G_ZEXT i1 to i32/i64 instruction selection.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D32965

llvm-svn: 302623
2017-05-10 06:52:58 +00:00
Ahmed Bougacha
d7292f3867 [CodeGen] Don't require AA in TwoAddress at -O0.
This is a follow-up to r302611, which moved an -O0 computation of DT
from SDAGISel to TwoAddress.

Don't use it here either, and avoid computing it completely.  The only
use was forwarding the analysis as an optional argument to utility
functions.

Differential Revision: https://reviews.llvm.org/D32766

llvm-svn: 302612
2017-05-10 00:56:00 +00:00
Ahmed Bougacha
d74b69039b [CodeGen] Don't require AA in SDAGISel at -O0.
Before r247167, the pass manager builder controlled which AA
implementations were used, exporting them all in the AliasAnalysis
analysis group.

Now, AAResultsWrapperPass always uses BasicAA, but still uses other AA
implementations if made available in the pass pipeline.

But regardless, SDAGISel is required at O0, and really doesn't need to
be doing fancy optimizations based on useful AA results.

Don't require AA at CodeGenOpt::None, and only use it otherwise.

This does have a functional impact (and one testcase is pessimized
because we can't reuse a load).  But I think that's desirable no matter
what.

Note that this alone doesn't result in less DT computations: TwoAddress
was previously able to reuse the DT we computed for SDAG.  That will be
fixed separately.

Differential Revision: https://reviews.llvm.org/D32766

llvm-svn: 302611
2017-05-10 00:39:30 +00:00
Ahmed Bougacha
8c31c46df1 [CodeGen] Compute DT/LI lazily in SafeStackLegacyPass. NFC.
We currently require SCEV, which requires DT/LI.  Those are expensive to
compute, but the pass only runs for functions that have the safestack
attribute.

Compute DT/LI to build SCEV lazily, only when the pass is actually going
to transform the function.

Differential Revision: https://reviews.llvm.org/D31302

llvm-svn: 302610
2017-05-10 00:39:25 +00:00
Ahmed Bougacha
c5a347f087 [CodeGen] Add an -O0 backend pipeline test. NFC.
This should hopefully makes changes to the O0 pipeline obvious; it's
easy to require expensive passes, and this helps make informed
decisions.

Case in point: in the few weeks separating the time when I initially
wrote this patch to the time when I committed, the test regressed as
r302103 added another use of DT!

llvm-svn: 302608
2017-05-10 00:39:17 +00:00
Sam Clegg
19dcd69096 [WebAssembly] Improve libObject support for wasm imports and exports
Previously we had only supported the importing and
exporting of functions and globals.

Also, add usefull overload of getWasmSymbol() and
getNumberOfSymbols() in support of lld port.

Differential Revision: https://reviews.llvm.org/D33011

llvm-svn: 302601
2017-05-09 23:48:41 +00:00
Sanjay Patel
bba8bf8184 [InstCombine] add tests for andn; NFC
llvm-svn: 302599
2017-05-09 23:40:13 +00:00
Keno Fischer
64e8b703ce [GVN] Fix a crash on encountering non-integral pointers
Summary:
This fixes the immediate crash caused by introducing an incorrect inttoptr
before attempting the conversion. There may still be a legality
check missing somewhere earlier for non-integral pointers, but this change
seems necessary in any case.

Reviewers: sanjoy, dberlin

Reviewed By: dberlin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32623

llvm-svn: 302587
2017-05-09 21:07:20 +00:00
Sanjay Patel
9807e9af6a [InstCombine] update test file to use FileCheck; NFC
llvm-svn: 302585
2017-05-09 20:46:12 +00:00
Zvi Rackover
8253c91a9b DAGCombine: Combine shuffles of splat-shuffles
Summary: Reapply r299047, but this time handle correctly splat-masks with undef elements.

Reviewers: spatel, RKSimon, eli.friedman, andreadb

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31961

llvm-svn: 302583
2017-05-09 20:25:38 +00:00
Matthew Simpson
383f076bfc [AArch64] Consider widening instructions in cost calculations
The AArch64 instruction set has a few "widening" instructions (e.g., uaddl,
saddl, uaddw, etc.) that take one or more doubleword operands and produce
quadword results. The operands are automatically sign- or zero-extended as
appropriate. However, in LLVM IR, these extends are explicit. This patch
updates TTI to consider these widening instructions as single operations whose
cost is attached to the arithmetic instruction. It marks extends that are part
of a widening operation "free" and applies a sub-target specified overhead
(zero by default) to the arithmetic instructions.

Differential Revision: https://reviews.llvm.org/D32706

llvm-svn: 302582
2017-05-09 20:18:12 +00:00
Reid Kleckner
1b8d831812 [codeview] Check for a DIExpression offset for local variables
Fixes inalloca parameters, which previously all pointed to the same
offset. Extend the test to use llvm-readobj so that we can test the
offset in a readable way.

llvm-svn: 302578
2017-05-09 19:59:29 +00:00
Adrian Prantl
05ee77f883 Make it illegal for two Functions to point to the same DISubprogram
As recently discussed on llvm-dev [1], this patch makes it illegal for
two Functions to point to the same DISubprogram and updates
FunctionCloner to also clone the debug info of a function to conform
to the new requirement. To simplify the implementation it also factors
out the creation of inlineAt locations from the Inliner into a
general-purpose utility in DILocation.

[1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/112661.html
<rdar://problem/31926379>

Differential Revision: https://reviews.llvm.org/D32975

This reapplies r302469 with a fix for a bot failure (reparentDebugInfo
now checks for the case the orig and new function are identical).

llvm-svn: 302576
2017-05-09 19:47:37 +00:00
Wolfgang Pieb
9b56d2f6b8 [DWARF] Fix a parsing issue with type unit headers.
Reviewers: dblaikie

Differential Revision: https://reviews.llvm.org/D32987

llvm-svn: 302574
2017-05-09 19:38:38 +00:00
Jacques Pienaar
d55c78bcc5 [lanai] Add computeKnownBitsForTargetNode for Lanai.
Summary: computeKnownBitsForTargetNode was not defined for Lanai which resulted in additional AND's with 0x1 for the output of SETCC instructions.

Reviewers: eliben, majnemer

Reviewed By: majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29605

llvm-svn: 302568
2017-05-09 18:35:26 +00:00
Sam Clegg
c698370461 [WebAssembly] Fix validation of start function
The check for valid start function was inverted.  Added a new
test in test/Object to check this case and fixed the existing
tests in for ObjectYAML.

Differential Revision: https://reviews.llvm.org/D32986

llvm-svn: 302560
2017-05-09 17:51:38 +00:00
Davide Italiano
334449f848 [NewGVN] Fix a consistent order for phi nodes operands.
The way we currently define congruency for two PHIExpression(s) is:

1) The operands to the phi functions are congruent
2) The PHIs are defined in the same BasicBlock.

NewGVN works under the assumption that phi operands are in predecessor
order, or at least in some consistent order. OTOH, is valid IR:

patatino:
  %meh = phi i16 [ %0, %winky ], [ %conv1, %tinky ]
  %banana = phi i16 [ %0, %tinky ], [ %conv1, %winky ]
  br label %end

and the in-memory representations of the two SSA registers have an
inconsistent order. This violation of NewGVN assumptions results into
two PHIs found congruent when they're not. While we think it's useful
to have always a consistent order enforced, let's fix this in NewGVN
sorting uses in predecessor order before creating a PHI expression.

Differential Revision:  https://reviews.llvm.org/D32990

llvm-svn: 302552
2017-05-09 16:58:28 +00:00
Craig Topper
4cf65193e1 [X86] Add more patterns for BZHI isel
This patch adds more patterns that a reasonable person might write that can be compiled to BZHI.

This adds support for

(~0U >> (32 - b)) & a;

and

a << (32 - b) >> (32 - b);

This was inspired by the code in APInt::clearUnusedBits.

This can pass an index of 32 to the bzhi instruction which a quick test of Haswell hardware shows will not mask any bits. Though the description text in the Intel manual says the "index is saturated to OperandSize-1". The pseudocode in the same manual indicates no bits will be zeroed for this case.

I think this is still missing cases where the subtract portion is an 8-bit operation.

Differential Revision: https://reviews.llvm.org/D32616

llvm-svn: 302549
2017-05-09 16:32:11 +00:00
Sanjay Patel
4f2714d90b [InstCombineCasts] Fix checks in sext->lshr->trunc pattern.
The comment says to avoid the case where zero bits are shifted into the truncated value, 
but the code checks that the shift is smaller than the truncated value instead of the 
number of bits added by the sign extension. Fixing this allows a shift by more than the 
value size to be introduced, which is undefined behavior, so the shift is capped at the 
value size minus one, which has the expected behavior of filling the value with the sign 
bit.

Patch by Jacob Young!

Differential Revision: https://reviews.llvm.org/D32285

llvm-svn: 302548
2017-05-09 16:24:59 +00:00
Guy Blank
52106f6b42 VX512] Only look at lower bit in constant scalar masks
for scalar masked instructions only the lower bit of the mask is relevant. so for constant masks we should either do an unmasked operation or no operation, depending on the value of the lower bit.
This patch handles cases where the lower bit is '1'.

Differential Revision: https://reviews.llvm.org/D32805

llvm-svn: 302546
2017-05-09 16:16:48 +00:00
Reid Kleckner
bed1389ae3 Re-land "Use the frame index side table for byval and inalloca arguments"
This re-lands r302483. It was not the cause of PR32977.

llvm-svn: 302544
2017-05-09 16:02:20 +00:00
Reid Kleckner
fc145824a1 Re-land "Don't add DBG_VALUE instructions for static allocas in dbg.declare"
This re-lands commit r302461. It was not the cause of PR32977.

llvm-svn: 302543
2017-05-09 16:01:47 +00:00
Hans Wennborg
1ddac6ae37 Revert r302469 "Make it illegal for two Functions to point to the same DISubprogram"
This caused PR32977.

Original commit message:

> Make it illegal for two Functions to point to the same DISubprogram
>
> As recently discussed on llvm-dev [1], this patch makes it illegal for
> two Functions to point to the same DISubprogram and updates
> FunctionCloner to also clone the debug info of a function to conform
> to the new requirement. To simplify the implementation it also factors
> out the creation of inlineAt locations from the Inliner into a
> general-purpose utility in DILocation.
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/112661.html
> <rdar://problem/31926379>
>
> Differential Revision: https://reviews.llvm.org/D32975

llvm-svn: 302533
2017-05-09 14:44:15 +00:00
Anna Thomas
3580c4d010 [LV] Fix insertion point for shuffle vectors in first order recurrence
Summary:
In first order recurrence vectorization, when the previous value is a phi node, we need to
set the insertion point to the first non-phi node.
We can have the previous value being a phi node, due to the generation of new
IVs as part of trunc optimization [1].

[1] https://reviews.llvm.org/rL294967

Reviewers: mssimpso, mkuper

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D32969

llvm-svn: 302532
2017-05-09 14:29:33 +00:00
Guy Blank
31e1f1d978 [X86][AVX512] Refine some avx512er intrinsics tests. NFC.
The modified tests should test the masked intrinsics.
Currently the mask is constant, which with a future patch (https://reviews.llvm.org/D32805) will cause the intrinsics to be replaced with an unmasked version.
This patch changes the constant mask to be a variable one.

llvm-svn: 302529
2017-05-09 14:03:51 +00:00
Serge Pavlov
b8ce9ec478 Add extra operand to CALLSEQ_START to keep frame part set up previously
Using arguments with attribute inalloca creates problems for verification
of machine representation. This attribute instructs the backend that the
argument is prepared in stack prior to  CALLSEQ_START..CALLSEQ_END
sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size
stored in CALLSEQ_START in this case does not count the size of this
argument. However CALLSEQ_END still keeps total frame size, as caller can
be responsible for cleanup of entire frame. So CALLSEQ_START and
CALLSEQ_END keep different frame size and the difference is treated by
MachineVerifier as stack error. Currently there is no way to distinguish
this case from actual errors.

This patch adds additional argument to CALLSEQ_START and its
target-specific counterparts to keep size of stack that is set up prior to
the call frame sequence. This argument allows MachineVerifier to calculate
actual frame size associated with frame setup instruction and correctly
process the case of inalloca arguments.

The changes made by the patch are:
- Frame setup instructions get the second mandatory argument. It
  affects all targets that use frame pseudo instructions and touched many
  files although the changes are uniform.
- Access to frame properties are implemented using special instructions
  rather than calls getOperand(N).getImm(). For X86 and ARM such
  replacement was made previously.
- Changes that reflect appearance of additional argument of frame setup
  instruction. These involve proper instruction initialization and
  methods that access instruction arguments.
- MachineVerifier retrieves frame size using method, which reports sum of
  frame parts initialized inside frame instruction pair and outside it.

The patch implements approach proposed by Quentin Colombet in
https://bugs.llvm.org/show_bug.cgi?id=27481#c1.
It fixes 9 tests failed with machine verifier enabled and listed
in PR27481.

Differential Revision: https://reviews.llvm.org/D32394

llvm-svn: 302527
2017-05-09 13:35:13 +00:00
Simon Dardis
76a4991023 Revert "[MIPS] Add support to match more patterns for DINS instruction"
This reverts commit rL302512. This broke the mips buildbots.

llvm-svn: 302526
2017-05-09 13:18:48 +00:00
Simon Pilgrim
50affcce7b [X86][SSE42] Lower v2i64/v4i64 ASHR(X, 63) as PCMPGTQ(0, X)
Similar to what we do for vXi8 ASHR(X, 7), use SSE42's PCMPGTQ to splat the sign instead of using the PSRAD+PSHUFD.

Avoiding bitcasts this improves combines that utilize computeNumSignBits, permits memory folding and reduces pipe pressure. Although it does require a second register, given that this is a (cheap) zero register the impact is minimal.

Differential Revision: https://reviews.llvm.org/D32973

llvm-svn: 302525
2017-05-09 13:14:40 +00:00
Guy Blank
03bb27e056 [X86][AVX512] Add test for masking of scalar instructions.
llvm-svn: 302519
2017-05-09 12:32:48 +00:00
Nikolai Bozhenov
3789a9bfa0 [X86] Clang option -fuse-init-array has no effect when generating for MCU target
Reviewers: Eugene.Zelenko, dschuff, craig.topper

Reviewed By: craig.topper

Subscribers: ahatanak, aaboud, DavidKreitzer, llvm-commits, cfe-commits

Differential Revision: https://reviews.llvm.org/D32543
Patch by AndreiGrischenko <andrei.l.grischenko@intel.com>

llvm-svn: 302513
2017-05-09 10:14:03 +00:00
Strahinja Petrovic
d020c9cb48 [MIPS] Add support to match more patterns for DINS instruction
This patch adds support for recognizing patterns to match
DINS instruction.

Differential Revision: https://reviews.llvm.org/D31465

llvm-svn: 302512
2017-05-09 10:02:00 +00:00
Reid Kleckner
1a48591876 Revert "Don't add DBG_VALUE instructions for static allocas in dbg.declare"
This reverts commit r302461.

It appears to be causing failures compiling gtest with debug info on the
Linux sanitizer bot. I was unable to reproduce the failure locally,
however.

llvm-svn: 302504
2017-05-09 01:57:44 +00:00
Teresa Johnson
7ff9f7abb3 Fix code section prefix for proper layout
Summary:
r284533 added hot and cold section prefixes based on profile
information, to enable grouping of hot/cold functions at link time.
However, it used "cold" as the prefix for cold sections, but gold only
recognizes "unlikely" (which is used by gcc for cold sections).
Therefore, cold sections were not properly being grouped. Switch to
using "unlikely"

Reviewers: danielcdh, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32983

llvm-svn: 302502
2017-05-09 01:43:24 +00:00
Reid Kleckner
e98eae6da6 Revert "Use the frame index side table for byval and inalloca arguments"
This reverts r302483 and it's follow up fix.

llvm-svn: 302493
2017-05-09 01:14:39 +00:00
Evgeniy Stepanov
49f6da0167 Ignore !associated metadata with null argument.
Fixes PR32577 (comment 10).
Such metadata may legitimately appear in LTO.

llvm-svn: 302485
2017-05-08 23:46:20 +00:00
Reid Kleckner
944adda3ae Relax Dwarf filecheck test for 32-bit hosts
llvm-svn: 302484
2017-05-08 23:27:52 +00:00