1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 12:33:33 +02:00
Commit Graph

143748 Commits

Author SHA1 Message Date
Matthias Braun
65593c842b PowerPC: Slight cleanup of getReservedRegs(); NFC
Change getReservedRegs() to not mark a register as reserved and then
revert that decision in some cases. Motivated by the discussion in
https://reviews.llvm.org/D29056

llvm-svn: 293073
2017-01-25 17:12:10 +00:00
Krzysztof Parzyszek
045e331ac6 Add loop pass insertion point EP_LateLoopOptimizations
Differential Revision: https://reviews.llvm.org/D28694

llvm-svn: 293067
2017-01-25 16:12:25 +00:00
Artur Pilipenko
c5d63dfc6a [Guards] Introduce loop-predication pass
This patch introduces guard based loop predication optimization. The new LoopPredication pass tries to convert loop variant range checks to loop invariant by widening checks across loop iterations. For example, it will convert

  for (i = 0; i < n; i++) {
    guard(i < len);
    ...
  }

to

  for (i = 0; i < n; i++) {
    guard(n - 1 < len);
    ...
  }

After this transformation the condition of the guard is loop invariant, so loop-unswitch can later unswitch the loop by this condition which basically predicates the loop by the widened condition:

  if (n - 1 < len)
    for (i = 0; i < n; i++) {
      ...
    } 
  else
    deoptimize

This patch relies on an NFC change to make ScalarEvolution::isMonotonicPredicate public (revision 293062).

Reviewed By: sanjoy

Differential Revision: https://reviews.llvm.org/D29034

llvm-svn: 293064
2017-01-25 16:00:44 +00:00
Chad Rosier
d7569c2c65 [AArch64] Minor code refactoring. NFC.
llvm-svn: 293063
2017-01-25 15:56:59 +00:00
Artur Pilipenko
e063c39b33 NFC. Make ScalarEvolution::isMonotonicPredicate public
Will be used by the upcoming LoopPredication optimization.

llvm-svn: 293062
2017-01-25 15:07:55 +00:00
Artur Pilipenko
6b244ca01f [InstCombine] Canonicalize guards for NOT OR condition
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine

Reviewed By: apilipenko

Differential Revision: https://reviews.llvm.org/D29075

Patch by Maxim Kazantsev.

llvm-svn: 293061
2017-01-25 14:45:12 +00:00
Simon Pilgrim
6c22b528b6 [InstCombine][SSE] Add support for PACKSS/PACKUS constant folding
Differential Revision: https://reviews.llvm.org/D28949

llvm-svn: 293060
2017-01-25 14:37:24 +00:00
Martin Bohme
02c46daf0c [ARM] GlobalISel: Fix stack-use-after-scope bug.
Summary:
Lifetime extension wasn't triggered on the result of BuildMI because the
reference was non-const. However, instead of adding a const, I've
removed the reference entirely as RVO should kick in anyway.

Reviewers: rovka, bkramer

Reviewed By: bkramer

Subscribers: aemerson, rengolin, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D29124

llvm-svn: 293059
2017-01-25 14:28:19 +00:00
Artur Pilipenko
71132836b4 [InstCombine] Canonicalize guards for AND condition
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine

Reviewed By: apilipenko

Differential Revision: https://reviews.llvm.org/D29074

Patch by Maxim Kazantsev.

llvm-svn: 293058
2017-01-25 14:20:52 +00:00
Artur Pilipenko
8d5f1040c3 [InstCombine] Allow InstrCombine to remove one of adjacent guards if they are equivalent
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine

Reviewed By: majnemer, apilipenko

Differential Revision: https://reviews.llvm.org/D29071

Patch by Maxim Kazantsev.

llvm-svn: 293056
2017-01-25 14:12:12 +00:00
Alexey Bataev
cb8becd477 [SLP] Improve horizontal vectorization for non-power-of-2 number of
instructions.

If number of instructions in horizontal reduction list is not power of 2
then only PowerOf2Floor(NumberOfInstructions) last elements are actually
vectorized, other instructions remain scalar. Patch tries to vectorize
the remaining elements either.

Differential Revision: https://reviews.llvm.org/D28959

llvm-svn: 293042
2017-01-25 09:54:38 +00:00
whitequark
e14626baaf Mark @llvm.powi.* as safe to speculatively execute.
Floating point intrinsics in LLVM are generally not speculatively
executed, since most of them are defined to behave the same as libm
functions, which set errno.

However, the @llvm.powi.* intrinsics do not correspond to any libm
function, and lacks any defined error handling semantics in LangRef.
It most certainly does not alter errno.

llvm-svn: 293041
2017-01-25 09:32:30 +00:00
Mohammed Agabaria
ee333a2999 [X86] enable memory interleaving for X86\SLM arch.
Differential Revision: https://reviews.llvm.org/D28547

llvm-svn: 293040
2017-01-25 09:14:48 +00:00
Artur Pilipenko
4d180f5ce3 Fix buildbot failures introduced by 293036
Fix unused variable, specify types explicitly to make VC compiler happy.

llvm-svn: 293039
2017-01-25 09:10:07 +00:00
Artur Pilipenko
0e6418640a [DAGCombiner] Match load by bytes idiom and fold it into a single load. Attempt #2.
The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm.
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html

This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used.

From the original commit:

Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.

Assuming little endian target:
  i8 *a = ...
  i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
  i32 val = *((i32)a)

  i8 *a = ...
  i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
  i32 val = BSWAP(*((i32)a))

This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.

Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
  i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)

Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.

The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.

Reviewed By: RKSimon, filcab, chandlerc 

Differential Revision: https://reviews.llvm.org/D27861

llvm-svn: 293036
2017-01-25 08:53:31 +00:00
Diana Picus
3e1c91b8c7 [ARM] GlobalISel: Support i1 add and ABI extensions
Add support for:
* i1 add
* i1 function arguments, if passed through registers
* i1 returns, with ABI signext/zeroext

Differential Revision: https://reviews.llvm.org/D27706

llvm-svn: 293035
2017-01-25 08:47:40 +00:00
Diana Picus
216544ff2f [ARM] GlobalISel: Support i8/i16 ABI extensions
At the moment, this means supporting the signext/zeroext attribute on the return
type of the function. For function arguments, signext/zeroext should be handled
by the caller, so there's nothing for us to do until we start lowering calls.

Note that this does not include support for other extensions (i8 to i16), those
will be added later.

Differential Revision: https://reviews.llvm.org/D27705

llvm-svn: 293034
2017-01-25 08:10:40 +00:00
Serge Pavlov
8bed9fa904 Do not verify dominator tree if it has no roots
If dominator tree has no roots, the pass that calculates it is
likely to be skipped. It occures, for instance, in the case of
entities with linkage available_externally. Do not run tree
verification in such case.

Differential Revision: https://reviews.llvm.org/D28767

llvm-svn: 293033
2017-01-25 07:58:10 +00:00
Dean Michael Berris
ace789b028 Implemented color coding and Vertex labels in XRay Graph
Summary:
A patch to enable the llvm-xray graph subcommand to color edges and
vertices based on statistics and to annotate vertices with statistics.

Depends on D27243

Reviewers: dblaikie, dberris

Reviewed By: dberris

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D28225

llvm-svn: 293031
2017-01-25 07:14:43 +00:00
Coby Tayree
00c1f58b67 [X86]Enable the use of 'mov' with a 64bit GPR and a large immediate
Enable the next form (intel style):
"mov <reg64>, <largeImm>"
which is should be available,
where <largeImm> stands for immediates which exceed the range of a singed 32bit integer

Differential Revision: https://reviews.llvm.org/D28988

llvm-svn: 293030
2017-01-25 07:09:42 +00:00
Diana Picus
a678e82c94 [ARM] GlobalISel: Bail out on Thumb. NFC
Thumb is not supported yet, so bail out early.

llvm-svn: 293029
2017-01-25 07:08:53 +00:00
Matt Arsenault
9f4074601e AMDGPU: Check nsz instead of unsafe math
llvm-svn: 293028
2017-01-25 06:27:02 +00:00
Akira Hatanaka
68f4d270cd [SimplifyCFG] Do not sink and merge inline-asm instructions.
Conservatively disable sinking and merging inline-asm instructions as doing so
can potentially create arguments that cannot satisfy the inline-asm constraints.

For example, SimplifyCFG used to do the following transformation:

(before)
if.then:
  %0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 8)
  br label %if.end
if.else:
  %1 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 6)
  br label %if.end

(after)
  %.sink = select i1 %tobool, i32 6, i32 8
  %0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 %.sink)

This would result in a crash in the backend since only immediate integer operands
are permitted for constraint "n".

rdar://problem/30110806

Differential Revision: https://reviews.llvm.org/D29111

llvm-svn: 293025
2017-01-25 06:21:51 +00:00
Matt Arsenault
093401c700 DAG: Recognize no-signed-zeros-fp-math attribute
clang already emits this with -cl-no-signed-zeros, but codegen
doesn't do anything with it. Treat it like the other fast math
attributes, and change one place to use it.

llvm-svn: 293024
2017-01-25 06:08:42 +00:00
Justin Bogner
a115225f55 GlobalISel: Fix typo in error message
llvm-svn: 293023
2017-01-25 06:02:10 +00:00
NAKAMURA Takumi
1de33b6dae Ignore llvm/test/tools/llvm-symbolizer/coff-exports.test on mingw.
FIXME: Demangler could behave along not host but target.
For example, assume host=mingw, target=msc.

llvm-svn: 293021
2017-01-25 05:26:23 +00:00
Matt Arsenault
b570b62964 DAGCombiner: Allow negating ConstantFP after legalize
llvm-svn: 293019
2017-01-25 04:54:34 +00:00
Gerolf Hoflehner
a64ca53972 [InstCombine] Added regression test to narrow-swich.ll
llvm-svn: 293018
2017-01-25 04:34:59 +00:00
NAKAMURA Takumi
b8624b1643 Rewind instantiations of OuterAnalysisManagerProxy in r289317, r291651, and r291662.
I found root class should be instantiated for variadic tempate to instantiate static member explicitly.

This will fix failures in mingw DLL build.

llvm-svn: 293017
2017-01-25 04:26:29 +00:00
Matt Arsenault
ec49368879 AMDGPU: Implement early ifcvt target hooks.
Leave early ifcvt disabled for now since there are some
shader-db regressions.

This causes some immediate improvements, but could be better.
The cost checking that the pass does is based on critical path
length for out of order CPUs which we do not want so it skips out
on many cases we want.

llvm-svn: 293016
2017-01-25 04:25:02 +00:00
Peter Collingbourne
a6e1ed1de7 gold-plugin: Add the file path to the file open error diagnostic.
llvm-svn: 293013
2017-01-25 03:35:28 +00:00
Ahmed Bougacha
9401d2bc09 Try to prevent build breakage by touching a CMakeLists.txt.
Looks like our cmake goop for handling .inc->td dependencies doesn't
track the .td files.

This manifests as cmake complaining about missing files since r293009.

Force a rerun to avoid that.

llvm-svn: 293012
2017-01-25 02:55:24 +00:00
Chandler Carruth
c3aa937b25 [PM] Teach LoopUnroll to update the LPM infrastructure as it unrolls
loops.

We do this by reconstructing the newly added loops after the unroll
completes to avoid threading pass manager details through all the mess
of the unrolling infrastructure.

I've enabled some extra assertions in the LPM to try and catch issues
here and enabled a bunch of unroller tests to try and make sure this is
sane.

Currently, I'm manually running loop-simplify when needed. That should
go away once it is folded into the LPM infrastructure.

Differential Revision: https://reviews.llvm.org/D28848

llvm-svn: 293011
2017-01-25 02:49:01 +00:00
Ahmed Bougacha
365c1158a8 [GlobalISel] Generate selector for more integer binop patterns.
This surprisingly isn't NFC because there are patterns to select GPR
sub to SUBSWrr (rather than SUBWrr/rs); SUBS is later optimized to
SUB if NZCV is dead.  From ISel's perspective, both are fine.

llvm-svn: 293010
2017-01-25 02:41:38 +00:00
Ahmed Bougacha
b9bb47b7f8 [GlobalISel] Rename TargetGlobalISel.td to GISel/SelectionDAGCompat.td
llvm-svn: 293009
2017-01-25 02:41:26 +00:00
Greg Parker
efc0b2cc29 Reinstate "r292904 - [lit] Allow boolean expressions in REQUIRES and XFAIL
and UNSUPPORTED"

This reverts the revert in r292942.

llvm-svn: 293007
2017-01-25 02:26:03 +00:00
Gor Nishanov
750a36491c [coroutines] Spill the result of the invoke instruction correctly
Summary:
When we decide that the result of the invoke instruction need to be spilled, we need to insert the spill into a block that is on the normal edge coming out of the invoke instruction. (Prior to this change the code would insert the spill immediately after the invoke instruction, which breaks the IR, since invoke is a terminator instruction).

In the following example, we will split the edge going into %cont and insert the spill there.

```
  %r = invoke double @print(double 0.0) to label %cont unwind label %pad

  cont:
    %0 = call i8 @llvm.coro.suspend(token none, i1 false)
    switch i8 %0, label %suspend [i8 0, label %resume
                                  i8 1, label %cleanup]
  resume:
    call double @print(double %r)
```

Reviewers: majnemer

Reviewed By: majnemer

Subscribers: mehdi_amini, llvm-commits, EricWF

Differential Revision: https://reviews.llvm.org/D29102

llvm-svn: 293006
2017-01-25 02:25:54 +00:00
Tom Stellard
bbf29e433b AMDGPU add support for spilling to a user sgpr pointed buffers
Summary:
This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1].

Patch By: Dave Airlie

Reviewers: nhaehnle, arsenm, tstellarAMD

Reviewed By: arsenm

Subscribers: mareko, llvm-commits, kzhuravl, wdng, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D25428

llvm-svn: 293000
2017-01-25 01:25:13 +00:00
Eugene Zelenko
7efc5e5021 [AArch64] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 292996
2017-01-25 00:29:26 +00:00
Justin Bogner
1aa5ce17e7 GlobalISel: Use the correct types when translating landingpad instructions
There was a bug here where we were using p0 instead of s32 for the
selector type in the landingpad. Instead of hardcoding these types we
should get the types from the landingpad instruction directly.

Note that we replicate an assert from SDAG here to only support
two-valued landingpads.

llvm-svn: 292995
2017-01-25 00:16:53 +00:00
Kevin Enderby
8bf2878f72 Fix llvm-objdump so it picks a good CPU based for Mach-O files
for CPU_SUBTYPE_ARM_V7S and CPU_SUBTYPE_ARM_V7K.

For these two cpusubtypes they should default to a cortex-a7 CPU
to give proper disassembly without a -mcpu= flag.

rdar://27431703

llvm-svn: 292993
2017-01-24 23:41:04 +00:00
Eugene Zelenko
4dcbc300ec [XCore] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 292988
2017-01-24 23:02:48 +00:00
Matt Arsenault
8060661ae3 AMDGPU: Remove spurious out branches after a kill
The sequence like this:
  v_cmpx_le_f32_e32 vcc, 0, v0
  s_branch BB0_30
  s_cbranch_execnz BB0_30
  ; BB#29:
  exp null off, off, off, off done vm
  s_endpgm
  BB0_30:
  ; %endif110

is likely wrong. The s_branch instruction will unconditionally jump
to BB0_30 and the skip block (exp done + endpgm) inserted for
performing the kill instruction will never be executed. This results
in a GPU hang with Star Ruler 2.

The s_branch instruction is added during the "Control Flow Optimizer"
pass which seems to re-organize the basic blocks, and we assume
that SI_KILL_TERMINATOR is always the last instruction inside a
basic block. Thus, after inserting a skip block we just go to the
next BB without looking at the subsequent instructions after the
kill, and the s_branch op is never removed.

Instead, we should remove the unconditional out branches and let
skip the two instructions if the exec mask is non-zero.

This patch fixes the GPU hang and doesn't introduce any regressions
with "make check".

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019

Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>

llvm-svn: 292985
2017-01-24 22:18:39 +00:00
Wei Mi
ca272c5437 Revert rL292621. Caused some internal build bot failures in apple.
llvm-svn: 292984
2017-01-24 22:15:06 +00:00
Eugene Zelenko
54635cc5dd [SystemZ] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 292983
2017-01-24 22:10:43 +00:00
Matt Arsenault
81a9bfe915 Enable FeatureFlatForGlobal on Volcanic Islands
This switches to the workaround that HSA defaults to
for the mesa path.

This should be applied to the 4.0 branch.

Patch by Vedran Miletić <vedran@miletic.net>

llvm-svn: 292982
2017-01-24 22:02:15 +00:00
Dehao Chen
abd6950761 Explicitly promote indirect calls before sample profile annotation.
Summary: In iterative sample pgo where profile is collected from PGOed binary, we may see indirect call targets promoted and inlined in the profile. Before profile annotation, we need to make this happen in order to annotate correctly on IR. This patch explicitly promotes these indirect calls and inlines them before profile annotation.

Reviewers: xur, davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29040

llvm-svn: 292979
2017-01-24 21:05:51 +00:00
Saleem Abdulrasool
8d12fdc869 Demangle: correct demangling for CV-qualified functions
When demangling a CV-qualified function type with a final reference type
parameter, we would treat the reference type parameter as a r-value ref
accidentally.  This would result in the improper decoration of the
function type itself.

Resolves PR31741!

llvm-svn: 292976
2017-01-24 20:04:58 +00:00
Saleem Abdulrasool
14ea071037 Demangle: use named values for CV qualifiers
Rather than hard-coding magic values of 1, 2, 4 (bit-field), use an enum
to name the values.  NFC.

llvm-svn: 292975
2017-01-24 20:04:56 +00:00
Ivan Krasin
4c52eb7431 Revert [AMDGPU][mc][tests][NFC] Add coverage/smoke tests for Gfx7 and Gfx8.
Reason: broke ASAN bots with a global buffer overflow.
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/2291

Each test contains 20-30K test cases but takes only several (from 4 to 10)
seconds to complete on average machine. The tests cover the majority of
AMDGPU Gfx7/Gfx8 instructions, including many dark corners, and intended
to quickly find out if something is broken.

llvm-svn: 292974
2017-01-24 19:58:59 +00:00