1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 11:02:59 +02:00
Commit Graph

127597 Commits

Author SHA1 Message Date
Hans Wennborg
8484662941 Revert r260979 "[X86] Enable the LEA optimization pass by default."
Asserts are still firing in Chromium builds. PR26575.

llvm-svn: 261058
2016-02-17 02:49:59 +00:00
Xinliang David Li
2bdd7e9a49 revert r261038: arm/aarch64 bot failure
llvm-svn: 261057
2016-02-17 02:39:34 +00:00
Mehdi Amini
919bd12aad Revert "Query the StringMap only once when creating MDString (NFC)"
This reverts commit r261030 and r261036.
(The revision was marked "approved" on phabricator, but some concerns
were raised on the mailing list. Thanks D. Blaikie for notifying me.)

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261055
2016-02-17 02:18:58 +00:00
Haicheng Wu
6b1e9d59b2 [AliasSetTracker] Teach AliasSetTracker about MemSetInst
This change is to fix the problem discussed in
http://lists.llvm.org/pipermail/llvm-dev/2016-February/095446.html.

llvm-svn: 261052
2016-02-17 02:01:50 +00:00
JF Bastien
08c122fb28 WebAssembly: update expected failures
r261050 seems to inadvertently fix the assertion failure.

llvm-svn: 261051
2016-02-17 01:59:23 +00:00
Dan Gohman
811eb4337e [WebAssembly] Call memcpy for large byval copies.
This fixes very slow compilation on
test/CodeGen/Generic/2010-11-04-BigByval.ll . Note that MaxStoresPerMemcpy
and friends are not yet carefully tuned so the cutoff point is currently
somewhat arbitrary. However, it's important that there be a cutoff point
so that we don't emit unbounded quantities of loads and stores.

llvm-svn: 261050
2016-02-17 01:43:37 +00:00
JF Bastien
83e991d52a WebAssembly: update expected test failures
r261032 adds frame address support.

llvm-svn: 261044
2016-02-17 00:34:15 +00:00
Chandler Carruth
4ec346556c [LCG] Construct an actual call graph with call-edge SCCs nested inside
reference-edge SCCs.

This essentially builds a more normal call graph as a subgraph of the
"reference graph" that was the old model. This allows both to exist and
the different use cases to use the aspect which addresses their needs.
Specifically, the pass manager and other *ordering* constrained logic
can use the reference graph to achieve conservative order of visit,
while analyses reasoning about attributes and other properties derived
from reachability can reason about the direct call graph.

Note that this isn't necessarily complete: it doesn't model edges to
declarations or indirect calls. Those can be found by scanning the
instructions of the function if desirable, and in fact every user
currently does this in order to handle things like calls to instrinsics.
If useful, we could consider caching this information in the call graph
to save the instruction scans, but currently that doesn't seem to be
important.

An important realization for why the representation chosen here works is
that the call graph is a formal subset of the reference graph and thus
both can live within the same data structure. All SCCs of the call graph
are necessarily contained within an SCC of the reference graph, etc.

The design is to build 'RefSCC's to model SCCs of the reference graph,
and then within them more literal SCCs for the call graph.

The formation of actual call edge SCCs is not done lazily, unlike
reference edge 'RefSCC's. Instead, once a reference SCC is formed, it
directly builds the call SCCs within it and stores them in a post-order
sequence. This is used to provide a consistent platform for mutation and
update of the graph. The post-order also allows for very efficient
updates in common cases by bounding the number of nodes (and thus edges)
considered.

There is considerable common code that I'm still looking for the best
way to factor out between the various DFS implementations here. So far,
my attempts have made the code harder to read and understand despite
reducing the duplication, which seems a poor tradeoff. I've not given up
on figuring out the right way to do this, but I wanted to wait until
I at least had the system working and tested to continue attempting to
factor it differently.

This also requires introducing several new algorithms in order to handle
all of the incremental update scenarios for the more complex structure
involving two edge colorings. I've tried to comment the algorithms
sufficiently to make it clear how this is expected to work, but they may
still need more extensive documentation.

I know that there are some changes which are not strictly necessarily
coupled here. The process of developing this started out with a very
focused set of changes for the new structure of the graph and
algorithms, but subsequent changes to bring the APIs and code into
consistent and understandable patterns also ended up touching on other
aspects. There was no good way to separate these out without causing
*massive* merge conflicts. Ultimately, to a large degree this is
a rewrite of most of the core algorithms in the LCG class and so I don't
think it really matters much.

Many thanks to the careful review by Sanjoy Das!

Differential Revision: http://reviews.llvm.org/D16802

llvm-svn: 261040
2016-02-17 00:18:16 +00:00
Reid Kleckner
30231d4f95 [X86] Fix a shrink-wrapping miscompile around __chkstk
__chkstk clobbers EAX. If EAX is live across the prologue, then we have
to take extra steps to save it. We already had code to do this if EAX
was a register parameter. This change adapts it to work when shrink
wrapping is used.

llvm-svn: 261039
2016-02-17 00:17:33 +00:00
Xinliang David Li
fc57478329 New test case: make sure alloc bit is not set for covmap section on Linux
llvm-svn: 261038
2016-02-17 00:14:52 +00:00
Dan Gohman
ea19c3e264 [WebAssembly] Use SDValue::getConstantOperandVal. NFC.
llvm-svn: 261037
2016-02-17 00:14:03 +00:00
Mehdi Amini
7e2849720e Fix MSVC bot: apparently visual studio does not like explicitly defaulted move ctor
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261036
2016-02-17 00:11:59 +00:00
Andrew Kaylor
eedb857024 Fix build LLVM with -D LLVM_USE_INTEL_JITEVENTS:BOOL=ON on Windows
Differential Revision: http://reviews.llvm.org/D16940

llvm-svn: 261033
2016-02-16 23:52:18 +00:00
Dan Gohman
7699d5ed4b [WebAssembly] Implement __builtin_frame_address.
Differential Revision: http://reviews.llvm.org/D17307

llvm-svn: 261032
2016-02-16 23:48:04 +00:00
Mehdi Amini
5c2caef940 Query the StringMap only once when creating MDString (NFC)
Summary: Loading IR with debug info improves MDString::get() from 19ms to 10ms.

Reviewers: dexonsmith

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16597

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261030
2016-02-16 23:05:56 +00:00
Mehdi Amini
6f127aff71 Define the ThinLTO Pipeline (experimental)
Summary:
On the contrary to Full LTO, ThinLTO can afford to shift compile time
from the frontend to the linker: both phases are parallel (even if
it is not totally "free": projects like clang are reusing product
from the "compile phase" for multiple link, think about
libLLVMSupport reused for opt, llc, etc.).

This pipeline is based on the proposal in D13443 for full LTO. We
didn't move forward on this proposal because the LTO link was far too
long after that. We believe that we can afford it with ThinLTO.

The ThinLTO pipeline integrates in the regular O2/O3 flow:

 - The compile phase perform the inliner with a somehow lighter
   function simplification. (TODO: tune the inliner thresholds here)
   This is intendend to simplify the IR and get rid of obvious things
   like linkonce_odr that will be inlined.
 - The link phase will run the pipeline from the start, extended with
   some specific passes that leverage the augmented knowledge we have
   during LTO. Especially after the inliner is done, a sequence of
   globalDCE/globalOpt is performed, followed by another run of the
   "function simplification" passes. It is not clear if this part
   of the pipeline will stay as is, as the split model of ThinLTO
   does not allow the same benefit as FullLTO without added tricks.

The measurements on the public test suite as well as on our internal
suite show an overall net improvement. The binary size for the clang
executable is reduced by 5%. We're still tuning it with the bringup
of ThinLTO and it will evolve, but this should provide a good starting
point.

Reviewers: tejohnson

Differential Revision: http://reviews.llvm.org/D17115

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261029
2016-02-16 23:02:29 +00:00
Mehdi Amini
639bf1e488 Refactor the PassManagerBuilder: extract a "addFunctionSimplificationPasses()" (NFC)
It is intended to contains the passes run over a function after the
inliner is done with a function and before it moves to its callers.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261028
2016-02-16 22:54:27 +00:00
Adam Nemet
f3d8c27701 Fix test from r261013
llvm-svn: 261027
2016-02-16 22:50:19 +00:00
Simon Pilgrim
3b4ddae9de [X86][AVX] Regenerated vselect tests
llvm-svn: 261026
2016-02-16 22:33:27 +00:00
Ahmed Bougacha
c1f422afe6 [X86] Remove the now-unused X86ISD::PSIGN. NFC.
llvm-svn: 261025
2016-02-16 22:14:12 +00:00
Ahmed Bougacha
0b74af0c16 [X86] Generalize logic blend of (x, -x) combine to match (-x, x).
I suspect this is what let PR26110 lie dormant for so long.

llvm-svn: 261024
2016-02-16 22:14:07 +00:00
Ahmed Bougacha
c6b1c28e14 [X86] Don't turn (c?-v:v) into (c?-v:0) by blindly using PSIGN.
Currently, we sometimes miscompile this vector pattern:
    (c ? -v : v)
We lower it to (because "c" is <4 x i1>, lowered as a vector mask):
    (~c & v) | (c & -v)

When we have SSSE3, we incorrectly lower that to PSIGN, which does:
    (c < 0 ? -v : c > 0 ? v : 0)
in other words, when c is either all-ones or all-zero:
    (c ? -v : 0)
While this is an old bug, it rarely triggers because the PSIGN combine
is too sensitive to operand order. This will be improved separately.

Note that the PSIGN tests are also incorrect. Consider:
    %b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31>
    %sub = sub nsw <4 x i32> zeroinitializer, %a
    %0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
    %1 = and <4 x i32> %a, %0
    %2 = and <4 x i32> %b.lobit, %sub
    %cond = or <4 x i32> %1, %2
    ret <4 x i32> %cond
if %b is zero:
    %b.lobit = <4 x i32> zeroinitializer
    %sub = sub nsw <4 x i32> zeroinitializer, %a
    %0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
    %1 = <4 x i32> %a
    %2 = <4 x i32> zeroinitializer
    %cond = or <4 x i32> %a, zeroinitializer
    ret <4 x i32> %a
whereas we currently generate:
    psignd %xmm1, %xmm0
    retq
which returns 0, as %xmm1 is 0.

Instead, use a pure logic sequence, as described in:
https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate

Fixes PR26110.

Differential Revision: http://reviews.llvm.org/D17181

llvm-svn: 261023
2016-02-16 22:14:03 +00:00
Ahmed Bougacha
f8020709f1 [X86] Extract PSIGN/BLENDVP tests into vector-blend.ll. NFC.
We're going to stop generating PSIGN, so calling a test "psign"
isn't ideal. Instead, call these tests what they really are:
variable blends using logic.
Also add a test to exhibit a case we're currently missing in
the PSIGN combine.

llvm-svn: 261022
2016-02-16 22:13:59 +00:00
Ahmed Bougacha
7fc85e4b5a [X86] Extract PSIGN/BLENDVP combine. NFC.
llvm-svn: 261021
2016-02-16 22:13:55 +00:00
Ahmed Bougacha
8f9b9ee793 [X86] Extract ANDNP combine. NFC.
This makes it IMO more readable and reduces indentation.

llvm-svn: 261020
2016-02-16 22:13:49 +00:00
Mehdi Amini
7adcf03313 Bitcode writer: fix a typo, using getName() instead of getSourceFileName()
When emitting the source filename, the encoding of the string
was checked against the name instead of the filename.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261019
2016-02-16 22:07:03 +00:00
Derek Schuff
a5f71b59fb [WebAssembly] Update torture test expectations
These were fixed with r260978

llvm-svn: 261017
2016-02-16 21:52:06 +00:00
Reid Kleckner
4b86dc7a52 [codeview] Bail on a DBG_VALUE register operand with no register
This apparently comes up when the register allocator decides that a
variable will become undef along a certain path.

Also improve the error message we emit when we can't map from LLVM
register number to CV register number.

llvm-svn: 261016
2016-02-16 21:49:26 +00:00
Derek Schuff
b9542a8754 [WebAssemly] Don't move calls or stores past intervening loads
The register stackifier currently checks for intervening stores (and
loads that may alias them) but doesn't account for the fact that the
instruction being moved may affect intervening loads.

Differential Revision: http://reviews.llvm.org/D17298

llvm-svn: 261014
2016-02-16 21:44:19 +00:00
Adam Nemet
e39424112f [LTO] Support Statistics
Summary:
I thought -Xlinker -mllvm -Xlinker -stats worked at some point but maybe
it never did.

For clang, I believe that stats are printed from cc1_main.  This patch
also prints them for LTO, specifically right after codegen happens.

I only looked at the C API for LTO briefly to see if this is a good
place.  Probably there are still cases where this wouldn't be printed
but it seems to be working for the common case.  I also experimented
putting this in the LTOCodeGenerator destructor but that didn't trigger
for me because ld64 does not destroy the LTOCodeGenerator.

Reviewers: dexonsmith, joker.eph

Subscribers: rafael, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D17302

llvm-svn: 261013
2016-02-16 21:41:51 +00:00
Reid Kleckner
eaf9090d89 [codeview] Fix assertion on non-memory, non-register DBG_VALUE instructions
Eventually we should find a way to describe constant variables, but it
is not obvious how to do this at the moment.

llvm-svn: 261010
2016-02-16 21:14:51 +00:00
Colin LeMahieu
d875c88104 [Hexagon] Adding relocation for code size, cold path optimization allowing a 23-bit 4-byte aligned relocation to be a valid instruction encoding.
The usual way to get a 32-bit relocation is to use a constant extender which doubles the size of the instruction, 4 bytes to 8 bytes.

Another way is to put a .word32 and mix code and data within a function.  The disadvantage is it's not a valid instruction encoding and jumping over it causes prefetch stalls inside the hardware.

This relocation packs a 23-bit value in to an "r0 = add(rX, #a)" instruction by overwriting the source register bits.  Since r0 is the return value register, if this instruction is placed after a function call which return void, r0 will be filled with an undefined value, the prefetch won't be confused, and the callee can access the constant value by way of the link register.

llvm-svn: 261006
2016-02-16 20:38:17 +00:00
Jun Bum Lim
bf77014eda [AArch64] Add pass to remove redundant copy after RA
Summary:
This change will add a pass to remove unnecessary zero copies in target blocks
of cbz/cbnz instructions. E.g., the copy instruction in the code below can be
removed because the cbz jumps to BB1 when x0 is zero :
  BB0:
    cbz x0, .BB1
  BB1:
    mov x0, xzr

Jun

Reviewers: gberry, jmolloy, HaoLiu, MatzeB, mcrosier

Subscribers: mcrosier, mssimpso, haicheng, bmakam, llvm-commits, aemerson, rengolin

Differential Revision: http://reviews.llvm.org/D16203

llvm-svn: 261004
2016-02-16 20:02:39 +00:00
Quentin Colombet
aa2c5cf11c [GlobalISel] Re-apply r260922-260923 with MSVC-friendly code.
Original message:
Get rid of the ifdefs in TargetLowering.
Introduce a new API used only by GlobalISel: CallLowering.
This API will contain target hooks dedicated to call lowering.

llvm-svn: 260998
2016-02-16 19:26:02 +00:00
Rafael Espindola
c13987d0b8 Pass a std::unique_ptr to IRMover::move.
It was already the one "destroying" the source module, now the API
reflects that.

llvm-svn: 260989
2016-02-16 18:50:12 +00:00
Derek Schuff
abc1815dc3 [WebAssembly] Insert COPY_LOCAL between CopyToReg and FrameIndex DAG nodes
CopyToReg nodes don't support FrameIndex operands. Other targets select
the FI to some LEA-like instruction, but since we don't have that, we
need to insert some kind of instruction that can take an FI operand and
produces a value usable by CopyToReg (i.e. in a vreg). So insert a dummy
copy_local between Op and its FI operand. This results in a redundant
copy which we should optimize away later (maybe in the post-FI-lowering
peephole pass).

Differential Revision: http://reviews.llvm.org/D17213

llvm-svn: 260987
2016-02-16 18:18:36 +00:00
Tom Stellard
5d2e8e7ab0 [AMDGPU] Rename $dst operand to $vdst for VOP instructions.
Summary: This change renames output operand for VOP instructions from dst to vdst. This is needed to enable decoding named operands for disassembler.

Reviewers: vpykhtin, tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits, nhaustov

Projects: #llvm-amdgpu-spb

Differential Revision: http://reviews.llvm.org/D16920

llvm-svn: 260986
2016-02-16 18:14:56 +00:00
Philip Reames
c5e0566bcf Revert 260705, it appears to be causing pr26628
The root issue appears to be a confusion around what makeNoWrapRegion actually does.   It seems likely we need two versions of this function with slightly different semantics.

llvm-svn: 260981
2016-02-16 17:14:30 +00:00
Andrey Turetskiy
951c4b5ff8 [X86] Enable the LEA optimization pass by default.
Differential Revision: http://reviews.llvm.org/D16877

llvm-svn: 260979
2016-02-16 16:41:38 +00:00
Dan Gohman
9d4d64722a [WebAssembly] Switch from RPO sorting to topological sorting.
WebAssembly doesn't require full RPO; topological sorting is sufficient and
can preserve more of the MachineBlockPlacement ordering. Unfortunately, this
still depends a lot on heuristics, because while we use the
MachineBlockPlacement ordering as a guide, we can't use it in places where
it isn't topologically ordered. This area will require further attention.

llvm-svn: 260978
2016-02-16 16:22:41 +00:00
Aaron Ballman
a5c9f74621 A signed bitfield's range is [-1,0], so assigning 1 is technically an overflow. However, the other bitfield requires a signed value (it supports negative offsets), so it is slightly better to retain a signed 1-bit bitfield and use -1 instead of 1. Silences an MSVC warning.
llvm-svn: 260973
2016-02-16 15:35:51 +00:00
Aaron Ballman
fcd6af1dbf Reverting r260922-260923; they cause link failures with MSVC.
http://lab.llvm.org:8011/builders/lldb-x86-windows-msvc2015/builds/15436/steps/build/logs/stdio
http://bb.pgr.jp/builders/msbuild-llvmclang-x64-msc18-DA/builds/961/steps/build_llvm/logs/stdio

llvm-svn: 260972
2016-02-16 15:29:06 +00:00
Dan Gohman
25ef0b853a [WebAssembly] Create new registers instead of reusing old ones in RegStackify.
This avoids some complications updating LiveIntervals to be aware of the new
register lifetimes, because we can just compute new intervals from scratch
rather than describe how the old ones have been changed.

llvm-svn: 260971
2016-02-16 15:17:21 +00:00
Rafael Espindola
5bff063f80 Reapply r260489.
Original commit message:

[readobj] Dump DT_JMPREL relocations when outputting dynamic relocations.

The bits of r260488 it depends on have been committed.

llvm-svn: 260970
2016-02-16 15:16:00 +00:00
Dan Gohman
8579fc81f5 [WebAssembly] Implement support for custom NaN bit patterns.
llvm-svn: 260968
2016-02-16 15:14:23 +00:00
Rafael Espindola
7f9a56c892 Introduce a getAsRange helper.
This requires making an error message a bit more generic, but that seems
a reasonable tradeoff.

Extracted from r260488 but simplified a bit.

llvm-svn: 260967
2016-02-16 14:50:39 +00:00
Rafael Espindola
784b0296aa Move DynRegionInfo out of the ELFDumper.
This reduces indentation in preparation to adding a bit more code to it.

Extracted from r260488.

llvm-svn: 260963
2016-02-16 14:27:33 +00:00
Rafael Espindola
5c95668155 This reverts commit r260488 and r260489.
Original messages:
    Revert "[readobj] Handle ELF files with no section table or with no program headers."
    Revert "[readobj] Dump DT_JMPREL relocations when outputting dynamic relocations."

r260489 depends on r260488 and among other issues r260488 deleted error
handling code.

llvm-svn: 260962
2016-02-16 14:17:48 +00:00
Andrey Turetskiy
4106876da9 [X86] PR26575: Fix LEA optimization pass.
Add a missing check for a type of address displacement operand of the load/store instruction being a candidate for LEA substitution.

Ref: https://llvm.org/bugs/show_bug.cgi?id=26575

Differential Revision: http://reviews.llvm.org/D17261

llvm-svn: 260959
2016-02-16 12:47:45 +00:00
Benjamin Kramer
9505fd87c8 [Hexagon] Hoist nonnull assert up.
Once a pointer is turned into a reference it cannot be nullptr, clang
rightfully warns about this assert being a tautology. Put the assert
before the reference is created.

llvm-svn: 260949
2016-02-16 09:53:47 +00:00