1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 19:12:56 +02:00
Commit Graph

149360 Commits

Author SHA1 Message Date
Zachary Turner
d610ac22c2 [CodeView Type Merging] Avoid record deserialization when possible.
A profile shows the majority of time doing type merging is spent
deserializing records from sequences of bytes into friendly C++ structures
that we can easily access members of in order to find the type indices to
re-write.

Records are prefixed with their length, however, and most records have
type indices that appear at fixed offsets in the record. For these
records, we can save some cycles by just looking at the right place in the
byte sequence and re-writing the value, then skipping the record in the
type stream. This saves us from the costly deserialization of examining
every field, including potentially null terminated strings which are the
slowest, even though it was unnecessary to begin with.

In addition, we apply another optimization. Previously, after
deserializing a record and re-writing its type indices, we would
unconditionally re-serialize it in order to compute the hash of the
re-written record. This would result in an alloc and memcpy for every
record. If no type indices were re-written, however, this was an
unnecessary allocation. In this patch re-writing is made two phase. The
first phase discovers the indices that need to be rewritten and their new
values. This information is passed through to the de-duplication code,
which only copies and re-writes type indices in the serialized byte
sequence if at least one type index is different.

Some records have type indices which only appear after variable length
strings, or which have lists of type indices, or various other situations
that can make it tricky to make this optimization. While I'm not giving up
on optimizing these cases as well, for now we can get the easy cases out
of the way and lay the groundwork for more complicated cases later.

This patch yields another 50% speedup on top of the already large speedups
submitted over the past 2 days. In two tests I have run, I went from 9
seconds to 3 seconds, and from 16 seconds to 8 seconds.

Differential Revision: https://reviews.llvm.org/D33480

llvm-svn: 303914
2017-05-25 21:06:28 +00:00
Aaron Ballman
9614394657 Update the documentation and CMake file for Visual Studio generators.
By default, CMake uses a 32-bit toolchain, even when on a 64-bit platform targeting a 64-bit build. However, due to the size of the binaries involved, this can cause linker instabilities (such as the linker running out of memory). Guide people to the correct solution to get CMake to use the native toolchain.

llvm-svn: 303912
2017-05-25 21:01:30 +00:00
Kyle Butt
bba51a3040 PPC: Correct Size for GETtlsADDR
PPC::GETtlsADDR is lowered to a branch and a nop, by the assembly
printer. Its size was incorrectly marked as 4, correct it to 8. The
incorrect size can cause incorrect branch relaxation in
PPCBranchSelector under the right conditions.

llvm-svn: 303904
2017-05-25 19:37:41 +00:00
Nico Weber
3d129c51cd Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots.
llvm-svn: 303902
2017-05-25 19:19:29 +00:00
Manoj Gupta
8e696dc227 [AArch64]: add 'a' inline asm operand modifier.
Summary:
This is used in the Linux kernel, and effectively just means "print an
address". This brings back r193593.

Reviewed by: Renato Golin

Reviewers: t.p.northover, rengolin, richard.barton.arm, kristof.beyls

Subscribers: aemerson, javed.absar, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D33558

llvm-svn: 303901
2017-05-25 19:07:57 +00:00
Adrian Prantl
70d4f7d141 Fix SelectionDAGBuilder::getDbgValue to not expect DW_OP_deref on FI vars
This fixes an oversight in r300522, which changed alloca
dbg.values to no longer emit a DW_OP_deref.

The array.ll testcase was regenerated from source.

Fixes PR33166:
https://bugs.llvm.org/show_bug.cgi?id=33166

llvm-svn: 303897
2017-05-25 18:54:10 +00:00
Adrian Prantl
39ff0a614a Delete an obsolete paragraph in LangRef.
llvm-svn: 303896
2017-05-25 18:54:06 +00:00
David Blaikie
7427ada9bf DebugInfo: Produce debug_{gnu_}pub{names,types} entries when explicitly requested, even in -gmlt or when empty
Turns out gold doesn't use the DW_AT_GNU_pubnames to decide whether to
parse the rest of the DIEs when building gdb-index. This causes gold to
trip over LLVM's output when there are DW_FORM_ref_addr present.

Gold does use the presence of a debug_gnu_pub{names,types} entry for the
CU to skip parsing the debug_info portion, so make sure that's included
even when empty (technically, when empty there couldn't be any ref_addr
anyway - it only came up when gmlt didn't produce any (even non-empty)
pubnames - but given what that reveals about gold's implementation, this
seems like a good thing to do for consistency).

llvm-svn: 303894
2017-05-25 18:50:28 +00:00
Bob Haarman
3d5da45928 [llvm-pdbdump] [yaml2pdb] always include object file name in module info
Summary:
Previously, the yaml2pdb subcommand of llvm-pdbdump only
included object file names in module info if a module info stream was
present. This change makes it so that we include the object file name
even if there is no module info stream for the module. As a result,
running
llvm-pdbdump pdb2yaml -dbi-module-info original.pdb > original.yaml &&
llvm-pdbdump yaml2pdb -pdb=new.pdb original.yaml && llvm-pdbdump
pdb2yaml -dbi-module-info new.pdb > new.yaml now produces identical
original.yaml and new.yaml files.

Reviewers: amccarth, zturner

Reviewed By: zturner

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D33463

llvm-svn: 303891
2017-05-25 18:04:17 +00:00
Daniel Berlin
e0a5735dce NewGVN: Fix PR 33119, PR 33129, due to regressed undef handling
Fix PR33120 and others by eliminating self-cycles a different way.

llvm-svn: 303875
2017-05-25 15:44:20 +00:00
Artur Pilipenko
140634dbe4 [InstCombine] Teach isAllocSiteRemovable to look through addrspacecasts
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D28565

llvm-svn: 303870
2017-05-25 15:14:48 +00:00
Sanjay Patel
0a14f0325e [InstCombine] make icmp-mul fold more efficient
There's probably a lot more like this (see also comments in D33338 about responsibility), 
but I suspect we don't usually get a visible manifestation.

Given the recent interest in improving InstCombine efficiency, another potential micro-opt
that could be repeated several times in this function: morph the existing icmp pred/operands
instead of creating a new instruction.

llvm-svn: 303860
2017-05-25 14:13:57 +00:00
Tim Corringham
ea6f7a370f [AMDGPU] add intrinsic for s_getpc
Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D32862

llvm-svn: 303859
2017-05-25 14:04:14 +00:00
Oren Ben Simhon
ff1b128201 [X86] Adding vpopcntd and vpopcntq instructions
AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq).

Differential Revision: https://reviews.llvm.org/D33169

llvm-svn: 303858
2017-05-25 13:45:23 +00:00
James Molloy
a00acdfb68 [GVNSink] Pacify MSVC
Don't convert an unsigned to a pointer for a sentinel, use a size_t instead.

llvm-svn: 303855
2017-05-25 13:14:10 +00:00
James Molloy
cc1d665bc2 [GVNSink] Don't define operator<< in NDEBUG
Without debug macros enabled, the raw_ostream operator<< overload
is unused.

llvm-svn: 303852
2017-05-25 13:11:18 +00:00
James Molloy
e219c46616 [GVNSink] GVNSink pass
This patch provides an initial prototype for a pass that sinks instructions based on GVN information, similar to GVNHoist. It is not yet ready for commiting but I've uploaded it to gather some initial thoughts.

This pass attempts to sink instructions into successors, reducing static
instruction count and enabling if-conversion.
We use a variant of global value numbering to decide what can be sunk.
Consider:

[ %a1 = add i32 %b, 1  ]   [ %c1 = add i32 %d, 1  ]
[ %a2 = xor i32 %a1, 1 ]   [ %c2 = xor i32 %c1, 1 ]
                 \           /
           [ %e = phi i32 %a2, %c2 ]
           [ add i32 %e, 4         ]

GVN would number %a1 and %c1 differently because they compute different
results - the VN of an instruction is a function of its opcode and the
transitive closure of its operands. This is the key property for hoisting
and CSE.

What we want when sinking however is for a numbering that is a function of
the *uses* of an instruction, which allows us to answer the question "if I
replace %a1 with %c1, will it contribute in an equivalent way to all
successive instructions?". The (new) PostValueTable class in GVN provides this
mapping.

This pass has some shown really impressive improvements especially for codesize already on internal benchmarks, so I have high hopes it can replace all the sinking logic in SimplifyCFG.

Differential revision: https://reviews.llvm.org/D24805

llvm-svn: 303850
2017-05-25 12:51:11 +00:00
Chandler Carruth
ccf2cc6211 [PM] Teach the PGO instrumentation pasess to run GlobalDCE before
instrumenting code.

This is important in the new pass manager. The old pass manager's
inliner has a small DCE routine embedded within it. The new pass manager
relies on the actual GlobalDCE pass for this.

Without this patch, instrumentation profiling with the new PM results in
massive code bloat in the object files because the instrumentation
itself ends up preventing DCE from working to remove the code.

We should probably change the instrumentation (and/or DCE) so that we
can eliminate dead code even if instrumented, but we shouldn't even
spend the time generating instrumentation for that code so this still
seems like a good patch.

Differential Revision: https://reviews.llvm.org/D33535

llvm-svn: 303845
2017-05-25 07:15:09 +00:00
Chandler Carruth
c357180b82 [PM/Unswitch] Fix a bug in the domtree update logic for the new unswitch
pass.

The original logic only considered direct successors of the hoisted
domtree nodes, but that isn't really enough. If there are other basic
blocks that are completely within the subtree, their successors could
just as easily be impacted by the hoisting.

The more I think about it, the more I think the correct update here is
to hoist every block on the dominance frontier which has an idom in the
chain we hoist across. However, this is subtle enough that I'd
definitely appreciate some more eyes on it.

Sadly, if this is the correct algorithm, it requires computing a (highly
localized) dominance frontier. I've done this in the simplest (IE, least
code) way I could come up with, but that may be too naive. Suggestions
welcome here, dominance update algorithms are not an area I've studied
much, so I don't have strong opinions.

In good news, with this patch, turning on simple unswitch passes the
LLVM test suite for me with asserts enabled.

Differential Revision: https://reviews.llvm.org/D32740

llvm-svn: 303843
2017-05-25 06:33:36 +00:00
Craig Topper
86bc93d48f [MVT] Fix the identation of the start of the MVT class. NFC
llvm-svn: 303841
2017-05-25 06:15:05 +00:00
Craig Topper
9d032a4588 [SelectionDAG] Fix off by one in a compare in getOperationAction.
If Op is equal to array_lengthof, the lookup would be out of bounds, but we were only checking for greater than. I suspect nothing ever passes in the equal value because its a sentinel to mark the end of the builtin opcodes and not a real opcode.

So really this fix is just so that the code looks right and makes sense.

llvm-svn: 303840
2017-05-25 05:38:40 +00:00
Chandler Carruth
23545833ac [LegacyPM] Make the 'addLoop' method accept a loop to add rather than
having it internally allocate the loop.

This is a much more flexible API and necessary in the new loop unswitch
to reasonably support both new and old PMs in common code. It also just
seems like a cleaner separation of concerns.

NFC, this should just be a pure refactoring.

Differential Revision: https://reviews.llvm.org/D33528

llvm-svn: 303834
2017-05-25 03:01:31 +00:00
Galina Kistanova
99635b8b60 Fixed nondeterminism in RuleMatcher::emit.
llvm-svn: 303829
2017-05-25 01:51:53 +00:00
Vitaly Buka
9f41e21d33 [libFuzzer] Don't replace custom signal handlers.
Summary:
This allows to keep handlers installed by sanitizers.
In other cases third-party code can replace handlers after libFuzzer
initialization anyway.

Reviewers: kcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33522

llvm-svn: 303828
2017-05-25 01:43:13 +00:00
George Karpenkov
0495ee8bbd Fix coverage check for full post-dominator basic blocks.
Coverage instrumentation which does not instrument full post-dominators
and full-dominators may skip valid paths, as the reasoning for skipping
blocks may become circular.
This patch fixes that, by only skipping
full post-dominators with multiple predecessors, as such predecessors by
definition can not be full-dominators.

llvm-svn: 303827
2017-05-25 01:41:46 +00:00
Gor Nishanov
813463be5f [coroutines] CoroFrame.cpp conform to coding convention (s/repeat/Repeat) (NFC)
llvm-svn: 303826
2017-05-25 01:07:10 +00:00
Gor Nishanov
4877a1a6fb [coroutines] Relocate instructions that maybe spilled after coro.begin
Summary:
Frontend generates store instructions after allocas, for example:

```
define i8* @f(i64 %this) "coroutine.presplit"="1" personality i32 0 {
entry:
  %this.addr = alloca i64
  store i64 %this, i64* %this.addr
  ..
  %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)

```
Such instructions may require spilling into coro.frame, but, coro-frame address is only available after coro.begin and thus needs to be moved after coro.begin.
The only instructions that should not be moved are the arguments of coro.begin and all of their operands.

Reviewers: GorNishanov, majnemer

Reviewed By: GorNishanov

Subscribers: llvm-commits, EricWF

Differential Revision: https://reviews.llvm.org/D33527

llvm-svn: 303825
2017-05-25 00:46:20 +00:00
Tony Jiang
82d9e3d26c [PowerPC] Fix a performance bug for PPC::XXSLDWI.
There are some VectorShuffle Nodes in SDAG which can be selected to XXSLDWI
instruction, this patch recognizes them and does the selection to improve the
PPC performance.

llvm-svn: 303822
2017-05-24 23:48:29 +00:00
Rafael Espindola
bbfcd24e3e Print symbols from COFF import libraries.
This change allows llvm-nm to print symbols found in import libraries,
in part by allowing COFFImportFiles to be casted to SymbolicFiles.

Patch by Dave Lee!

llvm-svn: 303821
2017-05-24 23:40:36 +00:00
Eugene Zelenko
1b906aeb05 [CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 303820
2017-05-24 23:10:29 +00:00
Gor Nishanov
a4569f33dd [coroutines] Allow rematerialization upto 4 times. Remove incorrect assert
Reviewers: majnemer

Subscribers: EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D33524

llvm-svn: 303819
2017-05-24 23:01:02 +00:00
Sanjay Patel
5be683d752 [InstCombine] use m_APInt to allow icmp-mul-mul vector fold
The swapped operands in the first test is a manifestation of an 
inefficiency for vectors that doesn't exist for scalars because 
the IRBuilder checks for an all-ones mask for scalars, but not 
vectors.

llvm-svn: 303818
2017-05-24 22:58:17 +00:00
Sanjay Patel
ed417a0edd [InstCombine] add tests for icmp eq (mul X, C), (mul Y, C); NFC
llvm-svn: 303816
2017-05-24 22:36:14 +00:00
Sanjay Patel
701f71c45e [InstCombine] move tests and use FileCheck; NFC
llvm-svn: 303808
2017-05-24 21:48:25 +00:00
Nirav Dave
e5871eda8b [DAG] Prevent crashes when merging constant stores with high-bit set. NFC.
llvm-svn: 303802
2017-05-24 19:56:39 +00:00
Nirav Dave
7dc90034fc [AArch64] Prevent nested ADDs from address calc in splitStoreSplat. NFC
In preparation for late-stage store merging.

llvm-svn: 303800
2017-05-24 19:55:49 +00:00
Vitaly Buka
57d693c059 Revert "Revert "Attempt to pacify ASan and UBSan reports in CrashRecovery tests""
This dependents on r303729 which was reverted.

This reverts commit r303783.

llvm-svn: 303796
2017-05-24 19:11:12 +00:00
Craig Topper
7b59e06467 [InstCombine] Merge together the SimplifyDemandedUseBits implementations for ZExt and Trunc. NFC
While there avoid resizing the DemandedMask twice. Make a copy into a separate variable instead. This potentially removes an allocation on large bit widths.

With the use of the zextOrTrunc methods on APInt and KnownBits these can be made almost source identical. The only difference is the zero of the upper bits for ZExt. This is similar to how its done in computeKnownBits in ValueTracking.

llvm-svn: 303791
2017-05-24 18:40:25 +00:00
Vitaly Buka
39c1ad5ee6 Prevent UBSan report in CrashRecovery tests
Reverted by mistake with r303783.

llvm-svn: 303785
2017-05-24 18:11:57 +00:00
Vitaly Buka
e2e04cf2d8 Revert "Attempt to pacify ASan and UBSan reports in CrashRecovery tests"
It's not needed after r303729.

This reverts commit r303311.

llvm-svn: 303783
2017-05-24 17:58:09 +00:00
Teresa Johnson
6b6d815883 Fix a couple of typos in memory intrinsic optimization output (NFC)
s/instrinsic/intrinsic

llvm-svn: 303782
2017-05-24 17:55:25 +00:00
Zaara Syeda
28e46e17c2 P9: D-form vector load/store. Differential Revision: https://reviews.llvm.org/D33248
llvm-svn: 303780
2017-05-24 17:50:37 +00:00
Craig Topper
879361c40b [InstCombine] Use less bitwise operations to handle Instruction::SExt in SimplifyDemandedUseBits. Other improvements.
The current code created a NewBits mask and used it as a mask several times. One of them just before a call to trunc making it unnecessary. A call to getActiveBits can get us the same information for the case. We also ORed with this mask later when we should have just sign extended the known bits.

We also called trunc on the guaranteed to be zero KnownZeros/Ones masks entering this code. Creating appropriately sized temporary APInts is probably better.

Differential Revision: https://reviews.llvm.org/D32098

llvm-svn: 303779
2017-05-24 17:33:30 +00:00
Krzysztof Parzyszek
27f1aa93f7 Move machine-cse-physreg.mir to test/CodeGen/Thumb
llvm-svn: 303778
2017-05-24 17:20:47 +00:00
Craig Topper
5c66091fbf [InstSimplify] Simplify uadd/sadd/umul/smul with overflow intrinsics when the Zero or Undef is on the LHS.
Summary: This code was migrated from InstCombine a few years ago. InstCombine had nearby code that would move Constants to the RHS for these, but InstSimplify doesn't have such code on this path.

Reviewers: spatel, majnemer, davide

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33473

llvm-svn: 303774
2017-05-24 17:05:28 +00:00
Craig Topper
d36a5dface [ValueTracking] Convert most of the calls to computeKnownBits to use the version that returns the KnownBits object.
This continues the changes started when computeSignBit was replaced with this new version of computeKnowBits.

Differential Revision: https://reviews.llvm.org/D33431

llvm-svn: 303773
2017-05-24 16:53:07 +00:00
Craig Topper
5442613e26 [ValueTracking] Add OptimizationRemarkEmitter to the other signature for commuteKnownBits.
This is needed for an upcoming patch.

llvm-svn: 303772
2017-05-24 16:53:03 +00:00
Matthew Simpson
983065c8f6 Revert r291254: [AArch64] Reduce vector insert/extract cost for Falkor
The default vector insert/extract cost is more profitable on Falkor than the
reduced cost.

llvm-svn: 303771
2017-05-24 16:48:39 +00:00
Rafael Espindola
3a54931075 Add some tips on benchmarking.
llvm-svn: 303769
2017-05-24 16:39:12 +00:00
Nirav Dave
0855b10b8c [AMDGPU] Prevent too large store merges in AMDGPU Subtargets. NFCI.
Various address spaces on the SI and R600 subtargets have stricter
limits on memory access size that other address spaces. Use
canMergeStoresTo predicate to prevent the DAGCombiner from creating
these stores as they will be split up during legalization.

llvm-svn: 303767
2017-05-24 15:59:09 +00:00