1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 19:12:56 +02:00
Commit Graph

93578 Commits

Author SHA1 Message Date
Simon Pilgrim
e70f883bd5 [X86][SSE] Add 2 input shuffle support to matchBinaryVectorShuffle
Not actually used yet...

llvm-svn: 277919
2016-08-06 11:22:39 +00:00
Benjamin Kramer
a733725b3a Move helpers into anonymous namespaces. NFC.
llvm-svn: 277916
2016-08-06 11:13:10 +00:00
David Majnemer
be67decdd0 [CodeGen] Fix a -Wdocumentation warning
A parameter was documented with the wrong name.
No functionality change is intended.

llvm-svn: 277915
2016-08-06 08:37:12 +00:00
David Majnemer
cbcaf6adec [ValueTracking] Teach computeKnownBits about [su]min/max
Reasoning about a select in terms of a min or max allows us to derive a
tigher bound on the result.

llvm-svn: 277914
2016-08-06 08:16:00 +00:00
David Majnemer
52836ff27d [CallGraphSCCPass] Use an ArrayRef instead of a pair of iterators
No functional change is intended.

llvm-svn: 277913
2016-08-06 06:21:02 +00:00
Sanjoy Das
aca3962ed4 [InstCombine] Don't coerce non-integral pointers to integers
Reviewers: majnemer

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D23231

llvm-svn: 277910
2016-08-06 02:58:48 +00:00
Matthias Braun
61a94e86f0 Revert "(refs/bisect/bad) GVN-hoist: enable by default"
GVN-Hoist appears to miscompile llvm-testsuite
SingleSource/Benchmarks/Misc/fbench.c at the moment.

I filed http://llvm.org/PR28880

This reverts commit r277786.

llvm-svn: 277909
2016-08-06 02:23:15 +00:00
Gor Nishanov
d04999a10d Part 4c: Coroutine Devirtualization: Devirtualize coro.resume and coro.destroy.
Summary:
This is the 4c patch of the coroutine series. CoroElide pass now checks if PostSplit coro.begin
is referenced by coro.subfn.addr intrinsics. If so replace coro.subfn.addrs with an appropriate coroutine
subfunction associated with that coro.begin.

Documentation and overview is here: http://llvm.org/docs/Coroutines.html.

Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
3.Add empty coroutine passes. (https://reviews.llvm.org/D22847)
4.Add coroutine devirtualization + tests.
ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998)
c) Do devirtualization <= we are here
5.Add CGSCC restart trigger + tests.
6.Add coroutine heap elision + tests.
7.Add the rest of the logic (split into more patches)

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D23229

llvm-svn: 277908
2016-08-06 02:16:35 +00:00
Nico Weber
187570f797 Revert r277896.
It breaks ExecutionEngine/OrcLazy/weak-function.ll on most bots.

Script:
--
...
--
Exit Code: 1

Command Output (stderr):
--
Could not find main function.

llvm-svn: 277907
2016-08-06 02:00:45 +00:00
Kyle Butt
2437437aae CodeGen: If Convert blocks that would form a diamond when tail-merged.
The following function currently relies on tail-merging for if
conversion to succeed. The common tail of cond_true and cond_false is
extracted, and this then forms a diamond pattern that can be
successfully if converted.

If this block does not get extracted, either because tail-merging is
disabled or the threshold is higher, we should still recognize this
pattern and if-convert it.
define i32 @t2(i32 %a, i32 %b) nounwind {
entry:
	%tmp1434 = icmp eq i32 %a, %b		; <i1> [#uses=1]
	br i1 %tmp1434, label %bb17, label %bb.outer

bb.outer:		; preds = %cond_false, %entry
	%b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
	%a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
	br label %bb

bb:		; preds = %cond_true, %bb.outer
	%indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
	%tmp. = sub i32 0, %b_addr.021.0.ph
	%tmp.40 = mul i32 %indvar, %tmp.
	%a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
	%tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
	br i1 %tmp3, label %cond_true, label %cond_false

cond_true:		; preds = %bb
	%tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
	%tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
	%indvar.next = add i32 %indvar, 1
	br i1 %tmp1437, label %bb17, label %bb

cond_false:		; preds = %bb
	%tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
	%tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
	br i1 %tmp14, label %bb17, label %bb.outer

bb17:		; preds = %cond_false, %cond_true, %entry
	%a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
	ret i32 %a_addr.026.1
}

Without tail-merging or diamond-tail if conversion:
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ble     LBB1_3
@ BB#2:                                 @ %cond_true
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r0, r0, r1
        cmp     r1, r0
        it      ne
        cmpne   r0, r1
        bgt     LBB1_4
LBB1_3:                                 @ %cond_false
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r1, r1, r0
        cmp     r1, r0
        bne     LBB1_1
LBB1_4:                                 @ %bb17
        bx      lr

With diamond-tail if conversion, but without tail-merging:
@ BB#0:                                 @ %entry
        cmp     r0, r1
        it      eq
        bxeq    lr
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ite     le
        suble   r1, r1, r0
        subgt   r0, r0, r1
        cmp     r1, r0
        bne     LBB1_1
@ BB#2:                                 @ %bb17
        bx      lr

llvm-svn: 277905
2016-08-06 01:52:37 +00:00
Kyle Butt
5a3ac8d61b IfConverter: Split ScanInstructions into 2 functions.
ScanInstructions is now 2 functions:
AnalyzeBranches and ScanInstructions. ScanInstructions also now takes a
pair of arguments delimiting the instructions to be scanned. This will
be used for forked diamond support to re-scan only a portion of the
block.

llvm-svn: 277904
2016-08-06 01:52:34 +00:00
Kyle Butt
f1dc880c22 IfConversion: Document countDuplicatedInstructions. NFC
llvm-svn: 277903
2016-08-06 01:52:33 +00:00
Kyle Butt
c2f044ae3a IfConversion: factor out 2 functions to skip debug instrs. NFC
Skipping debug instructions occurrs repeatedly, factor it out.

llvm-svn: 277902
2016-08-06 01:52:31 +00:00
Michael Zolotukhin
ca933d23c6 Revert "[LoopSimplify] Fix updating LCSSA after separating nested loops."
This reverts commit r277877.
Try to appease clang-x64-ninja-win7 buildbot.

llvm-svn: 277901
2016-08-06 01:48:51 +00:00
Lang Hames
60cc270b53 [ORC] Add (partial) weak symbol support to the CompileOnDemand layer.
This adds partial support for weak functions to the CompileOnDemandLayer by
modifying the addLogicalModule method to check for existing stub definitions
before building a new stub for a weak function. This scheme is sufficient to
support ODR definitions, but fails for general weak definitions if strong
definition is encountered after the first weak definition. (A more extensive
refactor will be required to fully support weak symbols).

This patch does *not* add weak symbol support to RuntimeDyld: I hope to add
that in the near future.

llvm-svn: 277896
2016-08-06 00:54:43 +00:00
Sanjoy Das
58e4d175cb [IRCE] Remove unused headers; NFC
llvm-svn: 277892
2016-08-06 00:02:01 +00:00
Sanjoy Das
ac0f2a4d82 [IRCE] Preserve loop-simplify form
Fixes PR28764.  Right now there is no way to test this, but (as
mentioned on the PR) with Michael Zolotukhin's yet to be checked in
LoopSimplify verfier, 8 of the llvm-lit tests for IRCE crash.

llvm-svn: 277891
2016-08-06 00:01:56 +00:00
Sanjay Patel
cf40a88b7b [InstCombine] refactor ctlz/cttz folds (NFCI)
Note that this fold really belongs in InstSimplify.
Refactoring here anyway as an intermediate step because
there's a planned addition to this function in D23134.

Differential Revision: https://reviews.llvm.org/D23223

llvm-svn: 277883
2016-08-05 22:42:46 +00:00
Daniel Berlin
816ae4062c [MSSA] Use depth first iterator instead of custom version.
Summary:
Originally the plan was to use the custom worklist to do some block popping,
and because we don't actually need a visited set. The custom one we have
here is slightly broken, and it's not worth fixing vs using depth_first_iterator since we aren't going to go the route we originally
were.

Fixes PR28874
Reviewers: george.burgess.iv

Subscribers: llvm-commits, gberry

Differential Revision: https://reviews.llvm.org/D23187

llvm-svn: 277880
2016-08-05 22:09:14 +00:00
Justin Bogner
049f0b1295 CodeView: Remove an unused variable
It was breaking the -Werror build.

llvm-svn: 277878
2016-08-05 21:57:10 +00:00
Michael Zolotukhin
fa30ea5db2 [LoopSimplify] Fix updating LCSSA after separating nested loops.
This fixes PR28825. The problem was that we only checked if a value from
a created inner loop is used in the outer loop, and fixed LCSSA for
them. But we missed to fixup LCSSA for values used in exits of the outer
loop.

llvm-svn: 277877
2016-08-05 21:52:58 +00:00
Zachary Turner
d023c59def Fix non portable include path.
llvm-svn: 277876
2016-08-05 21:50:02 +00:00
Daniel Berlin
a948a2224e [MSSA] Match assert vs llvm_unreachable style in verification functions.
llvm-svn: 277873
2016-08-05 21:47:20 +00:00
Daniel Berlin
242310219e Rewrite domination verifier to handle local domination as well.
Summary:
Rewrite domination verifier to handle local domination as well.
This catches a bug Geoff Berry noticed.

Reviewers: george.burgess.iv

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23184

llvm-svn: 277872
2016-08-05 21:46:52 +00:00
Zachary Turner
a3ce9cabee [CodeView] Decouple record deserialization from visitor dispatch.
Until now, our use case for the visitor has been to take a stream of bytes
representing a type stream, deserialize the records in sequence, and do
something with them, where "something" is determined by how the user
implements a particular set of callbacks on an abstract class.

For actually writing PDBs, however, we want to do the reverse. We have
some kind of description of the list of records in their in-memory format,
and we want to process each one. Perhaps by serializing them to a byte
stream, or perhaps by converting them from one description format (Yaml)
to another (in-memory representation).

This was difficult in the current model because deserialization and
invoking the callbacks were tightly coupled.

With this patch we change this so that TypeDeserializer is itself an
implementation of the particular set of callbacks. This decouples
deserialization from the iteration over a list of records and invocation
of the callbacks.  TypeDeserializer is initialized with another
implementation of the callback interface, so that upon deserialization it
can pass the deserialized record through to the next set of callbacks. In
a sense this is like an implementation of the Decorator design pattern,
where the Deserializer is a decorator.

This will be useful for writing Pdbs from yaml, where we have a
description of the type records in Yaml format. In this case, the visitor
implementation would have each visitation callback method implemented in
such a way as to extract the proper set of fields from the Yaml, and it
could maintain state that builds up a list of these records. Finally at
the end we can pass this information through to another set of callbacks
which serializes them into a byte stream.

Reviewed By: majnemer, ruiu, rnk
Differential Revision: https://reviews.llvm.org/D23177

llvm-svn: 277871
2016-08-05 21:45:34 +00:00
Marek Olsak
e93050ba9d AMDGPU/SI: Increase SGPR limit to 96 on Tonga/Iceland
Summary:
This is the setting of the Vulkan closed source driver.

It decreases the max wave count from 10 to 8.

26010 shaders in 14650 tests
Totals:
VGPRS: 829593 -> 808440 (-2.55 %)
Spilled SGPRs: 81878 -> 42226 (-48.43 %)
Spilled VGPRs: 367 -> 358 (-2.45 %)
Scratch VGPRs: 1764 -> 1748 (-0.91 %) dwords per thread
Code Size: 36677864 -> 35923932 (-2.06 %) bytes

There is a massive decrease in SGPR spilling in general and -7.4% spilled
VGPRs for DiRT Showdown (= SGPRs spilled to scratch?)

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23034

llvm-svn: 277867
2016-08-05 21:23:29 +00:00
Weiming Zhao
25d501fdb1 [ARM] Constant Materialize: imms with specific value can be encoded into mov.w
Summary: Thumb2 supports encoding immediates with specific patterns into mov.w by splatting the low 8 bits into other bytes.

I'm resubmitting this patch. The test case in the original commit
r277610 does not specify triple, so builds with differnt default triple
will have different output.

This patch fixed trile as thumb-darwin-apple.

Reviewers: john.brawn, jmolloy, bruno

Subscribers: jmolloy, aemerson, rengolin, samparker, llvm-commits

Differential Revision: https://reviews.llvm.org/D23090

llvm-svn: 277865
2016-08-05 20:58:29 +00:00
Davide Italiano
3bdeddf37c [FlattenCFG] Simplify + remove unused variable. NFCI.
llvm-svn: 277864
2016-08-05 20:53:35 +00:00
Dehao Chen
a21187d4ef Remove cold callsite heuristic that is not necessary because of cold callee heuristic.
llvm-svn: 277863
2016-08-05 20:49:04 +00:00
Dehao Chen
3e893cb241 Replace hot-callsite based heuristic to use its own threshold parameter instead of share inline-hint parameter
Summary: Hot callsites should have higher threshold than inline hints. This patch uses separate threshold parameter for hot callsites.

Reviewers: davidxl, eraman

Subscribers: Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D22368

llvm-svn: 277860
2016-08-05 20:28:41 +00:00
Mike Aizatsky
63896f700f [sanitizers] trace buffer API to use user-allocated buffer.
Differential Revision: https://reviews.llvm.org/D23185

llvm-svn: 277859
2016-08-05 20:09:53 +00:00
Ivan Krasin
2c83cdacf1 WholeProgramDevirt: print remarks with devirtualized method names.
Summary:
Chrome on Linux uses WholeProgramDevirt for speed ups, and it's
important to detect regressions on both sides: the toolchain,
if fewer methods get devirtualized after an update, and Chrome,
if an innocently looking change caused many hot methods become
virtual again.

The need to track devirtualized methods is not Chrome-specific,
but it's probably the only user of the pass at this time.

Reviewers: kcc

Differential Revision: https://reviews.llvm.org/D23219

llvm-svn: 277856
2016-08-05 19:45:16 +00:00
David Callahan
066bd5eb0e [ADCE] Refactoring for new functionality (NFC)
Summary:
This is another refactoring to break up the one function into three logical components functions.
Another non-functional change before we start added in features.

Reviewers: nadav, mehdi_amini, majnemer

Subscribers: twoh, freik, llvm-commits

Differential Revision: https://reviews.llvm.org/D23102

llvm-svn: 277855
2016-08-05 19:38:11 +00:00
Sanjoy Das
3c9e97ad45 [ConstantFolding] Don't create illegal (non-integral) inttoptrs
Reviewers: majnemer, arsenm

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D23182

llvm-svn: 277854
2016-08-05 19:23:29 +00:00
David Callahan
2001723a88 [AutoFDO] Fix handling of empty profiles
Summary:
If a profile has no samples for a function, then the function "entry count" is set to the value 0. Several places in the code test that if the Function::getEntryCount is defined at all. Here we change to treat a 0 entry count the same as undefined.

In particular, this fixes a problem in getLayoutSuccessorProbThreshold in MachineBlockPlacement.cpp where we use a different and inferior heuristic for laying out basic blocks.

Reviewers: danielcdh, dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23082

llvm-svn: 277849
2016-08-05 18:38:19 +00:00
Sanjoy Das
1c4709dd53 [SCEV] Don't infinitely recurse on unreachable code
llvm-svn: 277848
2016-08-05 18:34:14 +00:00
Kevin Enderby
538b15a9ad Add the first of what will be a long line of additional error checks for invalid Mach-O files.
This is where an LC_SEGMENT load command has a fileoff field that
extends past the end of the file.

Also fix llvm-nm and llvm-size to remove the errorToErrorCode() call so error messages are printed.
And needed to update a few test cases now that they do print the error messages just a
bit differently.

llvm-svn: 277845
2016-08-05 18:19:40 +00:00
Dehao Chen
359e5901cf Do not assign new discriminator for all intrinsics.
Summary: We do not care about intrinsic calls when assigning discriminators.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23212

llvm-svn: 277843
2016-08-05 17:56:49 +00:00
Tim Northover
5e1792ace2 GlobalISel: clear pending phis after MachineFunction translated
Test is just reordering the existing functions (it would trigger for any
function after one with a phi).

llvm-svn: 277841
2016-08-05 17:50:36 +00:00
Simon Pilgrim
88e8bdcf6a [X86][SSE] Add initial support for 2 input target shuffle combining.
At the moment only the INSERTPS matching can actually use 2 inputs but the plumbing is now in place.

llvm-svn: 277839
2016-08-05 17:36:14 +00:00
Tim Northover
fff7765455 GlobalISel: IRTranslate PHI instructions
llvm-svn: 277835
2016-08-05 17:16:40 +00:00
Ulrich Weigand
5e0692561d [PowerPC] Wrong fast-isel codegen for VSX floating-point loads
There were two locations where fast-isel would generate a LFD instruction
with a target register class VSFRC instead of F8RC when VSX was enabled.
This can ccause invalid registers to be used in certain cases, like:
   lfd 36, ...
instead of using a VSX load instruction.  The wrong register number gets
silently truncated, causing invalid code to be generated.


The first place is PPCFastISel::PPCEmitLoad, which had multiple problems:

1.) The IsVSSRC and IsVSFRC flags are not initialized correctly, since they
are computed from resultReg, which is still zero at this point in many cases.
Fixed by changing the helper routines to operate on a register class instead
of a register and passing in UseRC.
 
2.) Even with this fixed, Is64VSXLoad is still wrong due to a typo:

bool Is32VSXLoad = IsVSSRC && Opc == PPC::LFS;
bool Is64VSXLoad = IsVSSRC && Opc == PPC::LFD;

The second line needs to use isVSFRC (like PPCEmitStore does).

3.) Once both the above are fixed, we're now generating a VSX instruction --
but an incorrect one, since generation of an indexed instruction with null
index is wrong. Fixed by copying the code handling the same issue in
PPCEmitStore.


The second place is PPCFastISel::PPCMaterializeFP, where we would emit an
LFD to load a constant from the literal pool, and use the wrong result
register class. Fixed by hardcoding a F8RC class even on systems
supporting VSX.


Fixes: https://llvm.org/bugs/show_bug.cgi?id=28630

Differential Revision: https://reviews.llvm.org/D22632

llvm-svn: 277823
2016-08-05 15:22:05 +00:00
Zhan Jun Liau
26394d50f8 [SystemZ] Add missing classes and instructions
Summary:
Add instruction formats E, RSI, SSd, SSE, and SSF.

Added BRXH, BRXLE, PR, MVCK, STRAG, and ECTG instructions to test out
those formats.

Reviewers: uweigand

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23179

llvm-svn: 277822
2016-08-05 15:14:34 +00:00
Benjamin Kramer
e0adb168f1 [SimplifyCFG] Make range reduction code deterministic.
This generated IR based on the order of evaluation, which is different
between GCC and Clang. With that in mind you get bootstrap miscompares
if you compare a Clang built with GCC-built Clang vs. Clang built with
Clang-built Clang. Diagnosing that made my head hurt.

This also reverts commit r277337, which "fixed" the test case.

llvm-svn: 277820
2016-08-05 14:55:02 +00:00
Simon Pilgrim
8b91df35d2 [X86][SSE] Update the the target shuffle matches to use the effective mask's value type directly instead of via the input value type.
Preparation for adding 2 input support so we want to avoid unnecessary references to the input value type.

llvm-svn: 277817
2016-08-05 14:33:11 +00:00
Simon Pilgrim
1ff993a03f [X86][SSE] Consistently use the target shuffle root value type for vector size calculations. NFCI.
Preparation for adding 2 input support so we want to avoid unnecessary references to the input value type.

llvm-svn: 277814
2016-08-05 13:02:53 +00:00
NAKAMURA Takumi
ae5f0b7063 LLLexer.cpp: Avoid using BitsToDouble() to preserve SNaN like "double 0x7FF4000000000000".
We should not use double (or float) in the LLVM, unless it is really needed. x87 FP register doesn't preserve SNaN to move the value.

FIXME: APFloat() may have the constructor by raw bit.
llvm-svn: 277813
2016-08-05 11:59:49 +00:00
NAKAMURA Takumi
a0f33d0f1e Reformat.
llvm-svn: 277812
2016-08-05 11:59:45 +00:00
Simon Pilgrim
b37658fd3f [X86][SSE] Added target shuffle combine binary compute matching function. NFCI.
Added matchBinaryPermuteVectorShuffle and moved the blend+zero and insertps matching code into it.

llvm-svn: 277808
2016-08-05 11:16:53 +00:00
John Brawn
f0a061e202 Reapply r276973 "Adjust Registry interface to not require plugins to export a registry"
This differs from the previous version by being more careful about template
instantiation/specialization in order to prevent errors when building with
clang -Werror. Specifically:
 * begin is not defined in the template and is instead instantiated when Head
   is. I think the warning when we don't do that is wrong (PR28815) but for now
   at least do it this way to avoid the warning.
 * Instead of performing template specializations in LLVM_INSTANTIATE_REGISTRY
   instead provide a template definition then do explicit instantiation. No
   compiler I've tried has problems with doing it the other way, but strictly
   speaking it's not permitted by the C++ standard so better safe than sorry.

Original commit message:

Currently the Registry class contains the vestiges of a previous attempt to
allow plugins to be used on Windows without using BUILD_SHARED_LIBS, where a
plugin would have its own copy of a registry and export it to be imported by
the tool that's loading the plugin. This only works if the plugin is entirely
self-contained with the only interface between the plugin and tool being the
registry, and in particular this conflicts with how IR pass plugins work.

This patch changes things so that instead the add_node function of the registry
is exported by the tool and then imported by the plugin, which solves this
problem and also means that instead of every plugin having to export every
registry they use instead LLVM only has to export the add_node functions. This
allows plugins that use a registry to work on Windows if
LLVM_EXPORT_SYMBOLS_FOR_PLUGINS is used.

llvm-svn: 277806
2016-08-05 11:01:08 +00:00
Strahinja Petrovic
9cbffd73fc [PowerPC] fix passing long double arguments to function (soft-float)
This patch fixes passing long double type arguments to function in 
soft float mode. If there is less than 4 argument registers free 
(long double type is mapped in 4 gpr registers in soft float mode) 
long double type argument must be passed through stack.
Differential Revision: https://reviews.llvm.org/D20114.

llvm-svn: 277804
2016-08-05 08:47:26 +00:00
Nicolai Haehnle
7286dd419e [InstCombine] try to fold (select C, (sext A), B) into logical ops
Summary:
Turn (select C, (sext A), B) into (sext (select C, A, B')) when A is i1 and
B is a compatible constant, also for zext instead of sext. This will then be
further folded into logical operations.

The transformation would be valid for non-i1 types as well, but other parts of
InstCombine prefer to have sext from non-i1 as an operand of select.

Motivated by the shader compiler frontend in Mesa for AMDGPU, which emits i32
for boolean operations. With this change, the boolean logic is fully
recovered.

Reviewers: majnemer, spatel, tstellarAMD

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D22747

llvm-svn: 277801
2016-08-05 08:22:29 +00:00
Justin Bogner
443b814563 InstCombine: Clean up some trailing whitespace. NFC
llvm-svn: 277793
2016-08-05 01:09:48 +00:00
Justin Bogner
a9ca8f6cb7 InstCombine: Replace some never-null pointers with references. NFC
llvm-svn: 277792
2016-08-05 01:06:44 +00:00
Sebastian Pop
908ec888ad GVN-hoist: enable by default
llvm-svn: 277786
2016-08-04 23:49:07 +00:00
Sebastian Pop
16f65162d1 GVN-hoist: fix early exit logic
The patch splits a complex && if condition into easier to read and understand
logic.  That wrong early exit condition was letting some instructions with not
all operands available pass through when HoistingGeps was true.

Differential Revision: https://reviews.llvm.org/D23174

llvm-svn: 277785
2016-08-04 23:49:05 +00:00
Justin Bogner
c1fadbec04 IR: Provide an IRBuilder Inserter that calls a callback after insertion
Add a generalized IRBuilderCallbackInserter, which is just given a
callback to execute after insertion. This can be used to get rid of
the custom inserter in InstCombine, which will in turn allow me to add
target specific InstCombineCalls API for intrinsics without horrible
layering violations.

llvm-svn: 277784
2016-08-04 23:41:01 +00:00
Michael Kuperstein
7044572182 [LV, X86] Be more optimistic about vectorizing shifts.
Shifts with a uniform but non-constant count were considered very expensive to
vectorize, because the splat of the uniform count and the shift would tend to
appear in different blocks. That made the splat invisible to ISel, and we'd
scalarize the shift at codegen time.

Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we
are able to select the appropriate vector shifts. This updates the cost model to
to take this into account by making shifts by a uniform cheap again.

Differential Revision: https://reviews.llvm.org/D23049

llvm-svn: 277782
2016-08-04 22:48:03 +00:00
Sanjay Patel
d4b53a013e [InstCombine] use m_APInt to allow icmp eq (mul X, C1), C2 folds for splat constant vectors
This concludes the splat vector enhancements for foldICmpEqualityWithConstant().
Other commits in this series:
https://reviews.llvm.org/rL277762
https://reviews.llvm.org/rL277752
https://reviews.llvm.org/rL277738
https://reviews.llvm.org/rL277731
https://reviews.llvm.org/rL277659
https://reviews.llvm.org/rL277638
https://reviews.llvm.org/rL277629

llvm-svn: 277779
2016-08-04 22:19:27 +00:00
Kevin Enderby
cde48ff083 Clean up the logic of the Archive::Child::Child() with an assert to know Err is not a nullptr
when we are pointed at real data.

David Blaikie pointed out some odd logic in the case the Err value was a nullptr and
Lang Hames suggested it could be cleaned it up with an assert to know that Err is
not a nullptr when we are pointed at real data.  As only in the case of constructing
the sentinel value by pointing it at null data is Err is permitted to be a nullptr,
since no error could occur in that case.

With this change the testing for “if (Err)” is removed from the constructor’s logic
and *Err is used directly without any check after the assert().

llvm-svn: 277776
2016-08-04 21:54:19 +00:00
Tim Northover
97134fad94 GlobalISel: extend add widening to SUB, MUL, OR, AND and XOR.
These are the operations that are trivially identical. Division is omitted for
now because you need to use the correct sign/zero extension.

llvm-svn: 277775
2016-08-04 21:39:49 +00:00
Tim Northover
73cd86e9c7 GlobalISel: add support for G_MUL
llvm-svn: 277774
2016-08-04 21:39:44 +00:00
Tim Northover
6be3480298 GlobalISel: implement narrowing for G_ADD.
llvm-svn: 277769
2016-08-04 20:54:13 +00:00
Matt Arsenault
256e602a26 GVNHoist: Don't hoist convergent calls
llvm-svn: 277767
2016-08-04 20:52:57 +00:00
Lang Hames
06bc5445fb [ExecutionEngine] Refactor - Roll JITSymbolFlags functionality into JITSymbol.h
and remove the JITSymbolFlags header.

llvm-svn: 277766
2016-08-04 20:32:37 +00:00
David Majnemer
cac831b368 [coroutines] Part 4[ab]: Coroutine Devirtualization: Lower coro.resume and coro.destroy.
This is the forth patch in the coroutine series. CoroEaly pass now lowers coro.resume
and coro.destroy intrinsics by replacing them with an indirect call to an address
returned by coro.subfn.addr intrinsic. This is done so that CGPassManager recognizes
devirtualization when CoroElide replaces a call to coro.subfn.addr with an appropriate
function address.

Patch by Gor Nishanov!

Differential Revision: https://reviews.llvm.org/D22998

llvm-svn: 277765
2016-08-04 20:30:07 +00:00
Sanjay Patel
465c30dcbd [InstCombine] use m_APInt to allow icmp eq (and X, C1), C2 folds for splat constant vectors
llvm-svn: 277762
2016-08-04 20:05:02 +00:00
Yaxun Liu
6c8b6c7094 [OpenCL] Add missing tests for getOCLTypeName
Adding missing tests for OCL type names for half, float, double, char, short, long, and unknown.

Patch by Aaron En Ye Shi.

Differential Revision: https://reviews.llvm.org/D22964

llvm-svn: 277759
2016-08-04 19:45:00 +00:00
Zachary Turner
aa1e2354eb [CodeView] Use llvm::Error instead of std::error_code.
This eliminates the remnants of std::error_code from the
DebugInfo libraries.

llvm-svn: 277758
2016-08-04 19:39:55 +00:00
Tim Northover
9a77dbb39e AArch64: don't assume all i128s are BUILD_PAIRs
It leads to a crash when they're not. I'm *sure* I've made this mistake before,
at least once.

llvm-svn: 277755
2016-08-04 19:32:28 +00:00
Sanjay Patel
f15750bb28 [InstCombine] use m_APInt to allow icmp eq (or X, C1), C2 folds for splat constant vectors
llvm-svn: 277752
2016-08-04 19:12:12 +00:00
Tim Northover
7bdc05cfaa GlobalISel: also add G_TRUNC to IRTranslator.
llvm-svn: 277749
2016-08-04 18:35:17 +00:00
Tim Northover
da14a9a2d2 GlobalISel: add code to widen scalar G_ADD
llvm-svn: 277747
2016-08-04 18:35:11 +00:00
Derek Schuff
d2e71c8b48 [WebAssembly] Check return value of getRegForValue in FastISel
Previously, FastISel for WebAssembly wasn't checking the return value of
`getRegForValue` in certain cases, which would generate instructions
referencing NoReg. This patch fixes this behavior.

Patch by Dominic Chen

Differential Revision: https://reviews.llvm.org/D23100

llvm-svn: 277742
2016-08-04 18:01:52 +00:00
Krzysztof Parzyszek
5c74e232b1 [Hexagon] Validate register class when doing bit simplification
llvm-svn: 277740
2016-08-04 17:56:19 +00:00
Sanjay Patel
9e469c7feb [InstCombine] use m_APInt to allow icmp eq (op X, Y), C folds for splat constant vectors
I'm removing a misplaced pair of more specific folds from InstCombine in this patch as well,
so we know where those folds are happening in InstSimplify.

llvm-svn: 277738
2016-08-04 17:48:04 +00:00
Simon Pilgrim
7c40654a7f [X86][SSE] Rename target shuffle unary permute matching function. NFCI.
In preparation for adding a binary permute matching function.

llvm-svn: 277737
2016-08-04 17:16:50 +00:00
Alina Sbirlea
6b31c219ab LoadStoreVectorizer: Remove TargetBaseAlign. Keep alignment for stack adjustments.
Summary:
TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses.
A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted.

Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures:
- x86 failing tests either did not have the right target, or the right alignment.
- NVPTX failing tests did not have the right alignment.
- AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info
  considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks
  for a maximum size allowed and relies on the next condition checking for %4 for correctness.
  This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list).

Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure),
so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context.

Reviewers: arsenm, jlebar, tstellarAMD

Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D23068

llvm-svn: 277735
2016-08-04 16:38:44 +00:00
Daniel Sanders
dc7d5b28b3 [mips] Set Personality and LSDA encoding for FreeBSD
Reviewers: seanbruno, sdardis

Subscribers: tberghammer, danalbert, srhines, dsanders, sdardis, llvm-commits, seanbruno

Differential Revision: https://reviews.llvm.org/D23113

llvm-svn: 277732
2016-08-04 15:36:03 +00:00
Sanjay Patel
3cb5010af3 [InstCombine] use m_APInt to allow icmp eq (sub C1, X), C2 folds for splat constant vectors
llvm-svn: 277731
2016-08-04 15:19:25 +00:00
Simon Pilgrim
a8b6e796e4 [X86][SSE] Split off shuffle mask canonicalization from lowerVectorShuffle. NFCI.
The new function now returns true if the shuffle should be commuted.

This will allow target shuffle combines to share the code.

llvm-svn: 277728
2016-08-04 14:21:32 +00:00
Krzysztof Parzyszek
4a04420cea [Hexagon] Clear kill flags from modified registers in peephole optimizer
llvm-svn: 277727
2016-08-04 14:17:16 +00:00
Nikolai Bozhenov
2540ce6c57 [X86] Heuristic to selectively build Newton-Raphson SQRT estimation
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.

The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.

Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.

Differential Revision: https://reviews.llvm.org/D21379

llvm-svn: 277725
2016-08-04 12:47:28 +00:00
Hrvoje Varga
6bf6a83a4c [mips][microMIPS] Implement CFC1, CFC2, CTC1 and CTC2 instructions
Differential Revision: https://reviews.llvm.org/D22347

llvm-svn: 277719
2016-08-04 11:22:52 +00:00
Simon Pilgrim
70f3672799 [X86][SSE] Add initial costs for vector CTTZ/CTLZ
llvm-svn: 277716
2016-08-04 10:51:41 +00:00
Simon Pilgrim
fa572df732 [X86][SSE] Don't decide when to scalarize CTTZ/CTLZ for performance at lowering - this is what cost models are for
Improved CTTZ/CTLZ costings will be added shortly

llvm-svn: 277713
2016-08-04 10:14:39 +00:00
Simon Dardis
9fd70b27da [mips] Enable tail calls by default
Enable tail calls by default for (micro)MIPS(64).

microMIPS is slightly more tricky than doing it for MIPS(R6) or microMIPSR6.
microMIPS has two instruction encodings: 16bit and 32bit along with some
restrictions on the size of the instruction that can fill the delay slot.
For safe tail calls for microMIPS, the delay slot filler attempts to find
a correct size instruction for the delay slot of TAILCALL pseudos.

Reviewers: dsanders, vkalintris

Subscribers: jfb, dsanders, sdardis, llvm-commits

Differential Revision: https://reviews.llvm.org/D21138

llvm-svn: 277708
2016-08-04 09:17:07 +00:00
Diana Picus
f12d100d88 Typo fix in comment. NFC
llvm-svn: 277704
2016-08-04 08:25:08 +00:00
Dean Michael Berris
865b9eeeaa [XRay] Align entry and return sleds to 2 byte boundaries
This should ensure that we can atomically write two bytes (on top of the
retq and the one past it) and have those two bytes not straddle cache
lines.

We also move the label past the alignment instruction so that we can refer
to the actual first instruction, as opposed to potential padding before the
aligned instruction.

Update the tests to allow us to reflect the new order of assembly.

Reviewers: rSerge, echristo, majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23101

llvm-svn: 277701
2016-08-04 07:37:28 +00:00
Amaury Sechet
f98549c3e0 Add popcount(n) == bitsize(n) -> n == -1 transformation.
Summary: As per title.

Reviewers: majnemer, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23139

llvm-svn: 277694
2016-08-04 05:27:20 +00:00
David Majnemer
0f8f039a03 Forgot the dyn_cast_or_null intended for r277691.
llvm-svn: 277693
2016-08-04 04:47:18 +00:00
David Majnemer
3351460081 Reinstate "[CloneFunction] Don't remove side effecting calls"
This reinstates r277611 + r277614 and reverts r277642.  A cast_or_null
should have been a dyn_cast_or_null.

llvm-svn: 277691
2016-08-04 04:24:02 +00:00
Bruno Cardoso Lopes
7a82064117 Revert "GVN-hoist: enable by default" & "Make GVN Hoisting obey optnone/bisect."
This reverts commits r277685 & r277688. r277685 broke compiler-rt
compilation http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/23335
and r277685 is a followup from it.

llvm-svn: 277690
2016-08-04 04:16:24 +00:00
Chandler Carruth
824dbd8926 [PM] Change the name of the repeating utility to something less
overloaded (and simpler).

Sean rightly pointed out in code review that we've started using
"wrapper pass" as a specific part of the old pass manager, and in fact
it is more applicable there. Here, we really have a pass *template* to
build a repeated pass, so call it that.

llvm-svn: 277689
2016-08-04 03:52:53 +00:00
Sebastian Pop
4efd363a4c GVN-hoist: enable by default
As we addressed all compilation time problems with GVN-hoist
https://llvm.org/bugs/show_bug.cgi?id=28670
this patch turns GVN-hoist back by default.

Differential Revision: https://reviews.llvm.org/D23136

llvm-svn: 277685
2016-08-04 01:59:42 +00:00
Rui Ueyama
68a326fe7b pdbdump: Fix crash bug.
pdbdump calls DbiStreamBuilder::commit through PDBFileBuilder::commit
without calling DbiStreamBuilder::finalize. Because `finalize` initializes
`Header` member, `Header` remained nullptr which caused a crash bug.

Differential Revision: https://reviews.llvm.org/D23143

llvm-svn: 277681
2016-08-03 23:43:23 +00:00
Matthias Braun
f73ec6a530 RenameIndependentSubregs: Fix liveness query in rewriteOperands()
rewriteOperands() always performed liveness queries at the base index
rather than the RegSlot/Base as apropriate for the machine operand. This
could lead to illegal rewriting in some cases.

llvm-svn: 277661
2016-08-03 22:37:47 +00:00
Sanjay Patel
073234d16a [InstCombine] use m_APInt to allow icmp eq (add X, C1), C2 folds for splat constant vectors
llvm-svn: 277659
2016-08-03 22:08:44 +00:00
Kevin Enderby
83d2e01ea8 Clean up of libObject/Archive interfaces and change the last three uses of ErrorOr<>
changing them to Expected<> to allow them to pass through llvm Errors.
No functional change.

This commit by itself will break the next lld builds.  I’ll be committing the
matching change for lld immediately next.

llvm-svn: 277656
2016-08-03 21:57:47 +00:00
Guozhi Wei
d5e7d1acab [PPC] Handling CallInst in PPCBoolRetToInt
This patch fixes pr25548.

Current implementation of PPCBoolRetToInt doesn't handle CallInst correctly, so it failed to do the intended optimization when there is a CallInst with parameters. This patch fixed that.

llvm-svn: 277655
2016-08-03 21:43:51 +00:00