1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 12:12:47 +01:00
Commit Graph

160395 Commits

Author SHA1 Message Date
Craig Topper
2377f160f4 [X86] Add 512-bit unmasked pmulhrsw/pmulhw/pmulhuw intrinsics. Remove and auto upgrade 128/256/512 bit masked pmulhrsw/pmulhw/pmulhuw intrinsics.
The 128 and 256 bit versions were already not used by clang. This adds an equivalent unmasked 512 bit version. Then autoupgrades all sizes to use unmasked intrinsics plus select.

llvm-svn: 325559
2018-02-20 07:28:14 +00:00
Craig Topper
9470d5bcfb [X86] Remove GCCBuiltin from a bunch of intrinsics that aren't used by clang and should be removed.
llvm-svn: 325552
2018-02-20 05:49:22 +00:00
Serge Pavlov
5202bf068f Report fatal error in the case of out of memory
This is the second part of recommit of r325224. The previous part was
committed in r325426, which deals with C++ memory allocation. Solution
for C memory allocation involved functions `llvm::malloc` and similar.
This was a fragile solution because it caused ambiguity errors in some
cases. In this commit the new functions have names like `llvm::safe_malloc`.

The relevant part of original comment is below, updated for new function
names.

Analysis of fails in the case of out of memory errors can be tricky on
Windows. Such error emerges at the point where memory allocation function
fails, but manifests itself when null pointer is used. These two points
may be distant from each other. Besides, next runs may not exhibit
allocation error.

In some cases memory is allocated by a call to some of C allocation
functions, malloc, calloc and realloc. They are used for interoperability
with C code, when allocated object has variable size and when it is
necessary to avoid call of constructors. In many calls the result is not
checked for null pointer. To simplify checks, new functions are defined
in the namespace 'llvm': `safe_malloc`, `safe_calloc` and `safe_realloc`.
They behave as corresponding standard functions but produce fatal error if
allocation fails. This change replaces the standard functions like 'malloc'
in the cases when the result of the allocation function is not checked
for null pointer.

Finally, there are plain C code, that uses malloc and similar functions. If
the result is not checked, assert statement is added.

Differential Revision: https://reviews.llvm.org/D43010

llvm-svn: 325551
2018-02-20 05:41:26 +00:00
Amara Emerson
a99b25a021 [AArch64][GlobalISel] When copying from a gpr32 to an fpr16 reg, convert to fpr32 first.
This is a follow on commit to r[x] where we fix the other direction of copy.
For this case, after converting the source from gpr32 -> fpr32, we use a
subregister copy, which is essentially what EXTRACT_SUBREG does in SDAG land.

https://reviews.llvm.org/D43444

llvm-svn: 325550
2018-02-20 05:11:57 +00:00
Craig Topper
fc84bda806 [X86] Mark XOP vpmac* and vpmadc intrinsics as being commutative so that tablegen will generate patterns with the load in operand 0.
This allows loads to be folded during isel without the peephole pass.

llvm-svn: 325548
2018-02-20 03:58:14 +00:00
Craig Topper
59368d192a [X86] Make XOP VPCOM instructions commutable to fold loads during isel.
llvm-svn: 325547
2018-02-20 03:58:13 +00:00
Craig Topper
de1f98619d [X86] Make a helper function for commuting AVX512 VPCMP immediates since we do it in two places.
llvm-svn: 325546
2018-02-20 03:58:11 +00:00
Aditya Nandakumar
d8a3a0d897 [GISel]: Add pattern matchers for G_BITCAST/PTRTOINT/INTTOPTR
Adds pattern matchers for the above along with unit tests for the same.
https://reviews.llvm.org/D43479

llvm-svn: 325542
2018-02-19 23:11:53 +00:00
Sanjay Patel
ec74b351d9 [InstCombine] use CreateWithCopiedFlags to reduce code; NFCI
Also, move the folds with constants closer to make it easier to follow. 

llvm-svn: 325541
2018-02-19 23:09:03 +00:00
Brian Gesiak
0444eca9fd Revert "[mem2reg] Use range loops (NFCI)"
This reverts commit r325532.

llvm-svn: 325539
2018-02-19 22:48:51 +00:00
Craig Topper
6b23b7924a [X86] Use vpmovq2m/vpmovd2m for truncate to vXi1 when possible.
Previously we used vptestmd, but the scheduling data for SKX says vpmovq2m/vpmovd2m is lower latency. We already used vpmovb2m/vpmovw2m for byte/word truncates. So this is more consistent anyway.

llvm-svn: 325534
2018-02-19 22:07:31 +00:00
Sanjay Patel
54877d006d [InstCombine] allow fdiv with constant dividend folds with less than full -ffast-math
It's possible that we could allow this either 'arcp' or 'reassoc' alone, but this
should be conservatively better than what we have right now. GCC allows this with
only -freciprocal-math.

The last test is changed to show a case that is expected to fold, but we need D43398.

llvm-svn: 325533
2018-02-19 21:46:52 +00:00
Brian Gesiak
7874a0e6f0 [mem2reg] Use range loops (NFCI)
Summary:
Several for loops in PromoteMemoryToRegister.cpp leave their increment
expression empty, instead incrementing the iterator within the for loop
body. I believe this is because these loops were previously implemented
as while loops; see https://reviews.llvm.org/rL188327.

Incrementing the iterator within the body of the for loop instead of
in its increment expression makes it seem like the iterator will be
modified or conditionally incremented within the loop, but that is not
the case in these loops.

Instead, use range loops.

Test Plan: `check-llvm`

Reviewers: davide, bkramer

Reviewed By: davide, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43473

llvm-svn: 325532
2018-02-19 21:44:52 +00:00
Sanjay Patel
15bf1f9108 [InstCombine] refactor fdiv with constant dividend folds; NFC
The last fold that used to be here was not necessary. That's a 
combination of 2 folds (and there's a regression test to show that).

The transforms are guarded by isFast(), but that should be loosened.

llvm-svn: 325531
2018-02-19 21:17:58 +00:00
Sanjay Patel
4fe25aad51 [InstCombine] move fdiv tests; NFC
Also, use vector constants just to prove that already works.

llvm-svn: 325530
2018-02-19 21:13:39 +00:00
Brian Gesiak
a787cc44e8 [Coroutines] Move debug statement before assert
Summary:
Move a debug statement to above where an assertion is hit, so that the debug
statement can be inspected before a stack trace.

Test Plan: `check-llvm`

llvm-svn: 325529
2018-02-19 20:50:09 +00:00
Alexander Richardson
a80b0b4cfd [llvm-objcopy] Use the full filename in --add-gnu-debuglink
Summary:
The current implementation was writing the file name without the extension
whereas GNU objcopy writes the full filename. With this change GDB will now
load the .debug file instead of silently ignoring it.

Reviewers: jakehehrlich, jhenderson

Reviewed By: jakehehrlich

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43474

llvm-svn: 325528
2018-02-19 19:53:44 +00:00
Craig Topper
deb8d1bdf2 [X86] Stop swapping the operands of AVX512 setge.
We swapped the operands and used setle, but I don't see any reason to do that. I think this is a holdover from SSE where we swap and the invert to use pcmpgt. But with AVX512 we don't want an invert so we won't use pcmpgt. So there's no need to swap.

llvm-svn: 325527
2018-02-19 19:23:35 +00:00
Craig Topper
e96cad0b0e [X86] Reduce the number of isel pattern variations needed for VPTESTM/VPTESTNM matching.
Canonicalize EQ/NE PCMPM to have build vector all zeros on the RHS so we don't have to pattern match it in both locations. This significantly reduces the number of isel patterns needed since we also had to multiply it out with loads being in either operand of the 'and' input node and in the 'and' masking node.

This removes over 24000 bytes from the isel table.

llvm-svn: 325526
2018-02-19 19:23:31 +00:00
Steven Wu
a49fcd3ebf bitcode support change for fast flags compatibility
Summary: The discussion and as per need, each vendor needs a way to keep the old fast flags and the new fast flags in the auto upgrade path of the IR upgrader.  This revision addresses that issue.

Patched by Michael Berg

Reviewers: qcolombet, hans, steven_wu

Reviewed By: qcolombet, steven_wu

Subscribers: dexonsmith, vsk, mehdi_amini, andrewrk, MatzeB, wristow, spatel

Differential Revision: https://reviews.llvm.org/D43253

llvm-svn: 325525
2018-02-19 19:22:28 +00:00
Mark Searles
70cba954aa [AMDGPU] Make note of existing waitcnt instrs; this is add-on work related to suppression of redundant waitcnt instrs. It is necessary to make note of these existing waitcnt instrs so that we do not fall into an infinite loop when handling loops. Also, [NFC] some minor code clean-up.
llvm-svn: 325524
2018-02-19 19:19:59 +00:00
Simon Pilgrim
e1f55f9c90 [SelectionDAG] ComputeKnownBits - add support for SMIN+SMAX clamp patterns
If we have a clamp pattern, SMIN(SMAX(X, LO),HI) or SMAX(SMIN(X, HI),LO) then we can deduce that the number of signbits (zeros/ones) will be at least the minimum of the LO and HI constants.

ComputeKnownBits equivalent of D43338.

Differential Revision: https://reviews.llvm.org/D43463

llvm-svn: 325521
2018-02-19 18:08:16 +00:00
Mark Searles
bf29d8d265 [AMDGPU] Increased vector length for global/constant loads.
Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords.

Author: FarhanaAleen

Reviewed By: rampitec

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D43275

llvm-svn: 325518
2018-02-19 16:42:49 +00:00
David Green
9766fdd5cb [Dominators] Update DominatorTree compare in case roots are different
The compare function, unusually, returns false on same, true on
different. This fixes the conditions for different roots.

Reviewed as a part of D41298.

llvm-svn: 325517
2018-02-19 16:28:24 +00:00
Pavel Labath
627b30137f [CodeGen] Refactor AppleAccelTable
Summary:
This commit separates the abstract accelerator table data structure
from the code for writing out an on-disk representation of a specific
accelerator table format. The idea is that former (now called
AccelTable<T>) can be reused for the DWARF v5 accelerator tables
as-is, without any further customizations.

Some bits of the emission code (now living in the EmissionContext class)
can be reused for DWARF v5 as well, but the subtle differences in the
layout of various subtables mean the sharing is not always possible.
(Also, the individual emit*** functions are fairly simple so there's a
tradeoff between making a bigger general-purpose function, and two
smaller targeted functions.)

Another advantage of this setup is that more of the serialization logic
can be hidden in the .cpp file -- I have moved declarations of the
header and all the emission functions there.

Reviewers: JDevlieghere, aprantl, probinson, dblaikie

Subscribers: echristo, clayborg, vleschuk, llvm-commits

Differential Revision: https://reviews.llvm.org/D43285

llvm-svn: 325516
2018-02-19 16:12:20 +00:00
Sanjay Patel
6c99435a8b [TTI CostModel] change default cost of FP ops to 1 (PR36280)
This change was mentioned at least as far back as:
https://bugs.llvm.org/show_bug.cgi?id=26837#c26
...and I found a real program that is harmed by this: 
Himeno running on AMD Jaguar gets 6% slower with SLP vectorization:
https://bugs.llvm.org/show_bug.cgi?id=36280
...but the change here appears to solve that bug only accidentally.

The div/rem costs for x86 look very wrong in some cases, but that's already true, 
so we can fix those in follow-up patches. There's also evidence that more cost model
changes are needed to solve SLP problems as shown in D42981, but that's an independent 
problem (though the solution may be adjusted after this change is made).

Differential Revision: https://reviews.llvm.org/D43079

llvm-svn: 325515
2018-02-19 16:11:44 +00:00
Rafael Espindola
8e57037a71 Bring back r323297.
It was reverted because it broke the grub build. The reason the grub
build broke is because grub does its own relocation processing and was
not handing R_386_PLT32. Since grub has no dynamic linker, the fix is
trivial: handle R_386_PLT32 exactly like R_386_PC32.

On the report it was noted that they are using
-fno-integrated-assembler. The upstream GAS (starting with
451875b4f976a527395e9303224c7881b65e12ed) will already be producing a
R_386_PLT32 anyway, so they have to update their code one way or the
other

Original message:

Don't assume a null GV is local for ELF and MachO.

This is already a simplification, and should help with avoiding a plt
reference when calling an intrinsic with -fno-plt.

With this change we return false for null GVs, so the caller only
needs to check the new metadata to decide if it should use foo@plt or
*foo@got.

llvm-svn: 325514
2018-02-19 16:02:38 +00:00
Francis Visoiu Mistrih
e2a13616f8 [CodeGen] Fix tests breaking after r325505
llvm-svn: 325512
2018-02-19 15:51:17 +00:00
Charles Saternos
0c5dc7730f [ThinLTO] Add GraphTraits for FunctionSummaries
Add GraphTraits definitions to the FunctionSummary and ModuleSummaryIndex classes. These GraphTraits will be used to construct find SCC's in ThinLTO analysis passes.

Third attempt - moved function from lambda to static function due to build failures.

llvm-svn: 325506
2018-02-19 15:14:50 +00:00
Francis Visoiu Mistrih
5d2e990b7d Revert "[CodeGen] Move printing '\n' from MachineInstr::print to MachineBasicBlock::print"
This reverts commit r324681.

llvm-svn: 325505
2018-02-19 15:08:49 +00:00
Simon Pilgrim
c73a5be0c2 [X86][SSE] combineTruncateWithSat - use truncateVectorWithPACK down to 64-bit subvectors
Add support for chaining PACKSS/PACKUS down to 64-bit vectors by using only a single 128-bit input.

llvm-svn: 325494
2018-02-19 13:29:20 +00:00
Ivan A. Kosarev
c23c92625c [Transforms] Propagate new-format TBAA tags on simplification of memory-transfer intrinsics
With this patch in place, when a new-format TBAA tag is available
for a memory-transfer intrinsic call, we prefer propagating that
new-format tag. Otherwise, we fallback to the old approach where
we try to construct a proper TBAA access tag from 'tbaa.struct'
metadata.

Differential Revision: https://reviews.llvm.org/D41543

llvm-svn: 325488
2018-02-19 12:10:20 +00:00
Igor Laevsky
5480be3134 [llvm-opt-fuzzer] Add another pack of passes for continuous fuzzing
Differential Revision: https://reviews.llvm.org/D43384

llvm-svn: 325487
2018-02-19 11:57:07 +00:00
Dylan McKay
5690ef0222 [AVR] Set the program address space in the data layout
This adds the program memory address space setting to the AVR data
layout.

This setting was very recently added under r325479.

At the moment, there are no uses of this setting. In the future, things
such as switch lookup tables should reside there.

llvm-svn: 325481
2018-02-19 10:40:59 +00:00
Dylan McKay
ab1efa5f6d Add default address space for functions to the data layout (1/3)
Summary:
This adds initial support for letting targets specify which address
spaces their functions should reside in by default.

If a function is created by a frontend, it will get the default address space specified in the DataLayout, unless the frontend explicitly uses a more general `llvm::Function` constructor. Function address spaces will become a part of the bitcode and textual IR forms, as we do not have access to a data layout whilst parsing LL.

It will be possible to write IR that explicitly has `addrspace(n)` on a function. In this case, the function will reside in the specified space, ignoring the default in the DL.

This is the first step towards placing functions into the correct
address space for Harvard architectures.

Full patchset
* Add program address space to data layout D37052
* Require address space to be specified when creating functions D37054
* [clang] Require address space to be specified when creating functions D37057

Reviewers: pcc, arsenm, kparzysz, hfinkel, theraven

Reviewed By: theraven

Subscribers: arichardson, simoncook, rengolin, wdng, uabelho, bjope, asb, llvm-commits

Differential Revision: https://reviews.llvm.org/D37052

llvm-svn: 325479
2018-02-19 09:56:22 +00:00
Dylan McKay
a81cf82ee0 [AVR] Fix a lowering bug in AVRISelLowering.cpp
The parseFunctionArgs() method was directly reading the
arguments from a Function object, but is should have used the
arguments supplied by the SelectionDAGBuilder.

This was causing
the lowering code to only lower one argument, not two in some cases.

Thanks to @brainlag on GitHub for coming up with the working fix!

Patch-by: @brainlag on GitHub
llvm-svn: 325474
2018-02-19 08:28:38 +00:00
Eric Christopher
9287f3a5f6 Add LanaiMCTargetDesc.h to LanaiInstrInfo.h to make it self contained
with instruction enum definitions.

llvm-svn: 325473
2018-02-19 05:26:49 +00:00
Craig Topper
2d3b7e8d64 [X86] Correct a typo I made in combineToExtendCMOV recently.
We're accidentally checking that the same node is a constant twice instead of checking the other node.

This isn't a functional problem since we didn't do anything below that explicitly requires constants. It just means we may have introduced a sign_extend or zero_extend that won't fold out.

llvm-svn: 325469
2018-02-18 20:41:25 +00:00
Sanjay Patel
a75b2d22ae [PatternMatch, InstSimplify] enhance m_AllOnes() to ignore undef elements in vectors
Loosening the matcher definition reveals a subtle bug in InstSimplify (we should not
assume that because an operand constant matches that it's safe to return it as a result).

So I'm making that change here too (that diff could be independent, but I'm not sure how 
to reveal it before the matcher change).

This also seems like a good reason to *not* include matchers that capture the value.
We don't want to encourage the potential misstep of propagating undef values when it's
not allowed/intended.

I didn't include the capture variant option here or in the related rL325437 (m_One), 
but it already exists for other constant matchers.

llvm-svn: 325466
2018-02-18 18:05:08 +00:00
Sanjay Patel
1ba1b7ac34 [InstSimplify] add tests with vector undef elts; NFC
llvm-svn: 325465
2018-02-18 17:39:09 +00:00
Amara Emerson
e794d7bf90 Fix unused assertion variable warning.
llvm-svn: 325464
2018-02-18 17:28:34 +00:00
Amara Emerson
dc30913313 [AArch64][GlobalISel] Fix an assert fail/miscompile when fp16 types are copied
to gpr register banks.

PR36345.

rdar://36478867

Differential Revision: https://reviews.llvm.org/D43310

llvm-svn: 325463
2018-02-18 17:10:49 +00:00
Amara Emerson
ecf2cc8686 [AArch64][GlobalISel] Support G_INSERT/G_EXTRACT of types < s32 bits.
These are needed for operations on fp16 types in a later patch.

llvm-svn: 325462
2018-02-18 17:03:02 +00:00
Sanjay Patel
c987022ca6 [PatternMatch] reformatting and comment clean-ups; NFC
llvm-svn: 325461
2018-02-18 16:19:22 +00:00
Benjamin Kramer
97a9ef625d [Support] Replace hand-written scope_exit with make_scope_exit.
No functionality change intended.

llvm-svn: 325460
2018-02-18 16:05:40 +00:00
Haicheng Wu
7961b1b87b [AArch64] Coalesce Copy Zero during instruction selection
Add special case for copy of zero to avoid a double copy.

Differential Revision: https://reviews.llvm.org/D36104

llvm-svn: 325459
2018-02-18 13:51:33 +00:00
Jonas Paulsson
43ee82f217 [BPF] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Yonghong Song
llvm-svn: 325457
2018-02-18 10:09:54 +00:00
Craig Topper
3f02bf7302 [X86] Make masked pcmpeq commutable during isel so we can fold loads in other operand to the shorter encoding.
Previously we used the immediate encoding if the load was in operand 0 and the short encoding if the load was in operand 1.

This added an insane number of bytes to the size of the isel table. I'm wondering if we should always use the immediate form during isel and change to the short form during emission. This would remove the need to pattern match every combination for both the immediate form and the short form during isel. We could do the same with vpcmpgt

llvm-svn: 325456
2018-02-18 02:37:33 +00:00
Craig Topper
d1134f1e80 [X86] Add -show-mc-encoding to the avx512-vec-cmp.ll test and add test case to show that we're failing to use the shorter pcmpeq encoding when the memory arguemnt is the first argument.
This can't be spotted without showing the encodings since they have the same mnemonic.

llvm-svn: 325455
2018-02-18 02:37:32 +00:00
Simon Pilgrim
d1f44653e9 Revert: [llvm] r325448 - [ThinLTO] Add GraphTraits for FunctionSummaries
Add GraphTraits definitions to the FunctionSummary and ModuleSummaryIndex classes. These GraphTraits will be used to construct find SCC's in ThinLTO analysis passes.

Second attempt, since last patch caused stage2 build to fail (now using function_ref rather than std::function).

Reverted due to buildbot failures

llvm-svn: 325454
2018-02-18 00:01:36 +00:00