1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 05:01:59 +01:00

121529 Commits

Author SHA1 Message Date
Igor Breger
eeecfc6724 AVX512: Implemented encoding and intrinsics for
vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D11802

llvm-svn: 247276
2015-09-10 12:54:54 +00:00
Aaron Ballman
5197b310f6 Silencing C4141 warnings that were introduced en masse because __forceinline cannot be combined with inline in MSVC without triggering this diagnostic. This is safe to disable because clang will catch instances of the issue with -Wduplicate-decl-specifier, so we are not losing diagnostic coverage.
llvm-svn: 247275
2015-09-10 12:53:40 +00:00
Benjamin Kramer
f2b3e59406 [FileCheck] Use range-based for loops. NFC.
llvm-svn: 247272
2015-09-10 11:59:55 +00:00
Jakub Kuderski
1c398c712b There is a trunc(lshr (zext A), Cst) optimization in InstCombineCasts that
removes cast by performing the lshr on smaller types. However, currently there
is no trunc(lshr (sext A), Cst) variant.
This patch add such optimization by transforming trunc(lshr (sext A), Cst)
to ashr A, Cst.

Differential Revision: http://reviews.llvm.org/D12520

llvm-svn: 247271
2015-09-10 11:31:20 +00:00
Chandler Carruth
82d614a5b1 [ADT] Rewrite the StringRef::find implementation to be simpler, clearer,
and tremendously less reliant on the optimizer to fix things.

The code is always necessarily looking for the entire length of the
string when doing the equality tests in this find implementation, but it
previously was needlessly re-checking the size each time among other
annoyances.

By writing this so simply an ddirectly in terms of memcmp, it also is
about 8x faster in a debug build, which in turn makes FileCheck about 2x
faster in 'ninja check-llvm'. This saves about 8% of the time for
FileCheck-heavy parts of the test suite like the x86 backend tests.

llvm-svn: 247269
2015-09-10 11:17:49 +00:00
Silviu Baranga
82abdf7e36 [DAGCombine] Truncate BUILD_VECTOR operators if necessary when constant folding vectors
Summary:
The BUILD_VECTOR node will truncate its operators to match the
type. We need to take this into account when constant folding -
we need to perform a truncation before constant folding the elements.
This is because the upper bits can change the result, depending on
the operation type (for example this is the case for min/max).

This change also adds a regression test.

Reviewers: jmolloy

Subscribers: jmolloy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12697

llvm-svn: 247265
2015-09-10 10:34:34 +00:00
James Molloy
52c8c7c3ff Enable GlobalsAA by default
This can give significant improvements to alias analysis in some situations, and improves its testing coverage in all situations.

llvm-svn: 247264
2015-09-10 10:22:20 +00:00
James Molloy
786cc71fe2 Add GlobalsAA as preserved to a bunch of transforms
GlobalsAA must by definition be preserved in function passes, but the passmanager doesn't know that. Make each pass explicitly preserve GlobalsAA.

llvm-svn: 247263
2015-09-10 10:22:12 +00:00
Chandler Carruth
d57dfb9653 [ADT] Force inline several super boring and unusually hot methods on
SmallVector to further help debug builds not waste their time calling
one line functions.

To give you an idea of why this is worthwhile, this change alone gets
another >10% reduction in the runtime of TripleTest.Normalization! It's
now under 9 seconds for me. Sadly, this is the end of the easy wins for
that test. Anything further will require some different architecture of
the test itself. Still, I'm pretty happy. 'check-llvm' now is under 35s
for me.

llvm-svn: 247259
2015-09-10 09:46:47 +00:00
Chandler Carruth
96a2c57193 [ADT] Micro-optimize and force inlining for string switches.
These are now quite heavily used in unit tests and the host tools,
making it worth having them be reasonably fast even in an unoptimized
build. This change reduces the total runtime of TripleTest.Normalization
by yet another 10% to 15%. It is now under 10 seconds on my machine, and
the total check-llvm time has dropped from 38s to around 36s.

I experimented with a number of different options, and the code pattern
here consistently seemed to lower the cleanest, likely due to the
significantly simple CFG and far fewer redundant tests of 'Result'.

llvm-svn: 247257
2015-09-10 09:25:59 +00:00
James Molloy
255241a790 [ARM] Do not use vtrn for vectorshuffle if the order is reversed
The tests in isVTRNMask and isVTRN_v_undef_Mask should also check that the elements of the upper and lower half of the vectorshuffle occur in the correct order when both halves are used. Without this test the code assumes that it is correct to use vector transpose (vtrn) for the masks <1, 1, 0, 0> and <1, 3, 0, 2>, among others, but the transpose actually incorrectly generates shuffles for <0, 0, 1, 1> and <0, 2, 1, 3> in this case.

Patch by Jeroen Ketema!

llvm-svn: 247254
2015-09-10 08:42:28 +00:00
Chandler Carruth
277f8d2d80 [ADT] Apply a large hammer to StringRef functions: attribute always_inline.
The logic of this follows something Howard does in libc++ and something
I discussed with Chris eons ago -- for a lot of functions, there is
really no benefit to preserving "debug information" by leaving the
out-of-line even in debug builds. This is especially true as we now do
a very good job of preserving most debug information even in the face of
inlining. There are a bunch of methods in StringRef that we are paying
a completely unacceptable amount for with every debug build of every
LLVM developer.

Some day, we should fix Clang/LLVM so that developers can reasonable
use a default of something other than '-O0' and not waste their lives
waiting on *completely* unoptimized code to execute. We should have
a default that doesn't impede debugging while providing at least
plausable performance.

But today is not that day.

So today, I'm applying always_inline to the functions that are really
hurting the critical path for stuff like 'check_llvm'. I'm being very
cautious here, but there are a few other APIs that we really should do
this for as a matter of pragmatism. Hopefully we can rip this out some
day.

With this change, TripleTest.Normalization runtime decreases by over
10%, and the total 'check-llvm' time on my 48-core box goes from 38s to
just under 37s.

llvm-svn: 247253
2015-09-10 08:29:35 +00:00
Chandler Carruth
57fe727884 [Support] Fix the always_inline attribute macro to not include the
'inline' specifier. That specifier may or may not be valid for a given
function, or it may be required for correct linkage even when the
compiler doesn't support the always_inline attribute.

llvm-svn: 247252
2015-09-10 08:29:30 +00:00
Chandler Carruth
0d407eafe0 [ADT] Micro-optimize the Triple constructor by doing a single split and
re-using the resulting components rather than repeatedly splitting and
re-splitting to compute each component as part of the initializer list.

This is more work on PR23676. Sadly, it doesn't help much. It removes
the constructor from my profile, but doesn't make a sufficient dent in
the total time. But it should play together nicely with subsequent
changes.

llvm-svn: 247250
2015-09-10 07:51:43 +00:00
Chandler Carruth
e9c3d2d316 [ADT] Fix a confusing interface spec and some annoying peculiarities
with the StringRef::split method when used with a MaxSplit argument
other than '-1' (which nobody really does today, but which should
actually work).

The spec claimed both to split up to MaxSplit times, but also to append
<= MaxSplit strings to the vector. One of these doesn't make sense.
Given the name "MaxSplit", let's go with it being a max over how many
*splits* occur, which means the max on how many strings get appended is
MaxSplit+1. I'm not actually sure the implementation correctly provided
this logic either, as it used a really opaque loop structure.

The implementation was also playing weird games with nullptr in the data
field to try to rely on a totally opaque hidden property of the split
method that returns a pair. Nasty IMO.

Replace all of this with what is (IMO) simpler code that doesn't use the
pair returning split method, and instead just finds each separator and
appends directly. I think this is a lot easier to read, and it most
definitely matches the spec. Added some tests that exercise the corner
cases around StringRef() and StringRef("") that all now pass.

I'll start using this in code in the next commit.

llvm-svn: 247249
2015-09-10 07:51:37 +00:00
NAKAMURA Takumi
c05f204b7a GlobalsAAResult(&&): Move every members.
Or, one of MSVC builders failed with unexpected behavior.

llvm-svn: 247247
2015-09-10 07:16:42 +00:00
Elena Demikhovsky
49ab60d17b Added isUndef() interface for SDNode
Differential Revision: http://reviews.llvm.org/D12720

llvm-svn: 247246
2015-09-10 06:33:13 +00:00
Chandler Carruth
c484a28f0a [ADT] Switch a bunch of places in LLVM that were doing single-character
splits to actually use the single character split routine which does
less work, and in a debug build is *substantially* faster.

llvm-svn: 247245
2015-09-10 06:12:31 +00:00
Chandler Carruth
5e0afb35b0 [ADT] Add a single-character version of the small vector split routine
on StringRef. Finding and splitting on a single character is
substantially faster than doing it on even a single character StringRef
-- we immediately get to a *very* tuned memchr call this way.

Even nicer, we get to this even in a debug build, shaving 18% off the
runtime of TripleTest.Normalization, helping PR23676 some more.

llvm-svn: 247244
2015-09-10 06:07:03 +00:00
Chandler Carruth
6ce48f45a7 Add a way to skip the Go bindings tests even when Go is configured in
CMake.

The Go bindings tests in an unoptimized build take over 30 seconds for
me, making it the slowest test in 'check-llvm' by a factor of two.

I've only rigged this up fully to the CMake build. If someone is
interested in rigging it up to the autoconf build, they're welcome to do
so.

llvm-svn: 247243
2015-09-10 05:47:43 +00:00
Sanjoy Das
33ef65354e [ScalarEvolution] Fix PR24757.
Summary:
PR24757 was caused by some incorect math in
`ScalarEvolution::HowFarToZero` -- the smallest unsigned solution for X
in

  2^N * A = 2^N * X

is not necessarily A.

Reviewers: atrick, majnemer, meheff

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D12721

llvm-svn: 247242
2015-09-10 05:27:38 +00:00
Chandler Carruth
812097ec2e [LPM] Simplify this code and fix a compile error for compilers that
don't correctly implement the scoping rules of C++11 range based for
loops. This kind of aliasing isn't a good idea anyways (and wasn't
really intended).

llvm-svn: 247241
2015-09-10 04:22:36 +00:00
Chandler Carruth
b0c5d21894 [LPM] Use a map from analysis ID to immutable passes in the legacy pass
manager to avoid a slow linear scan of every immutable pass and on every
attempt to find an analysis pass.

This speeds up 'check-llvm' on an unoptimized build for me by 15%, YMMV.
It should also help (a tiny bit) other folks that are really
bottlenecked on repeated runs of tiny pass pipelines across small IR
files.

llvm-svn: 247240
2015-09-10 02:31:42 +00:00
Kit Barton
8bde6cb866 Enable the shrink wrapping optimization for PPC64.
The changes in this patch are as follows:
  1. Modify the emitPrologue and emitEpilogue methods to work properly when the prologue and epilogue blocks are not the first/last blocks in the function
  2. Fix a bug in PPCEarlyReturn optimization caused by an empty entry block in the function
  3. Override the runShrinkWrap PredicateFtor (defined in TargetMachine) to check whether shrink wrapping should run:
      Shrink wrapping will run on PPC64 (Little Endian and Big Endian) unless -enable-shrink-wrap=false is specified on command line

A new test case, ppc-shrink-wrapping.ll was created based on the existing shrink wrapping tests for x86, arm, and arm64.

Phabricator review: http://reviews.llvm.org/D11817

llvm-svn: 247237
2015-09-10 01:55:44 +00:00
Ahmed Bougacha
f0f3751dc9 [AArch64] Match FI+offset in STNP addressing mode.
First, we need to teach isFrameOffsetLegal about STNP.
It already knew about the STP/LDP variants, but those were probably
never exercised, because it's only the load/store optimizer that
generates STP/LDP, and the only user of the method is frame lowering,
which runs earlier.
The STP/LDP cases were wrong: they didn't take into account the fact
that they return two results, not one, so the immediate offset will be
the 4th operand, not the 3rd.

Follow-up to r247234.

llvm-svn: 247236
2015-09-10 01:54:43 +00:00
Davide Italiano
d66d78419d [MC] Convert all the remaining tests from macho-dump to llvm-readobj.
This sort-of deprecates macho-dump. It may take still a little while
to garbage collect it, but at least there's no real usage of it in
the tree anymore. New tests should always rely on llvm-readobj or
llvm-objdump.

llvm-svn: 247235
2015-09-10 01:50:00 +00:00
Ahmed Bougacha
ac10764756 [AArch64] Match base+offset in STNP addressing mode.
Followup to r247231.

llvm-svn: 247234
2015-09-10 01:48:29 +00:00
Mehdi Amini
0085639fea Makes EmitRecord() accepting ArrayRef and raw array (NFC)
After r247186, a vector is no longer needed as the push_front for
the code is removed.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247232
2015-09-10 01:45:55 +00:00
Ahmed Bougacha
5f7023b12c [AArch64] Support selecting STNP.
We could go through the load/store optimizer and match STNP where
we would have matched a nontemporal-annotated STP, but that's not
reliable enough, as an opportunistic optimization.
Insetad, we can guarantee emitting STNP, by matching them at ISel.
Since there are no single-input nontemporal stores, we have to
resort to some high-bits-extracting trickery to generate an STNP
from a plain store.

Also, we need to support another, LDP/STP-specific addressing mode,
base + signed scaled 7-bit immediate offset.
For now, only match the base. Let's make it smart separately.

Part of PR24086.

llvm-svn: 247231
2015-09-10 01:42:28 +00:00
Matt Arsenault
e27d2bced7 AMDGPU/SI: Fix more cases of losing exec operands
llvm-svn: 247230
2015-09-10 01:23:28 +00:00
Matt Arsenault
238e81b7c6 AMDGPU/SI: Fix creating v_mov_b32s without exec uses
This will be caught by existing tests with a
verifier check to be added in a future commit.

llvm-svn: 247229
2015-09-10 01:06:06 +00:00
Hans Wennborg
ddb1cf7aeb Revert r247216: "Fix Clang-tidy misc-use-override warnings, other minor fixes"
This caused build breakges, e.g.
http://lab.llvm.org:8011/builders/clang-x86_64-ubuntu-gdb-75/builds/24926

llvm-svn: 247226
2015-09-10 00:57:26 +00:00
Ahmed Bougacha
d90c25bcb0 [CodeGen] Make x86 nontemporal store patfrags generic. NFC.
To be used by other targets.

llvm-svn: 247225
2015-09-10 00:53:15 +00:00
Philip Reames
a5cde832ad [RewriteStatepointsForGC] Minor refactor to use shared implementation [NFC]
llvm-svn: 247223
2015-09-10 00:44:10 +00:00
Philip Reames
0e2ea3ec9a [RewriteStatepointsForGC] Strengthen a confusingly weak assertion [NFC]
The assertion was weaker than it should be and gave the impression we're growing the number of base defining values being considered during the fixed point interation.  That's not true.  The tighter form of the assert is useful documentation.

llvm-svn: 247221
2015-09-10 00:32:56 +00:00
Philip Reames
5581a8a2b4 [RewriteStatepointsForGC] One last bit of naming [NFCI]
llvm-svn: 247220
2015-09-10 00:27:50 +00:00
Reid Kleckner
b5753fd45f [WinEH] Add codegen support for cleanuppad and cleanupret
All of the complexity is in cleanupret, and it mostly follows the same
codepaths as catchret, except it doesn't take a return value in RAX.

This small example now compiles and executes successfully on win32:
  extern "C" int printf(const char *, ...) noexcept;
  struct Dtor {
    ~Dtor() { printf("~Dtor\n"); }
  };
  void has_cleanup() {
    Dtor o;
    throw 42;
  }
  int main() {
    try {
      has_cleanup();
    } catch (int) {
      printf("caught it\n");
    }
  }

Don't try to put the cleanup in the same function as the catch, or Bad
Things will happen.

llvm-svn: 247219
2015-09-10 00:25:23 +00:00
Philip Reames
ad09731a88 [RewriteStatepointsForGC] Further style/naming fixup [NFCI]
llvm-svn: 247217
2015-09-10 00:22:49 +00:00
Hans Wennborg
b5db40bf43 Fix Clang-tidy misc-use-override warnings, other minor fixes
Patch by Eugene Zelenko!

Differential Revision: http://reviews.llvm.org/D12740

llvm-svn: 247216
2015-09-10 00:12:56 +00:00
Mehdi Amini
7ea419f1ad Bitcode Writer: EmitRecordWith* takes an ArrayRef instead of a SmallVector (NFC)
This reapply commit r247178 after post-commit review from D.Blaikie
in a way that makes it compatible with the existing API.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247215
2015-09-10 00:05:09 +00:00
Mehdi Amini
f9dd260ee5 Add makeArrayRef() overload for ArrayRef input (no-op/identity) NFC
The purpose is to allow templated wrapper to work with either
ArrayRef or any convertible operation:

template<typename Container>
void wrapper(const Container &Arr) {
  impl(makeArrayRef(Arr));
}

with Container being a std::vector, a SmallVector, or an ArrayRef.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247214
2015-09-10 00:05:04 +00:00
Philip Reames
153e484862 [RewriteStatepointsForGC] More naming cleanup [NFCI]
llvm-svn: 247213
2015-09-10 00:01:53 +00:00
Philip Reames
b1d69773b9 [RewriteStatepointsForGC] Code cleanup [NFC]
Factor out common code related to naming values, fix a small style issue.  More to follow in separate changes.

llvm-svn: 247211
2015-09-09 23:57:18 +00:00
Philip Reames
b2d2c7c606 [RewriteStatepointsForGC] Extend base pointer inference to handle insertelement
This change is simply enhancing the existing inference algorithm to handle insertelement instructions by conservatively inserting a new instruction to propagate the vector of associated base pointers. In the process, I'm ripping out the peephole optimizations which mostly helped cover the fact this hadn't been done.

Note that most of the newly inserted nodes will be nearly immediately removed by the post insertion optimization pass introduced in 246718. Arguably, we should be trying harder to avoid the malloc traffic here, but I'd rather get the code correct, then worry about compile time.

Unlike previous extensions of the algorithm to handle more case, I discovered the existing code was causing miscompiles in some cases. In particular, we had an implicit assumption that the peephole covered *all* insert element instructions, so if we had a value directly based on a insert element the peephole didn't cover, we proceeded as if it were a base anyways. Not good. I believe we had the same issue with shufflevector which is why I adjusted the predicate for them as well.

Differential Revision: http://reviews.llvm.org/D12583

llvm-svn: 247210
2015-09-09 23:40:12 +00:00
Philip Reames
b4dfda2138 [RewriteStatepointsForGC] Make base pointer inference deterministic
Previously, the base pointer algorithm wasn't deterministic. The core fixed point was (of course), but we were inserting new nodes and optimizing them in an order which was unspecified and variable. We'd somewhat hacked around this for testing by sorting by value name, but that doesn't solve the general determinism problem.

Instead, we can use the order of traversal over the def/use graph to give us a single consistent ordering. Today, this is a DFS order, but the exact order doesn't mater provided it's deterministic for a given input.

(Q: It is safe to rely on a deterministic order of operands right?)

Note that this only fixes the determinism within a single inference step. The inference step is currently invoked many times in a non-deterministic order. That's a future change in the sequence. :)

Differential Revision: http://reviews.llvm.org/D12640

llvm-svn: 247208
2015-09-09 23:26:08 +00:00
Peter Collingbourne
04c6c402ba LowerBitSets: Fix non-determinism bug.
Visit disjoint sets in a deterministic order based on the maximum BitSetNM
index, otherwise the order in which we visit them will depend on pointer
comparisons. This was being exposed by MSan.

llvm-svn: 247201
2015-09-09 22:30:32 +00:00
Reid Kleckner
40ce82e375 [SEH] Emit 32-bit SEH tables for the new EH IR
The 32-bit tables don't actually contain PC range data, so emitting them
is incredibly simple.

The 64-bit tables, on the other hand, use the same table for state
numbering as well as label ranges. This makes things more difficult, so
it will be implemented later.

llvm-svn: 247192
2015-09-09 21:10:03 +00:00
Dan Gohman
c4e3274379 [WebAssembly] Update target datalayout strings.
llvm-svn: 247187
2015-09-09 20:54:31 +00:00
Teresa Johnson
8df32eb93e Change EmitRecordWithAbbrevImpl to take Optional record code. NFC.
This change enables EmitRecord to pass the supplied record Code to
EmitRecordWithAbbrevImpl, rather than insert it into the Vals array.
It is an enabler for changing EmitRecord to take an ArrayRef<uintty> instead
of a SmallVectorImpl<uintty>&

Patch suggested by Duncan P. N. Exon Smith, modified by myself a bit to get
correct assertion checking.

llvm-svn: 247186
2015-09-09 20:53:31 +00:00
Piotr Padlewski
56f48d9943 ScalarEvolution assume hanging bugfix
http://reviews.llvm.org/D12719

llvm-svn: 247184
2015-09-09 20:47:30 +00:00