1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 11:33:24 +02:00
Commit Graph

106958 Commits

Author SHA1 Message Date
Juergen Ributzka
98be3942ed [FastISel][AArch64] Use the correct register class to make the MI verifier happy.
This is mostly achieved by providing the correct register class manually,
because getRegClassFor always returns the GPR*AllRegClass for MVT::i32 and
MVT::i64.

Also cleanup the code to use the FastEmitInst_* method whenever possible. This
makes sure that the operands' register class is properly constrained. For all
the remaining cases this adds the missing constrainOperandRegClass calls for
each operand.

llvm-svn: 216225
2014-08-21 20:57:57 +00:00
David Blaikie
7a58463dea Explicitly pass ownership of the MemoryBuffer to AddNewSourceBuffer using std::unique_ptr
llvm-svn: 216223
2014-08-21 20:44:56 +00:00
Tom Stellard
214d6b1c7e R600/SI: Teach moveToVALU how to handle more S_LOAD_* instructions
llvm-svn: 216220
2014-08-21 20:41:00 +00:00
Tom Stellard
15e318920a R600/SI: Make sure SCRATCH_WAVE_OFFSET is added as Live-In to the function
This fixes a crash in an ocl conformance test.

llvm-svn: 216219
2014-08-21 20:40:58 +00:00
Tom Stellard
954e9d0d17 R600/SI: Remove unused SGPR spilling code
llvm-svn: 216218
2014-08-21 20:40:56 +00:00
Tom Stellard
51057b05fa R600/SI: Use eliminateFrameIndex() to expand SGPR spill pseudos
This will simplify the SGPR spilling and also allow us to use
MachineFrameInfo for calculating offsets, which should be more
reliable than our custom code.

This fixes a crash in some cases where a register would be spilled
in a branch such that the VGPR defined for spilling did not dominate
all the uses when restoring.

This fixes a crash in an ocl conformance test.  The test requries
register spilling and is too big to include.

llvm-svn: 216217
2014-08-21 20:40:54 +00:00
Tom Stellard
41cb1ccdb8 R600/SI: Handle VCC in SIRegisterInfo::getPhysRegSubReg()
This fixes a crash in an ocl conformance test.  The test requries
register spilling and is too big to include.

llvm-svn: 216216
2014-08-21 20:40:50 +00:00
Rafael Espindola
77b37756aa Rewrite the gold plugin to fix pr19901.
There is a fundamental difference between how the gold API and lib/LTO view
the LTO process.

The gold API talks about a particular symbol in a particular file. The lib/LTO
API talks about a symbol in the merged module.

The merged module is then defined in terms of the IR semantics. In particular,
a linkonce_odr GV is only copied if it is used, since it is valid to drop
unused linkonce_odr GVs.

In the testcase in pr19901 both properties collide. What happens is that gold
asks us to keep a particular linkonce_odr symbol, but the IR linker doesn't
copy it to the merged module and we never have a chance to ask lib/LTO to keep
it.

This patch fixes it by having a more direct implementation of the gold API. If
it asks us to keep a symbol, we change the linkage so it is not linkonce. If it
says we can drop a symbol, we do so. All of this before we even send the module
to lib/Linker.

Since now we don't have to produce LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN,
during symbol resolution we can use a temporary LLVMContext and do lazy
module loading. This allows us to keep the minimum possible amount of
allocated memory around. This should also allow as much parallelism as
we want, since there is no shared context.

llvm-svn: 216215
2014-08-21 20:28:55 +00:00
Jonathan Roelofs
152ca2e50f Satiate the sanitizer build bot
This fixes a missing initializer from r216182

llvm-svn: 216212
2014-08-21 20:09:15 +00:00
Rafael Espindola
eb4cd5d130 Move some logic to populateLTOPassManager.
This will avoid code duplication in the next commit which calls it directly
from the gold plugin.

llvm-svn: 216211
2014-08-21 20:03:44 +00:00
Adam Nemet
ab33858cc4 [AVX512] Add class to group common template arguments related to vector type
We discussed the issue of generality vs. readability of the AVX512 classes
recently.  I proposed this approach to try to hide and centralize the mappings
we commonly perform based on the vector type.  A new class X86VectorVTInfo
captures these.

The idea is to pass an instance of this class to classes/multiclasses instead
of the corresponding ValueType.  Then the class/multiclass can use its field
for things that derive from the type rather than passing all those as separate
arguments.

I modified avx512_valign to demonstrate this new approach.  As you can see
instead of 7 related template parameters we now have one.  The downside is
that we have to refer to fields for the derived values.  I named the argument
'_' in order to make this as invisible as possible.  Please let me know if you
absolutely hate this.  (Also once we allow local initializations in
multiclasses we can recover the original version by assigning the fields to
local variables.)

Another possible use-case for this class is to directly map things, e.g.:

  RegisterClass KRC = X86VectorVTInfo<32, i16>.KRC

llvm-svn: 216209
2014-08-21 19:50:07 +00:00
Alex Lorenz
d123485f6e Coverage Mapping: add function's hash to coverage function records.
The profile data format was recently updated and the new indexing api
requires the code coverage tool to know the function's hash as well
as the function's name to get the execution counts for a function.

Differential Revision: http://reviews.llvm.org/D4994

llvm-svn: 216207
2014-08-21 19:23:25 +00:00
Rafael Espindola
ef1b9eacdd llvm-gcc is dead.
llvm-svn: 216206
2014-08-21 19:22:24 +00:00
Eric Fiselier
54f5370778 [LIT] Remove documentation for method since it does not exist
llvm-svn: 216204
2014-08-21 18:52:58 +00:00
Rafael Espindola
43b3e883b8 Respect LibraryInfo in populateLTOPassManager and use it. NFC.
llvm-svn: 216203
2014-08-21 18:49:52 +00:00
Rafael Espindola
6518040a1a Remove dead code. NFC.
llvm-svn: 216201
2014-08-21 18:11:21 +00:00
Quentin Colombet
ca131ef450 [AArch64] Run a peephole pass right after AdvSIMD pass.
The AdvSIMD pass may produce copies that are not coalescer-friendly. The
peephole optimizer knows how to fix that as demonstrated in the test case.

<rdar://problem/12702965>

llvm-svn: 216200
2014-08-21 18:10:07 +00:00
Juergen Ributzka
3f0a3fc649 [FastISel][AArch64] Factor out ANDWri instruction generation into a helper function. NFCI.
llvm-svn: 216199
2014-08-21 18:02:25 +00:00
Moritz Roth
434b09b95f Thumb1 load/store optimizer: Improve code to materialize new base register.
There are two add-immediate instructions in Thumb1: tADDi8 and tADDi3. Only
the latter supports using different source and destination registers, so
whenever we materialize a new base register (at a certain offset) we'd do
so by moving the base register value to the new register and then adding in
place. This patch changes the code to use a single tADDi3 if the offset is
small enough to fit in 3 bits.

Differential Revision: http://reviews.llvm.org/D5006

llvm-svn: 216193
2014-08-21 17:11:03 +00:00
Hans Wennborg
eeb27ba170 Use returns_nonnull in BumpPtrAllocator and MallocAllocator to avoid null-check in placement new
In both Clang and LLVM, this is a common pattern:

  Size = sizeof(DeclRefExpr) + SomeExtraStuff;
  void *Mem = Context.Allocate(Size, llvm::alignOf<DeclRefExpr>());
  return new (Mem) DeclRefExpr(...);

The annoying thing is that because the default placement-new operator has a
nothrow specification, the compiler will insert a null check of Mem before
calling the DeclRefExpr constructor. This null check is redundant for us,
because we expect the allocation functions to never return null.

By annotating the allocator functions with returns_nonnull, we can optimize
away these checks. Compiling clang with a recent version of Clang and measuring
with:

  $ perf stat -r20 bin/clang.patch -fsyntax-only -w gcc.c && perf stat -r20 bin/clang.orig -fsyntax-only -w gcc.c

Shows a 2.4% speed-up (+- 0.8%).

The pattern occurs in LLVM too. Measuring with -O3 (and now using bzip2.c
instead, because it's smaller):

  $ perf stat -r20 bin/clang.patch -O3 -w bzip2.c  &&  perf stat -r20 bin/clang.orig -O3 -w bzip2.c

Shows 4.4 % speed-up (+- 1%).

If anyone knows of a similar attribute we can use for MSVC, or some other
technique to get rid off the null check there, please let me know.

Differential Revision: http://reviews.llvm.org/D4989

llvm-svn: 216192
2014-08-21 17:10:00 +00:00
Juergen Ributzka
83c6e645d7 [FastISel][AArch64] Remove redundant test.
These tests and many more are already covered by fast-isel-addressing-modes.ll.

llvm-svn: 216186
2014-08-21 16:40:05 +00:00
Jonathan Roelofs
31a7253462 Add a thread-model knob for lowering atomics on baremetal & single threaded systems
http://reviews.llvm.org/D4984

llvm-svn: 216182
2014-08-21 14:35:47 +00:00
Rafael Espindola
dd478b7122 Handle inlining in populateLTOPassManager like in populateModulePassManager.
No functionality change.

llvm-svn: 216178
2014-08-21 13:35:30 +00:00
Zinovy Nis
810d45c17a [CLNUP] Remove return after llvm_unreachable. Thanks to Hal Finkel for pointing.
llvm-svn: 216176
2014-08-21 13:30:05 +00:00
Benjamin Kramer
54a0e07417 DAGCombiner: Make concat_vector combine safe for EVTs and concat_vectors with many arguments.
PR20677

llvm-svn: 216175
2014-08-21 13:28:02 +00:00
Rafael Espindola
f79b5bf8bb Move DisableGVNLoadPRE from populateLTOPassManager to PassManagerBuilder.
llvm-svn: 216174
2014-08-21 13:13:17 +00:00
Josh Klontz
7806ab827f X86AsmPrinter MCJIT MSVC bug fix.
Summary:
This bug was introduced in r213006 which makes an assumption that MCSection is COFF for Windows MSVC. This assumption is broken for MCJIT users where ELF is used instead [1]. The fix is to change the MCSection cast to a dyn_cast.

[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-December/068407.html.

Reviewers: majnemer

Reviewed By: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4872

llvm-svn: 216173
2014-08-21 12:55:27 +00:00
Oliver Stannard
7994364d2f [ARM] Enable DP copy, load and store instructions for FPv4-SP
The FPv4-SP floating-point unit is generally referred to as
single-precision only, but it does have double-precision registers and
load, store and GPR<->DPR move instructions which operate on them.
This patch enables the use of these registers, the main advantage of
which is that we now comply with the AAPCS-VFP calling convention.
This partially reverts r209650, which added some AAPCS-VFP support,
but did not handle return values or alignment of double arguments in
registers.

This patch also adds tests for Thumb2 code generation for
floating-point instructions and intrinsics, which previously only
existed for ARM.

llvm-svn: 216172
2014-08-21 12:50:31 +00:00
Rafael Espindola
7968b84e64 Sort declarations.
llvm-svn: 216171
2014-08-21 12:39:07 +00:00
Benjamin Kramer
6bec61b66d Make format_object_base's destructor protected and non-virtual.
It's not meant to be used with operator delete and this avoids emitting virtual
dtors for every derived format object.

llvm-svn: 216170
2014-08-21 11:22:05 +00:00
Erik Verbruggen
3f6db9cd35 Reassociate x + -0.1234 * y into x - 0.1234 * y
This does not require -ffast-math, and it gives CSE/GVN more options to
eliminate duplicate expressions in, e.g.:

  return ((x + 0.1234 * y) * (x - 0.1234 * y));

Differential Revision: http://reviews.llvm.org/D4904

llvm-svn: 216169
2014-08-21 10:45:30 +00:00
Benjamin Kramer
c0369ec35d X86: Turn redundant if into an assertion.
While there remove noop casts.

llvm-svn: 216168
2014-08-21 10:31:37 +00:00
Robert Khasanov
84f9d2664b [x86] Added _addcarry_ and _subborrow_ intrinsics
llvm-svn: 216164
2014-08-21 09:43:43 +00:00
Robert Khasanov
350e87272b [x86] SMAP: added HasSMAP attribute for CLAC/STAC, corrected attributes
llvm-svn: 216163
2014-08-21 09:34:12 +00:00
Robert Khasanov
3a54da967f [x86] Broadwell: ADOX/ADCX. Added _addcarryx_u{32|64} intrinsics to LLVM.
llvm-svn: 216162
2014-08-21 09:27:00 +00:00
Robert Khasanov
741d0742f0 [x86] Enable Broadwell target.
Added FeatureSMAP.

Broadwell ISA includes Haswell ISA + ADX + RDSEED + SMAP

llvm-svn: 216161
2014-08-21 09:16:12 +00:00
Zinovy Nis
6b7238f3b4 [INDVARS] Extend using of widening of induction variables for the cases of "sub nsw" and "mul nsw" instructions.
Currently only "add nsw" are widened. This patch eliminates tons of "sext" instructions for 64 bit code (and the corresponding target code) in cases like:

int N = 100;
float **A;

void foo(int x0, int x1)
{
        float * A_cur = &A[0][0];
        float * A_next = &A[1][0];
        for(int x = x0; x < x1; ++x).
        {
          // Currently only [x+N] case is widened. Others 2 cases lead to sext.
          // This patch fixes it, so all 3 cases do not need sext.
          const float div = A_cur[x + N] + A_cur[x - N] + A_cur[x * N];
          A_next[x] = div;
        }
}
...
> clang++ test.cpp -march=core-avx2 -Ofast  -fno-unroll-loops -fno-tree-vectorize -S -o -

Differential Revision: http://reviews.llvm.org/D4695

llvm-svn: 216160
2014-08-21 08:25:45 +00:00
Elena Demikhovsky
511b2e1f89 IntelJITEventListener updates to fix breaks by recent changes to EngineBuilder and DIContext.
By Arch Robison.

llvm-svn: 216159
2014-08-21 07:01:55 +00:00
Craig Topper
65775cc03d Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size.
llvm-svn: 216158
2014-08-21 05:55:13 +00:00
David Majnemer
fb4e6230cf InstCombine: Fold ((A | B) & C1) ^ (B & C2) -> (A & C1) ^ B if C1^C2=-1
Adapted from a patch by Richard Smith, test-case written by me.

llvm-svn: 216157
2014-08-21 05:14:48 +00:00
Craig Topper
c80a14ad2f Remove custom implementations of max/min in StringRef that was originally added to work an old gcc bug. I believe its been fixed by now.
llvm-svn: 216156
2014-08-21 04:31:10 +00:00
Eric Fiselier
c5a9c84514 add self to credits
llvm-svn: 216155
2014-08-21 04:27:11 +00:00
Jiangning Liu
8819f1fe36 Fix a bug around truncating vector in const prop.
In constant folding stage, "TRUNC" can't handle vector data type.

llvm-svn: 216149
2014-08-21 02:12:35 +00:00
Jiangning Liu
c14e7de948 Revert r216066, "Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type".
llvm-svn: 216147
2014-08-21 01:59:30 +00:00
Quentin Colombet
be2a400039 [PeepholeOptimizer] Take advantage of the isInsertSubreg property in the
advanced copy optimization.

This is the final step patch toward transforming:
udiv    r0, r0, r2
udiv    r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov    r0, r1, d16
bx      lr

into:
udiv    r0, r0, r2
udiv    r1, r1, r3
bx      lr

Indeed, thanks to this patch, this optimization is able to look through
vmov.32 d16[0], r0
vmov.32 d16[1], r1

and is able to rewrite the following sequence:
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov    r0, r1, d16

into simple generic GPR copies that the coalescer managed to remove.

<rdar://problem/12702965>

llvm-svn: 216144
2014-08-21 00:19:16 +00:00
Quentin Colombet
fbc7f5c996 [ARM] Mark VSETLNi32 with the InsertSubreg property and implement the related
target hook.

This patch teaches the compiler that:
dX = VSETLNi32 dY, rZ, imm
is the same as:
dX = INSERT_SUBREG dY, rZ, translateImmToSubIdx(imm)

<rdar://problem/12702965>

llvm-svn: 216143
2014-08-21 00:10:52 +00:00
James Molloy
65aa6c84f5 [LoopVectorize] Up the maximum unroll factor to 4 for AArch64
Only for Cortex-A57 and Cyclone for now, where it has shown wins.

llvm-svn: 216141
2014-08-21 00:02:51 +00:00
James Molloy
3bb7b30942 [LoopVectorizer] Limit unroll factor in the presence of nested reductions.
If we have a scalar reduction, we can increase the critical path length if the loop we're unrolling is inside another loop. Limit, by default to 2, so the critical path only gets increased by one reduction operation.

llvm-svn: 216140
2014-08-20 23:53:52 +00:00
Quentin Colombet
1849edbdf6 Add isInsertSubreg property.
This patch adds a new property: isInsertSubreg and the related target hooks:
TargetIntrInfo::getInsertSubregInputs and
TargetInstrInfo::getInsertSubregLikeInputs to specify that a target specific
instruction is a (kind of) INSERT_SUBREG.

The approach is similar to r215394.

<rdar://problem/12702965>

llvm-svn: 216139
2014-08-20 23:49:36 +00:00
Jonathan Roelofs
63874dab8e Lower thumbv4t & thumbv5 lo->lo copies through a push-pop sequence
On pre-v6 hardware, 'MOV lo, lo' gives undefined results, so such copies need to
be avoided. This patch trades simplicity for implementation time at the expense
of performance... As they say: correctness first, then performance.

See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075998.html for a few
ideas on how to make this better.

llvm-svn: 216138
2014-08-20 23:38:50 +00:00