1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 03:23:01 +02:00
Commit Graph

108203 Commits

Author SHA1 Message Date
Chandler Carruth
327441cee3 [x86] Fix a few more tiny patterns with the new vector shuffle lowering
that keep cropping up in the regression test suite.

This also addresses one of the issues raised on the mailing list with
failing to form 'movsd' in as many cases as we realistically should.
There will be corresponding patches forthcoming for v4f32 at least. This
was a lot of fuss for a relatively small gain, but all the fuss was on
my end trying different ways of holding the pieces of the x86 fragment
patterns *just right*. Now that it works, the code is reasonably simple.

In the new test cases I'm adding here, v2i64 sticks out as just plain
horrible. I've not come up with any great ideas here other than that it
would be nice to recognize when we're *going* to take a domain crossing
hit and cross earlier to get the decent instructions. At least with AVX
it is slightly less silly....

llvm-svn: 218756
2014-10-01 11:14:02 +00:00
Chandler Carruth
52fe0fc691 [x86] Delete some extraneous logic from the new vector shuffle lowering.
Nothing was relying on this and there are potentially some edge cases
that it would not be correct under. Removing it seems better than trying
to "fix" it as nothing was relying on it.

llvm-svn: 218755
2014-10-01 11:13:57 +00:00
Tom Coxon
50ff005894 [AArch64] Allow access to all system registers with MRS/MSR instructions.
The A64 instruction set includes a generic register syntax for accessing
implementation-defined system registers. The syntax for these registers is:
    S<op0>_<op1>_<CRn>_<CRm>_<op2>

The encoding space permitted for implementation-defined system registers
is:
    op0 op1  CRn   CRm   op2
    11  xxx  1x11  xxxx  xxx

The full encoding space can now be accessed:
    op0 op1  CRn   CRm   op2
    xx  xxx  xxxx  xxxx  xxx

This is useful to anyone needing to write assembly code supporting new
system registers before the assembler has learned the official names for
them.

llvm-svn: 218753
2014-10-01 10:13:59 +00:00
Evgeniy Stepanov
07f2031151 Revert r218721, r218735.
Failing bootstrap on Linux (arm, x86).

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13139/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/470
http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/8518

llvm-svn: 218752
2014-10-01 10:07:28 +00:00
Asiri Rathnayake
0e12aa2ad9 Add missing natual vector cast.
Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST
was introduced in r217159 and r217138. This patch adds a missing cast from
v2f32 to v1i64 which is causing some compilation failures. Also added test
cases to cover various modimm types and BUILD_VECTORs with i64 elements.

llvm-svn: 218751
2014-10-01 09:59:45 +00:00
NAKAMURA Takumi
db6522181c ADTTests/OptionalTest.cpp: Use LLVM_DELETED_FUNCTION.
llvm-svn: 218750
2014-10-01 09:14:43 +00:00
Oliver Stannard
f4985610cd [ARM] Add support for Cortex-M7, FPv5-SP and FPv5-DP (LLVM)
The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and
FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be
modelled using the same target feature, and all double-precision
operations are already disabled by the fp-only-sp target features.

llvm-svn: 218747
2014-10-01 09:02:17 +00:00
Daniel Sanders
57fd199eaa [mips] Fix disassembly of [ls][wd]c[23], cache, and pref
Fixes PR21015, and PR20993.                                                       
                                                                                  
Patch by Jun Koi

llvm-svn: 218745
2014-10-01 08:26:55 +00:00
Sasa Stankovic
f65f1c04c4 [mips] For indirect calls we don't need $gp to point to .got. Mips linker
doesn't generate lazy binding stub for a function whose address is taken in
the program.

Differential Revision: http://reviews.llvm.org/D5067

llvm-svn: 218744
2014-10-01 08:22:21 +00:00
Justin Bogner
47571604c9 test: XFAIL the non-darwin gmlt test on darwin
r218702 disabled a -gmlt optimization for darwin, but this means the
non-darwin test isn't working there anymore.

llvm-svn: 218742
2014-10-01 05:45:45 +00:00
Lang Hames
61140d8e80 [MCJIT] Turn the getSymbolAddress free function created in r218626 into a static
member of RTDyldMemoryManager (and rename to getSymbolAddressInProcess).

The functionality this provides is very specific to RTDyldMemoryManager, so it
makes sense to keep it in that class to avoid accidental re-use.

No functional change.

llvm-svn: 218741
2014-10-01 04:11:13 +00:00
Nick Lewycky
abd4d05f7d Fix typo in comment from r218733
llvm-svn: 218739
2014-10-01 03:37:34 +00:00
Justin Bogner
4ceb3df28e InstrProf: Make coverage::Counter comparable
I'll be using this in a clang change very soon.

llvm-svn: 218736
2014-10-01 03:31:58 +00:00
Gerolf Hoflehner
a5e5769ed3 [InstCombine] Fix for assert build failures caused by r218721
The icmp-select-icmp optimization made the implicit assumption
that the select-icmp instructions are in the same block and asserted on it.
The fix explicitly checks for that condition and conservatively suppresses
the optimization when it is violated.

llvm-svn: 218735
2014-10-01 03:24:39 +00:00
Chandler Carruth
6ffdf68855 [x86] Teach the new vector shuffle lowering to be even more aggressive
in exposing the scalar value to the broadcast DAG fragment so that we
can catch even reloads and fold them into the broadcast.

This is somewhat magical I'm afraid but seems to work. It is also what
the old lowering did, and I've switched an old test to run both
lowerings demonstrating that we get the same result.

Unlike the old code, I'm not lowering f32 or f64 scalars through this
path when we only have AVX1. The target patterns include pretty heinous
code to re-cast those as shuffles when the scalar happens to not be
spilled because AVX1 provides no broadcast mechanism from registers
what-so-ever. This is terribly brittle. I'd much rather go through our
generic lowering code to get this. If needed, we can add a peephole to
get even more opportunities to broadcast-from-spill-slots that are
exposed post-RA, but my suspicion is this just doesn't matter that much.

llvm-svn: 218734
2014-10-01 03:19:43 +00:00
Chandler Carruth
82405d868b [x86] Hoist the zext-lowering up in the v4i32 lowering routine -- it is
the same speed as pshufd but we can fold loads into the pmovzx
instructions.

This fixes some regressions that came up in the regression test suite
for the new vector shuffle lowering.

llvm-svn: 218733
2014-10-01 02:25:54 +00:00
Jordan Rose
9f866eaf0a Add an emplace(...) method to llvm::Optional<T>.
This can be used for in-place initialization of non-moveable types.
For compilers that don't support variadic templates, only up to four
arguments are supported. We can always add more, of course, but this
should be good enough until we move to a later MSVC that has full
support for variadic templates.

Inspired by std::experimental::optional from the "Library Fundamentals" C++ TS.
Reviewed by David Blaikie.

llvm-svn: 218732
2014-10-01 02:12:35 +00:00
David Blaikie
fe64c97aaa Implement DW_TAG_subrange_type with DW_AT_count rather than DW_AT_upper_bound
This allows proper disambiguation of unbounded arrays and arrays of zero
bound ("struct foo { int x[]; };" and "struct foo { int x[0]; }"). GCC
instead produces an upper bound of -1 in the latter situation, but count
seems tidier. This way lower_bound is provided if it's not the language
default and count is provided if the count is known, otherwise it's
omitted. Simple.

If someone wants to look at rdar://problem/12566646 and see if this
change is acceptable to that bug/fix, that might be helpful (see the
empty-and-one-elem-array.ll test case which cites that radar).

llvm-svn: 218726
2014-10-01 00:56:55 +00:00
Adam Nemet
d7923b8d25 [AVX512] Remove space before \t in AsmStrings.
llvm-svn: 218725
2014-10-01 00:41:32 +00:00
Chandler Carruth
ed2c9efc13 [x86] Teach the new vector shuffle lowering about VBROADCAST and
VPBROADCAST.

This has the somewhat expected pervasive impact. I don't know why
I forgot about this. Everything seems good with lots of significant
improvements in the tests.

llvm-svn: 218724
2014-10-01 00:41:21 +00:00
NAKAMURA Takumi
79112cc68c llvm-cov/CoverageReport.cpp: Quick fix for msvcrt, since width specifier "z" is unavailable.
Note, mingw uses its own printf instead of msvcrt.

llvm-svn: 218723
2014-10-01 00:29:26 +00:00
NAKAMURA Takumi
64893e0425 llvm/test/DebugInfo/X86/gmlt.test: Get rid of %llc_dwarf. It should not be used with -mtriple.
Also, remove object-emission. test/DebugInfo/X86 doesn't require it.

llvm-svn: 218722
2014-10-01 00:29:16 +00:00
Gerolf Hoflehner
6bbc18ceb7 [InstCombine] Optimize icmp-select-icmp
In special cases select instructions can be eliminated by
replacing them with a cheaper bitwise operation even when the
select result is used outside its home block. The instances implemented
are patterns like
    %x=icmp.eq
    %y=select %x,%r, null
    %z=icmp.eq|neq %y, null
    br %z,true, false
==> %x=icmp.ne
    %y=icmp.eq %r,null
    %z=or %x,%y
    br %z,true,false
The optimization is integrated into the instruction
combiner and performed only when all uses of the select result can
be replaced by the select operand proper. For this dominator information
is used and dominance is now a required analysis pass in the combiner.
The optimization itself is iterative. The critical step is to replace the
select result with the non-constant select operand. So the select becomes
local and the combiner iteratively works out simpler code pattern and
eventually eliminates the select.

rdar://17853760

llvm-svn: 218721
2014-10-01 00:13:22 +00:00
David Blaikie
5720297fff Omit DW_AT_inline under -gmlt to save a little more space.
llvm-svn: 218719
2014-09-30 23:29:16 +00:00
Hal Finkel
44b0224eae [BasicAA] Make better use of zext and sign information
Two related things:

 1. Fixes a bug when calculating the offset in GetLinearExpression. The code
    previously used zext to extend the offset, so negative offsets were converted
    to large positive ones.

 2. Enhance aliasGEP to deduce that, if the difference between two GEP
    allocations is positive and all the variables that govern the offset are also
    positive (i.e. the offset is strictly after the higher base pointer), then
    locations that fit in the gap between the two base pointers are NoAlias.

Patch by Nick White!

llvm-svn: 218714
2014-09-30 22:43:40 +00:00
David Blaikie
0fda7951bb DebugInfo: Sink the code emitting DW_AT_APPLE_omit_frame_ptr down to a more common spot.
No functional change. Pre-emptive refactoring before I start pushing
some of this subprogram creation down into DWARFCompileUnit so I can
build different subprograms in the skeleton unit from the dwo unit for
adding -gmlt-like data to the skeleton.

llvm-svn: 218713
2014-09-30 22:32:49 +00:00
Hans Wennborg
d9356b02c3 MSBuild integration: fix the loop in install.bat
It would previously not continue the platforms loop
unless it could find the latest toolset directory.

llvm-svn: 218712
2014-09-30 22:30:06 +00:00
Jingyue Wu
908e4ab376 [SimplifyCFG] threshold for folding branches with common destination
Summary:
This patch adds a threshold that controls the number of bonus instructions
allowed for folding branches with common destination. The original code allows
at most one bonus instruction. With this patch, users can customize the
threshold to allow multiple bonus instructions. The default threshold is still
1, so that the code behaves the same as before when users do not specify this
threshold.

The motivation of this change is that tuning this threshold significantly (up
to 25%) improves the performance of some CUDA programs in our internal code
base. In general, branch instructions are very expensive for GPU programs.
Therefore, it is sometimes worth trading more arithmetic computation for a more
straightened control flow. Here's a reduced example:

  __global__ void foo(int a, int b, int c, int d, int e, int n,
                      const int *input, int *output) {
    int sum = 0;
    for (int i = 0; i < n; ++i)
      sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i];
    *output = sum;
  }

The select statement in the loop body translates to two branch instructions "if
((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination.
With the default threshold, SimplifyCFG is unable to fold them, because
computing the condition of the second branch "(i | c) ^ d > e" requires two
bonus instructions. With the threshold increased, SimplifyCFG can fold the two
branches so that the loop body contains only one branch, making the code
conceptually look like:

  sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i];

Increasing the threshold significantly improves the performance of this
particular example. In the configuration where both conditions are guaranteed
to be true, increasing the threshold from 1 to 2 improves the performance by
18.24%. Even in the configuration where the first condition is false and the
second condition is true, which favors shortcuts, increasing the threshold from
1 to 2 still improves the performance by 4.35%.

We are still looking for a good threshold and maybe a better cost model than
just counting the number of bonus instructions. However, according to the above
numbers, we think it is at least worth adding a threshold to enable more
experiments and tuning. Let me know what you think. Thanks!

Test Plan: Added one test case to check the threshold is in effect

Reviewers: nadav, eliben, meheff, resistor, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D5529

llvm-svn: 218711
2014-09-30 22:23:38 +00:00
Chandler Carruth
5a27308c76 [x86] Add AVX1 and AVX2 testing to all of the 128-bit shuffle test
cases.

While clearly we don't need the AVX vector width, these ISA extensions
often cause us to select different instructions and we should cover them
even with the narrow vector width.

Also, while here, nuke the stress_test2 contents. There is no reason to
try to FileCheck this entire body when it is mostly a test for
successfully surviving the code generator.

llvm-svn: 218710
2014-09-30 22:16:23 +00:00
Chandler Carruth
9a6f4aeb91 [x86] Update the exact FileCheck syntax of the 256-bit and 512-bit
shuffle tests to match that used in the script I posted and now used
consistently in 128-bit tests.

Nothing interesting changing here, just using the label name as the
FileCheck label and a slightly more general comment marker consumption
strategy.

llvm-svn: 218709
2014-09-30 22:04:45 +00:00
David Blaikie
d848d2d5db Adjust test case addition in r218702 so as not to fail when the X86 target isn't built.
llvm-svn: 218708
2014-09-30 22:02:27 +00:00
Chandler Carruth
4f910094ef [x86] Rework all of the 128-bit vector shuffle tests with my handy test
updating script so that they are more thorough and consistent.

Specific fixes here include:
- Actually test VEX-encoded AVX mnemonics.
- Actually use an SSE 4.1 run to test SSE 4.1 features!
- Correctly check instructions sequences from the start of the function.
- Elide the shuffle operands and comment designator in a consistent way.
- Test all of the architectures instead of just the ones I was motivated
  to manually author.

I've gone back through and fixed up any egregious issues I spotted. Let
me know if I missed something you really dislike.

One downside to this is that we're now not as diligently using FileCheck
variables for registers. I would be much more concerned with this if we
had larger register usage, but there just aren't that interesting of
register choices here and most of the registers are constrained by the
ABI. Ultimately, I don't think this is likely to be the maintenance
burden for these tests and updating them again should be staright
forward.

llvm-svn: 218707
2014-09-30 21:44:34 +00:00
David Blaikie
c3f2904ef4 Disable the -gmlt optimization implemented in r218129 under Darwin due to issues with dsymutil.
r218129 omits DW_TAG_subprograms which have no inlined subroutines when
emitting -gmlt data. This makes -gmlt very low cost for -O0 builds.

Darwin's dsymutil reasonably considers a CU empty if it has no
subprograms (which occurs with the above optimization in -O0 programs
without any force_inline function calls) and drops the line table, CU,
and everything in this situation, making backtraces impossible.

Until dsymutil is modified to account for this, disable this
optimization on Darwin to preserve the desired functionality.
(see r218545, which should be reverted after this patch, for other
discussion/details)

Footnote:
In the long term, it doesn't look like this scheme (of simplified debug
info to describe inlining to enable backtracing) is tenable, it is far
too size inefficient for optimized code (the DW_TAG_inlined_subprograms,
even once compressed, are nearly twice as large as the line table
itself (also compressed)) and we'll be considering things like Cary's
two level line table proposal to encode all this information directly in
the line table.

llvm-svn: 218702
2014-09-30 21:28:32 +00:00
Sanjay Patel
abcb3acee5 Use the target-specified iteration count to opt out of any further refinement of an estimate. NFC.
llvm-svn: 218700
2014-09-30 20:44:23 +00:00
Sanjay Patel
c095454ca2 Split the estimate() interface into separate functions for each type. NFC.
It was hacky to use an opcode as a switch because it won't always match
(rsqrte != sqrte), and it looks like we'll need to add more special casing
per arch than I had hoped for. Eg, x86 will prefer a different NR estimate
implementation. ARM will want to use it's 'step' instructions. There also
don't appear to be any new estimate instructions in any arch in a long,
long time. Altivec vloge and vexpte may have been the first and last in
that field...

llvm-svn: 218698
2014-09-30 20:28:48 +00:00
Juergen Ributzka
4b2325a925 Recommit r218010 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ.
Note: This version fixed an issue with the TBZ/TBNZ instructions that were
generated in FastISel. The issue was that the 64bit version of TBZ (TBZX)
automagically sets the upper bit of the immediate field that is used to specify
the bit we want to test. To test for any of the lower 32bits we have to first
extract the subregister and use the 32bit version of the TBZ instruction (TBZW).

Original commit message:
Teach selectBranch to fold bit test and branch into a single instruction (TBZ or
TBNZ).

llvm-svn: 218693
2014-09-30 19:59:35 +00:00
Matt Arsenault
7bc4ebe27d R600/SI: Fix printing of clamp and omod
No tests for omod since nothing uses it yet, but
this should get rid of the remaining annoying trailing
zeros after some instructions.

llvm-svn: 218692
2014-09-30 19:49:48 +00:00
Matt Arsenault
ace0d8c9eb R600/SI: Update VOP3b to not include obsolete operands
abs / neg are now part of the srcN_modifiers operands

llvm-svn: 218691
2014-09-30 19:49:43 +00:00
Bradley Smith
4388d0e70f Extend C disassembler API to allow specifying target features
llvm-svn: 218682
2014-09-30 16:31:40 +00:00
Reed Kotler
b2d48eb5a0 Add numeric extend, trunctate to mips fast-isel
Summary:
 Add numeric extend, trunctate to mips fast-isel

 Reactivates D4827



Test Plan:
fpext.ll
loadstoreconv.ll

Reviewers: dsanders

Subscribers: mcrosier

Differential Revision: http://reviews.llvm.org/D5251

llvm-svn: 218681
2014-09-30 16:30:13 +00:00
Tom Coxon
856ab42e33 [AArch64] Remove unnecessary whitespace. (Test commit)
llvm-svn: 218680
2014-09-30 16:23:16 +00:00
Andrea Di Biagio
a5f619dff6 [DAG] Check in advance if a build_vector has a legal type before attempting to convert it into a shuffle.
Currently, the DAG Combiner only tries to convert type-legal build_vector nodes
into shuffles. This patch simply moves the logic that checks if a
build_vector has a legal value type up before we even start analyzing the
operands. This allows to early exit immediately from method
'visitBUILD_VECTOR' if the node type is known to be illegal.

No functional change intended.

llvm-svn: 218677
2014-09-30 15:30:22 +00:00
Alex Lorenz
5ffa87b1e0 Revert r218673 'llvm-cov: add test for report's function & file association.'
Test causes buildbot failures.

llvm-svn: 218676
2014-09-30 14:48:12 +00:00
Alex Lorenz
7197889d8a llvm-cov: add test for report's function & file association.
This commit adds a test which checks that the functions defined in header files will get associated with the header files rather than the source files in the reports.

Differential Revision: http://reviews.llvm.org/D5489

llvm-svn: 218673
2014-09-30 12:52:31 +00:00
Alex Lorenz
5990711e78 llvm-cov: Use the number of executed functions for the function coverage metric.
This commit fixes llvm-cov's function coverage metric by using the number of executed functions instead of the number of fully covered functions.

Differential Revision: http://reviews.llvm.org/D5196

llvm-svn: 218672
2014-09-30 12:45:13 +00:00
Lorenzo Martignoni
821c07a6f8 Introduce support for custom wrappers for vararg functions.
Differential Revision: http://reviews.llvm.org/D5412

llvm-svn: 218671
2014-09-30 12:33:16 +00:00
Robert Khasanov
f86e0f1129 [AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VCMPGT{BWDQ}.
Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com>

llvm-svn: 218670
2014-09-30 12:15:52 +00:00
Robert Khasanov
61a1201dfa [AVX512] Added intrinsics for 128- and 256-bit versions of VCMPEQ{BWDQ}
Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1.
Now cmp intrinsics lower in the following way:
 (i8 (int_x86_avx512_mask_pcmpeq_q_128
             (v2i64 %a), (v2i64 %b), (i8 %mask))) ->
 (i8 (bitcast
   (v8i1 (insert_subvector undef,
           (v2i1 (and (PCMPEQM %a, %b),
                      (extract_subvector
                         (v8i1 (bitcast %mask)), 0))), 0))))

llvm-svn: 218669
2014-09-30 11:41:54 +00:00
Robert Khasanov
2972b6033d [AVX512] Added intrinsics for VPCMPEQB and VPCMPEQW.
Added new operand type for intrinsics (IIT_V64)

llvm-svn: 218668
2014-09-30 11:32:22 +00:00
Robert Khasanov
06de3de89a [AVX512] Enabled intrinsics for VPCMPEQD and VPCMPEQQ.
Added CMP_MASK intrinsic type

llvm-svn: 218667
2014-09-30 11:19:50 +00:00