1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 12:02:58 +02:00
Commit Graph

17922 Commits

Author SHA1 Message Date
Chris Lattner
c9303c920d Fix LICM's memory promotion optimization to preserve TBAA tags when
promoting a store in a loop.  This was noticed when working on PR14753,
but isn't directly related.

llvm-svn: 171281
2012-12-31 08:37:17 +00:00
Chris Lattner
aea9f04075 teach instcombine to preserve TBAA tag when merging two stores, part of
PR14753

llvm-svn: 171279
2012-12-31 08:10:58 +00:00
Jakub Staszak
000956e03b Transform (A == C1 || A == C2) into (A & ~(C1 ^ C2)) == C1
if C1 and C2 differ only with one bit.
Fixes PR14708.

llvm-svn: 171270
2012-12-31 00:34:55 +00:00
Hal Finkel
3bc1b07a1b Support ppcf128 in SelectionDAG::getConstantFP
Fixes pr14751.

Patch by Kai; Thanks!

llvm-svn: 171261
2012-12-30 19:03:32 +00:00
Nadav Rotem
0e391907a5 LoopVectorizer: Fix a bug in the code that updates the loop exiting block.
LCSSA PHIs may have undef values. The vectorizer updates values that are used by outside users such as PHIs.
The bug happened because undefs are not loop values. This patch handles these PHIs.

PR14725

llvm-svn: 171251
2012-12-30 07:47:00 +00:00
Dmitri Gribenko
968fc2e59b Tests: rewrite 'opt ... %s' to 'opt ... < %s' so that opt does not emit a ModuleID
This is done to avoid odd test failures, like the one fixed in r171243.

llvm-svn: 171250
2012-12-30 02:33:22 +00:00
Dmitri Gribenko
abbc99565a Add a check to the test Analysis/ScalarEvolution/2010-09-03-RequiredTransitive.ll
This test did not test anything at all (except for opt crashing, but that was
not the reason why it was added).

llvm-svn: 171248
2012-12-30 01:42:34 +00:00
Dmitri Gribenko
e3769d450b Tests: rewrite 'opt ... %s' to 'opt ... < %s' so that opt does not emit a ModuleID
This is done to avoid odd test failures, like the one fixed in r171243.

llvm-svn: 171246
2012-12-30 01:28:40 +00:00
NAKAMURA Takumi
77a7dbf266 llvm/test/Transforms/GVN/null-aliases-nothing.ll: Fix a RUN line not to emit ModuleID.
Larry Evans reported it fails if source tree contains "load", like "download".

llvm-svn: 171243
2012-12-30 00:33:26 +00:00
Chandler Carruth
f3295eafa5 Fix a stunning oversight in the inline cost analysis. It was never
propagating one of the values it simplified to a constant across
a myriad of instructions. Notably, ptrtoint instructions when we had
a constant pointer (say, 0) didn't propagate that, blocking a massive
number of down-stream optimizations.

This was uncovered when investigating why we fail to inline and delete
the boilerplate in:

  void f() {
    std::vector<int> v;
    v.push_back(1);
  }

It turns out most of the efforts I've made thus far to improve the
analysis weren't making it far purely because of this. After this is
fixed, the store-to-load forwarding patch enables LLVM to optimize the
above to an empty function. We still can't nuke a second push_back, but
for different reasons.

There is a very real chance this will cause somewhat noticable changes
in inlining behavior, so please let me know if you see regressions (or
improvements!) because of this patch.

llvm-svn: 171196
2012-12-28 14:43:42 +00:00
Chandler Carruth
8f719ce3e5 Teach the inline cost analysis about calls that can be simplified and
how to propagate constants through insert and extract value
instructions.

With the recent improvements to instsimplify, this allows inline cost
analysis to constant fold through intrinsic functions, including notably
the with.overflow intrinsic math routines which often show up inside of
STL abstractions. This is yet another piece in the puzzle of breaking
down the code for:

  void f() {
    std::vector<int> v;
    v.push_back(1);
  }

But it still isn't enough. There are a pile of bugs in inline cost still
blocking this.

llvm-svn: 171195
2012-12-28 14:23:32 +00:00
Chandler Carruth
93a470549b Teach instsimplify to use the constant folder where appropriate for
constant folding calls. Add the initial tests for this which show that
now instsimplify can simplify blindingly obvious code patterns expressed
with both intrinsics and library calls.

llvm-svn: 171194
2012-12-28 14:23:29 +00:00
Nadav Rotem
8f2be23305 AVX: Move the ZEXT/ANYEXT DAGCo optimizations to the lowering of these optimizations. The old test cases still cover all of these lowering/optimizations. The single change that we have is that now anyext does not need to zero a register, because it does not use the exact code path as the zero_extend.
llvm-svn: 171178
2012-12-28 05:45:24 +00:00
Alexey Samsonov
b3bcf27f5a [ASan] Fix lifetime intrinsics handling. Now for each intrinsic we check if it describes one of 'interesting' allocas. Assume that allocas can go through casts and phi-nodes before apperaring as llvm.lifetime arguments
llvm-svn: 171153
2012-12-27 08:50:58 +00:00
Nadav Rotem
2f6e04c7be On AVX/AVX2 the type v8i1 is legalized to v8i16, which is an XMM sized
register. In most cases we actually compare or select YMM-sized registers
and mixing the two types creates horrible code. This commit optimizes
some of the transition sequences.

PR14657.

llvm-svn: 171148
2012-12-27 08:15:45 +00:00
Eric Christopher
7096eb5a5c For the dwarf5 split debug info code split out the string section
per compile unit/skeleton compile unit. Update tests accordingly.

llvm-svn: 171133
2012-12-27 02:14:01 +00:00
Eric Christopher
eb6482f385 FileCheck-ize.
llvm-svn: 171132
2012-12-27 02:13:58 +00:00
Eric Christopher
4786ae2a5f FileCheck-ize.
llvm-svn: 171131
2012-12-27 02:13:55 +00:00
Eric Christopher
c6fa1e95a2 Right now all of the relocations are 32-bit dwarf, and the relocation
information doesn't return an addend for Rel relocations. Go ahead
and use this information to fix relocation handling inside dwarfdump
for 32-bit ELF REL.

llvm-svn: 171126
2012-12-27 01:07:07 +00:00
Nadav Rotem
4cad811734 If all of the write objects are identified then we can vectorize the loop even if the read objects are unidentified.
PR14719.

llvm-svn: 171124
2012-12-26 23:30:53 +00:00
Nadav Rotem
90712b89cc LoopVectorizer: Optimize the vectorization of consecutive memory access when the iteration step is -1
llvm-svn: 171114
2012-12-26 19:08:17 +00:00
Evgeniy Stepanov
3c52fb6e43 [msan] Raise alignment of origin stores/loads when possible.
Origin alignment is as high as the alignment of the corresponding application
location, but never less than 4.

llvm-svn: 171110
2012-12-26 11:55:09 +00:00
NAKAMURA Takumi
88036873a3 llvm/test/CodeGen/X86: FileCheck-ize two tests in r171083.
llvm-svn: 171084
2012-12-26 03:19:30 +00:00
NAKAMURA Takumi
faa91faa15 llvm/test/CodeGen/X86: Disable avx in two tests corresponding to r171082.
llvm-svn: 171083
2012-12-26 03:08:55 +00:00
Hal Finkel
7cfc448749 BBVectorize: Use VTTI to compute costs for intrinsics vectorization
For the time being this includes only some dummy test cases. Once the
generic implementation of the intrinsics cost function does something other
than assuming scalarization in all cases, or some target specializes the
interface, some real test cases can be added.

Also, for consistency, I changed the type of IID from unsigned to Intrinsic::ID
in a few other places.

llvm-svn: 171079
2012-12-26 01:36:57 +00:00
Hal Finkel
f9b3cb9121 LoopVectorize: Enable vectorization of the fmuladd intrinsic
llvm-svn: 171076
2012-12-25 23:21:29 +00:00
Hal Finkel
8299a9e0b2 BBVectorize: Enable vectorization of the fmuladd intrinsic
llvm-svn: 171075
2012-12-25 22:36:08 +00:00
Hal Finkel
62efc81644 Loosen scheduling restrictions on the PPC dcbt intrinsic
As with the prefetch intrinsic to which it maps, simply have dcbt
marked as reading from and writing to its arguments instead of having
unmodeled side effects. While this might cause unwanted code motion
(because aliasing checks don't really capture cache-line sharing),
it is more important that prefetches in unrolled loops don't block
the scheduler from rearranging the unrolled loop body.

llvm-svn: 171073
2012-12-25 18:51:18 +00:00
Hal Finkel
6b98f1baa2 Expand PPC64 atomic load and store
Use of store or load with the atomic specifier on 64-bit types would
cause instruction-selection failures. As with the 32-bit case, these
can use the default expansion in terms of cmp-and-swap.

llvm-svn: 171072
2012-12-25 17:22:53 +00:00
Evgeniy Stepanov
f41a8d635d [msan] Fix handling of vectors of pointers.
VectorType::getInteger() can not be used with them, because pointer size
depends on the target.

llvm-svn: 171070
2012-12-25 16:04:38 +00:00
Evgeniy Stepanov
e7fdcc5fe1 [msan] Fix handling of select with vector condition.
llvm-svn: 171069
2012-12-25 14:56:21 +00:00
Benjamin Kramer
8827e5db0d Harden test so it's not affected by changes to compare lowering.
This only failed on hosts that don't have SSE41.

llvm-svn: 171066
2012-12-25 13:23:23 +00:00
Benjamin Kramer
e4eabce005 X86: Shave off one shuffle from the pcmpeqq sequence for SSE2 by making use of and commutativity.
llvm-svn: 171064
2012-12-25 13:09:08 +00:00
Benjamin Kramer
2e313ba01d X86: Custom lower <2 x i64> eq and ne when SSE41 is not available.
pcmpeqd, pshufd, pshufd, pand is cheaper than unpack + cmpq, sbbq, cmpq, sbbq + pack.
Small speedup on loop-vectorized viterbi (-march=core2).

llvm-svn: 171063
2012-12-25 12:54:19 +00:00
Nick Lewycky
56ef0e9560 Fix typo "Makre" -> "Make".
llvm-svn: 171043
2012-12-24 19:55:47 +00:00
NAKAMURA Takumi
d92f25d49d llvm/test/CodeGen/X86/fold-vex.ll: Add explicit triple.
llvm-svn: 171029
2012-12-24 11:14:06 +00:00
Nadav Rotem
37de0ccb55 Some x86 instructions can load/store one of the operands to memory. On SSE, this memory needs to be aligned.
When these instructions are encoded in VEX (on AVX) there is no such requirement. This changes the folding
tables and removes the alignment restrictions from VEX-encoded instructions.

llvm-svn: 171024
2012-12-24 09:40:33 +00:00
Nadav Rotem
ace51e510e LoopVectorizer: When checking for vectorizable types, also check
the StoreInst operands.

PR14705.

llvm-svn: 171023
2012-12-24 09:14:18 +00:00
Nadav Rotem
309d628c4f LoopVectorizer: Fix an endless loop in the code that looks for reductions.
The bug was in the code that detects PHIs in if-then-else block sequence.

PR14701.

llvm-svn: 171008
2012-12-24 01:22:06 +00:00
Nadav Rotem
fb56b5fe2e CostModel: Change the default target-independent implementation for finding
the cost of arithmetic functions. We now assume that the cost of arithmetic
operations that are marked as Legal or Promote is low, but ops that are
marked as custom are higher.

llvm-svn: 171002
2012-12-23 17:31:23 +00:00
Nadav Rotem
479c7fc4de We are not ready to estimate the cost of integer expansions based on the number of parts. This test is too noisy.
llvm-svn: 170999
2012-12-23 09:11:07 +00:00
Nadav Rotem
e237376e62 Loop Vectorizer: Update the cost model of scatter/gather operations and make
them more expensive.

llvm-svn: 170995
2012-12-23 07:23:55 +00:00
Benjamin Kramer
52d36d0ccd X86: Turn mul of <4 x i32> into pmuludq when no SSE4.1 is available.
pmuludq is slow, but it turns out that all the unpacking and packing of the
scalarized mul is even slower. 10% speedup on loop-vectorized paq8p.

llvm-svn: 170985
2012-12-22 16:07:56 +00:00
Benjamin Kramer
a9cc52a6b3 X86: Emit vector sext as shuffle + sra if vpmovsx is not available.
Also loosen the SSSE3 dependency a bit, expanded pshufb + psra is still better
than scalarized loads. Fixes PR14590.

llvm-svn: 170984
2012-12-22 11:34:28 +00:00
Nadav Rotem
df77ac1ba8 In some cases, due to scheduling constraints we copy the EFLAGS.
The only way to read the eflags is using push and pop. If we don't
adjust the stack then we run over the first frame index. This is
not something that we want to do, so we have to make sure that
our machine function does not copy the flags. If it does then
we have to emit the prolog that adjusts the stack.

rdar://12896831

llvm-svn: 170961
2012-12-21 23:48:49 +00:00
Akira Hatanaka
4b2123dd9c [mips] Fix encoding of BAL instruction. Also, fix assembler test case which
was not catching the error.

llvm-svn: 170953
2012-12-21 23:13:59 +00:00
Benjamin Kramer
64cd3cc5b3 try to unbreak ppc buildbots.
llvm-svn: 170913
2012-12-21 18:11:45 +00:00
Benjamin Kramer
9297ff5d80 X86: Match pmin/pmax as a target specific dag combine. This occurs during vectorization.
Part of PR14667.

llvm-svn: 170908
2012-12-21 17:46:58 +00:00
Tom Stellard
19e8c4abd6 R600: Expand vec4 INT <-> FP conversions
llvm-svn: 170901
2012-12-21 16:33:24 +00:00
Evgeniy Stepanov
ea2d72253e [msan] Remove unreachable blocks before instrumenting a function.
llvm-svn: 170883
2012-12-21 11:18:49 +00:00