1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00
Commit Graph

3427 Commits

Author SHA1 Message Date
Nadav Rotem
adb4fb5903 Change the cpu type in the test.
llvm-svn: 172963
2013-01-20 05:20:56 +00:00
Benjamin Kramer
28c812f680 LoopVectorizer: Emit memory checks into their own basic block.
This separates the check for "too few elements to run the vector loop" from the
"memory overlap" check, giving a lot nicer code and allowing to skip the memory
checks when we're not going to execute the vector code anyways. We still leave
the decision of whether to emit the memory checks as branches or setccs, but it
seems to be doing a good job. If ugly code pops up we may want to emit them as
separate blocks too. Small speedup on MultiSource/Benchmarks/MallocBench/espresso.

Most of this is legwork to allow multiple bypass blocks while updating PHIs,
dominators and loop info.

llvm-svn: 172902
2013-01-19 13:57:58 +00:00
Bill Wendling
9449705327 Reverting r171325 & r172363. This was causing a mis-compile on the self-hosted LTO build bots.
Okay, here's how to reproduce the problem:

1) Build a Release (or Release+Asserts) version of clang in the normal way.

2) Using the clang & clang++ binaries from (1), build a Release (or
   Release+Asserts) version of the same sources, but this time enable LTO ---
   specify the `-flto' flag on the command line.

3) Run the ARC migrator tests:

    $ arcmt-test --args -triple x86_64-apple-darwin10 -fsyntax-only -x objective-c++ ./src/tools/clang/test/ARCMT/cxx-rewrite.mm

You'll see that the output isn't correct (the whitespace is off).

The mis-compile is in the function `RewriteBuffer::RemoveText' in the
clang/lib/Rewrite/Core/Rewriter.cpp file. When that function and RewriteRope.cpp
are compiled with LTO and the `arcmt-test' executable is regenerated, you'll see
the error. When those files are not LTO'ed, then the output of the `arcmt-test'
is fine.

It is *really* hard to get a testcase out of this. I'll file a PR with what I
have currently.

--- Reverse-merging r172363 into '.':
U    include/llvm/Analysis/MemoryBuiltins.h
U    lib/Analysis/MemoryBuiltins.cpp

--- Reverse-merging r171325 into '.':
U    test/Transforms/InstCombine/objsize.ll
G    include/llvm/Analysis/MemoryBuiltins.h
G    lib/Analysis/MemoryBuiltins.cpp

llvm-svn: 172756
2013-01-17 21:28:46 +00:00
Michael Gottesman
fd06af4ad6 Added test for r172599 which fixes bugzilla://14584,rdar://11744105.
llvm-svn: 172656
2013-01-16 21:07:18 +00:00
Benjamin Kramer
efb9c9f20b Move test that depends on the x86 target into a target-specific directory.
Should fix the arm buildbot (which only builds the arm target).

llvm-svn: 172611
2013-01-16 13:25:56 +00:00
Benjamin Kramer
5c109df93a Remove triple from this test, it makes it fail when X86 TTI is missing.
Without a triple opt falls back to NoTTI which comes closer to LSR's pre-TTI behavior.

llvm-svn: 172609
2013-01-16 13:19:59 +00:00
Nadav Rotem
24c96fa80c Teach InstCombine to optimize extract of a value from a vector add operation with a constant zero.
llvm-svn: 172576
2013-01-15 23:43:14 +00:00
Shuxin Yang
36c3cb1b3b 1. Hoist minus sign as high as possible in an attempt to reveal
some optimization opportunities (in the enclosing supper-expressions).

   rule 1. (-0.0 - X ) * Y => -0.0 - (X * Y)
     if expression "-0.0 - X" has only one reference.

   rule 2. (0.0 - X ) * Y => -0.0 - (X * Y)
     if expression "0.0 - X" has only one reference, and
        the instruction is marked "noSignedZero".

2. Eliminate negation (The compiler was already able to handle these
    opt if the 0.0s are replaced with -0.0.)

   rule 3: (0.0 - X) * (0.0 - Y) => X * Y
   rule 4: (0.0 - X) * C => X * -C
   if the expr is flagged "noSignedZero".

3. 
  Rule 5: (X*Y) * X => (X*X) * Y
   if X!=Y and the expression is flagged with "UnsafeAlgebra".

   The purpose of this transformation is two-fold:
    a) to form a power expression (of X).
    b) potentially shorten the critical path: After transformation, the
       latency of the instruction Y is amortized by the expression of X*X,
       and therefore Y is in a "less critical" position compared to what it
      was before the transformation. 

4. Remove the InstCombine code about simplifiying "X * select".
   
   The reasons are following:
    a) The "select" is somewhat architecture-dependent, therefore the
       higher level optimizers are not able to precisely predict if
       the simplification really yields any performance improvement
       or not.

    b) The "select" operator is bit complicate, and tends to obscure
       optimization opportunities. It is btter to keep it as low as
       possible in expr tree, and let CodeGen to tackle the optimization.

llvm-svn: 172551
2013-01-15 21:09:32 +00:00
Renato Golin
af8f3454c8 Pattern-matched variables in post-inc-icmpzero.ll
Test was failing for clang-native-arm-cortex-a9 build-bot configuration.
The reason for the failure was the test was using hardcoded names.
The attached patch fixes this failure by replacing the hard-coded variables
names with pattern-matched variable names.

Patch by Manish Verma, ARM

llvm-svn: 172534
2013-01-15 15:22:45 +00:00
Shuxin Yang
974390c4aa This change is to implement following rules under the condition C_A and/or C_R
---------------------------------------------------------------------------
 C_A: reassociation is allowed
 C_R: reciprocal of a constant C is appropriate, which means 
    - 1/C is exact, or 
    - reciprocal is allowed and 1/C is neither a special value nor a denormal.
 -----------------------------------------------------------------------------

 rule1:  (X/C1) / C2 => X / (C2*C1)  (if C_A)
                     => X * (1/(C2*C1))  (if C_A && C_R)
 rule 2:  X*C1 / C2 => X * (C1/C2)  if C_A
 rule 3: (X/Y)/Z = > X/(Y*Z)  (if C_A && at least one of Y and Z is symbolic value)
 rule 4: Z/(X/Y) = > (Z*Y)/X  (similar to rule3)

 rule 5: C1/(X*C2) => (C1/C2) / X (if C_A)
 rule 6: C1/(X/C2) => (C1*C2) / X (if C_A)
 rule 7: C1/(C2/X) => (C1/C2) * X (if C_A)

llvm-svn: 172488
2013-01-14 22:48:41 +00:00
Andrew Trick
841ad0f303 SCEVExpander fix. RAUW needs to update the InsertedExpressions cache.
Note that this bug is only exposed because LTO fails to use TTI.

Fixes self-LTO of clang. rdar://13007381.

llvm-svn: 172462
2013-01-14 21:00:37 +00:00
Michael Gottesman
39de946871 Added bugzilla PR number to test case.
llvm-svn: 172369
2013-01-13 22:17:22 +00:00
Michael Gottesman
c6a7902080 Fixed an infinite loop in the block escape in analysis in ObjCARC caused by 2x blocks each assigned a value via a phi-node causing each to depend on the other.
A test case is provided as well.

llvm-svn: 172368
2013-01-13 22:12:06 +00:00
Nadav Rotem
c6cce40085 Fix PR14547. Handle induction variables of small sizes smaller than i32 (i8 and i16).
llvm-svn: 172348
2013-01-13 07:56:29 +00:00
Michael Gottesman
822cc01174 Fixed bug in ObjCARC where we were changing a call from objc_autoreleaseRV => objc_autorelease but were not updating the InstructionClass to IC_Autorelease.
llvm-svn: 172288
2013-01-12 01:25:19 +00:00
Michael Gottesman
c59f1a814a Fixed a bug where we were tail calling objc_autorelease causing an object to not be placed into an autorelease pool.
The reason that this occurs is that tail calling objc_autorelease eventually
tail calls -[NSObject autorelease] which supports fast autorelease. This can
cause us to violate the semantic gaurantees of __autoreleasing variables that
assignment to an __autoreleasing variables always yields an object that is
placed into the innermost autorelease pool.

The fix included in this patch works by:

1. In the peephole optimization function OptimizeIndividualFunctions, always
remove tail call from objc_autorelease.
2. Whenever we convert to/from an objc_autorelease, set/unset the tail call
keyword as appropriate.

*NOTE* I also handled the case where objc_autorelease is converted in
OptimizeReturns to an autoreleaseRV which still violates the ARC semantics. I
will be removing that in a later patch and I wanted to make sure that the tree
is in a consistent state vis-a-vis ARC always.

Additionally some test cases are provided and all tests that have tail call marked
objc_autorelease keywords have been modified so that tail call has been removed.

*NOTE* One test fails due to a separate bug that I am going to commit soon. Thus
I marked the check line TMP: instead of CHECK: so make check does not fail.

llvm-svn: 172287
2013-01-12 01:25:15 +00:00
Nadav Rotem
c79f1aa3f4 ARM Cost Model: Modify the target independent cost model to ask
the target if it supports the different CAST types. We didn't do this
on X86 because of the different register sizes and types, but on ARM
this makes sense.

llvm-svn: 172245
2013-01-11 19:54:13 +00:00
Nadav Rotem
008741a0e0 ARM Cost Model: We need to detect the max bitwidth of types in the loop in order to select the max vectorization factor.
We don't have a detailed analysis on which values are vectorized and which stay scalars in the vectorized loop so we use
another method. We look at reduction variables, loads and stores, which are the only ways to get information in and out
of loop iterations. If the data types are extended and truncated then the cost model will catch the cost of the vector
zext/sext/trunc operations.

llvm-svn: 172178
2013-01-11 07:11:59 +00:00
Michael Gottesman
0a961cfd24 Converted test dont-tce-tail-marked-call.ll to use FileCheck.
llvm-svn: 172172
2013-01-11 04:16:35 +00:00
Michael Gottesman
93204bc270 This commit is a 4x squash commit consisting of 4x functions converted to use FileCheck instead of grep.
Messages:
Converted test case trivial_codegen_tailcall.ll to use FileCheck.
Converted test return_constant.ll to use FileCheck instead of grep.
Converted test reorder_load.ll to use FileCheck instead of grep.
Converted test intervening-inst.ll to use FileCheck instead of grep.

llvm-svn: 172171
2013-01-11 04:12:53 +00:00
Shuxin Yang
5df00cb3b4 PR14904: Segmentation fault running pass 'Recognize loop idioms'
The root cause is mistakenly taking for granted that 
    "dyn_cast<Instruction>(a-Value)"
return a non-NULL instruction.

llvm-svn: 172145
2013-01-10 23:32:01 +00:00
Evan Cheng
2494d83c07 CastInst::castIsValid should return true if the dest type is the same as
Value's current type. The casting is trivial even for aggregate type.

llvm-svn: 172143
2013-01-10 23:22:53 +00:00
Owen Anderson
9e81fe53cf Teach InstCombine to hoist FABS and FNEG through FPTRUNC instructions. The application of these operations commutes with the truncation, so we should prefer to do them in the smallest size we can, to save register space, use smaller constant pool entries, etc.
llvm-svn: 172117
2013-01-10 22:06:52 +00:00
Nadav Rotem
d5f59a81d9 LoopVectorizer: Fix a bug in the vectorization of BinaryOperators. The BinaryOperator can be folded to an Undef, and we don't want to set NSW flags to undef vals.
PR14878

llvm-svn: 172079
2013-01-10 17:34:39 +00:00
Joey Gouly
f0946a3d34 Fix TryToShrinkGlobalToBoolean in GlobalOpt, so that it does not discard address spaces.
llvm-svn: 172051
2013-01-10 10:31:11 +00:00
Nadav Rotem
436dc952aa ARM Cost model: Use the size of vector registers and widest vectorizable instruction to determine the max vectorization factor.
llvm-svn: 172010
2013-01-09 22:29:00 +00:00
Benjamin Kramer
953e24a567 LICM: Hoist insertvalue/extractvalue out of loops.
Fixes PR14854.

llvm-svn: 171984
2013-01-09 18:12:03 +00:00
Nadav Rotem
4d391c52b0 ARM Cost Model: Add a basic vectorization unrolling test.
llvm-svn: 171931
2013-01-09 01:29:07 +00:00
Nadav Rotem
c14d0d93d9 Remove the -licm pass from the loop vectorizer test because the loop vectorizer does it now.
llvm-svn: 171930
2013-01-09 01:20:59 +00:00
Nadav Rotem
9c27f36e59 Cost Model: Move the 'max unroll factor' variable to the TTI and add initial Cost Model support on ARM.
llvm-svn: 171928
2013-01-09 01:15:42 +00:00
Shuxin Yang
2dc1fb7889 Consider expression "0.0 - X" as the negation of X if
- this expression is explicitly marked no-signed-zero, or
  - no-signed-zero of this expression can be derived from some context.

llvm-svn: 171922
2013-01-09 00:13:41 +00:00
Bill Wendling
c63cf71b21 Make sure we don't emit instructions before a landingpad instruction.
PR14782

llvm-svn: 171846
2013-01-08 10:51:32 +00:00
Nadav Rotem
4aa065a2a3 LoopVectorizer: Add support for floating point reductions
llvm-svn: 171812
2013-01-07 23:13:00 +00:00
Nadav Rotem
5906222eab LoopVectorizer: When we vectorizer and widen loops we process many elements at once. This is a good thing, except for
small loops. On small loops post-loop that handles scalars (and runs slower) can take more time to execute than the
rest of the loop. This patch disables widening of loops with a small static trip count.

llvm-svn: 171798
2013-01-07 21:54:51 +00:00
Shuxin Yang
9c476471f2 This change is to implement following rules:
o. X/C1 * C2 => X * (C2/C1) (if C2/C1 is neither special FP nor denormal)
  o. X/C1 * C2 -> X/(C1/C2)   (if C2/C1 is either specical FP or denormal, but C1/C2 is a normal Fp)

     Let MDC denote multiplication or dividion with one & only one operand being a constant
  o. (MDC ± C1) * C2 => (MDC * C2) ± (C1 * C2)
     (so long as the constant-folding doesn't yield any denormal or special value)

llvm-svn: 171793
2013-01-07 21:39:23 +00:00
Quentin Colombet
994ef807c2 When code size is the priority (Oz, MinSize attribute), help llvm
turning a code like this:

if (foo)
   free(foo)

into that:
free(foo)

Move a call to free from basic block FB into FB's predecessor, P,
when the path from P to FB is taken only if the argument of free is
not equal to NULL.

Some restrictions apply on P and FB to be sure that this code motion
is profitable. Namely:
1. FB must have only one predecessor P.
2. FB must contain only the call to free plus an unconditional
   branch to S.
3. P's successors are FB and S.

Because of 1., we will not increase the code size when moving the call
to free from FB to P.
Because of 2., FB will be empty after the move.
Because of 2. and 3., P's branch instruction becomes useless, so as FB
(simplifycfg will do the job).

llvm-svn: 171762
2013-01-07 18:37:41 +00:00
Chandler Carruth
0e84971ae2 Switch the SCEV expander and LoopStrengthReduce to use
TargetTransformInfo rather than TargetLowering, removing one of the
primary instances of the layering violation of Transforms depending
directly on Target.

This is a really big deal because LSR used to be a "special" pass that
could only be tested fully using llc and by looking at the full output
of it. It also couldn't run with any other loop passes because it had to
be created by the backend. No longer is this true. LSR is now just
a normal pass and we should probably lift the creation of LSR out of
lib/CodeGen/Passes.cpp and into the PassManagerBuilder. =] I've not done
this, or updated all of the tests to use opt and a triple, because
I suspect someone more familiar with LSR would do a better job. This
change should be essentially without functional impact for normal
compilations, and only change behvaior of targetless compilations.

The conversion required changing all of the LSR code to refer to the TTI
interfaces, which fortunately are very similar to TargetLowering's
interfaces. However, it also allowed us to *always* expect to have some
implementation around. I've pushed that simplification through the pass,
and leveraged it to simplify code somewhat. It required some test
updates for one of two things: either we used to skip some checks
altogether but now we get the default "no" answer for them, or we used
to have no information about the target and now we do have some.

I've also started the process of removing AddrMode, as the TTI interface
doesn't use it any longer. In some cases this simplifies code, and in
others it adds some complexity, but I think it's not a bad tradeoff even
there. Subsequent patches will try to clean this up even further and use
other (more appropriate) abstractions.

Yet again, almost all of the formatting changes brought to you by
clang-format. =]

llvm-svn: 171735
2013-01-07 14:41:08 +00:00
David Tweed
fc87109159 Fix a mistaken commit that included some debugging code.
llvm-svn: 171734
2013-01-07 13:41:55 +00:00
David Tweed
6e4a1bff34 There was a switch fall-through in the parser for textual LLVM that caused
bogus comparison operands to default to eq/oeq. Fix that, fix a couple of
tests that accidentally passed and test for bogus comparison opeartors
explicitly.

llvm-svn: 171733
2013-01-07 13:32:38 +00:00
Chandler Carruth
3487258579 Switch BBVectorize to directly depend on having a TTI analysis.
This could be simplified further, but Hal has a specific feature for
ignoring TTI, and so I preserved that.

Also, I needed to use it because a number of tests fail when switching
from a null TTI to the NoTTI nonce implementation. That seems suspicious
to me and so may be something that you need to look into Hal. I worked
it by preserving the old behavior for these tests with the flag that
ignores all target info.

llvm-svn: 171722
2013-01-07 10:22:36 +00:00
Andrew Trick
8c90c2b2d7 Fix a crash in LSR replaceCongruentIVs.
Indirect branch in the preheader crashes replaceCongruentIVs.
Fixes rdar://12910141.

llvm-svn: 171653
2013-01-06 05:59:39 +00:00
Nadav Rotem
9eefe3aaf8 Fix a typo. Remove the duplicated test.
llvm-svn: 171584
2013-01-05 01:17:46 +00:00
Nadav Rotem
836b9a9fda iLoopVectorize: Non commutative operators can be used as reduction variables as long as the reduction chain is used in the LHS.
PR14803.

llvm-svn: 171583
2013-01-05 01:15:47 +00:00
Nadav Rotem
ef10b99294 Force a fixed unroll count on the target independent tests.
This should fix clang-native-arm-cortex-a9. Thanks Renato.

llvm-svn: 171582
2013-01-05 00:58:48 +00:00
Andrew Trick
7bdf2be629 tabs-to-spaces
llvm-svn: 171550
2013-01-04 23:11:35 +00:00
Paul Redmond
6ce33a6ae9 Do not vectorize loops with subtraction reductions
Since subtraction does not commute the loop vectorizer incorrectly vectorizes
reductions such as x = A[i] - x.

Disabling for now.

llvm-svn: 171537
2013-01-04 22:10:16 +00:00
Manman Ren
6e5ea3bafd Memory Dependence Analysis: fix a miscompile that uses DT to approxmiate the
reachablity.

We conservatively approximate the reachability analysis by saying it is not
reachable if there is a single path starting from "From" and the path does not
reach "To".

rdar://12801584

llvm-svn: 171512
2013-01-04 19:19:47 +00:00
Nadav Rotem
cb3562a88e LoopVectorizer:
1. Add code to estimate register pressure.
2. Add code to select the unroll factor based on register pressure.
3. Add bits to TargetTransformInfo to provide the number of registers.

llvm-svn: 171469
2013-01-04 17:48:25 +00:00
Nadav Rotem
ea6706d777 LoopVectorizer: Test the unrolling flag.
llvm-svn: 171446
2013-01-03 01:47:31 +00:00
Nadav Rotem
a7cac72b7d Avoid vectorization when the function has the "noimplicitflot" attribute.
llvm-svn: 171429
2013-01-02 23:54:43 +00:00