1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 12:02:58 +02:00
Commit Graph

87254 Commits

Author SHA1 Message Date
Chad Rosier
175af3914a Add a triple to this test.
llvm-svn: 169803
2012-12-11 00:51:36 +00:00
Chandler Carruth
ac8f03ddc1 Fix a miscompile in the DAG combiner. Previously, we would incorrectly
try to reduce the width of this load, and would end up transforming:

  (truncate (lshr (sextload i48 <ptr> as i64), 32) to i32)
to
  (truncate (zextload i32 <ptr+4> as i64) to i32)

We lost the sext attached to the load while building the narrower i32
load, and replaced it with a zext because lshr always zext's the
results. Instead, bail out of this combine when there is a conflict
between a sextload and a zext narrowing. The rest of the DAG combiner
still optimize the code down to the proper single instruction:

  movswl 6(...),%eax

Which is exactly what we wanted. Previously we read past the end *and*
missed the sign extension:

  movl 6(...), %eax

llvm-svn: 169802
2012-12-11 00:36:57 +00:00
Paul Redmond
fde20fa567 move X86-specific test
This test case uses -mcpu=corei7 so it belongs in CodeGen/X86

Reviewed by: Nadav

llvm-svn: 169801
2012-12-11 00:36:43 +00:00
Bill Wendling
10c1be166f Fix grammar-o.
llvm-svn: 169798
2012-12-11 00:23:07 +00:00
Chad Rosier
0b2e4a1ba8 Fall back to the selection dag isel to select tail calls.
This shouldn't affect codegen for -O0 compiles as tail call markers are not
emitted in unoptimized compiles.  Testing with the external/internal nightly
test suite reveals no change in compile time performance.  Testing with -O1,
-O2 and -O3 with fast-isel enabled did not cause any compile-time or
execution-time failures.  All tests were performed on my x86 machine.
I'll monitor our arm testers to ensure no regressions occur there.

In an upcoming clang patch I will be marking the objc_autoreleaseReturnValue
and objc_retainAutoreleaseReturnValue as tail calls unconditionally.  While
it's theoretically true that this is just an optimization, it's an
optimization that we very much want to happen even at -O0, or else ARC
applications become substantially harder to debug.

Part of rdar://12553082

llvm-svn: 169796
2012-12-11 00:18:02 +00:00
Eric Christopher
2d11b002bc Refactor out the abbreviation handling into a separate class that
controls each of the abbreviation sets (only a single one at the
moment) and computes offsets separately as well for each set
of DIEs.

No real function change, ordering of abbreviations for the skeleton
CU changed but only because we're computing in a separate order. Fix
the testcase not to care.

llvm-svn: 169793
2012-12-10 23:34:43 +00:00
Evan Cheng
86dd733bc8 Some enhancements for memcpy / memset inline expansion.
1. Teach it to use overlapping unaligned load / store to copy / set the trailing
   bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
   x86 and ARM.
3. When memcpy from a constant string, do *not* replace the load with a constant
   if it's not possible to materialize an integer immediate with a single
   instruction (required a new target hook: TLI.isIntImmLegal()).
4. Use unaligned load / stores more aggressively if target hooks indicates they
   are "fast".
5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.
   Also increase the threshold to something reasonable (8 for memset, 4 pairs
   for memcpy).

This significantly improves Dhrystone, up to 50% on ARM iOS devices.

rdar://12760078

llvm-svn: 169791
2012-12-10 23:21:26 +00:00
Arnold Schwaighofer
182d1ce4b7 Optimistically analyse Phi cycles
Analyse Phis under the starting assumption that they are NoAlias. Recursively
look at their inputs.
If they MayAlias/MustAlias there must be an input that makes them so.

Addresses bug 14351.

llvm-svn: 169788
2012-12-10 23:02:41 +00:00
Lang Hames
313bb2d202 Defer call to InitSections until after MCContext has been initialized. If
InitSections is called before the MCContext is initialized it could cause
duplicate temporary symbols to be emitted later (after context initialization
resets the temporary label counter).

llvm-svn: 169785
2012-12-10 22:49:11 +00:00
Anshuman Dasgupta
38feea3c57 Fix PR14568: Avoid the DFA packetizer from making an invalid read
beyond array bounds.

No test case since I cannot reproduce an ICE with this bug. According
to Carlos -- the bug reporter -- a segfault occurs only when LLVM is
compiled with a specific version of GCC.

llvm-svn: 169783
2012-12-10 22:45:57 +00:00
Eric Christopher
b3b9b702cb Rearrange vars and make comments more obvious.
llvm-svn: 169780
2012-12-10 22:25:41 +00:00
Eric Christopher
5b2c77f097 Remove blank line at top of file.
llvm-svn: 169779
2012-12-10 22:25:38 +00:00
Eric Christopher
9fed81d6be Fix a coding style nit.
llvm-svn: 169776
2012-12-10 22:00:20 +00:00
Nadav Rotem
a043fa4083 Enable the loop vectorizer only on O2 and above. (Still disabled by default)
llvm-svn: 169774
2012-12-10 21:45:01 +00:00
Tom Stellard
3801f0fed5 LegalizeDAG: Allow type promotion of scalar loads
llvm-svn: 169773
2012-12-10 21:41:58 +00:00
Tom Stellard
c8da3bd0a1 LegalizeDAG: Allow type promotion for scalar stores
llvm-svn: 169772
2012-12-10 21:41:54 +00:00
Nadav Rotem
417eaafbc4 Split the LoopVectorizer into H and CPP.
llvm-svn: 169771
2012-12-10 21:39:02 +00:00
Bill Wendling
0af6f08453 Revert r169656.
The linker will call `lto_codegen_add_must_preserve_symbol' on all globals that
should be kept around. The linker will pretend that a dylib is being created.
<rdar://problem/12528059>

llvm-svn: 169770
2012-12-10 21:33:45 +00:00
Eli Bendersky
139a219553 Add a test for explicitly exercising the mc-relax-all flag.
llvm-svn: 169764
2012-12-10 20:36:01 +00:00
Eli Bendersky
074c9e1b36 Cleanup formatting, comments and naming.
llvm-svn: 169762
2012-12-10 20:13:43 +00:00
Akira Hatanaka
c10e48ba6a [mips] Set HWEncoding field of registers. Use delete function
getMipsRegisterNumbering and use MCRegisterInfo::getEncodingValue instead.

llvm-svn: 169760
2012-12-10 20:04:40 +00:00
Eric Christopher
c67794597d Use the somewhat semantic term "split dwarf" it more matches what's
going on and makes a lot of the terminology in comments make more sense.

llvm-svn: 169758
2012-12-10 19:51:21 +00:00
Eric Christopher
2bf7bdcd23 Delete the FissionCU.
llvm-svn: 169757
2012-12-10 19:51:18 +00:00
Eric Christopher
67243c354a Reorder fission variables.
llvm-svn: 169756
2012-12-10 19:51:13 +00:00
Bill Wendling
bb1f8f293a Don't use a red zone for code coverage if the user specified `-mno-red-zone'.
The `-mno-red-zone' flag wasn't being propagated to the functions that code
coverage generates. This allowed some of them to use the red zone when that
wasn't allowed.
<rdar://problem/12843084>

llvm-svn: 169754
2012-12-10 19:46:49 +00:00
Nadav Rotem
196fc7cc8c Add support for reverse induction variables. For example:
while (i--)
 sum+=A[i];

llvm-svn: 169752
2012-12-10 19:25:06 +00:00
Jim Grosbach
14cbefa1f6 CMake: Don't run 'git svn' if there is no .git/svn directory.
If the local checkout does not have 'git svn' references set up, don't try
to use 'git svn' for version information.

llvm-svn: 169749
2012-12-10 19:03:37 +00:00
Eli Bendersky
9c8e9c6edd This patch adds statistics for other non-DWARF fragments emitted by
the assembler. This is useful in order to know how the numbers add up,
since in particular the Align fragments account for a non-trivial
portion of the emitted fragments (especially on -O0 which sets
relax-all).

llvm-svn: 169747
2012-12-10 18:59:39 +00:00
Hal Finkel
3b65689ab9 Use GetUnderlyingObjects in misched
misched used GetUnderlyingObject in order to break false load/store
dependencies, and the -enable-aa-sched-mi feature similarly relied on
GetUnderlyingObject in order to ensure it is safe to use the aliasing analysis.
Unfortunately, GetUnderlyingObject does not recurse through phi nodes, and so
(especially due to LSR) all of these mechanisms failed for
induction-variable-dependent loads and stores inside loops.

This change replaces uses of GetUnderlyingObject with GetUnderlyingObjects
(which will recurse through phi and select instructions) in misched.

Andy reviewed, tested and simplified this patch; Thanks!

llvm-svn: 169744
2012-12-10 18:49:16 +00:00
Sean Silva
ffc628ff80 Fix funky copy-pasted grammatical error.
PR14343

llvm-svn: 169742
2012-12-10 18:37:26 +00:00
Chandler Carruth
7e4aad1c1f Revert "Make '-mtune=x86_64' assume fast unaligned memory accesses."
Accidental commit... git svn betrayed me. Sorry for the noise.

llvm-svn: 169741
2012-12-10 18:23:52 +00:00
Chandler Carruth
a64587b996 Make '-mtune=x86_64' assume fast unaligned memory accesses.
Summary:
Not all chips targeted by x86_64 have this feature, but a dramatically
increasing number do. Specifying a chip-specific tuning parameter will
continue to turn the feature on or off as appropriate for that
particular chip, but the generic flag should try to achieve the best
performance on the most widely available hardware. Today, the number of
chips with fast UA access dwarfs those without in the x86-64 space.

Note that this also brings LLVM's code generation for this '-march' flag
more in line with that of modern GCCs.

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D195

llvm-svn: 169740
2012-12-10 18:22:42 +00:00
Chandler Carruth
d5a4e512ca Fix a typo in my previous commit -- bloomfield is 0x1A not 0x2A.
Thanks to the PaX folks for noticing in review! We need some tests here,
any sugestions welcome...

llvm-svn: 169739
2012-12-10 18:22:40 +00:00
Chandler Carruth
3e5f2c328d Address a FIXME and update the fast unaligned memory feature for newer
Intel chips.

The model number rules were determined by inspecting Intel's
documentation for their newer chip model numbers. My understanding is
that all of the newer Intel chips have fast unaligned memory access, but
if anyone is concerned about a particular chip, just shout.

No tests updated; it's not clear we have dedicated tests for the chips'
various features, but if anyone would like tests (or can point me at
some existing ones), I'm happy to oblige.

llvm-svn: 169730
2012-12-10 09:18:44 +00:00
Chandler Carruth
4686de879c Add a new visitor for walking the uses of a pointer value.
This visitor provides infrastructure for recursively traversing the
use-graph of a pointer-producing instruction like an alloca or a malloc.
It maintains a worklist of uses to visit, so it can handle very deep
recursions. It automatically looks through instructions which simply
translate one pointer to another (bitcasts and GEPs). It tracks the
offset relative to the original pointer as long as that offset remains
constant and exposes it during the visit as an APInt offset. Finally, it
performs conservative escape analysis.

However, currently it has some limitations that should be addressed
going forward:
1) It doesn't handle vectors of pointers.
2) It doesn't provide a cheaper visitor when the constant offset
   tracking isn't needed.
3) It doesn't support non-instruction pointer values.

The current functionality is exactly what is required to implement the
SROA pointer-use visitors in terms of this one, rather than in terms of
their own ad-hoc base visitor, which was always very poorly specified.
SROA has been converted to use this, and the code there deleted which
this utility now provides.

Technically speaking, using this new visitor allows SROA to handle a few
more cases than it previously did. It is now more aggressive in ignoring
chains of instructions which look like they would defeat SROA, but in
fact do not because they never result in a read or write of memory.
While this is "neat", it shouldn't be interesting for real programs as
any such chains should have been removed by others passes long before we
get to SROA. As a consequence, I've not added any tests for these
features -- it shouldn't be part of SROA's contract to perform such
heroics.

The goal is to extend the functionality of this visitor going forward,
and re-use it from passes like ASan that can benefit from doing
a detailed walk of the uses of a pointer.

Thanks to Ben Kramer for the code review rounds and lots of help
reviewing and debugging this patch.

llvm-svn: 169728
2012-12-10 08:28:39 +00:00
Craig Topper
0f4945c76d Teach DAG combine to handle vector add/sub with vectors of all 0s.
llvm-svn: 169727
2012-12-10 08:12:29 +00:00
NAKAMURA Takumi
db0a1830f6 [CMake] TARGET_TRIPLE may be internal alias of LLVM_DEFAULT_TARGET_TRIPLE.
llvm-svn: 169726
2012-12-10 07:14:29 +00:00
NAKAMURA Takumi
10a9cdfc27 [CMake] Update dependencies to intrinsics_gen corresponding to r169711.
llvm-svn: 169724
2012-12-10 05:27:15 +00:00
Bill Wendling
5d24fa9e9d Revert to old behavior until linker can pass export-dynamic option.
llvm-svn: 169720
2012-12-10 02:51:16 +00:00
Chandler Carruth
c9b6bd9712 Fix PR14548: SROA was crashing on a mixture of i1 and i8 loads and stores.
When SROA was evaluating a mixture of i1 and i8 loads and stores, in
just a particular case, it would tickle a latent bug where we compared
bits to bytes rather than bits to bits. As a consequence of the latent
bug, we would allow integers through which were not byte-size multiples,
a situation the later rewriting code was never intended to handle.

In release builds this could trigger all manner of oddities, but the
reported issue in PR14548 was forming invalid bitcast instructions.

The only downside of this fix is that it makes it more clear that SROA
in its current form is not capable of handling mixed i1 and i8 loads and
stores. Sometimes with the previous code this would work by luck, but
usually it would crash, so I'm not terribly worried. I'll watch the LNT
numbers just to be sure.

llvm-svn: 169719
2012-12-10 00:54:45 +00:00
Dmitri Gribenko
891cde588c Documentation: convert ReleaseNotes.html to reST.
Patch by Anthony Mykhailenko with small fixes by me.

llvm-svn: 169714
2012-12-09 23:14:26 +00:00
Michael Ilseman
2f7539fd12 Reorganize FastMathFlags to be a wrapper around unsigned, and streamline some interfaces.
llvm-svn: 169712
2012-12-09 21:12:04 +00:00
Paul Redmond
e43761293d LoopVectorize: support vectorizing intrinsic calls
- added function to VectorTargetTransformInfo to query cost of intrinsics
- vectorize trivially vectorizable intrinsic calls such as sin, cos, log, etc.

Reviewed by: Nadav

llvm-svn: 169711
2012-12-09 20:42:17 +00:00
Michael Ilseman
92f7651045 Have the bitcode reader/writer just use FPMathOperator's fast math enum directly
llvm-svn: 169710
2012-12-09 20:23:16 +00:00
Paul Redmond
b778deb83a test commit.
llvm-svn: 169709
2012-12-09 19:46:31 +00:00
Chris Lattner
ae27e4b10f So many people have touched this, it doesn't make sense to ascribe authorship anymore.
llvm-svn: 169704
2012-12-09 16:55:39 +00:00
Jakub Staszak
3375cd11b4 Use m_OneUse pattern instead of hasOneUse() method.
No functionality change.

llvm-svn: 169703
2012-12-09 16:06:44 +00:00
Sean Silva
099e0abea5 docs: Convert GarbageCollection.html to reST
Patch by Alexander Zinenko!

llvm-svn: 169702
2012-12-09 15:52:47 +00:00
Jakub Staszak
30bae3f07e Remove trailing spaces.
llvm-svn: 169701
2012-12-09 15:37:46 +00:00
Dmitri Gribenko
d029408178 Documentation: HowToReleaseLLVM.rst: remove trailing whitespace.
llvm-svn: 169700
2012-12-09 15:33:26 +00:00