1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 03:33:20 +01:00
Commit Graph

103285 Commits

Author SHA1 Message Date
Hal Finkel
c52e65b830 Move late partial-unrolling thresholds into the processor definitions
The old method used by X86TTI to determine partial-unrolling thresholds was
messy (because it worked by testing target features), and also would not
correctly identify the target CPU if certain target features were disabled.
After some discussions on IRC with Chandler et al., it was decided that the
processor scheduling models were the right containers for this information
(because it is often tied to special uop dispatch-buffer sizes).

This does represent a small functionality change:
 - For generic x86-64 (which uses the SB model and, thus, will get some
   unrolling).
 - For AMD cores (because they still currently use the SB scheduling model)
 - For Haswell (based on benchmarking by Louis Gerbarg, it was decided to bump
   the default threshold to 50; we're working on a test case for this).
Otherwise, nothing has changed for any other targets. The logic, however, has
been moved into BasicTTI, so other targets may now also opt-in to this
functionality simply by setting LoopMicroOpBufferSize in their processor
model definitions.

llvm-svn: 208289
2014-05-08 09:14:44 +00:00
Tobias Grosser
358e9a97e7 Revert "SCEV: Use I = vector<>.erase(I) to iterate and delete at the same time"
as committed in r208282. The original commit was incorrect.

llvm-svn: 208286
2014-05-08 07:55:34 +00:00
Hao Liu
be513c440d AArch64/ARM64: Port NEON post-increment load/store with 2/3/4 vectors to ARM64 backend.
llvm-svn: 208284
2014-05-08 07:38:13 +00:00
Tobias Grosser
4c447db9fb SCEV: Use I = vector<>.erase(I) to iterate and delete at the same time
llvm-svn: 208282
2014-05-08 07:12:44 +00:00
Richard Smith
34ed6bf95c [modules] Add missing #include.
llvm-svn: 208276
2014-05-08 02:34:32 +00:00
Saleem Abdulrasool
fffc610ca8 test: fix silly typo
Oh silly Darwin and your case insensitive file system.

llvm-svn: 208274
2014-05-08 01:41:04 +00:00
Saleem Abdulrasool
84a61727f4 ARM: support FK_SecRel_2 relocations on WoA
This adds FK_SecRel_2 relocation support to ARM.  This enables the building of
object files for armv7-windows-msvc which enables CodeView line tables for
debugging as opposed to armv7-windows-itanium which currently uses DWARF.

llvm-svn: 208273
2014-05-08 01:35:57 +00:00
Richard Smith
90c10bd484 Simplify and fix incorrect comment. No functionality change.
llvm-svn: 208272
2014-05-08 01:08:43 +00:00
Filipe Cabecinhas
275860c4fd Lower certain build_vectors to insertps instructions
Summary:
Vectors built with zeros and elements in the same order as another
(source) vector are optimized to be built using a single insertps
instruction.
Also optimize when we move one element in a vector to a different place
in that vector while zeroing out some of the other elements.

Further optimizations are possible, described in TODO comments.
I will be implementing at least some of them in the near future.

Added some tests for different cases where this optimization triggers.

Reviewers: nadav, delena, craig.topper

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3521

llvm-svn: 208271
2014-05-08 00:25:16 +00:00
Lang Hames
eb0e367f97 Back out r208257 while I investigate tester failures.
llvm-svn: 208267
2014-05-07 23:35:53 +00:00
Duncan P. N. Exon Smith
f5258e4d2c GlobalValue: Assert symbols with local linkage have default visibility
The change to ExtractGV.cpp has no functionality change except to avoid
the asserts.  Existing testcases already cover this, so I didn't add a
new one.

llvm-svn: 208264
2014-05-07 23:00:22 +00:00
Duncan P. N. Exon Smith
c74b2b0974 IR: Don't allow non-default visibility on local linkage
Visibilities of `hidden` and `protected` are meaningless for symbols
with local linkage.

  - Change the assembler to reject non-default visibility on symbols
    with local linkage.

  - Change the bitcode reader to auto-upgrade `hidden` and `protected`
    to `default` when the linkage is local.

  - Update LangRef.

<rdar://problem/16141113>

llvm-svn: 208263
2014-05-07 22:57:20 +00:00
Duncan P. N. Exon Smith
bba2550124 LTO: Assert visibility of local linkage when merging symbols
`ModuleLinker::getLinkageResult()` shouldn't create symbols with local
linkage and non-default visibility -- in fact, symbols with local
linkage shouldn't be merged at all.  Assert to that effect.

llvm-svn: 208262
2014-05-07 22:55:46 +00:00
Duncan P. N. Exon Smith
9739ca9e59 LTO: Check local linkage first
Since visibility is meaningless for symbols with local linkage, check
local linkage before visibility when setting symbol attributes.

When linkage is `internal` and the visibility is `hidden`, the exposed
attribute is now `LTO_SYMBOL_SCOPE_INTERNAL` instead of
`LTO_SYMBOL_SCOPE_HIDDEN`.  Although the bitfield allows *both* to be
specified, the combination is nonsense anyway.

Given changes (in progress) to drop visibility when a symbol has local
linkage, this almost has no functionality change: it's mostly a cleanup
to clarify the logic.

The exception is when something has `appending` linkage.  Before this
change, such symbols would be advertised as `LTO_SYMBOL_SCOPE_INTERNAL`;
now, they'll be given `LTO_SYMBOL_SCOPE_COMMON`.

Unfortunately this is really awkward to test.  This only changes what we
advertise to linkers (before running LTO), not what the final object
looks like.  In theory I could add `DEBUG` output to `llvm-lto` (and
test with "REQUIRES: asserts"), but follow-up commits to disallow
`internal hidden` simplify this anyway.

<rdar://problem/16141113>

llvm-svn: 208261
2014-05-07 22:53:14 +00:00
Quentin Colombet
548eb2e304 [X86] Add a test case for r208252.
Prior to r208252, the FMA 231 family was marked as isCommutable. However the
memory variants of this family are not commutable. Therefore, we did not
implemented the findCommutedOpIndices for those variants and missed that
the default implementation (more or less: commute indices 1 and 2) was
firing behind our back.
As a result, as demonstrated in the test case before the fix, we were
transforming a = b * c + a into a = a * c + b.

I.e., before r208252 we were generating for this test case:
vmovaps %xmm0, %xmm1
vmoss (%rsi), %xmm0
vfmadd231ss (%rdi), %xmm1, %xmm0

Instead of:
vmoss (%rsi), %xmm1
vfmadd231ss (%rdi), %xmm1, %xmm0

<rdar://problem/16800495> 

llvm-svn: 208260
2014-05-07 22:52:58 +00:00
Lang Hames
0dd4713eee [RuntimeDyld] Make RuntimeDyldImpl::resolveExternalSymbols preserve the
relocation entries it applies.

Prior to this patch, RuntimeDyldImpl::resolveExternalSymbols discarded
relocations for external symbols once they had been applied. This causes issues
if the client calls MCJIT::finalizeLoadedModules more than once, and updates the
location of any symbols in between (e.g. by calling MCJIT::mapSectionAddress).

No test case yet: None of our in-tree memory managers support moving sections
around. I'll have to hack up a dummy memory manager before I can write a unit
test.

Fixes <rdar://problem/16764378>

llvm-svn: 208257
2014-05-07 22:34:08 +00:00
Hal Finkel
a22ec95e68 [X86TTI] Remove the unrolling branch limits
The loop stream detector (LSD) on modern Intel cores, which optimizes the
execution of small loops, has limits on the number of taken branches in
addition to uop-count limits (modern AMD cores have similar limits).
Unfortunately, at the IR level, estimating the number of branches that will be
taken is difficult. For one thing, it strongly depends on later passes (block
placement, etc.). The original implementation took a conservative approach and
limited the maximal BB DFS depth of the loop.  However, fairly-extensive
benchmarking by several of us has revealed that this is the wrong approach. In
fact, there are zero known cases where the branch limit prevents a detrimental
unrolling (but plenty of cases where it does prevent beneficial unrolling).

While we could improve the current branch counting logic by incorporating
branch probabilities, this further complication seems unjustified without a
motivating regression. Instead, unless and until a regression appears, the
branch counting will be removed.

llvm-svn: 208255
2014-05-07 22:25:18 +00:00
Justin Bogner
81ce7e44c7 llvm-cov: Fix some funny indentation (NFC)
Noticed by Duncan Exon Smith. Thanks!

llvm-svn: 208253
2014-05-07 21:50:43 +00:00
Quentin Colombet
9b13d839be [X86] Selectively mark the FMA variants inside a family as isCommutable.
Given a FMA family (e.g., 213, 231), not all the variants (i.e., register or
memory) are commutable.
E.g., for the 213 family (with the syntax src1, src2, src3):
fmaXXX213 A, B, reg3/mem3 == fmaXXX213 B, A, reg3/mem3

Now consider the 231 family:
fmaXXX231 A, B, reg3 == fmaXXX231 A, reg3, B
But
fmaXXX231 A, B, mem3 != fmaXXX231 A, mem3, B
Indeed, mem3 cannot be the second argument of the memory variant of fmaXXX231.

Working on a reduced test case!

<rdar://problem/16800495>

llvm-svn: 208252
2014-05-07 21:43:35 +00:00
Eric Christopher
91701a28d0 Reformat a couple of functions for clarity.
llvm-svn: 208248
2014-05-07 21:05:47 +00:00
Nico Weber
4dc9c7ea6b Let OnDiskHashTable call the destructor of its Items.
OnDiskHashTable::insert() calls the Item constructor via placement new, but
nothing called the destructor.  This matters in cases when the Info template
parameter has key_type or data_type typedefs that have a destructor, for
example like IdentifierIndexWriterTrait in clang's GlobalModuleIndex.cpp.

This fixes a 5-year old bug that's been around since the OnDiskHashTable code
was added in r64192.  Bug found by LSan!

llvm-svn: 208243
2014-05-07 19:55:38 +00:00
Rafael Espindola
290b146df0 Replace a virtual with an override.
llvm-svn: 208242
2014-05-07 19:52:32 +00:00
Jyotsna Verma
c9cece4644 [Hexagon] Add New TSFlags to be used in the upcoming patches.
llvm-svn: 208239
2014-05-07 19:07:34 +00:00
Sebastian Pop
866eb1eecf avoid segfaulting
*Quotient and *Remainder don't have to be initialized.

llvm-svn: 208238
2014-05-07 19:00:37 +00:00
Sebastian Pop
8f355b84ae do not collect undef terms
llvm-svn: 208237
2014-05-07 19:00:32 +00:00
Matt Arsenault
903ece3700 Fix using wrong result type for setcc.
When reducing the bitwidth of a comparison against a constant, the
original setcc's result type was used, which was incorrect.

No test since I don't think any other in tree targets change the
bitwidth of the setcc type depending on the bitwidth of the compared
type.

llvm-svn: 208236
2014-05-07 18:26:58 +00:00
Eric Christopher
423c1a5415 Debug.h already includes raw_ostream.h, no need to include it again.
llvm-svn: 208235
2014-05-07 18:19:04 +00:00
Adam Nemet
9c5e483a57 [Test] Remove c-index-test from the list of substitutions
All the tests are under the clang tests and none should be under llvm moving
forward.

The topic was discussed in this thread:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140428/214905.html

llvm-svn: 208234
2014-05-07 18:16:02 +00:00
Sebastian Pop
d5cb815565 split delinearization pass in 3 steps
To compute the dimensions of the array in a unique way, we split the
delinearization analysis in three steps:

- find parametric terms in all memory access functions
- compute the array dimensions from the set of terms
- compute the delinearized access functions for each dimension

The first step is executed on all the memory access functions such that we
gather all the patterns in which an array is accessed. The second step reduces
all this information in a unique description of the sizes of the array. The
third step is delinearizing each memory access function following the common
description of the shape of the array computed in step 2.

This rewrite of the delinearization pass also solves a problem we had with the
previous implementation: because the previous algorithm was by induction on the
structure of the SCEV, it would not correctly recognize the shape of the array
when the memory access was not following the nesting of the loops: for example,
see polly/test/ScopInfo/multidim_only_ivs_3d_reverse.ll

; void foo(long n, long m, long o, double A[n][m][o]) {
;
;   for (long i = 0; i < n; i++)
;     for (long j = 0; j < m; j++)
;       for (long k = 0; k < o; k++)
;         A[i][k][j] = 1.0;

Starting with this patch we no longer delinearize access functions that do not
contain parameters, for example in test/Analysis/DependenceAnalysis/GCD.ll

;;  for (long int i = 0; i < 100; i++)
;;    for (long int j = 0; j < 100; j++) {
;;      A[2*i - 4*j] = i;
;;      *B++ = A[6*i + 8*j];

these accesses will not be delinearized as the upper bound of the loops are
constants, and their access functions do not contain SCEVUnknown parameters.

llvm-svn: 208232
2014-05-07 18:01:20 +00:00
Chandler Carruth
17e2aa3c0a [x86] Make the 'x86-64' cpu, what I see as and many use as the generic
default architecture for reasonable modern x86 processors, actually be
modern. This processor model should essentially be "tuned" for modern
x86 chips as much as possible without undue penalties on any specific
architecture. Previously we weren't even using the nice scheduling
models. There are a few other tweaks needed here, but this change at
least I have benchmarked across a decent swatch of chips (intel's
clovertown, westmere, and sandybridge; amd's istanbul) and seen no
significant regressions.

If anyone has suggested ways to test this, just let me know. Somewhat
alarmingly, no existing tests failed.

llvm-svn: 208230
2014-05-07 17:37:03 +00:00
Chandler Carruth
36ad1ec2b0 Tidy up whitespace with clang-format prior to making significant
changes.

llvm-svn: 208229
2014-05-07 17:36:59 +00:00
Simon Atanasyan
a8319970e6 [yaml2obj] Support ELF x86 relocations.
llvm-svn: 208228
2014-05-07 17:06:38 +00:00
Rafael Espindola
643c61edc3 Style update: don't duplicate the function name.
llvm-svn: 208227
2014-05-07 17:04:45 +00:00
Alexey Samsonov
0b0a2744cc [CMake] Add build rules for llvm-PerfectShuffle utility
llvm-svn: 208225
2014-05-07 16:54:00 +00:00
Rafael Espindola
04d183f259 Style update: don't duplicate the function name.
llvm-svn: 208224
2014-05-07 16:43:23 +00:00
Chad Rosier
da933c5aba [ARM64][fast-isel] Disable target specific optimizations at -O0. Functionally,
this patch disables the dead register elimination pass and the load/store pair
optimization pass at -O0.  The ILP optimizations don't require the optimization
level to be checked because the call to addILPOpts is predicated with the
necessary check.  The AdvSIMDScalar pass is disabled by default at all
optimization levels.  This patch leaves that pass disabled by default.

Also, move command-line options into ARM64TargetMachine.cpp and add a few
additional flags to aid in debugging.  This fixes an issue with the
-debug-pass=Structure flag where passes were printed, but not actually run
(i.e., AdvSIMDScalar pass).

llvm-svn: 208223
2014-05-07 16:41:55 +00:00
Daniel Sanders
44cc643eee [mips] Add highly experimental support for MIPS-I, MIPS-II, MIPS-III, and MIPS-V
Summary:
These processors will only be available for the integrated assembler at
first (CodeGen will emit a fatal error saying they are not implemented).

The intention is to work through the existing instructions and correctly
annotate the ISA they were added in so that we have a sufficiently good
base to start MIPS64r6 development. MIPS64r6 removes/re-encodes certain
instructions and I believe it is best to define ISA's using set-union's
as far as possible rather than using set-subtraction.

Reviewers: vmedic

Subscribers: emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D3569

llvm-svn: 208221
2014-05-07 16:25:22 +00:00
Justin Bogner
9296fda55a llvm-cov: Explicitly namespace llvm::make_unique to keep MSVC happy
This is a followup to r208171, where a call to make_unique was
disambiguated for MSVC. Disambiguate two more calls, and remove the
comment about it since this is what we do everywhere.

llvm-svn: 208219
2014-05-07 16:01:27 +00:00
Rafael Espindola
3c06343011 Use range loop.
llvm-svn: 208218
2014-05-07 14:53:32 +00:00
Michael Zolotukhin
5fe056ff21 [InstCombine] Add optimization of redundant insertvalue instructions.
rdar://problem/11861387

llvm-svn: 208214
2014-05-07 14:30:18 +00:00
Daniel Sanders
1cc95dd8b6 [mips] Add FGR_32/FGR_64/GPR_64 adjectives and use then instead of FGRPredicates/GPRPredicates
Summary:
No functional change (confirmed by diffing tablegen-erated files).

Depends on D3642

Reviewers: vmedic, dsanders

Reviewed By: dsanders

Differential Revision: http://reviews.llvm.org/D3645

llvm-svn: 208213
2014-05-07 14:25:43 +00:00
Daniel Sanders
e8ed4c1087 [mips] Add INSN_<name> adverbs and start using them instead of AdditionalPredicates overrides
Summary:
No functional change

Depends on D3641

Reviewers: vmedic

Reviewed By: vmedic

Differential Revision: http://reviews.llvm.org/D3642

llvm-svn: 208212
2014-05-07 14:11:46 +00:00
Evgeniy Stepanov
c02f1a9f96 [msan] Fix -fsanitize=memory -fno-integrated-as.
llvm-svn: 208211
2014-05-07 14:10:51 +00:00
Tim Northover
aed9e378c3 AArch64/ARM64: optimise vector selects & enable test
When performing a scalar comparison that feeds into a vector select,
it's actually better to do the comparison on the vector side: the
scalar route would be "CMP -> CSEL -> DUP", the vector is "CM -> DUP"
since the vector comparisons are all mask based.

llvm-svn: 208210
2014-05-07 14:10:27 +00:00
Daniel Sanders
ff0b1220dc [mips] Add ISA_<name> adverbs and start using them instead of AdditionalPredicates overrides
Summary:
One small functional change. The recently added PAUSE instruction now has
the HasStdEnc predicate which was accidentally removed by a Requires<>.

Depends on D3640

Reviewers: vmedic

Reviewed By: vmedic

Differential Revision: http://reviews.llvm.org/D3641

llvm-svn: 208209
2014-05-07 13:57:22 +00:00
Rafael Espindola
765e5e78cf Remove the UseCFI option from createAsmStreamer.
We were already always passing true, this just removes the option.

llvm-svn: 208205
2014-05-07 13:00:43 +00:00
Ed Maste
1645f9d9b6 DebugInfo: Use enum instead of unsigned
This makes debuging DebugInfo generation with LLDB a little more pleasant.

Differential Revision: http://reviews.llvm.org/D3626

llvm-svn: 208202
2014-05-07 12:49:08 +00:00
Daniel Sanders
9bda911eb9 [mips] Continue splitting Instruction.Predicates into smaller lists and re-join them with !listconcat
Summary:
Move IsGP64bit into GPRPredicates, and IsFP64bit/NotFP64bit into FGRPredicates

No functional change (confirmed by diffing tablegen-erated files).

Depends on D3639

Reviewers: vmedic

Reviewed By: vmedic

Differential Revision: http://reviews.llvm.org/D3640

llvm-svn: 208201
2014-05-07 12:48:37 +00:00
James Molloy
9af6fee0f6 [ARM64-BE] Fix fast-isel, and add appropriate RUN lines to appropriate tests.
llvm-svn: 208200
2014-05-07 12:33:55 +00:00
James Molloy
7951361ffc [ARM64-BE] Fix variable-argument saving.
llvm-svn: 208199
2014-05-07 12:33:48 +00:00