1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 03:23:01 +02:00
Commit Graph

75778 Commits

Author SHA1 Message Date
Erik Eckstein
99b67265b7 SLPVectorizer: limit the number of alias checks to reduce the runtime.
In case of blocks with many memory-accessing instructions, alias checking can take lot of time
(because calculating the memory dependencies has quadratic complexity).
I chose a limit which resulted in no changes when running the benchmarks.

llvm-svn: 226439
2015-01-19 09:33:38 +00:00
Hal Finkel
3eec8e9415 [PowerPC] Minor correction to r226432
We don't need to exclude patchpoints from the implicit r2 dependence in
FastISel because it is added as an implicit operand and, thus, should not
confuse that StackMap code.

By inspection / no test case.

llvm-svn: 226434
2015-01-19 07:44:45 +00:00
Michael Kuperstein
83ab484af5 [MIScheduler] Slightly better handling of constrainLocalCopy when both source and dest are local
This fixes PR21792.

Differential Revision: http://reviews.llvm.org/D6823

llvm-svn: 226433
2015-01-19 07:30:47 +00:00
Hal Finkel
89328f88bf [PowerPC] Add r2 as an operand for all calls under both PPC64 ELF V1 and V2
Our PPC64 ELF V2 call lowering logic added r2 as an operand to all direct call
instructions in order to represent the dependency on the TOC base pointer
value. Restricting this to ELF V2, however, does not seem to make sense: calls
under ELF V1 have the same dependence, and indirect calls have an r2 dependence
just as direct ones. Make sure the dependence is noted for all calls under both
ELF V1 and ELF V2.

llvm-svn: 226432
2015-01-19 07:20:27 +00:00
Craig Topper
3d7b2aaaae [x86] Change AVX512 intrinsics to take a 8-bit immediate for the comparision kind instead of a 32-bit immediate. This better aligns with the emitted instruction. It also matches SSE and AVX1 equivalents. Also add auto upgrade support.
llvm-svn: 226430
2015-01-19 06:07:27 +00:00
Chandler Carruth
111b7302d1 [PM] Lift the analyses into the interface for
SplitLandingPadPredecessors and remove the Pass argument from its
interface.

Another step to the utilities being usable with both old and new pass
managers.

llvm-svn: 226426
2015-01-19 03:03:39 +00:00
David Blaikie
4a6a34ad21 unique_ptrify the RelInfo parameter to TargetRegistry::createMCSymbolizer
llvm-svn: 226416
2015-01-18 20:45:48 +00:00
David Blaikie
8839ae1559 std::unique_ptrify the MCStreamer argument to createAsmPrinter
llvm-svn: 226414
2015-01-18 20:29:04 +00:00
Hal Finkel
847797329b [PowerPC] Don't hard-code R2 as register when processing TOC relocations
Instructions that have high-order TOC relocations always carry R2 as their base
register, so it does not matter whether we take the register from the
instruction or just hard-code it in PPCAsmPrinter. In the future, however, we
might want to apply these relocations to instructions using a different
register, so taking the register from the instruction is a better thing to do.
No change in functionality here, however.

llvm-svn: 226403
2015-01-18 15:59:44 +00:00
Hal Finkel
0f5a2dae42 [PowerPC] Add some FIXMEs for fastcc and FPR <-> GPR moves
So we don't forget, once we support FPR <-> GPR moves on the P8, we'll likely
want to re-visit this part of the calling convention.

llvm-svn: 226401
2015-01-18 14:31:10 +00:00
Hal Finkel
e9b3a1a030 [PowerPC] Initial PPC64 calling-convention changes for fastcc
The default calling convention specified by the PPC64 ELF (V1 and V2) ABI is
designed to work with both prototyped and non-prototyped/varargs functions. As
a result, GPRs and stack space are allocated for every argument, even those
that are passed in floating-point or vector registers.

GlobalOpt::OptimizeFunctions will transform local non-varargs functions (that
do not have their address taken) to use the 'fast' calling convention.

When functions are using the 'fast' calling convention, don't allocate GPRs for
arguments passed in other types of registers, and don't allocate stack space for
arguments passed in registers. Other changes for the fast calling convention
may be added in the future.

llvm-svn: 226399
2015-01-18 12:08:47 +00:00
Chandler Carruth
f5a71dfd5e [PM] Pull the analyses used for another utility routine into its API
rather than relying on the pass object.

This one is a bit annoying, but will pay off. First, supporting this one
will make the next one much easier, and for utilities like LoopSimplify,
this is moving them (slowly) closer to not having to pass the pass
object around throughout their APIs.

llvm-svn: 226396
2015-01-18 09:21:15 +00:00
Chandler Carruth
80cee50edc [PM] Sink the specific analyses preserved by SplitBlock into its
interface, removing Pass from its interface.

This also makes those analyses optional so that passes which don't even
preserve these (or use them) can skip the logic entirely.

llvm-svn: 226394
2015-01-18 02:39:37 +00:00
Chandler Carruth
67d589eee8 [PM] Replace another Pass argument with specific analyses that are
optionally updated by MergeBlockIntoPredecessors.

No functionality changed, just refactoring to clear the way for the new
pass manager.

llvm-svn: 226392
2015-01-18 02:11:23 +00:00
Chandler Carruth
da64aee97c [PM] Refactor how the LoopRotation pass access the DominatorTree.
Instead of querying the pass every where we need to, do that once and
cache a pointer in the pass object. This is both simpler and I'm about
to add yet another place where we need to dig out that pointer.

llvm-svn: 226391
2015-01-18 02:08:05 +00:00
Chandler Carruth
dd6154bd12 [PM] Lift the actual analyses used into the inferface rather than
accepting a Pass and querying it for analyses.

This is necessary to allow the utilities to work both with the old and
new pass managers, and I also think this makes the interface much more
clear and helps the reader know what analyses the utility can actually
handle. I plan to repeat this process iteratively to clean up all the
pass utilities.

llvm-svn: 226386
2015-01-18 01:45:07 +00:00
Chandler Carruth
3c308b83f1 [PM] Now that LoopInfo isn't in the Pass type hierarchy, it is much
cleaner to derive from the generic base.

Thise removes a ton of boiler plate code and somewhat strange and
pointless indirections. It also remove a bunch of the previously needed
friend declarations. To fully remove these, I also lifted the verify
logic into the generic LoopInfoBase, which seems good anyways -- it is
generic and useful logic even for the machine side.

llvm-svn: 226385
2015-01-18 01:25:51 +00:00
Chandler Carruth
48bfaee103 [PM] Cleanup more warnings my refactoring exposed where now we have
unused variables in a no-asserts build.

I've fixed this by putting the entire loop behind an #ifndef as it
contains nothing other than asserts.

llvm-svn: 226377
2015-01-17 14:49:23 +00:00
Chandler Carruth
dbbe22f963 [PM] Remove a dead field.
This was dead even before I refactored how we initialized it, but my
refactoring made it trivially dead and it is now caught by a Clang
warning. This fixes the warning and should clean up the -Werror bot
failures (sorry!).

llvm-svn: 226376
2015-01-17 14:31:35 +00:00
Chandler Carruth
c47432114d [PM] Split the LoopInfo object apart from the legacy pass, creating
a LoopInfoWrapperPass to wire the object up to the legacy pass manager.

This switches all the clients of LoopInfo over and paves the way to port
LoopInfo to the new pass manager. No functionality change is intended
with this iteration.

llvm-svn: 226373
2015-01-17 14:16:18 +00:00
Hal Finkel
470187b350 [PowerPC] Don't list R11 as a patchpoint scratch register
R11's status is the same under both the PPC64 ELF V1 and V2 ABIs: it is
reserved for use as an "environment pointer" for compilation models that
require such a thing. We don't, we also don't need a second scratch register,
and because we support only "local" patchpoint call targets, we might as well
let R11 be used for anyregcc patchpoints.

llvm-svn: 226369
2015-01-17 03:57:34 +00:00
Mehdi Amini
e119c6afaf Improve DAG combine pass on certain IR vector patterns
Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector
produced suboptimal code in AVX2 mode with certain IR combinations.

In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32
(undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical,
but then mysteriously generated rather bad code; the movq/movhpd combination
didn't match.

The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs
would get promoted to 4f32 by the type legalizer, eventually resulting
in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing
these were both half the output size, concatted them and then produced
a shuffle. However, the resulting concat + shuffle was more complex than
it should be; in the case where the upper half of the output is undef, we
probably want to generate shuffle + concat instead.

This enhancement causes the vector_shuffle combine step to recognize this
suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR
in case the same suboptimal pattern occurs for other reasons.

This results in the optimizer correctly producing the optimal movq + movhpd
sequence for all three variations on this IR, even with AVX2.

I've included a test case.

Radar link: rdar://problem/19287012
Fix for PR 21943.

From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 226360
2015-01-17 01:35:56 +00:00
Lang Hames
b909265827 [RuntimeDyld] Tidy up emitCommonSymbols a little. NFC.
llvm-svn: 226358
2015-01-17 00:55:05 +00:00
Richard Trieu
75776fb604 Remove std::move that was preventing return value optimization.
llvm-svn: 226356
2015-01-17 00:46:44 +00:00
Matthias Braun
522730babd RegisterCoalescer: Cleanup and improved comment for a subtle detail.
llvm-svn: 226353
2015-01-17 00:33:13 +00:00
Matthias Braun
28a5259b65 RegisterCoalescer: Cleanup by factoring out a common expression
llvm-svn: 226352
2015-01-17 00:33:11 +00:00
Matthias Braun
bc1c8ad9a4 RegisterCoalescer: Cleanup comment style
- Consistenly put comments above the function declaration, not the
  definition. To achieve this some duplicate comments got merged and
  some comment parts describing implementation details got moved into their
  functions.
- Consistently use doxygen comments above functions.
- Do not use doxygen comments inside functions.

llvm-svn: 226351
2015-01-17 00:33:09 +00:00
Matthias Braun
2d7fcc6f07 RegisterCoalescer: Drive-by typo + whitespace fix
llvm-svn: 226350
2015-01-17 00:33:06 +00:00
Lang Hames
2fdc54a320 [RuntimeDyld] Remove the brace initialization that was introduced in r226341.
Evidently MSVC doesn't like it.

llvm-svn: 226349
2015-01-17 00:32:56 +00:00
Philip Reames
b94fb91ee1 Update a comment
Be a bit more explicit about the fact that addrspace(1) is not reserved.

llvm-svn: 226344
2015-01-16 23:21:07 +00:00
Philip Reames
f7355553da clang-format all the GC related files (NFC)
Nothing interesting here...

llvm-svn: 226342
2015-01-16 23:16:12 +00:00
Lang Hames
58d75069f4 [RuntimeDyld] Track symbol visibility in RuntimeDyld.
RuntimeDyld symbol info previously consisted of just a Section/Offset pair. This
patch replaces that pair type with a SymbolInfo class that also tracks symbol
visibility. A new method, RuntimeDyld::getExportedSymbolLoadAddress, is
introduced which only returns a non-zero result for exported symbols. For
non-exported or non-existant symbols this method will return zero. The
RuntimeDyld::getSymbolAddress method retains its current behavior, returning
non-zero results for all symbols regardless of visibility.

No in-tree clients of RuntimeDyld are changed. The newly introduced
functionality will be used by the Orc APIs.

No test case: Since this patch doesn't modify the behavior for any in-tree
clients we don't have a good tool to test this with yet. Once Orc is in we can
use it to write regression tests that test these changes.

llvm-svn: 226341
2015-01-16 23:13:56 +00:00
Kevin Enderby
2473644cb2 Fix the Archive::Child::getRawSize() method used by llvm-objdump’s -archive-headers option
and tweak its use in llvm-objdump.  Add back the test case for the -archive-headers option.

llvm-svn: 226332
2015-01-16 22:10:36 +00:00
Colin LeMahieu
b0ae008450 [Hexagon] Converting halfword to doubleword multiply intrinsics.
llvm-svn: 226326
2015-01-16 21:41:57 +00:00
Colin LeMahieu
3f0206ff73 [Hexagon] Converting accumulating halfword multiply intrinsics to patterns.
llvm-svn: 226324
2015-01-16 21:36:34 +00:00
Colin LeMahieu
6559af4ce0 [Hexagon] Beginning converting intrinsics to patterns instead of duplicated definitions. Converting halfword multiply intrinsics.
llvm-svn: 226318
2015-01-16 20:38:54 +00:00
Colin LeMahieu
c4ad57bf83 [Hexagon] Fix 226309, replacement atomic store patterns didn't actually exist, added new versions.
llvm-svn: 226315
2015-01-16 20:16:14 +00:00
Saleem Abdulrasool
7c9433dcb1 X86: fix comment typo in AsmParser
Fix a typo.  NFC.

llvm-svn: 226313
2015-01-16 20:16:06 +00:00
Philip Reames
c6126d1689 Move ownership of GCStrategy objects to LLVMContext
Note: This change ended up being slightly more controversial than expected.  Chandler has tentatively okayed this for the moment, but I may be revisiting this in the near future after we settle some high level questions.

Rather than have the GCStrategy object owned by the GCModuleInfo - which is an immutable analysis pass used mainly by gc.root - have it be owned by the LLVMContext. This simplifies the ownership logic (i.e. can you have two instances of the same strategy at once?), but more importantly, allows us to access the GCStrategy in the middle end optimizer. To this end, I add an accessor through Function which becomes the canonical way to get at a GCStrategy instance.

In the near future, this will allows me to move some of the checks from http://reviews.llvm.org/D6808 into the Verifier itself, and to introduce optimization legality predicates for some of the recent additions to InstCombine. (These will follow as separate changes.)

Differential Revision: http://reviews.llvm.org/D6811

llvm-svn: 226311
2015-01-16 20:07:33 +00:00
Colin LeMahieu
1a5115554a [Hexagon] Removing old duplicate atomic load/store patterns.
llvm-svn: 226309
2015-01-16 19:53:35 +00:00
Philip Reames
ae413f6c26 Remove gc.root's findCustomSafePoints mechanism
Searching all of the existing gc.root implementations I'm aware of (all three of them), there was exactly one use of this mechanism, and that was to implement a performance improvement that should have been applied to the default lowering.

Having this function is requiring a dependency on a CodeGen class (MachineFunction), in a class which is otherwise completely independent of CodeGen. I could solve this differently, but given that I see absolutely no value in preserving this mechanism, I going to just get rid of it.

Note: Tis is the first time I'm intentionally breaking previously supported gc.root functionality. Given 3.6 has branched, I believe this is a good time to do this.

Differential Revision: http://reviews.llvm.org/D7004

llvm-svn: 226305
2015-01-16 19:33:28 +00:00
Colin LeMahieu
33e84773e4 [Hexagon] Converting old patterns to new versions using classes.
llvm-svn: 226304
2015-01-16 19:29:59 +00:00
Adam Nemet
9fe8c32290 [AVX512] Add intrinsics for masked aligned FP loads and stores
Similar to the unaligned cases.

Test was generated with update_llc_test_checks.py.

Part of <rdar://problem/17688758>

llvm-svn: 226296
2015-01-16 18:50:09 +00:00
Duncan P. N. Exon Smith
97ed3e1e77 IR: Allow 16-bits for column info
Raise the limit for column information from 8 bits to 16 bits.

llvm-svn: 226291
2015-01-16 17:33:08 +00:00
Duncan P. N. Exon Smith
fb3f1458b0 IR: Cleanup dead code, NFC
Line/column fixups already exist in `MDLocation`.  Delete the duplicated
logic in `DebugLoc`.

llvm-svn: 226290
2015-01-16 17:31:29 +00:00
Colin LeMahieu
8abdb53b9d [Hexagon] Updating call/jump instruction patterns.
llvm-svn: 226288
2015-01-16 17:05:27 +00:00
Andrea Di Biagio
d2869df8d3 [X86][DAG] Disable target specific combine on INSERTPS dag nodes at -O0.
This patch disables target specific combine on X86ISD::INSERTPS dag nodes
if optlevel is CodeGenOpt::None.

The backend currently implements a target specific combine rule that converts
a vector load used by an INSERTPS dag node into a scalar load plus a
scalar_to_vector. This allows ISel to select a single INSERTPSrm instead of
two instructions (i.e. a vector load plus INSERTPSrr).

However, the existing target combine rule on INSERTPS nodes only works under
the assumption that ISel will always be able to match an INSERTPSrm. This is
not true in general at -O0, since the backend only allows folding a load into
the memory operand of an instruction if the optimization level is not
CodeGenOpt::None.

In the example below:

//
__m128 test(__m128 a, __m128 *b) {
  __m128 c = _mm_insert_ps(a, *b, 1 << 6);
  return c;
}
//

Before this patch, at -O0, the backend would have canonicalized the load to 'b'
into a scalar load plus scalar_to_vector. Later on, ISel would have selected an
INSERTPSrr leaving the insertps mask in an inconsistent state:

  movss 4(%rdi), %xmm1
  insertps  $64, %xmm1, %xmm0 # xmm0 = xmm1[1],xmm0[1,2,3].

With this patch, the backend avoids folding the vector load into the operand of
the INSERTPS. The new codegen at -O0 is:

  movaps (%rdi), %xmm1
  insertps  $64, %xmm1, %xmm0 # %xmm1[1],xmm0[1,2,3].

llvm-svn: 226277
2015-01-16 14:55:26 +00:00
Toma Tabacu
d8b3aeeabe [mips] Remove a redundant semicolon and add space before curly brackets. NFC.
llvm-svn: 226269
2015-01-16 10:45:15 +00:00
Timur Iskhodzhanov
6d120e1a54 Revert r226242 - Revert Revert Don't create new comdats in CodeGen
This breaks AddressSanitizer (ninja check-asan) on Windows

llvm-svn: 226251
2015-01-16 08:38:45 +00:00
Hal Finkel
04316a019c [PowerPC] Adjust PatchPoints for ppc64le
Bill Schmidt pointed out that some adjustments would be needed to properly
support powerpc64le (using the ELF V2 ABI). For one thing, R11 is not available
as a scratch register, so we need to use R12. R12 is also available under ELF
V1, so to maintain consistency, I flipped the order to make R12 the first
scratch register in the array under both ABIs.

llvm-svn: 226247
2015-01-16 04:40:58 +00:00