1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 05:23:45 +02:00
Commit Graph

23878 Commits

Author SHA1 Message Date
Benjamin Kramer
cc45aefeb0 X86: If SSE4.1 is missing lower SMUL_LOHI of v4i32 to pmuludq and fix up the high parts.
This is more expensive than pmuldq but still cheaper than scalarizing the whole thing.

llvm-svn: 207370
2014-04-27 18:47:41 +00:00
Saleem Abdulrasool
0d344f3333 MC: duplicate .file test for WoA (SVN r207341)
Since the COFF tests are dependent on X86, duplicate the test for ARM.  Use the
default check prefix.

llvm-svn: 207365
2014-04-27 16:10:57 +00:00
NAKAMURA Takumi
e37f6b57b8 Revert r206989, "Mark llvm/test/BugPoint/compile-custom.ll as XFAIL:vg_leak." It has been fixed since r207265.
llvm-svn: 207355
2014-04-27 11:59:33 +00:00
Benjamin Kramer
ef79b623b8 Update test not to check for a shuffle of an all-zero vector.
llvm-svn: 207354
2014-04-27 11:54:45 +00:00
Benjamin Kramer
f9669b910d SelectionDAG: Aggressively fold shuffles of constant splats.
llvm-svn: 207352
2014-04-27 11:41:06 +00:00
Saleem Abdulrasool
61c5d7a3ff tests: Windows ARM now supports object emission
Update lit.cfg with the fact that LLVM can now generate WoA PE/COFF objects.

llvm-svn: 207347
2014-04-27 04:29:36 +00:00
Saleem Abdulrasool
9cbf05197a COFF: move ARM COFF test to ARM directory
The COFF tests all assume X86.  Just move the new COFF tests under ARM to
appease the build bots.

llvm-svn: 207346
2014-04-27 04:29:32 +00:00
Saleem Abdulrasool
7ece6ce430 Add WoA object file emission support
Introduce support for WoA PE/COFF object file emission from LLVM.  Add the new
target specific PE/COFF Streamer (ARMWinCOFFStreamer) that handles the ARM
specific behaviour of PE/COFF object emission.  ARM exception information is not
yet emitted and is a TODO item.

The ARM specific object writer (ARMWinCOFFObjectWriter) handles the ARM specific
relocation handling in conjunction with the WinCOFFObjectWriter in the MC layer.
The MC layer needs to be updated to deal with the relocation adjustments.
Branch relocations are adjusted by 4 bytes (unlikely their ELF counterparts).

Minor tweaks to switch multiple conditional checks into equivalent switch
statements.  The ObjectFileInfo is updated to relax the object file setup for
Windows COFF.  Move the architecture checks into an assertion.  Windows COFF is
currently only supported on x86, x86_64, and ARM (thumb).  Rather than
defaulting to ELF, we will refuse to generate an object file.  This is better
though as you do not get an (arbitrary) object file which is different from the
request.

llvm-svn: 207345
2014-04-27 03:48:22 +00:00
Benjamin Kramer
e6a357c0fc DAGCombiner: Simplify code a bit, make more transforms work with vectors.
llvm-svn: 207338
2014-04-26 23:09:49 +00:00
David Blaikie
b12eecfe0b DebugInfo: Fix and test a regression caused by r207263 causing the DW_AT_object_pointer to go missing on blocks
Noticed by inspection. Test coverage added.

llvm-svn: 207333
2014-04-26 22:12:18 +00:00
David Blaikie
ce1b727607 Include C++ source for debug info test case committed in r207323
llvm-svn: 207324
2014-04-26 18:25:07 +00:00
David Blaikie
25a7c1380d DWARF Type Units: Avoid emitting type units under fission if the type requires an address.
Since there's no way to ensure the type unit in the .dwo and the type
unit skeleton in the .o are correlated, this cannot work.

This implementation is a bit inefficient for a few reasons, called out
in comments.

llvm-svn: 207323
2014-04-26 17:27:38 +00:00
Benjamin Kramer
dfc082bbd6 X86TTI: i16/i32 vector div with a constant (splat) divisor are reasonably cheap now.
Turn vectorization back on.

llvm-svn: 207320
2014-04-26 14:53:05 +00:00
Benjamin Kramer
6ac674546f X86: Lower SMUL_LOHI of v4i32 to pmuldq when SSE4.1 is available.
llvm-svn: 207318
2014-04-26 14:12:19 +00:00
Benjamin Kramer
aad9317559 X86: Add patterns for MULHU/MULHS of v8i16 and v16i16.
This gets us pretty code for divs of i16 vectors. Turn the existing
intrinsics into the corresponding nodes.

llvm-svn: 207317
2014-04-26 13:01:03 +00:00
Benjamin Kramer
163df6bc62 DAGCombiner: Turn divs of vector splats into vectorized multiplications.
Otherwise the legalizer would just scalarize everything. Support for
mulhi in the targets isn't that great yet so on most targets we get
exactly the same scalarized output. Add a test for x86 vector udiv.

I had to disable the mulhi nodes on ARM because there aren't any patterns
for it. As far as I know ARM has instructions for getting the high part of
a multiply so this should be fixed.

llvm-svn: 207315
2014-04-26 12:06:28 +00:00
Michael Zolotukhin
1c3340dd78 Revert r206749 till a final decision about the intrinsics is made.
llvm-svn: 207313
2014-04-26 09:56:41 +00:00
Gerolf Hoflehner
13cf626de6 RecursivelyDeleteTriviallyDeadInstructions() could remove
more than 1 instruction. The caller need to be aware of this
and adjust instruction iterators accordingly.

rdar://16679376

Repaired r207302.

llvm-svn: 207309
2014-04-26 05:58:11 +00:00
Juergen Ributzka
d783eb73d8 [DAG] During DAG legalization keep opaque constants even after expanding.
The included test case would return the incorrect results, because the expansion
of an shift with a constant shift amount of 0 would generate undefined behavior.

This is because ExpandShiftByConstant assumes that all shifts by constants with
a value of 0 have already been optimized away. This doesn't happen for opaque
constants and usually this isn't a problem, because opaque constants won't take
this code path - they are not supposed to. In the case that the opaque constant
has to be expanded by the legalizer, the legalizer would drop the opaque flag.
In this case we hit the limitations of ExpandShiftByConstant and create incorrect
code.

This commit fixes the legalizer by not dropping the opaque flag when expanding
opaque constants and adding an assertion to ExpandShiftByConstant to catch this
not supported case in the future.

This fixes <rdar://problem/16718472>

llvm-svn: 207304
2014-04-26 02:58:04 +00:00
Gerolf Hoflehner
99a3e5b9f5 Revert commit r207302 since build failures
have been reported.

llvm-svn: 207303
2014-04-26 02:03:17 +00:00
Gerolf Hoflehner
54bf53d166 RecursivelyDeleteTriviallyDeadInstructions() could remove
more than 1 instruction. The caller need to be aware of this
and adjust instruction iterators accordingly.

rdar://16679376

llvm-svn: 207302
2014-04-26 01:19:16 +00:00
Quentin Colombet
f5142bf03e [X86] Implement TargetLowering::getScalingFactorCost hook.
Scaling factors are not free on X86 because every "complex" addressing mode
breaks the related instruction into 2 allocations instead of 1.

<rdar://problem/16730541>

llvm-svn: 207301
2014-04-26 01:11:26 +00:00
Andrea Di Biagio
815dfb7574 [InstCombine][X86] Teach how to fold calls to SSE2/AVX2 packed logical shift
right intrinsics.

A packed logical shift right with a shift count bigger than or equal to the
element size always produces a zero vector. In all other cases, it can be
safely replaced by a 'lshr' instruction.

llvm-svn: 207299
2014-04-26 01:03:22 +00:00
Filipe Cabecinhas
304b05721d Appease the almighty buildbots.
llvm-svn: 207295
2014-04-26 00:02:37 +00:00
Filipe Cabecinhas
54c5ad74d7 Optimization for certain shufflevector by using insertps.
Summary:
If we're doing a v4f32/v4i32 shuffle on x86 with SSE4.1, we can lower
certain shufflevectors to an insertps instruction:
When most of the shufflevector result's elements come from one vector (and
keep their index), and one element comes from another vector or a memory
operand.

Added tests for insertps optimizations on shufflevector.
Added support and tests for v4i32 vector optimization.

Reviewers: nadav

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3475

llvm-svn: 207291
2014-04-25 23:51:17 +00:00
Duncan P. N. Exon Smith
c54b3a7e23 Revert "blockfreq: Approximate irreducible control flow"
This reverts commit r207286.  It causes an ICE on the
cmake-llvm-x86_64-linux buildbot [1]:

    llvm/lib/Analysis/BlockFrequencyInfo.cpp: In lambda function:
    llvm/lib/Analysis/BlockFrequencyInfo.cpp:182:1: internal compiler error: in get_expr_operands, at tree-ssa-operands.c:1035

[1]: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/12093/steps/build_llvm/logs/stdio

llvm-svn: 207287
2014-04-25 23:16:58 +00:00
Duncan P. N. Exon Smith
3189616c35 blockfreq: Approximate irreducible control flow
Previously, irreducible backedges were ignored.  With this commit,
irreducible SCCs are discovered on the fly, and modelled as loops with
multiple headers.

This approximation specifies the headers of irreducible sub-SCCs as its
entry blocks and all nodes that are targets of a backedge within it
(excluding backedges within true sub-loops).  Block frequency
calculations act as if we insert a new block that intercepts all the
edges to the headers.  All backedges and entries to the irreducible SCC
point to this imaginary block.  This imaginary block has an edge (with
even probability) to each header block.

The result is now reasonable enough that I've added a number of
testcases for irreducible control flow.  I've outlined in
`BlockFrequencyInfoImpl.h` ways to improve the approximation.

<rdar://problem/14292693>

llvm-svn: 207286
2014-04-25 23:08:57 +00:00
Adrian Prantl
ca946260a4 Unbreak the gdb buildbot by not lowering dbg.declare intrinsics for arrays.
llvm-svn: 207284
2014-04-25 23:00:25 +00:00
Eric Christopher
0914022daa Make sure that rangelists are also relative to the compile unit
low_pc similar to location lists.

Fixes PR19563

llvm-svn: 207283
2014-04-25 22:23:54 +00:00
Tom Roeder
a8ef44c3ed Add an -mattr option to the gold plugin to support subtarget features in LTO
This adds support for an -mattr option to the gold plugin and to llvm-lto. This
allows the caller to specify details of the subtarget architecture, like +aes,
or +ssse3 on x86.  Note that this requires a change to the include/llvm-c/lto.h
interface: it adds a function lto_codegen_set_attr and it increments the
version of the interface.

llvm-svn: 207279
2014-04-25 21:46:51 +00:00
Adam Nemet
c4071d355a [LoopStrengthReduce] Don't trim formula that uses a subset of required registers
Consider this use from the new testcase:

  LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
    reg({1000,+,-1}<nw><%for.body>)
    -3003 + reg({3,+,3}<nw><%for.body>)
    -1001 + reg({1,+,1}<nuw><nsw><%for.body>)
    -1000 + reg({0,+,1}<nw><%for.body>)
    -3000 + reg({0,+,3}<nuw><%for.body>)
    reg({-1000,+,1}<nw><%for.body>)
    reg({-3000,+,3}<nsw><%for.body>)

This is the last use we consider for a solution in SolveRecurse, so CurRegs is
a large set.  (CurRegs is the set of registers that are needed by the
previously visited uses in the in-progress solution.)

ReqRegs is {
  {3,+,3}<nw><%for.body>,
  {1,+,1}<nuw><nsw><%for.body>
}

This is the intersection of the regs used by any of the formulas for the
current use and CurRegs.

Now, the code requires a formula to contain *all* these regs (the comment is
simply wrong), otherwise the formula is immediately disqualified.  Obviously,
no formula for this use contains two regs so they will all get disqualified.

The fix modifies the check to allow the formula in this case.  The idea is
that neither of these formulae is introducing any new registers which is the
point of this early pruning as far as I understand.

In terms of set arithmetic, we now allow formulas whose used regs are a subset
of the required regs not just the other way around.

There are few more loops in the test-suite that are now successfully LSRed.  I
have benchmarked those and found very minimal change.

Fixes <rdar://problem/13965777>

llvm-svn: 207271
2014-04-25 21:02:21 +00:00
Adrian Prantl
7566e72bb8 This reapplies r207235 with an additional bugfixes caught by the msan
buildbot - do not insert debug intrinsics before phi nodes.

Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.

Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.

This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source

rdar://problem/16679879
http://reviews.llvm.org/D3374

llvm-svn: 207269
2014-04-25 20:49:25 +00:00
Adrian Prantl
319db7c542 Revert "This reapplies r207130 with an additional testcase+and a missing check for"
This reverts commit 207235 to investigate msan buildbot breakage.

llvm-svn: 207250
2014-04-25 18:18:09 +00:00
Saleem Abdulrasool
ee8639eae1 ARM: remove @llvm.arm.sevl
This intrinsic is no longer needed with the new @llvm.arm.hint(i32) intrinsic
which provides a generic, extensible manner for adding hint instructions.  This
functionality can now be represented as @llvm.arm.hint(i32 5).

llvm-svn: 207246
2014-04-25 17:51:25 +00:00
Manman Ren
5d62183fae [inline cold threshold] Command line argument for inline threshold will
override the default cold threshold.

When we use command line argument to set the inline threshold, the default
cold threshold will not be used. This is in line with how we use
OptSizeThreshold. When we want a higher threshold for all functions, we
do not have to set both inline threshold and cold threshold.

llvm-svn: 207245
2014-04-25 17:34:55 +00:00
Saleem Abdulrasool
710c823515 ARM: provide a new generic hint intrinsic
Introduce the llvm.arm.hint(i32) intrinsic that can be used to inject hints into
the instruction stream. This is particularly useful for generating IR from a
compiler where the user may inject an intrinsic (e.g. __yield). These are then
pattern substituted into the correct instruction which already existed.

llvm-svn: 207242
2014-04-25 17:24:24 +00:00
Adrian Prantl
c3a2bdef85 Reapply r207135 without modifications.
Debug info: Let dbg.values inserted by LowerDbgDeclare inherit the location
of the dbg.value. This gets rid of tons of redundant variable DIEs in
subscopes.

rdar://problem/14874886, rdar://problem/16679936

llvm-svn: 207236
2014-04-25 17:01:04 +00:00
Adrian Prantl
7f9d1e9fd6 This reapplies r207130 with an additional testcase+and a missing check for
AllocaInst that was missing in one location.
Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.

Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.

This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source

rdar://problem/16679879
http://reviews.llvm.org/D3374

llvm-svn: 207235
2014-04-25 17:01:00 +00:00
Tilmann Scheller
0fdf10c5a5 [ARM64] When compiling for ELF in PIC mode, local symbols shouldn't go through the GOT
There's no need for local symbols to go through the GOT, in fact it seems GNU ld is not even emitting GOT entries for local symbols and will error out when trying to resolve a GOT relocation for a local symbol.

This bug triggers when bootstrapping clang on AArch64 Linux with -fPIC and the ARM64 backend. The AArch64 backend is not affected.

With this commit it's now possible to bootstrap clang on AArch64 Linux with the ARM64 backend (-fPIC, -O3).

llvm-svn: 207226
2014-04-25 13:43:18 +00:00
Jiangning Liu
8300b91450 [ARM64] Handle fp128 for parameter passing on stack
llvm-svn: 207222
2014-04-25 12:07:03 +00:00
Tim Northover
1017923163 ARM64: fix assertion in ISelDAGToDAG
Also an unused variable, so double bonus!

This should deal with PR19548.

llvm-svn: 207221
2014-04-25 10:48:47 +00:00
Bradley Smith
51d6f272a3 [ARM64] Print preferred aliases for SFBM/UBFM in InstPrinter
llvm-svn: 207219
2014-04-25 10:25:29 +00:00
Kevin Qin
6a9c40e76d [ARM64] Add RUN lines for "–target arm64 –mattr=-fp-armv8" on AArch64 no-fp test.
This patch is a supplement of implementing predicate of FP, enabling aarch64 backend
no-fp tests on arm64 target for verification. During this, one bug is exposed and
fixed by this patch.

llvm-svn: 207215
2014-04-25 09:44:20 +00:00
Kevin Qin
f56429c01e [ARM64] Support crc predicate on ARM64.
According to the specification, CRC is an optional extension of the
architecture.

llvm-svn: 207214
2014-04-25 09:25:42 +00:00
Duncan P. N. Exon Smith
6117699d7e blockfreq: Only one mass distribution per node
Remove the concepts of "forward" and "general" mass distributions, which
was wrong.  The split might have made sense in an early version of the
algorithm, but it's definitely wrong now.

<rdar://problem/14292693>

llvm-svn: 207195
2014-04-25 04:38:43 +00:00
Duncan P. N. Exon Smith
7be280eab6 blockfreq: Use better branch weights in multiexit test
The branch weights were even before.  Make them different.

<rdar://problem/14292693>

llvm-svn: 207193
2014-04-25 04:38:37 +00:00
Duncan P. N. Exon Smith
d2094f508a blockfreq: Clean up irreducible testcases
Strip irreducible testcases to pure control flow.  The function calls
made the branch weights more believable but cluttered it up a lot.
There isn't going to be any constant analysis here, so just use dumb
branch logic to clarify the important parts.

<rdar://problem/14292693>

llvm-svn: 207192
2014-04-25 04:38:35 +00:00
Karthik Bhat
a6070a9b75 Allow vectorization of bit intrinsics in BB Vectorizer.
This patch adds support for vectorization of  bit intrinsics such as bswap,ctpop,ctlz,cttz.

llvm-svn: 207174
2014-04-25 03:33:48 +00:00
Justin Bogner
532de088bc ProfileData: Treat missing function counts as malformed
llvm-svn: 207172
2014-04-25 02:45:33 +00:00
Adrian Prantl
0338f80f17 Revert "This reapplies r207130 with an additional testcase+and a missing check for"
Typo in testcase.

llvm-svn: 207166
2014-04-25 00:42:50 +00:00