1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 21:42:54 +02:00
Commit Graph

72130 Commits

Author SHA1 Message Date
Robert Khasanov
3a54da967f [x86] Broadwell: ADOX/ADCX. Added _addcarryx_u{32|64} intrinsics to LLVM.
llvm-svn: 216162
2014-08-21 09:27:00 +00:00
Robert Khasanov
741d0742f0 [x86] Enable Broadwell target.
Added FeatureSMAP.

Broadwell ISA includes Haswell ISA + ADX + RDSEED + SMAP

llvm-svn: 216161
2014-08-21 09:16:12 +00:00
Zinovy Nis
6b7238f3b4 [INDVARS] Extend using of widening of induction variables for the cases of "sub nsw" and "mul nsw" instructions.
Currently only "add nsw" are widened. This patch eliminates tons of "sext" instructions for 64 bit code (and the corresponding target code) in cases like:

int N = 100;
float **A;

void foo(int x0, int x1)
{
        float * A_cur = &A[0][0];
        float * A_next = &A[1][0];
        for(int x = x0; x < x1; ++x).
        {
          // Currently only [x+N] case is widened. Others 2 cases lead to sext.
          // This patch fixes it, so all 3 cases do not need sext.
          const float div = A_cur[x + N] + A_cur[x - N] + A_cur[x * N];
          A_next[x] = div;
        }
}
...
> clang++ test.cpp -march=core-avx2 -Ofast  -fno-unroll-loops -fno-tree-vectorize -S -o -

Differential Revision: http://reviews.llvm.org/D4695

llvm-svn: 216160
2014-08-21 08:25:45 +00:00
Elena Demikhovsky
511b2e1f89 IntelJITEventListener updates to fix breaks by recent changes to EngineBuilder and DIContext.
By Arch Robison.

llvm-svn: 216159
2014-08-21 07:01:55 +00:00
Craig Topper
65775cc03d Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size.
llvm-svn: 216158
2014-08-21 05:55:13 +00:00
David Majnemer
fb4e6230cf InstCombine: Fold ((A | B) & C1) ^ (B & C2) -> (A & C1) ^ B if C1^C2=-1
Adapted from a patch by Richard Smith, test-case written by me.

llvm-svn: 216157
2014-08-21 05:14:48 +00:00
Craig Topper
c80a14ad2f Remove custom implementations of max/min in StringRef that was originally added to work an old gcc bug. I believe its been fixed by now.
llvm-svn: 216156
2014-08-21 04:31:10 +00:00
Jiangning Liu
8819f1fe36 Fix a bug around truncating vector in const prop.
In constant folding stage, "TRUNC" can't handle vector data type.

llvm-svn: 216149
2014-08-21 02:12:35 +00:00
Jiangning Liu
c14e7de948 Revert r216066, "Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type".
llvm-svn: 216147
2014-08-21 01:59:30 +00:00
Quentin Colombet
be2a400039 [PeepholeOptimizer] Take advantage of the isInsertSubreg property in the
advanced copy optimization.

This is the final step patch toward transforming:
udiv    r0, r0, r2
udiv    r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov    r0, r1, d16
bx      lr

into:
udiv    r0, r0, r2
udiv    r1, r1, r3
bx      lr

Indeed, thanks to this patch, this optimization is able to look through
vmov.32 d16[0], r0
vmov.32 d16[1], r1

and is able to rewrite the following sequence:
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov    r0, r1, d16

into simple generic GPR copies that the coalescer managed to remove.

<rdar://problem/12702965>

llvm-svn: 216144
2014-08-21 00:19:16 +00:00
Quentin Colombet
fbc7f5c996 [ARM] Mark VSETLNi32 with the InsertSubreg property and implement the related
target hook.

This patch teaches the compiler that:
dX = VSETLNi32 dY, rZ, imm
is the same as:
dX = INSERT_SUBREG dY, rZ, translateImmToSubIdx(imm)

<rdar://problem/12702965>

llvm-svn: 216143
2014-08-21 00:10:52 +00:00
James Molloy
65aa6c84f5 [LoopVectorize] Up the maximum unroll factor to 4 for AArch64
Only for Cortex-A57 and Cyclone for now, where it has shown wins.

llvm-svn: 216141
2014-08-21 00:02:51 +00:00
James Molloy
3bb7b30942 [LoopVectorizer] Limit unroll factor in the presence of nested reductions.
If we have a scalar reduction, we can increase the critical path length if the loop we're unrolling is inside another loop. Limit, by default to 2, so the critical path only gets increased by one reduction operation.

llvm-svn: 216140
2014-08-20 23:53:52 +00:00
Quentin Colombet
1849edbdf6 Add isInsertSubreg property.
This patch adds a new property: isInsertSubreg and the related target hooks:
TargetIntrInfo::getInsertSubregInputs and
TargetInstrInfo::getInsertSubregLikeInputs to specify that a target specific
instruction is a (kind of) INSERT_SUBREG.

The approach is similar to r215394.

<rdar://problem/12702965>

llvm-svn: 216139
2014-08-20 23:49:36 +00:00
Jonathan Roelofs
63874dab8e Lower thumbv4t & thumbv5 lo->lo copies through a push-pop sequence
On pre-v6 hardware, 'MOV lo, lo' gives undefined results, so such copies need to
be avoided. This patch trades simplicity for implementation time at the expense
of performance... As they say: correctness first, then performance.

See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075998.html for a few
ideas on how to make this better.

llvm-svn: 216138
2014-08-20 23:38:50 +00:00
Quentin Colombet
2d800208a6 [PeepholeOptimizer] Take advantage of the isExtractSubreg property in the
advanced copy optimization.

This patch is a step toward transforming:
udiv	r0, r0, r2
udiv	r1, r1, r3
vmov.32	d16[0], r0
vmov.32	d16[1], r1
vmov	r0, r1, d16
bx	lr

into:
udiv	r0, r0, r2
udiv	r1, r1, r3
bx	lr

Indeed, thanks to this patch, this optimization is able to look through
vmov r0, r1, d16
but it does not understand yet
vmov.32 d16[0], r0
vmov.32 d16[1], r1

Comming patches will fix that and update the related test case.

<rdar://problem/12702965>

llvm-svn: 216136
2014-08-20 23:13:02 +00:00
Yi Jiang
aee2472606 New InstCombine pattern: (icmp ult/ule (A + C1), C3) | (icmp ult/ule (A + C2), C3) to (icmp ult/ule ((A & ~(C1 ^ C2)) + max(C1, C2)), C3) under certain condition
llvm-svn: 216135
2014-08-20 22:55:40 +00:00
Alexey Samsonov
b9f04a556e Don't allow MCStreamer::EmitIntValue to output 0-byte integers.
It makes no sense and can hide bugs. In particular, it lead
to left shift by 64 bits, which is an undefined behavior,
properly reported by UBSan.

llvm-svn: 216134
2014-08-20 22:46:38 +00:00
Quentin Colombet
67dd6f5956 [ARM] Mark VMOVRRD with the ExtractSubreg property and implement the related
target hook.

This patch teaches the compiler that:
rX, rY = VMOVRRD dZ
is the same as:
rX = EXTRACT_SUBREG dZ, ssub_0
rY = EXTRACT_SUBREG dZ, ssub_1

<rdar://problem/12702965>

llvm-svn: 216132
2014-08-20 22:16:19 +00:00
Alexey Samsonov
a64dcf1293 Fix undefined behavior (left shift of negative value) in SystemZ backend.
This bug is reported by UBSan.

llvm-svn: 216131
2014-08-20 21:56:43 +00:00
Quentin Colombet
b02f26b10f Add isExtractSubreg property.
This patch adds a new property: isExtractSubreg and the related target hooks:
TargetIntrInfo::getExtractSubregInputs and
TargetInstrInfo::getExtractSubregLikeInputs to specify that a target specific
instruction is a (kind of) EXTRACT_SUBREG.

The approach is similar to r215394.

<rdar://problem/12702965>

llvm-svn: 216130
2014-08-20 21:51:26 +00:00
Alexey Samsonov
e029f253b7 Fix null reference creation in SelectionDAG constructor.
Store TargetSelectionDAGInfo as a pointer instead of a reference:
getSelectionDAGInfo() may not be implemented for certain backends
(e.g. it's not currently implemented for R600).

This bug is reported by UBSan.

llvm-svn: 216129
2014-08-20 21:40:15 +00:00
Alexey Samsonov
439f7833fd Fix undefined behavior (left shift of negative value) in Hexagon backend.
This bug is reported by UBSan.

llvm-svn: 216125
2014-08-20 21:22:03 +00:00
Alexey Samsonov
08af4466dd Cleanup: Delete seemingly unused reference to MachineDominatorTree from ScheduleDAGInstrs.
llvm-svn: 216124
2014-08-20 20:57:26 +00:00
Sanjay Patel
3effcd99a5 Don't prevent a vselect of constants from becoming a single load (PR20648).
Fix for PR20648 - http://llvm.org/bugs/show_bug.cgi?id=20648

This patch checks the operands of a vselect to see if all values are constants.
If yes, bail out of any further attempts to create a blend or shuffle because
SelectionDAGLegalize knows how to turn this kind of vselect into a single load.

This already happens for machines without SSE4.1, so the added checks just send
more targets down that path.

Differential Revision: http://reviews.llvm.org/D4934

llvm-svn: 216121
2014-08-20 20:34:56 +00:00
Duncan P. N. Exon Smith
305e4c4ae6 X86: Align the stack on word boundaries in LowerFormalArguments()
The goal of the patch is to implement section 3.2.3 of the AMD64 ABI
correctly.  The controlling sentence is, "The size of each argument gets
rounded up to eightbytes.  Therefore the stack will always be eightbyte
aligned." The equivalent sentence in the i386 ABI page 37 says, "At all
times, the stack pointer should point to a word-aligned area."  For both
architectures, the stack pointer is not being rounded up to the nearest
eightbyte or word between the last normal argument and the first
variadic argument.

Patch by Thomas Jablin!

llvm-svn: 216119
2014-08-20 19:40:59 +00:00
Alexey Samsonov
d47abf2d7c Fix null reference creation in ScheduleDAGInstrs constructor call.
Both MachineLoopInfo and MachineDominatorTree may be null in ScheduleDAGMI
constructor call. It is undefined behavior to take references to these values.

This bug is reported by UBSan.

llvm-svn: 216118
2014-08-20 19:36:05 +00:00
Keno Fischer
1dbe2693fc Do not insert a tail call when returning multiple values on X86
Summary: This fixes http://llvm.org/bugs/show_bug.cgi?id=19530.
The problem is that X86ISelLowering erroneously thought the third call
was eligible for tail call elimination.
It would have been if it's return value was actually the one returned
by the calling function, but here that is not the case and
additional values are being returned.

Test Plan: Test case from the original bug report is included.

Reviewers: rafael

Reviewed By: rafael

Subscribers: rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D4968

llvm-svn: 216117
2014-08-20 19:00:37 +00:00
Alexey Samsonov
73266a6708 Fix undefined behavior (left shift by 64 bits) in ScaledNumber::toString().
This bug is reported by UBSan.

llvm-svn: 216116
2014-08-20 18:30:07 +00:00
Sanjay Patel
55357e1fa3 critical-anti-dependency breaker: don't use reg def info from kill insts (PR20308)
In PR20308 ( http://llvm.org/bugs/show_bug.cgi?id=20308 ), the critical-anti-dependency breaker
caused a miscompile because it broke a WAR hazard using a register that it thinks is available
based on info from a kill inst. Until PR18663 is solved, we shouldn't use any def/use info from
a kill because they are really just nops.

This patch adds guard checks for kills around calls to ScanInstruction() where the DefIndices
array is set. For good measure, add an assert in ScanInstruction() so we don't hit this bug again.

The test case is a reduced version of the code from the bug report.

Differential Revision: http://reviews.llvm.org/D4977

llvm-svn: 216114
2014-08-20 18:03:00 +00:00
Quentin Colombet
5404c5510b [PeepholeOptimizer] Refactor the advanced copy optimization to take advantage of
the isRegSequence property.

This is a follow-up of r215394 and r215404, which respectively introduces the
isRegSequence property and uses it for ARM.

Thanks to the property introduced by the previous commits, this patch is able
to optimize the following sequence:
vmov	d0, r2, r3
vmov	d1, r0, r1
vmov	r0, s0
vmov	r1, s2
udiv	r0, r1, r0
vmov	r1, s1
vmov	r2, s3
udiv	r1, r2, r1
vmov.32	d16[0], r0
vmov.32	d16[1], r1
vmov	r0, r1, d16
bx	lr

into:
udiv	r0, r0, r2
udiv	r1, r1, r3
vmov.32	d16[0], r0
vmov.32	d16[1], r1
vmov	r0, r1, d16
bx	lr

This patch refactors how the copy optimizations are done in the peephole
optimizer. Prior to this patch, we had one copy-related optimization that
replaced a copy or bitcast by a generic, more suitable (in terms of register
file), copy.

With this patch, the peephole optimizer features two copy-related optimizations:
1. One for rewriting generic copies to generic copies:
PeepholeOptimizer::optimizeCoalescableCopy.
2. One for replacing non-generic copies with generic copies:
PeepholeOptimizer::optimizeUncoalescableCopy.

The goals of these two optimizations are slightly different: one rewrite the
operand of the instruction (#1), the other kills off the non-generic instruction
and replace it by a (sequence of) generic instruction(s).

Both optimizations rely on the ValueTracker introduced in r212100.

The ValueTracker has been refactored to use the information from the
TargetInstrInfo for non-generic instruction. As part of the refactoring, we
switched the tracking from the index of the definition to the actual register
(virtual or physical). This one change is to provide better consistency with
register related APIs and to ease the use of the TargetInstrInfo.

Moreover, this patch introduces a new helper class CopyRewriter used to ease the
rewriting of generic copies (i.e., #1).

Finally, this patch adds a dead code elimination pass right after the peephole
optimizer to get rid of dead code that may appear after rewriting.

This is related to <rdar://problem/12702965>.

Review: http://reviews.llvm.org/D4874
llvm-svn: 216088
2014-08-20 17:41:48 +00:00
Rafael Espindola
016a21994a Remove unused field.
llvm-svn: 216086
2014-08-20 17:33:44 +00:00
Juergen Ributzka
7c8f6aa104 [FastISel][AArch64] Don't fold the sign-/zero-extend from i1 into the compare.
This fixes a bug I introduced in a previous commit (r216033). Sign-/Zero-
extension from i1 cannot be folded into the ADDS/SUBS instructions. Instead both
operands have to be sign-/zero-extended with separate instructions.

Related to <rdar://problem/17913111>.

llvm-svn: 216073
2014-08-20 16:34:15 +00:00
Aaron Ballman
1a252af32a Silencing a -Wcast-qual warning. NFC.
llvm-svn: 216068
2014-08-20 12:54:13 +00:00
Aaron Ballman
85f7f5057f Silencing an MSVC C4334 warning ('<<' : result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)). NFC.
llvm-svn: 216067
2014-08-20 12:14:35 +00:00
Jiangning Liu
c3dd378a9e Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type
legalization stage. With those two optimizations, fewer signed/zero extension
instructions can be inserted, and then we can expose more opportunities to
Machine CSE pass in back-end.

llvm-svn: 216066
2014-08-20 12:05:15 +00:00
Pavel Chupin
77b41f178f [x32] Fix FrameIndex check in SelectLEA64_32Addr
Summary:
Fixes http://llvm.org/bugs/show_bug.cgi?id=20016 reproducible on new
lea-5.ll case.
Also use RSP/RBP for x32 lea to save 1 byte used for 0x67 prefix in
ESP/EBP case.

Test Plan: lea tests modified to include x32/nacl and new test added

Reviewers: nadav, dschuff, t.p.northover

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D4929

llvm-svn: 216065
2014-08-20 11:59:22 +00:00
Yi Kong
57329b1cc2 ARM: Fix codegen for rbit intrinsic
LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic.
According to ARM ARM, rbit only takes register as argument, not immediate.
The correct instruction should be rbit <Rd>, <Rm>.

The bug was originally introduced in r211057.

Differential Revision: http://reviews.llvm.org/D4980

llvm-svn: 216064
2014-08-20 10:40:20 +00:00
David Majnemer
03fa77d0ce InstCombine: Annotate sub with nuw when we prove it's safe
We can prove that a 'sub' can be a 'sub nuw' if the left-hand side is
negative and the right-hand side is non-negative.

llvm-svn: 216045
2014-08-20 07:17:31 +00:00
Craig Topper
baef4556f6 Fix an off by 1 bug that prevented SmallPtrSet from using all of its 'small' capacity. Then fix the early return in the move constructor that prevented 'small' moves from clearing the NumElements in the moved from object. The directed test missed this because it was always testing large moves due to the off by 1 bug.
llvm-svn: 216044
2014-08-20 04:41:36 +00:00
Peter Collingbourne
e94d272f0d [dfsan] Treat vararg custom functions like unimplemented functions.
Because declarations of these functions can appear in places like autoconf
checks, they have to be handled somehow, even though we do not support
vararg custom functions. We do so by printing a warning and calling the
uninstrumented function, as we do for unimplemented functions.

llvm-svn: 216042
2014-08-20 01:40:23 +00:00
Juergen Ributzka
ce5953230a [FastISel][AArch64] Use the proper FMOV instruction to materialize a +0.0.
Use FMOVWSr/FMOVXDr instead of FMOVSr/FMOVDr, which have the proper register
class to be used with the zero register. This makes the MachineInstruction
verifier happy again.

This is related to <rdar://problem/18027157>.

llvm-svn: 216040
2014-08-20 01:10:36 +00:00
David Majnemer
b02f8f16bc InstCombine: Annotate sub with nsw when we prove it's safe
We can prove that a 'sub' can be a 'sub nsw' under certain conditions:
- The sign bits of the operands is the same.
- Both operands have more than 1 sign bit.

The subtraction cannot be a signed overflow in either case.

llvm-svn: 216037
2014-08-19 23:36:30 +00:00
Juergen Ributzka
21b19be38f [FastISel][AArch64] Factor out ADDS/SUBS instruction emission and add support for extensions and shift folding.
Factor out the ADDS/SUBS instruction emission code into helper functions and
make the helper functions more clever to support most of the different ADDS/SUBS
instructions the architecture support. This includes better immedediate support,
shift folding, and sign-/zero-extend folding.

This fixes <rdar://problem/17913111>.

llvm-svn: 216033
2014-08-19 22:29:55 +00:00
Rafael Espindola
ee775d5673 Split parseAssembly into parseAssembly and parseAssemblyInto.
This should restore the functionality of parsing new code into an existing
module without the confusing interface.

llvm-svn: 216031
2014-08-19 22:05:47 +00:00
Alexey Samsonov
8dee78d45c Delete unused argument in AArch64MCInstLower constructor: it doesn't
use Mangler, and Mangler is in fact not even created when AArch64MCInstLower
is constructed.

This bug is reported by UBSan.

llvm-svn: 216030
2014-08-19 21:51:08 +00:00
Duncan P. N. Exon Smith
56ea569496 IR: Implement uselistorder assembly directives
Implement `uselistorder` and `uselistorder_bb` assembly directives,
which allow the use-list order to be recovered when round-tripping to
assembly.

This is the bulk of PR20515.

llvm-svn: 216025
2014-08-19 21:30:15 +00:00
Duncan P. N. Exon Smith
986b92fd8e verify-uselistorder: Force -preserve-bc-use-list-order
llvm-svn: 216022
2014-08-19 21:08:27 +00:00
Lang Hames
a4e0584cbc [MCJIT] Allow '$' characters in symbol names in RuntimeDyldChecker.
llvm-svn: 216017
2014-08-19 20:04:45 +00:00
Duncan P. N. Exon Smith
6eb8c34f72 IR: Fix ConstantExpr::replaceUsesOfWithOnConstant()
Change `ConstantExpr` to follow the model the other constants are using:
only malloc a replacement if it's going to be used.  This fixes a subtle
bug where if an API user had used `ConstantExpr::get()` already to
create the replacement but hadn't given it any users, we'd delete the
replacement.

This relies on r216015 to thread `OnlyIfReduced` through
`ConstantExpr::getWithOperands()`.

llvm-svn: 216016
2014-08-19 20:03:35 +00:00