1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-29 23:12:55 +01:00
Commit Graph

10369 Commits

Author SHA1 Message Date
Matt Arsenault
62262e12fa R600/SI: Default to no single precision denormals.
llvm-svn: 213017
2014-07-14 23:40:43 +00:00
David Majnemer
94c981273e CodeGen: Stick constant pool entries in COMDAT sections for WinCOFF
COFF lacks a feature that other object file formats support: mergeable
sections.

To work around this, MSVC sticks constant pool entries in special COMDAT
sections so that each constant is in it's own section.  This permits
unused constants to be dropped and it also allows duplicate constants in
different translation units to get merged together.

This fixes PR20262.

Differential Revision: http://reviews.llvm.org/D4482

llvm-svn: 213006
2014-07-14 22:57:27 +00:00
Andrea Di Biagio
1b83284869 [DAGCombiner] Add more rules to combine shuffle vector dag nodes.
This patch teaches the DAGCombiner how to fold a pair of shuffles
according to rules:
  1.  shuffle(shuffle A, B, M0), B, M1) -> shuffle(A, B, M2)
  2.  shuffle(shuffle A, B, M0), A, M1) -> shuffle(A, B, M3)

The new rules would only trigger if the resulting shuffle has legal type and
legal mask.

Added test 'combine-vec-shuffle-3.ll' to verify that DAGCombiner correctly
folds shuffles on x86 when the resulting mask is legal. Also added some negative
cases to verify that we avoid introducing illegal shuffles.

llvm-svn: 213001
2014-07-14 22:46:26 +00:00
Bill Wendling
1a86df990e Unify the lowering of arguments during SjLj prepare.
The 'select true, %arg, undef' instruction can be used for both aggregate and
non-aggregate arguments.

llvm-svn: 212967
2014-07-14 18:21:11 +00:00
Saleem Abdulrasool
6f4137a9fc X86: correct 64-bit atomics on 32-bit
We would emit a libcall for a 64-bit atomic on x86 after SVN r212119.  This was
due to the misuse of hasCmpxchg16 to indicate if cmpxchg8b was supported on a
32-bit target.  They were added at different times and would result in the
border condition being mishandled.

This fixes the border case to emit the cmpxchg8b instruction for 64-bit atomic
operations on x86 at the cost of restoring a long-standing bug in the codegen.
We emit a cmpxchg8b on all x86 targets even where the CPU does not support this
instruction (pre-Pentium CPUs).  Although this bug should be fixed, this was
present prior to SVN r212119 and this change, so this is not really introducing
a regression.

llvm-svn: 212956
2014-07-14 16:28:13 +00:00
Tim Northover
4ef7afc394 X86: remove temporary atomicrmw used during lowering.
We construct a temporary "atomicrmw xchg" instruction when lowering atomic
stores for widths that aren't supported natively. This isn't on the top-level
worklist though, so it won't be removed automatically and we have to do it
ourselves once that itself has been lowered.

Thanks Saleem for pointing this out!

llvm-svn: 212948
2014-07-14 15:31:13 +00:00
Daniel Sanders
3af4859657 [mips] For the FP64A ABI, odd-numbered double-precision moves must not use mtc1/mfc1.
Summary:
This is because the FP64A the hardware will redirect 32-bit reads/writes
from/to odd-numbered registers to the upper 32-bits of the corresponding
even register. In effect, simulating FR=0 mode when FR=0 mode is not
available.

Unfortunately, we have to make the decision to avoid mfc1/mtc1 before
register allocation so we currently do this for even registers too.

FPXX has a similar requirement on 32-bit architectures that lack
mfhc1/mthc1 so this patch also handles the affected moves from the FPU for
FPXX too. Moves to the FPU were supported by an earlier commit.

Differential Revision: http://reviews.llvm.org/D4484

llvm-svn: 212938
2014-07-14 13:08:14 +00:00
Daniel Sanders
fc1f878e70 [mips] Use MFHC1 when it is available (MIPS32r2 and later) for both FP32 and FP64 moves
Summary:
This is similar to r210771 which did the same thing for MTHC1.

Also corrected MTHC1_D32 and MTHC1_D64 which used AFGR64 and FGR64 on the
wrong definitions.

Differential Revision: http://reviews.llvm.org/D4483

llvm-svn: 212936
2014-07-14 12:41:31 +00:00
Tim Northover
4ac35c9d7b AArch64: remove unnecessary pseudo-instruction.
Sufficiently twisted use of TableGen lets us write patterns directly for f16
(as an i16 promoted to i32) -> f32 conversion.

llvm-svn: 212933
2014-07-14 11:16:02 +00:00
Sasa Stankovic
d8104abd36 [mips] Expand BuildPairF64 to a spill and reload when the O32 FPXX ABI is
enabled and mthc1 and dmtc1 are not available (e.g. on MIPS32r1)

This prevents the upper 32-bits of a double precision value from being moved to
the FPU with mtc1 to an odd-numbered FPU register. This is necessary to ensure
that the code generated executes correctly regardless of the current FPU mode.

MIPS32r2 and above continues to use mtc1/mthc1, while MIPS-IV and above continue
to use dmtc1.

Differential Revision: http://reviews.llvm.org/D4465

llvm-svn: 212930
2014-07-14 09:40:29 +00:00
Bill Wendling
93c9860cb7 Support lowering of empty aggregates.
This crash was pretty common while compiling Rust for iOS (armv7). Reason -
SjLj preparation step was lowering aggregate arguments as ExtractValue +
InsertValue. ExtractValue has assertion which checks that there is some data in
value, which is not true in case of empty (no fields) structures. Rust uses
them quite extensively so this patch uses a 'select true, %val, undef'
instruction to lower the argument.

Patch by Valerii Hiora.

llvm-svn: 212922
2014-07-14 06:22:36 +00:00
Andrea Di Biagio
4242d12adc [DAGCombiner] Fix a crash caused by a missing check for legal type when trying to fold shuffles.
Verify that DAGCombiner does not crash when trying to fold a pair of shuffles
according to rule (added at r212539):
  (shuffle (shuffle A, Undef, M0), Undef, M1) -> (shuffle A, Undef, M2)

The DAGCombiner avoids folding shuffles if the resulting shuffle dag node
is not legal for the target. That means, the resulting shuffle must have
legal type and legal mask.

Before, the DAGCombiner only called method
'TargetLowering::isShuffleMaskLegal' to check if it was "safe" to fold according
to the above-mentioned rule. However, this caused a crash in the x86 backend
since method 'isShuffleMaskLegal' always expects to be called on a
legal vector type.

llvm-svn: 212915
2014-07-13 21:02:14 +00:00
Matt Arsenault
f6ce1be587 R600: Run more tests with promote alloca disabled.
Re-run tests changed in r211110 to test both paths.
Also fix broken check line.

llvm-svn: 212895
2014-07-13 02:46:17 +00:00
Matt Arsenault
fd221a07bc R600: Run private-memory test with and without alloca promote
The unpromoted path still needs to be tested since we can't
always promote to using LDS.

llvm-svn: 212894
2014-07-13 02:18:06 +00:00
Saleem Abdulrasool
00c379d824 AArch64: add support for llvm.aarch64.hint intrinsic
This adds a llvm.aarch64.hint intrinsic to mirror the llvm.arm.hint in order to
support the various hint intrinsic functions in the ACLE.

Add an optional pattern field that permits the subclass to specify the pattern
that matches the selection.  The intrinsic pattern is set as mayLoad, mayStore,
so overload the value for the definition of the hint instruction.

llvm-svn: 212883
2014-07-12 21:20:49 +00:00
Matt Arsenault
31e0179e8a R600: Add missing tests for some intrinsics
llvm-svn: 212870
2014-07-12 00:36:19 +00:00
Ulrich Weigand
1aae06e395 [PowerPC] Fix invalid displacement created by LocalStackAlloc
This commit fixes a bug in PPCRegisterInfo::isFrameOffsetLegal that
could result in the LocalStackAlloc pass creating an MI instruction
out-of-range displacement:
        %vreg17<def> = LD 33184, %vreg31; mem:LD8[%g](align=32)
        %G8RC:%vreg17 G8RC_and_G8RC_NOX0:%vreg31
(In final assembler output the top bits are stripped off, resulting
in a negative offset loading from below the stack pointer.)

Common code expects the isFrameOffsetLegal routine to verify whether
adding a given offset to the offset already present in the instruction
results in a valid displacement.  However, on PowerPC the routine
did not take the already present instruction offset into account.

This commit fixes isFrameOffsetLegal to add the instruction offset,
and updates a local caller (needsFrameBaseReg) to no longer add the
instruction offset itself before calling isFrameOffsetLegal.

Reviewed by Hal Finkel.

llvm-svn: 212832
2014-07-11 17:19:31 +00:00
Marek Olsak
7757b563f1 R600/SI: Use i32 vectors for resources and samplers
This affects new intrinsics only.

What surprises me is that v32i8 still works.

llvm-svn: 212831
2014-07-11 17:11:52 +00:00
Marek Olsak
6db1789f95 R600/SI: add sample and image intrinsics exposing all instruction fields
We need the intrinsics with offsets, so why not just add them all.
The R128 parameter will also be useful for reducing SGPR usage.
GL_ARB_image_load_store also adds some image GLSL modifiers like "coherent",
so Mesa will probably translate those to slc, glc, etc.

When LLVM 3.5 is released, I'll switch Mesa to these new intrinsics.

llvm-svn: 212830
2014-07-11 17:11:46 +00:00
Oliver Stannard
6090374181 ARM: Allow __fp16 as a function arg or return type for AArch64
ACLE 2.0 allows __fp16 to be used as a function argument or return
type. This enables this for AArch64.

llvm-svn: 212812
2014-07-11 13:33:46 +00:00
Quentin Colombet
854e94b5f3 [X86] Fix the inversion of low and high bits for the lowering of MUL_LOHI.
Also add a few comments.

<rdar://problem/17581756>

llvm-svn: 212808
2014-07-11 12:08:23 +00:00
Jan Vesely
f405b95fd6 R600: Implement float to long/ulong
Use alg. from LegalizeDAG.cpp
Move Expand setting to SIISellowering

v2: Extend existing tests instead of creating new ones
v3: use separate LowerFPTOSINT function
v4: use TargetLowering::expandFP_TO_SINT
    add comment about using FP_TO_SINT for uints

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 212773
2014-07-10 22:40:21 +00:00
Zoran Jovanovic
45b7aee40f [mips] Emit two CFI offset directives per double precision SDC1/LDC1
instead of just one for FR=1 registers
Differential Revision: http://reviews.llvm.org/D4310

llvm-svn: 212769
2014-07-10 22:23:30 +00:00
Andrea Di Biagio
dad37bbafb Extend the test coverage in combine-vec-shuffle-2.ll adding some negative tests.
Add test cases where we don't expect to trigger the combine optimizations
introduced at revision 212748.

No functional change intended.

llvm-svn: 212756
2014-07-10 18:59:41 +00:00
Matt Arsenault
49a604853d Revert "Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine.""
Don't try to convert the select condition type.

llvm-svn: 212750
2014-07-10 18:21:04 +00:00
Andrea Di Biagio
29af13b774 [DAG] Further improve the logic in DAGCombiner that folds a pair of shuffles into a single shuffle if the resulting mask is legal.
This patch teaches the DAGCombiner how to fold shuffles according to the
following new rules:
  1. shuffle(shuffle(x, y), undef) -> x
  2. shuffle(shuffle(x, y), undef) -> y
  3. shuffle(shuffle(x, y), undef) -> shuffle(x, undef)
  4. shuffle(shuffle(x, y), undef) -> shuffle(y, undef)

The backend avoids to combine shuffles according to rules 3. and 4. if
the resulting shuffle does not have a legal mask. This is to avoid introducing
illegal shuffles that are potentially expanded into a sub-optimal sequence of
target specific dag nodes during vector legalization.

Added test case combine-vec-shuffle-2.ll to verify that we correctly triggers
the new rules when combining shuffles.

llvm-svn: 212748
2014-07-10 18:04:55 +00:00
Akira Hatanaka
fc40b8c4ea [X86] Mark pseudo instruction TEST8ri_NOEREX as hasSIdeEffects=0.
Also, add a case clause in X86InstrInfo::shouldScheduleAdjacent to enable
macro-fusion.

<rdar://problem/15680770>

llvm-svn: 212747
2014-07-10 18:00:53 +00:00
Zoran Jovanovic
87a12de86d [mips] Added FPXX modeless calling convention.
Differential Revision: http://reviews.llvm.org/D4293

llvm-svn: 212726
2014-07-10 15:36:12 +00:00
Tim Northover
3a26804901 AArch64: correctly fast-isel i8 & i16 multiplies
We were asking for a register for type i8 or i16 which caused an assert.

rdar://problem/17620015

llvm-svn: 212718
2014-07-10 14:18:46 +00:00
Daniel Sanders
aa92e911ca [mips] Add support for -modd-spreg/-mno-odd-spreg
Summary:
When -mno-odd-spreg is in effect, 32-bit floating point values are not
permitted in odd FPU registers. The option also prohibits 32-bit and 64-bit
floating point comparison results from being written to odd registers.

This option has three purposes:
* It allows support for certain MIPS implementations such as loongson-3a that
  do not allow the use of odd registers for single precision arithmetic.
* When using -mfpxx, -mno-odd-spreg is the default and this allows us to
  statically check that code is compliant with the O32 FPXX ABI since mtc1/mfc1
  instructions to/from odd registers are guaranteed not to appear for any
  reason. Once this has been established, the user can then re-enable
  -modd-spreg to regain the use of all 32 single-precision registers.
* When using -mfp64 and -mno-odd-spreg together, an O32 extension named
  O32 FP64A is used as the ABI. This is intended to provide almost all
  functionality of an FR=1 processor but can also be executed on a FR=0 core
  with the assistance of a hardware compatibility mode which emulates FR=0
  behaviour on an FR=1 processor.

* Added '.module oddspreg' and '.module nooddspreg' each of which update
  the .MIPS.abiflags section appropriately
* Moved setFpABI() call inside emitDirectiveModuleFP() so that the caller
  doesn't have to remember to do it.
* MipsABIFlags now calculates the flags1 and flags2 member on demand rather
  than trying to maintain them in the same format they will be emitted in.

There is one portion of the -mfp64 and -mno-odd-spreg combination that is not
implemented yet. Moves to/from odd-numbered double-precision registers must not
use mtc1. I will fix this in a follow-up.

Differential Revision: http://reviews.llvm.org/D4383

llvm-svn: 212717
2014-07-10 13:38:23 +00:00
Chandler Carruth
b55565f54f [x86,SDAG] Introduce any- and sign-extend-vector-inreg nodes analogous
to the zero-extend-vector-inreg node introduced previously for the same
purpose: manage the type legalization of widened extend operations,
especially to support the experimental widening mode for x86.

I'm adding both because sign-extend is expanded in terms of any-extend
with shifts to propagate the sign bit. This removes the last
fundamental scalarization from vec_cast2.ll (a test case that hit many
really bad edge cases for widening legalization), although the trunc
tests in that file still appear scalarized because the the shuffle
legalization is scalarizing. Funny thing, I've been working on that.

Some initial experiments with this and SSE2 scenarios is showing
moderately good behavior already for sign extension. Still some work to
do on the shuffle combining on X86 before we're generating optimal
sequences, but avoiding scalarization is a huge step forward.

llvm-svn: 212714
2014-07-10 12:32:32 +00:00
NAKAMURA Takumi
07e99292d7 llvm/test/CodeGen/X86/shift-parts.ll: FileCheck-ize. (from r212640)
llvm-svn: 212709
2014-07-10 11:37:39 +00:00
NAKAMURA Takumi
ea850a0edc Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine."
This caused miscompilation on, at least, x86-64. SExt(i1 cond) confused other optimizations.

llvm-svn: 212708
2014-07-10 11:37:28 +00:00
Chandler Carruth
b0b7a4016c [x86] Add another combine that is particularly useful for the new vector
shuffle lowering: match shuffle patterns equivalent to an unpcklwd or
unpckhwd instruction.

This allows us to use generic lowering code for v8i16 shuffles and match
the unpack pattern late.

llvm-svn: 212705
2014-07-10 11:09:29 +00:00
Daniel Sanders
a59d10a761 Make it possible for ints/floats to return different values from getBooleanContents()
Summary:
On MIPS32r6/MIPS64r6, floating point comparisons return 0 or -1 but integer
comparisons return 0 or 1.

Updated the various uses of getBooleanContents. Two simplifications had to be
disabled when float and int boolean contents differ:
- ScalarizeVecRes_VSELECT except when the kind of boolean contents is trivially
  discoverable (i.e. when the condition of the VSELECT is a SETCC node).
- visitVSELECT (select C, 0, 1) -> (xor C, 1).
  Come to think of it, this one could test for the common case of 'C'
  being a SETCC too.

Preserved existing behaviour for all other targets and updated the affected
MIPS32r6/MIPS64r6 tests. This also fixes the pi benchmark where the 'low'
variable was counting in the wrong direction because it thought it could simply
add the result of the comparison.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, jholewinski, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D4389

llvm-svn: 212697
2014-07-10 10:18:12 +00:00
Chandler Carruth
ee588e17c9 [x86] Expand the target DAG combining for PSHUFD nodes to be able to
combine into half-shuffles through unpack instructions that expand the
half to a whole vector without messing with the dword lanes.

This fixes some redundant instructions in splat-like lowerings for
v16i8, which are now getting to be *really* nice.

llvm-svn: 212695
2014-07-10 09:57:36 +00:00
Chandler Carruth
7c6ee1b539 [x86] Tweak the v16i8 single input special case lowering for shuffles
that splat i8s into i16s.

Previously, we would try much too hard to arrange a sequence of i8s in
one half of the input such that we could unpack them into i16s and
shuffle those into place. This isn't always going to be a cheaper i8
shuffle than our other strategies. The case where it is always going to
be cheaper is when we can arrange all the necessary inputs into one half
using just i16 shuffles. It happens that viewing the problem this way
also makes it much easier to produce an efficient set of shuffles to
move the inputs into one half and then unpack them.

With this, our splat code gets one step closer to being not terrible
with the new experimental lowering strategy. It also exposes two
combines missing which I will add next.

llvm-svn: 212692
2014-07-10 09:16:40 +00:00
Chandler Carruth
7132367354 [x86] Initial improvements to the new shuffle lowering for v16i8
shuffles specifically for cases where a small subset of the elements in
the input vector are actually used.

This is specifically targetted at improving the shuffles generated for
trunc operations, but also helps out splat-like operations.

There is still some really low-hanging fruit here that I want to address
but this is a huge step in the right direction.

llvm-svn: 212680
2014-07-10 04:34:06 +00:00
Hao Liu
cb23c5a35e [AArch64]Fix an assertion failure in DAG Combiner about concating 2 build_vector.
llvm-svn: 212677
2014-07-10 03:41:50 +00:00
Matt Arsenault
834bcebaa6 R600/SI: Add support for llvm.convert.{to|from}.fp16
llvm-svn: 212676
2014-07-10 03:22:20 +00:00
David Blaikie
4464e33224 Recommit r212203: Don't try to construct debug LexicalScopes hierarchy for functions that do not have top level debug information.
Reverted by Eric Christopher (Thanks!) in r212203 after Bob Wilson
reported LTO issues. Duncan Exon Smith and Aditya Nandakumar helped
provide a reduced reproduction, though the failure wasn't too hard to
guess, and even easier with the example to confirm.

The assertion that the subprogram metadata associated with an
llvm::Function matches the scope data referenced by the DbgLocs on the
instructions in that function is not valid under LTO. In LTO, a C++
inline function might exist in multiple CUs and the subprogram metadata
nodes will refer to the same llvm::Function. In this case, depending on
the order of the CUs, the first intance of the subprogram metadata may
not be the one referenced by the instructions in that function and the
assertion will fail.

A test case (test/DebugInfo/cross-cu-linkonce-distinct.ll) is added, the
assertion removed and a comment added to explain this situation.

Original commit message:

If a function isn't actually in a CU's subprogram list in the debug info
metadata, ignore all the DebugLocs and don't try to build scopes, track
variables, etc.

While this is possibly a minor optimization, it's also a correctness fix
for an incoming patch that will add assertions to LexicalScopes and the
debug info verifier to ensure that all scope chains lead to debug info
for the current function.

Fix up a few test cases that had broken/incomplete debug info that could
violate this constraint.

Add a test case where this occurs by design (inlining a
debug-info-having function in an attribute nodebug function - we want
this to work because /if/ the nodebug function is then inlined into a
debug-info-having function, it should be fine (and will work fine - we
just stitch the scopes up as usual), but should the inlining not happen
we need to not assert fail either).

llvm-svn: 212649
2014-07-09 21:02:41 +00:00
Matt Arsenault
479a1f90e1 Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine.
Do this if the truncate is free and the select is legal.

llvm-svn: 212640
2014-07-09 19:12:07 +00:00
Jim Grosbach
10098b8e10 AArch64: Better codegen for storing to __fp16.
Storing will generally be immediately preceded by rounding from an f32
or f64, so make sure to match those patterns directly to convert into the
FPR16 register class directly rather than going through the integer GPRs.

This also eliminates an extra step in the convert-from-f64 path
which was first converting to f32 and then to f16 from there.

rdar://17594379

llvm-svn: 212638
2014-07-09 18:55:52 +00:00
Chandler Carruth
646382f7ef [x86] Fix a bug in my new zext-vector-inreg DAG trickery where we were
not widening the input type to the node sufficiently to let the ext take
place in a register.

This would in turn result in a mysterious bitcast assertion failure
downstream. First change here is to add back the helpful assert I had in
an earlier version of the code to catch this immediately.

Next change is to add support to the type legalization to detect when we
have widened the operand either too little or too much (for whatever
reason) and find a size-matched legal vector type to convert it to
first. This can also fail so we get a new fallback path, but that seems
OK.

With this, we no longer crash on vec_cast2.ll when using widening. I've
also added the CHECK lines for the zero-extend cases here. We still need
to support sign-extend and trunc (or something) to get plausible code
for the other two thirds of this test which is one of the regression
tests that showed the most scalarization when widening was
force-enabled. Slowly closing in on widening being a viable legalization
strategy without it resorting to scalarization at every turn. =]

llvm-svn: 212614
2014-07-09 12:36:54 +00:00
Benjamin Kramer
ba4b6af99b X86: When lowering v8i32 himuls use the correct shuffle masks for AVX2.
Turns out my trick of using the same masks for SSE4.1 and AVX2 didn't work out
as we have to blend two vectors. While there remove unecessary cross-lane moves
from the shuffles so the backend can lower it to palignr instead of vperm.

Fixes PR20118, a miscompilation of vector sdiv by constant on AVX2.

llvm-svn: 212611
2014-07-09 11:12:39 +00:00
Chandler Carruth
eb00ef26a3 [x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when widening
vector types to be legal and a ZERO_EXTEND node is encountered.

When we use widening to legalize vector types, extend nodes are a real
challenge. Either the input or output is likely to be legal, but in many
cases not both. As a consequence, we don't really have any way to
represent this situation and the prior code in the widening legalization
framework would just scalarize the extend operation completely.

This patch introduces a new DAG node to represent doing a zero extend of
a vector "in register". The core of the idea is to allow legal but
different vector types in the input and output. The output vector must
have fewer lanes but wider elements. The operation is defined to zero
extend the low elements of the input to the size of the output elements,
and drop all of the high elements which don't have a corresponding lane
in the output vector.

It also includes generic expansion of this node in terms of blending
a zero vector into the high elements of the vector and bitcasting
across. This in turn yields extremely nice code for x86 SSE2 when we use
the new widening legalization logic in conjunction with the new shuffle
lowering logic.

There is still more to do here. We need to support sign extension, any
extension, and potentially int-to-float conversions. My current plan is
to continue using similar synthetic nodes to model each of these
transitions with generic lowering code for each one.

However, with this patch LLVM already reaches performance parity with
GCC for the core C loops of the x264 code (assuming you disable the
hand-written assembly versions) when compiling for SSE2 and SSE3
architectures and enabling the new widening and lowering logic for
vectors.

Differential Revision: http://reviews.llvm.org/D4405

llvm-svn: 212610
2014-07-09 10:58:18 +00:00
Daniel Sanders
80d45ce1e9 [mips][mips64r6] Correct select patterns that have the condition or true/false values backwards
Summary: This bug caused SingleSource/Regression/C/uint64_to_float and SingleSource/UnitTests/2002-05-02-CastTest3 to fail (among others).

Differential Revision: http://reviews.llvm.org/D4388

llvm-svn: 212608
2014-07-09 10:47:26 +00:00
Daniel Sanders
f8ec2a1561 [mips][mips64r6] Correct cond names in the cmp.cond.[ds] instructions
Summary:
It seems we accidentally read the wrong column of the table MIPS64r6 spec
and used the names for c.cond.fmt instead of cmp.cond.fmt.

Differential Revision: http://reviews.llvm.org/D4387

llvm-svn: 212607
2014-07-09 10:40:20 +00:00
Daniel Sanders
847346c646 [mips][mips64r6] Use JALR for indirect branches instead of JR (which is not available on MIPS32r6/MIPS64r6)
Summary:
This completes the change to use JALR instead of JR on MIPS32r6/MIPS64r6.

Reviewers: jkolek, vmedic, zoran.jovanovic, dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4269

llvm-svn: 212605
2014-07-09 10:21:59 +00:00
Daniel Sanders
471e2a9293 [mips][mips64r6] Use JALR for returns instead of JR (which is not available on MIPS32r6/MIPS64r6)
Summary:
RET, and RET_MM have been replaced by a pseudo named PseudoReturn.
In addition a version with a 64-bit GPR named PseudoReturn64 has been
added.

Instruction selection for a return matches RetRA, which is expanded post
register allocation to PseudoReturn/PseudoReturn64. During MipsAsmPrinter,
this PseudoReturn/PseudoReturn64 are emitted as:
- (JALR64 $zero, $rs) on MIPS64r6
- (JALR $zero, $rs) on MIPS32r6
- (JR_MM $rs) on microMIPS
- (JR $rs) otherwise

On MIPS32r6/MIPS64r6, 'jr $rs' is an alias for 'jalr $zero, $rs'. To aid
development and review (specifically, to ensure all cases of jr are
updated), these aliases are temporarily named 'r6.jr' instead of 'jr'.
A follow up patch will change them back to the correct mnemonic.

Added (JALR $zero, $rs) to MipsNaClELFStreamer's definition of an indirect
jump, and removed it from its definition of a call.
Note: I haven't accounted for MIPS64 in MipsNaClELFStreamer since it's
doesn't appear to account for any MIPS64-specifics.

The return instruction created as part of eh_return expansion is now expanded
using expandRetRA() so we use the right return instruction on MIPS32r6/MIPS64r6
('jalr $zero, $rs').

Also, fixed a misuse of isABI_N64() to detect 64-bit wide registers in
expandEhReturn().

Reviewers: jkolek, vmedic, mseaborn, zoran.jovanovic, dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4268

llvm-svn: 212604
2014-07-09 10:16:07 +00:00
Chandler Carruth
0b494c0aa6 [x86] Re-apply a variant of the x86 side of r212324 now that the rest
has settled without incident, removing the x86-specific and overly
strict 'isVectorSplat' routine in favor of generic and more powerful
splat detection.

The primary motivation and result of this is that the x86 backend can
now see through splats which contain undef elements. This is essential
if we are using a widening form of legalization and I've updated a test
case to also run in that mode as before this change the generated code
for the test case was completely scalarized.

This version of the patch much more carefully handles the undef lanes.
- We aren't overly conservative about them in the shift lowering
  (where we will never use the splat itself).
- One place where the splat would have been re-used by the existing code
  now explicitly constructs a new constant splat that will be safe.
- The broadcast lowering is much more reasonable with undefs by doing
  a correct check of whether the splat is the only user of a loaded
  value, checking that the splat actually crosses multiple lanes before
  using a broadcast, and handling broadcasts of non-constant splats.

As a consequence of the last bullet, the weird usage of vpshufd instead
of vbroadcast is gone, and we actually can lower an AVX splat with
vbroadcastss where before we emitted a really strange pattern of
a vector load and a manual splat across the vector.

llvm-svn: 212602
2014-07-09 10:06:58 +00:00
Jim Grosbach
4c963f72c5 AArch64: Better codegen for loading from __fp16.
Loading will generally extend to an f32 or an 64, so make sure
to match those patterns directly to load into the FPR16 register
class directly rather than going through the integer GPRs.

This also eliminates an extra step in the convert-to-f64 path
which was first converting to f32 and then to f64 from there.

rdar://17594379

llvm-svn: 212573
2014-07-08 23:28:48 +00:00
Andrea Di Biagio
ac6b7b5b82 [DAG] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.
This patch teaches how to fold a shuffle according to rule:
  shuffle (shuffle (x, undef, M0), undef, M1) -> shuffle(x, undef, M2)

We do this only if the resulting mask M2 is legal; this is to avoid introducing
illegal shuffles that are potentially expanded into a sub-optimal sequence
of target specific dag nodes.

This patch has the advantage of being target independent, since it works on ISD
nodes. Therefore, all targets (not only x86) can take advantage of this rule.
The idea behind this patch is that most shuffle pairs can be safely combined
before we run the legalizer on vector operations. This allows us to
combine/simplify dag nodes earlier in the process and not only immediately
before instruction selection stage.

That said. This patch is not meant to replace any existing target specific
combine rules; backends might still introduce new shuffles during legalization
stage. Also, this rule is very simple and avoids to aggressively optimize
shuffles.

llvm-svn: 212539
2014-07-08 15:22:29 +00:00
Vladimir Medic
f3dd261ec8 Mips.abiflags is a new implicitly generated section that will be present on all new modules. The section contains a versioned data structure which represents essentially information to allow a program loader to determine the requirements of the application. This patch implements mips.abiflags section and provides test cases for it.
llvm-svn: 212519
2014-07-08 08:59:22 +00:00
Chandler Carruth
be294c3274 [x86,SDAG] Sink the logic for folding shuffles of splats more
aggressively from the x86 shuffle lowering to the generic SDAG vector
shuffle formation code.

This code already tried to fold away shuffles of splats! It just had
lots of bugs and couldn't handle the case my new x86 shuffle lowering
needed.

First, it failed to correctly compute whether N2 was undef because it
pre-computed this, then did transformations which could *make* N2 undef,
then failed to ever re-consider the precomputed state.

Second, it didn't look through bitcasts at all, even in the safe cases
where they are just element-type bitcasts with no change to the number
of elements.

Third, it didn't handle all-zero bit casts nicely the way my code in the
x86 side of things did, which is essential to getting good zext-shuffle
lowerings.

But all of these are generic. I just ported the code down to this layer
and fixed the surrounding bugs. Tests exercising this in the x86 backend
still pass and some silly code in widen_cast-6.ll gets better. I updated
that test to be a bit more precise but it's still pretty unclear what
the value of the test is in this day and age.

llvm-svn: 212517
2014-07-08 08:45:38 +00:00
Andrea Di Biagio
e6975da1cd [x86] Fix assertion failure caused by a wrong combine of PSHUFD nodes with different types.
When combining a sequence of two PSHUFD dag nodes into a single PSHUFD,
make sure that we assign the correct type to the resulting PSHUFD.
X86ISD::PSHUFD dag nodes can be either MVT::v4i32 or MVT::v4f32.

Before this change, an assertion failure was triggered in method
'DAGCombinerInfo::CombineTo' when trying to combine the shuffles from the test
below into a single PSHUFD.

define <4 x float> @test1(<4 x float> %V) {
  %1 = shufflevector <4 x float> %V, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
  %2 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
  ret <4 x float> %2
}

llvm-svn: 212498
2014-07-07 23:25:23 +00:00
Juergen Ributzka
19aa45cd04 [FastISel][X86] Fix smul.with.overflow.i8 lowering.
Add custom lowering code for signed multiply instruction selection, because the
default FastISel instruction selection for ISD::MUL will use unsigned multiply
for the i8 type and signed multiply for all other types. This would set the
incorrect flags for the overflow check.

This fixes <rdar://problem/17549300>

llvm-svn: 212493
2014-07-07 21:52:21 +00:00
Louis Gerbarg
30a2dcd984 Allow AArch64FastISel to degrade graceully in the presence of an MVT::i128
Currently AArch64FastISel crashes if it tries to extend an integer into an
MVT::i128. This can happen by creating 128 bit integers like so:

  typedef unsigned int uint128_t __attribute__((mode(TI)));
  typedef int sint128_t __attribute__((mode(TI)));

This patch makes EmitIntExt check for their presence and then falls back to
SelectionDAG.

Tests included.

rdar://17516686

llvm-svn: 212492
2014-07-07 21:37:51 +00:00
Ulrich Weigand
d3a8fd6703 [PowerPC] Fix testcase regression
Use -mcpu to avoid different codegen depending on host platform.

llvm-svn: 212478
2014-07-07 19:41:54 +00:00
Ulrich Weigand
8f2aff0b0c [PowerPC] Fix "byval align" arguments
Arguments passed as "byval align" should get the specified alignment
in the parameter save area.  There was some code in PPCISelLowering.cpp
that attempted to implement this, but this didn't work correctly:
while code did update the ArgOffset value, it neglected to update
the PtrOff value (which was already computed from the old ArgOffset),
and it also neglected to update GPR_idx -- fields skipped due to
alignment in the save area must likewise be skipped in GPRs.

This patch fixes and simplifies this logic by:
- handling argument offset alignment right at the beginning
  of argument processing, using a new helper routine
  CalculateStackSlotAlignment (this avoids having to update
  PtrOff and other derived values later on)
- not tracking GPR_idx separately, but always computing the
  correct GPR_idx for each argument *from* its ArgOffset
- removing some redundant computation in LowerFormalArguments:
  MinReservedArea must equal ArgOffset after argument processing,
  so there's no use in computing it twice.

[This doesn't change the behavior of the current clang front-end,
since that never creates "byval align" arguments at the moment.
This will change with a follow-on patch, however.]

llvm-svn: 212476
2014-07-07 19:26:41 +00:00
Chandler Carruth
9fd1035842 [x86] Revert r212324 which was too aggressive w.r.t. allowing undef
lanes in vector splats.

The core problem here is that undef lanes can't *unilaterally* be
considered to contribute to splats. Their handling needs to be more
cautious. There is also a reported failure of the nightly testers
(thanks Tobias!) that may well stem from the same core issue. I'm going
to fix this theoretical issue, factor the APIs a bit better, and then
verify that I don't see anything bad with Tobias's reduction from the
test suite before recommitting.

Original commit message for r212324:
  [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for
  any constant, constant FP, or undef splat and to tolerate any undef
  lanes in a splat, then replace all uses of isSplatVector in X86's
  lowering with it.

  This fixes issues where undef lanes in an otherwise splat vector would
  prevent the splat logic from firing. It is a touch more awkward to use
  this interface, but it is much more accurate. Suggestions for better
  interface structuring welcome.

  With this fix, the code generated with the widening legalization
  strategy for widen_cast-4.ll is *dramatically* improved as the special
  lowering strategies for a v16i8 SRA kick in even though the high lanes
  are undef.

  We also get a slightly different choice for broadcasting an aligned
  memory location, and use vpshufd instead of vbroadcastss. This looks
  like a minor win for pipelining and domain crossing, but a minor loss
  for the number of micro-ops. I suspect its a wash, but folks can
  easily tweak the lowering if they want.

llvm-svn: 212475
2014-07-07 19:03:32 +00:00
Matt Arsenault
f39a57f581 R600: Fix mishandling of load / store chains.
Fixes various bugs with reordering loads and stores.
Scalarized vector loads weren't collecting the chains
at all.

llvm-svn: 212473
2014-07-07 18:34:45 +00:00
Chandler Carruth
bf23b76679 [x86] Teach the new vector shuffle lowering code to handle what is
essentially a DAG combine that never gets a chance to run.

We might typically expect DAG combining to remove shuffles-of-splats and
other similar patterns, but we don't get a chance to run the DAG
combiner when we recursively form sub-shuffles during the lowering of
a shuffle. So instead hand-roll a really important combine directly into
the lowering code to detect shuffles-of-splats, especially shuffles of
an all-zero splat which needn't even have the same element width, etc.

This lets the new vector shuffle lowering handle shuffles which
implement things like zero-extension really nicely. This will become
even more important when I wire the legalization of zero-extension to
vector shuffles with the new widening legalization strategy.

llvm-svn: 212444
2014-07-07 09:06:58 +00:00
Tim Northover
96e2bce418 CodeGen: it turns out that NAND is not the same thing as BIC. At all.
We've been performing the wrong operation on ARM for "atomicrmw nand" for
years, since "a NAND b" is "~(a & b)" rather than ARM's very tempting "a & ~b".
This bled over into the generic expansion pass.

So I assume no-one has ever actually tried to do an atomic nand in the real
world. Oh well.

llvm-svn: 212443
2014-07-07 09:06:35 +00:00
Saleem Abdulrasool
b5f61cbe40 ARM: properly lower dllimport'ed global values
This completes the handling for DLL import storage symbols when lowering
instructions.  A DLL import storage symbol must have an additional load
performed prior to use.  This is applicable to variables and functions.

This is particularly important for non-function symbols as it is possible to
handle function references by emitting a thunk which performs the translation
from the unprefixed __imp_ symbol to the proper symbol (although, this is a
non-optimal lowering).  For a variable symbol, no such thunk can be
accommodated.

llvm-svn: 212431
2014-07-07 05:18:35 +00:00
Kevin Qin
c72c5a1639 [AArch64] Normalize all constants to build a vector.
The value of constant operands will be truncated to fit element width.

llvm-svn: 212428
2014-07-07 02:45:40 +00:00
Daniel Sanders
dc654abb9f [mips] Add tests for the 'ret', 'call', and 'indirectbr' LLVM IR instruction.
Summary:
The tests in this directory are intended to test a single IR instruction
with as few dependencies on other instructions as possible. The aim is to
be very confident that each LLVM-IR instruction is implemented correctly and
with the optimal sequence of instructions, as well as to make it easy to tell
what is tested, and make it easier to bring up new ISA revisions in the
future. This gives us a good foundation on which to test bigger things.

These particular tests will allow testing that MIPS32r6/MIPS64r6 generate
the correct return instruction for returns, calls, and indirect branches.
This will be a bit tricky since the assembly text is identical but the
instruction is actually different. On MIPS32r6/MIPS64r6 'jr $rs' has been
removed in favour of the equivalent 'jalr $zero, $rs'. 'jr $rs' remains as
an alias for 'jalr $zero, $rs'.

Differential Revision: http://reviews.llvm.org/D4266

llvm-svn: 212345
2014-07-04 15:16:14 +00:00
NAKAMURA Takumi
5a124748fa llvm/test/CodeGen/XCore/dwarf_debug.ll: Fix not to be affected by *-win32.
llvm-svn: 212335
2014-07-04 11:58:03 +00:00
NAKAMURA Takumi
a2b6dfbbca llvm/test/CodeGen/X86/vector-gep.ll: Appease to add -mtriple=i686-linux.
This doesn't pass if stack alignment is not 16, like cygming, *bsd.

llvm-svn: 212334
2014-07-04 11:55:40 +00:00
Tim Northover
2112de25ee llvm-readobj: fix MachO relocatoin printing a bit.
There were two issues here:
1. At the very least, scattered relocations cannot use the same code to
   determine the corresponding symbol being referred to. For some reason we
   pretend there is no symbol, even when one actually exists in the symtab, so to
   match this behaviour getRelocationSymbol should simply return symbols_end for
   scattered relocations.
2. Printing "-" when we can't get a symbol (including the scattered case, but
   not exclusively), isn't that helpful. In both cases there *is* interesting
   information in that field, so we should print it. As hex will do.

Small part of rdar://problem/17553104

llvm-svn: 212332
2014-07-04 10:57:56 +00:00
Chandler Carruth
3bdff47d29 [x86] Relax the line in this check to pacify build bots.
I still don't love testing the comments, but its the only sane way to
check shuffle instructions...

llvm-svn: 212326
2014-07-04 08:39:30 +00:00
Chandler Carruth
806e0340eb [x86] Move some check lines to be slightly easier for me to find.
(meant to put this cleanup in the previous patch, sorry)

llvm-svn: 212325
2014-07-04 08:19:37 +00:00
Chandler Carruth
ab7167839a [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for
any constant, constant FP, or undef splat and to tolerate any undef
lanes in a splat, then replace all uses of isSplatVector in X86's
lowering with it.

This fixes issues where undef lanes in an otherwise splat vector would
prevent the splat logic from firing. It is a touch more awkward to use
this interface, but it is much more accurate. Suggestions for better
interface structuring welcome.

With this fix, the code generated with the widening legalization
strategy for widen_cast-4.ll is *dramatically* improved as the special
lowering strategies for a v16i8 SRA kick in even though the high lanes
are undef.

We also get a slightly different choice for broadcasting an aligned
memory location, and use vpshufd instead of vbroadcastss. This looks
like a minor win for pipelining and domain crossing, but a minor loss
for the number of micro-ops. I suspect its a wash, but folks can easily
tweak the lowering if they want.

llvm-svn: 212324
2014-07-04 08:11:49 +00:00
Robert Lytton
463595f38e XCore target: remove incorrect DebugLoc entries from prologue
Summary: This was causing the prologue_end to be incorrectly positioned.

Differential Revision: http://reviews.llvm.org/D4122

llvm-svn: 212318
2014-07-04 06:38:22 +00:00
Eric Christopher
33b8321c4c Temporarily revert "Don't try to construct debug LexicalScopes hierarchy for functions that do not have top level debug information." as it appears to be breaking some LTO constructs.
This reverts commit r212203.

llvm-svn: 212298
2014-07-03 22:24:54 +00:00
Andrea Di Biagio
fc8cb32b3f [X86] Add ISel patterns to select 'f32_to_f16' and 'f16_to_f32' dag nodes.
This patch adds tablegen patterns to select F16C float-to-half-float
conversion instructions from 'f32_to_f16' and 'f16_to_f32' dag nodes.

If the target doesn't have F16C, then 'f32_to_f16' and 'f16_to_f32'
are expanded into library calls.

llvm-svn: 212293
2014-07-03 21:51:06 +00:00
Yi Kong
b99d414631 [ARM] Implement ISB memory barrier intrinsic
Adds support for __builtin_arm_isb. Also corrects DSB and ISB instructions
modelling by adding has-side-effects property.

llvm-svn: 212276
2014-07-03 16:00:41 +00:00
Sanjay Patel
bbb5460c77 bug fix for PR20020: anti-dependency-breaker causes miscompilation
This patch sets the 'KeepReg' bit for any tied and live registers during the PrescanInstruction() phase of the dependency breaking algorithm. It then checks those 'KeepReg' bits during the ScanInstruction() phase to avoid changing any tied registers. For more details, please see comments in:
http://llvm.org/bugs/show_bug.cgi?id=20020

I added two FIXME comments for code that I think can be removed by using register iterators that include self. I don't want to include those code changes with this patch, however, to keep things as small as possible.

The test case is larger than I'd like, but I don't know how to reduce it further and still produce the failing asm.

Differential Revision: http://reviews.llvm.org/D4351

llvm-svn: 212275
2014-07-03 15:19:40 +00:00
Ulrich Weigand
7f6ccb0182 Fix ppcf128 component access on little-endian systems
The PowerPC 128-bit long double data type (ppcf128 in LLVM) is in fact a
pair of two doubles, where one is considered the "high" or
more-significant part, and the other is considered the "low" or
less-significant part.  When a ppcf128 value is stored in memory or a
register pair, the high part always comes first, i.e. at the lower
memory address or in the lower-numbered register, and the low part
always comes second.  This is true both on big-endian and little-endian
PowerPC systems.  (Similar to how with a complex number, the real part
always comes first and the imaginary part second, no matter the byte
order of the system.)

This was implemented incorrectly for little-endian systems in LLVM.
This commit fixes three related issues:

- When printing an immediate ppcf128 constant to assembler output
  in emitGlobalConstantFP, emit the high part first on both big-
  and little-endian systems.

- When lowering a ppcf128 type to a pair of f64 types in SelectionDAG
  (which is used e.g. when generating code to load an argument into a
  register pair), use correct low/high part ordering on little-endian
  systems.

- In a related issue, because lowering ppcf128 into a pair of f64 must
  operate differently from lowering an int128 into a pair of i64,
  bitcasts between ppcf128 and int128 must not be optimized away by the
  DAG combiner on little-endian systems, but must effect a word-swap.

Reviewed by Hal Finkel.

llvm-svn: 212274
2014-07-03 15:06:47 +00:00
NAKAMURA Takumi
890d548a80 Let llvm/test/CodeGen/X86/lower-bitcast.ll tolerant of win32 calling convention.
llvm-svn: 212258
2014-07-03 07:25:00 +00:00
Chandler Carruth
d8bdeffa9e [x86] Fix the completely broken vector widening legalization of bswap.
This operation was classified as a binary operation in the widening
logic for some reason (clearly, untested). It is in fact a unary
operation. Add a RUN line to a test to exercise this for x86.

Note that again the vector widening strategy doesn't regress anything
and in one case removes a totally unecessary instruction that we
couldn't avoid when promoting the element type.

llvm-svn: 212257
2014-07-03 07:04:38 +00:00
Chandler Carruth
0dba9eafa3 [x86] Fix crashes in lowering bitcast instructions with the widening
mode.

This also runs the test in that mode which would reproduce the crash.
What I love is that *every single FIXME* in the test is addressed by
switching to widening.

llvm-svn: 212254
2014-07-03 03:43:47 +00:00
Chandler Carruth
636964a893 [aarch64] Add a test that should have been in r212242 but I forgot to
add it. Sorry about that.

llvm-svn: 212251
2014-07-03 02:12:26 +00:00
Chandler Carruth
fc0fe5064b [codegen,aarch64] Add a target hook to the code generator to control
vector type legalization strategies in a more fine grained manner, and
change the legalization of several v1iN types and v1f32 to be widening
rather than scalarization on AArch64.

This fixes an assertion failure caused by scalarizing nodes like "v1i32
trunc v1i64". As v1i64 is legal it will fail to scalarize v1i32.

This also provides a foundation for other targets to have more granular
control over how vector types are legalized.

Patch by Hao Liu, reviewed by Tim Northover. I'm committing it to allow
some work to start taking place on top of this patch as it adds some
really important hooks to the backend that I'd like to immediately start
using. =]

http://reviews.llvm.org/D4322

llvm-svn: 212242
2014-07-03 00:23:43 +00:00
Adam Nemet
e66defe4d5 [X86] AVX512: Allow writemask argument in vpermt* intrinsics
llvm-svn: 212223
2014-07-02 21:26:01 +00:00
Adam Nemet
ef83b36688 [X86] AVX512: Add writemask variants for vperm*2*
This includes assembler and codegen support (see the new tests in
avx512-encodings.s and avx512-shuffle.ll).

<rdar://problem/17492620>

llvm-svn: 212221
2014-07-02 21:25:54 +00:00
Tom Stellard
209c137768 R600: Promote i64 loads to v2i32
llvm-svn: 212216
2014-07-02 20:53:54 +00:00
David Blaikie
f63c5eb709 Don't try to construct debug LexicalScopes hierarchy for functions that do not have top level debug information.
If a function isn't actually in a CU's subprogram list in the debug info
metadata, ignore all the DebugLocs and don't try to build scopes, track
variables, etc.

While this is possibly a minor optimization, it's also a correctness fix
for an incoming patch that will add assertions to LexicalScopes and the
debug info verifier to ensure that all scope chains lead to debug info
for the current function.

Fix up a few test cases that had broken/incomplete debug info that could
violate this constraint.

Add a test case where this occurs by design (inlining a
debug-info-having function in an attribute nodebug function - we want
this to work because /if/ the nodebug function is then inlined into a
debug-info-having function, it should be fine (and will work fine - we
just stitch the scopes up as usual), but should the inlining not happen
we need to not assert fail either).

llvm-svn: 212203
2014-07-02 18:31:35 +00:00
Duncan P. N. Exon Smith
3ca30c6b7f AArch64: Re-enable AArch64AddressTypePromotion
This reverts commits r212189 and r212190.

While this pass was accidentally disabled (until r212073), r205437
slipped in a use of `auto` that should have been `auto&`.

This fixes PR20188.

llvm-svn: 212201
2014-07-02 18:17:40 +00:00
Duncan P. N. Exon Smith
971c17962b XFAIL the test to go with r202189
llvm-svn: 212190
2014-07-02 17:07:03 +00:00
Chad Rosier
f00a3b6176 Revert "Revert "MachineScheduler: better book-keeping for asserts.""
This reverts commit r212109, which reverted r212088.

However, disable the assert as it's not necessary for correctness.  There are
several corner cases that the assert needed to handle better for in-order
scheduling, but none of them are incorrect scheduler behavior. The assert is
mainly there to collect good unit tests like this and ensure that the
target-independent scheduler is working as expected with the various machine
models.

llvm-svn: 212187
2014-07-02 16:46:08 +00:00
Benjamin Kramer
972b4f1dd9 X86: When combining shuffles just remove shuffles that are completely redundant.
CombineTo doesn't allow replacing a node with itself so this would crash if the
combined shuffle is the same as the input shuffle.

llvm-svn: 212181
2014-07-02 15:09:44 +00:00
Elena Demikhovsky
2a53c3fac5 AVX-512: dec/inc instructions are slow on KNL
After Alexey Volkov, I'm adding the same property for KNL, that prefers ADD/SUB instead of INC/DEC.
Added a test.

llvm-svn: 212178
2014-07-02 14:11:05 +00:00
Tim Northover
f7aeb57c87 X86: delegate expanding atomic libcalls to generic code.
On targets without cmpxchg16b or cmpxchg8b, the borderline atomic
operations were slipping through the gaps.

X86AtomicExpand.cpp was delegating to ISelLowering. Generic
ISelLowering was delegating to X86ISelLowering and X86ISelLowering was
asserting. The correct behaviour is to expand to a libcall, preferably
in generic ISelLowering.

This can be achieved by X86ISelLowering deciding it doesn't want the
faff after all.

llvm-svn: 212134
2014-07-01 21:44:59 +00:00
Tim Northover
60e9ada729 X86: expand atomics in IR instead of as MachineInstrs.
The logic for expanding atomics that aren't natively supported in
terms of cmpxchg loops is much simpler to express at the IR level. It
also allows the normal optimisations and CodeGen improvements to help
out with atomics, instead of using a limited set of possible
instructions..

rdar://problem/13496295

llvm-svn: 212119
2014-07-01 18:53:31 +00:00
Chad Rosier
92846e46b5 Revert "MachineScheduler: better book-keeping for asserts."
This reverts commit r212088, which is causing a number of spec
failures.  Will provide reduced test cases shortly.
PR20057

llvm-svn: 212109
2014-07-01 17:23:11 +00:00
Andrew Trick
449b2dbbfb MachineScheduler: better book-keeping for asserts.
Fixes another test case under PR20057.

llvm-svn: 212088
2014-07-01 03:23:13 +00:00
Duncan P. N. Exon Smith
8dd9ef0369 AArch64: Actually do address type promotion
AArch64AddressTypePromotion was doing nothing because it was using the
old semantics of `Use` and `uses()`, when it really wanted to get at the
`users()`.

llvm-svn: 212073
2014-06-30 23:42:14 +00:00
Adrian Prantl
2e31f00493 Debug info: split out complex DIVariable address expressions into a
separate MDNode so they can be uniqued via folding set magic. To conserve
space, DIVariable nodes are still variable-length, with the last two
fields being optional.

No functional change.
http://reviews.llvm.org/D3526

llvm-svn: 212050
2014-06-30 17:17:35 +00:00
Andrea Di Biagio
2bc066b1b4 [X86] Add support for builtin to read performance monitoring counters.
This patch adds support for a new builtin instruction called
__builtin_ia32_rdpmc.

Builtin '__builtin_ia32_rdpmc' is defined as a 'GCC builtin'; on X86, it can
be used to read performance monitoring counters. It takes as input the index
of the performance counter to read, and returns the value of the specified
performance counter as a 64-bit number.

Calls to this new builtin will map to instruction RDPMC.
The index in input to the builtin call is moved to register %ECX. The result
of the builtin call is the value of the specified performance counter (RDPMC
would return that quantity in registers RDX:RAX).

This patch:
 - Adds builtin int_x86_rdpmc as a GCCBuiltin;
 - Adds a new x86 DAG node called 'RDPMC_DAG';
 - Teaches how to lower this new builtin;
 - Adds an ISel pattern to select instruction RDPMC;
 - Fixes the definition of instruction RDPMC adding %RAX and %RDX as
   implicit definitions, and adding %ECX as implicit use;
 - Adds a LLVM test to verify that the new builtin is correctly selected.

llvm-svn: 212049
2014-06-30 17:14:21 +00:00
Chad Rosier
e739f85e32 [AArch64] Unsized types don't specify an alignment.
PR20109

llvm-svn: 212045
2014-06-30 15:03:00 +00:00
Chad Rosier
2954f936f3 [AArch64] Convert mul x, -(pow2 +/- 1) to shift + add/sub.
The combine for mul x, pow2 +/- 1 is unchanged. Test cases for
both combines as well as mul x, pow2 have been added as well.

llvm-svn: 212044
2014-06-30 14:51:14 +00:00
Chandler Carruth
747f8eca83 [x86] Fix a bug in the v8i16 shuffling exposed by the new splat-like
lowering for v16i8.

ASan and some bots caught this bug with existing test cases. Fixing it
even fixed a miscompile with one of the test cases. I'm still a bit
suspicious of this test case as I've not taken a proper amount of time
to think about it, but the fix here is strict goodness.

llvm-svn: 211976
2014-06-28 05:46:28 +00:00
Chandler Carruth
d3455a84b3 [x86] Add handling for splat-like widenings of v16i8 shuffles.
These show up really frequently, not the least with actual splats. =] We
lowered these quite badly before. The new code path tries to widen i8
shuffles to i16 shuffles in a splat-like way. There are still some
inefficiencies in our i16 splat logic though, so we aren't really done
here.

Also, for certain patterns (bit of a gather-and-splat) we still
generate pretty silly code, and I've left a fixme for addressing it.
However, I'm not actually worried about this code pattern as much. The
old shuffle lowering generates a 29 instruction monstrosity for it that
should execute much more slowly.

llvm-svn: 211974
2014-06-28 05:16:40 +00:00
David Majnemer
28c9316247 This file wasn't supposed to be checked in
This was generated while trying to debug a test, it shouldn't have been
checked in.

Thanks to Alexander Kornienko for spotting this.

llvm-svn: 211973
2014-06-28 01:56:50 +00:00
Matt Arsenault
000617e7b3 Revert "Temporary hack to try cleaning extra .s file from bots."
llvm-svn: 211967
2014-06-27 23:11:26 +00:00
Matt Arsenault
20ec742a75 Temporary hack to try cleaning extra .s file from bots.
llvm-svn: 211963
2014-06-27 21:43:50 +00:00
Chad Rosier
98a58b0e56 [AArch64] Fix memset ICE when memset value is f128.
llvm-svn: 211960
2014-06-27 21:05:09 +00:00
Chandler Carruth
c99cbd9cc7 [x86] Fix another bug hit when bootstrapping with the new shuffle
lowering.

For maximum irony, I had already discovered this bug, diagnosed it, and
left FIXMEs about it in the test cases. =[ I just failed to go back over
those until after i had reduced a bootstrap miscompile down to a single
TU, stared at the assembly for an hour, and figured out the bug. Again.

Oh well.

llvm-svn: 211955
2014-06-27 20:07:40 +00:00
Justin Holewinski
2a38f449ff [NVPTX] Add reflect intrinsic (better than matching by function name)
Also clean up some of the logic in NVVMReflect.cpp while we're messing around in there.

llvm-svn: 211948
2014-06-27 18:36:11 +00:00
Justin Holewinski
3774707bba [NVPTX] Add 'b' asm constraint
llvm-svn: 211946
2014-06-27 18:36:06 +00:00
Justin Holewinski
caff2b4efe [NVPTX] Error out if initializer is given for variable in an address space that does not support initialization
llvm-svn: 211943
2014-06-27 18:36:01 +00:00
Justin Holewinski
004cf81d64 [NVPTX] Add support for .managed variables for UVM
llvm-svn: 211942
2014-06-27 18:35:58 +00:00
Justin Holewinski
6888ca5201 [NVPTX] Emit .weak linkage for link_once, weak, available_externally, and common linkage
llvm-svn: 211941
2014-06-27 18:35:56 +00:00
Justin Holewinski
0ed0e4b150 [NVPTX] Fix handling of ldg/ldu intrinsics.
The address space of the pointer must be global (1) for these intrinsics.  There must also be alignment metadata attached to the intrinsic calls, e.g.

%val = tail call i32 @llvm.nvvm.ldu.i.global.i32.p1i32(i32 addrspace(1)* %ptr), !align !0

!0 = metadata !{i32 4}

llvm-svn: 211939
2014-06-27 18:35:51 +00:00
Justin Holewinski
6fc5dab1cf [NVPTX] Clean up argument lowering code and properly handle alignment for structs and vectors
llvm-svn: 211938
2014-06-27 18:35:44 +00:00
Justin Holewinski
97b1e28f31 [NVPTX] Add support for [SHL,SRA,SRL]_PARTS
llvm-svn: 211936
2014-06-27 18:35:40 +00:00
Justin Holewinski
7663d1fd5a [NVPTX] Implement fma and imad contraction as target DAGCombiner patterns
This also introduces DAGCombiner patterns for mul.wide to multiply two smaller integers and produce a larger integer

llvm-svn: 211935
2014-06-27 18:35:37 +00:00
Justin Holewinski
1c8d752df2 [NVPTX] Add support for efficient rotate instructions on SM 3.2+
llvm-svn: 211934
2014-06-27 18:35:33 +00:00
Justin Holewinski
b7ecf6b4d6 [NVPTX] Add missing isel patterns for 64-bit atomics
llvm-svn: 211933
2014-06-27 18:35:30 +00:00
Justin Holewinski
2ffa2d24b0 [NVPTX] Add isel patterns for bit-field extract (bfe)
llvm-svn: 211932
2014-06-27 18:35:27 +00:00
Justin Holewinski
7c9cd5f566 [NVPTX] Add support for isspacep instruction
llvm-svn: 211931
2014-06-27 18:35:24 +00:00
Justin Holewinski
ab34c507cf [NVPTX] Add support for envreg reads
llvm-svn: 211930
2014-06-27 18:35:21 +00:00
Justin Holewinski
a3b8653f47 [NVPTX] Emit .weak when linkage is not external, internal, or private
llvm-svn: 211926
2014-06-27 18:35:10 +00:00
Chandler Carruth
ec4fb8a59b [x86] Fix a miscompile in the new shuffle lowering uncovered by
a bootstrap.

I managed to mis-remember how PACKUS worked on x86, and was using undef
for the high bytes instead of zero. The fix is fairly obvious.

llvm-svn: 211922
2014-06-27 18:25:23 +00:00
David Majnemer
abf7854d05 IR: Add COMDATs to the IR
This new IR facility allows us to represent the object-file semantic of
a COMDAT group.

COMDATs allow us to tie together sections and make the inclusion of one
dependent on another. This is required to implement features like MS
ABI VFTables and optimizing away certain kinds of initialization in C++.

This functionality is only representable in COFF and ELF, Mach-O has no
similar mechanism.

Differential Revision: http://reviews.llvm.org/D4178

llvm-svn: 211920
2014-06-27 18:19:56 +00:00
David Blaikie
06bae5c2fe Fix test so it doesn't try to write out temporary files into the test tree.
llvm-svn: 211916
2014-06-27 17:45:43 +00:00
Matt Arsenault
128df7aaf1 R600: Don't crash on unhandled instruction in promote alloca
llvm-svn: 211906
2014-06-27 16:52:49 +00:00
Ulrich Weigand
85c764c15a [PowerPC] Constrain base register in PPCRegisterInfo::resolveFrameIndex
I've run into a bug where current LLVM at -O0 (with fast-isel)
generated invalid code like:

        ld 0, 20936(1)                  # 8-byte Folded Reload
        stw 12, 10348(0)
        stw 12, 10344(0)

The underlying vreg had been introduced as base register by the
Local Stack Slot Allocation pass.  That register was constrained
to G8RC by PPCRegisterInfo::materializeFrameBaseRegister to match
the ADDI instruction used to set it, but it was *not* constrained
to G8RC_NOX0 to fit the *use* of the register in an address.

That should have happened in PPCRegisterInfo::resolveFrameIndex.
This patch adds an appropriate constrainRegClass call.

Reviewed by Hal Finkel.

llvm-svn: 211897
2014-06-27 13:04:12 +00:00
Chandler Carruth
e7aaf337d0 [x86] Teach the target combine step to aggressively fold pshufd insturcions.
Summary:
This allows it to fold pshufd instructions across intervening
half-shuffles and other noise. This pattern actually shows up in the
generic lowering tests, but I've also added direct tests using
intrinsics to make sure that the specific desired functionality is
working even if the lowering stuff changes in the future.

Differential Revision: http://reviews.llvm.org/D4292

llvm-svn: 211892
2014-06-27 11:40:13 +00:00
Chandler Carruth
127f1b532c [x86] Teach the target-specific combining how to aggressively fold
half-shuffles, even looking through intervening instructions in a chain.

Summary:
This doesn't happen to show up with any test cases I've found for the current
shuffle lowering, but previous attempts would benefit from this and it seems
generally useful. I've tested it directly using intrinsics, which also shows
that it will work with hand vectorized code as well.

Note that even though pshufd isn't directly used in these tests, it gets
exercised because we combine some of the half shuffles into a pshufd
first, and then merge them.

Differential Revision: http://reviews.llvm.org/D4291

llvm-svn: 211890
2014-06-27 11:34:40 +00:00
Chandler Carruth
c62b16ce89 [x86] Teach the X86 backend to DAG-combine SSE2 shuffles that are
trivially redundant.

This fixes several cases in the new vector shuffle lowering algorithm
which would generate redundant shuffle instructions for the sake of
simplicity.

I'm also deleting a testcase which was somewhat ridiculous. It was
checking for a bug in 2007 about incorrectly transforming shuffles by
looking for the string "-86" in the output of a pretty substantial
function. This test case doesn't seem to have any value at this point.

Differential Revision: http://reviews.llvm.org/D4240

llvm-svn: 211889
2014-06-27 11:27:52 +00:00
Chandler Carruth
e28f70fce3 [x86] Begin a significant overhaul of how vector lowering is done in the
x86 backend.

This sketches out a new code path for vector lowering, hidden behind an
off-by-default flag while it is under development. The fundamental idea
behind the new code path is to aggressively break down the problem space
in ways that ease selecting the odd set of instructions available on
x86, and carefully avoid scalarizing code even when forced to use older
ISAs. Notably, this starts off restricting itself to SSE2 and implements
the complete vector shuffle and blend space for 128-bit vectors in SSE2
without scalarizing. The plan is to layer on top of this ISA extensions
where we can bail out of the complex SSE2 lowering and opt for
a cheaper, specialized instruction (or set of instructions). It also
needs to be generalized to AVX and AVX512 vector widths.

Currently, this does a decent but not perfect job for SSE2. There are
some specific shortcomings that I plan to address:
- We need a peephole combine to fold together shuffles where possible.
  There are cases where a previous shuffle could be modified slightly to
  arrange for elements to be in the correct position and a later shuffle
  eliminated. Doing this eagerly added quite a bit of complexity, and
  so my plan is to combine away these redundancies afterward.
- There are a lot more clever ways to use unpck and pack that need to be
  added. This is essential for real world shuffles as it turns out...

Once SSE2 is polished a bit I should be able to get interesting numbers
on performance improvements on benchmarks conducive to vectorization.
All of this will be off by default until it is functionally equivalent
of course.

Differential Revision: http://reviews.llvm.org/D4225

llvm-svn: 211888
2014-06-27 11:23:44 +00:00
Andrew Trick
a8cb3163c0 MachineScheduler: add some book-keeping to fix an assert.
Fixe for Bug 20057 - Assertion failied in llvm::SUnit* llvm::SchedBoundary::pickOnlyChoice(): Assertion `i <= (HazardRec->getMaxLookAhead() + MaxObservedStall) && "permanent hazard"'

Thanks to Chad for the test case.

llvm-svn: 211865
2014-06-27 04:57:05 +00:00
Matt Arsenault
19ebc28a85 R600: Add some testcases for promote alloca pass.
More complicated GEPs are skipped. Add some tests to
actually stress this skipping.

llvm-svn: 211859
2014-06-27 03:55:55 +00:00
Juergen Ributzka
4e8c3d809f [StackMaps] Enable patchpoint liveness analysis per default.
llvm-svn: 211817
2014-06-26 23:39:52 +00:00
Juergen Ributzka
f24160ef4c [Stackmaps] Remove the liveness calculation for stackmap intrinsics.
There is no need to calculate the liveness information for stackmaps. The
liveness information is still available for the patchpoint intrinsic and
that is also the intended usage model.

Related to <rdar://problem/17473725>

llvm-svn: 211816
2014-06-26 23:39:44 +00:00
Matt Arsenault
c2d6bb970a R600/SI: Add FP mode bits to binary.
The default rounding mode to initialize the mode register needs
to be reported to the runtime. Fill in other bits a kernel
may be interested in setting for future use.

llvm-svn: 211791
2014-06-26 17:22:30 +00:00
Andrea Di Biagio
3be8532e69 [X86] Improve the selection of SSE3/AVX addsub instructions.
This patch teaches the backend how to canonicalize a shuffle vectors
according to the rule:

 - (shuffle (FADD A, B), (FSUB A, B), Mask) ->
       (shuffle (FSUB A, -B), (FADD A, -B), Mask)

Where 'Mask' is:
  <0,5,2,7>            ;; for v4f32 and v4f64 shuffles.
  <0,3>                ;; for v2f64 shuffles.
  <0,9,2,11,4,13,6,15> ;; for v8f32 shuffles.

In general, ISel only knows how to pattern-match a canonical
'fadd + fsub + blendi' dag node sequence into an ADDSUB instruction.

This new rule allows to convert a non-canonical dag sequence into a
canonical one that will be matched by a single ADDSUB at ISel stage.

The idea of converting a non-canonical ADDSUB into a canonical one by
swapping the first two operands of the shuffle, and then negating the
second operand of the FADD and FSUB, was originally proposed by Hal Finkel.

llvm-svn: 211771
2014-06-26 10:45:21 +00:00
Matt Arsenault
435a7c1256 R600: Fix vector FMA
llvm-svn: 211757
2014-06-26 01:28:05 +00:00
Juergen Ributzka
87fbbdf13d [FastISel][X86] Only fold the cmp into the select when both instructions are in the same basic block.
If the cmp is in a different basic block, then it is possible that not all
operands of that compare have defined registers. This can happen when one of
the operands to the cmp is a load and the load gets folded into the cmp. In
this case FastISel will skip the load instruction and the vreg is never
defined.

llvm-svn: 211730
2014-06-25 20:06:12 +00:00
Andrea Di Biagio
de7985179f [X86] Always prefer to lower a VECTOR_SHUFFLE into a BLENDI instead of SHUFP (or VPERM2X128).
This patch teaches method 'LowerVECTOR_SHUFFLE' to give higher precedence to
the check for 'isBlendMask'; the idea is that, when possible, we should firstly
check if a shuffle performs a blend, and in case, try to lower it into a BLENDI
instead of selecting a SHUFP or (worse) a VPERM2X128.

In general:
 - AVX VBLENDPS/D always have better latency and throughput than VPERM2F128;
 - BLENDPS/D instructions tend to always have better 'reciprocal throughput'
   than the equivalent SHUFPS/D;
 - Both BLENDPS/D and SHUFPS/D are often decoded into the same number of
   m-ops; however, a m-op obtained from a BLENDPS/D can be scheduled to more
   than one execution port.

This patch:
 - Moves the check for 'isBlendMask' immediately before the check for
   'isSHUFPMask' within method 'LowerVECTOR_SHUFFLE';
 - Updates existing tests for sse/avx shuffle/blend instructions to verify
   that we select (v)blendps/d when possible (instead of (v)shufps/d or
   vperm2f128).

llvm-svn: 211720
2014-06-25 17:41:58 +00:00
Eli Bendersky
def2619060 Rename loop unrolling and loop vectorizer metadata to have a common prefix.
[LLVM part]

These patches rename the loop unrolling and loop vectorizer metadata
such that they have a common 'llvm.loop.' prefix.  Metadata name
changes:

llvm.vectorizer.* => llvm.loop.vectorizer.*
llvm.loopunroll.* => llvm.loop.unroll.*

This was a suggestion from an earlier review
(http://reviews.llvm.org/D4090) which added the loop unrolling
metadata. 

Patch by Mark Heffernan.

llvm-svn: 211710
2014-06-25 15:41:00 +00:00
Chandler Carruth
2262038283 [x86] Add intrinsics for the pshufd, pshuflw, and pshufhw instructions.
llvm-svn: 211694
2014-06-25 13:12:54 +00:00
NAKAMURA Takumi
c5a2c81f7e Re-apply r211399, "Generate native unwind info on Win64" with a fix to ignore SEH pseudo ops in X86 JIT emitter.
--
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI.  It handles all corner cases (I hope), including stack
realignment.

Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.

Patch by Vadim Chugunov!

Reviewed By: rnk

Differential Revision: http://reviews.llvm.org/D4081

llvm-svn: 211691
2014-06-25 12:41:52 +00:00
Andrea Di Biagio
7f4e676911 [X86] Add target combine rule to select ADDSUB instructions from a build_vector
This patch teaches the backend how to combine a build_vector that implements
an 'addsub' between packed float vectors into a sequence of vector add
and vector sub followed by a VSELECT.

The new VSELECT is expected to be lowered into a BLENDI.
At ISel stage, the sequence 'vector add + vector sub + BLENDI' is
pattern-matched against ISel patterns added at r211427 to select
'addsub' instructions.
Added three more ISel patterns for ADDSUB.

Added test sse3-avx-addsub-2.ll to verify that we correctly emit 'addsub'
instructions.

llvm-svn: 211679
2014-06-25 10:02:21 +00:00
Rafael Espindola
f314cd4f5f Fix a regression from r211653.
The method was empty in the null streamer but I mistakenly replaced it with
the aborting one in MCStreamer.

llvm-svn: 211666
2014-06-25 05:31:22 +00:00
NAKAMURA Takumi
dc31d81b4f CodeGen/X86/pr20088.ll: Add -march=x86-64, or llc fails due to non-x86 default target.
llvm-svn: 211659
2014-06-25 03:05:47 +00:00
Juergen Ributzka
236ea1c61a [FastISel][X86] Fold XALU condition into branch and compare.
Optimize the codegen of select and branch instructions to directly use the
EFLAGS from the {s|u}{add|sub|mul}.with.overflow intrinsics.

llvm-svn: 211645
2014-06-24 23:51:21 +00:00
Tom Stellard
840992bb71 R600: Promote i64 stores to v2i32
Now we need only one 64-bit pattern for stores.

llvm-svn: 211643
2014-06-24 23:33:04 +00:00
Rafael Espindola
c0fea93ce8 Print a=b as an assignment.
In assembly the expression a=b is parsed as an assignment, so it should be
printed as one.

This remove a truly horrible hack for producing a label with "a=.". It would
be used by codegen but would never be reached by the asm parser. Sorry I
missed this when it was first committed.

llvm-svn: 211639
2014-06-24 22:45:16 +00:00
Matt Arsenault
37d6d91b5b R600: Fix inconsistency in rsq instructions.
R600 was using a clamped version of rsq, but SI was not. Add a
new rsq_clamped intrinsic and use them consistently.

It's unclear to me from the documentation what behavior
the R600 instructions have, so I assume they have the legacy behavior
described by the SI documents. For R600, use RECIPSQRT_IEEE
for both llvm.AMDGPU.rsq.legacy and llvm.AMDGPU.rsq. R600 also
has RECIPSQRT_FF, which I'm not sure how it fits in here.

llvm-svn: 211637
2014-06-24 22:13:39 +00:00
David Blaikie
ad1b24c1a0 Fix up scoping in a few tests (and delete one that validates unnecessary behavior).
Most of this is just tests that were silently succeeding in spite of
schema changes I made over a year ago. Cleaning them up as they lead to
failures in a change I'm working on/will come soon.

test/DebugInfo/2010-01-19-DbgScope.ll was removed as it tested miscoping
where a DebugLoc described a location not in the current function. The
test case doesn't describe why this is a valid situation and should be
supported, so I'm removing it and shortly going to commit changes that
make this firmly unsupported/assert-fail.

llvm-svn: 211628
2014-06-24 20:10:27 +00:00
Bill Schmidt
bfe90f8c83 [PPC64] Fix PR20071 (fctiduz generated for targets lacking that instruction)
PR20071 identifies a problem in PowerPC's fast-isel implementation for
floating-point conversion to integer.  The fctiduz instruction was added in
Power ISA 2.06 (i.e., Power7 and later).  However, this instruction is being
generated regardless of which 64-bit PowerPC target is selected.

The intent is for fast-isel to punt to DAG selection when this instruction is
not available.  This patch implements that change.  For testing purposes, the
existing fast-isel-conversion.ll test adds a RUN line for -mcpu=970 and tests
for the expected code generation.  Additionally, the existing test
fast-isel-conversion-p5.ll was found to be incorrectly expecting the
unavailable instruction to be generated.  I've removed these test variants
since we have adequate coverage in fast-isel-conversion.ll.

llvm-svn: 211627
2014-06-24 20:05:18 +00:00
Robert Khasanov
c1b0024016 vpblend intrinsics combines as shifts intrinsics due to absence return stmt between them
Fix PR20088

Differential Revision: http://reviews.llvm.org/D4277

llvm-svn: 211617
2014-06-24 18:08:04 +00:00
Weiming Zhao
23e9a680c2 Resubmit commit r211533
"Fix PR20056: Implement pseudo LDR <reg>, =<literal/label> for AArch64"
Missed files are added in this commit.

llvm-svn: 211605
2014-06-24 16:21:38 +00:00
Christian Pirker
4deae9a4a4 ARM: Fix TPsoft for Thumb mode
Reviewed at http://reviews.llvm.org/D4230

llvm-svn: 211601
2014-06-24 15:45:59 +00:00
Kevin Qin
22b89e0ae0 [AArch64] Fix a build_vector pattern match fail
caused by defect in isBuildVectorAllZeros().

llvm-svn: 211567
2014-06-24 05:37:27 +00:00
Juergen Ributzka
978bad2e2e [FastISel][X86] Lower unsupported selects to control-flow.
The extends the select lowering coverage by emiting pseudo cmov
instructions. These insturction will be later on lowered to control-flow to
simulate the select.

llvm-svn: 211545
2014-06-23 21:55:44 +00:00
Juergen Ributzka
9a7053da70 [FastISel][X86] Add support for floating-point select.
This extends the select lowering to support floating-point selects. The
lowering depends on SSE instructions and that the conditon comes from a
floating-point compare. Under this conditions it is possible to emit an
optimized instruction sequence that doesn't require any branches to
simulate the select.

llvm-svn: 211544
2014-06-23 21:55:40 +00:00
Juergen Ributzka
5ad4bec279 [FastISel][X86] Optimize selects when the condition comes from a compare.
Optimize the select instructions sequence to use the EFLAGS directly from a
compare when possible.

llvm-svn: 211543
2014-06-23 21:55:36 +00:00
Rafael Espindola
90186f5931 [Mips] Add a target streamer when creating a null streamer.
Should fix DebugInfo/global.ll on the mips bot.

llvm-svn: 211527
2014-06-23 19:43:40 +00:00
Matt Arsenault
0cbe5b2357 R600/SI: Fix div_scale intrinsic.
The operand that must match one of the others does matter,
and implement selecting for it.

llvm-svn: 211523
2014-06-23 18:28:28 +00:00
Christian Pirker
42c866343b ARMEB: Vector extend operations
Reviewed at http://reviews.llvm.org/D4043

llvm-svn: 211520
2014-06-23 18:05:53 +00:00
Matt Arsenault
68285ece4f R600: Move add/sub with overflow out of AMDILISelLowering
Add more tests for these.

llvm-svn: 211517
2014-06-23 18:00:49 +00:00
Matt Arsenault
b319678001 R600/SI: Handle i64 sub.
We can handle it the same way as add

llvm-svn: 211514
2014-06-23 18:00:38 +00:00
Ulrich Weigand
ac0412b387 [PowerPC] Allow stack frames without parameter save area
The PPCFrameLowering::determineFrameLayout routine currently ensures
that every function that allocates a stack frame provides space for the
parameter save area (via PPCFrameLowering::getMinCallFrameSize).

This is actually not necessary.  There may be functions that never call
another routine but still allocate a frame; those do not require the
parameter save area.  In the future, with the ELFv2 ABI, even some
routines that do call other functions do not need to allocate the
parameter save area.

While it is not a bug to allocate the parameter area when it is not
needed, it is better to avoid it to save stack space.

Note that when any particular function call requires the parameter save
area, this space will already have been included by ABI code in the size
the CALLSEQ_START insn is annotated with, and therefore included in the
size returned by MFI->getMaxCallFrameSize().

This means that determineFrameLayout simply does not need to care about
the parameter save area.  (It still needs to ensure that every frame
provides the linkage area.)  This is implemented by this patch.

Note that this exposed a bug in the new fast-isel code where the parameter
area was *not* included in the CALLSEQ_START size; this is also fixed.

A couple of test cases needed to be adapted for the new (smaller) stack
frame size those tests now see.

llvm-svn: 211495
2014-06-23 13:47:52 +00:00
Ulrich Weigand
e1fa36144d [PowerPC] Fix on-stack AltiVec arguments with 64-bit SVR4
Current 64-bit SVR4 code seems to have some remnants of Darwin code
in AltiVec argument handing.  This had the effect that AltiVec arguments
(or subsequent arguments) were not correctly placed in the parameter area
in some cases.

The correct behaviour with the 64-bit SVR4 ABI is:
- All AltiVec arguments take up space in the parameter area, just like
  any other arguments, whether vararg or not.
- They are always 16-byte aligned, skipping a parameter area doubleword
  (and the associated GPR, if any), if necessary.

This patch implements the correct behaviour and adds a test case.
(Verified against GCC behaviour via the ABI compat test suite.)

llvm-svn: 211492
2014-06-23 12:36:34 +00:00
NAKAMURA Takumi
eca89a0522 Revert r211399, "Generate native unwind info on Win64"
It broke Legacy JIT Tests on x86_64-{mingw32|msvc}, aka Windows x64.

llvm-svn: 211480
2014-06-22 22:00:56 +00:00
Jan Vesely
fd4ac4d9f5 R600: Add udivrem test
v2: move < %s to the end of the line
    space after ;
    add v4i32 test

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 211476
2014-06-22 21:42:58 +00:00
Filipe Cabecinhas
f28bdb1d10 Fix PR20087 by using the source index when changing the vector load
llvm-svn: 211472
2014-06-22 17:21:37 +00:00
Benjamin Kramer
8edc3ff6a3 Legalizer: Add support for splitting insert_subvectors.
We handle this by spilling the whole thing to the stack and doing the
insertion as a store.

PR19492. This happens in real code because the vectorizer creates v2i128 when AVX is enabled.

llvm-svn: 211435
2014-06-21 12:56:42 +00:00
Andrea Di Biagio
53a4d382f7 [X86] Add ISel patterns to select SSE3/AVX ADDSUB instructions.
This patch adds ISel patterns to select SSE3/AVX ADDSUB instructions
from a sequence of "vadd + vsub + blend".

Example:

///
typedef float float4 __attribute__((ext_vector_type(4)));

float4 foo(float4 A, float4 B) {
  float4 X = A - B;
  float4 Y = A + B;
  return (float4){X[0], Y[1], X[2], Y[3]};
}
///

Before this patch, (with flag -mcpu=corei7) llc produced the following
assembly sequence:
  movaps  %xmm0, %xmm2
  addps   %xmm1, %xmm2
  subps   %xmm1, %xmm0
  blendps $10, %xmm2, %xmm0


With this patch, we now get a single
  addsubps  %xmm1, %xmm0

llvm-svn: 211427
2014-06-21 01:31:15 +00:00
Reid Kleckner
40d9c6f936 Generate native unwind info on Win64
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI.  It handles all corner cases (I hope), including stack
realignment.

Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.

Patch by Vadim Chugunov!

Reviewed By: rnk

Differential Revision: http://reviews.llvm.org/D4081

llvm-svn: 211399
2014-06-20 20:35:47 +00:00
Tom Stellard
ae7faa387d R600/SI: Add patterns for ctpop inside a branch
llvm-svn: 211378
2014-06-20 17:06:11 +00:00
Tom Stellard
2d56aec1cd R600/SI: Add a pattern for f32 ftrunc
llvm-svn: 211377
2014-06-20 17:06:09 +00:00
Tom Stellard
0c64d4a322 R600: Expand vector flog2
llvm-svn: 211376
2014-06-20 17:06:07 +00:00
Tom Stellard
f2ad4b6bb8 R600: Expand vector fexp2
llvm-svn: 211375
2014-06-20 17:06:05 +00:00
Tom Stellard
d4aa49ad5e R600/SI: Add a VALU pattern for i64 xor
llvm-svn: 211373
2014-06-20 17:05:57 +00:00
Ulrich Weigand
3f880bd06e [PowerPC] Fix small argument stack slot offset for LE
When small arguments (structures < 8 bytes or "float") are passed in a
stack slot in the ppc64 SVR4 ABI, they must reside in the least
significant part of that slot.  On BE, this means that an offset needs
to be added to the stack address of the parameter, but on LE, the least
significant part of the slot has the same address as the slot itself.

This changes the PowerPC back-end ABI code to only add the small
argument stack slot offset for BE.  It also adds test cases to verify
the correct behavior on both BE and LE.

llvm-svn: 211368
2014-06-20 16:34:05 +00:00
Rafael Espindola
f4a9d636ea Move test so that it is skipped if the ARM target is not enabled.
llvm-svn: 211366
2014-06-20 15:30:38 +00:00
Oliver Stannard
8087ec46e0 Emit the ARM build attributes ABI_PCS_wchar_t and ABI_enum_size.
Emit the ARM build attributes ABI_PCS_wchar_t and ABI_enum_size based on
module flags metadata.

llvm-svn: 211349
2014-06-20 10:08:11 +00:00
Zoran Jovanovic
895809b0be ps][mips64r6] Added LSA/DLSA instructions
Differential Revision: http://reviews.llvm.org/D3897

llvm-svn: 211346
2014-06-20 09:28:09 +00:00
Alp Toker
6580a45cd6 Fix typos
llvm-svn: 211304
2014-06-19 19:41:26 +00:00
Andrea Di Biagio
87a0386d9d [X86] Teach how to combine horizontal binop even in the presence of undefs.
Before this change, the backend was unable to fold a build_vector dag
node with UNDEF operands into a single horizontal add/sub.

This patch teaches how to combine a build_vector with UNDEF operands into a
horizontal add/sub when possible. The algorithm conservatively avoids to combine
a build_vector with only a single non-UNDEF operand.

Added test haddsub-undef.ll to verify that we correctly fold horizontal binop
even in the presence of UNDEFs.

llvm-svn: 211265
2014-06-19 10:29:41 +00:00
Matt Arsenault
47b0791075 R600: Add a few tests I forgot to add.
These belong with r210827

llvm-svn: 211253
2014-06-19 04:24:43 +00:00
Matt Arsenault
b82983ef6a R600/SI: Add intrinsics for various math instructions.
These will be used for custom lowering and for library
implementations of various math functions, so it's useful
to expose these as builtins.

llvm-svn: 211247
2014-06-19 01:19:19 +00:00
Matt Arsenault
c86884a54a R600: Handle fnearbyint
The difference from rint isn't really relevant here,
so treat them as equivalent. OpenCL doesn't have nearbyint,
so this is sort of pointless other than for completeness.

llvm-svn: 211229
2014-06-18 22:03:45 +00:00
Marek Olsak
d80d0ca951 R600/SI: add gather4 and getlod intrinsics (v3)
This contains all the previous patches + getlod support on top of it.
It doesn't use SDNodes anymore, so it's quite small.
It also adds v16i8 to SReg_128, which is used for the sampler descriptor.

Reviewed-by: Tom Stellard
llvm-svn: 211228
2014-06-18 22:00:29 +00:00
Jan Vesely
3b8464bc4e R600: Expand vector fceil
Move fp64 fceil tests to fceil64.ll

v2: rebase

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 211194
2014-06-18 17:57:29 +00:00
Ulrich Weigand
e256aa48b9 [PowerPC] Simplify and improve loading into TOC register
During an indirect function call sequence on the 64-bit SVR4 ABI,
generate code must load and then restore the TOC register.

This does not use a regular LOAD instruction since the TOC
register r2 is marked as reserved.  Instead, the are two
special instruction patterns:

 let RST = 2, DS = 2 in
 def LDinto_toc: DSForm_1a<58, 0, (outs), (ins g8rc:$reg),
                     "ld 2, 8($reg)", IIC_LdStLD,
                     [(PPCload_toc i64:$reg)]>, isPPC64;
 
 let RST = 2, DS = 10, RA = 1 in
 def LDtoc_restore : DSForm_1a<58, 0, (outs), (ins),
                     "ld 2, 40(1)", IIC_LdStLD,
                     [(PPCtoc_restore)]>, isPPC64;

Note that these not only restrict the destination of the
load to r2, but they also restrict the *source* of the
load to particular address combinations.  The latter is
a problem when we want to support the ELFv2 ABI, since
there the TOC save slot is no longer at 40(1).

This patch replaces those two instructions with a single
instruction pattern that only hard-codes r2 as destination,
but supports generic addresses as source.  This will allow
supporting the ELFv2 ABI, and also helps generate more
efficient code for calls to absolute addresses (allowing
simplification of the ppc64-calls.ll test case).

llvm-svn: 211193
2014-06-18 17:52:49 +00:00
Ulrich Weigand
11bbc07d92 [PowerPC] Add back test case for absolute calls (removed in r211174)
As requested by Hal Finkel, this adds back a test for calls to
a known-constant function pointer value, and verifies that the
64-bit SVR4 indirect function call sequence is used.

llvm-svn: 211190
2014-06-18 17:28:56 +00:00
Arnold Schwaighofer
69cfc9ae09 Add a triple so that right syntax is choosen on mac osx systems
llvm-svn: 211188
2014-06-18 17:20:49 +00:00
Matt Arsenault
a46ba4c9d1 R600/SI: Add intrinsics for brev instructions
llvm-svn: 211187
2014-06-18 17:13:57 +00:00
Matt Arsenault
48848ba546 R600/SI: Prettier operand printing for 64-bit ops.
Copy what is done for 32-bit already so the order is about the same.

llvm-svn: 211186
2014-06-18 17:13:51 +00:00
Matheus Almeida
d742950090 [mips] SYNC $stype instruction was added in Mips32
but SYNC with an implied operand ($stype = 0) is valid since Mips2.

llvm-svn: 211185
2014-06-18 17:10:30 +00:00
Matt Arsenault
068d030935 R600: Implement f64 ftrunc, ffloor and fceil.
CI has instructions for these, so this fixes them for older hardware.

llvm-svn: 211183
2014-06-18 17:05:30 +00:00
Matt Arsenault
77f7e6fc35 R600: Custom lower f64 frint for pre-CI
llvm-svn: 211182
2014-06-18 17:05:26 +00:00
Adam Nemet
d36a2c2dba [X86] AVX512: Add non-temporal stores
Note that I followed the AVX2 convention here and didn't add LLVM intrinsics
for stores.  These can be generated with the nontemporal hint on LLVM IR
stores (see new test). The GCC builtins are lowered directly into nontemporal
stores.

<rdar://problem/17082571>

llvm-svn: 211176
2014-06-18 16:51:10 +00:00
Ulrich Weigand
1458f67d3e [PowerPC] Do not use BLA with the 64-bit SVR4 ABI
The PowerPC back-end uses BLA to implement calls to functions at
known-constant addresses, which is apparently used for certain
system routines on Darwin.

However, with the 64-bit SVR4 ABI, this is actually incorrect.
An immediate function pointer value on this platform is not
directly usable as a target address for BLA:
- in the ELFv1 ABI, the function pointer value refers to the
  *function descriptor*, not the code address
- in the ELFv2 ABI, the function pointer value refers to the
  global entry point, but BL(A) would only be correct when
  calling the *local* entry point

This bug didn't show up since using immediate function pointer
values is not usually done in the 64-bit SVR4 ABI in the first
place.  However, I ran into this issue with a certain use case
of LLVM as JIT, where immediate function pointer values were
uses to implement callbacks from JITted code to helpers in
statically compiled code.

Fixed by simply not using BLA with the 64-bit SVR4 ABI.

llvm-svn: 211174
2014-06-18 16:14:04 +00:00