1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-28 22:42:52 +01:00
Commit Graph

6157 Commits

Author SHA1 Message Date
Simon Pilgrim
d5bdbd921b CONCAT_VECTOR of BUILD_VECTOR - minor fix
Fixed issue with the combine of CONCAT_VECTOR of 2 BUILD_VECTOR nodes - the optimisation wasn't ensuring that the scalar operands of both nodes were the same type/size for implicit truncation.

Test case spotted by Patrik Hagglund

llvm-svn: 235371
2015-04-21 08:05:43 +00:00
Pawel Bylica
b046734bb1 Fix generic shift expansion when shift amount is 0
Summary:
This fixes http://llvm.org/bugs/show_bug.cgi?id=16439. 

This is one possible way to approach this. The other would be to split InL>>(nbits-Amt) into (InL>>(nbits-1-Amt))>>1, which is also valid since since we only need to care about Amt up nbits-1. It's hard to tell which one is better since the shift might be expensive if this stage of expansion is not yet a legal machine integer, whereas comparisons with zero are relatively cheap at all sizes, but more expensive than a shift if the shift is on a legal machine type. 

Patch by Keno Fischer!

Test Plan: regression test from http://reviews.llvm.org/D7752

Reviewers: chfast, resistor

Reviewed By: chfast, resistor

Subscribers: sanjoy, resistor, chfast, llvm-commits

Differential Revision: http://reviews.llvm.org/D4978

llvm-svn: 235370
2015-04-21 06:28:36 +00:00
Matthias Braun
3df6448424 X86: Do not select X86 custom vector nodes if operand types don't match
X86ISD::ADDSUB, X86ISD::(F)HADD, X86ISD::(F)HSUB should not be selected
if the operand types do not match the result type because vector type
legalization cannot deal with this for custom nodes.

Testcase X86ISD::ADDSUB is attached. I could not create a testcase for
the FHADD/FHSUB cases because of: https://llvm.org/bugs/show_bug.cgi?id=23296

Differential Revision: http://reviews.llvm.org/D9120

llvm-svn: 235367
2015-04-21 01:13:41 +00:00
Sanjay Patel
3391f7c43c use update_llc_test_checks.py to tighten checking
Also, replace win and linux runs with a generic run because that
makes no difference in what this test is checking.

llvm-svn: 235361
2015-04-20 23:31:53 +00:00
Andrea Di Biagio
b10889be03 [X86][FastIsel] Fix assertion failure when selecting int-to-double conversion (PR23273).
This fixes a regression introduced at revision 231243.
The target-independent selection algorithm in FastISel knows how to select
a SINT_TO_FP if the target is SSE but not AVX. That is because on X86, the
tablegen'd 'fastEmit' functions know how to select CVTSI2SSrr and CVTSI2SDrr.

Method X86FastISel::X86SelectSIToFP was therefore working under the
wrong assumption that the target was AVX. That assumption was incorrect since
we can have a target that is neither AVX nor SSE.

So, rather than asserting for the presence of AVX, we should have had an
early exit from 'X86SelectSIToFP' if the target was not AVX.
This patch fixes the issue replacing the invalid assertion with an early exit.

Thanks to Dimitry Andric for reporting this problem and for providing a small
reproducible testcase. Added test pr23273.ll.

llvm-svn: 235295
2015-04-20 11:56:59 +00:00
Simon Pilgrim
292688507a [X86][SSE] Fix for getScalarValueForVectorElement to detect scalar sources requiring truncation.
The fix ensures that scalar sources inserted into a vector are the correct bit size.

Integer scalar sources from BUILD_VECTOR and SCALAR_TO_VECTOR nodes may require truncation that this function doesn't currently support.

llvm-svn: 235281
2015-04-19 22:16:49 +00:00
Simon Pilgrim
1a7066b9c2 [X86][SSE] Extended copysign tests to include llvm intrinsic implementation and constant folding.
llvm-svn: 235279
2015-04-19 21:34:57 +00:00
Simon Pilgrim
cdf694c820 [X86][AVX2] Force execution domain on broadcast folding tests.
llvm-svn: 235260
2015-04-18 21:24:16 +00:00
Simon Pilgrim
048a4e3644 [X86][SSE] Force execution domain on float/double unpack shuffle tests.
llvm-svn: 235259
2015-04-18 18:50:55 +00:00
Sanjay Patel
d7195a62c2 [X86, AVX] add an exedepfix entry for vmovq == vmovlps == vmovlpd
This is the AVX extension of r235014:
http://llvm.org/viewvc/llvm-project?view=revision&revision=235014

Review:
http://reviews.llvm.org/D8691

llvm-svn: 235210
2015-04-17 17:02:37 +00:00
Nico Weber
ffd0269e17 Revert r235154-r235156, they cause asserts when building win64 code (http://crbug.com/477988)
llvm-svn: 235170
2015-04-17 09:10:43 +00:00
Reid Kleckner
560ebfc81d Fix test failure due to racing commits
It looks like r235145 changed the .ll syntax for variadic calls. Update
tests to use the new syntax.

llvm-svn: 235156
2015-04-17 01:09:53 +00:00
Reid Kleckner
52ead7bcab [SEH] Reimplement x64 SEH using WinEHPrepare
This now emits simple, unoptimized xdata tables for __C_specific_handler
based on the handlers listed in @llvm.eh.actions calls produced by
WinEHPrepare.

This adds support for running __finally blocks when exceptions are
thrown, and removes the old landingpad fan-in codepath.

I ran some manual execution tests on small basic test cases with and
without optimization, as well as on Chrome base_unittests, which uses a
small amount of SEH.  I'm sure there are bugs, and we may need to
revert.

llvm-svn: 235154
2015-04-17 01:01:27 +00:00
David Blaikie
dfadb4e9ee [opaque pointer type] Add textual IR support for explicit type parameter to the call instruction
See r230786 and r230794 for similar changes to gep and load
respectively.

Call is a bit different because it often doesn't have a single explicit
type - usually the type is deduced from the arguments, and just the
return type is explicit. In those cases there's no need to change the
IR.

When that's not the case, the IR usually contains the pointer type of
the first operand - but since typed pointers are going away, that
representation is insufficient so I'm just stripping the "pointerness"
of the explicit type away.

This does make the IR a bit weird - it /sort of/ reads like the type of
the first operand: "call void () %x(" but %x is actually of type "void
()*" and will eventually be just of type "ptr". But this seems not too
bad and I don't think it would benefit from repeating the type
("void (), void () * %x(" and then eventually "void (), ptr %x(") as has
been done with gep and load.

This also has a side benefit: since the explicit type is no longer a
pointer, there's no ambiguity between an explicit type and a function
that returns a function pointer. Previously this case needed an explicit
type (eg: a function returning a void() function was written as
"call void () () * @x(" rather than "call void () * @x(" because of the
ambiguity between a function returning a pointer to a void() function
and a function returning void).

No ambiguity means even function pointer return types can just be
written alone, without writing the whole function's type.

This leaves /only/ the varargs case where the explicit type is required.

Given the special type syntax in call instructions, the regex-fu used
for migration was a bit more involved in its own unique way (as every
one of these is) so here it is. Use it in conjunction with the apply.sh
script and associated find/xargs commands I've provided in rr230786 to
migrate your out of tree tests. Do let me know if any of this doesn't
cover your cases & we can iterate on a more general script/regexes to
help others with out of tree tests.

About 9 test cases couldn't be automatically migrated - half of those
were functions returning function pointers, where I just had to manually
delete the function argument types now that we didn't need an explicit
function type there. The other half were typedefs of function types used
in calls - just had to manually drop the * from those.

import fileinput
import sys
import re

pat = re.compile(r'((?:=|:|^|\s)call\s(?:[^@]*?))(\s*$|\s*(?:(?:\[\[[a-zA-Z0-9_]+\]\]|[@%](?:(")?[\\\?@a-zA-Z0-9_.]*?(?(3)"|)|{{.*}}))(?:\(|$)|undef|inttoptr|bitcast|null|asm).*$)')
addrspace_end = re.compile(r"addrspace\(\d+\)\s*\*$")
func_end = re.compile("(?:void.*|\)\s*)\*$")

def conv(match, line):
  if not match or re.search(addrspace_end, match.group(1)) or not re.search(func_end, match.group(1)):
    return line
  return line[:match.start()] + match.group(1)[:match.group(1).rfind('*')].rstrip() + match.group(2) + line[match.end():]

for line in sys.stdin:
  sys.stdout.write(conv(re.search(pat, line), line))

llvm-svn: 235145
2015-04-16 23:24:18 +00:00
Hans Wennborg
ff837f8fc0 Revert the switch lowering change (r235101, r235103, r235106)
Looks like it broke the sanitizer-ppc64-linux1 build. Reverting for now.

llvm-svn: 235108
2015-04-16 15:43:26 +00:00
Hans Wennborg
78be2559e7 Add a triple to switch.ll test.
llvm-svn: 235103
2015-04-16 15:09:33 +00:00
Hans Wennborg
bc33cd14d7 Switch lowering: extract jump tables and bit tests before building binary tree (PR22262)
This is a major rewrite of the SelectionDAG switch lowering. The previous code
would lower switches as a binary tre, discovering clusters of cases
suitable for lowering by jump tables or bit tests as it went along. To increase
the likelihood of finding jump tables, the binary tree pivot was selected to
maximize case density on both sides of the pivot.

By not selecting the pivot in the middle, the binary trees would not always
be balanced, leading to performance problems in the generated code.

This patch rewrites the lowering to search for clusters of cases
suitable for jump tables or bit tests first, and then builds the binary
tree around those clusters. This way, the binary tree will always be balanced.

This has the added benefit of decoupling the different aspects of the lowering:
tree building and jump table or bit tests finding are now easier to tweak
separately.

For example, this will enable us to balance the tree based on profile info
in the future.

The algorithm for finding jump tables is O(n^2), whereas the previous algorithm
was O(n log n) for common cases, and quadratic only in the worst-case. This
doesn't seem to be major problem in practice, e.g. compiling a file consisting
of a 10k-case switch was only 30% slower, and such large switches should be rare
in practice. Compiling e.g. gcc.c showed no compile-time difference.  If this
does turn out to be a problem, we could limit the search space of the algorithm.

This commit also disables all optimizations during switch lowering in -O0.

Differential Revision: http://reviews.llvm.org/D8649

llvm-svn: 235101
2015-04-16 14:49:23 +00:00
Ahmed Bougacha
697a3e0440 [CodeGen] Re-apply r234809 (concat of scalars), with an x86_mmx fix.
The only type that isn't an integer, isn't floating point, and isn't
a vector; ladies and gentlemen, the gift that keeps on giving: x86_mmx!

Fixes PR23246.

Original message (reverted in r235062):
[CodeGen] Combine concat_vectors of scalars into build_vector.

Combine something like:
  (v8i8 concat_vectors (v2i8 bitcast (i16)) x4)
into:
  (v8i8 (bitcast (v4i16 BUILD_VECTOR (i16) x4)))

If any of the scalars are floating point, use that throughout.

Differential Revision: http://reviews.llvm.org/D8948

llvm-svn: 235072
2015-04-16 02:39:14 +00:00
Duncan P. N. Exon Smith
380b5bd2b0 DebugInfo: Remove 'inlinedAt:' field from MDLocalVariable
Remove 'inlinedAt:' from MDLocalVariable.  Besides saving some memory
(variables with it seem to be single largest `Metadata` contributer to
memory usage right now in -g -flto builds), this stops optimization and
backend passes from having to change local variables.

The 'inlinedAt:' field was used by the backend in two ways:

 1. To tell the backend whether and into what a variable was inlined.
 2. To create a unique id for each inlined variable.

Instead, rely on the 'inlinedAt:' field of the intrinsic's `!dbg`
attachment, and change the DWARF backend to use a typedef called
`InlinedVariable` which is `std::pair<MDLocalVariable*, MDLocation*>`.
This `DebugLoc` is already passed reliably through the backend (as
verified by r234021).

This commit removes the check from r234021, but I added a new check
(that will survive) in r235048, and changed the `DIBuilder` API in
r235041 to require a `!dbg` attachment whose 'scope:` is in the same
`MDSubprogram` as the variable's.

If this breaks your out-of-tree testcases, perhaps the script I used
(mdlocalvariable-drop-inlinedat.sh) will help; I'll attach it to PR22778
in a moment.

llvm-svn: 235050
2015-04-15 22:29:27 +00:00
Duncan P. N. Exon Smith
565a460434 DebugInfo: Add missing !dbg attachments to intrinsics
Add missing `!dbg` attachments to `@llvm.dbg.*` intrinsics.  I updated
these using a script (add-dbg-to-intrinsics.sh) that I'll attach to
PR22778 for posterity.

llvm-svn: 235040
2015-04-15 21:04:10 +00:00
Sanjay Patel
e7d64b577c [X86] add an exedepfix entry for movq == movlps == movlpd
This is a 1-line patch (with a TODO for AVX because that will affect
even more regression tests) that lets us substitute the appropriate
64-bit store for the float/double/int domains.

It's not clear to me exactly what the difference is between the 0xD6 (MOVPQI2QImr) and 
0x7E (MOVSDto64mr) opcodes, but this is apparently the right choice.

Differential Revision: http://reviews.llvm.org/D8691

llvm-svn: 235014
2015-04-15 15:47:51 +00:00
Sanjay Patel
f5d182f516 [x86] Implement combineRepeatedFPDivisors
Set the transform bar at 2 divisions because the fastest current
x86 FP divider circuit is in SandyBridge / Haswell at 10 cycle
latency (best case) relative to a 5 cycle multiplier. 
So that's the worst case for this transform (no latency win), 
but multiplies are obviously pipelined while divisions are not,
so there's still a big throughput win which we would expect to
show up in typical FP code.

These are the sequences I'm comparing:

  divss   %xmm2, %xmm0
  mulss   %xmm1, %xmm0
  divss   %xmm2, %xmm0

Becomes:

  movss   LCPI0_0(%rip), %xmm3    ## xmm3 = mem[0],zero,zero,zero
  divss   %xmm2, %xmm3
  mulss   %xmm3, %xmm0
  mulss   %xmm1, %xmm0
  mulss   %xmm3, %xmm0

[Ignore for the moment that we don't optimize the chain of 3 multiplies
into 2 independent fmuls followed by 1 dependent fmul...this is the DAG
version of: https://llvm.org/bugs/show_bug.cgi?id=21768 ...if we fix that,
then the transform becomes even more profitable on all targets.]

Differential Revision: http://reviews.llvm.org/D8941

llvm-svn: 235012
2015-04-15 15:22:55 +00:00
Krzysztof Parzyszek
2f1ef7b970 Change the testcase mtriple to x86_64-unknown-unknown
llvm-svn: 234900
2015-04-14 15:28:42 +00:00
Krzysztof Parzyszek
6d2e3b066b Add mtriple to test case to avoid problems with different naming schemes
llvm-svn: 234793
2015-04-13 20:24:40 +00:00
Krzysztof Parzyszek
3efcf81e03 Allow memory intrinsics to be tail calls
llvm-svn: 234764
2015-04-13 17:16:45 +00:00
Sanjoy Das
68c711fb56 [InstCombine][CodeGenPrep] Create llvm.uadd.with.overflow in CGP.
Summary:
This change moves creating calls to `llvm.uadd.with.overflow` from
InstCombine to CodeGenPrep.  Combining overflow check patterns into
calls to the said intrinsic in InstCombine inhibits optimization because
it introduces an intrinsic call that not all other transforms and
analyses understand.

Depends on D8888.

Reviewers: majnemer, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8889

llvm-svn: 234638
2015-04-10 21:07:09 +00:00
Reid Kleckner
b5569055ec Avoid spewing binary to stdout in some filetype=obj tests
llvm-svn: 234627
2015-04-10 19:36:55 +00:00
Sanjay Patel
6fe3c1e8ff use update_llc_test_checks.py to tighten checking
test features, not CPUs

remove unnecessary cruft

llvm-svn: 234622
2015-04-10 18:31:42 +00:00
Akira Hatanaka
5d29050f58 [DAGCombine] Fix a bug in MergeConsecutiveStores.
The bug manifests when there are two loads and two stores chained as follows in
a DAG,

(ld v3f32) -> (st f32) -> (ld v3f32) -> (st f32)

and the stores' values are extracted from the preceding vector loads.

MergeConsecutiveStores would replace the first store in the chain with the
merged vector store, which would create a cycle between the merged store node
and the last load node that appears in the chain.

This commits fixes the bug by replacing the last store in the chain instead.

rdar://problem/20275084

Differential Revision: http://reviews.llvm.org/D8849

llvm-svn: 234430
2015-04-08 20:34:53 +00:00
Sanjay Patel
ff4da24886 fixed to test features, not CPU models
llvm-svn: 234413
2015-04-08 16:51:42 +00:00
Daniel Jasper
1e59aac021 Add test showing that MachineLICM is calculating register pressure wrong
More details: http://llvm.org/PR23143

llvm-svn: 234309
2015-04-07 11:41:40 +00:00
Rafael Espindola
4214740c57 Use sext in fast isel.
Fast isel used to zero extends immediates to 64 bits. This normally goes
unnoticed because the value is truncated to 32 bits for output.

Two cases were it is noticed:

* We fail to use smaller encodings.
* If the original constant was smaller than i32.

In the tests using i1 constants, codegen would change to use -1, which is fine
(and matches what regular isel does) since only the lowest bit is then used.

Instead, this patch then changes the ir to use i8 constants, which looks more
like what clang produces.

llvm-svn: 234249
2015-04-06 22:29:07 +00:00
Simon Pilgrim
a5a17c0b23 [X86][SSE] Use (V)PINSRB for direct byte insertion in 16i8 buildvector on SSE4.1 targets
This patch allows SSE4.1 targets to use (V)PINSRB to create 16i8 vectors by inserting i8 scalars directly into a XMM register instead of merging pairs of i8 scalars into a i16 and using the SSE2 PINSRW instruction.

This allows folding of byte loads and reduces scalar register usage as well.

Differential Revision: http://reviews.llvm.org/D8839

llvm-svn: 234193
2015-04-06 18:39:00 +00:00
Simon Pilgrim
e8ed19bff1 [DAGCombiner] Add support for FCEIL, FFLOOR and FTRUNC vector constant folding
Differential Revision: http://reviews.llvm.org/D8715

llvm-svn: 234179
2015-04-06 17:15:41 +00:00
Rafael Espindola
0601e2fab8 Implement unique sections with an unique ID.
This allows the compiler/assembly programmer to switch back to a
section. This in turn fixes the bootstrap failure on powerpc (tested
on gcc110) without changing the ppc codegen at all.

I will try to cleanup the various getELFSection overloads in a  followup patch.
Just using a default argument now would lead to ambiguities.

llvm-svn: 234099
2015-04-04 18:02:01 +00:00
Simon Pilgrim
9b09e6fee5 [DAGCombiner] Canonicalize vector constants for ADD/MUL/AND/OR/XOR re-association
Scalar integers are commuted to move constants to the RHS for re-association - this ensures vectors do the same.

llvm-svn: 234092
2015-04-04 10:20:31 +00:00
Craig Topper
54ac1d95cd [X86] Don't use GR64 register 'and with immediate' instructions if the immediate is zero in the upper 33-bits or upper 57-bits. Use GR32 instructions instead.
Previously the patterns didn't have high enough priority and we would only use the GR32 form if the only the upper 32 or 56 bits were zero.

Fixes PR23100.

llvm-svn: 234075
2015-04-04 02:08:20 +00:00
Sanjay Patel
ac7b801edf use update_llc_test_checks.py to tighten checking; remove unnecessary testing params
llvm-svn: 234029
2015-04-03 17:17:50 +00:00
Sanjay Patel
09dfdcbcd0 use update_llc_test_checks.py to tighten checking; remove unnecessary testing params
llvm-svn: 234027
2015-04-03 17:13:31 +00:00
Sanjay Patel
ad7b3271f0 use update_llc_test_checks.py to tighten checking; remove unnecessary testing params
llvm-svn: 234024
2015-04-03 17:09:37 +00:00
Sanjay Patel
682732f0ca use update_llc_test_checks.py to tighten checking
remove redundant and unnecessary test parameters

llvm-svn: 234022
2015-04-03 17:02:48 +00:00
Sanjay Patel
074d61ed44 add checks; remove redundant testing parameters
llvm-svn: 234020
2015-04-03 16:44:42 +00:00
Sanjay Patel
73ded85ef2 use update_llc_test_checks.py to tighten checking; remove darwin and sandybridge overspecification
llvm-svn: 234017
2015-04-03 16:06:58 +00:00
Simon Pilgrim
828e8b6f6d Added vector tests for DAGCombiner::ReassociateOps
Missing vector tests for rL233482

llvm-svn: 234015
2015-04-03 15:04:46 +00:00
Simon Pilgrim
4f794f11fb [X86] Added SSE4.2 CRC32 memory folding patterns + tests
llvm-svn: 234013
2015-04-03 14:24:40 +00:00
Simon Pilgrim
380961e86b [X86][3DNow] Added 3DNow! memory folding patterns + tests
llvm-svn: 234008
2015-04-03 11:50:30 +00:00
Simon Pilgrim
a600d06557 [X86][MMX] Added MMX stack folding tests
llvm-svn: 234006
2015-04-03 11:01:15 +00:00
Simon Pilgrim
335a565d46 [DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR
This patch attempts to fold the shuffling of 'scalar source' inputs - BUILD_VECTOR and SCALAR_TO_VECTOR nodes - if the shuffle node is the only user. This folds away a lot of unnecessary shuffle nodes, and allows quite a bit of constant folding that was being missed.

Differential Revision: http://reviews.llvm.org/D8516

llvm-svn: 234004
2015-04-03 10:02:21 +00:00
Sanjay Patel
ee1b1c4540 [AVX] Improve insertion of i8 or i16 into low element of 256-bit zero vector
Without this patch, we split the 256-bit vector into halves and produced something like:
	movzwl	(%rdi), %eax
	vmovd	%eax, %xmm0
	vxorps	%xmm1, %xmm1, %xmm1
	vblendps	$15, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]

Now, we eliminate the xor and blend because those zeros are free with the vmovd:
        movzwl  (%rdi), %eax
        vmovd   %eax, %xmm0

This should be the final fix needed to resolve PR22685:
https://llvm.org/bugs/show_bug.cgi?id=22685

llvm-svn: 233941
2015-04-02 20:21:52 +00:00
Sanjay Patel
c17be4ec1a [X86, AVX] adjust tablegen patterns to generate better code for scalar insertion into zero vector (PR23073)
For code like this:

define <8 x i32> @load_v8i32() {
  ret <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
}

We produce this AVX code:

_load_v8i32:                            ## @load_v8i32
  movl	$7, %eax
  vmovd	%eax, %xmm0
  vxorps	%ymm1, %ymm1, %ymm1
  vblendps	$1, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0],ymm1[1,2,3,4,5,6,7]
  retq

There are at least 2 bugs in play here:

    We're generating a blend when a move scalar does the same job using 2 less instruction bytes (see FIXMEs).
    We're not matching an existing pattern that would eliminate the xor and blend entirely. The zero bytes are free with vmovd.

The 2nd fix involves an adjustment of "AddedComplexity" [1] and mostly masks the 1st problem.

[1] AddedComplexity has close to no documentation in the source. 
The best we have is this comment: "roughly corresponds to the number of nodes that are covered". 
It appears that x86 has bastardized this definition by inflating its values for some other
undocumented reason. For example, we have a pattern with "AddedComplexity = 400" (!). 

I searched my way to this page:
https://groups.google.com/forum/#!topic/llvm-dev/5UX-Og9M0xQ

Differential Revision: http://reviews.llvm.org/D8794

llvm-svn: 233931
2015-04-02 17:56:17 +00:00
Elena Demikhovsky
74d944b41a AVX-512: intrinsics for VPADD, VPMULDQ and VPSUB
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 233906
2015-04-02 10:51:40 +00:00
Philip Reames
d3d2df55e2 Teach gcroot how to handle dynamically realigned frames
I'm playing with supporting custom stack map formats with statepoints.  While 
doing so, I noticed that the existing implementation didn't indicate inherently 
unsized frames.  This change essentially just ports the functionality that already 
exists for the default StackMaps section to custom stackmaps.

llvm-svn: 233891
2015-04-02 05:00:40 +00:00
Benjamin Kramer
a711d21dc6 [X86] Don't accidentally select shll $1, %eax when shrinking an immediate.
addl has higher throughput and this was needlessly picking a suboptimal
encoding causing PR23098.

I wish there was a way of doing this without further duplicating tbl-
generated patterns, but so far I haven't found one.

llvm-svn: 233832
2015-04-01 19:01:09 +00:00
Sanjay Patel
e16f423a6a [X86, AVX] fix zero-extending integer operand load patterns to use integer instructions
This is a follow-on to r233704 and another partial fix for PR22685:
https://llvm.org/bugs/show_bug.cgi?id=22685

llvm-svn: 233724
2015-03-31 18:43:43 +00:00
Sanjay Patel
c33fdfa218 [X86, AVX] try to lowerVectorShuffleAsElementInsertion() for all 256-bit vector sub-types
I suggested this change in D7898 (http://llvm.org/viewvc/llvm-project?view=revision&revision=231354)

It improves the v4i64 case although not optimally. This AVX codegen:

  vmovq {{.*#+}} xmm0 = mem[0],zero
  vxorpd %ymm1, %ymm1, %ymm1
  vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3]

Becomes:

  vmovsd {{.*#+}} xmm0 = mem[0],zero

Unfortunately, this doesn't completely solve PR22685. There are still at least 2 problems under here:

    We're not handling v32i8 / v16i16.
    We're not getting the FP / int domains right for instruction selection.

But since this patch alone appears to do no harm, reduces code duplication, and helps v4i64, 
I'm submitting this patch ahead of fixing the above.

Differential Revision: http://reviews.llvm.org/D8341

llvm-svn: 233704
2015-03-31 16:32:11 +00:00
Ahmed Bougacha
5df03ee9ed [X86] Generate MOVNT for all vector types.
We used to miss non-Q YMM integer vectors, and, non-Q/D XMM integer
vectors.
While there, change the v4i32 patterns to prefer MOVNTDQ.

llvm-svn: 233668
2015-03-31 03:16:51 +00:00
Paul Robinson
84aea79e47 Verify 'optnone' can run DAG combiner when appropriate.
Adds a test to verify the behavior that r233153 restored: 'optnone'
does not spuriously disable the DAG combiner, and in fact there are
cases where the DAG combiner must run (even at -O0 or 'optnone') in
order for codegen to succeed.

Differential Revision: http://reviews.llvm.org/D8614

llvm-svn: 233584
2015-03-30 19:37:44 +00:00
Simon Pilgrim
48aa5918cd [X86] Ensure integer domain on scalar i64 load/store stack folding tests. NFC
llvm-svn: 233553
2015-03-30 15:25:51 +00:00
Elena Demikhovsky
0e38b477c5 AVX-512: blank lines, duplicated tests, no functional changes
see comments http://reviews.llvm.org/D6835

llvm-svn: 233528
2015-03-30 09:29:28 +00:00
Elena Demikhovsky
28fa559e12 AVX-512: added intrinsics for VPAND, VPOR and VPXOR
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 233525
2015-03-30 08:30:34 +00:00
Benjamin Kramer
40ebf4f0c7 [inline asm] Don't reject duplicated matching constraints
They're harmless and it's easy to generate them from clang, leading to
a crash in LLVM. Found by afl-fuzz.

llvm-svn: 233500
2015-03-29 20:33:07 +00:00
Duncan P. N. Exon Smith
c7161ce869 DebugInfo: Fix testcases with invalid MDSubprogram nodes
Fix testcases that don't pass the verifier after a WIP patch to check
`MDSubprogram` operands more effectively.  I found the following issues:

  - When `isDefinition: false`, the `variables:` field might point at
    `!{i32 786468}`, or at a tuple that pointed at an empty tuple with
    the comment "previously: invalid DW_TAG_base_type" (I vaguely recall
    adding those comments during an upgrade script).  In these cases, I
    just dropped the array.
  - The `variables:` field might point at something like `!{!{!8}}`,
    where `!8` was an `MDLocation`.  I removed the extra layer of
    indirection.
  - Invalid `type:` (not an `MDSubroutineType`).

llvm-svn: 233466
2015-03-28 02:26:45 +00:00
Duncan P. N. Exon Smith
4d29d309de DebugInfo: Fix bad debug info for compile units and types
Fix debug info in these tests, which started failing with a WIP patch to
verify compile units and types.  The problems look like they were all
caused by bitrot.  They fell into these categories:

  - Using `!{i32 0}` instead of `!{}`.
  - Using `!{null}` instead of `!{}`.
  - Using `!MDExpression()` instead of `!{}`.
  - Using `!8` instead of `!{!8}`.
  - `file:` references that pointed at `MDCompileUnit`s instead of the
    same `MDFile` as the compile unit.
  - `file:` references that were numerically off-by-one or (off-by-ten).

llvm-svn: 233415
2015-03-27 20:46:33 +00:00
Quentin Colombet
fa5e8e64e6 [RegisterCoalescer] Refine the terminal rule to still consider the terminal
nodes.
When a node is terminal it is pushed at the end of the list of the copies to
coalesce instead of being completely ignored. In effect, this reduces its
priority over non-terminal nodes.

Because of that, we do not miss the rematerialization opportunities, nor the
copies that can be merged with more complex, than the terminal rule,
interference checks.

Related to PR22768.

llvm-svn: 233395
2015-03-27 18:37:15 +00:00
Duncan P. N. Exon Smith
f680a75cce LLParser: Require non-null scope for MDLocation and MDLocalVariable
Change `LLParser` to require a non-null `scope:` field for both
`MDLocation` and `MDLocalVariable`.  There's no need to wait for the
verifier for this check.  This also allows their `::getImpl()` methods
to assert that the incoming scope is non-null.

llvm-svn: 233394
2015-03-27 17:56:39 +00:00
Rafael Espindola
7bdf3f2e35 Close unique sections when switching away from them.
It is not possible to switch back to unique secitons, so close them
automatically when switching away.

llvm-svn: 233380
2015-03-27 15:01:40 +00:00
Philip Reames
007c0af082 Require a GC strategy be specified for functions which use gc.statepoint
This was discussed a while back and I left it optional for migration.  Since it's been far more than the 'week or two' that was discussed, time to actually make this manditory.  

llvm-svn: 233357
2015-03-27 05:09:33 +00:00
Philip Reames
1e1d0dcae5 Allow explicit spill slots to be specified for a gc.statepoint
This patch adds support for explicitly provided spill slots in the GC arguments of a gc.statepoint.  This is somewhat analogous to gcroot, but leverages the STATEPOINT MI node and StackMap infrastructure.  The motivation for this is:
1) The stack spilling code for gc.statepoints hasn't advanced as fast as I'd like.  One major option is to give up on doing spilling in the backend and do it at the IR level instead.  We'd give up the ability to have gc values in registers, but that's a minor cost in practice.  We are not neccessarily moving in that direction, but having the ability to prototype such a thing cheaply is interesting.
2) I want to port the gcroot lowering to use the statepoint infastructure.  Given the metadata printers for gcroot expect a fixed set of stack roots, it's easiest to just reuse the explicit stack slots and pass them directly to the underlying statepoint.  

I'm holding off on the documentation for the new feature until I'm reasonable sure this is going to stick around.

llvm-svn: 233356
2015-03-27 04:52:48 +00:00
Andrew Trick
cd803b2a1d Reintroduce the SelectionDAG scheduler test for r233351.
This test returns nonnative integer types which aren't supported on all targets.
The real issue with the SelectionDAG scheduler is with x86 EFLAGS.

llvm-svn: 233355
2015-03-27 04:42:52 +00:00
Duncan P. N. Exon Smith
bbcad499ab DebugInfo: Update testcases with invalid variables
Fix testcases whose variables are invalid.  I'm working on a patch that
adds `Verifier` checks for `MDLocalVariable` (and `MDGlobalVariable`),
and these failed because:

  - `scope:` fields need to point at `MDLocalScope` and can't be null.
  - `file:` fields need to point at `MDFile`.
  - `inlinedAt:` fields need to point at `MDLocation`.

llvm-svn: 233349
2015-03-27 01:58:34 +00:00
Andrea Di Biagio
e2ef12f536 [X86][FastIsel] Teach how to select vector load instructions.
This patch teaches fast-isel how to select 128-bit vector load instructions.
Added test CodeGen/X86/fast-isel-vecload.ll

Differential Revision: http://reviews.llvm.org/D8605

llvm-svn: 233270
2015-03-26 11:29:02 +00:00
Quentin Colombet
f4be6ae977 [RegisterCoalescer] Add a rule to consider more profitable copies first when
those are in the same basic block.
The previous approach was the topological order of the basic block.

By default this rule is disabled.

Related to PR22768.

llvm-svn: 233241
2015-03-26 01:01:48 +00:00
Simon Pilgrim
2751d2eb79 [DAGCombiner] Add support for TRUNCATE + FP_EXTEND vector constant folding
This patch adds supports for the vector constant folding of TRUNCATE and FP_EXTEND instructions and tidies up the SINT_TO_FP and UINT_TO_FP instructions to match.

It also moves the vector constant folding for the FNEG and FABS instructions to use the DAG.getNode() functionality like the other unary instructions.

Differential Revision: http://reviews.llvm.org/D8593

llvm-svn: 233224
2015-03-25 22:30:31 +00:00
Sanjay Patel
57d34fb183 [X86, AVX] improve insertion into zero element of 256-bit vector
This patch allows AVX blend instructions to handle insertion into the low
element of a 256-bit vector for the appropriate data types.

For f32, instead of:

   vblendps	$1, %xmm1, %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[1,2,3]
   vblendps	$15, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7]

we get:

   vblendps	$1, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0],ymm0[1,2,3,4,5,6,7]

For f64, instead of:

   vmovsd	%xmm1, %xmm0, %xmm1     ## xmm1 = xmm1[0],xmm0[1]
   vblendpd	$3, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0,1],ymm0[2,3]

we get:

   vblendpd	$1, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0],ymm0[1,2,3]

For the hardware-neglected integer data types, I left a TODO comment in the
code and added regression tests for a follow-on patch.

Differential Revision: http://reviews.llvm.org/D8609

llvm-svn: 233199
2015-03-25 17:36:01 +00:00
Sanjay Patel
8d24c6e726 use update_llc_test_checks.py to tighten checking in these tests
1. There were no CHECK-LABELs, so we could match instructions from the wrong function.
2. The use of zero operands meant multiple xor instructions could match some CHECKs.
3. The test was over-specified to need a Sandybridge CPU and Darwin triple.

llvm-svn: 233198
2015-03-25 17:34:11 +00:00
Andrea Di Biagio
7910ff680c [X86] Simplify check lines in tests. No functional change.
Also, removed unused check lines from test atomic6432.ll.

llvm-svn: 233181
2015-03-25 11:44:19 +00:00
Paul Robinson
ef36d53059 'optnone' should not disable DAG combiner.
Reverts the code change from r221168 and the relevant test.
It was a mistake to disable the combiner, and based on the ultimate
definition of 'optnone' we shouldn't have considered the test case
as failing in the first place.

llvm-svn: 233153
2015-03-25 00:10:24 +00:00
Reid Kleckner
b3c593a951 X86: Fix frameescape when not using an FP
We can't use TargetFrameLowering::getFrameIndexOffset directly, because
Win64 really wants the offset from the stack pointer at the end of the
prologue. Instead, use X86FrameLowering::getFrameIndexOffsetFromSP(),
which is a pretty close approximiation of that. It fails to handle cases
with interestingly large stack alignments, which is pretty uncommon on
Win64 and is TODO.

llvm-svn: 233137
2015-03-24 23:46:01 +00:00
Sanjay Patel
b1b1054a09 [X86, AVX] recognize shufflevector with zero input as a vperm2 (PR22984)
vperm2x128 instructions have the special ability (aka free hardware capability)
to shuffle zero values into a vector.

This patch recognizes that type of shuffle and generates the appropriate
control byte.

https://llvm.org/bugs/show_bug.cgi?id=22984

Differential Revision: http://reviews.llvm.org/D8563

llvm-svn: 233100
2015-03-24 19:19:07 +00:00
Simon Pilgrim
67352336fb [SelectionDAG] Fixed issue with uitofp vector constant folding being treated as sitofp
While the uitofp scalar constant folding treats an integer as an unsigned value (from lang ref):

%X = sitofp i8 -1 to double ; yields double:-1.0
%Y = uitofp i8 -1 to double ; yields double:255.0

The vector constant folding was always using sitofp:

%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>

This patch fixes this so that the correct opcode is used for sitofp and uitofp.

%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double 255.0, double 255.0>

Differential Revision: http://reviews.llvm.org/D8560

llvm-svn: 233033
2015-03-23 22:44:55 +00:00
Duncan P. N. Exon Smith
e1ff992bb4 DebugInfo: Overload get() in DIDescriptor subclasses
Continue to simplify the `DIDescriptor` subclasses, so that they behave
more like raw pointers.  Remove `getRaw()`, replace it with an
overloaded `get()`, and overload the arrow and cast operators.  Two
testcases started to crash on the arrow operators with this change
because of `scope:` references that weren't real scopes.  I fixed them.
Soon I'll add verifier checks for them too.

This also adds explicit dereference operators.  Previously, the builtin
dereference against `operator MDNode *()` would have worked, but now the
builtins are ambiguous.

llvm-svn: 233030
2015-03-23 21:54:07 +00:00
Simon Pilgrim
c93710534f Tidied up vec_zero_cse.ll test. NFCI.
Added target triple and refactored the CHECKs to be per function.

llvm-svn: 232894
2015-03-21 14:05:12 +00:00
Eric Christopher
3d3373d3e2 Cache the Function dependent subtarget on the MachineFunction.
As preparation for removing the getSubtargetImpl() call from
TargetMachine go ahead and flip the switch on caching the function
dependent subtarget and remove the bare getSubtargetImpl call
from the X86 port. As part of this add a few tests that show we
can generate code and assemble on X86 based on features/cpu on
the Function.

llvm-svn: 232879
2015-03-21 03:13:10 +00:00
Sanjay Patel
34ad366455 [X86] Prefer blendps over insertps codegen for one special case
With this patch, for this one exact case, we'll generate:

  blendps %xmm0, %xmm1, $1

instead of:

  insertps %xmm0, %xmm1, $0

If there's a memory operand available for load folding and we're
optimizing for size, we'll still generate the insertps.

The detailed performance data motivation for this may be found in D7866; 
in summary, blendps has 2-3x throughput vs. insertps on widely used chips.

Differential Revision: http://reviews.llvm.org/D8332

llvm-svn: 232850
2015-03-20 21:19:52 +00:00
Daniel Jasper
86b4584e4b [MBP] Don't outline short optional branches
With the option -outline-optional-branches, LLVM will place optional
branches out of line (more details on r231230).

With this patch, this is not done for short optional branches. A short
optional branch is a branch containing a single block with an
instruction count below a certain threshold (defaulting to 3). Still
everything is guarded under -outline-optional-branches).

Outlining a short branch can't significantly improve code locality. It
can however decrease performance because of the additional jmp and in
cases where the optional branch is hot. This fixes a compile time
regression I have observed in a benchmark.

Review: http://reviews.llvm.org/D8108
llvm-svn: 232802
2015-03-20 10:00:37 +00:00
Sanjay Patel
7bfdf498b8 [X86, AVX] use blends instead of insert128 with index 0
Another case of x86-specific shuffle strength reduction:
avoid generating insert*128 instructions with index 0 because
they are slower than their non-lane-changing blend equivalents.

Shuffle lowering already catches most of these cases, but
the zero vector case and some other paths such as in the
modified test in vector-shuffle-256-v32.ll were getting
through.

Differential Revision: http://reviews.llvm.org/D8366

llvm-svn: 232773
2015-03-19 22:29:40 +00:00
Simon Pilgrim
3e49d9aa2b Fixed failing test due to missing target triple causing different results on different buildbots.
llvm-svn: 232685
2015-03-18 22:51:45 +00:00
Simon Pilgrim
6f98dca24d [X86][SSE] Avoid scalarization of v2i64 vector shifts (REAPPLIED)
Fixed broken tests.

Differential Revision: http://reviews.llvm.org/D8416

llvm-svn: 232682
2015-03-18 22:18:51 +00:00
Eric Christopher
60fdac43a1 Revert "[X86][SSE] Avoid scalarization of v2i64 vector shifts" as it
appears to have broken tests/bots.

This reverts commit r232660.

llvm-svn: 232670
2015-03-18 21:01:00 +00:00
Simon Pilgrim
97919c9f36 [X86][SSE] Avoid scalarization of v2i64 vector shifts
Currently v2i64 vectors shifts (non-equal shift amounts) are scalarized, costing 4 x extract, 2 x x86-shifts and 2 x insert instructions - and it gets even more awkward on 32-bit targets.

This patch separately shifts the vector by both shift amounts and then shuffles the partial results back together, costing 2 x shuffles and 2 x sse-shifts instructions (+ 2 movs on pre-AVX hardware).

Note - this patch only improves the SHL / LSHR logical shifts as only these are supported in SSE hardware.

Differential Revision: http://reviews.llvm.org/D8416

llvm-svn: 232660
2015-03-18 19:35:31 +00:00
Matthias Braun
77986c7d5d TableGen: Fix register class lane masks being too conservative.
When calculating the lanemask of a register class we have to include the
masks of subregisters supported by any of the class members, not just
the ones supported by all class members.

This fixes problems when coalescing towards a subclass with additional
subregisters available.

The attached testcase works fine as is, but does crash if you enable
subregister liveness on x86 without this change applied.

llvm-svn: 232652
2015-03-18 17:56:09 +00:00
Sanjay Patel
fa74d9a602 Use utils/update_llc_test_checks.py to update all CHECKs
The checks here were so vague that we could nuke intrinsics
from existence and still pass the test because we'd match
the function name.

llvm-svn: 232647
2015-03-18 16:38:44 +00:00
Sanjay Patel
e60e76fab6 fixed to test features, not CPU model
The 'vmovntdq' was only passing due to a fluke in
SandyBridge codegen that splits 32-byte stores in half, 
but that meant that the test was not correctly checking
for the 32-byte store that we thought we were generating.

The lax checking in this file will be addressed in
another commit. There are bigger problems here.

llvm-svn: 232644
2015-03-18 16:07:10 +00:00
Daniel Jasper
3b0ddfa292 Change test to accept an additional critical edge split.
The two hot blocks are right next to each other and I verified that
there is no performance regression by compressing/uncompressing some
files with a minigzip built with the different options.

llvm-svn: 232629
2015-03-18 12:45:45 +00:00
Josh Magee
9342392187 Add testcases for BEXTR.
These BEXTR cases are a check for the 64-bit load form and two negative cases where the bitrange is non-contiguous.  From a private patch equivalent to r189742/PR17028.

llvm-svn: 232580
2015-03-18 01:34:06 +00:00
David Majnemer
ec30fe4691 DAGCombiner: fold (xor (shl 1, x), -1) -> (rotl ~1, x)
Targets which provide a rotate make it possible to replace a sequence of
(XOR (SHL 1, x), -1) with (ROTL ~1, x).  This saves an instruction on
architectures like X86 and POWER(64).

Differential Revision: http://reviews.llvm.org/D8350

llvm-svn: 232572
2015-03-18 00:03:36 +00:00
David Majnemer
de51ea1b14 COFF: Let globals with private linkage reside in their own section
COFF COMDATs (for selection kinds other than 'select any') require at
least one non-section symbol in the symbol table.
Satisfy this by morally enhancing the linkage from private to internal.

Differential Revision: http://reviews.llvm.org/D8394

llvm-svn: 232570
2015-03-17 23:54:51 +00:00
David Majnemer
a16c93669c Revert "COFF: Let globals with private linkage reside in their own section"
This reverts commit r232539.  This was committed accidently.

llvm-svn: 232543
2015-03-17 20:41:11 +00:00
David Majnemer
9c4a0b633b COFF: Let globals with private linkage reside in their own section
Summary:
COFF COMDATs (for selection kinds other than 'select any') require at
least one non-section symbol in the symbol table.
Satisfy this by morally enhancing the linkage from private to internal.

Reviewers: rafael

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8374

llvm-svn: 232539
2015-03-17 20:39:25 +00:00
Rafael Espindola
7351149bdc Replace a use of GetTempSymbol with createTempSymbol.
This is cleaner and avoids a crash in a corner case.

llvm-svn: 232471
2015-03-17 12:54:04 +00:00
David Majnemer
dbb6db49e6 CodeGen: @llvm.eh.typeid.for replaced @llvm.eh.typeid.for.i32
We removed @llvm.eh.typeid.for.i32 and replaced it with
@llvm.eh.typeid.for quite some time ago.  Fix up some test cases which
never got updated.

llvm-svn: 232421
2015-03-16 21:36:38 +00:00
Duncan P. N. Exon Smith
9b4abdcdd5 DebugInfo: Fix testcases that fail -verify-debug-info=true
As part of PR22777, fix testcases that fail the debug info verifier.
The changes fall into the following categories:

  - Empty `filename:` fields in `MDFile`s.  Compile units and some types
    require non-empty filenames.  A number of testcases have empty
    filenames, probably due to hand-reduction of testcases.
  - Not-quite empty arrays: `!{i32 0}`.  This used to be equivalent in
    the debug info schema to `!{}`.  They cause problems for
    `!MDSubroutineType`'s `types:` array, since it requires all operands
    to be valid types.  (Note that `!{null}` is the correct type array
    for functions that take no arguments and return `void`.)
  - Significantly bitrotted testcases.  Nodes got left behind a few
    upgrades ago because of missing or invalid tags.

llvm-svn: 232415
2015-03-16 21:10:12 +00:00
Sanjay Patel
418bf65e9d fixed to test feature, not CPU
llvm-svn: 232398
2015-03-16 18:24:28 +00:00
Sanjay Patel
356f802585 add CHECK-LABELs for more reliable testing
llvm-svn: 232391
2015-03-16 17:59:07 +00:00
Sanjay Patel
e787fac623 fixed to test feature, not CPU; removed unnecessary declaration
llvm-svn: 232387
2015-03-16 17:01:34 +00:00
Rafael Espindola
f61c5f7d3a Use the i8 immediate cmp instructions when possible.
llvm-svn: 232378
2015-03-16 14:25:08 +00:00
Simon Pilgrim
cd898479c0 [SSE} Added tests for float4-float3 conversions (PR11580)
llvm-svn: 232324
2015-03-15 16:19:15 +00:00
Simon Pilgrim
dee198643e Simplified some stack folding tests.
Replaced explicit pmovzx* intrinsic tests with general shuffles

llvm-svn: 232286
2015-03-14 23:16:43 +00:00
Daniel Jasper
a27feb2b3d [MachineLICM] First steps of sinking GEPs near calls.
Specifically, if there are copy-like instructions in the loop header
they are moved into the loop close to their uses. This reduces the live
intervals of the values and can avoid register spills.

This is working towards a fix for http://llvm.org/PR22230.
Review: http://reviews.llvm.org/D7259

Next steps:
- Find a better cost model (which non-copy instructions should be sunk?)
- Make this dependent on register pressure

llvm-svn: 232262
2015-03-14 10:58:38 +00:00
Ahmed Bougacha
b8ea46fe37 Add a bunch of CHECK missing colons in tests. NFC.
Some wouldn't pass;  fixed most, the rest will be fixed separately.

llvm-svn: 232239
2015-03-14 01:43:57 +00:00
Rafael Espindola
97eb5e9037 Use add32ri8 and friends on fast isel.
This fixes pr22854.

The core issue on the bug is that there are multiple instructions that
print the same in assembly. In fact, there doesn't seem to be any
syntax for specifying that a constant that fits in 8 bits should use a 32 bit
immediate.

The attached patch changes fast isel to consider i16immSExt8,
i32immSExt8, and i64immSExt8. They were disabled because fastisel didn’t know
to call the predicate back in the day.

llvm-svn: 232223
2015-03-13 22:18:18 +00:00
David Blaikie
3ea2df7c7b [opaque pointer type] Add textual IR support for explicit type parameter to gep operator
Similar to gep (r230786) and load (r230794) changes.

Similar migration script can be used to update test cases, which
successfully migrated all of LLVM and Polly, but about 4 test cases
needed manually changes in Clang.

(this script will read the contents of stdin and massage it into stdout
- wrap it in the 'apply.sh' script shown in previous commits + xargs to
apply it over a large set of test cases)

import fileinput
import sys
import re

rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s*\()((<\d*\s+x\s+)?([^@]*?)(|\s*addrspace\(\d+\))\s*\*(?(3)>)\s*)(?=$|%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|zeroinitializer|<|\[\[[a-zA-Z]|\{\{)", re.MULTILINE | re.DOTALL)

def conv(match):
  line = match.group(1)
  line += match.group(4)
  line += ", "
  line += match.group(2)
  return line

line = sys.stdin.read()
off = 0
for match in re.finditer(rep, line):
  sys.stdout.write(line[off:match.start()])
  sys.stdout.write(conv(match))
  off = match.end()
sys.stdout.write(line[off:])

llvm-svn: 232184
2015-03-13 18:20:45 +00:00
Andrea Di Biagio
f18bec8053 [X86][AVX] Fix wrong lowering of v4x64 shuffles into concat_vector plus extract_subvector nodes.
This patch fixes a bug in the shuffle lowering logic implemented by function
'lowerV2X128VectorShuffle'.

The are few cases where function 'lowerV2X128VectorShuffle' wrongly expands a
shuffle of two v4X64 vectors into a CONCAT_VECTORS of two EXTRACT_SUBVECTOR
nodes. The problematic expansion only occurs when the shuffle mask M has an
'undef' element at position 2, and M is equivalent to mask <0,1,4,5>.
In that case, the algorithm propagates the wrong vector to one of the two
new EXTRACT_SUBVECTOR nodes.

Example:
;;
define <4 x double> @test(<4 x double> %A, <4 x double> %B) {
entry:
  %0 = shufflevector <4 x double> %A, <4 x double> %B, <4 x i32><i32 undef, i32 1, i32 undef, i32 5>
  ret <4 x double> %0
}
;;

Before this patch, llc (-mattr=+avx) generated:
  vinsertf128 $1, %xmm0, %ymm0, %ymm0

With this patch, llc correctly generates:
  vinsertf128 $1, %xmm1, %ymm0, %ymm0

Added test lower-vec-shuffle-bug.ll

Differential Revision: http://reviews.llvm.org/D8259

llvm-svn: 232179
2015-03-13 17:29:49 +00:00
Sanjay Patel
13a9b5db63 [X86, AVX2] Replace inserti128 and extracti128 intrinsics with generic shuffles
This should complete the job started in r231794 and continued in r232045:
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.

AVX2 introduced proper integer variants of the hacked integer insert/extract
C intrinsics that were created for this same functionality with AVX1.

This should complete the removal of insert/extract128 intrinsics.

The Clang precursor patch for this change was checked in at r232109.

llvm-svn: 232120
2015-03-12 23:16:18 +00:00
Simon Pilgrim
22ed1c063a Removed useless palignr test - we don't actually provide a llvm.x86.ssse3.palign.r.128 intrinsic
Differential Revision: http://reviews.llvm.org/D8302

llvm-svn: 232108
2015-03-12 21:42:03 +00:00
Quentin Colombet
77e9397bd4 [X86] Fix a regression introduced by r223641.
The permps and permd instructions have their operands swapped compared to the
intrinsic definition. Therefore, they do not fall into the INTR_TYPE_2OP
category.

I did not create a new category for those two, as they are the only one AFAICT
in that case.

<rdar://problem/20108262>

llvm-svn: 232085
2015-03-12 19:34:12 +00:00
Andrea Di Biagio
b58c185f5a [X86] Fix wrong target specific combine on SETCC nodes.
Part of the folding logic implemented by function 'PerformISDSETCCCombine'
only worked under the assumption that the condition code in input could have
been either SETNE or SETEQ.
Unfortunately that assumption was incorrect, and in some cases the algorithm
ended up incorrectly folding SETCC nodes.

The incorrect folding only affected SETCC dag nodes where:
 - one of the operands was a build_vector of all zeroes;
 - the other operand was a SIGN_EXTEND from a vector of MVT:i1 elements;
 - the condition code was neither SETNE nor SETEQ.

Example:
  (setcc (v4i32 (sign_extend v4i1:%A)), (v4i32 VectorOfAllZeroes), setge)

Before this patch, the entire dag node sequence from the example was
incorrectly folded to node %A.

With this patch, the dag node sequence is folded to a
  (xor %A, (v4i1 VectorOfAllOnes)).

Added test setcc-combine.ll.

Thanks to Greg Bedwell for spotting this issue.

llvm-svn: 232046
2015-03-12 15:16:58 +00:00
Sanjay Patel
aa9ea9aae7 [X86, AVX] replace vextractf128 intrinsics with generic shuffles
Now that we've replaced the vinsertf128 intrinsics, 
do the same for their extract twins.

This is very much like D8086 (checked in at r231794):
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.

This is also the LLVM sibling to the cfe D8275 patch.

Differential Revision: http://reviews.llvm.org/D8276

llvm-svn: 232045
2015-03-12 15:15:19 +00:00
Simon Pilgrim
69f357aff0 [X86][AVX2] Added missing palignr stack folding test
llvm-svn: 232033
2015-03-12 13:12:33 +00:00
Reid Kleckner
997aab28ac Stop calling DwarfEHPrepare from WinEHPrepare
Instead, run both EH preparation passes, and have them both ignore
functions with unrecognized EH personalities. Pass delegation involved
some hacky code for creating an AnalysisResolver that we don't need now.

llvm-svn: 231995
2015-03-12 00:36:20 +00:00
Reid Kleckner
789a64e429 Handle big index in getelementptr instruction
CodeGen incorrectly ignores (assert from APInt) constant index bigger
than 2^64 in getelementptr instruction. This is a test and fix for that.

Patch by Paweł Bylica!

Reviewed By: rnk

Subscribers: majnemer, rnk, mcrosier, resistor, llvm-commits

Differential Revision: http://reviews.llvm.org/D8219

llvm-svn: 231984
2015-03-11 23:36:10 +00:00
Sanjay Patel
b2c96e6a26 add CHECK-LABELs for better reliability
llvm-svn: 231962
2015-03-11 20:12:07 +00:00
Rafael Espindola
268252825b Put jump tables in unique sections on COFF.
If a function is going in an unique section (because of -ffunction-sections
for example), putting a jump table in .rodata will keep .rodata alive and
that will keep alive any other function that also has a jump table.

Instead, put the jump table in a unique section that is associated with the
function.

llvm-svn: 231961
2015-03-11 19:58:37 +00:00
Derek Schuff
5d3b63bf2e Make NaCl's use of .init_array for static constructors match Linux
Summary:
The generic ELF TargetObjectFile defaults to .ctors, but Linux's
defaults to .init_array by calling InitializeELF with the value of
UseInitArray from TargetMachine. Make NaCl's behavior match.

Reviewers: jvoung
Differential Revision: http://reviews.llvm.org/D8240

llvm-svn: 231934
2015-03-11 16:16:09 +00:00
Quentin Colombet
b7df98faee [CodeGenPrepare] Refine the cost model provided by the promotion helper.
- Use TargetLowering to check for the actual cost of each extension.
- Provide a factorized method to check for the cost of an extension:
  TargetLowering::isExtFree.
- Provide a virtual method TargetLowering::isExtFreeImpl for targets to be able
  to tune the cost of non-free extensions.

This refactoring offers a better granularity to model what really happens on
different targets.

No performance changes and very few code differences.

Part of <rdar://problem/19267165> 

llvm-svn: 231855
2015-03-10 21:48:15 +00:00
Igor Laevsky
ec23b1a840 Teach lowering to correctly handle invoke statepoint and gc results tied to them. Note that we still can not lower gc.relocates for invoke statepoints.
Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result.
(Resubmitting this change after not being able to reproduce buildbot failure)

Differential Revision: http://reviews.llvm.org/D7760

llvm-svn: 231800
2015-03-10 16:26:48 +00:00
Sanjay Patel
5c62e16cdb [X86, AVX] replace vinsertf128 intrinsics with generic shuffles
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.

This is the sibling patch for the Clang half of this change:
http://reviews.llvm.org/D8088

Differential Revision: http://reviews.llvm.org/D8086

llvm-svn: 231794
2015-03-10 16:08:36 +00:00
Ahmed Bougacha
c37ed67e4a [CodeGen] Replace the reused stores' chain for extractelt expansion.
This fixes a subtle issue that was introduced in r205153.

When reusing a store for the extractelement expansion (to load directly
from it, inserting of going through the stack), later stores to the
same location might have overwritten the data we were expecting to
extract from.

To fix that, we need to explicitly replace the chain going out of the
reused store, so that later stores also have an explicit dependency on
the generated element-extracting loads, and can't clobber them.

rdar://20066785
Differential Revision: http://reviews.llvm.org/D8180

llvm-svn: 231721
2015-03-09 22:51:05 +00:00
Ahmed Bougacha
03e5ee9dd0 [X86] Add nounwind to vector-idiv.ll testcases. NFC.
In preparation for a patch where cfi directives get in the way.

llvm-svn: 231720
2015-03-09 22:46:02 +00:00
Reid Kleckner
dd109017af Reland r229944: EH: Prune unreachable resume instructions during Dwarf EH preparation
Fix the double-deletion of AnalysisResolver when delegating through to
Dwarf EH preparation by creating one from scratch. Hopefully the new
pass manager simplifies this.

This reverts commit r229952.

llvm-svn: 231719
2015-03-09 22:45:16 +00:00
Rafael Espindola
90d175f38e Print jump tables before exception tables.
In the case where just tables are part of the function section, this produces
more readable assembly by avoiding switching to the eh section and back
to .text.

This would also break with non unique section names, as trying to switch to
a unique section actually creates a new one.

llvm-svn: 231677
2015-03-09 18:29:12 +00:00
Andrea Di Biagio
a0999d27ed Fix line ending in test CodeGen/X86/pr22774.ll. NFC.
Also, replaced line with 'target triple' with flag -mtriple on the RUN line.
Removed the data layout string as it is not needed.

llvm-svn: 231654
2015-03-09 15:02:01 +00:00
Andrea Di Biagio
171a02e1ca [X86][AVX] Fix wrong lowering of VPERM2X128 nodes
There were cases where the backend computed a wrong permute mask for a VPERM2X128 node.

Example:
\code
define <8 x float> @foo(<8 x float> %a, <8 x float> %b) {
  %shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 undef, i32 undef, i32 6, i32 7>
  ret <8 x float> %shuffle
}
\code end

Before this patch, llc (with -mattr=+avx) emitted the following vperm2f128:
  vperm2f128 $0, %ymm0, %ymm0, %ymm0  # ymm0 = ymm0[0,1,0,1]

With this patch, llc emits a vperm2f128 with a correct permute mask:
  vperm2f128 $17, %ymm0, %ymm0, %ymm0  # ymm0 = ymm0[2,3,2,3]

Differential Revision: http://reviews.llvm.org/D8119

llvm-svn: 231601
2015-03-08 16:28:47 +00:00
Andrea Di Biagio
7cad5c0bec [DAGCombiner] Fix wrong folding of AND dag nodes.
This patch fixes the logic in the DAGCombiner that folds an AND node according
to rule: (and (X (load V)), C) -> (X (load V))

An AND between a vector load 'X' and a constant build_vector 'C' can be folded
into the load itself only if we can prove that the AND operation is redundant.
The algorithm implemented by 'visitAND' firstly computes the splat value 'S'
from C, and then checks if S has the lower 'B' bits set (where B is the size in
bits of the vector element type). The algorithm takes into account also the
'undef' bits in the splat mask.

Unfortunately, the algorithm only worked under the assumption that the size of S
is a multiple of the vector element type. With this patch, we conservatively
avoid folding the AND if the splat bits are not compatible with the vector
element type.

Added X86 test and-load-fold.ll

Differential Revision: http://reviews.llvm.org/D8085

llvm-svn: 231563
2015-03-07 12:24:55 +00:00
Simon Pilgrim
ce7343d964 [DAGCombiner] SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT(V,C)) -> VECTOR_SHUFFLE
This patch attempts to convert a SCALAR_TO_VECTOR using an operand from an EXTRACT_VECTOR_ELT into a VECTOR_SHUFFLE.

This prevents many cases of spilling scalar data between the gpr + simd registers. 

At present the optimization only accepts cases where there is no TRUNC of the scalar type (i.e. all types must match).

Differential Revision: http://reviews.llvm.org/D8132

llvm-svn: 231554
2015-03-07 05:52:42 +00:00
Sanjay Patel
1c3a2b6e94 fixed to test features, not CPUs
llvm-svn: 231524
2015-03-06 21:50:42 +00:00
Sanjay Patel
ea19afe756 fixed to test features, not CPUs
llvm-svn: 231523
2015-03-06 21:50:27 +00:00
Sanjay Patel
e784ddba37 loosen checking for buildbots
llvm-svn: 231522
2015-03-06 21:30:18 +00:00
Sanjay Patel
3e7adabcbe fixed to test only the feature, not the feature and a CPU
llvm-svn: 231521
2015-03-06 21:24:56 +00:00
Sanjay Patel
067fca92e1 fixed to test only the feature, not the feature and a CPU
llvm-svn: 231520
2015-03-06 21:19:32 +00:00
Sanjay Patel
fa0ccb5b9e fixed test to use FileCheck
llvm-svn: 231519
2015-03-06 21:16:15 +00:00
Sanjay Patel
c2d60fe833 fixed to use CHECK-LABELs
llvm-svn: 231517
2015-03-06 21:05:02 +00:00
Sanjay Patel
0627c13e13 fixed to test only the feature, not the feature and a CPU
llvm-svn: 231516
2015-03-06 20:58:15 +00:00
Sanjay Patel
d2d58cee6c fixed to test only the feature, not the feature and a CPU
llvm-svn: 231515
2015-03-06 20:57:40 +00:00
Sanjay Patel
b9a5d78c41 fixed to test feature, not CPU
llvm-svn: 231513
2015-03-06 20:51:25 +00:00
Sanjay Patel
af6ff9d393 fixed to test features, not CPUs
llvm-svn: 231512
2015-03-06 20:46:16 +00:00
Sanjay Patel
16546869a1 fixed test to use SSE2 attribute
llvm-svn: 231510
2015-03-06 20:38:55 +00:00
Sanjay Patel
061498e6b3 fixed to test only the feature, not the feature and a CPU
llvm-svn: 231509
2015-03-06 20:34:20 +00:00
Matthias Braun
47028079f1 DAGCombiner: Canonicalize select(and/or,x,y) depending on target.
This is based on the following equivalences:
select(C0 & C1, X, Y) <=> select(C0, select(C1, X, Y), Y)
select(C0 | C1, X, Y) <=> select(C0, X, select(C1, X, Y))

Many target cannot perform and/or on the CPU flags and therefore the
right side should be choosen to avoid materializign the i1 flags in an
integer register. If the target can perform this operation efficiently
we normalize to the left form.

Differential Revision: http://reviews.llvm.org/D7622

llvm-svn: 231507
2015-03-06 19:49:10 +00:00
Michael Zolotukhin
b7a87300a2 LegalizeTypes: Handle shift by 0 in ExpandShiftByConstant.
Though such shifts are usually optimized away by combiner, we still can
encounter them after a vector shift is legalized.

llvm-svn: 231443
2015-03-06 01:13:01 +00:00
Sanjay Patel
e7dd0e711b [AVX] Lower / fast-isel scalar FP selects into VBLENDV instructions (PR22483)
This patch reduces code size for all AVX targets and increases speed for some chips.

SSE 4.1 introduced the useless (see code comments) 2-register form of BLENDV and
only in the packed float/double flavors.

AVX subsequently made the instruction useful by adding a 4-register operand form.

So we just need to paper over the lack of scalar forms of this instruction, complicate
the code to choose float or double forms, and use blendv on scalars since all FP is in
xmm registers anyway.

This gives us an approximately 50% speed up for a blendv microbenchmark sequence
on SandyBridge and Haswell:
blendv : 29.73 cycles/iter
logic : 43.15 cycles/iter

No new test cases with this patch because:

1. fast-isel-select-sse.ll tests the positive side for regular X86 lowering and fast-isel
2. sse-minmax.ll and fp-select-cmp-and.ll confirm that we're not firing for scalar selects without AVX
3. fp-select-cmp-and.ll and logical-load-fold.ll confirm that we're not firing for scalar selects with constants.

http://llvm.org/bugs/show_bug.cgi?id=22483

Differential Revision: http://reviews.llvm.org/D8063

llvm-svn: 231408
2015-03-05 21:46:54 +00:00
David Majnemer
bdce8557da X86: Optimize address mode matching for FRAME_ALLOC_RECOVER nodes
We know that the absolute symbol will be less than 2GB and thus will
always fit.

llvm-svn: 231389
2015-03-05 18:50:12 +00:00
Reid Kleckner
d0e0d012a0 Replace llvm.frameallocate with llvm.frameescape
Turns out it's pretty straightforward and simplifies the implementation.

Reviewers: andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D8051

llvm-svn: 231386
2015-03-05 18:26:34 +00:00
Simon Pilgrim
2d456bc887 [DagCombiner] Allow shuffles to merge through bitcasts
Currently shuffles may only be combined if they are of the same type, despite the fact that bitcasts are often introduced in between shuffle nodes (e.g. x86 shuffle type widening).

This patch allows a single input shuffle to peek through bitcasts and if the input is another shuffle will merge them, shuffling using the smallest sized type, and re-applying the bitcasts at the inputs and output instead.

Dropped old ShuffleToZext test - this patch removes the use of the zext and vector-zext.ll covers these anyhow.

Differential Revision: http://reviews.llvm.org/D7939

llvm-svn: 231380
2015-03-05 17:14:04 +00:00
Igor Laevsky
26ad396d2f Revert change r231366 as it broke clang-native-arm-cortex-a9 Analysis/properties.m test.
llvm-svn: 231374
2015-03-05 15:41:14 +00:00
Elena Demikhovsky
a71b2e475e AVX-512, SKX: Enabled masked_load/store operations for this target.
Added lowering for ISD::CONCAT_VECTORS and ISD::INSERT_SUBVECTOR for i1 vectors,
it is needed to pass all masked_memop.ll tests for SKX.

llvm-svn: 231371
2015-03-05 15:11:35 +00:00
Igor Laevsky
eb9f695bc2 Teach lowering to correctly handle invoke statepoint and gc results tied to them. Note that we still can not lower gc.relocates for invoke statepoints.
Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result.

llvm-svn: 231366
2015-03-05 14:11:21 +00:00
Craig Topper
558157f7a7 [X86] Use vmovss to handle inserting an element into index 0 of a v8f32 vector of zeros.
llvm-svn: 231354
2015-03-05 06:38:42 +00:00
Chandler Carruth
9b555d5159 [MBP] Revert r231238 which attempted to fix a nasty bug where MBP is
just arbitrarily interleaving unrelated control flows once they get
moved "out-of-line" (both outside of natural CFG ordering and with
diamonds that cannot be fully laid out by chaining fallthrough edges).

This easy solution doesn't work in practice, and it isn't just a small
bug. It looks like a very different strategy will be required. I'm
working on that now, and it'll again go behind some flag so that
everyone can experiment and make sure it is working well for them.

llvm-svn: 231332
2015-03-05 01:07:03 +00:00
Andrea Di Biagio
c64113491c [X86][FastISel] Simplify the logic in method X86SelectSIToFP.
The target-independent selection algorithm in FastISel already knows how
to select a SINT_TO_FP if the target is SSE but not AVX.

On targets that have SSE but not AVX, the tablegen'd 'fastEmit' functions
for ISD::SINT_TO_FP know how to select instruction X86::CVTSI2SSrr
(for an i32 to f32 conversion) and X86::CVTSI2SDrr (for an i32 to f64
conversion).

This patch simplifies the logic in method X86SelectSIToFP knowing that
the code would not be reachable if the subtarget doesn't have AVX.
No functional change intended.

llvm-svn: 231243
2015-03-04 14:23:25 +00:00
Chandler Carruth
0fe0191603 [MBP] Fix a really horrible bug in MachineBlockPlacement, but behind
a flag for now.

First off, thanks to Daniel Jasper for really pointing out the issue
here. It's been here forever (at least, I think it was there when
I first wrote this code) without getting really noticed or fixed.

The key problem is what happens when two reasonably common patterns
happen at the same time: we outline multiple cold regions of code, and
those regions in turn have diamonds or other CFGs for which we can't
just topologically lay them out. Consider some C code that looks like:

  if (a1()) { if (b1()) c1(); else d1(); f1(); }
  if (a2()) { if (b2()) c2(); else d2(); f2(); }
  done();

Now consider the case where a1() and a2() are unlikely to be true. In
that case, we might lay out the first part of the function like:

  a1, a2, done;

And then we will be out of successors in which to build the chain. We go
to find the best block to continue the chain with, which is perfectly
reasonable here, and find "b1" let's say. Laying out successors gets us
to:

  a1, a2, done; b1, c1;

At this point, we will refuse to lay out the successor to c1 (f1)
because there are still un-placed predecessors of f1 and we want to try
to preserve the CFG structure. So we go get the next best block, d1.

... wait for it ...

Except that the next best block *isn't* d1. It is b2! d1 is waaay down
inside these conditionals. It is much less important than b2. Except
that this is exactly what we didn't want. If we keep going we get the
entire set of the rest of the CFG *interleaved*!!!

  a1, a2, done; b1, c1; b2, c2; d1, f1; d2, f2;

So we clearly need a better strategy here. =] My current favorite
strategy is to actually try to place the block whose predecessor is
closest. This very simply ensures that we unwind these kinds of CFGs the
way that is natural and fitting, and should minimize the number of cache
lines instructions are spread across.

It also happens to be *dead simple*. It's like the datastructure was
specifically set up for this use case or something. We only push blocks
onto the work list when the last predecessor for them is placed into the
chain. So the back of the worklist *is* the nearest next block.

Unfortunately, a change like this is going to cause *soooo* many
benchmarks to swing wildly. So for now I'm adding this under a flag so
that we and others can validate that this is fixing the problems
described, that it seems possible to enable, and hopefully that it fixes
more of our problems long term.

llvm-svn: 231238
2015-03-04 12:18:08 +00:00
Daniel Jasper
3a2dd1b264 Add a flag to experiment with outlining optional branches.
In a CFG with the edges A->B->C and A->C, B is an optional branch.

LLVM's default behavior is to lay the blocks out naturally, i.e. A, B,
C, in order to improve code locality and fallthroughs. However, if a
function contains many of those optional branches only a few of which
are taken, this leads to a lot of unnecessary icache misses. Moving B
out of line can work around this.

Review: http://reviews.llvm.org/D7719
llvm-svn: 231230
2015-03-04 11:05:34 +00:00
Michael Kuperstein
c46a58c60e [DAGCombine] Fix a bug in a BUILD_VECTOR combine
When trying to convert a BUILD_VECTOR into a shuffle, we try to split a single source vector that is twice as wide as the destination vector. 
We can not do this when we also need the zero vector to create a blend.
This fixes PR22774.

Differential Revision: http://reviews.llvm.org/D8040

llvm-svn: 231219
2015-03-04 07:27:39 +00:00
Filipe Cabecinhas
1f8bf333db Fix the test for r231201. We don't crash anymore.
llvm-svn: 231207
2015-03-04 02:09:40 +00:00
Rafael Espindola
eb2f517737 Use the vanilla func_end symbol for .size.
No need to create yet another temp symbol.

llvm-svn: 231198
2015-03-04 01:35:23 +00:00
Eric Christopher
3c27e28cc5 Weaken the check for a specific movl on the twoaddr-coalesce-3
test - we only care that there are two moves in the loop and not
which part is relative to which register anyhow.

llvm-svn: 231191
2015-03-04 01:19:17 +00:00
Filipe Cabecinhas
566e4876d5 Fix the x86-upgrade-avx2-vbroadcast.ll test by commenting the CHECK lines
llvm-svn: 231187
2015-03-04 00:49:12 +00:00
Rafael Espindola
64eb6d620c Drop the "eh_" from eh_func_begin and eh_func_end.
They will be used for more than eh tables.

llvm-svn: 231185
2015-03-04 00:27:43 +00:00
Juergen Ributzka
a0c556be5c Remove 'llvm.x86.avx2.vbroadcasti128' intrinsic.
The intrinsic is no longer generated by the front-end. Remove the intrinsic and
auto-upgrade it to a vector shuffle.

Reviewed by Nadav

This is related to rdar://problem/18742778.

llvm-svn: 231182
2015-03-04 00:13:25 +00:00
Eric Christopher
a02ce78817 Update twoaddr-coalesce-3.ll to run on darwin and linux machines:
a) Default relocation model differences,
b) Different numbers of # in comments

llvm-svn: 231178
2015-03-03 23:56:20 +00:00
Andrew Kaylor
f62d4b660d Moving WinEH outlining tests to an architecture neutral location
llvm-svn: 231155
2015-03-03 22:33:39 +00:00
Eric Christopher
5d29fd1be5 Fix a problem where the TwoAddressInstructionPass which generate redundant register moves in a loop.
From:
int M, total;
void foo() {
int i;
for (i = 0; i < M; i++) {
  total = total + i / 2;
}
}

This is the kernel loop:

.LBB0_2: # %for.body

=>This Inner Loop Header: Depth=1
movl %edx, %esi
movl %ecx, %edx
shrl $31, %edx
addl %ecx, %edx
sarl %edx
addl %esi, %edx
incl %ecx
cmpl %eax, %ecx
jl .LBB0_2
--------------------------
The first mov insn "movl %edx, %esi" could be removed if we change "addl %esi, %edx" to "addl %edx, %esi".

The IR before TwoAddressInstructionPass is:
BB#2: derived from LLVM BB %for.body

Predecessors according to CFG: BB#1 BB#2
    %vreg3<def> = COPY %vreg12<kill>; GR32:%vreg3,%vreg12
    %vreg2<def> = COPY %vreg11<kill>; GR32:%vreg2,%vreg11
    %vreg7<def,tied1> = SHR32ri %vreg3<tied0>, 31, %EFLAGS<imp-def,dead>; GR32:%vreg7,%vreg3
    %vreg8<def,tied1> = ADD32rr %vreg3<tied0>, %vreg7<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg8,%vreg3,%vreg7
    %vreg9<def,tied1> = SAR32r1 %vreg8<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg9,%vreg8
    %vreg4<def,tied1> = ADD32rr %vreg9<kill,tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg4,%vreg9,%vreg2
    %vreg5<def,tied1> = INC64_32r %vreg3<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg5,%vreg3
    CMP32rr %vreg5, %vreg0, %EFLAGS<imp-def>; GR32:%vreg5,%vreg0
    %vreg11<def> = COPY %vreg4; GR32:%vreg11,%vreg4
    %vreg12<def> = COPY %vreg5<kill>; GR32:%vreg12,%vreg5
    JL_4 <BB#2>, %EFLAGS<imp-use,kill>
Now TwoAddressInstructionPass will choose vreg9 to be tied with vreg4. However, it doesn't see that there is copy from vreg4 to vreg11 and another copy from vreg11 to vreg2 inside the loop body. To remove those copies, it is necessary to choose vreg2 to be tied with vreg4 instead of vreg9. This code pattern commonly appears when there is reduction operation in a loop.

So check for a reversed copy chain and if we encounter one then we can commute the add instruction so we can avoid a copy.

Patch by Wei Mi.
http://reviews.llvm.org/D7806

llvm-svn: 231148
2015-03-03 22:03:03 +00:00
Andrew Kaylor
6232795e40 Outline cleanup handlers for native Windows C++ exception handling
Differential Revision: http://reviews.llvm.org/D7865

llvm-svn: 231117
2015-03-03 20:00:16 +00:00
Reid Kleckner
34a636c32a Make llvm.eh.begincatch use an outparam
Ultimately, __CxxFrameHandler3 needs us to put a stack offset in a
table, and it will take responsibility for copying the exception object
into that slot. Modelling the exception object as an SSA value returned
by begincatch isn't going to work in general, so make it use an output
parameter.

Reviewers: andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D7920

llvm-svn: 231086
2015-03-03 17:41:09 +00:00
Duncan P. N. Exon Smith
8d1b74869c DebugInfo: Move new hierarchy into place
Move the specialized metadata nodes for the new debug info hierarchy
into place, finishing off PR22464.  I've done bootstraps (and all that)
and I'm confident this commit is NFC as far as DWARF output is
concerned.  Let me know if I'm wrong :).

The code changes are fairly mechanical:

  - Bumped the "Debug Info Version".
  - `DIBuilder` now creates the appropriate subclass of `MDNode`.
  - Subclasses of DIDescriptor now expect to hold their "MD"
    counterparts (e.g., `DIBasicType` expects `MDBasicType`).
  - Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp`
    for printing comments.
  - Big update to LangRef to describe the nodes in the new hierarchy.
    Feel free to make it better.

Testcase changes are enormous.  There's an accompanying clang commit on
its way.

If you have out-of-tree debug info testcases, I just broke your build.

  - `upgrade-specialized-nodes.sh` is attached to PR22564.  I used it to
    update all the IR testcases.
  - Unfortunately I failed to find way to script the updates to CHECK
    lines, so I updated all of these by hand.  This was fairly painful,
    since the old CHECKs are difficult to reason about.  That's one of
    the benefits of the new hierarchy.

This work isn't quite finished, BTW.  The `DIDescriptor` subclasses are
almost empty wrappers, but not quite: they still have loose casting
checks (see the `RETURN_FROM_RAW()` macro).  Once they're completely
gutted, I'll rename the "MD" classes to "DI" and kill the wrappers.  I
also expect to make a few schema changes now that it's easier to reason
about everything.

llvm-svn: 231082
2015-03-03 17:24:31 +00:00
Daniel Jasper
5816fc16b8 During PHI elimination, split critical edges that move copies out of loops.
This prevents the behavior observed in llvm.org/PR22369. I am not sure
whether I am reading the code correctly, but the early exit based on
isLiveOutPastPHIs() seems to make the wrong assumption that
RegisterCoalescer won't be able to coalesce those copies later.

This change hides the new behavior behind -no-phi-elim-live-out-early-exit
as it currently breaks four tests:
 * Assertion in:
     CodeGen/Hexagon/hwloop-cleanup.ll
 * Worse code in:
     CodeGen/X86/coalescer-commute4.ll
     CodeGen/X86/phys_subreg_coalesce-2.ll
     CodeGen/X86/zlib-longest-match.ll
   The root cause here seems to be that the heuristic that determines
   the visitation order in RegisterCoalescer gets less lucky.

llvm-svn: 231064
2015-03-03 10:23:11 +00:00
Ahmed Bougacha
d8b9ab0f6c [X86] Special-case 2x CMOV when custom-inserting.
This lets us avoid a few copies that are otherwise hard to get rid of.
The way this is done is, the custom-inserter looks at the following
instruction for another CMOV, and replaces both at the same time.
A previous version used a new CMOV2 opcode, but the custom inserter
is expected to be able to return a different basic block anyway, which
means it's OK - though far from ideal - to alter that block's contents.
Explicitly document that, in case it ever makes a difference.
Alternatives welcome!

Follow-up to r231045.

rdar://19767934
Closes http://reviews.llvm.org/D8019

llvm-svn: 231046
2015-03-03 01:21:16 +00:00
Ahmed Bougacha
10768a3027 [X86] Combine (cmov (and/or (setcc) (setcc))) into (cmov (cmov)).
Fold and/or of setcc's to double CMOV:

(CMOV F, T, ((cc1 | cc2) != 0)) -> (CMOV (CMOV F, T, cc1), T, cc2)
(CMOV F, T, ((cc1 & cc2) != 0)) -> (CMOV (CMOV T, F, !cc1), F, !cc2)

When we can't use the CMOV instruction, it might increase branch
mispredicts.  When we can, or when there is no mispredict, this
improves throughput and reduces register pressure.

These can't be catched by generic combines, because the pattern can
appear when legalizing some instructions (such as fcmp une).

rdar://19767934
http://reviews.llvm.org/D7634

llvm-svn: 231045
2015-03-03 01:09:14 +00:00
Reid Kleckner
a9f6fbee65 Fix cppeh breakage due to racing commits
llvm-svn: 231044
2015-03-03 01:04:39 +00:00
Andrew Kaylor
8cfaab59d9 Remap arguments and non-alloca values used by outlined C++ exception handlers.
Differential Revision: http://reviews.llvm.org/D7844

llvm-svn: 231042
2015-03-03 00:41:03 +00:00
Reid Kleckner
8e7faa02a8 WinEH: Run opt -instnamer over some cppeh tests and update CHECKs
In the future, we should run the output of clang through instnamer to
make it easier to manually edit test cases.

No functionality change.

llvm-svn: 231037
2015-03-03 00:05:35 +00:00
Elena Demikhovsky
e032aa37b6 AVX-512: Added mask and rounding mode for scalar arithmetics
Added more tests for scalar instructions to destinguish between AVX and AVX-512 forms.

llvm-svn: 230891
2015-03-01 07:44:04 +00:00
Sanjay Patel
46d370c549 avoid infinite looping when folding vector multiplies of constants (PR22698)
We were missing a check for the following fold in DAGCombiner:

// fold (fmul (fmul x, c1), c2) -> (fmul x, (fmul c1, c2))

If 'x' is also a constant, then we shouldn't do anything. Otherwise, we could end up swapping the operands back and forth forever.

This should fix:
http://llvm.org/bugs/show_bug.cgi?id=22698

Differential Revision: http://reviews.llvm.org/D7917

llvm-svn: 230884
2015-03-01 00:09:35 +00:00
Sanjay Patel
0f499d9c03 fixed to test only the feature, not the feature and a CPU
llvm-svn: 230883
2015-03-01 00:02:03 +00:00
Sanjay Patel
cd857fab16 make the tested feature (SSE2) explicit
llvm-svn: 230881
2015-02-28 23:55:24 +00:00
Duncan P. N. Exon Smith
d6d88d0f42 DebugInfo: Fix invalid file reference in CodeGen/X86/unknown-location.ll
There are two types of files in the old (current) debug info schema.

    !0 = !{!"some/filename", !"/path/to/dir"}
    !1 = !{!"0x29", !0} ; [ DW_TAG_file_type ]

!1 has a wrapper class called `DIFile` which inherits from `DIScope` and
is referenced in 'scope' fields.

!0 is called a "file node", and debug info nodes with a 'file' field
point at one of these directly -- although they're built in `DIBuilder`
by sending in a `DIFile` and reaching into it.

In the new hierarchy, I unified these nodes as `MDFile` (which `DIFile`
is a lightweight wrapper for) in r230057.  Moving the new hierarchy into
place (and upgrading testcases) caused CodeGen/X86/unknown-location.ll
to start failing -- apparently "0x29" was previously showing up in the
linetable as a filename, causing:

    .loc 2 4 3

(where 2 points at filename "0x29") instead of:

    .loc 1 4 3

(where 1 points at the actual filename).

Change the testcase to use the old schema correctly.

llvm-svn: 230880
2015-02-28 23:52:24 +00:00
Sanjay Patel
5a17442330 fixed to test only the feature, not the feature and a CPU
llvm-svn: 230878
2015-02-28 23:47:09 +00:00
Craig Topper
c3d656cc84 [X86] Remove the blendpd/blendps/pblendw/pblendd intrinsics. They can represented by shuffle_vector instructions.
llvm-svn: 230860
2015-02-28 19:33:17 +00:00
David Blaikie
ab043ff680 [opaque pointer type] Add textual IR support for explicit type parameter to load instruction
Essentially the same as the GEP change in r230786.

A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)

import fileinput
import sys
import re

pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")

for line in sys.stdin:
  sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7649

llvm-svn: 230794
2015-02-27 21:17:42 +00:00
Charles Davis
5c83517500 Target/X86: Never use the redzone for Win64 ABI functions.
Summary:
Until now, we did this (among other things) based on whether or not the
target was Windows. This is clearly wrong, not just for Win64 ABI functions
on non-Windows, but for System V ABI functions on Windows, too. In this
change, we make this decision based on the ABI the calling convention
specifies instead.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7953

llvm-svn: 230793
2015-02-27 21:11:16 +00:00
David Blaikie
0d99339102 [opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction
One of several parallel first steps to remove the target type of pointers,
replacing them with a single opaque pointer type.

This adds an explicit type parameter to the gep instruction so that when the
first parameter becomes an opaque pointer type, the type to gep through is
still available to the instructions.

* This doesn't modify gep operators, only instructions (operators will be
  handled separately)

* Textual IR changes only. Bitcode (including upgrade) and changing the
  in-memory representation will be in separate changes.

* geps of vectors are transformed as:
    getelementptr <4 x float*> %x, ...
  ->getelementptr float, <4 x float*> %x, ...
  Then, once the opaque pointer type is introduced, this will ultimately look
  like:
    getelementptr float, <4 x ptr> %x
  with the unambiguous interpretation that it is a vector of pointers to float.

* address spaces remain on the pointer, not the type:
    getelementptr float addrspace(1)* %x
  ->getelementptr float, float addrspace(1)* %x
  Then, eventually:
    getelementptr float, ptr addrspace(1) %x

Importantly, the massive amount of test case churn has been automated by
same crappy python code. I had to manually update a few test cases that
wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
python script just massages stdin and writes the result to stdout, I
then wrapped that in a shell script to handle replacing files, then
using the usual find+xargs to migrate all the files.

update.py:
import fileinput
import sys
import re

ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
normrep = re.compile(       r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")

def conv(match, line):
  if not match:
    return line
  line = match.groups()[0]
  if len(match.groups()[5]) == 0:
    line += match.groups()[2]
  line += match.groups()[3]
  line += ", "
  line += match.groups()[1]
  line += "\n"
  return line

for line in sys.stdin:
  if line.find("getelementptr ") == line.find("getelementptr inbounds"):
    if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
      line = conv(re.match(ibrep, line), line)
  elif line.find("getelementptr ") != line.find("getelementptr ("):
    line = conv(re.match(normrep, line), line)
  sys.stdout.write(line)

apply.sh:
for name in "$@"
do
  python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
  rm -f "$name.tmp"
done

The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh

After that, check-all (with llvm, clang, clang-tools-extra, lld,
compiler-rt, and polly all checked out).

The extra 'rm' in the apply.sh script is due to a few files in clang's test
suite using interesting unicode stuff that my python script was throwing
exceptions on. None of those files needed to be migrated, so it seemed
sufficient to ignore those cases.

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7636

llvm-svn: 230786
2015-02-27 19:29:02 +00:00
Eric Christopher
7c8f775d46 Remove the Forward Control Flow Integrity pass and its dependencies.
This work is currently being rethought along different lines and
if this work is needed it can be resurrected out of svn. Remove it
for now as no current work in ongoing on it and it's unused. Verified
with the authors before removal.

llvm-svn: 230780
2015-02-27 19:03:38 +00:00
Mehdi Amini
32875af6e3 Change the fast-isel-abort option from bool to int to enable "levels"
Summary:
Currently fast-isel-abort will only abort for regular instructions,
and just warn for function calls, terminators, function arguments.
There is already fast-isel-abort-args but nothing for calls and
terminators.

This change turns the fast-isel-abort options into an integer option,
so that multiple levels of strictness can be defined.
This will help no being surprised when the "abort" option indeed does
not abort, and enables the possibility to write test that verifies
that no intrinsics are forgotten by fast-isel.

Reviewers: resistor, echristo

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D7941

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 230775
2015-02-27 18:32:11 +00:00
Rafael Espindola
95e3cfd5ed Centralize handling of the eh_begin and eh_end labels.
This removes a bit of duplicated code and more importantly, remembers the
labels so that they don't need to be looked up by name.

This in turn allows for any name to be used and avoids a crash if the name
we wanted was already taken.

llvm-svn: 230772
2015-02-27 18:18:39 +00:00
Chandler Carruth
9471b89b1d [x86] Run most of the rest of the shuffle combining over non-128-bit
vectors. This lets us fix the rest of the v16 lowering problems when
pshufb is clearly better.

We might still be able to improve some of the lowerings by enabling the
other combine-based rewriting to fire for non-128-bit vectors, but this
at least should remove any regressions from using the fancy v16i16
lowering strategy.

llvm-svn: 230753
2015-02-27 12:13:14 +00:00
Chandler Carruth
36698dfd52 [x86] Teach a bunch of the x86-specific shuffle combining to work with
256-bit vectors as well as 128-bit vectors. Fixes some of the redundant
shuffles for v16i16.

llvm-svn: 230752
2015-02-27 11:45:13 +00:00
Chandler Carruth
a1c6bfd527 [x86] Make the v8i16 clever single-input shuffle lowering usable for
repeated 128-bit lane shuffles of wider vector types and use it to lower
256-bit v16i16 vector shuffles where applicable.

This should let us perfectly lowering the pattern of pshuflw and pshufhw
even for AVX2 256-bit patterns.

I've not added AVX-512 support, but it should be trivial for someone
working on that to wire up.

Note that currently this generates bad, long shuffle chains because we
don't combine 256-bit target shuffles. The subsequent patches will fix
that.

llvm-svn: 230751
2015-02-27 11:33:46 +00:00
Chandler Carruth
c1e4fdbb66 [x86] Add a bunch more tests for v16i16 shuffles. All of these are taken
by mirroring v8i16 test cases across both 128-bit lanes. This should
highlight problems where we aren't correctly using 128-bit shuffles to
implement things.

llvm-svn: 230750
2015-02-27 11:25:10 +00:00
Charles Davis
6a532329fd Target/X86: Save Win64 non-volatile registers in a Win64 ABI function.
Summary:
This change causes us to actually save non-volatile registers in a Win64
ABI function that calls a System V ABI function, and vice-versa.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7919

llvm-svn: 230714
2015-02-27 00:57:01 +00:00
Rafael Espindola
efb97e5733 Put jump tables in distinct sections if -ffunction-sections is used.
A small regression in r230411 was that we were basing the decision on
-fdata-sections.

llvm-svn: 230707
2015-02-26 23:55:11 +00:00
Chandler Carruth
38a7e2c9a6 [x86] Fix PR22706 where we would incorrectly try lower a v32i8 dynamic
blend as legal.

We made the same mistake in two different places. Whenever we are custom
lowering a v32i8 blend we need to check whether we are custom lowering
it only for constant conditions that can be shuffled, or whether we
actually have AVX2 and full dynamic blending support on bytes. Both are
fixed, with comments added to make it clear what is going on and a new
test case.

llvm-svn: 230695
2015-02-26 22:15:34 +00:00
Reid Kleckner
d2d2baefa1 Don't sibcall between SysV and Win64 convention functions
The shadow stack space expectations won't match.

Fixes PR22709.

llvm-svn: 230667
2015-02-26 19:43:20 +00:00
Paul Robinson
b0132db1ee When the source has a series of assignments, users reasonably want to
have the debugger step through each one individually. Turn off the
combine for adjacent stores at -O0 so we get this behavior.

Possibly, DAGCombine shouldn't run at all at -O0, but that's for
another day; see PR22346.

Differential Revision: http://reviews.llvm.org/D7181

llvm-svn: 230659
2015-02-26 18:47:57 +00:00
Bruno Cardoso Lopes
a831838ec1 [X86][MMX] Fix a typo in a couple of tests
llvm-svn: 230638
2015-02-26 15:16:09 +00:00
Bruno Cardoso Lopes
7282dad6b1 [X86][MMX] Remove widening experimental flag from MMX tests.
Turns out that after the past MMX commits, we don't need to rely on this
flag to get better codegen for MMX. Also update the tests to become
triple neutral.

llvm-svn: 230637
2015-02-26 15:10:38 +00:00
David Majnemer
0e99f5bfb9 X86, Win64: Allow 'mov' to restore the stack pointer if we have a FP
The Win64 epilogue structure is very restrictive, it permits a very
small number of opcodes and none of them are 'mov'.

This means that given:
  mov %rbp, %rsp
  pop %rbp

The mov isn't the epilogue, only the pop is.  This is problematic unless
a frame pointer is present in which case we are free to do whatever we'd
like in the "body" of the function.  If a frame pointer is present,
unwinding will undo the prologue operations in reverse order regardless
of the fact that we are at an instruction which is reseting the stack
pointer.

llvm-svn: 230543
2015-02-25 21:13:37 +00:00
Sanjoy Das
60e1014097 Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap
(The change was landed in r230280 and caused the regression PR22674.
This version contains a fix and a test-case for PR22674).
    
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.
    
This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.
    
Apart from the attached test case, another (more realistic)
manifestation of the bug can be seen in
Transforms/IndVarSimplify/pr20680.ll.

Differential Revision: http://reviews.llvm.org/D7778

llvm-svn: 230533
2015-02-25 20:02:59 +00:00
Bruno Cardoso Lopes
5570fdd2bb [X86][MMX] Reapply: Add MMX instructions to foldable tables
Reapply r230248.

Teach the peephole optimizer to work with MMX instructions by adding
entries into the foldable tables. This covers folding opportunities not
handled during isel.

llvm-svn: 230499
2015-02-25 15:14:02 +00:00
Rafael Espindola
565d0485ca Support SHF_MERGE sections in COMDATs.
This patch unifies the comdat and non-comdat code paths. By doing this
it add missing features to the comdat side and removes the fixed
section assumptions from the non-comdat side.

In ELF there is no one true section for "4 byte mergeable" constants.
We are better off computing the required properties of the section
and asking the context for it.

llvm-svn: 230411
2015-02-25 00:52:15 +00:00
Eric Christopher
ceeffadd1e Make this test even more OS and register allocation neutral.
llvm-svn: 230404
2015-02-25 00:12:11 +00:00
Eric Christopher
e992fbba6a Make this test not dependent upon the triple. All that was needed
was some flexibility in the check line for the comment basic block.

llvm-svn: 230400
2015-02-24 23:43:26 +00:00
Simon Pilgrim
50fa51790b Reapplied D7816 & rL230177 & rL230278 - with an additional fix toensure that the smallest build vector input scalar type is always used. Additional (crash) test cases already committed.
llvm-svn: 230388
2015-02-24 22:08:56 +00:00
Andrew Kaylor
914fdbf483 Fixing eol-style
llvm-svn: 230378
2015-02-24 20:49:35 +00:00
Eric Christopher
db420cc1dd Revert:
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date:   Mon Feb 23 23:04:28 2015 +0000

    Fix based on post-commit comment on D7816 & rL230177 - BUILD_VECTOR operand truncation was using the the BV's output scalar type instead of the input type.

and

Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date:   Sun Feb 22 18:17:28 2015 +0000

    [DagCombiner] Generalized BuildVector Vector Concatenation

    The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.

    This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.

    This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.

    Differential Revision: http://reviews.llvm.org/D7816

as the root cause of PR22678 which is causing an assertion inside the DAG combiner.

I'll follow up to the main thread as well.

llvm-svn: 230358
2015-02-24 19:11:00 +00:00
Hans Wennborg
fb5af71543 Revert r230280: "Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap"
This caused PR22674, failing this assert:

Instructions.h:2281: llvm::Value* llvm::PHINode::getOperand(unsigned int) const: Assertion `i_nocapture < OperandTraits<PHINode>::operands(this) && "getOperand() out of range!"' failed.

llvm-svn: 230341
2015-02-24 16:19:29 +00:00
Michael Kuperstein
0bcb785609 [x32] x32 should use ebx as the base pointer.
This fixes the original issue in PR22655, but not the secondary one.

llvm-svn: 230334
2015-02-24 15:27:13 +00:00
David Majnemer
f29778c502 X86: Only use 'lea' in Win64 epilogues if a frame pointer exists
We can only use 'add' in epilogues, 'lea' is not permitted unless we've
established a frame pointer in the prologue.

llvm-svn: 230286
2015-02-24 00:11:32 +00:00
Sanjoy Das
9dddcf7f33 Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.

This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.

Apart from the attached test case, another (more realistic) manifestation
of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll.

NOTE: this change was landed with an incorrect commit message in
rL230275 and was reverted for that reason in rL230279.  This commit
message is the correct one.

Differential Revision: http://reviews.llvm.org/D7778

llvm-svn: 230280
2015-02-23 23:22:58 +00:00
Sanjoy Das
e5ca05754e Revert 230275.
230275 got committed with an incorrect commit message due to a mixup
on my side.  Will re-land in a few moments with the correct commit
message.

llvm-svn: 230279
2015-02-23 23:13:22 +00:00
Andrea Di Biagio
fc5bcff188 [X86] Teach how to custom lower double-to-half conversions under fast-math.
This patch teaches the backend how to expand a double-half conversion into
a double-float conversion immediately followed by a float-half conversion.
We do this only under fast-math, and if float-half conversions are legal
for the target.

Added test CodeGen/X86/fastmath-float-half-conversion.ll

Differential Revision: http://reviews.llvm.org/D7832

llvm-svn: 230276
2015-02-23 22:59:02 +00:00
Sanjoy Das
421baaa45f Fix bug 22641
The bug was a result of getPreStartForExtend interpreting nsw/nuw
flags on an add recurrence more strongly than is legal.  {S,+,X}<nsw>
implies S+X is nsw only if the backedge of the loop is taken at least
once.

Differential Revision: http://reviews.llvm.org/D7808

llvm-svn: 230275
2015-02-23 22:55:13 +00:00
David Majnemer
b6ac6360c4 X86: Use a smaller 'mov' instruction for stack probe calls
Prologue emission, in some cases, requires calls to a stack probe helper
function.  The amount of stack to probe is passed as a register
argument in the Win64 ABI but the instruction sequence used is
pessimistic: it assumes that the number of bytes to probe is greater
than 4 GB.

Instead, select a more appropriate opcode depending on the number of
bytes we are going to probe.

llvm-svn: 230270
2015-02-23 21:50:30 +00:00
David Majnemer
5bffee412d X86: Use 'mov' instead of 'lea' in Win64 SEH prologues when possible
'mov' and 'lea' are equivalent when the displacement applied with 'lea'
is zero.  However, 'mov' should encode smaller.

llvm-svn: 230269
2015-02-23 21:50:27 +00:00
Bruno Cardoso Lopes
a8c6f58e91 [X86][MMX] Fix test to reflect current codegen
This test failed in several buildbots, a bit unclear how that happen
since this was the previous behavior before r230248.

llvm-svn: 230258
2015-02-23 20:57:46 +00:00
Andrew Kaylor
a496164a47 Adding test for Windows EH frame variable remapping.
llvm-svn: 230250
2015-02-23 20:04:51 +00:00
Andrew Kaylor
4e424eb088 Remap frame variables for native Windows exception handling.
Differential Revision: http://reviews.llvm.org/D7770

llvm-svn: 230249
2015-02-23 20:01:56 +00:00
Bruno Cardoso Lopes
390bb4572c Revert "[X86][MMX] Add MMX instructions to foldable tables"
This reverts commit r230226 since it breaks win buildbots.

llvm-svn: 230248
2015-02-23 19:53:37 +00:00
Bruno Cardoso Lopes
01a01f5b4a [X86] Add specific mtriple in order to appease builbots
llvm-svn: 230229
2015-02-23 15:33:40 +00:00
Bruno Cardoso Lopes
ae9a9b4601 [X86][MMX] Add MMX instructions to foldable tables
Teach the peephole optimizer to work with MMX instructions by adding
entries into the foldable tables. This covers folding opportunities not
handled during isel.

llvm-svn: 230226
2015-02-23 15:23:22 +00:00
Bruno Cardoso Lopes
8fe8621194 [X86][MMX] Support folding loads in psll, psrl and psra intrinsics
llvm-svn: 230225
2015-02-23 15:23:14 +00:00
Bruno Cardoso Lopes
8b65bbca30 [X86][MMX] Add tests for pslli, psrli and psrai intrinsics
Add tests to cover the RR form of the pslli, psrli and psrai intrinsics.
In the next commit, the loads are going to be folded and the
instructions use the RM form.

llvm-svn: 230224
2015-02-23 15:23:06 +00:00
Elena Demikhovsky
b15d81ba19 AVX-512: recommitted 229837 + bugfix + test
llvm-svn: 230223
2015-02-23 15:12:31 +00:00
Simon Pilgrim
fb2ad0f430 [DagCombiner] Generalized BuildVector Vector Concatenation
The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.

This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.

This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.

Differential Revision: http://reviews.llvm.org/D7816

llvm-svn: 230177
2015-02-22 18:17:28 +00:00
Simon Pilgrim
1794ce916b [X86][SSE] Added shuffle based integer zero extension tests.
llvm-svn: 230145
2015-02-21 21:25:16 +00:00
David Majnemer
15414a7819 Win64: Stack alignment constraints aren't applied during SET_FPREG
Stack realignment occurs after the prolog, not during, for Win64.
Because of this, don't factor in the maximum stack alignment when
establishing a frame pointer.

This fixes PR22572.

llvm-svn: 230113
2015-02-21 01:04:47 +00:00
Rafael Espindola
2661d3d69a Use short names for jumptable sections.
Also refactor code to remove some duplication.

llvm-svn: 230087
2015-02-20 23:28:28 +00:00
Andrea Di Biagio
bd0813ded2 [X86][FastIsel] Teach how to select float-half conversion intrinsics.
This patch teaches X86FastISel how to select intrinsic 'convert_from_fp16' and
intrinsic 'convert_to_fp16'.
If the target has F16C, we can select VCVTPS2PHrr for a float-half conversion,
and VCVTPH2PSrr for a half-float conversion.

Differential Revision: http://reviews.llvm.org/D7673

llvm-svn: 230043
2015-02-20 19:37:14 +00:00
Chandler Carruth
ef5f7bdec3 [x86] Remove the old vector shuffle lowering code and its flag.
The new shuffle lowering has been the default for some time. I've
enabled the new legality testing by default with no really blocking
regressions. I've fuzz tested this very heavily (many millions of fuzz
test cases have passed at this point). And this cleans up a ton of code.
=]

Thanks again to the many folks that helped with this transition. There
was a lot of work by others that went into the new shuffle lowering to
make it really excellent.

In case you aren't using a diff algorithm that can handle this:
  X86ISelLowering.cpp: 22 insertions(+), 2940 deletions(-)

llvm-svn: 229964
2015-02-20 04:25:04 +00:00
Chandler Carruth
55eca885d4 [x86] Now that the new vector shuffle legality is enabled and everything
is going well, remove the flag and the code for the old legality tests.

This is the first step toward removing the entire old vector shuffle
lowering. *Much* more code to delete coming up next.

llvm-svn: 229963
2015-02-20 03:59:35 +00:00
Chandler Carruth
6b218ec380 [x86] Make the new vector shuffle legality test on by default, which
reflects the fact that the x86 backend can in fact lower any shuffle you
want it to with reasonably high code quality.

My recent work on the new vector shuffle has made this regress *very*
little. The diff in the test cases makes me very, very happy.

llvm-svn: 229958
2015-02-20 03:05:47 +00:00
Chandler Carruth
74f60d7a7d [x86] Clean up a couple of test cases with the new update script. Split
one test case that is only partially tested in 32-bits into two test
cases so that the script doesn't generate massive spews of tests for the
cases we don't care about.

llvm-svn: 229955
2015-02-20 02:44:13 +00:00
Chandler Carruth
4c2aad4a26 Revert r229944: EH: Prune unreachable resume instructions during Dwarf EH preparation
This doesn't pass 'ninja check-llvm' for me. Lots of tests, including
the ones updated, fail with crashes and other explosions.

llvm-svn: 229952
2015-02-20 02:15:36 +00:00
Reid Kleckner
13eabcdf32 EH: Prune unreachable resume instructions during Dwarf EH preparation
Today a simple function that only catches exceptions and doesn't run
destructor cleanups ends up containing a dead call to _Unwind_Resume
(PR20300). We can't remove these dead resume instructions during normal
optimization because inlining might introduce additional landingpads
that do have cleanups to run. Instead we can do this during EH
preparation, which is guaranteed to run after inlining.

Fixes PR20300.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D7744

llvm-svn: 229944
2015-02-20 01:00:19 +00:00
Eric Christopher
c93875565e Revert "AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics."
The instructions were being generated on architectures that don't support avx512.

This reverts commit r229837.

llvm-svn: 229942
2015-02-20 00:45:28 +00:00
Sanjay Patel
3da4617a80 add X86 load folding tests for unary math ops
X86 load folding is fragile; eg, the tests here
don't work without AVX even though they should. This
is because we have a mix of tablegen patterns that have
been added over time, and we have a load folding table
used by the peephole optimizer that has to be kept in 
sync with the ever-changing ISA and tablegen defs.

llvm-svn: 229870
2015-02-19 16:59:11 +00:00
Chandler Carruth
6d618f4a18 [x86] Delete still more piles of complex code now that we have a good
systematic lowering of v8i16.

This required a slight strategy shift to prefer unpack lowerings in more
places. While this isn't a cut-and-dry win in every case, it is in the
overwhelming majority. There are only a few places where the old
lowering would probably be a touch faster, and then only by a small
margin.

In some cases, this is yet another significant improvement.

llvm-svn: 229859
2015-02-19 15:21:57 +00:00
Chandler Carruth
fad94c932a [x86] Teach the unpack lowering how to lower with an initial unpack in
addition to lowering to trees rooted in an unpack.

This saves shuffles and or registers in many various ways, lets us
handle another class of v4i32 shuffles pre SSE4.1 without domain
crosses, etc.

llvm-svn: 229856
2015-02-19 15:06:13 +00:00
Chandler Carruth
6f2050671d [x86] Dramatically improve v8i16 shuffle lowering by not using its
terribly complex partial blend logic.

This code path was one of the more complex and bug prone when it first
went in and it hasn't faired much better. Ultimately, with the simpler
basis for unpack lowering and support bit-math blending, this is
completely obsolete. In the worst case without this we generate
different but equivalent instructions. However, in many cases we
generate much better code. This is especially true when blends or pshufb
is available.

This does expose one (minor) weakness of the unpack lowering that I'll
try to address.

In case you were wondering, this is actually a big part of what I've
been trying to pull off in the recent string of commits.

llvm-svn: 229853
2015-02-19 14:08:24 +00:00
Chandler Carruth
73087f5ce9 [x86] Remove the final fallback in the v8i16 lowering that isn't really
needed, and significantly improve the SSSE3 path.

This makes the new strategy much more clear. If we can blend, we just go
with that. If we can't blend, we try to permute into an unpack so
that we handle cases where the unpack doing the blend also simplifies
the shuffle. If that fails and we've got SSSE3, we now call into
factored-out pshufb lowering code so that we leverage the fact that
pshufb can set up a blend for us while shuffling. This generates great
code, especially because we *know* we don't have a fast blend at this
point. Finally, we fall back on decomposing into permutes and blends
because we do at least have a bit-math-based blend if we need to use
that.

This pretty significantly improves some of the v8i16 code paths. We
never need to form pshufb for the single-input shuffles because we have
effective target-specific combines to form it there, but we were missing
its effectiveness in the blends.

llvm-svn: 229851
2015-02-19 13:56:49 +00:00
Chandler Carruth
ba91b52308 [x86] Simplify the pre-SSSE3 v16i8 lowering significantly by decomposing
them into permutes and a blend with the generic decomposition logic.

This works really well in almost every case and lets the code only
manage the expansion of a single input into two v8i16 vectors to perform
the actual shuffle. The blend-based merging is often much nicer than the
pack based merging that this replaces. The only place where it isn't we
end up blending between two packs when we could do a single pack. To
handle that case, just teach the v2i64 lowering to handle these blends
by digging out the operands.

With this we're down to only really random permutations that cause an
explosion of instructions.

llvm-svn: 229849
2015-02-19 13:15:12 +00:00
Chandler Carruth
b0373058d2 [x86] Remove the insanely over-aggressive unpack lowering strategy for
v16i8 shuffles, and replace it with new facilities.

This uses precise patterns to match exact unpacks, and the new
generalized unpack lowering only when we detect a case where we will
have to shuffle both inputs anyways and they terminate in exactly
a blend.

This fixes all of the blend horrors that I uncovered by always lowering
blends through the vector shuffle lowering. It also removes *sooooo*
much of the crazy instruction sequences required for v16i8 lowering
previously. Much cleaner now.

The only "meh" aspect is that we sometimes use pshufb+pshufb+unpck when
it would be marginally nicer to use pshufb+pshufb+por. However, the
difference there is *tiny*. In many cases its a win because we re-use
the pshufb mask. In others, we get to avoid the pshufb entirely. I've
left a FIXME, but I'm dubious we can really do better than this. I'm
actually pretty happy with this lowering now.

For SSE2 this exposes some horrors that were really already there. Those
will have to fixed by changing a different path through the v16i8
lowering.

llvm-svn: 229846
2015-02-19 12:10:37 +00:00
Elena Demikhovsky
41438d50e6 AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics.
llvm-svn: 229837
2015-02-19 10:48:04 +00:00
Chandler Carruth
72b437995e [x86] Add support for bit-wise blending and use it in the v8 and v16
lowering paths. I'm going to be leveraging this to simplify a lot of the
overly complex lowering of v8 and v16 shuffles in pre-SSSE3 modes.

Sadly, this isn't profitable on v4i32 and v2i64. There, the float and
double blending instructions for pre-SSE4.1 are actually pretty good,
and we can't beat them with bit math. And once SSE4.1 comes around we
have direct blending support and this ceases to be relevant.

Also, some of the test cases look odd because the domain fixer
canonicalizes these to floating point domain. That's OK, it'll use the
integer domain when it matters and some day I may be able to update
enough of LLVM to canonicalize the other way.

This restores almost all of the regressions from teaching x86's vselect
lowering to always use vector shuffle lowering for blends. The remaining
problems are because the v16 lowering path is still doing crazy things.
I'll be re-arranging that strategy in more detail in subsequent commits
to finish recovering the performance here.

llvm-svn: 229836
2015-02-19 10:46:52 +00:00
Chandler Carruth
105d2fa5e8 [x86,sdag] Two interrelated changes to the x86 and sdag code.
First, don't combine bit masking into vector shuffles (even ones the
target can handle) once operation legalization has taken place. Custom
legalization of vector shuffles may exist for these patterns (making the
predicate return true) but that custom legalization may in some cases
produce the exact bit math this matches. We only really want to handle
this prior to operation legalization.

However, the x86 backend, in a fit of awesome, relied on this. What it
would do is mark VSELECTs as expand, which would turn them into
arithmetic, which this would then match back into vector shuffles, which
we would then lower properly. Amazing.

Instead, the second change is to teach the x86 backend to directly form
vector shuffles from VSELECT nodes with constant conditions, and to mark
all of the vector types we support lowering blends as shuffles as custom
VSELECT lowering. We still mark the forms which actually support
variable blends as *legal* so that the custom lowering is bypassed, and
the legal lowering can even be used by the vector shuffle legalization
(yes, i know, this is confusing. but that's how the patterns are
written).

This makes the VSELECT lowering much more sensible, and in fact should
fix a bunch of bugs with it. However, as you'll see in the test cases,
right now what it does is point out the *hilarious* deficiency of the
new vector shuffle lowering when it comes to blends. Fortunately, my
very next patch fixes that. I can't submit it yet, because that patch,
somewhat obviously, forms the exact and/or pattern that the DAG combine
is matching here! Without this patch, teaching the vector shuffle
lowering to produce the right code infloops in the DAG combiner. With
this patch alone, we produce terrible code but at least lower through
the right paths. With both patches, all the regressions here should be
fixed, and a bunch of the improvements (like using 2 shufps with no
memory loads instead of 2 andps with memory loads and an orps) will
stay. Win!

There is one other change worth noting here. We had hilariously wrong
vectorization cost estimates for vselect because we fell through to the
code path that assumed all "expand" vector operations are scalarized.
However, the "expand" lowering of VSELECT is vector bit math, most
definitely not scalarized. So now we go back to the correct if horribly
naive cost of "1" for "not scalarized". If anyone wants to add actual
modeling of shuffle costs, that would be cool, but this seems an
improvement on its own. Note the removal of 16 and 32 "costs" for doing
a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of
course, we don't right now because of OMG bad code, but I'm going to fix
that. Next patch. I promise.

llvm-svn: 229835
2015-02-19 10:36:19 +00:00
Chandler Carruth
4ca147df5a [x86] Merge checks for a recently added test case that is the same on
all SSE variants and AVX variants.

llvm-svn: 229770
2015-02-18 23:20:49 +00:00
Reid Kleckner
53bc7b1858 Add an IR-to-IR test for dwarf EH preparation using opt
This tests the simple resume instruction elimination logic that we have
before making some changes to it.

llvm-svn: 229768
2015-02-18 23:17:41 +00:00
Reid Kleckner
c8eb2119c4 dos2unix the WinEH file and tests
llvm-svn: 229735
2015-02-18 19:52:46 +00:00
Andrew Kaylor
eca1819627 Adding implementation to outline C++ catch handlers for native Windows 64 exception handling.
Differential Revision: http://reviews.llvm.org/D7363

llvm-svn: 229715
2015-02-18 18:31:51 +00:00
Michael Kuperstein
8617e481c7 Fixes two issue in SimplifyDemandedBits of sext_in_reg:
1) We should not try to simplify if the sext has multiple uses
2) There is no need to simplify is the source value is already sign-extended.

Patch by Gil Rapaport <gil.rapaport@intel.com>

Differential Revision: http://reviews.llvm.org/D6949

llvm-svn: 229659
2015-02-18 09:43:40 +00:00
Chandler Carruth
fbca0e7b75 [x86] Refactor the bit shift code the same as I just did the byte shift
code.

While this didn't have the miscompile (it used MatchLeft consistently)
it missed some cases where it could use right shifts. I've added a test
case Craig Topper came up with to exercise the right shift matching.

This code is really identical between the two. I'm going to merge them
next so that we don't keep two copies of all of this logic.

llvm-svn: 229655
2015-02-18 09:19:58 +00:00
Daniel Jasper
3f53d83cd0 Remove experimental options to control machine block placement.
This reverts r226034. Benchmarking with those flags has not revealed
anything interesting.

llvm-svn: 229648
2015-02-18 08:18:07 +00:00
Elena Demikhovsky
2ae2229fab AVX-512: Added support for FP instructions with embedded rounding mode.
By Asaf Badouh <asaf.badouh@intel.com>

llvm-svn: 229645
2015-02-18 07:59:20 +00:00
Craig Topper
347250558c [X86] Add another test case for the bug fixed in r229642. With the bug a vpsrldq was emitted instead of pslldq.
llvm-svn: 229643
2015-02-18 07:45:43 +00:00
Chandler Carruth
e6fc612338 [x86] Rewrite the byte shift detection to not use boolean variables to
track state.

I didn't like this in the code review because the pattern tends to be
error prone, but I didn't see a clear way to rewrite it. Turns out that
there were bugs here, I found them when fuzz testing our shuffle
lowering for correctness on x86.

The core of the problem is that we need to consistently test all our
preconditions for the same directionality of shift and the same input
vector. Instead, formulate this as two predicates (one doesn't depend on
the input in any way), pass things like the directionality and input
vector as inputs, and loop over the alternatives.

This fixes a pattern of very rare miscompiles coming out of this code.
Turned up roughly 4 out of every 1 million v8 shuffles in my fuzz
testing. The new code is over half a million test runs with no failures
yet. I've also fuzzed every other function in the lowering code with
over 3.5 million test cases and not discovered any other miscompiles.

llvm-svn: 229642
2015-02-18 07:13:48 +00:00
Craig Topper
398dc737fa [X86] Remove AVX2 and SSE2 pslldq and psrldq intrinsics. We can represent them in IR with vector shuffles now. All their uses have been removed from clang in favor of shuffles.
llvm-svn: 229640
2015-02-18 06:24:44 +00:00
Andrea Di Biagio
016c12ee8d [X86][FastIsel] Teach how to select scalar integer to float/double conversions.
This patch teaches fast-isel how to select a (V)CVTSI2SSrr for an integer to 
float conversion, and how to select a (V)CVTSI2SDrr for an integer to double
conversion.

Added test 'fast-isel-int-float-conversion.ll'.

Differential Revision: http://reviews.llvm.org/D7698

llvm-svn: 229589
2015-02-17 23:40:58 +00:00
Rafael Espindola
be6855ec2a Add r228939 back with a fix.
The problem in the original patch was not switching back to .text after printing
an eh table.

Original message:

On ELF, put PIC jump tables in a non executable section.

Fixes PR22558.

llvm-svn: 229586
2015-02-17 23:34:51 +00:00
Rafael Espindola
686783175f Add a test showing the problem in r228939.
If an EH table is printed in between the function and the jump table we would
fail to switch back to the text section to print the jump table.

llvm-svn: 229580
2015-02-17 23:21:46 +00:00
Simon Pilgrim
6da8e17d64 [X86][SSE] Generalised unpckl/unpckh shuffle matching
Added commuted unpckl/unpckh shuffle matching patterns as many cases containing undefined lanes fail to commute by themselves.

Differential Revision: http://reviews.llvm.org/D7564

llvm-svn: 229571
2015-02-17 22:24:32 +00:00
Sanjay Patel
f2c498a0c4 use a triple instead of a cpu; less builbot sadness
llvm-svn: 229563
2015-02-17 21:59:54 +00:00
Rafael Espindola
6d441224de Add testcases I missed in r229541.
llvm-svn: 229542
2015-02-17 20:50:39 +00:00
Sanjay Patel
ec471315ab make basic block label matching more flexible for less sad buildbots
llvm-svn: 229535
2015-02-17 20:29:31 +00:00
Sanjay Patel
5abe0e38c5 prevent folding a scalar FP load into a packed logical FP instruction (PR22371)
Change the memory operands in sse12_fp_packed_scalar_logical_alias from scalars to vectors. 
That's what the hardware packed logical FP instructions define: 128-bit memory operands.
There are no scalar versions of these instructions...because this is x86.

Generating the wrong code (folding a scalar load into a 128-bit load) is still possible
using the peephole optimization pass and the load folding tables. We won't completely
solve this bug until we either fix the lowering in fabs/fneg/fcopysign and any other
places where scalar FP logic is created or fix the load folding in foldMemoryOperandImpl()
to make sure it isn't changing the size of the load.

Differential Revision: http://reviews.llvm.org/D7474

llvm-svn: 229531
2015-02-17 20:08:21 +00:00
Sanjay Patel
0117b06b3f Canonicalize splats as build_vectors (PR22283)
This is a follow-on patch to:
http://reviews.llvm.org/D7093

That patch canonicalized constant splats as build_vectors, 
and this patch removes the constant check so we can canonicalize
all splats as build_vectors.

This fixes the 2nd test case in PR22283:
http://llvm.org/bugs/show_bug.cgi?id=22283

The unfortunate code duplication between SelectionDAG and DAGCombiner
is discussed in the earlier patch review. At least this patch is just
removing code...

This improves an existing x86 AVX test and changes codegen in an ARM test.

Differential Revision: http://reviews.llvm.org/D7389

llvm-svn: 229511
2015-02-17 16:54:32 +00:00
Andrea Di Biagio
c38ea9f435 [X86][FastISel] Add missing flag -fast-isel-abort to run lines in test fast-isel-fptrunc-fpext.ll.
Flag -fast-isel-abort is required in order to verify that X86FastISel
never fails to select FPExt (float-to-double) and FPTrunc (double-to-float).
No Functional change intended.

llvm-svn: 229489
2015-02-17 12:25:49 +00:00
Elena Demikhovsky
30ee20b16b AVX-512: changes in intel_ocl_bi calling conventions
- added mask types v8i1 and v16i1 to possible function parameters
- enabled passing 512-bit vectors in standard CC
- added a test for KNL intel_ocl_bi conventions

llvm-svn: 229482
2015-02-17 09:20:12 +00:00
Michael Kuperstein
812c46b9de [X86] Combine vector anyext + and into a vector zext
Vector zext tends to get legalized into a vector anyext, represented as a vector shuffle with an undef vector + a bitcast, that gets ANDed with a mask that zeroes the undef elements.
Combine this into an explicit shuffle with a zero vector instead. This allows shuffle lowering to match it as a zext, instead of matching it as an anyext and emitting an explicit AND.
This combine only covers a subset of the cases, but it's a start.

Differential Revision: http://reviews.llvm.org/D7666

llvm-svn: 229480
2015-02-17 08:22:51 +00:00
Chandler Carruth
b51ccf2731 [x86] Teach the unpack lowering to try wider element unpacks.
This allows it to match still more places where previously we would have
to fall back on floating point shuffles or other more complex lowering
strategies.

I'm hoping to replace some of the hand-rolled unpack matching with this
routine is it gets more and more clever.

llvm-svn: 229463
2015-02-17 02:12:24 +00:00
Hal Finkel
0e48ce380c Specify arch in test/CodeGen/X86/float-conv-elim.ll
This test was failing on non-x86 hosts because it specified a cpu of x86_64,
but not an architecture. x86_64 is obviously not a valid cpu on all
architectures.

llvm-svn: 229460
2015-02-17 00:11:19 +00:00
Cameron McInally
857d405820 [AVX512] Make 512b vector floating point rounds legal on AVX512.
llvm-svn: 229445
2015-02-16 22:15:42 +00:00
Simon Pilgrim
e9778d75a1 [X86][SSE] Add SSE MOVQ instructions to SSEPackedInt domain
Patch to explicitly add the SSE MOVQ (rr,mr,rm) instructions to SSEPackedInt domain - prevents a number of costly domain switches.

Differential Revision: http://reviews.llvm.org/D7600

llvm-svn: 229439
2015-02-16 21:50:56 +00:00
Mehdi Amini
d44ae390cb SelectionDAG: fold (fp_to_u/sint (s/uint_to_fp)) here too
Update SPARC tests to match.

From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 229438
2015-02-16 21:47:58 +00:00
Craig Topper
b6e168f770 [X86] Remove the multiply by 8 that goes into the shift constant for X86ISD::VSHLDQ and X86ISD::VSRLDQ. This simplifies the pattern matching in isel and allows these nodes to become the patterns embedded in the instruction.
llvm-svn: 229431
2015-02-16 20:52:07 +00:00
Chandler Carruth
d44ede78e3 [x86] Add a generic unpack-targeted lowering technique. This can be used
to generically lower blends and is particularly nice because it is
available frome SSE2 onward. This removes a lot of the remaining domain
crossing blends in SSE2 code.

I'm hoping to replace some of the "interleaved" lowering hacks with
something closer to this which should be more principled. First, this
needs to learn how to detect and use other interleavings besides that of
the natural type provided. That will be a follow-up patch though.

llvm-svn: 229378
2015-02-16 12:28:18 +00:00
Chandler Carruth
3816521c2e [x86] Switch this test to use checks generated by my update script. NFC
llvm-svn: 229377
2015-02-16 12:23:22 +00:00
Chandler Carruth
358c1db65e [x86] Add initial basic support for forming blends of v16i8 vectors.
This blend instruction is ... really lame. The register usage is insane.
As a consequence this is probably only *barely* better than 2 pshufbs
followed by a por, and that mostly because it only has to read from
a single memory location.

However, this doesn't fix as much as I kind of expected, so more to go.
Pretty sure that the ordering and delegation of v16i8 is just really,
really bad.

llvm-svn: 229373
2015-02-16 10:58:23 +00:00
Chandler Carruth
db7a8ca276 [x86] Add some more test cases for i8 vector blends.
llvm-svn: 229372
2015-02-16 10:51:49 +00:00
Craig Topper
b3a29e8067 [X86] Add support for lowering shuffles to 256-bit PALIGNR instruction.
llvm-svn: 229359
2015-02-16 06:29:06 +00:00
Craig Topper
988e9c859c [X86] Remove some hard tab characters from tests.
llvm-svn: 229358
2015-02-16 06:29:02 +00:00
Chandler Carruth
a34a4a834e [x86] Teach the 128-bit vector shuffle lowering routines to take
advantage of the existence of a reasonable blend instruction.

The 256-bit vector shuffle lowering has leveraged the general technique
of decomposed shuffles and blends for quite some time, but this never
made it back into the 128-bit code, and there are a large number of
patterns where this is substantially better. For example, this removes
almost all domain crossing in vector shuffles that involve some blend
and some permutation with SSE4.1 and later. See the massive reduction
in 'shufps' for integer test cases in this commit.

This isn't perfect yet for a few reasons:

1) The v8i16 shuffle lowering continues to plague me. We don't always
   form an unpack-based blend when that would be better. But the wins
   pretty drastically outstrip the losses here.
2) The v16i8 shuffle lowering is just a disaster here. I never went and
   implemented blend support here for some terrible reason. I'll do
   that next probably. I've not updated it for now.

More variations on this technique are coming as well -- we don't
shuffle-into-unpack or shuffle-into-palignr, both of which would also be
profitable.

Note that some test cases grow significantly in the number of
instructions, but I expect to actually be faster. We use
pshufd+pshufd+blendw instead of a single shufps, but the pshufd's are
very likely to pipeline well (two ports on most modern intel chips) and
the blend is a *very* fast instruction. The domain switch penalty will
essentially always be more than a blend instruction, which is the only
increase in tree height.

llvm-svn: 229350
2015-02-16 01:52:02 +00:00
Chandler Carruth
572fa3dbba [x86] Clean up a few test cases with the update script. NFC
llvm-svn: 229349
2015-02-16 01:39:50 +00:00
Simon Pilgrim
9849f0e2cd Added (still inefficient) shuffle test case for PR21138
llvm-svn: 229321
2015-02-15 18:21:39 +00:00
Simon Pilgrim
dc017ce5f6 Added some test cases of missed opportunities to use unpckl/unpckh shuffles
llvm-svn: 229313
2015-02-15 15:07:45 +00:00
Simon Pilgrim
ddbf019542 [X86][AVX2] vpslldq/vpsrldq byte shifts for AVX2
This patch refactors the existing lowerVectorShuffleAsByteShift function to add support for 256-bit vectors on AVX2 targets.

It also fixes a tablegen issue that prevented the lowering of vpslldq/vpsrldq vec256 instructions.

Differential Revision: http://reviews.llvm.org/D7596

llvm-svn: 229311
2015-02-15 13:19:52 +00:00
Chandler Carruth
eb48186ef8 [x86] Add the test case from PR22412, we now get this right even with
the new vector shuffle legality.

llvm-svn: 229310
2015-02-15 12:45:05 +00:00
Chandler Carruth
2fac6b1c98 [x86] Teach the decomposed shuffle/blend lowering to use an early blend
when that will allow it to lower with a single permute instead of
multiple permutes.

It tries to detect when it will only have to do a single permute in
either case to maximize folding of loads and such.

This cuts a *lot* of the avx2 shuffle permute counts in half. =]

llvm-svn: 229309
2015-02-15 12:42:15 +00:00
Chandler Carruth
0b684c6980 [SDAG] Teach the SelectionDAG to canonicalize vector shuffles of splats
directly into blends of the splats.

These patterns show up even very late in the vector shuffle lowering
where we don't have any chance for DAG combining to kick in, and
blending is a tremendously simpler operation to model. By coercing the
shuffle into a blend we can much more easily match and lower shuffles of
splats.

Immediately with this change there are significantly more blends being
matched in the x86 vector shuffle lowering.

llvm-svn: 229308
2015-02-15 12:18:12 +00:00
Chandler Carruth
1d8146cec4 [x86] Stop shuffling zero vectors. =]
I was somewhat surprised this pattern really came up, but it does. It
seems better to just directly handle it than try to special case every
place where we end up forming a shuffle that devolves to a shuffle of
a zero vector.

llvm-svn: 229301
2015-02-15 10:34:52 +00:00
Chandler Carruth
5c0c778648 [x86] When splitting 256-bit vectors into 128-bit vectors, don't extract
subvectors from buildvectors. That doesn't really make any sense and it
breaks all of the down-stream matching of buildvectors to cleverly lower
shuffles.

With this, we now get the shift-based lowering of 256-bit vector
shuffles with AVX1 when we split them into 128-bit vectors. We also do
much better on the zero-extension patterns, although there remains quite
a bit of room for improvement here.

llvm-svn: 229299
2015-02-15 10:12:02 +00:00
Chandler Carruth
65b90c2fa8 [x86] Update some tests with the latest version of my script and llc.
This mostly adds some shuffle decode comments and cleans up indentation.

llvm-svn: 229296
2015-02-15 09:26:15 +00:00