1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 05:23:45 +02:00
Commit Graph

4637 Commits

Author SHA1 Message Date
Alexey Volkov
f369538f31 [x86] Do not convert to cmp32 for Atom arch by Sergey Okunev
Differential Revision: http://llvm-reviews.chandlerc.com/D2824

llvm-svn: 205288
2014-04-01 08:13:07 +00:00
Juergen Ributzka
9c6cfb73c8 [Stackmaps] Update the stackmap format to use 64-bit relocations for the function address and properly align all entries.
This commit updates the stackmap format to version 1 to indicate the
reorganizaion of several fields. This was done in order to align stackmap
entries to their natural alignment and to minimize padding.

Fixes <rdar://problem/16005902>

llvm-svn: 205254
2014-03-31 22:14:04 +00:00
Yaron Keren
2dd78cdf41 Two updated tests for MinGW 32 and 64 exception handling code generation.
llvm-svn: 205227
2014-03-31 17:34:15 +00:00
Akira Hatanaka
cca54c2eea [x86] Fix printing of register operands with q modifier.
Emit 32-bit register names instead of 64-bit register names if the target does
not have 64-bit general purpose registers.

<rdar://problem/14653996>

llvm-svn: 205067
2014-03-28 23:28:07 +00:00
David Majnemer
11f5ba4322 X86: Disable IsLegalToCallImmediateAddr for Win32
WinCOFF cannot form PC relative relocations to support absolute
MCValues.  We should reenable this once WinCOFF supports emission of
IMAGE_REL_I386_REL32 relocations.

This fixes PR19272.

llvm-svn: 205058
2014-03-28 21:40:47 +00:00
Rafael Espindola
5c8926deed Prevent alias from pointing to weak aliases.
This adds back r204781.

Original message:

Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204934
2014-03-27 15:26:56 +00:00
Elena Demikhovsky
624ece9d50 AVX-512: Implemented masking for integer arithmetic & logic instructions.
By Robert Khasanov rob.khasanov@gmail.com

llvm-svn: 204906
2014-03-27 09:45:08 +00:00
Ekaterina Romanova
777994499d This is a fix for PR# 19051. I noticed code gen differences due to code motion when running tests with and without the debug info at O2. The problem is in branch folding. A loop wanted to skip the debug info, but actually it didn't do so.
llvm-svn: 204865
2014-03-26 22:15:28 +00:00
Jim Grosbach
88bf32af3b Fix for incorrect address sinking in the presence of potential overflows.
In some cases it is possible for CGP to attempt to reuse a base address from
another basic block. In those cases we have to be sure that all the address
math was either done at the same bit width, or that none of it overflowed
before it was extended.

Patch by Louis Gerbarg <lgg@apple.com>

rdar://16307442

llvm-svn: 204833
2014-03-26 17:27:01 +00:00
Hans Wennborg
ce9682473f Revert "X86 memcpy lowering: use "rep movs" even when esi is used as base pointer" (r204174)
>  For functions where esi is used as base pointer, we would previously fall ba
>  from lowering memcpy with "rep movs" because that clobbers esi.
>
>  With this patch, we just store esi in another physical register, and restore
>  it afterwards. This adds a little bit of register preassure, but the more
>  efficient memcpy should be worth it.
>
>  Differential Revision: http://llvm-reviews.chandlerc.com/D2968

This didn't work. I was ending up with code like this:

  lea     edi,[esi+38h]
  mov     ecx,0Fh
  mov     edx,esi
  mov     esi,ebx
  rep movs dword ptr es:[edi],dword ptr [esi]
  lea     ecx,[esi+74h] <-- Ooops, we're now using esi before restoring it from edx.
  add     ebx,3Ch
  mov     esi,edx

I guess if we want to do this we need stronger glue or something, or doing the expansion
much later.

llvm-svn: 204829
2014-03-26 16:30:54 +00:00
Cameron McInally
8872097c93 Fix AVX512 Gather and Scatter execution domains.
llvm-svn: 204804
2014-03-26 13:50:50 +00:00
Renato Golin
2c1112ea41 Add @llvm.clear_cache builtin
Implementing the LLVM part of the call to __builtin___clear_cache
which translates into an intrinsic @llvm.clear_cache and is lowered
by each target, either to a call to __clear_cache or nothing at all
incase the caches are unified.

Updating LangRef and adding some tests for the implemented architectures.
Other archs will have to implement the method in case this builtin
has to be compiled for it, since the default behaviour is to bail
unimplemented.

A Clang patch is required for the builtin to be lowered into the
llvm intrinsic. This will be done next.

llvm-svn: 204802
2014-03-26 12:52:28 +00:00
Rafael Espindola
63a8ff6883 Revert "Prevent alias from pointing to weak aliases."
This reverts commit r204781.

I will follow up to with msan folks to see what is what they
were trying to do with aliases to weak aliases.

llvm-svn: 204784
2014-03-26 06:14:40 +00:00
Rafael Espindola
c9179b8b50 Prevent alias from pointing to weak aliases.
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204781
2014-03-26 04:48:47 +00:00
Quentin Colombet
5812800068 [X86] Add broadcast instructions to the table used by ExeDepsFix pass.
Adds the different broadcast instructions to the ReplaceableInstrsAVX2 table.
That way the ExeDepsFix pass can take better decisions when AVX2 broadcasts are
across domain (int <-> float).

In particular, prior to this patch we were generating:
  vpbroadcastd  LCPI1_0(%rip), %ymm2
  vpand %ymm2, %ymm0, %ymm0
  vmaxps  %ymm1, %ymm0, %ymm0 ## <- domain change penalty

Now, we generate the following nice sequence where everything is in the float
domain:
  vbroadcastss  LCPI1_0(%rip), %ymm2
  vandps  %ymm2, %ymm0, %ymm0
  vmaxps  %ymm1, %ymm0, %ymm0

<rdar://problem/16354675>

llvm-svn: 204770
2014-03-26 00:10:22 +00:00
Adam Nemet
66a311bff9 [X86] Generate VPSHUFB for in-place v16i16 shuffles
This used to resort to splitting the 256-bit operation into two 128-bit
shuffles and then recombining the results.

Fixes <rdar://problem/16167303>

llvm-svn: 204735
2014-03-25 17:47:06 +00:00
Cameron McInally
649a597374 Fix AVX2 Gather execution domains.
llvm-svn: 204713
2014-03-25 12:36:38 +00:00
David Majnemer
68a1631530 WinCOFF: Add support for -fdata-sections
This is a pretty straight forward translation for COFF, we just need to
stick the data in a COMDAT section marked as
IMAGE_COMDAT_SELECT_NODUPLICATES.

N.B. We must be careful to avoid sticking entities with private linkage
in COMDAT groups.  COFF is pretty hostile to the renaming of entities so
we must be careful to disallow GlobalVariables with unstable names.

llvm-svn: 204703
2014-03-25 06:14:26 +00:00
Quentin Colombet
ac3c109b60 [X86][ISelDAG] Add missing fallback patterns for avx2 broadcast instructions.
Those patterns are used when the load cannot be folded into the related broadcast
during the select phase.
This happens when the load gets additional uses that were not anticipated during
the previous lowering phases (constant vector to constant load, then constant
load reused) or when selection DAG is not able to prove that folding the load
will not create a cycle in the DAG.

<rdar://problem/16074331>

llvm-svn: 204631
2014-03-24 17:54:19 +00:00
David Majnemer
978341c0ae WinCOFF: Add support for -ffunction-sections
This is a pretty straight forward translation for COFF, we just need to
stick the function in a COMDAT section marked as
IMAGE_COMDAT_SELECT_NODUPLICATES.

llvm-svn: 204565
2014-03-23 17:47:39 +00:00
Andrea Di Biagio
84fdff1b7f [DAG] Fix an assertion failure caused by an invalid cast in method 'BuildVectorSDNode::isConstantSplat'
This patch renames method 'isConstantSplat' as 'getConstantSplatValue'
(mainly for consistency reasons), and rewrites its logic to ensure
that we always perform a legal 'cast<ConstantSDNode>'.

Added test shift-combine-crash.ll to verify that DAGCombiner no longer crashes with an assertion failure in the attempt to simplify a vector shift by a vector of all undef counts.

llvm-svn: 204536
2014-03-22 01:47:22 +00:00
Manman Ren
d30a28764a Register allocator: add condition to hoist a spill to outer loop.
We make sure a spill is not hoisted to a hotter outer loop by adding
a condition. Hoist a spill to outer loop if there are multiple dependents
(it can be beneficial if more than one dependents are hoisted) or
if DepSV (the hoisting source) is hotter than SV (the hoisting destination).

rdar://16268194

llvm-svn: 204522
2014-03-21 21:46:24 +00:00
Rafael Espindola
32e335fee3 Move codegen test over to MC.
llvm-svn: 204490
2014-03-21 17:55:34 +00:00
Rafael Espindola
ae72c7bb72 Convert test to using cfi.
An unnamed global in llvm still produces a regular symbol.

llvm-svn: 204488
2014-03-21 17:38:01 +00:00
Rafael Espindola
7f667c4902 Remove redundant test.
The production of the .eh symbols is done from MC now and we already have tests
for it.

llvm-svn: 204483
2014-03-21 17:26:35 +00:00
Rafael Espindola
196cb72a5f Split out the MC part of this test.
llvm-svn: 204481
2014-03-21 17:16:11 +00:00
Juergen Ributzka
4470e9c92d [Constant Hoisting] Make the constant materialization cost operand dependent
Extend the target hook to take also the operand index into account when
calculating the cost of the constant materialization.

Related to <rdar://problem/16381500>

llvm-svn: 204435
2014-03-21 06:04:45 +00:00
Rafael Espindola
239d7d1128 Convert a CodeGen test into a MC test.
llvm-svn: 204421
2014-03-21 00:55:42 +00:00
Rafael Espindola
6e9dde5149 Port test to cfi.
llvm-svn: 204416
2014-03-21 00:30:24 +00:00
Rafael Espindola
038051f4de Convert another CodeGen test into a MC test.
llvm-svn: 204412
2014-03-20 23:35:00 +00:00
Rafael Espindola
63a2bb51f9 Remove unused options from test.
llvm-svn: 204401
2014-03-20 21:38:04 +00:00
Juergen Ributzka
c55e0f3fc7 Revert "[Constant Hoisting] Extend coverage of the constant hoisting pass."
I will break this up into smaller pieces for review and recommit.

llvm-svn: 204393
2014-03-20 20:17:13 +00:00
Juergen Ributzka
7dae5f7baa [Constant Hoisting] Extend coverage of the constant hoisting pass.
This commit extends the coverage of the constant hoisting pass, adds additonal
debug output and updates the function names according to the style guide.

Related to <rdar://problem/16381500>

llvm-svn: 204389
2014-03-20 19:55:52 +00:00
Hans Wennborg
114745e7d4 X86 memcpy lowering: use "rep movs" even when esi is used as base pointer
For functions where esi is used as base pointer, we would previously fall back
from lowering memcpy with "rep movs" because that clobbers esi.

With this patch, we just store esi in another physical register, and restore
it afterwards. This adds a little bit of register preassure, but the more
efficient memcpy should be worth it.

Differential Revision: http://llvm-reviews.chandlerc.com/D2968

llvm-svn: 204174
2014-03-18 20:04:34 +00:00
Michael Zolotukhin
b974cb54e8 Fix test lsr-normalization.ll broken in r204161.
llvm-svn: 204166
2014-03-18 18:17:59 +00:00
Michael Zolotukhin
a73e842ea7 Add stride normalization to SCEV Normalize/Denormalize transformation.
llvm-svn: 204161
2014-03-18 17:34:03 +00:00
Andrea Di Biagio
b329156058 [DAGCombiner] teach how to simplify xor/and/or nodes according to the following rules:
1)  (AND (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (AND (A, B), C, Mask)
 2)  (OR  (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (OR  (A, B), C, Mask)
 3)  (XOR (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (XOR (A, B), V_0, Mask)

 4)  (AND (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, AND (A, B), Mask)
 5)  (OR  (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, OR  (A, B), Mask)
 6)  (XOR (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (V_0, XOR (A, B), Mask)

llvm-svn: 204160
2014-03-18 17:12:59 +00:00
Matt Arsenault
c95c06bda9 Make DAGCombiner work on vector bitshifts with constant splat vectors.
llvm-svn: 204071
2014-03-17 18:58:01 +00:00
Adam Nemet
65c87794ae [VectorLegalizer/X86] Don't unvectorize fp_to_uint for v8f32->v8i16
Rather than LegalizeAction::Expand, this needs LegalizeAction::Promote to get
promoted to fp_to_sint v8f32->v8i32.  This is a legal operation on AVX.

For that to work properly, we also need to teach the legalizer about the
specific promotion required here.  The default vector promotion uses
bitcasting to a vector type of the same total size.  We want to promote the
vector element type, effectively widening the operation and then truncating
the result.  This is analogous to the current logic of how int_to_fp is
promoted.

The change also factors out some code from the int_to_fp promotion code to
ValueType::widenIntegerVectorElementType.  This is now shared between
int_to_fp and fp_to_int.

There is no longer need for the custom lowering of fp_to_sint f32->v8i16 in
X86.  It can now go through the new target-independent fp_to_*int promotion
logic.

I also checked that no other target uses Promote for these ops yet, so there
shouldn't be any unexpected change in behavior.

Fixes <rdar://problem/16202247>

llvm-svn: 204058
2014-03-17 17:06:14 +00:00
Lang Hames
8992c5f69e [X86] New and improved VZeroUpperInserter optimization.
- Adds support for inserting vzerouppers before tail-calls.
  This is enabled implicitly by having MachineInstr::copyImplicitOps preserve
  regmask operands, which allows VZeroUpperInserter to see where tail-calls use
  vector registers.

- Fixes a bug that caused the previous version of this optimization to miss some
  vzeroupper insertion points in loops. (Loops-with-vector-code that followed
  loops-without-vector-code were mistakenly overlooked by the previous version).

- New algorithm never revisits instructions.

Fixes <rdar://problem/16228798>

llvm-svn: 204021
2014-03-17 01:22:54 +00:00
Adrian Prantl
ba6b9e907e Re-add checks that were in this testcase before it was converted to dwarfdump.
llvm-svn: 203981
2014-03-14 23:08:21 +00:00
Rafael Espindola
d15cd32b9f Remove the linker_private and linker_private_weak linkages.
These linkages were introduced some time ago, but it was never very
clear what exactly their semantics were or what they should be used
for. Some investigation found these uses:

* utf-16 strings in clang.
* non-unnamed_addr strings produced by the sanitizers.

It turns out they were just working around a more fundamental problem.
For some sections a MachO linker needs a symbol in order to split the
section into atoms, and llvm had no idea that was the case. I fixed
that in r201700 and it is now safe to use the private linkage. When
the object ends up in a section that requires symbols, llvm will use a
'l' prefix instead of a 'L' prefix and things just work.

With that, these linkages were already dead, but there was a potential
future user in the objc metadata information. I am still looking at
CGObjcMac.cpp, but at this point I am convinced that linker_private
and linker_private_weak are not what they need.

The objc uses are currently split in

* Regular symbols (no '\01' prefix). LLVM already directly provides
whatever semantics they need.
* Uses of a private name (start with "\01L" or "\01l") and private
linkage. We can drop the "\01L" and "\01l" prefixes as soon as llvm
agrees with clang on L being ok or not for a given section. I have two
patches in code review for this.
* Uses of private name and weak linkage.

The last case is the one that one could think would fit one of these
linkages. That is not the case. The semantics are

* the linker will merge these symbol by *name*.
* the linker will hide them in the final DSO.

Given that the merging is done by name, any of the private (or
internal) linkages would be a bad match. They allow llvm to rename the
symbols, and that is really not what we want. From the llvm point of
view, these objects should really be (linkonce|weak)(_odr)?.

For now, just keeping the "\01l" prefix is probably the best for these
symbols. If we one day want to have a more direct support in llvm,
IMHO what we should add is not a linkage, it is just a hidden_symbol
attribute. It would be applicable to multiple linkages. For example,
on weak it would produce the current behavior we have for objc
metadata. On internal, it would be equivalent to private (and we
should then remove private).

llvm-svn: 203866
2014-03-13 23:18:37 +00:00
Kevin Enderby
02a99aab20 Add -mtriple=x86_64-linux to this test case to fix the build bots.5
The original commit was r203829.

llvm-svn: 203844
2014-03-13 20:31:19 +00:00
Ekaterina Romanova
b9d21b7ce1 Fix for http://llvm.org/bugs/show_bug.cgi?id=18590
This patch fixes the bug in peephole optimization that folds a load which defines one vreg into the one and only use of that vreg. With debug info, a DBG_VALUE that referenced the vreg considered to be a use, preventing the optimization. The fix is to ignore DBG_VALUE's during the optimization, and undef a DBG_VALUE that references a vreg that gets removed.
Patch by Trevor Smigiel!

llvm-svn: 203829
2014-03-13 18:47:12 +00:00
Manuel Jacob
a25ab845ae CodeGenPrep: sink extends of illegal types into use block.
Summary:
This helps the instruction selector to lower an i64 * i64 -> i128
multiplication into a single instruction on targets which support it.

This is an update of D2973 which was reverted because of a bug reported
as PR19084.

Reviewers: t.p.northover, chapuni

Reviewed By: t.p.northover

CC: llvm-commits, alex, chapuni

Differential Revision: http://llvm-reviews.chandlerc.com/D3021

llvm-svn: 203797
2014-03-13 13:36:25 +00:00
Elena Demikhovsky
d3e6f6628c AVX-512: masked load/store + intrinsics for them.
llvm-svn: 203790
2014-03-13 12:05:52 +00:00
Adam Nemet
fc761e9a09 [X86] Add peephole for masked rotate amount
Extend what's currently done for shift because the HW performs this masking
implicitly:

   (rotl:i32 x, (and y, 31)) -> (rotl:i32 x, y)

I use the newly factored out multiclass that was only supporting shifts so
far.

For testing I extended my testcase for the new rotation idiom.

<rdar://problem/15295856>

llvm-svn: 203718
2014-03-12 21:20:55 +00:00
Rafael Espindola
d866898775 Reject alias to undefined symbols in the verifier.
On ELF and COFF an alias is just another name for a position in the file.
There is no way to refer to a position in another file, so an alias to
undefined is meaningless.

MachO currently doesn't support aliases. The spec has a N_INDR, which when
implemented will have a different set of restrictions. Adding support for
it shouldn't be harder than any other IR extension.

For now, having the IR represent what is actually possible with current
tools makes it easier to fix the design of GlobalAlias.

llvm-svn: 203705
2014-03-12 20:15:49 +00:00
Hans Wennborg
7ce76d19aa X86: Don't generate 64-bit movd after cmpneqsd in 32-bit mode (PR19059)
This fixes the bug where we would bitcast the 64-bit floating point result
of cmpneqsd to a 64-bit integer even on 32-bit targets.

Differential Revision: http://llvm-reviews.chandlerc.com/D3009

llvm-svn: 203581
2014-03-11 15:49:24 +00:00
Tim Northover
68c567a38a IR: add a second ordering operand to cmpxhg for failure
The syntax for "cmpxchg" should now look something like:

	cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic

where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).

rdar://problem/15996804

llvm-svn: 203559
2014-03-11 10:48:52 +00:00
Jim Grosbach
3b6ef12947 X86: Enable ISel of 16-bit MOVBE instructions.
When the MOVBE instructions are available, use them for 16-bit endian
swapping as well as for 32 and 64 bit.

The patterns were already present on the instructions, but weren't being
matched because the operation was unconditionally marked to 'Expand.'
Change that to be conditional on whether the MOVBE instructions are
available. Use 'rolw' to implement the in-register version (32 and 64
bit have the dedicated 'bswap' instruction for that).

Patch by Louis Gerbarg <lgg@apple.com>.

rdar://15479984

llvm-svn: 203524
2014-03-11 00:44:14 +00:00
Matt Arsenault
998df7332f Fix undefined behavior in vector shift tests.
These were all shifting the same amount as the bitwidth.

llvm-svn: 203519
2014-03-11 00:01:41 +00:00
NAKAMURA Takumi
baf4a0d596 Revert r203230, "CodeGenPrep: sink extends of illegal types into use block."
It choked i686 stage2.

llvm-svn: 203386
2014-03-09 11:01:07 +00:00
David Majnemer
4036f15710 IR: Change inalloca's grammar a bit
The grammar for LLVM IR is not well specified in any document but seems
to obey the following rules:

 - Attributes which have parenthesized arguments are never preceded by
   commas.  This form of attribute is the only one which ever has
   optional arguments.  However, not all of these attributes support
   optional arguments: 'thread_local' supports an optional argument but
   'addrspace' does not.  Interestingly, 'addrspace' is documented as
   being a "qualifier".  What constitutes a qualifier?  I cannot find a
   definition.

 - Some attributes use a space between the keyword and the value.
   Examples of this form are 'align' and 'section'.  These are always
   preceded by a comma.

 - Otherwise, the attribute has no argument.  These attributes do not
   have a preceding comma.

Sometimes an attribute goes before the instruction, between the
instruction and it's type, or after it's type.  'atomicrmw' has
'volatile' between the instruction and the type while 'call' has 'tail'
preceding the instruction.

With all this in mind, it seems most consistent for 'inalloca' on an
'inalloca' instruction to occur before between the instruction and the
type.  Unlike the current formulation, there would be no preceding
comma.  The combination 'alloca inalloca' doesn't look particularly
appetizing, perhaps a better spelling of 'inalloca' is down the road.

llvm-svn: 203376
2014-03-09 06:41:58 +00:00
Adam Nemet
05756683c9 Update comment from r203315 based on review
llvm-svn: 203361
2014-03-08 21:51:55 +00:00
David Blaikie
6f7025ca24 DebugInfo: further improvements to test following up on r203329
llvm-svn: 203337
2014-03-08 02:45:53 +00:00
David Blaikie
097ffbf12d DebugInfo: Fix test fallout from r203323
Will fix this harder in a moment.

llvm-svn: 203329
2014-03-08 01:32:51 +00:00
Adam Nemet
f591fbc1db [DAGCombiner] Recognize another rotation idiom
This is the new idiom:

  x<<(y&31) | x>>((0-y)&31)

which is recognized as:

  x ROTL (y&31)

The change refines matchRotateSub.  In
Neg & (OpSize - 1) == (OpSize - Pos) & (OpSize - 1), if Pos is
Pos' & (OpSize - 1) we can just use Pos' instead of Pos.

llvm-svn: 203315
2014-03-07 23:56:28 +00:00
Arnold Schwaighofer
bd1d167eb0 ISel: Make VSELECT selection terminate in cases where the condition type has to
be split and the result type widened.

When the condition of a vselect has to be split it makes no sense widening the
vselect and thereby widening the condition. We end up in an endless loop of
widening (vselect result type) and splitting (condition mask type) doing this.
Instead, split both the condition and the vselect and widen the result.

I ran this over the test suite with i686 and mattr=+sse and saw no regressions.

Fixes PR18036.

llvm-svn: 203311
2014-03-07 23:25:55 +00:00
Tim Northover
781b15d502 CodeGenPrep: sink extends of illegal types into use block.
This helps the instruction selector to lower an i64 * i64 -> i128
multiplication into a single instruction on targets which support it.

Patch by Manuel Jacob.

llvm-svn: 203230
2014-03-07 11:04:30 +00:00
Rafael Espindola
cb9ca86245 Replace PROLOG_LABEL with a new CFI_INSTRUCTION.
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.

The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.

The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.

I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.

The net result is that we don't create temporary labels that are never used.

llvm-svn: 203204
2014-03-07 06:08:31 +00:00
Rafael Espindola
fe5dfa44c9 Remove shouldEmitUsedDirectiveFor.
Clang now uses llvm.compiler.used for these cases.

llvm-svn: 203174
2014-03-06 22:47:08 +00:00
Rafael Espindola
f350832c74 Convert test to FileCheck.
llvm-svn: 203173
2014-03-06 22:21:43 +00:00
Andrea Di Biagio
4586b3b78c [X86] Teach the DAGCombiner how to fold a OR of two shufflevector nodes.
This patch teaches the DAGCombiner how to fold a binary OR between two
shufflevector into a single shuffle vector when possible.

The rules are:
  1. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask1)
  2. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf B, A, Mask2)

The DAGCombiner can take advantage of the fact that OR is commutative and
compute two possible shuffle masks (Mask1 and Mask2) for the resulting
shuffle node.

Before folding a dag according to either rule 1 or 2, DAGCombiner verifies
that the resulting shuffle mask is legal for the target.
DAGCombiner would firstly try to fold according to 1.; If not possible
then it will try to fold according to 2.
If both Mask1 and Mask2 are illegal then we conservatively don't fold
the OR instruction.

llvm-svn: 203156
2014-03-06 20:19:52 +00:00
Rafael Espindola
d987164eed Always print the implicit .text at the start of an asm file.
Before llvm-mc would print it, but llc was assuming that it would produce
another section changing directive before one was needed. That assumption is
false with inline asm.

Fixes PR19049.

Another option would be to always create the section, but in the asm printer
avoid printing sections changes during initialization. That would work, but
* We do use the fact that llvm-mc prints it in testing. The tests can be changed
  if needed.
* A quick poll on IRC suggest that most developers prefer the implicit .text to
  be printed.

llvm-svn: 203001
2014-03-05 20:09:15 +00:00
Cameron McInally
80fa2d42e5 Lower AVX v4i64->v4i32 truncate to one shuffle.
llvm-svn: 202996
2014-03-05 19:41:16 +00:00
Andrew Trick
bcd20c0c29 Make stackmap machineinstrs clobber the scratch regs too.
Patchpoints already did this. Doing it for stackmaps is a convenience
for the runtime in the event that it needs to scratch register to
patch or perform a runtime call thunk.

Unlike patchpoints, we just assume the AnyRegCC calling
convention. This is the only language and target independent calling
convention specific to stackmaps so makes sense.  Although the calling
convention is not currently used to select the scratch registers.

llvm-svn: 202943
2014-03-05 07:08:16 +00:00
Hans Wennborg
c1cb270dba Check for dynamic allocas and inline asm that clobbers sp before building
selection dag (PR19012)

In X86SelectionDagInfo::EmitTargetCodeForMemcpy we check with MachineFrameInfo
to make sure that ESI isn't used as a base pointer register before we choose to
emit rep movs (which clobbers esi).

The problem is that MachineFrameInfo wouldn't know about dynamic allocas or
inline asm that clobbers the stack pointer until SelectionDAGBuilder has
encountered them.

This patch fixes the problem by checking for such things when building the
FunctionLoweringInfo.

Differential Revision: http://llvm-reviews.chandlerc.com/D2954

llvm-svn: 202930
2014-03-05 02:43:26 +00:00
Elena Demikhovsky
838b163a58 AVX-512: Fixed extract_vector_elt for v8i1 vector
llvm-svn: 202624
2014-03-02 09:19:44 +00:00
Manman Ren
fe531c446c SpillPlacement: fix a bug in iterate.
Inside iterate, we scan backwards then scan forwards in a loop. When iteration
is not zero, the last node was just updated so we can skip it. But when
iteration is zero, we can't skip the last node.

For the testing case, fixing this will save a spill and move register copies
from hot path to cold path.

llvm-svn: 202557
2014-02-28 23:05:31 +00:00
Adam Nemet
0fe89b88ce Test commit
llvm-svn: 202528
2014-02-28 18:44:39 +00:00
Daniel Sanders
98ea718b1d Stop test/CodeGen/X86/v4i32load-crash.ll targeting non-X86-64 targets.
Summary:
Fixes an issue where a test attempts to use -mcpu=x86-64 on non-X86-64 targets.
This triggers an assertion in the MIPS backend since it doesn't know what ABI to
use by default for unrecognized processors.

CC: llvm-commits, rafael

Differential Revision: http://llvm-reviews.chandlerc.com/D2877

llvm-svn: 202369
2014-02-27 09:24:31 +00:00
Andrew Trick
ba61c4e6cf Add a limit to the heuristic that register allocates instructions in local order.
This handles pathological cases in which we see 2x increase in spill
code for large blocks (~50k instructions). I don't have a unit test
for this behavior.

Fixes rdar://16072279.

llvm-svn: 202304
2014-02-26 22:07:26 +00:00
Quentin Colombet
e639a79f72 Lower unsigned vsetcc to psubus in certain cases
The current approach to lower a vsetult is to flip the sign bit of the
operands, swap the operands and then use a (signed) pcmpgt.  psubus (unsigned
saturating subtract) can be used to emulate a vsetult more efficiently:

+    case ISD::SETULT: {
+      // If the comparison is against a constant we can turn this into a
+      // setule.  With psubus, setule does not require a swap.  This is
+      // beneficial because the constant in the register is no longer
+      // destructed as the destination so it can be hoisted out of a loop.

I also enable lowering via psubus in a few other cases where it's clearly
beneficial: setule and setuge if minu/maxu cannot be used.
    
rdar://problem/14338765

Patch by Adam Nemet <anemet@apple.com>.

llvm-svn: 202301
2014-02-26 21:39:12 +00:00
Rafael Espindola
4caf003955 Store a DataLayout in Module.
Now that DataLayout is not a pass, store one in Module.

Since the C API expects to be able to get a char* to the datalayout description,
we have to keep a std::string somewhere. This patch keeps it in Module and also
uses it to represent modules without a DataLayout.

Once DataLayout is mandatory, we should probably move the string to DataLayout
itself since it won't be necessary anymore to represent the special case of a
module without a DataLayout.

llvm-svn: 202190
2014-02-25 20:01:08 +00:00
Elena Demikhovsky
1804845947 AVX-512: Fixed encoding of VPTESTMQ
llvm-svn: 201980
2014-02-23 14:28:35 +00:00
Benjamin Kramer
f8dda2f6b3 Make test more resilient against scheduling decisions.
Should bring the atom buildbots back to life.

llvm-svn: 201951
2014-02-22 20:14:02 +00:00
NAKAMURA Takumi
1f5bffb985 llvm/test/CodeGen/X86/shift-pcmp.ll: Tweak to appease FileCheck. "CHECK-LABEL" doesn't identify labels magically and CHECK-LABEL behaves free from other contexts.
For targeting pecoff, ".def foo" appears before ".short 32".

          .def    foo;
  ...
  .LCPI0_0:
          .short  32
  foo:

CHECK-LABEL seeks not from ".short 32" but from the top of the input.

llvm-svn: 201931
2014-02-22 07:27:04 +00:00
Quentin Colombet
60b53fae3c [DAGCombiner] PCMP* sets its result to all ones or zeros so we can AND with the
shifted mask rather than masking and shifting separately.

The patch adds this transformation to the DAGCombiner:

  (shl (and (setcc:i8v16 ...) N01C) N1C) -> (and (setcc:i8v16 ...) N01C<<N1C)

<rdar://problem/16054492>

Patch by Adam Nemet <anemet@apple.com>

llvm-svn: 201906
2014-02-21 23:42:41 +00:00
Elena Demikhovsky
27104d29fd AVX-512: added a lit test for truncate operation
llvm-svn: 201763
2014-02-20 07:34:13 +00:00
Rafael Espindola
aea6192f20 Add back r201608, r201622, r201624 and r201625
r201608 made llvm corretly handle private globals with MachO. r201622 fixed
a bug in it and r201624 and r201625 were changes for using private linkage,
assuming that llvm would do the right thing.

They all got reverted because r201608 introduced a crash in LTO. This patch
includes a fix for that. The issue was that TargetLoweringObjectFile now has
to be initialized before we can mangle names of private globals. This is
trivially true during the normal codegen pipeline (the asm printer does it),
but LTO has to do it manually.

llvm-svn: 201700
2014-02-19 17:23:20 +00:00
Cameron McInally
7173a45caf Fix AVX512 vector sqrt assembly strings.
llvm-svn: 201681
2014-02-19 15:16:09 +00:00
Daniel Jasper
bf4e7d8ac3 Revert r201622 and r201608.
This causes the LLVMgold plugin to segfault. More information on the
replies to r201608.

llvm-svn: 201669
2014-02-19 12:26:01 +00:00
Rafael Espindola
5381278d2b Avoid an infinite cycle with private linkage and -f{data|function}-sections.
When outputting an object we check its section to find its name, but when
looking for the section with -ffunction-section we look for the symbol name.

Break the loop by requesting a name with the private prefix when constructing
the section name. This matches the behavior before r201608.

llvm-svn: 201622
2014-02-19 01:28:30 +00:00
Rafael Espindola
d39a573c72 Fix PR18743.
The IR
@foo = private constant i32 42

is valid, but before this patch we would produce an invalid MachO from it. It
was invalid because it would use an L label in a section where the liker needs
the labels in order to atomize it.

One way of fixing it would be to just reject this IR in the backend, but that
would not be very front end friendly.

What this patch does is use an 'l' prefix in sections that we know the linker
requires symbols for atomizing them. This allows frontends to just use
private and not worry about which sections they go to or how the linker handles
them.

One small issue with this strategy is that now a symbol name depends on the
section, which is not available before codegen. This is not a problem in
practice. The reason is that it only happens with private linkage, which will
be ignored by the non codegen users (llvm-nm and llvm-ar).

llvm-svn: 201608
2014-02-18 22:24:57 +00:00
Tim Northover
448249fd73 X86: use vpsllvd (& friends) for 16-bit shifts on Haswell
llvm-svn: 201558
2014-02-18 11:15:32 +00:00
Elena Demikhovsky
0e85630ee2 AVX-512: implemented zext fron i1 to i16
llvm-svn: 201502
2014-02-17 07:29:33 +00:00
Elena Demikhovsky
2cdad2b3d4 AVX-512: simpyfied BUILD_VECTOR for masks; fixed cmp/test sequence
llvm-svn: 201487
2014-02-16 11:34:23 +00:00
Nico Rieck
02b2fbee0d Fix broken CHECK lines
llvm-svn: 201479
2014-02-16 07:31:05 +00:00
Rafael Espindola
753d7ba297 Use __literal16. It has been supported by the linker since 2005.
llvm-svn: 201365
2014-02-13 23:16:11 +00:00
Rafael Espindola
367df40fdc .file is only available on ELF, use a triple instead of -march.
llvm-svn: 201337
2014-02-13 15:38:16 +00:00
Daniel Sanders
7a3a160940 Re-commit: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for
targets with mature MC support. Such targets will always parse the inline
assembly (even when emitting assembly). Targets without mature MC support
continue to use EmitRawText() for assembly output.

The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced
with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler
to parse inline assembly (even when emitting assembly output). UseIntegratedAs
is set to true for targets that consider any failure to parse valid assembly
to be a bug. Target specific subclasses generally enable the integrated
assembler in their constructor. The default value can be overridden with
-no-integrated-as.

All tests that rely on inline assembly supporting invalid assembly (for example,
those that use mnemonics such as 'foo' or 'hello world') have been updated to
disable the integrated assembler.

Changes since review (and last commit attempt):
- Fixed test failures that were missed due to configuration of local build.
  (fixes crash.ll and a couple others).
- Fixed tests that happened to pass because the local build was on X86
  (should fix 2007-12-17-InvokeAsm.ll)
- mature-mc-support.ll's should no longer require all targets to be compiled.
  (should fix ARM and PPC buildbots)
- Object output (-filetype=obj and similar) now forces the integrated assembler
  to be enabled regardless of default setting or -no-integrated-as.
  (should fix SystemZ buildbots)

Reviewers: rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2686

llvm-svn: 201333
2014-02-13 14:44:26 +00:00
Juergen Ributzka
19fe55f203 [DAG] Fix the recognition of opaque constants in the SelectionDAGBuilder.
This fix checks the original LLVM IR node to identify opaque constants by
looking for the bitcast-constant pattern. Originally we looked at the generated
SDNode, but this might lead to incorrect results. The SDNode could have been
generated by an constant expression that was folded to a constant.

This fixes <rdar://problem/16050719>

llvm-svn: 201291
2014-02-13 04:19:26 +00:00
Andrea Di Biagio
b682c0a265 [X86] Teach the backend how to lower vector shift left into multiply rather than scalarizing it.
Instead of expanding a packed shift into a sequence of scalar shifts,
the backend now tries (when possible) to convert the vector shift into a
vector multiply.

Before this change, a shift of a MVT::v8i16 vector by a
build_vector of constants was always scalarized into a long sequence of "vector
extracts + scalar shifts + vector insert".
With this change, if there is SSE2 support, we emit a single vector multiply.

This change also affects SSE4.1, AVX, AVX2 shifts:
 - A shift of a MVT::v4i32 vector by a build_vector of non uniform constants
is now lowered when possible into a single SSE4.1 vector multiply.
 - Packed v16i16 shift left by constant build_vector are now expanded when
possible into a single AVX2 vpmullw.
This change also improves the lowering of AVX512f vector shifts.

Added test CodeGen/X86/vec_shift6.ll with some code examples that are affected
by this change.

llvm-svn: 201271
2014-02-12 23:42:28 +00:00
Daniel Sanders
656c4d360b Revert r201237+r201238: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
It introduced multiple test failures in the buildbots.

llvm-svn: 201241
2014-02-12 15:39:20 +00:00
Daniel Sanders
e647d6441b Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output.

The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as.

All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler.

Reviewers: rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2686

llvm-svn: 201237
2014-02-12 14:44:54 +00:00
David Blaikie
4af7429b68 DebugInfo: Remove dependence on file numbering in the line table.
These tests were unnecessarily sensitive to the presence and ordering of
elements in the line table file_names list which will break on a future
change I'm working on.

llvm-svn: 201185
2014-02-11 21:46:46 +00:00
Robert Lougher
834d664400 Teach the DAGCombiner how to fold concat_vector nodes when the input is two
BUILD_VECTOR nodes, e.g.:

(concat_vectors (BUILD_VECTOR a1, a2, a3, a4), (BUILD_VECTOR b1, b2, b3, b4))
->
(BUILD_VECTOR a1, a2, a3, a4, b1, b2, b3, b4)

This fixes an issue with AVX, where a sequence was not recognized as a 256-bit
vbroadcast due to the concat_vectors.

llvm-svn: 201158
2014-02-11 15:42:46 +00:00
Elena Demikhovsky
31d965117b AVX: fixed a bug in LowerVECTOR_SHUFFLE
llvm-svn: 201140
2014-02-11 10:21:53 +00:00
Elena Demikhovsky
ac4dfca982 AVX-512: Optimized BUILD_VECTOR pattern;
fixed encoding of VEXTRACTPS instruction.

llvm-svn: 201134
2014-02-11 07:25:59 +00:00