There is nothing special about quotes and newlines from the object
file point of view, only the assembler has to worry about expanding
the \n and \".
This patch then removes the special handling from the Mangler.
llvm-svn: 194667
Accepting quotes is a property of an assembler, not of an object file. For
example, ELF can support any names for sections and symbols, but the gnu
assembler only accepts quotes in some contexts and llvm-mc in a few more.
LLVM should not produce different symbols based on a guess about which assembler
will be reading the code it is printing.
llvm-svn: 194575
This patch reapplies r193676 with an additional fix for the Hexagon backend. The
SystemZ backend has already been fixed by r194148.
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.
This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. Now the type
legalizer will split both VSELECT and SETCC.
This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.
Reviewed by Nadav
llvm-svn: 194542
We already know how to fold a reload from a frameindex without
analyzing the load instruction. Generalize this to handle any
frameindex load. This streamlines the logic for rematerializing loads
from stack arguments. As a side effect, it allows stackmaps to record
a stack argument location without spilling it.
Verified no effect on codegen for llvm test-suite.
llvm-svn: 194497
Fixes <rdar://15432754> [JS] Assertion: "Folded a def to a non-store!"
The primary purpose of anyregcc is to prevent a patchpoint's call
arguments and return value from being spilled. They must be available
in a register, although the calling convention does not pin the
register. It's up to the front end to avoid using this convention for
calls with more arguments than allocatable registers.
llvm-svn: 194428
This patch moves the jump address materialization inside the noop slide. This
enables patching of the materialization itself or its complete removal. This
patch also adds the ability to define scratch registers that can be used safely
by the code called from the patchpoint intrinsic. At least one scratch register
is required, because that one is used for the materialization of the jump
address. This patch depends on D2009.
Differential Revision: http://llvm-reviews.chandlerc.com/D2074
Reviewed by Andy
llvm-svn: 194306
The idea of the AnyReg Calling Convention is to provide the call arguments in
registers, but not to force them to be placed in a paticular order into a
specified set of registers. Instead it is up tp the register allocator to assign
any register as it sees fit. The same applies to the return value (if
applicable).
Differential Revision: http://llvm-reviews.chandlerc.com/D2009
Reviewed by Andy
llvm-svn: 194293
MorphNodeTo is not safe to call during DAG building. It eagerly
deletes dependent DAG nodes which invalidates the NodeMap. We could
expose a safe interface for morphing nodes, but I don't think it's
worth it. Just create a new MachineNode and replaceAllUsesWith.
My understaning of the SD design has been that we want to support
early target opcode selection. That isn't very well supported, but
generally works. It seems reasonable to rely on this feature even if
it isn't widely used.
llvm-svn: 194102
When an extend more than doubles the size of the elements (e.g., a zext
from v16i8 to v16i32), the normal legalization method of splitting the
vectors will run into problems as by the time the destination vector is
legal, the source vector is illegal. The end result is the operation
often becoming scalarized, with the typical horrible performance. For
example, on x86_64, the simple input of:
define void @bar(<16 x i8> %a, <16 x i32>* %p) nounwind {
%tmp = zext <16 x i8> %a to <16 x i32>
store <16 x i32> %tmp, <16 x i32>*%p
ret void
}
Generates:
.section __TEXT,__text,regular,pure_instructions
.section __TEXT,__const
.align 5
LCPI0_0:
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.section __TEXT,__text,regular,pure_instructions
.globl _bar
.align 4, 0x90
_bar:
vpunpckhbw %xmm0, %xmm0, %xmm1
vpunpckhwd %xmm0, %xmm1, %xmm2
vpmovzxwd %xmm1, %xmm1
vinsertf128 $1, %xmm2, %ymm1, %ymm1
vmovaps LCPI0_0(%rip), %ymm2
vandps %ymm2, %ymm1, %ymm1
vpmovzxbw %xmm0, %xmm3
vpunpckhwd %xmm0, %xmm3, %xmm3
vpmovzxbd %xmm0, %xmm0
vinsertf128 $1, %xmm3, %ymm0, %ymm0
vandps %ymm2, %ymm0, %ymm0
vmovaps %ymm0, (%rdi)
vmovaps %ymm1, 32(%rdi)
vzeroupper
ret
So instead we can check if there are legal types that enable us to split
more cleverly when the input vector is already legal such that we don't
turn it into an illegal type. If the extend is such that it's more than
doubling the size of the input we check if
- the number of vector elements is even,
- the source type is legal,
- the type of a split source is illegal,
- the type of an extended (by doubling element size) source is legal, and
- the type of that extended source when split is legal.
If the conditions are met, instead of just splitting both the
destination and the source types, we create an extend that only goes up
one "step" (doubling the element width), and the continue legalizing the
rest of the operation normally. The result is that this operates as a
new, more effecient, termination condition for the loop of "split the
operation until the destination type is legal."
With this change, the above example now compiles to:
_bar:
vpxor %xmm1, %xmm1, %xmm1
vpunpcklbw %xmm1, %xmm0, %xmm2
vpunpckhwd %xmm1, %xmm2, %xmm3
vpunpcklwd %xmm1, %xmm2, %xmm2
vinsertf128 $1, %xmm3, %ymm2, %ymm2
vpunpckhbw %xmm1, %xmm0, %xmm0
vpunpckhwd %xmm1, %xmm0, %xmm3
vpunpcklwd %xmm1, %xmm0, %xmm0
vinsertf128 $1, %xmm3, %ymm0, %ymm0
vmovaps %ymm0, 32(%rdi)
vmovaps %ymm2, (%rdi)
vzeroupper
ret
This generalizes a custom lowering that was added a while back to the
ARM backend. That lowering is no longer necessary, and is removed. The
testcases for it, however, provide excellent ARM tests for this change
and so remain.
rdar://14735100
llvm-svn: 193727
With this patch llvm produces a weak_def_can_be_hidden for linkonce_odr
if they are also unnamed_addr or don't have their address taken.
There is not a lot of documentation about .weak_def_can_be_hidden, but
from the old discussion about linkonce_odr_auto_hide and the name of
the directive this looks correct: these symbols can be hidden.
Testing this with the ld64 in Xcode 5 linking clang reduces the number of
exported symbols from 21053 to 19049.
llvm-svn: 193718
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.
This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. This mask has
usually the same size as the VSELECT return type (except for Intel KNL). Now the
type legalizer will split both VSELECT and SETCC.
This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.
Reviewed by Nadav
llvm-svn: 193676
On sandy bridge (PR17654) we now get
vpxor %xmm1, %xmm1, %xmm1
vpunpckhbw %xmm1, %xmm0, %xmm2
vpunpcklbw %xmm1, %xmm0, %xmm0
vinsertf128 $1, %xmm2, %ymm0, %ymm0
On haswell it's a simple
vpmovzxbw %xmm0, %ymm0
There is a maze of duplicated and dead transforms and patterns in this
area. Remove the dead custom lowering of zext v8i16 to v8i32, that's
already handled by LowerAVXExtend.
llvm-svn: 193262
- Skip instructions added in prolog. For specific targets, prolog may
insert helper function calls (e.g. _chkstk will be called when
there're more than 4K bytes allocated on stack). However, these
helpers don't use/def YMM/XMM registers.
llvm-svn: 193261
the instruction defenitions and ISEL reflect this.
Prior to this patch these instructions took an i32i8imm, and the high bits were
dropped during encoding. This led to incorrect behavior for shifts by
immediates higher than 255. This patch fixes that issue by detecting large
immediate shifts and returning constant zero (for logical shifts) or capping
the shift amount at an encodable value (for arithmetic shifts).
Fixes <rdar://problem/14968098>
llvm-svn: 193096
This ensures that the prefix data is treated as part of the function for
the purpose of debug info. This provides a better debugging experience,
among other things by allowing a debug info client to correctly look up
a function in debug info given a function pointer.
llvm-svn: 193042
This caused the clang-native-mingw32-win7 buildbot to break.
The assembler was complaining about the following lines that were showing up
in the asm for CrashRecoveryContext.cpp:
movl $"__ZL16ExceptionHandlerP19_EXCEPTION_POINTERS@4", 4(%eax)
calll "_AddVectoredExceptionHandler@8"
.def "__ZL16ExceptionHandlerP19_EXCEPTION_POINTERS@4";
"__ZL16ExceptionHandlerP19_EXCEPTION_POINTERS@4":
calll "_RemoveVectoredExceptionHandler@4"
Reverting for now.
llvm-svn: 192940
When canonicalizing dags according to the rule
(shl (zext (shr X, c1) ), c1) ==> (zext (shl (shr X, c1), c1))
remember to add the new shl dag to the DAGCombiner worklist of nodes.
If we don't explicitly add it to the worklist of nodes to visit, we
may not trigger later on the rule that folds the shift left + logical
shift right into a AND instruction with bitmask.
llvm-svn: 192883
Consider the following:
typedef unsigned short ushort4U __attribute__((ext_vector_type(4),
aligned(2)));
typedef unsigned short ushort4 __attribute__((ext_vector_type(4)));
typedef unsigned short ushort8 __attribute__((ext_vector_type(8)));
typedef int int4 __attribute__((ext_vector_type(4)));
int4 __bbase_cvt_int(ushort4 v) {
ushort8 a;
a.lo = v;
return _mm_cvtepu16_epi32(a);
}
This generates the, not unreasonable, IR:
define <4 x i32> @foo0(double %v.coerce) nounwind ssp {
%tmp = bitcast double %v.coerce to <4 x i16>
%tmp1 = shufflevector <4 x i16> %tmp, <4 x i16> undef, <8 x i32> <i32
%0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
%tmp2 = tail call <4 x i32> @llvm.x86.sse41.pmovzxwd(<8 x i16> %tmp1)
ret <4 x i32> %tmp2
}
The problem is when type legalization gets hold of the v4i16. It
legalizes that by spilling to the stack, then doing a zero-extending
load. Things go even more silly from there, ending up with something
like:
_foo0:
movsd %xmm0, -8(%rsp) <== Spill to the stack.
movq -8(%rsp), %xmm0 <== Reload it right back out.
pmovzxwd %xmm0, %xmm1 <== Here's what we actually asked for.
pblendw $1, %xmm1, %xmm0 <== We don't need this at all
pmovzxwd %xmm0, %xmm0 <== We already did this
ret
The v8i8 to v8i16 zext intrinsic gives even worse results, with two
table lookups via pshufb instructions(!!).
To avoid all that, we can move the bitcasting until after we've formed
the wider (legal) vector type. Then our normal codegen flows along
nicely and we get the expected:
_foo0:
pmovzxwd %xmm0, %xmm0
ret
rdar://15245794
llvm-svn: 192866
The reason this got reverted was that the @feat.00 symbol which was emitted
for every TU became quoted, and on cygwin/mingw we use the gas assembler which
couldn't handle the quotes.
This commit fixes the problem by only emitting @feat.00 for win32, where we use
clang -cc1as to assemble. gas would just drop this symbol anyway, so there is no
loss there.
With @feat.00 gone, there shouldn't be quoted symbols showing up on cygwin since
it uses the Itanium ABI, which doesn't put these funny characters in symbols.
> Because of win32 mangling, we produce symbol and section names with
> funny characters in them, most notably @ characters.
>
> MC would choke on trying to parse its own assembly output. This patch addresses
> that by:
>
> - Making @ trigger quoting of symbol names
> - Also quote section names in the same way
> - Just parse section names like other identifiers (to allow for quotes)
> - Don't assume @ signifies a symbol variant if it is in a string.
llvm-svn: 192859
bulldozer and piledriver. Support for the instruction itself seems to have
already been added in r178040.
Differential Revision: http://llvm-reviews.chandlerc.com/D1933
llvm-svn: 192828
This happens e.g. with <2 x i64> -1 on x86_32. It cannot be generated directly
because i64 is illegal. It would be nice if getNOT would handle this
transparently, but I don't see a way to generate a legal constant there right
now. Fixes PR17487.
llvm-svn: 192795