Summary:
This fixes PR27617.
Bug description: The SLPVectorizer asserts on encountering GEPs with different index types, such as i8 and i64.
The patch includes a simple relaxation of the assert to allow constants being of different types, along with a regression test that will provoke the unrelaxed assert.
Reviewers: nadav, mzolotukhin
Subscribers: JesperAntonsson, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D20685
Patch by Jesper Antonsson!
llvm-svn: 272206
Summary:
Consider the following diamond CFG:
A
/ \
B C
\/
D
Suppose A->B and A->C have probabilities 81% and 19%. In block-placement, A->B is called a hot edge and the final placement should be ABDC. However, the current implementation outputs ABCD. This is because when choosing the next block of B, it checks if Freq(C->D) > Freq(B->D) * 20%, which is true (if Freq(A) = 100, then Freq(B->D) = 81, Freq(C->D) = 19, and 19 > 81*20%=16.2). Actually, we should use 25% instead of 20% as the probability here, so that we have 19 < 81*25%=20.25, and the desired ABDC layout will be generated.
Reviewers: djasper, davidxl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20989
llvm-svn: 272203
Summary:
Now DISubroutineType has a 'cc' field which should be a DW_CC_ enum. If
it is present and non-zero, the backend will emit it as a
DW_AT_calling_convention attribute. On the CodeView side, we translate
it to the appropriate enum for the LF_PROCEDURE record.
I added a new LLVM vendor specific enum to the list of DWARF calling
conventions. DWARF does not appear to attempt to standardize these, so I
assume it's OK to do this until we coordinate with GCC on how to emit
vectorcall convention functions.
Reviewers: dexonsmith, majnemer, aaboud, amccarth
Subscribers: mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D21114
llvm-svn: 272197
with user specified count has been applied.
Summary:
Previously SetLoopAlreadyUnrolled() set the disable pragma only if
there was some loop metadata.
Now it set the pragma in all cases. This helps to prevent multiple
unroll when -unroll-count=N is given.
Reviewers: mzolotukhin
Differential Revision: http://reviews.llvm.org/D20765
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 272195
Again, the Microsoft linker does not like empty substreams.
We still emit an empty string table if CodeView is enabled, but that
doesn't cause problems because it always contains at least one null
byte.
llvm-svn: 272183
Absence of may-unwind calls is not enough to guarantee that a
UB-generating use of an add-rec poison in the loop latch will actually
cause UB. We also need to guard against calls that terminate the thread
or infinite loop themselves.
This partially addresses PR28012.
llvm-svn: 272181
The worklist algorithm introduced in rL271151 didn't check to see if the
direct users of the post-inc add recurrence propagates poison. This
change fixes the problem and makes the code structure more obvious.
Note for release managers: correctness wise, this bug wasn't a
regression introduced by rL271151 -- the behavior of SCEV around
post-inc add recurrences was strictly improved (in terms of correctness)
in rL271151.
llvm-svn: 272179
Teach AArch64RegisterBankInfo that G_OR can be mapped on either GPR or
FPR for 64-bit or 32-bit values.
Add test cases demonstrating how this information is used to coalesce a
computation on a single register bank.
llvm-svn: 272170
The MSR instructions can write to the CPSR, but we did not model this
fact, so we could emit them in the middle of IT blocks, changing the
condition flags for later instructions in the block.
The tests use two calls to llvm.write_register.i32 because it is valid
to use these instructions at the end of an IT block, which if conversion
does do in some cases. With two calls, the first clobbers the flags, so
a branch has to be used to make the second one conditional.
Differential Revision: http://reviews.llvm.org/D21139
llvm-svn: 272154
Changes since the initial commit:
- Use echo instead of printf. This should side-step the character
escaping issues on Windows.
Differential Revision: http://reviews.llvm.org/D20980
llvm-svn: 272068
Summary:
The presence of this attribute indicates that VGPR outputs should be computed
in whole quad mode. This will be used by Mesa for prolog pixel shaders, so
that derivatives can be taken of shader inputs computed by the prolog, fixing
a bug.
The generated code could certainly be improved: if a prolog pixel shader is
used (which isn't common in modern OpenGL - they're used for gl_Color, polygon
stipples, and forcing per-sample interpolation), Mesa will use this attribute
unconditionally, because it has to be conservative. So WQM may be used in the
prolog when it isn't really needed, and furthermore a silly back-and-forth
switch is likely to happen at the boundary between prolog and main shader
parts.
Fixing this is a bit involved: we'd first have to add a mechanism by which
LLVM writes the WQM-related input requirements to the main shader part binary,
and then Mesa specializes the prolog part accordingly. At that point, we may
as well just compile a monolithic shader...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D20839
llvm-svn: 272063
Summary:
This patch is adding support for the MSVC buffer security check implementation
The buffer security check is turned on with the '/GS' compiler switch.
* https://msdn.microsoft.com/en-us/library/8dbf701c.aspx
* To be added to clang here: http://reviews.llvm.org/D20347
Some overview of buffer security check feature and implementation:
* https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx
* http://www.ksyash.com/2011/01/buffer-overflow-protection-3/
* http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html
For the following example:
```
int example(int offset, int index) {
char buffer[10];
memset(buffer, 0xCC, index);
return buffer[index];
}
```
The MSVC compiler is adding these instructions to perform stack integrity check:
```
push ebp
mov ebp,esp
sub esp,50h
[1] mov eax,dword ptr [__security_cookie (01068024h)]
[2] xor eax,ebp
[3] mov dword ptr [ebp-4],eax
push ebx
push esi
push edi
mov eax,dword ptr [index]
push eax
push 0CCh
lea ecx,[buffer]
push ecx
call _memset (010610B9h)
add esp,0Ch
mov eax,dword ptr [index]
movsx eax,byte ptr buffer[eax]
pop edi
pop esi
pop ebx
[4] mov ecx,dword ptr [ebp-4]
[5] xor ecx,ebp
[6] call @__security_check_cookie@4 (01061276h)
mov esp,ebp
pop ebp
ret
```
The instrumentation above is:
* [1] is loading the global security canary,
* [3] is storing the local computed ([2]) canary to the guard slot,
* [4] is loading the guard slot and ([5]) re-compute the global canary,
* [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling.
Overview of the current stack-protection implementation:
* lib/CodeGen/StackProtector.cpp
* There is a default stack-protection implementation applied on intermediate representation.
* The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie.
* An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast).
* Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling.
* Guard manipulation and comparison are added directly to the intermediate representation.
* lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
* lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
* There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls).
* see long comment above 'class StackProtectorDescriptor' declaration.
* The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr).
* 'getSDagStackGuard' returns the appropriate stack guard (security cookie)
* The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'.
* include/llvm/Target/TargetLowering.h
* Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'.
* lib/Target/X86/X86ISelLowering.cpp
* Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm.
Function-based Instrumentation:
* The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions.
* To support function-based instrumentation, this patch is
* adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h),
* If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue.
* modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation,
* generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp),
* if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp).
Modifications
* adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp)
* adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h)
Results
* IR generated instrumentation:
```
clang-cl /GS test.cc /Od /c -mllvm -print-isel-input
```
```
*** Final LLVM Code input to ISel ***
; Function Attrs: nounwind sspstrong
define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 {
entry:
%StackGuardSlot = alloca i8* <<<-- Allocated guard slot
%0 = call i8* @llvm.stackguard() <<<-- Loading Stack Guard value
call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot) <<<-- Prologue intrinsic call (store to Guard slot)
%index.addr = alloca i32, align 4
%offset.addr = alloca i32, align 4
%buffer = alloca [10 x i8], align 1
store i32 %index, i32* %index.addr, align 4
store i32 %offset, i32* %offset.addr, align 4
%arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0
%1 = load i32, i32* %index.addr, align 4
call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false)
%2 = load i32, i32* %index.addr, align 4
%arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2
%3 = load i8, i8* %arrayidx, align 1
%conv = sext i8 %3 to i32
%4 = load volatile i8*, i8** %StackGuardSlot <<<-- Loading Guard slot
call void @__security_check_cookie(i8* %4) <<<-- Epilogue function-based check
ret i32 %conv
}
```
* SelectionDAG generated instrumentation:
```
clang-cl /GS test.cc /O1 /c /FA
```
```
"?example@@YAHHH@Z": # @"\01?example@@YAHHH@Z"
# BB#0: # %entry
pushl %esi
subl $16, %esp
movl ___security_cookie, %eax <<<-- Loading Stack Guard value
movl 28(%esp), %esi
movl %eax, 12(%esp) <<<-- Store to Guard slot
leal 2(%esp), %eax
pushl %esi
pushl $204
pushl %eax
calll _memset
addl $12, %esp
movsbl 2(%esp,%esi), %esi
movl 12(%esp), %ecx <<<-- Loading Guard slot
calll @__security_check_cookie@4 <<<-- Epilogue function-based check
movl %esi, %eax
addl $16, %esp
popl %esi
retl
```
Reviewers: kcc, pcc, eugenis, rnk
Subscribers: majnemer, llvm-commits, hans, thakis, rnk
Differential Revision: http://reviews.llvm.org/D20346
llvm-svn: 272053
This patch does a few things:
- Unifies AttrAll and AttrUnknown (since they were used for more or less
the same purpose anyway).
- Introduces AttrEscaped, an attribute that notes that a value escapes
our analysis for a given set, but not that an unknown value flows into
said set.
- Removes functions that take bit indices, since we also had functions
that took bitsets, and the use of both (with similar names) was
unclear and bug-prone.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21000
llvm-svn: 272040
These instructions end in "S" but are not flag-setting, so they need including
in the list of special cases in the assembly parser.
Differential Revision: http://reviews.llvm.org/D21077
llvm-svn: 272015
Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins.
This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch.
We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction.
There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets.
Differential Review: http://reviews.llvm.org/D20965
llvm-svn: 272010
Using an LLVM IR aggregate return value type containing three
or more integer values causes an abort in the fast isel pass.
This patch adds two more registers to RetCC_PPC64_ELF_FIS to
allow returning up to four integers with fast isel, just the
same as is currently supported with regular isel (RetCC_PPC).
This is needed for Swift and (possibly) other non-clang frontends.
Fixes PR26190.
llvm-svn: 272005
We currently only combine to blend+zero if the target value type has 8 elements or less, but this was missing a lot of cases where the combined mask had been widened.
This change makes it so we use the combined mask to determine the blend value type, allowing us to catch more widened cases.
llvm-svn: 272003
A Thumb-2 post-indexed LDR instruction such as:
ldr.w r0, [r1], #4
Can be rewritten as:
ldm.n r1!, {r0}
LDMs can be more expensive than LDRs on some cores, so this has been enabled only in minsize mode.
llvm-svn: 272002
If we have an LDM that uses only low registers and doesn't write to its base register:
ldm.w r0, {r1, r2, r3}
And that base register is dead after the LDM, then we can convert it to writeback form and use a narrow encoding:
ldm.n r0!, {r1, r2, r3}
Obviously, this introduces a new register write and so can cause WAW hazards, so I've enabled it only in minsize mode. This is a code size trick that ARM Compiler 5 ("armcc") does that we don't.
llvm-svn: 272000
SHT_GNU_verneed (.gnu.version_r) is a version dependency section.
It was the last symbol versioning relative section that was not dumped,
now it is.
Differential revision: http://reviews.llvm.org/D21024
llvm-svn: 271998
The Thumb2 conditional branch B<cond>.W has a different encoding (T3)
to the unconditional branch B.W (T4) as it needs to record <cond>.
As the encoding is different the B<cond>.W is given a different
relocation type.
ELF for the ARM Architecture 4.6.1.6 (Table-13) states that
R_ARM_THM_JUMP19 should be used for B<cond>.W. At present the
MC layer is using the R_ARM_THM_JUMP24 from B.W.
This change makes B<cond>.W use R_ARM_THM_JUMP19 and alters the
existing test that checks for R_ARM_THM_JUMP24 to expect
R_ARM_THM_JUMP19.
llvm-svn: 271997
Unlike native shifts, the AVX2 per-element shift instructions VPSRAV/VPSRLV/VPSLLV handle out of range shift values (logical shifts set the result to zero, arithmetic shifts splat the sign bit).
If the shift amount is constant we can sometimes convert these instructions to native shifts:
1 - if all shift amounts are in range then the conversion is trivial.
2 - out of range arithmetic shifts can be clamped to the (bitwidth - 1) (a legal shift amount) before conversion.
3 - logical shifts just return zero if all elements have out of range shift amounts.
In addition, UNDEF shift amounts are handled - either as an UNDEF shift amount in a native shift or as an UNDEF in the logical 'all out of range' zero constant special case for logical shifts.
Differential Revision: http://reviews.llvm.org/D19675
llvm-svn: 271996
This patch adds support for folding undef/zero/constant inputs to MOVMSK instructions.
The SSE/AVX versions can be fully folded, but the MMX version can only handle undef inputs.
Differential Revision: http://reviews.llvm.org/D20998
llvm-svn: 271990
TLS access requires an offset from the TLS index. The index itself is the
section-relative distance of the symbol. For ARM, the relevant relocation
(IMAGE_REL_ARM_SECREL) is applied as a constant. This means that the value may
not be an immediate and must be lowered into a constant pool. This offset will
not be base relocated. We were previously emitting the actual address of the
symbol which would be base relocated and would therefore be the vaue offset by
the ImageBase + TLS Offset.
llvm-svn: 271974
This reverts commit r271962 and reinstantes r271957.
MSVC's linker doesn't appear to like it if you have an empty symbol
substream, so only open a symbol substream if we're going to emit
something about globals into it.
Makes check-asan pass.
llvm-svn: 271965
scalarizePHI only looked for phis that have exactly two uses - the "latch"
use, and an extract. Unfortunately, we can not assume all equivalent extracts
are CSE'd, since InstCombine itself may create an extract which is a duplicate
of an existing one. This extends it to handle several distinct extracts from
the same index.
This should fix at least some of the performance regressions from PR27988.
Differential Revision: http://reviews.llvm.org/D20983
llvm-svn: 271961
Changes since the initial commit:
- Normalize file paths read from the file to prevent Windows path
separators from escaping parts of the path.
- Since we need to store the normalized file paths in WeightedFile,
don't do tricky things to keep the source MemoryBuffer alive.
- Don't use list-initialization for a std::string in WeightedFile.
Differential Revision: http://reviews.llvm.org/D20980
llvm-svn: 271953
Changes since the initial commit:
- Normalize file paths read from the file to prevent Windows path
separators from escaping parts of the path.
- Since we need to store the normalized file paths in WeightedFile,
don't do tricky things to keep the source MemoryBuffer alive.
Differential Revision: http://reviews.llvm.org/D20980
llvm-svn: 271949
This is the simplest possible patch to get some kind of YAML
output. All it dumps is the MSF header fields so that in
theory an empty MSF file could be reconstructed.
Reviewed By: ruiu, majnemer
Differential Revision: http://reviews.llvm.org/D20971
llvm-svn: 271939
In some cases, when simplifying with SCEV, we might consider pointer values as
just usual integer values. Thus, we might get a different type from what we
had originally in the map of simplified values, and hence we need to check
types before operating on the values.
This fixes PR28015.
llvm-svn: 271931
Summary:
Fix LSRInstance::HoistInsertPosition() to check the original insert
position block first for a canonical insertion point that is dominated
by all inputs. This leads to SCEV being able to reuse more instructions
since it currently tracks the instructions it creates for reuse by
keeping a table of <Value, insert point> pairs.
Originally reviewed in http://reviews.llvm.org/D18001
Reviewers: atrick
Subscribers: llvm-commits, mzolotukhin, mcrosier
Differential Revision: http://reviews.llvm.org/D18480
llvm-svn: 271929
The data strucutre in the new FPO stream is described in the
PE/COFF spec. There is one record per function if frame pointer
is omitted.
Differential Revision: http://reviews.llvm.org/D20999
llvm-svn: 271926
The code layout that TailMerging (inside BranchFolding) works on is not the
final layout optimized based on the branch probability. Generally, after
BlockPlacement, many new merging opportunities emerge.
This patch calls Tail Merging after MBP and calls MBP again if Tail Merging
merges anything.
Differential Revision: http://reviews.llvm.org/D20276
llvm-svn: 271925
In r271810 ( http://reviews.llvm.org/rL271810 ), I loosened the check
above this to work for any Constant rather than ConstantInt. AFAICT,
that part makes sense if we can determine that the shrunken/extended
constant remained equal. But it doesn't make sense for this later
transform where we assume that the constant DID change.
This could assert for a ConstantExpr:
https://llvm.org/bugs/show_bug.cgi?id=28011
And it could be wrong for a vector as shown in the added regression test.
llvm-svn: 271908
Another step for unification llvm assembler/disassembler with sp3.
Besides, CodeGen output is a bit improved, thus changes in CodeGen tests.
Assembler/Disassembler tests updated/added.
Differential Revision: http://reviews.llvm.org/D20796
llvm-svn: 271900
Summary:
This hasn't been caught before because it requires noalias or similarly
strong alias analysis to actually reproduce.
Fixes http://llvm.org/PR27952 .
Reviewers: hfinkel, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20944
llvm-svn: 271858
Since FoldOpIntoPhi speculates the binary operation to potentially each
of the predecessors of the PHI node (pulling it out of arbitrary control
dependence in the process), we can FoldOpIntoPhi only if we know the
operation doesn't have UB.
This also brings up an interesting profitability question -- the way it
is written today, commonIRemTransforms will hoist out work from
dynamically dead code into code that will execute at runtime. Perhaps
that isn't the best canonicalization?
Fixes PR27968.
llvm-svn: 271857
Summary:
There are some rough corners, since the new pass manager doesn't have
(as far as I can tell) LoopSimplify and LCSSA, so I've updated the
tests to run them separately in the old pass manager in the lit tests.
We also don't have an equivalent for AU.setPreservesCFG() in the new
pass manager, so I've left a FIXME.
Reviewers: bogner, chandlerc, davide
Subscribers: sanjoy, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20783
llvm-svn: 271846
A basic block could contain:
%cp = cleanuppad []
cleanupret from %cp unwind to caller
This basic block is empty and is thus a candidate for removal. However,
there can be other uses of %cp outside of this basic block. This is
only possible in unreachable blocks.
Make our transform more correct by checking that the pad has a single
user before removing the BB.
This fixes PR28005.
llvm-svn: 271816
Windows itanium is nearly identical to windows-msvc (MS ABI for C, itanium for
C++). Enable the TLS support for the target similar to the MSVC model.
llvm-svn: 271797
The AVX2 v16i16 shift lowering works by unpacking to 2 x v8i32, performing the shift and then truncating the result.
The unpacking is used to place the values in the upper 16-bits so that we can correctly sign-extend for SRA shifts. Unfortunately we weren't ensuring that the lower 16-bits were zero to ensure that SHL correctly shifts in zero bits.
llvm-svn: 271796
Add the MMX implementation to the SimplifyDemandedUseBits SSE/AVX MOVMSK support added in D19614
Requires a minor tweak as llvm.x86.mmx.pmovmskb takes a x86_mmx argument - so we have to be explicit about the implied v8i8 vector type.
llvm-svn: 271789
Original commit message:
[sancov] Run sancov tests on more platforms
The only tests that need to be run on Linux are the ones that use C++
demangling. I'm assuming they will fail on Mac, since __cxa_demangle
there won't handle the non-double-underscore prefixed mangled names.
llvm-svn: 271763
and/or tests aren't working on Windows currently.
There seems to be some problem with quoting the file paths. I don't
understand the test structure here or the code well enough to try to
come up with a way to correctly handle paths with back slashes in them,
and this has caused the Windows builds to be failing for 7 hours now, so
I'm reverting the whole thing to bring them back to life. Sorry for the
disruption, but a couple of these were bug fixes anyways that can be
folded into a fresh commit.
Reverts the following patches:
r271756: Clean up the way we create the input filenames buffer (NFC)
r271748: Fix use-after-free from discarded MemoryBuffer (NFC)
r271710: Fix option description (NFC)
r271709: Add option to ingest filepaths from a file
llvm-svn: 271760
Summary:
Adds an option -esan-assume-intra-cache-line which causes esan to assume
that a single memory access touches just one cache line, even if it is not
aligned, for better performance at a potential accuracy cost. Experiments
show that the performance difference can be 2x or more, and accuracy loss
is typically negligible, so we turn this on by default. This currently
applies just to the working set tool.
Reviewers: aizatsky
Subscribers: vitalybuka, zhaoqin, kcc, eugenis, llvm-commits
Differential Revision: http://reviews.llvm.org/D20978
llvm-svn: 271743
Summary:
Previously we would try to load PDBs for every PE executable we tried to
symbolize. If that failed, we would fall back to DWARF. If there wasn't
any DWARF, we'd print mostly useless symbol information using the export
table.
With this change, we only try to load PDBs for executables that claim to
have them. If that fails, we can now print an error rather than falling
back silently. This should make it a lot easier to diagnose and fix
common symbolization issues, such as not having DIA or not having a PDB.
Reviewers: zturner, eugenis
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20982
llvm-svn: 271725
Under emscripten, C code can take the address of a function implemented
in Javascript (which is exposed via an import in wasm). Because imports
do not have linear memory address in wasm, we need to generate a thunk
to be the target of the indirect call; it call the import directly.
To make this possible, LLVM needs to emit the type signatures for these
functions, because they may not be called directly or referred to other
than where the address is taken.
This uses s new .s directive (.functype) which specifies the signature.
Differential Revision: http://reviews.llvm.org/D20891
Re-apply r271599 but instead of bailing with an error when a declared
function has multiple returns, replace it with a pointer argument. Also
add the test case I forgot to 'git add' last time around.
llvm-svn: 271703
The only tests that need to be run on Linux are the ones that use C++
demangling. I'm assuming they will fail on Mac, since __cxa_demangle
there won't handle the non-double-underscore prefixed mangled names.
llvm-svn: 271695
This re-applies r271611, and hopefully the bots won't break this time.
Although ld64 always outputs linkedit data in the same order, it isn't actually required to. This change makes yaml2obj resilient if the offsets are in arbitrary order.
llvm-svn: 271687
This only translates data members for now. Translating overloaded
methods is complicated, so I stopped short of doing that.
Reviewers: aaboud
Differential Revision: http://reviews.llvm.org/D20924
llvm-svn: 271680
We were assuming all SBFX-like operations would have the shl/asr form, but often
when the field being extracted is an i8 or i16, we end up with a
SIGN_EXTEND_INREG acting on a shift instead.
This is a port of r213754 from ARM to AArch64.
llvm-svn: 271677
There was concern that creating bitcasts for the simpler potential select pattern:
define <2 x i64> @vecBitcastOp1(<4 x i1> %cmp, <2 x i64> %a) {
%a2 = add <2 x i64> %a, %a
%sext = sext <4 x i1> %cmp to <4 x i32>
%bc = bitcast <4 x i32> %sext to <2 x i64>
%and = and <2 x i64> %a2, %bc
ret <2 x i64> %and
}
might lead to worse code for some targets, so this patch is matching the larger
patterns seen in the test cases.
The motivating example for this patch is this IR produced via SSE intrinsics in C:
define <2 x i64> @gibson(<2 x i64> %a, <2 x i64> %b) {
%t0 = bitcast <2 x i64> %a to <4 x i32>
%t1 = bitcast <2 x i64> %b to <4 x i32>
%cmp = icmp sgt <4 x i32> %t0, %t1
%sext = sext <4 x i1> %cmp to <4 x i32>
%t2 = bitcast <4 x i32> %sext to <2 x i64>
%and = and <2 x i64> %t2, %a
%neg = xor <4 x i32> %sext, <i32 -1, i32 -1, i32 -1, i32 -1>
%neg2 = bitcast <4 x i32> %neg to <2 x i64>
%and2 = and <2 x i64> %neg2, %b
%or = or <2 x i64> %and, %and2
ret <2 x i64> %or
}
For an AVX target, this is currently:
vpcmpgtd %xmm1, %xmm0, %xmm2
vpand %xmm0, %xmm2, %xmm0
vpandn %xmm1, %xmm2, %xmm1
vpor %xmm1, %xmm0, %xmm0
retq
With this patch, it becomes:
vpmaxsd %xmm1, %xmm0, %xmm0
Differential Revision: http://reviews.llvm.org/D20774
llvm-svn: 271676
new instruction to ARM and AArch64 targets and several system registers.
Patch by: Roger Ferrer Ibanez and Oliver Stannard
Differential Revision: http://reviews.llvm.org/D20282
llvm-svn: 271670
Summary:
Added custom converters for SDWA instruction to support optional operands and modifiers.
Support for _dpp and _sdwa suffixes that allows to force DPP or SDWA encoding for instructions.
Reviewers: tstellarAMD, vpykhtin, artem.tamazov
Subscribers: arsenm, kzhuravl
Differential Revision: http://reviews.llvm.org/D20625
llvm-svn: 271655
Summary:
N32 support will follow in a later patch since the symbol version of 'la'
incorrectly believes N32 to have 64-bit pointers and rejects it early.
This fixes the three incorrectly expanded 'la' macros found in bionic.
Reviewers: sdardis
Subscribers: dsanders, llvm-commits, sdardis
Differential Revision: http://reviews.llvm.org/D20820
llvm-svn: 271644
This patch begins adding support for lowering to the XOP VPERMIL2PD/VPERMIL2PS shuffle instructions - adding the X86ISD::VPERMIL2 opcode and cleaning up the usage.
The internal llvm intrinsics were assuming the shuffle mask operand was the same type as the float/double input operands (I guess to simplify the intrinsic definitions in X86InstrXOP.td to a single value type). These needed changing to integer types (matching the clang builtin and the AMD intrinsics definitions), an auto upgrade path is added to convert old calls.
Mask decoding/target shuffle support will be added in future patches.
Differential Revision: http://reviews.llvm.org/D20049
llvm-svn: 271633
When printing line information and file checksums, we were printing
the file offset field from the struct header. This teaches
llvm-pdbdump how to turn those numbers into the filename. In the
case of file checksums, this is done by looking in the global
string table. In the case of line contributions, this is done
by indexing into the file names buffer of the DBI stream. Why
they use a different technique I don't know.
llvm-svn: 271630
To facilitate this, a couple of changes had to be made:
1. `ModuleSubstream` got moved from `DebugInfo/PDB` to
`DebugInfo/CodeView`, and various codeview related types are defined
there. It turns out `DebugInfo/CodeView/Line.h` already defines many of
these structures, but this is really old code that is not endian aware,
doesn't interact well with `StreamInterface` and not very helpful for
getting stuff out of a PDB. Eventually we should migrate the old readobj
`COFFDumper` code to these new structures, or at least merge their
functionality somehow.
2. A `ModuleSubstream` visitor is introduced. Depending on where your
module substream array comes from, different subsets of record types can
be expected. We are already hand parsing these substream arrays in many
places especially in `COFFDumper.cpp`. In the future we can migrate these
paths to the visitor as well, which should reduce a lot of code in
`COFFDumper.cpp`.
Differential Revision: http://reviews.llvm.org/D20936
Reviewed By: ruiu, majnemer
llvm-svn: 271621
Summary:
Instrument GEP instruction for counting the number of struct field
address calculation to approximate the number of struct field accesses.
Adds test struct_field_count_basic.ll to test the struct field
instrumentation.
Reviewers: bruening, aizatsky
Subscribers: junbuml, zhaoqin, llvm-commits, eugenis, vitalybuka, kcc, bruening
Differential Revision: http://reviews.llvm.org/D20892
llvm-svn: 271619
Although ld64 always outputs linkedit data in the same order, it isn't actually required to. This change makes yaml2obj resilient if the offsets are in arbitrary order.
llvm-svn: 271611
This reverts r271599, it broke the integration tests.
More places than I expected had nontrival return types in imports, or
else the check was wrong.
llvm-svn: 271606
This commit adds round tripping for MachO symbol data. Symbols are entries in the name list, that contain offsets into the string table which is at the end of the __LINKEDIT segment.
llvm-svn: 271604
The original tests were intended to show a missing transform that would
be solved by D20774:
http://reviews.llvm.org/D20774
But it's not clear that the transform for the simpler tests is a win for
all targets. Make the tests show a larger pattern that should be a win
regardless of the cost of bitcast instructions.
llvm-svn: 271603
Summary:
The module pass pipeline includes a late LICM run after loop
unrolling. LCSSA is implicitly run as a pass dependency of LICM. However no
cleanup pass was run after this, so the LCSSA nodes ended in the optimized output.
Reviewers: hfinkel, mehdi_amini
Subscribers: majnemer, bruno, mzolotukhin, mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D20606
llvm-svn: 271602
Under emscripten, C code can take the address of a function implemented
in Javascript (which is exposed via an import in wasm). Because imports
do not have linear memory address in wasm, we need to generate a thunk
to be the target of the indirect call; it call the import directly.
To make this possible, LLVM needs to emit the type signatures for these
functions, because they may not be called directly or referred to other
than where the address is taken.
This uses s new .s directive (.functype) which specifies the signature.
Differential Revision: http://reviews.llvm.org/D20891
llvm-svn: 271599
This first pass only splits apart the records and dumps the line
info kinds and binary data. Subsequent patches will parse out
the binary data into more useful information and dump it in
detail.
llvm-svn: 271576
There are a lot of different kinds of loads to test for,
and these were scattered around inconsistently with
some redundancy. Try to comprehensively test all loads
in a consistent way.
llvm-svn: 271571
If the processor name failed to parse for amdgcn,
the resulting output would have R600 ISA in it.
If the processor name was missing or invalid for R600,
the wavefront size would not be set and there would be
crashes from missing itinerary data.
Fixes crashes in future commit caused by dividing by the unset/0
wavefront size.
llvm-svn: 271561
Unlike other sections that can grow to any size, the COFF section header
stream has maximum length because each record is fixed size and the COFF
file format limits the maximum number of sections. So I decided to not
create a specific stream class for it. Instead, I added a member function
to DbiStream class which returns a vector of COFF headers.
Differential Revision: http://reviews.llvm.org/D20717
llvm-svn: 271557
Summary:
Also convert test/CodeGen/PowerPC/vsx-ldst-builtin-le.ll to use
FileCheck instead of two grep and count runs.
This change is needed to avoid spurious diffs in these tests when
EarlyCSE is improved to use MemorySSA and can do more load elimination.
Reviewers: hfinkel
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20238
llvm-svn: 271553
The DIType* for void is the null pointer. A null DIType can never be a
qualified type, so we can just exit the loop at this point and go to
getTypeIndex(BaseTy).
Fixes PR27984
llvm-svn: 271550
Summary:
In PR29973 Sanjay Patel reported an assertion failure when a certain
loop was optimized, for a target without SSE2 support. It turned out
this was because of the AVG pattern detection introduced in rL253952.
Prevent the assertion failure by bailing out early in
`detectAVGPattern()`, if the target does not support SSE2.
Also add a minimized test case.
Reviewers: congh, eli.friedman, spatel
Subscribers: emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D20905
llvm-svn: 271548
Do not issue lexing errors found during the parsing of macro body
definitions and parseIdentifier function in AsmParser. This changes the
Parser to not issue a lexing error when we reach an error, but rather
when it is consumed allowing us time to examine and recover from an
error.
As a result, of this, we stop issuing a both lexing error and a parsing
error in floating-literals test. Minor tweak to parseDirectiveRealValue
to favor more meaningful lexing error over less helpful parse error.
Reviewers: rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20535
llvm-svn: 271542
This directory is used to find if there is a PDB associated with an
executable. I plan to use this functionality to teach llvm-symbolizer
whether it should use DIA or DWARF to symbolize a given DLL.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D20885
llvm-svn: 271539
Inline virtual functions has linkeonceodr linkage (emitted in comdat on
supporting targets). If the vtable for the class is not emitted in the
defining module, function won't be address taken thus its address is not
recorded. At the mercy of the linker, if the per-func prf_data from this
module (in comdat) is picked at link time, we will lose mapping from
function address to its hash val. This leads to missing icall promotion.
The second test case (currently disabled) in compiler_rt (r271528):
instrprof-icall-prom.test demostrates the bug. The first profile-use
subtest is fine due to linker order difference.
With this change, no missing icall targets is found in instrumented clang's
raw profile.
llvm-svn: 271532
Summary:
When this flag is specified, the target llvm-lto is not built, but is still
used as a dependency of the test targets. cmake 2.8 silently ignored this
situation, but with cmake_minimum_required(3.4) it becomes an error. Fix this
by avoiding the inclusion of the target as a dependency.
Reviewers: beanz
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20882
llvm-svn: 271530
Summary:
If the target requests it, use emptry spaces in the fixed and
callee-save stack area to allocate local stack objects.
AArch64: Change last callee-save reg stack object alignment instead of
size to leave a gap to take advantage of above change.
Reviewers: t.p.northover, qcolombet, MatzeB
Subscribers: rengolin, mcrosier, llvm-commits, aemerson
Differential Revision: http://reviews.llvm.org/D20220
llvm-svn: 271527
Although this was intended to be NFC, the test case wiggle shows a change in
code scheduling/RA caused by a difference in the SDLoc() generation.
Depending on how you look at it, this is the (dis)advantage of exact checking
in regression tests.
llvm-svn: 271526
This patch removes the llvm intrinsics (V)CVTTPS2DQ and VCVTTPD2DQ truncation (round to zero) conversions and auto-upgrades to FP_TO_SINT calls instead.
Note: I looked at updating CVTTPD2DQ as well but this still requires a lot more work to correctly lower.
Differential Revision: http://reviews.llvm.org/D20860
llvm-svn: 271510
Use the type index of the underlying type unless we have a typedef from
long to HRESULT; HRESULT typedefs are translated to T_HRESULT.
llvm-svn: 271494
I'm not sure why this was missing for so long.
This also exposed that we were picking floating point 256-bit VMOVNTPS for some integer types in normal isel for AVX1 even though VMOVNTDQ is available. In practice it doesn't matter due to the execution dependency fix pass, but it required extra isel patterns. Fixing that in a follow up commit.
llvm-svn: 271481
Add support for the new pass manager to MemorySSA pass.
Change MemorySSA to be computed eagerly upon construction.
Change MemorySSAWalker to be owned by the MemorySSA object that creates
it.
Reviewers: dberlin, george.burgess.iv
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19664
llvm-svn: 271432
When the index is known to be constant 0, insert directly into the the low half,
instead of spilling, performing the insert in-memory, and reloading.
Differential Revision: http://reviews.llvm.org/D20763
llvm-svn: 271428
Fix PR27943 "Bad machine code: Using an undefined physical register".
SUBFC8 implicitly defines the CR0 register, but this was omitted in
the instruction definition.
Patch by Jameson Nash <jameson@juliacomputing.com>
Reviewers: hfinkel
Differential Revision: http://reviews.llvm.org/D20802
llvm-svn: 271425
This patch extends CFLAA to recognize allocation functions such as
malloc, free, etc, so we can treat them more aggressively.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D20776
llvm-svn: 271421
Patch by Taewook Oh
Summary: Patch for Bug 27478. Make BasicAliasAnalysis claims NoAlias if two GEPs index different fields of the same structure.
Reviewers: hfinkel, dberlin
Subscribers: dberlin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20665
llvm-svn: 271415
Summary:
Re-enable lifetime-start-on-first-use for stack coloring,
but explicitly disable it for slots with more than one start
or end lifetime marker.
Bug: 27903
Reviewers: wmi, tejohnson, qcolombet, gbiv
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20739
llvm-svn: 271412
Previously, whenever we needed a vector IV, we would create it on the fly,
by splatting the scalar IV and adding a step vector. Instead, we can create a
real vector IV. This tends to save a couple of instructions per iteration.
This only changes the behavior for the most basic case - integer primary
IVs with a constant step.
Differential Revision: http://reviews.llvm.org/D20315
llvm-svn: 271410
Summary:
This is meant to be the tiniest step towards DIType to CV type index
translation that I could come up with. Whenever translation fails, we use type
index zero, which is the unknown type.
Reviewers: aaboud, zturner
Subscribers: llvm-commits, amccarth
Differential Revision: http://reviews.llvm.org/D20840
llvm-svn: 271408
Summary:
Change some of the internal interfaces in Loads.cpp to keep track of the
number of bytes we're trying to prove dereferenceable using an explicit
`Size` parameter.
Before this, the `Size` parameter was implicitly inferred from the
pointee type of the pointer whose dereferenceability we were trying to
prove, causing us to be conservative around bitcasts. This was
unfortunate since bitcast instructions are no-ops and should never
break optimizations. With an explicit `Size` parameter, we're more
precise (as shown in the test cases), and the code is simpler.
We should eventually move towards a `DerefQuery` struct that groups
together a base pointer, an offset, a size and an alignment; but this
patch is a first step.
Reviewers: apilipenko, dblaikie, hfinkel, reames
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20764
llvm-svn: 271406
Summary:
It isn't clear what is the operational meaning of loading or storing an
unsized types, since it cannot be lowered into something meaningful.
Since there does not seem to be any practical need for it either, make
such loads and stores illegal IR.
Reviewers: majnemer, chandlerc
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20846
llvm-svn: 271402
This adds an additional matcher to select UBFX(..) from SRL(AND(..)) in
ARMISelDAGToDAG to help with code size.
Patch by David Green.
Differential Revision: http://reviews.llvm.org/D20667
llvm-svn: 271384
This will be necessary to allow the global merge pass to attach
multiple debug info metadata nodes to global variables once we reverse
the edge from DIGlobalVariable to GlobalVariable.
Differential Revision: http://reviews.llvm.org/D20414
llvm-svn: 271358
This patch adds an IR, assembly and bitcode representation for metadata
attachments for globals. Future patches will port existing features to use
these new attachments.
Differential Revision: http://reviews.llvm.org/D20074
llvm-svn: 271348
Refactor LiveIntervals::renameDisconnectedComponents() to be a pass.
Also change the name to "RenameIndependentSubregs":
- renameDisconnectedComponents() worked on a MachineFunction at a time
so it is a natural candidate for a machine function pass.
- The algorithm is testable with a .mir test now.
- This also fixes a problem where the lazy renaming as part of the
MachineScheduler introduced IMPLICIT_DEF instructions after the number
of a nodes in a region were counted leading to a mismatch.
Differential Revision: http://reviews.llvm.org/D20507
llvm-svn: 271345
Physregs have no associated register class, do not attempt to modify it
in Thumb2InstrInfo::storeRegToStackSlot()/loadFromStackSlot().
llvm-svn: 271339
This patch fixes bug https://llvm.org/bugs/show_bug.cgi?id=27897.
When query memory access cost, current SLP always passes in alignment value of 1 (unaligned), so it gets a very high cost of scalar memory access, and wrongly vectorize memory loads in the test case.
It can be fixed by simply giving correct alignment.
llvm-svn: 271333
when the object is from a slice of a Mach-O Universal Binary use something like
"foo.o (for architecture i386)" as part of the error message when expected.
Also fixed places in these tools that were ignoring object file errors from
MachOUniversalBinary::getAsObjectFile() when the code moved on to see if
the slice was an archive.
To do this MachOUniversalBinary::getAsObjectFile() and
MachOUniversalBinary::getObjectForArch() were changed from returning
ErrorOr<...> to Expected<...> then that was threaded up to its users.
Converting these interfaces to Expected<> from ErrorOr<> does involve
touching a number of places. To contain the changes for now the use of
errorToErrorCode() is still used in two places yet to be fully converted.
llvm-svn: 271332
Code like the following is considered broken, and doesn't need to be
supported by our AA magicks:
void getFoo(int *P) {
int *PAlias = (int *)((char *)NULL + (uintptr_t)P);
}
This patch makes CFLAA drop support for code like this.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D20775
llvm-svn: 271322
We think it's OK to generate half fminnan because it's legal for the
transform-to type (f32; r245196). However, PromoteFloatRes was missing
the case; simply promote like the other binops, including minnum.
llvm-svn: 271317
Adds the method MCStreamer::EmitBinaryData, which is usually an alias
for EmitBytes. In the MCAsmStreamer case, it is overridden to emit hex
dump output like this:
.byte 0x0e, 0x00, 0x08, 0x10
.byte 0x03, 0x00, 0x00, 0x00
.byte 0x00, 0x00, 0x00, 0x00
.byte 0x00, 0x10, 0x00, 0x00
Also, when verbose asm comments are enabled, this patch prints the dump
output for each comment before its record, like this:
# ArgList (0x1000) {
# TypeLeafKind: LF_ARGLIST (0x1201)
# NumArgs: 0
# Arguments [
# ]
# }
.byte 0x06, 0x00, 0x01, 0x12
.byte 0x00, 0x00, 0x00, 0x00
This should make debugging easier and testing more convenient.
Reviewers: aaboud
Subscribers: majnemer, zturner, amccarth, aaboud, llvm-commits
Differential Revision: http://reviews.llvm.org/D20711
llvm-svn: 271313
A constant pool holding the address of a variable in equivalent to
a got entry. It produces exactly the same instruction sequence as a
got use and unlike a got use this is not uniqued by the linker.
llvm-svn: 271311
Enforce compact branch register restrictions such as the use of the zero
register, both operands being the same register. Emit clear error in such
cases as the issue is subtle.
For bovc and bnvc, silently fixup such cases when emitting objects directly,
like LLVM started doing in rL269899.
Reviewers: vkalintiris, dsanders
Differential Review: http://reviews.llvm.org/D20475
llvm-svn: 271301
The MachO export trie is a serially encoded trie keyed by symbol name. This code parses the trie and preserves the structure so that it can be dumped again.
llvm-svn: 271300
The assumption, made in insert() that weak functions are always inserted after strong functions,
is only true in the first round of adding functions.
In subsequent rounds this is no longer guaranteed , because we might remove a strong function from the tree (because it's modified) and add it later,
where an equivalent weak function already exists in the tree.
This change removes the assert in insert() and explicitly enforces a weak->strong order.
This also removes the need of two separate loops in runOnModule().
llvm-svn: 271299
Summary:
Creates a global variable containing preliminary information
for the cache-fragmentation tool runtime.
Passes a pointer to the variable (null if no variable is created) to the
compilation unit init and exit routines in the runtime.
Reviewers: aizatsky, bruening
Subscribers: filcab, kubabrecka, bruening, kcc, vitalybuka, eugenis, llvm-commits, zhaoqin
Differential Revision: http://reviews.llvm.org/D20541
llvm-svn: 271298
Added support to map intrinsics
__builtin_arm_{ldc,ldcl,ldc2,ldc2l,stc,stcl,stc2,stc2l}
to their ARM instructions.
Differential Revision: http://reviews.llvm.org/D20564
llvm-svn: 271271
beqc and bnec cannot have $rs == $rt. Inhibit compact branch creation
if that would occur.
Reviewers: vkalintiris, dsanders
Differential Revision: http://reviews.llvm.org/D20624
llvm-svn: 271260
This adds support to the backed to actually support SjLj EH as an exception
model. This is *NOT* the default model, and requires explicitly opting into it
from the frontend. GCC supports this model and for MinGW can still be enabled
via the `--using-sjlj-exceptions` options.
Addresses PR27749!
llvm-svn: 271244
The exit-on-error flag is necessary to avoid some assertions/unreachables. We
can get past them by creating a few dummy nodes.
Fixes PR27768, PR27769.
Differential Revision: http://reviews.llvm.org/D20726
llvm-svn: 271200
Summary:
If we can prove that an op.with.overflow intrinsic does not overflow, we
can get rid of the intrinsic, and replace it with non-wrapping
arithmetic.
This was first checked in at r265913 but reverted in r265950 because it
exposed some issues around how SCEV handled post-inc add recurrences.
Those issues have now been fixed.
Reviewers: atrick, regehr
Subscribers: sanjoy, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18685
llvm-svn: 271153
Summary:
This change teaches SCEV to see reduce `(extractvalue
0 (op.with.overflow X Y))` into `op X Y` (with a no-wrap tag if
possible).
This was first checked in at r265912 but reverted in r265950 because it
exposed some issues around how SCEV handled post-inc add recurrences.
Those issues have now been fixed.
Reviewers: atrick, regehr
Subscribers: mcrosier, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18684
llvm-svn: 271152
Fixes PR27315.
The post-inc version of an add recurrence needs to "follow the same
rules" as a normal add or subtract expression. Otherwise we miscompile
programs like
```
int main() {
int a = 0;
unsigned a_u = 0;
volatile long last_value;
do {
a_u += 3;
last_value = (long) ((int) a_u);
if (will_add_overflow(a, 3)) {
// Leave, and don't actually do the increment, so no UB.
printf("last_value = %ld\n", last_value);
exit(0);
}
a += 3;
} while (a != 46);
return 0;
}
```
This patch changes SCEV to put no-wrap flags on post-inc add recurrences
only when the poison from a potential overflow will go ahead to cause
undefined behavior.
To avoid regressing performance too much, I've assumed infinite loops
without side effects is undefined behavior to prove poison<->UB
equivalence in more cases. This isn't ideal, but is not new to LLVM as
a whole, and far better than the situation I'm trying to fix.
llvm-svn: 271151
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.
Reapplied now that the the companion patch (D20684) removes/auto-upgrade the clang intrinsics has been committed.
Differential Revision: http://reviews.llvm.org/D20686
llvm-svn: 271131
Summary:
When RF_NullMapMissingGlobalValues is set, mapValue can return null
for GlobalValue. When mapping the operands of a constant that is
referenced from metadata, we need to handle this case and actually
return null instead of mapping this constant.
Reviewers: dexonsmith, rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20713
llvm-svn: 271129
We were producing R_X86_64_GOTPCRELX for invalid instructions and
sometimes producing R_X86_64_GOTPCRELX instead of
R_X86_64_REX_GOTPCRELX.
llvm-svn: 271118
It would be better to check the valid/expected size of the immediate operand, but this is
generally better than what we print right now.
Differential Revision: http://reviews.llvm.org/D20385
llvm-svn: 271114
This matches the behavior of GNU assembler which supports symbolic
expressions in absolute expressions used in assembly directives.
Differential Revision: http://reviews.llvm.org/D20752
llvm-svn: 271102