When an overflow intrinsic is followed by a non-overflow instruction,
replace the latter with an extract. For example:
%sadd = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)
%sadd3 = add i32 %a, %b
Here the add statement will be replaced by an extract.
When an overflow intrinsic follows a non-overflow instruction, a clone
of the intrinsic is inserted before the normal instruction, which makes
it the same as the previous case. Subsequent runs of GVN can then clean
up the duplicate instructions and insert the extract.
This fixes PR8817.
llvm-svn: 203553
The official specifications state the name to be ARMNT (as per the Microsoft
Portable Executable and Common Object Format Specification v8.3).
llvm-svn: 203530
When the MOVBE instructions are available, use them for 16-bit endian
swapping as well as for 32 and 64 bit.
The patterns were already present on the instructions, but weren't being
matched because the operation was unconditionally marked to 'Expand.'
Change that to be conditional on whether the MOVBE instructions are
available. Use 'rolw' to implement the in-register version (32 and 64
bit have the dedicated 'bswap' instruction for that).
Patch by Louis Gerbarg <lgg@apple.com>.
rdar://15479984
llvm-svn: 203524
During LTO, user-supplied definitions of C library functions often
exist. -instcombine uses Module::getOrInsertFunction() to get a handle
on library functions (e.g., @puts, when optimizing @printf).
Previously, Module::getOrInsertFunction() would rename any matching
functions with local linkage, and create a new declaration. In LTO,
this is the opposite of desired behaviour, as it skips by the
user-supplied version of the library function and creates a new
undefined reference which the linker often cannot resolve.
After some discussing with Rafael on the list, it looks like it's
undesired behaviour. If a consumer actually *needs* this behaviour, we
should add new API with a more explicit name.
I added two testcases: one specifically for the -instcombine behaviour
and one for the LTO flow.
<rdar://problem/16165191>
llvm-svn: 203513
the legalization cost must be included to get an accurate
estimation of the total cost of the scalarized vector.
The inaccurate cost triggered unprofitable SLP vectorization on
32-bit X86.
Summary:
Include legalization overhead when computing scalarization cost
Reviewers: hfinkel, nadav
CC: chandlerc, rnk, llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2992
llvm-svn: 203509
Summary:
When the sample profiles include discriminator information,
use the discriminator values to distinguish instruction weights
in different basic blocks.
This modifies the BodySamples mapping to map <line, discriminator> pairs
to weights. Instructions on the same line but different blocks, will
use different discriminator values. This, in turn, means that the blocks
may have different weights.
Other changes in this patch:
- Add tests for positive values of line offset, discriminator and samples.
- Change data types from uint32_t to unsigned and int and do additional
validation.
Reviewers: chandlerc
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2857
llvm-svn: 203508
Extend the error message generated by the Verifier when an intrinsic
name does not match the expected mangling to include the expected
name. Simplifies debugging.
Patch by Philip Reames!
llvm-svn: 203490
optimize a call to a llvm intrinsic to something that invovles a call to a C
library call, make sure it sets the right calling convention on the call.
e.g.
extern double pow(double, double);
double t(double x) {
return pow(10, x);
}
Compiles to something like this for AAPCS-VFP:
define arm_aapcs_vfpcc double @t(double %x) #0 {
entry:
%0 = call double @llvm.pow.f64(double 1.000000e+01, double %x)
ret double %0
}
declare double @llvm.pow.f64(double, double) #1
Simplify libcall (part of instcombine) will turn the above into:
define arm_aapcs_vfpcc double @t(double %x) #0 {
entry:
%__exp10 = call double @__exp10(double %x) #1
ret double %__exp10
}
declare double @__exp10(double)
The pre-instcombine code works because calls to LLVM builtins are special.
Instruction selection will chose the right calling convention for the call.
However, the code after instcombine is wrong. The call to __exp10 will use
the C calling convention.
I can think of 3 options to fix this.
1. Make "C" calling convention just work since the target should know what CC
is being used.
This doesn't work because each function can use different CC with the "pcs"
attribute.
2. Have Clang add the right CC keyword on the calls to LLVM builtin.
This will work but it doesn't match the LLVM IR specification which states
these are "Standard C Library Intrinsics".
3. Fix simplify libcall so the resulting calls to the C routines will have the
proper CC keyword. e.g.
%__exp10 = call arm_aapcs_vfpcc double @__exp10(double %x) #1
This works and is the solution I implemented here.
Both solutions #2 and #3 would work. After carefully considering the pros and
cons, I decided to implement #3 for the following reasons.
1. It doesn't change the "spec" of the intrinsics.
2. It's a self-contained fix.
There are a couple of potential downsides.
1. There could be other places in the optimizer that is broken in the same way
that's not addressed by this.
2. There could be other calling conventions that need to be propagated by
simplify-libcall that's not handled.
But for now, this is the fix that I'm most comfortable with.
llvm-svn: 203488
* Add masking instructions before loads and stores (in MC layer).
* Add masking instructions after SP changes (in MC layer).
* Forbid loads, stores and SP changes in delay slots (in MI layer).
Differential Revision: http://llvm-reviews.chandlerc.com/D2904
llvm-svn: 203484
NVPTX, like the other backends, relies on generic symbol name sanitizing done by
MCSymbol. However, the ptxas assembler is more stringent and disallows some
additional characters in symbol names.
See PR19099 for more details.
llvm-svn: 203483
Summary:
This is a white lie to workaround a widespread bug in the -mfp64
implementation.
The problem is that none of the 32-bit fpu ops mention the fact that they
clobber the upper 32-bits of the 64-bit FPR. This allows MFHC1 to be
scheduled on the wrong side of most 32-bit FPU ops. Fixing that requires a
major overhaul of the FPU implementation which can't be done right now due to
time constraints.
MFHC1 is one of two affected instructions. These instructions are the only
FPU instructions that don't read or write the lower 32-bits. We therefore
pretend that it reads the bottom 32-bits to artificially create a dependency and
prevent the scheduler changing the behaviour of the code.
The other instruction is MTHC1 which will be fixed once I've have found a failing
test case for it.
The testcase is test-suite/SingleSource/UnitTests/Vector/simple.c when
given TARGET_CFLAGS="-mips32r2 -mfp64 -mmsa".
Reviewers: jacksprat, matheusalmeida
Reviewed By: jacksprat
Differential Revision: http://llvm-reviews.chandlerc.com/D2966
llvm-svn: 203464
The function was making too many assumptions about its input:
1. The NEON_VDUP optimisation was far too aggressive, assuming (I
think) that the input would always be BUILD_VECTOR.
2. We were treating most unknown concats as legal (by returning Op
rather than SDValue()). I think only concats of pairs of vectors are
actually legal.
http://llvm.org/PR19094
llvm-svn: 203450