This bug is introduced by r211144. The element of operand may be
smaller than the element of result, but previous commit can
only handle the contrary condition. This commit is to handle this
scenario and generate optimized codes like ZIP1.
llvm-svn: 213830
When we had a vector_shuffle where we had an input from each vector, we
could miscompile it because we were assuming the input from V2 wouldn't
be moved from where it was on the vector.
Added a test case.
llvm-svn: 213826
Add `Value::sortUseList()`, templated on the comparison function to use.
The sort is an iterative merge sort that uses a binomial vector of
already-merged lists to limit the size overhead to `O(1)`.
This is part of PR5680.
llvm-svn: 213824
We use gep to access the global array "switch.table", and the table index
should be treated as unsigned. When the highest bit is 1, this commit
zero-extends the index to an integer type with larger size.
For a switch on i2, we used to generate:
%switch.tableidx = sub i2 %0, -2
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i2 %switch.tableidx
It is incorrect when %switch.tableidx is 2 or 3. The fix is to generate
%switch.tableidx = sub i2 %0, -2
%switch.tableidx.zext = zext i2 %switch.tableidx to i3
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i3 %switch.tableidx.zext
rdar://17735071
llvm-svn: 213815
It isn't reasonable to test storing things using undef pointers --
storing through those is at best "good luck" and really should be
transformed to "unreachable". Random changes in the combiner can
randomly break these tests for no good reason. I'm following up on the
original commit regarding the right long-term strategy here.
llvm-svn: 213810
There were still some disassembler bits in lib/MC, but their use of Object
was only visible in the includes they used, not in the symbols.
llvm-svn: 213808
While the subprogram map cache used by Dead Argument Elimination works
there, I made a mistake when reusing it for Argument Promotion in
r212128 because ArgPromo may transform functions more than once whereas
DAE transforms each function only once, removing all the dead arguments
in one go.
To address this, ensure that the map is updated after each argument
promotion.
In retrospect it might be a little wasteful to create a map of all
subprograms when only handling a single CGSCC, but the alternative is
walking the debug info for each function in the CGSCC that gets updated.
It's not clear to me what the right tradeoff is there, but since the
current tradeoff seems to be working OK (and the code to keep things
updated is very cheap), let's stick with that for now.
llvm-svn: 213805
Also the debug location I had here was bogus, describing the location of
the call site as in the callee - and unnecessary, so just drop it.
llvm-svn: 213803
The transform to constant fold unary operations with an AND across a
vector comparison applies when the constant is not a splat of a scalar
as well.
llvm-svn: 213800
The folding of unary operations through a vector compare and mask operation
is only safe if the unary operation result is of the same size as its input.
For example, it's not safe for [su]itofp from v4i32 to v4f64.
llvm-svn: 213799
Constant fold the lanes of the input constant build_vector individually
so we correctly handle when the vector elements are not all the same
constant value.
PR20394
llvm-svn: 213798
I used the wrong method to obtain the return type inside FinishCall. This fix
simply uses the return type from FastLowerCall, which we already determined to
be a valid type.
Reduced test case from Chad. Thanks.
llvm-svn: 213788
With optimizations disabled, we disable the isel patterns for mul.wide; but we
were still generating MULWIDE ISD nodes. Now, we only try to generate MULWIDE
ISD nodes in DAGCombine if the optimization level is not zero.
llvm-svn: 213773
The target-independent DAGcombiner will generate:
asr w1, X, #31 w1 = splat sign bit.
add X, X, w1, lsr #28 X = X + 0 or pow2-1
asr w0, X, asr #4 w0 = X/pow2
However, the add + shifts is expensive, so generate:
add w0, X, 15 w0 = X + pow2-1
cmp X, wzr X - 0
csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X;
asr w0, X, asr 4 w0 = X/pow2
llvm-svn: 213758
We were assuming all SBFX-like operations would have the shl/asr form, but
often when the field being extracted is an i8 or i16, we end up with a
SIGN_EXTEND_INREG acting on a shift instead. Simple enough to check for though.
llvm-svn: 213754
Although the final shifter operand is a rotate, this actually only matters for
the half-word extends when the amount == 24. Otherwise folding a shift in is
just as good.
llvm-svn: 213753
This pass attempts to speculatively use a sqrt instruction if one exists on the target, falling back to a libcall if the target instruction returned NaN.
This was enabled for MIPS and System-Z, but is well guarded and is good for most targets - GCC does this for (that I've checked) X86, ARM and AArch64.
llvm-svn: 213752
The ARM ARM prohibits STRB instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STRB instructions with unpredictable behavior.
llvm-svn: 213750
There really is no arm64_be: it was a useful fiction to test big-endian support
while both backends existed in parallel, but now the only platform that uses
the name (iOS) doesn't have a big-endian variant, let alone one called
"arm64_be".
llvm-svn: 213748
The ARM ARM prohibits STR instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STR instructions with unpredictable behavior.
llvm-svn: 213745
Having both Triple::arm64 and Triple::aarch64 is extremely confusing, and
invites bugs where only one is checked. In reality, the only legitimate
difference between the two (arm64 usually means iOS) is also present in the OS
part of the triple and that's what should be checked.
We still parse the "arm64" triple, just canonicalise it to Triple::aarch64, so
there aren't any LLVM-side test changes.
llvm-svn: 213743
This chang fully reverts r211771.
That revision added a canonicalization rule which has the potential to causes a
combine-cycle in the target-independent canonicalizing DAG combine.
The plan is to move the logic that forms target specific addsub nodes as part of
the lowering of shuffles.
llvm-svn: 213736
instruction sequences with CHECK-NEXT for these test cases.
This notably exposes how absolutely horrible the generated code is for
several of these test cases, and will make any future updates to the
test as our vector instruction selection gets better.
llvm-svn: 213732
The post-indexed instructions were missing the constraint, causing unpredictable STRH instructions to be emitted.
The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed.
llvm-svn: 213729