These tests were using SI-NOT: MOVREL to make sure concat vectors
weren't being lowered to stack loads and stores, but we are using
scratch buffers for the stack now instead of registers, so we need
to add an additional SI-NOT check for scratch buffers.
With this change I was able to uncover one broken test which will
be fixed in a future commit.
llvm-svn: 215269
I accidentally also used INC/DEC for unsigned arithmetic which doesn't work,
because INC/DEC don't set the required flag which is used for the overflow
check.
llvm-svn: 215237
Also added the testcase that should have been in r215194.
This behaviour has surprised me a few times now. The problem is that the
generated MipsSubtarget::ParseSubtargetFeatures() contains code like this:
if ((Bits & Mips::FeatureABICalls) != 0) IsABICalls = true;
so '-abicalls' means 'leave it at the default' and '+abicalls' means 'set it to
true'. In this case, (and the similar -modd-spreg case) I'd like the code to be
IsABICalls = (Bits & Mips::FeatureABICalls) != 0;
or possibly:
if ((Bits & Mips::FeatureABICalls) != 0)
IsABICalls = true;
else
IsABICalls = false;
and preferably arrange for 'Bits & Mips::FeatureABICalls' to be true by default
(on some triples).
llvm-svn: 215211
For best-case performance on Cortex-A57, we should try to use a balanced mix of odd and even D-registers when performing a critical sequence of independent, non-quadword FP/ASIMD floating-point multiply or multiply-accumulate operations.
This pass attempts to detect situations where the register allocation may adversely affect this load balancing and to change the registers used so as to better utilize the CPU.
Ideally we'd just take each multiply or multiply-accumulate in turn and allocate it alternating even or odd registers. However, multiply-accumulates are most efficiently performed in the same functional unit as their accumulation operand. Therefore this pass tries to find maximal sequences ("Chains") of multiply-accumulates linked via their accumulation operand, and assign them all the same "color" (oddness/evenness).
This optimization affects S-register and D-register floating point multiplies and FMADD/FMAs, as well as vector (floating point only) muls and FMADD/FMA. Q register instructions (and 128-bit vector instructions) are not affected.
llvm-svn: 215199
This short-circuited our error reporting for incorrectly specified
target triples (you'd get AArch64 code instead).
Should fix PR20567.
llvm-svn: 215191
GlobalOpt didn't know how to simulate InsertValueInst or
ExtractValueInst. Optimizing these is pretty straightforward.
N.B. This came up when looking at clang's IRGen for MS ABI member
pointers; they are represented as aggregates.
llvm-svn: 215184
This completes one item from the todo-list of r215125 "Generate masking
instruction variants with tablegen".
The AddedComplexity is needed just like for the k variant.
Added a codegen test based on valignq.
llvm-svn: 215173
__stack_chk_guard.
Handle the case where the pointer operand of the load instruction that loads the
stack guard is not a global variable but instead a bitcast.
%StackGuard = load i8** bitcast (i64** @__stack_chk_guard to i8**)
call void @llvm.stackprotector(i8* %StackGuard, i8** %StackGuardSlot)
Original test case provided by Ana Pazos.
This fixes PR20558.
llvm-svn: 215167
Due to an unnecessary special case, inlined arguments that happened to
be from the same function as they were inlined into were misclassified
as non-inline arguments and would overwrite the non-inlined arguments.
Assert that we never overwrite a function's arguments, and stop
misclassifying inlined arguments as non-inline arguments to fix this
issue.
Excuse the rather crappy test case - handcrafted IR might do better, or
someone who understands better how to tickle the inliner to create a
recursive inlining situation like this (though it may also be necessary
to tickle the variable in a particular way to cause it to be recorded in
the MMI side table and go down this particular path for location
information).
llvm-svn: 215157
a base GOT entry.
Summary:
get tip of tree mips fast-isel to pass test-suite
Two bugs were fixed:
1) one bit booleans were treated as 1 bit signed integers and so the literal '1' could become sign extended.
2) mips uses got for pic but in certain cases, as with string constants for example, many items can be referenced from the same got entry and this case was not handled properly.
Test Plan: test-suite
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: mcrosier
Differential Revision: http://reviews.llvm.org/D4801
llvm-svn: 215155
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reverts commits r215111, 215115, 215116, 215117, 215136.
llvm-svn: 215154
Re-commit of r214832,r21469 with a work-around that
avoids the previous problem with gcc build compilers
The work-around is to use SmallVector instead of ArrayRef
of basic blocks in preservesResourceLen()/MachineCombiner.cpp
llvm-svn: 215151
this case, the code path dealing with vector promotion was missing the explicit
checks for lifetime intrinsics that were present on the corresponding integer
promotion path.
llvm-svn: 215148
BranchFolderPass was not correctly setting the basic block branch weights when
tail-merging created or merged blocks. This patch recomutes the weights of
tail-merged blocks using the following formula:
branch_weight(merged block to successor j) =
sum(block_frequency(bb) * branch_probability(bb -> j))
bb is a block that is in the set of merged blocks.
<rdar://problem/16256423>
llvm-svn: 215135
Currently FileCheck errors out on empty input. This is usually the
right thing to do, but makes testing things like "this command does
not emit some error message" hard to test. This usually leads to
people using "command 2>&1 | count 0" instead, and then the bots that
use guard malloc fail a few hours later.
By adding a flag to FileCheck that allows empty inputs, we can make
tests that consist entirely of "CHECK-NOT" lines feasible.
llvm-svn: 215127
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.
Thanks to Lang Hames for making MCJIT a good replacement!
llvm-svn: 215111
Summary:
These directives are used to toggle whether the assembler accepts MSA-specific instructions or not.
Patch by Matheus Almeida and Toma Tabacu.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D4783
llvm-svn: 215099
shuffle lowering.
This is closely related to the previous one. Here we failed to use the
source offset when swapping in the other case -- where we end up
swapping the *final* shuffle. The cause of this bug is a bit different:
I simply wasn't thinking about the fact that this mask is actually
a slice of a wide mask and thus has numbers that need SourceOffset
applied. Simple fix. Would be even more simple with an algorithm-y thing
to use here, but correctness first. =]
llvm-svn: 215095
via the fuzz tester.
Here I missed an offset when round-tripping a value through a shuffle
mask. I got it right 2 lines below. See a problem? I do. ;] I'll
probably be adding a little "swap" algorithm which accepts a range and
two values and swaps those values where they occur in the range. Don't
really have a name for it, let me know if you do.
llvm-svn: 215094
through the new fuzzer.
This one is great: bad operator precedence led the modulus to happen at
the wrong point. All the asserts didn't fire because there were usually
the right values past the end of the 4 element region we were looking
at. Probably could have gotten a crash here with ASan + fuzzing, but the
correctness tests pinpointed this really nicely.
llvm-svn: 215092
Summary:
Since pointers are 32-bit on x32 we can use ebp and esp as frame and stack
pointer. Some operations like PUSH/POP and CFI_INSTRUCTION still
require 64-bit register, so using 64-bit MachineFramePtr where required.
X86_64 NaCl uses 64-bit frame/stack pointers, however it's been found that
both isTarget64BitLP64 and isTarget64BitILP32 are true for NaCl. Addressing
this issue here as well by making isTarget64BitLP64 false.
Also mark hasReservedSpillSlot unreachable on X86. See inlined comments.
Test Plan: Add one new simple test and upgrade 2 existing with x32 target case.
Reviewers: nadav, dschuff
Subscribers: llvm-commits, zinovy.nis
Differential Revision: http://reviews.llvm.org/D4617
llvm-svn: 215091
fuzz testing.
The function which tested for adjacency did what it said on the tin, but
when I called it, I wanted it to do something more thorough: I wanted to
know if the *pairs* of shuffle elements were adjacent and started at
0 mod 2. In one place I had the decency to try to test for this, but in
the other it was completely skipped, miscompiling this test case. Fix
this by making the helper actually do what I wanted it to do everywhere
I called it (and removing the now redundant code in one place).
I *really* dislike the name "canWidenShuffleElements" for this
predicate. If anyone can come up with a better name, please let me know.
The other name I thought about was "canWidenShuffleMask" but is it
really widening the mask to reduce the number of lanes shuffled? I don't
know. Naming things is hard.
llvm-svn: 215089
It also allows nested { } expressions, as now that they are sized, we can merge pull bits from the nested value.
In the current behaviour, everything in { } must have been convertible to a single bit.
However, now that binary literals are sized, its useful to be able to initialize a range of bits.
So, for example, its now possible to do
bits<8> x = { 0, 1, { 0b1001 }, 0, 0b0 }
llvm-svn: 215086