Use the same COPY_TO_REGCLASS approach as for the 2-register *_sfp instructions.
This change made a big difference in the code generated for the
CodeGen/Thumb2/cross-rc-coalescing-2.ll test: The coalescer is still doing
a fine job, but some instructions that were previously moved outside the loop
are not moved now. It's using fewer VFP registers now, which is generally
a good thing, so I think the estimates for register pressure changed and that
affected the LICM behavior. Since that isn't obviously wrong, I've just
changed the test file. This completes the work for Radar 8711675.
llvm-svn: 121730
Jakob Olesen suggested that we can avoid the need for separate pseudo
instructions here by using COPY_TO_REGCLASS in the patterns. The pattern
gets pretty ugly but it seems to work well. Partial fix for Radar 8711675.
llvm-svn: 121718
as a "long" direct branch. While the mnemonics are the same, they encode the branch offset differently, and
the Darwin assembler appears to prefer the "long" form for direct branches. Thus, in the name of bitwise
equivalence, provide encoding and fixup support for it.
llvm-svn: 121710
when the wider type is legal. This allows us to compile:
define zeroext i16 @test1(i16 zeroext %x) nounwind {
entry:
%div = udiv i16 %x, 33
ret i16 %div
}
into:
test1: # @test1
movzwl 4(%esp), %eax
imull $63551, %eax, %eax # imm = 0xF83F
shrl $21, %eax
ret
instead of:
test1: # @test1
movw $-1985, %ax # imm = 0xFFFFFFFFFFFFF83F
mulw 4(%esp)
andl $65504, %edx # imm = 0xFFE0
movl %edx, %eax
shrl $5, %eax
ret
Implementing rdar://8760399 and example #4 from:
http://blog.regehr.org/archives/320
We should implement the same thing for [su]mul_hilo, but I don't
have immediate plans to do this.
llvm-svn: 121696
for each constant pool entry. Using WriteTypeSymbolic here takes time
proportional to the size of the module, for each constant pool entry.
This speeds up -verbose-asm llc on 252.eon (a random testcase at my disposal)
from 4.4s to 2.137s. llc takes 2.11s with asm-verbose off, so this is now a
pretty reasonable cost for verbose comments.
llvm-svn: 121691
when simplifying, allowing them to be eagerly turned into switches. This
is the last step required to get "Example 7" from this blog post:
http://blog.regehr.org/archives/320
On X86, we now generate this machine code, which (to my eye) seems better
than the ICC generated code:
_crud: ## @crud
## BB#0: ## %entry
cmpb $33, %dil
jb LBB0_4
## BB#1: ## %switch.early.test
addb $-34, %dil
cmpb $58, %dil
ja LBB0_3
## BB#2: ## %switch.early.test
movzbl %dil, %eax
movabsq $288230376537592865, %rcx ## imm = 0x400000017001421
btq %rax, %rcx
jb LBB0_4
LBB0_3: ## %lor.rhs
xorl %eax, %eax
ret
LBB0_4: ## %lor.end
movl $1, %eax
ret
llvm-svn: 121690
location in simplifycfg. In the old days, SimplifyCFG was never run on
the entry block, so we had to scan over all preds of the BB passed into
simplifycfg to do this xform, now we can just check blocks ending with
a condbranch. This avoids a scan over all preds of every simplified
block, which should be a significant compile-time perf win on functions
with lots of edges. No functionality change.
llvm-svn: 121668
class A<bit a, bits<3> x, bits<3> y> {
bits<3> z;
let z = !if(a, x, y);
}
The variable z will get the value of x when 'a' is 1 and 'y' when a is '0'.
llvm-svn: 121666