type to truncate to should be the number of bits of the value that are
preserved, not the number that are clobbered with sign-extension.
This fixes regressions in ldecod.
llvm-svn: 69704
%reg1498<def> = MOV32rm %reg1024, 1, %reg0, 12, %reg0, Mem:LD(4,4) [sunkaddr39 + 0]
%reg1506<def> = MOV32rm %reg1024, 1, %reg0, 8, %reg0, Mem:LD(4,4) [sunkaddr42 + 0]
%reg1486<def> = MOV32rr %reg1506
%reg1486<def> = XOR32rr %reg1486, %reg1498, %EFLAGS<imp-def,dead>
%reg1510<def> = MOV32rm %reg1024, 1, %reg0, 4, %reg0, Mem:LD(4,4) [sunkaddr45 + 0]
=>
%reg1498<def> = MOV32rm %reg2036, 1, %reg0, 12, %reg0, Mem:LD(4,4) [sunkaddr39 + 0]
%reg1506<def> = MOV32rm %reg2037, 1, %reg0, 8, %reg0, Mem:LD(4,4) [sunkaddr42 + 0]
%reg1486<def> = MOV32rr %reg1506
%reg1486<def> = XOR32rr %reg1486, %reg1498, %EFLAGS<imp-def,dead>
%reg1510<def> = MOV32rm %reg2038, 1, %reg0, 4, %reg0, Mem:LD(4,4) [sunkaddr45 + 0]
From linearscan's point of view, each of reg2036, 2037, and 2038 are separate registers, each is "killed" after a single use. The reloaded register is available and it's often clobbered right away. e.g. In thise case reg1498 is allocated EAX while reg2036 is allocated RAX. This means we end up with multiple reloads from the same stack slot in the same basic block.
Now linearscan recognize there are other reloads from same SS in the same BB. So it'll "downgrade" RAX (and its aliases) after reg2036 is allocated until the next reload (reg2037) is done. This greatly increase the likihood reloads from SS are reused.
This speeds up sha1 from OpenSSL by 5.8%. It is also an across the board win for SPEC2000 and 2006.
llvm-svn: 69585
type as the vector element type: allow them to be of
a wider integer type than the element type all the way
through the system, and not just as far as LegalizeDAG.
This should be safe because it used to be this way
(the old type legalizer would produce such nodes), so
backends should be able to handle it. In fact only
targets which have legal vector types with an illegal
promoted element type will ever see this (eg: <4 x i16>
on ppc). This fixes a regression with the new type
legalizer (vec_splat.ll). Also, treat SCALAR_TO_VECTOR
the same as BUILD_VECTOR. After all, it is just a
special case of BUILD_VECTOR.
llvm-svn: 69467
for the optimization it's testing to kick in (although
it improves the code, getting rid of all spills).
I don't understand the optimization well enough to
rescue the test, so XFAILing.
llvm-svn: 69409
leaq foo@TLSGD(%rip), %rdi
as part of the instruction sequence. Using a register other than %rdi and then
copying it to %rdi is not valid.
llvm-svn: 69350
register is available and when it's profitable.
e.g.
xorq %r12<kill>, %r13
addq %rax, -184(%rbp)
addq %r13, -184(%rbp)
==>
xorq %r12<kill>, %r13
movq -184(%rbp), %r12
addq %rax, %r12
addq %r13, %r12
movq %r12, -184(%rbp)
Two more instructions, but fewer memory accesses. It can also open up
opportunities for more optimizations.
llvm-svn: 69341
have pointer types, though in contrast to C pointer types, SCEV
addition is never implicitly scaled. This not only eliminates the
need for special code like IndVars' EliminatePointerRecurrence
and LSR's own GEP expansion code, it also does a better job because
it lets the normal optimizations handle pointer expressions just
like integer expressions.
Also, since LLVM IR GEPs can't directly index into multi-dimensional
VLAs, moving the GEP analysis out of client code and into the SCEV
framework makes it easier for clients to handle multi-dimensional
VLAs the same way as other arrays.
Some existing regression tests show improved optimization.
test/CodeGen/ARM/2007-03-13-InstrSched.ll in particular improved to
the point where if-conversion started kicking in; I turned it off
for this test to preserve the intent of the test.
llvm-svn: 69258
sext around sext(shorter IV + constant), using a
longer IV instead, when it can figure out the
add can't overflow. This comes up a lot in
subscripting; mainly affects 64 bit.
llvm-svn: 69123
llvm.dbg.region.end instrinsic. This nested llvm.dbg.func.start/llvm.dbg.region.end pair now enables DW_TAG_inlined_subroutine support in code generator.
llvm-svn: 69118
operator is used by a CopyToReg to export the value to a different
block, don't reuse the CopyToReg's register for the subreg operation
result if the register isn't precisely the right class for the
subreg operation.
Also, rename the h-registers.ll test, now that there are more
than one.
llvm-svn: 69087
- Add patterns for h-register extract, which avoids a shift and mask,
and in some cases a temporary register.
- Add address-mode matching for turning (X>>(8-n))&(255<<n), where
n is a valid address-mode scale value, into an h-register extract
and a scaled-offset address.
- Replace X86's MOV32to32_ and related instructions with the new
target-independent COPY_TO_SUBREG instruction.
On x86-64 there are complicated constraints on h registers, and
CodeGen doesn't currently provide a high-level way to express all of them,
so they are handled with a bunch of special code. This code currently only
supports extracts where the result is used by a zero-extend or a store,
though these are fairly common.
These transformations are not always beneficial; since there are only
4 h registers, they sometimes require extra move instructions, and
this sometimes increases register pressure because it can force out
values that would otherwise be in one of those registers. However,
this appears to be relatively uncommon.
llvm-svn: 68962
to support C99 inline, GNU extern inline, etc. Related bugzilla's
include PR3517, PR3100, & PR2933. Nothing uses this yet, but it
appears to work.
llvm-svn: 68940
1. Sinking would crash when the first instruction of a block was
sunk due to iterator problems.
2. Instructions could be sunk to their current block, causing an
infinite loop.
This fixes PR3968
llvm-svn: 68787
register destinations that are tied to source operands. The
TargetInstrDescr::findTiedToSrcOperand method silently fails for inline
assembly. The existing MachineInstr::isRegReDefinedByTwoAddr was very
close to doing what is needed, so this revision makes a few changes to
that method and also renames it to isRegTiedToUseOperand (for consistency
with the very similar isRegTiedToDefOperand and because it handles both
two-address instructions and inline assembly with tied registers).
llvm-svn: 68714
in addition to ZERO_EXTEND and SIGN_EXTEND. Fix a bug in the
way it checked for live-out values, and simplify the way it
find users by using SDNode::use_iterator's (relatively) new
features. Also, make it slightly more permissive on targets
with free truncates.
In SelectionDAGBuild, avoid creating ANY_EXTEND nodes that are
larger than necessary. If the target's SwitchAmountTy has
enough bits, use it. This exposes the truncate to optimization
early, enabling more optimizations.
llvm-svn: 68670
integer types, unless they are already strange. This prevents it from
turning the code produced by SROA into crazy libcalls and stuff that
the code generator can't handle. In the attached example, the result
was an i96 multiply that caused the x86 backend to assert.
Note that if TargetData had an idea of what the legal types are for
a target that this could be used to stop instcombine from introducing
i64 muls, as Scott wanted.
llvm-svn: 68598
with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce
SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG
instructions), and teach the DAGCombiner to take advantage of this on
targets which support it. This eliminates many redundant
zero-extension operations on x86-64.
This adds a new TargetLowering hook, isZExtFree. It's similar to
isTruncateFree, except it only applies to actual definitions, and not
no-op truncates which may not zero the high bits.
Also, this adds a new optimization to SimplifyDemandedBits: transform
operations like x+y into (zext (add (trunc x), (trunc y))) on targets
where all the casts are no-ops. In contexts where the high part of the
add is explicitly masked off, this allows the mask operation to be
eliminated. Fix the DAGCombiner to avoid undoing these transformations
to eliminate casts on targets where the casts are no-ops.
Also, this adds a new two-address lowering heuristic. Since
two-address lowering runs before coalescing, it helps to be able to
look through copies when deciding whether commuting and/or
three-address conversion are profitable.
Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle
the case that a clobber range extended both before and beyond an
existing live range. In that case, multiple live ranges need to be
added. This was exposed by the new subreg coalescing code.
Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the
spiller behavior it was looking for no longer occurrs with the new
instruction selection.
llvm-svn: 68576
builds.
--- Reverse-merging (from foreign repository) r68552 into '.':
U test/CodeGen/X86/tls8.ll
U test/CodeGen/X86/tls10.ll
U test/CodeGen/X86/tls2.ll
U test/CodeGen/X86/tls6.ll
U lib/Target/X86/X86Instr64bit.td
U lib/Target/X86/X86InstrSSE.td
U lib/Target/X86/X86InstrInfo.td
U lib/Target/X86/X86RegisterInfo.cpp
U lib/Target/X86/X86ISelLowering.cpp
U lib/Target/X86/X86CodeEmitter.cpp
U lib/Target/X86/X86FastISel.cpp
U lib/Target/X86/X86InstrInfo.h
U lib/Target/X86/X86ISelDAGToDAG.cpp
U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp
U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
U lib/Target/X86/X86ISelLowering.h
U lib/Target/X86/X86InstrInfo.cpp
U lib/Target/X86/X86InstrBuilder.h
U lib/Target/X86/X86RegisterInfo.td
llvm-svn: 68560
This introduces a small regression on the generated code
quality in the case we are just computing addresses, not
loading values.
Will work on it and on X86-64 support.
llvm-svn: 68552
Constant, MDString and MDNode which can only be used by globals with a name
that starts with "llvm." or as arguments to a function with the same naming
restriction.
llvm-svn: 68420
e.g.
%reg1024<def> = MOV r1
%reg1025<def> = ADD %reg1024, %reg1026
r0 = MOV %reg1025
If it's not possible / profitable to commute ADD, then turning ADD into a LEA saves a copy.
llvm-svn: 68065
x * 40
=>
shlq $3, %rdi
leaq (%rdi,%rdi,4), %rax
This has the added benefit of allowing more multiply to be folded into addressing mode. e.g.
a * 24 + b
=>
leaq (%rdi,%rdi,2), %rax
leaq (%rsi,%rax,8), %rax
llvm-svn: 67917
call, we should treat "i64 zext" as the start of a constant expr, but
"i64 0 zext" as an argument with an obsolete attribute on it (this form
is already tested by test/Assembler/2007-07-30-AutoUpgradeZextSext.ll).
Make the autoupgrade logic more discerning to avoid treating "i64 zext"
as an old-style attribute, causing us to reject a valid constant expr.
This fixes PR3876.
llvm-svn: 67682
to/from integer types that are not intptr_t to convert to intptr_t
then do an integer conversion to the dest type. This exposes the
cast to the optimizer.
llvm-svn: 67638
1. Make instcombine always canonicalize trunc x to i1 into an icmp(x&1). This
exposes the AND to other instcombine xforms and is more of what the code
generator expects.
2. Rewrite the remaining trunc pattern match to use 'match', which
simplifies it a lot.
llvm-svn: 67635
e.g. allocating for GR32, bh is not used, updating bl spill weight.
bl should get the same spill weight otherwise it will be choosen
as a spill candidate since spilling bh doesn't make ebx available.
This fix PR2866.
llvm-svn: 67574
same as a normal i80 {low64, high16} rather
than its own {high64, low16}. A depressing number
of places know about this; I think I got them all.
Bitcode readers and writers convert back to the old
form to avoid breaking compatibility.
llvm-svn: 67562
%RAX<def> = ...
%RAX<def> = SUBREG_TO_REG 0, %EAX:3<kill>, 3
The first def is defining RAX, not EAX so the top bits were not zero-extended.
llvm-svn: 67511
linkage: the value may be replaced with something
different at link time. (Frontends that want to
allow values to be loaded out of weak constants can
give their constants weak_odr linkage).
llvm-svn: 67407
not safe in general because the immediate could be an arbitrary
value that does not fit in a 32-bit pcrel displacement.
Conservatively fall back to loading the value into a register
and calling through it.
We still do the optzn on X86-32.
llvm-svn: 67142
it is not APInt clean, but even when it is it needs to be evaluated carefully
to determine whether it is actually profitable.
This fixes a crash on PR3806
llvm-svn: 67134
size by the array amount as an i32 value instead of promoting from
i32 to i64 then doing the multiply. Not doing this broke wrap-around
assumptions that the optimizers (validly) made. The ultimate real
fix for this is to introduce i64 version of alloca and remove mallocinst.
This fixes PR3829
llvm-svn: 67093
vector shuffle mask. Forced the mask to be built using i32. Note: this will
be irrelevant once vector_shuffle no longer takes a build vector for the
shuffle mask.
llvm-svn: 67076
- Fix fabs, fneg for f32 and f64.
- Use BuildVectorSDNode.isConstantSplat, now that the functionality exists
- Continue to improve i64 constant lowering. Lower certain special constants
to the constant pool when they correspond to SPU's shufb instruction's
special mask values. This avoids the overhead of performing a shuffle on a
zero-filled vector just to get the special constant when the memory load
suffices.
llvm-svn: 67067
U test/CodeGen/X86/2009-03-13-PHIElimBug.ll
D test/CodeGen/X86/2009-03-16-PHIElimInLPad.ll
U lib/CodeGen/PHIElimination.cpp
r67049 was causing this failure:
Running /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/test/CodeGen/X86/dg.exp ...
FAIL: /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/test/CodeGen/X86/2009-03-13-PHIElimBug.ll for PR3784
Failed with exit(1) at line 1
while running: llvm-as < /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/test/CodeGen/X86/2009-03-13-PHIElimBug.ll | llc -march=x86 | /usr/bin/grep -A 2 {call f} | /usr/bin/grep movl
child process exited abnormally
llvm-svn: 67051
how invokes are set up. The fix could be disturbed by
register copies coming after the EH_LABEL, and also didn't
behave quite right when it was the invoke result that
was used in a phi node. Also (see new testcase) fix
another phi elimination bug while there: register copies
in the landing pad need to come after the EH_LABEL, because
that's where execution branches to when unwinding. If they
come before the EH_LABEL then they will never be executed...
Also tweak the original testcase so it doesn't use a no-longer
existing counter.
The accumulated phi elimination changes fix two of seven Ada
testsuite failures that turned up after landing pad critical
edge splitting was turned off. So there's probably more to come.
llvm-svn: 67049
ptrtoint and inttoptr in X86FastISel. These casts aren't always
handled in the generic FastISel code because X86 sometimes needs
custom code to do truncation and zero-extension.
llvm-svn: 66988
by inserting explicit zero extensions where necessary. Included
is a testcase where SelectionDAG produces a virtual register
holding an i1 value which FastISel previously mistakenly assumed
to be zero-extended.
llvm-svn: 66941
1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.
Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.
llvm-svn: 66875
for i32/i64 expressions (we could also do i16 on cpus where
i16 lea is fast, but I didn't add this). On the example, we now
generate:
_test:
movl 4(%esp), %eax
cmpl $42, (%eax)
setl %al
movzbl %al, %eax
leal 4(%eax,%eax,8), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
cmpl $41, (%eax)
movl $4, %ecx
movl $13, %eax
cmovg %ecx, %eax
ret
llvm-svn: 66869
operands can't both be fully folded at the same time. For example,
in the included testcase, a global variable is being added with
an add of two values. The global variable wants RIP-relative
addressing, so it can't share the address with another base
register, but it's still possible to fold the initial add.
llvm-svn: 66865
in the Ada testcase. Reverting this only covers up
the real problem, which is a nasty conceptual difficulty
in the phi elimination pass: when eliminating phi nodes
in landing pads, the register copies need to come before
the invoke, not at the end of the basic block which is
too late... See PR3784.
llvm-svn: 66826
related transformations out of target-specific dag combine into the
ARM backend. These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).
Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0 -> (zext(cond) << 3). This happens frequently
with the recently added cp constant select optimization, but is a
very general xform. For example, we now compile the second example
in const-select.ll to:
_test:
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
seta %al
movzbl %al, %eax
movl 4(%esp), %ecx
movsbl (%ecx,%eax,4), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
leal 4(%eax), %ecx
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
cmovbe %eax, %ecx
movsbl (%ecx), %eax
ret
This passes multisource and dejagnu.
llvm-svn: 66779
alignment of the generated constant pool entry to the
desired alignment of a type. If we don't do this, we end up
trying to do movsd from 4-byte alignment memory. This fixes
450.soplex and 456.hmmer.
llvm-svn: 66641
1. Use the same value# to represent unknown values being merged into sub-registers.
2. When coalescer commute an instruction and the destination is a physical register, update its sub-registers by merging in the extended ranges.
llvm-svn: 66610
to obtain debug info about them.
Introduce helpers to access debug info for global variables. Also introduce a
helper that works for both local and global variables.
llvm-svn: 66541
the same say the "test" instruction does in overflow cases,
so eliminating the test is only safe when those bits aren't
needed, as is the case for COND_E and COND_NE, or if it
can be proven that no overflow will occur. For now, just
restrict the optimization to COND_E and COND_NE and don't
do any overflow analysis.
llvm-svn: 66318