1. (((x) & 0xFF00) >> 8) | (((x) & 0x00FF) << 8)
=> (bswap x) >> 16
2. ((x&0xff)<<8)|((x&0xff00)>>8)|((x&0xff000000)>>8)|((x&0x00ff0000)<<8))
=> (rotl (bswap x) 16)
This allows us to eliminate most of the def : Pat patterns for ARM rev16
revsh instructions. It catches many more cases for ARM and x86.
rdar://9609108
llvm-svn: 133503
ops.
This is a rewrite of the IV simplification algorithm used by
-disable-iv-rewrite. To avoid perturbing the default mode, I
temporarily split the driver and created SimplifyIVUsersNoRewrite. The
idea is to avoid doing opcode/pattern matching inside
IndVarSimplify. SCEV already does it. We want to optimize with the
full generality of SCEV, but optimize def-use chains top down on-demand rather
than rewriting the entire expression bottom-up. This was easy to do
for operations that SCEV can prove are identity function. So we're now
eliminating bitmasks and zero extends this way.
A result of this rewrite is that indvars -disable-iv-rewrite no longer
requires IVUsers.
llvm-svn: 133502
In cases such as the attached test, where the case value for a switch
destination is used in a phi node that follows the destination, it
might be better to replace that value with the condition value of the
switch, so that more blocks can be folded away with
TryToSimplifyUncondBranchFromEmptyBlock because there are less
conflicts in the phi node.
llvm-svn: 133344
type's bitwidth matches the (allocated) size of the alloca. This severely
pessimizes vector scalar replacement when the only vector type being used is
something like <3 x float> on x86 or ARM whose allocated size matches a
<4 x float>.
I hope to fix some of the flawed assumptions about allocated size throughout
scalar replacement and reenable this in most cases.
llvm-svn: 133338
for pre-2.9 bitcode files. We keep x86 unaligned loads, movnt, crc32, and the
target indep prefetch change.
As usual, updating the testsuite is a PITA.
llvm-svn: 133337
* rounding modes for fp add, mul, sub now use .rn
* float -> int rounding correctly uses .rzi not .rni
* 32bit fdiv for sm13 uses div.rn (instead of div.approx)
* 32bit fdiv for sm10 now uses div (instead of div.approx)
Approx is not IEEE 754 compatible (and should be optionally set by a flag to the backend instead). The .rn rounding modifier is the PTX default anyway, but it's better to be explicit.
All these modifiers should be available by using __fmul_rz functions for example, but support will need to be added for this in the backend.
Patch by Dan Bailey
llvm-svn: 133253
In Thumb mode we cannot handle GPR virtual registers, even though some
instructions can. When isel is lowering a CopyFromReg, it should limit
itself to subclasses of getRegClassFor(VT).
<rdar://problem/9624323>
llvm-svn: 133210
accumulator forwarding. Specifically (from SVN log entry):
Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplier
accumulator forwarding:
vadd d3, d0, d1
vmul d3, d3, d2
=>
vmul d3, d0, d2
vmla d3, d1, d2
Make sure it catches cases where operand 1 is add/fadd/sub/fsub, which was
intended in the original revision.
llvm-svn: 133127
optimizations when emitting calls to the function; instead those calls may
use faster relocations which require the function to be immediately resolved
upon loading the dynamic object featuring the call. This is useful when it
is known that the function will be called frequently and pervasively and
therefore there is no merit in delaying binding of the function.
Currently only implemented for x86-64, where it turns into a call through
the global offset table.
Patch by Dan Gohman, who assures me that he's going to add LangRef documentation
for this once it's committed.
llvm-svn: 133080
Note that this actually changes code generation, and someone who
understands this target better should check the changes.
- R12Q is now allocatable. I think it was omitted from the allocation
order by mistake since it isn't reserved. It as apparently used as a
GOT pointer sometimes, and it should probably be reserved if that is
the case.
- The GR64 registers are allocated in a different order now. The
register allocator will automatically put the CSRs last. There were
other changes to the order that may have been significant.
The test fix is because r0 and r1 swapped places in the allocation order.
llvm-svn: 133067
the bits being cleared by the AND are not demanded by the BFI.
The previous BFI dag combine rule was actually incorrect (or used to be
correct until BFI representation changed).
rdar://9609030
llvm-svn: 133034
converted to add x,x if x is a undef. add undef, undef does not guarantee
that the resulting low order bit is zero.
Fixes <rdar://problem/9453156> and <rdar://problem/9487392>.
llvm-svn: 133022
types (with power of two types such as 8,16,32 .. 512).
Fix a bug in the integer promotion of bitcast nodes. Enable integer expanding
only if the target of the conversion is an integer (when the type action is
scalarize).
Add handling to the legalization of vector load/store in cases where the saved
vector is integer-promoted.
llvm-svn: 132985
might overflow. Re-typing the alloca to a larger type (e.g. double)
hoists a shift into the alloca, potentially exposing overflow in the
expression. rdar://problem/9265821
llvm-svn: 132926
In particular, don't spill dirty registers only to satisfy a hint. It is
not worth it.
The attached test case provides an example where the fast allocator
would spill a register when other registers are available.
llvm-svn: 132900
we try to branch to them.
Before we were creating successor lists with duplicated entries. Fixing that
found a bug in isBlockOnlyReachableByFallthrough that would causes it to
return the wrong answer for
-----------
...
jne foo
jmp bar
foo:
----------
llvm-svn: 132882
causing an assertion failure downstream. This fixes <rdar://problem/9562908>.
This really seems like it should always be set at CCState creation time, so mistakes like
this can never happen. I'll take a look at doing that.
llvm-svn: 132811
The potential DAGCombine which enforces this more generally messes up some other very fragile patterns, so I'm leaving that alone, at least for now.
llvm-svn: 132809
pad, separating the exception and selector calls from the new lpad. Teaching
it not to do that, or to properly adjust the CFG afterwards, is out of
scope because it would require the other edges to the landing pad to be split
as well (effectively). Instead, just recover from the most likely cases
during inlining. The best long-term solution is to change the exception
representation and commit to either requiring or not requiring the more
complex edge-splitting logic; this is just a shorter-term hack.
llvm-svn: 132799
assuming that all offsets are legal vector accesses, and thus trying to access
the float member of { <2 x float>, float } as the 3rd element of the first
member.
llvm-svn: 132766
former was using the size of the entire alloca, whereas the latter was correctly using
the allocated size of the immediate type being converted (which may differ from the size
of the alloca). This fixes PR10082.
llvm-svn: 132759
- cfi directives are not inserted at the right location or in the right order.
- The source MachineLocation for the cfi directive that changes the cfa register
to $fp should be MachineLocation::VirtualFP.
- A PROLOG_LABEL that marks the beginning of cfi_offset directives for
callee-saved register is emitted even when no callee-saved registers are
saved.
- When a callee-saved double precision register is saved, two cfi_offset
directives, one for each of the paired single precision registers, should be
emitted.
llvm-svn: 132703
When local live range splitting creates a live range with the same
number of instructions as the old range, mark it as RS_Local. When such
a range is seen again, require that it be split in a way that reduces
the number of instructions. That guarantees we are making progress while
still being able to perform 3 -> 2+3 splits as required by PR10070.
This also means that the PrevSlot map is no longer needed. This was also
used to estimate new spill weights, but that is no longer necessary
after slotIndexes::insertMachineInstrInMaps() got the extra Late
insertion argument.
llvm-svn: 132697
then we don't want to set the destination in the indirect branch to the
destination. This is because the indirect branch needs its destinations to have
had their block addresses taken. This isn't so of the new critical edge that's
split during this process. If it turns out that the destination block has only
one predecessor, and that being a BB with an indirect branch, then it won't be
marked as 'used' and may be removed.
PR10072
llvm-svn: 132638
redundant with partially-aliasing loads.
When computing what portion of a clobbering load value is needed,
it doesn't consider phi-translation which may have occurred
between the clobbing load and the redundant load.
llvm-svn: 132631
A TableGen backend can define how certain classes can be expanded into
ordered sets of defs, typically by evaluating a specific field in the
record. The SetTheory class can then evaluate DAG expressions that refer
to these named sets.
A number of standard set and list operations are predefined, and the
backend can add more specialized operators if needed. The -print-sets
backend is used by SetTheory.td to provide examples.
This is intended to simplify how register classes are defined:
def GR32_NOSP : RegisterClass<"X86", [i32], 32, (sub GR32, ESP)>;
llvm-svn: 132621
queries in the case of a DAG, where a query reaches a node
visited earlier, but it's not on a cycle. This avoids
MayAlias results in cases where BasicAA is expected to
return MustAlias or PartialAlias in order to protect TBAA.
llvm-svn: 132609