these would try hard to match constants by inverting the bits
and recursively matching. There are two problems with this:
1) some patterns would match when we didn't want them to (theoretical)
2) this is insanely expensive to do, and most often pointless.
This was apparently useful in just 2 instcombine cases, which I
added code to handle explicitly. This change speeds up 'opt'
time on 176.gcc by 1% and produces bitwise identical code.
llvm-svn: 123518
This is needed to allow an InstAlias for an instruction with an "OptionalDef"
result register (like ARM's cc_out) where you want to set the optional register
to reg0.
llvm-svn: 123490
disabled in this checkin. Sorry for the large diffs due to
refactoring. New functionality is all guarded by EnableSchedCycles.
Scheduling the isel DAG is inherently imprecise, but we give it a best
effort:
- Added MayReduceRegPressure to allow stalled nodes in the queue only
if there is a regpressure need.
- Added BUHasStall to allow checking for either dependence stalls due to
latency or resource stalls due to pipeline hazards.
- Added BUCompareLatency to encapsulate and standardize the heuristics
for minimizing stall cycles (vs. reducing register pressure).
- Modified the bottom-up heuristic (now in BUCompareLatency) to
prioritize nodes by their depth rather than height. As long as it
doesn't stall, height is irrelevant. Depth represents the critical
path to the DAG root.
- Added hybrid_ls_rr_sort::isReady to filter stalled nodes before
adding them to the available queue.
Related Cleanup: most of the register reduction routines do not need
to be templates.
llvm-svn: 123468
simplification present in fully optimized code (I think instcombine fails to
transform some of these when "X-Y" has more than one use). Fires here and
there all over the test-suite, for example it eliminates 8 subtractions in
the final IR for 445.gobmk, 2 subs in 447.dealII, 2 in paq8p etc.
llvm-svn: 123442
threading of shifts over selects and phis while there. This fires here and
there in the testsuite, to not much effect. For example when compiling spirit
it fires 5 times, during early-cse, resulting in 6 more cse simplifications,
and 3 more terminators being folded by jump threading, but the final bitcode
doesn't change in any interesting way: other optimizations would have caught
the opportunity anyway, only later.
llvm-svn: 123441
early in the cleanup code and one late interlaced with the inliner. The second one is
important because inlining and other scalar optzns can unpin allocas, allowing them to
be split up and promoted. While important for performance, this is also relatively
rare, and we would previously force a (non-lazy) computation of DomFrontiers, which
happened even if nothing became unpinned.
With this patch, the first pass of scalarrepl still promotes the vast bulk of allocas
in programs, but hte second pass has changed to use SSAUpdater, which is more "sparse"
and lazy. This speeds up opt -O3 time on kimwitu++ (a c++ app) by about 1%. The
numbers are interesting: the first pass promotes ~17500 allocas. The second pass
promotes about 1600. For non-C++ codes, the compile time win should be greater,
because the second pass of scalarrepl does less.
llvm-svn: 123437
instead of DomTree/DomFrontier. This may be interesting for reducing compile
time. This is currently disabled, but seems to work just fine.
When this is enabled, we eliminate two runs of dominator frontier, one in the
"early per-function" optimizations and one in the "interlaced with inliner"
function passes.
llvm-svn: 123434
- Fixed :upper16: fix up routine. It should be shifting down the top 16 bits first.
- Added support for Thumb2 :lower16: and :upper16: fix up.
- Added :upper16: and :lower16: relocation support to mach-o object writer.
llvm-svn: 123424
most important simplifications, as well as resolving phase ordering issues where instcombine
would inhibit important CSE'ing opportunities, for instance on BitBench/drop3.
llvm-svn: 123418
While there, I noticed that the transform "undef >>a X -> undef" was wrong.
For example if X is 2 then the top two bits must be equal, so the result can
not be anything. I fixed this in the constant folder as well. Also, I made
the transform for "X << undef" stronger: it now folds to undef always, even
though X might be zero. This is in accordance with the LangRef, but I must
admit that it is fairly aggressive. Also, I added "i32 X << 32 -> undef"
following the LangRef and the constant folder, likewise fairly aggressive.
llvm-svn: 123417