LZCNT instructions are available. Force promotion to i32 to get
a smaller encoding since the fix-ups necessary are just as complex for
either promoted type
We can't do standard promotion for CTLZ when lowering through BSR
because it results in poor code surrounding the 'xor' at the end of this
instruction. Essentially, if we promote the entire CTLZ node to i32, we
end up doing the xor on a 32-bit CTLZ implementation, and then
subtracting appropriately to get back to an i8 value. Instead, our
custom logic just uses the knowledge of the incoming size to compute
a perfect xor. I'd love to know of a way to fix this, but so far I'm
drawing a blank. I suspect the legalizer could be more clever and/or it
could collude with the DAG combiner, but how... ;]
llvm-svn: 147251
my C-brain happy. Remove the unnecessary bits of pedantic IR fluff like
nounwind. Remove stray uses comments. Name things semantically rather
than tN so that adding a new test in the middle doesn't cause pain, and
so that new tests can be grouped semantically.
This exposes how little systematic testing is going on here. I noticed
this by finding several bugs via inspection and wondering why this test
wasn't catching any of them. =[
llvm-svn: 147248
X86ISelLowering C++ code. Because this is lowered via an xor wrapped
around a bsr, we want the dagcombine which runs after isel lowering to
have a chance to clean things up. In particular, it is very common to
see code which looks like:
(sizeof(x)*8 - 1) ^ __builtin_clz(x)
Which is trying to compute the most significant bit of 'x'. That's
actually the value computed directly by the 'bsr' instruction, but if we
match it too late, we'll get completely redundant xor instructions.
The more naive code for the above (subtracting rather than using an xor)
still isn't handled correctly due to the dagcombine getting confused.
Also, while here fix an issue spotted by inspection: we should have been
expanding the zero-undef variants to the normal variants when there is
an 'lzcnt' instruction. Do so, and test for this. We don't want to
generate unnecessary 'bsr' instructions.
These two changes fix some regressions in encoding and decoding
benchmarks. However, there is still a *lot* to be improve on in this
type of code.
llvm-svn: 147244
use the zero-undefined variants of CTTZ and CTLZ. These are just simple
patterns for now, there is more to be done to make real world code using
these constructs be optimized and codegen'ed properly on X86.
The existing tests are spiffed up to check that we no longer generate
unnecessary cmov instructions, and that we generate the very important
'xor' to transform bsr which counts the index of the most significant
one bit to the number of leading (most significant) zero bits. Also they
now check that when the variant with defined zero result is used, the
cmov is still produced.
llvm-svn: 146974
I followed three heuristics for deciding whether to set 'true' or
'false':
- Everything target independent got 'true' as that is the expected
common output of the GCC builtins.
- If the target arch only has one way of implementing this operation,
set the flag in the way that exercises the most of codegen. For most
architectures this is also the likely path from a GCC builtin, with
'true' being set. It will (eventually) require lowering away that
difference, and then lowering to the architecture's operation.
- Otherwise, set the flag differently dependending on which target
operation should be tested.
Let me know if anyone has any issue with this pattern or would like
specific tests of another form. This should allow the x86 codegen to
just iteratively improve as I teach the backend how to differentiate
between the two forms, and everything else should remain exactly the
same.
llvm-svn: 146370
non-zero.
- Teach X86 cmov optimization to eliminate the cmov from ctlz, cttz extension
when the source of X86ISD::BSR / X86ISD::BSF is proven to be non-zero.
rdar://9490949
llvm-svn: 131948