llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-01 08:23:21 +01:00

Author	SHA1	Message	Date
Chandler Carruth	7a5c52fadf	Use standard promotion for i8 CTTZ nodes and i8 CTLZ nodes when the LZCNT instructions are available. Force promotion to i32 to get a smaller encoding since the fix-ups necessary are just as complex for either promoted type We can't do standard promotion for CTLZ when lowering through BSR because it results in poor code surrounding the 'xor' at the end of this instruction. Essentially, if we promote the entire CTLZ node to i32, we end up doing the xor on a 32-bit CTLZ implementation, and then subtracting appropriately to get back to an i8 value. Instead, our custom logic just uses the knowledge of the incoming size to compute a perfect xor. I'd love to know of a way to fix this, but so far I'm drawing a blank. I suspect the legalizer could be more clever and/or it could collude with the DAG combiner, but how... ;] llvm-svn: 147251	2011-12-24 12:12:34 +00:00
Chandler Carruth	82b7a7478b	Add systematic testing for cttz as well, and fix the bug I spotted by inspection earlier. llvm-svn: 147250	2011-12-24 11:46:10 +00:00
Chandler Carruth	b52ba33d0a	Add i8 and i64 testing for ctlz on x86. Also simplify the i16 test. llvm-svn: 147249	2011-12-24 11:26:59 +00:00
Chandler Carruth	800a803717	Tidy up this rather crufty test. Put the declarations at the top to make my C-brain happy. Remove the unnecessary bits of pedantic IR fluff like nounwind. Remove stray uses comments. Name things semantically rather than tN so that adding a new test in the middle doesn't cause pain, and so that new tests can be grouped semantically. This exposes how little systematic testing is going on here. I noticed this by finding several bugs via inspection and wondering why this test wasn't catching any of them. =[ llvm-svn: 147248	2011-12-24 11:26:57 +00:00
Chandler Carruth	9ef50ef1f7	Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the X86ISelLowering C++ code. Because this is lowered via an xor wrapped around a bsr, we want the dagcombine which runs after isel lowering to have a chance to clean things up. In particular, it is very common to see code which looks like: (sizeof(x)8 - 1) ^ __builtin_clz(x) Which is trying to compute the most significant bit of 'x'. That's actually the value computed directly by the 'bsr' instruction, but if we match it too late, we'll get completely redundant xor instructions. The more naive code for the above (subtracting rather than using an xor) still isn't handled correctly due to the dagcombine getting confused. Also, while here fix an issue spotted by inspection: we should have been expanding the zero-undef variants to the normal variants when there is an 'lzcnt' instruction. Do so, and test for this. We don't want to generate unnecessary 'bsr' instructions. These two changes fix some regressions in encoding and decoding benchmarks. However, there is still a lot* to be improve on in this type of code. llvm-svn: 147244	2011-12-24 10:55:54 +00:00
Chandler Carruth	7564e8371a	Begin teaching the X86 target how to efficiently codegen patterns that use the zero-undefined variants of CTTZ and CTLZ. These are just simple patterns for now, there is more to be done to make real world code using these constructs be optimized and codegen'ed properly on X86. The existing tests are spiffed up to check that we no longer generate unnecessary cmov instructions, and that we generate the very important 'xor' to transform bsr which counts the index of the most significant one bit to the number of leading (most significant) zero bits. Also they now check that when the variant with defined zero result is used, the cmov is still produced. llvm-svn: 146974	2011-12-20 11:19:37 +00:00
Chandler Carruth	2bedf185c9	Manually upgrade the test suite to specify the flag to cttz and ctlz. I followed three heuristics for deciding whether to set 'true' or 'false': - Everything target independent got 'true' as that is the expected common output of the GCC builtins. - If the target arch only has one way of implementing this operation, set the flag in the way that exercises the most of codegen. For most architectures this is also the likely path from a GCC builtin, with 'true' being set. It will (eventually) require lowering away that difference, and then lowering to the architecture's operation. - Otherwise, set the flag differently dependending on which target operation should be tested. Let me know if anyone has any issue with this pattern or would like specific tests of another form. This should allow the x86 codegen to just iteratively improve as I teach the backend how to differentiate between the two forms, and everything else should remain exactly the same. llvm-svn: 146370	2011-12-12 11:59:10 +00:00
Evan Cheng	b5950697e8	- Teach SelectionDAG::isKnownNeverZero to return true (op x, c) when c is non-zero. - Teach X86 cmov optimization to eliminate the cmov from ctlz, cttz extension when the source of X86ISD::BSR / X86ISD::BSF is proven to be non-zero. rdar://9490949 llvm-svn: 131948	2011-05-24 01:48:22 +00:00
Chris Lattner	8be174c089	filecheckize a test and mark these wiht a cpu so it passes on hosts without cmovs. llvm-svn: 98521	2010-03-14 22:31:16 +00:00
Dan Gohman	df2896d609	Eliminate more uses of llvm-as and llvm-dis. llvm-svn: 81290	2009-09-08 23:54:48 +00:00
Evan Cheng	6909ff8c4b	Fix ctlz and cttz. llvm definition requires them to return number of bits in of the src type when value is zero. llvm-svn: 45029	2007-12-14 08:30:15 +00:00
Evan Cheng	51cf86ded0	Implement ctlz and cttz with bsr and bsf. llvm-svn: 45024	2007-12-14 02:13:44 +00:00

12 Commits