llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-31 16:02:52 +01:00

Author	SHA1	Message	Date
Nadav Rotem	06ab05f47a	CostModel: increase the default cost of supported floating point operations from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles. llvm-svn: 179413	2013-04-12 21:15:03 +00:00
Arnold Schwaighofer	329430aeac	X86 cost model: Vector shifts are expensive in most cases The default logic does not correctly identify costs of casts because they are marked as custom on x86. For some cases, where the shift amount is a scalar we would be able to generate better code. Unfortunately, when this is the case the value (the splat) will get hoisted out of the loop, thereby making it invisible to ISel. radar://13130673 radar://13537826 llvm-svn: 178703	2013-04-03 21:46:05 +00:00
Michael Liao	fe785c9579	Correct cost model for vector shift on AVX2 - After moving logic recognizing vector shift with scalar amount from DAG combining into DAG lowering, we declare to customize all vector shifts even vector shift on AVX is legal. As a result, the cost model needs special tuning to identify these legal cases. llvm-svn: 177586	2013-03-20 22:01:10 +00:00
Arnold Schwaighofer	e60e6fc70f	X86 cost model: Adjust cost for custom lowered vector multiplies This matters for example in following matrix multiply: int mmult(int rows, int cols, int m1, int m2, int m3) { int i, j, k, val; for (i=0; i<rows; i++) { for (j=0; j<cols; j++) { val = 0; for (k=0; k<cols; k++) { val += m1[i][k] * m2[k][j]; } m3[i][j] = val; } } return(m3); } Taken from the test-suite benchmark Shootout. We estimate the cost of the multiply to be 2 while we generate 9 instructions for it and end up being quite a bit slower than the scalar version (48% on my machine). Also, properly differentiate between avx1 and avx2. On avx-1 we still split the vector into 2 128bits and handle the subvector muls like above with 9 instructions. Only on avx-2 will we have a cost of 9 for v4i64. I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an add instead of a mul because with a mul we now no longer vectorize. I did verify that the mul would be indeed more expensive when vectorized with 3 kernels: for (i ...) r += a[i] * 3; for (i ...) m1[i] = m1[i] * 3; // This matches the test case in avx1.ll and a matrix multiply. In each case the vectorized version was considerably slower. radar://13304919 llvm-svn: 176403	2013-03-02 04:02:52 +00:00
Nadav Rotem	6dac3b0c66	Cost Model: change the default cost of control flow instructions (br / ret / ...) to zero. llvm-svn: 169423	2012-12-05 21:21:26 +00:00
Nadav Rotem	4def3aace5	Implement the cost of abnormal x86 instruction lowering as a table. llvm-svn: 167395	2012-11-05 19:32:46 +00:00
Nadav Rotem	c9bbabd5e9	X86 CostModel: Add support for a some of the common arithmetic instructions for SSE4, AVX and AVX2. llvm-svn: 167347	2012-11-03 00:39:56 +00:00

7 Commits