Jim Grosbach
f2f14a2d43
X86: Resolve a long standing FIXME and properly isel pextr[bw].
...
Generalize the AArch64 .td nodes for AssertZext and AssertSext. Use
them to match the relevant pextr store instructions.
The test widen_load-2.ll requires a slight change because with the
stores gone, the remaining instructions are scheduled in a different
order.
Add test cases for SSE4 and AVX variants.
Resolves rdar://13414672.
Patch by Adam Nemet <anemet@apple.com>.
llvm-svn: 200957
2014-02-07 00:16:33 +00:00
Benjamin Kramer
c63386d01a
X86: Turn fp selects into mask operations.
...
double test(double a, double b, double c, double d) { return a<b ? c : d; }
before:
_test:
ucomisd %xmm0, %xmm1
ja LBB0_2
movaps %xmm3, %xmm2
LBB0_2:
movaps %xmm2, %xmm0
after:
_test:
cmpltsd %xmm1, %xmm0
andpd %xmm0, %xmm2
andnpd %xmm3, %xmm0
orpd %xmm2, %xmm0
Small speedup on Benchmarks/SmallPT
llvm-svn: 187706
2013-08-04 12:05:16 +00:00
Benjamin Kramer
4ee689feed
X86: Add a note.
...
llvm-svn: 175408
2013-02-17 23:34:14 +00:00
Chris Lattner
4a8f2bcb32
some peepholes that should match horizontal add/sub operations.
...
llvm-svn: 163103
2012-09-03 02:58:21 +00:00
Benjamin Kramer
080ccc13a6
Add a note for -ffast-math optimization of vector norm.
...
llvm-svn: 153031
2012-03-19 00:43:34 +00:00
Benjamin Kramer
54f11215e8
This is now implemented.
...
llvm-svn: 146258
2011-12-09 15:45:57 +00:00
Lang Hames
be4997db2f
Add a natural stack alignment field to TargetData, and prevent InstCombine from
...
promoting allocas to preferred alignments that exceed the natural
alignment. This avoids some potentially expensive dynamic stack realignments.
The natural stack alignment is set in target data strings via the "S<size>"
option. Size is in bits and must be a multiple of 8. The natural stack alignment
defaults to "unspecified" (represented by a zero value), and the "unspecified"
value does not prevent any alignment promotions. Target maintainers that care
about avoiding promotions should explicitly add the "S<size>" option to their
target data strings.
llvm-svn: 141599
2011-10-10 23:42:08 +00:00
Benjamin Kramer
19bcaa5d51
Add a note about SSE4.1 roundss/roundsd.
...
llvm-svn: 125438
2011-02-12 17:58:16 +00:00
Chris Lattner
908d8e9de2
update this.
...
llvm-svn: 113116
2010-09-05 20:22:09 +00:00
Chris Lattner
eb4c7e43cc
we should pattern match the SSE complex arithmetic ops.
...
llvm-svn: 112109
2010-08-25 23:31:42 +00:00
Chris Lattner
f4dfc7aaab
random improvement for variable shift codegen.
...
llvm-svn: 111813
2010-08-23 17:30:29 +00:00
Jakob Stoklund Olesen
eeabe43059
Remove obsolete README_SSE note.
...
We are generating movaps for all XMM register copies, including scalar
floating point values. This is known to be at least as good as movss and movsd
for all known architectures up to and including Nehalem because it avoids a
partial register stall.
The SSEDomainFix pass will switch movaps to movdqa when appropriate (i.e., when
operands come from the integer unit). We don't now that switching movaps to
movapd has any benefit.
The same applies to andps -> pand.
llvm-svn: 108096
2010-07-11 17:13:42 +00:00
Chris Lattner
6a9b6e3253
some notes about suboptimal insertps's
...
llvm-svn: 107613
2010-07-05 05:48:41 +00:00
Eli Friedman
37ee2be8cc
Remove some already-fixed README entries.
...
llvm-svn: 105377
2010-06-03 01:47:31 +00:00
Eli Friedman
6a9bddea09
Remove README entry which no longer compiles to something sane.
...
llvm-svn: 105376
2010-06-03 01:16:51 +00:00
Dan Gohman
37bf232609
Floating-point add, sub, and mul are now spelled fadd, fsub, and fmul,
...
respectively.
llvm-svn: 97531
2010-03-02 01:11:08 +00:00
Dan Gohman
92b6122204
Fix "the the" and similar typos.
...
llvm-svn: 95781
2010-02-10 16:03:48 +00:00
Chris Lattner
7c64c9ca21
add a note from PR6194
...
llvm-svn: 95649
2010-02-09 05:45:29 +00:00
Chris Lattner
a635219775
move the PR6214 microoptzn to this file.
...
llvm-svn: 95299
2010-02-04 07:32:01 +00:00
Chris Lattner
72a06499b3
this is an SSE-specific issue.
...
llvm-svn: 93373
2010-01-13 23:29:11 +00:00
Chris Lattner
feb6b56242
Bill implemented this.
...
llvm-svn: 63752
2009-02-04 19:09:07 +00:00
Chris Lattner
eae4653469
add a note, this is why we're faster at SciMark-MonteCarlo with
...
SSE disabled.
llvm-svn: 63751
2009-02-04 19:08:01 +00:00
Evan Cheng
2a965124b7
The memory alignment requirement on some of the mov{h|l}p{d|s} patterns are 16-byte. That is overly strict. These instructions read / write f64 memory locations without alignment requirement.
...
llvm-svn: 63195
2009-01-28 08:35:02 +00:00
Chris Lattner
c018045520
add a note
...
llvm-svn: 56391
2008-09-20 19:17:53 +00:00
Chris Lattner
61e771be29
add a note
...
llvm-svn: 54964
2008-08-19 00:41:02 +00:00
Evan Cheng
71fbfe73c1
- Fix a x86 vector isel bug: illegal transformation of a vector_shuffle into a
...
shift.
- Add a readme entry for a missing vector_shuffle optimization that results in
awful codegen.
llvm-svn: 52740
2008-06-25 20:52:59 +00:00
Evan Cheng
d312ced1cf
This is done.
...
llvm-svn: 51526
2008-05-24 00:10:13 +00:00
Evan Cheng
4f660778f0
Use movlps / movhps to modify low / high half of 16-byet memory location.
...
llvm-svn: 51501
2008-05-23 21:23:16 +00:00
Dan Gohman
e8422fc112
Elaborate on the entry on integer vector multiplication by constants.
...
llvm-svn: 51491
2008-05-23 18:05:39 +00:00
Evan Cheng
e7ec4690e1
New entry.
...
llvm-svn: 51487
2008-05-23 17:28:11 +00:00
Chris Lattner
4c1ffef5af
we compile multiply-by-constant into horrible code. Doesn't sse4 have some
...
instruction for doing this?
llvm-svn: 51473
2008-05-23 04:29:53 +00:00
Chris Lattner
a11adf725d
add a note
...
llvm-svn: 51062
2008-05-13 19:56:20 +00:00
Chris Lattner
c9eb6a7d64
add a note
...
llvm-svn: 51060
2008-05-13 18:48:54 +00:00
Evan Cheng
9e15622879
Instead of a vector load, shuffle and then extract an element. Load the element from address with an offset.
...
pshufd $1, (%rdi), %xmm0
movd %xmm0, %eax
=>
movl 4(%rdi), %eax
llvm-svn: 51026
2008-05-13 08:35:03 +00:00
Evan Cheng
e4ee4c2870
On x86, it's safe to treat i32 load anyext as a normal i32 load. Ditto for i8 anyext load to i16.
...
llvm-svn: 51019
2008-05-13 00:54:02 +00:00
Evan Cheng
fcbdc8bd6e
Xform bitconvert(build_pair(load a, load b)) to a single load if the load locations are at the right offset from each other.
...
llvm-svn: 51008
2008-05-12 23:04:07 +00:00
Anton Korobeynikov
ad83aeb489
Add note
...
llvm-svn: 50959
2008-05-11 14:33:15 +00:00
Chris Lattner
9f994482f5
add a note, this is actually not too bad to implement.
...
llvm-svn: 49466
2008-04-10 05:54:50 +00:00
Chris Lattner
869325c4c4
move the x86-32 part of PR2108 here.
...
llvm-svn: 49465
2008-04-10 05:37:47 +00:00
Chris Lattner
b628208161
Finish implementing a readme entry: when inserting an i64 variable
...
into a vector of zeros or undef, and when the top part is obviously
zero, we can just use movd + shuffle. This allows us to compile
vec_set-B.ll into:
_test3:
movl $1234567, %eax
andl 4(%esp), %eax
movd %eax, %xmm0
ret
instead of:
_test3:
subl $28, %esp
movl $1234567, %eax
andl 32(%esp), %eax
movl %eax, (%esp)
movl $0, 4(%esp)
movq (%esp), %xmm0
addl $28, %esp
ret
llvm-svn: 48090
2008-03-09 05:42:06 +00:00
Chris Lattner
b741ebba29
add a note
...
llvm-svn: 48064
2008-03-09 01:08:22 +00:00
Chris Lattner
17f68a3075
Implement a readme entry, compiling
...
#include <xmmintrin.h>
__m128i doload64(short x) {return _mm_set_epi16(0,0,0,0,0,0,0,1);}
into:
movl $1, %eax
movd %eax, %xmm0
ret
instead of a constant pool load.
llvm-svn: 48063
2008-03-09 01:05:04 +00:00
Chris Lattner
ff9dc0af80
This one looks easy, add a note.
...
llvm-svn: 48055
2008-03-08 22:32:39 +00:00
Chris Lattner
b12697f8bb
move these to the appropriate file
...
llvm-svn: 48054
2008-03-08 22:28:45 +00:00
Chris Lattner
83e0b885f8
evan implemented this.
...
llvm-svn: 47948
2008-03-05 17:11:51 +00:00
Chris Lattner
7571a88209
add a note
...
llvm-svn: 47939
2008-03-05 07:22:39 +00:00
Chris Lattner
299977b5ca
Evan implemented these.
...
llvm-svn: 47828
2008-03-02 18:05:14 +00:00
Chris Lattner
b714906acf
upgrade some entries, remove stuff that is done.
...
llvm-svn: 47109
2008-02-14 06:19:02 +00:00
Nate Begeman
5f18794295
readme updates
...
llvm-svn: 47051
2008-02-13 07:06:12 +00:00
Nate Begeman
5a4e290b70
Enable SSE4 codegen and pattern matching.
...
Add some notes to the README.
llvm-svn: 46949
2008-02-11 04:19:36 +00:00