1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 20:12:56 +02:00
Commit Graph

427 Commits

Author SHA1 Message Date
Nadav Rotem
b461f2190e Add X86-SSE4 codegen support for vector-select.
llvm-svn: 139285
2011-09-08 08:11:19 +00:00
Rafael Espindola
9182560b8f Fix comment. Noticed by Duncan.
llvm-svn: 139161
2011-09-06 19:29:31 +00:00
Duncan Sands
d1311488fe Add codegen support for vector select (in the IR this means a select
with a vector condition); such selects become VSELECT codegen nodes.
This patch also removes VSETCC codegen nodes, unifying them with SETCC
nodes (codegen was actually often using SETCC for vector SETCC already).
This ensures that various DAG combiner optimizations kick in for vector
comparisons.  Passes dragonegg bootstrap with no testsuite regressions
(nightly testsuite as well as "make check-all").  Patch mostly by
Nadav Rotem.

llvm-svn: 139159
2011-09-06 19:07:46 +00:00
Duncan Sands
6939ae53ac Split the init.trampoline intrinsic, which currently combines GCC's
init.trampoline and adjust.trampoline intrinsics, into two intrinsics
like in GCC.  While having one combined intrinsic is tempting, it is
not natural because typically the trampoline initialization needs to
be done in one function, and the result of adjust trampoline is needed
in a different (nested) function.  To get around this llvm-gcc hacks the
nested function lowering code to insert an additional parent variable
holding the adjust.trampoline result that can be accessed from the child
function.  Dragonegg doesn't have the luxury of tweaking GCC code, so it
stored the result of adjust.trampoline in the memory GCC set aside for
the trampoline itself (this is always available in the child function),
and set up some new memory (using an alloca) to hold the trampoline.
Unfortunately this breaks Go which allocates trampoline memory on the
heap and wants to use it even after the parent has exited (!).  Rather
than doing even more hacks to get Go working, it seemed best to just use
two intrinsics like in GCC.  Patch mostly by Sanjoy Das.

llvm-svn: 139140
2011-09-06 13:37:06 +00:00
Rafael Espindola
9db302e741 Adds support for variable sized allocas. For a variable sized alloca,
code is inserted to first check if the current stacklet has enough
space. If so, space is allocated by simply decrementing the stack
pointer. Otherwise a runtime routine (__morestack_allocate_stack_space
in libgcc) is called which allocates the required memory from the
heap.

Patch by Sanjoy Das.

llvm-svn: 138818
2011-08-30 19:47:04 +00:00
Rafael Espindola
7721c15106 Adds a SelectionDAG node X86SegAlloca which will be custom lowered
from DYNAMIC_STACKALLOC.

Two new pseudo instructions (SEG_ALLOCA_32 and SEG_ALLOCA_64) which
will match X86SegAlloca (based on word size) are also added.  They
will be custom emitted to inject the actual stack handling code.

Patch by Sanjoy Das.

llvm-svn: 138814
2011-08-30 19:43:21 +00:00
Eli Friedman
9f95c7d381 Add support for generating CMPXCHG16B on x86-64 for the cmpxchg IR instruction.
llvm-svn: 138660
2011-08-26 21:21:21 +00:00
Craig Topper
1da38a34a6 Break 256-bit vector int add/sub/mul into two 128-bit operations to avoid costly scalarization. Fixes PR10711.
llvm-svn: 138427
2011-08-24 06:14:18 +00:00
Bruno Cardoso Lopes
98531dfd08 Introduce matching patterns for vbroadcast AVX instruction. The idea is to
match splats in the form (splat (scalar_to_vector (load ...))) whenever
the load can be folded. All the logic and instruction emission is
working but because of PR8156, there are no ways to match loads, cause
they can never be folded for splats. Thus, the tests are XFAILed, but
I've tested and exercised all the logic using a relaxed version for
checking the foldable loads, as if the bug was already fixed. This
should work out of the box once PR8156 gets fixed since MayFoldLoad will
work as expected.

llvm-svn: 137810
2011-08-17 02:29:19 +00:00
Bruno Cardoso Lopes
2d100ca13c The VPERM2F128 is a AVX instruction which permutes between two 256-bit
vectors. It operates on 128-bit elements instead of regular scalar
types. Recognize shuffles that are suitable for VPERM2F128 and teach
the x86 legalizer how to handle them.

llvm-svn: 137519
2011-08-12 21:48:26 +00:00
Bruno Cardoso Lopes
02bbf20b02 Cleanup PALIGNR handling and remove the old palign pattern fragment.
Also make PALIGNR masks to don't match 256-bits, which isn't supported
It's also a step to solve PR10489

llvm-svn: 136448
2011-07-29 01:30:59 +00:00
Eli Friedman
842ea169de Code generation for 'fence' instruction.
llvm-svn: 136283
2011-07-27 22:21:52 +00:00
Bruno Cardoso Lopes
8830fde434 The vpermilps and vpermilpd have different behaviour regarding the
usage of the shuffle bitmask. Both work in 128-bit lanes without
crossing, but in the former the mask of the high part is the same
used by the low part while in the later both lanes have independent
masks. Handle this properly and and add support for vpermilpd.

llvm-svn: 136200
2011-07-27 00:56:34 +00:00
Bruno Cardoso Lopes
e53bb853ea Recognize unpckh* masks and match 256-bit versions. The new versions are
different from the previous 128-bit because they work in lanes.
Update a few comments and add testcases

llvm-svn: 136157
2011-07-26 22:03:40 +00:00
Bruno Cardoso Lopes
85d8ca32c9 More movsldup/movshdup cleanup. Rewrite the mask matching function and add
support for 256-bit versions (but no instruction selection yet, coming next).

llvm-svn: 136050
2011-07-26 02:39:28 +00:00
Bruno Cardoso Lopes
b7b9688aa5 -Inspected a AVX code block added by someone in early Feb. This was never used
and was actually very wrong, fix it and make it simpler. Also remove the
ConcatVectors function, which is unused now.

- Fix a introduction of useless nodes in r126664 and r126264. The
VUNPCKL* should never be introduced cause we don't want duplicate
nodes for 128 AVX and non-AVX modes, the actual instruction
difference only exists during isel, but not for target specific DAG
nodes. We only introduce V* target nodes when there is no 128-bit
version already there.

- Fix a fragile test and make it more useful.

llvm-svn: 135729
2011-07-22 00:15:07 +00:00
Bruno Cardoso Lopes
ba1a2a9135 Add support for 256-bit versions of VPERMIL instruction. This is a new
instruction introduced in AVX, which can operate on 128 and 256-bit vectors.
It considers a 256-bit vector as two independent 128-bit lanes. It can permute
any 32 or 64 elements inside a lane, and restricts the second lane to
have the same permutation of the first one. With the improved splat support
introduced early today, adding codegen for this instruction enable more
efficient 256-bit code:

Instead of:
  vextractf128  $0, %ymm0, %xmm0
  punpcklbw %xmm0, %xmm0
  punpckhbw %xmm0, %xmm0
  vinsertf128 $0, %xmm0, %ymm0, %ymm1
  vinsertf128 $1, %xmm0, %ymm1, %ymm0
  vextractf128  $1, %ymm0, %xmm1
  shufps  $1, %xmm1, %xmm1
  movss %xmm1, 28(%rsp)
  movss %xmm1, 24(%rsp)
  movss %xmm1, 20(%rsp)
  movss %xmm1, 16(%rsp)
  vextractf128  $0, %ymm0, %xmm0
  shufps  $1, %xmm0, %xmm0
  movss %xmm0, 12(%rsp)
  movss %xmm0, 8(%rsp)
  movss %xmm0, 4(%rsp)
  movss %xmm0, (%rsp)
  vmovaps (%rsp), %ymm0
We get:
  vextractf128  $0, %ymm0, %xmm0
  punpcklbw %xmm0, %xmm0
  punpckhbw %xmm0, %xmm0
  vinsertf128 $0, %xmm0, %ymm0, %ymm1
  vinsertf128 $1, %xmm0, %ymm1, %ymm0
  vpermilps $85, %ymm0, %ymm0

llvm-svn: 135662
2011-07-21 01:55:47 +00:00
Chris Lattner
e1fe7061ce land David Blaikie's patch to de-constify Type, with a few tweaks.
llvm-svn: 135375
2011-07-18 04:54:35 +00:00
Nadav Rotem
b93249b1e7 [VECTOR-SELECT]
During type legalization we often use the SIGN_EXTEND_INREG SDNode.
When this SDNode is legalized during the LegalizeVector phase, it is
scalarized because non-simple types are automatically marked to be expanded.
In this patch we add support for lowering SIGN_EXTEND_INREG manually.
This fixes CodeGen/X86/vec_sext.ll when running with the '-promote-elements'
flag.

llvm-svn: 135144
2011-07-14 11:11:14 +00:00
Bruno Cardoso Lopes
b98f50da03 The target specific node PANDN name is misleading. That happens because
it's later selected to a ANDNPD/ANDNPS instruction instead of the PANDN
instruction. Rename it.

llvm-svn: 135087
2011-07-13 21:36:47 +00:00
Eric Christopher
3cd31a95dd Use getRegForInlineAsmConstraint instead of custom defining regclasses
via vectors.

Part of rdar://9643582

llvm-svn: 134079
2011-06-29 17:23:50 +00:00
Evan Cheng
f86a6485e7 Remove TargetOptions.h dependency from X86Subtarget.
llvm-svn: 133726
2011-06-23 17:54:54 +00:00
Eric Christopher
1ae9ec6124 Add a parameter to CCState so that it can access the MachineFunction.
No functional change.

Part of PR6965

llvm-svn: 132763
2011-06-08 23:55:35 +00:00
Stuart Hastings
d044ba7a9f Followup to 132458, omit unnecessary stack copy when x87 input is a
load.  rdar://problem/6373334

llvm-svn: 132696
2011-06-06 23:15:58 +00:00
Stuart Hastings
ea8b49dff3 Reapply 132424 with fixes. This fixes PR10068.
rdar://problem/5993888

llvm-svn: 132606
2011-06-03 23:53:54 +00:00
Eric Christopher
d68494ffdd Have LowerOperandForConstraint handle multiple character constraints.
Part of rdar://9119939

llvm-svn: 132510
2011-06-02 23:16:42 +00:00
Rafael Espindola
1299f014d4 Revert 132424 to fix PR10068.
llvm-svn: 132479
2011-06-02 19:57:47 +00:00
Stuart Hastings
9a085fb9d8 Recommit 132404 with fixes. rdar://problem/5993888
llvm-svn: 132424
2011-06-01 21:33:14 +00:00
Stuart Hastings
4b33767382 Revert 132404 to appease a buildbot. rdar://problem/5993888
llvm-svn: 132419
2011-06-01 19:52:20 +00:00
Stuart Hastings
23f5ceda96 Add support for x86 CMPEQSS and friends. These instructions do a
floating-point comparison, generate a mask of 0s or 1s, and generally
DTRT with NaNs.  Only profitable when the user wants a materialized 0
or 1 at runtime.  rdar://problem/5993888

llvm-svn: 132404
2011-06-01 17:17:45 +00:00
Stuart Hastings
fdc9e4af68 FGETSIGN support for x86, using movmskps/pd. Will be enabled with a
patch to TargetLowering.cpp.  rdar://problem/5660695

llvm-svn: 132388
2011-06-01 04:39:42 +00:00
Stuart Hastings
837a958ff6 Reverting 132105: it broke some LLVM-GCC DejaGNU tests.
llvm-svn: 132108
2011-05-26 04:09:49 +00:00
Stuart Hastings
e704bfb21e Correctly handle a one-word struct passed byval on x86_64.
rdar://problem/6920088

llvm-svn: 132105
2011-05-26 02:44:56 +00:00
Eli Friedman
42d94ce561 Clean up the mess created by r131467+r131469.
llvm-svn: 131471
2011-05-17 18:02:22 +00:00
Stuart Hastings
581113d8a0 Revert 131467 due to buildbot complaint.
llvm-svn: 131469
2011-05-17 16:59:46 +00:00
Stuart Hastings
a2509a7ec3 Fix an obscure issue in X86_64 parameter passing: if a tiny byval is
passed as the fifth parameter, insure it's passed correctly (in R9).
rdar://problem/6920088

llvm-svn: 131467
2011-05-17 16:45:55 +00:00
Nadav Rotem
2a654a69ed Add custom lowering of X86 vector SRA/SRL/SHL when the shift amount is a splat vector.
llvm-svn: 131179
2011-05-11 08:12:09 +00:00
Eli Friedman
12e590e760 Make the logic for determining function alignment more explicit. No functionality change.
llvm-svn: 131012
2011-05-06 20:34:06 +00:00
Evan Cheng
dd99a0a548 Re-apply r127953 with fixes: eliminate empty return block if it has no predecessors; update dominator tree if cfg is modified.
llvm-svn: 127981
2011-03-21 01:19:09 +00:00
Daniel Dunbar
34c65737c3 Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessors
to canonicalize IR", it broke a lot of things.

llvm-svn: 127954
2011-03-19 21:47:14 +00:00
Evan Cheng
c5f50f7322 SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR
to have single return block (at least getting there) for optimizations. This
is general goodness but it would prevent some tailcall optimizations.
One specific case is code like this:
int f1(void);
int f2(void);
int f3(void);
int f4(void);
int f5(void);
int f6(void);
int foo(int x) {
  switch(x) {
  case 1: return f1();
  case 2: return f2();
  case 3: return f3();
  case 4: return f4();
  case 5: return f5();
  case 6: return f6();
  }
}

=>
LBB0_2:                                 ## %sw.bb
  callq   _f1
  popq    %rbp
  ret
LBB0_3:                                 ## %sw.bb1
  callq   _f2
  popq    %rbp
  ret
LBB0_4:                                 ## %sw.bb3
  callq   _f3
  popq    %rbp
  ret

This patch teaches codegenprep to duplicate returns when the return value
is a phi and where the phi operands are produced by tail calls followed by
an unconditional branch:

sw.bb7:                                           ; preds = %entry
  %call8 = tail call i32 @f5() nounwind
  br label %return
sw.bb9:                                           ; preds = %entry
  %call10 = tail call i32 @f6() nounwind
  br label %return
return:
  %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ]
  ret i32 %retval.0

This allows codegen to generate better code like this:

LBB0_2:                                 ## %sw.bb
        jmp     _f1                     ## TAILCALL
LBB0_3:                                 ## %sw.bb1
        jmp     _f2                     ## TAILCALL
LBB0_4:                                 ## %sw.bb3
        jmp     _f3                     ## TAILCALL

rdar://9147433

llvm-svn: 127953
2011-03-19 17:17:39 +00:00
Cameron Zwarich
cea63dc052 Move more logic into getTypeForExtArgOrReturn.
llvm-svn: 127809
2011-03-17 14:53:37 +00:00
Cameron Zwarich
a5746339cc Rename getTypeForExtendedInteger() to getTypeForExtArgOrReturn().
llvm-svn: 127807
2011-03-17 14:21:56 +00:00
Cameron Zwarich
2bb1e45ea3 The x86-64 ABI says that a bool is only guaranteed to be sign-extended to a byte
rather than an int. Thankfully, this only causes LLVM to miss optimizations, not
generate incorrect code.

This just fixes the zext at the return. We still insert an i32 ZextAssert when
reading a function's arguments, but it is followed by a truncate and another i8
ZextAssert so it is not optimized.

llvm-svn: 127766
2011-03-16 22:20:18 +00:00
Cameron Zwarich
a1920d7f51 Move getRegPressureLimit() from TargetLoweringInfo to TargetRegisterInfo.
llvm-svn: 127175
2011-03-07 21:56:36 +00:00
Owen Anderson
bd26993873 Allow targets to specify a the type of the RHS of a shift parameterized on the type of the LHS.
llvm-svn: 126518
2011-02-25 21:41:48 +00:00
David Greene
7b0539174a [AVX] General VUNPCKL codegen support.
llvm-svn: 126264
2011-02-22 23:31:46 +00:00
David Greene
7de7347ee8 [AVX] Support VSINSERTF128 with more patterns and appropriate
infrastructure.  This makes lowering 256-bit vectors to 128-bit
vectors simple when 256-bit vector support is not available.

llvm-svn: 124868
2011-02-04 16:08:29 +00:00
David Greene
2753be260c [AVX] VEXTRACTF128 support. This commit includes patterns for
matching EXTRACT_SUBVECTOR to VEXTRACTF128 along with support routines
to examine and translate index values.  VINSERTF128 comes next.  With
these two in place we can begin supporting more AVX operations as
INSERT/EXTRACT can be used as a fallback when 256-bit support is not
available.

llvm-svn: 124797
2011-02-03 15:50:00 +00:00
David Greene
5c173a307b [AVX] Add INSERT_SUBVECTOR and support it on x86. This provides a
default implementation for x86, going through the stack in a similr
fashion to how the codegen implements BUILD_VECTOR.  Eventually this
will get matched to VINSERTF128 if AVX is available.

llvm-svn: 124307
2011-01-26 19:13:22 +00:00