1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00
Commit Graph

7206 Commits

Author SHA1 Message Date
Benjamin Kramer
788f55c7d4 DAGCombiner: Fold a shuffle on CONCAT_VECTORS into a new CONCAT_VECTORS if possible.
This pattern occurs in SROA output due to the way vector arguments are lowered
on ARM.

The testcase from PR15525 now compiles into this, which is better than the code
we got with the old scalarrepl:
_Store:
	ldr.w	r9, [sp]
	vmov	d17, r3, r9
	vmov	d16, r1, r2
	vst1.8	{d16, d17}, [r0]
	bx	lr

Differential Revision: http://llvm-reviews.chandlerc.com/D647

llvm-svn: 179106
2013-04-09 17:41:43 +00:00
Hal Finkel
c72f2476a9 Use virtual base registers on PPC
On PowerPC, non-vector loads and stores have r+i forms; however, in functions
with large stack frames these were not being used to access slots far from the
stack pointer because such slots were out of range for the signed 16-bit
immediate offset field. This increases register pressure because we need a
separate register for each offset (when the r+r form is used). By enabling
virtual base registers, we can deal with large stack frames without unduly
increasing register pressure.

llvm-svn: 179105
2013-04-09 17:27:09 +00:00
Hal Finkel
68742d4f1f Convert test PowerPC/2007-09-07-LoadStoreIdxForms to FileCheck
llvm-svn: 179104
2013-04-09 17:26:55 +00:00
Jakob Stoklund Olesen
6a275455ac Compute correct frame sizes for SPARC v9 64-bit frames.
The save area is twice as big and there is no struct return slot. The
stack pointer is always 16-byte aligned (after adding the bias).

Also eliminate the stack adjustment instructions around calls when the
function has a reserved stack frame.

llvm-svn: 179083
2013-04-09 04:37:47 +00:00
Hal Finkel
0daaa8e2de Generate PPC early conditional returns
PowerPC has a conditional branch to the link register (return) instruction: BCLR.
This should be used any time when we'd otherwise have a conditional branch to a
return. This adds a small pass, PPCEarlyReturn, which runs just prior to the
branch selection pass (and, importantly, after block placement) to generate
these conditional returns when possible. It will also eliminate unconditional
branches to returns (these happen rarely; most of the time these have already
been tail duplicated by the time PPCEarlyReturn is invoked). This is a nice
optimization for small functions that do not maintain a stack frame.

llvm-svn: 179026
2013-04-08 16:24:03 +00:00
Tim Northover
8eb5637d73 AArch64: remove barriers from AArch64 atomic operations.
I've managed to convince myself that AArch64's acquire/release
instructions are sufficient to guarantee C++11's required semantics,
even in the sequentially-consistent case.

llvm-svn: 179005
2013-04-08 08:40:41 +00:00
Hal Finkel
9af1fa0d50 Cleanup and improve PPC fsel generation
First, we should not cheat: fsel-based lowering of select_cc is a
finite-math-only optimization (the ISA manual, section F.3 of v2.06, makes
this clear, as does a note in our own README).

This also adds fsel-based lowering of EQ and NE condition codes. As it turned
out, fsel generation was covered by a grand total of zero regression test
cases. I've added some test cases to cover the existing behavior (which is now
finite-math only), as well as the new EQ cases.

llvm-svn: 179000
2013-04-07 22:11:09 +00:00
Jakob Stoklund Olesen
658f7754a3 Implement LowerCall_64 for the SPARC v9 64-bit ABI.
There is still no support for byval arguments (which I don't think are
needed) and varargs.

llvm-svn: 178993
2013-04-07 19:10:57 +00:00
Jakob Stoklund Olesen
594e073ba0 Implement LowerReturn_64 for SPARC v9.
Integer return values are sign or zero extended by the callee, and
structs up to 32 bytes in size can be returned in registers.

The CC_Sparc64 CallingConv definition is shared between
LowerFormalArguments_64 and LowerReturn_64. Function arguments and
return values are passed in the same registers.

The inreg flag is also used for return values. This is required to handle
C functions returning structs containing floats and ints:

  struct ifp {
    int i;
    float f;
  };

  struct ifp f(void);

LLVM IR:

  define inreg { i32, float } @f() {
     ...
     ret { i32, float } %retval
  }

The ABI requires that %retval.i is returned in the high bits of %i0
while %retval.f goes in %f1.

Without the inreg return value attribute, %retval.i would go in %i0 and
%retval.f would go in %f3 which is a more efficient way of returning
%multiple values, but it is not ABI compliant for returning C structs.

llvm-svn: 178966
2013-04-06 23:57:33 +00:00
Jakob Stoklund Olesen
2e3d792bf5 SPARC v9 stack pointer bias.
64-bit SPARC v9 processes use biased stack and frame pointers, so the
current function's stack frame is located at %sp+BIAS .. %fp+BIAS where
BIAS = 2047.

This makes more local variables directly accessible via [%fp+simm13]
addressing.

llvm-svn: 178965
2013-04-06 21:38:57 +00:00
Hal Finkel
6c65d3a736 Implement PPCInstrInfo::FoldImmediate
There are certain PPC instructions into which we can fold a zero immediate
operand. We can detect such cases by looking at the register class required
by the using operand (so long as it is not otherwise constrained).

llvm-svn: 178961
2013-04-06 19:30:30 +00:00
Jakob Stoklund Olesen
2076ffbc15 Complete formal arguments for the SPARC v9 64-bit ABI.
All arguments are formally assigned to stack positions and then promoted
to floating point and integer registers. Since there are more floating
point registers than integer registers, this can cause situations where
floating point arguments are assigned to registers after integer
arguments that where assigned to the stack.

Use the inreg flag to indicate 32-bit fragments of structs containing
both float and int members.

The three-way shadowing between stack, integer, and floating point
registers requires custom argument lowering. The good news is that
return values are passed in the exact same way, and we can share the
code.

Still missing:

 - Update LowerReturn to handle structs returned in registers.
 - LowerCall.
 - Variadic functions.

llvm-svn: 178958
2013-04-06 18:32:12 +00:00
Tom Stellard
8ad4f7c25b R600/SI: Add support for buffer stores v2
v2:
  - Use the ADDR64 bit

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178931
2013-04-05 23:31:51 +00:00
Tom Stellard
917d7412f1 R600/SI: Add processor types for each SI variant
Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178928
2013-04-05 23:31:35 +00:00
Tom Stellard
17fba38a4b R600/SI: Avoid generating S_MOVs with 64-bit immediates v2
SITargetLowering::analyzeImmediate() was converting the 64-bit values
to 32-bit and then checking if they were an inline immediate.  Some
of these conversions caused this check to succeed and produced
S_MOV instructions with 64-bit immediates, which are illegal.

v2:
  - Clean up logic

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178927
2013-04-05 23:31:20 +00:00
Hal Finkel
b556d850c9 Enable early if conversion on PPC
On cores for which we know the misprediction penalty, and we have
the isel instruction, we can profitably perform early if conversion.
This enables us to replace some small branch sequences with selects
and avoid the potential stalls from mispredicting the branches.

Enabling this feature required implementing canInsertSelect and
insertSelect in PPCInstrInfo; isel code in PPCISelLowering was
refactored to use these functions as well.

llvm-svn: 178926
2013-04-05 23:29:01 +00:00
Timur Iskhodzhanov
c004e9db2d Make the test/CodeGen/X86/win32_sret.ll reliable on any CPU by explicitly specifying the -mcpu
llvm-svn: 178885
2013-04-05 17:05:56 +00:00
Renato Golin
9d05117f2b Reverting 178851 as it broke buildbots
llvm-svn: 178883
2013-04-05 16:39:53 +00:00
Stepan Dyatkovskiy
98f7dac944 Fix for PR14824: "Optimization arm_ldst_opt inserts newly generated instruction vldmia at incorrect position".
Patch introduces memory operands tracking in ARMLoadStoreOpt::LoadStoreMultipleOpti. For each register it keeps the order of load operations as it was before optimization pass.
It is kind of deep improvement of fix proposed by Hao: http://llvm.org/bugs/show_bug.cgi?id=14824#c4
But it also tracks conflicts between different register classes (e.g. D2 and S5).
For more details see:
Bug description: http://llvm.org/bugs/show_bug.cgi?id=14824
LLVM Commits discussion: 
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130311/167936.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130318/168688.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130325/169376.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130401/170238.html

llvm-svn: 178851
2013-04-05 05:52:14 +00:00
Andrew Trick
6da34cd35e RegisterPressure heuristics currently require signed comparisons.
llvm-svn: 178823
2013-04-05 00:31:34 +00:00
Hal Finkel
1dc78e666b PPC: Improve code generation for mixed-precision reciprocal sqrt
The DAGCombine logic that recognized a/sqrt(b) and transformed it into
a multiplication by the reciprocal sqrt did not handle cases where the
sqrt and the division were separated by an fpext or fptrunc.

llvm-svn: 178801
2013-04-04 22:44:12 +00:00
Jakob Stoklund Olesen
a53fa8d450 Avoid high-latency false CPSR dependencies even for tMOVSi.
The Thumb2SizeReduction pass avoids false CPSR dependencies, except it
still aggressively creates tMOVi8 instructions because they are so
common.

Avoid creating false CPSR dependencies even for tMOVi8 instructions when
the the CPSR flags are known to have high latency. This allows integer
computation to overlap floating point computations.

Also process blocks in a reverse post-order and propagate high-latency
flags to successors.

<rdar://problem/13468102>

llvm-svn: 178773
2013-04-04 18:25:36 +00:00
Stepan Dyatkovskiy
0562afa331 New-password-test commit.
llvm-svn: 178765
2013-04-04 16:11:18 +00:00
Vincent Lejeune
a680946842 R600: Take export into account when computing cf address
llvm-svn: 178761
2013-04-04 13:59:59 +00:00
Jakob Stoklund Olesen
1969a96fcd Add SPARC v9 support for select on 64-bit compares.
This requires v9 cmov instructions using the %xcc flags instead of the
%icc flags.

Still missing:
- Select floats on %xcc flags.
- Select i64 on %fcc flags.

llvm-svn: 178737
2013-04-04 03:08:00 +00:00
Vincent Lejeune
6a4ef74f44 R600: Fix last ALU of a clause being emitted in a separate clause
llvm-svn: 178675
2013-04-03 18:24:47 +00:00
Bill Schmidt
990515e4c4 Fix PR15632: No support for ppcf128 floating-point remainder on PowerPC.
For this we need to use a libcall.  Previously LLVM didn't implement
libcall support for frem, so I've added it in the usual
straightforward manner.  A test case from the bug report is included.

llvm-svn: 178639
2013-04-03 13:05:44 +00:00
Timur Iskhodzhanov
0976f711d6 Temporarily relax the WIN32 checks in the SRet test to fix the Atom D2700 bot
llvm-svn: 178635
2013-04-03 12:17:15 +00:00
Timur Iskhodzhanov
ecd533f0ec Fix SRet for thiscall in i686-pc-win32
llvm-svn: 178634
2013-04-03 11:27:54 +00:00
Jakob Stoklund Olesen
3b7eaf9bb6 Add 64-bit compare + branch for SPARC v9.
The same compare instruction is used for 32-bit and 64-bit compares. It
sets two different sets of flags: icc and xcc.

This patch adds a conditional branch instruction using the xcc flags for
64-bit compares.

llvm-svn: 178621
2013-04-03 04:41:44 +00:00
Hal Finkel
0208f7c744 Use PPC reciprocal estimates with Newton iteration in fast-math mode
When unsafe FP math operations are enabled, we can use the fre[s] and
frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together
with some Newton iteration, in order to quickly generate floating-point
division and sqrt results. All of these instructions are separately optional,
and so each has its own feature flag (except for the Altivec instructions,
which are covered under the existing Altivec flag). Doing this is not only
faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these
computations to be pipelined with other computations in order to hide their
overall latency.

I've also added a couple of missing fnmsub patterns which turned out to be
missing (but are necessary for good code generation of the Newton iterations).
Altivec needs a similar fix, but that will probably be more complicated because
fneg is expanded for Altivec's v4f32.

llvm-svn: 178617
2013-04-03 04:01:11 +00:00
Akira Hatanaka
f08d3a5a83 [mips] Small update to the implementation of eh.return for Mips.
This patch initializes t9 to the handler address, but only if the relocation
model is pic. This handles the case where handler to which eh.return jumps 
points to the start of the function.

Patch by Sasa Stankovic.

llvm-svn: 178588
2013-04-02 23:02:07 +00:00
NAKAMURA Takumi
d8a9117bcb llvm/test/CodeGen/X86: Unmark them out of XFAIL:cygming, in atomic{32|64}.ll and handle-move.ll, corresponding to r178549.
This reverts r176808, r176798, and r177914.

llvm-svn: 178583
2013-04-02 22:35:08 +00:00
Bill Schmidt
c98ed219d3 Fix PR15630: Replace faulty stdcx. with stwcx.
When doing a partword atomic operation, a lwarx was being paired with
a stdcx. instead of a stwcx. when compiling for a 64-bit target.  The
target has nothing to do with it in this case; we always need a stwcx.

Thanks to Kai Nacke for reporting the problem.

llvm-svn: 178559
2013-04-02 18:37:08 +00:00
Jakob Stoklund Olesen
b0a5a72daf Don't attempt MTM heuristics without a scheduling model present.
This should fix the PPC buildbots.

llvm-svn: 178558
2013-04-02 18:26:45 +00:00
Chad Rosier
908153170e [fast-isel] Use the correct API to disable FastLowerArguments for Win64.
llvm-svn: 178549
2013-04-02 16:31:41 +00:00
Arnold Schwaighofer
c9508f817a DAGCombiner: Merge store/loads when we have extload/truncstores
This is helps on architectures where i8,i16 are not legal but we have byte, and
short loads/stores. Allowing us to merge copies like the one below on ARM.

copy(char *a, char *b, int n) {
 do {
   int t0 = a[0];
   int t1 = a[1];
   b[0] = t0;
   b[1] = t1;

radar://13536387

llvm-svn: 178546
2013-04-02 15:58:51 +00:00
Preston Gurd
fca710bf70 Simplify test cases for Atom preferring call register indirect over
call memory indirect (32 and 64 bit).

llvm-svn: 178541
2013-04-02 14:25:06 +00:00
Jakob Stoklund Olesen
8a184a7fe4 Add 64-bit load and store instructions.
There is only a few new instructions, the rest is handled with patterns.

llvm-svn: 178528
2013-04-02 04:09:28 +00:00
Jakob Stoklund Olesen
22fe26207f Basic 64-bit ALU operations.
SPARC v9 extends all ALU instructions to 64 bits, so we simply need to
add patterns to use them for both i32 and i64 values.

llvm-svn: 178527
2013-04-02 04:09:23 +00:00
Jakob Stoklund Olesen
d57f9ab92f Materialize 64-bit immediates.
The last resort pattern produces 6 instructions, and there are still
opportunities for materializing some immediates in fewer instructions.

llvm-svn: 178526
2013-04-02 04:09:17 +00:00
Jakob Stoklund Olesen
5ef2195726 Add 64-bit shift instructions.
SPARC v9 defines new 64-bit shift instructions. The 32-bit shift right
instructions are still usable as zero and sign extensions.

This adds new F3_Sr and F3_Si instruction formats that probably should
be used for the 32-bit shifts as well. They don't really encode an
simm13 field.

llvm-svn: 178525
2013-04-02 04:09:12 +00:00
Jakob Stoklund Olesen
9fbfce2d11 Add support for 64-bit calling convention.
This is far from complete, but it is enough to make it possible to write
test cases using i64 arguments.

Missing features:
- Floating point arguments.
- Receiving arguments on the stack.
- Calls.

llvm-svn: 178523
2013-04-02 04:09:02 +00:00
Vincent Lejeune
dc0e12bd5b R600: Add support for native control flow
llvm-svn: 178505
2013-04-01 21:48:05 +00:00
Vincent Lejeune
11918406b3 R600: Emit CF_ALU and use true kcache register.
llvm-svn: 178503
2013-04-01 21:47:42 +00:00
Hal Finkel
9191cdb5f2 Fix a bad assert in PPCTargetLowering
llvm-svn: 178489
2013-04-01 18:42:58 +00:00
Hal Finkel
9cd1e5e93c Add triple to test/CodeGen/PowerPC/stfiwx-2
llvm-svn: 178486
2013-04-01 18:18:44 +00:00
Arnold Schwaighofer
a2a475a83d Merge load/store sequences with adresses: base + index + offset
We would also like to merge sequences that involve a variable index like in the
example below.

    int index = *idx++
    int i0 = c[index+0];
    int i1 = c[index+1];
    b[0] = i0;
    b[1] = i1;

By extending the parsing of the base pointer to handle dags that contain a
base, index, and offset we can handle examples like the one above.

The dag for the code above will look something like:

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i8 load %index))))

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i32 add (i32 signextend (i8 load %index))
                                         (i32 1)))))

The code that parses the tree ignores the intermediate sign extensions. However,
if there is a sign extension it needs to be on all indexes.

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (add (i8 load %index)
                                     (i8 1))))
 vs

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i32 add (i32 signextend (i8 load %index))
                                         (i32 1)))))
radar://13536387

llvm-svn: 178483
2013-04-01 18:12:58 +00:00
Hal Finkel
f184647a53 Add more PPC floating-point conversion instructions
The P7 and A2 have additional floating-point conversion instructions which
allow a direct two-instruction sequence (plus load/store) to convert from all
combinations (signed/unsigned i32/i64) <--> (float/double) (on previous cores,
only some combinations were directly available).

llvm-svn: 178480
2013-04-01 17:52:07 +00:00
Hal Finkel
55f144f923 Fix PowerPC/cttz.ll to specify a cpu (and use FileCheck)
llvm-svn: 178472
2013-04-01 16:31:56 +00:00