1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-01 16:33:37 +01:00
Commit Graph

605 Commits

Author SHA1 Message Date
Bill Wendling
f10d5c00fc Consider this code snippet:
float t1(int argc) {
  return (argc == 1123) ? 1.234f : 2.38213f;
}

We would generate truly awful code on ARM (those with a weak stomach should look
away):

_t1:
  movw   r1, #1123
  movs   r2, #1
  movs   r3, #0
  cmp    r0, r1
  mov.w  r0, #0
  it     eq
  moveq  r0, r2
  movs   r1, #4
  cmp    r0, #0
  it     ne
  movne  r3, r1
  adr    r0, #LCPI1_0
  ldr    r0, [r0, r3]
  bx     lr

The problem was that legalization was creating a cascade of SELECT_CC nodes, for
for the comparison of "argc == 1123" which was fed into a SELECT node for the ?:
statement which was itself converted to a SELECT_CC node. This is because the
ARM back-end doesn't have custom lowering for SELECT nodes, so it used the
default "Expand".

I added a fairly simple "LowerSELECT" to the ARM back-end. It takes care of this
testcase, but can obviously be expanded to include more cases.

Now we generate this, which looks optimal to me:

_t1:
  movw   r1, #1123
  movs   r2, #0
  cmp    r0, r1
  adr    r0, #LCPI0_0
  it     eq
  moveq  r2, #4
  ldr    r0, [r0, r2]
  bx     lr
  .align  2
LCPI0_0:
  .long   1075344593  @ float 2.382130e+00
  .long   1067316150  @ float 1.234000e+00

llvm-svn: 110799
2010-08-11 08:43:16 +00:00
Evan Cheng
f8604b772e Report error if codegen tries to instantiate a ARM target when the cpu does support it. e.g. cortex-m* processors.
llvm-svn: 110798
2010-08-11 07:17:46 +00:00
Bill Wendling
8c5ac0d30c Update test to match output of optimize compares for ARM.
llvm-svn: 110765
2010-08-11 01:05:02 +00:00
Bill Wendling
37ac7cfa7d The optimize comparisons pass removes the "cmp" instruction this is checking for.
llvm-svn: 110739
2010-08-10 22:16:05 +00:00
Rafael Espindola
6d53fded19 Fix eabi calling convention when a 64 bit value shadows r3.
Without this what was happening was:

* R3 is not marked as "used"
* ARM backend thinks it has to save it to the stack because of vaarg
* Offset computation correctly ignores it
* Offsets are wrong

llvm-svn: 110446
2010-08-06 15:35:32 +00:00
Bill Wendling
46dea7e086 Testcase for r110248.
llvm-svn: 110249
2010-08-04 21:56:30 +00:00
Bob Wilson
6a2437480a Combine NEON VABD (absolute difference) intrinsics with ADDs to make VABA
(absolute difference with accumulate) intrinsics.  Radar 8228576.

llvm-svn: 110170
2010-08-04 00:12:08 +00:00
Anton Korobeynikov
5e3d50ec58 Currently EH lowering code expects typeinfo to be global only.
This assumption is not satisfied due to global mergeing.
Workaround the issue by temporary disablinge mergeing of const globals.
Also, ignore LLVM "special" globals. This fixes PR7716

llvm-svn: 109423
2010-07-26 18:45:39 +00:00
Evan Cheng
f215e55d5f - Allow target to specify when is register pressure "too high". In most cases,
it's too late to start backing off aggressive latency scheduling when most
  of the registers are in use so the threshold should be a bit tighter.
- Correctly handle live out's and extract_subreg etc.
- Enable register pressure aware scheduling by default for hybrid scheduler.
  For ARM, this is almost always a win on # of instructions. It's runtime
  neutral for most of the tests. But for some kernels with high register
  pressure it can be a huge win. e.g. 464.h264ref reduced number of spills by
  54 and sped up by 20%.

llvm-svn: 109279
2010-07-23 22:39:59 +00:00
Evan Cheng
5aa6a25102 More register pressure aware scheduling work.
llvm-svn: 109064
2010-07-21 23:53:58 +00:00
Eric Christopher
3d118d5e8a Baby steps towards ARM fast-isel.
llvm-svn: 109047
2010-07-21 22:26:11 +00:00
Rafael Espindola
9aab8413b8 Fix calling convention on ARM if vfp2+ is enabled.
llvm-svn: 109009
2010-07-21 11:38:30 +00:00
Jim Grosbach
270540da7b Add combiner patterns to more effectively utilize the BFI (bitfield insert)
instruction for non-constant operands. This includes the case referenced
in the README.txt regarding a bitfield copy.

llvm-svn: 108608
2010-07-17 03:30:54 +00:00
Jim Grosbach
749f4fca0a Add basic support to code-gen the ARM/Thumb2 bit-field insert (BFI) instruction
and a combine pattern to use it for setting a bit-field to a constant
value. More to come for non-constant stores.

llvm-svn: 108570
2010-07-16 23:05:05 +00:00
Evan Cheng
ffbae6ad52 Split -enable-finite-only-fp-math to two options:
-enable-no-nans-fp-math and -enable-no-infs-fp-math. All of the current codegen fp math optimizations only care whether the fp arithmetics arguments and results can never be NaN.

llvm-svn: 108465
2010-07-15 22:07:12 +00:00
Jim Grosbach
e2d1ecbe70 Improve 64-subtraction of immediates when parts of the immediate can fit
in the literal field of an instruction. E.g.,
long long foo(long long a) {
  return a - 734439407618LL;
}

rdar://7038284

llvm-svn: 108339
2010-07-14 17:45:16 +00:00
Bob Wilson
34f481e895 Add support for NEON VMVN immediate instructions.
llvm-svn: 108324
2010-07-14 06:31:50 +00:00
Bob Wilson
0f581a998c Add an ARM-specific DAG combining to avoid redundant VDUPLANE nodes.
Radar 7373643.

llvm-svn: 108303
2010-07-14 01:22:12 +00:00
Bob Wilson
7feb850d36 Use a target-specific VMOVIMM DAG node instead of BUILD_VECTOR to represent
NEON VMOV-immediate instructions.  This simplifies some things.

llvm-svn: 108275
2010-07-13 21:16:48 +00:00
Evan Cheng
069f1f7c9a Extend the r107852 optimization which turns some fp compare to code sequence using only i32 operations. It now optimize some f64 compares when fp compare is exceptionally slow (e.g. cortex-a8). It also catches comparison against 0.0.
llvm-svn: 108258
2010-07-13 19:27:42 +00:00
Rafael Espindola
84716579d4 Fix va_arg for doubles. With this patch VAARG nodes always contain the
correct alignment information, which simplifies ExpandRes_VAARG a bit.

The patch introduces a new alignment information to TargetLoweringInfo. This is
needed since the two natural candidates cannot be used:

* The 's' in target data: If this is set to the minimal alignment of any
  argument, getCallFrameTypeAlignment would return 4 for doubles on ARM for
  example.
* The getTransientStackAlignment method. It is possible for an architecture to
  have argument less aligned than what we maintain the stack pointer.

llvm-svn: 108072
2010-07-11 04:01:49 +00:00
Jim Grosbach
b591b3b48d In the presence of variable sized objects, allocate an emergency spill slot.
rdar://8131327

llvm-svn: 108008
2010-07-09 20:27:06 +00:00
Jakob Stoklund Olesen
b4c406a101 Fix test to be less sensitive of regalloc accidents
llvm-svn: 107951
2010-07-09 01:32:11 +00:00
Bob Wilson
f15e542bdc Print "dregpair" NEON operands with a space between them, for readability and
consistency with other instructions that have lists of register operands.

llvm-svn: 107944
2010-07-09 00:47:20 +00:00
Bob Wilson
dd02fe62a2 Reenable DAG combining for vector shuffles. It looks like it was temporarily
disabled and then never turned back on again.  Adjust some tests, one because
this change avoids an unnecessary instruction, and the other to make it
continue testing what it was intended to test.

llvm-svn: 107941
2010-07-09 00:38:12 +00:00
Evan Cheng
5307ec12d7 Check for FiniteOnlyFPMath as well.
llvm-svn: 107904
2010-07-08 20:12:24 +00:00
Evan Cheng
3e8530bf14 r107852 is only safe with -enable-unsafe-fp-math to account for +0.0 == -0.0.
llvm-svn: 107856
2010-07-08 06:01:49 +00:00
Evan Cheng
ed3f224f04 Optimize some vfp comparisons to integer ones. This patch implements the simplest case when the following conditions are met:
1. The arguments are f32.
2. The arguments are loads and they have no uses other than the comparison.
3. The comparison code is EQ or NE.

e.g.
        vldr.32 s0, [r1]
        vldr.32 s1, [r0]
        vcmpe.f32       s1, s0
        vmrs    apsr_nzcv, fpscr
	beq     LBB0_2
=>
        ldr     r1, [r1]
        ldr     r0, [r0]
        cmp     r0, r1
        beq     LBB0_2

More complicated cases will be implemented in subsequent patches.

llvm-svn: 107852
2010-07-08 02:08:50 +00:00
Dale Johannesen
2df647f882 Changes to ARM tail calls, mostly cosmetic.
Add explicit testcases for tail calls within the same module.
Duplicate some code to humor those who think .w doesn't apply on ARM.
Leave this disabled on Thumb1, and add some comments explaining why it's hard
and won't gain much.

llvm-svn: 107851
2010-07-08 01:18:23 +00:00
Rafael Espindola
e5689571a1 Don't create neon moves in CopyRegToReg. NEONMoveFixPass will do the conversion
if profitable.

llvm-svn: 107673
2010-07-06 16:24:34 +00:00
Bob Wilson
f96aab3079 Fix incorrect asm-printing of some NEON immediates. Fix weak testcase so
that it checks the immediate values, not just the instructions opcodes.
Radar 8110263.

llvm-svn: 107487
2010-07-02 17:23:44 +00:00
Bill Wendling
90b6422f2f Implement the "linker_private_weak" linkage type. This will be used for
Objective-C metadata types which should be marked as "weak", but which the
linker will remove upon final linkage. However, this linkage isn't specific to
Objective-C.

For example, the "objc_msgSend_fixup_alloc" symbol is defined like this:

      .globl l_objc_msgSend_fixup_alloc
      .weak_definition l_objc_msgSend_fixup_alloc
      .section __DATA, __objc_msgrefs, coalesced
      .align 3
l_objc_msgSend_fixup_alloc:
       .quad   _objc_msgSend_fixup
       .quad   L_OBJC_METH_VAR_NAME_1

This is different from the "linker_private" linkage type, because it can't have
the metadata defined with ".weak_definition".

Currently only supported on Darwin platforms.

llvm-svn: 107433
2010-07-01 21:55:59 +00:00
Jakob Stoklund Olesen
fff50dd31d Fix the handling of partial redefines in the fast register allocator.
A partial redefine needs to be treated like a tied operand, and the register
must be reloaded while processing use operands.

This fixes a bug where partially redefined registers were processed as normal
defs with a reload added. The reload could clobber another use operand if it was
a kill that allowed register reuse.

llvm-svn: 107193
2010-06-29 19:15:30 +00:00
Bob Wilson
ef26313c6b Fix a register scavenger crash when dealing with undefined subregs.
The LowerSubregs pass needs to preserve implicit def operands attached to
EXTRACT_SUBREG instructions when it replaces those instructions with copies.

llvm-svn: 107189
2010-06-29 18:42:49 +00:00
Rafael Espindola
832e4ddde7 Add a VT argument to getMinimalPhysRegClass and replace the copy related uses
of getPhysicalRegisterRegClass with it.

If we want to make a copy (or estimate its cost), it is better to use the
smallest class as more efficient operations might be possible.

llvm-svn: 107140
2010-06-29 14:02:34 +00:00
Bob Wilson
5ae34cf120 Unlike other targets, ARM now uses BUILD_VECTORs post-legalization so they
can't be changed arbitrarily by the DAGCombiner without checking if it is
running after legalization.

llvm-svn: 107097
2010-06-28 23:40:25 +00:00
Rafael Espindola
317a02739d When splitting a VAARG, remember its alignment.
This produces terrible but correct code.

llvm-svn: 106952
2010-06-26 18:22:20 +00:00
Daniel Dunbar
c0c37cea64 Thumb2ITBlockPass: Fix a possible dereference of an invalid iterator. This was
introduced in r106343, but only showed up recently (with a particular compiler &
linker combination) because of the particular check, and because we have no
builtin checking for dereferencing the end of an array, which is truly
unfortunate.

llvm-svn: 106908
2010-06-25 23:14:54 +00:00
Evan Cheng
346aecdb8b Change if-conversion block size limit checks to add some flexibility.
llvm-svn: 106901
2010-06-25 22:42:03 +00:00
Dan Gohman
0be71f4660 Teach EmitLiveInCopies to omit copies for unused virtual registers,
and to clean up unused incoming physregs from the live-in list.

llvm-svn: 106805
2010-06-24 22:23:02 +00:00
Bill Wendling
eebd6fa159 It's possible that a flag is added to the SDNode that points back to the
original SDNode. This is badness. Also, this function allows one SDNode to point
multiple flags to another SDNode. Badness as well.

llvm-svn: 106793
2010-06-24 22:00:37 +00:00
Jakob Stoklund Olesen
9f8104463f Replace a big gob of old coalescer logic with the new CoalescerPair class.
CoalescerPair can determine if a copy can be coalesced, and which register gets
merged away. The old logic in SimpleRegisterCoalescing had evolved into
something a bit too convoluted.

This second attempt fixes some crashes that only occurred Linux.

llvm-svn: 106769
2010-06-24 18:15:01 +00:00
Dan Gohman
d79ac4a097 Eliminate the other half of the BRCOND optimization, and update
as many tests as possible.

llvm-svn: 106749
2010-06-24 15:24:03 +00:00
Jakob Stoklund Olesen
1c9d50ed92 Revert "Replace a big gob of old coalescer logic with the new CoalescerPair class."
Whiny buildbots.

llvm-svn: 106710
2010-06-24 00:52:22 +00:00
Jakob Stoklund Olesen
19abbf4387 Replace a big gob of old coalescer logic with the new CoalescerPair class.
CoalescerPair can determine if a copy can be coalesced, and which register gets
merged away. The old logic in SimpleRegisterCoalescing had evolved into
something a bit too convoluted.

llvm-svn: 106701
2010-06-24 00:12:39 +00:00
Bill Wendling
777c025ad8 We are missing opportunites to use ldm. Take code like this:
void t(int *cp0, int *cp1, int *dp, int fmd) {
  int c0, c1, d0, d1, d2, d3;
  c0 = (*cp0++ & 0xffff) | ((*cp1++ << 16) & 0xffff0000);
  c1 = (*cp0++ & 0xffff) | ((*cp1++ << 16) & 0xffff0000);
  /* ... */
}

It code gens into something pretty bad. But with this change (analogous to the
X86 back-end), it will use ldm and generate few instructions.

llvm-svn: 106693
2010-06-23 23:00:16 +00:00
Dale Johannesen
f0763e6300 Reinstate correct test, remove the real invalidated test.
llvm-svn: 106664
2010-06-23 18:56:06 +00:00
Dale Johannesen
4e632b4237 Remove tests invalidated by previous checkin.
llvm-svn: 106663
2010-06-23 18:53:12 +00:00
Bob Wilson
cb414cca8d Thumb1 functions using @llvm.returnaddress were not saving the incoming LR.
Radar 8031193.

llvm-svn: 106582
2010-06-22 22:04:24 +00:00
Evan Cheng
167a8655c7 Fix PR7421: bug in kill transferring logic. It was ignoring loads / stores which have already been processed.
llvm-svn: 106481
2010-06-21 21:21:14 +00:00