1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 12:02:58 +02:00
Commit Graph

1563 Commits

Author SHA1 Message Date
Jakob Stoklund Olesen
05cec5db28 Completely disallow partial copies in adjustCopiesBackFrom().
Partial copies can show up even when CoalescerPair.isPartial() returns
false. For example:

   %vreg24:dsub_0<def> = COPY %vreg31:dsub_0; QPR:%vreg24,%vreg31

Such a partial-partial copy is not good enough for the transformation
adjustCopiesBackFrom() needs to do.

llvm-svn: 166944
2012-10-29 17:51:52 +00:00
Quentin Colombet
bcd2bc3437 [code size][ARM] Emit regular call instructions instead of the move, branch sequence
llvm-svn: 166854
2012-10-27 01:10:17 +00:00
Jakob Stoklund Olesen
4b4db880a3 Revert r163298 "Optimize codegen for VSETLNi{8,16,32} operating on Q registers."
Keep the integer_insertelement test case, the new coalescer can handle
this kind of lane insertion without help from pseudo-instructions.

llvm-svn: 166835
2012-10-26 23:39:46 +00:00
Evan Cheng
f97472cdf6 Fix a miscompilation caused by a typo. When turning a adde with negative value
into a sbc with a positive number, the immediate should be complemented, not
negated. Also added a missing pattern for ARM codegen.

rdar://12559385

llvm-svn: 166613
2012-10-24 19:53:01 +00:00
Bill Wendling
e97df2d337 When a block ends in an indirect branch, add its successors to the machine basic block.
The CFG of the machine function needs to know that the targets of the indirect
branch are successors to the indirect branch.
<rdar://problem/12529625>

llvm-svn: 166448
2012-10-22 23:30:04 +00:00
Shuxin Yang
3ad15929e7 This patch is to fix radar://8426430. It is about llvm support of __builtin_debugtrap()
which is supposed to consistently raise SIGTRAP across all systems. In contrast,
__builtin_trap() behave differently on different systems. e.g. it raises SIGTRAP on ARM, and
SIGILL on X86. The purpose of __builtin_debugtrap() is to consistently provide "trap"
functionality, in the mean time preserve the compatibility with on gcc on __builtin_trap().

  The X86 backend is already able to handle debugtrap(). This patch is to:
  1) make front-end recognize "__builtin_debugtrap()" (emboddied in the one-line change to Clang).
  2) In DAG legalization phase, by default, "debugtrap" will be replaced with "trap", which
     make the __builtin_debugtrap() "available" to all existing ports without the hassle of
     changing their code.
  3) If trap-function is specified (via -trap-func=xyz to llc), both __builtin_debugtrap() and
     __builtin_trap() will be expanded into the function call of the specified trap function.
    This behavior may need change in the future.

  The provided testing-case is to make sure 2) and 3) are working for ARM port, and we
already have a testing case for x86. 

llvm-svn: 166300
2012-10-19 20:11:16 +00:00
Stepan Dyatkovskiy
ece4c2a9c1 ARM:
Removed extra stack frame object for fixed byval arguments,
VarArgsStyleRegisters invocation was reworked due to some improper usage in
past. PR14099 also demonstrates it.

llvm-svn: 166273
2012-10-19 08:23:06 +00:00
Jakob Stoklund Olesen
1cfbe5c549 Revert r166046 "Switch back to the old coalescer for now to fix the 32 bit bit"
A fix for PR14098, including the test case is in the next commit.

llvm-svn: 166067
2012-10-16 22:51:55 +00:00
Rafael Espindola
2f08719190 Switch back to the old coalescer for now to fix the 32 bit bit
llvm+clang+compiler-rt bootstrap.

llvm-svn: 166046
2012-10-16 19:34:06 +00:00
Stepan Dyatkovskiy
09c6b0a273 Issue:
Stack is formed improperly for long structures passed as byval arguments for
EABI mode.

If we took AAPCS reference, we can found the next statements:

A: "If the argument requires double-word alignment (8-byte), the NCRN (Next
Core Register Number) is rounded up to the next even register number." (5.5
Parameter Passing, Stage C, C.3).

B: "The alignment of an aggregate shall be the alignment of its most-aligned
component." (4.3 Composite Types, 4.3.1 Aggregates).

So if we have structure with doubles (9 double fields) and 3 Core unused
registers (r1, r2, r3): caller should use r2 and r3 registers only.
Currently r1,r2,r3 set is used, but it is invalid.

Callee VA routine should also use r2 and r3 regs only. All is ok here. This
behaviour is guessed by rounding up SP address with ADD+BFC operations.

Fix:
Main fix is in ARMTargetLowering::HandleByVal. If we detected AAPCS mode and
8 byte alignment, we waste odd registers then.

P.S.:
I also improved LDRB_POST_IMM regression test. Since ldrb instruction will
not generated by current regression test after this patch. 

llvm-svn: 166018
2012-10-16 07:16:47 +00:00
Jim Grosbach
8df1c73056 ARM: v1i64 and v2i64 VBSL intrinsic support.
rdar://12502028

llvm-svn: 165981
2012-10-15 21:23:40 +00:00
Silviu Baranga
e3e0e84559 Fixed PR13938: the ARM backend was crashing because it couldn't select a VDUPLANE node with the vector input size different from the output size. This was bacause the BUILD_VECTOR lowering code didn't check that the size of the input vector was correct for using VDUPLANE.
llvm-svn: 165929
2012-10-15 09:41:32 +00:00
Jakob Stoklund Olesen
4aac24404c Drop <def,dead> flags when merging into an unused lane.
The new coalescer can merge a dead def into an unused lane of an
otherwise live vector register.

Clear the <dead> flag when that happens since the flag refers to the
full virtual register which is still live after the partial dead def.

This fixes PR14079.

llvm-svn: 165877
2012-10-13 17:26:47 +00:00
Jakob Stoklund Olesen
533711462c Allow for loops in LiveIntervals::pruneValue().
It is possible that the live range of the value being pruned loops back
into the kill MBB where the search started. When that happens, make sure
that the beginning of KillMBB is also pruned.

Instead of starting a DFS at KillMBB and skipping the root of the
search, start a DFS at each KillMBB successor, and allow the search to
loop back to KillMBB.

This fixes PR14078.

llvm-svn: 165872
2012-10-13 16:15:31 +00:00
Manman Ren
6d1b15f406 ARM: tail-call inside a function where part of a byval argument is on caller's
local frame causes problem.

For example:
void f(StructToPass s) {
  g(&s, sizeof(s));
}
will cause problem with tail-call since part of s is passed via registers and
saved in f's local frame. When g tries to access s, part of s may be corrupted
since f's local frame is popped out before the tail-call.

The current fix is to disable tail-call if getVarArgsRegSaveSize is not 0 for
the caller. This is a conservative approach, if we can prove the address of
s or part of s is not taken and passed to g, it should be okay to perform
tail-call.

rdar://12442472

llvm-svn: 165853
2012-10-12 23:39:43 +00:00
Jim Grosbach
f931c78724 ARM: Mark VSELECT as 'expand'.
The backend already pattern matches to form VBSL when it can. We may want to
teach it to use the vbsl intrinsics at some point to prevent machine licm from
mucking with this, but using the Expand is completely correct.

http://llvm.org/bugs/show_bug.cgi?id=13831
http://llvm.org/bugs/show_bug.cgi?id=13961

Patch by Peter Couperus <peter.couperus@st.com>.

llvm-svn: 165845
2012-10-12 22:59:21 +00:00
Evan Cheng
76b896ec91 Legalizer optimize a pair of div / mod to a call to divrem libcall if they are
not legal. However, it should use a div instruction + mul + sub if divide is
legal. The rem legalization code was missing a check and incorrectly uses a
divrem libcall even when div is legal.

rdar://12481395

llvm-svn: 165778
2012-10-12 01:15:47 +00:00
Evan Cheng
72074df318 Add isel patterns for v2f32 / v4f32 neon.vbsl intrinsics. rdar://12471808
llvm-svn: 165673
2012-10-10 23:06:34 +00:00
Stepan Dyatkovskiy
06c2fdd18f Fix for LDRB instruction:
SDNode for LDRB_POST_IMM is invalid: number of registers added to SDNode fewer
that described in .td.

7 ops is needed, but SDNode with only 6 is created.

In more details:
In ARMInstrInfo.td, in multiclass AI2_ldridx, in definition _POST_IMM, offset
operand is defined as am2offset_imm. am2offset_imm is complex parameter type,
and actually it consists from dummy register and imm itself. As I understood
trick with dummy reg was made for AsmParser. In ARMISelLowering.cpp, this dummy
register was not added to SDNode, and it cause crash in Peephole Optimizer pass.

The problem fixed by setting up additional dummy reg when emitting
LDRB_POST_IMM instruction.

llvm-svn: 165617
2012-10-10 11:43:40 +00:00
Stepan Dyatkovskiy
5182bb8695 Issue description:
SchedulerDAGInstrs::buildSchedGraph ignores dependencies between FixedStack
objects and byval parameters. So loading byval parameters from stack may be
inserted *before* it will be stored, since these operations are treated as
independent.

Fix:
Currently ARMTargetLowering::LowerFormalArguments saves byval registers with
FixedStack MachinePointerInfo. To fix the problem we need to store byval
registers with MachinePointerInfo referenced to first the "byval" parameter.

Also commit adds two new fields to the InputArg structure: Function's argument
index and InputArg's part offset in bytes relative to the start position of
Function's argument. E.g.: If function's argument is 128 bit width and it was
splitted onto 32 bit regs, then we got 4 InputArg structs with same arg index,
but different offset values. 

llvm-svn: 165616
2012-10-10 11:37:36 +00:00
Jim Grosbach
3237e667f1 ARM: locate user-defined text sections next to default text.
Make sure functions located in user specified text sections (via the
section attribute) are located together with the default text sections.
Otherwise, for large object files, the relocations for call instructions
are more likely to be out of range. This becomes even more likely in the
presence of LTO.

rdar://12402636

llvm-svn: 165254
2012-10-04 21:33:24 +00:00
Silviu Baranga
c4986e5454 Fixed a bug in the ExecutionDependencyFix pass that caused dependencies to not propagate through implicit defs.
llvm-svn: 165102
2012-10-03 08:29:36 +00:00
Jakob Stoklund Olesen
f4d8b0432e Make sure the whole live range is covered when values are pruned twice.
JoinVals::pruneValues() calls LIS->pruneValue() to avoid conflicts when
overlapping two different values. This produces a set of live range end
points that are used to reconstruct the live range (with SSA update)
after joining the two registers.

When a value is pruned twice, the set of end points was insufficient:

  v1 = DEF
  v1 = REPLACE1
  v1 = REPLACE2
  KILL v1

The end point at KILL would only reconstruct the live range from
REPLACE2 to KILL, leaving the range REPLACE1-REPLACE2 dead.

Add REPLACE2 as an end point in this case so the full live range is
reconstructed.

This fixes PR13999.

llvm-svn: 165056
2012-10-02 21:46:39 +00:00
Bob Wilson
ee6a40c517 Add LLVM support for Swift.
llvm-svn: 164899
2012-09-29 21:43:49 +00:00
Bob Wilson
fdb7fc6060 Whitespace.
llvm-svn: 164898
2012-09-29 21:27:31 +00:00
Jakob Stoklund Olesen
6f2b596e57 Enable the new coalescer algorithm by default.
The new coalescer is better at merging values into unused vector lanes,
improving NEON code.

llvm-svn: 164794
2012-09-27 21:06:02 +00:00
Jush Lu
ff46f6b0c6 [arm-fast-isel] Add support for ELF PIC.
This is a preliminary step towards ELF support; currently ARMFastISel hasn't
been used for ELF object files yet.

llvm-svn: 164759
2012-09-27 05:21:41 +00:00
NAKAMURA Takumi
f8c0be6df4 ARM/atomicrmw_minmax.ll: Fix RUN line.
llvm-svn: 164687
2012-09-26 10:12:20 +00:00
James Molloy
220547e625 Fix ordering of operands on lowering of atomicrmw min/max nodes on ARM.
llvm-svn: 164685
2012-09-26 09:48:32 +00:00
Bill Wendling
28f1f0139e Generate an error message instead of asserting or segfaulting when we have a
scalar-to-vector conversion that we cannot handle. For instance, when an invalid
constraint is used in an inline asm statement.
<rdar://problem/12284092>

llvm-svn: 164662
2012-09-26 06:16:18 +00:00
Bill Wendling
9a7fa167f9 Generate an error message instead of asserting or segfaulting when we have a
scalar-to-vector conversion that we cannot handle. For instance, when an invalid
constraint is used in an inline asm statement.
<rdar://problem/12284092>

llvm-svn: 164657
2012-09-26 04:04:19 +00:00
Chad Rosier
a58913fc00 [fast-isel] Fallback to SelectionDAG isel if we require strict alignment for
non-aligned i32 loads/stores.
rdar://12304911

llvm-svn: 164381
2012-09-21 16:58:35 +00:00
NAKAMURA Takumi
a7a5a7cd7d llvm/test/CodeGen/ARM/fast-isel.ll: Fix possible typos, s/@unaligned_i16_store/@unaligned_i16_load/g.
I guess this had apparently passed in +Asserts possibly due to verborsity.

llvm-svn: 164350
2012-09-21 01:15:05 +00:00
Chad Rosier
d80fb0b13d Testcase does not need to be this strict.
llvm-svn: 164347
2012-09-21 00:47:08 +00:00
Chad Rosier
c15c6508e0 Add newline.
llvm-svn: 164346
2012-09-21 00:43:18 +00:00
Chad Rosier
8a1b0217f6 [fast-isel] Fallback to SelectionDAG isel if we require strict alignment for
non-halfword-aligned i16 loads/stores.
rdar://12304911

llvm-svn: 164345
2012-09-21 00:41:42 +00:00
Jim Grosbach
135898ebe3 ARM: Use a dedicated intrinsic for vector bitwise select.
The expression based expansion too often results in IR level optimizations
splitting the intermediate values into separate basic blocks, preventing
the formation of the VBSL instruction as the code author intended. In
particular, LICM would often hoist part of the computation out of a loop.

rdar://11011471

llvm-svn: 164340
2012-09-21 00:18:20 +00:00
Jakob Stoklund Olesen
801e92ce89 Ignore PHI-defs for -new-coalescer interference checks.
A PHI can't create interference on its own. If two live ranges interfere
at a PHI, they must also interfere when leaving one of the PHI
predecessors.

llvm-svn: 164330
2012-09-20 23:08:42 +00:00
Evan Cheng
959ad65636 Try to make these tests more portable.
llvm-svn: 164320
2012-09-20 21:35:21 +00:00
Jakob Stoklund Olesen
557b4e64be Resolve conflicts involving dead vector lanes for -new-coalescer.
A common coalescing conflict in vector code is lane insertion:

  %dst = FOO
  %src = BAR
  %dst:ssub0 = COPY %src

The live range of %src interferes with the ssub0 lane of %dst, but that
lane is never read after %src would have clobbered it. That makes it
safe to merge the live ranges and eliminate the COPY:

  %dst = FOO
  %dst:ssub0 = BAR

This patch teaches the new coalescer to resolve conflicts where dead
vector lanes would be clobbered, at least as long as the clobbered
vector lanes don't escape the basic block.

llvm-svn: 164250
2012-09-19 21:29:18 +00:00
Evan Cheng
1a3416521f MOVi16 (movw) is only legal on cpus with V6T2 support. rdar://12300648
llvm-svn: 164169
2012-09-18 21:24:16 +00:00
James Molloy
4cb3751b3e More domain conversion; convert VFP VMOVS to NEON instructions in more cases - when we may clobber the other S-lane by converting an S to a D instruction, make an effort to work out if the S lane is clobberable or not.
llvm-svn: 164114
2012-09-18 08:31:15 +00:00
Evan Cheng
82c85585f9 Use vld1 / vst2 for unaligned v2f64 load / store. e.g. Use vld1.16 for 2-byte
aligned address. Based on patch by David Peixotto.

Also use vld1.64 / vst1.64 with 128-bit alignment to take advantage of alignment
hints. rdar://12090772, rdar://12238782

llvm-svn: 164089
2012-09-18 01:42:45 +00:00
Jakob Stoklund Olesen
b761721812 Merge into undefined lanes under -new-coalescer.
Add LIS::pruneValue() and extendToIndices(). These two functions are
used by the register coalescer when merging two live ranges requires
more than a trivial value mapping as supported by LiveInterval::join().

The pruneValue() function can remove the part of a value number that is
going to conflict in join(). Afterwards, extendToIndices can restore the
live range, using any new dominating value numbers and updating the SSA
form.

Use this complex value mapping to support merging a register into a
vector lane that has a conflicting value, but the clobbered lane is
undef.

llvm-svn: 164074
2012-09-17 23:03:25 +00:00
Silviu Baranga
aa267976b5 Removed the VMLxForwarding feature for the Cortex-A15 target.
llvm-svn: 164030
2012-09-17 14:10:54 +00:00
Silviu Baranga
11ff2a551d This patch introduces A15 as a target in LLVM.
llvm-svn: 163803
2012-09-13 15:05:10 +00:00
Kristof Beyls
76d939497b Fix constant folding through bitcasts by no longer relying on undefined behaviour (converting NaN values between float and double).
SelectionDAG::getConstantFP(double Val, EVT VT, bool isTarget);
should not be used when Val is not a simple constant (as the comment in
SelectionDAG.h indicates). This patch avoids using this function
when folding an unknown constant through a bitcast, where it cannot be
guaranteed that Val will be a simple constant.

llvm-svn: 163703
2012-09-12 11:25:02 +00:00
Jakob Stoklund Olesen
ab77839866 Don't attempt to use flags from predicated instructions.
The ARM backend can eliminate cmp instructions by reusing flags from a
nearby sub instruction with similar arguments.

Don't do that if the sub is predicated - the flags are not written
unconditionally.

<rdar://problem/12263428>

llvm-svn: 163535
2012-09-10 19:17:25 +00:00
James Molloy
fe38f1d2b0 Fix an assertion failure when optimising a shufflevector incorrectly into concat_vectors, and a followup bug with SelectionDAG::getNode() creating nodes with invalid types.
llvm-svn: 163511
2012-09-10 14:01:21 +00:00
Craig Topper
eb1db45675 Set operation action for FFLOOR to Expand for all vector types for X86. Set FFLOOR of v4f32 to Expand for ARM. v2f64 was already correct.
llvm-svn: 163458
2012-09-08 04:58:43 +00:00
James Molloy
791ec0aa52 Improve codegen for BUILD_VECTORs on ARM.
If we have a BUILD_VECTOR that is mostly a constant splat, it is often better to splat that constant then insertelement the non-constant lanes instead of insertelementing every lane from an undef base.

llvm-svn: 163304
2012-09-06 09:55:02 +00:00
James Molloy
90179e600b Optimize codegen for VSETLNi{8,16,32} operating on Q registers. Degenerate to a VSETLN on D registers, instead of an (INSERT_SUBREG (VSETLN (EXTRACT_SUBREG ))) sequence to help the register coalescer.
llvm-svn: 163298
2012-09-06 09:16:01 +00:00
Jakob Stoklund Olesen
0324528c8c Use predication instead of pseudo-opcodes when folding into MOVCC.
Now that it is possible to dynamically tie MachineInstr operands,
predicated instructions are possible in SSA form:

  %vreg3<def> = SUBri %vreg1, -2147483647, pred:14, pred:%noreg, %opt:%noreg
  %vreg4<def,tied1> = MOVCCr %vreg3<tied0>, %vreg1, %pred:12, pred:%CPSR

Becomes a predicated SUBri with a tied imp-use:

  SUBri %vreg1, -2147483647, pred:13, pred:%CPSR, opt:%noreg, %vreg1<imp-use,tied0>

This means that any instruction that is safe to move can be folded into
a MOVCC, and the *CC pseudo-instructions are no longer needed.

The test case changes reflect that Thumb2SizeReduce recognizes the
predicated instructions. It didn't understand the pseudos.

llvm-svn: 163274
2012-09-05 23:58:02 +00:00
Tim Northover
4e03b89c79 Strip old MachineInstrs *after* we know we can put them back.
Previous patch accidentally decided it couldn't convert a VFP to a
NEON instruction after it had already destroyed the old one. Not a
good move.

llvm-svn: 163230
2012-09-05 18:37:53 +00:00
Silviu Baranga
6f46bb1705 Fixed the DAG combiner to better handle the folding of AND nodes for vector types. The previous code was making the assumption that the length of the bitmask returned by isConstantSplat was equal to the size of the vector type. Now we first make sure that the splat value has at least the length of the vector lane type, then we only use as many fields as we have available in the splat value.
llvm-svn: 163203
2012-09-05 08:57:21 +00:00
Arnold Schwaighofer
d606c6fcdf Patch to implement UMLAL/SMLAL instructions for the ARM architecture
This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.

Bug 12213

Patch by Yin Ma!

llvm-svn: 163136
2012-09-04 14:37:49 +00:00
Nadav Rotem
d1815a0763 Not all targets have efficient ISel code generation for select instructions.
For example, the ARM target does not have efficient ISel handling for vector
selects with scalar conditions. This patch adds a TLI hook which allows the
different targets to report which selects are supported well and which selects
should be converted to CF duting codegen prepare.

llvm-svn: 163093
2012-09-02 12:10:19 +00:00
Nadav Rotem
3425b10f0f Generate better select code by allowing the target to use scalar select, and not sign-extend.
llvm-svn: 163086
2012-09-02 08:20:07 +00:00
Owen Anderson
27ba45c764 Teach DAG combine a number of tricks to simplify FMA expressions in fast-math mode.
llvm-svn: 163051
2012-09-01 06:04:27 +00:00
Jakob Stoklund Olesen
eb687a399c Fix a couple of typos in EmitAtomic.
Thumb2 instructions are mostly constrained to rGPR, not tGPR which is
for Thumb1.

rdar://problem/12203728

llvm-svn: 162968
2012-08-31 02:08:34 +00:00
Nadav Rotem
17b9d8cba2 Currently targets that do not support selects with scalar conditions and vector operands - scalarize the code. ARM is such a target
because it does not support CMOV of vectors. To implement this efficientlyi, we broadcast the condition bit and use a sequence of NAND-OR
to select between the two operands. This is the same sequence we use for targets that don't have vector BLENDs (like SSE2).

rdar://12201387

llvm-svn: 162926
2012-08-30 19:17:29 +00:00
Tim Northover
627f946e05 Add support for moving pure S-register to NEON pipeline if desired
llvm-svn: 162898
2012-08-30 10:17:45 +00:00
Jush Lu
5a78c68e1d [arm-fast-isel] Add support for ARM PIC.
llvm-svn: 162823
2012-08-29 02:41:21 +00:00
Bill Wendling
d49e183a6f Make sure we add the predicate after all of the registers are added.
<rdar://problem/12183003>

llvm-svn: 162703
2012-08-27 22:12:44 +00:00
Stepan Dyatkovskiy
56ead97c8d Rejected 169195. As Duncan commented, bitcasting to proper type is wrong approach. We need to insert some valid TRANCATE node here.
llvm-svn: 162354
2012-08-22 09:33:55 +00:00
Tim Northover
ff29cd0939 Add correct set of regression tests for r162094 commit.
llvm-svn: 162276
2012-08-21 12:43:03 +00:00
Jakob Stoklund Olesen
4403f82dbf Add a missing def flag.
*** Bad machine code: Explicit definition marked as use ***
- function:    test_cos
- basic block: BB#0 L.entry (0x7ff2a2024fd0)
- instruction: VSETLNi32 %D11, %D11<undef>, %R0, 0, pred:14, pred:%noreg, %Q5<imp-use,kill>, %Q5<imp-def>
- operand 0:   %D11

llvm-svn: 162247
2012-08-21 00:34:53 +00:00
Jakob Stoklund Olesen
a3264c242c Don't add CFG edges for redundant conditional branches.
IR that hasn't been through SimplifyCFG can look like this:

  br i1 %b, label %r, label %r

Make sure we don't create duplicate Machine CFG edges in this case.

Fix the machine code verifier to accept conditional branches with a
single CFG edge.

llvm-svn: 162230
2012-08-20 21:39:52 +00:00
Stepan Dyatkovskiy
0a4850d1ef Forget to add testcase for r162195. Sorry.
llvm-svn: 162196
2012-08-20 08:03:18 +00:00
Jakob Stoklund Olesen
e78d4a5b08 Also combine zext/sext into selects for ARM.
This turns common i1 patterns into predicated instructions:

  (add (zext cc), x) -> (select cc (add x, 1), x)
  (add (sext cc), x) -> (select cc (add x, -1), x)

For a function like:

  unsigned f(unsigned s, int x) {
    return s + (x>0);
  }

We now produce:

  cmp r1, #0
  it  gt
  addgt.w r0, r0, #1

Instead of:

  movs  r2, #0
  cmp r1, #0
  it  gt
  movgt r2, #1
  add r0, r2

llvm-svn: 162177
2012-08-18 21:25:22 +00:00
Jakob Stoklund Olesen
ece4a53017 Also pass logical ops to combineSelectAndUse.
Add these transformations to the existing add/sub ones:

  (and (select cc, -1, c), x) -> (select cc, x, (and, x, c))
  (or  (select cc, 0, c), x)  -> (select cc, x, (or, x, c))
  (xor (select cc, 0, c), x)  -> (select cc, x, (xor, x, c))

The selects can then be transformed to a single predicated instruction
by peephole.

This transformation will make it possible to eliminate the ISD::CAND,
COR, and CXOR custom DAG nodes.

llvm-svn: 162176
2012-08-18 21:25:16 +00:00
Jakob Stoklund Olesen
40eb30013e Avoid folding ADD instructions with FI operands.
PEI can't handle the pseudo-instructions. This can be removed when the
pseudo-instructions are replaced by normal predicated instructions.

Fixes PR13628.

llvm-svn: 162130
2012-08-17 20:55:34 +00:00
Tim Northover
1de091468c Implement NEON domain switching for scalar <-> S-register vmovs on ARM
llvm-svn: 162094
2012-08-17 11:32:52 +00:00
Jakob Stoklund Olesen
88217b055d Add ADD and SUB to the predicable ARM instructions.
It is not my plan to duplicate the entire ARM instruction set with
predicated versions. We need a way of representing predicated
instructions in SSA form without requiring a separate opcode.

Then the pseudo-instructions can go away.

llvm-svn: 162061
2012-08-16 23:21:55 +00:00
Jush Lu
767c82d4e0 [arm-fast-isel] Add support for fastcc.
Without fastcc support, the caller just falls through to CallingConv::C
for fastcc, but callee still uses fastcc, this inconsistency of calling
convention is a problem, and fastcc support can fix it.

llvm-svn: 162013
2012-08-16 05:15:53 +00:00
Jakob Stoklund Olesen
55aee8b58a Fold predicable instructions into MOVCC / t2MOVCC.
The ARM select instructions are just predicated moves. If the select is
the only use of an operand, the instruction defining the operand can be
predicated instead, saving one instruction and decreasing register
pressure.

This implementation can turn AND/ORR/EOR instructions into their
corresponding ANDCC/ORRCC/EORCC variants. Ideally, we should be able to
predicate any instruction, but we don't yet support predicated
instructions in SSA form.

llvm-svn: 161994
2012-08-15 22:16:39 +00:00
Bill Wendling
8c8ad77086 Rework test so that it reproduces the error without the horrible flag.
llvm-svn: 161989
2012-08-15 21:10:18 +00:00
Evan Cheng
625c0ca5ee Use vld1/vst1 to load/store f64 if alignment is < 4 and the target allows unaligned access. rdar://12091029
llvm-svn: 161962
2012-08-15 17:44:53 +00:00
Anton Korobeynikov
d13403fbd1 The names of VFP variants of half-to-float conversion instructions were
reversed. This leads to wrong codegen for float-to-half conversion
intrinsics which are used to support storage-only fp16 type.
NEON variants of same instructions are fine.

llvm-svn: 161907
2012-08-14 23:36:01 +00:00
Nadav Rotem
eb22b069bb During the CodeGenPrepare we often lower intrinsics (such as objsize)
and allow some optimizations to turn conditional branches into unconditional.
This commit adds a simple control-flow optimization which merges two consecutive
basic blocks which are connected by a single edge. This allows the codegen to
operate on larger basic blocks.

rdar://11973998

llvm-svn: 161852
2012-08-14 05:19:07 +00:00
NAKAMURA Takumi
79503bef02 llvm/test/CodeGen/ARM/floorf.ll: Add explicit -mtriple=arm-unknown-unknown, or it fails on msvc.
llvm-svn: 161825
2012-08-14 00:56:06 +00:00
Owen Anderson
c5b77c0317 Add a roundToIntegral method to APFloat, which can be parameterized over various rounding modes. Use this to implement SelectionDAG constant folding of FFLOOR, FCEIL, and FTRUNC.
llvm-svn: 161807
2012-08-13 23:32:49 +00:00
Nadav Rotem
03c4d5f036 Do not optimize (or (and X,Y), Z) into BFI and other sequences if the AND ISDNode has more than one user.
rdar://11876519

llvm-svn: 161775
2012-08-13 18:52:44 +00:00
Eric Christopher
3aea549423 Add support for the %H output modifier.
Patch by Weiming Zhao.

llvm-svn: 161768
2012-08-13 18:18:52 +00:00
Tim Northover
7acf444c11 Add test for previous commit correcting NEON load patterns.
llvm-svn: 161750
2012-08-13 10:38:45 +00:00
Arnold Schwaighofer
c751a25aed Revert 161581: Patch to implement UMLAL/SMLAL instructions for the ARM
architecture

It broke MultiSource/Applications/JM/ldecod/ldecod on armv7 thumb O0 g and armv7
thumb O3.

llvm-svn: 161736
2012-08-12 05:11:56 +00:00
Arnold Schwaighofer
f3d4d73157 Patch to implement UMLAL/SMLAL instructions for the ARM architecture
This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.

Bug 12213

Patch by Yin Ma!

llvm-svn: 161581
2012-08-09 15:25:52 +00:00
Nadav Rotem
63ebca3806 Fix the legalization of ExtLoad on ARM. ExpandUnalignedLoad did not properly
handle the cases where the memory value type was illegal. 
PR 13111. 

llvm-svn: 161565
2012-08-09 01:56:44 +00:00
Bob Wilson
b173ddc53e Add test triples to fix win32 failures. Revert workaround from r161292.
I don't have a win32 system to test, so hopefully I got them all fixed here.

llvm-svn: 161519
2012-08-08 20:31:37 +00:00
Anton Korobeynikov
6dd5c91aae Add stack spill / reload instructions for DTriple and DQuad register classes, which
were missed for no reason. This fixes PR13377

llvm-svn: 161299
2012-08-04 13:16:12 +00:00
Bob Wilson
9f6e25017a Refactor and check "onlyReadsMemory" before optimizing builtins.
This patch is mostly just refactoring a bunch of copy-and-pasted code, but
it also adds a check that the call instructions are readnone or readonly.
That check was already present for sin, cos, sqrt, log2, and exp2 calls, but
it was missing for the rest of the builtins being handled in this code.

llvm-svn: 161282
2012-08-03 23:29:17 +00:00
Jush Lu
ca5d760bf2 [arm-fast-isel] Add support for shl, lshr, and ashr.
llvm-svn: 161230
2012-08-03 02:37:48 +00:00
Jakob Stoklund Olesen
ed1a4d695a Clear kill flags in removeCopyByCommutingDef().
We are extending live ranges, so kill flags are not accurate. They
aren't needed until they are recomputed after RA anyway.

<rdar://problem/11950722>

llvm-svn: 161023
2012-07-31 02:47:24 +00:00
Jush Lu
54c7329b88 [arm-fast-isel] Add support for vararg function calls.
llvm-svn: 160500
2012-07-19 09:49:00 +00:00
Chandler Carruth
5d1c4f0605 Fix a somewhat nasty crasher in PR13378. This crashes inside of
LiveIntervals due to the two-addr pass generating bogus MI code.

The crux of the issue was a loop nesting problem. The intent of the code
which attempts to transform instructions before converting them to
two-addr form is to defer and reprocess any transformed instructions as
the second processing is likely to have more opportunities to coalesce
copies, etc. Unfortunately, there was one section of processing that was
not deferred -- the INSERT_SUBREG rewriting. Due to quirks of how this
rewriting proceeded, not only did it occur early, it removed the bits of
information needed for the deferred processing to correctly generate the
necessary two address form (specifically inserting a copy), but didn't
trigger any immediate assertions and produced what appeared to be
already valid two-address from code. Thus, the assertion only fired much
later in the pipeline.

The fix is to hoist the transformation logic up layer to where it can
more firmly defer all further processing, and to teach the normal
processing to handle an edge case previously handled as part of the
transformation logic. This edge case (already matched tied register
operands) needs to *not* defer any steps.

As has been brought up repeatedly in the process: wow does this code
need refactoring. I *may* squeeze in some time to at least bring sanity
to this loop... but wow... =]

Thanks to Jakob for helpful hints on the way here, and the review.

llvm-svn: 160443
2012-07-18 18:58:22 +00:00
Joel Jones
4ce75efda5 More replacing of target-dependent intrinsics with target-indepdent
intrinsics.  The second instruction(s) to be handled are the vector versions 
of count set bits (ctpop).

The changes here are to clang so that it generates a target independent 
vector ctpop when it sees an ARM dependent vector bits set count.  The changes 
in llvm are to match the target independent vector ctpop and in 
VMCore/AutoUpgrade.cpp to update any existing bc files containing ARM 
dependent vector pop counts with target-independent ctpops.  There are also 
changes to an existing test case in llvm for ARM vector count instructions and 
to a test for the bitcode upgrade.

<rdar://problem/11892519>

There is deliberately no test for the change to clang, as so far as I know, no
consensus has been reached regarding how to test neon instructions in clang;
q.v. <rdar://problem/8762292>

llvm-svn: 160410
2012-07-18 00:02:16 +00:00
Joel Jones
12ea066486 This is one of the first steps at moving to replace target-dependent
intrinsics with target-indepdent intrinsics.  The first instruction(s) to be 
handled are the vector versions of count leading zeros (ctlz).

The changes here are to clang so that it generates a target independent 
vector ctlz when it sees an ARM dependent vector ctlz.  The changes in llvm 
are to match the target independent vector ctlz and in VMCore/AutoUpgrade.cpp 
to update any existing bc files containing ARM dependent vector ctlzs with 
target-independent ctlzs.  There are also changes to an existing test case in 
llvm for ARM vector count instructions and a new test for the bitcode upgrade.

<rdar://problem/11831778>

There is deliberately no test for the change to clang, as so far as I know, no
consensus has been reached regarding how to test neon instructions in clang;
q.v. <rdar://problem/8762292>

llvm-svn: 160200
2012-07-13 23:25:25 +00:00
Manman Ren
0479bff92d ARM: Fix optimizeCompare to correctly check safe condition.
It is safe if CPSR is killed or re-defined.
When we are done with the basic block, check whether CPSR is live-out.
Do not optimize away cmp if CPSR is live-out.

llvm-svn: 160090
2012-07-11 22:51:44 +00:00
Owen Anderson
e0c4f94d45 Teach the DAG combiner to turn sitofp/uitofp from i1 into a conditional move, since there are only two possible values.
Previously, this would become an integer extension operation, followed by a real integer->float conversion.

llvm-svn: 159957
2012-07-09 20:31:12 +00:00
NAKAMURA Takumi
bd5a31d598 Revert r159804, "[arm-fast-isel] Add support for vararg function calls."
It broke LLVM :: CodeGen/Thumb2/large-call.ll on several hosts.

llvm-svn: 159817
2012-07-06 11:12:44 +00:00
Jush Lu
4fdc23801c [arm-fast-isel] Add support for vararg function calls.
llvm-svn: 159804
2012-07-06 03:02:37 +00:00
Chandler Carruth
5d3a0ce4e5 Fix the remaining TCL-style quotes found in the testsuite. This is
another mechanical change accomplished though the power of terrible Perl
scripts.

I have manually switched some "s to 's to make escaping simpler.

While I started this to fix tests that aren't run in all configurations,
the massive number of tests is due to a really frustrating fragility of
our testing infrastructure: things like 'grep -v', 'not grep', and
'expected failures' can mask broken tests all too easily.

Essentially, I'm deeply disturbed that I can change the testsuite so
radically without causing any change in results for most platforms. =/

llvm-svn: 159547
2012-07-02 19:09:46 +00:00
Chandler Carruth
d200829a4f Convert the uses of '|&' to use '2>&1 |' instead, which works on old
versions of Bash. In addition, I can back out the change to the lit
built-in shell test runner to support this.

This should fix the majority of fallout on Darwin, but I suspect there
will be a few straggling issues.

llvm-svn: 159544
2012-07-02 18:37:59 +00:00
Chandler Carruth
8a358b3669 Convert all tests using TCL-style quoting to use shell-style quoting.
This was done through the aid of a terrible Perl creation. I will not
paste any of the horrors here. Suffice to say, it require multiple
staged rounds of replacements, state carried between, and a few
nested-construct-parsing hacks that I'm not proud of. It happens, by
luck, to be able to deal with all the TCL-quoting patterns in evidence
in the LLVM test suite.

If anyone is maintaining large out-of-tree test trees, feel free to poke
me and I'll send you the steps I used to convert things, as well as
answer any painful questions etc. IRC works best for this type of thing
I find.

Once converted, switch the LLVM lit config to use ShTests the same as
Clang. In addition to being able to delete large amounts of Python code
from 'lit', this will also simplify the entire test suite and some of
lit's architecture.

Finally, the test suite runs 33% faster on Linux now. ;]
For my 16-hardware-thread (2x 4-core xeon e5520): 36s -> 24s

llvm-svn: 159525
2012-07-02 12:47:22 +00:00
Rafael Espindola
dd05a97f8e Now that RegistersDefinedFromSameValue handles one instruction being an
implicit_def, the other instruction can be anything, including instructions
that define multiple values. Be careful about that and don't assume what operand
0 is.
Fixes pr13249.

llvm-svn: 159509
2012-07-01 17:08:01 +00:00
Manman Ren
bd339c27e1 ARM: update peephole optimization.
More condition codes are included when deciding whether to remove cmp after
a sub instruction. Specifically, we extend from GE|LT|GT|LE to 
GE|LT|GT|LE|HS|LS|HI|LO|EQ|NE. If we have "sub a, b; cmp b, a; movhs", we
should be able to replace with "sub a, b; movls".

rdar: 11725965
llvm-svn: 159166
2012-06-25 21:49:38 +00:00
Pete Cooper
9f89f00988 DAG legalisation can now handle illegal fma vector types by scalarisation
llvm-svn: 159092
2012-06-24 00:05:44 +00:00
Hans Wennborg
8c011bd43a Extend the IL for selecting TLS models (PR9788)
This allows the user/front-end to specify a model that is better
than what LLVM would choose by default. For example, a variable
might be declared as

  @x = thread_local(initialexec) global i32 42

if it will not be used in a shared library that is dlopen'ed.

If the specified model isn't supported by the target, or if LLVM can
make a better choice, a different model may be used.

llvm-svn: 159077
2012-06-23 11:37:03 +00:00
Evan Cheng
2d498dc096 (sub X, imm) gets canonicalized to (add X, -imm)
There are patterns to handle immediates when they fit in the immediate field.
e.g. %sub = add i32 %x, -123
=>   sub r0, r0, #123
Add patterns to catch immediates that do not fit but should be materialized
with a single movw instruction rather than movw + movt pair.
e.g. %sub = add i32 %x, -65535
=>   movw r1, #65535
     sub r0, r0, r1

rdar://11726136

llvm-svn: 159057
2012-06-23 00:29:06 +00:00
Lang Hames
7d298105e5 Rename fp-op fusion option (yet again) for compatibility with GCC option.
llvm-svn: 159042
2012-06-22 22:31:00 +00:00
Andrew Trick
764fe3bfef ARM scheduling fix: compute predicated implicit use properly.
Minor drive by fix to cleanup latency computation. Calling
getOperandLatency with a deliberately incorrect operand index does not
give you the latency you want.

llvm-svn: 158959
2012-06-22 02:50:31 +00:00
Lang Hames
68cf87e3ef Rename -allow-excess-fp-precision flag to -fuse-fp-ops, and switch from a
boolean flag to an enum: { Fast, Standard, Strict } (default = Standard).

This option controls the creation by optimizations of fused FP ops that store
intermediate results in higher precision than IEEE allows (E.g. FMAs). The
behavior of this option is intended to match the behaviour specified by a
soon-to-be-introduced frontend flag: '-ffuse-fp-ops'.

Fast mode - allows formation of fused FP ops whenever they're profitable.

Standard mode - allow fusion only for 'blessed' FP ops. At present the only
blessed op is the fmuladd intrinsic. In the future more blessed ops may be
added.

Strict mode - allow fusion only if/when it can be proven that the excess
precision won't effect the result.

Note: This option only controls formation of fused ops by the optimizers.  Fused
operations that are explicitly requested (e.g. FMA via the llvm.fma.* intrinsic)
will always be honored, regardless of the value of this option.

Internally TargetOptions::AllowExcessFPPrecision has been replaced by
TargetOptions::AllowFPOpFusion.

llvm-svn: 158956
2012-06-22 01:09:09 +00:00
Lang Hames
662801dbc8 Add a missing llvm.fma -> VFNMS pattern to the ARM backend.
llvm-svn: 158902
2012-06-21 06:10:00 +00:00
Evan Cheng
5e3a175c65 Emit a single _udivmodsi4 libcall instead of two separate _udivsi3 and
_umodsi3 libcalls if they have the same arguments. This optimization
was apparently broken if one of the node was replaced in place.
rdar://11714607

llvm-svn: 158900
2012-06-21 05:56:05 +00:00
Lang Hames
f0b9601a6d Add DAG-combines for aggressive FMA formation.
This patch adds DAG combines to form FMAs from pairs of FADD + FMUL or
FSUB + FMUL. The combines are performed when:
(a) Either
      AllowExcessFPPrecision option (-enable-excess-fp-precision for llc)
        OR
      UnsafeFPMath option (-enable-unsafe-fp-math)
    are set, and
(b) TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) is true for the type of
    the FADD/FSUB, and
(c) The FMUL only has one user (the FADD/FSUB).

If your target has fast FMA instructions you can make use of these combines by
overriding TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) to return true for
types supported by your FMA instruction, and adding patterns to match ISD::FMA
to your FMA instructions.

llvm-svn: 158757
2012-06-19 22:51:23 +00:00
Manman Ren
6d2895c506 ARM: use NOEN loads and stores if possible when handling struct byval.
This change is to be enabled in clang.

rdar://9877866

llvm-svn: 158684
2012-06-18 22:23:48 +00:00
Joel Jones
3d5ae56be4 This change handles a another case for generating the bic instruction
when a compile time constant is known.  This occurs when implicitly zero 
extending function arguments from 16 bits to 32 bits.  The 8 bit case doesn't
need to be handled, as the 8 bit constants are encoded directly, thereby
not needing a separate load instruction to form the constant into a register.

<rdar://problem/11481151>

llvm-svn: 158659
2012-06-18 14:51:32 +00:00
Manman Ren
7ffcd63dea ARM: optimization for sub+abs.
This patch will optimize abs(x-y)
FROM
sub, movs, rsbmi
TO
subs, rsbmi

For abs, we will use cmp instead of movs. This is necessary because we already
have an existing peephole pass which optimizes away cmp following sub.

rdar: 11633193
llvm-svn: 158551
2012-06-15 21:32:12 +00:00
Jakob Stoklund Olesen
6fd22231ba Preserve <undef> flags in ARMExpandPseudo.
This probably mostly shows up in bugpoint-generated code.

llvm-svn: 158527
2012-06-15 17:46:54 +00:00
Manman Ren
797e146fae Revert: test/CodeGen/ARM/iabs.ll in r158441
Sorry that I accidently checked in this file with my previous commit.

llvm-svn: 158442
2012-06-14 06:04:02 +00:00
Manman Ren
e3471c0bdf InstCombine: fix a bug when combining (fcmp cc0 x, y) && (fcmp cc1 x, y).
uno && ueq was converted to ueq, it should be converted to uno.

llvm-svn: 158441
2012-06-14 05:57:42 +00:00
Andrew Trick
1115f0067a sched: fix latency of memory dependence chain edges for consistency.
For store->load dependencies that may alias, we should always use
TrueMemOrderLatency, which may eventually become a subtarget hook. In
effect, we should guarantee at least TrueMemOrderLatency on at least
one DAG path from a store to a may-alias load.

This should fix the standard mode as well as -enable-aa-sched-mi".

llvm-svn: 158380
2012-06-13 02:39:03 +00:00
Chad Rosier
9734752b2c [arm-fast-isel] Add support for -arm-long-calls.
Patch by Jush Lu <jush.msn@gmail.com>.

llvm-svn: 158368
2012-06-12 19:25:13 +00:00
Jakob Stoklund Olesen
b6d5f36f3d Fix test that depends on register allocation.
The test is really checking the prolog/epilog load/store multiple
formation.

llvm-svn: 158328
2012-06-11 21:14:28 +00:00
Bill Wendling
2320ff76d7 Re-enable the CMN instruction.
We turned off the CMN instruction because it had semantics which we weren't
getting correct. If we are comparing with an immediate, then it's okay to use
the CMN instruction.
<rdar://problem/7569620>

llvm-svn: 158302
2012-06-11 08:07:26 +00:00
Jakob Stoklund Olesen
ce0f9aef12 Don't run RAFast in the optimizing regalloc pipeline.
The fast register allocator is not supposed to work in the optimizing
pipeline. It doesn't make sense to compute live intervals, run full copy
coalescing, and then run RAFast.

Fast register allocation in the optimizing pipeline is better done by
RABasic.

llvm-svn: 158242
2012-06-08 23:15:12 +00:00
Joel Jones
6fb688ff18 Revert commit r157966
llvm-svn: 157972
2012-06-05 00:47:21 +00:00
Joel Jones
fdd5284d04 This change handles a another case for generating the bic instruction
when a compile time constant is known.  This occurs when implicitly zero 
extending function arguments from 16 bits to 32 bits.

<rdar://problem/11481151>

llvm-svn: 157966
2012-06-04 23:38:57 +00:00
Nadav Rotem
6969a673e9 Remove the "-promote-elements" flag. This flag is now enabled by default.
llvm-svn: 157925
2012-06-04 11:27:21 +00:00
Manman Ren
f8ff637b8d ARM: add testing case for struct byval
rdar://9877866

llvm-svn: 157876
2012-06-02 05:37:44 +00:00
Owen Anderson
060176473a Make this testcase independent of register allocation.
llvm-svn: 157761
2012-05-31 18:07:02 +00:00
Owen Anderson
38e510d831 Switch the canonical FMA term operand order to match both the comment I wrote and the usual LLVM convention.
llvm-svn: 157708
2012-05-30 18:54:50 +00:00
Owen Anderson
49ddb93c30 Teach DAGCombine to canonicalize the position of a constant in the term operands of an FMA node.
llvm-svn: 157707
2012-05-30 18:50:39 +00:00
Chad Rosier
80159b26a5 [arm-fast-isel] Add support for the llvm.frameaddress() intrinsic.
Patch by Jush Lu <jush.msn@gmail.com>.

llvm-svn: 157696
2012-05-30 17:23:22 +00:00
Evan Cheng
a46443d6db Teach taildup to update livein set. rdar://11538365
llvm-svn: 157663
2012-05-30 00:42:39 +00:00
Chris Lattner
3ae033bbe9 These tests used intrinsics with the wrong prototype. They weren't caught because
the old verifier just checked that something "was a pointer", but not that the pointee
was correct.

llvm-svn: 157544
2012-05-27 19:35:41 +00:00
Chad Rosier
d9dca3aef3 [arm-fast-isel] Add support for non-global callee.
Patch by Jush Lu <jush.msn@gmail.com>.

llvm-svn: 157336
2012-05-23 18:38:57 +00:00
Nuno Lopes
944814b41a revert my previous patches that introduced an additional parameter to the objectsize intrinsic.
After a lot of discussion, we realized it's not the best option for run-time bounds checking

llvm-svn: 157255
2012-05-22 15:25:31 +00:00
Jakob Stoklund Olesen
3bee83c3a5 Transfer memory operands to the right instruction.
They need to go on the PICLDR as the verifier points out.

llvm-svn: 157151
2012-05-20 06:38:42 +00:00
Jim Grosbach
343a996ca5 Refactor data-in-code annotations.
Use a dedicated MachO load command to annotate data-in-code regions.
This is the same format the linker produces for final executable images,
allowing consistency of representation and use of introspection tools
for both object and executable files.

Data-in-code regions are annotated via ".data_region"/".end_data_region"
directive pairs, with an optional region type.

data_region_directive := ".data_region" { region_type }
region_type := "jt8" | "jt16" | "jt32" | "jta32"
end_data_region_directive := ".end_data_region"

The previous handling of ARM-style "$d.*" labels was broken and has
been removed. Specifically, it didn't handle ARM vs. Thumb mode when
marking the end of the section.

rdar://11459456

llvm-svn: 157062
2012-05-18 19:12:01 +00:00
Tim Northover
fc7737e7dc Remove incorrect pattern for ARM SMML instruction.
Patch by Meador Inge.

llvm-svn: 156989
2012-05-17 13:12:13 +00:00
Jakob Stoklund Olesen
1da786c936 Enable sub-sub-register copy coalescing.
It is now possible to coalesce weird skewed sub-register copies by
picking a super-register class larger than both original registers. The
included test case produces code like this:

  vld2.32 {d16, d17, d18, d19}, [r0]!
  vst2.32 {d18, d19, d20, d21}, [r0]

We still perform interference checking as if it were a normal full copy
join, so this is still quite conservative. In particular, the f1 and f2
functions in the included test case still have remaining copies because
of false interference.

llvm-svn: 156878
2012-05-15 23:31:35 +00:00
Chad Rosier
4a65a2a197 [fast-isel] Add support for selecting @llvm.trap().
llvm-svn: 156646
2012-05-11 21:33:49 +00:00
Chad Rosier
72bd34ca71 [fast-isel] Remove -disable-arm-fast-isel option. -fast-isel=0 suffices. Minor cleanup.
llvm-svn: 156632
2012-05-11 19:40:25 +00:00
Chad Rosier
c20de37076 [fast-isel] Cleaner fix for when we're unable to handle a non-double multi-reg
retval.  Hoists check before emitting the call to avoid unnecessary work.
rdar://11430407
PR12796

llvm-svn: 156628
2012-05-11 18:51:55 +00:00
Manman Ren
c82d0e71b9 ARM: peephole optimization to remove cmp instruction
This patch will optimize the following cases:
  sub r1, r3 | sub r1, imm
  cmp r3, r1 or cmp r1, r3 | cmp r1, imm
  bge L1

TO
  subs r1, r3
  bge  L1 or ble L1

If the branch instruction can use flag from "sub", then we can replace
"sub" with "subs" and eliminate the "cmp" instruction.

rdar: 10734411
llvm-svn: 156599
2012-05-11 01:30:47 +00:00
Manman Ren
5abcae1320 Revert: 156550 "ARM: peephole optimization to remove cmp instruction"
This commit broke an external linux bot and gave a compile-time warning.

llvm-svn: 156556
2012-05-10 18:49:43 +00:00
Manman Ren
727b7d5e4c ARM: peephole optimization to remove cmp instruction
This patch will optimize the following cases:
  sub r1, r3 | sub r1, imm
  cmp r3, r1 or cmp r1, r3 | cmp r1, imm
  bge L1

TO
  subs r1, r3
  bge  L1 or ble L1

If the branch instruction can use flag from "sub", then we can replace
"sub" with "subs" and eliminate the "cmp" instruction.

rdar: 10734411
llvm-svn: 156550
2012-05-10 16:48:21 +00:00
Nuno Lopes
e8880a9916 change the objectsize intrinsic signature: add a 3rd parameter to denote the maximum runtime performance penalty that the user is willing to accept.
This commit only adds the parameter. Code taking advantage of it will follow.

llvm-svn: 156473
2012-05-09 15:52:43 +00:00
Owen Anderson
8adb0322ce Teach DAG combine to fold x-x to 0.0 when unsafe FP math is enabled.
llvm-svn: 156324
2012-05-07 20:51:25 +00:00
Owen Anderson
e3d41b44cc Teach DAGCombine the same multiply-by-1.0 folding trick when doing FMAs, just like it now knows for FMULs.
llvm-svn: 156029
2012-05-02 22:17:40 +00:00
Owen Anderson
3cfe269707 Teach DAG combine that multiplication by 1.0 can always be constant folded.
llvm-svn: 156023
2012-05-02 21:32:35 +00:00
Bob Wilson
d2f6ff588b Don't introduce illegal types when creating vmull operations. <rdar://11324364>
ARM BUILD_VECTORs created after type legalization cannot use i8 or i16
operands, since those types are not legal.  Instead use i32 operands, which
will be implicitly truncated by the BUILD_VECTOR to match the element type.

llvm-svn: 155824
2012-04-30 16:53:34 +00:00
Lang Hames
7d83af4ed0 Fix the order of the operands in the llvm.fma intrinsic patterns for ARM,
<rdar://problem/11325085>.

llvm-svn: 155724
2012-04-27 18:51:24 +00:00
Evan Cheng
f35523d08a Implement a bastardized ABI.
llvm-svn: 155686
2012-04-27 02:11:10 +00:00
Tim Northover
876c151146 Use VLD1 in NEON extenting-load patterns instead of VLDR.
On some cores it's a bad idea for performance to mix VFP and NEON instructions
and since these patterns are NEON anyway, the NEON load should be used.

llvm-svn: 155630
2012-04-26 08:46:29 +00:00
Evan Cheng
7f9bf43bcf MachineBasicBlock::SplitCriticalEdge() should follow LLVM IR variant and refuse to break edge to EH landing pad. rdar://11300144
llvm-svn: 155470
2012-04-24 19:06:55 +00:00
James Molloy
44927f5296 Fix bad EXTRACT_SUBREG in instruction selection for extending-loads on NEON.
llvm-svn: 154915
2012-04-17 08:18:00 +00:00
Jakob Stoklund Olesen
8bea8a2373 FileCheckize these tests.
Add an extra test to ldr_post with an immediate increment.

llvm-svn: 154859
2012-04-16 20:56:42 +00:00
Jakob Stoklund Olesen
8d6641a2b2 Disable code placement for this test.
It makes it less sensitive to small changes in heuristics.

llvm-svn: 154857
2012-04-16 20:49:06 +00:00
Chandler Carruth
728acc9bd9 Flip the new block-placement pass to be on by default.
This is mostly to test the waters. I'd like to get results from FNT
build bots and other bots running on non-x86 platforms.

This feature has been pretty heavily tested over the last few months by
me, and it fixes several of the execution time regressions caused by the
inlining work by preventing inlining decisions from radically impacting
block layout.

I've seen very large improvements in yacr2 and ackermann benchmarks,
along with the expected noise across all of the benchmark suite whenever
code layout changes. I've analyzed all of the regressions and fixed
them, or found them to be impossible to fix. See my email to llvmdev for
more details.

I'd like for this to be in 3.1 as it complements the inliner changes,
but if any failures are showing up or anyone has concerns, it is just
a flag flip and so can be easily turned off.

I'm switching it on tonight to try and get at least one run through
various folks' performance suites in case SPEC or something else has
serious issues with it. I'll watch bots and revert if anything shows up.

llvm-svn: 154816
2012-04-16 13:49:17 +00:00
Evan Cheng
3499593c7e On Darwin targets, only use vfma etc. if the source use fma() intrinsic explicitly.
llvm-svn: 154689
2012-04-13 18:59:28 +00:00
Evan Cheng
f138fb4599 Add more fused mul+add/sub patterns. rdar://10139676
llvm-svn: 154484
2012-04-11 06:59:47 +00:00
Evan Cheng
b5291aea18 Match (fneg (fma) to vfnma. rdar://10139676
llvm-svn: 154469
2012-04-11 01:21:25 +00:00
Evan Cheng
eaf8eba8c4 Merge fma.ll into fusedMAC.ll
llvm-svn: 154466
2012-04-11 01:03:11 +00:00
Owen Anderson
a8319713a4 Move the constant-folding support for FP_ROUND in SelectionDAG from the one-operand version of getNode() to the two-operand version, since it became a two-operand node at sound point.
Zap a testcase that this allows us to completely fold away.

llvm-svn: 154447
2012-04-10 22:46:53 +00:00
Evan Cheng
f9617f7f54 Handle llvm.fma.* intrinsics. rdar://10914096
llvm-svn: 154439
2012-04-10 21:40:28 +00:00
Eric Christopher
ec1405e930 To ensure that we have more accurate line information for a block
don't elide the branch instruction if it's the only one in the block,
otherwise it's ok.

PR9796 and rdar://11215207

llvm-svn: 154417
2012-04-10 18:18:10 +00:00
Anton Korobeynikov
0fc5fe0430 Transform div to mul with reciprocal only when fp imm is legal.
This fixes PR12516 and uncovers one weird problem in legalize (workarounded)

llvm-svn: 154394
2012-04-10 13:22:49 +00:00
Evan Cheng
d9ff163215 Add proper checks.
llvm-svn: 154379
2012-04-10 03:15:42 +00:00
Evan Cheng
5825e9dbf5 Fix a long standing tail call optimization bug. When a libcall is emitted
legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.

PR12419
rdar://9770785
rdar://11195178

llvm-svn: 154370
2012-04-10 01:51:00 +00:00
Chad Rosier
a588421976 When performing a truncating store, it's possible to rearrange the data
in-register, such that we can use a single vector store rather then a 
series of scalar stores.

For func_4_8 the generated code

	vldr	d16, LCPI0_0
	vmov	d17, r0, r1
	vadd.i16	d16, d17, d16
	vmov.u16	r0, d16[3]
	strb	r0, [r2, #3]
	vmov.u16	r0, d16[2]
	strb	r0, [r2, #2]
	vmov.u16	r0, d16[1]
	strb	r0, [r2, #1]
	vmov.u16	r0, d16[0]
	strb	r0, [r2]
	bx	lr

becomes

	vldr	d16, LCPI0_0
	vmov	d17, r0, r1
	vadd.i16	d16, d17, d16
	vuzp.8	d16, d17
	vst1.32	{d16[0]}, [r2, :32]
	bx	lr

I'm not fond of how this combine pessimizes 2012-03-13-DAGCombineBug.ll,
but I couldn't think of a way to judiciously apply this combine.

This

	ldrh	r0, [r0, #4]
	strh	r0, [r1]

becomes

	vldr	d16, [r0]
	vmov.u16	r0, d16[2]
	vmov.32	d16[0], r0
	vuzp.16	d16, d17
	vst1.32	{d16[0]}, [r1, :32]

PR11158
rdar://10703339

llvm-svn: 154340
2012-04-09 20:32:02 +00:00
Duncan Sands
cd52f3d447 Convert floating point division by a constant into multiplication by the
reciprocal if converting to the reciprocal is exact.  Do it even if inexact
if -ffast-math.  This substantially speeds up ac.f90 from the polyhedron
benchmarks.

llvm-svn: 154265
2012-04-07 20:04:00 +00:00
Jakob Stoklund Olesen
bb7b631def Allow negative immediates in ARM and Thumb2 compares.
ARM and Thumb2 mode can use cmn instructions to compare against negative
immediates. Thumb1 mode can't.

llvm-svn: 154183
2012-04-06 17:45:04 +00:00
James Molloy
3604b95957 An oversight when applying the patches for r150956 and r150957 to a vanilla tree meant I forgot to svn add these testcases.
Noticed while investigating PR12274!

llvm-svn: 154090
2012-04-05 10:01:12 +00:00
Jakob Stoklund Olesen
e1ae4f161c Pass the right sign to TLI->isLegalICmpImmediate.
LSR can fold three addressing modes into its ICmpZero node:

  ICmpZero BaseReg + Offset      => ICmp BaseReg, -Offset
  ICmpZero -1*ScaleReg + Offset  => ICmp ScaleReg, Offset
  ICmpZero BaseReg + -1*ScaleReg => ICmp BaseReg, ScaleReg

The first two cases are only used if TLI->isLegalICmpImmediate() likes
the offset.

Make sure the right Offset sign is passed to this method in the second
case. The ARM version is not symmetric.

<rdar://problem/11184260>

llvm-svn: 154079
2012-04-05 03:10:56 +00:00
Jakob Stoklund Olesen
0419ed395c Implement ARMBaseInstrInfo::commuteInstruction() for MOVCCr.
A MOVCCr instruction can be commuted by inverting the condition. This
can help reduce register pressure and remove unnecessary copies in some
cases.

<rdar://problem/11182914>

llvm-svn: 154033
2012-04-04 18:23:42 +00:00
Jakob Stoklund Olesen
97f47c37b6 Allocate virtual registers in ascending order.
This is just the fallback tie-breaker ordering, the main allocation
order is still descending size.

Patch by Shamil Kurmangaleev!

llvm-svn: 153904
2012-04-02 22:30:39 +00:00
Lang Hames
dbc3175c89 During two-address lowering, rescheduling an instruction does not untie
operands. Make TryInstructionTransform return false to reflect this.
Fixes PR11861.

llvm-svn: 153892
2012-04-02 19:58:43 +00:00
Nadav Rotem
2729f54295 This commit contains a few changes that had to go in together.
1. Simplify xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B))
   (and also scalar_to_vector).

2. Xor/and/or are indifferent to the swizzle operation (shuffle of one src).
   Simplify xor/and/or (shuff(A), shuff(B)) -> shuff(op (A, B))

3. Optimize swizzles of shuffles:  shuff(shuff(x, y), undef) -> shuff(x, y).

4. Fix an X86ISelLowering optimization which was very bitcast-sensitive.

Code which was previously compiled to this:

movd    (%rsi), %xmm0
movdqa  .LCPI0_0(%rip), %xmm2
pshufb  %xmm2, %xmm0
movd    (%rdi), %xmm1
pshufb  %xmm2, %xmm1
pxor    %xmm0, %xmm1
pshufb  .LCPI0_1(%rip), %xmm1
movd    %xmm1, (%rdi)
ret

Now compiles to this:

movl    (%rsi), %eax
xorl    %eax, (%rdi)
ret

llvm-svn: 153848
2012-04-01 19:31:22 +00:00
Evan Cheng
f3c23907f5 ARM target should allow codegenprep to duplicate ret instructions to enable tailcall opt. rdar://11140249
llvm-svn: 153717
2012-03-30 01:24:39 +00:00
Lang Hames
5685c98688 Change the constant in this testcase so that it results in a constant pool
load.

llvm-svn: 153704
2012-03-29 23:52:38 +00:00
Evan Cheng
72cf12e416 ARM has a peephole optimization which looks for a def / use pair. The def
produces a 32-bit immediate which is consumed by the use. It tries to 
fold the immediate by breaking it into two parts and fold them into the
immmediate fields of two uses. e.g
       movw    r2, #40885
       movt    r3, #46540
       add     r0, r0, r3
=>
       add.w   r0, r0, #3019898880
       add.w   r0, r0, #30146560
;
However, this transformation is incorrect if the user produces a flag. e.g.
       movw    r2, #40885
       movt    r3, #46540
       adds    r0, r0, r3
=>
       add.w   r0, r0, #3019898880
       adds.w  r0, r0, #30146560
Note the adds.w may not set the carry flag even if the original sequence
would.

rdar://11116189

llvm-svn: 153484
2012-03-26 23:31:00 +00:00
Eli Bendersky
3ef88c1833 Continue cleanup of LIT, getting rid of the remaining artifacts from dejagnu
* Removed test/lib/llvm.exp - it is no longer needed 
* Deleted the dg.exp reading code from test/lit.cfg. There are no dg.exp files
  left in the test suite so this code is no longer required. test/lit.cfg is
  now much shorter and clearer 
* Removed a lot of duplicate code in lit.local.cfg files that need access to
  the root configuration, by adding a "root" attribute to the TestingConfig
  object. This attribute is dynamically computed to provide the same
  information as was previously provided by the custom getRoot functions. 
* Documented the config.root attribute in docs/CommandGuide/lit.pod

llvm-svn: 153408
2012-03-25 09:02:19 +00:00
Andrew Trick
121e3e153c Remove -enable-lsr-nested in time for 3.1.
Tests cases have been removed but attached to open PR12330.

llvm-svn: 153286
2012-03-22 22:42:45 +00:00
Chad Rosier
ce3a78be60 [fast-isel] Fold "urem x, pow2" -> "and x, pow2-1". This should fix the 271%
execution-time regression for nsieve-bits on the ARMv7 -O0 -g nightly tester.
This may also improve compile-time on architectures that would otherwise 
generate a libcall for urem (e.g., ARM) or fall back to the DAG selector.
rdar://10810716

llvm-svn: 153230
2012-03-22 00:21:17 +00:00
Chad Rosier
c6293b3e41 Fix test case from r153135.
llvm-svn: 153140
2012-03-20 21:49:54 +00:00
Anton Korobeynikov
ccc669ff8f Perform mul combine when multiplying wiht negative constants.
Patch by Weiming Zhao!
This fixes PR12212

llvm-svn: 153049
2012-03-19 19:19:50 +00:00
Chad Rosier
e007850778 [fast-isel] Address Eli's comments for r152847. Specifically, add a test case
and still allow immediate encoding, just not with cmn.
rdar://11038907

llvm-svn: 152869
2012-03-15 22:54:20 +00:00
Jim Grosbach
3812c82b92 ARM case-insensitive checking for APSR_nzcv.
rdar://11056591

llvm-svn: 152846
2012-03-15 21:34:14 +00:00
Lang Hames
7918b0b225 Use vmov.f32 to materialize f32 consts on ARM. This relaxes constraints on
register allocation by allowing all 32 D-registers to be used. Patch by Cameron
Zwarich.

llvm-svn: 152824
2012-03-15 18:49:02 +00:00
Evan Cheng
155a7230b7 DAG combine incorrectly optimize (i32 vextract (v4i16 load $addr), c) to
(i16 load $addr+c*sizeof(i16)) and replace uses of (i32 vextract) with the
i16 load. It should issue an extload instead: (i32 extload $addr+c*sizeof(i16)).

rdar://11035895

llvm-svn: 152675
2012-03-13 22:00:52 +00:00
Evan Cheng
f04f2e7a52 Extend r148086 to check for [r +/- reg] address mode. This fixes queens performance regression (due to increased register pressure from overly aggressive pre-inc formation).
llvm-svn: 152162
2012-03-06 23:33:32 +00:00
Jakob Stoklund Olesen
d4e1cb591a Add <imp-def> operands when reloading into physregs.
When an instruction only writes sub-registers, it is still necessary to
add an <imp-def> operand for the super-register.  When reloading into a
virtual register, rewriting will add the operand, but when loading
directly into a virtual register, the <imp-def> operand is still
necessary.

llvm-svn: 152095
2012-03-06 02:48:17 +00:00
Lang Hames
a49054ac9c Split fpscr into two registers: FPSCR and FPSCR_NZCV.
The fpscr register contains both flags (set by FP operations/comparisons) and
control bits. The control bits (FPSCR) should be reserved, since they're always
available and needn't be defined before use. The flag bits (FPSCR_NZCV) should
like to be unreserved so they can be hoisted by MachineCSE. This fixes PR12165.

llvm-svn: 152076
2012-03-06 00:19:55 +00:00
Sebastian Pop
e6eeed8151 updated patch for the ARM fused multiply add/sub
In this update:
- I assumed neon2 does not imply vfpv4, but neon and vfpv4 imply neon2.
- I kept setting .fpu=neon-vfpv4 code attribute because that is what the
assembler understands.

Patch by Ana Pazos <apazos@codeaurora.org>

llvm-svn: 152036
2012-03-05 17:39:52 +00:00
Jakob Stoklund Olesen
fd29132e44 Use <def,undef> operands when spilling NEON bundles.
MachineOperands that define part of a virtual register must have an
<undef> flag if they are not intended as read-modify-write operands.

The old trick of adding an <imp-def> operand doesn't work any longer.

Fixes PR12177.

llvm-svn: 152008
2012-03-04 18:40:30 +00:00
Bill Wendling
88f55b45b2 Do trivial CSE of dead BBs during codegen preparation.
Some BBs can become dead after codegen preparation. If we delete them here, it
could help enable tail-call optimizations later on.
<rdar://problem/10256573>

llvm-svn: 152002
2012-03-04 10:46:01 +00:00
Jakob Stoklund Olesen
cfef07fd05 Fix RA-dependent test.
llvm-svn: 151958
2012-03-03 00:26:30 +00:00
Evan Cheng
31b407de17 Neuter the optimization I implemented with r107852 and r108258 which turn some
floating point equality comparisons into integer ones with -ffast-math. The
issue is the optimization causes +0.0 != -0.0.

Now the optimization is only done when one side is known to be 0.0. The other
side's sign bit is masked off for the comparison.

rdar://10964603

llvm-svn: 151861
2012-03-01 23:27:13 +00:00