1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00
Commit Graph

7574 Commits

Author SHA1 Message Date
Benjamin Kramer
9205d91a11 DAGCombiner: Generate a correct constant for vector types when folding (xor (and)) into (and (not)).
PR15948.

llvm-svn: 181597
2013-05-10 14:09:52 +00:00
Tom Stellard
7edf38bf1f R600: Remove AMDILPeeopholeOptimizer and replace optimizations with tablegen patterns
The BFE optimization was the only one we were actually using, and it was
emitting an intrinsic that we don't support.

https://bugs.freedesktop.org/show_bug.cgi?id=64201

Reviewed-by: Christian König <christian.koenig@amd.com>

NOTE: This is a candidate for the 3.3 branch.
llvm-svn: 181580
2013-05-10 02:09:45 +00:00
Tom Stellard
ed363c57b2 R600: Expand SUB for v2i32/v4i32
Patch by: Aaron Watry

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>

NOTE: This is a candidate for the 3.3 branch.
llvm-svn: 181579
2013-05-10 02:09:39 +00:00
Tom Stellard
3ca3d250c6 R600: Expand MUL for v4i32/v2i32
Fixes piglit test for OpenCL builtin mul24, and allows mad24 to run.

Patch by: Aaron Watry

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>

NOTE: This is a candidate for the 3.3 branch.
llvm-svn: 181578
2013-05-10 02:09:34 +00:00
Tom Stellard
56fef8261c R600: Expand SRA for v4i32/v2i32
v2: Add v4i32 test

Patch by: Aaron Watry

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>

NOTE: This is a candidate for the 3.3 branch.
llvm-svn: 181577
2013-05-10 02:09:29 +00:00
Tom Stellard
5d4a5a0d37 R600: Expand vselect for v4i32 and v2i32
v2: Add vselect v4i32 test

Patch by: Aaron Watry

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>

NOTE: This is a candidate for the 3.3 branch.
llvm-svn: 181576
2013-05-10 02:09:24 +00:00
Owen Anderson
e34b034b7d Teach SelectionDAG to constant fold all-constant FMA nodes the same way that it constant folds FADD, FMUL, etc.
llvm-svn: 181555
2013-05-09 22:27:13 +00:00
Bill Wendling
59bb428cc9 Generate a compact unwind encoding in the face of a stack alignment push.
We generate a `push' of a random register (%rax) if the stack needs to be
aligned by the size of that register. However, this could mess up compact unwind
generation. In particular, we want to still generate compact unwind in the
presence of this monstrosity.

Check if the push of of the %rax/%eax register. If it is and it's marked with
the `FrameSetup' flag, then we can generate a compact unwind encoding for the
function only if the push is the last FrameSetup instruction.

llvm-svn: 181540
2013-05-09 20:10:38 +00:00
Jyotsna Verma
15d448ba0c Hexagon: Use relation map for getMatchingCondBranchOpcode() and
getInvertedPredicatedOpcode() functions instead of switch cases.

llvm-svn: 181530
2013-05-09 18:25:44 +00:00
Richard Osborne
7504cb9f47 [XCore] Fix handling of functions where only the LR is spilled.
Previously we only checked if the LR required saving if the frame size was
non zero. However because the caller reserves 1 word for the callee to use
that doesn't count towards our frame size it is possible for the LR to need
saving and for the frame size to be 0.

We didn't hit when the LR needed saving because of a function calls because
the 1 word of stack we must allocate for our callee means the frame size
is always non zero in this case. However we can hit this case if the LR is
clobbered in inline asm.

llvm-svn: 181520
2013-05-09 16:43:42 +00:00
Akira Hatanaka
a1d814e7b8 [mips] Add instruction selection pattern for (seteq $LHS, 0).
llvm-svn: 181459
2013-05-08 19:38:04 +00:00
Bill Schmidt
7f1a2b5212 Fix handling of anonymous aggregate parameters for powerpc*-apple-darwin8.
This fixes bug 15821 similarly to the powerpc64-linux fix for bug 14779.

Patch by David Fang.

llvm-svn: 181449
2013-05-08 17:22:33 +00:00
Michel Danzer
e95e9672c2 R600/SI: Add lit tests for llvm.SI.imageload and llvm.SI.resinfo intrinsics
Adapted from the llvm.SI.sample test.

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 181425
2013-05-08 13:07:29 +00:00
Hal Finkel
1636de8210 PPCInstrInfo::optimizeCompareInstr should not optimize FP compares
The floating-point record forms on PPC don't set the condition register bits
based on a comparison with zero (like the integer record forms do), but rather
based on the exception status bits.

llvm-svn: 181423
2013-05-08 12:16:14 +00:00
Nick Lewycky
a88ff03516 Fix a bug in codegenprep where it was losing track of values OptimizeMemoryInst
by switching to a ValueMap. Patch by Andrea DiBiagio!

llvm-svn: 181397
2013-05-08 09:00:10 +00:00
David Majnemer
fdba035711 DAGCombiner: Simplify inverted bit tests
Fold (xor (and x, y), y) -> (and (not x), y)

This removes an opportunity for a constant to appear twice.

llvm-svn: 181395
2013-05-08 06:44:42 +00:00
Jyotsna Verma
37863260ff Hexagon: Fix Small Data support to handle -G 0 correctly.
llvm-svn: 181344
2013-05-07 19:53:00 +00:00
Jyotsna Verma
5307666fe8 Reverting r181331.
Missing file, HexagonSplitConst32AndConst64.cpp, from lib/Target/Hexagon/CMakeLists.txt.

llvm-svn: 181334
2013-05-07 17:12:35 +00:00
Jyotsna Verma
af0c734e1b Hexagon: Fix Small Data support to handle -G 0 correctly.
llvm-svn: 181331
2013-05-07 16:42:15 +00:00
Bill Wendling
0c1e625af5 Reduce attributes.
llvm-svn: 181245
2013-05-06 20:57:23 +00:00
Tom Stellard
fb8e73f3af R600: Emit config values in register / value pairs
Reviewed-by: Vincent Lejeune <vljn@ovi.com>
Tested-By: Aaron Watry <awatry@gmail.com>
llvm-svn: 181228
2013-05-06 17:50:51 +00:00
Tom Stellard
6c3f6e1b02 R600: Stop emitting the instruction type byte before each instruction
Reviewed-by: Vincent Lejeune <vljn@ovi.com>
Tested-By: Aaron Watry <awatry@gmail.com>
llvm-svn: 181225
2013-05-06 17:50:44 +00:00
Tom Stellard
ebe049fd75 R600: Emit ISA for CALL_FS_* instructions
Reviewed-by: Vincent Lejeune <vljn@ovi.com>
Tested-By: Aaron Watry <awatry@gmail.com>
llvm-svn: 181223
2013-05-06 17:50:26 +00:00
Ulrich Weigand
1431b3c2f5 [SystemZ] Add CodeGen test cases
This adds all CodeGen tests for the SystemZ target.

This version of the patch incorporates feedback from a review by
Sean Silva.  Thanks to all reviewers!

Patch by Richard Sandiford.

llvm-svn: 181204
2013-05-06 16:17:29 +00:00
Michael Kuperstein
74685eff73 Fix slightly too aggressive conact_vector optimization.
(Would sometimes optimize away conacts used to extend a vector with undef values)

llvm-svn: 181186
2013-05-06 08:06:13 +00:00
Bill Wendling
454b468436 Add a testcase that checks that we generate functions with frame
pointers or not depending upon the function attributes.

llvm-svn: 181180
2013-05-06 05:45:57 +00:00
Evan Cheng
2fcdb53946 Test case for r181160 and r181161. rdar://13782395
llvm-svn: 181162
2013-05-05 18:07:15 +00:00
Stepan Dyatkovskiy
c06cd03f6e For ARM backend, fixed "byval" attribute support.
Now even the small structures could be passed within byval (small enough
to be stored in GPRs).
In regression tests next function prototypes are checked:

PR15293:
  %artz = type { i32 }
  define void @foo(%artz* byval %s)
  define void @foo2(%artz* byval %s, i32 %p, %artz* byval %s2)
foo: "s" stored in R0
foo2: "s" stored in R0, "s2" stored in R2.

Next AAPCS rules are checked:
5.5 Parameters Passing, C.4 and C.5,
"ParamSize" is parameter size in 32bit words:
-- NSAA != 0, NCRN < R4 and NCRN+ParamSize > R4.
   Parameter should be sent to the stack; NCRN := R4.
-- NSAA != 0, and NCRN < R4, NCRN+ParamSize < R4.
   Parameter stored in GPRs; NCRN += ParamSize.

llvm-svn: 181148
2013-05-05 07:48:36 +00:00
David Majnemer
d5ba4da281 Remove a recently redundant transform from X86ISelLowering.
X86ISelLowering has support to treat:
(icmp ne (and (xor %flags, -1), (shl 1, flag)), 0)

as if it were actually:
(icmp eq (and %flags, (shl 1, flag)), 0)

However, r179386 has code at the InstCombine level to handle this.

llvm-svn: 181145
2013-05-05 02:00:10 +00:00
Tim Northover
d4f2cac7b6 AArch64: support literal pool access in large memory model.
llvm-svn: 181120
2013-05-04 16:54:07 +00:00
Tim Northover
4ef2500d01 AArch64: support large code model for jump-tables
llvm-svn: 181119
2013-05-04 16:54:00 +00:00
Tim Northover
ece66eacb2 AArch64: implement support for blockaddress in large code model
llvm-svn: 181118
2013-05-04 16:53:53 +00:00
Tim Northover
87645e02c0 AArch64: implement large code model access to global variables.
The MOVZ/MOVK instruction sequence may not be the most efficient (a
literal-pool load could be better) but adding that would require
reinstating the ConstantIslands pass.

For now the sequence is correct, and that's enough. Beware, as of
commit GNU ld does not appear to support the relocations needed for
this. Its primary purpose (for now) will be to support JITed code,
since in that case there is no guarantee of where your code will end
up in memory relative to external symbols it references.

llvm-svn: 181117
2013-05-04 16:53:46 +00:00
Reed Kotler
b89d9a0181 Remove some uneeded pseudos in the presence of the naked function attribute.
llvm-svn: 181072
2013-05-03 23:17:24 +00:00
Akira Hatanaka
5f295bccfc [mips] Split the DSP control register and define one register for each field of
its fields.

This removes false dependencies between DSP instructions which access different
fields of the the control register. Implicit register operands are added to
instructions RDDSP and WRDSP after instruction selection, depending on the
value of the mask operand.

llvm-svn: 181041
2013-05-03 18:37:49 +00:00
Tom Stellard
2165728987 R600: Expand vector or, shl, srl, and xor nodes
llvm-svn: 181035
2013-05-03 17:21:31 +00:00
Tom Stellard
f2fd0109a0 R600: Add pattern for SHA-256 Ma function
This can be optimized using the BFI_INT instruction.

llvm-svn: 181033
2013-05-03 17:21:20 +00:00
Akira Hatanaka
ab6ee99fe0 [mips] Handle reading, writing or copying of ccond field of DSP control
register.

- Define pseudo instructions which store or load ccond field of the DSP
  control register.
- Emit the pseudos in MipsSEInstrInfo::storeRegToStack and loadRegFromStack.
- Expand the pseudos before callee-scan save.
- Emit instructions RDDSP or WRDSP to copy between ccond field and GPRs. 

llvm-svn: 180969
2013-05-02 23:07:05 +00:00
Vincent Lejeune
5d7b2a4aea R600: Signed literals are 64bits wide
llvm-svn: 180960
2013-05-02 21:53:03 +00:00
Vincent Lejeune
3ff31b75b3 R600: If previous bundle is dot4, PV valid chan is always X
llvm-svn: 180959
2013-05-02 21:52:55 +00:00
Vincent Lejeune
97fb65a788 R600: Add a test to check that use_kill is emitted
llvm-svn: 180958
2013-05-02 21:52:46 +00:00
Vincent Lejeune
62da1453e1 R600: Prettier asmPrint of Alu
llvm-svn: 180956
2013-05-02 21:52:30 +00:00
Pranav Bhandarkar
520d26e773 Hexagon - Add peephole optimizations for zero extends.
* lib/Target/Hexagon/HexagonInstrInfo.td: Add patterns to combine a
	sequence of a pair of i32->i64 extensions followed by a "bitwise or"
	into COMBINE_rr.
	* lib/Target/Hexagon/HexagonPeephole.cpp: Copy propagate Rx in the
	instruction Rp = COMBINE_Ir_V4(0, Rx) to the uses of Rp:subreg_loreg.
	* test/CodeGen/Hexagon/union-1.ll: New test.
	* test/CodeGen/Hexagon/combine_ir.ll: Fix test.

llvm-svn: 180946
2013-05-02 20:22:51 +00:00
Manman Ren
0e2c14381d TBAA: remove !tbaa from testing cases if not used.
This will make it easier to turn on struct-path aware TBAA since the metadata
format will change.

llvm-svn: 180935
2013-05-02 18:11:35 +00:00
Michael Liao
ca62375d96 Rewrite X86 codegen regression test with FileCheck
llvm-svn: 180910
2013-05-02 06:20:42 +00:00
Michael Liao
ec28235c2a Avoid generating tempfile(s) never used
As DejaGNU is deprecated, it seems pipe-jam issue doesn't exist any more.

llvm-svn: 180892
2013-05-01 22:46:50 +00:00
Bill Wendling
218b457a2f Revert r180737. The companion patch was reverted, and this is not relevant right now.
llvm-svn: 180889
2013-05-01 22:32:08 +00:00
Nadav Rotem
d62966b79d Optimize away nop CONCAT_VECTOR nodes.
Optimize CONCAT_VECTOR nodes that merge EXTRACT_SUBVECTOR values that extract from the same vector.

rdar://13402653
PR15866

llvm-svn: 180871
2013-05-01 19:18:51 +00:00
Rafael Espindola
056abc9e26 Put VMOVPQIto64rr in the VRPDI class.
Patch by Joshua Magee.

llvm-svn: 180842
2013-05-01 13:00:16 +00:00
Michael Liao
cff6c527b8 Forget remove the tempfile argument
llvm-svn: 180838
2013-05-01 05:45:57 +00:00
Michael Liao
349e772e4f More rewrites of x86 codegen regression tests with FileCheck
llvm-svn: 180837
2013-05-01 05:34:30 +00:00
Akira Hatanaka
f5c940dea8 [mips] Fix handling of instructions which copy to/from accumulator registers.
Expand copy instructions between two accumulator registers before callee-saved
scan is done. Handle copies between integer GPR and hi/lo registers in
MipsSEInstrInfo::copyPhysReg. Delete pseudo-copy instructions that are not
needed.

llvm-svn: 180827
2013-04-30 23:22:09 +00:00
Stephen Lin
84b2d4dbd4 Only pass 'returned' to target-specific lowering code when the value of entire register is guaranteed to be preserved.
llvm-svn: 180825
2013-04-30 22:49:28 +00:00
Akira Hatanaka
0bca7f3584 [mips] Instruction selection patterns for DSP-ASE vector select and compare
instructions.

llvm-svn: 180820
2013-04-30 22:37:26 +00:00
Adrian Prantl
7482401c1d Temporarily revert "Change the informal convention of DBG_VALUE so that we can express a"
because it breaks some buildbots.

This reverts commit 180816.

llvm-svn: 180819
2013-04-30 22:35:14 +00:00
Adrian Prantl
baf0a98faa Change the informal convention of DBG_VALUE so that we can express a
register-indirect address with an offset of 0.
It used to be that a DBG_VALUE is a register-indirect value if the offset
(operand 1) is nonzero. The new convention is that a DBG_VALUE is
register-indirect if the first operand is a register and the second
operand is an immediate. For plain registers use the combination reg, reg.

rdar://problem/13658587

llvm-svn: 180816
2013-04-30 22:16:46 +00:00
Hal Finkel
2ac40ae0d3 LocalStackSlotAllocation improvements
First, taking advantage of the fact that the virtual base registers are allocated in order of the local frame offsets, remove the quadratic register-searching behavior. Because of the ordering, we only need to check the last virtual base register created.

Second, store the frame index in the FrameRef structure, and get the frame index and the local offset from this structure at the top of the loop iteration. This allows us to de-nest the loops in insertFrameReferenceRegisters (and I think makes the code cleaner). I also moved the needsFrameBaseReg check into the first loop over instructions so that we don't bother pushing FrameRefs for instructions that don't want a virtual base register anyway.

Lastly, and this is the only functionality change, avoid the creation of single-use virtual base registers. These are currently not useful because, in general, they end up replacing what would be one r+r instruction with an add and a r+i instruction. Committing this removes the XFAIL in CodeGen/PowerPC/2007-09-07-LoadStoreIdxForms.ll

Jim has okayed this off-list.

llvm-svn: 180799
2013-04-30 20:04:37 +00:00
Manman Ren
0b37dd0efc TBAA: remove !tbaa from testing cases if not used.
This will make it easier to turn on struct-path aware TBAA since the metadata
format will change.

llvm-svn: 180796
2013-04-30 17:52:57 +00:00
Vincent Lejeune
25352bd54f R600: fix loop-address.ll test
Texture cache is now used when shader type is not specified

llvm-svn: 180785
2013-04-30 12:47:56 +00:00
Michael Liao
c67e0fc9ea Rewrite X86 codegen regression test with FileCheck
llvm-svn: 180776
2013-04-30 07:51:08 +00:00
Vincent Lejeune
29f24e0ce8 R600: use native for alu
llvm-svn: 180761
2013-04-30 00:14:38 +00:00
Vincent Lejeune
e641cd06c9 R600: Add FetchInst bit to instruction defs to denote vertex/tex instructions
v2[Vincent Lejeune]: Split FetchInst into usesTextureCache/usesVertexCache

llvm-svn: 180755
2013-04-30 00:13:39 +00:00
Michael Liao
3db1a24464 Rewrite test in FileCheck instead of grep in X86 codegen
llvm-svn: 180754
2013-04-30 00:13:38 +00:00
Manman Ren
180923f053 TBAA: remove !tbaa from testing cases if not used.
This will make it easier to turn on struct-path aware TBAA since the metadata
format will change.

llvm-svn: 180745
2013-04-29 22:58:55 +00:00
Bill Wendling
e24a89874b Duplicate a testcase.
llvm-svn: 180744
2013-04-29 22:42:47 +00:00
Michael Liao
4e03c7690d Rewrite some tests with FileCHeck in X86 codegen
- Revise previous patches of the same purpose by fixing
  *) grep <PA> | not grep <PB> semantically is not the same as
     CHECK: <PA>{{^<PB>.*$}} as the former will check all occurrences of <PA>
     while the later only check the first match. As the result, CHECK needs
     putting in all place where <PA> occurs.
  *) grep <PA> | count <N> needs a final CHECK-NOT of the same pattern.
     (As 'CHECK-<N>' is proposed for discussion, converting 'grep | count <N>'
      where N > 1 is postponed.)

llvm-svn: 180742
2013-04-29 22:41:29 +00:00
Tom Stellard
33e7a52e1c R600: Use correct CF_END instruction on Northern Island GPUs
llvm-svn: 180735
2013-04-29 22:23:58 +00:00
Tom Stellard
a22d2b47f3 R600: Fix encoding of CF_END_{EG, R600} instructions
The EOP bit was not being encoded.

llvm-svn: 180734
2013-04-29 22:23:54 +00:00
Rafael Espindola
ef355ff0c1 Make all darwin ppc stubs local.
This fixes pr15763.
Patch by David Fang.

llvm-svn: 180657
2013-04-27 00:43:16 +00:00
Benjamin Kramer
583d3f2591 Make CHECK lines a bit less strict so they also match code generated for win64.
Hopefully brings the windows buildbots back to life.

llvm-svn: 180630
2013-04-26 21:04:21 +00:00
Tom Stellard
de2ad0a8f1 R600: Initialize AMDGPUMachineFunction::ShaderType to ShaderType::COMPUTE
We need to intialize this to something and since clang does not set
the shader type attribute and clang is used only for compute shaders,
initializing it to COMPUTE seems like the best choice.

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 180620
2013-04-26 18:32:24 +00:00
Benjamin Kramer
40a2d53c85 ARM/NEON: Pattern match vector integer abs to vabs.
llvm-svn: 180604
2013-04-26 15:00:57 +00:00
Benjamin Kramer
11723aa321 X86: Now that we have a canonical form for vector integer abs, match it into pabs.
llvm-svn: 180600
2013-04-26 12:05:21 +00:00
Benjamin Kramer
7ce75fb032 DAGCombiner: Canonicalize vector integer abs in the same way we do it for scalars.
This already helps SSE2 x86 a lot because it lacks an efficient way to
represent a vector select. The long term goal is to enable the backend to match
a canonicalized pattern into a single instruction (e.g. vabs or pabs).

llvm-svn: 180597
2013-04-26 09:19:19 +00:00
Preston Gurd
0547d81fdb This patch adds the X86FixupLEAs pass, which will reduce instruction
latency for certain models of the Intel Atom family, by converting
instructions into their equivalent LEA instructions, when it is both
useful and possible to do so.

llvm-svn: 180573
2013-04-25 20:29:37 +00:00
Chad Rosier
3030f76a0d [inline asm] Add a test case for r180226. The specific issue is that the inline
assembly is requesting a 64-bit register, which is invalid for i386.
rdar://13731657

llvm-svn: 180445
2013-04-25 17:10:21 +00:00
Silviu Baranga
c656dd9bdd Fix constant folding for one lane vector types. Constant folding one lane vector types not returns a vector instead of a scalar.
llvm-svn: 180254
2013-04-25 09:32:33 +00:00
Tom Stellard
48d161332e R600: Use SHT_PROGBITS for the .AMDGPU.config section
The libelf implementation that is distributed here:
http://www.mr511.de/software/english.html
will not parse sections that are marked SHT_NULL.

llvm-svn: 180230
2013-04-24 23:56:14 +00:00
Andrew Trick
73014520d6 MI Sched: eliminate local vreg copies.
For now, we just reschedule instructions that use the copied vregs and
let regalloc elliminate it. I would really like to eliminate the
copies on-the-fly during scheduling, but we need a complete
implementation of repairIntervalsInRange() first.

The general strategy is for the register coalescer to eliminate as
many global copies as possible and shrink live ranges to be
extended-basic-block local. The coalescer should not have to worry
about resolving local copies (e.g. it shouldn't attemp to reorder
instructions). The scheduler is a much better place to deal with local
interference. The coalescer side of this equation needs work.

llvm-svn: 180193
2013-04-24 15:54:43 +00:00
Jyotsna Verma
4a5d195942 Hexagon: Use multiclass for combine and STri[bhwd]_shl_V4 instructions.
llvm-svn: 180145
2013-04-23 21:17:40 +00:00
Stephen Lin
44a24c9593 Add more tests for r179925 to verify correct handling of signext/zeroext; strengthen condition check to require actual MVT::i32 virtual register types, just in case (no actual functionality change)
llvm-svn: 180138
2013-04-23 19:42:25 +00:00
Jyotsna Verma
624f2e0434 Hexagon: Remove assembler mapped instruction definitions.
llvm-svn: 180133
2013-04-23 19:15:55 +00:00
Vincent Lejeune
3666f07489 R600: Use .AMDGPU.config section to emit stacksize
llvm-svn: 180124
2013-04-23 17:34:12 +00:00
Vincent Lejeune
e5ba5f1b14 R600: Add CF_END
llvm-svn: 180123
2013-04-23 17:34:00 +00:00
Jyotsna Verma
20903a7aba Hexagon: Remove duplicate instructions to handle global/immediate values
for absolute/absolute-set addressing modes.

llvm-svn: 180120
2013-04-23 17:11:46 +00:00
Rafael Espindola
f7c86d97a1 Move test from grep to FileCheck.
llvm-svn: 180092
2013-04-23 12:03:27 +00:00
Akira Hatanaka
913bf6194a [mips] In performDSPShiftCombine, check that all elements in the vector are
shifted by the same amount and the shift amount is smaller than the element
size.

llvm-svn: 180039
2013-04-22 19:58:23 +00:00
Stephen Lin
4e394628e7 Extra paranoid test for r179925 (verify that tail calls are not generated to 'this'-returning constructors of objects with different 'this' pointers than the caller)
llvm-svn: 180032
2013-04-22 17:23:49 +00:00
Stepan Dyatkovskiy
8adaf54376 Fix for 5.5 Parameter Passing --> Stage C:
-- C.4 and C.5 statements, when NSAA is not equal to SP.
 -- C.1.cp statement for VA functions. Note: There are no VFP CPRCs in a
    variadic procedure.

Before this patch "NSAA != 0" means "don't use GPRs anymore ". But there are
some exceptions in AAPCS.
1. For non VA function: allocate all VFP regs for CPRC. When all VFPs are allocated
   CPRCs would be sent to stack, while non CPRCs may be still allocated in GRPs.
2. Check that for VA functions all params uses GPRs and then stack.
   No exceptions, no CPRCs here.

llvm-svn: 180011
2013-04-22 13:06:52 +00:00
Arnaud A. de Grandmaison
087fe129d8 Cleanup: test source files do not need to be executable
llvm-svn: 180003
2013-04-22 08:02:43 +00:00
David Blaikie
9bfe15c313 Revert "Revert "PR14606: debug info imported_module support""
This reverts commit r179840 with a fix to test/DebugInfo/two-cus-from-same-file.ll

I'm not sure why that test only failed on ARM & MIPS and not X86 Linux, even
though the debug info was clearly invalid on all of them, but this ought to fix
it.

llvm-svn: 179996
2013-04-22 06:12:31 +00:00
Jim Grosbach
3104dcf2ca Legalize vector truncates by parts rather than just splitting.
Rather than just splitting the input type and hoping for the best, apply
a bit more cleverness. Just splitting the types until the source is
legal often leads to an illegal result time, which is then widened and a
scalarization step is introduced which leads to truly horrible code
generation. With the loop vectorizer, these sorts of operations are much
more common, and so it's worth extra effort to do them well.

Add a legalization hook for the operands of a TRUNCATE node, which will
be encountered after the result type has been legalized, but if the
operand type is still illegal. If simple splitting of both types
ends up with the result type of each half still being legal, just
do that (v16i16 -> v16i8 on ARM, for example). If, however, that would
result in an illegal result type (v8i32 -> v8i8 on ARM, for example),
we can get more clever with power-two vectors. Specifically,
split the input type, but also widen the result element size, then
concatenate the halves and truncate again.  For example on ARM,
To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
  %inlo = v4i32 extract_subvector %in, 0
  %inhi = v4i32 extract_subvector %in, 4
  %lo16 = v4i16 trunc v4i32 %inlo
  %hi16 = v4i16 trunc v4i32 %inhi
  %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
  %res = v8i8 trunc v8i16 %in16

This allows instruction selection to generate three VMOVN instructions
instead of a sequences of moves, stores and loads.

Update the ARMTargetTransformInfo to take this improved legalization
into account.

Consider the simplified IR:

define <16 x i8> @test1(<16 x i32>* %ap) {
  %a = load <16 x i32>* %ap
  %tmp = trunc <16 x i32> %a to <16 x i8>
  ret <16 x i8> %tmp
}

define <8 x i8> @test2(<8 x i32>* %ap) {
  %a = load <8 x i32>* %ap
  %tmp = trunc <8 x i32> %a to <8 x i8>
  ret <8 x i8> %tmp
}

Previously, we would generate the truly hideous:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #20
	bic	sp, sp, #7
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d24, d25}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	vld1.64	{d18, d19}, [r2:128]
	add	r1, r0, #16
	vmovn.i32	d22, q8
	vld1.64	{d16, d17}, [r1:128]
	vmovn.i32	d20, q9
	vmovn.i32	d18, q12
	vmov.u16	r0, d22[3]
	strb	r0, [sp, #15]
	vmov.u16	r0, d22[2]
	strb	r0, [sp, #14]
	vmov.u16	r0, d22[1]
	strb	r0, [sp, #13]
	vmov.u16	r0, d22[0]
	vmovn.i32	d16, q8
	strb	r0, [sp, #12]
	vmov.u16	r0, d20[3]
	strb	r0, [sp, #11]
	vmov.u16	r0, d20[2]
	strb	r0, [sp, #10]
	vmov.u16	r0, d20[1]
	strb	r0, [sp, #9]
	vmov.u16	r0, d20[0]
	strb	r0, [sp, #8]
	vmov.u16	r0, d18[3]
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	vldmia	sp, {d16, d17}
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	mov	sp, r7
	pop	{r7}
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #12
	bic	sp, sp, #7
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d20, d21}, [r0:128]
	vmovn.i32	d18, q8
	vmov.u16	r0, d18[3]
	vmovn.i32	d16, q10
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	ldm	sp, {r0, r1}
	mov	sp, r7
	pop	{r7}
	bx	lr

Now, however, we generate the much more straightforward:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d20, d21}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	add	r1, r0, #16
	vld1.64	{d18, d19}, [r2:128]
	vld1.64	{d22, d23}, [r1:128]
	vmovn.i32	d17, q8
	vmovn.i32	d16, q9
	vmovn.i32	d18, q10
	vmovn.i32	d19, q11
	vmovn.i16	d17, q8
	vmovn.i16	d16, q9
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d18, d19}, [r0:128]
	vmovn.i32	d16, q8
	vmovn.i32	d17, q9
	vmovn.i16	d16, q8
	vmov	r0, r1, d16
	bx	lr

llvm-svn: 179989
2013-04-21 23:47:41 +00:00
Jim Grosbach
2582e2e539 ARM: Split out cost model vcvt testcases.
They had a separate RUN line already, so may as well be in a separate file.

llvm-svn: 179988
2013-04-21 23:47:37 +00:00
Jakob Stoklund Olesen
c9f30e9065 Passing arguments to varags functions under the SPARC v9 ABI.
Arguments after the fixed arguments never use the floating point
registers.

llvm-svn: 179987
2013-04-21 21:36:49 +00:00
Jakob Stoklund Olesen
d8a2b84611 Fix the SETHIimm pattern for 64-bit code.
Don't ignore the high 32 bits of the immediate.

llvm-svn: 179985
2013-04-21 21:18:03 +00:00
Tim Northover
593f76e08e ARM: fix part of test which actually needed an asserts build
This should fix a buildbot failure that occurred after r179977.

llvm-svn: 179978
2013-04-21 12:20:19 +00:00
Tim Northover
943f2a9234 ARM: Use ldrd/strd to spill 64-bit pairs when available.
This allows common sp-offsets to be part of the instruction and is
probably faster on modern CPUs too.

llvm-svn: 179977
2013-04-21 11:57:07 +00:00
Bill Wendling
61eb6957c5 Remove tbaa metadata.
llvm-svn: 179970
2013-04-21 01:38:25 +00:00
Jakob Stoklund Olesen
cbfbef04da Compile varargs functions for SPARCv9.
With a little help from the frontend, it looks like the standard va_*
intrinsics can do the job.

Also clean up an old bitcast hack in LowerVAARG that dealt with
unaligned double loads. Load SDNodes can specify an alignment now.

Still missing: Calling varargs functions with float arguments.

llvm-svn: 179961
2013-04-20 22:49:16 +00:00
Tim Northover
de5285eb6f ARM: don't add FrameIndex offset for LDMIA (has no immediate)
Previously, when spilling 64-bit paired registers, an LDMIA with both
a FrameIndex and an offset was produced. This kind of instruction
shouldn't exist, and the extra operand was being confused with the
predicate, causing aborts later on.

This removes the invalid 0-offset from the instruction being
produced.

llvm-svn: 179956
2013-04-20 19:31:00 +00:00
Stephen Lin
98df7358cd Minor renaming of tests (for consistency with an in-development patch)
llvm-svn: 179954
2013-04-20 16:21:26 +00:00
Benjamin Kramer
a7e8f887fe Don't litter .s files in test directory.
llvm-svn: 179937
2013-04-20 10:43:40 +00:00
Stephen Lin
9d99ba2071 Add CodeGen support for functions that always return arguments via a new parameter attribute 'returned', which is taken advantage of in target-independent tail call opportunity detection and in ARM call lowering (when placed on an integral first parameter).
llvm-svn: 179925
2013-04-20 05:14:40 +00:00
Stephen Lin
65c1101eba Allow tail call opportunity detection through nested and/or multiple iterations of extractelement/insertelement indirection
llvm-svn: 179924
2013-04-20 04:27:51 +00:00
Akira Hatanaka
11b4211d68 [mips] Instruction selection patterns for DSP-ASE vector shifts.
llvm-svn: 179906
2013-04-19 23:21:32 +00:00
Hal Finkel
9e44a50443 Fix PPC optimizeCompareInstr swapped-sub argument handling
When matching a compare with a subtract where the arguments of the compare are
swapped w.r.t. the arguments of the subtract, we need to negate the predicates
(or CR bit indices) of the users. This, however, is not the same as inverting
the predicate (negating LT -> GT, but inverting LT -> GE, for example). The ARM
backend seems to do this correctly, but when I adapted the code for the PPC
backend, I introduced an error in this logic.

Comparison optimization is now enabled again by default.

llvm-svn: 179899
2013-04-19 22:08:38 +00:00
Anton Korobeynikov
f95220dd8b Do not mangle in MS-way the globals with magic \001 in the name.
Based on the patch by David Nadlinger!

llvm-svn: 179889
2013-04-19 21:20:56 +00:00
Bill Wendling
e8c6d1cb09 Make test slightly more readable.
llvm-svn: 179888
2013-04-19 21:14:59 +00:00
Bill Wendling
7256108f6f Add a testcase to make sure we generate the proper compact unwind section for a function that cannot produce a compact unwind encoding.
llvm-svn: 179887
2013-04-19 21:07:11 +00:00
Eric Christopher
88bdd26cc9 Revert "PR14606: debug info imported_module support"
This reverts commit r179836 as it seems to have caused test failures.

llvm-svn: 179840
2013-04-19 07:47:16 +00:00
David Blaikie
46f35f8e56 PR14606: debug info imported_module support
Adding another CU-wide list, in this case of imported_modules (since they
should be relatively rare, it seemed better to add a list where each element
had a "context" value, rather than add a (usually empty) list to every scope).
This takes care of DW_TAG_imported_module, but to fully address PR14606 we'll
need to expand this to cover DW_TAG_imported_declaration too.

llvm-svn: 179836
2013-04-19 06:57:04 +00:00
Tom Stellard
017c53ebbd R600: Add pattern for the BFI_INT instruction
llvm-svn: 179830
2013-04-19 02:11:06 +00:00
Tom Stellard
b767059700 R600: Reorganize lit tests and document how they should be organized
llvm-svn: 179828
2013-04-19 02:10:53 +00:00
Hal Finkel
b96e61374f Disable PPC comparison optimization by default
This seems to cause a stage-2 LLVM compile failure (by crashing TableGen); do
I'm disabling this for now.

llvm-svn: 179807
2013-04-18 22:54:25 +00:00
Hal Finkel
44190578df Implement optimizeCompareInstr for PPC
Many PPC instructions have a so-called 'record form' which stores to a specific
condition register the result of comparing the result of the instruction with
zero (always as a signed comparison). For integer operations on PPC64, this is
always a 64-bit comparison.

This implementation is derived from the implementation in the ARM backend;
there are some differences because PPC condition registers are allocatable
virtual registers (although the record forms always use a specific one), and we
look for a matching subtraction instruction after the compare (but before the
first use) in addition to before it.

llvm-svn: 179802
2013-04-18 22:15:08 +00:00
Benjamin Kramer
aeff9e581b X86: Add an SSE2 lowering for 64 bit compares when pcmpgtq (SSE4.2) isn't available.
This pattern started popping up in vectorized min/max reductions.

llvm-svn: 179797
2013-04-18 21:37:45 +00:00
Derek Schuff
c55a3d43a9 Allow misaligned stores in x86 fast-isel.
In X86FastISel::X86SelectStore(), improperly aligned stores are rejected and
handled by the DAG-based ISel.  However, X86FastISel::X86SelectLoad() makes
no such requirement.  There doesn't appear to be an x86 architectural
correctness issue with allowing potentially unaligned store instructions.
This patch removes this restriction.

Patch by Jim Stichnot.

llvm-svn: 179774
2013-04-18 17:41:08 +00:00
Hao Liu
ca09ec237c Fix for PR14824, An ARM Load/Store Optimization bug
llvm-svn: 179751
2013-04-18 09:11:08 +00:00
Eli Bendersky
802610971f This patch teaches x86 fast-isel to generate the native div/idiv instructions
for the sdiv/srem/udiv/urem bitcode instructions.  This is done for the i8,
i16, and i32 types, as well as i64 for the x86_64 target.

Patch by Jim Stichnoth

llvm-svn: 179715
2013-04-17 20:10:13 +00:00
Vincent Lejeune
cd0483fb18 R600: Make Export Instruction not duplicable
llvm-svn: 179686
2013-04-17 15:17:39 +00:00
Richard Osborne
3bc2e6cf63 [XCore] Extend test to check positve offsets are folded into addresses.
llvm-svn: 179621
2013-04-16 20:05:52 +00:00
Richard Osborne
d8d60d4b61 [XCore] Give test more generic name.
I intend to extend the test with more offset folding checks

llvm-svn: 179620
2013-04-16 19:56:55 +00:00
Richard Osborne
53ec25c8fa [XCore] Convert a couple of tests to FileCheck.
llvm-svn: 179619
2013-04-16 19:41:19 +00:00
Logan Chien
6f13ff357d Implement ARM unwind opcode assembler.
llvm-svn: 179591
2013-04-16 12:02:21 +00:00
Jakob Stoklund Olesen
b4edc00933 Add 64-bit multiply and divide instructions for SPARC v9.
llvm-svn: 179582
2013-04-16 02:57:02 +00:00
Tom Stellard
bd67f8cd81 R600/SI: Emit config values in register value pairs.
Instead of emitting config values in a predefined order, the code
emitter will now emit a 32-bit register index followed by the 32-bit
config value.

llvm-svn: 179546
2013-04-15 17:51:35 +00:00
Tom Stellard
a44e2e18a1 R600/SI: Emit configuration value in the .AMDGPU.config ELF section
llvm-svn: 179545
2013-04-15 17:51:30 +00:00
Tom Stellard
cb4468b00a R600: Emit ELF formatted code rather than raw ISA.
llvm-svn: 179544
2013-04-15 17:51:21 +00:00
Tim Northover
b5dc8bb136 Avoid outputting temporary test file into source tree.
llvm-svn: 179532
2013-04-15 15:49:13 +00:00
Hal Finkel
371be65604 Fix PPC64 CR spill location for callee-saved registers
This fixes an ABI bug for non-Darwin PPC64. For the callee-saved condition
registers, the spill location is specified relative to the stack pointer (SP +
8). However, this is not relative to the SP after the new stack frame is
established, but instead relative to the caller's stack pointer (it is stored
into the linkage area of the parent's stack frame).

So, like with the link register, we don't directly spill the CRs with other
callee-saved registers, but just mark them to be spilled during prologue
generation.

In practice, this reverts r179457 for PPC64 (but leaves it in place for PPC32).

llvm-svn: 179500
2013-04-15 02:07:05 +00:00
Jakob Stoklund Olesen
3b790b7f2e Use i32 for all SPARC shift amounts, even in 64-bit mode.
Test case by llvm-stress.

llvm-svn: 179477
2013-04-14 05:48:50 +00:00
Jakob Stoklund Olesen
8fafe8cd31 Add support for the abs64 SPARC v9 code model.
For when 16 TB just isn't enough.

llvm-svn: 179474
2013-04-14 05:10:36 +00:00
Jakob Stoklund Olesen
d29f125f5b Add support for the SPARC v9 abs44 code model.
This is the default model for non-PIC 64-bit code. It supports
text+data+bss linked anywhere in the low 16 TB of the address space.

llvm-svn: 179473
2013-04-14 04:57:51 +00:00
Jakob Stoklund Olesen
b5173ad8fb Also put target flags on SPARC constant pool references.
Constant pool entries are accessed exactly the same way as global
variables.

llvm-svn: 179471
2013-04-14 04:35:16 +00:00
Jakob Stoklund Olesen
c23fada5f9 Fix patterns for 64-bit pointers.
This fixes the pic32 code model for SPARC v9.

llvm-svn: 179469
2013-04-14 01:53:23 +00:00
Jakob Stoklund Olesen
6182cc630a Define SPARC code models.
Currently, only abs32 and pic32 are implemented. Add a test case for
abs32 with 64-bit code. 64-bit PIC code is currently broken.

llvm-svn: 179463
2013-04-13 19:02:23 +00:00
Hal Finkel
978a847acb Spill and restore PPC CR registers using the FP when we have one
For functions that need to spill CRs, and have dynamic stack allocations, the
value of the SP during the restore is not what it was during the save, and so
we need to use the FP in these cases (as for all of the other spills and
restores, but the CR restore has a special code path because its reserved slot,
like the link register, is specified directly relative to the adjusted SP).

llvm-svn: 179457
2013-04-13 08:09:20 +00:00
Andrew Trick
2bd87ad8d4 Further generalize this scheduler test.
The order of copies depends on queue order, which is not very stable.

llvm-svn: 179456
2013-04-13 07:37:27 +00:00
Andrew Trick
fb2a8d10f8 Fix a dislexic regex.
llvm-svn: 179455
2013-04-13 07:29:21 +00:00
Andrew Trick
1ef71359cd Add a missing REQUIRES: asserts
llvm-svn: 179453
2013-04-13 06:12:46 +00:00
Andrew Trick
861493bc4f MI-Sched: schedule physreg copies.
The register allocator expects minimal physreg live ranges. Schedule
physreg copies accordingly. This is slightly tricky when they occur in
the middle of the scheduling region. For now, this is handled by
rescheduling the copy when its associated instruction is
scheduled. Eventually we may instead bundle them, but only if we can
preserve the bundles as parallel copies during regalloc.

llvm-svn: 179449
2013-04-13 06:07:40 +00:00
Akira Hatanaka
e0468ce3e1 [mips] Reapply r179420 and r179421.
llvm-svn: 179434
2013-04-13 00:55:41 +00:00
Akira Hatanaka
b0b85e00d8 Revert r179420 and r179421.
llvm-svn: 179422
2013-04-12 22:40:07 +00:00
Akira Hatanaka
737648f84c [mips] Instruction selection patterns for carry-setting and using add
instructions.

llvm-svn: 179421
2013-04-12 22:24:52 +00:00
Akira Hatanaka
d809bc8eeb [mips] v4i8 and v2i16 add, sub and mul instruction selection patterns.
llvm-svn: 179420
2013-04-12 22:14:24 +00:00
Nico Rieck
1162bb7a1d Replace coff-/elf-dump with llvm-readobj
llvm-svn: 179361
2013-04-12 04:06:46 +00:00
Nadav Rotem
f96cc4976d Fix the test on linux by setting the triple and the align format
llvm-svn: 179354
2013-04-12 01:07:16 +00:00
Nadav Rotem
662256bafa Add a flag to align all basic blocks in the function.
When debugging performance regressions we often ask ourselves if the regression
that we see is due to poor isel/sched/ra or due to some micro-architetural
problem.  When comparing two code sequences one good way to rule out front-end
bottlenecks (and other the issues) is to force code alignment. This pass adds
a flag that forces the alignment of all of the basic blocks in the program.

llvm-svn: 179353
2013-04-12 00:48:32 +00:00
Preston Gurd
8e4196dfe6 Use FileCheck instead of grep.
llvm-svn: 179322
2013-04-11 21:39:01 +00:00
Jack Carter
834846d7a8 Mips specific inline asm memory operand modifier test case
These changes are based on commit responses for r179135.

llvm-svn: 179315
2013-04-11 19:39:19 +00:00
Eli Bendersky
0ce49fd520 Add a CHECK-NOT for a more faithful translation of the original grep | count 2.
Thanks to Reid Kleckner for catching this.

llvm-svn: 179289
2013-04-11 14:43:19 +00:00
Benjamin Kramer
f15ba24b8d Add missing colons to check lines.
llvm-svn: 179277
2013-04-11 12:41:41 +00:00
Benjamin Kramer
4413e71a39 FileCheckize a bunch of tests.
llvm-svn: 179276
2013-04-11 12:32:23 +00:00
Michael Liao
877d1576e6 Optimize vector select from all 0s or all 1s
As packed comparisons in AVX/SSE produce all 0s or all 1s in each SIMD lane,
vector select could be simplified to AND/OR or removed if one or both values
being selected is all 0s or all 1s.

llvm-svn: 179267
2013-04-11 05:15:54 +00:00
Michael Liao
87125582e9 Enhance bool simplifcation in X86 to handle more cases
This patch is revised based on patch from Victor Umansky
<victor.umansky@intel.com>. More cases are handled in X86's bool
simplification, i.e.
- SETCC_CARRY
- value is truncated to i1 with AND

As a by-product, PR5443 is also fixed.

llvm-svn: 179265
2013-04-11 04:43:09 +00:00
Eli Bendersky
90daaa543a Rewrite some of the test/CodeGen/X86 tests to use FileCheck instead of grep
llvm-svn: 179241
2013-04-10 23:30:20 +00:00
Hal Finkel
03d47320aa Manually remove successors in if conversion when CopyAndPredicateBlock is used
In the simple and triangle if-conversion cases, when CopyAndPredicateBlock is
used because the to-be-predicated block has other predecessors, we need to
explicitly remove the old copied block from the successors list. Normally if
conversion relies on TII->AnalyzeBranch combined with BB->CorrectExtraCFGEdges
to cleanup the successors list, but if the predicated block contained an
un-analyzable branch (such as a now-predicated return), then this will fail.

These extra successors were causing a problem on PPC because it was causing
later passes (such as PPCEarlyReturm) to leave dead return-only basic blocks in
the code.

llvm-svn: 179227
2013-04-10 22:05:25 +00:00
Jack Carter
8e319ed798 Mips specific inline asm memory operand modifier test case
These changes are based on commit responses for r179135.

llvm-svn: 179225
2013-04-10 22:02:32 +00:00
Michel Danzer
c1562afdde R600/SI: Add pattern for AMDGPUurecip
21 more little piglits with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 179186
2013-04-10 17:17:56 +00:00
Reed Kotler
68e5128508 This is for an experimental option -mips-os16. The idea is to compile all
Mips32 code as Mips16 unless it can't be compiled as Mips 16. For now this
would happen as long as floating point instructions are not needed.
Probably it would also make sense to compile as mips32 if atomic operations
are needed too. There may be other cases too.

A module pass prescans the IR and adds the mips16 or nomips16 attribute
to functions depending on the functions needs.

Mips 16 mode can result in a 40% code compression by utililizing 16 bit
encoding of many instructions.

The hope is for this to replace the traditional gcc way of dealing with
Mips16 code using floating point which involves essentially using soft float
but with a library implemented using mips32 floating point. This gcc 
method also requires creating stubs so that Mips32 code can interact with
these Mips 16 functions that have floating point needs. My conjecture is
that in reality this traditional gcc method would never win over this
new method.

I will be implementing the traditional gcc method also. Some of it is already
done but I needed to do the stubs to finish the work and those required
this mips16/32 mixed mode capability.

I have more ideas for to make this new method much better and I think the old
method will just live in llvm for anyone that needs the backward compatibility
but I don't for what reason that would be needed.

llvm-svn: 179185
2013-04-10 16:58:04 +00:00
Vincent Lejeune
daa1e69206 R600: Add VTX_READ_* and RAT_WRITE_CACHELESS_* when computing cf addr
llvm-svn: 179174
2013-04-10 13:29:20 +00:00
Christian Konig
f40f671bab R600/SI: dynamical figure out the reg class of MIMG
Depending on the number of bits set in the writemask.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 179166
2013-04-10 08:39:16 +00:00
Christian Konig
76cd1a76c2 R600/SI: adjust writemask to only the used components
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 179165
2013-04-10 08:39:08 +00:00
Christian Konig
ffddac18a4 R600/SI: remove image sample writemask
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 179164
2013-04-10 08:39:01 +00:00
Evan Cheng
9f82233851 __sincosf_stret returns sinf / cosf in bits 0:31 and 32:63 of xmm0, not in
xmm0 / xmm1.

rdar://13599493

llvm-svn: 179141
2013-04-10 01:26:07 +00:00
Jack Carter
03f8f98410 Mips specific inline asm operand modifier 'D'
Modifier 'D' is to use the second word of a double integer.

We had previously implemented the pure register varient of 
the modifier and this patch implements the memory reference.



#include "stdio.h"

int b[8] = {0,1,2,3,4,5,6,7};
void main()
{
    int i;
    
    // The first word. Notice, no 'D'
    {asm (
    "lw    %0,%1;"
    : "=r" (i)
    : "m" (*(b+4))
    );}
    
    printf("%d\n",i);

    // The second word
    {asm (
    "lw    %0,%D1;"
    : "=r" (i)
    : "m" (*(b+4))
    );}
    
    printf("%d\n",i);
}

llvm-svn: 179135
2013-04-09 23:19:50 +00:00
Hal Finkel
8b05494b58 Allow PPC B and BLR to be if-converted into some predicated forms
This enables us to form predicated branches (which are the same conditional
branches we had before) and also a larger set of predicated returns (including
instructions like bdnzlr which is a conditional return and loop-counter
decrement all in one).

At the moment, if conversion does not capture all possible opportunities. A
simple example is provided in early-ret2.ll, where if conversion forms one
predicated return, and then the PPCEarlyReturn pass picks up the other one. So,
at least for now, we'll keep both mechanisms.

llvm-svn: 179134
2013-04-09 22:58:37 +00:00
Reed Kotler
9b753510a5 This patch enables llvm to switch between compiling for mips32/mips64
and mips16 on a per function basis.

Because this patch is somewhat involved I have provide an overview of the
key pieces of it.

The patch is written so as to not change the behavior of the non mixed
mode. We have tested this a lot but it is something new to switch subtargets
so we don't want any chance of regression in the mainline compiler until
we have more confidence in this.

Mips32/64 are very different from Mip16 as is the case of ARM vs Thumb1.
For that reason there are derived versions of the register info, frame info, 
instruction info and instruction selection classes.

Now we register three separate passes for instruction selection.
One which is used to switch subtargets (MipsModuleISelDAGToDAG.cpp) and then
one for each of the current subtargets (Mips16ISelDAGToDAG.cpp and
MipsSEISelDAGToDAG.cpp).

When the ModuleISel pass runs, it determines if there is a need to switch
subtargets and if so, the owning pointers in MipsTargetMachine are
appropriately changed.

When 16Isel or SEIsel is run, they will return immediately without doing
any work if the current subtarget mode does not apply to them.

In addition, MipsAsmPrinter needs to be reset on a function basis.

The pass BasicTargetTransformInfo is substituted with a null pass since the
pass is immutable and really needs to be a function pass for it to be
used with changing subtargets. This will be fixed in a follow on patch.

llvm-svn: 179118
2013-04-09 19:46:01 +00:00
Benjamin Kramer
788f55c7d4 DAGCombiner: Fold a shuffle on CONCAT_VECTORS into a new CONCAT_VECTORS if possible.
This pattern occurs in SROA output due to the way vector arguments are lowered
on ARM.

The testcase from PR15525 now compiles into this, which is better than the code
we got with the old scalarrepl:
_Store:
	ldr.w	r9, [sp]
	vmov	d17, r3, r9
	vmov	d16, r1, r2
	vst1.8	{d16, d17}, [r0]
	bx	lr

Differential Revision: http://llvm-reviews.chandlerc.com/D647

llvm-svn: 179106
2013-04-09 17:41:43 +00:00
Hal Finkel
c72f2476a9 Use virtual base registers on PPC
On PowerPC, non-vector loads and stores have r+i forms; however, in functions
with large stack frames these were not being used to access slots far from the
stack pointer because such slots were out of range for the signed 16-bit
immediate offset field. This increases register pressure because we need a
separate register for each offset (when the r+r form is used). By enabling
virtual base registers, we can deal with large stack frames without unduly
increasing register pressure.

llvm-svn: 179105
2013-04-09 17:27:09 +00:00
Hal Finkel
68742d4f1f Convert test PowerPC/2007-09-07-LoadStoreIdxForms to FileCheck
llvm-svn: 179104
2013-04-09 17:26:55 +00:00
Jakob Stoklund Olesen
6a275455ac Compute correct frame sizes for SPARC v9 64-bit frames.
The save area is twice as big and there is no struct return slot. The
stack pointer is always 16-byte aligned (after adding the bias).

Also eliminate the stack adjustment instructions around calls when the
function has a reserved stack frame.

llvm-svn: 179083
2013-04-09 04:37:47 +00:00
Hal Finkel
0daaa8e2de Generate PPC early conditional returns
PowerPC has a conditional branch to the link register (return) instruction: BCLR.
This should be used any time when we'd otherwise have a conditional branch to a
return. This adds a small pass, PPCEarlyReturn, which runs just prior to the
branch selection pass (and, importantly, after block placement) to generate
these conditional returns when possible. It will also eliminate unconditional
branches to returns (these happen rarely; most of the time these have already
been tail duplicated by the time PPCEarlyReturn is invoked). This is a nice
optimization for small functions that do not maintain a stack frame.

llvm-svn: 179026
2013-04-08 16:24:03 +00:00
Tim Northover
8eb5637d73 AArch64: remove barriers from AArch64 atomic operations.
I've managed to convince myself that AArch64's acquire/release
instructions are sufficient to guarantee C++11's required semantics,
even in the sequentially-consistent case.

llvm-svn: 179005
2013-04-08 08:40:41 +00:00
Hal Finkel
9af1fa0d50 Cleanup and improve PPC fsel generation
First, we should not cheat: fsel-based lowering of select_cc is a
finite-math-only optimization (the ISA manual, section F.3 of v2.06, makes
this clear, as does a note in our own README).

This also adds fsel-based lowering of EQ and NE condition codes. As it turned
out, fsel generation was covered by a grand total of zero regression test
cases. I've added some test cases to cover the existing behavior (which is now
finite-math only), as well as the new EQ cases.

llvm-svn: 179000
2013-04-07 22:11:09 +00:00
Jakob Stoklund Olesen
658f7754a3 Implement LowerCall_64 for the SPARC v9 64-bit ABI.
There is still no support for byval arguments (which I don't think are
needed) and varargs.

llvm-svn: 178993
2013-04-07 19:10:57 +00:00
Jakob Stoklund Olesen
594e073ba0 Implement LowerReturn_64 for SPARC v9.
Integer return values are sign or zero extended by the callee, and
structs up to 32 bytes in size can be returned in registers.

The CC_Sparc64 CallingConv definition is shared between
LowerFormalArguments_64 and LowerReturn_64. Function arguments and
return values are passed in the same registers.

The inreg flag is also used for return values. This is required to handle
C functions returning structs containing floats and ints:

  struct ifp {
    int i;
    float f;
  };

  struct ifp f(void);

LLVM IR:

  define inreg { i32, float } @f() {
     ...
     ret { i32, float } %retval
  }

The ABI requires that %retval.i is returned in the high bits of %i0
while %retval.f goes in %f1.

Without the inreg return value attribute, %retval.i would go in %i0 and
%retval.f would go in %f3 which is a more efficient way of returning
%multiple values, but it is not ABI compliant for returning C structs.

llvm-svn: 178966
2013-04-06 23:57:33 +00:00
Jakob Stoklund Olesen
2e3d792bf5 SPARC v9 stack pointer bias.
64-bit SPARC v9 processes use biased stack and frame pointers, so the
current function's stack frame is located at %sp+BIAS .. %fp+BIAS where
BIAS = 2047.

This makes more local variables directly accessible via [%fp+simm13]
addressing.

llvm-svn: 178965
2013-04-06 21:38:57 +00:00
Hal Finkel
6c65d3a736 Implement PPCInstrInfo::FoldImmediate
There are certain PPC instructions into which we can fold a zero immediate
operand. We can detect such cases by looking at the register class required
by the using operand (so long as it is not otherwise constrained).

llvm-svn: 178961
2013-04-06 19:30:30 +00:00
Jakob Stoklund Olesen
2076ffbc15 Complete formal arguments for the SPARC v9 64-bit ABI.
All arguments are formally assigned to stack positions and then promoted
to floating point and integer registers. Since there are more floating
point registers than integer registers, this can cause situations where
floating point arguments are assigned to registers after integer
arguments that where assigned to the stack.

Use the inreg flag to indicate 32-bit fragments of structs containing
both float and int members.

The three-way shadowing between stack, integer, and floating point
registers requires custom argument lowering. The good news is that
return values are passed in the exact same way, and we can share the
code.

Still missing:

 - Update LowerReturn to handle structs returned in registers.
 - LowerCall.
 - Variadic functions.

llvm-svn: 178958
2013-04-06 18:32:12 +00:00
Tom Stellard
8ad4f7c25b R600/SI: Add support for buffer stores v2
v2:
  - Use the ADDR64 bit

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178931
2013-04-05 23:31:51 +00:00
Tom Stellard
917d7412f1 R600/SI: Add processor types for each SI variant
Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178928
2013-04-05 23:31:35 +00:00
Tom Stellard
17fba38a4b R600/SI: Avoid generating S_MOVs with 64-bit immediates v2
SITargetLowering::analyzeImmediate() was converting the 64-bit values
to 32-bit and then checking if they were an inline immediate.  Some
of these conversions caused this check to succeed and produced
S_MOV instructions with 64-bit immediates, which are illegal.

v2:
  - Clean up logic

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178927
2013-04-05 23:31:20 +00:00
Hal Finkel
b556d850c9 Enable early if conversion on PPC
On cores for which we know the misprediction penalty, and we have
the isel instruction, we can profitably perform early if conversion.
This enables us to replace some small branch sequences with selects
and avoid the potential stalls from mispredicting the branches.

Enabling this feature required implementing canInsertSelect and
insertSelect in PPCInstrInfo; isel code in PPCISelLowering was
refactored to use these functions as well.

llvm-svn: 178926
2013-04-05 23:29:01 +00:00
Timur Iskhodzhanov
c004e9db2d Make the test/CodeGen/X86/win32_sret.ll reliable on any CPU by explicitly specifying the -mcpu
llvm-svn: 178885
2013-04-05 17:05:56 +00:00
Renato Golin
9d05117f2b Reverting 178851 as it broke buildbots
llvm-svn: 178883
2013-04-05 16:39:53 +00:00
Stepan Dyatkovskiy
98f7dac944 Fix for PR14824: "Optimization arm_ldst_opt inserts newly generated instruction vldmia at incorrect position".
Patch introduces memory operands tracking in ARMLoadStoreOpt::LoadStoreMultipleOpti. For each register it keeps the order of load operations as it was before optimization pass.
It is kind of deep improvement of fix proposed by Hao: http://llvm.org/bugs/show_bug.cgi?id=14824#c4
But it also tracks conflicts between different register classes (e.g. D2 and S5).
For more details see:
Bug description: http://llvm.org/bugs/show_bug.cgi?id=14824
LLVM Commits discussion: 
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130311/167936.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130318/168688.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130325/169376.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130401/170238.html

llvm-svn: 178851
2013-04-05 05:52:14 +00:00
Andrew Trick
6da34cd35e RegisterPressure heuristics currently require signed comparisons.
llvm-svn: 178823
2013-04-05 00:31:34 +00:00
Hal Finkel
1dc78e666b PPC: Improve code generation for mixed-precision reciprocal sqrt
The DAGCombine logic that recognized a/sqrt(b) and transformed it into
a multiplication by the reciprocal sqrt did not handle cases where the
sqrt and the division were separated by an fpext or fptrunc.

llvm-svn: 178801
2013-04-04 22:44:12 +00:00
Jakob Stoklund Olesen
a53fa8d450 Avoid high-latency false CPSR dependencies even for tMOVSi.
The Thumb2SizeReduction pass avoids false CPSR dependencies, except it
still aggressively creates tMOVi8 instructions because they are so
common.

Avoid creating false CPSR dependencies even for tMOVi8 instructions when
the the CPSR flags are known to have high latency. This allows integer
computation to overlap floating point computations.

Also process blocks in a reverse post-order and propagate high-latency
flags to successors.

<rdar://problem/13468102>

llvm-svn: 178773
2013-04-04 18:25:36 +00:00
Stepan Dyatkovskiy
0562afa331 New-password-test commit.
llvm-svn: 178765
2013-04-04 16:11:18 +00:00
Vincent Lejeune
a680946842 R600: Take export into account when computing cf address
llvm-svn: 178761
2013-04-04 13:59:59 +00:00
Jakob Stoklund Olesen
1969a96fcd Add SPARC v9 support for select on 64-bit compares.
This requires v9 cmov instructions using the %xcc flags instead of the
%icc flags.

Still missing:
- Select floats on %xcc flags.
- Select i64 on %fcc flags.

llvm-svn: 178737
2013-04-04 03:08:00 +00:00
Vincent Lejeune
6a4ef74f44 R600: Fix last ALU of a clause being emitted in a separate clause
llvm-svn: 178675
2013-04-03 18:24:47 +00:00
Bill Schmidt
990515e4c4 Fix PR15632: No support for ppcf128 floating-point remainder on PowerPC.
For this we need to use a libcall.  Previously LLVM didn't implement
libcall support for frem, so I've added it in the usual
straightforward manner.  A test case from the bug report is included.

llvm-svn: 178639
2013-04-03 13:05:44 +00:00
Timur Iskhodzhanov
0976f711d6 Temporarily relax the WIN32 checks in the SRet test to fix the Atom D2700 bot
llvm-svn: 178635
2013-04-03 12:17:15 +00:00
Timur Iskhodzhanov
ecd533f0ec Fix SRet for thiscall in i686-pc-win32
llvm-svn: 178634
2013-04-03 11:27:54 +00:00
Jakob Stoklund Olesen
3b7eaf9bb6 Add 64-bit compare + branch for SPARC v9.
The same compare instruction is used for 32-bit and 64-bit compares. It
sets two different sets of flags: icc and xcc.

This patch adds a conditional branch instruction using the xcc flags for
64-bit compares.

llvm-svn: 178621
2013-04-03 04:41:44 +00:00
Hal Finkel
0208f7c744 Use PPC reciprocal estimates with Newton iteration in fast-math mode
When unsafe FP math operations are enabled, we can use the fre[s] and
frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together
with some Newton iteration, in order to quickly generate floating-point
division and sqrt results. All of these instructions are separately optional,
and so each has its own feature flag (except for the Altivec instructions,
which are covered under the existing Altivec flag). Doing this is not only
faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these
computations to be pipelined with other computations in order to hide their
overall latency.

I've also added a couple of missing fnmsub patterns which turned out to be
missing (but are necessary for good code generation of the Newton iterations).
Altivec needs a similar fix, but that will probably be more complicated because
fneg is expanded for Altivec's v4f32.

llvm-svn: 178617
2013-04-03 04:01:11 +00:00
Akira Hatanaka
f08d3a5a83 [mips] Small update to the implementation of eh.return for Mips.
This patch initializes t9 to the handler address, but only if the relocation
model is pic. This handles the case where handler to which eh.return jumps 
points to the start of the function.

Patch by Sasa Stankovic.

llvm-svn: 178588
2013-04-02 23:02:07 +00:00
NAKAMURA Takumi
d8a9117bcb llvm/test/CodeGen/X86: Unmark them out of XFAIL:cygming, in atomic{32|64}.ll and handle-move.ll, corresponding to r178549.
This reverts r176808, r176798, and r177914.

llvm-svn: 178583
2013-04-02 22:35:08 +00:00
Bill Schmidt
c98ed219d3 Fix PR15630: Replace faulty stdcx. with stwcx.
When doing a partword atomic operation, a lwarx was being paired with
a stdcx. instead of a stwcx. when compiling for a 64-bit target.  The
target has nothing to do with it in this case; we always need a stwcx.

Thanks to Kai Nacke for reporting the problem.

llvm-svn: 178559
2013-04-02 18:37:08 +00:00
Jakob Stoklund Olesen
b0a5a72daf Don't attempt MTM heuristics without a scheduling model present.
This should fix the PPC buildbots.

llvm-svn: 178558
2013-04-02 18:26:45 +00:00
Chad Rosier
908153170e [fast-isel] Use the correct API to disable FastLowerArguments for Win64.
llvm-svn: 178549
2013-04-02 16:31:41 +00:00
Arnold Schwaighofer
c9508f817a DAGCombiner: Merge store/loads when we have extload/truncstores
This is helps on architectures where i8,i16 are not legal but we have byte, and
short loads/stores. Allowing us to merge copies like the one below on ARM.

copy(char *a, char *b, int n) {
 do {
   int t0 = a[0];
   int t1 = a[1];
   b[0] = t0;
   b[1] = t1;

radar://13536387

llvm-svn: 178546
2013-04-02 15:58:51 +00:00
Preston Gurd
fca710bf70 Simplify test cases for Atom preferring call register indirect over
call memory indirect (32 and 64 bit).

llvm-svn: 178541
2013-04-02 14:25:06 +00:00
Jakob Stoklund Olesen
8a184a7fe4 Add 64-bit load and store instructions.
There is only a few new instructions, the rest is handled with patterns.

llvm-svn: 178528
2013-04-02 04:09:28 +00:00
Jakob Stoklund Olesen
22fe26207f Basic 64-bit ALU operations.
SPARC v9 extends all ALU instructions to 64 bits, so we simply need to
add patterns to use them for both i32 and i64 values.

llvm-svn: 178527
2013-04-02 04:09:23 +00:00
Jakob Stoklund Olesen
d57f9ab92f Materialize 64-bit immediates.
The last resort pattern produces 6 instructions, and there are still
opportunities for materializing some immediates in fewer instructions.

llvm-svn: 178526
2013-04-02 04:09:17 +00:00
Jakob Stoklund Olesen
5ef2195726 Add 64-bit shift instructions.
SPARC v9 defines new 64-bit shift instructions. The 32-bit shift right
instructions are still usable as zero and sign extensions.

This adds new F3_Sr and F3_Si instruction formats that probably should
be used for the 32-bit shifts as well. They don't really encode an
simm13 field.

llvm-svn: 178525
2013-04-02 04:09:12 +00:00
Jakob Stoklund Olesen
9fbfce2d11 Add support for 64-bit calling convention.
This is far from complete, but it is enough to make it possible to write
test cases using i64 arguments.

Missing features:
- Floating point arguments.
- Receiving arguments on the stack.
- Calls.

llvm-svn: 178523
2013-04-02 04:09:02 +00:00
Vincent Lejeune
dc0e12bd5b R600: Add support for native control flow
llvm-svn: 178505
2013-04-01 21:48:05 +00:00
Vincent Lejeune
11918406b3 R600: Emit CF_ALU and use true kcache register.
llvm-svn: 178503
2013-04-01 21:47:42 +00:00
Hal Finkel
9191cdb5f2 Fix a bad assert in PPCTargetLowering
llvm-svn: 178489
2013-04-01 18:42:58 +00:00
Hal Finkel
9cd1e5e93c Add triple to test/CodeGen/PowerPC/stfiwx-2
llvm-svn: 178486
2013-04-01 18:18:44 +00:00
Arnold Schwaighofer
a2a475a83d Merge load/store sequences with adresses: base + index + offset
We would also like to merge sequences that involve a variable index like in the
example below.

    int index = *idx++
    int i0 = c[index+0];
    int i1 = c[index+1];
    b[0] = i0;
    b[1] = i1;

By extending the parsing of the base pointer to handle dags that contain a
base, index, and offset we can handle examples like the one above.

The dag for the code above will look something like:

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i8 load %index))))

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i32 add (i32 signextend (i8 load %index))
                                         (i32 1)))))

The code that parses the tree ignores the intermediate sign extensions. However,
if there is a sign extension it needs to be on all indexes.

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (add (i8 load %index)
                                     (i8 1))))
 vs

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i32 add (i32 signextend (i8 load %index))
                                         (i32 1)))))
radar://13536387

llvm-svn: 178483
2013-04-01 18:12:58 +00:00
Hal Finkel
f184647a53 Add more PPC floating-point conversion instructions
The P7 and A2 have additional floating-point conversion instructions which
allow a direct two-instruction sequence (plus load/store) to convert from all
combinations (signed/unsigned i32/i64) <--> (float/double) (on previous cores,
only some combinations were directly available).

llvm-svn: 178480
2013-04-01 17:52:07 +00:00
Hal Finkel
55f144f923 Fix PowerPC/cttz.ll to specify a cpu (and use FileCheck)
llvm-svn: 178472
2013-04-01 16:31:56 +00:00
Hal Finkel
9eed3ac928 Add the PPC popcntw instruction
The popcntw instruction is available whenever the popcntd instruction is
available, and performs a separate popcnt on the lower and upper 32-bits.
Ignoring the high-order count, this can be used for the 32-bit input case
(saving on the explicit zero extension otherwise required to use popcntd).

llvm-svn: 178470
2013-04-01 15:58:15 +00:00
Benjamin Kramer
790bd5fb50 X86: Promote sitofp <8 x i16> to <8 x i32> when AVX is available.
A vector sext + sitofp is a lot cheaper than 8 scalar conversions.

llvm-svn: 178448
2013-03-31 12:49:15 +00:00
Hal Finkel
085f61160f Add the PPC lfiwax instruction
This instruction is available on modern PPC64 CPUs, and is now used
to improve the SINT_TO_FP lowering (by eliminating the need for the
separate sign extension instruction and decreasing the amount of
needed stack space).

llvm-svn: 178446
2013-03-31 10:12:51 +00:00
Hal Finkel
7bdfbd6570 Cleanup PPC(64) i32 -> float/double conversion
The existing SINT_TO_FP code for i32 -> float/double conversion was disabled
because it relied on broken EXTSW_32/STD_32 instruction definitions. The
original intent had been to enable these 64-bit instructions to be used on CPUs
that support them even in 32-bit mode.  Unfortunately, this form of lying to
the infrastructure was buggy (as explained in the FIXME comment) and had
therefore been disabled.

This re-enables this functionality, using regular DAG nodes, but only when
compiling in 64-bit mode. The old STD_32/EXTSW_32 definitions (which were dead)
are removed.

llvm-svn: 178438
2013-03-31 01:58:02 +00:00
Benjamin Kramer
86e90ea8b4 DAGCombine: visitXOR can replace a node without returning it, bail out in that case.
Fixes the crash reported in PR15608.

llvm-svn: 178429
2013-03-30 21:28:18 +00:00
Benjamin Kramer
50725426cb Change '@SECREL' suffix to GAS-compatible '@SECREL32'.
'@SECREL' is what is used by the Microsoft assembler, but GNU as expects '@SECREL32'.
With the patch, the MC-generated code works fine in combination with a recent GNU as (2.23.51.20120920 here).

Patch by David Nadlinger!
Differential Revision: http://llvm-reviews.chandlerc.com/D429

llvm-svn: 178427
2013-03-30 16:21:50 +00:00
Justin Holewinski
21480942b2 [NVPTX] Remove support for SM < 2.0. This was never fully supported anyway.
llvm-svn: 178417
2013-03-30 14:29:30 +00:00
Justin Holewinski
23056edada [NVPTX] Add NVVMReflect pass to allow compile-time selection of
specific code paths.

This allows us to write code like:

  if (__nvvm_reflect("FOO"))
    // Do something
  else
    // Do something else

and compile into a library, then give "FOO" a value at kernel
compile-time so the check becomes a no-op.

llvm-svn: 178416
2013-03-30 14:29:25 +00:00
Akira Hatanaka
bc81d23802 [mips] Add patterns for DSP indexed load instructions.
llvm-svn: 178408
2013-03-30 02:14:45 +00:00
Akira Hatanaka
5ff9493456 [mips] Fix DSP instructions to have explicit accumulator register operands.
Check that instruction selection can select multiply-add/sub DSP instructions
from a pattern that doesn't have intrinsics.

llvm-svn: 178406
2013-03-30 01:58:00 +00:00
Akira Hatanaka
6c9ddf6943 [mips] Move the code which does dag-combine for multiply-add/sub nodes to
derived class MipsSETargetLowering.

We shouldn't be generating madd/msub nodes if target is Mips16, since Mips16
doesn't have support for multipy-add/sub instructions.

llvm-svn: 178404
2013-03-30 01:42:24 +00:00
Timur Iskhodzhanov
d7d35221f7 Exclude the X86/complex-fca.ll test at it probably wasn't supposed to work on Windows
llvm-svn: 178375
2013-03-29 21:54:00 +00:00
Hal Finkel
c14a74d243 Implement FRINT lowering on PPC using frin
Like nearbyint, rint can be implemented on PPC using the frin instruction. The
complication comes from the fact that rint needs to set the FE_INEXACT flag
when the result does not equal the input value (and frin does not do that). As
a result, we use a custom inserter which, after the rounding, compares the
rounded value with the original, and if they differ, explicitly sets the XX bit
in the FPSCR register (which corresponds to FE_INEXACT).

Once LLVM has better modeling of the floating-point environment we should be
able to (often) eliminate this extra complexity.

llvm-svn: 178362
2013-03-29 19:41:55 +00:00
Benjamin Kramer
279e5cfa9a Remove the old CodePlacementOpt pass.
It was superseded by MachineBlockPlacement and disabled by default since LLVM 3.1.

llvm-svn: 178349
2013-03-29 17:14:24 +00:00
Hal Finkel
3493fd8e51 Add PPC FP rounding instructions fri[mnpz]
These instructions are available on the P5x (and later) and on the A2. They
implement the standard floating-point rounding operations (floor, trunc, etc.).
One caveat: frin (round to nearest) does not implement "ties to even", and so
is only enabled in fast-math mode.

llvm-svn: 178337
2013-03-29 08:57:48 +00:00
Michael Liao
427149cbcf Add support of RDSEED defined in AVX2 extension
llvm-svn: 178314
2013-03-28 23:41:26 +00:00
Michael Liao
aec693ab31 Enhance boolean simplification to handle 16-/64-bit RDRAND
- RDRAND always clears the destination value when a random value is not
  available (i.e. CF == 0). This value is truncated or zero-extended as
  the false boolean value to be returned. Boolean simplification needs
  to skip this 'zext' or 'trunc' node.

llvm-svn: 178312
2013-03-28 23:38:52 +00:00
Timur Iskhodzhanov
d7de83d51c Make Win32 put the SRet address into EAX, fixes PR15556
llvm-svn: 178291
2013-03-28 21:30:04 +00:00
Hal Finkel
8672bd7c2a Specify CPUs on the PPC bswap-load-store test
Otherwise, the CHECK-NOT's might trigger depending on the host's CPU.

llvm-svn: 178287
2013-03-28 20:35:18 +00:00
Hal Finkel
88670ad5f4 Only enable 64-bit bswap DAG combines for PPC64
Compiling in 32-bit mode on a P7 would assert after 64-bit DAG combines were
added for bswap with load/store. This is because these combines are really only
valid in 64-bit mode, regardless of the CPU (and this was not being checked).

llvm-svn: 178286
2013-03-28 20:23:46 +00:00
Jyotsna Verma
f28cc49519 Hexagon: Enable SupportDebugInfomation and DwarfInSection flags.
llvm-svn: 178279
2013-03-28 19:34:49 +00:00
Hal Finkel
f359927db6 Add the PPC64 ldbrx/stdbrx instructions
These are 64-bit load/store with byte-swap, and available on the P7 and the A2.
Like the similar instructions for 16- and 32-bit words, these are matched in the
target DAG-combine phase against load/store-bswap pairs.

llvm-svn: 178276
2013-03-28 19:25:55 +00:00
Jyotsna Verma
8a524534a6 Hexagon: Use multiclass for gp-relative instructions.
Remove noV4T gp-relative instructions.

llvm-svn: 178246
2013-03-28 16:25:57 +00:00
Hal Finkel
c21c3cf09e Add the PPC64 popcntd instruction
PPC ISA 2.06 (P7, A2, etc.) has a popcntd instruction. Add this instruction and
tell TTI about it so that popcount-loop recognition will know about it.

llvm-svn: 178233
2013-03-28 13:29:47 +00:00
Hal Finkel
4d8aed70c1 Cleanup PPC CR-spill kill flags and 32- vs. 64-bit instructions
There were a few places where kill flags were not being set correctly, and
where 32-bit instruction variants were being used with 64-bit registers. After
r178180, this code was being triggered causing llc to assert.

llvm-svn: 178220
2013-03-28 03:38:16 +00:00
David Blaikie
377434ec76 Revert "Adding DIImportedModules to DIScopes."
This reverts commit 342d92c7a0adeabc9ab00f3f0d88d739fe7da4c7.

Turns out we're going with a different schema design to represent
DW_TAG_imported_modules so we won't need this extra field.

llvm-svn: 178215
2013-03-28 02:44:59 +00:00
Preston Gurd
787c145b5f This patch follows is a follow up to r178171, which uses the register
form of call in preference to memory indirect on Atom.

In this case, the patch applies the optimization to the code for reloading
spilled registers.

The patch also includes changes to sibcall.ll and movgs.ll, which were
failing on the Atom buildbot after the first patch was applied.

This patch by Sriram Murali.

llvm-svn: 178193
2013-03-27 23:16:18 +00:00
Preston Gurd
b6ed645cb6 For the current Atom processor, the fastest way to handle a call
indirect through a memory address is to load the memory address into
a register and then call indirect through the register.

This patch implements this improvement by modifying SelectionDAG to
force a function address which is a memory reference to be loaded
into a virtual register.

Patch by Sriram Murali.

llvm-svn: 178171
2013-03-27 19:14:02 +00:00
Christian Konig
510c335233 R600/SI: add SETO/SETUO patterns
6 more piglit tests.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 178145
2013-03-27 15:27:31 +00:00
Hal Finkel
937268691d Print PPC ZERO as 0 (not r0) even on Darwin
It seems that the Darwin PPC assembler requires r0 to be written as 0 when it
means 0 (at least in lwarx/stwcx.). Fixes PR15605.

llvm-svn: 178142
2013-03-27 13:20:52 +00:00
Silviu Baranga
bd61af84a7 Enabling the generation of dependency breakers for partial updates on Cortex-A15. Also fixing a small bug in getting the update clearence for VLD1LNd32.
llvm-svn: 178134
2013-03-27 12:38:44 +00:00
Christian Konig
fb305cbcea R600/SI: add cummuting of rev instructions
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 178127
2013-03-27 09:12:59 +00:00