1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 04:22:57 +02:00
Commit Graph

19193 Commits

Author SHA1 Message Date
Benjamin Kramer
aeff9e581b X86: Add an SSE2 lowering for 64 bit compares when pcmpgtq (SSE4.2) isn't available.
This pattern started popping up in vectorized min/max reductions.

llvm-svn: 179797
2013-04-18 21:37:45 +00:00
Anat Shemer
ca5036302e In the function InstCombiner::visitExtractElementInst() removed the limitation that extract is promoted over a cast only if the cast has only one use.
llvm-svn: 179786
2013-04-18 19:56:44 +00:00
Anat Shemer
2d789b4b53 Added a function scalarizePHI() that sclarizes a vector phi instruction if it has only 2 uses: one to promote the vector phi in a loop and the other use is an extract operation of one element at a constant location.
llvm-svn: 179783
2013-04-18 19:35:39 +00:00
Rafael Espindola
c44b97c596 At Jim Grosbach's request detemplate Object/MachO.h.
We are still able to handle mixed endian objects by swapping one struct at a
time.

llvm-svn: 179778
2013-04-18 18:08:55 +00:00
Derek Schuff
c55a3d43a9 Allow misaligned stores in x86 fast-isel.
In X86FastISel::X86SelectStore(), improperly aligned stores are rejected and
handled by the DAG-based ISel.  However, X86FastISel::X86SelectLoad() makes
no such requirement.  There doesn't appear to be an x86 architectural
correctness issue with allowing potentially unaligned store instructions.
This patch removes this restriction.

Patch by Jim Stichnot.

llvm-svn: 179774
2013-04-18 17:41:08 +00:00
Arnold Schwaighofer
acd551152c LoopVectorizer: Recognize min/max reductions
A min/max operation is represented by a select(cmp(lt/le/gt/ge, X, Y), X, Y)
sequence in LLVM. If we see such a sequence we can treat it just as any other
commutative binary instruction and reduce it.

This appears to help bzip2 by about 1.5% on an imac12,2.

radar://12960601

llvm-svn: 179773
2013-04-18 17:22:34 +00:00
Benjamin Kramer
18f31a4d5e LoopVectorize: Use a set to avoid longer cycles in the reduction chain too.
Fixes PR15748.

llvm-svn: 179757
2013-04-18 14:29:13 +00:00
Hao Liu
ca09ec237c Fix for PR14824, An ARM Load/Store Optimization bug
llvm-svn: 179751
2013-04-18 09:11:08 +00:00
David Majnemer
72034bc02f Revert "Combine bit test + conditional or into simple math"
It is causing stage2 builds to fail, let's get them running again.

llvm-svn: 179750
2013-04-18 08:42:33 +00:00
David Majnemer
7dd2b94d65 Combine bit test + conditional or into simple math
Simplify:
(select (icmp eq (and X, C1), 0), Y, (or Y, C2))

Into:
(or (shl (and X, C1), C3), y)

Where:
C3 = Log(C2) - Log(C1)

If:
C1 and C2 are both powers of two

llvm-svn: 179748
2013-04-18 07:30:07 +00:00
Michael Gottesman
41ef390b2d [objc-arc] Do not mismatch up retains inside a for loop with releases outside said for loop in the presense of differing provenance caused by escaping blocks.
This occurs due to an alloca representing a separate ownership from the
original pointer. Thus consider the following pseudo-IR:

  objc_retain(%a)
  for (...) {
    objc_retain(%a)
    %block <- %a
    F(%block)
    objc_release(%block)
  }
  objc_release(%a)

From the perspective of the optimizer, the %block is a separate
provenance from the original %a. Thus the optimizer pairs up the inner
retain for %a and the outer release from %a, resulting in segfaults.

This is fixed by noting that the signature of a mismatch of
retain/releases inside the for loop is a Use/CanRelease top down with an
None bottom up (since bottom up the Retain-CanRelease-Use-Release
sequence is completed by the inner objc_retain, but top down due to the
differing provenance from the objc_release said sequence is not
completed). In said case in CheckForCFGHazards, we now clear the state
of %a implying that no pairing will occur.

Additionally a test case is included.

rdar://12969722

llvm-svn: 179747
2013-04-18 05:39:45 +00:00
Michael Gottesman
475607a0fa Streamline arc-annotation test (removing some cases which do not add any extra coverage) and set it up to use FileCheck variables to make the test more robust.
llvm-svn: 179745
2013-04-18 04:34:06 +00:00
Akira Hatanaka
ae4353c654 [mips] DSP-ASE move from HI/LO register instructions.
llvm-svn: 179739
2013-04-18 00:52:44 +00:00
Peter Collingbourne
a0d11d0e11 Add support for subsections to the ELF assembler. Fixes PR8717.
Differential Revision: http://llvm-reviews.chandlerc.com/D598

llvm-svn: 179725
2013-04-17 21:18:16 +00:00
Chad Rosier
1efbeb717f [ms-inline asm] Add support for the minus unary operator. Previously, we were
unable to handle cases such as __asm mov eax, 8*-8.

This patch also attempts to simplify the state machine.  Further, the error
reporting has been improved.  Test cases included, but more will be added to
the clang side shortly.
rdar://13668445

llvm-svn: 179719
2013-04-17 21:01:45 +00:00
Eli Bendersky
802610971f This patch teaches x86 fast-isel to generate the native div/idiv instructions
for the sdiv/srem/udiv/urem bitcode instructions.  This is done for the i8,
i16, and i32 types, as well as i64 for the x86_64 target.

Patch by Jim Stichnoth

llvm-svn: 179715
2013-04-17 20:10:13 +00:00
Arnold Schwaighofer
e1dc8ae8c8 X86 cost model: Exit before calling getSimpleVT on non-simple VTs
getSimpleVT can only handle simple value types.

radar://13676022

llvm-svn: 179714
2013-04-17 20:04:53 +00:00
Quentin Colombet
c65f67e600 Fix treatment of ARM unallocated hint instructions.
The reference manual defines only 5 permitted values for the immediate field of the "hint" instruction:
1. nop (imm == 0)
2. yield (imm == 1)
3. wfe (imm == 2)
4. wfi (imm == 3)
5. sev (imm == 4)

Therefore, restrict the permitted values for the "hint" instruction to 0 through 4.

Patch by Mihail Popa <Mihail.Popa@arm.com>

llvm-svn: 179707
2013-04-17 18:46:12 +00:00
Vincent Lejeune
cd0483fb18 R600: Make Export Instruction not duplicable
llvm-svn: 179686
2013-04-17 15:17:39 +00:00
Eric Christopher
35dc509db9 This appears to be no longer necessary for the testsuite.
llvm-svn: 179667
2013-04-17 06:37:30 +00:00
David Blaikie
53eed4fdaa PR15149/r174304 improvement - print hex for unknown dwarf language codes & add a test case
CR feedback from Rafael Espindola and Paul Robinson.

llvm-svn: 179664
2013-04-17 03:41:36 +00:00
Peter Collingbourne
648f68b9e0 Do not optimise fprintf() calls if its return value is used.
Differential Revision: http://llvm-reviews.chandlerc.com/D620

llvm-svn: 179661
2013-04-17 02:01:10 +00:00
Jack Carter
e773ca9ec6 Mips assembler: Enable handling of nested expressions
This patch allows the Mips assembler to parse and emit nested 
expressions as instruction operands. It also extends the 
expansion of memory instructions when an offset is given as 
an expression. 

Contributer: Vladimir Medic
llvm-svn: 179657
2013-04-17 00:18:04 +00:00
Richard Osborne
3bc2e6cf63 [XCore] Extend test to check positve offsets are folded into addresses.
llvm-svn: 179621
2013-04-16 20:05:52 +00:00
Richard Osborne
d8d60d4b61 [XCore] Give test more generic name.
I intend to extend the test with more offset folding checks

llvm-svn: 179620
2013-04-16 19:56:55 +00:00
Richard Osborne
53ec25c8fa [XCore] Convert a couple of tests to FileCheck.
llvm-svn: 179619
2013-04-16 19:41:19 +00:00
Logan Chien
6f13ff357d Implement ARM unwind opcode assembler.
llvm-svn: 179591
2013-04-16 12:02:21 +00:00
Alexey Samsonov
8fcb809efb llvm-objdump: Don't print contents of BSS sections: it makes no sense and crashes llvm-objdump on relocated objects with large bss
llvm-svn: 179589
2013-04-16 10:53:11 +00:00
Hans Wennborg
9311589e8d simplifycfg: Fix integer overflow converting switch into icmp.
If a switch instruction has a case for every possible value of its type,
with the same successor, SimplifyCFG would replace it with an icmp ult,
but the computation of the bound overflows in that case, which inverts
the test.

Patch by Jed Davis!

llvm-svn: 179587
2013-04-16 08:35:36 +00:00
Jakob Stoklund Olesen
b4edc00933 Add 64-bit multiply and divide instructions for SPARC v9.
llvm-svn: 179582
2013-04-16 02:57:02 +00:00
Jim Grosbach
10785fcd52 ARM: Add VACLT and VACLE assembly aliases.
These are aliases for VACGT and VACGE, respectively, with the source
operands reversed.

rdar://13638090

llvm-svn: 179575
2013-04-15 22:42:50 +00:00
Bill Wendling
97863d4274 We are not able to bitcast a pointer to an integral value.
Two return types are not equivalent if one is a pointer and the other is an
integral. This is because we cannot bitcast a pointer to an integral value.
PR15185

llvm-svn: 179569
2013-04-15 22:33:50 +00:00
Jack Carter
6a3d1c59be Mips assembler: Explicit floating point condition register recognition.
This patch allows the assembler to recognize $fcc0 
as a valid register for conditional move instructions. 

Corresponding test cases have been added.

Contributer: Vladimir Medic
llvm-svn: 179567
2013-04-15 22:21:55 +00:00
Nadav Rotem
40ad92b46d SLPVectorizer: Make it a function pass and add code for hoisting the vector-gather sequence out of loops.
llvm-svn: 179562
2013-04-15 22:00:26 +00:00
Tom Stellard
bd67f8cd81 R600/SI: Emit config values in register value pairs.
Instead of emitting config values in a predefined order, the code
emitter will now emit a 32-bit register index followed by the 32-bit
config value.

llvm-svn: 179546
2013-04-15 17:51:35 +00:00
Tom Stellard
a44e2e18a1 R600/SI: Emit configuration value in the .AMDGPU.config ELF section
llvm-svn: 179545
2013-04-15 17:51:30 +00:00
Tom Stellard
cb4468b00a R600: Emit ELF formatted code rather than raw ISA.
llvm-svn: 179544
2013-04-15 17:51:21 +00:00
Tim Northover
b5dc8bb136 Avoid outputting temporary test file into source tree.
llvm-svn: 179532
2013-04-15 15:49:13 +00:00
Eric Christopher
2f014c4d8d Revert "Recommit r179497 after fixing uninitialized variable." until
I can fix the testcases here:

http://lab.llvm.org:8011/builders/clang-native-arm-cortex-a9/builds/6952

This reverts commit r179512 due to testcases specifying triples
that they didn't actually mean and causing failures on other platforms.

llvm-svn: 179513
2013-04-15 07:31:37 +00:00
Eric Christopher
d78f751bfe Recommit r179497 after fixing uninitialized variable.
llvm-svn: 179512
2013-04-15 07:07:21 +00:00
Nadav Rotem
7ab2574900 SLPVectorizer: Add support for vectorizing trees that start at compare instructions.
llvm-svn: 179504
2013-04-15 04:25:27 +00:00
Hal Finkel
371be65604 Fix PPC64 CR spill location for callee-saved registers
This fixes an ABI bug for non-Darwin PPC64. For the callee-saved condition
registers, the spill location is specified relative to the stack pointer (SP +
8). However, this is not relative to the SP after the new stack frame is
established, but instead relative to the caller's stack pointer (it is stored
into the linkage area of the parent's stack frame).

So, like with the link register, we don't directly spill the CRs with other
callee-saved registers, but just mark them to be spilled during prologue
generation.

In practice, this reverts r179457 for PPC64 (but leaves it in place for PPC32).

llvm-svn: 179500
2013-04-15 02:07:05 +00:00
Eric Christopher
d4e829416f Revert "Remove some unused triple and data layout."
This reverts commit r179497 and the accompanying commit as it broke random platforms that aren't osx.

llvm-svn: 179499
2013-04-14 23:35:36 +00:00
Eric Christopher
ba040e7f36 Remove some unused triple and data layout.
llvm-svn: 179498
2013-04-14 23:32:44 +00:00
Nico Rieck
3863b37eaf Use object file specific section type for initial text section
llvm-svn: 179494
2013-04-14 21:18:36 +00:00
David Majnemer
1dc3d3f7a0 Reorders two transforms that collide with each other
One performs: (X == 13 | X == 14) -> X-13 <u 2
The other: (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1

The problem is that there are certain values of C1 and C2 that
trigger both transforms but the first one blocks out the second,
this generates suboptimal code.

Reordering the transforms should be better in every case and
allows us to do interesting stuff like turn:
  %shr = lshr i32 %X, 4
  %and = and i32 %shr, 15
  %add = add i32 %and, -14
  %tobool = icmp ne i32 %add, 0

into:
  %and = and i32 %X, 240
  %tobool = icmp ne i32 %and, 224

llvm-svn: 179493
2013-04-14 21:15:43 +00:00
Nadav Rotem
e7f3991d24 Make the command line triple match the module triple.
llvm-svn: 179492
2013-04-14 20:13:05 +00:00
Jakob Stoklund Olesen
3b790b7f2e Use i32 for all SPARC shift amounts, even in 64-bit mode.
Test case by llvm-stress.

llvm-svn: 179477
2013-04-14 05:48:50 +00:00
Nadav Rotem
e708aba7d7 Remove unused function attributes.
llvm-svn: 179476
2013-04-14 05:47:04 +00:00
Nadav Rotem
e380208d4f SLPVectorizer: Add support for trees that don't start at binary operators, and add the cost of extracting values from the roots of the tree.
llvm-svn: 179475
2013-04-14 05:15:53 +00:00
Jakob Stoklund Olesen
8fafe8cd31 Add support for the abs64 SPARC v9 code model.
For when 16 TB just isn't enough.

llvm-svn: 179474
2013-04-14 05:10:36 +00:00
Jakob Stoklund Olesen
d29f125f5b Add support for the SPARC v9 abs44 code model.
This is the default model for non-PIC 64-bit code. It supports
text+data+bss linked anywhere in the low 16 TB of the address space.

llvm-svn: 179473
2013-04-14 04:57:51 +00:00
Jakob Stoklund Olesen
b5173ad8fb Also put target flags on SPARC constant pool references.
Constant pool entries are accessed exactly the same way as global
variables.

llvm-svn: 179471
2013-04-14 04:35:16 +00:00
Nadav Rotem
433f05f5de SLPVectorizer: add initial support for reduction variable vectorization.
llvm-svn: 179470
2013-04-14 03:22:20 +00:00
Jakob Stoklund Olesen
c23fada5f9 Fix patterns for 64-bit pointers.
This fixes the pic32 code model for SPARC v9.

llvm-svn: 179469
2013-04-14 01:53:23 +00:00
Jakob Stoklund Olesen
6182cc630a Define SPARC code models.
Currently, only abs32 and pic32 are implemented. Add a test case for
abs32 with 64-bit code. 64-bit PIC code is currently broken.

llvm-svn: 179463
2013-04-13 19:02:23 +00:00
Benjamin Kramer
ebee156edb GlobalDCE: Fix an oversight in my last commit that could lead to crashes.
There is a Constant with non-constant operands: blockaddress.

llvm-svn: 179460
2013-04-13 16:11:14 +00:00
Benjamin Kramer
fedd86f086 Fix a scalability issue with complex ConstantExprs.
This is basically the same fix in three different places. We use a set to avoid
walking the whole tree of a big ConstantExprs multiple times.

For example: (select cmp, (add big_expr 1), (add big_expr 2))
We don't want to visit big_expr twice here, it may consist of thousands of
nodes.

The testcase exercises this by creating an insanely large ConstantExprs out of
a loop. It's questionable if the optimizer should ever create those, but this
can be triggered with real C code. Fixes PR15714.

llvm-svn: 179458
2013-04-13 12:53:18 +00:00
Hal Finkel
978a847acb Spill and restore PPC CR registers using the FP when we have one
For functions that need to spill CRs, and have dynamic stack allocations, the
value of the SP during the restore is not what it was during the save, and so
we need to use the FP in these cases (as for all of the other spills and
restores, but the CR restore has a special code path because its reserved slot,
like the link register, is specified directly relative to the adjusted SP).

llvm-svn: 179457
2013-04-13 08:09:20 +00:00
Andrew Trick
2bd87ad8d4 Further generalize this scheduler test.
The order of copies depends on queue order, which is not very stable.

llvm-svn: 179456
2013-04-13 07:37:27 +00:00
Andrew Trick
fb2a8d10f8 Fix a dislexic regex.
llvm-svn: 179455
2013-04-13 07:29:21 +00:00
Andrew Trick
1ef71359cd Add a missing REQUIRES: asserts
llvm-svn: 179453
2013-04-13 06:12:46 +00:00
Andrew Trick
861493bc4f MI-Sched: schedule physreg copies.
The register allocator expects minimal physreg live ranges. Schedule
physreg copies accordingly. This is slightly tricky when they occur in
the middle of the scheduling region. For now, this is handled by
rescheduling the copy when its associated instruction is
scheduled. Eventually we may instead bundle them, but only if we can
preserve the bundles as parallel copies during regalloc.

llvm-svn: 179449
2013-04-13 06:07:40 +00:00
Rafael Espindola
7beab5afc5 Finish templating MachObjectFile over endianness.
We are now able to handle big endian macho files in llvm-readobject. Thanks to
David Fang for providing the object files.

llvm-svn: 179440
2013-04-13 01:45:40 +00:00
Akira Hatanaka
e0468ce3e1 [mips] Reapply r179420 and r179421.
llvm-svn: 179434
2013-04-13 00:55:41 +00:00
Akira Hatanaka
b0b85e00d8 Revert r179420 and r179421.
llvm-svn: 179422
2013-04-12 22:40:07 +00:00
Akira Hatanaka
737648f84c [mips] Instruction selection patterns for carry-setting and using add
instructions.

llvm-svn: 179421
2013-04-12 22:24:52 +00:00
Akira Hatanaka
d809bc8eeb [mips] v4i8 and v2i16 add, sub and mul instruction selection patterns.
llvm-svn: 179420
2013-04-12 22:14:24 +00:00
Nadav Rotem
17c064fb45 Revert r179409 because it caused some warnings and some of the build bots fail.
llvm-svn: 179418
2013-04-12 22:02:26 +00:00
Benjamin Kramer
9f30ffb4a7 InstCombine: Check the operand types before merging fcmp ord & fcmp ord.
Fixes PR15737.

llvm-svn: 179417
2013-04-12 21:56:23 +00:00
Nadav Rotem
51df846152 SLPVectorizer: add support for vectorization of diamond shaped trees. We now perform a preliminary traversal of the graph to collect values with multiple users and check where the users came from.
llvm-svn: 179414
2013-04-12 21:16:54 +00:00
Nadav Rotem
06ab05f47a CostModel: increase the default cost of supported floating point operations from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles.
llvm-svn: 179413
2013-04-12 21:15:03 +00:00
Nadav Rotem
99420caf0a Add support for additional vector instructions in the interpreter.
patch by Veselov, Yuri <Yuri.Veselov@intel.com>.

llvm-svn: 179409
2013-04-12 20:45:20 +00:00
Quentin Colombet
226206b401 ARM: Correct printing of pre-indexed operands.
According to the ARM reference manual, constant offsets are mandatory for pre-indexed addressing modes.
The MC disassembler was not obeying this when the offset is 0.
It was producing instructions like: str r0, [r1]!.
Correct syntax is: str r0, [r1, #0]!.

This change modifies the dumping of operands so that the offset is always printed, regardless of its value, when pre-indexed addressing mode is used.

Patch by Mihail Popa <Mihail.Popa@arm.com>

llvm-svn: 179398
2013-04-12 18:47:25 +00:00
David Majnemer
eec2fe2c55 Simplify (A & ~B) in icmp if A is a power of 2
The transform will execute like so:
(A & ~B) == 0 --> (A & B) != 0
(A & ~B) != 0 --> (A & B) == 0

llvm-svn: 179386
2013-04-12 17:25:07 +00:00
Arnold Schwaighofer
48b1c3e915 LoopVectorizer: integer division is not a reduction operation
Don't classify idiv/udiv as a reduction operation. Integer division is lossy.
For example : (1 / 2) * 4 != 4/2.

Example:

int a[] = { 2, 5, 2, 2}
int x = 80;

for()
  x /= a[i];

Scalar:
  x /= 2 // = 40
  x /= 5 // = 8
  x /= 2 // = 4
  x /= 2 // = 2

Vectorized:

 <80, 1> / <2,5> //= <40,0>
 <40, 0> / <2,2> //= <20,0>

 20*0 = 0

radar://13640654

llvm-svn: 179381
2013-04-12 15:15:19 +00:00
Tim Northover
50af088d71 AArch64: use full triple for ELF tests
These tests rely specifically on the names of ELF relocations, let alone any
other detail. There's no way they'd work if LLVM was emitting something else by
default.

llvm-svn: 179376
2013-04-12 12:54:58 +00:00
Tim Northover
7d51031f92 AArch64: remove over-zealous use of CHECK-NEXT
It turns out some platforms (e.g. Windows) lay out their llvm-mc slightly
differently with extra newlines; there was no real reason for the test lines to
be consecutive, so this relaxes the FileCheck.

llvm-svn: 179375
2013-04-12 12:54:49 +00:00
Nico Rieck
e4564483fb Teach llvm-readobj to print ELF program headers
llvm-svn: 179363
2013-04-12 04:07:39 +00:00
Nico Rieck
a1dea0c714 Remove obsolete object file dumpers
llvm-svn: 179362
2013-04-12 04:07:13 +00:00
Nico Rieck
1162bb7a1d Replace coff-/elf-dump with llvm-readobj
llvm-svn: 179361
2013-04-12 04:06:46 +00:00
Nico Rieck
3ac415cf98 Add extensive relocation tests for llvm-readobj
This test ensures that relocation type names returned by libObject match
the raw relocation type value.

llvm-svn: 179360
2013-04-12 04:02:23 +00:00
Nadav Rotem
f96cc4976d Fix the test on linux by setting the triple and the align format
llvm-svn: 179354
2013-04-12 01:07:16 +00:00
Nadav Rotem
662256bafa Add a flag to align all basic blocks in the function.
When debugging performance regressions we often ask ourselves if the regression
that we see is due to poor isel/sched/ra or due to some micro-architetural
problem.  When comparing two code sequences one good way to rule out front-end
bottlenecks (and other the issues) is to force code alignment. This pass adds
a flag that forces the alignment of all of the basic blocks in the program.

llvm-svn: 179353
2013-04-12 00:48:32 +00:00
Rafael Espindola
af3cfcbc9f Add 179294 back, but don't use bit fields so that it works on big endian hosts.
Original message:

Print more information about relocations.

With this patch llvm-readobj now prints if a relocation is pcrel, its length,
if it is extern and if it is scattered.

It also refactors the code a bit to use bit fields instead of shifts and
masks all over the place.

llvm-svn: 179345
2013-04-12 00:17:33 +00:00
Manman Ren
6a0ccf041c Aliasing rules for struct-path aware TBAA.
Added PathAliases to check if two struct-path tags can alias.
Added command line option -struct-path-tbaa.

llvm-svn: 179337
2013-04-11 23:24:18 +00:00
Preston Gurd
8e4196dfe6 Use FileCheck instead of grep.
llvm-svn: 179322
2013-04-11 21:39:01 +00:00
David Majnemer
82ec1d080e Optimize icmp involving addition better
Allows LLVM to optimize sequences like the following:

%add = add nsw i32 %x, 1
%cmp = icmp sgt i32 %add, %y

into:

%cmp = icmp sge i32 %x, %y

as well as:

%add1 = add nsw i32 %x, 20
%add2 = add nsw i32 %y, 57
%cmp = icmp sge i32 %add1, %add2

into:

%add = add nsw i32 %y, 37
%cmp = icmp sle i32 %cmp, %x

llvm-svn: 179316
2013-04-11 20:05:46 +00:00
Jack Carter
834846d7a8 Mips specific inline asm memory operand modifier test case
These changes are based on commit responses for r179135.

llvm-svn: 179315
2013-04-11 19:39:19 +00:00
Rafael Espindola
6c9d485a40 Revert my last two commits while I debug what is wrong in a big endian host.
llvm-svn: 179303
2013-04-11 17:46:10 +00:00
Rafael Espindola
857bda0e10 Print more information about relocations.
With this patch llvm-readobj now prints if a relocation is pcrel, its length,
if it is extern and if it is scattered.

It also refactors the code a bit to use bit fields instead of shifts and
masks all over the place.

llvm-svn: 179294
2013-04-11 16:31:37 +00:00
Benjamin Kramer
3b38288ea2 Fix for wrong instcombine on vector insert/extract
When trying to collapse sequences of insertelement/extractelement
instructions into single shuffle instructions, there is one specific
case where the Instruction Combiner wrongly updates the resulting
Mask of shuffle indexes.

The problem is in function CollectShuffleElments.

If we have a sequence of insert/extract element instructions
like the one below:

  %tmp1 = extractelement <4 x float> %LHS, i32 0
  %tmp2 = insertelement <4 x float> %RHS, float %tmp1, i32 1
  %tmp3 = extractelement <4 x float> %RHS, i32 2
  %tmp4 = insertelement <4 x float> %tmp2, float %tmp3, i32 3

Where:
  . %RHS will have a mask of [4,5,6,7]
  . %LHS will have a mask of [0,1,2,3]

The Mask of shuffle indexes is wrongly computed to [4,1,6,7]
instead of [4,0,6,7].
When analyzing %tmp2 in order to compute the Mask for the
resulting shuffle instruction, the algorithm forgets to update
the mask index at position 1 with the index associated to the
element extracted from %LHS by instruction %tmp1.

Patch by Andrea DiBiagio!

llvm-svn: 179291
2013-04-11 15:10:09 +00:00
Eli Bendersky
0ce49fd520 Add a CHECK-NOT for a more faithful translation of the original grep | count 2.
Thanks to Reid Kleckner for catching this.

llvm-svn: 179289
2013-04-11 14:43:19 +00:00
Benjamin Kramer
f15ba24b8d Add missing colons to check lines.
llvm-svn: 179277
2013-04-11 12:41:41 +00:00
Benjamin Kramer
4413e71a39 FileCheckize a bunch of tests.
llvm-svn: 179276
2013-04-11 12:32:23 +00:00
Michael Liao
877d1576e6 Optimize vector select from all 0s or all 1s
As packed comparisons in AVX/SSE produce all 0s or all 1s in each SIMD lane,
vector select could be simplified to AND/OR or removed if one or both values
being selected is all 0s or all 1s.

llvm-svn: 179267
2013-04-11 05:15:54 +00:00
Michael Liao
75c886a312 Add CLAC/STAC instruction encoding/decoding support
As these two instructions in AVX extension are privileged instructions for
special purpose, it's only expected to be used in inlined assembly.

llvm-svn: 179266
2013-04-11 04:52:28 +00:00
Michael Liao
87125582e9 Enhance bool simplifcation in X86 to handle more cases
This patch is revised based on patch from Victor Umansky
<victor.umansky@intel.com>. More cases are handled in X86's bool
simplification, i.e.
- SETCC_CARRY
- value is truncated to i1 with AND

As a by-product, PR5443 is also fixed.

llvm-svn: 179265
2013-04-11 04:43:09 +00:00
Rafael Espindola
3ac0776aa4 Add MachO-x86-64 tests.
The object was already checked in, but was not being tested.

llvm-svn: 179256
2013-04-11 02:52:29 +00:00
Eli Bendersky
90daaa543a Rewrite some of the test/CodeGen/X86 tests to use FileCheck instead of grep
llvm-svn: 179241
2013-04-10 23:30:20 +00:00
Nico Rieck
8e22855ea6 MC: Support COFF image-relative MCSymbolRefs
Add support for the COFF relocation types IMAGE_REL_I386_DIR32NB and
IMAGE_REL_AMD64_ADDR32NB for 32- and 64-bit respectively. These are
similar to normal 4-byte relocations except that they do not include
the base address of the image.

Image-relative relocations are used for debug information (32-bit) and
SEH unwind tables (64-bit).

A new MCSymbolRef variant called 'VK_COFF_IMGREL32' is introduced to
specify such relocations. For AT&T assembly, this variant can be accessed
using the symbol suffix '@imgrel'.

llvm-svn: 179240
2013-04-10 23:28:17 +00:00
Hal Finkel
03d47320aa Manually remove successors in if conversion when CopyAndPredicateBlock is used
In the simple and triangle if-conversion cases, when CopyAndPredicateBlock is
used because the to-be-predicated block has other predecessors, we need to
explicitly remove the old copied block from the successors list. Normally if
conversion relies on TII->AnalyzeBranch combined with BB->CorrectExtraCFGEdges
to cleanup the successors list, but if the predicated block contained an
un-analyzable branch (such as a now-predicated return), then this will fail.

These extra successors were causing a problem on PPC because it was causing
later passes (such as PPCEarlyReturm) to leave dead return-only basic blocks in
the code.

llvm-svn: 179227
2013-04-10 22:05:25 +00:00
Jack Carter
8e319ed798 Mips specific inline asm memory operand modifier test case
These changes are based on commit responses for r179135.

llvm-svn: 179225
2013-04-10 22:02:32 +00:00
Kay Tiong Khoo
5f12d15d44 fixed xsave, xsaveopt, xrstor mnemonics with intel syntax; added test cases
llvm-svn: 179223
2013-04-10 21:52:25 +00:00
Eric Christopher
d419b3e326 Revert "Update the version of dwarf we say we're emitting to at least 3."
temporarily while we work on plumbing through some changes to continue
supporting gdb on darwin.

This reverts commit r179122.

llvm-svn: 179222
2013-04-10 21:45:07 +00:00
Jyotsna Verma
b244dff2f1 Add object-emission flag for lit tests. This flag is used
to disable following tests for Hexagon that require direct object
generation support.

DebugInfo/dwarf-public-names.ll
DebugInfo/dwarf-version.ll
DebugInfo/member-pointers.ll
DebugInfo/namespace.ll
DebugInfo/two-cus-from-same-file.ll

Fixes bug 15616 - http://llvm.org/bugs/show_bug.cgi?id=15616

llvm-svn: 179209
2013-04-10 19:53:26 +00:00
Nadav Rotem
6a6b998435 Make the SLP store-merger less paranoid about function calls. We check for function calls when we check if it is safe to sink instructions.
llvm-svn: 179207
2013-04-10 19:41:36 +00:00
Michel Danzer
c1562afdde R600/SI: Add pattern for AMDGPUurecip
21 more little piglits with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 179186
2013-04-10 17:17:56 +00:00
Reed Kotler
68e5128508 This is for an experimental option -mips-os16. The idea is to compile all
Mips32 code as Mips16 unless it can't be compiled as Mips 16. For now this
would happen as long as floating point instructions are not needed.
Probably it would also make sense to compile as mips32 if atomic operations
are needed too. There may be other cases too.

A module pass prescans the IR and adds the mips16 or nomips16 attribute
to functions depending on the functions needs.

Mips 16 mode can result in a 40% code compression by utililizing 16 bit
encoding of many instructions.

The hope is for this to replace the traditional gcc way of dealing with
Mips16 code using floating point which involves essentially using soft float
but with a library implemented using mips32 floating point. This gcc 
method also requires creating stubs so that Mips32 code can interact with
these Mips 16 functions that have floating point needs. My conjecture is
that in reality this traditional gcc method would never win over this
new method.

I will be implementing the traditional gcc method also. Some of it is already
done but I needed to do the stubs to finish the work and those required
this mips16/32 mixed mode capability.

I have more ideas for to make this new method much better and I think the old
method will just live in llvm for anyone that needs the backward compatibility
but I don't for what reason that would be needed.

llvm-svn: 179185
2013-04-10 16:58:04 +00:00
Peter Collingbourne
672b843bca Use a scheme closer to that of GNU as when deciding the type of a
symbol with multiple .type declarations.

Differential Revision: http://llvm-reviews.chandlerc.com/D607

llvm-svn: 179184
2013-04-10 16:52:15 +00:00
Vincent Lejeune
daa1e69206 R600: Add VTX_READ_* and RAT_WRITE_CACHELESS_* when computing cf addr
llvm-svn: 179174
2013-04-10 13:29:20 +00:00
Reid Kleckner
724d65c212 [test] Use lit's shell test runner on Windows
Summary:
I did a local comparison between using bash and using lit's runner, and
more of the suite passes with lit than passes with bash.  Most of the
bash failures have to do with /dev/null, which is nonsensical on
Windows, but the lit runner handles it.

The lit shell runner is also much faster than bash, so I would expect
most Windows devs would want it by default.

The behavior can be overridden on any OS by setting
LIT_USE_INTERNAL_SHELL to 0 or 1 in the environment.

Reviewers: chapuni, ddunbar

CC: llvm-commits, timurrrr

Differential Revision: http://llvm-reviews.chandlerc.com/D559

llvm-svn: 179173
2013-04-10 13:11:38 +00:00
Tim Northover
b82f729eb5 ARM: Make "SMC" instructions conditional on new TrustZone architecture feature.
These instructions aren't universally available, but depend on a specific
extension to the normal ARM architecture (rather than, say, v6/v7/...) so a new
feature is appropriate.

This also enables the feature by default on A-class cores which usually have
these extensions, to avoid breaking existing code and act as a sensible
default.

llvm-svn: 179171
2013-04-10 12:08:35 +00:00
Christian Konig
f40f671bab R600/SI: dynamical figure out the reg class of MIMG
Depending on the number of bits set in the writemask.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 179166
2013-04-10 08:39:16 +00:00
Christian Konig
76cd1a76c2 R600/SI: adjust writemask to only the used components
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 179165
2013-04-10 08:39:08 +00:00
Christian Konig
ffddac18a4 R600/SI: remove image sample writemask
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 179164
2013-04-10 08:39:01 +00:00
Evan Cheng
9f82233851 __sincosf_stret returns sinf / cosf in bits 0:31 and 32:63 of xmm0, not in
xmm0 / xmm1.

rdar://13599493

llvm-svn: 179141
2013-04-10 01:26:07 +00:00
Jack Carter
03f8f98410 Mips specific inline asm operand modifier 'D'
Modifier 'D' is to use the second word of a double integer.

We had previously implemented the pure register varient of 
the modifier and this patch implements the memory reference.



#include "stdio.h"

int b[8] = {0,1,2,3,4,5,6,7};
void main()
{
    int i;
    
    // The first word. Notice, no 'D'
    {asm (
    "lw    %0,%1;"
    : "=r" (i)
    : "m" (*(b+4))
    );}
    
    printf("%d\n",i);

    // The second word
    {asm (
    "lw    %0,%D1;"
    : "=r" (i)
    : "m" (*(b+4))
    );}
    
    printf("%d\n",i);
}

llvm-svn: 179135
2013-04-09 23:19:50 +00:00
Hal Finkel
8b05494b58 Allow PPC B and BLR to be if-converted into some predicated forms
This enables us to form predicated branches (which are the same conditional
branches we had before) and also a larger set of predicated returns (including
instructions like bdnzlr which is a conditional return and loop-counter
decrement all in one).

At the moment, if conversion does not capture all possible opportunities. A
simple example is provided in early-ret2.ll, where if conversion forms one
predicated return, and then the PPCEarlyReturn pass picks up the other one. So,
at least for now, we'll keep both mechanisms.

llvm-svn: 179134
2013-04-09 22:58:37 +00:00
Eric Christopher
8a0c2a7dbd Update the version of dwarf we say we're emitting to at least 3.
Deals with a dwarf2 -> dwarf3 DW_FORM_ref_addr change.

llvm-svn: 179122
2013-04-09 20:22:47 +00:00
Reed Kotler
9b753510a5 This patch enables llvm to switch between compiling for mips32/mips64
and mips16 on a per function basis.

Because this patch is somewhat involved I have provide an overview of the
key pieces of it.

The patch is written so as to not change the behavior of the non mixed
mode. We have tested this a lot but it is something new to switch subtargets
so we don't want any chance of regression in the mainline compiler until
we have more confidence in this.

Mips32/64 are very different from Mip16 as is the case of ARM vs Thumb1.
For that reason there are derived versions of the register info, frame info, 
instruction info and instruction selection classes.

Now we register three separate passes for instruction selection.
One which is used to switch subtargets (MipsModuleISelDAGToDAG.cpp) and then
one for each of the current subtargets (Mips16ISelDAGToDAG.cpp and
MipsSEISelDAGToDAG.cpp).

When the ModuleISel pass runs, it determines if there is a need to switch
subtargets and if so, the owning pointers in MipsTargetMachine are
appropriately changed.

When 16Isel or SEIsel is run, they will return immediately without doing
any work if the current subtarget mode does not apply to them.

In addition, MipsAsmPrinter needs to be reset on a function basis.

The pass BasicTargetTransformInfo is substituted with a null pass since the
pass is immutable and really needs to be a function pass for it to be
used with changing subtargets. This will be fixed in a follow on patch.

llvm-svn: 179118
2013-04-09 19:46:01 +00:00
Nadav Rotem
96f8f45bd5 Add support for bottom-up SLP vectorization infrastructure.
This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations.
The infrastructure has three potential users:

  1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]).

  2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute.

  3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization.

This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code:

void SAXPY(int *x, int *y, int a, int i) {
  x[i]   = a * x[i]   + y[i];
  x[i+1] = a * x[i+1] + y[i+1];
  x[i+2] = a * x[i+2] + y[i+2];
  x[i+3] = a * x[i+3] + y[i+3];
}

llvm-svn: 179117
2013-04-09 19:44:35 +00:00
Eric Christopher
48e215c890 The .dwo section shouldn't contain the unrelocated values (and
therefore not at all) of the pc or statement list. We also don't
need to emit the compilation dir so save so space and time
and don't bother.

Fix up the testcase accordingly and verify that we don't emit
the attributes or the items that they use.

llvm-svn: 179114
2013-04-09 19:23:15 +00:00
Nadav Rotem
8743b338cb Revert r176408 and r176407 to address PR15540.
llvm-svn: 179111
2013-04-09 18:16:05 +00:00
Benjamin Kramer
788f55c7d4 DAGCombiner: Fold a shuffle on CONCAT_VECTORS into a new CONCAT_VECTORS if possible.
This pattern occurs in SROA output due to the way vector arguments are lowered
on ARM.

The testcase from PR15525 now compiles into this, which is better than the code
we got with the old scalarrepl:
_Store:
	ldr.w	r9, [sp]
	vmov	d17, r3, r9
	vmov	d16, r1, r2
	vst1.8	{d16, d17}, [r0]
	bx	lr

Differential Revision: http://llvm-reviews.chandlerc.com/D647

llvm-svn: 179106
2013-04-09 17:41:43 +00:00
Hal Finkel
c72f2476a9 Use virtual base registers on PPC
On PowerPC, non-vector loads and stores have r+i forms; however, in functions
with large stack frames these were not being used to access slots far from the
stack pointer because such slots were out of range for the signed 16-bit
immediate offset field. This increases register pressure because we need a
separate register for each offset (when the r+r form is used). By enabling
virtual base registers, we can deal with large stack frames without unduly
increasing register pressure.

llvm-svn: 179105
2013-04-09 17:27:09 +00:00
Hal Finkel
68742d4f1f Convert test PowerPC/2007-09-07-LoadStoreIdxForms to FileCheck
llvm-svn: 179104
2013-04-09 17:26:55 +00:00
Eli Bendersky
3fbc32c278 Rewrite test/Linker tests to use FileCheck instead of grep.
Some translations here are not 1x1 because there are grep|grep
chains that are non-trivial to implement in terms of FileCheck features. I
made an effort for the tests to remain as similar as possible; do let me know
if you notice anything fishy. The good news are that some buggy tests were
fixed (grep | not grep - a bug waiting to happen).

llvm-svn: 179102
2013-04-09 16:51:13 +00:00
Michael Gottesman
66864f3c32 Converted 8x tests of SimplifyCFG to use FileCheck instead of grep.
llvm-svn: 179087
2013-04-09 05:18:53 +00:00
Nadav Rotem
fad36d8034 Revert 179071 because it is not the right way to support non standard new/new[] operators.
llvm-svn: 179084
2013-04-09 04:43:46 +00:00
Jakob Stoklund Olesen
6a275455ac Compute correct frame sizes for SPARC v9 64-bit frames.
The save area is twice as big and there is no struct return slot. The
stack pointer is always 16-byte aligned (after adding the bias).

Also eliminate the stack adjustment instructions around calls when the
function has a reserved stack frame.

llvm-svn: 179083
2013-04-09 04:37:47 +00:00
Nadav Rotem
e99e862deb c++ new operators are not malloc-like functions because they do not return uninitialized memory.
Users may overide new-operators and implement any function that they like.

llvm-svn: 179071
2013-04-08 23:40:47 +00:00
Eli Bendersky
d65a755ddf Rewrite test/Integer tests to use FileCheck instead of grep
llvm-svn: 179047
2013-04-08 20:18:15 +00:00
Eli Bendersky
88f7dd304e Rewrite test/ExecutionEngine tests to use FileCheck instead of grep
llvm-svn: 179043
2013-04-08 19:51:36 +00:00
Eli Bendersky
ea93ef7358 Rewrite test/Verifier tests to use FileCheck instead of grep
llvm-svn: 179036
2013-04-08 18:33:51 +00:00
Arnold Schwaighofer
3218da2403 X86 cost model: Model cost for uitofp and sitofp on SSE2
The costs are overfitted so that I can still use the legalization factor.

For example the following kernel has about half the throughput vectorized than
unvectorized when compiled with SSE2. Before this patch we would vectorize it.

unsigned short A[1024];
double B[1024];
void f() {
  int i;
  for (i = 0; i < 1024; ++i) {
    B[i] = (double) A[i];
  }
}

radar://13599001

llvm-svn: 179033
2013-04-08 18:05:48 +00:00
Hal Finkel
0daaa8e2de Generate PPC early conditional returns
PowerPC has a conditional branch to the link register (return) instruction: BCLR.
This should be used any time when we'd otherwise have a conditional branch to a
return. This adds a small pass, PPCEarlyReturn, which runs just prior to the
branch selection pass (and, importantly, after block placement) to generate
these conditional returns when possible. It will also eliminate unconditional
branches to returns (these happen rarely; most of the time these have already
been tail duplicated by the time PPCEarlyReturn is invoked). This is a nice
optimization for small functions that do not maintain a stack frame.

llvm-svn: 179026
2013-04-08 16:24:03 +00:00
Chandler Carruth
c0d878f95e Simplify the quoting here. Our lit emulator doesn't deal well with the
nested quoting schemes, and they're not important here...

llvm-svn: 179014
2013-04-08 10:07:50 +00:00
Tim Northover
8eb5637d73 AArch64: remove barriers from AArch64 atomic operations.
I've managed to convince myself that AArch64's acquire/release
instructions are sufficient to guarantee C++11's required semantics,
even in the sequentially-consistent case.

llvm-svn: 179005
2013-04-08 08:40:41 +00:00
Hal Finkel
9af1fa0d50 Cleanup and improve PPC fsel generation
First, we should not cheat: fsel-based lowering of select_cc is a
finite-math-only optimization (the ISA manual, section F.3 of v2.06, makes
this clear, as does a note in our own README).

This also adds fsel-based lowering of EQ and NE condition codes. As it turned
out, fsel generation was covered by a grand total of zero regression test
cases. I've added some test cases to cover the existing behavior (which is now
finite-math only), as well as the new EQ cases.

llvm-svn: 179000
2013-04-07 22:11:09 +00:00
Arnold Schwaighofer
16848bcf4a TargetLowering: Fix getTypeConversion handling of extended vector types
The code in getTypeConversion attempts to promote the element vector type
before it trys to split or widen the vector.
After it failed finding a legal vector type by promoting it would continue using
the promoted vector element type. Thereby missing legal splitted vector types.
For example the type v32i32 that has a legal split of 4 x v3i32 on x86/sse2
would be transformed to: v32i256 and from there on successively split to:
v16i256, v8i256, v1i256 and then finally ends up as an i64 type.
By resetting the vector element type to the original vector element type that
existed before the promotion the code will attempt to split the vector type to
smaller vector widths of the same type.

llvm-svn: 178999
2013-04-07 20:22:56 +00:00
Jakob Stoklund Olesen
658f7754a3 Implement LowerCall_64 for the SPARC v9 64-bit ABI.
There is still no support for byval arguments (which I don't think are
needed) and varargs.

llvm-svn: 178993
2013-04-07 19:10:57 +00:00
Chandler Carruth
8d04726f54 Fix PR15674 (and PR15603): a SROA think-o.
The fix for PR14972 in r177055 introduced a real think-o in the *store*
side, likely because I was much more focused on the load side. While we
can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily
widen a value to be stored, as that changes the width of memory access!
Lock down the code path in the store rewriting which would do this to
only handle the intended circumstance.

All of the existing tests continue to pass, and I've added a test from
the PR.

llvm-svn: 178974
2013-04-07 11:47:54 +00:00
Eric Christopher
546e1942f2 DW_FORM_sec_offset should be a relocation on platforms that use
a relocation across sections. Do this for DW_AT_stmt list in the
skeleton CU and check the relocations in the debug_info section.

Add a FIXME for multiple CUs.

llvm-svn: 178969
2013-04-07 03:43:09 +00:00
Jakob Stoklund Olesen
594e073ba0 Implement LowerReturn_64 for SPARC v9.
Integer return values are sign or zero extended by the callee, and
structs up to 32 bytes in size can be returned in registers.

The CC_Sparc64 CallingConv definition is shared between
LowerFormalArguments_64 and LowerReturn_64. Function arguments and
return values are passed in the same registers.

The inreg flag is also used for return values. This is required to handle
C functions returning structs containing floats and ints:

  struct ifp {
    int i;
    float f;
  };

  struct ifp f(void);

LLVM IR:

  define inreg { i32, float } @f() {
     ...
     ret { i32, float } %retval
  }

The ABI requires that %retval.i is returned in the high bits of %i0
while %retval.f goes in %f1.

Without the inreg return value attribute, %retval.i would go in %i0 and
%retval.f would go in %f3 which is a more efficient way of returning
%multiple values, but it is not ABI compliant for returning C structs.

llvm-svn: 178966
2013-04-06 23:57:33 +00:00
Jakob Stoklund Olesen
2e3d792bf5 SPARC v9 stack pointer bias.
64-bit SPARC v9 processes use biased stack and frame pointers, so the
current function's stack frame is located at %sp+BIAS .. %fp+BIAS where
BIAS = 2047.

This makes more local variables directly accessible via [%fp+simm13]
addressing.

llvm-svn: 178965
2013-04-06 21:38:57 +00:00
Hal Finkel
6c65d3a736 Implement PPCInstrInfo::FoldImmediate
There are certain PPC instructions into which we can fold a zero immediate
operand. We can detect such cases by looking at the register class required
by the using operand (so long as it is not otherwise constrained).

llvm-svn: 178961
2013-04-06 19:30:30 +00:00
Jakob Stoklund Olesen
2076ffbc15 Complete formal arguments for the SPARC v9 64-bit ABI.
All arguments are formally assigned to stack positions and then promoted
to floating point and integer registers. Since there are more floating
point registers than integer registers, this can cause situations where
floating point arguments are assigned to registers after integer
arguments that where assigned to the stack.

Use the inreg flag to indicate 32-bit fragments of structs containing
both float and int members.

The three-way shadowing between stack, integer, and floating point
registers requires custom argument lowering. The good news is that
return values are passed in the exact same way, and we can share the
code.

Still missing:

 - Update LowerReturn to handle structs returned in registers.
 - LowerCall.
 - Variadic functions.

llvm-svn: 178958
2013-04-06 18:32:12 +00:00
Tom Stellard
8ad4f7c25b R600/SI: Add support for buffer stores v2
v2:
  - Use the ADDR64 bit

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178931
2013-04-05 23:31:51 +00:00
Tom Stellard
917d7412f1 R600/SI: Add processor types for each SI variant
Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178928
2013-04-05 23:31:35 +00:00
Tom Stellard
17fba38a4b R600/SI: Avoid generating S_MOVs with 64-bit immediates v2
SITargetLowering::analyzeImmediate() was converting the 64-bit values
to 32-bit and then checking if they were an inline immediate.  Some
of these conversions caused this check to succeed and produced
S_MOV instructions with 64-bit immediates, which are illegal.

v2:
  - Clean up logic

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178927
2013-04-05 23:31:20 +00:00
Hal Finkel
b556d850c9 Enable early if conversion on PPC
On cores for which we know the misprediction penalty, and we have
the isel instruction, we can profitably perform early if conversion.
This enables us to replace some small branch sequences with selects
and avoid the potential stalls from mispredicting the branches.

Enabling this feature required implementing canInsertSelect and
insertSelect in PPCInstrInfo; isel code in PPCISelLowering was
refactored to use these functions as well.

llvm-svn: 178926
2013-04-05 23:29:01 +00:00
Michael Gottesman
d803c63c69 An objc_retain can serve as a use for a different pointer.
This is the counterpart to commit r160637, except it performs the action
in the bottomup portion of the data flow analysis.

llvm-svn: 178922
2013-04-05 22:54:32 +00:00
Michael Gottesman
fcbd79805b Properly model precise lifetime when given an incomplete dataflow sequence.
The normal dataflow sequence in the ARC optimizer consists of the following
states:

    Retain -> CanRelease -> Use -> Release

The optimizer before this patch stored the uses that determine the lifetime of
the retainable object pointer when it bottom up hits a retain or when top down
it hits a release. This is correct for an imprecise lifetime scenario since what
we are trying to do is remove retains/releases while making sure that no
``CanRelease'' (which is usually a call) deallocates the given pointer before we
get to the ``Use'' (since that would cause a segfault).

If we are considering the precise lifetime scenario though, this is not
correct. In such a situation, we *DO* care about the previous sequence, but
additionally, we wish to track the uses resulting from the following incomplete
sequences:

  Retain -> CanRelease -> Release   (TopDown)
  Retain <- Use <- Release          (BottomUp)

*NOTE* This patch looks large but the most of it consists of updating
test cases. Additionally this fix exposed an additional bug. I removed
the test case that expressed said bug and will recommit it with the fix
in a little bit.

llvm-svn: 178921
2013-04-05 22:54:28 +00:00
Shuxin Yang
5cf388a00f Disable the optimization about promoting vector-element-access with symbolic index.
This optimization is unstable at this moment; it 
  1) block us on a very important application
  2) PR15200
  3) test6 and test7 in test/Transforms/ScalarRepl/dynamic-vector-gep.ll
     (the CHECK command compare the output against wrong result)

   I personally believe this optimization should not have any impact on the
autovectorized code, as auto-vectorizer is supposed to put gather/scatter
in a "right" way.  Although in theory downstream optimizaters might reveal 
some gather/scatter optimization opportunities, the chance is quite slim.

   For the hand-crafted vectorizing code, in term of redundancy elimination,
load-CSE, copy-propagation and DSE can collectively achieve the same result,
but in much simpler way. On the other hand, these optimizers are able to 
improve the code in a incremental way; in contrast, SROA is sort of all-or-none
approach. However, SROA might slighly win in stack size, as it tries to figure 
out a stretch of memory tightenly cover the area accessed by the dynamic index.

 rdar://13174884
 PR15200

llvm-svn: 178912
2013-04-05 21:07:08 +00:00
Akira Hatanaka
70bcdf07be [mips] XFAIL test-interp-vec-loadstore.ll in an attempt to turn builder
llvm-mips-linux green.

llvm-mips-linux runs on a big endian machine. This test passes if I change 'e'
to 'E' in the target data layout string.

llvm-svn: 178910
2013-04-05 20:54:46 +00:00
Timur Iskhodzhanov
c004e9db2d Make the test/CodeGen/X86/win32_sret.ll reliable on any CPU by explicitly specifying the -mcpu
llvm-svn: 178885
2013-04-05 17:05:56 +00:00
Renato Golin
9d05117f2b Reverting 178851 as it broke buildbots
llvm-svn: 178883
2013-04-05 16:39:53 +00:00
Chad Rosier
59dbb08c9b [ms-inline asm] Add support for numeric displacement expressions in bracketed
memory operands.

Essentially, this layers an infix calculator on top of the parsing state
machine.  The scale on the index register is still expected to be an immediate

 __asm mov eax, [eax + ebx*4]

and will not work with more complex expressions.  For example,

 __asm mov eax, [eax + ebx*(2*2)]

The plus and minus binary operators assume the numeric value of a register is
zero so as to not change the displacement.  Register operands should never
be an operand for a multiply or divide operation; the scale*indexreg
expression is always replaced with a zero on the operand stack to prevent
such a case.
rdar://13521380

llvm-svn: 178881
2013-04-05 16:28:55 +00:00
Rafael Espindola
79787bd764 Don't fetch pointers from a InMemoryStruct.
InMemoryStruct is extremely dangerous as it returns data from an internal
buffer when the endiannes doesn't match. This should fix the tests on big
endian hosts.

llvm-svn: 178875
2013-04-05 15:15:22 +00:00
Ulrich Weigand
045c5fae3d Respect Addend when processing MCJIT relocations to local/global symbols.
When the RuntimeDyldELF::processRelocationRef routine finds the target
symbol of a relocation in the local or global symbol table, it performs
a section-relative relocation:

    Value.SectionID = lsi->second.first;
    Value.Addend = lsi->second.second;

At this point, however, any Addend that might have been specified in
the original relocation record is lost.  This is somewhat difficult to
trigger for relocations within the code section since they usually
do not contain non-zero Addends (when built with the default JIT code
model, in any case).  However, the problem can be reliably triggered
by a relocation within the data section caused by code like:

 int test[2] = { -1, 0 };
 int *p = &test[1];

The initializer of "p" will need a relocation to "test + 4".  On
platforms using RelA relocations this means an Addend of 4 is required.
Current code ignores this addend when processing the relocation,
resulting in incorrect execution.

Fixed by taking the Addend into account when processing relocations
to symbols found in the local or global symbol table.

Tested on x86_64-linux and powerpc64-linux.

llvm-svn: 178869
2013-04-05 13:29:04 +00:00
Alexey Samsonov
5cf62b262f llvm-symbolizer: correctly parse filenames given in quotes
llvm-svn: 178859
2013-04-05 09:22:24 +00:00
Alexey Samsonov
c2465d2e7e Add a basic test for llvm-symbolizer tool
llvm-svn: 178858
2013-04-05 08:30:13 +00:00
Alexey Samsonov
13f4bf7f12 Add obj2yaml to test dependencies
llvm-svn: 178852
2013-04-05 07:26:37 +00:00
Stepan Dyatkovskiy
98f7dac944 Fix for PR14824: "Optimization arm_ldst_opt inserts newly generated instruction vldmia at incorrect position".
Patch introduces memory operands tracking in ARMLoadStoreOpt::LoadStoreMultipleOpti. For each register it keeps the order of load operations as it was before optimization pass.
It is kind of deep improvement of fix proposed by Hao: http://llvm.org/bugs/show_bug.cgi?id=14824#c4
But it also tracks conflicts between different register classes (e.g. D2 and S5).
For more details see:
Bug description: http://llvm.org/bugs/show_bug.cgi?id=14824
LLVM Commits discussion: 
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130311/167936.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130318/168688.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130325/169376.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130401/170238.html

llvm-svn: 178851
2013-04-05 05:52:14 +00:00
Rafael Espindola
8aff750403 The ppc bots say this is the last broken line, so lets try one more :-(
llvm-svn: 178849
2013-04-05 05:36:37 +00:00
Rafael Espindola
af616aedfe One more try before I just delete the macho bits until tomorrow.
llvm-svn: 178847
2013-04-05 05:15:39 +00:00
Rafael Espindola
391fa49a86 More test loosening.
Sorry for so many commits, but llvm is still building on my ppc vm.

llvm-svn: 178843
2013-04-05 04:54:42 +00:00
Rafael Espindola
348c01a389 Loosen this test too.
llvm-svn: 178841
2013-04-05 04:37:55 +00:00
Rafael Espindola
0bd976d9fe Loosen this test.
Looks like there is a big endian/little endian problem here. Loosen the
test to try to get the bots green while llvm builds on a ppc qemu vm.

The failure was in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/

llvm-svn: 178839
2013-04-05 04:31:09 +00:00
Rafael Espindola
98c3e8bc71 Add a test for obj2yaml in preparation for refactoring it.
llvm-svn: 178829
2013-04-05 02:02:05 +00:00
Andrew Trick
6da34cd35e RegisterPressure heuristics currently require signed comparisons.
llvm-svn: 178823
2013-04-05 00:31:34 +00:00
Arnold Schwaighofer
abd363c1bc LoopVectorizer: Pass OperandValueKind information to the cost model
Pass down the fact that an operand is going to be a vector of constants.

This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86
back. It had degraded to scalar performance due to my pervious shift cost change
that made all shifts expensive on x86.

radar://13576547

llvm-svn: 178809
2013-04-04 23:26:27 +00:00
Arnold Schwaighofer
52871434dd X86 cost model: Differentiate cost for vector shifts of constants
SSE2 has efficient support for shifts by a scalar. My previous change of making
shifts expensive did not take this into account marking all shifts as expensive.
This would prevent vectorization from happening where it is actually beneficial.

With this change we differentiate between shifts of constants and other shifts.

radar://13576547

llvm-svn: 178808
2013-04-04 23:26:24 +00:00
Hal Finkel
1dc78e666b PPC: Improve code generation for mixed-precision reciprocal sqrt
The DAGCombine logic that recognized a/sqrt(b) and transformed it into
a multiplication by the reciprocal sqrt did not handle cases where the
sqrt and the division were separated by an fpext or fptrunc.

llvm-svn: 178801
2013-04-04 22:44:12 +00:00
Jyotsna Verma
8017d2c8b6 Disable 2010-10-01-crash.ll for Hexagon as the Hexagon frontend will
never produce a byval parameter with size < 8 bytes.

llvm-svn: 178792
2013-04-04 21:05:46 +00:00
Rafael Espindola
5e796656b1 Add back parsing of header charactestics.
It had been dropped during the switch to yaml::IO. Also add a test going
from yaml2obj to llvm-readobj. It can be extended as we add more
fields/formats to yaml2obj.

llvm-svn: 178786
2013-04-04 20:30:52 +00:00
Richard Osborne
25a2bd3084 [XCore] Add bru instruction.
llvm-svn: 178783
2013-04-04 20:05:35 +00:00
Richard Osborne
2eabe25672 [XCore] The RRegs register class is a superset of GRRegs.
At the time when the XCore backend was added there were some issues with
with overlapping register classes but these all seem to be fixed now.
Describing the register classes correctly allow us to get rid of a
codegen only instruction (LDAWSP_lru6_RRegs) and it means we can
disassemble ru6 instructions that use registers above r11.

llvm-svn: 178782
2013-04-04 19:57:46 +00:00
Jakob Stoklund Olesen
a53fa8d450 Avoid high-latency false CPSR dependencies even for tMOVSi.
The Thumb2SizeReduction pass avoids false CPSR dependencies, except it
still aggressively creates tMOVi8 instructions because they are so
common.

Avoid creating false CPSR dependencies even for tMOVi8 instructions when
the the CPSR flags are known to have high latency. This allows integer
computation to overlap floating point computations.

Also process blocks in a reverse post-order and propagate high-latency
flags to successors.

<rdar://problem/13468102>

llvm-svn: 178773
2013-04-04 18:25:36 +00:00
Stepan Dyatkovskiy
0562afa331 New-password-test commit.
llvm-svn: 178765
2013-04-04 16:11:18 +00:00
Vincent Lejeune
a680946842 R600: Take export into account when computing cf address
llvm-svn: 178761
2013-04-04 13:59:59 +00:00
Alexey Samsonov
9e33512a49 Propagate path to ASan/MSan symbolizer into test environment to produce useful reports on errors.
llvm-svn: 178749
2013-04-04 07:41:00 +00:00
Jakob Stoklund Olesen
1969a96fcd Add SPARC v9 support for select on 64-bit compares.
This requires v9 cmov instructions using the %xcc flags instead of the
%icc flags.

Still missing:
- Select floats on %xcc flags.
- Select i64 on %fcc flags.

llvm-svn: 178737
2013-04-04 03:08:00 +00:00
Arnold Schwaighofer
329430aeac X86 cost model: Vector shifts are expensive in most cases
The default logic does not correctly identify costs of casts because they are
marked as custom on x86.

For some cases, where the shift amount is a scalar we would be able to generate
better code. Unfortunately, when this is the case the value (the splat) will get
hoisted out of the loop, thereby making it invisible to ISel.

radar://13130673
radar://13537826

llvm-svn: 178703
2013-04-03 21:46:05 +00:00
Rafael Espindola
af01832c73 Implement the "mips endian" for r_info.
Normally r_info is just a 32 of 64 bit number matching the endian of the rest
of the file. Unfortunately, mips 64 bit little endian is special: The top 32
bits are a little endian number and the following 32 are a big endian one.

llvm-svn: 178694
2013-04-03 21:02:51 +00:00
Richard Osborne
8c4177b262 [XCore] Check disassembly of the st8 instruction.
llvm-svn: 178689
2013-04-03 20:07:11 +00:00
Richard Osborne
a7413cf3e7 [XCore] Update disassembler test to improve coverage of the instructions.
Previously some instructions were unintentionally covered twice and
others were not covered at all.

llvm-svn: 178688
2013-04-03 20:07:06 +00:00
Eric Christopher
df46cef31b Implements low-level object file format specific output for COFF and
ELF with support for:

- File headers
- Section headers + data
- Relocations
- Symbols
- Unwind data (only COFF/Win64)

The output format follows a few rules:
- Values are almost always output one per line (as elf-dump/coff-dump already do). - Many values are translated to something readable (like enum names), with the raw value in parentheses.
- Hex numbers are output in uppercase, prefixed with "0x".
- Flags are sorted alphabetically.
- Lists and groups are always delimited.

Example output:
---------- snip ----------
Sections [
  Section {
    Index: 1
    Name: .text (5)
    Type: SHT_PROGBITS (0x1)
    Flags [ (0x6)
      SHF_ALLOC (0x2)
      SHF_EXECINSTR (0x4)
    ]
    Address: 0x0
    Offset: 0x40
    Size: 33
    Link: 0
    Info: 0
    AddressAlignment: 16
    EntrySize: 0
    Relocations [
      0x6 R_386_32 .rodata.str1.1 0x0
      0xB R_386_PC32 puts 0x0
      0x12 R_386_32 .rodata.str1.1 0x0
      0x17 R_386_PC32 puts 0x0
    ]
    SectionData (
      0000: 83EC04C7 04240000 0000E8FC FFFFFFC7  |.....$..........|
      0010: 04240600 0000E8FC FFFFFF31 C083C404  |.$.........1....|
      0020: C3                                   |.|
    )
  }
]
---------- snip ----------

Relocations and symbols can be output standalone or together with the section header as displayed in the example.
This feature set supports all tests in test/MC/COFF and test/MC/ELF (and I suspect all additional tests using elf-dump), making elf-dump and coff-dump deprecated.

Patch by Nico Rieck!

llvm-svn: 178679
2013-04-03 18:31:38 +00:00
Eric Christopher
99a330354d Implement sectionContainsSymbol for ELF.
Patch by Nico Rieck!

llvm-svn: 178677
2013-04-03 18:31:19 +00:00
Eric Christopher
8cfce53956 When dumping clear the arm/thumb flag for now.
Patch by Nico Rieck!

llvm-svn: 178676
2013-04-03 18:31:12 +00:00
Vincent Lejeune
6a4ef74f44 R600: Fix last ALU of a clause being emitted in a separate clause
llvm-svn: 178675
2013-04-03 18:24:47 +00:00
Bill Schmidt
990515e4c4 Fix PR15632: No support for ppcf128 floating-point remainder on PowerPC.
For this we need to use a libcall.  Previously LLVM didn't implement
libcall support for frem, so I've added it in the usual
straightforward manner.  A test case from the bug report is included.

llvm-svn: 178639
2013-04-03 13:05:44 +00:00
Tim Northover
2550df2b22 AArch64: implement ETMv4 trace system registers.
llvm-svn: 178637
2013-04-03 12:31:29 +00:00
Timur Iskhodzhanov
0976f711d6 Temporarily relax the WIN32 checks in the SRet test to fix the Atom D2700 bot
llvm-svn: 178635
2013-04-03 12:17:15 +00:00
Timur Iskhodzhanov
ecd533f0ec Fix SRet for thiscall in i686-pc-win32
llvm-svn: 178634
2013-04-03 11:27:54 +00:00
Jakob Stoklund Olesen
3b7eaf9bb6 Add 64-bit compare + branch for SPARC v9.
The same compare instruction is used for 32-bit and 64-bit compares. It
sets two different sets of flags: icc and xcc.

This patch adds a conditional branch instruction using the xcc flags for
64-bit compares.

llvm-svn: 178621
2013-04-03 04:41:44 +00:00
Hal Finkel
0208f7c744 Use PPC reciprocal estimates with Newton iteration in fast-math mode
When unsafe FP math operations are enabled, we can use the fre[s] and
frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together
with some Newton iteration, in order to quickly generate floating-point
division and sqrt results. All of these instructions are separately optional,
and so each has its own feature flag (except for the Altivec instructions,
which are covered under the existing Altivec flag). Doing this is not only
faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these
computations to be pipelined with other computations in order to hide their
overall latency.

I've also added a couple of missing fnmsub patterns which turned out to be
missing (but are necessary for good code generation of the Newton iterations).
Altivec needs a similar fix, but that will probably be more complicated because
fneg is expanded for Altivec's v4f32.

llvm-svn: 178617
2013-04-03 04:01:11 +00:00
Rafael Espindola
de2bc5c06e Fix the fde encoding used by mips to match gas.
This finally fixes the encoding. The patch also
* Removes eh-frame.ll. It was an unnecessary .ll to .o test that was checking
  the wrong value.
* Merge fde-reloc.s and eh-frame.s into a single test, since the only difference
  was the run lines.
* Don't blindly test the content of the entire .eh_frame section. It makes it
  hard to anyone actually fixing a bug and hitting a difference in a binary
  blob. Instead, use a CHECK for each field and document what is being checked.

llvm-svn: 178615
2013-04-03 03:13:19 +00:00
Michael Gottesman
05c38c0189 Remove an optimization where we were changing an objc_autorelease into an objc_autoreleaseReturnValue.
The semantics of ARC implies that a pointer passed into an objc_autorelease
must live until some point (potentially down the stack) where an
autorelease pool is popped. On the other hand, an
objc_autoreleaseReturnValue just signifies that the object must live
until the end of the given function at least.

Thus objc_autorelease is stronger than objc_autoreleaseReturnValue in
terms of the semantics of ARC* implying that performing the given
strength reduction without any knowledge of how this relates to
the autorelease pool pop that is further up the stack violates the
semantics of ARC.

*Even though objc_autoreleaseReturnValue if you know that no RV
optimization will occur is more computationally expensive.

llvm-svn: 178612
2013-04-03 02:57:24 +00:00
Akira Hatanaka
f08d3a5a83 [mips] Small update to the implementation of eh.return for Mips.
This patch initializes t9 to the handler address, but only if the relocation
model is pic. This handles the case where handler to which eh.return jumps 
points to the start of the function.

Patch by Sasa Stankovic.

llvm-svn: 178588
2013-04-02 23:02:07 +00:00
Eric Christopher
c6f97cf1a0 Support and test template arguments for unions.
llvm-svn: 178586
2013-04-02 22:55:56 +00:00
NAKAMURA Takumi
d8a9117bcb llvm/test/CodeGen/X86: Unmark them out of XFAIL:cygming, in atomic{32|64}.ll and handle-move.ll, corresponding to r178549.
This reverts r176808, r176798, and r177914.

llvm-svn: 178583
2013-04-02 22:35:08 +00:00
Bill Schmidt
c98ed219d3 Fix PR15630: Replace faulty stdcx. with stwcx.
When doing a partword atomic operation, a lwarx was being paired with
a stdcx. instead of a stwcx. when compiling for a 64-bit target.  The
target has nothing to do with it in this case; we always need a stwcx.

Thanks to Kai Nacke for reporting the problem.

llvm-svn: 178559
2013-04-02 18:37:08 +00:00
Jakob Stoklund Olesen
b0a5a72daf Don't attempt MTM heuristics without a scheduling model present.
This should fix the PPC buildbots.

llvm-svn: 178558
2013-04-02 18:26:45 +00:00
Chad Rosier
908153170e [fast-isel] Use the correct API to disable FastLowerArguments for Win64.
llvm-svn: 178549
2013-04-02 16:31:41 +00:00
Arnold Schwaighofer
c9508f817a DAGCombiner: Merge store/loads when we have extload/truncstores
This is helps on architectures where i8,i16 are not legal but we have byte, and
short loads/stores. Allowing us to merge copies like the one below on ARM.

copy(char *a, char *b, int n) {
 do {
   int t0 = a[0];
   int t1 = a[1];
   b[0] = t0;
   b[1] = t1;

radar://13536387

llvm-svn: 178546
2013-04-02 15:58:51 +00:00
Preston Gurd
fca710bf70 Simplify test cases for Atom preferring call register indirect over
call memory indirect (32 and 64 bit).

llvm-svn: 178541
2013-04-02 14:25:06 +00:00
Bill Wendling
2b9f48d238 Use a worklist to avoid a sneaky iterator invalidation.
The iterator could be invalidated when it's recursively deleting a whole bunch
of constant expressions in a constant initializer.

Note: This was only reproducible if `opt' was run on a `.bc' file. If `opt' was
run on a `.ll' file, it wouldn't crash. This is why the test first pushes the
`.ll' file through `llvm-as' before feeding it to `opt'.

PR15440

llvm-svn: 178531
2013-04-02 08:16:45 +00:00
Jakob Stoklund Olesen
8a184a7fe4 Add 64-bit load and store instructions.
There is only a few new instructions, the rest is handled with patterns.

llvm-svn: 178528
2013-04-02 04:09:28 +00:00
Jakob Stoklund Olesen
22fe26207f Basic 64-bit ALU operations.
SPARC v9 extends all ALU instructions to 64 bits, so we simply need to
add patterns to use them for both i32 and i64 values.

llvm-svn: 178527
2013-04-02 04:09:23 +00:00
Jakob Stoklund Olesen
d57f9ab92f Materialize 64-bit immediates.
The last resort pattern produces 6 instructions, and there are still
opportunities for materializing some immediates in fewer instructions.

llvm-svn: 178526
2013-04-02 04:09:17 +00:00
Jakob Stoklund Olesen
5ef2195726 Add 64-bit shift instructions.
SPARC v9 defines new 64-bit shift instructions. The 32-bit shift right
instructions are still usable as zero and sign extensions.

This adds new F3_Sr and F3_Si instruction formats that probably should
be used for the 32-bit shifts as well. They don't really encode an
simm13 field.

llvm-svn: 178525
2013-04-02 04:09:12 +00:00
Jakob Stoklund Olesen
9fbfce2d11 Add support for 64-bit calling convention.
This is far from complete, but it is enough to make it possible to write
test cases using i64 arguments.

Missing features:
- Floating point arguments.
- Receiving arguments on the stack.
- Calls.

llvm-svn: 178523
2013-04-02 04:09:02 +00:00
Jack Carter
b48d003b7d Mips direct object exception handling regression
Revision 177141 caused a regression in all but
mips64 little endian. That is because none of the
other Mips targets had test cases checking the 
contents of the .eh_frame section. This patch fixes
both the llvm code and adds an assembler test case 
to include the current 4 flavors.

The test cases unfortunately rely on llvm-objdump. A
preferable method would be to use a pretty printer output
such as what readelf -wf <elf_file> would give.

I also changed the name of the test case to correct a typo.

llvm-svn: 178506
2013-04-01 21:55:15 +00:00
Vincent Lejeune
dc0e12bd5b R600: Add support for native control flow
llvm-svn: 178505
2013-04-01 21:48:05 +00:00
Vincent Lejeune
11918406b3 R600: Emit CF_ALU and use true kcache register.
llvm-svn: 178503
2013-04-01 21:47:42 +00:00
Hal Finkel
9191cdb5f2 Fix a bad assert in PPCTargetLowering
llvm-svn: 178489
2013-04-01 18:42:58 +00:00
Hal Finkel
9cd1e5e93c Add triple to test/CodeGen/PowerPC/stfiwx-2
llvm-svn: 178486
2013-04-01 18:18:44 +00:00
Shuxin Yang
74f54ae4b2 Correct assertion condition
llvm-svn: 178484
2013-04-01 18:13:05 +00:00
Arnold Schwaighofer
a2a475a83d Merge load/store sequences with adresses: base + index + offset
We would also like to merge sequences that involve a variable index like in the
example below.

    int index = *idx++
    int i0 = c[index+0];
    int i1 = c[index+1];
    b[0] = i0;
    b[1] = i1;

By extending the parsing of the base pointer to handle dags that contain a
base, index, and offset we can handle examples like the one above.

The dag for the code above will look something like:

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i8 load %index))))

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i32 add (i32 signextend (i8 load %index))
                                         (i32 1)))))

The code that parses the tree ignores the intermediate sign extensions. However,
if there is a sign extension it needs to be on all indexes.

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (add (i8 load %index)
                                     (i8 1))))
 vs

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i32 add (i32 signextend (i8 load %index))
                                         (i32 1)))))
radar://13536387

llvm-svn: 178483
2013-04-01 18:12:58 +00:00
Hal Finkel
f184647a53 Add more PPC floating-point conversion instructions
The P7 and A2 have additional floating-point conversion instructions which
allow a direct two-instruction sequence (plus load/store) to convert from all
combinations (signed/unsigned i32/i64) <--> (float/double) (on previous cores,
only some combinations were directly available).

llvm-svn: 178480
2013-04-01 17:52:07 +00:00
Hal Finkel
55f144f923 Fix PowerPC/cttz.ll to specify a cpu (and use FileCheck)
llvm-svn: 178472
2013-04-01 16:31:56 +00:00
Hal Finkel
9eed3ac928 Add the PPC popcntw instruction
The popcntw instruction is available whenever the popcntd instruction is
available, and performs a separate popcnt on the lower and upper 32-bits.
Ignoring the high-order count, this can be used for the 32-bit input case
(saving on the explicit zero extension otherwise required to use popcntd).

llvm-svn: 178470
2013-04-01 15:58:15 +00:00
Nadav Rotem
fe272b52da Add support for vector data types in the LLVM interpreter.
Patch by:
Veselov, Yuri <Yuri.Veselov@intel.com>

llvm-svn: 178469
2013-04-01 15:53:30 +00:00
Benjamin Kramer
7634eefc37 X86TTI: Add accurate costs for itofp operations, based on the actual instruction counts.
llvm-svn: 178459
2013-04-01 10:23:49 +00:00
Benjamin Kramer
790bd5fb50 X86: Promote sitofp <8 x i16> to <8 x i32> when AVX is available.
A vector sext + sitofp is a lot cheaper than 8 scalar conversions.

llvm-svn: 178448
2013-03-31 12:49:15 +00:00
Hal Finkel
085f61160f Add the PPC lfiwax instruction
This instruction is available on modern PPC64 CPUs, and is now used
to improve the SINT_TO_FP lowering (by eliminating the need for the
separate sign extension instruction and decreasing the amount of
needed stack space).

llvm-svn: 178446
2013-03-31 10:12:51 +00:00
Hal Finkel
7bdfbd6570 Cleanup PPC(64) i32 -> float/double conversion
The existing SINT_TO_FP code for i32 -> float/double conversion was disabled
because it relied on broken EXTSW_32/STD_32 instruction definitions. The
original intent had been to enable these 64-bit instructions to be used on CPUs
that support them even in 32-bit mode.  Unfortunately, this form of lying to
the infrastructure was buggy (as explained in the FIXME comment) and had
therefore been disabled.

This re-enables this functionality, using regular DAG nodes, but only when
compiling in 64-bit mode. The old STD_32/EXTSW_32 definitions (which were dead)
are removed.

llvm-svn: 178438
2013-03-31 01:58:02 +00:00
Benjamin Kramer
86e90ea8b4 DAGCombine: visitXOR can replace a node without returning it, bail out in that case.
Fixes the crash reported in PR15608.

llvm-svn: 178429
2013-03-30 21:28:18 +00:00
Benjamin Kramer
50725426cb Change '@SECREL' suffix to GAS-compatible '@SECREL32'.
'@SECREL' is what is used by the Microsoft assembler, but GNU as expects '@SECREL32'.
With the patch, the MC-generated code works fine in combination with a recent GNU as (2.23.51.20120920 here).

Patch by David Nadlinger!
Differential Revision: http://llvm-reviews.chandlerc.com/D429

llvm-svn: 178427
2013-03-30 16:21:50 +00:00
Justin Holewinski
21480942b2 [NVPTX] Remove support for SM < 2.0. This was never fully supported anyway.
llvm-svn: 178417
2013-03-30 14:29:30 +00:00
Justin Holewinski
23056edada [NVPTX] Add NVVMReflect pass to allow compile-time selection of
specific code paths.

This allows us to write code like:

  if (__nvvm_reflect("FOO"))
    // Do something
  else
    // Do something else

and compile into a library, then give "FOO" a value at kernel
compile-time so the check becomes a no-op.

llvm-svn: 178416
2013-03-30 14:29:25 +00:00
Shuxin Yang
c53fc5dc4c Implement XOR reassociation. It is based on following rules:
rule 1: (x | c1) ^ c2 => (x & ~c1) ^ (c1^c2),
     only useful when c1=c2
  rule 2: (x & c1) ^ (x & c2) = (x & (c1^c2))
  rule 3: (x | c1) ^ (x | c2) = (x & c3) ^ c3 where c3 = c1 ^ c2
  rule 4: (x | c1) ^ (x & c2) => (x & c3) ^ c1, where c3 = ~c1 ^ c2

 It reduces an application's size (in terms of # of instructions) by 8.9%.
 Reviwed by Pete Cooper. Thanks a lot!

 rdar://13212115  

llvm-svn: 178409
2013-03-30 02:15:01 +00:00
Akira Hatanaka
bc81d23802 [mips] Add patterns for DSP indexed load instructions.
llvm-svn: 178408
2013-03-30 02:14:45 +00:00
Akira Hatanaka
5ff9493456 [mips] Fix DSP instructions to have explicit accumulator register operands.
Check that instruction selection can select multiply-add/sub DSP instructions
from a pattern that doesn't have intrinsics.

llvm-svn: 178406
2013-03-30 01:58:00 +00:00
Akira Hatanaka
6c9ddf6943 [mips] Move the code which does dag-combine for multiply-add/sub nodes to
derived class MipsSETargetLowering.

We shouldn't be generating madd/msub nodes if target is Mips16, since Mips16
doesn't have support for multipy-add/sub instructions.

llvm-svn: 178404
2013-03-30 01:42:24 +00:00
Michael Gottesman
1bc2d353ed Updated test0 of retain-not-declared.ll to reflect the fact that objc-arc-expand runs before objc-arc/objc-arc-contract.
Specifically, objc-arc-expand will make sure that the
objc_retainAutoreleasedReturnValue, objc_autoreleaseReturnValue, and ret
will all have %call as an argument.

llvm-svn: 178382
2013-03-29 22:44:59 +00:00
Timur Iskhodzhanov
d7d35221f7 Exclude the X86/complex-fca.ll test at it probably wasn't supposed to work on Windows
llvm-svn: 178375
2013-03-29 21:54:00 +00:00
Michael Gottesman
c7ef28e19f Add clang.arc.used to ModuleHasARC so ARC always runs if said call is present in a module.
clang.arc.used is an interesting call for ARC since ObjCARCContract
needs to run to remove said intrinsic to avoid a linker error (since the
call does not exist).

llvm-svn: 178369
2013-03-29 21:15:23 +00:00
Adrian Prantl
dd8a6dc41a move testcase into appropriate X86 subdirectory.
llvm-svn: 178364
2013-03-29 20:14:08 +00:00
Hal Finkel
c14a74d243 Implement FRINT lowering on PPC using frin
Like nearbyint, rint can be implemented on PPC using the frin instruction. The
complication comes from the fact that rint needs to set the FE_INEXACT flag
when the result does not equal the input value (and frin does not do that). As
a result, we use a custom inserter which, after the rounding, compares the
rounded value with the original, and if they differ, explicitly sets the XX bit
in the FPSCR register (which corresponds to FE_INEXACT).

Once LLVM has better modeling of the floating-point environment we should be
able to (often) eliminate this extra complexity.

llvm-svn: 178362
2013-03-29 19:41:55 +00:00
Adrian Prantl
324e4868cb Split the llvm/tools/clang/test/CodeGenObjC/debug-info-blocks.m testcase into a CFE and LLVM part.
rdar://problem/12767564

llvm-svn: 178353
2013-03-29 18:08:14 +00:00
Benjamin Kramer
279e5cfa9a Remove the old CodePlacementOpt pass.
It was superseded by MachineBlockPlacement and disabled by default since LLVM 3.1.

llvm-svn: 178349
2013-03-29 17:14:24 +00:00
Hal Finkel
3493fd8e51 Add PPC FP rounding instructions fri[mnpz]
These instructions are available on the P5x (and later) and on the A2. They
implement the standard floating-point rounding operations (floor, trunc, etc.).
One caveat: frin (round to nearest) does not implement "ties to even", and so
is only enabled in fast-math mode.

llvm-svn: 178337
2013-03-29 08:57:48 +00:00
Jack Carter
ab230573a8 [Mips Assembler] Add support for OR macro with imediate opperand
Mips assembler supports macros that allows the OR instruction 
to have an immediate parameter. This patch adds an instruction 
alias that converts this macro into a Mips ORI instruction. 

Contributer: Vladimir Medic
llvm-svn: 178316
2013-03-28 23:45:13 +00:00
Michael Liao
427149cbcf Add support of RDSEED defined in AVX2 extension
llvm-svn: 178314
2013-03-28 23:41:26 +00:00
Michael Liao
aec693ab31 Enhance boolean simplification to handle 16-/64-bit RDRAND
- RDRAND always clears the destination value when a random value is not
  available (i.e. CF == 0). This value is truncated or zero-extended as
  the false boolean value to be returned. Boolean simplification needs
  to skip this 'zext' or 'trunc' node.

llvm-svn: 178312
2013-03-28 23:38:52 +00:00
Jack Carter
1e744ec264 [Mips Assembler] Add alias definitions for jal
Mips assembler allows following to be used as aliased instructions:
jal $rs for jalr $rs
jal $rd,$rd for jalr $rd,$rs

This patch provides alias definitions in td files and test cases to show the usage.

Contributer: Vladimir Medic
llvm-svn: 178304
2013-03-28 23:02:21 +00:00
Timur Iskhodzhanov
d7de83d51c Make Win32 put the SRet address into EAX, fixes PR15556
llvm-svn: 178291
2013-03-28 21:30:04 +00:00