1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

26077 Commits

Author SHA1 Message Date
Chandler Carruth
57acc5d880 [x86] As a follow-up to r217819, don't check for VSELECT legality now
that we don't use VSELECT and directly emit an addsub synthetic node.
Also remove a stale comment referencing VSELECT.

The test case is updated to use 'core2' which only has SSE3, not SSE4.1,
and it still passes. Previously it would not because we lacked
sufficient blend support to legalize the VSELECT.

llvm-svn: 217849
2014-09-16 00:24:42 +00:00
Chandler Carruth
bb9f352cdf [x86] Add the beginnings of a proper DAG combine to match ADDSUBPS and
ADDSUBPD nodes out of blends of adds and subs.

This allows us to actually form these instructions with SSE3 rather than
only forming them when we had both SSE3 for the ADDSUB instructions and
SSE4.1 for the blend instructions. ;] Kind-of important.

I've adjusted the CPU requirements on one of the tests to demonstrate
this kicking in nicely for an SSE3 cpu configuration.

llvm-svn: 217848
2014-09-16 00:15:20 +00:00
Juergen Ributzka
d6c73385b8 [FastISel][AArch64] Add missing test case for previous commit.
This adds the missing test case for the previous commit:
Allow handling of vectors during return lowering for little endian machines.

Sorry for the noise.

llvm-svn: 217847
2014-09-15 23:47:57 +00:00
Juergen Ributzka
25497b3f2d [FastISel][AArch64] Lower sin/cos/pow to runtime lib calls.
Also lower sin/cos/pow to runtime lib calls.

This fixes rdar://problem/18343468.

llvm-svn: 217839
2014-09-15 22:33:06 +00:00
Justin Bogner
bd9ae45726 llvm-cov: Make debug output more consistent
This changes the debug output of the llvm-cov tool to consistently
write to stderr, and moves the highlighting output closer to where
it's relevant.

llvm-svn: 217838
2014-09-15 22:23:29 +00:00
Justin Bogner
50e68a6de5 llvm-cov: Fix an issue with showing regions but not counts
In r217746, though it was supposed to be NFC, I broke llvm-cov's
handling of showing regions without showing counts. This should've
shown up in the existing tests, except they were checking debug output
that was displayed regardless of what was actually output. I've moved
the relevant debug output to a more appropriate place so that the
tests catch this kind of thing.

llvm-svn: 217835
2014-09-15 22:12:28 +00:00
Rafael Espindola
c596f4f4ed Add back tests for empty function in SPARC and PowerPC.
llvm-svn: 217834
2014-09-15 22:11:07 +00:00
Juergen Ributzka
e596897b8b [FastISel][AArch64] Add lowering support for frem.
This lowers frem to a runtime libcall inside fast-isel.

The test case also checks the CallLoweringInfo bug that was exposed by this
change.

This fixes rdar://problem/18342783.

llvm-svn: 217833
2014-09-15 22:07:49 +00:00
Juergen Ributzka
dd6e5e3f62 [FastISel][AArch64] Improve floating-point compare support.
Add support for the last two missing fcmp condition codes: UEQ and ONE.

This fixes rdar://problem/18341575.

llvm-svn: 217823
2014-09-15 20:47:16 +00:00
Reed Kotler
03071c7bb3 Add mips32 r1 to the list of supported targets for Mips fast-isel
Summary:
Expand list of supported targets for Mips to include mips32 r1.
Previously it only include r2. More patches are coming where there is 
a difference but in the current patches as pushed upstream, r1 and r2
are equivalent.

Test Plan:
simplestorefp1.ll

add new build bots at mips to test this flavor at both -O0 and -O2

Reviewers: dsanders

Reviewed By: dsanders

Differential Revision: http://reviews.llvm.org/D5306

llvm-svn: 217821
2014-09-15 20:30:25 +00:00
NAKAMURA Takumi
647331dc16 llvm/test/CodeGen/X86/peephole-fold-movsd.ll: Relax an expression for win32.
llvm-svn: 217806
2014-09-15 19:00:31 +00:00
Rafael Espindola
12bf1eaab3 Add a triple to fix the bots.
llvm-svn: 217805
2014-09-15 18:54:41 +00:00
Rafael Espindola
6e5ce4f5db Fix a lot of confusion around inserting nops on empty functions.
On MachO, and MachO only, we cannot have a truly empty function since that
breaks the linker logic for atomizing the section.

When we are emitting a frame pointer, the presence of an unreachable will
create a cfi instruction pointing past the last instruction. This is perfectly
fine. The FDE information encodes the pc range it applies to. If some tool
cannot handle this, we should explicitly say which bug we are working around
and only work around it when it is actually relevant (not for ELF for example).

Given the unreachable we could omit the .cfi_def_cfa_register, but then
again, we could also omit the entire function prologue if we wanted to.

llvm-svn: 217801
2014-09-15 18:32:58 +00:00
Quentin Colombet
798b42868c [CodeGenPrepare][AddressingModeMatcher] Fix a think-o for the sext(zext) -> zext promotion
introduced in r217629.
We were returning the old sext instead of the new zext as the promoted instruction!

Thanks Joerg Sonnenberger for the test case.

llvm-svn: 217800
2014-09-15 18:26:58 +00:00
Akira Hatanaka
7402171777 [X86] Fix a bug in X86's peephole optimization.
Peephole optimization was folding MOVSDrm, which is a zero-extending double
precision floating point load, into ADDPDrr, which is a SIMD add of two packed
double precision floating point values.

(before)
%vreg21<def> = MOVSDrm <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg21
%vreg23<def,tied1> = ADDPDrr %vreg20<tied0>, %vreg21; VR128:%vreg23,%vreg20,%vreg21

(after)
%vreg23<def,tied1> = ADDPDrm %vreg20<tied0>, <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg23,%vreg20

X86InstrInfo::foldMemoryOperandImpl already had the logic that prevented this
from happening. However the check wasn't being conducted for loads from stack
objects. This commit factors out the logic into a new function and uses it for
checking loads from stack slots are not zero-extending loads.

rdar://problem/18236850

llvm-svn: 217799
2014-09-15 18:23:52 +00:00
Matt Arsenault
bbce701d89 CHECK-LABELize test
llvm-svn: 217797
2014-09-15 17:56:56 +00:00
Matt Arsenault
c4d4d57a16 R600/SI: Prefer selecting more e64 instruction forms.
Add some more tests to make sure better operand
choices are still made. Leave some cases that seem
to have no reason to ever be e64 alone.

llvm-svn: 217789
2014-09-15 17:15:02 +00:00
Matt Arsenault
579ac03562 R600/SI: Make sure double vector fmul is tested
llvm-svn: 217787
2014-09-15 17:04:54 +00:00
Matt Arsenault
8e07f5aa14 R600/SI: Add some mubuf testcases.
I noticed some odd looking cases where addr64 wasn't set
when storing to a pointer in an SGPR. This seems to be intentional,
and partially tested already.

The documentation seems to describe addr64 in terms of which registers
addressing modifiers come from, but I would expect to always need
addr64 when using 64-bit pointers. If no offset is applied,
it makes sense to not need to worry about doing a 64-bit add
for the final address. A small immediate offset can be applied,
so is it OK to not have addr64 set if a carry is necessary when adding
the base pointer in the resource to the offset?

llvm-svn: 217785
2014-09-15 16:48:01 +00:00
Matt Arsenault
236d59890d R600/SI: Add preliminary support for flat address space
llvm-svn: 217777
2014-09-15 15:41:53 +00:00
Toma Tabacu
a227ffeb8e [mips] Marked the DADDiu instruction aliases as MIPS III.
Patch by Vasileios Kalintiris.

Differential Revision: http://reviews.llvm.org/D5239

llvm-svn: 217770
2014-09-15 14:47:46 +00:00
Chandler Carruth
3ee73d8f19 [x86] Begin emitting PBLENDW instructions for integer blend operations
when SSE4.1 is available.

This removes a ton of domain crossing from blend code paths that were
ending up in the floating point code path.

This is just the tip of the iceberg though. The real switch is for
integer blend lowering to more actively rely on this instruction being
available so we don't hit shufps at all any longer. =] That will come in
a follow-up patch.

Another place where we need better support is for using PBLENDVB when
doing so avoids the need to have two complementary PSHUFB masks.

llvm-svn: 217767
2014-09-15 12:40:54 +00:00
Chandler Carruth
2b509d476d [x86] Add an explicit SSE3 run to this test and flesh out a bunch of
missing specific checks.

While there is a lot of redundancy here where all-but-one mode use the
same code generation, I'd rather have each variant spelled out and
checked so that readers aren't misled by an omission in the test suite.

llvm-svn: 217765
2014-09-15 11:40:20 +00:00
Chandler Carruth
7097776d2e [x86] Teach the x86 DAG combiner to form UNPCKLPS and UNPCKHPS
instructions from the relevant shuffle patterns.

This is the last tweak I'm aware of to generate essentially perfect
v4f32 and v2f64 shuffles with the new vector shuffle lowering up through
SSE4.1. I'm sure I've missed some and it'd be nice to check since v4f32
is amenable to exhaustive exploration, but this is all of the tricks I'm
aware of.

With AVX there is a new trick to use the VPERMILPS instruction, that's
coming up in a subsequent patch.

llvm-svn: 217761
2014-09-15 11:26:25 +00:00
Chandler Carruth
1aad39bc7a [x86] Teach the x86 DAG combiner to form MOVSLDUP and MOVSHDUP
instructions when it finds an appropriate pattern.

These are lovely instructions, and its a shame to not use them. =] They
are fast, and can hand loads folded into their operands, etc.

I've also plumbed the comment shuffle decoding through the various
layers so that the test cases are printed nicely.

llvm-svn: 217758
2014-09-15 11:15:23 +00:00
Chandler Carruth
8d1918703c [x86] Undo a flawed transform I added to form UNPCK instructions when
AVX is available, and generally tidy up things surrounding UNPCK
formation.

Originally, I was thinking that the only advantage of PSHUFD over UNPCK
instruction variants was its free copy, and otherwise we should use the
shorter encoding UNPCK instructions. This isn't right though, there is
a larger advantage of being able to fold a load into the operand of
a PSHUFD. For UNPCK, the operand *must* be in a register so it can be
the second input.

This removes the UNPCK formation in the target-specific DAG combine for
v4i32 shuffles. It also lifts the v8 and v16 cases out of the
AVX-specific check as they are potentially replacing multiple
instructions with a single instruction and so should always be valuable.
The floating point checks are simplified accordingly.

This also adjusts the formation of PSHUFD instructions to attempt to
match the shuffle mask to one which would fit an UNPCK instruction
variant. This was originally motivated to allow it to match the UNPCK
instructions in the combiner, but clearly won't now.

Eventually, we should add a MachineCombiner pass that can form UNPCK
instructions post-RA when the operand is known to be in a register and
thus there is no loss.

llvm-svn: 217755
2014-09-15 10:35:41 +00:00
Chandler Carruth
c0324d3763 [x86] Teach the new vector shuffle lowering to use 'punpcklwd' and
'punpckhwd' instructions when suitable rather than falling back to the
generic algorithm.

While we could canonicalize to these patterns late in the process, that
wouldn't help when the freedom to use them is only visible during
initial lowering when undef lanes are well understood. This, it turns
out, is very important for matching the shuffle patterns that are used
to lower sign extension. Fixes a small but relevant regression in
gcc-loops with the new lowering.

When I changed this I noticed that several 'pshufd' lowerings became
unpck variants. This is bad because it removes the ability to freely
copy in the same instruction. I've adjusted the widening test to handle
undef lanes correctly and now those will correctly continue to use
'pshufd' to lower. However, this caused a bunch of churn in the test
cases. No functional change, just churn.

Both of these changes are part of addressing a general weakness in the
new lowering -- it doesn't sufficiently leverage undef lanes. I've at
least a couple of patches that will help there at least in an academic
sense.

llvm-svn: 217752
2014-09-15 09:02:37 +00:00
David Majnemer
9985d36f3b InstSimplify: Simplify trivial and/or of icmps
Some ICmpInsts when anded/ored with another ICmpInst trivially reduces
to true or false depending on whether or not all integers or no integers
satisfy the intersected/unioned range.

This sort of trivial looking code can come about when InstCombine
performs a range reduction-type operation on sdiv and the like.

This fixes PR20916.

llvm-svn: 217750
2014-09-15 08:15:28 +00:00
Chandler Carruth
3aaf043ab0 [x86] Teach the new vector shuffle lowering to use BLENDPS and BLENDPD.
These are super simple. They even take precedence over crazy
instructions like INSERTPS because they have very high throughput on
modern x86 chips.

I still have to teach the integer shuffle variants about this to avoid
so many domain crossings. However, due to the particular instructions
available, that's a touch more complex and so a separate patch.

Also, the backend doesn't seem to realize it can commute blend
instructions by negating the mask. That would help remove a number of
copies here. Suggestions on how to do this welcome, it's an area I'm
less familiar with.

llvm-svn: 217744
2014-09-14 23:43:33 +00:00
NAKAMURA Takumi
ee2b59b691 llvm/test/CodeGen/X86/vec_shuffle-38.ll: Add explicit -mtriple=x86_64-unknown to avoid incompatibility of win32.
llvm-svn: 217742
2014-09-14 23:39:01 +00:00
Chandler Carruth
20b263c37d [x86] Add an SSE41 mode to this test. Nothing interesting here, its the
same as SSE3.

llvm-svn: 217741
2014-09-14 23:28:12 +00:00
Chandler Carruth
a6586c7127 [x86] Switch this test to use an ALL prefix with special SSE2 and SSE3
variants where significant.

This will make it more obvious what is happening when we start using
blends in SSE41.

llvm-svn: 217740
2014-09-14 23:19:37 +00:00
Chandler Carruth
22df61c791 [x86] Add some test cases where we should emit blendpd in SSE4.1. No
actual change yet though.

llvm-svn: 217739
2014-09-14 23:15:52 +00:00
Chandler Carruth
31d65bc142 [x86] Teach the vector combiner that picks a canonical shuffle from to
support transforming the forms from the new vector shuffle lowering to
use 'movddup' when appropriate.

A bunch of the cases where we actually form 'movddup' don't actually
show up in the test results because something even later than DAG
legalization maps them back to 'unpcklpd'. If this shows back up as
a performance problem, I'll probably chase it down, but it is at least
an encoded size loss. =/

To make this work, also always do this canonicalizing step for floating
point vectors where the baseline shuffle instructions don't provide any
free copies of their inputs. This also causes us to canonicalize
unpck[hl]pd into mov{hl,lh}ps (resp.) which is a nice encoding space
win.

There is one test which is "regressed" by this: extractelement-load.
There, the test case where the optimization it is testing *fails*, the
exact instruction pattern which results is slightly different. This
should probably be fixed by having the appropriate extract formed
earlier in the DAG, but that would defeat the purpose of the test.... If
this test case is critically important for anyone, please let me know
and I'll try to work on it. The prior behavior was actually contrary to
the comment in the test case and seems likely to have been an accident.

llvm-svn: 217738
2014-09-14 22:41:37 +00:00
Matt Arsenault
89c98636b8 R600/SI: Fix broken check lines
llvm-svn: 217736
2014-09-14 18:32:05 +00:00
Juergen Ributzka
e238e394d2 [FastISel][AArch64] Add support for non-native types for logical ops.
Extend the logical ops selection to also support non-native types such as i1,
i8, and i16.

Fixes rdar://problem/18330589.

llvm-svn: 217732
2014-09-13 23:46:28 +00:00
Chad Rosier
f160a60b0d [AArch64] Update test case to pass with post-RA MI scheduler.
Check that the post RA scheduler is being skipped, regardless of
whether it's the top-down list latency scheduler or the post-RA
MI scheduler.

llvm-svn: 217725
2014-09-13 03:23:23 +00:00
Nick Kledzik
1776c9e2b9 Stop suppress error messages in test case to see why one buildbot is failing
llvm-svn: 217715
2014-09-12 22:46:01 +00:00
Nick Kledzik
269b17eed0 [llvm-objdump] support -rebase option for mach-o to dump rebasing info
Similar to my previous -exports-trie option, the -rebase option dumps info from
the LC_DYLD_INFO load command. The rebasing info is a list of the the locations
that dyld needs to adjust if a mach-o image is not loaded at its preferred 
address. Since ASLR is now the default, images almost never load at their
preferred address, and thus need to be rebased by dyld.

llvm-svn: 217709
2014-09-12 21:34:15 +00:00
Justin Bogner
1635d485a8 llvm-profdata: Avoid undefined behaviour when reading raw profiles
The raw profiles that are generated in compiler-rt always add padding
so that each profile is aligned, so we can simply treat files that
don't have this property as malformed.

Caught by Alexey's new ubsan bot. Thanks!

llvm-svn: 217708
2014-09-12 21:22:55 +00:00
Chad Rosier
a74d292633 FileCheckize. NFC.
llvm-svn: 217698
2014-09-12 17:55:16 +00:00
Chad Rosier
e7b10df26b [AArch64] Enable post-RA MI scheduler.
Phabricator Revision: http://reviews.llvm.org/D5278
Patch by Sanjin Sijaric!

llvm-svn: 217693
2014-09-12 17:40:39 +00:00
Jordan Rose
6655e823e0 [lit] Parse all strings as UTF-8 rather than ASCII.
As far as I can tell UTF-8 has been supported since the beginning of Python's
codec support, and it's the de facto standard for text these days, at least
for primarily-English text. This allows us to put Unicode into lit RUN lines.

rdar://problem/18311663

llvm-svn: 217688
2014-09-12 16:46:05 +00:00
NAKAMURA Takumi
1668088977 llvm/test/CodeGen/X86/vec_ctbits.ll: Add explicit -mtriple=x86_64-unknown. It was incompatible to Win32 x64.
llvm-svn: 217683
2014-09-12 15:10:56 +00:00
Zoran Jovanovic
f1edd29dad [mips][microMIPS] Implement JRADDIUSP instruction
Differential Revision: http://reviews.llvm.org/D5046

llvm-svn: 217681
2014-09-12 14:29:54 +00:00
Bill Schmidt
b47e7e1e1d Address comments on r217622
llvm-svn: 217680
2014-09-12 14:26:36 +00:00
Zoran Jovanovic
e18ff10a80 [mips][microMIPS] Implement BGEZALS and BLTZALS instructions
Differential Revision: http://reviews.llvm.org/D5004

llvm-svn: 217678
2014-09-12 13:51:58 +00:00
Zoran Jovanovic
29a4b03447 [mips][microMIPS] Implement JALS and JALRS instructions.
Differential Revision: http://reviews.llvm.org/D5003

llvm-svn: 217676
2014-09-12 13:43:41 +00:00
Zoran Jovanovic
4d73b21714 [mips][microMIPS] Implement TLBP, TLBR, TLBWI and TLBWR instructions
Differential Revision: http://reviews.llvm.org/D5211

llvm-svn: 217675
2014-09-12 13:33:33 +00:00
James Molloy
b9abbdacdc [ARM] Teach the cost model that cross-class copies are costly.
Cross-class copies being expensive is actually a trait of the microarchitecture, but as I haven't yet seen an example of a microarchitecture where they're cheap it seems best to just enable this by default, covering the non-mcpu build case.

llvm-svn: 217674
2014-09-12 13:29:40 +00:00