1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 19:12:56 +02:00
Commit Graph

9283 Commits

Author SHA1 Message Date
Lang Hames
8992c5f69e [X86] New and improved VZeroUpperInserter optimization.
- Adds support for inserting vzerouppers before tail-calls.
  This is enabled implicitly by having MachineInstr::copyImplicitOps preserve
  regmask operands, which allows VZeroUpperInserter to see where tail-calls use
  vector registers.

- Fixes a bug that caused the previous version of this optimization to miss some
  vzeroupper insertion points in loops. (Loops-with-vector-code that followed
  loops-without-vector-code were mistakenly overlooked by the previous version).

- New algorithm never revisits instructions.

Fixes <rdar://problem/16228798>

llvm-svn: 204021
2014-03-17 01:22:54 +00:00
Adrian Prantl
ba6b9e907e Re-add checks that were in this testcase before it was converted to dwarfdump.
llvm-svn: 203981
2014-03-14 23:08:21 +00:00
Ulrich Weigand
59b05e81f9 [ppc64] Avoid copy relocs in named rodata sections
Commit r181723 introduced code to avoid placing initialized variables
needing relocations into the .rodata section, which avoid copy relocs
that do not work as expected on ppc64 function references.

The same treatment is also needed for *named* .rodata.XXX sections.
This patch changes PPC64LinuxTargetObjectFile::SelectSectionForGlobal
to modify "Kind" *before* calling the default SelectSectionForGlobal
routine, instead of first calling the default routine and then just
checking for the (main) .rodata section afterwards.

llvm-svn: 203921
2014-03-14 12:45:22 +00:00
Rafael Espindola
d15cd32b9f Remove the linker_private and linker_private_weak linkages.
These linkages were introduced some time ago, but it was never very
clear what exactly their semantics were or what they should be used
for. Some investigation found these uses:

* utf-16 strings in clang.
* non-unnamed_addr strings produced by the sanitizers.

It turns out they were just working around a more fundamental problem.
For some sections a MachO linker needs a symbol in order to split the
section into atoms, and llvm had no idea that was the case. I fixed
that in r201700 and it is now safe to use the private linkage. When
the object ends up in a section that requires symbols, llvm will use a
'l' prefix instead of a 'L' prefix and things just work.

With that, these linkages were already dead, but there was a potential
future user in the objc metadata information. I am still looking at
CGObjcMac.cpp, but at this point I am convinced that linker_private
and linker_private_weak are not what they need.

The objc uses are currently split in

* Regular symbols (no '\01' prefix). LLVM already directly provides
whatever semantics they need.
* Uses of a private name (start with "\01L" or "\01l") and private
linkage. We can drop the "\01L" and "\01l" prefixes as soon as llvm
agrees with clang on L being ok or not for a given section. I have two
patches in code review for this.
* Uses of private name and weak linkage.

The last case is the one that one could think would fit one of these
linkages. That is not the case. The semantics are

* the linker will merge these symbol by *name*.
* the linker will hide them in the final DSO.

Given that the merging is done by name, any of the private (or
internal) linkages would be a bad match. They allow llvm to rename the
symbols, and that is really not what we want. From the llvm point of
view, these objects should really be (linkonce|weak)(_odr)?.

For now, just keeping the "\01l" prefix is probably the best for these
symbols. If we one day want to have a more direct support in llvm,
IMHO what we should add is not a linkage, it is just a hidden_symbol
attribute. It would be applicable to multiple linkages. For example,
on weak it would produce the current behavior we have for objc
metadata. On internal, it would be equivalent to private (and we
should then remove private).

llvm-svn: 203866
2014-03-13 23:18:37 +00:00
Kevin Enderby
02a99aab20 Add -mtriple=x86_64-linux to this test case to fix the build bots.5
The original commit was r203829.

llvm-svn: 203844
2014-03-13 20:31:19 +00:00
Ekaterina Romanova
b9d21b7ce1 Fix for http://llvm.org/bugs/show_bug.cgi?id=18590
This patch fixes the bug in peephole optimization that folds a load which defines one vreg into the one and only use of that vreg. With debug info, a DBG_VALUE that referenced the vreg considered to be a use, preventing the optimization. The fix is to ignore DBG_VALUE's during the optimization, and undef a DBG_VALUE that references a vreg that gets removed.
Patch by Trevor Smigiel!

llvm-svn: 203829
2014-03-13 18:47:12 +00:00
Tom Stellard
c33b600343 R600: LDS instructions shouldn't implicitly define OQAP
LDS instructions are pseudo instructions which model
the OQAP defs and uses within a single instruction.

This fixes a hang in the opencv MedianFilter tests.

llvm-svn: 203818
2014-03-13 17:13:04 +00:00
Mark Seaborn
7481b1f4be Cleanup: Remove use of old "-enable-correct-eh-support" option from a test
This option enables LowerInvoke's obsolete SJLJ EH support, but the
target used in this test (ARM Darwin) no longer uses the LowerInvoke
pass, so the option has no effect here.  This target currently uses
the newer SjLjEHPrepare pass instead.

This cleanup will help with removing "-enable-correct-eh-support".

Differential Revision: http://llvm-reviews.chandlerc.com/D3064

llvm-svn: 203810
2014-03-13 16:23:00 +00:00
Hans Wennborg
26925002c2 [ARM] Use symbolic register names in .cfi directives only with IAS (PR19110)
This is a follow-up to r203635. Saleem pointed out that since symbolic register
names are much easier to read, it would be good if we could turn them off only
when we really need to because we're using an external assembler.

Differential Revision: http://llvm-reviews.chandlerc.com/D3056

llvm-svn: 203806
2014-03-13 15:56:41 +00:00
Manuel Jacob
a25ab845ae CodeGenPrep: sink extends of illegal types into use block.
Summary:
This helps the instruction selector to lower an i64 * i64 -> i128
multiplication into a single instruction on targets which support it.

This is an update of D2973 which was reverted because of a bug reported
as PR19084.

Reviewers: t.p.northover, chapuni

Reviewed By: t.p.northover

CC: llvm-commits, alex, chapuni

Differential Revision: http://llvm-reviews.chandlerc.com/D3021

llvm-svn: 203797
2014-03-13 13:36:25 +00:00
Elena Demikhovsky
d3e6f6628c AVX-512: masked load/store + intrinsics for them.
llvm-svn: 203790
2014-03-13 12:05:52 +00:00
Hal Finkel
8b6358ead9 [PowerPC] Initial support for the VSX instruction set
VSX is an ISA extension supported on the POWER7 and later cores that enhances
floating-point vector and scalar capabilities. Among other things, this adds
<2 x double> support and generally helps to reduce register pressure.

The interesting part of this ISA feature is the register configuration: there
are 64 new 128-bit vector registers, the 32 of which are super-registers of the
existing 32 scalar floating-point registers, and the second 32 of which overlap
with the 32 Altivec vector registers. This makes things like vector insertion
and extraction tricky: this can be free but only if we force a restriction to
the right register subclass when needed. A new "minipass" PPCVSXCopy takes care
of this (although it could do a more-optimal job of it; see the comment about
unnecessary copies below).

Please note that, currently, VSX is not enabled by default when targeting
anything because it is not yet ready for that.  The assembler and disassembler
are fully implemented and tested. However:

 - CodeGen support causes miscompiles; test-suite runtime failures:
      MultiSource/Benchmarks/FreeBench/distray/distray
      MultiSource/Benchmarks/McCat/08-main/main
      MultiSource/Benchmarks/Olden/voronoi/voronoi
      MultiSource/Benchmarks/mafft/pairlocalalign
      MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
      SingleSource/Benchmarks/CoyoteBench/almabench
      SingleSource/Benchmarks/Misc/matmul_f64_4x4

 - The lowering currently falls back to using Altivec instructions far more
   than it should. Worse, there are some things that are scalarized through the
   stack that shouldn't be.

 - A lot of unnecessary copies make it past the optimizers, and this needs to
   be fixed.

 - Many more regression tests are needed.

Normally, I'd fix these things prior to committing, but there are some
students and other contributors who would like to work this, and so it makes
sense to move this development process upstream where it can be subject to the
regular code-review procedures.

llvm-svn: 203768
2014-03-13 07:58:58 +00:00
Adam Nemet
fc761e9a09 [X86] Add peephole for masked rotate amount
Extend what's currently done for shift because the HW performs this masking
implicitly:

   (rotl:i32 x, (and y, 31)) -> (rotl:i32 x, y)

I use the newly factored out multiclass that was only supporting shifts so
far.

For testing I extended my testcase for the new rotation idiom.

<rdar://problem/15295856>

llvm-svn: 203718
2014-03-12 21:20:55 +00:00
Rafael Espindola
d866898775 Reject alias to undefined symbols in the verifier.
On ELF and COFF an alias is just another name for a position in the file.
There is no way to refer to a position in another file, so an alias to
undefined is meaningless.

MachO currently doesn't support aliases. The spec has a N_INDR, which when
implemented will have a different set of restrictions. Adding support for
it shouldn't be harder than any other IR extension.

For now, having the IR represent what is actually possible with current
tools makes it easier to fix the design of GlobalAlias.

llvm-svn: 203705
2014-03-12 20:15:49 +00:00
Matt Arsenault
469ede65b2 R600: Fix trunc store from i64 to i1
llvm-svn: 203695
2014-03-12 18:45:52 +00:00
Daniel Sanders
0ea082ce7e [mips] BSEL's and BINS[RL] operands are reversed compared to the vselect node used in the pattern.
Summary:
Correct the match patterns and the lowerings that made the CodeGen tests pass despite the mistakes.

The original testcase that discovered the problem was SingleSource/UnitTests/SignlessType/factor.c in test-suite.
During review, we also found that some of the existing CodeGen tests were incorrect and fixed them:
* bitwise.ll: In bsel_v16i8 the IfSet/IfClear were reversed because bsel and bmnz have different operand orders and the test didn't correctly account for this. bmnz goes 'IfClear, IfSet, CondMask', while bsel goes 'CondMask, IfClear, IfSet'.
* vec.ll: In the cases where a bsel is emitted as a bmnz (they are the same operation with a different input tied to the result) the operands were in the wrong order.
* compare.ll and compare_float.ll: The bsel operand order was correct for a greater-than comparison, but a greater-than comparison instruction doesn't exist. Lowering this operation inverts the condition so the IfSet/IfClear need to be swapped to match.

The differences between BSEL, BMNZ, and BMZ and how they map to/from vselect are rather confusing. I've therefore added a note to MSA.txt to explain this in a single place in addition to the comments that explain each case.

Reviewers: matheusalmeida, jacksprat

Reviewed By: matheusalmeida

Differential Revision: http://llvm-reviews.chandlerc.com/D3028

llvm-svn: 203657
2014-03-12 11:54:00 +00:00
Tim Northover
3912f10885 ARM: correct Dwarf output for non-contiguous VFP saves.
When the list of VFP registers to be saved was non-contiguous (so multiple
vpush/vpop instructions were needed) these were being ordered oddly, as in:
    vpush {d8, d9}
    vpush {d11}

This led to the layout in memory being [d11, d8, d9] which is ugly and doesn't
match the CFI_INSTRUCTIONs we're generating either (so Dwarf info would be
broken).

This switches the order of vpush/vpop (in both prologue and epilogue,
obviously) so that the Dwarf locations are correct again.

rdar://problem/16264856

llvm-svn: 203655
2014-03-12 11:29:23 +00:00
Hans Wennborg
bbde26f39a [ARM] Use DWARF register numbers for CFI directives in ELF assembly
It seems gas can't handle CFI directives with VFP register names ("d12", etc.).
This broke us trying to build Chromium for Android after 201423.

A gas bug has been filed: https://sourceware.org/bugzilla/show_bug.cgi?id=16694

compnerd suggested making this conditional on whether we're using the integrated
assembler or not. I'll look into that in a follow-up patch.

Differential Revision: http://llvm-reviews.chandlerc.com/D3049

llvm-svn: 203635
2014-03-12 03:52:34 +00:00
Hans Wennborg
7ce76d19aa X86: Don't generate 64-bit movd after cmpneqsd in 32-bit mode (PR19059)
This fixes the bug where we would bitcast the 64-bit floating point result
of cmpneqsd to a 64-bit integer even on 32-bit targets.

Differential Revision: http://llvm-reviews.chandlerc.com/D3009

llvm-svn: 203581
2014-03-11 15:49:24 +00:00
Saleem Abdulrasool
878ae23fa9 ARM: honour -f{no-,}optimize-sibling-calls
Use the options in the ARMISelLowering to control whether tail calls are
optimised or not.  Previously, this option was entirely ignored on the ARM
target and only honoured on x86.

This option is mostly useful in profiling scenarios.  The default remains that
tail call optimisations will be applied.

llvm-svn: 203577
2014-03-11 15:09:54 +00:00
Saleem Abdulrasool
d4d06957bd ARM: remove ancient -arm-tail-calls option
This option is from 2010, designed to work around a linker issue on Darwin for
ARM.  According to grosbach this is no longer an issue and this option can
safely be removed.

llvm-svn: 203576
2014-03-11 15:09:49 +00:00
Saleem Abdulrasool
75c162a52d ARM: enable tail call optimisation on Thumb 2
Tail call optimisation was previously disabled on all targets other than
iOS5.0+.  This enables the tail call optimisation on all Thumb 2 capable
platforms.

The test adjustments are to remove the IR hint "tail" to function invocation.
The tests were designed assuming that tail call optimisations would not kick in
which no longer holds true.

llvm-svn: 203575
2014-03-11 15:09:44 +00:00
Tim Northover
68c567a38a IR: add a second ordering operand to cmpxhg for failure
The syntax for "cmpxchg" should now look something like:

	cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic

where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).

rdar://problem/15996804

llvm-svn: 203559
2014-03-11 10:48:52 +00:00
Jim Grosbach
3b6ef12947 X86: Enable ISel of 16-bit MOVBE instructions.
When the MOVBE instructions are available, use them for 16-bit endian
swapping as well as for 32 and 64 bit.

The patterns were already present on the instructions, but weren't being
matched because the operation was unconditionally marked to 'Expand.'
Change that to be conditional on whether the MOVBE instructions are
available. Use 'rolw' to implement the in-register version (32 and 64
bit have the dedicated 'bswap' instruction for that).

Patch by Louis Gerbarg <lgg@apple.com>.

rdar://15479984

llvm-svn: 203524
2014-03-11 00:44:14 +00:00
Matt Arsenault
998df7332f Fix undefined behavior in vector shift tests.
These were all shifting the same amount as the bitwidth.

llvm-svn: 203519
2014-03-11 00:01:41 +00:00
Eli Bendersky
a3ecc3bf5a Followup to r203483 - add test.
[forgot to 'svn add' before committing r203483]

llvm-svn: 203485
2014-03-10 20:36:04 +00:00
Sasa Stankovic
37538d4bfa [mips] Implement NaCl sandboxing of loads, stores and SP changes:
* Add masking instructions before loads and stores (in MC layer).
  * Add masking instructions after SP changes (in MC layer).
  * Forbid loads, stores and SP changes in delay slots (in MI layer).

Differential Revision: http://llvm-reviews.chandlerc.com/D2904

llvm-svn: 203484
2014-03-10 20:34:23 +00:00
Reed Kotler
e1cab9f9f1 Fix regression with -O0 for mips .
llvm-svn: 203469
2014-03-10 16:31:25 +00:00
Tim Northover
2f522988cc AArch64: fix LowerCONCAT_VECTORS for new CodeGen.
The function was making too many assumptions about its input:

1. The NEON_VDUP optimisation was far too aggressive, assuming (I
think) that the input would always be BUILD_VECTOR.

2. We were treating most unknown concats as legal (by returning Op
rather than SDValue()). I think only concats of pairs of vectors are
actually legal.

http://llvm.org/PR19094

llvm-svn: 203450
2014-03-10 09:34:07 +00:00
NAKAMURA Takumi
baf4a0d596 Revert r203230, "CodeGenPrep: sink extends of illegal types into use block."
It choked i686 stage2.

llvm-svn: 203386
2014-03-09 11:01:07 +00:00
David Majnemer
4036f15710 IR: Change inalloca's grammar a bit
The grammar for LLVM IR is not well specified in any document but seems
to obey the following rules:

 - Attributes which have parenthesized arguments are never preceded by
   commas.  This form of attribute is the only one which ever has
   optional arguments.  However, not all of these attributes support
   optional arguments: 'thread_local' supports an optional argument but
   'addrspace' does not.  Interestingly, 'addrspace' is documented as
   being a "qualifier".  What constitutes a qualifier?  I cannot find a
   definition.

 - Some attributes use a space between the keyword and the value.
   Examples of this form are 'align' and 'section'.  These are always
   preceded by a comma.

 - Otherwise, the attribute has no argument.  These attributes do not
   have a preceding comma.

Sometimes an attribute goes before the instruction, between the
instruction and it's type, or after it's type.  'atomicrmw' has
'volatile' between the instruction and the type while 'call' has 'tail'
preceding the instruction.

With all this in mind, it seems most consistent for 'inalloca' on an
'inalloca' instruction to occur before between the instruction and the
type.  Unlike the current formulation, there would be no preceding
comma.  The combination 'alloca inalloca' doesn't look particularly
appetizing, perhaps a better spelling of 'inalloca' is down the road.

llvm-svn: 203376
2014-03-09 06:41:58 +00:00
Adam Nemet
05756683c9 Update comment from r203315 based on review
llvm-svn: 203361
2014-03-08 21:51:55 +00:00
David Blaikie
6f7025ca24 DebugInfo: further improvements to test following up on r203329
llvm-svn: 203337
2014-03-08 02:45:53 +00:00
David Blaikie
097ffbf12d DebugInfo: Fix test fallout from r203323
Will fix this harder in a moment.

llvm-svn: 203329
2014-03-08 01:32:51 +00:00
Adam Nemet
f591fbc1db [DAGCombiner] Recognize another rotation idiom
This is the new idiom:

  x<<(y&31) | x>>((0-y)&31)

which is recognized as:

  x ROTL (y&31)

The change refines matchRotateSub.  In
Neg & (OpSize - 1) == (OpSize - Pos) & (OpSize - 1), if Pos is
Pos' & (OpSize - 1) we can just use Pos' instead of Pos.

llvm-svn: 203315
2014-03-07 23:56:28 +00:00
Arnold Schwaighofer
bd1d167eb0 ISel: Make VSELECT selection terminate in cases where the condition type has to
be split and the result type widened.

When the condition of a vselect has to be split it makes no sense widening the
vselect and thereby widening the condition. We end up in an endless loop of
widening (vselect result type) and splitting (condition mask type) doing this.
Instead, split both the condition and the vselect and widen the result.

I ran this over the test suite with i686 and mattr=+sse and saw no regressions.

Fixes PR18036.

llvm-svn: 203311
2014-03-07 23:25:55 +00:00
Sasa Stankovic
5f4d984f32 Moved test file from test/MC/Mips to test/CodeGen/Mips.
llvm-svn: 203298
2014-03-07 22:08:46 +00:00
Tom Stellard
bee4678d48 R600/SI: Using SGPRs is illegal for instructions that read carry-out from VCC
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 203281
2014-03-07 20:12:39 +00:00
Tom Stellard
230af572ff R600/SI: Custom lower i1 stores
These are sometimes created by the shrink to boolean optimization in the
globalopt pass.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 203280
2014-03-07 20:12:33 +00:00
Tim Northover
781b15d502 CodeGenPrep: sink extends of illegal types into use block.
This helps the instruction selector to lower an i64 * i64 -> i128
multiplication into a single instruction on targets which support it.

Patch by Manuel Jacob.

llvm-svn: 203230
2014-03-07 11:04:30 +00:00
Rafael Espindola
cb9ca86245 Replace PROLOG_LABEL with a new CFI_INSTRUCTION.
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.

The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.

The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.

I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.

The net result is that we don't create temporary labels that are never used.

llvm-svn: 203204
2014-03-07 06:08:31 +00:00
Rafael Espindola
fe5dfa44c9 Remove shouldEmitUsedDirectiveFor.
Clang now uses llvm.compiler.used for these cases.

llvm-svn: 203174
2014-03-06 22:47:08 +00:00
Rafael Espindola
f350832c74 Convert test to FileCheck.
llvm-svn: 203173
2014-03-06 22:21:43 +00:00
Andrea Di Biagio
4586b3b78c [X86] Teach the DAGCombiner how to fold a OR of two shufflevector nodes.
This patch teaches the DAGCombiner how to fold a binary OR between two
shufflevector into a single shuffle vector when possible.

The rules are:
  1. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask1)
  2. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf B, A, Mask2)

The DAGCombiner can take advantage of the fact that OR is commutative and
compute two possible shuffle masks (Mask1 and Mask2) for the resulting
shuffle node.

Before folding a dag according to either rule 1 or 2, DAGCombiner verifies
that the resulting shuffle mask is legal for the target.
DAGCombiner would firstly try to fold according to 1.; If not possible
then it will try to fold according to 2.
If both Mask1 and Mask2 are illegal then we conservatively don't fold
the OR instruction.

llvm-svn: 203156
2014-03-06 20:19:52 +00:00
Matt Arsenault
8140d7d370 R600: Fix extloads from i8 / i16 to i64.
This appears to only be working for global loads. Private
and local break for other reasons.

llvm-svn: 203135
2014-03-06 17:34:12 +00:00
Matt Arsenault
f68a94e609 R600/SI: Expand selects on vectors.
llvm-svn: 203134
2014-03-06 17:34:03 +00:00
Richard Osborne
7d4ecf1273 [XCore] Add support for the "m" inline asm constraint.
Summary:
This provides support for CP and DP relative global accesses in inline
asm.

Reviewers: robertlytton

Reviewed By: robertlytton

Differential Revision: http://llvm-reviews.chandlerc.com/D2943

llvm-svn: 203129
2014-03-06 16:37:48 +00:00
Chad Rosier
6c9595d931 [AArch64] This is a work in progress to provide a machine description
for the Cortex-A53 subtarget in the AArch64 backend.

This patch lays the ground work to annotate each AArch64 instruction
(no NEON yet) with a list of SchedReadWrite types. The patch also
provides the Cortex-A53 processor resources, maps those the the default
SchedReadWrites, and provides basic latency. NEON support will be added
in a subsequent patch with proper forwarding logic.

Verification was done by setting the pre-RA scheduler to linearize to
better gauge the effect of the MIScheduler. Even without modeling the
forward logic, the results show a modest improvement for Cortex-A53.

Reviewers: apazos, mcrosier, atrick
Patch by Dave Estes <cestes@codeaurora.org>!

llvm-svn: 203125
2014-03-06 16:04:00 +00:00
Hal Finkel
c04da1f4c7 Fixup PPC Darwin i1 argument handling
Like on other targets, we need to zero_extend/truncate i1 args before copying
them to GPRs.

llvm-svn: 203045
2014-03-06 00:45:19 +00:00
Hal Finkel
0373847793 When using CR bit registers on PPC32, handle the i1 vaarg case
When copying an i1 value into a GPR for a vaarg call, we need to explicitly
zero-extend the i1 value (otherwise an invalid CRBIT -> GPR copy will be
generated).

llvm-svn: 203041
2014-03-06 00:23:33 +00:00