1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-27 05:53:07 +01:00
Commit Graph

71930 Commits

Author SHA1 Message Date
Adam Nemet
f0c71981d5 [AVX512] Switch FMA intrinsics to the masking version
This does the renaming and updates the lowering logic.

Part of <rdar://problem/17688758>

llvm-svn: 215664
2014-08-14 17:13:30 +00:00
Adam Nemet
181433c3de [X86] Break out logic to map FMA Intrinsic number to Opcode
No functional change.  Will be used to lower AVX512 masking FMA intrinsics.

llvm-svn: 215663
2014-08-14 17:13:27 +00:00
Adam Nemet
4afa7f042a [AVX512] Add enum for the static rounding types
No functional change.  This will be used by the new FMA intrinsic lowering
code.

We can probably add NO_EXC here as well, I am just not too familiar with this
part of AVX512 yet.  We can add that later.

llvm-svn: 215662
2014-08-14 17:13:26 +00:00
Adam Nemet
a00c3f7239 [AVX512] Break out the logic to lower masking intrinsics
No functional change.  This will be used by the FMA intrinsic lowering as well
and hopefully many more.

llvm-svn: 215661
2014-08-14 17:13:24 +00:00
Adam Nemet
ccc6a7155c [AVX512] Add masking variant for the FMA instructions
This change further evolves the base class AVX512_masking in order to make it
suitable for the masking variants of the FMA instructions.

Besides AVX512_masking there is now a new base class that instructions
including FMAs can use: AVX512_masking_3src.  With three-source (destructive)
instructions one of the sources is already tied to the destination.  This
difference from AVX512_masking is captured by this new class.  The common bits
between _masking and _masking_3src are broken out into a new super class
called AVX512_masking_common.

As with valign, there is some corresponding restructuring of the underlying
format classes.  The idea is the same we want to derive from two classes
essentially: one providing the format bits and another format-independent
multiclass supplying the various masking and non-masking instruction variants.

Existing fma tests in avx512-fma*.ll provide coverage here for the non-masking
variants.  For masking, the next patches in the series will add intrinsics and
intrinsic tests.

For AVX512_masking_3src to work, the (ins ...) dag has to be passed *without*
the leading source operand that is tied to dst ($src1).  This is necessary to
properly construct the (ins ...) for the different variants.  For the record,
I did check that if $src is mistakenly included, you do get a fairly intuitive
error message from the tablegen backend.

Part of <rdar://problem/17688758>

llvm-svn: 215660
2014-08-14 17:13:19 +00:00
Juergen Ributzka
49e924e64e Revert "[FastISel][AArch64] Add support for more addressing modes."
This reverts commits r215597, because it might have broken the build bots.

llvm-svn: 215659
2014-08-14 17:10:54 +00:00
Hal Finkel
d98fc22cbf Add noalias metadata for general calls (not just memory intrinsics) during inlining
When preserving noalias function parameter attributes by adding noalias
metadata in the inliner, we should do this for general function calls (not just
memory intrinsics). The logic is very similar to what already existed (except
that we want to add this metadata even for functions taking no relevant
parameters). This metadata can be used by ModRef queries in the caller after
inlining.

This addresses the first part of PR20500. Adding noalias metadata during
inlining is still turned off by default.

llvm-svn: 215657
2014-08-14 16:44:03 +00:00
Moritz Roth
464943b6ce Testing commit access.
Remove a trailing whitespace.

llvm-svn: 215653
2014-08-14 16:20:50 +00:00
Chad Rosier
9e119aad03 [Reassociation] Add support for reassociation with unsafe algebra.
Vector instructions are (still) not supported for either integer or floating
point.  Hopefully, that work will be landed shortly.

llvm-svn: 215647
2014-08-14 15:23:01 +00:00
Sanjay Patel
e7588bf533 optimize vector fneg of bitcasted integer value
This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops.

This patch is very similar to a fabs patch committed at r214892.

Differential Revision: http://reviews.llvm.org/D4852

llvm-svn: 215646
2014-08-14 15:15:28 +00:00
Rafael Espindola
1b2eb4e486 Delete support for AuroraUX.
auroraux.org is not resolving.

I will add this to the release notes as soon as I figure out where to put the
3.6 release notes :-)

llvm-svn: 215645
2014-08-14 15:15:09 +00:00
Aaron Ballman
5e89d939ac Silencing an MSVC C4334 warning ('<<' : result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)). NFC.
llvm-svn: 215642
2014-08-14 13:43:57 +00:00
Chandler Carruth
b216c5b9a3 [x86] Begin stubbing out the AVX support in the new vector shuffle
lowering scheme.

Currently, this just directly bails to the fallback path of splitting
the 256-bit vector into two 128-bit vectors, operating there, and then
joining the results back together. While the results are far from
perfect, they are *shockingly* good for what we're doing here. I'll be
layering the rest of the functionality on top of this piece by piece and
updating tests as I go.

Note that 256-bit vectors in this mode are still somewhat WIP. While
I think the code paths that I'm adding here are clean and good-to-go,
there are still a lot of 128-bit assumptions that I'll need to stomp out
as I march through the functional spread here.

llvm-svn: 215637
2014-08-14 12:13:59 +00:00
Zoran Jovanovic
25db0be672 [mips][microMIPS] MicroMIPS Compact Branch Instructions BEQZC and BNEZC
Differential Revision: http://reviews.llvm.org/D3545

llvm-svn: 215636
2014-08-14 12:09:10 +00:00
Toma Tabacu
06b97e75b0 [mips] Add assembler support for the "la $reg,symbol" pseudo-instruction.
Summary:
This pseudo-instruction allows the programmer to load an address from a symbolic expression into a register.

Patch by David Chisnall.
His work was sponsored by: DARPA, AFRL

I've made some minor changes to the original, such as improving the formatting and adding some comments, and I've also added a test case.

Reviewers: dsanders

Reviewed By: dsanders

Differential Revision: http://reviews.llvm.org/D4808

llvm-svn: 215630
2014-08-14 10:29:17 +00:00
Daniel Sanders
29c87ba390 [mips] Rename [gs]etCanHaveModuleDir to more natural names
Summary:
getCanHaveModuleDir() is renamed to isModuleDirectiveAllowed(), and
setCanHaveModuleDir() is renamed to forbidModuleDirective() since it is only
ever given a false argument.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4885

llvm-svn: 215628
2014-08-14 09:18:14 +00:00
Chandler Carruth
b65d715ac5 [SDAG] Fix a bug in the DAG combiner where we would fail to return the
input node after manually adding it to the worklist and using CombineTo.

Once we use CombineTo the input node may have been deleted. Despite this
being *completely confusing* and somewhat broken, the only way to
"correctly" return from a DAG combine after potentially deleting the
input node is to return *that exact node*....

But really, this code should just never have used CombineTo. It won't do
what it wants (returning the node as mentioned above just causes the
combine to infloop). The correct way to combine away a casted load to
a load of the correct type is to RAUW the chain directly and then return
the loaded value to replace the actual value node.

I managed to find this with the vector shuffle fuzzer even though it
clearly has nothing at all to do with vector shuffles and rather those
happen to trigger a load of a constant pool that hits this combine *just
right*. I've included the test as it is small and a nice stress test
that the infrastructure isn't asserting.

llvm-svn: 215622
2014-08-14 08:18:34 +00:00
David Majnemer
8e04706b9d InstCombine: ((A | ~B) ^ (~A | B)) to A ^ B
Proof using CVC3 follows:
$ cat t.cvc
A, B : BITVECTOR(32);
QUERY BVXOR((A | ~B),(~A |B)) = BVXOR(A,B);
$ cvc3 t.cvc
Valid.

Patch by Mayur Pandey!

Differential Revision: http://reviews.llvm.org/D4883

llvm-svn: 215621
2014-08-14 06:46:25 +00:00
David Majnemer
26a2d38959 AArch64: Silence warning in AArch64FastISel
GCC was emitting a signed vs unsigned comparison warning.

llvm-svn: 215620
2014-08-14 06:44:51 +00:00
David Majnemer
a878644a06 Added InstCombine Transform for ((B | C) & A) | B -> B | (A & C)
Transform ((B | C) & A) | B --> B | (A & C)

Z3 Link: http://rise4fun.com/Z3/hP6p

Patch by Sonam Kumari!

Differential Revision: http://reviews.llvm.org/D4865

llvm-svn: 215619
2014-08-14 06:41:38 +00:00
Saleem Abdulrasool
9b9af49013 MC: AsmLexer: handle multi-character CommentStrings correctly
As X86MCAsmInfoDarwin uses '##' as CommentString although a single '#' starts a
comment a workaround for this special case is added.

Fixes divisions in constant expressions for the AArch64 assembler and other
targets which use '//' as CommentString.

Patch by Janne Grunau!

llvm-svn: 215615
2014-08-14 02:51:43 +00:00
Lang Hames
ada52150c3 [MCJIT] Support DisableSymbolSearching and InstallLazyFunctionCreator in MCJIT.
Patch by Anthony Pesch. Thanks Anthony!

llvm-svn: 215613
2014-08-14 02:38:20 +00:00
Chandler Carruth
f5c0be3c57 [SDAG] Fix a case where we would iteratively legalize a node during
combining by replacing it with something else but not re-process the
node afterward to remove it.

In a truly remarkable stroke of bad luck, this would (in the test case
attached) end up getting some other node combined into it without ever
getting re-processed. By adding it back on to the worklist, in addition
to deleting the dead nodes more quickly we also ensure that if it
*stops* being dead for any reason it makes it back through the
legalizer. Without this, the test case will end up failing during
instruction selection due to an and node with a type we don't have an
instruction pattern for.

It took many million runs of the shuffle fuzz tester to find this.

llvm-svn: 215611
2014-08-14 01:07:37 +00:00
Quentin Colombet
5b4be241d4 [X86] Fix the value of the low mask for the lowering of MUL_LOHI for v4i32.
Found by code inspection.

llvm-svn: 215604
2014-08-13 23:49:24 +00:00
Akira Hatanaka
87c53cc314 [AArch64, fast-isel] Fall back to SelectionDAG to select tail calls.
Certain functions such as objc_autoreleaseReturnValue have to be called as
tail-calls even at -O0. Since normal fast-isel doesn't emit calls as tail calls,
we have to fall back to SelectionDAG to select calls that are marked as tail.

<rdar://problem/17991614>

llvm-svn: 215600
2014-08-13 23:23:58 +00:00
Juergen Ributzka
1428168ab7 [FastISel][AArch64] Add support for more addressing modes.
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.

For Example:
  lsl x1, x1, #3     --> ldr x0, [x0, x1, lsl #3]
  ldr x0, [x0, x1]

  sxtw x1, w1
  lsl x1, x1, #3     --> ldr x0, [x0, x1, sxtw #3]
  ldr x0, [x0, x1]

llvm-svn: 215597
2014-08-13 22:53:29 +00:00
Juergen Ributzka
48c446b654 [FastISel][X86] Add large code model support for materializing floating-point constants.
In the large code model for X86 floating-point constants are placed in the
constant pool and materialized by loading from it. Since the constant pool
could be far away, a PC relative load might not work. Therefore we first
materialize the address of the constant pool with a movabsq and then load
from there the floating-point value.

Fixes <rdar://problem/17674628>.

llvm-svn: 215595
2014-08-13 22:25:35 +00:00
Juergen Ributzka
812ef43f49 [FastISel][X86] Use XOR to materialize the "0" value.
llvm-svn: 215594
2014-08-13 22:22:17 +00:00
Juergen Ributzka
9760c4b351 [FastISel][X86] Emit more efficient instructions for integer constant materialization.
This mostly affects the i64 value type, which always resulted in an 15byte
mobavsq instruction to materialize any constant. The custom code checks the
value of the immediate and tries to use a different and smaller mov
instruction when possible.

This fixes <rdar://problem/17420988>.

llvm-svn: 215593
2014-08-13 22:18:11 +00:00
Juergen Ributzka
410902c24a [FastISel][AArch64] Make use of the zero register when possible.
This change materializes now the value "0" from the zero register.
The zero register can be folded by several instruction, so no
materialization is need at all.

Fixes <rdar://problem/17924413>.

llvm-svn: 215591
2014-08-13 22:13:14 +00:00
Juergen Ributzka
3e89dc8eab [FastISel] Let the target decide first if it wants to materialize a constant.
This changes the order in which FastISel tries to materialize a constant.
Originally it would try to use a simple target-independent approach, which
can lead to the generation of inefficient code.

On X86 this would result in the use of movabsq to materialize any 64bit
integer constant - even for simple and small values such as 0 and 1. Also
some very funny floating-point materialization could be observed too.

On AArch64 it would materialize the constant 0 in a register even the
architecture has an actual "zero" register.

On ARM it would generate unnecessary mov instructions or not use mvn.

This change simply changes the order and always asks the target first if it
likes to materialize the constant. This doesn't fix all the issues
mentioned above, but it enables the targets to implement such
optimizations.

Related to <rdar://problem/17420988>.

llvm-svn: 215588
2014-08-13 22:08:02 +00:00
Gerolf Hoflehner
971b863baf [MachineCombiner] Removal of dangling DBG_VALUES after combining [20598]
This is a cleaner solution to the problem described in r215431.
When instructions are combined a dangling DBG_VALUE is removed.
This resolves bug 20598.

llvm-svn: 215587
2014-08-13 22:07:36 +00:00
Juergen Ributzka
a1bfac9396 [FastISel][X86] Refactor constant materialization. NFCI.
Split the constant materialization code into three separate helper functions for
Integer-, Floating-Point-, and GlobalValue-Constants.

llvm-svn: 215586
2014-08-13 22:01:55 +00:00
Juergen Ributzka
fdca7fc3da [FastISel][ARM] Use MOVT/MOVW if the subtarget requests it.
This change is also in preparation for a future change to make sure that
the constant materialization uses MOVT/MOVW when available and not a load
from the constant pool.

llvm-svn: 215584
2014-08-13 21:42:19 +00:00
Juergen Ributzka
1531c2e27a [FastISel][ARM] Fix a bug in the integer materialization code.
getRegClassFor returns the incorrect register class when in Thumb2 mode.
This fix simply manually selects the register class as in the code just a few
lines above.

There is no test case for this code, because the code is currently
unreachable. This will be changed in a future commit and existing test
cases will exercise this code.

llvm-svn: 215583
2014-08-13 21:39:18 +00:00
Juergen Ributzka
0b3487625b [FastISel][AArch64] Cleanup constant materialization code. NFCI.
Cleanup and prepare constant materialization code for future commits.

llvm-svn: 215582
2014-08-13 21:34:04 +00:00
Gerolf Hoflehner
97f5907d85 [Cleanup] Utility function to erase instruction and mark DBG_Values
New function to erase a machine instruction and mark DBG_VALUE
for removal. A DBG_VALUE is marked for removal when it references
an operand defined in the instruction.
Use the new function to cleanup code in dead machine instruction
removal pass.

llvm-svn: 215580
2014-08-13 21:15:23 +00:00
Quentin Colombet
2563daec04 [MachineDominatorTree] Provide a method to inform a MachineDominatorTree that a
critical edge has been split. The MachineDominatorTree will when lazy update the
underlying dominance properties when require.

** Context **

This is a follow-up of r215410.
Each time a critical edge is split this invalidates the dominator tree
information. Thus, subsequent queries of that interface will be slow until the
underlying information is actually recomputed (costly).

** Problem **

Prior to this patch, splitting a critical edge needed to query the dominator
tree to update the dominator information.
Therefore, splitting a bunch of critical edges will likely produce poor
performance as each query to the dominator tree will use the slow query path.
This happens a lot in passes like MachineSink and PHIElimination.

** Proposed Solution **

Splitting a critical edge is a local modification of the CFG. Moreover, as soon
as a critical edge is split, it is not critical anymore and thus cannot be a
candidate for critical edge splitting anymore. In other words, the predecessor
and successor of a basic block inserted on a critical edge cannot be inserted by
critical edge splitting.

Using these observations, we can pile up the splitting of critical edge and
apply then at once before updating the DT information.

The core of this patch moves the update of the MachineDominatorTree information
from MachineBasicBlock::SplitCriticalEdge to a lazy MachineDominatorTree.

** Performance **

Thanks to this patch, the motivating example compiles in 4- minutes instead of
6+ minutes. No test case added as the motivating example as nothing special but
being huge!

The binaries are strictly identical for all the llvm test-suite + SPECs with and
without this patch for both Os and O3.

Regarding compile time, I observed only noise, although on average I saw a
small improvement.

<rdar://problem/17894619>

llvm-svn: 215576
2014-08-13 21:00:07 +00:00
Jan Vesely
c5cea652e3 utils: Fix segfault in flattencfg
v2: continue iterating through the rest of the bb
    use for loop

v3: initialize FlattenCFG pass in ScalarOps
    add test

v4: split off initializing flattencfg to a separate patch
    add comment

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215574
2014-08-13 20:31:53 +00:00
Jan Vesely
4ea706ef68 Initialize FlattenCFG pass
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215573
2014-08-13 20:31:52 +00:00
Matt Arsenault
16534528fc R600: Correctly set the src value offset for scalarized kernel args
This for some reason fixes v1i64 kernel arguments on pre-SI. This
currently breaks some other cases in the kernel-args.ll test for R600,
but I'm not particularly confident in the new output. VTX_READ_* are not
used for some of the scalarized cases, and the code reading from the
constant buffer doesn't make much sense to me.

llvm-svn: 215564
2014-08-13 18:14:11 +00:00
Benjamin Kramer
da144ed5a2 Canonicalize header guards into a common format.
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)

Changes made by clang-tidy with minor tweaks.

llvm-svn: 215558
2014-08-13 16:26:38 +00:00
Andrea Di Biagio
2e80a6ad4d [DAGCombiner] Improved target independent vector shuffle combine rule.
This patch improves the existing algorithm in DAGCombiner that
attempts to fold shuffles according to rule:
  shuffle(shuffle(x, y, M1), undef, M2) -> shuffle(y, undef, M3)

Before this change, there were cases where the DAGCombiner conservatively
avoided folding shuffles even if the resulting mask would have been legal.
That is because the algorithm wrongly assumed that commuting
an illegal shuffle mask would always produce an illegal mask.

With this change, we now correctly compute the commuted shuffle mask before
calling method 'isShuffleMaskLegal' on it.
On X86, this improves for example the codegen for the following function:

define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
  %1 = shufflevector <4 x i32> %B, <4 x i32> %A, <4 x i32> <i32 1, i32 2, i32 6, i32 7>
  %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 2, i32 3>
  ret <4 x i32> %2
}

Before this change the X86 backend (-mcpu=corei7) generated
the following assembly code for function @test:
  shufps $-23, %xmm0, %xmm1  # xmm1 = xmm1[1,2],xmm0[2,3]
  movhlps %xmm1, %xmm1       # xmm1 = xmm1[1,1]
  movaps %xmm1, %xmm0

Now we produce:
  movhlps %xmm0, %xmm0       # xmm0 = xmm0[1,1]

Added extra test cases in combine-vec-shuffle-2.ll to verify that we correctly
fold according to the above-mentioned rule.

llvm-svn: 215555
2014-08-13 16:09:40 +00:00
Toma Tabacu
178e6e7886 [mips] Refactor calls to setCanHaveModuleDir.
Summary:
Moved some calls to setCanHaveModuleDir to the MipsTargetStreamer base class and removed the resulting empty functions from the MipsTargetELFStreamer class.

Also fixed a missing call to setCanHaveModuleDir in MipsTargetELFStreamer::emitDirectiveSetMicroMips.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: tomatabacu

Differential Revision: http://reviews.llvm.org/D4781

llvm-svn: 215542
2014-08-13 12:48:12 +00:00
Chandler Carruth
5652971ecf [optnone] Make the optnone attribute effective at suppressing function
attribute and function argument attribute synthesizing and propagating.

As with the other uses of this attribute, the goal remains a best-effort
(no guarantees) attempt to not optimize the function or assume things
about the function when optimizing. This is particularly useful for
compiler testing, bisecting miscompiles, triaging things, etc. I was
hitting specific issues using optnone to isolate test code from a test
driver for my fuzz testing, and this is one step of fixing that.

llvm-svn: 215538
2014-08-13 10:49:33 +00:00
Aaron Ballman
2ae3eb4c49 Silence a -Wparenthesis warning with these asserts. NFC.
llvm-svn: 215537
2014-08-13 10:49:07 +00:00
Robert Khasanov
8a64ff14da [SKX] Extended non-temporal load/store instructions for AVX512VL subsets.
Added avx512_movnt_vl multiclass for handling 256/128-bit forms of instruction.
Added encoding and lowering tests.

Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>

llvm-svn: 215536
2014-08-13 10:46:00 +00:00
Daniel Sanders
1e9ddf188d Re-commit: [mips] Implement .ent, .end, .frame, .mask and .fmask.
Patch by Matheus Almeida and Toma Tabacu

The lld test failure on the previous attempt to commit was caused by the
addition of the .pdr section causing the offsets it was checking to change.
This has been fixed by removing the .ent/.end directives from that test since
they weren't really needed.

llvm-svn: 215535
2014-08-13 10:07:34 +00:00
Chandler Carruth
a501271c18 Revert r215415 which causse MSan to crash on a great deal of C++ code.
I've followed up on the original commit as well.

llvm-svn: 215532
2014-08-13 09:19:39 +00:00
Elena Demikhovsky
cff52c24c9 AVX-512: Fixed a bug in shufflevector lowering.
PALIGNR instruction does not exist in AVX-512F set.
Added a test.

llvm-svn: 215526
2014-08-13 07:58:43 +00:00