1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 05:23:45 +02:00
Commit Graph

30661 Commits

Author SHA1 Message Date
Daniel Sanders
c027690ada [mips] Fix unused variable warning introduced in r221056
llvm-svn: 221058
2014-11-01 18:53:01 +00:00
Daniel Sanders
f5c1f29334 [mips] Remove ByValArgInfo::Address in favour of CCValAssign::getMemLocOffset(). NFC.
Summary: ByValArgInfo is practically the same as CCState::ByValInfo now.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5976

llvm-svn: 221057
2014-11-01 18:44:56 +00:00
Daniel Sanders
698188b376 [mips] Move F128 argument handling into MipsCCState as we did for returns. NFC.
Summary:
There are a couple more changes to make before analyzeFormalArguments can
be merged into the standard AnalyzeFormalArguments. I've had to temporarily
poke a couple holes in MipsCCState's encapsulation to save having to make
all the required changes for this merge all at once*. These will be removed
shortly.

* We must merge our ByVal argument handling with the implementation in CCState.
  This will be done over the next three patches, then the fourth will merge
  analyzeFormalArguments with AnalyzeFormalArguments.

Depends on D5967

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5969

llvm-svn: 221056
2014-11-01 18:38:03 +00:00
Daniel Sanders
6acd6b7829 [mips] Remove MipsCC::CCInfo. NFC.
Summary:
It's now passed in as an argument to functions that need it. Eventually
this argument will be replaced by the 'this' pointer for a MipsCCState
object.

Depends on D5966

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5967

llvm-svn: 221054
2014-11-01 18:13:52 +00:00
Daniel Sanders
b89d32de0f [mips] Removed MipsCC::fixedArgFn(). NFC
Summary:
There is one remaining trace of it in MipsCC::analyzeCallOperands() where
Mips16 might override the calling convention. This will moved into
tablegen-erated code later.

Depends on D5965

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5966

llvm-svn: 221053
2014-11-01 17:44:51 +00:00
Daniel Sanders
79790d7c5e [tablegen] Add CustomCallingConv and use it to tablegen-erate the outermost parts of the Mips O32 implementation
Summary:
CustomCallingConv is simply a CallingConv that tablegen should not generate the
implementation for. It allows regular CallingConv's to delegate to these custom
functions. This is (currently) necessary for Mips and we cannot use CCCustom
without having to adapt to the different API that CCCustom uses.

This brings us a bit closer to being able to remove
MipsCC::analyzeCallOperands and MipsCC::analyzeFormalArguments in favour of
the common implementation.

No functional change to the targets.

Depends on D3341

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: vmedic, llvm-commits

Differential Revision: http://reviews.llvm.org/D5965

llvm-svn: 221052
2014-11-01 17:38:22 +00:00
Rafael Espindola
20114e4c02 Remove redundant calls to isMaterializable.
This removes calls to isMaterializable in the following cases:

* It was redundant with a call to isDeclaration now that isDeclaration returns
  the correct answer for materializable functions.
* It was followed by a call to Materialize. Just call Materialize and check EC.

llvm-svn: 221050
2014-11-01 16:46:18 +00:00
Daniel Sanders
d266fea71e Revert r221048 - Test commit
It seems I can't commit unless $DBUS_SESSION_BUS_ADDRESS is set correctly and
it is not set for ssh sessions.

llvm-svn: 221049
2014-11-01 16:08:03 +00:00
Daniel Sanders
f112d3f8cc Test commit
Added some whitespace to debug some authentication issues I'm having.

llvm-svn: 221048
2014-11-01 16:00:40 +00:00
Adrian Prantl
401e61e9a6 Revert "Temporarily revert r220777 to sort out build bot breakage."
This reverts commit r221028. Later commits depend on this and
reverting just this one causes even more bots to fail.

llvm-svn: 221041
2014-11-01 03:19:45 +00:00
NAKAMURA Takumi
2419cf2c14 Revert r220779, "[AVX512] Removed special case for cmp instructions in getVectorMaskingNode. Now cmp intrinsics lower as other intrinsics through VSELECT, and then VSELECT tranforms to AND in PerformSELECTCombine."
Since r221028 (reverting r220777), this caused failures.

llvm-svn: 221040
2014-11-01 01:36:14 +00:00
Adrian Prantl
379cee6cd0 Temporarily revert r220777 to sort out build bot breakage.
"[x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros."

llvm-svn: 221028
2014-11-01 00:26:59 +00:00
Duncan P. N. Exon Smith
7004fd9aac IR: MDNode => Value: Instruction::getMetadata()
Change `Instruction::getMetadata()` to return `Value` as part of
PR21433.

Update most callers to use `Instruction::getMDNode()`, which wraps the
result in a `cast_or_null<MDNode>`.

llvm-svn: 221024
2014-11-01 00:10:31 +00:00
Reid Kleckner
4f54f1c0fd Revert "R600: Add missing file to CMakeLists.txt"
This reverts commit r220998.

It should've been reverted with the other change.

llvm-svn: 221021
2014-10-31 23:39:10 +00:00
Reid Kleckner
660a86c442 Revert "R600: Make sure to inline all internal functions"
This reverts commit r220996.

It introduced layering violations causing link errors in many
configurations.

llvm-svn: 221020
2014-10-31 23:35:26 +00:00
Reid Kleckner
626248cfdf Work around bugs in MSVC "14" CTP 3's conversion logic
It appears to ignore or find ambiguous MachineInstrBuilder's conversion
operators that allow conversion to MachineInstr* and
MachineBasicBlock::bundle_iterator.

As a workaround, add an explicit way to get the MachineInstr.

llvm-svn: 221017
2014-10-31 23:19:46 +00:00
Tom Stellard
7f51cfe90c R600: Add IPO to the list of required libraries
llvm-svn: 221004
2014-10-31 21:52:08 +00:00
Tom Stellard
992a2f893a R600: Add missing file to CMakeLists.txt
llvm-svn: 220998
2014-10-31 20:56:36 +00:00
Tom Stellard
044bedf9b1 R600: Don't promote allocas when one of the users is a ptrtoint instruction
We need to figure out how to track ptrtoint values all the
way until result is converted back to a pointer in order
to correctly rewrite the pointer type.

llvm-svn: 220997
2014-10-31 20:52:04 +00:00
Tom Stellard
8a8077171a R600: Make sure to inline all internal functions
Function calls aren't supported yet.

llvm-svn: 220996
2014-10-31 20:52:02 +00:00
Bill Schmidt
5c5103e17e [PowerPC] Initial VSX intrinsic support, with min/max for vector double
Now that we have initial support for VSX, we can begin adding
intrinsics for programmer access to VSX instructions.  This patch adds
basic support for VSX intrinsics in general, and tests it by
implementing intrinsics for minimum and maximum for the vector double
data type.

The LLVM portion of this is quite straightforward.  There is a
companion patch for Clang.

llvm-svn: 220988
2014-10-31 19:19:07 +00:00
Chad Rosier
14722ca1a2 [AArch64] Check Dest Register Liveness in CondOpt pass.
Our internal test reveals such case should not be transformed:

  cmp x17, #3
  b.lt .LBB10_15
  ...
  subs x12, x12, #1
  b.gt .LBB10_1

where x12 is a liveout, becomes:

  cmp x17, #2
  b.le .LBB10_15
  ...
  subs x12, x12, #2
  b.ge .LBB10_1

Unable to provide test case as it's difficult to reproduce on community branch.

http://reviews.llvm.org/D6048
Patch by Zhaoshi Zheng <zhaoshiz@codeaurora.org>!

llvm-svn: 220987
2014-10-31 19:02:38 +00:00
Quentin Colombet
06167df4ad [CodeGenPrepare] Move extractelement close to store if they can be combined.
This patch adds an optimization in CodeGenPrepare to move an extractelement
right before a store when the target can combine them.
The optimization may promote any scalar operations to vector operations in the
way to make that possible.


** Context **

Some targets use different register files for both vector and scalar operations.
This means that transitioning from one domain to another may incur copy from one
register file to another. These copies are not coalescable and may be expensive.
For example, according to the scheduling model, on cortex-A8 a vector to GPR
move is 20 cycles.


** Motivating Example **

Let us consider an example:
define void @foo(<2 x i32>* %addr1, i32* %dest) {
 %in1 = load <2 x i32>* %addr1, align 8
 %extract = extractelement <2 x i32> %in1, i32 1
 %out = or i32 %extract, 1
 store i32 %out, i32* %dest, align 4
 ret void
}

As it is, this IR generates the following assembly on armv7:
  vldr  d16, [r0]            @vector load  
  vmov.32 r0, d16[1]  @ cross-register-file copy: 20 cycles
  orr r0, r0, #1           @ scalar bitwise or
  str r0, [r1]               @ scalar store
  bx  lr

Whereas we could generate much faster code:
  vldr  d16, [r0]               @ vector load
  vorr.i32  d16, #0x1     @ vector bitwise or
  vst1.32 {d16[1]}, [r1:32] @ vector extract + store
  bx  lr

Half of the computation made in the vector is useless, but this allows to get
rid of the expensive cross-register-file copy.


** Proposed Solution **

To avoid this cross-register-copy penalty, we promote the scalar operations to
vector operations. The penalty will be removed if we manage to promote the whole
chain of computation in the vector domain.
Currently, we do that only when the chain of computation ends by a store and the
target is able to combine an extract with a store.

Stores are the most likely candidates, because other instructions produce values
that would need to be promoted and so, extracted as some point[1]. Moreover,
this is customary that targets feature stores that perform a vector extract (see
AArch64 and X86 for instance).

The proposed implementation relies on the TargetTransformInfo to decide whether
or not it is beneficial to promote a chain of computation in the vector domain.
Unfortunately, this interface is rather inaccurate for this level of details and
although this optimization may be beneficial for X86 and AArch64, the inaccuracy
will lead to the optimization being too aggressive.
Basically in TargetTransformInfo, everything that is legal has a cost of 1,
whereas, even if a vector type is legal, usually a vector operation is slightly
more expensive than its scalar counterpart. That will lead to too many
promotions that may not be counter balanced by the saving of the
cross-register-file copy. For instance, on AArch64 this penalty is just 4
cycles.

For now, the optimization is just enabled for ARM prior than v8, since those
processors have a larger penalty on cross-register-file copies, and the scope is
limited to basic blocks. Because of these two factors, we limit the effects of
the inaccuracy. Indeed, I did not want to build up a fancy cost model with block
frequency and everything on top of that.

[1] We can imagine targets that can combine an extractelement with  other
instructions than just stores. If we want to go into that direction, the current
interfaces must be augmented and, moreover, I think this becomes a global isel
problem.

Differential Revision: http://reviews.llvm.org/D5921

<rdar://problem/14170854>

llvm-svn: 220978
2014-10-31 17:52:53 +00:00
Chad Rosier
9bc5bd5f9a [AArch64] CondOpt pass is missing FCMP instructions when searching backward for
a CMP which defines the flags used by B.CC.

http://reviews.llvm.org/D6047
Patch by Zhaoshi Zheng <zhaoshiz@codeaurora.org>!

llvm-svn: 220961
2014-10-31 15:17:36 +00:00
Ulrich Weigand
5b86d1f937 [PowerPC] Load BlockAddress values from the TOC in 64-bit SVR4 code
Since block address values can be larger than 2GB in 64-bit code, they
cannot be loaded simply using an @l / @ha pair, but instead must be
loaded from the TOC, just like GlobalAddress, ConstantPool, and
JumpTable values are.

The commit also fixes a bug in PPCLinuxAsmPrinter::doFinalization where
temporary labels could not be used as TOC values, since code would
attempt (and fail) to use GetOrCreateSymbol to create a symbol of the
same name as the temporary label.

llvm-svn: 220959
2014-10-31 10:33:14 +00:00
Robert Khasanov
3e398a3800 [AVX512] Added VBROADCAST{SS/SD} encoding for VL subset.
Refactored through AVX512_maskable
        

llvm-svn: 220908
2014-10-30 14:21:47 +00:00
Robert Khasanov
b26f057244 [AVX512] Implemented AVX512VL FP bnary packed instructions (VADDP*, VSUBP*, VMULP*, VDIVP*, VMAXP*, VMINP*)
Refactored through AVX512_maskable
Added encoding tests for them.

llvm-svn: 220858
2014-10-29 15:43:02 +00:00
Robert Khasanov
4bea47e6eb [AVX512] Fix VSQRT packed instructions internal names.
No functional change

llvm-svn: 220808
2014-10-28 18:22:41 +00:00
Robert Khasanov
2ca56ad410 [AVX512] Extended avx512_sqrt_packed (sqrt instructions) to VL subset.
Refactored through AVX512_maskable

llvm-svn: 220806
2014-10-28 18:15:20 +00:00
Robert Khasanov
d134194df8 [AVX-512] Expanded rsqrt/rcp instructions to VL subset.
Refactored multiclass through AVX512_maskable

llvm-svn: 220783
2014-10-28 16:37:13 +00:00
Robert Khasanov
2b3eb6af20 [AVX512] Removed special case for cmp instructions in getVectorMaskingNode. Now cmp intrinsics lower as other intrinsics through VSELECT, and then VSELECT tranforms to AND in PerformSELECTCombine.
No functional change.

llvm-svn: 220779
2014-10-28 16:17:14 +00:00
Robert Khasanov
fa811056f3 [x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros.
This transformation worked if selector is produced by SETCC, however SETCC is needed only if we consider to swap operands. So I replaced SETCC check for this case.
Added tests for vselect of <X x i1> values.

llvm-svn: 220777
2014-10-28 15:59:40 +00:00
Robert Khasanov
a87de33c61 [AVX512] Bring back vector-shuffle lowering support through broadcasts
Ffter commit at rev219046 512-bit broadcasts lowering become non-optimal. Most of tests on broadcasting and embedded broadcasting were changed and they doesn’t produce efficient code.

Example below is from commit changes (it’s the first test from test/CodeGen/X86/avx512-vbroadcast.ll):

 define   <16 x i32> @_inreg16xi32(i32 %a) {
 ; CHECK-LABEL: _inreg16xi32:
 ; CHECK:       ## BB#0:
-; CHECK-NEXT:    vpbroadcastd %edi, %zmm0
+; CHECK-NEXT:    vmovd %edi, %xmm0
+; CHECK-NEXT:    vpbroadcastd %xmm0, %ymm0
+; CHECK-NEXT:    vinserti64x4 $1, %ymm0, %zmm0, %zmm0
 ; CHECK-NEXT:    retq
 %b = insertelement <16 x i32> undef, i32 %a, i32 0
 %c = shufflevector <16 x i32> %b, <16 x i32> undef, <16 x i32> zeroinitializer
 ret <16 x i32> %c
}

Here, 256-bit broadcast was generated instead of 512-bit one.

In this patch
1) I added vector-shuffle lowering through broadcasts
2) Removed asserts and branches likes because this is incorrect
-  assert(Subtarget->hasDQI() && "We can only lower v8i64 with AVX-512-DQI");
3) Fixed lowering tests

llvm-svn: 220774
2014-10-28 12:28:51 +00:00
Reid Kleckner
af3046bd9e X86: Implement the vectorcall calling convention
This is a Microsoft calling convention that supports both x86 and x86_64
subtargets. It passes vector and floating point arguments in XMM0-XMM5,
and passes them indirectly once they are consumed.

Homogenous vector aggregates of up to four elements can be passed in
sequential vector registers, but this part is not implemented in LLVM
and will be handled in Clang.

On 32-bit x86, it is similar to fastcall in that it uses ecx:edx as
integer register parameters and is callee cleanup. On x86_64, it
delegates to the normal win64 calling convention.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D5943

llvm-svn: 220745
2014-10-28 01:29:26 +00:00
Tim Northover
136fb499eb AArch64: enable Cortex-A57 FP balancing on Cortex-A53.
Benchmarks have shown that it's harmless to the performance there, and having a
unified set of passes between the two cores where possible helps big.LITTLE
deployment.

Patch by Z. Zheng.

llvm-svn: 220744
2014-10-28 01:24:32 +00:00
NAKAMURA Takumi
231973f544 AArch64InstrInfo.h: Fix a warning introduced in clang r220703. [-Winconsistent-missing-override]
llvm-svn: 220739
2014-10-27 23:29:27 +00:00
Adam Nemet
f35a0bba1b [AVX512] Add vpermil variable version
This is implemented via a multiclass that derives from the vperm imm
multiclass.

Fixes <rdar://problem/18426089>

llvm-svn: 220737
2014-10-27 23:08:40 +00:00
Adam Nemet
134480598b [AVX512] Clean up avx512_perm_imm to use X86VectorVTInfo
No functionality change.  No change in X86.td.expanded except that we only set
the CD8 attributes for the memory variants.  (This shouldn't be used unless we
have a memory operand.)

llvm-svn: 220736
2014-10-27 23:08:37 +00:00
Adam Nemet
8cf84f2568 [AVX512] Derive vpermil* from avx512_perm_imm
This used to derive from avx512_pshuf_imm which is confusing.

NFC.  Compared X86.td.expanded.

llvm-svn: 220735
2014-10-27 23:08:34 +00:00
Adam Nemet
78db8293e5 [AVX512] Fix copy-and-paste bugs in vpermil
1) i512mem -> f512mem (this is the packed FP input being permuted)
2) element size is 64 bits in EVEX_CD8 for PD.

(A good illustration why X86VectorVTInfo is useful)

llvm-svn: 220734
2014-10-27 23:08:31 +00:00
Pete Cooper
01b2132972 Fix a stackmap bug introduced in r220710.
For a call to not return in to the stackmap shadow, the shadow must end with the call.

To do this, we must insert any required nops *before* the call, and not after it.

llvm-svn: 220728
2014-10-27 22:38:45 +00:00
Juergen Ributzka
b3117b0b86 [FastISel][AArch64] Emit immediate version of icmp (subs) for null pointer check.
This is a minor change to use the immediate version when the operand is a null
value. This should get rid of an unnecessary 'mov' instruction in debug
builds and align the code more with the one generated by SelectionDAG.

This fixes rdar://problem/18785125.

llvm-svn: 220713
2014-10-27 19:58:36 +00:00
Juergen Ributzka
76c57b0570 [FastISel][AArch64] Optimize compare-and-branch for i1 to use 'tbz'.
Minor enhancement to use 'tbz' for i1 compare-and-branch to get rid of an 'and'
instruction.

This fixes rdar://problem/18784953.

llvm-svn: 220712
2014-10-27 19:46:23 +00:00
Pete Cooper
87efb91a50 Stackmap shadows should consider call returns a branch target.
To avoid emitting too many nops, a stackmap shadow can include emitted instructions in the shadow, but these must not include branch targets.

A return from a call should count as a branch target as patching over the instructions after the call would lead to incorrect behaviour for threads currently making that call, when they return.

llvm-svn: 220710
2014-10-27 19:40:35 +00:00
Juergen Ributzka
3423c5ccda [FastISel][AArch64] Use 'cbz' also for null values (pointers).
The pattern matching for a 'ConstantInt' value was too restrictive. Checking for
a 'Constant' with a bull value is sufficient for using an 'cbz/cbnz' instruction.

This fixes rdar://problem/18784732.

llvm-svn: 220709
2014-10-27 19:38:05 +00:00
Juergen Ributzka
64c2c99226 [FastISel][AArch64] Don't fold the 'and' instruction into the 'tbz/tbnz' instruction if it is in a different basic block.
This fixes a bug where the input register was not defined for the 'tbz/tbnz'
instruction. This happened, because we folded the 'and' instruction from a
different basic block.

This fixes rdar://problem/18784013.

llvm-svn: 220704
2014-10-27 19:16:48 +00:00
Juergen Ributzka
57783726dd [FastISel][AArch64] Fix load/store with frame indices.
At higher optimization levels the LLVM IR may contain more complex patterns for
loads/stores from/to frame indices. The 'computeAddress' function wasn't able to
handle this and triggered an assertion.

This fix extends the possible addressing modes for frame indices.

This fixes rdar://problem/18783298.

llvm-svn: 220700
2014-10-27 18:21:58 +00:00
Lang Hames
77d387a954 [PBQP] Unique allowed-sets for nodes in the PBQP graph and use pairs of these
sets as keys into a cache of interference matrice values in the Interference
constraint adder.

Creating interference matrices was one of the large remaining time-sinks in
PBQP. Caching them reduces the total compile time (when using PBQP) on the
nightly test suite by ~10%.

llvm-svn: 220688
2014-10-27 17:44:25 +00:00
NAKAMURA Takumi
a1ef3346aa Prune CRLF.
llvm-svn: 220678
2014-10-27 12:37:26 +00:00
Oliver Stannard
9a595b2769 [ARM] Select VMAXNM and VMINNM regardless of operand order
Currently, the ARM backend will select the VMAXNM and VMINNM for these C
expressions:
  (a < b) ? a : b
  (a > b) ? a : b
but not these expressions:
  (a > b) ? b : a
  (a < b) ? b : a

This patch allows all of these expressions to be matched.

llvm-svn: 220671
2014-10-27 09:23:02 +00:00
Yuri Gorshenin
0701551505 [asan-asm-instrumentation] Added comment describing how asm instrumentation works.
Summary: [asan-asm-instrumentation] Added comment describing how asm instrumentation works.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5970

llvm-svn: 220670
2014-10-27 08:38:54 +00:00
Elena Demikhovsky
467839f2f5 AVX-512: Fixed encoding of VPBROADCASTM and added SKX forms of this instruction
llvm-svn: 220638
2014-10-26 09:52:24 +00:00
Simon Pilgrim
c20b3d1fdd [X86][SSE] Vector integer/float conversion memory folding
Tidied up some entries in the folding tables so that they are under the correct comment section (they were categorised as AVX2 instructions when they're AVX1).

Minor patch agreed with qcolombet.

llvm-svn: 220613
2014-10-25 08:11:20 +00:00
Jingyue Wu
376aaf44c4 [NVPTX] aligned byte-buffers for vector return types
Summary:
Fixes PR21100 which is caused by inconsistency between the declared return type
and the expected return type at the call site. The new behavior is consistent
with nvcc and the NVPTXTargetLowering::getPrototype function.

Test Plan: test/Codegen/NVPTX/vector-return.ll

Reviewers: jholewinski

Reviewed By: jholewinski

Subscribers: llvm-commits, meheff, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D5612

llvm-svn: 220607
2014-10-25 03:46:16 +00:00
Kevin Enderby
9e33867d25 Fix a Mach-O assembler segfault for a subtraction expression with an undefined symbol.
In a Mach-O object file a relocatable expression of the form
SymbolA - SymbolB + constant is allowed when both symbols are
defined in a section.  But when either symbol is undefined it
is an error.

The code was crashing when it had an undefined symbol in this case.
And should have printed a error message using the location information
in the relocation entry.

rdar://18678402

llvm-svn: 220599
2014-10-24 22:39:40 +00:00
Simon Pilgrim
29b3d266a4 [X86][SSE] Bitcast assertion in XFormVExtractWithShuffleIntoLoad
Minor patch to fix an issue in XFormVExtractWithShuffleIntoLoad where a load is unary shuffled, then bitcast (to a type with the same number of elements) before extracting an element.

An undef was created for the second shuffle operand using the original (post-bitcasted) vector type instead of the pre-bitcasted type like the rest of the shuffle node - this was then causing an assertion on the different types later on inside SelectionDAG::getVectorShuffle.

Differential Revision: http://reviews.llvm.org/D5917

llvm-svn: 220592
2014-10-24 21:04:41 +00:00
Colin LeMahieu
a5b4b06680 [Hexagon] Resubmission of 220427
Modified library structure to deal with circular dependency between HexagonInstPrinter and HexagonMCInst.
Adding encoding bits for add opcode.
Adding llvm-mc tests.
Removing unit tests.

http://reviews.llvm.org/D5624

llvm-svn: 220584
2014-10-24 19:00:32 +00:00
Sanjay Patel
4e42d925c1 Allow AVX vrsqrtps generation.
This is a follow-on to r220570 that allows a 256-bit (v8f32)
version of vrsqrtps to be generated.

llvm-svn: 220579
2014-10-24 17:59:18 +00:00
Sanjay Patel
d9b7837012 Use rsqrt (X86) to speed up reciprocal square root calcs
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.

For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).

This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..

See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900

Differential Revision: http://reviews.llvm.org/D5658

llvm-svn: 220570
2014-10-24 17:02:16 +00:00
Daniel Sanders
3747982b4e [mips] Replace MipsABIEnum with a MipsABIInfo class.
Summary:
No functional change yet, it's just an object replacement for an enum.
It will allow us to gather ABI information in a single place so that we can
start testing for properties of the ABI's instead of the ABI itself.

For example we will eventually be able to use:
  ABI.MinStackAlignmentInBytes()
instead of:
  (isABI_N32() || isABI_N64()) ? 16 : 8
which is clearer and more maintainable.

Reviewers: matheusalmeida

Reviewed By: matheusalmeida

Differential Revision: http://reviews.llvm.org/D3341

llvm-svn: 220568
2014-10-24 16:15:27 +00:00
Daniel Sanders
656a7ed742 [mips] Fix >80-column line
llvm-svn: 220564
2014-10-24 14:46:00 +00:00
Daniel Sanders
5f05e5e050 [mips] Remove redundant code in RetCC_MipsN. NFC.
Summary:
i32 is always promoted to i64 so it no longer makes sense to assign i32 to
registers.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5964

llvm-svn: 220561
2014-10-24 13:49:54 +00:00
Daniel Sanders
d767694dc4 [mips] For N32/N64, structs must be passed in the upper bits of a register.
Summary:
Most structs were fixed by r218451 but those of between >32-bits and
<64-bits remained broken since they were not marked with [ASZ]ExtUpper.
This patch fixes the remaining cases by using
CCPromoteToUpperBitsInType<i64> on i64's in addition to i32 and smaller.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5963

llvm-svn: 220556
2014-10-24 13:09:19 +00:00
Oliver Stannard
4116b199b2 [AArch64] Fix fast-isel of cbz of i1, i8, i16
This fixes a miscompilation in the AArch64 fast-isel which was
triggered when a branch is based on an icmp with condition eq or ne,
and type i1, i8 or i16. The cbz instruction compares the whole 32-bit
register, so values with the bottom 1, 8 or 16 bits clear would cause
the wrong branch to be taken.

llvm-svn: 220553
2014-10-24 09:54:41 +00:00
Adam Nemet
01508c4733 [AVX512] FMA support for the 231 variants
This is asm/diasm-only support, similar to AVX.

For ISeling the register variant, they are no different from 213 other than
whether the multiplication or the addition operand is destructed.

For ISeling the memory variant, i.e. to fold a load, they are no different
than the 132 variant.  The addition operand (op3) in both cases can come from
memory.  Again the ony difference is which operand is destructed.

There could be a post-RA pass that would convert a 213 or 132 into a 231.

Part of <rdar://problem/17082571>

llvm-svn: 220540
2014-10-24 00:03:00 +00:00
Adam Nemet
3beded2b5d [AVX512] Introduce fma3p_forms from AVX
This multiclass generates the different forms: 213, 231, 132 in AVX.

132 in AVX512 is a separate class but I am planning to use this same
multiclass to generate 231 relying on the nice the null_frag trick from AVX to
disable codegen pattern for 231.

No functionality change, no change in X86.td.expanded except for the different
instruction definition names.

llvm-svn: 220539
2014-10-24 00:02:55 +00:00
Ahmed Bougacha
af95bf4558 [X86] Improve mul w/ overflow codegen, to MUL8+SETO.
Currently, @llvm.smul.with.overflow.i8 expands to 9 instructions, where
3 are really needed.

This adds X86ISD::UMUL8/SMUL8 SD nodes, and custom lowers them to
MUL8/IMUL8 + SETO.

i8 is a special case because there is no two/three operand variants of
(I)MUL8, so the first operand and return value need to go in AL/AX.

Also, we can't write patterns for these instructions: TableGen refuses
patterns where output operands don't match SDNode results. In this case,
instructions where the output operand is an implicitly defined register.

A related special case (and FIXME) exists for MUL8 (X86InstrArith.td):

  // FIXME: Used for 8-bit mul, ignore result upper 8 bits.
  // This probably ought to be moved to a def : Pat<> if the
  // syntax can be accepted.
  [(set AL, (mul AL, GR8:$src)), (implicit EFLAGS)]

Ideally, these go away with UMUL8, but we still need to improve TableGen
support of implicit operands in patterns.

Before this change:
  movsbl  %sil, %eax
  movsbl  %dil, %ecx
  imull   %eax, %ecx
  movb    %cl, %al
  sarb    $7, %al
  movzbl  %al, %eax
  movzbl  %ch, %esi
  cmpl    %eax, %esi
  setne   %al

After:
  movb    %dil, %al
  imulb   %sil
  seto    %al

Also, remove a made-redundant testcase for PR19858, and enable more FastISel
ALU-overflow tests for SelectionDAG too.

Differential Revision: http://reviews.llvm.org/D5809

llvm-svn: 220516
2014-10-23 21:55:31 +00:00
Renato Golin
cd30ecdc65 Do not emit intermediate register for zero FP immediate
This updates check for double precision zero floating point constant to allow
use of instruction with immediate value rather than temporary register.
Currently "a == 0.0", where "a" is of "double" type generates:

vmov.i32        d16, #0x0
vcmpe.f64       d0, d16

With this change it becomes:

vcmpe.f64        d0, #0

Patch by Sergey Dmitrouk.

llvm-svn: 220486
2014-10-23 15:31:50 +00:00
NAKAMURA Takumi
b1cc8f92d6 Hexagon/Disassembler/LLVMBuild.txt: Update libdeps.
llvm-svn: 220482
2014-10-23 11:32:16 +00:00
NAKAMURA Takumi
985e79f39f Hexagon/LLVMBuild.txt: Prune CRLF.
llvm-svn: 220481
2014-10-23 11:32:03 +00:00
NAKAMURA Takumi
0cdee77f2d [CMake] Prune CRLF in CMakeLists.txt(s).
llvm-svn: 220480
2014-10-23 11:31:50 +00:00
NAKAMURA Takumi
d8b493b106 Revert r220427, "[Hexagon] Adding encoding bits for add opcode."
It brought cyclic dependecy between HexagonAsmPrinter and HexagonDesc.

llvm-svn: 220478
2014-10-23 11:31:22 +00:00
Zoran Jovanovic
417b0378da [mips][microMIPS] Implement ADDIUR1SP instruction
Differential Revision: http://reviews.llvm.org/D5153

llvm-svn: 220477
2014-10-23 11:13:59 +00:00
Zoran Jovanovic
7d4ae91afe ps][microMIPS] Implement ADDIUR2 instruction
Differential Revision: http://reviews.llvm.org/D5151

llvm-svn: 220476
2014-10-23 11:06:34 +00:00
Zoran Jovanovic
c644ce2e43 ps][microMIPS] Implement LI16 instruction
Differential Revision: http://reviews.llvm.org/D5149

llvm-svn: 220475
2014-10-23 10:59:24 +00:00
Zoran Jovanovic
0172553d1e [mips][microMIPS] Implement CodeGen support for SLL16 and SRL16 instructions
Differential Revision: http://reviews.llvm.org/D5774

llvm-svn: 220474
2014-10-23 10:42:01 +00:00
Oliver Stannard
b9f2d13ec6 [Thumb2] Improve disassembly of memory hints
Currently, the ARM disassembler will disassemble the Thumb2 memory hint
instructions (PLD, PLDW and PLI), even for targets which do not have
these instructions. This patch adds the required checks to the
disassmebler.

llvm-svn: 220472
2014-10-23 08:52:58 +00:00
Akira Hatanaka
f1f0b1a1fd [ARM, stack protector] If supported, use armv7 instructions.
This commit enables using movt/movw to load the stack guard address:

movw r0, :lower16:(L_g3$non_lazy_ptr-(LPC0_0+8))
movt r0, :upper16:(L_g3$non_lazy_ptr-(LPC0_0+8))
ldr r0, [pc, r0]

Previously a pc-relative load was emitted:

ldr r0, LCPI0_0
ldr r0, [pc, r0]

rdar://problem/18740489

llvm-svn: 220470
2014-10-23 04:17:05 +00:00
Colin LeMahieu
88461f5b7a [Hexagon] Adding encoding bits for add opcode.
Adding llvm-mc tests.
Removing unit tests.

http://reviews.llvm.org/D5624

llvm-svn: 220427
2014-10-22 20:58:35 +00:00
Chad Rosier
19ef65a831 [AArch64] Add support for the .inst directive.
This has been implement using the MCTargetStreamer interface as is done in the
ARM, Mips and PPC backends.

Phabricator: http://reviews.llvm.org/D5891
PR20964

llvm-svn: 220422
2014-10-22 20:35:57 +00:00
Hans Wennborg
dbfab2bba5 Fix VS2012 build; C++11 type aliases are not supported.
llvm-svn: 220399
2014-10-22 17:47:49 +00:00
Colin LeMahieu
f359444940 Ammending 220393 - Removing unused decoding tables.
llvm-svn: 220397
2014-10-22 17:23:01 +00:00
Colin LeMahieu
05568816b4 Ammending 220393 - Removing unused functions.
llvm-svn: 220396
2014-10-22 17:03:19 +00:00
Bill Schmidt
1033a05374 [PATCH] Support select-cc for VSFRC when VSX is enabled
A previous patch enabled SELECT_VSRC and SELECT_CC_VSRC for VSX to
handle <2 x double> cases.  This patch adds SELECT_VSFRC and
SELECT_CC_VSFRC to allow use of all 64 vector-scalar registers for the
f64 type when VSX is enabled.  The changes are analogous to those in
the previous patch.  I've added a new variant to vsx.ll to test the
code generation.

(I also cleaned up a little formatting in PPCInstrVSX.td from the
previous patch.)

llvm-svn: 220395
2014-10-22 16:58:20 +00:00
Colin LeMahieu
f6f8bdf091 [Hexagon] Adding basic disassembler.
Marking all instructions as CodeGenOnly since encoding bits are not set yet.
http://reviews.llvm.org/D5829?vs=on&id=15023&whitespace=ignore-all#toc

llvm-svn: 220393
2014-10-22 16:49:14 +00:00
Bill Schmidt
884633f291 [PowerPC] Support select-cc for VSX
The tests test/CodeGen/Generic/select-cc.ll and
test/CodeGen/PowerPC/select-cc.ll both fail with VSX enabled.  The
problem is that the lowering logic for the SELECT and SELECT_CC
operations doesn't currently support the VSX registers.  This patch
fixes that.

In lib/Target/PowerPC/PPCInstrInfo.td, we have pseudos to handle this
for other register classes.  Similar pseudos are added in
PPCInstrVSX.td (they must be there, because the "vsrc" register class
definition appears there) for the VSRC register class.  The
SELECT_VSRC pseudo is then used in pattern matching for SELECT_CC.

The rest of the patch just adds logic for SELECT_VSRC wherever similar
logic appears for SELECT_VRRC.

There are no new test cases because the existing tests above test
this, along with a variant in test/CodeGen/PowerPC/vsx.ll.

After discussion with Hal, a future patch will add similar _VSFRC
variants to override f64 type handling (currently using F8RC).

llvm-svn: 220385
2014-10-22 13:13:40 +00:00
Arnaud A. de Grandmaison
e8d641238b [AArch64] Cleanup A57PBQPConstraints
And add a long awaited testcase.

llvm-svn: 220381
2014-10-22 12:40:20 +00:00
Jyoti Allur
555583e57e [Thumb/Thumb2] Implement restrictions on SP in register list on LDM, STM variants in thumb mode
llvm-svn: 220379
2014-10-22 10:41:14 +00:00
Matt Arsenault
2257f6b589 Add minnum / maxnum codegen
llvm-svn: 220342
2014-10-21 23:01:01 +00:00
Matt Arsenault
98d33a4281 R600/SI: Add missing parameter to div_fmas intrinsic
llvm-svn: 220338
2014-10-21 22:20:55 +00:00
Matt Arsenault
f60479b756 R600: Use default GlobalDirective
The overridden one wasn't inserting a space,
so you would end up with .globalfoo

llvm-svn: 220329
2014-10-21 21:08:36 +00:00
Arnaud A. de Grandmaison
73624b6ac4 [PBQP] Teach PassConfig to tell if the default register allocator is used.
This enables targets to adapt their pass pipeline to the register
allocator in use. For example, with the AArch64 backend, using PBQP
with the cortex-a57, the FPLoadBalancing pass is no longer necessary.

llvm-svn: 220321
2014-10-21 20:47:22 +00:00
Rafael Espindola
f1e8d6839e Drop support for an old version of ld64 (from darwin 9).
llvm-svn: 220310
2014-10-21 18:31:09 +00:00
Matt Arsenault
bfd13cde1b R600/SI: Add pattern for bswap
llvm-svn: 220304
2014-10-21 16:25:08 +00:00
NAKAMURA Takumi
2ca8461092 X86AsmInstrumentation.cpp: Dissolve initializer-ranged-for. MSC17 disliked it.
llvm-svn: 220301
2014-10-21 16:22:52 +00:00
Colin LeMahieu
0a3ea08c1c Test commit
Fixing brief comment.

llvm-svn: 220299
2014-10-21 16:03:10 +00:00
Bill Schmidt
b654c2bb89 [PowerPC] Avoid VSX FMA mutate when killed product reg = addend reg
With VSX enabled, test/CodeGen/PowerPC/recipest.ll exposes a bug in
the FMA mutation pass.  If we have a situation where a killed product
register is the same register as the FMA target, such as:

   %vreg5<def,tied1> = XSNMSUBADP %vreg5<tied0>, %vreg11, %vreg5,
                       %RM<imp-use>; VSFRC:%vreg5 F8RC:%vreg11 

then the substitution makes no sense.  We end up getting a crash when
we try to extend the interval associated with the killed product
register, as there is already a live range for %vreg5 there.  This
patch just disables the mutation under those circumstances.

Since recipest.ll generates different code with VMX enabled, I've
modified that test to use -mattr=-vsx.  I've borrowed the code from
that test that exposed the bug and placed it in fma-mutate.ll, where
it tests several mutation opportunities including the "bad" one.

llvm-svn: 220290
2014-10-21 13:02:37 +00:00
Oliver Stannard
ecd544ccbd [ARM] NEON 32-bit scalar moves are also available in VFPv2
The 32-bit variants of the NEON scalar<->GPR move instructions are
also available in VFPv2. The 8- and 16-bit variants do require NEON.

Note that the checks in the test file are all -DAG because they are
checking a mixture of stdout and stderr, and the ordering is not
guaranteed.

llvm-svn: 220288
2014-10-21 11:49:14 +00:00
Yuri Gorshenin
a8f1a7e5da [asan-asm-instrumentation] Fixed memory accesses with rbp as a base or an index register.
Summary: Fixed memory accesses with rbp as a base or an index register.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5819

llvm-svn: 220283
2014-10-21 10:22:27 +00:00
Oliver Stannard
70943b7fa8 [Thumb2] LDRS?[BH] cannot load to the PC
The Thumb2 LDRS?[BH] instructions are not valid when the destination
register is the PC (these encodings are used for preload hints).

llvm-svn: 220278
2014-10-21 09:14:15 +00:00
Zoran Jovanovic
5e356e74e3 [mips][microMIPS] Implement ADDU16 and SUBU16 instructions
Differential Revision: http://reviews.llvm.org/D5118

llvm-svn: 220276
2014-10-21 08:44:58 +00:00
Zoran Jovanovic
26b6fdd712 [mips][microMIPS] Implement AND16, NOT16, OR16 and XOR16 instructions
Differential Revision: http://reviews.llvm.org/D5117

llvm-svn: 220275
2014-10-21 08:32:40 +00:00
Zoran Jovanovic
7982f485af [mips][microMIPS] Implement microMIPS 16-bit instructions registers
Differential Revision: http://reviews.llvm.org/D5116

llvm-svn: 220273
2014-10-21 08:23:11 +00:00
Rafael Espindola
6ffbd5bf5d Fix a bit of confusion about .set and produce more readable assembly.
Every target we support has support for assembly that looks like

a = b - c
.long a

What is special about MachO is that the above combination suppresses the
production of a relocation.

With this change we avoid producing the intermediary labels when they don't
add any value.

llvm-svn: 220256
2014-10-21 01:17:30 +00:00
Quentin Colombet
1691b0c568 [X86] Fix a bug in the lowering of the mask of VSELECT.
X86 code to lower VSELECT messed a bit with the bits set in the mask of VSELECT
when it knows it can be lowered into BLEND. Indeed, only the high bits need to be
set for those and it optimizes those accordingly.
However, when the mask is a compile time constant, the lowering will be handled
by the generic optimizer and those modifications will generate bad code in the
generic optimizer.

This patch fixes that by preventing the optimization if the VSELECT will be
handled by the generic optimizer.

<rdar://problem/18675020>

llvm-svn: 220242
2014-10-20 23:13:30 +00:00
Simon Pilgrim
0ccb373260 [X86] Memory folding for commutative instructions (updated)
This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning.

Updated version of r219584 (reverted in r219595) - the commutation attempt now explicitly ensures that neither of the commuted source operands are tied to the destination operand / register, which was the source of all the regressions that occurred with the original patch attempt.

Added additional regression test case provided by Joerg Sonnenberger.

Differential Revision: http://reviews.llvm.org/D5818

llvm-svn: 220239
2014-10-20 22:14:22 +00:00
Tim Northover
7a41a526ce ARM: rework Thumb1 frame index rewriting
The previous code had a few problems, motivating the choices here.

1. It could create instructions clobbering CPSR, but the incoming MachineInstr
   didn't reflect this. A potential source of corruption. This is why the patch
   has a new PseudoInst for before lowering.
2. Similarly, there was some code to handle the incoming instruction not being
   ARMCC::AL, but this would have caused massive problems if it was actually
   invoked when a complex offset needing more than one instruction was requested.
3. It wasn't designed to handle unaligned pointers (or offsets). These should
   probably be minimised anyway, but the code needs to deal with them properly
   regardless.
4. It had some rather dubious ad-hoc code to avoid calling
   emitThumbRegPlusImmediate, a function which should be designed to do precisely
   this job.

We seem to cover the common cases correctly now, and hopefully can enhance
emitThumbRegPlusImmediate to handle any extra optimisations we need to add in
future.

llvm-svn: 220236
2014-10-20 21:28:41 +00:00
Oliver Stannard
4a54d08d15 [Thumb2] RFE, SRS and "SUBS pc, lr" are undefined on v7M
These instructions are related to the v7[AR] exception model, and are
not defined on v7M.

llvm-svn: 220204
2014-10-20 15:37:35 +00:00
Sid Manning
66f7ab4998 Remove unnecessary else.
llvm-svn: 220200
2014-10-20 13:08:19 +00:00
Oliver Stannard
23d7518ec9 [ARM] Do not select SMULW[BT] or SMLAW[BT]
The current instruction selection patterns for SMULW[BT] and SMLAW[BT]
are incorrect. These instructions multiply a 32-bit and a 16-bit value
(both signed) and return the top 32 bits of the 48-bit result. This
preserves the 16 bits of overflow, whereas the patterns they currently
match truncate the result to 16 bits then sign extend.

To select these instructions, we would need to match an ISD::SMUL_LOHI,
a sign extend, two shifts and an or. There is no way to match SMUL_LOHI
in an instruction pattern as it defines multiple values, so this would
have to be done in C++. I have raised
http://llvm.org/bugs/show_bug.cgi?id=21297 to cover allowing correct
selection of these instructions.

This fixes http://llvm.org/bugs/show_bug.cgi?id=19396

llvm-svn: 220196
2014-10-20 11:30:35 +00:00
Oliver Stannard
3b66d91575 [Thumb] Fix crash in Thumb1RegisterInfo::rewriteFrameIndex
This function can, for some offsets from the SP, split one instruction
into two. Since it re-uses the original instruction as the first
instruction of the result, we need ensure its result register is not
marked as dead before we use it in the second instruction.

llvm-svn: 220194
2014-10-20 11:00:18 +00:00
Bob Wilson
113df60c0c Use triple predicate functions instead of checking values directly. NFC.
llvm-svn: 220155
2014-10-19 00:39:30 +00:00
Aaron Watry
ea88a00c76 R600/SI: Add global atomicrmw xchg
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220110
2014-10-17 23:33:03 +00:00
Aaron Watry
d852df8bcb R600/SI: Add global atomicrmw xor
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220109
2014-10-17 23:33:01 +00:00
Aaron Watry
5f5371bf76 R600/SI: Add global atomicrmw or
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220108
2014-10-17 23:32:59 +00:00
Aaron Watry
d94b6445ca R600/SI: Add global atomicrmw min/umin
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220107
2014-10-17 23:32:57 +00:00
Aaron Watry
0c7ec3cff8 R600/SI: Add global atomicrmw max/umax
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220106
2014-10-17 23:32:56 +00:00
Aaron Watry
f93e43a3b6 R600/SI: Add global atomicrmw and
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220105
2014-10-17 23:32:54 +00:00
Aaron Watry
ad528e135c R600/SI: Add global atomicrmw sub
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220104
2014-10-17 23:32:52 +00:00
Bill Schmidt
4290f0e275 [PowerPC] Change assert to better form
llvm-svn: 220092
2014-10-17 21:19:59 +00:00
Matt Arsenault
8a07439402 R600/SI: Remove redundant setting of instruction bits
These are all set on the instruction base classes.

llvm-svn: 220091
2014-10-17 21:13:11 +00:00
Bill Schmidt
3e4d52c10c [PowerPC] Change liveness testing in VSX FMA mutation pass
With VSX enabled, LLVM crashes when compiling
test/CodeGen/PowerPC/fma.ll.  I traced this to the liveness test
that's revised in this patch. The interval test is designed to only
work for virtual registers, but in this case the AddendSrcReg is
physical. Since there is already a walk of the MIs between the
AddendMI and the FMA, I added a check for def/kill of the AddendSrcReg
in that loop.  At Hal Finkel's request, I converted the liveness test
to an assert restricted to virtual registers.

I've changed the fma.ll test to have VSX and non-VSX variants so we
can test both kinds of multiply-adds.

llvm-svn: 220090
2014-10-17 21:02:44 +00:00
Matt Arsenault
58e20fe65d Fix typo
llvm-svn: 220068
2014-10-17 18:02:31 +00:00
Matt Arsenault
b82c7f40b2 R600/SI: Also check for FPImm literal constants
llvm-svn: 220067
2014-10-17 18:00:50 +00:00
Matt Arsenault
9be7600d7c R600/SI: Allow commuting with source modifiers
llvm-svn: 220066
2014-10-17 18:00:48 +00:00
Matt Arsenault
2d0382bc47 R600/SI: Simplify code with hasModifiersSet
llvm-svn: 220065
2014-10-17 18:00:45 +00:00
Matt Arsenault
9c6d84b515 R600/SI: Fix general commuting breaking src mods
The generic code trying to use findCommutedOpIndices won't
understand that it needs to swap the modifier operands also,
so it should fail if they are set.

llvm-svn: 220064
2014-10-17 18:00:43 +00:00
Matt Arsenault
c43bde1e40 R600/SI: Cleanup code with ChangeToFPImmediate
llvm-svn: 220063
2014-10-17 18:00:41 +00:00
Matt Arsenault
69078d03ff R600/SI: Allow comuting fp immediates
llvm-svn: 220062
2014-10-17 18:00:39 +00:00
Matt Arsenault
082244cff4 R600/SI: Use early return instead of checking condition twice
Any commutable instruction will have at least src1.

llvm-svn: 220061
2014-10-17 18:00:37 +00:00
Matt Arsenault
0b41e1680b R600/SI: Use complex pattern for MUBUF load patterns.
This eliminates a use of the SI_ADDR64_RSRC pseudo

llvm-svn: 220057
2014-10-17 17:43:00 +00:00
Matt Arsenault
3ce8aba254 R600/SI: Remove SI_BUFFER_RSRC pseudo
Just use REG_SEQUENCE directly, so there are fewer
instructions to need to deal with later.

llvm-svn: 220056
2014-10-17 17:42:56 +00:00
Andrea Di Biagio
7d3fa43e58 [X86] Fix missed selection of non-temporal store of zero vector.
When the input to a store instruction was a zero vector, the backend
always selected a normal vector store regardless of the non-temporal
hint. This is fixed by this patch.

This fixes PR19370.

llvm-svn: 220054
2014-10-17 17:27:06 +00:00
James Molloy
b06648acfb [AArch64] Fix a silent codegen fault in BUILD_VECTOR lowering.
We should be talking about the number of source elements, not the number of destination elements, given we know at this point that the source and dest element numbers are not the same.

While we're at it, avoid writing to std::vector::end()...

Bug found with random testing and a lot of coffee.

llvm-svn: 220051
2014-10-17 17:06:31 +00:00
Bill Schmidt
d3f8b7e4eb [PowerPC] Enable use of lxvw4x/stxvw4x in VSX code generation
Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64
types, but does not yet use lxvw4x and stxvw4x for 4x32 types.  This
patch adds that support.

As with lxvd2x/stxvd2x, this involves straightforward overriding of
the patterns normally recognized for lvx/stvx, with preference given
to the VSX patterns when VSX is enabled.

In addition, the logic for permitting misaligned memory accesses is
modified so that v4r32 and v4i32 are treated the same as v2f64 and
v2i64 when VSX is enabled.  Finally, the DAG generation for unaligned
loads is changed to just use a normal LOAD (which will become lxvw4x)
on P8 and later hardware, where unaligned loads are preferred over
lvsl/lvx/lvx/vperm.

A number of tests now generate the VSX loads/stores instead of
lvx/stvx, so this patch adds VSX variants to those tests.  I've also
added <4 x float> tests to the vsx.ll test case, and created a
vsx-p8.ll test case to be used for testing code generation for the
P8Vector feature.  For now, that simply tests the unaligned load/store
behavior.

This has been tested along with a temporary patch to enable the VSX
and P8Vector features, with no new regressions encountered with or
without the temporary patch applied.

llvm-svn: 220047
2014-10-17 15:13:38 +00:00
Jan Vesely
1949cffd79 Mips: Only set divrem i64 to custom on 64bit
Reviewed-by: Daniel Sanders <daniel.sanders@imgtec.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220046
2014-10-17 14:45:28 +00:00
Vasileios Kalintiris
86aabae168 [mips] Add support for COP1's Branch-On-Cond-Likely instructions
Summary: Depends on D5782

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5802

llvm-svn: 220042
2014-10-17 14:08:28 +00:00
Vasileios Kalintiris
f0e37a4687 [mips] Add support for COP0's Branch-On-Cond-Likely instructions
Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5782

llvm-svn: 220036
2014-10-17 12:38:35 +00:00
Akira Hatanaka
1c4225a09b ARM: Fix a bug which was causing convergence failure in constant-island pass.
The bug is in ARMConstantIslands::createNewWater where the upper bound of the
new water split point is computed:

// This could point off the end of the block if we've already got constant
// pool entries following this block; only the last one is in the water list.
// Back past any possible branches (allow for a conditional and a maximally
// long unconditional).
if (BaseInsertOffset + 8 >= UserBBI.postOffset()) {
  BaseInsertOffset = UserBBI.postOffset() - UPad - 8;
  DEBUG(dbgs() << format("Move inside block: %#x\n", BaseInsertOffset));
}

The split point is supposed to be somewhere between the machine instruction that
loads from the constant pool entry and the end of the basic block, before branch
instructions. The code above is fine if the basic block is large enough and
there are a sufficient number of instructions following the machine instruction.
However, if the machine instruction is near the end of the basic block,
BaseInsertOffset can point to the machine instruction or another instruction
that precedes it, and this can lead to convergence failure.

This commit fixes this bug by ensuring BaseInsertOffset is larger than the
offset of the instruction following the constant-loading instruction.

rdar://problem/18581150

llvm-svn: 220015
2014-10-17 01:31:47 +00:00
Matt Arsenault
cb1558abf3 R600/SI: Simplify debug printing
llvm-svn: 219999
2014-10-17 00:36:20 +00:00
Matt Arsenault
f84bb3f382 R600/SI: Remove another VALU pattern
llvm-svn: 219988
2014-10-16 23:33:37 +00:00
Robin Morisset
8dc41d55aa Erase fence insertion from SelectionDAGBuilder.cpp (NFC)
Summary:
Backends can use setInsertFencesForAtomic to signal to the middle-end that
montonic is the only memory ordering they can accept for
stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger
ordering to fences + monotonic accesses is currently living in
SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it
for several reasons:
- There is lots of redundancy to avoid: extremely similar logic already
  exists in AtomicExpand.
- The current code in SelectionDAGBuilder does not use any target-hooks, it
  does the same transformation for every backend that requires it
- As a result it is plain *unsound*, as it was apparently designed for ARM.
  It happens to mostly work for the other targets because they are extremely
  conservative, but Power for example had to switch to AtomicExpand to be
  able to use lwsync safely (see r218331).
- Because it produces IR-level fences, it cannot be made sound ! This is noted
  in the C++11 standard (section 29.3, page 1140):
```
Fences cannot, in general, be used to restore sequential consistency for atomic
operations with weaker ordering semantics.
```
It can also be seen by the following example (called IRIW in the litterature):
```
atomic<int> x = y = 0;
int r1, r2, r3, r4;
Thread 0:
  x.store(1);
Thread 1:
  y.store(1);
Thread 2:
  r1 = x.load();
  r2 = y.load();
Thread 3:
  r3 = y.load();
  r4 = x.load();
```
r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst.
But if they are lowered to monotonic accesses, no amount of fences can prevent it..

This patch does three things (I could cut it into parts, but then some of them
would not be tested/testable, please tell me if you would prefer that):
- it provides a default implementation for emitLeadingFence/emitTrailingFence in
terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder.
As we saw above, this is unsound, but the best that can be done without knowing
the targets well (and there is a comment warning about this risk).
- it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default
implementation (that exactly replicates the logic of SelectionDAGBuilder, so no
functional change)
- it finally erase this logic from SelectionDAGBuilder as it is dead-code.

Ideally, each target would define its own override for emitLeading/TrailingFence
using target-specific fences, but I do not know the Sparc/Mips/XCore memory model
well enough to do this, and they appear to be dealing fine with the ARM-inspired
default expansion for now (probably because they are overly conservative, as
Power was). If anyone wants to compile fences more agressively on these
platforms, the long comment should make it clear why he should first override
emitLeading/TrailingFence.

Test Plan: make check-all, no functional change

Reviewers: jfb, t.p.northover

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5474

llvm-svn: 219957
2014-10-16 20:34:57 +00:00
Matt Arsenault
ecbe5b08b8 R600/SI: Remove unnecessary VALU patterns
These haven't been necessary since allowing
selecting SALU instructions in non-entry blocks
was enabled.

llvm-svn: 219956
2014-10-16 20:31:50 +00:00
Matt Arsenault
75125bd463 R600: Fix nonsensical implementation of computeKnownBits for BFE
This was resulting in invalid simplifications of sdiv

llvm-svn: 219953
2014-10-16 20:07:40 +00:00
Rafael Espindola
253081a9b6 Delete -std-compile-opts.
These days -std-compile-opts was just a silly alias for -O3.

llvm-svn: 219951
2014-10-16 20:00:02 +00:00
Juergen Ributzka
3743a4c0d2 [AArch64] Fix miscompile of sdiv-by-power-of-2.
When the constant divisor was larger than 32bits, then the optimized code
generated for the AArch64 backend would emit the wrong code, because the shift
was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would
loose the upper 32bits.

This fixes rdar://problem/18678801.

llvm-svn: 219934
2014-10-16 16:41:15 +00:00
Vasileios Kalintiris
5822b72dd9 [mips] Account for endianess when expanding BuildPairF64/ExtractElementF64 nodes.
Summary:
In order to support big endian targets for the BuildPairF64 nodes we
just need to swap the low/high pair registers. Additionally, for the
ExtractElementF64 nodes we have to calculate the correct stack offset
with respect to the node's register/operand that we want to extract.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5753

llvm-svn: 219931
2014-10-16 15:41:51 +00:00
Vasileios Kalintiris
8d39330886 [mips] Marked the DI/EI instruction aliases as MIPS32r2
Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5751

llvm-svn: 219927
2014-10-16 15:23:52 +00:00
Vasileios Kalintiris
b62fa9afef Test commit access: remove extra new line at the end of file
llvm-svn: 219925
2014-10-16 14:37:00 +00:00
Matt Arsenault
c79fc2137a R600: Remove dead function
llvm-svn: 219879
2014-10-16 00:08:09 +00:00