1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 21:13:02 +02:00
Commit Graph

30713 Commits

Author SHA1 Message Date
Tom Roeder
f8bc1a9968 Add Forward Control-Flow Integrity.
This commit adds a new pass that can inject checks before indirect calls to
make sure that these calls target known locations. It supports three types of
checks and, at compile time, it can take the name of a custom function to call
when an indirect call check fails. The default failure function ignores the
error and continues.

This pass incidentally moves the function JumpInstrTables::transformType from
private to public and makes it static (with a new argument that specifies the
table type to use); this is so that the CFI code can transform function types
at call sites to determine which jump-instruction table to use for the check at
that site.

Also, this removes support for jumptables in ARM, pending further performance
analysis and discussion.

Review: http://reviews.llvm.org/D4167
llvm-svn: 221708
2014-11-11 21:08:02 +00:00
Sanjay Patel
54a084ae08 Use rcpss/rcpps (X86) to speed up reciprocal calcs (PR21385).
This is a first step for generating SSE rcp instructions for reciprocal
calcs when fast-math allows it. This is very similar to the rsqrt optimization
enabled in D5658 ( http://reviews.llvm.org/rL220570 ).

For now, be conservative and only enable this for AMD btver2 where performance
improves significantly both in terms of latency and throughput.

We may never enable this codegen for Intel Core* chips because the divider circuits
are just too fast. On SandyBridge, divss can be as fast as 10 cycles versus the 21
cycle critical path for the rcp + mul + sub + mul + add estimate.

Follow-on patches may allow configuration of the number of Newton-Raphson refinement
steps, add AVX512 support, and enable the optimization for more chips.

More background here: http://llvm.org/bugs/show_bug.cgi?id=21385

Differential Revision: http://reviews.llvm.org/D6175

llvm-svn: 221706
2014-11-11 20:51:00 +00:00
Bill Schmidt
2b1221e06b [PowerPC] Replace foul hackery with real calls to __tls_get_addr
My original support for the general dynamic and local dynamic TLS
models contained some fairly obtuse hacks to generate calls to
__tls_get_addr when lowering a TargetGlobalAddress.  Rather than
generating real calls, special GET_TLS_ADDR nodes were used to wrap
the calls and only reveal them at assembly time.  I attempted to
provide correct parameter and return values by chaining CopyToReg and
CopyFromReg nodes onto the GET_TLS_ADDR nodes, but this was also not
fully correct.  Problems were seen with two back-to-back stores to TLS
variables, where the call sequences ended up overlapping with unhappy
results.  Additionally, since these weren't real calls, the proper
register side effects of a call were not recorded, so clobbered values
were kept live across the calls.

The proper thing to do is to lower these into calls in the first
place.  This is relatively straightforward; see the changes to
PPCTargetLowering::LowerGlobalTLSAddress() in PPCISelLowering.cpp.
The changes here are standard call lowering, except that we need to
track the fact that these calls will require a relocation.  This is
done by adding a machine operand flag of MO_TLSLD or MO_TLSGD to the
TargetGlobalAddress operand that appears earlier in the sequence.

The calls to LowerCallTo() eventually find their way to
LowerCall_64SVR4() or LowerCall_32SVR4(), which call FinishCall(),
which calls PrepareCall().  In PrepareCall(), we detect the calls to
__tls_get_addr and immediately snag the TargetGlobalTLSAddress with
the annotated relocation information.  This becomes an extra operand
on the call following the callee, which is expected for nodes of type
tlscall.  We change the call opcode to CALL_TLS for this case.  Back
in FinishCall(), we change it again to CALL_NOP_TLS for 64-bit only,
since we require a TOC-restore nop following the call for the 64-bit
ABIs.

During selection, patterns in PPCInstrInfo.td and PPCInstr64Bit.td
convert the CALL_TLS nodes into BL_TLS nodes, and convert the
CALL_NOP_TLS nodes into BL8_NOP_TLS nodes.  This replaces the code
removed from PPCAsmPrinter.cpp, as the BL_TLS or BL8_NOP_TLS
nodes can now be emitted normally using their patterns and the
associated printTLSCall print method.

Finally, as a result of these changes, all references to get-tls-addr
in its various guises are no longer used, so they have been removed.

There are existing TLS tests to verify the changes haven't messed
anything up).  I've added one new test that verifies that the problem
with the original code has been fixed.

llvm-svn: 221703
2014-11-11 20:44:09 +00:00
Rafael Espindola
d10e16c7f3 Use a 8 bit immediate when possible.
This fixes pr21529.

llvm-svn: 221700
2014-11-11 19:46:36 +00:00
Dario Domizioli
138d3c8b1e [X86][ELF] Fix PR20243 - leaf frame pointer bug with TLS access
The ISel lowering for global TLS access in PIC mode was creating a pseudo 
instruction that is later expanded to a call, but the code was not 
setting the hasCalls flag in the MachineFrameInfo alongside the adjustsStack 
flag. This caused some functions to be mistakenly recognized as leaf functions,
and this in turn affected the decision to eliminate the frame pointer.

With the fix, hasCalls is properly set and the leaf frame pointer is correctly
preserved.

llvm-svn: 221695
2014-11-11 18:44:49 +00:00
Vasileios Kalintiris
d0bab7c3e7 [mips] Add preliminary support for the MIPS II target.
Summary:
This patch enables code generation for the MIPS II target. Pre-Mips32
targets don't have the MUL instruction, so we add the correspondent
pattern that uses the MULT/MFLO combination in order to retrieve the
product.

This is WIP as we don't support code generation for select nodes due to
the lack of conditional-move instructions.

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6150

llvm-svn: 221686
2014-11-11 11:43:55 +00:00
Vasileios Kalintiris
9d62a870f7 [mips] Add hardware register name "hwr_ulr" ($29)
The canonical name when printing assembly is still $29. The reason is that
GAS does not accept "$hwr_ulr" at the moment.

This addresses the comments from r221307, which reverted the original
commit r221299.

llvm-svn: 221685
2014-11-11 11:22:39 +00:00
Andrea Di Biagio
8872877147 [X86] Add missing check for 'isINSERTPSMask' in method 'isShuffleMaskLegal'.
This helps the DAGCombiner to identify more opportunities to fold shuffles.

llvm-svn: 221684
2014-11-11 11:20:31 +00:00
Vasileios Kalintiris
88cc5e39d4 Recommit "[mips] Add names and tests for the hardware registers"
The original commit r221299 was reverted in r221307.  I removed the name
"hrw_ulr" ($29) from the original commit because two tests were failing.

llvm-svn: 221681
2014-11-11 10:31:31 +00:00
Craig Topper
b267ba2986 Use uint64_t as the type for the X86 TSFlag format enum. Allows removal of the VEXShift hack that was used to access the higher bits of TSFlags.
llvm-svn: 221673
2014-11-11 07:32:32 +00:00
Michael Kuperstein
cd941e090d [X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSext
This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits. See PR20494 for details.
Recommitting - This time, with a hopefully working test.

Differential Revision: http://reviews.llvm.org/D6128

llvm-svn: 221672
2014-11-11 07:07:40 +00:00
Jingyue Wu
3d5e891a7e [NVPTX] Remove dead code in NVPTXTargetTransformInfo (NFC)
llvm-svn: 221668
2014-11-11 05:24:04 +00:00
Rafael Espindola
476a83ecd4 MCAsmParserExtension has a copy of the MCAsmParser. Use it.
Base classes were storing a second copy.

llvm-svn: 221667
2014-11-11 05:18:41 +00:00
Quentin Colombet
6564d569c9 [X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 if
AVX2 is available.
According to IACA, the new lowering has a throughput of 8 cycles instead of 13
with the previous one.

Althought this lowering kicks in some SPECs benchmarks, the performance
improvement was within the noise.

Correctness testing has been done for the whole range of uint32_t with the
following program:
    uint4 v = (uint4) {0,1,2,3};
    uint32_t i;
    
    //Check correctness over entire range for uint4 -> float4 conversion
    for( i = 0; i < 1U << (32-2); i++ )
    {
        float4 t = test(v);
        float4 c = correct(v);
        
        if( 0xf != _mm_movemask_ps( t == c ))
        {
            printf( "Error @ %vx: %vf vs. %vf\n", v, c, t);
            return -1;
        }
        
        v += 4;
    }
Where "correct" is the old lowering and "test" the new one.

The patch adds a test case for the two custom lowering instruction.
It also modifies the vector cost model, which is why cast.ll and uitofp.ll are
modified.
2009-02-26-MachineLICMBug.ll is also modified because we now hoist 7
instructions instead of 4 (3 more constant loads).

rdar://problem/18153096>

llvm-svn: 221657
2014-11-11 02:23:47 +00:00
Michael Kuperstein
c9f353294c Reverting r221626 due to a too-strict test.
llvm-svn: 221629
2014-11-10 21:07:41 +00:00
Juergen Ributzka
500ec86b5d [AArch64][FastISel] Fix kill flags for integer extends.
In the case we optimize an integer extend away and replace it directly with the
source register, we also have to clear all kill flags at all its uses.
This is necessary, because the orignal IR instruction might be trivially dead,
but we replaced it with a nop at MI level.

llvm-svn: 221628
2014-11-10 21:05:31 +00:00
Michael Kuperstein
db34b4e22e [X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSext
This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits.
See PR20494 for details.

Differential Revision: http://reviews.llvm.org/D6128

llvm-svn: 221626
2014-11-10 20:40:21 +00:00
Jingyue Wu
74a580e42d [NVPTX] Add an NVPTX-specific TargetTransformInfo
Summary:
It currently only implements hasBranchDivergence, and will be extended
in later diffs.

Split from D6188.

Test Plan: make check-all

Reviewers: jholewinski

Reviewed By: jholewinski

Subscribers: llvm-commits, meheff, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D6195

llvm-svn: 221619
2014-11-10 18:38:25 +00:00
Rafael Espindola
8cb53479d4 Misc style fixes. NFC.
This fixes a few cases of:

* Wrong variable name style.
* Lines longer than 80 columns.
* Repeated names in comments.
* clang-format of the above.

This make the next patch a lot easier to read.

llvm-svn: 221615
2014-11-10 18:11:10 +00:00
Zoran Jovanovic
557d63136e [mips][microMIPS] Fix issue with delay slot filler and microMIPS
Differential Revision: http://reviews.llvm.org/D6193

llvm-svn: 221612
2014-11-10 17:27:56 +00:00
Daniel Sanders
141b848f86 [mips] Fix sret arguments for N32/N64 which were accidentally broken in r221534.
llvm-svn: 221604
2014-11-10 15:57:53 +00:00
Matt Arsenault
fe816ec6cd R600: Remove unused define
llvm-svn: 221543
2014-11-07 20:45:00 +00:00
Daniel Sanders
a0cd02ace9 [mips] Promote i32 arguments to i64 for the N32/N64 ABI and fix <64-bit structs...
Summary:
... and after all that refactoring, it's possible to distinguish softfloat
floating point values from integers so this patch no longer breaks softfloat to
do it.

Remove direct handling of i32's in the N32/N64 ABI by promoting them to
i64. This more closely reflects the ABI documentation and also fixes
problems with stack arguments on big-endian targets.

We now rely on signext/zeroext annotations (already generated by clang) and
the Assert[SZ]ext nodes to avoid the introduction of unnecessary sign/zero
extends.

It was not possible to convert three tests to use signext/zeroext. These tests
are bswap.ll, ctlz-v.ll, ctlz-v.ll. It's not possible to put signext on a
vector type so we just accept the sign extends here for now. These tests don't
pass the vectors the same way clang does (clang puts multiple elements in the
same argument, these map 1 element to 1 argument) so we don't need to worry too
much about it.

With this patch, all known N32/N64 bugs should be fixed and we now pass the
first 10,000 tests generated by ABITest.py.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6117

llvm-svn: 221534
2014-11-07 16:54:21 +00:00
Daniel Sanders
412a9e70ae [mips] Removed the remainder of MipsCC. NFC.
Summary:
One of the calls to AllocateStack (the one in LowerCall) doesn't look like
it should be there but it was there before and removing it breaks the
frame size calculation.

Reviewers: vmedic, theraven

Reviewed By: theraven

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6116

llvm-svn: 221529
2014-11-07 15:33:08 +00:00
Daniel Sanders
5bfeec4042 [mips] Remove MipsCC::reservedArgArea() in favour of MipsABIInfo::GetCalleeAllocdArgSizeInBytes(). NFC.
Summary:

Reviewers: theraven, vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6115

llvm-svn: 221528
2014-11-07 15:03:53 +00:00
NAKAMURA Takumi
092d3a0319 MipsCCState.h: Use LLVM_DELETED_FUNCTION for msc17.
llvm-svn: 221527
2014-11-07 14:56:31 +00:00
Daniel Sanders
a45773664f [mips] Move MipsCCState to a separate file and clang-formatted it.
Summary: Depends on D6113

Reviewers: theraven, vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6114

llvm-svn: 221525
2014-11-07 14:24:31 +00:00
Daniel Sanders
2b361cb67c [mips] Fix unused variable warnings introduced in r221521
llvm-svn: 221522
2014-11-07 12:43:01 +00:00
Daniel Sanders
9db1197993 [mips] Remove remaining use of MipsCC::intArgRegs() in favour of MipsABIInfo::GetByValArgRegs() and MipsABIInfo::GetVarArgRegs()
Summary: Depends on D6112

Reviewers: theraven, vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6113

llvm-svn: 221521
2014-11-07 12:21:37 +00:00
Daniel Sanders
0e05d1ee29 [mips] Remove MipsCC::getRegVT(). NFC
Summary: It's no longer used.

Reviewers: vmedic, theraven

Reviewed By: theraven

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6112

llvm-svn: 221519
2014-11-07 12:02:59 +00:00
Daniel Sanders
1b0b5063fe [mips] Remove MipsCC::analyzeCallOperands in favour of CCState::AnalyzeCallOperands. NFC
Summary:
In addition to the usual f128 workaround, it was also necessary to provide
a means of accessing ArgListEntry::IsFixed.

Reviewers: theraven, vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6111

llvm-svn: 221518
2014-11-07 11:43:49 +00:00
Daniel Sanders
95f569ae9e [mips] Move SpecialCallingConv to MipsCCState and use it from tablegen-erated code. NFC
Summary:
In the long run, it should probably become a calling convention in its own
right but for now just move it out of
MipsISelLowering::analyzeCallOperands() so that we can drop this function
in favour of CCState::AnalyzeCallOperands().

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6085

llvm-svn: 221517
2014-11-07 11:10:48 +00:00
Daniel Sanders
1e7ac2a9c7 [mips] Removed IsVarArg from MipsISelLowering::analyzeCallOperands(). NFC.
Summary:
CCState objects already carry this information in their isVarArg() method.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6084

llvm-svn: 221516
2014-11-07 10:45:16 +00:00
Ahmed Bougacha
b887a994f2 [AArch64] Keep flags on condition vreg when instantiating a CB branch.
Reversing a CB* instruction used to drop the flags on the condition. On the
included testcase, this lead to a read from an undefined vreg.
Using addOperand keeps the flags, here <undef>.

Differential Revision: http://reviews.llvm.org/D6159

llvm-svn: 221507
2014-11-07 02:50:00 +00:00
Simon Pilgrim
e2d60b3127 [X86][SSE] Vector integer/float conversion memory folding (cvttps2dq / cvttpd2dq)
Fixed an issue with the (v)cvttps2dq and (v)cvttpd2dq instructions being incorrectly put in the 2 source operand folding tables instead of the 1 source operand and added the missing SSE/AVX versions.

Also added missing (v)cvtps2dq and (v)cvtpd2dq instructions to the folding tables.

Differential Revision: http://reviews.llvm.org/D6001

llvm-svn: 221489
2014-11-06 22:15:41 +00:00
Ahmed Bougacha
d69ce819ea [X86] Add VFMADDSUB cases for the 213->231 custom inserter.
Also add tests for vfmadd/vfmsub.

llvm-svn: 221488
2014-11-06 22:04:15 +00:00
Ahmed Bougacha
97c6b57f51 [X86] Add missing FMA3 VFMADDSUB in the emitter.
Also reuse the fma4 intrinsic test to cover fma3 instructions too.

llvm-svn: 221487
2014-11-06 21:58:11 +00:00
Colin LeMahieu
02f427b6fb [Hexagon] Adding basic Hexagon ELF object emitter.
llvm-svn: 221465
2014-11-06 17:05:51 +00:00
Eli Bendersky
39bb0a448e Clean up NVPTXLowerStructArgs.cpp. NFC
* Remove unnecessary const_casts and C-style casts
* Simplify attribute access code
* Simplify ArrayRef creation
* 80-col and clang-format

llvm-svn: 221464
2014-11-06 17:05:49 +00:00
Daniel Sanders
427838b573 [mips] Removed IsSoftFloat from MipsISelLowering::analyzeCallOperands(). NFC
Summary:
It isn't used anymore.

Depends on D6081

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6083

llvm-svn: 221463
2014-11-06 16:48:57 +00:00
Daniel Sanders
c33a78ead1 [mips] Removed MipsISelLowering::analyzeFormalArguments() in favour of CCState::AnalyzeFormalArguments()
Summary:
As with returns, we must be able to identify f128 arguments despite them
being lowered away. We do this with a pre-analyze step that builds a
vector and then we use this vector from the tablegen-erated code.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6081

llvm-svn: 221461
2014-11-06 16:36:30 +00:00
Andrea Di Biagio
a697bfdcaf [X86] When commuting SSE immediate blend, make sure that the new blend mask is a valid imm8.
Example:
define <4 x i32> @test(<4 x i32> %a, <4 x i32> %b) {
  %shuffle = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 4, i32 5, i32 6, i32 3>
  ret <4 x i32> %shuffle
}

Before llc (-mattr=+sse4.1), produced the following assembly instruction:
  pblendw $4294967103, %xmm1, %xmm0

After
  pblendw $63, %xmm1, %xmm0

llvm-svn: 221455
2014-11-06 14:36:45 +00:00
Aaron Ballman
b7915a3ee5 Fixing some -Wcast-qual warnings; NFC.
llvm-svn: 221454
2014-11-06 14:32:30 +00:00
Toma Tabacu
8e2e22ba28 [mips] Tolerate the use of the %z inline asm operand modifier with non-immediates.
Summary:
Currently, we give an error if %z is used with non-immediates, instead of continuing as if the %z isn't there.

For example, you use the %z operand modifier along with the "Jr" constraints ("r" makes the operand a register, and "J" makes it an immediate, but only if its value is 0). 
In this case, you want the compiler to print "$0" if the inline asm input operand turns out to be an immediate zero and you want it to print the register containing the operand, if it's not.

We give an error in the latter case, and we shouldn't (GCC also doesn't).

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6023

llvm-svn: 221453
2014-11-06 14:25:42 +00:00
Sasa Stankovic
e20257b78e [mips] Add the following MIPS options that control gp-relative addressing of
small data items: -mgpopt, -mlocal-sdata, -mextern-sdata. Implement gp-relative
addressing for constants.

Differential Revision: http://reviews.llvm.org/D4903

llvm-svn: 221450
2014-11-06 13:20:12 +00:00
Toma Tabacu
5d123a7fbb [mips] Improve error/warning messages and testing for the .cpload assembler directive.
Summary:
Improved warning message when using .cpload inside a reorder section and added an error message for using .cpload with Mips16 enabled.
Modified the tests to fit with the changes mentioned above, added a test-case for the N32 ABI in cpload.s and did some reformatting to make the tests easier to read.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5465

llvm-svn: 221447
2014-11-06 10:02:45 +00:00
David Majnemer
8c2f8688d2 X86, MC: Tidy up some whitespace in GetRelocType
No functionality change intended.

llvm-svn: 221443
2014-11-06 08:10:37 +00:00
Quentin Colombet
d01225e402 [X86] Lower VSELECT into SHRUNKBLEND when we shrink the bits used into the
condition to match a blend.
This prevents optimizations that work on VSELECT to perform invalid
transformations. Indeed, the optimized condition does not match the vector
boolean content that is expected and bad things may happen.

This patch yields the exact same code on the whole test-suite + specs (-O3 and
-O3 -march=core-avx2), it improves one test case (vector-blend.ll) and fixes a
bug reduced in vselect-avx.ll.

<rdar://problem/18819506>

llvm-svn: 221429
2014-11-06 02:25:03 +00:00
Simon Pilgrim
c2b395ce18 [X86][SSE] Vector integer to float conversion memory folding
Added missing memory folding for the (V)CVTDQ2PS instructions - we can safely fold these (but not the (V)CVTDQ2PD versions which have a register/memory size discrepancy in the source operand). I've added a test case demonstrating that stack folding now works.

Differential Revision: http://reviews.llvm.org/D5981

llvm-svn: 221407
2014-11-05 22:28:25 +00:00
Matt Arsenault
92d039adc4 R600/SI: Fix omod display for VOP3b
llvm-svn: 221387
2014-11-05 19:35:00 +00:00
Derek Schuff
2c22caff55 [x86 fast-isel] Materialize allocas with the correct-sized lea for ILP32
Summary:
X86FastISel::fastMaterializeAlloca was incorrectly conditioning its
opcode selection on subtarget bitness rather than pointer size.

Differential Revision: http://reviews.llvm.org/D6136

llvm-svn: 221386
2014-11-05 19:27:21 +00:00
Matt Arsenault
3bdbaa2185 R600/SI: Move all rsrc building functions to SIISelLowering
llvm-svn: 221383
2014-11-05 19:01:19 +00:00
Matt Arsenault
37abb7d7e6 R600/SI: Remove SI_ADDR64_RSRC
llvm-svn: 221382
2014-11-05 19:01:17 +00:00
Justin Holewinski
c8bdbc6d01 [NVPTX] Add NVPTXLowerStructArgs pass
This works around the limitation that PTX does not allow .param space
loads/stores with arbitrary pointers.

If a function has a by-val struct ptr arg, say foo(%struct.x *byval %d), then
add the following instructions to the first basic block :

%temp = alloca %struct.x, align 8
%tt1 = bitcast %struct.x * %d to i8 *
%tt2 = llvm.nvvm.cvt.gen.to.param %tt2
%tempd = bitcast i8 addrspace(101) * to %struct.x addrspace(101) *
%tv = load %struct.x addrspace(101) * %tempd
store %struct.x %tv, %struct.x * %temp, align 8

The above code allocates some space in the stack and copies the incoming
struct from param space to local space. Then replace all occurences of %d
by %temp.

Fixes PR21465.

llvm-svn: 221377
2014-11-05 18:19:30 +00:00
Duncan P. N. Exon Smith
fb1e9e11dd IR: MDNode => Value: NamedMDNode::getOperator()
Change `NamedMDNode::getOperator()` from returning `MDNode *` to
returning `Value *`.  To reduce boilerplate at some call sites, add a
`getOperatorAsMDNode()` for named metadata that's expected to only
return `MDNode` -- for now, that's everything, but debug node named
metadata (such as llvm.dbg.cu and llvm.dbg.sp) will soon change.  This
is part of PR21433.

Note that there's a follow-up patch to clang for the API change.

llvm-svn: 221375
2014-11-05 18:16:03 +00:00
Tilmann Scheller
aa24ed47f6 [ARM] Remove more dead code.
Dead code identified by the Clang static analyzer.

llvm-svn: 221372
2014-11-05 17:45:04 +00:00
Zoran Jovanovic
f79dde9378 ps][microMIPS] Implement CodeGen support for ANDI16 instruction
llvm-svn: 221371
2014-11-05 17:43:00 +00:00
Colin LeMahieu
f858df62bd [Hexagon] [NFC] Alphabetizing cmake files.
llvm-svn: 221370
2014-11-05 17:38:48 +00:00
Zoran Jovanovic
a0eca4a912 ps][microMIPS] Implement CodeGen support for SLL16 and SRL16 instructions
llvm-svn: 221369
2014-11-05 17:38:31 +00:00
Tilmann Scheller
d42fb07871 [ARM] Remove another redundant assignment.
Found by the Clang static analyzer.

llvm-svn: 221368
2014-11-05 17:34:04 +00:00
Zoran Jovanovic
3aa8010f59 [mips][microMIPS] Implement ANDI16 instruction
llvm-svn: 221367
2014-11-05 17:31:00 +00:00
Tilmann Scheller
9602608981 [ARM] Remove redundant assignment.
Found by the Clang static analyzer.

llvm-svn: 221366
2014-11-05 17:28:19 +00:00
Tilmann Scheller
98cc58a81f [ARM] Remove dead code identified by the Clang static analyzer.
llvm-svn: 221358
2014-11-05 17:10:43 +00:00
Zoran Jovanovic
cc26c757fc [mips][microMIPS] Mark symbols as microMIPS if necessary
Differential Revision: http://reviews.llvm.org/D6039

llvm-svn: 221355
2014-11-05 16:35:20 +00:00
Zoran Jovanovic
9298c57c2a Reverted revisions 221351, 221352 and 221353.
llvm-svn: 221354
2014-11-05 16:19:59 +00:00
Zoran Jovanovic
59ff702bee [mips][microMIPS] Implement CodeGen support for ANDI16 instruction
Differential Revision: http://reviews.llvm.org/D5797

llvm-svn: 221353
2014-11-05 15:54:05 +00:00
Zoran Jovanovic
f62372a286 [mips][microMIPS] Implement CodeGen support for SLL16 and SRL16 instructions
Differential Revision: http://reviews.llvm.org/D5933

llvm-svn: 221352
2014-11-05 15:46:53 +00:00
Zoran Jovanovic
d61600192c [mips][microMIPS] Implement ANDI16 instruction
Differential Revision: http://reviews.llvm.org/D5163

llvm-svn: 221351
2014-11-05 15:39:41 +00:00
Tom Stellard
0384d3478f R600/SI: Change all instruction assembly names to lowercase.
This matches the format produced by the AMD proprietary driver.

//==================================================================//
// Shell script for converting .ll test cases: (Pass the .ll files
   you want to convert to this script as arguments).
//==================================================================//

; This was necessary on my system so that A-Z in sed would match only
; upper case.  I'm not sure why.
export LC_ALL='C'

TEST_FILES="$*"

MATCHES=`grep -v Patterns SIInstructions.td | grep -o '"[A-Z0-9_]\+["e]' | grep -o '[A-Z0-9_]\+' | sort -r`

for f in $TEST_FILES; do
  # Check that there are SI tests:
  grep -q -e 'verde' -e 'bonaire' -e 'SI' -e 'tahiti' $f
  if [ $? -eq 0 ]; then
    for match in $MATCHES; do
      sed -i -e "s/\([ :]$match\)/\L\1/" $f
    done

    # Try to get check lines with partial instruction names
    sed -i 's/\(;[ ]*SI[A-Z\\-]*: \)\([A-Z_0-9]\+\)/\1\L\2/' $f
  fi
done

sed -i -e 's/bb0_1/BB0_1/g' ../../../test/CodeGen/R600/infinite-loop.ll
sed -i -e 's/SI-NOT: bfe/SI-NOT: {{[^@]}}bfe/g'../../../test/CodeGen/R600/llvm.AMDGPU.bfe.*32.ll ../../../test/CodeGen/R600/sext-in-reg.ll
sed -i -e 's/exp_IEEE/EXP_IEEE/g' ../../../test/CodeGen/R600/llvm.exp2.ll
sed -i -e 's/numVgprs/NumVgprs/g' ../../../test/CodeGen/R600/register-count-comments.ll
sed -i 's/\(; CHECK[-NOT]*: \)\([A-Z_0-9]\+\)/\1\L\2/' ../../../test/CodeGen/R600/select64.ll ../../../test/CodeGen/R600/sgpr-copy.ll

//==================================================================//
// Shell script for converting .td files (run this last)
//==================================================================//

export LC_ALL='C'
sed -i -e '/Patterns/!s/\("[A-Z0-9_]\+[ "e]\)/\L\1/g' SIInstructions.td
sed -i -e 's/"EXP/"exp/g' SIInstrInfo.td

llvm-svn: 221350
2014-11-05 14:50:53 +00:00
Andrea Di Biagio
38de0c92c6 [X86] Teach method 'isVectorClearMaskLegal' how to check for legal blend masks.
This patch improves the folding of vector AND nodes into blend operations for
targets that feature SSE4.1. A vector AND node where one of the operands is
a constant build_vector with elements that are either zero or all-ones can be
converted into a blend.

This allows for example to simplify the following code:

define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
  %1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1>
  %2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0>
  %3 = or <4 x i32> %1, %2
  ret <4 x i32> %3
}

Before this patch llc (-mcpu=corei7) generated:
        andps  LCPI1_0(%rip), %xmm0, %xmm0
        andps  LCPI1_1(%rip), %xmm1, %xmm1
        orps   %xmm1, %xmm0, %xmm0
        retq

With this patch we generate a single 'vpblendw'.

llvm-svn: 221343
2014-11-05 13:04:14 +00:00
Oliver Stannard
cd482a4c4e [ARM] Honor FeatureD16 in the assembler and disassembler
Some ARM FPUs only have 16 double-precision registers, rather than the
normal 32. LLVM represents this with the D16 target feature. This is
currently used by CodeGen to avoid using high registers when they are
not available, but the assembler and disassembler do not.

I fix this in the assmebler and disassembler rather than the
InstrInfo.td files, as the latter would require a large number of
changes everywhere one of the floating-point instructions is referenced
in the backend. This solution is similar to the one used for
co-processor numbers and MSR masks.

llvm-svn: 221341
2014-11-05 12:06:39 +00:00
Tim Northover
b0edeb50ba ARM: try to add extra CS-register whenever stack alignment >= 8.
We currently try to push an even number of registers to preserve 8-byte
alignment during a function's prologue, but only when the stack alignment is
prcisely 8. Many of the reasons for doing it apply also when that alignment > 8
(the extra store is often free, and can save another stack adjustment, though
less frequently for 16-byte stack alignment).

llvm-svn: 221321
2014-11-05 00:27:20 +00:00
Tim Northover
ea6751aca3 ARM/Dwarf: correctly align stack before callee-saved VPRs
We were making an attempt to do this by adding an extra callee-saved GPR (so
that there was an even number in the list), but when that failed we went ahead
and pushed anyway.

This had a couple of potential issues:
  + The .cfi directives we emit misplaced dN because they were based on
    PrologEpilogInserter's calculation.
  + Unaligned stores can be less efficient.
  + Unaligned stores can actually fault (likely only an issue in niche cases,
    but possible).

This adds a final explicit stack adjustment if all other options fail, so that
the actual locations of the registers match up with where they should be.

llvm-svn: 221320
2014-11-05 00:27:13 +00:00
Simon Pilgrim
c7629829ea [X86][SSE] Enable commutation for SSE immediate blend instructions
Patch to allow (v)blendps, (v)blendpd, (v)pblendw and vpblendd instructions to be commuted - swaps the src registers and inverts the blend mask.

This is primarily to improve memory folding (see new tests), but it also improves the quality of shuffles (see modified tests).

Differential Revision: http://reviews.llvm.org/D6015

llvm-svn: 221313
2014-11-04 23:25:08 +00:00
Juergen Ributzka
dc5e1808f1 [AArch64] Use the correct register class for ORR.
While fixing up the register classes in the machine combiner in a previous
commit I missed one.

This fixes the last one and adds a test case.

llvm-svn: 221308
2014-11-04 22:20:07 +00:00
Rafael Espindola
9c149d6662 Revert "[mips] Add names and tests for the hardware registers"
This reverts commit r221299.

The tests

    LLVM :: MC/Disassembler/Mips/mips32.txt
    LLVM :: MC/Disassembler/Mips/mips32_le.txt

were failing.

llvm-svn: 221307
2014-11-04 22:15:05 +00:00
Vasileios Kalintiris
79193bf819 [mips] Move COP2 & COP3 load/store instructions from MipsInstrFPU.td to MipsInstrInfo.td. NFC.
Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5843

llvm-svn: 221300
2014-11-04 21:45:16 +00:00
Vasileios Kalintiris
2d9f60f771 [mips] Add names and tests for the hardware registers
Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5763

llvm-svn: 221299
2014-11-04 21:30:44 +00:00
Andrea Di Biagio
1f25a26da7 [X86] Add 'FeatureSlowSHLD' to cpu 'bdver3'. Also explicit set FeatureAVX and FeatureSSE4A for all the bdver* cpus.
This patch adds 'FeatureSlowSHLD' to 'bdver3'.
According to the official AMD optimization guide for amdfam15: "Using
alternative code in place of SHLD achieves lower overall latency and
requires fewer execution resources. The 32-bit and 64-bit forms of
ADD, ADC, SHR, and LEA (except 16-bit form) are DirectPath
instructions, while SHLD is a VectorPath instruction."

This patch also explicitly sets feature AVX and SSE4A for all the bdver*
cpus. This part of the patch is a non-functional change and it is mainly
done for clarity reasons (Both XOP and FMA4 already imply AVX and SSE4A).

llvm-svn: 221296
2014-11-04 21:18:09 +00:00
Matt Arsenault
8d0302bad1 R600/SI: Rename div_scale dest operands to match documentation
llvm-svn: 221291
2014-11-04 20:29:20 +00:00
Benjamin Kramer
b612f063c8 AArch64: Pattern match integer vector abs like we do on ARM.
This kind of pattern is emitted by the loop vectorizer.

llvm-svn: 221289
2014-11-04 20:10:06 +00:00
Toma Tabacu
93fe1e2fb1 [mips] Improve support for the .set mips16/nomips16 assembler directives.
Summary:
Appropriately set/clear the FeatureBit for Mips16 when these assembler directives are used and also emit ".set nomips16" (previously, only ".set mips16" was being emitted).

These improvements allow for better testing of the .cpload/.cprestore assembler directives (which are not supposed to work when Mips16 is enabled).

Test Plan: The test is bare-bones because there are no MC tests for Mips16 instructions (there's only one, which checks that the Mips16 ELF header flag gets set), and that suggests to me that it has not been implemented yet in the IAS.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5462

llvm-svn: 221277
2014-11-04 17:18:07 +00:00
NAKAMURA Takumi
d354a0d376 R600/LLVMBuild.txt: Add TransformUtils.
llvm-svn: 221228
2014-11-04 02:16:53 +00:00
Colin LeMahieu
f34b00f529 [Hexagon] Reverting 220584 to address ASAN errors.
llvm-svn: 221210
2014-11-04 00:14:36 +00:00
Akira Hatanaka
5a78b496e9 Rename variables to conform to llvm coding standards.
Differential Revision: http://reviews.llvm.org/D6062

llvm-svn: 221204
2014-11-03 23:24:10 +00:00
Akira Hatanaka
0619ecdf1b [AArch64] Make function processLogicalImmediate more efficient. NFC.
llvm-svn: 221199
2014-11-03 23:06:31 +00:00
Ahmed Bougacha
bc21735f09 [X86] Add debug print name for X86ISD::[US]MUL8. NFC-ish.
The opcodes were added in r220516, but I forgot to add the print names.

llvm-svn: 221185
2014-11-03 21:25:18 +00:00
Akira Hatanaka
e7d08f497e [ARM, inline-asm] Fix ARMTargetLowering::getRegForInlineAsmConstraint to return
register class tGPRRegClass if the target is thumb1.

This commit fixes a crash that occurs during register allocation which was
triggered when a virtual register defined by an inline-asm instruction had to
be spilled.
 
rdar://problem/18740489

llvm-svn: 221178
2014-11-03 20:37:04 +00:00
Ahmed Bougacha
f2a8134721 [X86] 8bit divrem: Improve codegen for AH register extraction.
For 8-bit divrems where the remainder is used, we used to generate:
    divb  %sil
    shrw  $8, %ax
    movzbl  %al, %eax

That was to avoid an H-reg access, which is problematic mainly because
it isn't possible in REX-prefixed instructions.

This patch optimizes that to:
    divb  %sil
    movzbl  %ah, %eax

To do that, we explicitly extend AH, and extract the L-subreg in the
resulting register.  The extension is done using the NOREX variants of
MOVZX.  To support signed operations, MOVSX_NOREX is also added.
Further, this introduces a new SDNode type, [us]divrem_ext_hreg, which is
then lowered to a sequence containing a single zext (rather than 2).

Differential Revision: http://reviews.llvm.org/D6064

llvm-svn: 221176
2014-11-03 20:26:35 +00:00
Tom Stellard
5b15714b76 Reapply: R600: Make sure to inline all internal functions
Function calls aren't supported yet.

This was reverted due to build breakages, which should be fixed now.

llvm-svn: 221173
2014-11-03 19:49:05 +00:00
Duncan P. N. Exon Smith
8f49c8202f IR: MDNode => Value: Instruction::getAllMetadataOtherThanDebugLoc()
Change `Instruction::getAllMetadataOtherThanDebugLoc()` from a vector of
`MDNode` to one of `Value`.  Part of PR21433.

llvm-svn: 221167
2014-11-03 18:13:57 +00:00
Charlie Turner
05675977c5 Remove the cortex-a9-mp CPU.
This CPU definition is redundant. The Cortex-A9 is defined as
supporting multiprocessing extensions. Remove its definition and
update appropriate tests.

LLVM defines both a cortex-a9 CPU and a cortex-a9-mp CPU. The only
difference between the two CPU definitions in ARM.td is that
cortex-a9-mp contains the feature FeatureMP for multiprocessing
extensions.

This is redundant since the Cortex-A9 is defined as having
multiprocessing extensions in the TRMs. armcc also defines the
Cortex-A9 as having multiprocessing extensions by default.

Change-Id: Ifcadaa6c322be0a33d9d2a39cfdd7da1d75981a7
llvm-svn: 221166
2014-11-03 17:38:00 +00:00
Oliver Stannard
2c73f413f7 [AArch64] Fix miscompile of comparison with 0xffffffffffffffff
Some literals in the AArch64 backend had 15 'f's rather than 16, causing
comparisons with a constant 0xffffffffffffffff to be miscompiled.

llvm-svn: 221157
2014-11-03 15:28:40 +00:00
Sid Manning
f60eba6543 Handle ctor/init_array initialization.
Hexagon was not calling InitializeELF and could not select between
ctors and init_array.

Phabricator revision: http://reviews.llvm.org/D6061

llvm-svn: 221156
2014-11-03 14:56:05 +00:00
Daniel Sanders
b694a1f8a6 [mips] Remove unused prototype and variable. NFC.
llvm-svn: 221146
2014-11-03 10:14:57 +00:00
Matt Arsenault
8491fb66a7 R600: Don't unnecessarily repeat the register class
llvm-svn: 221119
2014-11-02 23:46:59 +00:00
Matt Arsenault
89859004b1 R600/SI: Use REG_SEQUENCE instead of INSERT_SUBREGs
llvm-svn: 221118
2014-11-02 23:46:54 +00:00
Matt Arsenault
1838bf2925 Support REG_SEQUENCE in tablegen.
The problem is mostly that variadic output instruction
aren't handled, so it is rejected for having an inconsistent
number of operands, and then the right number of operands
isn't emitted.

llvm-svn: 221117
2014-11-02 23:46:51 +00:00
Daniel Sanders
913fc0e3df Re-commit r221056 and others with fix, "[mips] Move F128 argument handling into MipsCCState as we did for returns. NFC."
sret arguments can never originate from an f128 argument so we detect
sret arguments and push false into OriginalArgWasF128.

llvm-svn: 221102
2014-11-02 16:09:29 +00:00
NAKAMURA Takumi
67a64a5c44 Revert r221056 and others, "[mips] Move F128 argument handling into MipsCCState as we did for returns. NFC."
r221056 "[mips] Move F128 argument handling into MipsCCState as we did for returns. NFC."
  r221058 "[mips] Fix unused variable warning introduced in r221056"
  r221059 "[mips] Move all ByVal handling into CCState and tablegen-erated code. NFC."
  r221061 "Renamed CCState members that appear to misspell 'Processed' as 'Proceed'. NFC."

It cuased an undefined behavior in LLVM :: CodeGen/Mips/return-vector.ll.

llvm-svn: 221081
2014-11-02 04:43:54 +00:00