1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00
Commit Graph

39863 Commits

Author SHA1 Message Date
Javed Absar
90db73a094 [ARM] Assign cost of scaling for Cortex-R52
This patch assigns cost of the scaling used in addressing for Cortex-R52.

On Cortex-R52 a negated register offset takes longer than a non-negated
register offset, in a register-offset addressing mode.

Differential Revision: http://reviews.llvm.org/D25670

Reviewer: jmolloy
llvm-svn: 284460
2016-10-18 09:08:54 +00:00
Simon Pilgrim
293c1a3deb [X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32
As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.

This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.

It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.

Differential Revision: https://reviews.llvm.org/D23808

llvm-svn: 284459
2016-10-18 07:42:15 +00:00
Dean Michael Berris
b135433036 [XRay] Support for for tail calls for ARM no-Thumb
This patch adds simplified support for tail calls on ARM with XRay instrumentation.

Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall
-std=c++14  -ffunction-sections -fdata-sections` (this list doesn't include my
specific flags like --target=armv7-linux-gnueabihf etc.), the following program

    #include <cstdio>
    #include <cassert>
    #include <xray/xray_interface.h>

    [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() {
      std::printf("In fC()\n");
    }

    [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() {
      std::printf("In fB()\n");
      fC();
    }

    [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() {
      std::printf("In fA()\n");
      fB();
    }

    // Avoid infinite recursion in case the logging function is instrumented (so calls logging
    //   function again).
    [[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret)
    {
      printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret));
    }

    int main(int argc, char* argv[]) {
      __xray_set_handler(simplyPrint);

      printf("Patching...\n");
      __xray_patch();
      fA();

      printf("Unpatching...\n");
      __xray_unpatch();
      fA();

      return 0;
    }

gives the following output:

    Patching...
    XRay: functionId=3 type=0.
    In fA()
    XRay: functionId=3 type=1.
    XRay: functionId=2 type=0.
    In fB()
    XRay: functionId=2 type=1.
    XRay: functionId=1 type=0.
    XRay: functionId=1 type=1.
    In fC()
    Unpatching...
    In fA()
    In fB()
    In fC()

So for function fC() the exit sled seems to be called too much before function
exit: before printing In fC().

Debugging shows that the above happens because printf from fC is also called as
a tail call. So first the exit sled of fC is executed, and only then printf is
jumped into. So it seems we can't do anything about this with the current
approach (i.e. within the simplification described in
https://reviews.llvm.org/D23988 ).

Differential Revision: https://reviews.llvm.org/D25030

llvm-svn: 284456
2016-10-18 05:54:15 +00:00
Craig Topper
f805b093b4 [X86] Fix DecodeVPERMVMask to handle cases where the constant pool entry has a different type than the shuffle itself.
This is especially important for 32-bit targets with 64-bit shuffle elements.

llvm-svn: 284453
2016-10-18 04:48:33 +00:00
Craig Topper
a71d800155 [AVX-512] Fix DecodeVPERMV3Mask to handle cases where the constant pool entry has a different type than the shuffle itself.
Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem.

Reviewers: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25666

llvm-svn: 284451
2016-10-18 04:00:32 +00:00
Craig Topper
baaf1b7511 [AVX-512] Add support for decoding shuffle mask from constant pool for masked VPERMILPS/PD.
llvm-svn: 284450
2016-10-18 03:36:52 +00:00
Konstantin Zhuravlyov
f1c9f3143b [AMDGPU] Mark .note section SHF_ALLOC so lld creates a segment for it
Differential Revision: https://reviews.llvm.org/D25694

llvm-svn: 284435
2016-10-17 22:40:15 +00:00
Tim Northover
2bc7209c52 GlobalISel: support wider range of load/store sizes in AArch64.
llvm-svn: 284406
2016-10-17 18:36:53 +00:00
Tom Stellard
e57c681680 AMDGPU/SI: LowerParameter() should be computing align based on memory type
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25203

llvm-svn: 284398
2016-10-17 16:56:19 +00:00
Tom Stellard
a4414c957a AMDGPU/SI: Fix LowerParameter() for i16 arguments
Summary:
If we are loading an i16 value from a 32-bit memory location, then
we need to be able to truncate the loaded value to i16.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25198

llvm-svn: 284397
2016-10-17 16:21:45 +00:00
Craig Topper
4538af9fb0 [X86] Fix shuffle decoding assertions to print the right number of required operands. Update the checks themselves to be >= to the same number instead of > one less than the required number.
llvm-svn: 284365
2016-10-17 06:41:18 +00:00
Craig Topper
0b7e675d9c [AVX-512] Add shuffle combining support for vpermi2var shuffles derived from existing support for vpermt2var.
llvm-svn: 284357
2016-10-17 04:26:47 +00:00
Craig Topper
6621b37696 [AVX-512] Add support for turning a 256-bit load that goes to both halfs of an insert_subvector into a subvector broadcast.
Differential Revision: https://reviews.llvm.org/D25650

llvm-svn: 284353
2016-10-16 23:29:51 +00:00
Craig Topper
de787f6ced [AVX-512] Fix the operand order for vpermi2var_qi intrinsics to match the other vpermi2var intrinsics.
llvm-svn: 284329
2016-10-16 04:54:35 +00:00
Craig Topper
fc3bffdd20 [AVX-512] Correct execution domain for VPERMT2PS and VPERMI2PS.
llvm-svn: 284328
2016-10-16 04:54:31 +00:00
Craig Topper
a6d8c5e30e [AVX-512] Move (v4i64 (X86SubVBroadcast (v2i64))) alternate patterns under a HasVLX predicate. Similar for floating point.
llvm-svn: 284327
2016-10-16 04:54:26 +00:00
Davide Italiano
a92b11f81e [ArmFastISel] Kill dead code. NFCI.
llvm-svn: 284320
2016-10-16 01:09:39 +00:00
Konstantin Zhuravlyov
1aefc15a1a [MachineMemOperand] Move synchronization scope and atomic orderings from SDNode to MachineMemOperand, and remove redundant getAtomic* member functions from SelectionDAG.
Differential Revision: https://reviews.llvm.org/D24577

llvm-svn: 284312
2016-10-15 22:01:18 +00:00
Craig Topper
865d3464cd [AVX-512] Add shuffle comments for vbroadcast instructions.
llvm-svn: 284305
2016-10-15 16:26:07 +00:00
Craig Topper
86fd8363e9 [AVX-512] Rename VPBROADCASTI32X2 and VPBROADCASTF32X2 instruction classes to match the mnemonic which does not include a 'P'.
llvm-svn: 284304
2016-10-15 16:26:02 +00:00
Tom Stellard
eeebdabf5d AMDGPU/SI: Handle s_getreg hazard in GCNHazardRecognizer
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25526

llvm-svn: 284298
2016-10-15 00:58:14 +00:00
Tim Northover
dc91ae935f GlobalISel: rename legalizer components to match others.
The previous names were both misleading (the MachineLegalizer actually
contained the info tables) and inconsistent with the selector & translator (in
having a "Machine") prefix. This should make everything sensible again.

The only functional change is the name of a couple of command-line options.

llvm-svn: 284287
2016-10-14 22:18:18 +00:00
Guozhi Wei
19888a3d98 [PPC] Shorter sequence to load 64bit constant with same hi/lo words
This is a patch to implement pr30640.

When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register.

This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together.

Differential Revision: https://reviews.llvm.org/D25521

llvm-svn: 284276
2016-10-14 20:41:50 +00:00
Tom Stellard
ac33187376 AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations
Summary:
We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff)

Reviewers: arsenm, nhaehnle

Subscribers: tony-tye, arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24672

llvm-svn: 284267
2016-10-14 19:14:29 +00:00
Krzysztof Parzyszek
7e79b59e6f The real fix for post-r284255 failures
llvm-svn: 284264
2016-10-14 19:06:25 +00:00
Krzysztof Parzyszek
8ac6d87b06 Workaround to eliminate check-llvm failures after r284255
llvm-svn: 284262
2016-10-14 18:36:42 +00:00
David L Kreitzer
1233356719 Add a pass to optimize patterns of vectorized interleaved memory accesses for
X86. The pass optimizes as a unit the entire wide load + shuffles pattern
produced by interleaved vectorization. This initial patch optimizes one pattern
(64-bit elements interleaved by a factor of 4). Future patches will generalize
to additional patterns.

Patch by Farhana Aleen

Differential revision: http://reviews.llvm.org/D24681

llvm-svn: 284260
2016-10-14 18:20:41 +00:00
Tom Stellard
19c9255423 AMDGPU/SI: Don't allow unaligned scratch access
Summary: The hardware doesn't support this.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25523

llvm-svn: 284257
2016-10-14 18:10:39 +00:00
Krzysztof Parzyszek
ad19e572fb [RDF] Switch RegisterRef to be a pair (Register, LaneMask)
Use PackedRegisterRef to store the register information in the graph nodes.

This commit also removes support for virtual registers. It has never been
tested or used. It will be possible to add it back if there is a need.

llvm-svn: 284255
2016-10-14 17:57:55 +00:00
David L Kreitzer
d68b19752a [safestack] Use non-thread-local unsafe stack pointer for Contiki OS
Patch by Michael LeMay

Differential revision: http://reviews.llvm.org/D19852

llvm-svn: 284254
2016-10-14 17:56:00 +00:00
Eric Christopher
259a7b3aba Revert "In preparation for removing getNameWithPrefix off of
TargetMachine," as it's causing sanitizer/memory issues until I
can track down this set.

This reverts commit r284203

llvm-svn: 284252
2016-10-14 17:28:23 +00:00
Pierre Gousseau
1c56ea6dd5 [X86] Take advantage of the lzcnt instruction on btver2 architectures when ORing comparisons to zero.
This change adds transformations such as:
  zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0))))
  To:
  srl(or(ctlz(x), ctlz(y)), log2(bitsize(x))
This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput.
Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it.
For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar.

Differential Revision: https://reviews.llvm.org/D23446

llvm-svn: 284248
2016-10-14 16:41:38 +00:00
Nicolai Haehnle
5c5844a79f AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodes
Summary:
This will be used for 64-bit MULHU, which is in turn used for the 64-bit
divide-by-constant optimization (see D24822).

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25289

llvm-svn: 284224
2016-10-14 10:30:00 +00:00
Simon Dardis
7bd8310ae6 [mips] Fix aui/daui/dahi/dati for MIPSR6
For compatiblity with binutils, define these instructions to take
two registers with a 16bit unsigned immediate. Both of the registers
have to be same for dahi and dati.

Reviewers: dsanders, zoran.jovanovic

Differential Review: https://reviews.llvm.org/D21473

llvm-svn: 284218
2016-10-14 09:31:42 +00:00
Nicolai Haehnle
1c18014661 AMDGPU: Fix use-after-frees
Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25312

llvm-svn: 284215
2016-10-14 09:03:04 +00:00
Michael Zuckerman
6441a0b461 [x86][ms-inline-asm] use of "jmp short" in asm is not supported
Committing in the name of Ziv Izhar: After check-all and LGTM .

The following patch is for compatability with Microsoft.
Microsoft ignores the keyword "short" when used after a jmp, for example:
__asm {
      jmp short label
      label:
      }

A test for that patch will be added in another patch, since it's located in clang's codegen tests. Link will be added shortly.
link to test: https://reviews.llvm.org/D24958

Differential Revision: https://reviews.llvm.org/D24957

llvm-svn: 284211
2016-10-14 08:09:40 +00:00
Eric Christopher
bf50905153 In preparation for removing getNameWithPrefix off of TargetMachine,
sink the current behavior into the callers and sink
TargetMachine::getNameWithPrefix into TargetMachine::getSymbol.

llvm-svn: 284203
2016-10-14 05:47:41 +00:00
Eric Christopher
403df02984 Tidy the calls to getCurrentSection().first -> getCurrentSectionOnly to help
readability a bit.

llvm-svn: 284202
2016-10-14 05:47:37 +00:00
Konstantin Zhuravlyov
b970547eb1 [AMDGPU] Emit 32-bit lo/hi got and pc relative variant kinds for external and global address space variables
Differential Revision: https://reviews.llvm.org/D25562

llvm-svn: 284196
2016-10-14 04:37:34 +00:00
Konstantin Zhuravlyov
8c3f44a8af [AMDGPU] Add 32-bit lo/hi got and pc relative variant kinds and emit appropriate relocations
Differential Revision: https://reviews.llvm.org/D25548

llvm-svn: 284195
2016-10-14 04:21:32 +00:00
Saleem Abdulrasool
520d92773b CodeGen: use MSVC division on windows itanium
Windows itanium is identical to MSVC when dealing with everything but C++.
Lower the math routines into msvcrt rather than compiler-rt.

llvm-svn: 284175
2016-10-13 23:00:11 +00:00
Saleem Abdulrasool
ec882c3b4f CodeGen: adjust floating point operations in Windows itanium
Windows itanium is equivalent to MSVC except in C++ mode.  Ensure that the
promote the 32-bit floating point operations to their 64-bit equivalences.

llvm-svn: 284173
2016-10-13 22:38:15 +00:00
Sriraman Tallam
404f8de41a New llc option pie-copy-relocations to optimize access to extern globals.
This option indicates copy relocations support is available from the linker
when building as PIE and allows accesses to extern globals to avoid the GOT.

Differential Revision: https://reviews.llvm.org/D24849

llvm-svn: 284160
2016-10-13 20:54:39 +00:00
Nirav Dave
7c2dd71bf8 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r284151 which appears to be triggering a LTO
failures on Hexagon

llvm-svn: 284157
2016-10-13 20:23:25 +00:00
Nirav Dave
01829c947a In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after upstream changes.

   Simplify Consecutive Merge Store Candidate Search

   Now that address aliasing is much less conservative, push through
   simplified store merging search which only checks for parallel stores
   through the chain subgraph. This is cleaner as the separation of
   non-interfering loads/stores from the store-merging logic.

   Whem merging stores, search up the chain through a single load, and
   finds all possible stores by looking down from through a load and a
   TokenFactor to all stores visited. This improves the quality of the
   output SelectionDAG and generally the output CodeGen (with some
   exceptions).

   Additional Minor Changes:

       1. Finishes removing unused AliasLoad code
       2. Unifies the the chain aggregation in the merged stores across
       code paths
       3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
       4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

   This finishes the change Matt Arsenault started in r246307 and
   jyknight's original patch.

   Many tests required some changes as memory operations are now
   reorderable. Some tests relying on the order were changed to use
   volatile memory operations

   Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 284151
2016-10-13 19:20:16 +00:00
Quentin Colombet
a3a1780e1a [AArch64][RegisterBankInfo] Switch to fully static opds mapping for G_BITCAST.
NFC.

llvm-svn: 284146
2016-10-13 18:46:38 +00:00
Igor Breger
41d9f78ab6 [X86][AVX512] Fix sext v32i1 -> v32i8 lowering.
Fix PR30600.

Differential Revision: https://reviews.llvm.org/D25554

llvm-svn: 284134
2016-10-13 17:20:38 +00:00
Reid Kleckner
46c5ea7083 Fix for PR30687. Avoid dereferencing MBB.end().
We don't need to return a MachineInstr* from these stack probe insertion
calls anyway. If we ever need to add it back, we can return an iterator
instead.

Based on a patch by David Kreitzer

This bug is a consequence of

r279314 | dexonsmith | 2016-08-19 13:40:12 -0700 (Fri, 19 Aug 2016) | 110 lines

We hit the "Assertion `!NodePtr->isKnownSentinel()' failed" assertion,
but only when inserting a stack probe call at the end of an MBB, which
isn't necessarily a common situation.

Differential Revision: https://reviews.llvm.org/D25566

llvm-svn: 284130
2016-10-13 15:48:48 +00:00
Javed Absar
35adbf7085 [ARM]: Assign cost of scaling used in addressing mode for ARM cores
This patch assigns cost of the scaling used in addressing.
On many ARM cores, a negated register offset takes longer than a
non-negated register offset, in a register-offset addressing mode.

For instance:

LDR R0, [R1, R2 LSL #2]
LDR R0, [R1, -R2 LSL #2]

Above, (1) takes less cycles than (2).

By assigning appropriate scaling factor cost, we enable the LLVM
to make the right trade-offs in the optimization and code-selection phase.

Differential Revision: http://reviews.llvm.org/D24857

Reviewers: jmolloy, rengolin
llvm-svn: 284127
2016-10-13 14:57:43 +00:00
Matt Arsenault
e8b0ec39b7 AMDGPU: Assume spilling will occur at -O0
Because everything live is spilled at the end of a
block by fast regalloc, assume this will happen and
avoid the copies of the resource descriptor.

llvm-svn: 284119
2016-10-13 13:10:00 +00:00
Matt Arsenault
62c9ce83e6 AMDGPU: Fix truncate to bool warnings
llvm-svn: 284116
2016-10-13 12:45:16 +00:00
Simon Dardis
df162f15a6 [mips] Add IAS support for dvp, evp
These instructions were only defined for microMIPSR6 previously. Add
definitions for MIPSR6, correct definitions for microMIPSR6, flag these
instructions as having unmodelled side effects (they disable/enable
virtual processors) and add missing disassember tests for microMIPSR6.

Reviewers: vkalintiris

Differential Review: https://reviews.llvm.org/D24291

llvm-svn: 284115
2016-10-13 12:12:56 +00:00
Oren Ben Simhon
270c1993be [X86] Basic additions to support RegCall Calling Convention.
The Register Calling Convention (RegCall) was introduced by Intel to optimize parameter transfer on function call.
This calling convention ensures that as many values as possible are passed or returned in registers.
This commit presents the basic additions to LLVM CodeGen in order to support RegCall in X86.

Differential Revision: http://reviews.llvm.org/D25022

llvm-svn: 284108
2016-10-13 07:53:43 +00:00
Daniel Jasper
6c31819c9c Silence unused warning in non-assert builds.
llvm-svn: 284107
2016-10-13 06:39:44 +00:00
Craig Topper
f16d9627b0 [AVX-512] Teach shuffle lowering to recognize 512-bit zero extends.
llvm-svn: 284105
2016-10-13 05:29:41 +00:00
Craig Topper
c306b650bf [X86] Simplify the lowering code for extracting and inserting subvectors.
We don't need to check if AVX is enabled. It's implied by the operation action being set to Custom.
We don't need to check both the input and output type widths. We only need to check the type that's being inserted or extracted. The other type is known to be a legal type and we can assume its a different width.

llvm-svn: 284102
2016-10-13 04:14:47 +00:00
Quentin Colombet
dab36c3ff5 [AArch64][RegisterBankInfo] Provide alternative mappings for 64-bit load
This allows RegBankSelect in greedy mode to get rid some of the cross
register bank copies when loads are involved in the chain of
computation.

llvm-svn: 284097
2016-10-13 01:01:23 +00:00
Reid Kleckner
b5d0510883 Correct PrivateLinkage for COFF
- Use storage class C_STAT for 'PrivateLinkage' The storage class for
  PrivateLinkage should equal to the Internal Linkage.

- Set 'PrivateGlobalPrefix' from "L" to ".L" for MM_WinCOFF (includes
  x86_64) MM_WinCOFF has empty GlobalPrefix '\0' so PrivateGlobalPrefix
  "L" may conflict to the normal symbol name starting with 'L'.

Based on a patch by Han Sangjin! Manually updated test cases.

llvm-svn: 284096
2016-10-13 00:55:24 +00:00
Quentin Colombet
6441d42855 [AArch64][RegisterBankInfo] Provide alternative mappings for G_BITCASTs.
Thanks to this patch, RegBankSelect is able to get rid of some register
bank copies as demonstrated in the test case.

llvm-svn: 284094
2016-10-13 00:34:48 +00:00
Quentin Colombet
233216e34f [AArch64][RegisterBankInfo] Describe cross regbank copies statically.
NFC.

llvm-svn: 284091
2016-10-13 00:12:06 +00:00
Quentin Colombet
4261d80dae [AArch64][RegisterBankInfo] Use static mapping for same bank G_BITCAST.
NFC.

llvm-svn: 284090
2016-10-13 00:12:04 +00:00
Quentin Colombet
e473164643 [AArch64][MachineLegalizer] Mark more G_BITCAST as legal.
Basically any vector types that fits in a 32-bit register is also valid
as far as copies are concerned.

llvm-svn: 284089
2016-10-13 00:12:01 +00:00
Quentin Colombet
d92691e6b8 [AArch64][RegisterBankInfo] Bump the cost of vector loads.
This does not change anything yet, because we do not offer any
alternative mapping.

llvm-svn: 284088
2016-10-13 00:11:59 +00:00
Quentin Colombet
c92eb5dfcb [AArch64][RegisterBankInfo] Use a proper cost for cross regbank G_BITCASTs.
This does not change anything yet, because we do not offer any
alternative mapping.

llvm-svn: 284087
2016-10-13 00:11:57 +00:00
Quentin Colombet
cc0c9c861b [AArch64][RegisterBankInfo] Provide more realistic copy costs.
llvm-svn: 284086
2016-10-13 00:11:55 +00:00
Tim Northover
498e50998f GlobalISel: support G_TRUNC selection on AArch64.
Ahmed's patch again.

llvm-svn: 284075
2016-10-12 22:49:15 +00:00
Tim Northover
3df46ddb6d GlobalISel: support int <-> float conversions on AArch64.
More of Ahmed's work.

llvm-svn: 284074
2016-10-12 22:49:11 +00:00
Tim Northover
b9b4b9615e GlobalISel: select G_FCMP instructions on AArch64.
Another of Ahmed's patches.

llvm-svn: 284073
2016-10-12 22:49:07 +00:00
Tim Northover
b2faa745c9 GlobalISel: support selection of G_ICMP on AArch64.
Patch from Ahmed Bougaca again.

llvm-svn: 284072
2016-10-12 22:49:04 +00:00
Tim Northover
98fb949e04 GlobalISel: select G_BRCOND instructions on AArch64.
llvm-svn: 284071
2016-10-12 22:49:01 +00:00
Tim Northover
17cd0939b4 GlobalISel: mark G_BRCOND on s1 as legal.
It's going to be a TBNZ (at -O0) anyway, so the high bits don't matter.

llvm-svn: 284070
2016-10-12 22:48:36 +00:00
Albert Gutowski
14303dbdfa Create llvm.addressofreturnaddress intrinsic
Summary: We need a new LLVM intrinsic to implement MS _AddressOfReturnAddress builtin on 64-bit Windows.

Reviewers: majnemer, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25293

llvm-svn: 284061
2016-10-12 22:13:19 +00:00
Matt Arsenault
fb5c675468 AMDGPU: Initial implementation of VGPR indexing mode
This is the most basic handling of the indirect access
pseudos using GPR indexing mode. This currently only enables
the mode for a single v_mov_b32 and then disables it.
This is much more complicated to use than the movrel instructions,
so a new optimization pass is probably needed to fold the access
into the uses and keep the mode enabled for them.

llvm-svn: 284031
2016-10-12 18:49:05 +00:00
Matt Arsenault
cb0c02c980 AMDGPU: Add instruction definitions for VGPR indexing
VI added a second method of indexing into VGPRs
besides using v_movrel*

llvm-svn: 284027
2016-10-12 18:00:51 +00:00
Tom Stellard
52a5870f77 AMDGPU/SI: Change mimg intrinsic signatures
This makes more fields overridable and removes redundant bits.

Patch by: Changpeng Fang

llvm-svn: 284024
2016-10-12 16:35:29 +00:00
Alexey Bataev
e0af07de49 NFC: The Cost Model specialization, by Andrey Tischenko
The current Cost Model implementation is very inaccurate and has to be
updated, improved, re-implemented to be able to take into account the
concrete CPU models and the concrete targets where this Cost Model is
being used. For example, the Latency Cost Model should be differ from
Code Size Cost Model, etc.
This patch is the first step to launch the developing and implementation
of a new Cost Model generation.

Differential Revision: https://reviews.llvm.org/D25186

llvm-svn: 284012
2016-10-12 13:24:13 +00:00
Quentin Colombet
85ea9448d6 [AArch64][InstrustionSelector] Teach the selector about G_BITCAST.
llvm-svn: 283973
2016-10-12 03:57:52 +00:00
Quentin Colombet
442f16b444 [AArch64][InstructionSelector] Refactor the handling of copies.
Although Copies are not specific to preISel, we still have to assign them
a proper register class. However, given they are not constrained to
anything we do not have to handle the source register at the copy. It
will be properly mapped when reaching the related definition.

In the process, the handlong of G_ANYEXT is slightly modified as those
end up being selected as copy. The difference is that when register size
do not match on both sides, we need to insert SUBREG_TO_REG operation,
otherwise the post RA copy expansion will not be happy!

llvm-svn: 283972
2016-10-12 03:57:49 +00:00
Quentin Colombet
4d726ae788 [AArch64][MachineLegalizer] Mark more bitcasts as legal.
Those are copies, we do not have to do any legalization action for them.

llvm-svn: 283970
2016-10-12 03:57:43 +00:00
Tim Shen
97a4f6a6e9 [PPCMIPeephole] Fix splat elimination
Summary:
In PPCMIPeephole, when we see two splat instructions, we can't simply do the following transformation:
  B = Splat A
  C = Splat B
=>
  C = Splat A
because B may still be used between these two instructions. Instead, we should make the second Splat a PPC::COPY and let later passes decide whether to remove it or not:
  B = Splat A
  C = Splat B
=>
  B = Splat A
  C = COPY B

Fixes PR30663.

Reviewers: echristo, iteratee, kbarton, nemanjai

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D25493

llvm-svn: 283961
2016-10-12 00:48:25 +00:00
Tim Northover
dfe83991c2 GlobalISel: support same-size casts on AArch64.
Mostly Ahmed's work again, I'm just sprucing things up slightly before
committing.

llvm-svn: 283952
2016-10-11 22:29:23 +00:00
Reid Kleckner
f071744d81 Re-land "[Thumb] Save/restore high registers in Thumb1 pro/epilogues"
Reverts r283938 to reinstate r283867 with a fix.

The original change had an ArrayRef referring to a destroyed temporary
initializer list. Use plain C arrays instead.

llvm-svn: 283942
2016-10-11 21:14:03 +00:00
Reid Kleckner
a5f5d77747 Revert "[Thumb] Save/restore high registers in Thumb1 pro/epilogues"
This reverts r283867.

This appears to be an infinite loop:

    while (HiRegToSave != AllHighRegs.end() && CopyReg != AllCopyRegs.end()) {
      if (HiRegsToSave.count(*HiRegToSave)) {
        ...

        CopyReg = findNextOrderedReg(++CopyReg, CopyRegs, AllCopyRegs.end());
        HiRegToSave =
            findNextOrderedReg(++HiRegToSave, HiRegsToSave, AllHighRegs.end());
      }
    }

llvm-svn: 283938
2016-10-11 20:54:41 +00:00
Tim Northover
f238c4b92c GlobalISel: support selection of extend operations.
Patch mostly by Ahmed Bougaca.

llvm-svn: 283937
2016-10-11 20:50:21 +00:00
Konstantin Zhuravlyov
277eacdebe [AMDGPU] Refactor waitcnt encoding
- Refactor bit packing/unpacking
- Calculate bit mask given bit shift and bit width
- Introduce function for decoding bits of waitcnt
- Introduce function for encoding bits of waitcnt
- Introduce function for getting waitcnt mask (instead of using bare numbers)
- Introduce function fot getting max waitcnt(s) (instead of using bare numbers)

Differential Revision: https://reviews.llvm.org/D25298

llvm-svn: 283919
2016-10-11 18:58:22 +00:00
Mehdi Amini
93003b9ee7 Fix "static initialization order fiasco" for the XCore Target.
I fixed all the other Targets in r283702, and interestingly the
sanitizers are only now "sometimes" catching this bug on the only
one I missed.

llvm-svn: 283914
2016-10-11 18:22:41 +00:00
NAKAMURA Takumi
9ce6934080 ARMMachineFunctionInfo.cpp: Add an initializer of ARMFunctionInfo::ReturnRegsCount in the explicit ctor.
It caused crash since r283867.

llvm-svn: 283909
2016-10-11 17:38:30 +00:00
NAKAMURA Takumi
f500124315 Reformat.
llvm-svn: 283908
2016-10-11 17:38:25 +00:00
Daniel Jasper
9861a3ad52 Silence unused warning in non-assert builds.
llvm-svn: 283899
2016-10-11 16:22:36 +00:00
Changpeng Fang
1632d75b59 AMDGPU/SI: Update ISA version numbers for Tonga and Polaris10/11.
Differential Revision:
  http://reviews.llvm.org/D25454

Reviewers:
  tstellarAMD

llvm-svn: 283893
2016-10-11 16:00:47 +00:00
Oliver Stannard
282bb1aaf0 [Thumb] Save/restore high registers in Thumb1 pro/epilogues
The high registers are not allocatable in Thumb1 functions, but they
could still be used by inline assembly, so we need to save and restore
the callee-saved high registers (r8-r11) in the prologue and epilogue.

This is complicated by the fact that the Thumb1 push and pop
instructions cannot access these registers. Therefore, we have to move
them down into low registers before pushing, and move them back after
popping into low registers.

In most functions, we will have low registers that are also being
pushed/popped, which we can use as the temporary registers for
saving/restoring the high registers. However, this is not guaranteed, so
we may need to push some extra low registers to ensure that the high
registers can be saved/restored. For correctness, it would be sufficient
to use just one low register, but if we have enough low registers
available then we only need one push/pop instruction, rather than one
per high register.

We can also use the argument/return registers when they are not live,
and the link register when saving (but not restoring), reducing the
number of extra registers we need to push.

There are still a few extreme edge cases where we need two push/pop
instructions, because not enough low registers can be made live in the
prologue or epilogue.

In addition to the regression tests included here, I've also tested this
using a script to generate functions which clobber different
combinations of registers, have different numbers of argument and return
registers (including variadic arguments), allocate different fixed sized
objects on the stack, and do or don't use variable sized allocas and the
__builtin_return_address intrinsic (all of which affect the available
registers in the prologue and epilogue). I ran these functions in a test
harness which verifies that all of the callee-saved registers are
correctly preserved.

Differential Revision: https://reviews.llvm.org/D24228

llvm-svn: 283867
2016-10-11 10:12:25 +00:00
Oliver Stannard
674ab91499 [ARM] Fix registers clobbered by SjLj EH on soft-float targets
Currently, the Int_eh_sjlj_dispatchsetup intrinsic is marked as
clobbering all registers, including floating-point registers that may
not be present on the target. This is technically true, as we could get
linked against code that does use the FP registers, but that will not
actually work, as the soft-float code cannot save and restore the FP
registers. SjLj exception handling can only work correctly if either all
or none of the code is built for a target with FP registers. Therefore,
we can assume that, when Int_eh_sjlj_dispatchsetup is compiled for a
soft-float target, it is only going to be linked against other
soft-float code, and so only clobbers the general-purpose registers.
This allows us to check that no non-savable registers are clobbered when
generating the prologue/epilogue.

Differential Revision: https://reviews.llvm.org/D25180

llvm-svn: 283866
2016-10-11 10:06:59 +00:00
Diana Picus
aa4f835b48 [AArch64] Allow label arithmetic with add/sub/cmp
Allow instructions such as 'cmp w0, #(end - start)' by folding the
expression into a constant. For ELF, we fold only if the symbols are in
the same section. For MachO, we fold if the expression contains only
symbols that are not linker visible.

Fixes https://llvm.org/bugs/show_bug.cgi?id=18920

Differential Revision: https://reviews.llvm.org/D23834

llvm-svn: 283862
2016-10-11 09:17:47 +00:00
Quentin Colombet
f515b3c80c [AArch64][InstructionSelector] Teach how to select FP load/store.
This patch allows to select 32 and 64-bit FP load and store.

llvm-svn: 283832
2016-10-11 00:21:14 +00:00
Quentin Colombet
4a90fcde10 [AArch64][InstructionSelector] Teach the selector how to handle vector OR.
This only adds the support for 64-bit vector OR. Adding more sizes is
not difficult, but it requires a bigger refactoring because ORs work on
any size, not necessarly the ones that match the width of the register
width. Right now, this is not expressed in the legalization, so don't
bother pushing the refactoring yet.

llvm-svn: 283831
2016-10-11 00:21:11 +00:00
Quentin Colombet
453627f3cf [AArch64][MachineLegalizer] Mark v2s32 G_LOAD as legal.
Actually every 64-bit loads are legal, but right now the API does not
offer a simple way to express that.

llvm-svn: 283829
2016-10-11 00:21:08 +00:00
Peter Collingbourne
861bb221e9 Revert r283690, "MC: Remove unused entities."
llvm-svn: 283814
2016-10-10 22:49:37 +00:00
Tim Northover
7cb88e7053 GlobalISel: select G_GLOBAL_VALUE uses on AArch64.
llvm-svn: 283809
2016-10-10 21:50:00 +00:00
Tim Northover
94d146decd GlobalISel: allow G_GLOBAL_VALUEs in AArch64 legalization.
llvm-svn: 283808
2016-10-10 21:49:53 +00:00
Tim Northover
de665d6f28 GlobalISel: support selecting G_GEP instructions.
They're basically just an alias for G_ADD on AArch64.

llvm-svn: 283807
2016-10-10 21:49:49 +00:00
Tim Northover
c4bdf87acf GlobalISel: support selecting constants on AArch64.
llvm-svn: 283806
2016-10-10 21:49:42 +00:00
Alexandros Lamprineas
7a09ca4165 [ARM] Fix invalid VLDM/VSTM access when targeting Big Endian with NEON
The instructions VLDM/VSTM can only access word-aligned memory
locations and produce alignment fault if the condition is not met.

The compiler currently generates VLDM/VSTM for v2f64 load/store
regardless the alignment of the memory access. Instead, if a v2f64
load/store is not word-aligned, the compiler should generate
VLD1/VST1. For each non double-word-aligned VLD1/VST1, a VREV
instruction should be generated when targeting Big Endian.

Differential Revision: https://reviews.llvm.org/D25281

llvm-svn: 283763
2016-10-10 16:01:54 +00:00
Zvi Rackover
8755b41ec0 [X86] Prefer rotate by 1 over rotate by imm
Summary:
Rotate by 1 is translated to 1 micro-op, while rotate with imm8 is translated to 2 micro-ops.

Fixes pr30644.

Reviewers: delena, igorb, craig.topper, spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D25399

llvm-svn: 283758
2016-10-10 14:43:55 +00:00
Chris Dewhurst
17365ee380 This pass, fixing an erratum in some LEON 2 processors ensures that the SDIV instruction is not issued, but replaced by SDIVcc instead, which does not exhibit the error. Unit test included.
Differential Review: https://reviews.llvm.org/D24660

llvm-svn: 283727
2016-10-10 08:53:06 +00:00
Daniel Jasper
0d0a9b6265 Fix WebAssembly build after r283702.
llvm-svn: 283723
2016-10-10 06:49:55 +00:00
Craig Topper
47e80c7323 [AVX-512] Add missing pattern sext or zext from bytes to quad words with a 128-bit load as input.
llvm-svn: 283720
2016-10-10 06:25:48 +00:00
Michael Zuckerman
78eb7f43a4 [x86][inline-asm][llvm] accept 'v' constraint
Commit in the name of:Coby Tayree
1.'v' constraint for (x86) non-avx arch imitates the already implemented 'x' constraint, i.e. allows XMM{0-15} & YMM{0-15} depending on the apparent arch & mode (32/64).
2.for the avx512 arch it allows [X,Y,Z]MM{0-31} (mode dependent)

This patch applies the needed changes to clang
 clang patch: https://reviews.llvm.org/D25004

Differential Revision: D25005
 

llvm-svn: 283717
2016-10-10 05:48:56 +00:00
Dylan McKay
01431d3493 [AVR] Enable generation of the TableGen assembly writer tables
This also changes the order of the statements in CMakeLists.txt to be
alphabetical.

llvm-svn: 283711
2016-10-10 01:28:45 +00:00
Craig Topper
b83b79db6c [AVX-512] Port 128 and 256-bit memory->register sign/zero extend patterns from SSE file. Also add a minimal set for 512-bit.
llvm-svn: 283704
2016-10-09 23:08:39 +00:00
Craig Topper
62c3b0af31 [X86] Remove redundant patterns. The same pattern appears a few lines up.
llvm-svn: 283703
2016-10-09 23:08:33 +00:00
Mehdi Amini
fa86e5fee9 Move the global variables representing each Target behind accessor function
This avoids "static initialization order fiasco"

Differential Revision: https://reviews.llvm.org/D25412

llvm-svn: 283702
2016-10-09 23:00:34 +00:00
Elena Demikhovsky
d561a4f25b DAG: Setting Masked-Expand-Load as a variant of Masked-Load node
Masked-expand-load node represents load operation that loads a variable amount of elements from memory according to amount of "true" bits in the mask and expands the loaded elements according to their position in the mask vector.
Right now, the node is used in intrinsics for VEXPAND* instructions. 
The work is done towards implementation of masked.expandload and masked.compressstore intrinsics.

Differential Revision: https://reviews.llvm.org/D25322

llvm-svn: 283694
2016-10-09 10:48:52 +00:00
Craig Topper
1903b91730 [AVX-512] Fix execution domain for EVEX encoded VINSERTPS.
llvm-svn: 283692
2016-10-09 06:41:47 +00:00
Peter Collingbourne
99affdec93 MC: Remove unused entities.
llvm-svn: 283691
2016-10-09 04:39:13 +00:00
Peter Collingbourne
f90dab8459 Target: Remove unused entities.
llvm-svn: 283690
2016-10-09 04:38:57 +00:00
Craig Topper
08e1980363 [AVX-512] Add subvector insert and extract to load/store folding tables.
llvm-svn: 283689
2016-10-09 03:54:13 +00:00
Craig Topper
e78b57f9ce [AVX-512] Add the vector down convert instructions to the store folding tables.
llvm-svn: 283687
2016-10-09 03:54:05 +00:00
Mehdi Amini
a6cfd067ac Turn cl::values() (for enum) from a vararg function to using C++ variadic template
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:

 va_start(ValueArgs, Desc);

with Desc being a StringRef.

Differential Revision: https://reviews.llvm.org/D25342

llvm-svn: 283671
2016-10-08 19:41:06 +00:00
Craig Topper
c02d4a8c91 [AVX-512] Fix a bug in getLargestLegalSuperClass where we inflated to VR128X/VR256X even when VLX isn't supported.
This seems to have been responsible for the XMM16-31 spills observed in PR29112. With this fixed the test case has been modified to no longer have a spill of XMM16.

llvm-svn: 283668
2016-10-08 18:49:57 +00:00
Colin LeMahieu
6f13f69e9a [Hexagon] Adding change of flow max 1 (cofMax1) TS flag for marking this restriction rather than implying it from TypeJR.
llvm-svn: 283665
2016-10-08 17:18:51 +00:00
Sebastian Pop
e1edccb1b0 [AArch64] Avoid generating indexed vector instructions for Exynos
Avoid generating indexed vector instructions for Exynos. This is needed for
fmla/fmls/fmul/fmulx. For example, the instruction

  fmla v0.4s, v1.4s, v2.s[1]

is less efficient than the instructions

  dup v2.4s, v2.s[1]
  fmla v0.4s, v1.4s, v2.4s

Patch written by Abderrazek Zaafrani.

Differential Revision: https://reviews.llvm.org/D21571

llvm-svn: 283663
2016-10-08 12:30:07 +00:00
Dylan McKay
7daf0ee243 [AVR] Add backend dependencies to MCTargetDesc/LLVMBuild.txt
llvm-svn: 283642
2016-10-08 01:14:23 +00:00
Dylan McKay
6ea48bea7f Fix incorrect assertion in AVRFrameLowering.cpp
This wasn't looking at the right instruction, and would always fail.

llvm-svn: 283640
2016-10-08 01:10:36 +00:00
Dylan McKay
729f814148 [AVR] Don't worry about call frame size when initializing frame pointer
We previously only used the frame pointer if the frame pointer was too
big. This was to work around a bug (described in this old commit)

https://sourceforge.net/p/avr-llvm/code/204/tree//llvm/trunk/AVR/AVRFrameLowering.cpp?diff=50d64d912718465cb887d17a:203

I mistakenly invered the condition assuming it was a typo. I am now
removing it because it doesn't seem to be a problem anymore (plus it's a
dirty hack).

llvm-svn: 283639
2016-10-08 01:10:31 +00:00
Dylan McKay
6d9258b05a [AVR] Don't shadow container while iterating in range-based loop
This works on clang, but fails on GCC 4.6

llvm-svn: 283638
2016-10-08 01:09:06 +00:00
Dylan McKay
9bc70c9179 [AVR] Use references rather than pointers in AVRISelLowering
llvm-svn: 283636
2016-10-08 01:06:21 +00:00
Dylan McKay
bb08c7e2e9 Allow a maximum of 64 bits to be returned in registers
The rest spills to the stack

Authored by Jake Goulding

llvm-svn: 283635
2016-10-08 01:05:09 +00:00
Dylan McKay
1e3e36d597 [AVR] Expand MULHS for all types
Once MULHS was expanded, this exposed an issue where the condition
register was thought to be 16-bit. This caused an attempt to copy a
16-bit register to an 8-bit register.

Authored by Jake Goulding

llvm-svn: 283634
2016-10-08 01:01:49 +00:00
Dylan McKay
ba8fefd85c [AVR] Add the 'SoftFail' field to all instruction formats
This will be used in the future for disassembly.

llvm-svn: 283630
2016-10-08 00:55:46 +00:00
Dylan McKay
cec36d0272 [AVR] Set up the instruction printer and the assembly backend
llvm-svn: 283629
2016-10-08 00:50:11 +00:00
Dylan McKay
24c6381412 [AVR] Add dependencies to AVR libraries in AVRCodeGen
llvm-svn: 283628
2016-10-08 00:45:24 +00:00
Dylan McKay
4c9681ec4f [AVR] Add missing subdirectories to LLVMBuild
llvm-svn: 283627
2016-10-08 00:42:58 +00:00
Dylan McKay
2919eeaedd [AVR] Add the assembly printer
Summary: This adds the AVRAsmPrinter class.

Reviewers: arsenm, kparzysz

Subscribers: llvm-commits, wdng, beanz, japaric, mgorny

Differential Revision: https://reviews.llvm.org/D25271

llvm-svn: 283623
2016-10-08 00:02:36 +00:00
Tom Stellard
d2d7b48ba5 AMDGPU/SI: Handle div_fmas hazard in GCNHazardRecognizer
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25250

llvm-svn: 283622
2016-10-07 23:42:48 +00:00
Tom Stellard
7128408133 AMDGPU/SI: Add support for 8-byte relocations
Reviewers: arsenm, kzhuravl

Subscribers: wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25375

llvm-svn: 283593
2016-10-07 20:36:58 +00:00
Colin LeMahieu
c46eeafb38 [Hexagon][NFC] Using documented instruction type name V4LDST instead of MEMOP.
llvm-svn: 283582
2016-10-07 19:11:28 +00:00
Tom Stellard
0f9040e19e AMDGPU/SI: Emit fixups for long branches
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25366

llvm-svn: 283570
2016-10-07 16:01:18 +00:00
Artem Tamazov
d87d4169f2 [AMDGPU][mc] Add support for buffer_load_dwordx3, buffer_store_dwordx3.
Partially fixes Bug 28232.
Lit tests added.

Differential Revision: https://reviews.llvm.org/D25367

llvm-svn: 283567
2016-10-07 15:53:16 +00:00
Sam Kolton
cbf772ce03 [AMDGPU] Assembler: support v_mac_f32 DPP and SDWA. Move getNamedOperandIdx to AMDGPUBaseInfo.h
Reviewers: artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D25084

llvm-svn: 283560
2016-10-07 14:46:06 +00:00
Konstantin Zhuravlyov
12f780d0fd [AMDGPU] AMDGPUCodeGenPrepare: remove extra ';'
llvm-svn: 283558
2016-10-07 14:39:53 +00:00
Konstantin Zhuravlyov
602c1c7441 [AMDGPU] Promote uniform (i1, i16] operations to i32
Differential Revision: https://reviews.llvm.org/D25302

llvm-svn: 283555
2016-10-07 14:22:58 +00:00
Javed Absar
a39b9200fb [ARM]: add missing switch case for cortex-r52
Adds a missing switch case for handling cortex-r52
in init-subtarget-features.

llvm-svn: 283551
2016-10-07 13:41:55 +00:00
Martin Storsjo
59ad31e4f1 [ARM] Reapply: Use __rt_div functions for divrem on Windows
Reapplying r283383 after revert in r283442. The additional fix
is a getting rid of a stray space in a function name, in the
refactoring part of the commit.

This avoids falling back to calling out to the GCC rem functions
(__moddi3, __umoddi3) when targeting Windows.

The __rt_div functions have flipped the two arguments compared
to the __aeabi_divmod functions. To match MSVC, we emit a
check for division by zero before actually calling the library
function (even if the library function itself also might do
the same check).

Not all calls to __rt_div functions for division are currently
merged with calls to the same function with the same parameters
for the remainder. This is more wasteful than a div + mls as before,
but avoids calls to __moddi3.

Differential Revision: https://reviews.llvm.org/D25332

llvm-svn: 283550
2016-10-07 13:28:53 +00:00
Javed Absar
636ccd32c2 [ARM]: Add Cortex-R52 target to LLVM
This patch adds Cortex-R52, the new ARM real-time processor, to LLVM. 
Cortex-R52 implements the ARMv8-R architecture.

llvm-svn: 283542
2016-10-07 12:06:40 +00:00
Simon Pilgrim
1b51489947 [X86][SSE] Update register class during MOVSD/MOVSS - BLENDPD/BLENDPS commutation
MOVSD/MOVSS take a 128-bit register and a FR32/FR64 register input, the commutation code wasn't taking this into account leading to verification errors.

This patch inserts a vreg copy mi to ensure that the registers are correct.

Fix for PR30607

Differential Revision: https://reviews.llvm.org/D25280

llvm-svn: 283539
2016-10-07 11:18:38 +00:00
Oliver Stannard
7be9f2d236 [ARM] Don't convert switches to lookup tables of pointers with ROPI/RWPI
With the ROPI and RWPI relocation models we can't always have pointers
to global data or functions in constant data, so don't try to convert switches
into lookup tables if any value in the lookup table would require a relocation.
We can still safely emit lookup tables of other values, such as simple
constants.

Differential Revision: https://reviews.llvm.org/D24462

llvm-svn: 283530
2016-10-07 08:48:24 +00:00
Mehdi Amini
ddf3596f96 Use StringRef in ARMELFStreamer (NFC)
llvm-svn: 283529
2016-10-07 08:48:07 +00:00
Nicolai Haehnle
65a5ad0126 AMDGPU: Fix use-after-free in SIOptimizeExecMasking
Summary:
There was a bug with sequences like

   s_mov_b64 s[0:1], exec
   s_and_b64 s[2:3]<def>, s[0:1], s[2:3]<kill>
   ...
   s_mov_b64_term exec, s[2:3]

because s[2:3] was defined and used in the same instruction, ending up with
SaveExecInst inside OtherUseInsts.

Note that the test case also exposes an unrelated bug.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98028

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25306

llvm-svn: 283528
2016-10-07 08:40:14 +00:00
Mehdi Amini
eb017ddb32 Use StringReg in TargetParser APIs (NFC)
llvm-svn: 283527
2016-10-07 08:37:29 +00:00
Mehdi Amini
2052587793 Revert "Revert "Add a static_assert to enforce that parameters to llvm::format() are not totally unsafe""
This reverts commit r283510 and reapply r283509, with updates to
clang-tools-extra as well.

llvm-svn: 283525
2016-10-07 08:25:42 +00:00