1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 05:23:45 +02:00
Commit Graph

219 Commits

Author SHA1 Message Date
Robert Khasanov
ae2da173af [SKX] Enabling SKX target and AVX512BW, AVX512DQ, AVX512VL features.
Enabling HasAVX512{DQ,BW,VL} predicates.
Adding VK2, VK4, VK32, VK64 masked register classes.
Adding new types (v64i8, v32i16) to VR512.
Extending calling conventions for new types (v64i8, v32i16)

Patch by Zinovy Nis <zinovy.y.nis@intel.com>
Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>

llvm-svn: 213545
2014-07-21 14:54:21 +00:00
Sanjay Patel
2f0f025b2b Move Post RA Scheduling flag bit into SchedMachineModel
Refactoring; no functional changes intended

    Removed PostRAScheduler bits from subtargets (X86, ARM).
    Added PostRAScheduler bit to MCSchedModel class.
    This bit is set by a CPU's scheduling model (if it exists).
    Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses.
    Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!).
    Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling.
    Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values.
    Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86: 
       a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget. 
       b. MIPS overrides the CPU's postRA settings by enabling postRA for everything. 
       c. PPC overrides the CPU's postRA settings by enabling postRA for everything. 
       d. X86 is the only target that actually has postRA specified via sched model info.

Differential Revision: http://reviews.llvm.org/D4217

llvm-svn: 213101
2014-07-15 22:39:58 +00:00
Eric Christopher
39dfca6dbe Move to a private function to initialize the subtarget dependencies
so that we can use initializer lists for the X86Subtarget.

llvm-svn: 210614
2014-06-11 00:25:19 +00:00
Eric Christopher
451eb4e990 Use unique_ptr for X86Subtarget pointer members.
llvm-svn: 210606
2014-06-10 23:26:47 +00:00
Eric Christopher
de9b19fdc2 Move all of the x86 subtarget initialized variables down into the x86 subtarget
from the x86 target machine. Should be no functional change.

llvm-svn: 210479
2014-06-09 17:08:19 +00:00
Alexey Volkov
a3a5a1d7f1 [X86] Use ADD/SUB instead of INC/DEC for Silvermont
According to Intel Software Optimization Manual 
on Silvermont INC or DEC instructions require 
an additional uop to merge the flags.
As a result, a branch instruction depending 
on an INC or a DEC instruction incurs a 1 cycle penalty.

Differential Revision: http://reviews.llvm.org/D3990

llvm-svn: 210466
2014-06-09 11:40:41 +00:00
Eric Christopher
7880d61aac Make early if conversion dependent upon the subtarget and add
a subtarget hook to enable. Unconditionally add to the pass pipeline
for targets that might want to use it. No functional change.

llvm-svn: 209340
2014-05-21 23:40:26 +00:00
Alexey Volkov
9a03018603 [X86] Tune LEA usage for Silvermont
According to Intel Software Optimization Manual on Silvermont in some cases LEA
is better to be replaced with ADD instructions:
"The rule of thumb for ADDs and LEAs is that it is justified to use LEA
with a valid index and/or displacement for non-destructive destination purposes
(especially useful for stack offset cases), or to use a SCALE.
Otherwise, ADD(s) are preferable."

Differential Revision: http://reviews.llvm.org/D3826

llvm-svn: 209198
2014-05-20 08:55:50 +00:00
Jim Grosbach
51592aeaa3 X86: Remove TargetMachine CPU auto-detection.
This logic is properly in the realm of whatever is creating the
TargetMachine. This makes plain 'llc foo.ll' consistent across
heterogenous machines.

llvm-svn: 206094
2014-04-12 01:34:29 +00:00
Yaron Keren
18f37e5fc5 Added isTargetWindowsMSVC(), renamed isTargetMingw() to isTargetWindowsGNU()
and isTargetCygwin() to isTargetWindowsCygwin() to be consistent with the
four Windows environments in Triple.h.

Suggestion by Saleem Abdulrasool!

llvm-svn: 205393
2014-04-02 04:27:51 +00:00
Yaron Keren
0524cf3d81 isTargetWindows() renamed to isTargetKnownWindowsMSVC()
to reflect its current functionality.

Based on Takumi NAKAMURA suggestion.

llvm-svn: 205338
2014-04-01 18:15:34 +00:00
Craig Topper
5cb50d052d [C++11] Mark more classes in the X86 target as 'final'.
llvm-svn: 205166
2014-03-31 06:53:13 +00:00
NAKAMURA Takumi
81cbcfd543 X86Subtarget.h: isTargetWindows() should tell whether he is targeting msvc.
FYI, !isWindowsGNUEnvironment() is insufficient. It missed cygwin.

FIXME: The name "isTargetWindows" should be fixed.
llvm-svn: 205124
2014-03-30 04:35:00 +00:00
Saleem Abdulrasool
d42d60171a Canonicalise Windows target triple spellings
Construct a uniform Windows target triple nomenclature which is congruent to the
Linux counterpart.  The old triples are normalised to the new canonical form.
This cleans up the long-standing issue of odd naming for various Windows
environments.

There are four different environments on Windows:

MSVC: The MS ABI, MSVCRT environment as defined by Microsoft
GNU: The MinGW32/MinGW32-W64 environment which uses MSVCRT and auxiliary libraries
Itanium: The MSVCRT environment + libc++ built with Itanium ABI
Cygnus: The Cygwin environment which uses custom libraries for everything

The following spellings are now written as:

i686-pc-win32 => i686-pc-windows-msvc
i686-pc-mingw32 => i686-pc-windows-gnu
i686-pc-cygwin => i686-pc-windows-cygnus

This should be sufficiently flexible to allow us to target other windows
environments in the future as necessary.

llvm-svn: 204977
2014-03-27 22:50:05 +00:00
Craig Topper
465f748cb7 [C++11] Add 'override' keyword to virtual methods that override their base class.
llvm-svn: 203378
2014-03-09 07:44:38 +00:00
Craig Topper
b0056a4ca7 Switch all uses of LLVM_OVERRIDE to just use 'override' directly.
llvm-svn: 202621
2014-03-02 09:09:27 +00:00
David Woodhouse
5bd74bdaf8 [x86] Kill gratuitous X86_{32,64}TargetMachine subclasses, use X86TargetMachine
llvm-svn: 198720
2014-01-08 00:08:50 +00:00
Craig Topper
201bd5add3 [x86] Add basic support for .code16
This is not really expected to work right yet. Mostly because we will
still emit the OpSize (0x66) prefix in all the wrong places, along with
a number of other corner cases. Those will all be fixed in the subsequent
commits.

Patch from David Woodhouse.

llvm-svn: 198584
2014-01-06 04:55:54 +00:00
Rafael Espindola
859cb122ba Synchronize the NaCl DataLayout strings with the ones in clang.
Patch by Derek Schuff.

llvm-svn: 197640
2013-12-19 00:44:37 +00:00
Tim Northover
54d769f4eb Make Triple's isOSBinFormatXXX functions partition triple-space.
Most users would be surprised if "isCOFF" and "isMachO" were simultaneously
true, unless they'd put the compiler in a box with a gun attached to a photon
detector.

This makes sure precisely one of the three formats is true for any triple and
simplifies some target logic based on that.

llvm-svn: 196934
2013-12-10 16:57:43 +00:00
Ekaterina Romanova
eda4e2e4a7 SHLD/SHRD are VectorPath (microcode) instructions known to have poor latency on certain architectures. While generating SHLD/SHRD instructions is acceptable when optimizing for size, optimizing for speed on these platforms should be implemented using alternative sequences of instructions composed of add, adc, shr, shl, or and lea which are directPath instructions. These alternative instructions not only have a lower latency but they also increase the decode bandwidth by allowing simultaneous decoding of a third directPath instruction.
AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency. Optimization guides for these processors recommend using an alternative sequence of instructions. For these AMD's processors, I disabled folding (or (x << c) | (y >> (64 - c))) when we are not optimizing for size.

It might be beneficial to disable this folding for some of the Intel's processors. However, since I couldn't find specific recommendations regarding using SHLD/SHRD instructions on Intel's processors, I haven't disabled this peephole for Intel.

llvm-svn: 195383
2013-11-21 23:21:26 +00:00
Yaron Keren
3fb42fb8b5 (this is a corrected patch)
Calling _chkstk is required on ELF as well as COFF on Windows. Without 
_chkstk, functions requiring large stack crash in initialization code.

Previous code tested for COFF format but not Mach-O and this patch modifies 
the code to test for Windows OS (both Windows target and MingW target) 
but not Mach-O object format: Looks like macho environment was used to 
build some EFI code.
 
Credits to Andrew MacPherson.

llvm-svn: 193289
2013-10-23 23:37:01 +00:00
Rafael Espindola
b6d34eea66 Revert "Calling _chkstk is required on ELF as well as COFF on Windows. Without _chkstk functions requiring large stack crash in initialization code. Previous code tested for COFF format but not Mach-O and this patch modifies the code to test for Windows."
This reverts commit r193263.

It is causing CodeGen/X86/mingw-alloca.ll to fail.

llvm-svn: 193275
2013-10-23 21:45:09 +00:00
Yaron Keren
56f5c84f6c Calling _chkstk is required on ELF as well as COFF on Windows.
Without _chkstk functions requiring large stack crash in 
initialization code. Previous code tested for COFF format but 
not Mach-O and this patch modifies the code to test for Windows.

Credits to Andrew MacPherson.

llvm-svn: 193263
2013-10-23 19:40:07 +00:00
Andrew Trick
e3e67d4a0a Enable MI Sched for x86.
This changes the SelectionDAG scheduling preference to source
order. Soon, the SelectionDAG scheduler can be bypassed saving
a nice chunk of compile time.

Performance differences that result from this change are often a
consequence of register coalescing. The register coalescer is far from
perfect. Bugs can be filed for deficiencies.

On x86 SandyBridge/Haswell, the source order schedule is often
preserved, particularly for small blocks.

Register pressure is generally improved over the SD scheduler's ILP
mode. However, we are still able to handle large blocks that require
latency hiding, unlike the SD scheduler's BURR mode. MI scheduler also
attempts to discover the critical path in single-block loops and
adjust heuristics accordingly.

The MI scheduler relies on the new machine model. This is currently
unimplemented for AVX, so we may not be generating the best code yet.

Unit tests are updated so they don't depend on SD scheduling heuristics.

llvm-svn: 192750
2013-10-15 23:33:07 +00:00
Yunzhong Gao
14609c71b2 Adding a feature flag to the llvm backend for x86 TBM instruction set.
Adding TBM feature to bdver2 processor; piledriver supports this instruction set
according to the following document:
http://developer.amd.com/wordpress/media/2012/10/New-Bulldozer-and-Piledriver-Instructions.pdf

Phabricator code review is located here: http://llvm-reviews.chandlerc.com/D1692

llvm-svn: 191324
2013-09-24 18:21:52 +00:00
Preston Gurd
0411803c14 Adds support for Atom Silvermont (SLM) - -march=slm
Implements Instruction scheduler latencies for Silvermont,
using latencies from the Intel Silvermont Optimization Guide.

Auto detects SLM.

Turns on post RA scheduler when generating code for SLM.

llvm-svn: 190717
2013-09-13 19:23:28 +00:00
Ben Langmuir
9981cd7cfe Partial support for Intel SHA Extensions (sha1rnds4)
Add basic assembly/disassembly support for the first Intel SHA
instruction 'sha1rnds4'. Also includes feature flag, and test cases.

Support for the remaining instructions will follow in a separate patch.

llvm-svn: 190611
2013-09-12 15:51:31 +00:00
Cameron Esfahani
ee2e44fcc0 Clean up some usage of Triple. The base class has methods for determining if the target is iOS and Linux.
llvm-svn: 189604
2013-08-29 20:23:14 +00:00
NAKAMURA Takumi
ad1675ea4f X86Subtarget.h: Recognize x86_64-cygwin.
In the LLVM side, x86_64-cygwin is almost as same as x86_64-mingw32.

llvm-svn: 189436
2013-08-28 03:04:02 +00:00
Craig Topper
33a600320c Rename mattr names for AVX-512 to from avx-512 -> avx512f, avx-512-pfi -> av512pf, avx-512-cdi -> avx512cd, avx-512-eri->avx512er. This matches better with official docs and what gcc patches appearto be using. I didn't touch the has* functions or the feature flag names to avoid change the td and lowering file while commits are still happening.
llvm-svn: 188859
2013-08-21 03:57:57 +00:00
Elena Demikhovsky
118b5b6492 I'm starting to commit KNL backend. I'll push patches one-by-one. This patch includes support for the extended register set XMM16-31, YMM16-31, ZMM0-31.
The full ISA you can see here: http://software.intel.com/en-us/intel-isa-extensions

llvm-svn: 187030
2013-07-24 11:02:47 +00:00
Charles Davis
2b2075f834 Target/X86: Add explicit Win64 and System V/x86-64 calling conventions.
Summary:
This patch adds explicit calling convention types for the Win64 and
System V/x86-64 ABIs. This allows code to override the default, and use
the Win64 convention on a target that wants to use SysV (and
vice-versa). This is needed to implement the `ms_abi` and `sysv_abi` GNU
attributes.

Reviewers:

CC:

llvm-svn: 186144
2013-07-12 06:02:35 +00:00
Andrew Trick
18751012bb Revert "Temporarily enable MI-Sched on X86."
This reverts commit 98a9b72e8c56dc13a2617de84503a3d78352789c.

llvm-svn: 184823
2013-06-25 02:48:58 +00:00
Andrew Trick
716b547d13 Temporarily enable MI-Sched on X86.
Sorry for the unit test churn. I'll try to make the change permanently
next time.

llvm-svn: 184705
2013-06-24 09:13:20 +00:00
Preston Gurd
0547d81fdb This patch adds the X86FixupLEAs pass, which will reduce instruction
latency for certain models of the Intel Atom family, by converting
instructions into their equivalent LEA instructions, when it is both
useful and possible to do so.

llvm-svn: 180573
2013-04-25 20:29:37 +00:00
Michael Liao
427149cbcf Add support of RDSEED defined in AVX2 extension
llvm-svn: 178314
2013-03-28 23:41:26 +00:00
Preston Gurd
b6ed645cb6 For the current Atom processor, the fastest way to handle a call
indirect through a memory address is to load the memory address into
a register and then call indirect through the register.

This patch implements this improvement by modifying SelectionDAG to
force a function address which is a memory reference to be loaded
into a virtual register.

Patch by Sriram Murali.

llvm-svn: 178171
2013-03-27 19:14:02 +00:00
Michael Liao
3515920fbd Add HLE target feature
llvm-svn: 178082
2013-03-26 22:46:02 +00:00
Michael Liao
969ef73c31 Add PREFETCHW codegen support
- Add 'PRFCHW' feature defined in AVX2 ISA extension

llvm-svn: 178040
2013-03-26 17:47:11 +00:00
Bill Wendling
fb87157cc8 Reinitialize the ivars in the subtarget so that they can be reset with the new features.
llvm-svn: 175336
2013-02-16 01:36:26 +00:00
Bill Wendling
3e17fc6664 Temporary revert of 175320.
llvm-svn: 175322
2013-02-15 23:22:32 +00:00
Bill Wendling
ecc7822c1e Reinitialize the ivars in the subtarget.
When we're recalculating the feature set of the subtarget, we need to have the
ivars in their initial state.

llvm-svn: 175320
2013-02-15 23:18:01 +00:00
Bill Wendling
cc20bdf27b Use the 'target-features' and 'target-cpu' attributes to reset the subtarget features.
If two functions require different features (e.g., `-mno-sse' vs. `-msse') then
we want to honor that, especially during LTO. We can do that by resetting the
subtarget's features depending upon the 'target-feature' attribute.

llvm-svn: 175314
2013-02-15 22:31:27 +00:00
Kay Tiong Khoo
45b3d90921 added basic support for Intel ADX instructions
-feature flag, instructions definitions, test cases

llvm-svn: 175196
2013-02-14 19:08:21 +00:00
Evan Cheng
2e2cde560f Teach SDISel to combine fsin / fcos into a fsincos node if the following
conditions are met:
1. They share the same operand and are in the same BB.
2. Both outputs are used.
3. The target has a native instruction that maps to ISD::FSINCOS node or
   the target provides a sincos library call.

Implemented the generic optimization in sdisel and enabled it for
Mac OSX. Also added an additional optimization for x86_64 Mac OSX by
using an alternative entry point __sincos_stret which returns the two
results in xmm0 / xmm1.

rdar://13087969
PR13204

llvm-svn: 173755
2013-01-29 02:32:37 +00:00
Eli Bendersky
391ff99738 In this patch, we teach X86_64TargetMachine that it has a ILP32
(defined by the x32 ABI) mode, in which case its pointers are 32-bits
in size. This knowledge is also added to X86RegisterInfo that now
returns the appropriate registers in getPointerRegClass.

There are many outcomes to this change. In order to keep the patches
separate and manageable, we start by focusing on some simple testable
cases. The patch adds a test with passing a pointer to a function -
focusing on the difference between the two data models for x86-64.
Another test is added for handling of 'sret' arguments (and
functionality is added in X86ISelLowering to make it work).

A note on naming: the "x32 ABI" document refers to the AMD64
architecture (in LLVM it's distinguished by being is64Bits() in the
x86 subtarget) with two variations: the LP64 (default) data model, and
the ILP32 data model. This patch adds predicates to the subtarget
which are consistent with this naming scheme.

llvm-svn: 173503
2013-01-25 22:07:43 +00:00
Preston Gurd
4b0d66f924 Pad Short Functions for Intel Atom
The current Intel Atom microarchitecture has a feature whereby
when a function returns early then it is slightly faster to execute
a sequence of NOP instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction until
the return address is ready.

When compiling for X86 Atom only, this patch will run a pass,
called "X86PadShortFunction" which will add NOP instructions where less
than four cycles elapse between function entry and return.

It includes tests.

This patch has been updated to address Nadav's review comments
- Optimize only at >= O1 and don't do optimization if -Os is set
- Stores MachineBasicBlock* instead of BBNum
- Uses DenseMap instead of std::map
- Fixes placement of braces

Patch by Andy Zhang.

llvm-svn: 171879
2013-01-08 18:27:24 +00:00
Nadav Rotem
900cb45dec Revert revision 171524. Original message:
URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev
Log:
The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.

When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.

It includes tests.

Patch by Andy Zhang.

llvm-svn: 171603
2013-01-05 05:42:48 +00:00
Preston Gurd
b1c34fa73f The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.

When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.

It includes tests.

Patch by Andy Zhang.

llvm-svn: 171524
2013-01-04 20:54:54 +00:00