1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 19:42:54 +02:00
Commit Graph

165 Commits

Author SHA1 Message Date
Alexey Volkov
a3a5a1d7f1 [X86] Use ADD/SUB instead of INC/DEC for Silvermont
According to Intel Software Optimization Manual 
on Silvermont INC or DEC instructions require 
an additional uop to merge the flags.
As a result, a branch instruction depending 
on an INC or a DEC instruction incurs a 1 cycle penalty.

Differential Revision: http://reviews.llvm.org/D3990

llvm-svn: 210466
2014-06-09 11:40:41 +00:00
Alexey Volkov
9a03018603 [X86] Tune LEA usage for Silvermont
According to Intel Software Optimization Manual on Silvermont in some cases LEA
is better to be replaced with ADD instructions:
"The rule of thumb for ADDs and LEAs is that it is justified to use LEA
with a valid index and/or displacement for non-destructive destination purposes
(especially useful for stack offset cases), or to use a SCALE.
Otherwise, ADD(s) are preferable."

Differential Revision: http://reviews.llvm.org/D3826

llvm-svn: 209198
2014-05-20 08:55:50 +00:00
Chandler Carruth
17e2aa3c0a [x86] Make the 'x86-64' cpu, what I see as and many use as the generic
default architecture for reasonable modern x86 processors, actually be
modern. This processor model should essentially be "tuned" for modern
x86 chips as much as possible without undue penalties on any specific
architecture. Previously we weren't even using the nice scheduling
models. There are a few other tweaks needed here, but this change at
least I have benchmarked across a decent swatch of chips (intel's
clovertown, westmere, and sandybridge; amd's istanbul) and seen no
significant regressions.

If anyone has suggested ways to test this, just let me know. Somewhat
alarmingly, no existing tests failed.

llvm-svn: 208230
2014-05-07 17:37:03 +00:00
Benjamin Kramer
b0e4e7faba Add a description for AMD's bdver4 (aka Excavator).
This is just bdver3 + AVX2 + BMI2.

llvm-svn: 207847
2014-05-02 15:47:07 +00:00
Alexey Volkov
e9ac44dbb0 Enable FeatureFastUAMem for Silvermont processor
Differential Revision: http://llvm-reviews.chandlerc.com/D2982

llvm-svn: 203218
2014-03-07 09:03:49 +00:00
Alexey Volkov
75bd45540c Test commit
Removed whitespace

llvm-svn: 203216
2014-03-07 08:28:44 +00:00
Craig Topper
201bd5add3 [x86] Add basic support for .code16
This is not really expected to work right yet. Mostly because we will
still emit the OpSize (0x66) prefix in all the wrong places, along with
a number of other corner cases. Those will all be fixed in the subsequent
commits.

Patch from David Woodhouse.

llvm-svn: 198584
2014-01-06 04:55:54 +00:00
Rafael Espindola
427ca8d886 Change the default of AsmWriterClassName and isMCAsmWriter.
llvm-svn: 196065
2013-12-02 04:55:42 +00:00
Ekaterina Romanova
eda4e2e4a7 SHLD/SHRD are VectorPath (microcode) instructions known to have poor latency on certain architectures. While generating SHLD/SHRD instructions is acceptable when optimizing for size, optimizing for speed on these platforms should be implemented using alternative sequences of instructions composed of add, adc, shr, shl, or and lea which are directPath instructions. These alternative instructions not only have a lower latency but they also increase the decode bandwidth by allowing simultaneous decoding of a third directPath instruction.
AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency. Optimization guides for these processors recommend using an alternative sequence of instructions. For these AMD's processors, I disabled folding (or (x << c) | (y >> (64 - c))) when we are not optimizing for size.

It might be beneficial to disable this folding for some of the Intel's processors. However, since I couldn't find specific recommendations regarding using SHLD/SHRD instructions on Intel's processors, I haven't disabled this peephole for Intel.

llvm-svn: 195383
2013-11-21 23:21:26 +00:00
Benjamin Kramer
2d870f327a X86: Add a description for AMD bdver3 aka Steamroller.
This is just bdver2 + FSGSBase.

llvm-svn: 193984
2013-11-04 10:29:20 +00:00
Yunzhong Gao
23e948dd2f Enabling 3DNow! prefetch instruction for a few AMD processors: bobcat, jaguar,
bulldozer and piledriver. Support for the instruction itself seems to have
already been added in r178040.

Differential Revision: http://llvm-reviews.chandlerc.com/D1933

llvm-svn: 192828
2013-10-16 19:04:11 +00:00
Nick Lewycky
e9c94635b3 Rename this feature to "cx16" to match gcc's flag name. Apparently these strings
are directly tied to the flag names in clang with no remapping in between?

llvm-svn: 192044
2013-10-05 20:11:44 +00:00
Yunzhong Gao
14609c71b2 Adding a feature flag to the llvm backend for x86 TBM instruction set.
Adding TBM feature to bdver2 processor; piledriver supports this instruction set
according to the following document:
http://developer.amd.com/wordpress/media/2012/10/New-Bulldozer-and-Piledriver-Instructions.pdf

Phabricator code review is located here: http://llvm-reviews.chandlerc.com/D1692

llvm-svn: 191324
2013-09-24 18:21:52 +00:00
Preston Gurd
3d32b70ccf Remove unused code, which had been commented out.
llvm-svn: 190869
2013-09-17 16:53:36 +00:00
Craig Topper
4edc623bda Make F16C feature flag imply AVX rather than just checking both at the patterns.
llvm-svn: 190775
2013-09-16 04:29:58 +00:00
Preston Gurd
0411803c14 Adds support for Atom Silvermont (SLM) - -march=slm
Implements Instruction scheduler latencies for Silvermont,
using latencies from the Intel Silvermont Optimization Guide.

Auto detects SLM.

Turns on post RA scheduler when generating code for SLM.

llvm-svn: 190717
2013-09-13 19:23:28 +00:00
Ben Langmuir
9981cd7cfe Partial support for Intel SHA Extensions (sha1rnds4)
Add basic assembly/disassembly support for the first Intel SHA
instruction 'sha1rnds4'. Also includes feature flag, and test cases.

Support for the remaining instructions will follow in a separate patch.

llvm-svn: 190611
2013-09-12 15:51:31 +00:00
Benjamin Kramer
163da340ac X86: Add a description of the Intel Atom Silvermont CPU.
Currently this is just the atom model with SSE4.2 enabled.

llvm-svn: 189669
2013-08-30 14:05:32 +00:00
Rafael Espindola
112bdc8929 Rename features to match what gcc and clang use.
There is no advantage in being different and using the same names simplifies
clang a bit.

llvm-svn: 189141
2013-08-23 20:21:34 +00:00
Craig Topper
33a600320c Rename mattr names for AVX-512 to from avx-512 -> avx512f, avx-512-pfi -> av512pf, avx-512-cdi -> avx512cd, avx-512-eri->avx512er. This matches better with official docs and what gcc patches appearto be using. I didn't touch the has* functions or the feature flag names to avoid change the td and lowering file while commits are still happening.
llvm-svn: 188859
2013-08-21 03:57:57 +00:00
Elena Demikhovsky
505373db43 Added encoding prefixes for KNL instructions (EVEX).
Added 512-bit operands printing.
Added instruction formats for KNL instructions.

llvm-svn: 187324
2013-07-28 08:28:38 +00:00
Elena Demikhovsky
118b5b6492 I'm starting to commit KNL backend. I'll push patches one-by-one. This patch includes support for the extended register set XMM16-31, YMM16-31, ZMM0-31.
The full ISA you can see here: http://software.intel.com/en-us/intel-isa-extensions

llvm-svn: 187030
2013-07-24 11:02:47 +00:00
Benjamin Kramer
96ad18d591 X86: Add target description for btver2; make autodetection logic aware of AVX.
llvm-svn: 181005
2013-05-03 10:20:08 +00:00
Preston Gurd
0547d81fdb This patch adds the X86FixupLEAs pass, which will reduce instruction
latency for certain models of the Intel Atom family, by converting
instructions into their equivalent LEA instructions, when it is both
useful and possible to do so.

llvm-svn: 180573
2013-04-25 20:29:37 +00:00
Chad Rosier
645b701422 [asm parser] Add support for predicating MnemonicAlias based on the assembler
variant/dialect.  Addresses a FIXME in the emitMnemonicAliases function.
Use and test case to come shortly.
rdar://13688439 and part of PR13340.

llvm-svn: 179804
2013-04-18 22:35:36 +00:00
Michael Liao
427149cbcf Add support of RDSEED defined in AVX2 extension
llvm-svn: 178314
2013-03-28 23:41:26 +00:00
Nadav Rotem
401bba05fe Add the Haswell machine model.
llvm-svn: 178301
2013-03-28 22:34:46 +00:00
Preston Gurd
b6ed645cb6 For the current Atom processor, the fastest way to handle a call
indirect through a memory address is to load the memory address into
a register and then call indirect through the register.

This patch implements this improvement by modifying SelectionDAG to
force a function address which is a memory reference to be loaded
into a virtual register.

Patch by Sriram Murali.

llvm-svn: 178171
2013-03-27 19:14:02 +00:00
Michael Liao
3515920fbd Add HLE target feature
llvm-svn: 178082
2013-03-26 22:46:02 +00:00
Jakob Stoklund Olesen
43b68b7eb9 Enable SandyBridgeModel for all modern Intel P6 descendants.
All Intel CPUs since Yonah look a lot alike, at least at the granularity
of the scheduling models. We can add more accurate models for
processors that aren't Sandy Bridge if required. Haswell will probably
need its own.

The Atom processor and anything based on NetBurst is completely
different. So are the non-Intel chips.

llvm-svn: 178080
2013-03-26 22:19:12 +00:00
Michael Liao
969ef73c31 Add PREFETCHW codegen support
- Add 'PRFCHW' feature defined in AVX2 ISA extension

llvm-svn: 178040
2013-03-26 17:47:11 +00:00
Kay Tiong Khoo
45b3d90921 added basic support for Intel ADX instructions
-feature flag, instructions definitions, test cases

llvm-svn: 175196
2013-02-14 19:08:21 +00:00
Preston Gurd
4b0d66f924 Pad Short Functions for Intel Atom
The current Intel Atom microarchitecture has a feature whereby
when a function returns early then it is slightly faster to execute
a sequence of NOP instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction until
the return address is ready.

When compiling for X86 Atom only, this patch will run a pass,
called "X86PadShortFunction" which will add NOP instructions where less
than four cycles elapse between function entry and return.

It includes tests.

This patch has been updated to address Nadav's review comments
- Optimize only at >= O1 and don't do optimization if -Os is set
- Stores MachineBasicBlock* instead of BBNum
- Uses DenseMap instead of std::map
- Fixes placement of braces

Patch by Andy Zhang.

llvm-svn: 171879
2013-01-08 18:27:24 +00:00
Nadav Rotem
900cb45dec Revert revision 171524. Original message:
URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev
Log:
The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.

When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.

It includes tests.

Patch by Andy Zhang.

llvm-svn: 171603
2013-01-05 05:42:48 +00:00
Preston Gurd
b1c34fa73f The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.

When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.

It includes tests.

Patch by Andy Zhang.

llvm-svn: 171524
2013-01-04 20:54:54 +00:00
Chandler Carruth
63f4e5f8b9 Make '-mtune=x86_64' assume fast unaligned memory accesses.
Not all chips targeted by x86_64 have this feature, but a dramatically
increasing number do. Specifying a chip-specific tuning parameter will
continue to turn the feature on or off as appropriate for that
particular chip, but the generic flag should try to achieve the best
performance on the most widely available hardware. Today, the number of
chips with fast UA access dwarfs those without in the x86-64 space.

Note that this also brings LLVM's code generation for this '-march' flag
more in line with that of modern GCCs. Reviewed by Dan Gohman.

llvm-svn: 170269
2012-12-15 09:01:13 +00:00
Chandler Carruth
7e4aad1c1f Revert "Make '-mtune=x86_64' assume fast unaligned memory accesses."
Accidental commit... git svn betrayed me. Sorry for the noise.

llvm-svn: 169741
2012-12-10 18:23:52 +00:00
Chandler Carruth
a64587b996 Make '-mtune=x86_64' assume fast unaligned memory accesses.
Summary:
Not all chips targeted by x86_64 have this feature, but a dramatically
increasing number do. Specifying a chip-specific tuning parameter will
continue to turn the feature on or off as appropriate for that
particular chip, but the generic flag should try to achieve the best
performance on the most widely available hardware. Today, the number of
chips with fast UA access dwarfs those without in the x86-64 space.

Note that this also brings LLVM's code generation for this '-march' flag
more in line with that of modern GCCs.

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D195

llvm-svn: 169740
2012-12-10 18:22:42 +00:00
Chandler Carruth
3e5f2c328d Address a FIXME and update the fast unaligned memory feature for newer
Intel chips.

The model number rules were determined by inspecting Intel's
documentation for their newer chip model numbers. My understanding is
that all of the newer Intel chips have fast unaligned memory access, but
if anyone is concerned about a particular chip, just shout.

No tests updated; it's not clear we have dedicated tests for the chips'
various features, but if anyone would like tests (or can point me at
some existing ones), I'm happy to oblige.

llvm-svn: 169730
2012-12-10 09:18:44 +00:00
Michael Liao
59114df23b Add support of RTM from TSX extension
- Add RTM code generation support throught 3 X86 intrinsics:
  xbegin()/xend() to start/end a transaction region, and xabort() to abort a
  tranaction region

llvm-svn: 167573
2012-11-08 07:28:54 +00:00
Michael Liao
62bd496b54 Atom has SIMD instruction set extension up to SSSE3
llvm-svn: 166665
2012-10-25 07:06:48 +00:00
Craig Topper
09bb7db4f3 Fix 80-column violation
llvm-svn: 165089
2012-10-03 03:56:12 +00:00
Roman Divacky
a811b158e5 Add support for AMD Geode.
llvm-svn: 163710
2012-09-12 14:36:02 +00:00
Preston Gurd
c80dc7d214 Generic Bypass Slow Div
- CodeGenPrepare pass for identifying div/rem ops
- Backend specifies the type mapping using addBypassSlowDivType
- Enabled only for Intel Atom with O2 32-bit -> 8-bit
- Replace IDIV with instructions which test its value and use DIVB if the value
is positive and less than 256.
- In the case when the quotient and remainder of a divide are used a DIV
and a REM instruction will be present in the IR. In the non-Atom case
they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
using the quotient and remainder from the first IDIV. However,
due to this optimization CSE is not able to eliminate redundant
IDIV instructions because they are located in different basic blocks.
This is overcome by calculating both the quotient (DIV) and remainder (REM)
in each basic block that is inserted by the optimization and reusing the result
values when a subsequent DIV or REM instruction uses the same operands.
- Test cases check for the presents of the optimization when calculating
either the quotient, remainder,  or both.

Patch by Tyler Nowicki!

llvm-svn: 163150
2012-09-04 18:22:17 +00:00
Anitha Boyapati
161fc750a1 Patch to enable FMA on bdver2 target. Make XOP feature enable FMA4 as well.
llvm-svn: 162012
2012-08-16 04:04:02 +00:00
Anitha Boyapati
5443ee0d76 (no commit message)
llvm-svn: 162010
2012-08-16 03:50:04 +00:00
Andrew Trick
b9c8074dcd I'm introducing a new machine model to simultaneously allow simple
subtarget CPU descriptions and support new features of
MachineScheduler.

MachineModel has three categories of data:
1) Basic properties for coarse grained instruction cost model.
2) Scheduler Read/Write resources for simple per-opcode and operand cost model (TBD).
3) Instruction itineraties for detailed per-cycle reservation tables.

These will all live side-by-side. Any subtarget can use any
combination of them. Instruction itineraries will not change in the
near term. In the long run, I expect them to only be relevant for
in-order VLIW machines that have complex contraints and require a
precise scheduling/bundling model. Once itineraries are only actively
used by VLIW-ish targets, they could be replaced by something more
appropriate for those targets.

This tablegen backend rewrite sets things up for introducing
MachineModel type #2: per opcode/operand cost model.

llvm-svn: 159891
2012-07-07 04:00:00 +00:00
Craig Topper
5837bcfc02 Rename FMA3 feature flag to just FMA to match gcc so it can be added to clang.
llvm-svn: 157903
2012-06-03 18:58:46 +00:00
Benjamin Kramer
cb686400fb X86: Rename the CLMUL target feature to PCLMUL.
It was renamed in gcc/gas a while ago and causes all kinds of
confusion because it was named differently in llvm and clang.

llvm-svn: 157745
2012-05-31 14:34:17 +00:00
Craig Topper
405f995b07 Make XOP and FMA4 require SSE4A to match GCC behavior. Use this to simplify Bulldozer feature list.
llvm-svn: 155897
2012-05-01 06:54:48 +00:00