1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 20:12:56 +02:00
Commit Graph

9169 Commits

Author SHA1 Message Date
Preston Gurd
66b9c4fcf9 Bypass Slow Divides
* Only apply divide bypass optimization when not optimizing for size. 
* Fixed bug caused by constant for 0 value of type Int32,
  used dividend type to generate the constant instead.
* For atom x86-64 apply the divide bypass to use 16-bit divides instead of
  64-bit divides when operand values are small enough.
* Added lit tests for 64-bit divide bypass.

Patch by Tyler Nowicki!

llvm-svn: 176442
2013-03-04 18:13:57 +00:00
Arnold Schwaighofer
e60e6fc70f X86 cost model: Adjust cost for custom lowered vector multiplies
This matters for example in following matrix multiply:

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
  int i, j, k, val;
  for (i=0; i<rows; i++) {
    for (j=0; j<cols; j++) {
      val = 0;
      for (k=0; k<cols; k++) {
        val += m1[i][k] * m2[k][j];
      }
      m3[i][j] = val;
    }
  }
  return(m3);
}

Taken from the test-suite benchmark Shootout.

We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).

Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.

I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:

for (i ...)
   r += a[i] * 3;
for (i ...)
  m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.

In each case the vectorized version was considerably slower.

radar://13304919

llvm-svn: 176403
2013-03-02 04:02:52 +00:00
Michael Liao
1e621fbd2f Fix PR10475
- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
  but TLI.getShiftAmountTy() so far only return scalar type. As a
  result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
  TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
  return target-specificed scalar type or the same vector type as the
  1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
  type.

llvm-svn: 176364
2013-03-01 18:40:30 +00:00
Duncan Sands
268f52a52f GCC thinks that this variable might be used uninitialized (it isn't).
llvm-svn: 176341
2013-03-01 09:46:03 +00:00
Yiannis Tsiouris
b2a123a008 Re-format comments (and check commit access)
llvm-svn: 176270
2013-02-28 16:59:10 +00:00
Nadav Rotem
6489b3c2b7 Revert r176166 because it broke one of the lit tests.
llvm-svn: 176171
2013-02-27 05:56:20 +00:00
Nadav Rotem
c64a5a0435 std::string to StringRef.
llvm-svn: 176166
2013-02-27 05:23:56 +00:00
Chad Rosier
5d41dc7229 [fast-isel] Make sure the FastLowerArguments function checks to make sure the
arguments type is a simple type.
rdar://13290455

llvm-svn: 176066
2013-02-26 01:05:31 +00:00
Michael Liao
ad0b9ecc47 Refine fix to PR10499, no functionality change
- Put expensive checking after simple one

llvm-svn: 176060
2013-02-25 23:16:36 +00:00
Michael Liao
ff7d7ec88b Fix PR10499
- Check whether SSE is available before lowering all 1s vector building with
  PCMPEQD, which is only available from SSE2

llvm-svn: 176058
2013-02-25 23:01:03 +00:00
Chad Rosier
37142b6930 [fast-isel] Add X86FastIsel::FastLowerArguments to handle functions with 6 or
fewer scalar integer (i32 or i64) arguments. It completely eliminates the need
for SDISel for trivial functions.

Also, add the new llc -fast-isel-abort-args option, which is similar to
-fast-isel-abort option, but for formal argument lowering.

llvm-svn: 176052
2013-02-25 21:59:35 +00:00
Chad Rosier
77e46d6eb6 [ms-inline asm] Add support for the pushad/popad mnemonics.
rdar://13254235

llvm-svn: 176036
2013-02-25 19:06:27 +00:00
Nadav Rotem
0740239f87 Revert r169638 because it broke Mesa llvmpipe tests.
Fix PR15239.

llvm-svn: 175985
2013-02-24 07:09:35 +00:00
Benjamin Kramer
bdb1d9aad3 X86: Disable cmov-memory patterns on subtargets without cmov.
Fixes PR15115.

llvm-svn: 175962
2013-02-23 10:40:58 +00:00
Peter Collingbourne
7dc1ee08f5 x86_64: designate most general purpose and SSE registers as callee save under coldcc
llvm-svn: 175911
2013-02-22 19:19:44 +00:00
Eli Bendersky
37f247b8d8 Move the eliminateCallFramePseudoInstr method from TargetRegisterInfo
to TargetFrameLowering, where it belongs. Incidentally, this allows us
to delete some duplicated (and slightly different!) code in TRI.

There are potentially other layering problems that can be cleaned up
as a result, or in a similar manner.

The refactoring was OK'd by Anton Korobeynikov on llvmdev.

Note: this touches the target interfaces, so out-of-tree targets may
be affected.

llvm-svn: 175788
2013-02-21 20:05:00 +00:00
Eli Bendersky
79457c8f90 getX86SubSuperRegister has a special mode with High=true for i64 which
exists solely to enable it to call itself for i8 with some registers.
The proposed patch simplifies the function somewhat to make the High
bit only meaningful for the i8 mode, which makes sense. No functional
difference (getX86SubSuperRegister is not getting called from anywhere
outside with i64 and High=true).

llvm-svn: 175762
2013-02-21 16:40:18 +00:00
Jim Grosbach
89c0252c2a MCParser: Update method names per coding guidelines.
s/AddDirectiveHandler/addDirectiveHandler/
s/ParseMSInlineAsm/parseMSInlineAsm/
s/ParseIdentifier/parseIdentifier/
s/ParseStringToEndOfStatement/parseStringToEndOfStatement/
s/ParseEscapedString/parseEscapedString/
s/EatToEndOfStatement/eatToEndOfStatement/
s/ParseExpression/parseExpression/
s/ParseParenExpression/parseParenExpression/
s/ParseAbsoluteExpression/parseAbsoluteExpression/
s/CheckForValidSection/checkForValidSection/

http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

No functional change intended.

llvm-svn: 175675
2013-02-20 22:21:35 +00:00
Jim Grosbach
233487d8a2 Update TargetLowering ivars for name policy.
http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

ivars should be camel-case and start with an upper-case letter. A few in
TargetLowering were starting with a lower-case letter.

No functional change intended.

llvm-svn: 175667
2013-02-20 21:13:59 +00:00
Chad Rosier
f46d46cb82 [ms-inline asm] Make the comment a bit more verbose.
llvm-svn: 175641
2013-02-20 18:03:44 +00:00
Elena Demikhovsky
0886fb4d55 I optimized the following patterns:
sext <4 x i1> to <4 x i64>
 sext <4 x i8> to <4 x i64>
 sext <4 x i16> to <4 x i64>
 
I'm running Combine on SIGN_EXTEND_IN_REG and revert SEXT patterns:
 (sext_in_reg (v4i64 anyext (v4i32 x )), ExtraVT) -> (v4i64 sext (v4i32 sext_in_reg (v4i32 x , ExtraVT)))
 
 The sext_in_reg (v4i32 x) may be lowered to shl+sar operations.
 The "sar" does not exist on 64-bit operation, so lowering sext_in_reg (v4i64 x) has no vector solution.

I also added a cost of this operations to the AVX costs table.

llvm-svn: 175619
2013-02-20 12:42:54 +00:00
Chad Rosier
2be41be7b9 [ms-inline asm] Force the use of a base pointer if the MachineFunction includes
MS-style inline assembly.

This is a follow-on to r175334.  Forcing a FP to be emitted doesn't ensure it
will be used.  Therefore, force the base pointer as well.  We now treat MS
inline assembly in the same way we treat functions with dynamic stack
realignment and VLAs.  This guarantees the BP will be used to reference 
parameters and locals.
rdar://13218191

llvm-svn: 175576
2013-02-19 23:50:45 +00:00
Jakub Staszak
8aa8fcbbf4 Add obvious constantness.
llvm-svn: 175560
2013-02-19 21:54:59 +00:00
Benjamin Kramer
e94a9e6d14 Clean up HiPE prologue emission a bit and avoid signed arithmetic tricks.
No intended functionality change.

llvm-svn: 175536
2013-02-19 17:32:57 +00:00
Rafael Espindola
2e18dbb394 Move LLVM_LIBRARY_VISIBILITY for consistency with what was done to
PPCJITInfo.cpp in r175394.

llvm-svn: 175531
2013-02-19 17:14:33 +00:00
Eli Bendersky
cf980445ac Make pass name more precise and fix comment.
llvm-svn: 175525
2013-02-19 16:38:32 +00:00
Craig Topper
6b083f50d1 Fix capitalization in comment to match function name.
llvm-svn: 175497
2013-02-19 07:43:59 +00:00
Jakub Staszak
201dd59b34 Use array_pod_sort instead of std::sort.
llvm-svn: 175472
2013-02-18 23:18:22 +00:00
NAKAMURA Takumi
fd77684daa X86FrameLowering.cpp: Fixup. Sorry for the breakage.
llvm-svn: 175467
2013-02-18 23:15:21 +00:00
NAKAMURA Takumi
ea78a0dbe1 X86FrameLowering.cpp: Fix a warning in -Asserts. [-Wunused-variable]
llvm-svn: 175464
2013-02-18 23:08:49 +00:00
Chad Rosier
0914ae8db3 Remove a useless assert.
llvm-svn: 175463
2013-02-18 22:20:16 +00:00
Benjamin Kramer
f776243f47 Fix a 32/64 bit incompatibility in the HiPE prologue generation.
llvm-svn: 175458
2013-02-18 21:45:01 +00:00
Benjamin Kramer
462b555ebe Support for HiPE-compatible code emission, patch by Yiannis Tsiouris.
llvm-svn: 175457
2013-02-18 20:55:12 +00:00
Benjamin Kramer
4ee689feed X86: Add a note.
llvm-svn: 175408
2013-02-17 23:34:14 +00:00
Jakub Staszak
933b3fda5f Return false instead of 0.
llvm-svn: 175402
2013-02-17 18:35:25 +00:00
NAKAMURA Takumi
355a81a32e [msvc x64] Update X86CompilationCallback_Win64.asm corresponding to r175267.
llvm-svn: 175363
2013-02-16 16:04:29 +00:00
Jakub Staszak
0b70c7ab00 Minor cleanups. No functionality change.
llvm-svn: 175359
2013-02-16 13:34:26 +00:00
Bill Wendling
fb87157cc8 Reinitialize the ivars in the subtarget so that they can be reset with the new features.
llvm-svn: 175336
2013-02-16 01:36:26 +00:00
Chad Rosier
1062ec80b5 [ms-inline asm] Do not omit the frame pointer if we have ms-inline assembly.
If the frame pointer is omitted, and any stack changes occur in the inline
assembly, e.g.: "pusha", then any C local variable or C argument references
will be incorrect.  

I pass no judgement on anyone who would do such a thing. ;)
rdar://13218191

llvm-svn: 175334
2013-02-16 01:25:28 +00:00
Bill Wendling
3e17fc6664 Temporary revert of 175320.
llvm-svn: 175322
2013-02-15 23:22:32 +00:00
Bill Wendling
ecc7822c1e Reinitialize the ivars in the subtarget.
When we're recalculating the feature set of the subtarget, we need to have the
ivars in their initial state.

llvm-svn: 175320
2013-02-15 23:18:01 +00:00
Bill Wendling
cc20bdf27b Use the 'target-features' and 'target-cpu' attributes to reset the subtarget features.
If two functions require different features (e.g., `-mno-sse' vs. `-msse') then
we want to honor that, especially during LTO. We can do that by resetting the
subtarget's features depending upon the 'target-feature' attribute.

llvm-svn: 175314
2013-02-15 22:31:27 +00:00
Chad Rosier
f2e3e4b3af [ms-inline asm] Adjust the EndLoc to account for the ']'.
llvm-svn: 175312
2013-02-15 21:58:13 +00:00
Rafael Espindola
64454b7056 Give these callbacks hidden visibility. It is better to not export them more
than we need to and some ELF linkers complain about directly accessing symbols
with default visibility.

llvm-svn: 175268
2013-02-15 14:15:59 +00:00
Rafael Espindola
50b6c2a0e2 Don't make assumptions about the mangling of static functions in extern "C"
blocks. We still don't have consensus if we should try to change clang or
the standard, but llvm should work with compilers that implement the current
standard and mangle those functions.

llvm-svn: 175267
2013-02-15 14:08:43 +00:00
Benjamin Kramer
e11f88e804 Make helpers static. Add missing include so LLVMInitializeObjCARCOpts gets C linkage.
llvm-svn: 175264
2013-02-15 12:30:38 +00:00
Eli Bendersky
3c50981d2a The operand listing is very much outdated.
llvm-svn: 175220
2013-02-14 23:17:03 +00:00
Jakub Staszak
9ef0442cde Simplify code. Remove "else after return".
llvm-svn: 175212
2013-02-14 21:50:09 +00:00
Kay Tiong Khoo
45b3d90921 added basic support for Intel ADX instructions
-feature flag, instructions definitions, test cases

llvm-svn: 175196
2013-02-14 19:08:21 +00:00
Nadav Rotem
402093b121 80-col
llvm-svn: 175189
2013-02-14 18:20:48 +00:00
Elena Demikhovsky
3a155506e7 Fixed a bug in X86TargetLowering::LowerVectorIntExtend() (assertion failure).
Added a test.

llvm-svn: 175144
2013-02-14 08:20:26 +00:00
Rafael Espindola
11ec830693 Revert r175120 and r175121. Clang is producing the expected asm names again.
llvm-svn: 175133
2013-02-14 03:33:34 +00:00
Rafael Espindola
d214fef222 Don't assume the mangling of static functions.
llvm-svn: 175121
2013-02-14 02:49:18 +00:00
Nick Lewycky
9a61e050d5 Don't build tail calls to functions with three inreg arguments on x86-32 PIC.
Fixes PR15250!

llvm-svn: 175092
2013-02-13 21:59:15 +00:00
Chad Rosier
ed40f84fdc [ms-inline-asm] Add support for memory references that have non-immediate
displacements.
rdar://12974533

llvm-svn: 175083
2013-02-13 21:33:44 +00:00
Benjamin Kramer
34ab81b7fa X86: Disable generation of rep;movsl when %esi is used as a base pointer.
This happens when there is both stack realignment and a dynamic alloca in the
function. If we overwrite %esi (rep;movsl uses fixed registers) we'll lose the
base pointer and the next register spill will write into oblivion.

Fixes PR15249 and unbreaks firefox on i386/freebsd. Mozilla uses dynamic allocas
and freebsd a 4 byte stack alignment.

llvm-svn: 175057
2013-02-13 13:40:35 +00:00
Elena Demikhovsky
a4a4bded4d Prevent insertion of "vzeroupper" before call that preserves YMM registers, since a caller uses preserved registers across the call.
llvm-svn: 175043
2013-02-13 08:02:04 +00:00
Eric Christopher
a2c85e433f Check i1 as well as i8 variables for 8 bit registers for x86 inline
assembly.

llvm-svn: 175036
2013-02-13 06:01:05 +00:00
Kay Tiong Khoo
d299c572f3 Added 0x0D to 2-byte opcode extension table for prefetch* variants
Fixed decode of existing 3dNow prefetchw instruction
Intel is scheduled to add a compatible prefetchw (same encoding) to future CPUs

llvm-svn: 174920
2013-02-12 00:19:12 +00:00
Kay Tiong Khoo
09400e6c4a *fixed disassembly of some i386 system insts with intel syntax
*added file for test cases for i386 intel syntax

llvm-svn: 174900
2013-02-11 19:46:36 +00:00
Eli Bendersky
1854305220 This is a follow-up on r174446, now taking Atom processors into
account. Atoms use LEA for updating SP in prologs/epilogs, and the
exact LEA opcode depends on the data model.

Also reapplying the test case which was added and then reverted
(because of Atom failures), this time specifying explicitly the CPU in
addition to the triple. The test case now checks all variations (data
mode, cpu Atom vs. Core).

llvm-svn: 174542
2013-02-06 20:43:57 +00:00
Eli Bendersky
63b704faa4 Make sure the correct opcodes are used to SUB and ADD the stack
pointer in function prologs/epilogs. The opcodes should depend on the
data model (LP64 vs. ILP32) rather than the architecture bit-ness.

llvm-svn: 174446
2013-02-05 21:53:29 +00:00
Jakob Stoklund Olesen
83ad73208a Move MRI liveouts to X86 return instructions.
llvm-svn: 174402
2013-02-05 17:59:48 +00:00
Eli Bendersky
89664e61b6 Fix comments
llvm-svn: 174390
2013-02-05 16:53:11 +00:00
Benjamin Kramer
ae05ca2d32 X86: Open up some opportunities for constant folding by postponing shift lowering.
Fixes PR15141.

llvm-svn: 174327
2013-02-04 15:19:33 +00:00
Benjamin Kramer
ab649797e0 X86: Simplify code. No functionality change.
llvm-svn: 174326
2013-02-04 15:19:25 +00:00
Evgeniy Stepanov
389e9dd213 More MSan/ASan annotations.
This change lets us bootstrap LLVM/Clang under ASan and MSan. It contains
fixes for 2 issues:

- X86JIT reads return address from stack, which MSan does not know is
  initialized.
- bugpoint tests run binaries with RLIMIT_AS. This does not work with certain
  Sanitizers.

We are no longer including config.h in Compiler.h with this change.

llvm-svn: 174306
2013-02-04 07:03:24 +00:00
David Sehr
59597001bc Two changes relevant to LEA and x32:
1) allows the use of RIP-relative addressing in 32-bit LEA instructions under
   x86-64 (ILP32 and LP64)
2) separates the size of address registers in 64-bit LEA instructions from
   control by ILP32/LP64.

llvm-svn: 174208
2013-02-01 19:28:09 +00:00
Chad Rosier
ebbd4433e6 [PEI] Pass the frame index operand number to the eliminateFrameIndex function.
Each target implementation was needlessly recomputing the index.
Part of rdar://13076458

llvm-svn: 174083
2013-01-31 20:02:54 +00:00
Eric Christopher
44ea43314a Whitespace.
llvm-svn: 174009
2013-01-31 00:50:48 +00:00
Eric Christopher
ae708feb79 Check and allow floating point registers to select the size of the
register for inline asm. This conforms to how gcc allows for effective
casting of inputs into gprs (fprs is already handled).

llvm-svn: 174008
2013-01-31 00:50:46 +00:00
Evan Cheng
4d1a496923 Restrict sin/cos optimization to 64-bit only for now. 32-bit is a bit messy and less critical.
llvm-svn: 173987
2013-01-30 22:56:35 +00:00
Evan Cheng
3d095b1549 Remove dead code.
llvm-svn: 173812
2013-01-29 18:08:22 +00:00
Hans Wennborg
4df1f32131 Fix typo in X86BaseInfo.h that I introduced in r157818.
llvm-svn: 173798
2013-01-29 14:05:57 +00:00
Craig Topper
a8c721bc88 Merge SSE and AVX shuffle instructions in the comment printer.
llvm-svn: 173777
2013-01-29 07:54:31 +00:00
Evan Cheng
2e2cde560f Teach SDISel to combine fsin / fcos into a fsincos node if the following
conditions are met:
1. They share the same operand and are in the same BB.
2. Both outputs are used.
3. The target has a native instruction that maps to ISD::FSINCOS node or
   the target provides a sincos library call.

Implemented the generic optimization in sdisel and enabled it for
Mac OSX. Also added an additional optimization for x86_64 Mac OSX by
using an alternative entry point __sincos_stret which returns the two
results in xmm0 / xmm1.

rdar://13087969
PR13204

llvm-svn: 173755
2013-01-29 02:32:37 +00:00
Craig Topper
8c275cea97 Fix 256-bit PALIGNR comment decoding to understand that it works on independent 256-bit lanes.
llvm-svn: 173674
2013-01-28 07:41:18 +00:00
Craig Topper
e60d6ab3f8 Add missing break in 256-bit palignr comment printing. No test case yet because the comment itself is still wrong.
llvm-svn: 173669
2013-01-28 07:19:11 +00:00
Craig Topper
97391f52d3 Fix inconsistent usage of PALIGN and PALIGNR when referring to the same instruction.
llvm-svn: 173667
2013-01-28 06:48:25 +00:00
Benjamin Kramer
b7b4734d8b X86: Decode PALIGN operands so I don't have to do it in my head.
llvm-svn: 173572
2013-01-26 13:31:37 +00:00
Benjamin Kramer
f6126f19f4 X86: Do splat promotion later, so the optimizer can chew on it first.
This catches many cases where we can emit a more efficient shuffle for a
specific mask or when the mask contains undefs. Once the splat is lowered to
unpacks we can't do that anymore.

There is a possibility of moving the promotion after pshufb matching, but I'm
not sure if pshufb with a mask loaded from memory is faster than 3 shuffles, so
I avoided that for now.

llvm-svn: 173569
2013-01-26 11:44:21 +00:00
Eli Bendersky
391ff99738 In this patch, we teach X86_64TargetMachine that it has a ILP32
(defined by the x32 ABI) mode, in which case its pointers are 32-bits
in size. This knowledge is also added to X86RegisterInfo that now
returns the appropriate registers in getPointerRegClass.

There are many outcomes to this change. In order to keep the patches
separate and manageable, we start by focusing on some simple testable
cases. The patch adds a test with passing a pointer to a function -
focusing on the difference between the two data models for x86-64.
Another test is added for handling of 'sret' arguments (and
functionality is added in X86ISelLowering to make it work).

A note on naming: the "x32 ABI" document refers to the AMD64
architecture (in LLVM it's distinguished by being is64Bits() in the
x86 subtarget) with two variations: the LP64 (default) data model, and
the ILP32 data model. This patch adds predicates to the subtarget
which are consistent with this naming scheme.

llvm-svn: 173503
2013-01-25 22:07:43 +00:00
Renato Golin
efde585fc3 Moving Cost Tables up to share with other targets
llvm-svn: 173382
2013-01-24 23:01:00 +00:00
Michael Liao
f1ce1e547c Fix an issue of pseudo atomic instruction DAG schedule
- Add list of physical registers clobbered in pseudo atomic insts
  Physical registers are clobbered when pseudo atomic instructions are
  expanded. Add them in clobber list to prevent DAG scheduler to
  mis-schedule them after these insns are declared side-effect free.
- Add test case from Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 173200
2013-01-22 21:47:38 +00:00
Benjamin Kramer
f2fc27452c X86: Make sure we account for the FMA4 register immediate value, otherwise rip-rel relocations will be off by one byte.
PR15040.

llvm-svn: 173176
2013-01-22 18:05:59 +00:00
Eli Bendersky
36076df691 Initial patch for x32 ABI support.
Add the x32 environment kind to the triple, and separate the concept of
pointer size and callee save stack slot size, since they're not equal
on x32.

llvm-svn: 173175
2013-01-22 18:02:49 +00:00
Tim Northover
52ba1e77cb Make APFloat constructor require explicit semantics.
Previously we tried to infer it from the bit width size, with an added
IsIEEE argument for the PPC/IEEE 128-bit case, which had a default
value. This default value allowed bugs to creep in, where it was
inappropriate.

llvm-svn: 173138
2013-01-22 09:46:31 +00:00
Craig Topper
c227faa439 Use <0 checks in place of ==-1 because it results in simpler code.
llvm-svn: 173010
2013-01-21 07:25:16 +00:00
Craig Topper
8c9d15eee7 Use MVT instead of EVT in LowerVECTOR_SHUFFLEtoBlend.
llvm-svn: 173009
2013-01-21 07:19:54 +00:00
Craig Topper
f0715ea5bb Remove trailing whitespace.
llvm-svn: 173008
2013-01-21 06:57:59 +00:00
Craig Topper
364a4f7a27 Fix some 80 column violations.
llvm-svn: 173006
2013-01-21 06:21:54 +00:00
Craig Topper
636472593a Make helper method static.
llvm-svn: 173005
2013-01-21 06:13:28 +00:00
Craig Topper
27f55b0886 Convert more EVT's to MVT's in the lowering methods.
llvm-svn: 172995
2013-01-20 21:50:27 +00:00
Craig Topper
31bc22abcd Capitalize lowerTRUNCATE so that it matches the other lower functions in this file despite it not matching coding standards.
llvm-svn: 172994
2013-01-20 21:34:37 +00:00
Renato Golin
5260b180e5 Revert CostTable algorithm, will re-write
llvm-svn: 172992
2013-01-20 20:57:20 +00:00
Craig Topper
ca029d2150 Make LowerVSETCC a static function and use MVT instead of EVT.
llvm-svn: 172969
2013-01-20 09:02:22 +00:00
Nadav Rotem
94213533f7 Revert 172708.
The optimization handles esoteric cases but adds a lot of complexity both to the X86 backend and to other backends.
This optimization disables an important canonicalization of chains of SEXT nodes and makes SEXT and ZEXT asymmetrical.
Disabling the canonicalization of consecutive SEXT nodes into a single node disables other DAG optimizations that assume
that there is only one SEXT node. The AVX mask optimizations is one example. Additionally this optimization does not update the cost model.

llvm-svn: 172968
2013-01-20 08:35:56 +00:00
Craig Topper
33f4f75f64 Make some helper methods static.
llvm-svn: 172936
2013-01-20 00:50:58 +00:00
Craig Topper
c36bc959d8 Remove DebugLoc argument from static function. It can easily be obtained from the SVOp passed in.
llvm-svn: 172935
2013-01-20 00:43:42 +00:00
Craig Topper
cae5aa7eae Use MVT instead of EVT in more instruction lowering code.
llvm-svn: 172933
2013-01-20 00:38:18 +00:00
Craig Topper
33af4801d0 Use MVT instead of EVT in more of the shuffle lowering code.
llvm-svn: 172930
2013-01-19 23:36:09 +00:00
Craig Topper
c7903010f6 Capitalize LowerVectorIntExtend to be consistent with all the other lower functions in this file.
llvm-svn: 172927
2013-01-19 23:14:09 +00:00
Nadav Rotem
6d7bb8551d On Sandybridge split unaligned 256bit stores into two xmm-sized stores.
llvm-svn: 172894
2013-01-19 08:38:41 +00:00
Craig Topper
a19b081754 Use MVT instead of EVT when computing shuffle immediates since they can only be for legal types. Keeps compiler from generating unneeded checks and handling for extended types.
llvm-svn: 172893
2013-01-19 08:27:45 +00:00
Nadav Rotem
de06852537 On Sandybridge loading unaligned 256bits using two XMM loads (vmovups and vinsertf128) is faster than using a single vmovups instruction.
llvm-svn: 172868
2013-01-18 23:10:30 +00:00
Craig Topper
a7891c8da9 Calculate vector element size more directly for VINSERTF128/VEXTRACTF128 immediate handling. Also use MVT since this only called on legal types during pattern matching.
llvm-svn: 172797
2013-01-18 08:41:28 +00:00
Craig Topper
4389c60096 Minor formatting fix. No functional change.
llvm-svn: 172795
2013-01-18 07:27:20 +00:00
Craig Topper
c06d31bb06 Spelling fix: extened->extended. Trailing whitespace in same function.
llvm-svn: 172793
2013-01-18 06:50:59 +00:00
Craig Topper
a819bc40e0 Make more use of is128BitVector/is256BitVector in place of getSizeInBits() == 128/256.
llvm-svn: 172792
2013-01-18 06:44:29 +00:00
Chad Rosier
66c139114d [ms-inline asm] Make the error message more generic now that we support the
'SIZE' and 'LENGTH' operators.

llvm-svn: 172773
2013-01-18 00:50:59 +00:00
Chad Rosier
bb513e22fa [ms-inline asm] Add support for the 'SIZE' and 'LENGTH' operators.
Part of rdar://12576868

llvm-svn: 172743
2013-01-17 19:21:48 +00:00
Elena Demikhovsky
461c2bd18c Optimization for the following SIGN_EXTEND pairs:
v8i8  -> v8i64, 
v8i8  -> v8i32, 
v4i8  -> v4i64, 
v4i16 -> v4i64 
for AVX and AVX2.

Bug 14865.

llvm-svn: 172708
2013-01-17 09:59:53 +00:00
Craig Topper
c5444baf77 Combine AVX and SSE forms of MOVSS and MOVSD into the same multiclasses so they get instantiated together.
llvm-svn: 172704
2013-01-17 06:59:42 +00:00
Jakob Stoklund Olesen
4cc85cb304 Provide a place for targets to insert ILP optimization passes.
Move the early if-conversion pass into this group.

ILP optimizations usually need to find the right balance between
register pressure and ILP using the MachineTraceMetrics analysis to
identify critical paths and estimate other costs. Such passes should run
together so they can share dominator tree and loop info analyses.

Besides if-conversion, future passes to run here here could include
expression height reduction and ARM's MLxExpansion pass.

llvm-svn: 172687
2013-01-17 00:58:38 +00:00
Renato Golin
1487c2a7ac Change CostTable model to be global to all targets
Moving the X86CostTable to a common place, so that other back-ends
can share the code. Also simplifying it a bit and commoning up
tables with one and two types on operations.

llvm-svn: 172658
2013-01-16 21:29:55 +00:00
Chad Rosier
1f23c079a7 [ms-inline asm] Extend support for parsing Intel bracketed memory operands that
have an arbitrary ordering of the base register, index register and displacement.
rdar://12527141

llvm-svn: 172484
2013-01-14 22:31:35 +00:00
Craig Topper
58b9662000 Simplify nested strconcats in X86 td files since strconcat can take more than 2 arguments.
llvm-svn: 172379
2013-01-14 07:46:34 +00:00
Craig Topper
7dac5e7e3d Create a single multiclass for SSE and AVX version of MOVL/MOVH. Prevents needing to specify everything twice. No functional change intended
llvm-svn: 172378
2013-01-14 07:26:58 +00:00
Nick Lewycky
07a4cc5052 Fix typo in comment.
llvm-svn: 172364
2013-01-13 19:03:55 +00:00
Benjamin Kramer
26eae94ea6 X86: Add patterns for X86ISD::VSEXT in registers.
Those can occur when something between the sextload and the store is on the same
chain and blocks isel. Fixes PR14887.

llvm-svn: 172353
2013-01-13 11:37:04 +00:00
Preston Gurd
7affdf3bdd Update patch for the pad short functions pass for Intel Atom (only).
Adds a check for -Oz, changes the code to not re-visit BBs,
and skips over DBG_VALUE instrs.

Patch by Andy Zhang.

llvm-svn: 172258
2013-01-11 22:06:56 +00:00
NAKAMURA Takumi
da4d0cbcc1 X86AsmParser.cpp: Fix up r172148, to add initializer in another CreateMem().
llvm-svn: 172157
2013-01-11 01:13:54 +00:00
Jakub Staszak
4beed9fd38 Remove heavy and unused #inclues from X86TargetObjectFile.cpp.
llvm-svn: 172151
2013-01-10 23:43:56 +00:00
Chad Rosier
217f7fad13 [ms-inline asm] Make sure we set a default value for AddressOf. Follow on to
r172121.

llvm-svn: 172148
2013-01-10 23:39:07 +00:00
Chad Rosier
f66d08be5c [ms-inline asm] Add support for calling functions from inline assembly.
Part of rdar://12991541

llvm-svn: 172121
2013-01-10 22:10:27 +00:00
Nadav Rotem
436dc952aa ARM Cost model: Use the size of vector registers and widest vectorizable instruction to determine the max vectorization factor.
llvm-svn: 172010
2013-01-09 22:29:00 +00:00
Nadav Rotem
18e176ccaa Efficient lowering of vector sdiv when the divisor is a splatted power of two constant.
PR 14848. The lowered sequence is based on the existing sequence the target-independent
DAG Combiner creates for the scalar case.

Patch by Zvi Rackover.

llvm-svn: 171953
2013-01-09 05:14:33 +00:00
Eric Christopher
44e3142d09 Last in the series of removing unnecessary '0' arguments for
address space. Reordered the EmitULEB128IntValue arguments to
make this easier.

llvm-svn: 171949
2013-01-09 03:52:05 +00:00
Andrew Trick
c15e94c204 MIsched: add an ILP window property to machine model.
This was an experimental option, but needs to be defined
per-target. e.g. PPC A2 needs to aggressively hide latency.

I converted some in-order scheduling tests to A2. Hal is working on
more test cases.

llvm-svn: 171946
2013-01-09 03:36:49 +00:00
Eric Christopher
38c8e00aa9 These functions have default arguments of 0 for the last arg. Use
them.

llvm-svn: 171933
2013-01-09 01:57:54 +00:00
Nadav Rotem
9c27f36e59 Cost Model: Move the 'max unroll factor' variable to the TTI and add initial Cost Model support on ARM.
llvm-svn: 171928
2013-01-09 01:15:42 +00:00
Preston Gurd
4b0d66f924 Pad Short Functions for Intel Atom
The current Intel Atom microarchitecture has a feature whereby
when a function returns early then it is slightly faster to execute
a sequence of NOP instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction until
the return address is ready.

When compiling for X86 Atom only, this patch will run a pass,
called "X86PadShortFunction" which will add NOP instructions where less
than four cycles elapse between function entry and return.

It includes tests.

This patch has been updated to address Nadav's review comments
- Optimize only at >= O1 and don't do optimization if -Os is set
- Stores MachineBasicBlock* instead of BBNum
- Uses DenseMap instead of std::map
- Fixes placement of braces

Patch by Andy Zhang.

llvm-svn: 171879
2013-01-08 18:27:24 +00:00
Eli Bendersky
4699968d0b Renamed MCInstFragment to MCRelaxableFragment and added some comments.
No change in functionality.

llvm-svn: 171822
2013-01-08 00:22:56 +00:00
Jordan Rose
c95190a559 Change SMRange to be half-open (exclusive end) instead of closed (inclusive)
This is necessary not only for representing empty ranges, but for handling
multibyte characters in the input. (If the end pointer in a range refers to
a multibyte character, should it point to the beginning or the end of the
character in a char array?) Some of the code in the asm parsers was already
assuming this anyway.

llvm-svn: 171765
2013-01-07 19:00:49 +00:00
Craig Topper
8884832622 Remove # from the beginning and end of def names.
llvm-svn: 171696
2013-01-07 05:26:58 +00:00
Craig Topper
b80024c8e6 Remove unnecessary # tokens at the beginning and end of defm names.
llvm-svn: 171694
2013-01-07 05:04:39 +00:00
Chandler Carruth
7723d75e9e Fix the enumerator names for ShuffleKind to match tho coding standards,
and make its comments doxygen comments.

llvm-svn: 171688
2013-01-07 03:20:02 +00:00
Chandler Carruth
601fa4e996 Make the popcnt support enums and methods have more clear names and
follow the conding conventions regarding enumerating a set of "kinds" of
things.

llvm-svn: 171687
2013-01-07 03:16:03 +00:00
Chandler Carruth
3c0f5d4efb Move TargetTransformInfo to live under the Analysis library. This no
longer would violate any dependency layering and it is in fact an
analysis. =]

llvm-svn: 171686
2013-01-07 03:08:10 +00:00
Chandler Carruth
30bd563e01 Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.

The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.

The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.

The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.

The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.

The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.

The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.

The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.

Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.

Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.

Commits to update DragonEgg and Clang will be made presently.

llvm-svn: 171681
2013-01-07 01:37:14 +00:00
Craig Topper
7af95b6c84 Fix suffix handling for parsing and printing of cvtsi2ss, cvtsi2sd, cvtss2si, cvttss2si, cvtsd2si, and cvttsd2si to match gas behavior.
cvtsi2* should parse with an 'l' or 'q' suffix or no suffix at all. No suffix should be treated the same as 'l' suffix. Printing should always print a suffix. Previously we didn't parse or print an 'l' suffix.
cvtt*2si/cvt*2si should parse with an 'l' or 'q' suffix or not suffix at all. No suffix should use the destination register size to choose encoding. Printing should not print a suffix.

Original 'l' suffix issue with cvtsi2* pointed out by Michael Kuperstein.

llvm-svn: 171668
2013-01-06 20:39:29 +00:00
Evan Cheng
80b19ffea6 Fix for PR14739. It's not safe to fold a load into a call across a store. Thanks to Nick Lewycky for the initial patch.
llvm-svn: 171665
2013-01-06 19:00:15 +00:00
Craig Topper
942e03f627 Recommit r171461 which was incorrectly reverted. Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks.
llvm-svn: 171608
2013-01-05 07:39:25 +00:00
Nadav Rotem
900cb45dec Revert revision 171524. Original message:
URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev
Log:
The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.

When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.

It includes tests.

Patch by Andy Zhang.

llvm-svn: 171603
2013-01-05 05:42:48 +00:00
Jakub Staszak
369da81e4b Move 'break' to the right place to prevent fallthru. There is no test-case
because conditions in the next case prevented from doing anything nasty.

llvm-svn: 171549
2013-01-04 23:01:26 +00:00
Preston Gurd
b1c34fa73f The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.

When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.

It includes tests.

Patch by Andy Zhang.

llvm-svn: 171524
2013-01-04 20:54:54 +00:00
Nadav Rotem
cb3562a88e LoopVectorizer:
1. Add code to estimate register pressure.
2. Add code to select the unroll factor based on register pressure.
3. Add bits to TargetTransformInfo to provide the number of registers.

llvm-svn: 171469
2013-01-04 17:48:25 +00:00
Nadav Rotem
08d6ff1eaf Revert revision: 171467. This transformation is incorrect and makes some tests fail. Original message:
Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1.
Added a test.

llvm-svn: 171468
2013-01-04 17:35:21 +00:00
Elena Demikhovsky
d675e085b0 Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1.
Added a test.

llvm-svn: 171467
2013-01-03 08:48:33 +00:00
Michael Gottesman
5f81e1c2d0 Revert "Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks."
This reverts commit r171461 since it breaks the following tests:

Clang :: Analysis/outofbound-notwork.c
Clang :: Analysis/string-fail.c
Clang :: CXX/basic/basic.lookup/basic.lookup.qual/p6-0x.cpp
Clang :: CXX/basic/basic.lookup/basic.lookup.unqual/p15.cpp
Clang :: CXX/dcl.dcl/dcl.spec/dcl.fct.spec/p4.cpp
Clang :: CXX/dcl.dcl/dcl.spec/dcl.stc/p10.cpp
Clang :: CXX/temp/temp.param/p14.cpp
Clang :: CXX/temp/temp.res/temp.dep.res/temp.point/p1.cpp
Clang :: CodeGen/2009-02-13-zerosize-union-field-ppc.c
Clang :: CodeGen/blocks-2.c
Clang :: CodeGen/libcalls-d.c
Clang :: CodeGen/libcalls-ld.c
Clang :: CodeGenCXX/conversion-function.cpp
Clang :: CodeGenCXX/debug-info-limit-type.cpp
Clang :: CodeGenCXX/inheriting-constructor.cpp
Clang :: FixIt/fixit-errors.c
Clang :: FixIt/fixit-pmem.cpp
Clang :: Modules/namespaces.cpp
Clang :: PCH/changed-files.c
Clang :: PCH/pr4489.c
Clang :: PCH/source-manager-stack.c
Clang :: Parser/cxx-ambig-decl-expr-xfail.cpp
Clang :: SemaCXX/switch-implicit-fallthrough-cxx98.cpp
Clang :: SemaTemplate/instantiate-function-1.mm

llvm-svn: 171466
2013-01-03 08:18:30 +00:00