1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-27 05:53:07 +01:00
Commit Graph

33952 Commits

Author SHA1 Message Date
James Molloy
aa51550e7c [ARM] Allow vmin/vmax of scalars to be emitted without UseNEONForFP.
This overrides the default to more closely resemble the hand-crafted matching logic in ISelLowering. It makes sense, as there is no VFP equivalent of vmin or vmax, to use them when they're available even if in general VFP ops should be preferred.

This should be NFC.

llvm-svn: 244915
2015-08-13 17:28:20 +00:00
Ulrich Weigand
6643dc8666 [SystemZ] Support large LLVM IR struct return values
Recent mesa/llvmpipe crashes on SystemZ due to a failed assertion when
attempting to compile a routine with a return type of
  { <4 x float>, <4 x float>, <4 x float>, <4 x float> }
on a system without vector instruction support.

This is because after legalizing the vector type, we get a return value
consisting of 16 floats, which cannot all be returned in registers.

Usually, what should happen in this case is that the target's CanLowerReturn
routine rejects the return type, in which case SelectionDAG falls back to
implementing a structure return in memory via implicit reference.

However, the SystemZ target never actually implemented any CanLowerReturn
routine, and thus would accept any struct return type.

This patch fixes the crash by implementing CanLowerReturn.  As a side effect,
this also handles fp128 return values, fixing a todo that was noted in
SystemZCallingConv.td.

llvm-svn: 244889
2015-08-13 13:37:06 +00:00
John Brawn
ecc0ff6b14 [ARM] Reorganise and simplify thumb-1 load/store selection
Other than PC-relative loads/store the patterns that match the various
load/store addressing modes have the same complexity, so the order that they
are matched is the order that they appear in the .td file.

Rearrange the instruction definitions in ARMInstrThumb.td, and make use of
AddedComplexity for PC-relative loads, so that the instruction matching order
is the order that results in the simplest selection logic. This also makes
register-offset load/store be selected when it should, as previously it was
only selected for too-large immediate offsets.

Differential Revision: http://reviews.llvm.org/D11800

llvm-svn: 244882
2015-08-13 10:48:22 +00:00
Ahmed Bougacha
8e769a9269 [AArch64] Also custom-lowering mismatched vector/f16 FCOPYSIGN.
We can lower them using our cool tricks if we fpext/fptrunc the second
input, like we do for f32/f64.

Follow-up to r243924, r243926, and r244858.

llvm-svn: 244860
2015-08-13 01:13:56 +00:00
JF Bastien
9205d59c82 WebAssembly: floating-point comparisons
Summary:
D11924 implemented part of the floating-point comparisons, this patch implements the rest:
 * Tell ISelLowering that all booleans are either 0 or 1.
 * Expand the eq/ne/lt/le/gt/ge floating-point comparisons to the canonical ones (similar to what Mips32r6InstrInfo.td does).
 * Add tests for ord/uno.
 * Add tests for ueq/one/ult/ule/ugt/uge.
 * Fix existing comparison tests to remove the (res & 1) code, which setBooleanContents stops from generating.

Reviewers: sunfish

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11970

llvm-svn: 244779
2015-08-12 17:53:29 +00:00
Sanjay Patel
a24f811c83 80-cols; NFC
llvm-svn: 244755
2015-08-12 15:12:25 +00:00
Sanjay Patel
463099751f fix typo; NFC
llvm-svn: 244753
2015-08-12 15:09:09 +00:00
Zoran Jovanovic
3c2a065d19 [mips][microMIPS] Create microMIPS64r6 subtarget and implement DALIGN, DAUI, DAHI, DATI, DEXT, DEXTM and DEXTU instructions
Differential Revision: http://reviews.llvm.org/D10923

llvm-svn: 244744
2015-08-12 12:45:16 +00:00
Michael Kuperstein
43bbce4282 [X86] Disable mul -> shl + lea combine when compiling for minsize
Differential Revision: http://reviews.llvm.org/D11904

llvm-svn: 244740
2015-08-12 11:27:26 +00:00
Michael Kuperstein
8c8a758faa [X86] Allow x86 call frame optimization to fold more loads into pushes
This abstracts away the test for "when can we fold across a MachineInstruction"
into the the MI interface, and changes call-frame optimization use the same test
the peephole optimizer users.

Differential Revision: http://reviews.llvm.org/D11945

llvm-svn: 244729
2015-08-12 10:14:58 +00:00
Matt Arsenault
ac50a3d981 AMDGPU: Fix assert on dbg_value instructions
llvm-svn: 244728
2015-08-12 09:04:44 +00:00
Simon Pilgrim
45d6ddee89 [InstCombine] Move SSE/AVX vector blend folding to instcombiner
As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely).

InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask).

I also moved all the relevant combine tests into InstCombine/blend_x86.ll

Differential Revision: http://reviews.llvm.org/D11934

llvm-svn: 244723
2015-08-12 08:08:56 +00:00
Saleem Abdulrasool
23546702ae X86: hoist a condition into a variable (NFC)
The same value is used multiple times through the function.  Hoist the condition
into a variable.  This should fix a silly static analysis warning where the
conditions flip around.  No functional change intended.

llvm-svn: 244713
2015-08-12 02:01:36 +00:00
Sanjay Patel
7b4cd645e8 [x86] enable machine combiner reassociations for 256-bit vector FP mul/add
llvm-svn: 244705
2015-08-12 00:29:10 +00:00
Alex Lorenz
ce2812bb8e PseudoSourceValue: Transform the mips subclass to target independent subclasses
This commit transforms the mips-specific 'MipsCallEntry' subclass of the
'PseudoSourceValue' class into two, target-independent subclasses named
'GlobalValuePseudoSourceValue' and 'ExternalSymbolPseudoSourceValue'.

This change makes it easier to serialize the pseudo source values by removing
target-specific pseudo source values.

Reviewers: Akira Hatanaka
llvm-svn: 244698
2015-08-11 23:23:17 +00:00
Alex Lorenz
7b1d22a17d PseudoSourceValue: Replace global manager with a manager in a machine function.
This commit removes the global manager variable which is responsible for
storing and allocating pseudo source values and instead it introduces a new
manager class named 'PseudoSourceValueManager'. Machine functions now own an
instance of the pseudo source value manager class.

This commit also modifies the 'get...' methods in the 'MachinePointerInfo'
class to construct pseudo source values using the instance of the pseudo
source value manager object from the machine function.

This commit updates calls to the 'get...' methods from the 'MachinePointerInfo'
class in a lot of different files because those calls now need to pass in a
reference to a machine function to those methods.

This change will make it easier to serialize pseudo source values as it will
enable me to transform the mips specific MipsCallEntry PseudoSourceValue
subclass into two target independent subclasses.

Reviewers: Akira Hatanaka
llvm-svn: 244693
2015-08-11 23:09:45 +00:00
Alex Lorenz
4047ccf510 PseudoSourceValue: Introduce a 'PSVKind' enumerator.
This commit introduces a new enumerator named 'PSVKind' in the
'PseudoSourceValue' class. This enumerator is now used to distinguish between
the various kinds of pseudo source values.

This change is done in preparation for the changes to the pseudo source value
object management and to the PseudoSourceValue's class hierarchy - the next two
PseudoSourceValue commits will get rid of the global variable that manages the
pseudo source values and the mips specific MipsCallEntry subclass.

Reviewers: Akira Hatanaka
llvm-svn: 244687
2015-08-11 22:32:00 +00:00
Mark Heffernan
030525f5bf Use 32-bit divides instead of 64-bit divides where possible.
For NVPTX, try to use 32-bit division instead of 64-bit division when the dividend and divisor
fit in 32 bits. This speeds up some internal benchmarks significantly. The underlying reason
is that many index computations are carried out in 64-bits but never actually exceed the
capacity of a 32-bit word.

llvm-svn: 244684
2015-08-11 22:16:34 +00:00
JF Bastien
a0847105a1 WebAssembly: implement comparison.
Some of the FP comparisons (ueq, one, ult, ule, ugt, uge) are currently broken, I'll fix them in a follow-up.

Reviewers: sunfish

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11924

llvm-svn: 244665
2015-08-11 21:02:46 +00:00
Sanjay Patel
39bab9e7a2 [x86] enable machine combiner reassociations for 128-bit vector single/double multiplies
llvm-svn: 244657
2015-08-11 20:19:23 +00:00
JF Bastien
aace81abf2 WebAssembly: implement WebAssemblyTargetLowering::getTargetNodeName
Summary: Implementation is the same as in AArch64.

Subscribers: aemerson, jfb, llvm-commits, sunfish

Differential Revision: http://reviews.llvm.org/D11956

llvm-svn: 244655
2015-08-11 20:13:18 +00:00
Rafael Espindola
574b6734d9 Use llvm::make_unique to fix the MSVC build.
llvm-svn: 244641
2015-08-11 18:11:17 +00:00
Michael Kuperstein
8ea9afb887 [X86] Allow merging of immediates within a basic block for code size savings
First step in preventing immediates that occur more than once within a single
basic block from being pulled into their users, in order to prevent unnecessary
large instruction encoding .Currently enabled only when optimizing for size.

Patch by: zia.ansari@intel.com
Differential Revision: http://reviews.llvm.org/D11363

llvm-svn: 244601
2015-08-11 14:10:58 +00:00
James Molloy
e0929cde28 [AArch64] Match fminnum/fmaxnum for vector fminnm/fmaxnm instead of an intrinsic.
Lower Intrinsic::aarch64_neon_fmin/fmax to fminnum/fmannum and match that instead. Minimal functional change:

  - Extra tests added because coverage of scalar fminnm/fmaxnm instructions was nonexistant.
  - f16 test updated because now we actually generate scalar fminnm/fmaxnm we no longer need to bail out to a libcall!

llvm-svn: 244595
2015-08-11 12:06:37 +00:00
James Molloy
655902e549 [AArch64] Replace the custom AArch64ISD::FMIN/MAX nodes with ISD::FMINNAN/MAXNAN
NFCI. This just removes custom ISDNodes that are no longer needed.

llvm-svn: 244594
2015-08-11 12:06:33 +00:00
James Molloy
d56d688228 [ARM] Match fminnan/fmaxnan for vector vmin/vmax instead of an intrinsic
Lower Intrinsic::arm_neon_vmins/vmaxs to fminnan/fmaxnan and match that instead. This is important because SDAG will soon be able to select FMINNAN itself, so we need a unified lowering path for intrinsics and SDAG.

NFCI.

llvm-svn: 244593
2015-08-11 12:06:28 +00:00
James Molloy
c131e948e8 [ARM] Match fminnum/fmaxnum for vector vminnm/vmaxnm instead of an intrinsic
Lower the intrinsic to a FMINNUM/FMAXNUM node and select that instead. This is important because soon SDAG will be able to select FMINNUM/FMAXNUM itself, so we need an integrated lowering path between SDAG and intrinsics.

NFCI.

llvm-svn: 244592
2015-08-11 12:06:25 +00:00
James Molloy
83cfd780e5 [ARM] Replace ARMISD::VMINNM/VMAXNM with ISD::FMINNUM/FMAXNUM
NFCI. This replaces another custom ISDNode with a generic equivalent.

llvm-svn: 244591
2015-08-11 12:06:22 +00:00
James Molloy
9564f7ade6 [ARM] Replace ARMISD::FMIN/FMAX with the shiny new ISD::FMINNAN/FMAXNAN.
NFCI. This removes a custom ISDNode.

llvm-svn: 244590
2015-08-11 12:06:15 +00:00
Marina Yatsina
a28fbe6a96 [X86] Add SAL mnemonics for Intel syntax
SAL and SHL instructions perform the same operation

Differential Revision: http://reviews.llvm.org/D11882

llvm-svn: 244588
2015-08-11 12:05:06 +00:00
Marina Yatsina
fc986c89c0 [X86] Fix REPE, REPZ, REPNZ for intel syntax
REPE, REPZ, REPNZ, REPNE should have mnemonics for Intel syntax as well.
Currently using these instructions causes compilation errors for Intel syntax.

Differential Revision: http://reviews.llvm.org/D11794

llvm-svn: 244584
2015-08-11 11:28:10 +00:00
Marina Yatsina
d8e14460d5 [X86] Fix imul alias for intel syntax
The "imul reg, imm" alias is not defined for intel syntax. 
In intel syntax there is no w/l/q suffix for the imul instruction.

Differential Revision: http://reviews.llvm.org/D11887

llvm-svn: 244582
2015-08-11 10:43:04 +00:00
Vasileios Kalintiris
761ce121c9 [mips] Remap move as or.
Summary:
This patch remaps the assembly idiom 'move' to 'or' instead of 'daddu' or
'addu'. The use of addu/daddu instead of or as move was highlighted as a
performance issue during the analysis of a recent 64bit design. Originally
move was encoded as 'or' by binutils but was changed for the r10k cpu family
due to their pipeline which had 2 arithmetic units and a single logical unit,
and so could issue multiple (d)addu based moves at the same time but only 1
logical move.

This patch preserves the disassembly behaviour so that disassembling a old style
(d)addu move still appears as move, but assembling move always gives an or

Patch by Simon Dardis.

Reviewers: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11796

llvm-svn: 244579
2015-08-11 08:56:25 +00:00
Michael Kuperstein
ebd10e5c0e [X86] When optimizing for minsize, use POP for small post-call stack clean-up
When optimizing for size, replace "addl $4, %esp" and "addl $8, %esp"
following a call by one or two pops, respectively. We don't try to do it in
general, but only when the stack adjustment immediately follows a call - which
is the most common case.

That allows taking a short-cut when trying to find a free register to pop into,
instead of a full-blown liveness check. If the adjustment immediately follows a
call, then every register the call clobbers but doesn't define should be dead at
that point, and can be used.

Differential Revision: http://reviews.llvm.org/D11749

llvm-svn: 244578
2015-08-11 08:48:48 +00:00
JF Bastien
7a8b1de402 WebAssembly: NFC fix release build break, unused variable.
Summary: Caused by D11914, pointed out by blaikie.

Subscribers: llvm-commits, jfb, dblaikie

Differential Revision: http://reviews.llvm.org/D11929

llvm-svn: 244570
2015-08-11 04:52:24 +00:00
JF Bastien
b4d2511cd9 WebAssembly: add basic floating-point tests
Summary: I somehow forgot to add these when I added the basic floating-point opcodes. Also remove ceil/floor/trunc/nearestint for now, and add them only when properly tested.

Subscribers: llvm-commits, sunfish, jfb

Differential Revision: http://reviews.llvm.org/D11927

llvm-svn: 244562
2015-08-11 02:45:15 +00:00
Cameron Esfahani
0c35a7deea Explicitly clear the MI operand list when getInstruction() is called. Call MI.clear() within MCD::OPC_Decode case and inside of translateInstruction() for the X86 target. Remove now unnecessary MI.clear() from ARMDisassembler.
Summary: Explicitly clear the MI operand list when getInstruction() is called.

Reviewers: hfinkel, t.p.northover, hvarga, kparzysz, jyknight, qcolombet, uweigand

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11665

llvm-svn: 244557
2015-08-11 01:15:07 +00:00
JF Bastien
6198dac24e WebAssembly: simply assert on SNaN and NaNs with payloads
Summary: convertToHexString doesn't represent them correctly at this point in time. This is a follow-up to sunfish's suggestion in D11914.

Subscribers: llvm-commits, sunfish, jfb

Differential Revision: http://reviews.llvm.org/D11925

llvm-svn: 244551
2015-08-11 00:49:20 +00:00
Joerg Sonnenberger
547554aeba Add lduw and lwua aliases for SPARCv9.
llvm-svn: 244535
2015-08-10 23:47:22 +00:00
Joerg Sonnenberger
ed1bfffcbb Load/store for float registers from/to alternate space.
llvm-svn: 244532
2015-08-10 23:33:17 +00:00
JF Bastien
5df3b56df8 WebAssembly: print immediates
Summary:
For now output using C99's hexadecimal floating-point representation.

This patch also cleans up how machine operands are printed: instead of special-casing per type of machine instruction, the code now handles operands generically.

Reviewers: sunfish

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11914

llvm-svn: 244520
2015-08-10 22:36:48 +00:00
Joerg Sonnenberger
d2510224dd Add support for the signx instrution alias of SPARCv9.
llvm-svn: 244519
2015-08-10 22:32:25 +00:00
JF Bastien
f82b3e73a6 x86: Emit LAHF/SAHF instead of PUSHF/POPF
NaCl's sandbox doesn't allow PUSHF/POPF out of security concerns (priviledged emulators have forgotten to mask system bits in the past, and EFLAGS's DF bit is a constant source of hilarity). Commit r220529 fixed PR20376 by saving cmpxchg's flags result using EFLAGS, this commit now generated LAHF/SAHF instead, for all of x86 (not just NaCl) because it leads to an overall performance gain over PUSHF/POPF.

As with the previous patch this code generation is pretty bad because it occurs very later, after register allocation, and in many cases it rematerializes flags which were already available (e.g. already in a register through SETE). Fortunately it's somewhat rare that this code needs to fire.

I did [[ https://github.com/jfbastien/benchmark-x86-flags | a bit of benchmarking ]], the results on an Intel Haswell E5-2690 CPU at 2.9GHz are:

| Time per call (ms)  | Runtime (ms) | Benchmark                      |
| 0.000012514         |      6257    | sete.i386                      |
| 0.000012810         |      6405    | sete.i386-fast                 |
| 0.000010456         |      5228    | sete.x86-64                    |
| 0.000010496         |      5248    | sete.x86-64-fast               |
| 0.000012906         |      6453    | lahf-sahf.i386                 |
| 0.000013236         |      6618    | lahf-sahf.i386-fast            |
| 0.000010580         |      5290    | lahf-sahf.x86-64               |
| 0.000010304         |      5152    | lahf-sahf.x86-64-fast          |
| 0.000028056         |     14028    | pushf-popf.i386                |
| 0.000027160         |     13580    | pushf-popf.i386-fast           |
| 0.000023810         |     11905    | pushf-popf.x86-64              |
| 0.000026468         |     13234    | pushf-popf.x86-64-fast         |

Clearly `PUSHF`/`POPF` are suboptimal. It doesn't really seems to be worth teaching LLVM about individual flags, at least not for this purpose.

Reviewers: rnk, jvoung, t.p.northover

Subscribers: llvm-commits

Differential revision: http://reviews.llvm.org/D6629

llvm-svn: 244503
2015-08-10 20:59:36 +00:00
Sanjay Patel
bea667f5ae fix minsize detection: minsize attribute implies optimizing for size
llvm-svn: 244499
2015-08-10 20:45:44 +00:00
Simon Pilgrim
65266a8e22 [InstCombine] Move SSE2/AVX2 arithmetic vector shift folding to instcombiner
As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations.

Differential Revision: http://reviews.llvm.org/D11886

llvm-svn: 244495
2015-08-10 20:21:15 +00:00
James Y Knight
2a6af41342 [Sparc] Implement i64 load/store support for 32-bit sparc.
The LDD/STD instructions can load/store a 64bit quantity from/to
memory to/from a consecutive even/odd pair of (32-bit) registers. They
are part of SparcV8, and also present in SparcV9. (Although deprecated
there, as you can store 64bits in one register).

As recommended on llvmdev in the thread "How to enable use of 64bit
load/store for 32bit architecture" from Apr 2015, I've modeled the
64-bit load/store operations as working on a v2i32 type, rather than
making i64 a legal type, but with few legal operations. The latter
does not (currently) work, as there is much code in llvm which assumes
that if i64 is legal, operations like "add" will actually work on it.

The same assumption does not hold for v2i32 -- for vector types, it is
workable to support only load/store, and expand everything else.

This patch:
- Adds a new register class, IntPair, for even/odd pairs of registers.

- Modifies the list of reserved registers, the stack spilling code,
  and register copying code to support the IntPair register class.

- Adds support in AsmParser. (note that in asm text, you write the
  name of the first register of the pair only. So the parser has to
  morph the single register into the equivalent paired register).

- Adds the new instructions themselves (LDD/STD/LDDA/STDA).

- Hooks up the instructions and registers as a vector type v2i32. Adds
  custom legalizer to transform i64 load/stores into v2i32 load/stores
  and bitcasts, so that the new instructions can actually be
  generated, and marks all operations other than load/store on v2i32
  as needing to be expanded.

- Copies the unfortunate SelectInlineAsm hack from ARMISelDAGToDAG.
  This hack undoes the transformation of i64 operands into two
  arbitrarily-allocated separate i32 registers in
  SelectionDAGBuilder. and instead passes them in a single
  IntPair. (Arbitrarily allocated registers are not useful, asm code
  expects to be receiving a pair, which can be passed to ldd/std.)

Also adds a bunch of test cases covering all the bugs I've added along
the way.

Differential Revision: http://reviews.llvm.org/D8713

llvm-svn: 244484
2015-08-10 19:11:39 +00:00
Chad Rosier
e027a2d1cc [AArch64] Convert a conditional check that will always be true to an assert. NFC.
llvm-svn: 244479
2015-08-10 18:42:45 +00:00
Chad Rosier
216eb1ed4b Typo. Move comment closer to relevant code. NFC.
llvm-svn: 244465
2015-08-10 17:17:19 +00:00
Sanjay Patel
d654d315bb fix minsize detection: minsize attribute implies optimizing for size
llvm-svn: 244464
2015-08-10 17:15:17 +00:00
Sanjay Patel
5fcdfefe10 fix minsize detection: minsize attribute implies optimizing for size
llvm-svn: 244463
2015-08-10 17:00:44 +00:00