1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 04:22:57 +02:00
Commit Graph

31039 Commits

Author SHA1 Message Date
Matt Arsenault
c9e7c7e638 R600/SI: Remove incorrect assertion
This can be a COPY to a physical register, such as VCC

llvm-svn: 223207
2014-12-03 05:22:38 +00:00
Matt Arsenault
0533e2261e R600/SI: Remove i1 pseudo VALU ops
Select i1 logical ops directly to 64-bit SALU instructions.
Vector i1 values are always really in SGPRs, with each
bit for each item in the wave. This saves about 4 instructions
when and/or/xoring any condition, and also helps write conditions
that need to be passed in vcc.

This should work correctly now that the SGPR live range
fixing pass works. More work is needed to eliminate the VReg_1
pseudo regclass and possibly the entire SILowerI1Copies pass.

llvm-svn: 223206
2014-12-03 05:22:35 +00:00
Matt Arsenault
aed413b578 R600/SI: Fix suspicious indexing
The loop is over the operands of an instruction, and checks the
register with the sub reg index of the dest register. This probably
meant to be checking the sub reg index of the same operand.

llvm-svn: 223205
2014-12-03 05:22:32 +00:00
Matt Arsenault
43aa2fe161 R600/SI: Fix running SILowerI1Copies a second time
llvm-svn: 223204
2014-12-03 05:22:30 +00:00
Matt Arsenault
10b6254c18 R600/SI: Fix live range error hidden by SIFoldOperands
m0 is treated as a virtual register class with a single register
rather than the physical register it really is. This was updating
the live range of the used virtual copy of m0 from the first ds_read
instruction, and leaving the unused copy unchanged. This resulted in a
"Live segment doesn't end at a valid instruction" verifier error because
the erased instructions. Update the live range of the second copy (which
should be dead).

No test since I'm not sure how to trigger this with SIFoldOperands
enabled.

llvm-svn: 223203
2014-12-03 05:22:29 +00:00
Duncan P. N. Exon Smith
93d49e04d2 NVPTX: Delete dead code
`MDNode` does not inherit from `User`, and it never has a name.

llvm-svn: 223198
2014-12-03 04:13:23 +00:00
Tom Stellard
a8af03e062 R600/SI: Enable inline assembly
We just needed to remove the assertion in
AMDGPURegisterInfo::getFrameRegister(), which is called when
initializing the parser for inline assembly.

llvm-svn: 223197
2014-12-03 04:08:00 +00:00
Matt Arsenault
a51749b87e R600/SI: Change mubuf offsets to print as decimal
This matches SC's behavior.

llvm-svn: 223194
2014-12-03 03:12:13 +00:00
Ahmed Bougacha
ac111e2226 [X86][MC] Intel syntax: accept implicit memory operand sizes larger than 80.
The X86AsmParser intel handling was refactored in r216481, making it
try each different memory operand size to see which one matches.
Operand sizes larger than 80 ("[xyz]mmword ptr") were forgotten, which
led to an "invalid operand" error for code such as:
  movdqa [rax], xmm0

llvm-svn: 223187
2014-12-03 02:03:26 +00:00
Hal Finkel
2b306926ed [PowerPC] Fix readcyclecounter to be custom expanded for all 32-bit targets
We need to use the custom expansion of readcyclecounter on all 32-bit targets
(even those with 64-bit registers). This should fix the ppc64 buildbot.

llvm-svn: 223182
2014-12-03 00:19:17 +00:00
Tim Northover
0c91cccef8 AArch64: strengthen Darwin ABI alignment assumptions
A global variable without an explicit alignment specified should be assumed to
be ABI-aligned according to its type, like on other platforms. This allows us
to use better memory operations when accessing it.

rdar://18533701

llvm-svn: 223180
2014-12-02 23:53:43 +00:00
Tim Northover
85149ea9e5 AArch64: don't be too greedy when folding :lo12: accesses into mem ops.
This frequently leads to cases like:
   ldr xD, [xN, :lo12:var]
   add xA, xN, :lo12:var
   ldr xD, [xA, #8]

where the ADD would have been needed anyway, and the two distinct addressing
modes can prevent the formation of an ldp. Because of how we handle ADRP
(aggressively forming an ADRP/ADD pseudo-inst at ISel time), this pattern also
results in duplicated ADRP instructions (one on its own to cover the ldr, and
one combined with the add).

llvm-svn: 223172
2014-12-02 23:13:39 +00:00
Simon Pilgrim
12f78b2c48 [X86][SSE] Keep 4i32 vector insertions in integer domain on SSE4.1 targets
4i32 shuffles for single insertions into zero vectors lowers to X86vzmovl which was using (v)blendps - causing domain switch stalls. This patch fixes this by using (v)pblendw instead.

The updated tests on test/CodeGen/X86/sse41.ll still contain a domain stall due to the use of insertps - I'm looking at fixing this in a future patch.

Differential Revision: http://reviews.llvm.org/D6458

llvm-svn: 223165
2014-12-02 22:31:23 +00:00
Hal Finkel
337f550328 [PowerPC] Implement readcyclecounter for PPC32
We've long supported readcyclecounter on PPC64, but it is easier there (the
read of the 64-bit time-base register can be accomplished via a single
instruction). This now provides an implementation for PPC32 as well. On PPC32,
the time-base register is still 64 bits, but can only be read 32 bits at a time
via two separate SPRs. The ISA manual explains how to do this properly (it
involves re-reading the upper bits and looping if the counter has wrapped while
being read).

This requires PPC to implement a custom integer splitting legalization for the
READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then
gets turned into a pseudo-instruction, which is then expanded to the necessary
sequence (which has three SPR reads, the comparison and the branch).

Thanks to Paul Hargrove for pointing out to me that this was still unimplemented.

llvm-svn: 223161
2014-12-02 22:01:00 +00:00
Tom Stellard
004db709b2 R600/SI: Emit amd_kernel_code_t header for AMDGPU environment
llvm-svn: 223160
2014-12-02 22:00:07 +00:00
Lang Hames
8e0b335b01 [AArch64][Stackmaps] Optimize stackmap shadows on AArch64.
Reduce the number of nops emitted for stackmap shadows on AArch64 by counting
non-stackmap instructions up to the next branch target towards the requested
shadow.

<rdar://problem/14959522>

llvm-svn: 223156
2014-12-02 21:36:24 +00:00
Tom Stellard
07557193b4 R600/SI: Move more information into SIProgramInfo struct
llvm-svn: 223154
2014-12-02 21:28:53 +00:00
Daniel Sanders
8a04298b22 [mips] Fix passing of small structures for big-endian O32.
Summary:
Like N32/N64, they must be passed in the upper bits of the register.

The new code could be merged with the existing if-statements but I've
refrained from doing this since it will make porting the O32 implementation
to tablegen harder later.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6463

llvm-svn: 223148
2014-12-02 20:40:27 +00:00
Roman Divacky
3adb65e8f7 Introduce CPUStringIsValid() into MCSubtargetInfo and use it for ARM .cpu parsing.
Previously .cpu directive in ARM assembler didnt switch to the new CPU and
therefore acted as a nop. This implemented real action for .cpu and eg. 
allows to assembler FreeBSD kernel with -integrated-as.

llvm-svn: 223147
2014-12-02 20:03:22 +00:00
Tom Stellard
aedc60a0a8 R600/SI: Refactor AMDGPUAsmPrinter::EmitProgramInfoSI()
llvm-svn: 223144
2014-12-02 19:45:05 +00:00
Philip Reames
a2fc1f8feb Remove unneccessary code introduced with 223101.
llvm-svn: 223132
2014-12-02 18:06:10 +00:00
Tom Stellard
98b309ed3d R600/SI: Set correct number of user sgprs for HSA runtime
We don't support scratch buffers yet with HSA.

llvm-svn: 223130
2014-12-02 17:41:43 +00:00
Sanjay Patel
f1f2d23cc4 fix typo in comment
llvm-svn: 223127
2014-12-02 17:25:27 +00:00
Tim Northover
53d419429f AArch64: make register block rules apply to vector types too.
The blocking code originated in ARM, which is more aggressive about casting
types to a canonical representative before doing anything else, so I missed out
most vector HFAs and broke the ABI. This should fix it.

llvm-svn: 223126
2014-12-02 17:15:22 +00:00
Tom Stellard
221c239920 R600/SI: Set the ATC bit on all resource descriptors for the HSA runtime
llvm-svn: 223125
2014-12-02 17:05:41 +00:00
Asiri Rathnayake
b299199464 Remove unused function.
Removing an unused function which is causing one of the build bots to fail.
This was introduced in the commit r223113. A proper cleanup of the so_imm
tblgen defintion (made redundant by the mod_imm definition) needs to happen
soon.

llvm-svn: 223115
2014-12-02 12:09:55 +00:00
Asiri Rathnayake
9fa2089501 Add support for ARM modified-immediate assembly syntax.
Certain ARM instructions accept 32-bit immediate operands encoded as a 8-bit
integer value (0-255) and a 4-bit rotation (0-30, even). Current ARM assembly
syntax support in LLVM allows the decoded (32-bit) immediate to be specified
as a single immediate operand for such instructions:

mov r0, #4278190080

The ARMARM defines an extended assembly syntax allowing the encoding to be made
more explicit, as in:

mov r0, #255, #8 ; (same 32-bit value as above)

The behaviour of the two instructions can be different w.r.t flags, which is
documented under "Modified immediate constants" in ARMARM. This patch enables
support for this extended syntax at the MC layer.

llvm-svn: 223113
2014-12-02 10:53:20 +00:00
Charlie Turner
e286c5ea3a Emit Tag_ABI_FP_denormal correctly in fast-math mode.
The default ARM floating-point mode does not support IEEE 754 mode exactly. Of
relevance to this patch is that input denormals are flushed to zero. The way in
which they're flushed to zero depends on the architecture,

  * For VFPv2, it is implementation defined as to whether the sign of zero is
    preserved.
  * For VFPv3 and above, the sign of zero is always preserved when a denormal
    is flushed to zero.

When FP support has been disabled, the strategy taken by this patch is to
assume the software support will mirror the behaviour of the hardware support
for the target *if it existed*. That is, for architectures which can only have
VFPv2, it is assumed the software will flush to positive zero. For later
architectures it is assumed the software will flush to zero preserving sign.

Change-Id: Icc5928633ba222a4ba3ca8c0df44a440445865fd
llvm-svn: 223110
2014-12-02 08:22:29 +00:00
Nick Lewycky
4c5a73bf78 Fix variable used only in assertion.
llvm-svn: 223101
2014-12-02 01:09:56 +00:00
Philip Reames
27b1ef6326 Try to fix a bot failure due to a variable used only in an assert.
Specifically, bot lld-x86_64-darwin13.  Resulting from change 223085.

llvm-svn: 223092
2014-12-01 23:27:45 +00:00
Philip Reames
d056135ae1 [Statepoints 2/4] Statepoint infrastructure for garbage collection: MI & x86-64 Backend
This is the second patch in a small series.  This patch contains the MachineInstruction and x86-64 backend pieces required to lower Statepoints.  It does not include the code to actually generate the STATEPOINT machine instruction and as a result, the entire patch is currently dead code.  I will be submitting the SelectionDAG parts within the next 24-48 hours.  Since those pieces are by far the most complicated, I wanted to minimize the size of that patch.  That patch will include the tests which exercise the functionality in this patch.  The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683.

The STATEPOINT psuedo node is generated after all gc values are explicitly spilled to stack slots.  The purpose of this node is to wrap an actual call instruction while recording the spill locations of the meta arguments used for garbage collection and other purposes.  The STATEPOINT is modeled as modifing all of those locations to prevent backend optimizations from forwarding the value from before the STATEPOINT to after the STATEPOINT.  (Doing so would break relocation semantics for collectors which wish to relocate roots.)

The implementation of STATEPOINT is closely modeled on PATCHPOINT.  Eventually, much of the code in this patch will be removed.  The long term plan is to merge the functionality provided by statepoints and patchpoints.  Merging their implementations in the backend is likely to be a good starting point.

Reviewed by: atrick, ributzka

llvm-svn: 223085
2014-12-01 22:52:56 +00:00
Jingyue Wu
66d4df8027 [NVPTX] Do not emit .weak symbols for NVPTX
Summary:
".weak" symbols cannot be consumed by ptxas (PR21685). This patch makes the
weak directive in MCAsmPrinter customizable, and disables emitting ".weak"
symbols for NVPTX.

Test Plan: weak-linkage.ll

Reviewers: jholewinski

Reviewed By: jholewinski

Subscribers: majnemer, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D6455

llvm-svn: 223077
2014-12-01 21:16:17 +00:00
Ahmed Bougacha
17e4b2bbd7 [AArch64] Don't combine "select (setcc i1 LHS, RHS), vL, vR".
r208210 introduced an optimization that improves the vector select
codegen by doing the setcc on vectors directly.
This is a problem they the setcc operands are i1s, because the
optimization would create vectors of i1, which aren't legal.

Part of PR21549.

Differential Revision: http://reviews.llvm.org/D6308

llvm-svn: 223075
2014-12-01 20:59:00 +00:00
Ahmed Bougacha
e6c3a5f724 [AArch64] Fix v2i8->i16 bitcast legalization.
r213378 improved f16 bitcasts, so that they go directly through subregs,
instead of through the stack.  That code now causes an assertion failure
for bitcasts from other 16-bits types (most importantly v2i8).

Correct that by doing the custom lowering for i16 bitcasts only when the
input is an f16.

Part of PR21549.

Differential Revision: http://reviews.llvm.org/D6307

llvm-svn: 223074
2014-12-01 20:52:32 +00:00
Tim Northover
6096fde320 ARM: lower tail calls correctly when using GHC calling convention.
Patch by Ben Gamari.

llvm-svn: 223055
2014-12-01 17:46:39 +00:00
Matt Arsenault
2819063e41 R600/SI: Various instruction format bit test cleanups
- Fix missing SALU format bits
- Remove unused isSALUInstr
- Add isVALU
- Switch isDS to use a bit like the others
- Move SIInstrInfo::is* functions to header
- Reorder so they are approximately sorted by type (SALU, VALU, memory)

llvm-svn: 223038
2014-12-01 15:52:46 +00:00
Vladimir Medic
7915fd41c4 The andi16, addiusp and jraddiusp micromips instructions were missing dedicated decoder methods in MipsDisassembler.cpp to properly decode immediate operands. These methods are added together with corresponding tests.
llvm-svn: 223006
2014-12-01 11:12:04 +00:00
Jay Foad
1e2a95bd74 [PowerPC] Fix unwind info with dynamic stack realignment
Summary:
PowerPC DWARF unwind info defined CFA as SP + offset even in a function
where the stack had been dynamically realigned. This clearly doesn't
work because the offset from SP to CFA is not a constant. Fix it by
defining CFA as BP instead.

This was causing the AddressSanitizer null_deref test to fail 50% of
the time, depending on whether SP happened to be 32-byte aligned on
entry to a particular function or not.

Reviewers: willschm, uweigand, hfinkel

Reviewed By: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6410

llvm-svn: 222996
2014-12-01 09:42:32 +00:00
Charlie Turner
8386827fe8 Add post-decode checking of HVC instruction.
Add checkDecodedInstruction for post-decode checking of instructions, to catch
the corner cases like HVC that don't fit into the general pattern. Needed to
check for an invalid condition field in instruction encoding despite HVC not
taking a predicate.

Patch by Matthew Wahab.

Change-Id: I48e28de981d7a9e43569594da3c45fb478b4f795
llvm-svn: 222992
2014-12-01 08:50:27 +00:00
Charlie Turner
5186769a0c Add Thumb HVC and ERET virtualisation extension instructions.
Patch by Matthew Wahab.

Change-Id: I131f71c1150d5fa797066a18e09d526c19bf9016
llvm-svn: 222990
2014-12-01 08:39:19 +00:00
Charlie Turner
b03bbf42d6 Add ARM ERET and HVC virtualisation extension instructions.
Patch by Matthew Wahab.

Change-Id: Iad75f078fbaa4ecc7d7a4820ad9b3930679cbbbb
llvm-svn: 222989
2014-12-01 08:33:28 +00:00
Akira Hatanaka
881af53bf7 Fix capitalization. NFC.
llvm-svn: 222988
2014-12-01 06:14:52 +00:00
Hal Finkel
efd08ad91f [PowerPC] Add asm support for cache-inhibited ld/st instructions
Add assembler support for the fixed-point cache-inhibited load/store
instructions. These are hypervisor-level only, so don't get too excited ;)

Fixes PR21650.

llvm-svn: 222976
2014-11-30 10:15:56 +00:00
Simon Pilgrim
3d5273ea80 Target triple OS detection tidyup. NFC
Use Triple::isOS*() helpers where possible.

llvm-svn: 222960
2014-11-29 19:18:21 +00:00
Jozef Kolek
e3a600e129 [mips][microMIPS] Implement NOP aliases
This patch implements microMIPS 16-bit (MOVE16 $0, $0) and
32-bit (SLL $0, $0, 0) NOP aliases.

http://reviews.llvm.org/D6440

llvm-svn: 222953
2014-11-29 13:29:24 +00:00
Matt Arsenault
5c2fb0e261 R600/SI: Fix assertion on sign extend of 3 vectors
This was trying to create an MVT with 3x vectors which
created an invalid EVT

llvm-svn: 222942
2014-11-28 22:51:38 +00:00
Duncan P. N. Exon Smith
73ce6dbb2b Revert "Masked Vector Load and Store Intrinsics."
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot.  I'll respond to the commit on the
list with a reproduction of one of the failures.

Conflicts:
	lib/Target/X86/X86TargetTransformInfo.cpp

llvm-svn: 222936
2014-11-28 21:29:14 +00:00
Sanjay Patel
019be40c8c Enable FeatureFastUAMem for btver2
Allow unaligned 16-byte memop codegen for btver2. No functional changes for any other subtargets.

Replace the existing supposed small memcpy test with an actual test of a small memcpy. 
The previous test wasn't using FileCheck either.

This patch should allow us to close PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).

Differential Revision: http://reviews.llvm.org/D6360

llvm-svn: 222925
2014-11-28 18:40:18 +00:00
Charlie Turner
dc3f8f7176 Fix wrong encoding of MRSBanked.
Patch by Matthew Wahab.

Change-Id: Ia2a001ca2760028ea360fe77b56f203a219eefbc
llvm-svn: 222920
2014-11-28 15:01:06 +00:00
Craig Topper
c8e385cf52 Add missing 'override' keyword.
llvm-svn: 222911
2014-11-28 03:58:26 +00:00
Tim Northover
98dae1ec10 Stop using ArrayRef of a const type.
I *think* this is what the GCC bots are complaining about.

llvm-svn: 222905
2014-11-27 21:29:20 +00:00
Tim Northover
e8b34aaff0 AArch64: treat [N x Ty] as a block during procedure calls.
The AAPCS treats small structs and homogeneous floating (or vector) aggregates
specially, and guarantees they either get passed as a contiguous block of
registers, or prevent any future use of those registers and get passed on the
stack.

This concept can fit quite neatly into LLVM's own type system, mapping an HFA
to [N x float] and so on, and small structs to [N x i64]. Doing so allows
front-ends to emit AAPCS compliant code without having to duplicate the
register counting logic.

llvm-svn: 222903
2014-11-27 21:02:42 +00:00
Zoran Jovanovic
15712f82b0 [mips][microMIPS] Implement SWM16 and LWM16 instructions
Differential Revision: http://reviews.llvm.org/D5579

llvm-svn: 222901
2014-11-27 18:28:59 +00:00
Jozef Kolek
c85a4a5656 [mips][microMIPS] Implement BREAK16 and SDBBP16 instructions
Patch by Radovan Obradovic.

Differential Revision: http://reviews.llvm.org/D5048

llvm-svn: 222900
2014-11-27 18:18:42 +00:00
Daniel Sanders
d7d5d4cdb1 [mips] Add synci instruction.
Patch by Amaury Pouly

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6421

llvm-svn: 222899
2014-11-27 17:28:10 +00:00
Jozef Kolek
90243462d4 [mips][microMIPS] Implement disassembler support for 16-bit instructions LI16, ADDIUR1SP, ADDIUR2 and ADDIUS5
Differential Revision: http://reviews.llvm.org/D6419

llvm-svn: 222887
2014-11-27 14:41:44 +00:00
Charlie Turner
e4a6c7fd89 Stop uppercasing build attribute data.
The string data for string-valued build attributes were being unconditionally
uppercased. There is no mention in the ARM ABI addenda about case conventions,
so it's technically implementation defined as to whether the data are
capitialised in some way or not. However, there are good reasons not to
captialise the data.

  * It's less work.
  * Some vendors may legitimately have case-sensitive checks for these
    attributes which would fail on LLVM generated object files.
  * There could be locale issues with uppercasing.

The original reasons for uppercasing appear to have stemmed from an
old codesourcery toolchain behaviour, see

http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/87133

This patch makes the object file emitted no longer captialise string
data, it encodes as seen in the assembly source.

Change-Id: Ibe20dd6e60d2773d57ff72a78470839033aa5538
llvm-svn: 222882
2014-11-27 12:13:56 +00:00
Matt Arsenault
796e0c24e7 R600/SI: Use ZeroOrNegativeOneBooleanContent
This sort of doesn't matter since the setcc type is i1, but
this previously was using the default UndefinedBooleanContent. This
makes it more consistent with R600. This enables more optimizations
which typically give up on UndefinedBooleanContent. For example,
there is already a special case target DAG combine for
setcc + sext which can be eliminated in favor of what the generic
DAG combiner can do if it assumes boolean values are sign extended.
Since -1 is an inline immediate, using it is basically free and the
backend already uses it when a boolean value is needed in a wider type.

llvm-svn: 222850
2014-11-26 21:23:15 +00:00
Colin LeMahieu
0872710917 [Hexagon] Adding cmp* immediate form instructions.
llvm-svn: 222849
2014-11-26 19:43:12 +00:00
Jozef Kolek
ecfa20e7f7 [mips][microMIPS] Implement disassembler support for 16-bit instructions LBU16, LHU16, LW16, SB16, SH16 and SW16
Differential Revision: http://reviews.llvm.org/D6405

llvm-svn: 222847
2014-11-26 18:56:38 +00:00
Colin LeMahieu
1f8cdbb85d [Hexagon] Adding and64, or64, and xor64 instructions.
llvm-svn: 222846
2014-11-26 18:55:59 +00:00
Matt Arsenault
b4b33cf9de R600/SI: Create e64 versions of and/or/xor in SILowerI1Copies
This fixes moving boolean constants into registers before operating
on them. They get permuted and shrunk down to e32 anyway later. This
is a temporary fix until the patch that removes these pseudos is
committed.

llvm-svn: 222844
2014-11-26 18:18:28 +00:00
Will Newton
da953f13b9 Update AArch64 ELF relocations to ABI 1.0
This mostly entails adding relocations, however there are a couple of
changes to existing relocations:

1. R_AARCH64_NONE is defined to be zero rather than 256

R_AARCH64_NONE has been defined to be zero for a long time elsewhere
e.g. binutils and glibc since the submission of the AArch64 port in
2012 so this is required for compatibility.

2. R_AARCH64_TLSDESC_ADR_PAGE renamed to R_AARCH64_TLSDESC_ADR_PAGE21

I don't think there is any way for relocation names to leak out of LLVM
so this should not break anything.

Tested with check-all with no regressions.

llvm-svn: 222821
2014-11-26 10:49:18 +00:00
Elena Demikhovsky
868b76ae69 AVX-512: Scalar ERI intrinsics
including SAE mode and memory operand.
Added AVX512_maskable_scalar template, that should cover all scalar instructions in the future.

The main difference between AVX512_maskable_scalar<> and AVX512_maskable<> is using X86select instead of vselect.
I need it, because I can't create vselect node for MVT::i1 mask for scalar instruction.

http://reviews.llvm.org/D6378

llvm-svn: 222820
2014-11-26 10:46:49 +00:00
Craig Topper
0734168db8 Replace neverHasSideEffects=1 with hasSideEffects=0 in all .td files.
llvm-svn: 222801
2014-11-26 00:46:26 +00:00
Simon Pilgrim
0e1c44b939 [X86][SSE] Improvements to byte shift shuffle matching
Since (v)pslldq / (v)psrldq instructions resolve to a single input argument it is useful to match it much earlier than we currently do - this prevents more complicated shuffles (notably insertion into a zero vector) matching before it.

Differential Revision: http://reviews.llvm.org/D6409

llvm-svn: 222796
2014-11-25 22:34:59 +00:00
Colin LeMahieu
535f692186 [Hexagon] Adding add64 and sub64 instructions.
llvm-svn: 222795
2014-11-25 22:15:44 +00:00
Colin LeMahieu
732a0febbc Reverting 222792
llvm-svn: 222793
2014-11-25 21:39:57 +00:00
Colin LeMahieu
bf565f821c [Hexagon] Adding compare with immediate instructions.
llvm-svn: 222792
2014-11-25 21:30:28 +00:00
Colin LeMahieu
dbc4cca085 [Hexagon] Adding NOP encoding bits.
llvm-svn: 222791
2014-11-25 21:23:07 +00:00
Matt Arsenault
36581167b0 R600/SI: Only use one DEBUG()
llvm-svn: 222789
2014-11-25 21:03:22 +00:00
Cameron McInally
c32dadfa69 [AVX512] Add 512b integer shift by variable intrinsics and patterns.
llvm-svn: 222786
2014-11-25 20:41:51 +00:00
Colin LeMahieu
9b0719975d [Hexagon] Adding C2_mux instruction.
llvm-svn: 222784
2014-11-25 20:20:09 +00:00
Craig Topper
3ac6865792 Remove space before tab in all AVX512 mnemonic strings.
llvm-svn: 222778
2014-11-25 20:11:23 +00:00
Colin LeMahieu
4a42d4abd9 [Hexagon] Replacing cmp* instructions with ones that contain encoding bits.
llvm-svn: 222771
2014-11-25 18:20:52 +00:00
Chandler Carruth
5a24aaefeb Revert r222746: That commit did not update any tests and caused two R600
tests to start failing.

Original commit log: R600/SI: Disable commutativity for MIN/MAX_LEGACY

llvm-svn: 222753
2014-11-25 10:50:41 +00:00
Zoran Jovanovic
c3664f6f8a [mips][micromips] Use call instructions with short delay slots
Differential Revision: http://reviews.llvm.org/D6338

llvm-svn: 222752
2014-11-25 10:50:00 +00:00
Marek Olsak
c593ef1ab1 R600/SI: Disable commutativity for MIN/MAX_LEGACY
llvm-svn: 222746
2014-11-25 09:49:23 +00:00
Matt Arsenault
14d278bdec R600/SI: Fix allocating flat_scr_lo / flat_scr_hi
Only the super register flat_scr was marked as reserved,
so in some cases with high register usage it would still
try to allocate the subregisters.

llvm-svn: 222737
2014-11-25 07:53:06 +00:00
Juergen Ributzka
c90ddb75a2 [FastISel][AArch64] Fix and extend the tbz/tbnz pattern matching.
The pattern matching failed to recognize all instances of "-1", because when
comparing against "-1" we didn't use an APInt of the same bitwidth.

This commit fixes this and also adds inverse versions of the conditon to catch
more cases.

llvm-svn: 222722
2014-11-25 04:16:15 +00:00
Hal Finkel
d19aa4cdf8 [PowerPC] Add the 'attn' instruction
The attn instruction is not part of the Power ISA, but is documented in the A2
user manual, and is accepted by the GNU assembler for the A2 and the POWER4+.
Reported as part of PR21650.

llvm-svn: 222712
2014-11-25 00:30:11 +00:00
Hal Finkel
515f6e50f5 [PowerPC] Implement combineRepeatedFPDivisors
This does not matter on newer cores (where we can use reciprocal estimates in
fast-math mode anyway), but for older cores this allows us to generate better
fast-math code where we have multiple FDIVs with a common divisor.

llvm-svn: 222710
2014-11-24 23:45:21 +00:00
Chad Rosier
17cb0c630f [AArch64] Fix clobber computation in A57LoadBalancing pass.
Extremely difficult to reproduce, so no test case included.
PR21637

llvm-svn: 222677
2014-11-24 18:57:58 +00:00
Colin LeMahieu
c450f360d3 Removing unused variable.
llvm-svn: 222676
2014-11-24 18:55:32 +00:00
Ulrich Weigand
24b899d017 [PowerPC] Fix PR 21652 - copy st_other bits on symbol assignment
When processing an assignment in the integrated assembler that sets
a symbol to the value of another symbol, we need to copy the st_other
bits that encode the local entry point offset.

Modeled after MipsTargetELFStreamer::emitAssignment handling of the
ELF::STO_MIPS_MICROMIPS flag.

llvm-svn: 222672
2014-11-24 18:09:47 +00:00
Colin LeMahieu
1dc5a7ca7c [Hexagon] Adding asrh instruction, removing unused multiclasses.
llvm-svn: 222670
2014-11-24 18:04:42 +00:00
Colin LeMahieu
b3868ebb81 [Hexagon] Adding aslh instruction.
llvm-svn: 222668
2014-11-24 17:44:19 +00:00
Colin LeMahieu
e1cd9ff6b5 [Hexagon] Adding zxth instruction.
llvm-svn: 222662
2014-11-24 17:11:34 +00:00
Colin LeMahieu
80e59674e9 [Hexagon] Adding zxtb instruction.
llvm-svn: 222660
2014-11-24 16:48:43 +00:00
Jozef Kolek
a4e87d7a74 [mips][microMIPS] Fix JRADDIUSP instruction
Fix JRADDIUSP instruction, remove delay slot flag because this instruction
doesn't have delay slot.

Differential Revision: http://reviews.llvm.org/D6365

llvm-svn: 222658
2014-11-24 16:14:10 +00:00
Jozef Kolek
dd0dbf282b [mips][microMIPS] Implement LBU16, LHU16, LW16, SB16, SH16 and SW16 instructions
Differential Revision: http://reviews.llvm.org/D5122

llvm-svn: 222653
2014-11-24 14:39:13 +00:00
Jozef Kolek
95061e7de1 [mips][microMIPS] Implement 16-bit instructions registers including ZERO instead of S0
Implement microMIPS 16-bit instructions register set: $0, $2-$7 and $17.

Differential Revision: http://reviews.llvm.org/D5780

llvm-svn: 222652
2014-11-24 14:25:53 +00:00
Aaron Ballman
4f29ba5173 Removing a variable that is initialized but never read. The original author has been alerted to the warning, in case this variable is meant to be used. Fixes -Werror builds in the meantime.
llvm-svn: 222649
2014-11-24 14:03:16 +00:00
Jozef Kolek
0d926f8fc9 [mips][microMIPS] Implement disassembler support for 16-bit instructions
With the help of new method readInstruction16() two bytes are read and
decodeInstruction() is called with DecoderTableMicroMips16, if this fails
four bytes are read and decodeInstruction() is called with
DecoderTableMicroMips32.

Differential Revision: http://reviews.llvm.org/D6149

llvm-svn: 222648
2014-11-24 13:29:59 +00:00
Andrea Di Biagio
3646b17160 [X86] Improved target specific combine on VSELECT dag nodes.
This patch teaches function 'transformVSELECTtoBlendVECTOR_SHUFFLE' how to
convert VSELECT dag nodes to shuffles on targets that do not have SSE4.1.
On pre-SSE4.1 targets, we can still perform blend operations using movss/movsd.

Also, removed a target specific combine that performed a premature lowering of
VSELECT nodes to target specific MOVSS/MOVSD nodes.

llvm-svn: 222647
2014-11-24 12:23:15 +00:00
Michael Kuperstein
b12b19a24a [X86] Fixes bug in build_vector v4x32 lowering
r222375 made some improvements to build_vector lowering of v4x32 and v4xf32 into an insertps, but it missed a case where:

1. A single extracted element is used twice.
2. The lower of the two non-zero indexes should be preserved, and the higher should be used for the dest mask.

This caused a crash, since the source value for the insertps ends-up uninitialized.

Differential Revision: http://reviews.llvm.org/D6377

llvm-svn: 222635
2014-11-23 13:09:06 +00:00
Craig Topper
22f2dfbc9f Add missing override keywords.
llvm-svn: 222634
2014-11-23 09:40:13 +00:00
Elena Demikhovsky
36a2243ab7 Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 222632
2014-11-23 08:07:43 +00:00
Matt Arsenault
1b03538afe R600: Fix extloads of i1 on R600/Evergreen
llvm-svn: 222631
2014-11-23 02:57:54 +00:00
Matt Arsenault
417f5ceb20 R600: Fix assert on copy of an i1 on pre-SI
i1 is not a legal type on Evergreen, so this combine proceeded
and tried to produce a bitcast between i1 and i8.

llvm-svn: 222630
2014-11-23 02:57:52 +00:00
Simon Pilgrim
815bb3182b Tidied up target triple OS detection. NFC
Use Triple::isOS*() helper functions where possible.

llvm-svn: 222622
2014-11-22 19:12:10 +00:00
Chandler Carruth
3e70f9f348 [x86] Teach the vector shuffle yet another step of canonicalization.
No functionality changed yet, but this will prevent subsequent patches
from having to handle permutations of various interleaved shuffle
patterns.

llvm-svn: 222614
2014-11-22 09:18:53 +00:00
Joerg Sonnenberger
00a4fe60d0 Fix transformation of add with pc argument to adr for non-immediate
arguments.

llvm-svn: 222587
2014-11-21 22:39:34 +00:00
Tom Stellard
cfd2fce8a1 R600/SI: Add an s_mov_b32 to patterns which use the M0RegClass
We need to use a s_mov_b32 rather than a copy, so that CSE will
eliminate redundant moves to the m0 register.

llvm-svn: 222584
2014-11-21 22:31:46 +00:00
Tom Stellard
a112fe4e40 R600/SI: Emit s_mov_b32 m0, -1 before every DS instruction
This s_mov_b32 will write to a virtual register from the M0Reg
class and all the ds instructions now take an extra M0Reg explicit
argument.

This change is necessary to prevent issues with the scheduler
mixing together instructions that expect different values in the m0
registers.

llvm-svn: 222583
2014-11-21 22:31:44 +00:00
Tom Stellard
484f10138e R600/SI: Add SIFoldOperands pass
This pass attempts to fold the source operands of mov and copy
instructions into their uses.

llvm-svn: 222581
2014-11-21 22:06:37 +00:00
Jozef Kolek
52fa965cf8 [mips][microMIPS] This patch implements functionality in MIPS delay slot
filler such as if delay slot filler have to put NOP instruction into the
delay slot of microMIPS BEQ or BNE instruction which uses the register $0,
then instead of emitting NOP this instruction is replaced by the corresponding
microMIPS compact branch instruction, i.e. BEQZC or BNEZC.

Differential Revision: http://reviews.llvm.org/D3566

llvm-svn: 222580
2014-11-21 22:04:35 +00:00
Tom Stellard
b76305ec11 R600/SI: Mark s_mov_b32 and s_mov_b64 as rematerializable
llvm-svn: 222579
2014-11-21 22:00:16 +00:00
Colin LeMahieu
4986bc53c5 [Hexagon] Adding sxth instruction.
llvm-svn: 222577
2014-11-21 21:54:59 +00:00
Colin LeMahieu
9a7b747bf6 [Hexagon] Adding sxtb instruction. Renaming some identically named classes that will be removed after converting referencing defs.
llvm-svn: 222575
2014-11-21 21:35:52 +00:00
Colin LeMahieu
6e2ce8815f [Hexagon] Removing SUB_rr and replacing with A2_sub.
llvm-svn: 222571
2014-11-21 21:19:18 +00:00
Sanjay Patel
776e5485fb Add a feature flag for slow 32-byte unaligned memory accesses [x86].
This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen
for Sandy Bridge and Ivy Bridge. There is no functionality change intended for 
those chips. Previously, the absence of AVX2 was being used as a proxy to detect
this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2
that do not have the 32-byte unaligned access slowdown.

Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).

Differential Revision: http://reviews.llvm.org/D6355

llvm-svn: 222544
2014-11-21 17:40:04 +00:00
Chandler Carruth
5e598c0342 [x86] Restructure the checking patterns for v16 and v32 avx2 vector
shuffle lowering to allow much better blend matching.

Specifically, with the new structure the code seems clearer to me and we
correctly can hit the cases where merging two 128-bit lanes is a clear
win and can be shuffled cheaply afterward.

llvm-svn: 222539
2014-11-21 14:53:03 +00:00
Chandler Carruth
7491f1f32f [x86] Make the previous logic significantly less conservative and get
a bunch more improvements.

Non-lane-crossing is fine, the key is that lane merging only makes sense
for single-input shuffles. Not sure why I got so turned around here. The
code all works, I was just using the wrong model for it.

This only updates v4 and v8 lowering. The v16 and v32 lowering requires
restructuring the entire check sequence.

llvm-svn: 222537
2014-11-21 14:33:24 +00:00
Chandler Carruth
8387bec088 [x86] Teach the x86 vector shuffle lowering to detect mergable 128-bit
lanes.

By special casing these we can often either reduce the total number of
shuffles significantly or reduce the number of (high latency on Haswell)
AVX2 shuffles that potentially cross 128-bit lanes. Even when these
don't actually cross lanes, they have much higher latency to support
that. Doing two of them and a blend is worse than doing a single insert
across the 128-bit lanes to blend and then doing a single interleaved
shuffle.

While this seems like a narrow case, it kept cropping up on me and the
difference is *huge* as you can see in many of the test cases. I first
hit this trying to perfectly fix the interleaving shuffle patterns used
by Halide for AVX2.

llvm-svn: 222533
2014-11-21 13:56:05 +00:00
Alexey Volkov
235268b4ed [X86] For Silvermont CPU use 16-bit division instead of 64-bit for small positive numbers
Differential Revision: http://reviews.llvm.org/D5938

llvm-svn: 222521
2014-11-21 11:19:34 +00:00
NAKAMURA Takumi
3cd455da60 Add LLVMScalarOpts to LLVMPowerPCCodeGen.
llvm-svn: 222516
2014-11-21 09:14:45 +00:00
Hao Liu
9cb82be410 DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal.
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)

A hook is added to allow the target to control whether it needs to do such combine.

Reviewed in http://reviews.llvm.org/D6334

llvm-svn: 222510
2014-11-21 06:39:58 +00:00
Craig Topper
45dffff5e4 Remove a bunch of unnecessary typecasts to 'const TargetRegisterClass *'
llvm-svn: 222509
2014-11-21 05:58:21 +00:00
Hal Finkel
ac26448a5c [PPC] Use SeparateConstOffsetFromGEP
This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in
the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on
SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM,
there is a store moved out of the inner loop) and a potential speedup on
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it
makes some code look cleaner, and synchronizing the backends in this regard
seems like a generally good thing.

llvm-svn: 222504
2014-11-21 04:35:51 +00:00
Quentin Colombet
8fea50c066 [X86] Do not custom lower UINT_TO_FP when the target type does not
match the custom lowering.

<rdar://problem/19026326>

llvm-svn: 222489
2014-11-21 00:47:19 +00:00
Reid Kleckner
dbf3d8a5a4 Fix more instances of -Wsentinel on Windows with s/NULL/nullptr/
Follow up to r221940, where I must not have caught em all. NFC

llvm-svn: 222481
2014-11-20 23:51:47 +00:00
Reid Kleckner
6a21619ebc Add out of line virtual destructors to all LLVMTargetMachine subclasses
These recently all grew a unique_ptr<TargetLoweringObjectFile> member in
r221878.  When anyone calls a virtual method of a class, clang-cl
requires all virtual methods to be semantically valid. This includes the
implicit virtual destructor, which triggers instantiation of the
unique_ptr destructor, which fails because the type being deleted is
incomplete.

This is just part of the ongoing saga of PR20337, which is affecting
Blink as well. Because the MSVC ABI doesn't have key functions, we end
up referencing the vtable and implicit destructor on any virtual call
through a class. We don't actually end up emitting the dtor, so it'd be
good if we could avoid this unneeded type completion work.

llvm-svn: 222480
2014-11-20 23:37:18 +00:00
Mehdi Amini
fe9410a47b Update Makefile following directory removal in r222466
llvm-svn: 222475
2014-11-20 22:48:24 +00:00
Colin LeMahieu
ff9ce82394 [Hexagon] [NFC] Merging InstPrinter directory in to MCTargetDesc since they have a circular dependency.
llvm-svn: 222458
2014-11-20 21:56:35 +00:00
Saleem Abdulrasool
0de13e90eb X86: use the correct alloca symbol for Windows Itanium
Windows itanium targets the MSVCRT, and the stack probe symbol is provided by
MSVCRT.  This corrects the emission of stack probes on i686-windows-itanium.

llvm-svn: 222439
2014-11-20 18:01:26 +00:00
Jyoti Allur
0aaf89456e [ELF] Prevent ARM ELF object writer from generating deprecated relocation code R_ARM_PLT32
llvm-svn: 222414
2014-11-20 05:58:11 +00:00
Craig Topper
0f032e3130 Fix a typo in a comment.
llvm-svn: 222412
2014-11-20 05:22:37 +00:00
Colin LeMahieu
384b462d47 [Hexagon] Adding A2_xor instruction with IR selection pattern and test.
llvm-svn: 222399
2014-11-19 23:22:23 +00:00
Colin LeMahieu
35e8a8aa73 [Hexagon] Adding A2_or instruction with IR selection pattern and test.
llvm-svn: 222396
2014-11-19 22:58:04 +00:00
Andrea Di Biagio
b770dd344e [X86] Improved lowering of v4x32 build_vector dag nodes.
This patch improves the lowering of v4f32 and v4i32 build_vector dag nodes
that are known to have at least two non-zero elements.

With this patch, a build_vector that performs a blend with zero is 
converted into a shuffle. This is done to let the shuffle legalizer expand
the dag node in a optimal way. For example, if we know that a build_vector
performs a blend with zero, we can try to lower it as a movq/blend instead of
always selecting an insertps.

This patch also improves the logic that lowers a build_vector into a insertps
with zero masking. See for example the extra test cases added to test sse41.ll.

Differential Revision: http://reviews.llvm.org/D6311

llvm-svn: 222375
2014-11-19 19:34:29 +00:00
Tom Stellard
271c0a936e R600/SI: Make SIInstrInfo::isOperandLegal() more strict
A register operand that has a common sub-class with its instruction's
defined register class is not always legal.  For example,
SReg_32 and M0Reg both have a common sub-class, but we can't
use an SReg_32 in instructions that expect a M0Reg.

This prevents the llvm.SI.sendmsg.ll test from failing when the fold
operand pass is added.

llvm-svn: 222368
2014-11-19 16:58:49 +00:00
Zoran Jovanovic
ebf19d975c [mips][micromips] Implement SWM32 and LWM32 instructions
Differential Revision: http://reviews.llvm.org/D5519

llvm-svn: 222367
2014-11-19 16:44:02 +00:00
Jozef Kolek
2b6a42be6d [mips][microMIPS] Fix opcodes of MFHC1 and MTHC1 instructions.
Differential Revision: http://reviews.llvm.org/D6169

llvm-svn: 222355
2014-11-19 13:37:51 +00:00
Jozef Kolek
d19675f448 [mips][microMIPS] Implement CodeGen support for 16-bit instruction ADDIUR2.
Differential Revision: http://reviews.llvm.org/D5800

llvm-svn: 222352
2014-11-19 13:23:58 +00:00
Jozef Kolek
9fbf00198c [mips][microMIPS] Implement CodeGen support for ADDIUS5 instruction.
Differential Revision: http://reviews.llvm.org/D5799

llvm-svn: 222351
2014-11-19 13:11:09 +00:00
Jozef Kolek
0de52b5b97 [mips][microMIPS] Implement LWXS instruction.
Differential Revision: http://reviews.llvm.org/D5407

llvm-svn: 222348
2014-11-19 11:39:12 +00:00
Jozef Kolek
e466cd5b54 [mips][microMIPS] Implement SDBBP and RDHWR instructions.
Differential Revision: http://reviews.llvm.org/D5240

llvm-svn: 222347
2014-11-19 11:25:50 +00:00
Simon Pilgrim
e5f972f1c1 [X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2
This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain.

I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr.

Differential Revision: http://reviews.llvm.org/D5699

llvm-svn: 222340
2014-11-19 10:06:49 +00:00
David Blaikie
60e6c80905 Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.

This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...

llvm-svn: 222334
2014-11-19 07:49:26 +00:00
Hao Liu
f7e0bd2878 [AArch64] Disable useAA for Cortex-A57.
Using AA during CodeGen is very useful for in-order cores. It is less useful for ooo cores. Also I find
enabling useAA for Cortex-A57 may generate worse code for some test cases. If useAA in codegen is improved 
and benefical for ooo cores, we can enable it again.

llvm-svn: 222333
2014-11-19 06:48:56 +00:00
Hao Liu
00d285aca3 [AArch64] Enable SeparateConstOffsetFromGEP, EarlyCSE and LICM passes on AArch64 backend.
SeparateConstOffsetFromGEP can gives more optimizaiton opportunities related to GEPs, which benefits EarlyCSE
and LICM. By enabling these passes we can have better address calculations and generate a better addressing
mode. Some SPEC 2006 benchmarks (astar, gobmk, namd) have obvious improvements on Cortex-A57.

Reviewed in http://reviews.llvm.org/D5864.

llvm-svn: 222331
2014-11-19 06:39:53 +00:00
David Blaikie
7499cbae4c Remove StringMap::GetOrCreateValue in favor of StringMap::insert
Having two ways to do this doesn't seem terribly helpful and
consistently using the insert version (which we already has) seems like
it'll make the code easier to understand to anyone working with standard
data structures. (I also updated many references to the Entry's
key and value to use first() and second instead of getKey{Data,Length,}
and get/setValue - for similar consistency)

Also removes the GetOrCreateValue functions so there's less surface area
to StringMap to fix/improve/change/accommodate move semantics, etc.

llvm-svn: 222319
2014-11-19 05:49:42 +00:00
Weiming Zhao
c7ce2ee93f [Aarch64] Customer lowering of CTPOP to SIMD should check for NEON availability
llvm-svn: 222292
2014-11-19 00:29:14 +00:00
Matt Arsenault
73f4bd8758 R600/SI: Implement areMemAccessesTriviallyDisjoint
This partially makes up for not having address spaces
used for alias analysis in some simple cases.

This is not yet enabled by default so shouldn't change anything yet.

llvm-svn: 222286
2014-11-19 00:01:31 +00:00
Matt Arsenault
1843dcc2b6 R600/SI: Set hasSideEffects = 0 on load and store instructions.
Assuming unmodeled side effects interferes with some scheduling
opportunities.

Don't put it in the base class of DS instructions since there
are a few weird effecting, non load/store instructions there.

llvm-svn: 222285
2014-11-18 23:57:33 +00:00
Simon Pilgrim
daabed160f [X86][AVX] 256-bit vector stack unaligned load/stores identification
Under many circumstances the stack is not 32-byte aligned, resulting in the use of the vmovups/vmovupd/vmovdqu instructions when inserting ymm reloads/spills.

This minor patch adds these instructions to the isFrameLoadOpcode/isFrameStoreOpcode helpers so that they can be correctly identified and not be treated as folded reloads/spills.

This has also been noticed by http://llvm.org/bugs/show_bug.cgi?id=18846 where it was causing redundant spills - I've added a reduced test case at test/CodeGen/X86/pr18846.ll

Differential Revision: http://reviews.llvm.org/D6252

llvm-svn: 222281
2014-11-18 23:38:19 +00:00
Colin LeMahieu
f815ca1b0b [Hexagon] Adding A2_and instruction.
llvm-svn: 222274
2014-11-18 22:45:47 +00:00
Chad Rosier
ccf41a5c21 [FastISel][AArch64] Also allow folding of sign-/zero-extend and arithmetic
shift-right for booleans (i1).

Arithmetic shift-right immediate with sign-/zero-extensions also works for
boolean values.  Update the assert and the test cases to reflect that fact.

llvm-svn: 222272
2014-11-18 22:41:49 +00:00
Chad Rosier
7153154a79 [FastISel][AArch64] Also allow folding of sign-/zero-extend and logical
shift-right for booleans (i1).

Logical shift-right immediate with sign-/zero-extensions also works for boolean
values.  Update the assert and the test cases to reflect that fact.

llvm-svn: 222270
2014-11-18 22:38:42 +00:00
Colin LeMahieu
1d74e8f01b [Hexagon] Adding A2_sub instruction
Renaming test files.

llvm-svn: 222263
2014-11-18 21:51:51 +00:00
Juergen Ributzka
b3791ee3a7 [FastISel][AArch64] Follow-up fix for "Fix shift-immediate emission for "zero" shifts."
Shifts also perform sign-/zero-extends to larger types, which requires us to emit
an integer extend instead of a simple COPY.

Related to PR21594.

llvm-svn: 222257
2014-11-18 21:20:17 +00:00
Matt Arsenault
84d2214a94 R600/SI: Move SIFixSGPRCopies to inst selector passes
This should expose more of the actually used VALU
instructions to the machine optimization passes.

This also should help getting i1 handling into a better state.
For not entirly understood reasons, this fixes the split-scalar-i64-add.ll
test where a 64-bit add would only partially be moved to the VALU
resulting in use of undefined VCC.

llvm-svn: 222256
2014-11-18 21:06:58 +00:00
Juergen Ributzka
fe3cb34d8e [AArch64] Don't optimize all compare instructions.
"optimizeCompareInstr" converts compares (cmp/cmn) into plain sub/add
instructions when the flags are not used anymore. This conversion is valid for
most instructions, but not all. Some instructions that don't set the flags
(e.g. sub with immediate) can set the SP, whereas the flag setting version uses
the same encoding for the "zero" register.

Update the code to also check for the return register before performing the
optimization to make sure that a cmp doesn't suddenly turn into a sub that sets
the stack pointer.

I don't have a test case for this, because it isn't easy to trigger.

llvm-svn: 222255
2014-11-18 21:02:40 +00:00
Tom Stellard
962ccd7f85 R600/SI: Make sure resource descriptors are always stored in SGPRs
llvm-svn: 222253
2014-11-18 20:39:39 +00:00
Colin LeMahieu
f7ca7f6c70 [Hexagon] Converting from ADD_rr to A2_add which has encoding bits.
Adding test to show correct instruction selection and encoding.

llvm-svn: 222249
2014-11-18 20:28:11 +00:00
Juergen Ributzka
a2005be2b4 [FastISel][AArch64] Fix shift-immediate emission for "zero" shifts.
This change emits a COPY for a shift-immediate with a "zero" shift value.
This fixes PR21594 where we emitted a shift instruction with an incorrect
immediate operand.

llvm-svn: 222247
2014-11-18 19:58:59 +00:00
Jozef Kolek
d5cccacaae Test commit to verify that commit access works.
llvm-svn: 222244
2014-11-18 19:20:34 +00:00
Reid Kleckner
0ae02a3892 Revert "ADT: correctly report isMSVCEnvironment for windows itanium"
This reverts commit r222180.

llvm-svn: 222188
2014-11-17 22:55:59 +00:00
Saleem Abdulrasool
c8bc5eb9ab ADT: correctly report isMSVCEnvironment for windows itanium
The itanium environment on Windows uses MSVC and is a MSVC environment.  Report
this correctly.

llvm-svn: 222180
2014-11-17 22:13:26 +00:00
Matt Arsenault
0f208ea195 R600/SI: Don't copy flags when extracting subreg
This was resulting in use of a register after a kill.
For some reason this showed up as a problem in many tests
when moving the SIFixSGPRCopies pass closer to instruction
selection.

llvm-svn: 222175
2014-11-17 21:11:37 +00:00
Matt Arsenault
76c97dc14a R600/SI: Assume SIFixSGPRCopies makes changes
I'm not sure if this was breaking anything.

llvm-svn: 222174
2014-11-17 21:11:34 +00:00
Alexey Volkov
3cc8e8e28b [X86] Use ADD/SUB instead of INC/DEC for Haswell and Broadwell CPUs
Differential Revision: http://reviews.llvm.org/D5934

llvm-svn: 222141
2014-11-17 16:17:51 +00:00
Oliver Stannard
93a823bc7a [Thumb1] Re-write emitThumbRegPlusImmediate
This was motivated by a bug which caused code like this to be
miscompiled:
  declare void @take_ptr(i8*)
  define void @test() {
    %addr1.32 = alloca i8
    %addr2.32 = alloca i32, i32 1028
    call void @take_ptr(i8* %addr1)
    ret void
  }

This was emitting the following assembly to get the value of %addr1:
  add r0, sp, #1020
  add r0, r0, #8
However, "add r0, r0, #8" is not a valid Thumb1 instruction, and this
could not be assembled. The generated object file contained this,
resulting in r0 holding SP+8 rather tha SP+1028:
  add r0, sp, #1020
  add r0, sp, #8

This function looked like it could have caused miscompilations for
other combinations of registers and offsets (though I don't think it is
currently called with these), and the heuristic it used did not match
the emitted code in all cases.

llvm-svn: 222125
2014-11-17 11:18:10 +00:00
Craig Topper
5121b0369b Convert some EVTs to MVTs where only a SimpleValueType is needed.
llvm-svn: 222109
2014-11-16 21:17:18 +00:00
Craig Topper
eb88940e3d [x86] Remove two redundant isel patterns. They equivalent already exists in the instruction pattern.
llvm-svn: 222094
2014-11-16 09:24:16 +00:00
Simon Pilgrim
5e170d7652 [X86][SSE] Improve legal SHUFP and PSHUFD shuffle matching
Updated X86TargetLowering::isShuffleMaskLegal to match SHUFP masks with commuted inputs and PSHUFD masks that reference the second input.

As part of this I've refactored isPSHUFDMask to work in a more general manner and allow it to match against either the first or second input vector.

Differential Revision: http://reviews.llvm.org/D6287

llvm-svn: 222087
2014-11-15 21:13:05 +00:00
Matt Arsenault
1298bf6e9c R600: Permute operands when selecting legacy min/max
This gets the correct NaN behavior based on the compare type
the hardware uses. This now passes the new piglit test I have
for this on SI.

Add stricter tests for the operand order.

llvm-svn: 222079
2014-11-15 05:02:57 +00:00
Tom Stellard
d8a0a4cc2b R600: Fix 64-bit integer division
This fixes a failure in one of the oclconform tests.

Patch by: Jan Vesely

llvm-svn: 222073
2014-11-15 01:07:57 +00:00
Tom Stellard
573a5f6172 R600: Factor i64 UDIVREM lowering into its own fuction
This is so it could potentially be used by SI.  However, the current
implementation does not always produce correct results, so the
IntegerDivisionPass is being used instead.

llvm-svn: 222072
2014-11-15 01:07:53 +00:00
Reid Kleckner
1bf13cbc83 Rename EH related stuff to be more precise
Summary:
The current "WinEH" exception handling type is more about Itanium-style
LSDA tables layered on top of the Windows native unwind info format
instead of .eh_frame tables or EHABI unwind info. Use the name
"ItaniumWinEH" to better reflect the hybrid nature of the design.

Also rename isExceptionHandlingDWARF to usesItaniumLSDAForExceptions,
since the LSDA is part of the Itanium C++ ABI document, and not the
DWARF standard.

Reviewers: echristo

Subscribers: llvm-commits, compnerd

Differential Revision: http://reviews.llvm.org/D6279

llvm-svn: 222062
2014-11-14 23:31:07 +00:00
Tim Northover
f511e3a633 ARM: refactor .cfi_def_cfa_offset emission.
We use to track quite a few "adjusted" offsets through the FrameLowering code
to account for changes in the prologue instructions as we went and allow the
emission of correct CFA annotations. However, we were missing a couple of cases
and the code was almost impenetrable.

It's easier to just add any stack-adjusting instruction to a list and emit them
together.

llvm-svn: 222057
2014-11-14 22:45:33 +00:00
Tim Northover
5eff3fb39c ARM: correctly calculate the offset of FP in its push.
When we folded the DPR alignment gap into a push, we weren't noting the extra
distance from the beginning of the push to the FP, and so FP ended up pointing
at an incorrect offset.

The .cfi_def_cfa_offset directives are still wrong in this case, but I think
that can be improved by refactoring.

llvm-svn: 222056
2014-11-14 22:45:31 +00:00
Tom Stellard
fcbfc8a809 R600/SI: Mark s_movk_i32 as rematerializable
llvm-svn: 222037
2014-11-14 20:43:28 +00:00
Tom Stellard
c18e457d39 R600/SI: Fix spilling of m0 register
If we have spilled the value of the m0 register, then we need to restore
it with v_readlane_b32 to a regular sgpr, because v_readlane_b32 can't
write to m0.

v_readlane_b32 can't write to m0, so

llvm-svn: 222036
2014-11-14 20:43:26 +00:00
Matt Arsenault
633ce0ecd7 R600/SI: Combine min3/max3 instructions
llvm-svn: 222032
2014-11-14 20:08:52 +00:00
Matt Arsenault
d4196f855a R600/SI: Fix verifier error from a branch on IMPLICIT_DEF
SIILowerI1Copies wasn't correctly handling this case.

llvm-svn: 222020
2014-11-14 18:43:41 +00:00
Matt Arsenault
8de36ef4f7 Fix unused variable warning without asserts
llvm-svn: 222017
2014-11-14 18:40:49 +00:00
Matt Arsenault
346bdd92b1 R600/SI: Match integer min / max instructions
llvm-svn: 222015
2014-11-14 18:30:06 +00:00
Matt Arsenault
8ddb68217c R600/SI: Use S_BFE_I64 for 64-bit sext_inreg
llvm-svn: 222012
2014-11-14 18:18:16 +00:00
Cameron McInally
8882e35937 [AVX512] Add 512b masked integer shift by immediate patterns.
llvm-svn: 222002
2014-11-14 15:43:00 +00:00
Tom Stellard
6fe14a2b60 R600/SI: Fix assembly names for exec_hi and exec_lo
llvm-svn: 221995
2014-11-14 14:08:04 +00:00
Tom Stellard
a4aabd3acb R600/SI: Start implementing an assembler
This was done using the Sparc and PowerPC AsmParsers as guides.  So far it
is very simple and only supports sopp instructions.

llvm-svn: 221994
2014-11-14 14:08:00 +00:00
Bill Schmidt
80cd5e5dbb [PowerPC] Add VSX builtins for vec_div
This patch adds builtin support for xvdivdp and xvdivsp, along with a
test case.  Straightforward stuff.

There's a companion patch for Clang.

llvm-svn: 221983
2014-11-14 12:10:40 +00:00
Matt Arsenault
2fc74d5147 R600/SI: Make constant array static
llvm-svn: 221965
2014-11-14 02:21:58 +00:00
Tim Northover
3a30fb90f5 X86: use getConstant rather than getTargetConstant behind BUILD_VECTOR.
getTargetConstant should only be used when you can guarantee the instruction
selected will be able to cope with the raw value. BUILD_VECTOR is rather too
generic for this so we should use getConstant instead. In that case, an
instruction can still consume the constant, but if it doesn't it'll be
materialised through its own round of ISel.

Should fix PR21352.

llvm-svn: 221961
2014-11-14 01:30:14 +00:00
Reid Kleckner
2a9158eca3 Fix build of Mips code with MSVC by using our macro instead of __attribute__((unused)) directly
llvm-svn: 221956
2014-11-14 00:39:33 +00:00
Reed Kotler
664c8f453d First stage of call lowering for Mips fast-isel
Summary:
This has most of what is needed for mips fast-isel call lowering for O32.
What is missing I will add on the next patch because this patch is already too large.
It should not be doing anything wrong but it will punt on some cases that it is basically
capable of doing.

The mechanism is there for parameters to be passed on the stack but I have not enabled it because it serves as a way for now to prevent some of the strange cases of O32 register passing that I have not fully checked yet and have some issues.

The Mips O32 abi rules are very complicated as far how data is passed in floating and integer registers.

However there is a way to think about this all very simply and this implementation reflects that.

Basically, the ABI rules are written as if everything is passed on the stack and aligned as such.
Once that is conceptually done, it is nearly trivial to reassign those locations to registers and
then all the complexity disappears.

So I have told tablegen that all the data is passed on the stack and during the lowering I fix
this by assigning to registers as per the ABI doc.

This has been my approach and you can line up what I did with the ABI document and see 1 to 1 what
is going on.



Test Plan: callabi.ll

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: jholewinski, echristo, ahatanak, llvm-commits, rfuhler

Differential Revision: http://reviews.llvm.org/D5714

llvm-svn: 221948
2014-11-13 23:37:45 +00:00
Matt Arsenault
08233590d4 R600/SI: Fix fmin_legacy / fmax_legacy matching for SI
select_cc is expanded on SI, so this was never matched.

llvm-svn: 221941
2014-11-13 23:03:09 +00:00
Aditya Nandakumar
4d9c1ff994 We can get the TLOF from the TargetMachine - so constructor no longer requires TargetLoweringObjectFile to be passed.
llvm-svn: 221926
2014-11-13 21:29:21 +00:00
Juergen Ributzka
fb67fff9cb [FastISel][AArch64] Don't bail during simple GEP instruction selection.
The generic FastISel code would bail, because it can't emit a sign-extend for
AArch64. This copies the code over and uses AArch64 specific emit functions.

This is not ideal and 'computeAddress' should handles this, so it can fold the
address computation into the memory operation.

I plan to clean up 'computeAddress' anyways, so I will add that in a future
commit.

Related to rdar://problem/18962471.

llvm-svn: 221923
2014-11-13 20:50:44 +00:00
Matt Arsenault
d222be0dd7 R600/SI: Use s_movk_i32
llvm-svn: 221922
2014-11-13 20:44:23 +00:00
Matt Arsenault
95ce2351ba R600/SI: Fix definition for s_cselect_b32
These were directly using the old base instruction
class, and specifying the wrong register classes
for operands. The operands can be the other special
inputs besides SGPRs. The op name was also being
directly used for the asm string, so this was printed
without any operands.

llvm-svn: 221921
2014-11-13 20:23:36 +00:00
Matt Arsenault
2ef28e2234 R600: Fix assert on empty function
If a function is just an unreachable, this would hit a
"this is not a MachO target" assertion because of setting
HasSubsectionViaSymbols.

llvm-svn: 221920
2014-11-13 20:07:40 +00:00
Matt Arsenault
8f277a520f R600: Error on initializer for LDS.
Also give a proper error for other address spaces.

llvm-svn: 221917
2014-11-13 19:56:13 +00:00
Matt Arsenault
5487619f0a R600/SI: Get rid of FCLAMP_SI pseudo
It's not necessary. Also use complex patterns to allow
src modifier usage.

llvm-svn: 221916
2014-11-13 19:49:04 +00:00
Matt Arsenault
41886925dc R600/SI: Allow commuting with src2_modifiers
llvm-svn: 221911
2014-11-13 19:26:50 +00:00
Matt Arsenault
a919d1ac8d R600/SI: Allow commuting some 3 op instructions
e.g. v_mad_f32 a, b, c -> v_mad_f32 b, a, c

This simplifies matching v_madmk_f32.

This looks somewhat surprising, but it appears to be
OK to do this. We can commute src0 and src1 in all
of these instructions, and that's all that appears
to matter.

llvm-svn: 221910
2014-11-13 19:26:47 +00:00
Tim Northover
8c9fca07bd ARM: allow constpool entry to be moved to the user's block in all cases.
Normally entries can only move to a lower address, but when that wasn't viable,
the user's block was considered anyway. Unfortunately, it went via
createNewWater which wasn't designed to handle the case where there's already
an island after the block.

Unfortunately, the test we have is slow and fragile, and I couldn't reduce it
to anything sane even with the @llvm.arm.space intrinsic. The test change here
is recreating the previous one after the change.

rdar://problem/18545506

llvm-svn: 221905
2014-11-13 17:58:53 +00:00
Tim Northover
a52b6a893f ARM: avoid duplicating branches during constant islands.
We were using a naive heuristic to determine whether a basic block already had
an unconditional branch at the end. This mostly corresponded to reality
(assuming branches got optimised) because there's not much point in a branch to
the next block, but could go wrong.

llvm-svn: 221904
2014-11-13 17:58:51 +00:00