1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-28 06:22:51 +01:00
Commit Graph

13374 Commits

Author SHA1 Message Date
Bill Schmidt
56989d09e1 Add missing test for r242296 (vec_sld)
llvm-svn: 242680
2015-07-20 15:43:21 +00:00
Tom Stellard
836373703a AMDGPU/SI: Add VI patterns to select FLAT instructions for global memory ops
Summary:
The MUBUF addr64 bit has been removed on VI, so we must use FLAT
instructions when the pointer is stored in VGPRs.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11067

llvm-svn: 242673
2015-07-20 14:28:41 +00:00
Simon Pilgrim
4477808227 [X86][SSE] Tidied up vector CTLZ/CTTZ. NFCI.
llvm-svn: 242645
2015-07-19 17:09:43 +00:00
Elena Demikhovsky
32daaef0aa AVX-512: Floating point conversions for SKX - DAG Lowering.
SKX supports conversion for all FP types. Integer types include doublewords and quardwords.
I added "Legal" status for these nodes and a bunch of tests.
I added "NoVLX" for AVX DAG selection to force VLX instructions selection when VLX is supported.

Differential Revision: http://reviews.llvm.org/D11255

llvm-svn: 242637
2015-07-19 10:17:33 +00:00
Simon Pilgrim
db674064f5 [X86][SSE] Added additional fp/int tests.
Demonstrates some shortfalls in subvector(cvt(x)) compared to cvt(subvector(x)) patterns - especially on AVX/AVX2 targets.

llvm-svn: 242614
2015-07-18 17:05:39 +00:00
Simon Pilgrim
f865bc7902 Refreshed tests.
llvm-svn: 242613
2015-07-18 16:53:51 +00:00
Simon Pilgrim
d4fa5c8283 Refreshed tests and reordered in descending integer size.
llvm-svn: 242610
2015-07-18 16:14:56 +00:00
Simon Pilgrim
6e2ee57005 Tidyup shufflevector calls - don't repeat inputs if you can avoid it.
llvm-svn: 242609
2015-07-18 15:56:33 +00:00
Matthias Braun
05a99347e9 ARM: Enable MachineScheduler and disable PostRAScheduler for swift.
Reapply r242500 now that the swift schedmodel includes LDRLIT.

This is mostly done to disable the PostRAScheduler which optimizes for
instruction latencies which isn't a good fit for out-of-order
architectures. This also allows to leave out the itinerary table in
swift in favor of the SchedModel ones.

This change leads to performance improvements/regressions by as much as
10% in some benchmarks, in fact we loose 0.4% performance over the
llvm-testsuite for reasons that appear to be unknown or out of the
compilers control. rdar://20803802 documents the investigation of
these effects.

While it is probably a good idea to perform the same switch for the
other ARM out-of-order CPUs, I limited this change to swift as I cannot
perform the benchmark verification on the other CPUs.

Differential Revision: http://reviews.llvm.org/D10513

llvm-svn: 242588
2015-07-17 23:18:30 +00:00
Matthias Braun
1102768a42 ARM: Add scheduling information for LDRLIT instructions to swift scheduling model
These pseudo instructions are only lowered after register allocation and
are therefore still present when the machine scheduler runs.
Add a run: line to a testcase that uses the uncommon flags necessary to
actually produce a LDRLIT instruction on swift.

llvm-svn: 242587
2015-07-17 23:18:26 +00:00
Quentin Colombet
186bbfc64e [RAGreedy] Add an experimental deferred spilling feature.
The idea of deferred spilling is to delay the insertion of spill code until the
very end of the allocation. A "candidate" to spill variable might not required
to be spilled because of other evictions that happened after this decision was
taken. The spirit is similar to the optimistic coloring strategy implemented in
Preston and Briggs graph coloring algorithm.

For now, this feature is highly experimental. Although correct, it would require
much more modification to properly model the effect of spilling.

Anyway, this early patch helps prototyping this feature.

Note: The test case cannot unfortunately be reduced and is probably fragile.
llvm-svn: 242585
2015-07-17 23:04:06 +00:00
Alex Lorenz
5ebdfda8d2 MIR Parser: Allow the dollar characters in all of the identifier tokens.
This commit modifies the machine instruction lexer so that it now accepts the
'$' characters in identifier tokens.

This change makes the syntax for unquoted global value tokens consistent with
the syntax for the global idenfitier tokens in the LLVM's assembly language.

llvm-svn: 242584
2015-07-17 22:48:04 +00:00
Adam Nemet
da30ff366a Revert "ARM: Enable MachineScheduler and disable PostRAScheduler for swift."
This reverts commit r242500.

It broke some internal tests and Matthias asked me to revert it while he
is investigating.

llvm-svn: 242553
2015-07-17 18:14:19 +00:00
Eli Bendersky
d69a59088c Use inbounds GEPs for memcpy and memset lowering
Follow-up on discussion in http://reviews.llvm.org/D11220

llvm-svn: 242542
2015-07-17 16:42:33 +00:00
John Brawn
1ef8168343 Make global aliases have symbol size equal to their type
This is mainly for the benefit of GlobalMerge, so that an alias into a
MergedGlobals variable has the same size as the original non-merged
variable.

Differential Revision: http://reviews.llvm.org/D10837

llvm-svn: 242520
2015-07-17 12:12:03 +00:00
Matthias Braun
27408a4dbe ARM: Enable MachineScheduler and disable PostRAScheduler for swift.
This is mostly done to disable the PostRAScheduler which optimizes for
instruction latencies which isn't a good fit for out-of-order
architectures. This also allows to leave out the itinerary table in
swift in favor of the SchedModel ones.

This change leads to performance improvements/regressions by as much as
10% in some benchmarks, in fact we loose 0.4% performance over the
llvm-testsuite for reasons that appear to be unknown or out of the
compilers control. rdar://20803802 documents the investigation of
these effects.

While it is probably a good idea to perform the same switch for the
other ARM out-of-order CPUs, I limited this change to swift as I cannot
perform the benchmark verification on the other CPUs.

Differential Revision: http://reviews.llvm.org/D10513

llvm-svn: 242500
2015-07-17 01:44:31 +00:00
Matt Arsenault
87f1d755ab Only do fmul (fadd x, x), c combine if the fadd only has one use
This was increasing the instruction count if the fadd has multiple uses.

llvm-svn: 242498
2015-07-17 01:14:35 +00:00
Rafael Espindola
35a7e47fa4 Use small encodings for constants when possible.
llvm-svn: 242493
2015-07-17 00:57:52 +00:00
Alex Lorenz
6fa497343d MIR Serialization: Serialize the frame setup machine instruction flag.
llvm-svn: 242491
2015-07-17 00:24:15 +00:00
Alex Lorenz
b3441a0f38 MIR Serialization: Serialize the frame index machine operands.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 242487
2015-07-16 23:37:45 +00:00
Matthias Braun
fbe0089d74 Arm: Don't define a label twice with two setjmps in a function.
Constructing a name based on the function name didn't give us a unique
symbol if we had more than one setjmp in a function. Using
MCContext::createTempSymbol() always gives us a unique name.

Differential Revision: http://reviews.llvm.org/D9314

llvm-svn: 242482
2015-07-16 22:34:20 +00:00
Matthias Braun
f00b5af3eb Fix __builtin_setjmp in combination with sjlj exception handling.
llvm.eh.sjlj.setjmp was used as part of the SjLj exception handling
style but is also used in clang to implement __builtin_setjmp.  The ARM
backend needs to output additional dispatch tables for the SjLj
exception handling style, these tables however can't be emitted if
llvm.eh.sjlj.setjmp is simply used for __builtin_setjmp and no actual
landing pad blocks exist.

To solve this issue a new llvm.eh.sjlj.setup_dispatch intrinsic is
introduced which is used instead of llvm.eh.sjlj.setjmp in the SjLj
exception handling lowering, so we can differentiate between the case
where we actually need to setup a dispatch table and the case where we
just need the __builtin_setjmp semantic.

Differential Revision: http://reviews.llvm.org/D9313

llvm-svn: 242481
2015-07-16 22:34:16 +00:00
Tim Northover
b07439e7ad AArch64: make inexact signalling on round Darwin-specific
C11 leaves the choice on whether round-to-integer operations set the inexact
flag implementation-defined. Darwin does expect it to be set, but this seems to
be against the intent of the IEEE document and slower to implement anyway. So
it should be opt-in.

llvm-svn: 242446
2015-07-16 21:30:21 +00:00
Simon Pilgrim
4c670f2794 [X86][SSE] Added nounwind attribute to vector shift tests.
Stop i686 codegen from generating cfi directives.

llvm-svn: 242443
2015-07-16 21:14:26 +00:00
Bill Schmidt
af026989c7 [PowerPC] v4i32 is a VSRCRegClass
I was looking at some vector code generation and kept seeing
unnecessary vector copies into the Altivec half of the VSX registers.
I discovered that we overlooked v4i32 when adding the register classes
for VSX; we only added v4f32 and v2f64.  This means that anything that
canonicalizes into v4i32 (which is a LOT of stuff) ends up being
forced into VRRC on its way to VSRC.

The fix is one line.  The rest of the patch is fixing up some test
cases whose code generation has changed as a result.

This seems like it would be a good candidate for backport to 3.7.

llvm-svn: 242442
2015-07-16 21:14:07 +00:00
Simon Pilgrim
466e395dc0 [X86][SSE] Updated vector conversion test names.
I'll be adding further tests shortly so need a more thorough naming convention.

llvm-svn: 242440
2015-07-16 21:00:57 +00:00
Matthias Braun
0a9f87ba79 AArch64: Implement conditional compare sequence matching.
This is a new iteration of the reverted r238793 /
http://reviews.llvm.org/D8232 which wrongly assumed that any and/or
trees can be represented by conditional compare sequences, however there
are some restrictions to that. This version fixes this and adds comments
that explain exactly what types of and/or trees can actually be
implemented as conditional compare sequences.

Related to http://llvm.org/PR20927, rdar://18326194

Differential Revision: http://reviews.llvm.org/D10579

llvm-svn: 242436
2015-07-16 20:02:37 +00:00
Tom Stellard
d96b39afde AMDPGU/SI: Negative offsets aren't allowed in MUBUF's vaddr operand
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11226

llvm-svn: 242434
2015-07-16 19:40:09 +00:00
Pete Cooper
d871542c52 Revert "Add missing load/store flags to thumb2 instructions."
This reverts commit r242300.

This is causing buildbot failures which we are investigating.
I'll reapply once we know whats going on, but for now want to
get the bots green.

llvm-svn: 242428
2015-07-16 18:38:13 +00:00
Eli Bendersky
901c8f80fc Correct lowering of memmove in NVPTX
This fixes https://llvm.org/bugs/show_bug.cgi?id=24056

Also a bit of refactoring along the way.

Differential Revision: http://reviews.llvm.org/D11220

llvm-svn: 242413
2015-07-16 16:27:19 +00:00
James Molloy
fbd5dd7f96 [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation
This adds new intrinsics "*absdiff" for absolute difference ops to facilitate efficient code generation for "sum of absolute differences" operation.
The patch also contains the introduction of corresponding SDNodes and basic legalization support.Sanity of the generated code is tested on X86.

This is 1st of the three patches.

Patch by Shahid Asghar-ahmad!

llvm-svn: 242409
2015-07-16 15:22:46 +00:00
Michael Kuperstein
aeebf2eeed [X86] Test for r242395 (Fix emitPrologue() to make less assumptions about pushes)
llvm-svn: 242399
2015-07-16 13:55:39 +00:00
Michael Kuperstein
e1c68fa91b [X86] Reapply r240257 : "Allow more call sequences to use push instructions for argument passing"
This allows more call sequences to use pushes instead of movs when optimizing for size.
In particular, calling conventions that pass some parameters in registers (e.g. thiscall) are now supported.

This should no longer cause miscompiles, now that a bug in emitPrologue was fixed in r242395.

llvm-svn: 242398
2015-07-16 13:54:14 +00:00
Reid Kleckner
798cb07786 Revert "[X86] Allow more call sequences to use push instructions for argument passing"
It miscompiles some code and a reduced test case has been sent to the
author.

This reverts commit r240257.

llvm-svn: 242373
2015-07-16 01:30:00 +00:00
Alex Lorenz
445eea3a06 Fix broken testcase from r242358.
The testcase failed on non X86 targets, because I forgot to pass the
'-march=x86-64' option into llc for one of the X86 specific tests.

llvm-svn: 242370
2015-07-16 00:58:33 +00:00
Akira Hatanaka
deeb3667fb [ARM] Define a subtarget feature that is used to avoid using movt/movw
pairs for 32-bit immediates.

This change is needed to avoid emitting movt/movw pairs when doing LTO
and do so on a per-function basis.

Out-of-tree projects currently using cl::opt option -arm-use-movt=0 or
false to avoid emitting movt/movw pairs should make changes to add
subtarget feature "+no-movt" (see the changes made to clang in r242368).

rdar://problem/21529937

Differential Revision: http://reviews.llvm.org/D11026

llvm-svn: 242369
2015-07-16 00:58:23 +00:00
Alex Lorenz
5b9f1e6415 MIR Serialization: Serialize the jump table index operands.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 242358
2015-07-15 23:38:35 +00:00
Alex Lorenz
7146d12547 MIR Serialization: Serialize the jump table info.
The jump table info is serialized using a YAML mapping that contains its kind
and a YAML sequence of jump table entries. A jump table entry is a YAML mapping
that has an ID and an inline YAML sequence of machine basic block references.

The testcase 'CodeGen/MIR/X86/jump-table-info.mir' doesn't have any instructions
because one of them contains a jump table index operand. The jump table index
operands will be serialized in a follow up patch, and the appropriate
instructions will be added to this testcase.

Reviewers: Duncan P. N. Exon Smith
llvm-svn: 242357
2015-07-15 23:31:07 +00:00
Alex Lorenz
fc7a1b12f5 MIR Serialization: Serialize references from the stack objects to named allocas.
This commit serializes the references to the named LLVM alloca instructions from
the stack objects in the machine frame info. This commit adds a field 'Name' to
the struct 'yaml::MachineStackObject'. This new field is used to store the name
of the alloca instruction when the alloca is present and when it has a name.

llvm-svn: 242339
2015-07-15 22:14:49 +00:00
Bruno Cardoso Lopes
02108a873f Revert "Look through PHIs to find additional register sources"
Likely broke compilation on ARM:

http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/13054

This reverts commit 131ce4a838c081516cbfed039fc986b33e3979d6.

llvm-svn: 242310
2015-07-15 18:10:35 +00:00
Pete Cooper
885a149814 Add missing load/store flags to thumb2 instructions.
These were the cause of a verifier error when building 7zip with
-verify-machineinstrs.  Running 'make check' with the verifier
triggered the same error on the test here so i've updated the test
to run the verifier on one of its runs instead of adding a new one.

While looking at this code, there was a stale comment that these
instructions were only used for disassembly.  This probably used to
be the case, but they are now used in the 'ARM load / store optimization pass' too.

llvm-svn: 242300
2015-07-15 16:36:38 +00:00
Bruno Cardoso Lopes
3cf14dd30a Look through PHIs to find additional register sources
- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.

With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:

A:
  psllq %mm1, %mm0
  movd  %mm0, %r9
  jmp C

B:
  por %mm1, %mm0
  movd  %mm0, %r9
  jmp C

C:
  movd  %r9, %mm0
  pshufw  $238, %mm0, %mm0

Becomes:

A:
  psllq %mm1, %mm0
  jmp C

B:
  por %mm1, %mm0
  jmp C

C:
  pshufw  $238, %mm0, %mm0

Differential Revision: http://reviews.llvm.org/D11197

rdar://problem/20404526

llvm-svn: 242295
2015-07-15 15:35:23 +00:00
Alexey Bataev
1b8ab2f134 [SDAG] Optimize unordered comparison in soft-float mode (patch by Anton Nadolskiy)
Current implementation handles unordered comparison poorly in soft-float mode. 
Consider (a ULE b) which is a <= b. It is lowered to (ledf2(a, b) <= 0 || unorddf2(a, b) != 0) (in general). We can do better job by lowering it to (__gtdf2(a, b) <= 0). 
Such replacement is true for other CMP's (ult, ugt, uge). In general, we just call same function as for ordered case but negate comparison against zero.
Differential Revision: http://reviews.llvm.org/D10804

llvm-svn: 242280
2015-07-15 08:39:35 +00:00
Hal Finkel
db4f85d64e [PowerPC] Use the MachineCombiner to reassociate fadd/fmul
This is a direct port of the code from the X86 backend (r239486/r240361), which
uses the MachineCombiner to reassociate (floating-point) adds/muls to increase
ILP, to the PowerPC backend. The rationale is the same.

There is a lot of copy-and-paste here between the X86 code and the PowerPC
code, and we should extract at least some of this into CodeGen somewhere.
However, I don't want to do that until this code is enhanced to handle FMAs as
well. After that, we'll be in a better position to extract the common parts.

llvm-svn: 242279
2015-07-15 08:23:05 +00:00
Simon Pilgrim
bcccac638f [X86][SSE] Added i686/SSE2 vector shift tests.
We were only testing on x86-64, but we should be ensuring decent code gen of i64 shifts on 32-bit targets.

llvm-svn: 242273
2015-07-15 08:04:07 +00:00
Igor Breger
7e12f17211 AVX : Fix ISA disabling in case AVX512VL , some instructions should be disabled only if AVX512BW present.
Tests added.

Differential Revision: http://reviews.llvm.org/D11122

llvm-svn: 242270
2015-07-15 07:08:10 +00:00
JF Bastien
24e01d0990 WebAssembly: fix build breakage.
Summary:
processFunctionBeforeCalleeSavedScan was renamed to determineCalleeSaves and now takes a BitVector parameter as of rL242165, reviewed in http://reviews.llvm.org/D10909

WebAssembly is still marked as experimental and therefore doesn't build by default. It does, however, grep by default! I notice that processFunctionBeforeCalleeSavedScan is still mentioned in a few comments and error messages, which I also fixed.

Reviewers: qcolombet, sunfish

Subscribers: jfb, dsanders, hfinkel, MatzeB, llvm-commits

Differential Revision: http://reviews.llvm.org/D11199

llvm-svn: 242242
2015-07-14 23:06:07 +00:00
Hal Finkel
f7a4ebfecb [PowerPC] Support symbolic targets in patchpoints
Follow-up r235483, with the corresponding support in PPC. We use a regular call
for symbolic targets (because they're much cheaper than indirect calls).

llvm-svn: 242239
2015-07-14 22:53:11 +00:00
Hal Finkel
2a88231c40 [PowerPC] Use the ABI indirect-call protocol for patchpoints
We used to take the address specified as the direct target of the patchpoint
and did no TOC-pointer handling.  This, however, as not all that useful,
because MCJIT tends to create a lot of modules, and they have their own TOC
sections. Thus, to call from the generated code to other generated code, you
really need to switch TOC pointers. Make this work as expected, and under
ELFv1, tread the address as the function descriptor address so that the correct
TOC pointer can be loaded.

llvm-svn: 242217
2015-07-14 22:26:06 +00:00
Alex Lorenz
427be4c561 MIR Serialization: Serialize the machine basic block live in registers.
llvm-svn: 242204
2015-07-14 21:24:41 +00:00
Hal Finkel
7783df1bb4 [PowerPC] Fix the PPCInstrInfo::getInstrLatency implementation
PowerPC uses itineraries to describe processor pipelines (and dispatch-group
restrictions for P7/P8 cores). Unfortunately, the target-independent
implementation of TII.getInstrLatency calls ItinData->getStageLatency, and that
looks for the largest cycle count in the pipeline for any given instruction.
This, however, yields the wrong answer for the PPC itineraries, because we
don't encode the full pipeline. Because the functional units are fully
pipelined, we only model the initial stages (there are no relevant hazards in
the later stages to model), and so the technique employed by getStageLatency
does not really work. Instead, we should take the maximum output operand
latency, and that's what PPCInstrInfo::getInstrLatency now does.

This caused some test-case churn, including two unfortunate side effects.
First, the new arrangement of copies we get from function parameters now
sometimes blocks VSX FMA mutation (a FIXME has been added to the code and the
test cases), and we have one significant test-suite regression:

SingleSource/Benchmarks/BenchmarkGame/spectral-norm
	56.4185% +/- 18.9398%

In this benchmark we have a loop with a vectorized FP divide, and it with the
new scheduling both divides end up in the same dispatch group (which in this
case seems to cause a problem, although why is not exactly clear). The grouping
structure is hard to predict from the bottom of the loop, and there may not be
much we can do to fix this.

Very few other test-suite performance effects were really significant, but
almost all weakly favor this change. However, in light of the issues
highlighted above, I've left the old behavior available via a
command-line flag.

llvm-svn: 242188
2015-07-14 20:02:02 +00:00
Krzysztof Parzyszek
8d49f92601 [Hexagon] Generate instructions for operations on predicate registers
Convert logical operations on general-purpose registers to the correspon-
ding operations on predicate registers.

llvm-svn: 242186
2015-07-14 19:30:21 +00:00
Keno Fischer
400c1358d0 [CodeGen] Force emission of personality directive if explicitly specified
Summary:
Before this change, personality directives were not emitted
if there was no invoke left in the function (of course until
recently this also meant that we couldn't know what
the personality actually was). This patch forces personality directives
to still be emitted, unless it is known to be a noop in the absence of
invokes, or the user explicitly specified `nounwind` (and not
`uwtable`) on the function.

Reviewers: majnemer, rnk

Subscribers: rnk, llvm-commits

Differential Revision: http://reviews.llvm.org/D10884

llvm-svn: 242185
2015-07-14 19:22:51 +00:00
Matt Arsenault
efaaa8cb7c AMDGPU: Avoid using 64-bit shift for i64 (shl x, 32)
This can be done only with moves which theoretically
will optimize better later.

Although this transform increases the instruction count,
it should be code size / cycle count neutral in the worst
VALU case. It also seems to slightly improve a couple
of testcases due to other DAG combines this exposes.

This is probably slightly worse for the SALU case, so
it might be better to handle this during moveToVALU,
although then you lose some simplifications like
the load width reducing in the simple testcase.

llvm-svn: 242177
2015-07-14 18:20:33 +00:00
Matt Arsenault
988dd980cf AMDGPU/SI: Fix read2 merging into a super register.
If the read2 produced was supposed to be writing into a
super register, it would use the wrong subregister indices.
Fix this by inserting copies, so we only ever write to a vreg_64.
Run the register coalescer again to clean this up, although this
isn't ideal and often does result in an extra move.

Also remove the assert that offset1 > offset0.

There isn't a real reason to not allow this other than a minor
convenience in the compiler, and it doesn't seem worth the effort
of avoiding it.

llvm-svn: 242174
2015-07-14 17:57:36 +00:00
Nemanja Ivanovic
a397908a1f Add missing builtins to the PPC back end for ABI compliance (vol. 4)
This patch corresponds to review:
http://reviews.llvm.org/D11183

Back end portion of the fourth round of additions to altivec.h.

llvm-svn: 242167
2015-07-14 17:25:20 +00:00
Tim Northover
b5aedb0d34 ARM: add at least one real test for r242123.
The ones committed were orthogonal to the change and would have passed before
that revision. What it *did* do was prevent an assertion failure when
generating object files.

llvm-svn: 242166
2015-07-14 17:23:55 +00:00
Matthias Braun
14b971e075 PrologEpilogInserter: Rewrite API to determine callee save regsiters.
This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan():

- Rename the function to determineCalleeSaves()
- Pass a bitset of callee saved registers by reference, thus avoiding
  the function-global PhysRegUsed bitset in MachineRegisterInfo.
- Without PhysRegUsed the implementation is fine tuned to not save
  physcial registers which are only read but never modified.

Related to rdar://21539507

Differential Revision: http://reviews.llvm.org/D10909

llvm-svn: 242165
2015-07-14 17:17:13 +00:00
Krzysztof Parzyszek
38a34d7651 [Hexagon] Generate "extract" instructions more aggressively
Generate extract instructions (via intrinsics) before the DAG combiner
folds shifts into unrecognizable forms.

llvm-svn: 242163
2015-07-14 17:07:24 +00:00
Tom Stellard
a3220fa789 AMDGPU/SI: Add support for shrinking v_cndmask_b32_e32 instructions
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11061

llvm-svn: 242146
2015-07-14 14:15:03 +00:00
Yaron Keren
a300c4d7e5 Generate correct asm info for mingw and cygwin ARM targets.
http://reviews.llvm.org/D11075

Patch by Martell Malone
Reviewed by Reid Kleckner

llvm-svn: 242123
2015-07-14 05:51:05 +00:00
NAKAMURA Takumi
aa66e39b4d Give an explicit triple to llvm/test/CodeGen/X86/pr13577.ll.
llvm-svn: 242111
2015-07-14 03:07:06 +00:00
Matthias Braun
e1bcc14506 Revert "LegalizeDAG: Fix and improve FCOPYSIGN/FABS legalization"
Accidental commit, needs review first.

This reverts commit r242107.

llvm-svn: 242108
2015-07-14 02:09:57 +00:00
Matthias Braun
e8ffab6ec5 LegalizeDAG: Fix and improve FCOPYSIGN/FABS legalization
- Factor out code to query and modify the sign bit of a floatingpoint
  value as an integer. This also works if none of the targets integer
  types is big enough to hold all bits of the floatingpoint value.

- Legalize FABS(x) as FCOPYSIGN(x, 0.0) if FCOPYSIGN is available,
  otherwise perform bit manipulation on the sign bit. The previous code
  used "x >u 0 ? x : -x" which is incorrect for x being -0.0! It also
  takes 34 instructions on ARM Cortex-M4. With this patch we only
  require 5:
    vldr d0, LCPI0_0
    vmov r2, r3, d0
    lsrs r2, r3, #31
    bfi r1, r2, #31, #1
    bx lr
  (This could be further improved if the compiler would recognize that
   r2, r3 is zero).

- Only lower FCOPYSIGN(x, y) = sign(x) ? -FABS(x) : FABS(x) if FABS is
  available otherwise perform bit manipulation on the sign bit.

- Perform the sign(x) test by masking out the sign bit and comparing
  with 0 rather than shifting the sign bit to the highest position and
  testing for "<s 0". For x86 copysignl (on 80bit values) this gets us:
    testl $32768, %eax
  rather than:
    shlq $48, %rax
    sets %al
    testb %al, %al

llvm-svn: 242107
2015-07-14 02:08:26 +00:00
Matthias Braun
cc6f791568 X86: Check output of x86 copysignl testcase.
This makes the changes in an upcoming patch visible.

llvm-svn: 242106
2015-07-14 02:08:23 +00:00
Alex Lorenz
262fad804e MIR Serialization: Serialize the variable sized stack objects.
llvm-svn: 242095
2015-07-14 00:26:26 +00:00
Alex Lorenz
cddac3ca8f MIR Serialization: Serialize the sub register indices.
This commit serializes the sub register indices from the register machine
operands.

Reviewers: Duncan P. N. Exon Smith
llvm-svn: 242084
2015-07-13 23:24:34 +00:00
Bill Schmidt
381b5e4a38 [PPC64LE] More improvements to VSX swap optimization
This patch allows VSX swap optimization to succeed more frequently.
Specifically, it is concerned with common code sequences that occur
when copying a scalar floating-point value to a vector register.  This
patch currently handles cases where the floating-point value is
already in a register, but does not yet handle loads (such as via an
LXSDX scalar floating-point VSX load).  That will be dealt with later.

A typical case is when a scalar value comes in as a floating-point
parameter.  The value is copied into a virtual VSFRC register, and
then a sequence of SUBREG_TO_REG and/or COPY operations will convert
it to a full vector register of the class required by the context.  If
this vector register is then used as part of a lane-permuted
computation, the original scalar value will be in the wrong lane.  We
can fix this by adding a swap operation following any widening
SUBREG_TO_REG operation.  Additional COPY operations may be needed
around the swap operation in order to keep register assignment happy,
but these are pro forma operations that will be removed by coalescing.

If a scalar value is otherwise directly referenced in a computation
(such as by one of the many XS* vector-scalar operations), we
currently disable swap optimization.  These operations are
lane-sensitive by definition.  A MentionsPartialVR flag is added for
use in each swap table entry that mentions a scalar floating-point
register without having special handling defined.

A common idiom for PPC64LE is to convert a double-precision scalar to
a vector by performing a splat operation.  This ensures that the value
can be referenced as V[0], as it would be for big endian, whereas just
converting the scalar to a vector with a SUBREG_TO_REG operation
leaves this value only in V[1].  A doubleword splat operation is one
form of an XXPERMDI instruction, which takes one doubleword from a
first operand and another doubleword from a second operand, with a
two-bit selector operand indicating which doublewords are chosen.  In
the general case, an XXPERMDI can be permitted in a lane-swapped
region provided that it is properly transformed to select the
corresponding swapped values.  This transformation is to reverse the
order of the two input operands, and to reverse and complement the
bits of the selector operand (derivation left as an exercise to the
reader ;).

A new test case that exercises the scalar-to-vector and generalized
XXPERMDI transformations is added as CodeGen/PowerPC/swaps-le-5.ll.
The patch also requires a change to CodeGen/PowerPC/swaps-le-3.ll to
use CHECK-DAG instead of CHECK for two independent instructions that
now appear in reverse order.

There are two small unrelated changes that are added with this patch.
First, the XXSLDWI instruction was incorrectly omitted from the list
of lane-sensitive instructions; this is now fixed.  Second, I observed
that the same webs were being rejected over and over again for
different reasons.  Since it's sufficient to reject a web only once, I
added a check for this to speed up the compilation time slightly.

llvm-svn: 242081
2015-07-13 22:58:19 +00:00
Reid Kleckner
e4008eaff3 [WinEH] Emit the LSDA even if no lpads remain but outlining occurred
The outlined funclets call intrinsics which reference labels from the
LSDA. This situation can easily arise in small functions with a single
cleanup at -O0, where Clang marks a definition as nounwind, and then
WinEHPrepare "discovers" that the landingpad is dead by accident and
deletes it.

We now need to ask the LLVM IR Function for it's personality directly,
rather than going through MachineModuleInfo.

Fixes PR23892.

llvm-svn: 242063
2015-07-13 20:41:46 +00:00
Alex Lorenz
f891b70313 MIR Serialization: Serialize the fixed stack objects.
This commit serializes the fixed stack objects, including fixed spill slots.
The fixed stack objects are serialized using a YAML sequence of YAML inline
mappings. Each mapping has the object's ID, type, size, offset, and alignment.
The objects that aren't spill slots also serialize the isImmutable and isAliased
flags.

The fixed stack objects are a part of the machine function's YAML mapping.

Reviewers: Duncan P. N. Exon Smith
llvm-svn: 242045
2015-07-13 18:07:26 +00:00
Reid Kleckner
20e3005c59 [WinEH] Strip the \01 character from the __CxxFrameHandler3 thunk name
Add another C++ 32-bit EH table test.

llvm-svn: 242044
2015-07-13 17:55:14 +00:00
James Y Knight
b1184c5419 Fix handling of the 'n' asm constraint with invalid operands.
It had accidently accepted a symbol+offset value (and emitted
incorrect code for it, keeping only the offset part) instead of
properly reporting the constraint as invalid.

Differential Revision: http://reviews.llvm.org/D11039

llvm-svn: 242040
2015-07-13 16:36:22 +00:00
Tom Stellard
24371a62d2 AMDGPU/SI: Select mad patterns to v_mac_f32
The two-address instruction pass will convert these back to v_mad_f32
if necessary.

Differential Revision: http://reviews.llvm.org/D11060

llvm-svn: 242038
2015-07-13 15:47:57 +00:00
Logan Chien
33021e9c0e ARM: Fix cttz expansion on vector types.
The 64/128-bit vector types are legal if NEON instructions are
available.  However, there was no matching patterns for @llvm.cttz.*()
intrinsics and result in fatal error.

This commit fixes the problem by lowering cttz to:
a. ctpop((x & -x) - 1)
b. width - ctlz(x & -x) - 1

llvm-svn: 242037
2015-07-13 15:37:30 +00:00
Rafael Espindola
122834dffd Print the visibility of available_externally functions.
We were already printing it for declarations, but not available_externally.

llvm-svn: 242027
2015-07-13 13:55:18 +00:00
Elena Demikhovsky
618bae6f38 AVX-512: Added all AVX-512 forms of Vector Convert for Float/Double/Int/Long types.
In this patch I have only encoding. Intrinsics and DAG lowering will be in the next patch.
I temporary removed the old intrinsics test (just to split this patch).
Half types are not covered here.

Differential Revision: http://reviews.llvm.org/D11134

llvm-svn: 242023
2015-07-13 13:26:20 +00:00
Renato Golin
943837bfb6 [ARM] Add support for nest attribute using r12
Register r12 ('ip') is used by GCC for this purpose
and hence is used here. As discussed on the GCC mailing
list, the register choice is an ABI issue and so
choosing the same register as GCC means
__builtin_call_with_static_chain is compatible.

A similar patch has just gone in the AArch64 backend,
so this is just the ARM counterpart, following the same
discussion.

Patch by Stephen Cross.

llvm-svn: 241996
2015-07-12 18:16:40 +00:00
Simon Pilgrim
e3726fba14 [X86][SSE] Tidied up vector extend/truncation tests. NFCI.
llvm-svn: 241995
2015-07-12 17:40:49 +00:00
Simon Pilgrim
88d91ba7dc [X86][SSE] Vectorized v4i32 non-uniform shifts.
While the v4i32 shl operation is already vectorized using a cvttps2dq/pmulld pattern, the lshr/ashr opeations are still scalarized.

This patch adds vectorization support for non-uniform v4i32 shift operations - it splats constant shift amounts to allow them to use the immediate sse shift instructions, or extracts/zero-extends non-constant shift amounts. The individual results are then blended together.

Differential Revision: http://reviews.llvm.org/D11063

llvm-svn: 241989
2015-07-12 11:15:19 +00:00
Hal Finkel
568c3a41af [PowerPC] Make use of the TargetRecip system
r238842 added the TargetRecip system for controlling use of reciprocal
estimates for sqrt and division using a set of parameters that can be set by
the frontend. Clang now supports a sophisticated -mrecip option, and this will
allow that option to effectively control the relevant code-generation
functionality of the PPC backend.

llvm-svn: 241985
2015-07-12 02:33:57 +00:00
Hal Finkel
f91045042a [PowerPC] Support the nest parameter attribute
This adds support for the 'nest' attribute, which allows the static chain
register to be set for functions calls under non-Darwin PPC/PPC64 targets. r11
is the chain register (which the PPC64 ELF ABI calls the "environment
pointer"). For indirect calls under PPC64 ELFv1, this would normally be loaded
from the function descriptor, but providing an explicit 'nest' parameter will
override that process and use the value provided.

This allows __builtin_call_with_static_chain to work as expected on PowerPC.

llvm-svn: 241984
2015-07-12 00:37:44 +00:00
Alex Lorenz
3307a878a0 MIR Serialization: Serialize the virtual register operands.
Reviewers: Duncan P. N. Exon Smith

Differential Revision: http://reviews.llvm.org/D11005

llvm-svn: 241959
2015-07-10 22:51:20 +00:00
Reid Kleckner
1acdc5b15e [SEH] Push reloads of the SEH code past phi nodes
This in turn would sometimes introduce new cleanupblocks that didn't
previously exist. The uses were being introduced by SSA value demotion.
We actually want to *promote* uses of EH pointers and selectors, so I
added some spcecial casing to avoid demoting such instructions.  This is
getting overly complicated, but hopefully we'll come along and delete it
in the new representation.

llvm-svn: 241950
2015-07-10 22:21:54 +00:00
Matt Arsenault
e1819eb4e3 DAGCombiner: Assume invariant load cannot alias a store
The motivation is to allow GatherAllAliases / FindBetterChain
to not give up on dependent loads of a pointer from constant memory.

This is important for AMDGPU, because most loads are pointers
derived from a load of a kernel argument from constant memory.

llvm-svn: 241948
2015-07-10 22:17:40 +00:00
Quentin Colombet
74a81286b5 [ShrinkWrap][PEI] Do not insert epilogue for unreachable blocks.
Although this is not incorrect to insert such code, it is useless
and it hurts the binary size.

llvm-svn: 241946
2015-07-10 22:09:55 +00:00
Evgeniy Stepanov
b389e3c7f6 Fix AArch64 prologue for empty frame with dynamic allocas.
Fixes PR23804: assertion failure in emitPrologue in the case of a
function with an empty frame and a dynamic alloca that needs stack
realignment. This is a typical case for AddressSanitizer.

llvm-svn: 241943
2015-07-10 21:24:07 +00:00
Matthias Braun
bab8638132 ARMLoadStoreOpt: Merge subs/adds into LDRD/STRD; Factor out common code
This commit factors out common code from MergeBaseUpdateLoadStore() and
MergeBaseUpdateLSMultiple() and introduces a new function
MergeBaseUpdateLSDouble() which merges adds/subs preceding/following a
strd/ldrd instruction into an strd/ldrd instruction with writeback where
possible.

Differential Revision: http://reviews.llvm.org/D10676

llvm-svn: 241928
2015-07-10 18:37:33 +00:00
Fiona Glaser
7d7d370b56 ComputeKnownBits: be a bit smarter about ADDs
If our two inputs have known top-zero bit counts M and N, we trivially
know that the output cannot have any bits set in the top (min(M, N)-1)
bits, since nothing could carry past that point.

llvm-svn: 241927
2015-07-10 18:29:02 +00:00
Matthias Braun
5c16e27f27 ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2
Differential Revision: http://reviews.llvm.org/D10623

llvm-svn: 241926
2015-07-10 18:28:49 +00:00
Alex Lorenz
e7f6afffde MIR Serialization: Initial serialization of stack objects.
This commit implements the initial serialization of stack objects from the
MachineFrameInfo class. It can only serialize the ordinary stack objects
(including ordinary spill slots), but it doesn't serialize variable sized or
fixed stack objects yet.

The stack objects are serialized using a YAML sequence of YAML inline mappings.
Each mapping has the object's ID, type, size, offset and alignment. The stack
objects are a part of machine function's YAML mapping.

Reviewers: Duncan P. N. Exon Smith
llvm-svn: 241922
2015-07-10 18:13:57 +00:00
Matthias Braun
699544cc62 ARMLoadStoreOptimizer: Rewrite LDM/STM matching logic.
This improves the logic in several ways and is a preparation for
followup patches:
- First perform an analysis and create a list of merge candidates, then
  transform. This simplifies the code in that you have don't have to
  care to much anymore that you may be holding iterators to
  MachineInstrs that get removed.
- Analyze/Transform basic blocks in reverse order. This allows to use
  LivePhysRegs to find free registers instead of the RegisterScavenger.
  The RegisterScavenger will become less precise in the future as it
  relies on the deprecated kill-flags.
- Return the newly created node in MergeOps so there's no need to look
  around in the schedule to find it.
- Rename some MBBI iterators to InsertBefore to make their role clear.
- General code cleanup.

Differential Revision: http://reviews.llvm.org/D10140

llvm-svn: 241920
2015-07-10 18:08:49 +00:00
Eli Bendersky
711d658342 Actually support volatile memcpys in NVPTX lowering
Differential Revision: http://reviews.llvm.org/D11091

llvm-svn: 241914
2015-07-10 15:40:33 +00:00
Jingyue Wu
7084152bd9 [NVPTX] declare no vector registers
Summary:
Without this patch, LoopVectorizer in certain cases (see loop-vectorize.ll)
produces code with complex control flow which hurts later optimizations. Since
NVPTX doesn't have vector registers in LLVM's sense
(NVPTXTTI::getRegisterBitWidth(true) == 32), we for now declare no vector
registers to effectively disable loop vectorization.

Reviewers: jholewinski

Subscribers: jingyue, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11089

llvm-svn: 241884
2015-07-10 04:31:56 +00:00
Reid Kleckner
b2ba390775 [WinEH] Make sure LSDA tables are 4 byte aligned
Apparently this is important, otherwise _except_handler3 assumes that
the registration node is corrupted and ignores it.

Also fix a bug in WinEHPrepare where we would insert code after a
terminator instruction.

llvm-svn: 241877
2015-07-10 00:08:49 +00:00
Sanjay Patel
76dbdc8f6e [x86] enable machine combiner reassociations for scalar double-precision multiplies
llvm-svn: 241873
2015-07-09 22:58:39 +00:00
Sanjay Patel
2245b46f4f [x86] enable machine combiner reassociations for scalar double-precision adds
llvm-svn: 241871
2015-07-09 22:48:54 +00:00
Alex Lorenz
5c443ee383 MIR Serialization: Serialize the virtual register definitions.
The virtual registers are serialized using a YAML sequence of YAML inline
mappings. Each mapping has the id of the virtual register and the register
class.

Reviewers: Duncan P. N. Exon Smith

Differential Revision: http://reviews.llvm.org/D10981

llvm-svn: 241868
2015-07-09 22:23:13 +00:00
Reid Kleckner
733508d4e8 [WinEH] Give up on using CSRs across 32-bit invokes for now
The runtime does not restore CSRs when transferring control back to the
function handling the exception. According to the experts on IRC, LLVM's
register allocator has no way to model register clobbers that only
happen on one edge of the CFG. For now, don't worry about trying to use
the meager three CSRs available on 32-bit X86 and just say that such
invokes preserve nothing.

llvm-svn: 241865
2015-07-09 22:09:41 +00:00
Alex Lorenz
e56a7b90eb MIR Parser: Report an error when parsing machine function with an empty body.
This commit adds a new error which is reported when the MIR Parser encounters
a machine function without any machine basic blocks. The machine verifier
expects that the machine functions have at least one MBB, and this error will
prevent machine functions without MBBs from reaching the machine verifier and
crashing with an assertion.

llvm-svn: 241862
2015-07-09 21:21:33 +00:00
Sanjoy Das
fefc8af84a [ImplicitNullChecks] Be smarter in picking the memory op.
Summary:
Before this change ImplicitNullChecks would only pick loads of the form:

```
   test Reg, Reg
   jz elsewhere
 fallthrough:
   movl 32(Reg), Reg2
```

but not (say)

```
   test Reg, Reg
   jz elsewhere
 fallthrough:
   inc Reg3
   movl 32(Reg), Reg2
```

This change teaches ImplicitNullChecks to look through "unrelated"
instructions like `inc Reg3` when searching for a load instruction
to convert to a trapping load.

Reviewers: atrick, JosephTremoulet, reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11044

llvm-svn: 241850
2015-07-09 20:13:25 +00:00
Alex Lorenz
60440f3a04 MIR Serialization: Serialize the simple MachineFrameInfo attributes.
This commit serializes the 13 scalar boolean and integer attributes from the
MachineFrameInfo class: IsFrameAddressTaken, IsReturnAddressTaken, HasStackMap,
HasPatchPoint, StackSize, OffsetAdjustment, MaxAlignment, AdjustsStack,
HasCalls, MaxCallFrameSize, HasOpaqueSPAdjustment, HasVAStart, and
HasMustTailInVarArgFunc. These attributes are serialized as part
of the frameInfo YAML mapping, which itself is a part of the machine function's
YAML mapping.

llvm-svn: 241844
2015-07-09 19:55:27 +00:00
Pat Gavlin
a6d3ba4544 Allow {e,r}bp as the target of {read,write}_register.
This patch allows the read_register and write_register intrinsics to
read/write the RBP/EBP registers on X86 iff the targeted register is
the frame pointer for the containing function.

Differential Revision: http://reviews.llvm.org/D10977

llvm-svn: 241827
2015-07-09 17:40:29 +00:00
Sanjay Patel
6ce96b0ff0 fix an invisible bug when combining repeated FP divisors
This patch fixes bugs that were exposed by the addition of fast-math-flags in the DAG:
r237046 ( http://reviews.llvm.org/rL237046 ):

1. When replacing a division node, it's not enough to RAUW.
   We should call CombineTo() to delete dead nodes and combine again.
2. Because we are changing the DAG, we can't return an empty SDValue
   after the transform. As the code comments say:

    Visitation implementation - Implement dag node combining for different node types.
    The semantics are as follows: Return Value:
      SDValue.getNode() == 0 - No change was made
      SDValue.getNode() == N - N was replaced, is dead and has been handled.
      otherwise - N should be replaced by the returned Operand.

The new test case shows no difference with or without this patch, but it will crash if
we re-apply r237046 or enable FMF via the current -enable-fmf-dag cl::opt.

Differential Revision: http://reviews.llvm.org/D9893

llvm-svn: 241826
2015-07-09 17:28:37 +00:00
Pawel Bylica
b5caea461d Reapply fixed r241790: Fix shift legalization and lowering for big constants.
Summary: If shift amount is a constant value > 64 bit it is handled incorrectly during type legalization and X86 lowering. This patch the type of shift amount argument in function DAGTypeLegalizer::ExpandShiftByConstant from unsigned to APInt.

Reviewers: nadav, majnemer, sanjoy, RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: http://reviews.llvm.org/D10767

llvm-svn: 241806
2015-07-09 14:58:04 +00:00
Krzysztof Parzyszek
cac9b5847a [Hexagon] Add support for atomic RMW operations
llvm-svn: 241804
2015-07-09 14:51:21 +00:00
Arnaud A. de Grandmaison
90c89b61da [AArch64] Select SBFIZ or UBFIZ instead of left + right shifts
And rename LSB to Immr / MSB to Imms to match the ARM ARM terminology.

llvm-svn: 241803
2015-07-09 14:33:38 +00:00
Renato Golin
fc38fe2618 Test for 241794 (nest attribute in AArch64)
Forgot to git add the test.

Patch by Stephen Cross.

llvm-svn: 241797
2015-07-09 13:29:35 +00:00
Pawel Bylica
7aa3d79c2c Revert r241790: Fix shift legalization and lowering for big constants.
llvm-svn: 241792
2015-07-09 09:50:54 +00:00
Pawel Bylica
f94083a5ee Fix shift legalization and lowering for big constants.
Summary: If shift amount is a constant value > 64 bit it is handled incorrectly during type legalization and X86 lowering. This patch the type of shift amount argument in function DAGTypeLegalizer::ExpandShiftByConstant from unsigned to APInt.

Reviewers: nadav, majnemer, sanjoy, RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: http://reviews.llvm.org/D10767

llvm-svn: 241790
2015-07-09 08:01:36 +00:00
Elena Demikhovsky
88c04dfc81 Extended syntax of vector version of getelementptr instruction.
The justification of this change is here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-March/082989.html

According to the current GEP syntax, vector GEP requires that each index must be a vector with the same number of elements.

%A = getelementptr i8, <4 x i8*> %ptrs, <4 x i64> %offsets

In this implementation I let each index be or vector or scalar. All vector indices must have the same number of elements. The scalar value will mean the splat vector value.

(1) %A = getelementptr i8, i8* %ptr, <4 x i64> %offsets
or
(2) %A = getelementptr i8, <4 x i8*> %ptrs, i64 %offset

In all cases the %A type is <4 x i8*>

In the case (2) we add the same offset to all pointers.

The case (1) covers C[B[i]] case, when we have the same base C and different offsets B[i].

The documentation is updated.

http://reviews.llvm.org/D10496

llvm-svn: 241788
2015-07-09 07:42:48 +00:00
Alex Lorenz
284c4b13e7 MIR Serialization: Serialize the 'undef' register machine operand flag.
llvm-svn: 241762
2015-07-08 23:58:31 +00:00
Sanjay Patel
806e80e796 [x86] enable machine combiner reassociations for scalar single-precision multiplies
llvm-svn: 241752
2015-07-08 22:35:20 +00:00
Eli Bendersky
374b7e43da Add tests for the NVPTXLowerAggrCopies pass.
Note: not testing memmove lowering for now, as it's broken
[see https://llvm.org/bugs/show_bug.cgi?id=24056]

llvm-svn: 241736
2015-07-08 21:29:28 +00:00
Alex Lorenz
1fa43c2c9d MIR Serialization: Serialize the 'killed' register machine operand flag.
llvm-svn: 241734
2015-07-08 21:23:34 +00:00
Simon Pilgrim
0c528a0762 [X86][SSE] Vector shift test cleanup. NFC.
llvm-svn: 241730
2015-07-08 21:11:17 +00:00
Reid Kleckner
f93836486a [Win64] Only treat some functions as having the Win64 convention
All the usual X86 target-specific conventions are collapsed to the
normal Win64 convention, but the custom conventions like GHC and webkit
should not be.

Previously we would assume that the caller allocated 32 bytes of shadow
space for us, which is not how webkit_jscc or other custom conventions
are supposed to work.

Based on a patch by peavo@outlook.com.

Fixes PR24051.

llvm-svn: 241725
2015-07-08 21:03:47 +00:00
Alex Lorenz
8b999fafb7 MIR Parser: Use source locations for MBB naming errors.
This commit changes the type of the field 'Name' in the struct
'yaml::MachineBasicBlock' from 'std::string' to 'yaml::StringValue'. This change
allows the MIR parser to report errors related to the MBB name with the proper
source locations.

llvm-svn: 241718
2015-07-08 20:22:20 +00:00
Krzysztof Parzyszek
ac54e4bbae [Hexagon] Implement commoning of GetElementPtr instructions
llvm-svn: 241714
2015-07-08 19:22:28 +00:00
Reid Kleckner
78c492e610 [SEH] Add missing test case from previous realignment commit
llvm-svn: 241700
2015-07-08 18:09:39 +00:00
Reid Kleckner
cde3a2cf79 [SEH] Ensure that empty __except blocks have their own BB
The 32-bit lowering assumed that WinEHPrepare had this invariant.
WinEHPrepare did it for C++, but not SEH. The result was that we would
insert calls to llvm.x86.seh.restoreframe in normal basic blocks, which
corrupted the frame pointer.

llvm-svn: 241699
2015-07-08 18:08:52 +00:00
James Y Knight
4f71b891ec [SPARC] Cleanup handling of the Y/ASR registers.
- Implement copying ASR to/from GPR regs.
- Mark ASRs as non-allocatable, so it won't try to arbitrarily use
  them inappropriately.
- Instead of inserting explicit WRASR/RDASR nodes in the MUL/DIV
  routines, just do normal register copies.
- Also...mark div as using Y, not just writing it.

Added a test case with some code which previously died with an
assertion failure (with -O0), or produced wrong code (otherwise).

(Third time's the charm?)

Differential Revision: http://reviews.llvm.org/D10401

llvm-svn: 241686
2015-07-08 16:25:12 +00:00
Krzysztof Parzyszek
1ff6cefced [Hexagon] Generate "insert" instructions more aggressively
llvm-svn: 241683
2015-07-08 14:47:34 +00:00
Krzysztof Parzyszek
0942a0a955 Revert 241681: causes Windows builds to fail
llvm-svn: 241682
2015-07-08 14:34:13 +00:00
Krzysztof Parzyszek
bba8c32fe9 [Hexagon] Generate "insert" instructions more aggressively
llvm-svn: 241681
2015-07-08 14:22:27 +00:00
Simon Pilgrim
5c44e5f75e [X86][SSE] Added (V)ROUNDSD + (V)ROUNDSS stack folding support
llvm-svn: 241671
2015-07-08 08:07:57 +00:00
Reid Kleckner
0138358834 [WinEH] Make llvm.x86.seh.restoreframe work for stack realignment prologues
The incoming EBP value points to the end of a local stack allocation, so
we can use that to restore ESI, the base pointer. Once we do that, we
can use local stack allocations. If we know we need stack realignment,
spill the original frame pointer in the prologue and reload it after
restoring ESI.

llvm-svn: 241648
2015-07-07 23:45:58 +00:00
Reid Kleckner
6207d850e4 [WinEH] Add localaddress intrinsic instead of using frameaddress
Clang uses this for SEH finally. The new intrinsic will produce the
right value when stack realignment is required.

llvm-svn: 241643
2015-07-07 23:23:03 +00:00
Arnold Schwaighofer
6d6b413fa3 Add more nvcasts
Tim Northover has told me that they can occur when the compiler cleverly
constructs constants - as demonstrated in the test case.

rdar://21703486

llvm-svn: 241641
2015-07-07 23:13:18 +00:00
Reid Kleckner
45072b933e Rename llvm.frameescape and llvm.framerecover to localescape and localrecover
Summary:
Initially, these intrinsics seemed like part of a family of "frame"
related intrinsics, but now I think that's more confusing than helpful.
Initially, the LangRef specified that this would create a new kind of
allocation that would be allocated at a fixed offset from the frame
pointer (EBP/RBP). We ended up dropping that design, and leaving the
stack frame layout alone.

These intrinsics are really about sharing local stack allocations, not
frame pointers. I intend to go further and add an `llvm.localaddress()`
intrinsic that returns whatever register (EBP, ESI, ESP, RBX) is being
used to address locals, which should not be confused with the frame
pointer.

Naming suggestions at this point are welcome, I'm happy to re-run sed.

Reviewers: majnemer, nicholas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11011

llvm-svn: 241633
2015-07-07 22:25:32 +00:00
Alex Lorenz
19382e1c52 MIR Serialization: Serialize the 'dead' register machine operand flag.
llvm-svn: 241624
2015-07-07 20:34:53 +00:00
Arnold Schwaighofer
e78f06bc6f Add CHECK lines to test case
llvm-svn: 241619
2015-07-07 19:26:31 +00:00
Arnold Schwaighofer
a112326960 Add a pattern for a nvcast from v2f64 -> v4f32
Since the NvCast is generated by the selection process the concerns about
endianess and bit reversal don't apply.

rdar://21703486

llvm-svn: 241611
2015-07-07 18:31:55 +00:00
Akira Hatanaka
f2cd5836e0 Fix test case to unbreak build.
This commit changes the target arch to fix the test case commited in r241566
that was failing on ninja-x64-msvc-RA-centos6. Also add checks to make sure
the callee's address is loaded to blx's operand. 

llvm-svn: 241588
2015-07-07 14:45:12 +00:00
Akira Hatanaka
548fcd7ec7 [ARM] Define a subtarget feature and use it to decide whether long calls should
be emitted.

This is needed to enable ARM long calls for LTO and enable and disable it on a
per-function basis.

Out-of-tree projects currently using EnableARMLongCalls to emit long calls
should start passing "+long-calls" to the feature string (see the changes made
to clang in r241565).

rdar://problem/21529937

Differential Revision: http://reviews.llvm.org/D9364

llvm-svn: 241566
2015-07-07 06:54:42 +00:00
Alex Lorenz
813af3fadc MIR Parser: Verify the implicit machine register operands.
This commit verifies that the parsed machine instructions contain the implicit
register operands as specified by the MCInstrDesc. Variadic and call
instructions aren't verified.

Reviewers: Duncan P. N. Exon Smith

Differential Revision: http://reviews.llvm.org/D10781

llvm-svn: 241537
2015-07-07 02:08:46 +00:00
Dan Gohman
fc49082461 [WebAssembly] Create a CodeGen unittest directory.
llvm-svn: 241520
2015-07-06 23:14:57 +00:00
Alex Lorenz
583f921888 MIR Serialization: Serialize the implicit register flag.
This commit serializes the implicit flag for the register machine operands. It
introduces two new keywords into the machine instruction syntax: 'implicit' and
'implicit-def'. The 'implicit' keyword is used for the implicit register
operands, and the 'implicit-def' keyword is used for the register operands that
have both the implicit and the define flags set.

Reviewers: Duncan P. N. Exon Smith

Differential Revision: http://reviews.llvm.org/D10709

llvm-svn: 241519
2015-07-06 23:07:26 +00:00
Simon Pilgrim
3c101973b3 [X86][AVX] Add support for shuffle decoding of vperm2f128/vperm2i128 with zero'd lanes
The vperm2f128/vperm2i128 shuffle mask decoding was not attempting to deal with shuffles that give zero lanes. This patch fixes this so that the assembly printer can provide shuffle comments.

As this decoder is also used in X86ISelLowering for shuffle combining, I've added an early-out to match existing behaviour. The hope is that we can add zero support in the future, this would allow other ops' decodes (e.g. insertps) to be combined as well.

Differential Revision: http://reviews.llvm.org/D10593

llvm-svn: 241516
2015-07-06 22:46:46 +00:00
Sanjay Patel
6781c4029c [x86] extend machine combiner reassociation optimization to SSE scalar adds
Extend the reassociation optimization of http://reviews.llvm.org/rL240361 (D10460)
to SSE scalar FP SP adds in addition to AVX scalar FP SP adds.

With the 'switch' in place, we can trivially add other opcodes and test cases in
future patches.

Differential Revision: http://reviews.llvm.org/D10975

llvm-svn: 241515
2015-07-06 22:35:29 +00:00
Simon Pilgrim
a825efbf95 [X86][SSE] Vectorized i64 uniform constant SRA shifts
This patch adds vectorization support for uniform constant i64 arithmetic shift right operators.

Differential Revision: http://reviews.llvm.org/D9645

llvm-svn: 241514
2015-07-06 22:35:19 +00:00
Reid Kleckner
c447c449e6 [WinEH] Add some test cases I forgot to add to previous commits
llvm-svn: 241510
2015-07-06 21:13:53 +00:00
Reid Kleckner
80b41774f1 [WinEH] Insert the EH code load before the block terminator
The previous code put the load after the terminator, leading to invalid
IR and downstream crashes. This caused http://crbug.com/506446.

llvm-svn: 241509
2015-07-06 21:13:43 +00:00
Simon Pilgrim
385bee8c59 [X86][SSE4A] Shuffle lowering using SSE4A EXTRQ/INSERTQ instructions
This patch adds support for v8i16 and v16i8 shuffle lowering using the immediate versions of the SSE4A EXTRQ and INSERTQ instructions. Although rather limited (they can only act on the lower 64-bits of the source vectors, leave the upper 64-bits of the result vector undefined and don't have VEX encoded variants), the instructions are still useful for the zero extension of any lane (EXTRQ) or inserting a lane into another vector (INSERTQ). Testing demonstrated that it wasn't typically worth it to use these instructions for v2i64 or v4i32 vector shuffles although they are capable of it.

As well as adding specific pattern matching for the shuffles, the patch uses EXTRQ for zero extension cases where SSE41 isn't available and its more efficient than the SSE2 'unpack' default approach. It also adds shuffle decode support for the EXTRQ / INSERTQ cases when the instructions are handling full byte-sized extractions / insertions.

From this foundation, future patches will be able to make use of the instructions for situations that use their ability to extract/insert at the bit level.

Differential Revision: http://reviews.llvm.org/D10146

llvm-svn: 241508
2015-07-06 20:46:41 +00:00
Alex Lorenz
6945ffc637 llc: Add a 'run-pass' option.
This commit adds a 'run-pass' option to llc, which instructs the compiler to run
one specific code generation pass only.

Llc already has the 'start-after' and the 'stop-after' options, and this new
option complements the other two by making it easier to write tests that want
to invoke a single pass only.

Reviewers: Duncan P. N. Exon Smith

Differential Revision: http://reviews.llvm.org/D10776

llvm-svn: 241476
2015-07-06 17:44:26 +00:00
Matt Arsenault
36294af135 AMDGPU/SI: Add debugging subtarget feature for DS offsets
We don't have a good way to detect most situations where
DS offsets are usable on SI, so add an option to force using
them even if unsafe for debugging performance problems.

llvm-svn: 241462
2015-07-06 16:01:58 +00:00
Simon Pilgrim
e9f414f573 [X86][SSE] Added missing stack folding test for SQRTSD and SQRTSS instructions.
llvm-svn: 241445
2015-07-06 14:15:02 +00:00
Asaf Badouh
a51b8d0d5b [X86][AVX512] Multiply Packed Unsigned Integers with Round and Scale
pmulhrsw

review:
http://reviews.llvm.org/D10948

llvm-svn: 241443
2015-07-06 14:03:40 +00:00
Simon Pilgrim
99be799579 [X86][SSE3] Just use an explicit SSE3 target attribute - not a cpu type.
Merged arch/target into a specific triple - we had i686 and x86_64 targets overriding each other....

llvm-svn: 241410
2015-07-05 19:06:32 +00:00
Simon Pilgrim
7cc9f6e96f [X86][SSE2] Just use an explicit SSE2 target attribute - not a cpu type.
corei7 is capable of a lot more than just SSE2.... 

llvm-svn: 241409
2015-07-05 19:03:51 +00:00
Asaf Badouh
7e53a288e3 [x86][AVX512] add Multiply High Op
include encoding and intrinsics tests.

review
http://reviews.llvm.org/D10896

llvm-svn: 241406
2015-07-05 12:23:20 +00:00
Nemanja Ivanovic
4dede06034 Add missing builtins to the PPC back end for ABI compliance (vol. 2)
This patch corresponds to review:
http://reviews.llvm.org/D10874

Back end portion of the second round of additions to altivec.h.

llvm-svn: 241398
2015-07-05 06:03:51 +00:00
Simon Pilgrim
ae34c3f1f4 [X86][SSE] Improved i8/i16 to f64 uint2fp vector conversions
Followup to D10433 and D10589 that fixes i8/i16 uint2fp vector conversions by zero extending to i32 and using the sint2fp path (unless the target does actually support uint2fp).

llvm-svn: 241394
2015-07-04 15:33:34 +00:00
Simon Pilgrim
4be0188219 [X86] Added 32-bit builds to fp<->int tests.
Ensure that i686 x87/SSE/SSE2 targets all build.

llvm-svn: 241368
2015-07-03 20:07:57 +00:00
NAKAMURA Takumi
a114fd42ae llvm/test/CodeGen/ARM/fnattr-trap.ll: Add -mtriple, to appease targeting *-win32.
LLVM ERROR: CPU: 'generic' does not support ARM mode execution!

llvm-svn: 241329
2015-07-03 08:21:38 +00:00
Simon Pilgrim
9c5ee1cefc whitespace tidyup. NFC.
llvm-svn: 241326
2015-07-03 08:02:12 +00:00
Simon Pilgrim
555783cecd [X86][SSE] Sign extension for target vector sizes less than 128 bits (pt2)
Add support for v2i8/v2i16 to v2f64 by using a sign extension to v2i32 before conversion to v2f64.

Differential Revision: http://reviews.llvm.org/D10589

llvm-svn: 241325
2015-07-03 08:01:36 +00:00
Simon Pilgrim
438ea7ca28 [X86][SSE] Sign extension for target vector sizes less than 128 bits (pt1)
This patch adds support for sign extension for sub 128-bit vectors, such as to v2i32. It concatenates with UNDEF subvectors up to 128-bits, performs the sign extension (i.e. as v4i32) and then extracts the target subvector.

Patch 1/2 of D10589 - the second patch covers the conversion of v2i8/v2i16 to v2f64.

llvm-svn: 241323
2015-07-03 07:51:01 +00:00
Nadav Rotem
4e07210007 Fix an overly aggressive assertion in getCopyFromPartsVector.
The assertion in getCopyFromPartsVector assumed that the vector 'part' must
match the type of argument (arguments are potentially split into multiple
parts). However, in some cases the targets return a 'part' of the right size
but with a different type. We already handle this case correctly later on
and generate a bitcast. This commit just makes sure that we are actually
checking the property that we care about.

llvm-svn: 241312
2015-07-02 23:23:52 +00:00
Akira Hatanaka
09b63898c4 Use function attribute "trap-func-name" and remove TargetOptions::TrapFuncName.
This commit changes normal isel and fast isel to read the user-defined trap
function name from function attribute "trap-func-name" attached to llvm.trap or
llvm.debugtrap instead of from TargetOptions::TrapFuncName. This is needed to
use clang's command line option "-ftrap-function" for LTO and enable changing
the trap function name on a per-call-site basis.

Out-of-tree projects currently using TargetOptions::TrapFuncName to specify the
trap function name should attach attribute "trap-func-name" to the call sites
of llvm.trap and llvm.debugtrap instead.

rdar://problem/21225723

Differential Revision: http://reviews.llvm.org/D10832

llvm-svn: 241305
2015-07-02 22:13:27 +00:00
Bill Schmidt
8da856ce76 [PPC64LE] Remove implicit-subreg restriction from VSX swap removal
In r241285, I removed the SUBREG_TO_REG restriction from VSX swap
removal, determining that this was overly conservative.  We have
another form of the same restriction in that we check for the presence
of implicit subregs in vector operations.  As with SUBREG_TO_REG for
partial register conversions, an implicit subreg is safe in and of
itself, provided no other operation makes a lane-sensitive assumption
about the result.  This patch removes that restriction, by removing
the HasImplicitSubreg flag and all code that relies on it.

I've added a test case that fails to optimize before this patch is
applied, and optimizes properly with the patch.  Test based on a
report from Anton Blanchard.

llvm-svn: 241290
2015-07-02 19:01:22 +00:00
Bill Schmidt
0c245b6f0f [PPC64LE] Teach swap optimization about the doubleword splat idiom
With a previous patch, the VSX swap optimization is able to recognize
the doubleword load-splat idiom that can be implemented using lxvdsx.
However, that does not cover a doubleword splat where the source is a
register.  We can implement this using xxspltd (a special form of
xxpermdi).  This patch teaches the swap optimization pass about this
idiom.

As a prerequisite, it also permits swap optimization to succeed for
all forms of SUBREG_TO_REG.  Previously we were conservative and only
allowed SUBREG_TO_REG when it copied a full register.  However, on
reflection any form of SUBREG_TO_REG is safe in and of itself, so long
as an unsafe operation is not performed on its result.  In particular,
a widening SUBREG_TO_REG often occurs as an input to a doubleword
splat idiom, particularly in auto-vectorized code.

The doubleword splat idiom is an XXPERMDI operation where both source
registers are identical, and the selection mask is either 0 (splat the
first element) or 3 (splat the second element).  To determine whether
the registers are identical, we use the existing mechanism for looking
through "copy-like" operations.  That mechanism has a side effect of
marking the XXPERMDI operation as using a physical register, which
would invalidate its presence in a swap-optimized region.  This is
correct for the form of XXPERMDI that performs a swap and hence would
be removed, but is not what we want for a doubleword-splat variety of
XXPERMDI.  Therefore we reset the physical-register flag on the
XXPERMDI when it represents a splat.

A simple test case is added to verify that we generate the splat and
that we also remove the xxswapd instructions that would otherwise be
associated with the load and store of another operand.

llvm-svn: 241285
2015-07-02 17:03:06 +00:00
Pawel Bylica
4cfe4f6645 Reapply r240291: Fix shl folding in DAG combiner.
The code responsible for shl folding in the DAGCombiner was assuming incorrectly that all constants are less than 64 bits. This patch simply changes the way values are compared.

It has been reverted previously because of some problems with comparing APInt with raw uint64_t. That has been fixed/changed with r241204.

llvm-svn: 241254
2015-07-02 11:44:54 +00:00
Quentin Colombet
fe7967bf11 [TwoAddressInstructionPass] Try 3 Addr Conversion After Commuting.
TwoAddressInstructionPass stops after a successful commuting but 3 Addr
conversion might be good for some cases.
 
Consider:

int foo(int a, int b) {
  return a + b;
}

Before this commit, we emit:

addl	%esi, %edi
movl	%edi, %eax
ret

After this commit, we try 3 Addr conversion:

leal	(%rsi,%rdi), %eax
ret

Patch by Volkan Keles <vkeles@apple.com>!

Differential Revision: http://reviews.llvm.org/D10851

llvm-svn: 241206
2015-07-01 23:12:13 +00:00
Matthias Braun
e7b061b51a Test for specific output in lit test
llvm-svn: 241200
2015-07-01 22:34:59 +00:00
Jingyue Wu
5f36b4cd05 [NVPTX] expand extload/truncstore for vectors of floats
Summary:
According to PTX ISA:

For convenience, ld, st, and cvt instructions permit source and destination data operands to be wider than the instruction-type size, so that narrow values may be loaded, stored, and converted using regular-width registers. For example, 8-bit or 16-bit values may be held directly in 32-bit or 64-bit registers when being loaded, stored, or converted to other types and sizes. The operand type checking rules are relaxed for bit-size and integer (signed and unsigned) instruction types; floating-point instruction types still require that the operand type-size matches exactly, unless the operand is of bit-size type.

So, the ISA does not support load with extending/store with truncatation for floating numbers. This is reflected in setting the loadext/truncstore actions to expand in the code for floating numbers, but vectors of floating numbers are not taken care of.

As a result, loading a vector of floats followed by a fp_extend may be combined by DAGCombiner to a extload, and the extload may be lowered to NVPTXISD::LoadV2 with extending information. However, NVPTXISD::LoadV2 does not perform extending, and no extending instructions are inserted. Finally, PTX instructions with mismatched types are generated, like
ld.v2.f32 {%fd3, %fd4}, [%rd2]

This patch adds the correct actions for vectors of floats, so DAGCombiner would not create loads with extending, and correct code is generated.

Patched by Gang Hu. 

Test Plan: Test case attached.

Reviewers: jingyue

Reviewed By: jingyue

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D10876

llvm-svn: 241191
2015-07-01 21:32:42 +00:00
Jingyue Wu
bf15de754a [NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPass
Summary:
Offset of frame index is calculated by NVPTXPrologEpilogPass. Before
that the correct offset of stack objects cannot be obtained, which
leads to wrong offset if there are more than 2 frame objects. This patch
move NVPTXPeephole after NVPTXPrologEpilogPass. Because the frame index
is already replaced by %VRFrame in NVPTXPrologEpilogPass, we check
VRFrame register instead, and try to remove the VRFrame if there
is no usage after NVPTXPeephole pass.

Patched by Xuetian Weng. 

Test Plan:
Strengthened test/CodeGen/NVPTX/local-stack-frame.ll to check the
offset calculation based on SP and SPL.

Reviewers: jholewinski, jingyue

Reviewed By: jingyue

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10853

llvm-svn: 241185
2015-07-01 20:08:06 +00:00
Bill Schmidt
2cbe4a84c5 [PPC64LE] Enable missing lxvdsx optimization, and related swap optimization
When adding little-endian vector support for PowerPC last year, I
inadvertently disabled an optimization that recognizes a load-splat
idiom and generates the lxvdsx instruction.  This patch moves the
offending logic so lxvdsx is once again generated.

This pattern is frequently generated by the vectorizer for scalar
loads of an effective constant.  Previously the lxvdsx instruction was
wrongly listed as lane-sensitive for the VSX swap optimization (since
both doublewords are identical, swaps are safe).  This patch fixes
this as well, so that vectorized code using lxvdsx can now have swaps
removed from the computation.

There is an existing test (@test50) in test/CodeGen/PowerPC/vsx.ll
that checks for the missing optimization.  However, vsx.ll was only
being tested for POWER7 with big-endian code generation.  I've added
a little-endian RUN statement and expected LE code generation for all
the tests in vsx.ll to give us a bit better VSX coverage, including
what's needed for this patch.

llvm-svn: 241183
2015-07-01 19:40:07 +00:00
Sanjay Patel
09ab09a461 add a cl::opt override for TargetLoweringBase's JumpIsExpensive
This patch is not intended to change existing codegen behavior for any target. 
It just exposes the JumpIsExpensive setting on the command-line to allow for
easier testing and emergency overrides.

Also, change the existing regression test to use FileCheck, explicitly specify
the jump-is-expensive option, and use more precise checks.

Differential Revision: http://reviews.llvm.org/D10846

llvm-svn: 241179
2015-07-01 18:10:20 +00:00
Reid Kleckner
ff8213aeba [SEH] Don't assert if the parent function lacks a personality
The EH code might have been deleted as unreachable and the personality
pruned while the filter is still present.  Currently I'm hitting this at
-O0 due to the clang bug PR24009.

llvm-svn: 241170
2015-07-01 16:45:47 +00:00
Igor Breger
cdff3524c0 AVX-512: Implemented missing encoding for FMA scalar instructions
Added tests for encoding

Differential Revision: http://reviews.llvm.org/D10865

llvm-svn: 241159
2015-07-01 13:24:28 +00:00
Reid Kleckner
8011644f25 [SEH] Add new intrinsics for recovering and restoring parent frames
The incoming EBP value established by the runtime is actually a pointer
to the end of the EH registration object, and not the true parent
function frame pointer. Clang doesn't need llvm.x86.seh.exceptioninfo
anymore because we know that the exception info pointer is at a fixed
offset from this incoming EBP.

The llvm.x86.seh.recoverfp intrinsic takes an EBP value provided by the
EH runtime and returns a pointer that is usable with llvm.framerecover.

The llvm.x86.seh.restoreframe intrinsic is inserted by the 32-bit
specific preparation pass in blocks targetted by the EH runtime. It
re-establishes any physical registers used by the parent function to
address the stack, such as the frame, base, and stack pointers.

Neither of these intrinsics correctly handle stack realignment prologues
yet, but it's possible to add that later.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D10848

llvm-svn: 241125
2015-06-30 22:46:59 +00:00
Sanjoy Das
d0dd6a0ba1 [FaultMaps] Let the frontend pre-select implicit null check candidates.
Summary:
This change introduces a !make.implicit metadata that allows the
frontend to pre-select the set of explicit null checks that will be
considered for transformation into implicit null checks.

The reason for not using profiling data instead of !make.implicit is
explained in the change to `FaultMaps.rst`.

Reviewers: atrick, reames, pgavlin, JosephTremoulet

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10824

llvm-svn: 241116
2015-06-30 21:22:32 +00:00
Nemanja Ivanovic
7d2f845a9d Fixes a bug with __builtin_vsx_lxvdw4x on Little Endian systems
llvm-svn: 241108
2015-06-30 19:45:45 +00:00
Peter Collingbourne
48749b3316 COFF: Do not assign linker-weak symbols to selectany comdat sections.
It is mandatory to specify a comdat in order to receive comdat semantics
for a symbol. We were previously getting this wrong in -function-sections
mode; linker-weak symbols were being emitted in a selectany comdat. This
change causes such symbols to use a noduplicates comdat instead, fixing
the inconsistency.

Also correct an inaccuracy in the docs.

Differential Revision: http://reviews.llvm.org/D10828

llvm-svn: 241103
2015-06-30 19:10:31 +00:00
Jingyue Wu
20fd92cbbc [NVPTX] Fix issue introduced in D10321
Summary:
Really check if %SP is not used in other places, instead of checking only exact
one non-dbg use.

Patched by Xuetian Weng. 

Test Plan:
@foo4 in test/CodeGen/NVPTX/local-stack-frame.ll, create a case that
SP will appear twice.

Reviewers: jholewinski, jingyue

Reviewed By: jingyue

Subscribers: llvm-commits, sfantao, jholewinski

Differential Revision: http://reviews.llvm.org/D10844

llvm-svn: 241099
2015-06-30 18:59:19 +00:00
Alex Lorenz
f829ecf101 MIR Serialization: Serialize MBB successors.
This commit implements serialization of the machine basic block successors. It
uses a YAML flow sequence that contains strings that have the MBB references.
The MBB references in those strings use the same syntax as the MBB machine
operands in the machine instruction strings.

Reviewers: Duncan P. N. Exon Smith

Differential Revision: http://reviews.llvm.org/D10699

llvm-svn: 241093
2015-06-30 18:16:42 +00:00
Samuel Antao
067b1d340e Force relocation mode to be default, regardless of what is passed to the backend.
llvm-svn: 241081
2015-06-30 17:18:00 +00:00
Michael Kuperstein
16ecf4031e [X86] Fix a bug in WIN_FTOL_32/64 handling.
Duplicating an FP register "as itself" is a bad idea, since it violates the
invariant that every FP register is mapped to at most one FPU stack slot.
Use the scratch FP register instead.

This fixes PR23957.

llvm-svn: 241069
2015-06-30 14:38:57 +00:00
Michael Kuperstein
7b4b71c924 [X86] Add FXSR intrinsics
Add intrinsics for the FXSR instructions (FXSAVE/FXSAVE64/FXRSTOR/FXRSTOR64)

llvm-svn: 241049
2015-06-30 08:49:35 +00:00
Matthias Braun
bb1a92ea1b RegisterCoalescer: Cleanup empty subranges after shrinkToUses()
A call to removeEmptySubranges() is necessary after every operation that
potentially removes all segments from a subregister range; this case in
the register coalescer was missing.

llvm-svn: 241027
2015-06-30 00:33:44 +00:00
Peter Collingbourne
d3c303721f Teach LTOModule to emit linker flags for dllexported symbols, plus interface cleanup.
This change unifies how LTOModule and the backend obtain linker flags
for globals: via a new TargetLoweringObjectFile member function named
emitLinkerFlagsForGlobal. A new function LTOModule::getLinkerOpts() returns
the list of linker flags as a single concatenated string.

This change affects the C libLTO API: the function lto_module_get_*deplibs now
exposes an empty list, and lto_module_get_*linkeropts exposes a single element
which combines the contents of all observed flags. libLTO should never have
tried to parse the linker flags; it is the linker's job to do so. Because
linkers will need to be able to parse flags in regular object files, it
makes little sense for libLTO to have a redundant mechanism for doing so.

The new API is compatible with the old one. It is valid for a user to specify
multiple linker flags in a single pragma directive like this:

 #pragma comment(linker, "/defaultlib:foo /defaultlib:bar")

The previous implementation would not have exposed
either flag via lto_module_get_*deplibs (as the test in
TargetLoweringObjectFileCOFF::getDepLibFromLinkerOpt was case sensitive)
and would have exposed "/defaultlib:foo /defaultlib:bar" as a single flag via
lto_module_get_*linkeropts. This may have been a bug in the implementation,
but it does give us a chance to fix the interface.

Differential Revision: http://reviews.llvm.org/D10548

llvm-svn: 241010
2015-06-29 22:04:09 +00:00
Tim Northover
f460862bbe ARM: add correct kill flags when combining stm instructions
When the store sequence being combined actually stores the base register, we
should not mark it as killed until the end.

rdar://21504262

llvm-svn: 241003
2015-06-29 21:42:16 +00:00
Matthias Braun
c4e357521f X86: Rework inline asm integer register specification.
This is a new version of http://reviews.llvm.org/D10260.

It turned out that when you specify an integer register in inline asm on
x86 you get the register of the required type size back. That means that
X86TargetLowering::getRegForInlineAsmConstraint() has to accept any of
the integer registers and adapt its size to the given target size which
may be any 8/16/32/64 bit sized type. Surprisingly that means given a
constraint of "{ax}" and a type of MVT::F32 we need to return X86::EAX.

This change makes this face explicit, the previous code seemed like
working by accident because there it never returned an error once a
register was found. On the other hand this rewrite allows to actually
return errors for invalid situations like requesting an integer register
for an i128 type.

Related to rdar://21042280

Differential Revision: http://reviews.llvm.org/D10813

llvm-svn: 241002
2015-06-29 21:35:51 +00:00
Sanjoy Das
d6c4d72807 [FaultMaps] Fix test case.
implicit-null-check-negative.ll had a missing 2>&1.  Fix this, and
remove an incorrect test case that this exposes.

llvm-svn: 240998
2015-06-29 21:27:36 +00:00
Pawel Bylica
7882eec9af [DAGCombiner] Fix & simplify constant folding of sext/zext.
Summary: This patch fixes the cases of sext/zext constant folding in DAG combiner where constans do not fit 64 bits. The fix simply removes un$

Test Plan: New regression test included.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: http://reviews.llvm.org/D10607

llvm-svn: 240991
2015-06-29 20:28:47 +00:00
Alex Lorenz
b12c23d4ef MIR Serialization: Serialize the register mask machine operands.
This commit implements serialization of the register mask machine
operands. This commit serializes only the call preserved register
masks that are defined by a target, it doesn't serialize arbitrary
register masks.

This commit also extends the TargetRegisterInfo class and TableGen so that
the users of TRI can get the list of all the call preserved register masks and
their names.

Reviewers: Duncan P. N. Exon Smith

Differential Revision: http://reviews.llvm.org/D10673

llvm-svn: 240966
2015-06-29 16:57:06 +00:00
Elena Demikhovsky
12bde41e5a AVX-512: all forms of SCATTER instruction on SKX,
encoding, intrinsics and tests.

llvm-svn: 240936
2015-06-29 12:14:24 +00:00
Javed Absar
013d6e555c [ARM]: Extend -mfpu options for half-precision and vfpv3xd
Some of the the permissible ARM -mfpu options, which are supported in GCC,
are currently not present in llvm/clang.This patch adds the options:
'neon-fp16', 'vfpv3-fp16', 'vfpv3-d16-fp16', 'vfpv3xd' and 'vfpv3xd-fp16.
These are related to half-precision floating-point and single precision.

Reviewers: rengolin, ranjeet.singh

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10645

llvm-svn: 240930
2015-06-29 09:32:29 +00:00
Igor Breger
7ca2ee2eb1 AVX-512: Implemented missing encoding and intrinsics for FMA instructions
Added tests for DAG lowering ,encoding and intrinsics

Differential Revision: http://reviews.llvm.org/D10796

llvm-svn: 240926
2015-06-29 09:10:00 +00:00
Matt Arsenault
0bf3df2409 AMDGPU/SI: Fix extra space when printing v_div_fmas_*
llvm-svn: 240911
2015-06-28 18:16:14 +00:00
Asaf Badouh
732e3b5425 [x86][AVX512]
Add vscalef support
include encoding and intrinsics


review:
http://reviews.llvm.org/D10730

llvm-svn: 240906
2015-06-28 14:30:39 +00:00
Elena Demikhovsky
02169f53d0 AVX-512: Added all SKX forms of GATHER instructions.
Added intrinsics.
Added encoding and tests.

llvm-svn: 240905
2015-06-28 10:53:29 +00:00
Benjamin Kramer
f266daf4f5 [SDAG] Now that we have a way to communicate the exact bit on sdiv use it to simplify sdiv by a constant.
We had a hack in SDAGBuilder in place to work around this but now we
can avoid that. Call BuildExactSDIV from BuildSDIV so DAGCombiner can
perform this trick automatically.

The added check in DAGCombiner is necessary to prevent exact sdiv by pow2
from regressing as the target-specific pow2 lowering is not aware of
exact bits yet.

This is mostly covered by existing tests. One side effect is that we
get the better lowering for exact vector sdivs now too :)

llvm-svn: 240891
2015-06-27 20:33:26 +00:00
NAKAMURA Takumi
3fc2c99ead llvm/test/CodeGen/X86/xor.ll: Appease Win32 targets since r240796.
%struct.ref_s = type { %union.v, i16, i16 }
  %union.v = type { i64 }

It seems %struct.ref_s is incompatible in tail padding.

llvm-svn: 240874
2015-06-27 03:46:58 +00:00
Alex Lorenz
4a93dcb918 MIR Serialization: Serialize global address machine operands.
This commit serializes the global address machine operands.
This commit doesn't serialize the operand's offset and target
flags, it serializes only the global value reference.

Reviewers: Duncan P. N. Exon Smith

Differential Revision: http://reviews.llvm.org/D10671

llvm-svn: 240851
2015-06-26 22:56:48 +00:00
Jingyue Wu
e89c324de3 [NVPTX] noop when kernel pointers are already global
Summary:
Some front ends make kernel pointers global already. In that case,
handlePointerParams does nothing.

Test Plan: more tests in lower-kernel-ptr-arg.ll

Reviewers: grosser

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10779

llvm-svn: 240849
2015-06-26 22:35:43 +00:00
Tom Stellard
a50ac5923b AMDPGU/SI: Use correct resource descriptors for VI on HSA
Summary: We need to set MTYPE = 2 for VI shaders when targeting the HSA runtime.

Reviewers: arsenm

Differential Revision: http://reviews.llvm.org/D10777

llvm-svn: 240841
2015-06-26 21:58:42 +00:00
Tom Stellard
ff6108f813 AMDGPU/SI: Update amd_kernel_code_t definition and add assembler support
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10772

llvm-svn: 240839
2015-06-26 21:58:31 +00:00
Philip Reames
4f4ed9b17b [Verifier] Verify invokes of intrinsics
We support invoking a subset of llvm's intrinsics, but the verifier didn't account for this.  We had previously added a special case to verify invokes of statepoints.  By generalizing the code in terms of CallSite, we can verify invokes of other intrinsics as well.  Interestingly, this found one test case which was invalid.

Note: I'm deliberately leaving the naming change from CI to CS to a follow up change.  That will happen shortly, I just wanted to reduce the diff to make it clear what was happening with this one.

Differential Revision: http://reviews.llvm.org/D10118

llvm-svn: 240836
2015-06-26 21:39:44 +00:00
Tom Stellard
ac2f277b1d AMDGPU/SI: Set ELF OS/ABI to ELFOSABI_AMDGPU_HSA
Reviewers: arsenm, rafael

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10708

llvm-svn: 240832
2015-06-26 21:15:11 +00:00