1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 04:22:57 +02:00
Commit Graph

27549 Commits

Author SHA1 Message Date
Colin LeMahieu
77e5a6b190 [Hexagon] Marking several instructions as isCodeGenOnly=0 and adding direct disassembly tests for many instructions.
llvm-svn: 223482
2014-12-05 17:27:39 +00:00
Asiri Rathnayake
4984b53708 Improvements to ARM assembler tests
No functional changes. Got myself bitten in r223113 when adding support for
modified immediate syntax (regressions reported by joerg@britannica.bec.de,
fixes in r223366 and r223381). Our assembler tests did not cover serveral
different syntax variants. This patch expands the test coverage to check for
the following cases:

1. Modified immediate operands may be expressed with expressions, as in #(4 * 2)
instead of #8.

2. Modified immediate operands may be _optionally_ prefixed by a '#' symbol or a
'$' symbol.

3. Certain instructions (e.g. ADD) support single input register variants;
[ADD r0, #mod_imm] is same as [ADD r0, r0, #mod_imm].

4. Certain instructions have aliases which convert plain immediates to modified
immediates. For an example, [ADD r0, -10] is not valid because -10 (in two's
complement) cannot be encoded as a modified immediate, but ARMInstrInfo.td
defines an alias which can transform this into a [SUB r0, 10].

llvm-svn: 223475
2014-12-05 16:33:56 +00:00
Evgeniy Stepanov
c900d55e15 [msan] Avoid extra origin address realignment.
Do not realign origin address if the corresponding application
address is at least 4-byte-aligned.

Saves 2.5% code size in track-origins mode.

llvm-svn: 223464
2014-12-05 14:34:03 +00:00
Andrea Di Biagio
549dad7c4c [X86] Avoid introducing extra shuffles when lowering packed vector shifts.
When lowering a vector shift node, the backend checks if the shift count is a
shuffle with a splat mask. If so, then it introduces an extra dag node to
extract the splat value from the shuffle. The splat value is then used
to generate a shift count of a target specific shift.

However, if we know that the shift count is a splat shuffle, we can use the
splat index 'I' to extract the I-th element from the first shuffle operand.
The advantage is that the splat shuffle may become dead since we no longer
use it.

Example:

;;
define <4 x i32> @example(<4 x i32> %a, <4 x i32> %b) {
  %c = shufflevector <4 x i32> %b, <4 x i32> undef, <4 x i32> zeroinitializer
  %shl = shl <4 x i32> %a, %c
  ret <4 x i32> %shl
}
;;

Before this patch, llc generated the following code (-mattr=+avx):
  vpshufd $0, %xmm1, %xmm1   # xmm1 = xmm1[0,0,0,0]
  vpxor  %xmm2, %xmm2
  vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7]
  vpslld %xmm1, %xmm0, %xmm0
  retq

With this patch, the redundant splat operation is removed from the code.
  vpxor  %xmm2, %xmm2
  vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7]
  vpslld %xmm1, %xmm0, %xmm0
  retq

llvm-svn: 223461
2014-12-05 12:13:30 +00:00
Charlie Turner
1fb529a753 Add missing FP build attribute tests.
The test file test/CodeGen/ARM/build-attributes.ll was missing several
floating-point build attribute tests. The intention of this commit is that for
each CPU / architecture currently tested, there are now tests that make sure
the following attributes are sufficiently checked,

  * Tag_ABI_FP_rounding
  * Tag_ABI_FP_denormal
  * Tag_ABI_FP_exceptions
  * Tag_ABI_FP_user_exceptions
  * Tag_ABI_FP_number_model

Also in this commit, the -unsafe-fp-math flag has been augmented with the full
suite of flags Clang sends to LLVM when you pass -ffast-math to Clang. That is,
`-unsafe-fp-math' has been changed to `-enable-unsafe-fp-math -disable-fp-elim
-enable-no-infs-fp-math -enable-no-nans-fp-math -fp-contract=fast'

Change-Id: I35d766076bcbbf09021021c0a534bf8bf9a32dfc
llvm-svn: 223454
2014-12-05 08:22:47 +00:00
Hal Finkel
bf9e347b44 Revert "r223440 - Consider subregs when calling MI::registerDefIsDead for phys deps"
Reverting this because, while it fixes the problem in the reduced test case, it
does not fix the problem in the full test case from the bug report.

llvm-svn: 223442
2014-12-05 02:07:35 +00:00
Hal Finkel
c4a8e3d745 Consider subregs when calling MI::registerDefIsDead for phys deps
The scheduling dependency graph is built bottom-up within each scheduling
region, and ScheduleDAGInstrs::addPhysRegDeps is called to add output/anti
dependencies, based on physical registers, to the SUs for instructions
based on those that come before them.

In the test case, we start before post-RA scheduling with a block that looks
like this:

...
	INLINEASM <...
andc $0,$0,$2
stdcx. $0,0,$3
bne- 1b
> [sideeffect] [mayload] [maystore] [attdialect], $0:[regdef-ec:G8RC], %X6<earlyclobber,def,dead>, $1:[mem], %X3<kill>, $2:[reguse:G8RC], %X5<kill>, $3:[reguse:G8RC], %X3, $4:[mem], %X3, $5:[clobber], %CC<earlyclobber,imp-def,dead>, <<badref>>
	...
	%X4<def,dead> = ANDIo8 %X4<kill>, 1, %CR0<imp-def,dead>, %CR0GT<imp-def>
	...
	%R29<def> = ISEL %R3<undef>, %R4<kill>, %CR0GT<kill>

where it is relevant that %CC is an alias to %CR0, and that %CR0GT is a
subregister of %CR0. However, for post-RA scheduling, no dependency was added
to prevent the INLINEASM from being scheduled in between the ANDIo8 and the
ISEL (which communicate via the %CR0GT register).

In ScheduleDAGInstrs::addPhysRegDeps, when called for the %CC operand, we'd
iterate over all of its aliases (which include %CC itself and also %CR0), and
look for previously-encountered defs of those registers. We'd find the ANDIo8,
but decide not to add a dependency between the INLINEASM and the ANDIo8 because
both the INLINEASM's def of %CC is dead, and also the ANDIo8 def of %CR0 is
dead. This ignores, however, that ANDIo8 has a non-dead def of %CR0GT, a
subregister of %CR0, and thus a dependency still must exist.

To fix this problem, when calling registerDefIsDead on the SU with the def, we
also check all subregisters for possible non-dead defs, and add the dependency
if any are found.

Fixes PR21742.

llvm-svn: 223440
2014-12-05 01:57:22 +00:00
Adrian Prantl
039b9f8d4e Add a comment.
llvm-svn: 223427
2014-12-05 01:02:36 +00:00
Rafael Espindola
f1bc0a9da3 Add a few extra cases to the test. NFC.
llvm-svn: 223417
2014-12-05 00:02:42 +00:00
Kevin Enderby
291f346eff Re-add support to llvm-objdump for Mach-O universal files and archives with -macho
with fixes.  Includes the move of tests for llvm-objdump for universal files to an X86
directory.  And the fix where it was failing on linux Rafael tracked down with asan.
I had both Jim Grosbach and Adam Hemet look over the second fix since I could not
set up asan to reproduce with the old version but not with the fix.

llvm-svn: 223416
2014-12-04 23:56:27 +00:00
Rafael Espindola
89aebd8ef9 Convert test to use an extra Input file. NFC.
llvm-svn: 223414
2014-12-04 23:31:21 +00:00
Adrian Prantl
2a019cf6d0 Simplify implementation and testcase of r223401 based on feedback from dblaikie.
llvm-svn: 223405
2014-12-04 22:58:41 +00:00
Adrian Prantl
a0b1f27462 Debug info: If the RegisterCoalescer::reMaterializeTrivialDef() is
eliminating all uses of a vreg, update any DBG_VALUE describing that vreg
to point to the rematerialized register instead.

llvm-svn: 223401
2014-12-04 22:29:04 +00:00
Hans Wennborg
33a45a8666 Add some tests for SimplifyCFG's TurnSwitchRangeIntoICmp(). NFC.
llvm-svn: 223396
2014-12-04 22:19:28 +00:00
Hans Wennborg
e04e27debb Add some tests for SimplifyCFG's ConstantFoldTerminator(). NFC.
llvm-svn: 223395
2014-12-04 22:19:25 +00:00
Weiming Zhao
b889d65e01 [AArch64] Combining Load and IntToFp should check for neon availability
llvm-svn: 223382
2014-12-04 20:25:50 +00:00
Asiri Rathnayake
e5f983fb48 Fix yet another unseen regression caused by r223113
r223113 added support for ARM modified immediate assembly syntax. Which
assumes all immediate operands are prefixed with a '#'. This assumption
is wrong as per the ARMARM - which recommends that all '#' characters be
treated optional. The current patch fixes this regression and adds a test
case. A follow-up patch will expand the test coverage to other instructions.

llvm-svn: 223381
2014-12-04 19:34:59 +00:00
Jonathan Roelofs
348155e124 Fix thumbv4t indirect calls
So there are a couple of issues with indirect calls on thumbv4t. First, the most
'obvious' instruction, 'blx' isn't available until v5t. And secondly, the
next-most-obvious sequence: 'mov lr, pc; bx rN' doesn't DTRT in thumb code
because the saved off pc has its thumb bit cleared, so when the callee returns
we end up in ARM mode.... yuck.

The solution is to 'bl' to a nearby landing pad with a 'bx rN' in it.

We could cut down on code size by sharing the landing pads between call sites
that are close enough, but for the moment let's do correctness first and look at
performance later.


Patch by: Iain Sandoe

http://reviews.llvm.org/D6519

llvm-svn: 223380
2014-12-04 19:34:50 +00:00
Philip Reames
80180bc27f Add a test case for argument type coercion in an invoke of a vararg function
This would have caught the bug I fixed in 223370.  

llvm-svn: 223378
2014-12-04 19:13:45 +00:00
Hal Finkel
207573f122 Revert "r223364 - Revert r223347 which has caused crashes on bootstrap bots."
Reapply r223347, with a fix to not crash on uninserted instructions (or more
precisely, instructions in uninserted blocks). bugpoint was able to reduce the
test case somewhat, but it is still somewhat large (and relies on setting
things up to be simplified during inlining), so I've not included it here.
Nevertheless, it is clear what is going on and why.

Original commit message:

Restrict somewhat the memory-allocation pointer cmp opt from r223093

Based on review comments from Richard Smith, restrict this optimization from
applying to globals that might resolve lazily to other dynamically-loaded
modules, and also from dynamic allocas (which might be transformed into malloc
calls). In short, take extra care that the compared-to pointer is really
simultaneously live with the memory allocation.

llvm-svn: 223371
2014-12-04 17:45:19 +00:00
Asiri Rathnayake
7121b6d3a3 Fix a minor regression introduced in r223113
r223113 added support for ARM modified immediate assembly syntax. That patch
has broken support for immediate expressions, as in:
    add r0, #(4 * 4)
It wasn't caught because we don't have any tests for this feature. This patch
fixes this regression and adds test cases.

llvm-svn: 223366
2014-12-04 14:49:07 +00:00
Alexander Potapenko
a5817ca5e9 Revert r223347 which has caused crashes on bootstrap bots.
llvm-svn: 223364
2014-12-04 14:22:27 +00:00
Rafael Espindola
ffb0b8fbb1 Revert "[Thumb/Thumb2] Added restrictions on PC, LR, SP in the register list for PUSH/POP/LDM/STM. <Differential Revision: http://reviews.llvm.org/D6090>"
This reverts commit r223356.

It was failing check-all (MC/ARM/thumb.s in particular).

llvm-svn: 223363
2014-12-04 14:10:20 +00:00
Michael Kuperstein
7a925b41a3 [X86] Improve a dag-combine that handles a vector extract -> zext sequence.
The current DAG combine turns a sequence of extracts from <4 x i32> followed by zexts into a store followed by scalar loads.
According to measurements by Martin Krastev (see PR 21269) for x86-64, a sequence of an extract, movs and shifts gives better performance. However, for 32-bit x86, the previous sequence still seems better.

Differential Revision: http://reviews.llvm.org/D6501

llvm-svn: 223360
2014-12-04 13:49:51 +00:00
Jyoti Allur
a4f3d11768 [Thumb/Thumb2] Added restrictions on PC, LR, SP in the register list for PUSH/POP/LDM/STM. <Differential Revision: http://reviews.llvm.org/D6090>
llvm-svn: 223356
2014-12-04 11:52:49 +00:00
Patrik Hagglund
bd19c4a85b Use DomTree in MachineSink to sink over diamonds.
According to a previous FIXME comment we now not only look at MBB
successors, but also handle code sinking past them:

  x = computation
  if () {} else {}
  use x

The instruction could be sunk over the whole diamond for the
if/then/else (or loop, etc), allowing it to be sunk into other blocks
after that.

Modified test added in r204522, due to one spill less present.

Minor fixes in comments.

Patch provided by Jonas Paulsson. Reviewed by Hal Finkel.

llvm-svn: 223350
2014-12-04 10:36:42 +00:00
Simon Pilgrim
0b4002e32c [InstCombine] Minor optimization for bswap with binary ops
Added instcombine optimizations for BSWAP with AND/OR/XOR ops:

OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )
OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )

Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well:

fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y))

Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone.

Differential Revision: http://reviews.llvm.org/D6407

llvm-svn: 223349
2014-12-04 09:44:01 +00:00
Elena Demikhovsky
befed29343 Masked Load / Store Intrinsics - the CodeGen part.
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.

Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 223348
2014-12-04 09:40:44 +00:00
Hal Finkel
a127d156af Restrict somewhat the memory-allocation pointer cmp opt from r223093
Based on review comments from Richard Smith, restrict this optimization from
applying to globals that might resolve lazily to other dynamically-loaded
modules, and also from dynamic allocas (which might be transformed into malloc
calls). In short, take extra care that the compared-to pointer is really
simultaneously live with the memory allocation.

llvm-svn: 223347
2014-12-04 09:22:28 +00:00
Jean-Daniel Dupas
22cb66e79b Add missing test file
llvm-svn: 223346
2014-12-04 09:20:13 +00:00
Jean-Daniel Dupas
d567300387 Add mach-o LC_RPATH support to llvm-objdump
Summary: Add rpath load command support in Mach-O object and update llvm-objdump to use it.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6512

llvm-svn: 223343
2014-12-04 07:37:02 +00:00
Rafael Espindola
eee0583686 Revert "Add missing test dependency and use a more canonical target name."
This reverts commit r223336.

NAKAMURA Takumi did the same thing in r223332!

Sorry about the noise.

llvm-svn: 223337
2014-12-04 04:33:32 +00:00
Rafael Espindola
6e72eb4cdf Add missing test dependency and use a more canonical target name.
llvm-svn: 223336
2014-12-04 04:30:56 +00:00
Colin LeMahieu
892cc4dcb1 [Hexagon] Adding lit exception if Hexagon isn't built.
llvm-svn: 223335
2014-12-04 04:28:38 +00:00
Colin LeMahieu
a0eacebee5 [Hexagon] Marking some instructions as CodeGenOnly=0 and adding disassembly tests.
llvm-svn: 223334
2014-12-04 03:41:21 +00:00
NAKAMURA Takumi
92f7403490 Introduce "llvm-ranlib" as a name of targets since Object/archive-symtab.test requires llvm-ranlib.
llvm-svn: 223332
2014-12-04 01:34:11 +00:00
NAKAMURA Takumi
6d3ff27369 Sort by alphabetical order.
llvm-svn: 223331
2014-12-04 01:27:53 +00:00
Michael Liao
f24ea6579d [X86] Restore X86 base pointer after call to llvm.eh.sjlj.setjmp
Commit on 

- This patch fixes the bug described in
  http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-May/062343.html

The fix allocates an extra slot just below the GPRs and stores the base pointer
there. This is done only for functions containing llvm.eh.sjlj.setjmp that also
need a base pointer. Because code containing llvm.eh.sjlj.setjmp saves all of
the callee-save GPRs in the prologue, the offset to the extra slot can be
computed before prologue generation runs.

Impact at run-time on affected functions is::

  - One extra store in the prologue, The store saves the base pointer.
  - One extra load after a llvm.eh.sjlj.setjmp. The load restores the base pointer.

Because the extra slot is just above a gap between frame-pointer-relative and
base-pointer-relative chunks of memory, there is no impact on other offset
calculations other than ensuring there is room for the extra slot.

http://reviews.llvm.org/D6388

Patch by Arch Robison <arch.robison@intel.com>

llvm-svn: 223329
2014-12-04 00:56:38 +00:00
Hal Finkel
84b0b8cc3b [PowerPC] 'cc' should be an alias only to 'cr0'
We had mistakenly believed that GCC's 'cc' referred to the entire
condition-code register (cr0 through cr7) -- and implemented this in r205630 to
fix PR19326, but 'cc' is actually an alias only to 'cr0'. This is causing LLVM
to clobber too much with legacy code with inline asm using the 'cc' clobber.

Fixes PR21451.

llvm-svn: 223328
2014-12-04 00:46:20 +00:00
Hal Finkel
bf150eceab [PowerPC] Fix inline asm memory operands not to use r0
On PowerPC, inline asm memory operands might be expanded as 0($r), where $r is
a register containing the address. As a result, this register cannot be r0, and
we need to enforce this register subclass constraint to prevent miscompiling
the code (we'd get this constraint for free with the usual instruction
definitions, but that scheme has no knowledge of how we end up printing inline
asm memory operands, and so here we need to do it 'by hand'). We can accomplish
this within the current address-mode selection framework by introducing an
explicit COPY_TO_REGCLASS node.

Fixes PR21443.

llvm-svn: 223318
2014-12-03 23:40:13 +00:00
Quentin Colombet
ce2758eff5 [RegAllocFast] Handle implicit definitions conservatively.
Prior to this commit, physical registers defined implicitly were considered free
right after their definition, i.e.. like dead definitions. Therefore, their uses
had to immediately follow their definitions, otherwise the related register may
be reused to allocate a virtual register.

This commit fixes this assumption by keeping implicit definitions alive until
they are actually used. The downside is that if the implicit definition was dead
(and not marked at such), we block an otherwise available register. This is
however conservatively correct and makes the fast register allocator much more
robust in particular regarding the scheduling of the instructions.

Fixes PR21700.

llvm-svn: 223317
2014-12-03 23:38:08 +00:00
Rafael Espindola
4f7ba285db This reverts commit r223306 and r223277.
The code is using uninitialized memory and failing on linux.

llvm-svn: 223315
2014-12-03 23:29:34 +00:00
Kostya Serebryany
78d5668556 [msan] allow -fsanitize-coverage=N together with -fsanitize=memory, llvm part
llvm-svn: 223312
2014-12-03 23:28:26 +00:00
Kevin Enderby
fc688e8cc2 Move tests for llvm-objdump for universal files to X86 directory to fix build bots.
llvm-svn: 223306
2014-12-03 23:00:16 +00:00
Rafael Espindola
3358db2d18 Split the set of identified struct types into opaque and non-opaque ones.
The non-opaque part can be structurally uniqued. To keep this to just
a hash lookup, we don't try to unique cyclic types.

Also change the type mapping algorithm to be optimistic about a type
not being recursive and only create a new type when proven to be wrong.
This is not as strong as trying to speculate that we can keep the source
type, but is simpler (no speculation to revert) and more powerfull
than what we had before (we don't copy non-recursive types at least).

I initially wrote this to try to replace the name based type merging.
It is not strong enough to replace it, but is is a useful addition.

With this patch the number of named struct types is a clang lto bootstrap goes
from 49674 to 15986.

llvm-svn: 223278
2014-12-03 22:36:37 +00:00
Kevin Enderby
b19e536332 Add support to llvm-objdump for Mach-O universal files and archives with -macho.
llvm-svn: 223277
2014-12-03 22:29:40 +00:00
Matthias Braun
8672c85e5e [SimplifyLibCalls] Improve double->float shrinking to consider constants
This allows cases like float x; fmin(1.0, x); to be optimized to fminf(1.0f, x);

rdar://19049359

Differential Revision: http://reviews.llvm.org/D6496

llvm-svn: 223270
2014-12-03 21:46:33 +00:00
Matthias Braun
3bca19851f [SimplifyLibCalls] Enable double to float shrinking for copysign
rdar://19049359

Differential Revision: http://reviews.llvm.org/D6495

llvm-svn: 223269
2014-12-03 21:46:29 +00:00
Tim Northover
24557369d4 AArch64: fix wrong-endian parameter passing.
The blocked arguments code didn't take account of the hacks needed to support
it.

llvm-svn: 223247
2014-12-03 17:49:26 +00:00
Nick Lewycky
e77a0e6d56 Fix test to use the right metadata node (reapply r223239 plus a fix) and also to use the correct path to the GCNO file.
llvm-svn: 223244
2014-12-03 17:32:44 +00:00
Alexander Potapenko
44a90df4d9 Revert r223239, which broke some bots.
llvm-svn: 223240
2014-12-03 16:03:08 +00:00
Alexander Potapenko
80e9b87af0 Fix the metadata number used by llvm.gcov to match the number of the inserted metadata node.
llvm-svn: 223239
2014-12-03 15:15:58 +00:00
Erik Eckstein
b61d06dbaf InstCombine: simplify signed range checks
Try to convert two compares of a signed range check into a single unsigned compare.
Examples:
(icmp sge x, 0) & (icmp slt x, n) --> icmp ult x, n
(icmp slt x, 0) | (icmp sgt x, n) --> icmp ugt x, n

llvm-svn: 223224
2014-12-03 10:39:15 +00:00
Hal Finkel
e95528845d [PowerPC] Print all inline-asm consts as signed numbers
Almost all immediates in PowerPC assembly (both 32-bit and 64-bit) are signed
numbers, and it is important that we print them as such. To make sure that
happens, we change PPCTargetLowering::LowerAsmOperandForConstraint so that it
does all intermediate checks on a signed-extended int64_t value, and then
creates the resulting target constant using MVT::i64. This will ensure that all
negative values are printed as negative values (mirroring what is done in other
backends to achieve the same sign-extension effect).

This came up in the context of inline assembly like this:
  "add%I2   %0,%0,%2", ..., "Ir"(-1ll)
where we used to print:
  addi   3,3,4294967295
and gcc would print:
  addi   3,3,-1
and gas accepts both forms, but our builtin assembler (correctly) does not. Now
we print -1 like gcc does.

While here, I replaced a bunch of custom integer checks with isInt<16> and
friends from MathExtras.h.

Thanks to Paul Hargrove for the bug report.

llvm-svn: 223220
2014-12-03 09:37:50 +00:00
Charlie Turner
ab73ef8264 Emit ABI_FP_rounding attribute.
LLVM understands a -enable-sign-dependent-rounding-fp-math codegen option. When
the user has specified this option, the Tag_ABI_FP_rounding attribute should be
emitted with value 1. This option currently does not appear to disable
transformations and optimizations that assume default floating point rounding
behavior, AFAICT, but the intention should be recorded in the build attributes,
regardless of what the compiler actually does with the intention.

Change-Id: If838578df3dc652b6f2796b8d152545674bcb30e
llvm-svn: 223218
2014-12-03 08:12:26 +00:00
Charlie Turner
7f4e539ba5 Add tests for default value of Tag_ABI_FP_rounding.
Change-Id: I051866d073fc6ce87ce3e693a3762da6d81f4393
llvm-svn: 223217
2014-12-03 07:59:50 +00:00
Rafael Espindola
02dc2705ae Ask the module for its the identified types.
When lazy reading a module, the types used in a function will not be visible to
a TypeFinder until the body is read.

This patch fixes that by asking the module for its identified struct types.
If a materializer is present, the module asks it. If not, it uses a TypeFinder.

This fixes pr21374.

I will be the first to say that this is ugly, but it was the best I could find.

Some of the options I looked at:

* Asking the LLVMContext. This could be made to work for gold, but not currently
  for ld64. ld64 will load multiple modules into a single context before merging
  them. This causes us to see types from future merges. Unfortunately,
  MappedTypes is not just a cache when it comes to opaque types. Once the
  mapping has been made, we have to remember it for as long as the key may
  be used. This would mean moving MappedTypes to the Linker class and having
  to drop the Linker::LinkModules static methods, which are visible from C.

* Adding an option to ignore function bodies in the TypeFinder. This would
  fix the PR by picking the worst result. It would work, but unfortunately
  we are currently quite dependent on the upfront type merging. I will
  try to reduce our dependency, but it is not clear that we will be able
  to get rid of it for now.

The only clean solution I could think of is making the Module own the types.
This would have other advantages, but it is a much bigger change. I will
propose it, but it is nice to have this fixed while that is discussed.

With the gold plugin, this patch takes the number of types in the LTO clang
binary from 52817 to 49669.

llvm-svn: 223215
2014-12-03 07:18:23 +00:00
Matt Arsenault
0533e2261e R600/SI: Remove i1 pseudo VALU ops
Select i1 logical ops directly to 64-bit SALU instructions.
Vector i1 values are always really in SGPRs, with each
bit for each item in the wave. This saves about 4 instructions
when and/or/xoring any condition, and also helps write conditions
that need to be passed in vcc.

This should work correctly now that the SGPR live range
fixing pass works. More work is needed to eliminate the VReg_1
pseudo regclass and possibly the entire SILowerI1Copies pass.

llvm-svn: 223206
2014-12-03 05:22:35 +00:00
Tom Stellard
213a8062a7 StructurizeCFG: Use LoopInfo analysis for better loop detection
We were assuming that each back-edge in a region represented a unique
loop, which is not always the case.  We need to use LoopInfo to
correctly determine which back-edges are loops.

llvm-svn: 223199
2014-12-03 04:28:32 +00:00
Tom Stellard
a8af03e062 R600/SI: Enable inline assembly
We just needed to remove the assertion in
AMDGPURegisterInfo::getFrameRegister(), which is called when
initializing the parser for inline assembly.

llvm-svn: 223197
2014-12-03 04:08:00 +00:00
Matt Arsenault
a51749b87e R600/SI: Change mubuf offsets to print as decimal
This matches SC's behavior.

llvm-svn: 223194
2014-12-03 03:12:13 +00:00
Nick Lewycky
be2ab4ddd3 Emit the entry block first and the exit block second, then all the blocks in between afterwards. This is what gcc always does, and some out of tree tools depend on that.
llvm-svn: 223193
2014-12-03 02:45:01 +00:00
Peter Collingbourne
837799f13b Prologue support
Patch by Ben Gamari!

This redefines the `prefix` attribute introduced previously and
introduces a `prologue` attribute.  There are a two primary usecases
that these attributes aim to serve,

  1. Function prologue sigils

  2. Function hot-patching: Enable the user to insert `nop` operations
     at the beginning of the function which can later be safely replaced
     with a call to some instrumentation facility

  3. Runtime metadata: Allow a compiler to insert data for use by the
     runtime during execution. GHC is one example of a compiler that
     needs this functionality for its tables-next-to-code functionality.

Previously `prefix` served cases (1) and (2) quite well by allowing the user
to introduce arbitrary data at the entrypoint but before the function
body. Case (3), however, was poorly handled by this approach as it
required that prefix data was valid executable code.

Here we redefine the notion of prefix data to instead be data which
occurs immediately before the function entrypoint (i.e. the symbol
address). Since prefix data now occurs before the function entrypoint,
there is no need for the data to be valid code.

The previous notion of prefix data now goes under the name "prologue
data" to emphasize its duality with the function epilogue.

The intention here is to handle cases (1) and (2) with prologue data and
case (3) with prefix data.

References
----------

This idea arose out of discussions[1] with Reid Kleckner in response to a
proposal to introduce the notion of symbol offsets to enable handling of
case (3).

[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html

Test Plan: testsuite

Differential Revision: http://reviews.llvm.org/D6454

llvm-svn: 223189
2014-12-03 02:08:38 +00:00
Ahmed Bougacha
ac111e2226 [X86][MC] Intel syntax: accept implicit memory operand sizes larger than 80.
The X86AsmParser intel handling was refactored in r216481, making it
try each different memory operand size to see which one matches.
Operand sizes larger than 80 ("[xyz]mmword ptr") were forgotten, which
led to an "invalid operand" error for code such as:
  movdqa [rax], xmm0

llvm-svn: 223187
2014-12-03 02:03:26 +00:00
Hal Finkel
2b306926ed [PowerPC] Fix readcyclecounter to be custom expanded for all 32-bit targets
We need to use the custom expansion of readcyclecounter on all 32-bit targets
(even those with 64-bit registers). This should fix the ppc64 buildbot.

llvm-svn: 223182
2014-12-03 00:19:17 +00:00
Tim Northover
0c91cccef8 AArch64: strengthen Darwin ABI alignment assumptions
A global variable without an explicit alignment specified should be assumed to
be ABI-aligned according to its type, like on other platforms. This allows us
to use better memory operations when accessing it.

rdar://18533701

llvm-svn: 223180
2014-12-02 23:53:43 +00:00
Tim Northover
85149ea9e5 AArch64: don't be too greedy when folding :lo12: accesses into mem ops.
This frequently leads to cases like:
   ldr xD, [xN, :lo12:var]
   add xA, xN, :lo12:var
   ldr xD, [xA, #8]

where the ADD would have been needed anyway, and the two distinct addressing
modes can prevent the formation of an ldp. Because of how we handle ADRP
(aggressively forming an ADRP/ADD pseudo-inst at ISel time), this pattern also
results in duplicated ADRP instructions (one on its own to cover the ldr, and
one combined with the add).

llvm-svn: 223172
2014-12-02 23:13:39 +00:00
Michael Zolotukhin
37717c3cf1 PR21302. Vectorize only bottom-tested loops.
rdar://problem/18886083

llvm-svn: 223171
2014-12-02 22:59:06 +00:00
Michael Zolotukhin
ce3e203aab Apply loop-rotate to several vectorizer tests.
Such loops shouldn't be vectorized due to the loops form.
After applying loop-rotate (+simplifycfg) the tests again start to check
what they are intended to check.

llvm-svn: 223170
2014-12-02 22:59:02 +00:00
Simon Pilgrim
12f78b2c48 [X86][SSE] Keep 4i32 vector insertions in integer domain on SSE4.1 targets
4i32 shuffles for single insertions into zero vectors lowers to X86vzmovl which was using (v)blendps - causing domain switch stalls. This patch fixes this by using (v)pblendw instead.

The updated tests on test/CodeGen/X86/sse41.ll still contain a domain stall due to the use of insertps - I'm looking at fixing this in a future patch.

Differential Revision: http://reviews.llvm.org/D6458

llvm-svn: 223165
2014-12-02 22:31:23 +00:00
Hal Finkel
337f550328 [PowerPC] Implement readcyclecounter for PPC32
We've long supported readcyclecounter on PPC64, but it is easier there (the
read of the 64-bit time-base register can be accomplished via a single
instruction). This now provides an implementation for PPC32 as well. On PPC32,
the time-base register is still 64 bits, but can only be read 32 bits at a time
via two separate SPRs. The ISA manual explains how to do this properly (it
involves re-reading the upper bits and looping if the counter has wrapped while
being read).

This requires PPC to implement a custom integer splitting legalization for the
READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then
gets turned into a pseudo-instruction, which is then expanded to the necessary
sequence (which has three SPR reads, the comparison and the branch).

Thanks to Paul Hargrove for pointing out to me that this was still unimplemented.

llvm-svn: 223161
2014-12-02 22:01:00 +00:00
Lang Hames
8e0b335b01 [AArch64][Stackmaps] Optimize stackmap shadows on AArch64.
Reduce the number of nops emitted for stackmap shadows on AArch64 by counting
non-stackmap instructions up to the next branch target towards the requested
shadow.

<rdar://problem/14959522>

llvm-svn: 223156
2014-12-02 21:36:24 +00:00
Tom Stellard
07557193b4 R600/SI: Move more information into SIProgramInfo struct
llvm-svn: 223154
2014-12-02 21:28:53 +00:00
Matt Arsenault
0ddaaf1c85 R600: Cleanup some tests and add missing testcases
llvm-svn: 223151
2014-12-02 21:02:20 +00:00
Daniel Sanders
8a04298b22 [mips] Fix passing of small structures for big-endian O32.
Summary:
Like N32/N64, they must be passed in the upper bits of the register.

The new code could be merged with the existing if-statements but I've
refrained from doing this since it will make porting the O32 implementation
to tablegen harder later.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6463

llvm-svn: 223148
2014-12-02 20:40:27 +00:00
Roman Divacky
3adb65e8f7 Introduce CPUStringIsValid() into MCSubtargetInfo and use it for ARM .cpu parsing.
Previously .cpu directive in ARM assembler didnt switch to the new CPU and
therefore acted as a nop. This implemented real action for .cpu and eg. 
allows to assembler FreeBSD kernel with -integrated-as.

llvm-svn: 223147
2014-12-02 20:03:22 +00:00
Philip Reames
02104421ff [Statepoints 3/4] Statepoint infrastructure for garbage collection: SelectionDAGBuilder
This is the third patch in a small series.  It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085).  The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them.  

With this change, gc.statepoints should be functionally complete.  The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now.

I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated.  The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it.  

During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics.  Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints.  Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack.  The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases.  

In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator.  In principal, we shouldn't need to eagerly spill at all.  The register allocator should do any spilling required and the statepoint should simply record that fact.  Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure.  

Reviewed by: atrick, ributzka

llvm-svn: 223137
2014-12-02 18:50:36 +00:00
Bruno Cardoso Lopes
fda1c647dd [SwitchLowering] Handle destinations on multiple phi instructions
Follow up from r222926. Also handle multiple destinations from merged
cases on multiple and subsequent phi instructions.

rdar://problem/19106978

llvm-svn: 223135
2014-12-02 18:31:53 +00:00
Ahmed Bougacha
3b0fbe2e87 [MachineCSE] Clear kill-flag on registers imp-def'd by the CSE'd instruction.
Go through implicit defs of CSMI and MI, and clear the kill flags on
their uses in all the instructions between CSMI and MI.
We might have made some of the kill flags redundant, consider:
  subs  ... %NZCV<imp-def>        <- CSMI
  csinc ... %NZCV<imp-use,kill>   <- this kill flag isn't valid anymore
  subs  ... %NZCV<imp-def>        <- MI, to be eliminated
  csinc ... %NZCV<imp-use,kill>
Since we eliminated MI, and reused a register imp-def'd by CSMI
(here %NZCV), that register, if it was killed before MI, should have
that kill flag removed, because it's lifetime was extended.

Also, add an exhaustive testcase for the motivating example.

Reviewed by: Juergen Ributzka <juergen@apple.com>

llvm-svn: 223133
2014-12-02 18:09:51 +00:00
Tim Northover
53d419429f AArch64: make register block rules apply to vector types too.
The blocking code originated in ARM, which is more aggressive about casting
types to a canonical representative before doing anything else, so I missed out
most vector HFAs and broke the ABI. This should fix it.

llvm-svn: 223126
2014-12-02 17:15:22 +00:00
Tom Stellard
221c239920 R600/SI: Set the ATC bit on all resource descriptors for the HSA runtime
llvm-svn: 223125
2014-12-02 17:05:41 +00:00
Bruno Cardoso Lopes
498cbaf867 [LICM] Avoind store sinking if no preheader is available
Load instructions are inserted into loop preheaders when sinking stores
and later removed if not used by the SSA updater. Avoid sinking if the
loop has no preheader and avoid crashes. This fixes one more side effect
of not handling indirectbr instructions properly on LoopSimplify.

llvm-svn: 223119
2014-12-02 14:22:34 +00:00
Asiri Rathnayake
9fa2089501 Add support for ARM modified-immediate assembly syntax.
Certain ARM instructions accept 32-bit immediate operands encoded as a 8-bit
integer value (0-255) and a 4-bit rotation (0-30, even). Current ARM assembly
syntax support in LLVM allows the decoded (32-bit) immediate to be specified
as a single immediate operand for such instructions:

mov r0, #4278190080

The ARMARM defines an extended assembly syntax allowing the encoding to be made
more explicit, as in:

mov r0, #255, #8 ; (same 32-bit value as above)

The behaviour of the two instructions can be different w.r.t flags, which is
documented under "Modified immediate constants" in ARMARM. This patch enables
support for this extended syntax at the MC layer.

llvm-svn: 223113
2014-12-02 10:53:20 +00:00
Charlie Turner
e286c5ea3a Emit Tag_ABI_FP_denormal correctly in fast-math mode.
The default ARM floating-point mode does not support IEEE 754 mode exactly. Of
relevance to this patch is that input denormals are flushed to zero. The way in
which they're flushed to zero depends on the architecture,

  * For VFPv2, it is implementation defined as to whether the sign of zero is
    preserved.
  * For VFPv3 and above, the sign of zero is always preserved when a denormal
    is flushed to zero.

When FP support has been disabled, the strategy taken by this patch is to
assume the software support will mirror the behaviour of the hardware support
for the target *if it existed*. That is, for architectures which can only have
VFPv2, it is assumed the software will flush to positive zero. For later
architectures it is assumed the software will flush to zero preserving sign.

Change-Id: Icc5928633ba222a4ba3ca8c0df44a440445865fd
llvm-svn: 223110
2014-12-02 08:22:29 +00:00
Sonam Kumari
c72e2ad442 [signext.ll] Removal Of Duplicate Test Cases
Removed the duplicate test case existing in signext.ll file.

llvm-svn: 223109
2014-12-02 05:29:47 +00:00
Hal Finkel
f84be383f3 Simplify pointer comparisons involving memory allocation functions
System memory allocation functions, which are identified at the IR level by the
noalias attribute on the return value, must return a pointer into a memory region
disjoint from any other memory accessible to the caller. We can use this
property to simplify pointer comparisons between allocated memory and local
stack addresses and the addresses of global variables. Neither the stack nor
global variables can overlap with the region used by the memory allocator.

Fixes PR21556.

llvm-svn: 223093
2014-12-01 23:38:06 +00:00
Philip Reames
cbcf6ea7d6 [Statepoints 1/4] Statepoint infrastructure for garbage collection: IR Intrinsics
The statepoint intrinsics are intended to enable precise root tracking through the compiler as to support garbage collectors of all types. The addition of the statepoint intrinsics to LLVM should have no impact on the compilation of any program which does not contain them. There are no side tables created, no extra metadata, and no inhibited optimizations.

A statepoint works by transforming a call site (or safepoint poll site) into an explicit relocation operation. It is the frontend's responsibility (or eventually the safepoint insertion pass we've developed, but that's not part of this patch series) to ensure that any live pointer to a GC object is correctly added to the statepoint and explicitly relocated. The relocated value is just a normal SSA value (as seen by the optimizer), so merges of relocated and unrelocated values are just normal phis. The explicit relocation operation, the fact the statepoint is assumed to clobber all memory, and the optimizers standard semantics ensure that the relocations flow through IR optimizations correctly.

This is the first patch in a small series.  This patch contains only the IR parts; the documentation and backend support will be following separately.  The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683.

Reviewed by: atrick, ributzka

llvm-svn: 223078
2014-12-01 21:18:12 +00:00
Jingyue Wu
66d4df8027 [NVPTX] Do not emit .weak symbols for NVPTX
Summary:
".weak" symbols cannot be consumed by ptxas (PR21685). This patch makes the
weak directive in MCAsmPrinter customizable, and disables emitting ".weak"
symbols for NVPTX.

Test Plan: weak-linkage.ll

Reviewers: jholewinski

Reviewed By: jholewinski

Subscribers: majnemer, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D6455

llvm-svn: 223077
2014-12-01 21:16:17 +00:00
Reid Kleckner
1591491217 Parse 'ghccc' in .ll files as the GHC convention (cc 10)
Previously we just used "cc 10" in the .ll files, but that isn't very
human readable.

llvm-svn: 223076
2014-12-01 21:04:44 +00:00
Ahmed Bougacha
17e4b2bbd7 [AArch64] Don't combine "select (setcc i1 LHS, RHS), vL, vR".
r208210 introduced an optimization that improves the vector select
codegen by doing the setcc on vectors directly.
This is a problem they the setcc operands are i1s, because the
optimization would create vectors of i1, which aren't legal.

Part of PR21549.

Differential Revision: http://reviews.llvm.org/D6308

llvm-svn: 223075
2014-12-01 20:59:00 +00:00
Ahmed Bougacha
e6c3a5f724 [AArch64] Fix v2i8->i16 bitcast legalization.
r213378 improved f16 bitcasts, so that they go directly through subregs,
instead of through the stack.  That code now causes an assertion failure
for bitcasts from other 16-bits types (most importantly v2i8).

Correct that by doing the custom lowering for i16 bitcasts only when the
input is an f16.

Part of PR21549.

Differential Revision: http://reviews.llvm.org/D6307

llvm-svn: 223074
2014-12-01 20:52:32 +00:00
Peter Zotov
a7ec66eea2 [OCaml] Move Llvm.clone_module to its own Llvm_transform_utils module.
This way most code won't link this (substantially large) library,
if compiled statically with LLVM.

llvm-svn: 223072
2014-12-01 19:50:39 +00:00
Peter Zotov
a05f11b29b [OCaml] [cmake] Add CMake buildsystem for OCaml.
Closes PR15325.

llvm-svn: 223071
2014-12-01 19:50:23 +00:00
Ahmed Bougacha
d23753b378 [MachineVerifier] Accept a MBB with a single landing pad successor.
The MachineVerifier used to check that there was always exactly one
unconditional branch to a non-landingpad (normal) successor.
If that normal successor to an invoke BB is unreachable, it seems
reasonable to only have one successor, the landing pad.
On targets other than AArch64 (and on AArch64 with a different testcase),
the branch folder turns the branch to the landing pad into a fallthrough.
The MachineVerifier, which relies on AnalyzeBranch, is unable to check
the condition, and doesn't complain. However, it does in this specific
testcase, where the branch to the landing pad remained.
Make the MachineVerifier accept it.

llvm-svn: 223059
2014-12-01 18:43:53 +00:00
Tim Northover
6096fde320 ARM: lower tail calls correctly when using GHC calling convention.
Patch by Ben Gamari.

llvm-svn: 223055
2014-12-01 17:46:39 +00:00
Hans Wennborg
f8109e3ddf Revert r223049, r223050 and r223051 while investigating test failures.
I didn't foresee affecting the Clang test suite :/

llvm-svn: 223054
2014-12-01 17:36:43 +00:00
Hans Wennborg
75717ff698 SimplifyCFG: Omit range checks for switch lookup tables when default is unreachable
They would get optimized away later, but we might as well not emit them.

llvm-svn: 223051
2014-12-01 17:08:38 +00:00
Hans Wennborg
15017c6fe4 SimplifyCFG: don't remove unreachable default switch destinations
An unreachable default destination can be exploited by other optimizations, and
SDag lowering is now prepared to handle them efficiently.

For example, branches to the unreachable destination will be optimized away,
such as in the case of range checks for switch lookup tables.

On 64-bit Linux, this reduces the size of a clang bootstrap by 80 kB (and
Chromium by 30 kB).

llvm-svn: 223050
2014-12-01 17:08:35 +00:00
Hans Wennborg
09865c30b6 SelectionDAG switch lowering: Replace unreachable default with most popular case.
This can significantly reduce the size of the switch, allowing for more
efficient lowering.

I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.

llvm-svn: 223049
2014-12-01 17:08:32 +00:00
Rafael Espindola
0bd0e50fbc Partial revert of r222986.
The explicit set of destination types is not fully redundant when lazy loading
since the TypeFinder will not find types used only in function bodies.

This keeps the logic to drop the name of mapped types since it still helps
with avoiding further renaming.

llvm-svn: 223043
2014-12-01 16:32:20 +00:00
Vladimir Medic
7915fd41c4 The andi16, addiusp and jraddiusp micromips instructions were missing dedicated decoder methods in MipsDisassembler.cpp to properly decode immediate operands. These methods are added together with corresponding tests.
llvm-svn: 223006
2014-12-01 11:12:04 +00:00
Jay Foad
1e2a95bd74 [PowerPC] Fix unwind info with dynamic stack realignment
Summary:
PowerPC DWARF unwind info defined CFA as SP + offset even in a function
where the stack had been dynamically realigned. This clearly doesn't
work because the offset from SP to CFA is not a constant. Fix it by
defining CFA as BP instead.

This was causing the AddressSanitizer null_deref test to fail 50% of
the time, depending on whether SP happened to be 32-byte aligned on
entry to a particular function or not.

Reviewers: willschm, uweigand, hfinkel

Reviewed By: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6410

llvm-svn: 222996
2014-12-01 09:42:32 +00:00
Sonam Kumari
b1f24c8a5b Removed extra whitespace. (Testing commit access). NFC.
llvm-svn: 222994
2014-12-01 09:27:46 +00:00
Charlie Turner
8386827fe8 Add post-decode checking of HVC instruction.
Add checkDecodedInstruction for post-decode checking of instructions, to catch
the corner cases like HVC that don't fit into the general pattern. Needed to
check for an invalid condition field in instruction encoding despite HVC not
taking a predicate.

Patch by Matthew Wahab.

Change-Id: I48e28de981d7a9e43569594da3c45fb478b4f795
llvm-svn: 222992
2014-12-01 08:50:27 +00:00
Yury Gribov
cfbc9e88f4 [asan] Change dynamic alloca instrumentation to only consider allocas that are dominating all exits from function.
Reviewed in http://reviews.llvm.org/D6412

llvm-svn: 222991
2014-12-01 08:47:58 +00:00
Charlie Turner
5186769a0c Add Thumb HVC and ERET virtualisation extension instructions.
Patch by Matthew Wahab.

Change-Id: I131f71c1150d5fa797066a18e09d526c19bf9016
llvm-svn: 222990
2014-12-01 08:39:19 +00:00
Charlie Turner
b03bbf42d6 Add ARM ERET and HVC virtualisation extension instructions.
Patch by Matthew Wahab.

Change-Id: Iad75f078fbaa4ecc7d7a4820ad9b3930679cbbbb
llvm-svn: 222989
2014-12-01 08:33:28 +00:00
Akira Hatanaka
cad434d590 [stack protector] Set edge weights for newly created basic blocks.
This commit fixes a bug in stack protector pass where edge weights were not set
when new basic blocks were added to lists of successor basic blocks.

Differential Revision: http://reviews.llvm.org/D5766

llvm-svn: 222987
2014-12-01 04:27:03 +00:00
Rafael Espindola
0376bacc1a Change how we keep track of which types are in the dest module.
Instead of keeping an explicit set, just drop the names of types we choose
to map to some other type.

This has the advantage that the name of the unused will not cause the context
to rename types on module read.

llvm-svn: 222986
2014-12-01 04:15:59 +00:00
Rafael Espindola
85fbf2f916 Add a test showing what the linker IdentifiedStructTypes is for.
Without this it could just be deleted and all tests would pass.

llvm-svn: 222985
2014-12-01 03:20:57 +00:00
Rafael Espindola
c79482b0b3 Relax an assert a bit to avoid a crash on unreachable code.
Patch by Duncan Exon Smith with a small tweak by me.

llvm-svn: 222984
2014-12-01 02:55:24 +00:00
Hal Finkel
efd08ad91f [PowerPC] Add asm support for cache-inhibited ld/st instructions
Add assembler support for the fixed-point cache-inhibited load/store
instructions. These are hypervisor-level only, so don't get too excited ;)

Fixes PR21650.

llvm-svn: 222976
2014-11-30 10:15:56 +00:00
Hans Wennborg
22fda335d6 Switch lowering: Fix broken 'Figure out which block is next' code
This doesn't seem to have worked in a long time, but other optimizations
would clean it up.

llvm-svn: 222961
2014-11-29 21:17:05 +00:00
Jozef Kolek
e3a600e129 [mips][microMIPS] Implement NOP aliases
This patch implements microMIPS 16-bit (MOVE16 $0, $0) and
32-bit (SLL $0, $0, 0) NOP aliases.

http://reviews.llvm.org/D6440

llvm-svn: 222953
2014-11-29 13:29:24 +00:00
Duncan P. N. Exon Smith
57cead164b DebugIR: Delete -debug-ir
llvm-svn: 222945
2014-11-29 03:15:47 +00:00
Matt Arsenault
5c2fb0e261 R600/SI: Fix assertion on sign extend of 3 vectors
This was trying to create an MVT with 3x vectors which
created an invalid EVT

llvm-svn: 222942
2014-11-28 22:51:38 +00:00
Duncan P. N. Exon Smith
73ce6dbb2b Revert "Masked Vector Load and Store Intrinsics."
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot.  I'll respond to the commit on the
list with a reproduction of one of the failures.

Conflicts:
	lib/Target/X86/X86TargetTransformInfo.cpp

llvm-svn: 222936
2014-11-28 21:29:14 +00:00
David Majnemer
87a17a0975 InstCombine: FoldOrOfICmps harder
We may be in a situation where the icmps might not be near each other in
a tree of or instructions.  Try to dig out related compare instructions
and see if they combine.

N.B.  This won't fire on deep trees of compares because rewritting the
tree might end up creating a net increase of IR.  We may have to resort
to something more sophisticated if this is a real problem.

llvm-svn: 222928
2014-11-28 19:58:29 +00:00
Bruno Cardoso Lopes
a072cee758 [LICM] Store sink and indirectbr instructions
Loop simplify skips exit-block insertion when exits contain indirectbr
instructions. This leads to an assertion in LICM when trying to sink
stores out of non-dedicated loop exits containing indirectbr
instructions. This patch fix this issue by re-checking for dedicated
exits in LICM prior to store sink attempts.

Differential Revision: http://reviews.llvm.org/D6414

rdar://problem/18943047

llvm-svn: 222927
2014-11-28 19:47:46 +00:00
Bruno Cardoso Lopes
02319ec9b1 [SwitchLowering] Handle multiple destinations on condensed case stmts
Switch cases statements with sequential values that branch to the same
destination BB may often be handled together in a single new source BB.
In this scenario we need to remove remaining incoming values from PHI
instructions in the destination BB, as to match the number of source
branches.

Differential Revision: http://reviews.llvm.org/D6415

rdar://problem/19040894

llvm-svn: 222926
2014-11-28 19:47:33 +00:00
Sanjay Patel
019be40c8c Enable FeatureFastUAMem for btver2
Allow unaligned 16-byte memop codegen for btver2. No functional changes for any other subtargets.

Replace the existing supposed small memcpy test with an actual test of a small memcpy. 
The previous test wasn't using FileCheck either.

This patch should allow us to close PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).

Differential Revision: http://reviews.llvm.org/D6360

llvm-svn: 222925
2014-11-28 18:40:18 +00:00
Rafael Espindola
880240ae91 Add back r222727 with a fix.
The original patch would fail when:

* A dst opaque type (%A) is matched with a src type (%A).
* A src opaque (%E) type is then speculatively matched with %A and the
  speculation fails afterward.
* When rolling back the speculation we would cancel the source %A to dest
  %A mapping.

The fix is to keep an explicit list of which resolutions are speculative.

Original message:

Fix overly aggressive type merging.

If we find out that two types are *not* isomorphic, we learn nothing about
opaque sub types in both the source and destination.

llvm-svn: 222923
2014-11-28 16:41:24 +00:00
Rafael Espindola
3962e8e2eb Add a testcase reduced from clang lto bootstrap on OS X.
llvm-svn: 222921
2014-11-28 15:45:31 +00:00
Charlie Turner
dc3f8f7176 Fix wrong encoding of MRSBanked.
Patch by Matthew Wahab.

Change-Id: Ia2a001ca2760028ea360fe77b56f203a219eefbc
llvm-svn: 222920
2014-11-28 15:01:06 +00:00
Evgeniy Stepanov
118b7804cb [msan] Fix origin propagation for select of floats.
MSan does not assign origin for instrumentation temps (i.e. the ones that do
not come from the application code), but "select" instrumentation erroneously
tried to use one of those.

https://code.google.com/p/memory-sanitizer/issues/detail?id=78

llvm-svn: 222918
2014-11-28 11:17:58 +00:00
Charlie Turner
0795093430 Test all <build attribute, value> pairs.
Add more tests to make sure the encoding/decoding of build attributes works
correctly for all permissible values of build attributes. For cases where there
are an infinite number of such values, a representative subset has been settled
for.

Change-Id: I2643c9624c211b2d56405306e16eec2d487bc5d6
llvm-svn: 222917
2014-11-28 11:14:47 +00:00
Tim Northover
e8b34aaff0 AArch64: treat [N x Ty] as a block during procedure calls.
The AAPCS treats small structs and homogeneous floating (or vector) aggregates
specially, and guarantees they either get passed as a contiguous block of
registers, or prevent any future use of those registers and get passed on the
stack.

This concept can fit quite neatly into LLVM's own type system, mapping an HFA
to [N x float] and so on, and small structs to [N x i64]. Doing so allows
front-ends to emit AAPCS compliant code without having to duplicate the
register counting logic.

llvm-svn: 222903
2014-11-27 21:02:42 +00:00
Zoran Jovanovic
15712f82b0 [mips][microMIPS] Implement SWM16 and LWM16 instructions
Differential Revision: http://reviews.llvm.org/D5579

llvm-svn: 222901
2014-11-27 18:28:59 +00:00
Jozef Kolek
c85a4a5656 [mips][microMIPS] Implement BREAK16 and SDBBP16 instructions
Patch by Radovan Obradovic.

Differential Revision: http://reviews.llvm.org/D5048

llvm-svn: 222900
2014-11-27 18:18:42 +00:00
Daniel Sanders
d7d5d4cdb1 [mips] Add synci instruction.
Patch by Amaury Pouly

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6421

llvm-svn: 222899
2014-11-27 17:28:10 +00:00
Will Newton
a200eb1b54 Widen ELFYAML relocation type to 32 bits
The current 8 bits is sufficient for ELF32 targets but ELF64 requires
32 bits. Add a test for AArch64 that exposes the issue.

llvm-svn: 222898
2014-11-27 17:20:48 +00:00
Rafael Espindola
fe0669d419 Commit back the correct bits of r222760 (was r222538).
I also added a test.

Original message:

Allow FDE references outside the +/-2GB range supported by PC relative
offsets for code models other than small/medium. For JIT application,
memory layout is less controlled and can result in truncations
otherwise.

Patch from Akos Kiss.

Differential Revision: http://reviews.llvm.org/D6079

llvm-svn: 222897
2014-11-27 17:13:56 +00:00
Rafael Espindola
008818ce29 Revert "Reapply 222538 and update tests to explicitly request small code model and PIC:"
This reverts commit r222760.

It changed our behaviour on PIC so we don't match gas anymore. It also
included lots of unnecessary changes to tests.

If those changes are desirable, there should be an independent discussion
as they are out of scope for that patch.

I will recommit the other bits.

llvm-svn: 222896
2014-11-27 17:13:51 +00:00
Duncan P. N. Exon Smith
bcb91dbf3f Revert "Fix overly aggressive type merging."
This reverts commit r222727, which causes LTO bootstrap failures.

Last passing @ r222698:
http://lab.llvm.org:8080/green/job/clang-Rlto_master_build/532/

First failing @ r222843:
http://lab.llvm.org:8080/green/job/clang-Rlto_master_build/533/

Internal bootstraps pointed at a much narrower range: r222725 is
passing, and r222731 is failing.

LTO crashes while handling libclang.dylib:
http://lab.llvm.org:8080/green/job/clang-Rlto_master_build/533/consoleFull#-158682280549ba4694-19c4-4d7e-bec5-911270d8a58c

    GEP is not of right type for indices!
      %InfoObj.i.i = getelementptr inbounds %"class.llvm::OnDiskIterableChainedHashTable"* %.lcssa, i64 0, i32 0, i32 4, !dbg !123627
     %"class.clang::serialization::reader::ASTIdentifierLookupTrait" = type { %"class.clang::ASTReader.31859"*, %"class.clang::serialization::ModuleFile.31870"*, %"class.clang::IdentifierInfo"* }LLVM ERROR: Broken function found, compilation aborted!
    clang: error: linker command failed with exit code 1 (use -v to see invocation)

Looks like the new algorithm doesn't merge types aggressively enough.

llvm-svn: 222895
2014-11-27 17:01:10 +00:00
Erik Eckstein
b81721adb5 reinstate r222872: Peephole optimization in switch table lookup: reuse the guarding table comparison if possible.
Fixed missing dominance check.
Original commit message:

This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch.
Example:
   if (idx < tablesize)
      r = table[idx]; // table does not contain default_value
   else
      r = default_value;
   if (r != default_value)
      ...
Is optimized to:
   cond = idx < tablesize;
   if (cond)
      r = table[idx];
   else
      r = default_value;
   if (cond)
      ...
Jump threading will then eliminate the second if(cond).

llvm-svn: 222891
2014-11-27 15:13:14 +00:00
Evgeniy Stepanov
a93bac024f [msan] Remove indirect call wrapping code.
This functionality was only used in MSanDR, which is deprecated.

llvm-svn: 222889
2014-11-27 14:54:02 +00:00
Jozef Kolek
90243462d4 [mips][microMIPS] Implement disassembler support for 16-bit instructions LI16, ADDIUR1SP, ADDIUR2 and ADDIUS5
Differential Revision: http://reviews.llvm.org/D6419

llvm-svn: 222887
2014-11-27 14:41:44 +00:00
Charlie Turner
e4a6c7fd89 Stop uppercasing build attribute data.
The string data for string-valued build attributes were being unconditionally
uppercased. There is no mention in the ARM ABI addenda about case conventions,
so it's technically implementation defined as to whether the data are
capitialised in some way or not. However, there are good reasons not to
captialise the data.

  * It's less work.
  * Some vendors may legitimately have case-sensitive checks for these
    attributes which would fail on LLVM generated object files.
  * There could be locale issues with uppercasing.

The original reasons for uppercasing appear to have stemmed from an
old codesourcery toolchain behaviour, see

http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/87133

This patch makes the object file emitted no longer captialise string
data, it encodes as seen in the assembly source.

Change-Id: Ibe20dd6e60d2773d57ff72a78470839033aa5538
llvm-svn: 222882
2014-11-27 12:13:56 +00:00
Suyog Sarda
375e0af627 Use FileCheck instead of grep. Change by Ankur Garg.
Differential Revision: http://reviews.llvm.org/D6430

llvm-svn: 222879
2014-11-27 11:22:49 +00:00
Erik Eckstein
6a5635f3ac Revert "Peephole optimization in switch table lookup: reuse the guarding table comparison if possible."
It is breaking the clang bootstrag.

llvm-svn: 222877
2014-11-27 10:59:08 +00:00
Suyog Sarda
a17e6ed740 Use FileCheck instead of grep. Change by Sonam.
Differential Revision: http://reviews.llvm.org/D6432

llvm-svn: 222876
2014-11-27 10:57:24 +00:00
Erik Eckstein
15d20044f2 Peephole optimization in switch table lookup: reuse the guarding table comparison if possible.
This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch.
Example:
    if (idx < tablesize)
       r = table[idx]; // table does not contain default_value
    else
       r = default_value;
    if (r != default_value)
       ...
Is optimized to:
    cond = idx < tablesize;
    if (cond)
       r = table[idx];
    else
       r = default_value;
    if (cond)
       ...
\endcode
Jump threading will then eliminate the second if(cond).

llvm-svn: 222872
2014-11-27 08:33:51 +00:00
David Majnemer
9698d3487d InstCombine: Restore optimizations lost in r210006
This restores our ability to optimize:
(X & C) == 0 ? X ^ C : X  into  X | C
(X & C) != 0 ? X ^ C : X  into  X & ~C

llvm-svn: 222871
2014-11-27 07:25:21 +00:00
David Majnemer
d9ae958b9b InstSimplify: Restore optimizations lost in r210006
This restores our ability to optimize:
(X & C) ? X & ~C : X  into  X & ~C
(X & C) ? X : X & ~C  into  X
(X & C) ? X | C : X  into  X
(X & C) ? X : X | C  into  X | C

llvm-svn: 222868
2014-11-27 06:32:46 +00:00
David Majnemer
7e9b94486e Revert "Added inst combine transforms for single bit tests from Chris's note"
This reverts commit r210006, it miscompiled libapr which is used in who
knows how many projects.

A test has been added to ensure that we don't regress again.

I'll work on a rewrite of what the optimization was trying to do later.

llvm-svn: 222856
2014-11-26 23:00:38 +00:00
Colin LeMahieu
0872710917 [Hexagon] Adding cmp* immediate form instructions.
llvm-svn: 222849
2014-11-26 19:43:12 +00:00
Jozef Kolek
ecfa20e7f7 [mips][microMIPS] Implement disassembler support for 16-bit instructions LBU16, LHU16, LW16, SB16, SH16 and SW16
Differential Revision: http://reviews.llvm.org/D6405

llvm-svn: 222847
2014-11-26 18:56:38 +00:00
Colin LeMahieu
1f8cdbb85d [Hexagon] Adding and64, or64, and xor64 instructions.
llvm-svn: 222846
2014-11-26 18:55:59 +00:00
Will Newton
da953f13b9 Update AArch64 ELF relocations to ABI 1.0
This mostly entails adding relocations, however there are a couple of
changes to existing relocations:

1. R_AARCH64_NONE is defined to be zero rather than 256

R_AARCH64_NONE has been defined to be zero for a long time elsewhere
e.g. binutils and glibc since the submission of the AArch64 port in
2012 so this is required for compatibility.

2. R_AARCH64_TLSDESC_ADR_PAGE renamed to R_AARCH64_TLSDESC_ADR_PAGE21

I don't think there is any way for relocation names to leak out of LLVM
so this should not break anything.

Tested with check-all with no regressions.

llvm-svn: 222821
2014-11-26 10:49:18 +00:00
Elena Demikhovsky
868b76ae69 AVX-512: Scalar ERI intrinsics
including SAE mode and memory operand.
Added AVX512_maskable_scalar template, that should cover all scalar instructions in the future.

The main difference between AVX512_maskable_scalar<> and AVX512_maskable<> is using X86select instead of vselect.
I need it, because I can't create vselect node for MVT::i1 mask for scalar instruction.

http://reviews.llvm.org/D6378

llvm-svn: 222820
2014-11-26 10:46:49 +00:00
Will Newton
7576aeb371 Update ARM ELF relocations to ABI 2.09
Add R_ARM_IRELATIVE.

llvm-svn: 222817
2014-11-26 10:36:03 +00:00
Simon Pilgrim
0e1c44b939 [X86][SSE] Improvements to byte shift shuffle matching
Since (v)pslldq / (v)psrldq instructions resolve to a single input argument it is useful to match it much earlier than we currently do - this prevents more complicated shuffles (notably insertion into a zero vector) matching before it.

Differential Revision: http://reviews.llvm.org/D6409

llvm-svn: 222796
2014-11-25 22:34:59 +00:00
Colin LeMahieu
535f692186 [Hexagon] Adding add64 and sub64 instructions.
llvm-svn: 222795
2014-11-25 22:15:44 +00:00
Colin LeMahieu
732a0febbc Reverting 222792
llvm-svn: 222793
2014-11-25 21:39:57 +00:00
Colin LeMahieu
bf565f821c [Hexagon] Adding compare with immediate instructions.
llvm-svn: 222792
2014-11-25 21:30:28 +00:00
Rafael Espindola
3e2785424e This test requires asserts because of -stats.
Sorry about that.

llvm-svn: 222788
2014-11-25 20:56:56 +00:00
Rafael Espindola
4252588d37 gold plugin: call llvm_shutdown so that -stats works.
llvm-svn: 222787
2014-11-25 20:52:49 +00:00
Cameron McInally
c32dadfa69 [AVX512] Add 512b integer shift by variable intrinsics and patterns.
llvm-svn: 222786
2014-11-25 20:41:51 +00:00
Colin LeMahieu
4ed8312072 [Hexagon] [NFC] Adding trailing whitespace to test files.
llvm-svn: 222785
2014-11-25 20:22:24 +00:00
Colin LeMahieu
9b0719975d [Hexagon] Adding C2_mux instruction.
llvm-svn: 222784
2014-11-25 20:20:09 +00:00
Hans Wennborg
24f12c7a0b Remove useless rdar:// comment from switch_to_lookup_table.ll test.
llvm-svn: 222772
2014-11-25 18:45:23 +00:00
Colin LeMahieu
4a42d4abd9 [Hexagon] Replacing cmp* instructions with ones that contain encoding bits.
llvm-svn: 222771
2014-11-25 18:20:52 +00:00
Hans Wennborg
333795bf71 LazyValueInfo: Actually re-visit partially solved block-values in solveBlockValue()
If solveBlockValue() needs results from predecessors that are not already
computed, it returns false with the intention of resuming when the dependencies
have been resolved. However, the computation would never be resumed since an
'overdefined' result had been placed in the cache, preventing any further
computation.

The point of placing the 'overdefined' result in the cache seems to have been
to break cycles, but we can check for that when inserting work items in the
BlockValue stack instead. This makes the "stop and resume" mechanism of
solveBlockValue() work as intended, unlocking more analysis.

Using this patch shaves 120 KB off a 64-bit Chromium build on Linux.

I benchmarked compiling bzip2.c at -O2 but couldn't measure any difference in
compile time.

Tests by Jiangning Liu from r215343 / PR21238, Pete Cooper, and me.

Differential Revision: http://reviews.llvm.org/D6397

llvm-svn: 222768
2014-11-25 17:23:05 +00:00
Joerg Sonnenberger
0740338c61 Small model and JIT generally don't go well with each other.
On LP64 platforms, it will work or not depending on the choosen memory
layout, so neither PASS nor XFAIL is appropiate.
As UNSUPPORTED as per-test target doesn't exist (yet), remove the test
instead to unbreak the builds.

llvm-svn: 222767
2014-11-25 17:14:22 +00:00
Rafael Espindola
6791ceb3c6 Set the body of a new struct as soon as it is created.
This changes the order in which different types are passed to get, but
one order is not inherently better than the other.

The main motivation is that this simplifies linkDefinedTypeBodies now that
it is only linking "real" opaque types. It is also means that we only have to
call it once and that we don't need getImpl.

A small change in behavior is that we don't copy type names when resolving
opaque types. This is an improvement IMHO, but it can be added back if
desired. A test is included with the new behavior.

llvm-svn: 222764
2014-11-25 15:33:40 +00:00
Joerg Sonnenberger
829c958371 Reapply 222538 and update tests to explicitly request small code model
and PIC:

Allow FDE references outside the +/-2GB range supported by PC relative
offsets for code models other than small/medium. For JIT application,
memory layout is less controlled and can result in truncations
otherwise.

Patch from Akos Kiss.

Differential Revision: http://reviews.llvm.org/D6079

llvm-svn: 222760
2014-11-25 13:37:55 +00:00
Joerg Sonnenberger
6dd10513e3 Mark as explicit failing on x86-64 -- small memory model doesn't agree
with default address selections.

llvm-svn: 222759
2014-11-25 13:28:56 +00:00
Zoran Jovanovic
c3664f6f8a [mips][micromips] Use call instructions with short delay slots
Differential Revision: http://reviews.llvm.org/D6338

llvm-svn: 222752
2014-11-25 10:50:00 +00:00
Chandler Carruth
6264bc6537 [InstCombine] Change LLVM To canonicalize toward the value type being
stored rather than the pointer type.

This change is analogous to r220138 which changed the canonicalization
for loads. The rationale is the same: memory does not have a type,
operations (and thus the values they produce) have a type. We should
match that type as closely as possible rather than reading some form of
semantics into the pointer type.

With this change, loads and stores should no longer be made with
nonsensical types for the values that tehy load and store. This is
particularly important when trying to match specific loaded and stored
types in the process of doing other instcombines, which is what led me
down this twisty maze of miscanonicalization.

I've put quite some effort into looking through IR to find places where
LLVM's optimizer was being unreasonably conservative in the face of
mismatched load and store types, however it is possible (let's say,
likely!) I have missed some. If you see regressions here, or from
r220138, the likely cause is some part of LLVM failing to cope with load
and store types differing. Test cases appreciated, it is important that
we root all of these out of LLVM.

llvm-svn: 222748
2014-11-25 10:09:51 +00:00
Suyog Sarda
cbd1299080 Change the test case file to use FileCheck instead of grep. NFC.
Change by Ankur Garg.

Differential Revision: http://reviews.llvm.org/D6382

llvm-svn: 222740
2014-11-25 08:44:56 +00:00
Chandler Carruth
7feb19d89c Revert r220349 to re-instate r220277 with a fix for PR21330 -- quite
clearly only exactly equal width ptrtoint and inttoptr casts are no-op
casts, it says so right there in the langref. Make the code agree.

Original log from r220277:
Teach the load analysis to allow finding available values which require
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.

To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.

These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.

I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.

llvm-svn: 222739
2014-11-25 08:20:27 +00:00
David Majnemer
a4a80e242b Forgot to add a file for r222734
llvm-svn: 222736
2014-11-25 07:45:56 +00:00
David Majnemer
705e1231b2 COFF: Add another test for r222124
llvm-svn: 222734
2014-11-25 07:42:36 +00:00
Rafael Espindola
34bccf8e6c Fix overly aggressive type merging.
If we find out that two types are *not* isomorphic, we learn nothing about
opaque sub types in both the source and destination.

llvm-svn: 222727
2014-11-25 05:59:24 +00:00
Simon Atanasyan
b1231365ea [Object][Mips] Return address of MIPS symbol with cleared microMIPS indicator bit
llvm-svn: 222726
2014-11-25 05:57:55 +00:00
Rafael Espindola
906b92c9d0 Link the type of aliases.
They are not more or less "well typed" than GlobalVariables.

llvm-svn: 222725
2014-11-25 04:43:59 +00:00
Juergen Ributzka
c90ddb75a2 [FastISel][AArch64] Fix and extend the tbz/tbnz pattern matching.
The pattern matching failed to recognize all instances of "-1", because when
comparing against "-1" we didn't use an APInt of the same bitwidth.

This commit fixes this and also adds inverse versions of the conditon to catch
more cases.

llvm-svn: 222722
2014-11-25 04:16:15 +00:00
Rafael Espindola
690ff4c264 Add an interesting test that we already get right. NFC.
llvm-svn: 222720
2014-11-25 03:47:57 +00:00
David Majnemer
4e5c0f46f5 InstSimplify: Handle some simple tautological comparisons
This handles cases where we are comparing a masked value against itself.
The analysis could be further improved by making it recursive but such
expense is not currently justified.

llvm-svn: 222716
2014-11-25 02:55:48 +00:00
Hal Finkel
d19aa4cdf8 [PowerPC] Add the 'attn' instruction
The attn instruction is not part of the Power ISA, but is documented in the A2
user manual, and is accepted by the GNU assembler for the A2 and the POWER4+.
Reported as part of PR21650.

llvm-svn: 222712
2014-11-25 00:30:11 +00:00
Hal Finkel
515f6e50f5 [PowerPC] Implement combineRepeatedFPDivisors
This does not matter on newer cores (where we can use reciprocal estimates in
fast-math mode anyway), but for older cores this allows us to generate better
fast-math code where we have multiple FDIVs with a common divisor.

llvm-svn: 222710
2014-11-24 23:45:21 +00:00
Matt Arsenault
454e837bd2 Bug 21610: Canonicalize min/max fcmp selects to use ordered comparisons
llvm-svn: 222705
2014-11-24 23:15:18 +00:00
Matt Arsenault
ae1a2b7c22 Convert test to FileCheck and use CHECK-LABEL
llvm-svn: 222704
2014-11-24 23:03:17 +00:00
Rafael Espindola
9584ebd5dc Add a disable-output option to the gold plugin.
This corresponds to the opt option and is handy for profiling.

llvm-svn: 222687
2014-11-24 21:18:14 +00:00
Rafael Espindola
c8934bcfb9 Pass the .ll files to llvm-link directly. NFC.
llvm-svn: 222681
2014-11-24 20:35:59 +00:00
Kostya Serebryany
cb8f8175c9 [asan/coverage] change the way asan coverage instrumentation is done: instead of setting the guard to 1 in the generated code, pass the pointer to guard to __sanitizer_cov and set it there. No user-visible functionality change expected
llvm-svn: 222675
2014-11-24 18:49:53 +00:00
Ulrich Weigand
24b899d017 [PowerPC] Fix PR 21652 - copy st_other bits on symbol assignment
When processing an assignment in the integrated assembler that sets
a symbol to the value of another symbol, we need to copy the st_other
bits that encode the local entry point offset.

Modeled after MipsTargetELFStreamer::emitAssignment handling of the
ELF::STO_MIPS_MICROMIPS flag.

llvm-svn: 222672
2014-11-24 18:09:47 +00:00
Colin LeMahieu
1dc5a7ca7c [Hexagon] Adding asrh instruction, removing unused multiclasses.
llvm-svn: 222670
2014-11-24 18:04:42 +00:00
Colin LeMahieu
b3868ebb81 [Hexagon] Adding aslh instruction.
llvm-svn: 222668
2014-11-24 17:44:19 +00:00
Colin LeMahieu
e1cd9ff6b5 [Hexagon] Adding zxth instruction.
llvm-svn: 222662
2014-11-24 17:11:34 +00:00
Colin LeMahieu
80e59674e9 [Hexagon] Adding zxtb instruction.
llvm-svn: 222660
2014-11-24 16:48:43 +00:00
David Majnemer
291966cd3b InstCombine: Don't create an unused instruction
We would create an instruction but not inserting it.
Not inserting the unused instruction would lead us to verification
failure.

This fixes PR21653.

llvm-svn: 222659
2014-11-24 16:41:13 +00:00
Jozef Kolek
a4e87d7a74 [mips][microMIPS] Fix JRADDIUSP instruction
Fix JRADDIUSP instruction, remove delay slot flag because this instruction
doesn't have delay slot.

Differential Revision: http://reviews.llvm.org/D6365

llvm-svn: 222658
2014-11-24 16:14:10 +00:00
Jozef Kolek
dd0dbf282b [mips][microMIPS] Implement LBU16, LHU16, LW16, SB16, SH16 and SW16 instructions
Differential Revision: http://reviews.llvm.org/D5122

llvm-svn: 222653
2014-11-24 14:39:13 +00:00
Jozef Kolek
0d926f8fc9 [mips][microMIPS] Implement disassembler support for 16-bit instructions
With the help of new method readInstruction16() two bytes are read and
decodeInstruction() is called with DecoderTableMicroMips16, if this fails
four bytes are read and decodeInstruction() is called with
DecoderTableMicroMips32.

Differential Revision: http://reviews.llvm.org/D6149

llvm-svn: 222648
2014-11-24 13:29:59 +00:00
Andrea Di Biagio
3646b17160 [X86] Improved target specific combine on VSELECT dag nodes.
This patch teaches function 'transformVSELECTtoBlendVECTOR_SHUFFLE' how to
convert VSELECT dag nodes to shuffles on targets that do not have SSE4.1.
On pre-SSE4.1 targets, we can still perform blend operations using movss/movsd.

Also, removed a target specific combine that performed a premature lowering of
VSELECT nodes to target specific MOVSS/MOVSD nodes.

llvm-svn: 222647
2014-11-24 12:23:15 +00:00
David Majnemer
2445c6caf5 InstCombine: Don't assume DataLayout is always available
We tried to get the result of DataLayout::getLargestLegalIntTypeSize but
we didn't have a DataLayout.  This resulted in opt crashing.

This fixes PR21651.

llvm-svn: 222645
2014-11-24 07:26:20 +00:00
Michael Kuperstein
b12b19a24a [X86] Fixes bug in build_vector v4x32 lowering
r222375 made some improvements to build_vector lowering of v4x32 and v4xf32 into an insertps, but it missed a case where:

1. A single extracted element is used twice.
2. The lower of the two non-zero indexes should be preserved, and the higher should be used for the dest mask.

This caused a crash, since the source value for the insertps ends-up uninitialized.

Differential Revision: http://reviews.llvm.org/D6377

llvm-svn: 222635
2014-11-23 13:09:06 +00:00
Elena Demikhovsky
36a2243ab7 Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 222632
2014-11-23 08:07:43 +00:00
Matt Arsenault
1b03538afe R600: Fix extloads of i1 on R600/Evergreen
llvm-svn: 222631
2014-11-23 02:57:54 +00:00
Matt Arsenault
5edd328299 R600/SI: Add additional tests for i1 loads
llvm-svn: 222629
2014-11-23 02:57:50 +00:00
Matt Arsenault
1d9ec67967 R600/SI: Fix broken check lines and modernize prefixes
Use -LABEL and remove -CHECK

llvm-svn: 222628
2014-11-23 02:57:49 +00:00
Matt Arsenault
a30225b261 R600/SI: Fix missing -verify-machineinstrs on a test
llvm-svn: 222627
2014-11-23 02:57:47 +00:00
David Majnemer
0b413925f3 InstCombine: Propagate exact for (sdiv X, Pow2) -> (udiv X, Pow2)
llvm-svn: 222625
2014-11-22 20:00:41 +00:00
David Majnemer
ba33e07fad InstCombine: Propagate exact for (sdiv X, Y) -> (udiv X, Y)
llvm-svn: 222624
2014-11-22 20:00:38 +00:00
David Majnemer
e3d9e29780 InstCombine: Propagate exact for (sdiv -X, C) -> (sdiv X, -C)
llvm-svn: 222623
2014-11-22 20:00:34 +00:00
David Majnemer
23e1540ef9 InstCombine: Propagate exact in (udiv (lshr X,C1),C2) -> (udiv x,C1<<C2)
llvm-svn: 222620
2014-11-22 18:16:54 +00:00
David Majnemer
1847177b9b InstCombine: Propagate NSW/NUW for X*(1<<Y) -> X<<Y
llvm-svn: 222613
2014-11-22 08:57:02 +00:00
David Majnemer
3c7153d5d6 InstCombine: Propagate NSW for -X * -Y -> X * Y
llvm-svn: 222612
2014-11-22 07:25:19 +00:00
David Majnemer
26583aff1f InstSimplify: Simplify (sub 0, X) -> X if it's NUW
This is a generalization of the X - (0 - Y) -> X transform.

llvm-svn: 222611
2014-11-22 07:15:16 +00:00
Chandler Carruth
4c0a2a8001 [x86] Add some tests for a common unpack pattern of vector shuffle that
has a remarkably unique and efficient lowering.

While we get this some of the time already, we miss a few cases and
there wasn't a principled reason we got it. We should at least test
this. v8 already has tests for this pattern.

llvm-svn: 222607
2014-11-22 05:44:43 +00:00
David Majnemer
c405b87f53 InstCombine: Preserve nsw when folding X*(2^C) -> X << C
llvm-svn: 222606
2014-11-22 04:52:55 +00:00
David Majnemer
96d9c67b69 InstCombine: Preserve nsw/nuw for ((X << C2)*C1) -> (X * (C1 << C2))
llvm-svn: 222605
2014-11-22 04:52:52 +00:00
David Majnemer
6191590b23 InstCombine: Preserve nsw for (mul %V, -1) -> (sub 0, %V)
llvm-svn: 222604
2014-11-22 04:52:38 +00:00
Gerolf Hoflehner
cb87bd4853 [InstCombine] Re-commit of r218721 (Optimize icmp-select-icmp sequence)
Fixes the self-host fail. Note that this commit activates dominator
analysis in the combiner by default (like the original commit did).

llvm-svn: 222590
2014-11-21 23:36:44 +00:00
Joerg Sonnenberger
00a4fe60d0 Fix transformation of add with pc argument to adr for non-immediate
arguments.

llvm-svn: 222587
2014-11-21 22:39:34 +00:00
Kostya Serebryany
ec6bd28ded [asan] remove old experimental code
llvm-svn: 222586
2014-11-21 22:34:29 +00:00
Tom Stellard
3929e69978 R600/SI: Add a failing test case for offset order in ds_read2 instructions
llvm-svn: 222585
2014-11-21 22:31:47 +00:00
Tom Stellard
a112fe4e40 R600/SI: Emit s_mov_b32 m0, -1 before every DS instruction
This s_mov_b32 will write to a virtual register from the M0Reg
class and all the ds instructions now take an extra M0Reg explicit
argument.

This change is necessary to prevent issues with the scheduler
mixing together instructions that expect different values in the m0
registers.

llvm-svn: 222583
2014-11-21 22:31:44 +00:00
Tom Stellard
484f10138e R600/SI: Add SIFoldOperands pass
This pass attempts to fold the source operands of mov and copy
instructions into their uses.

llvm-svn: 222581
2014-11-21 22:06:37 +00:00
Jozef Kolek
52fa965cf8 [mips][microMIPS] This patch implements functionality in MIPS delay slot
filler such as if delay slot filler have to put NOP instruction into the
delay slot of microMIPS BEQ or BNE instruction which uses the register $0,
then instead of emitting NOP this instruction is replaced by the corresponding
microMIPS compact branch instruction, i.e. BEQZC or BNEZC.

Differential Revision: http://reviews.llvm.org/D3566

llvm-svn: 222580
2014-11-21 22:04:35 +00:00
Tom Stellard
267a21d6d7 R600/SI: Use hex notation for constant in test
llvm-svn: 222578
2014-11-21 22:00:13 +00:00
Colin LeMahieu
4986bc53c5 [Hexagon] Adding sxth instruction.
llvm-svn: 222577
2014-11-21 21:54:59 +00:00
Colin LeMahieu
9a7b747bf6 [Hexagon] Adding sxtb instruction. Renaming some identically named classes that will be removed after converting referencing defs.
llvm-svn: 222575
2014-11-21 21:35:52 +00:00
Manman Ren
8be1069f3f Debug Info: revert r222195, r222210 and r222239.
This is no longer needed after David's fix at r222377 + r222485.
rdar://18958417

llvm-svn: 222563
2014-11-21 19:55:23 +00:00
Sanjay Patel
776e5485fb Add a feature flag for slow 32-byte unaligned memory accesses [x86].
This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen
for Sandy Bridge and Ivy Bridge. There is no functionality change intended for 
those chips. Previously, the absence of AVX2 was being used as a proxy to detect
this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2
that do not have the 32-byte unaligned access slowdown.

Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).

Differential Revision: http://reviews.llvm.org/D6355

llvm-svn: 222544
2014-11-21 17:40:04 +00:00
Chandler Carruth
5e598c0342 [x86] Restructure the checking patterns for v16 and v32 avx2 vector
shuffle lowering to allow much better blend matching.

Specifically, with the new structure the code seems clearer to me and we
correctly can hit the cases where merging two 128-bit lanes is a clear
win and can be shuffled cheaply afterward.

llvm-svn: 222539
2014-11-21 14:53:03 +00:00
Chandler Carruth
7491f1f32f [x86] Make the previous logic significantly less conservative and get
a bunch more improvements.

Non-lane-crossing is fine, the key is that lane merging only makes sense
for single-input shuffles. Not sure why I got so turned around here. The
code all works, I was just using the wrong model for it.

This only updates v4 and v8 lowering. The v16 and v32 lowering requires
restructuring the entire check sequence.

llvm-svn: 222537
2014-11-21 14:33:24 +00:00
Andrea Di Biagio
0a8cf1ad5a [DAG] Teach how to turn a build_vector into a shuffle if some of the operands are zero.
Before this patch, the DAGCombiner only tried to convert build_vector dag nodes
into shuffles if all operands were either extract_vector_elt or undef.

This patch improves that logic and teaches the DAGCombiner how to deal with
build_vector dag nodes where one or more operands are zero. A build_vector
dag node with some zero operands is turned into a shuffle only if the resulting
shuffle mask is legal for the target.

llvm-svn: 222536
2014-11-21 14:32:06 +00:00
Chandler Carruth
8387bec088 [x86] Teach the x86 vector shuffle lowering to detect mergable 128-bit
lanes.

By special casing these we can often either reduce the total number of
shuffles significantly or reduce the number of (high latency on Haswell)
AVX2 shuffles that potentially cross 128-bit lanes. Even when these
don't actually cross lanes, they have much higher latency to support
that. Doing two of them and a blend is worse than doing a single insert
across the 128-bit lanes to blend and then doing a single interleaved
shuffle.

While this seems like a narrow case, it kept cropping up on me and the
difference is *huge* as you can see in many of the test cases. I first
hit this trying to perfectly fix the interleaving shuffle patterns used
by Halide for AVX2.

llvm-svn: 222533
2014-11-21 13:56:05 +00:00
Chandler Carruth
5646862a2e [x86] Remove more windows line endings that slipped into this file...
llvm-svn: 222528
2014-11-21 12:33:46 +00:00
Chandler Carruth
2db7c4cf32 [x86] Add a bunch of test cases to 256-bit shuffles that exercise
merging 128-bit subvectors and also shuffling all the elements of those
subvectors. Currently we generate pretty bad code for many of these, but
I'm testing a patch that should dramatically improve this in addition to
making the shuffle lowering robust to other changes.

llvm-svn: 222525
2014-11-21 12:17:50 +00:00
Alexey Volkov
235268b4ed [X86] For Silvermont CPU use 16-bit division instead of 64-bit for small positive numbers
Differential Revision: http://reviews.llvm.org/D5938

llvm-svn: 222521
2014-11-21 11:19:34 +00:00
Yury Gribov
cb671c0b2c [asan] Add new hidden compile-time flag asan-instrument-allocas to sanitize variable-sized dynamic allocas. Patch by Max Ostapenko.
Reviewed at http://reviews.llvm.org/D6055

llvm-svn: 222519
2014-11-21 10:29:50 +00:00
Hao Liu
9cb82be410 DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal.
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)

A hook is added to allow the target to control whether it needs to do such combine.

Reviewed in http://reviews.llvm.org/D6334

llvm-svn: 222510
2014-11-21 06:39:58 +00:00
Hal Finkel
ac26448a5c [PPC] Use SeparateConstOffsetFromGEP
This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in
the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on
SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM,
there is a store moved out of the inner loop) and a potential speedup on
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it
makes some code look cleaner, and synchronizing the backends in this regard
seems like a generally good thing.

llvm-svn: 222504
2014-11-21 04:35:51 +00:00
David Majnemer
8a561be3da SROA: The alloca type isn't a candidate promotion type for vectors
The alloca's type is irrelevant, only those types which are used in a
load or store of the exact size of the slice should be considered.

This manifested as an assertion failure when we compared the various
types: we had a size mismatch.

This fixes PR21480.

llvm-svn: 222499
2014-11-21 02:34:55 +00:00
Quentin Colombet
8fea50c066 [X86] Do not custom lower UINT_TO_FP when the target type does not
match the custom lowering.

<rdar://problem/19026326>

llvm-svn: 222489
2014-11-21 00:47:19 +00:00
Michael Zolotukhin
7e8ae7cad7 Fix a trip-count overflow issue in LoopUnroll.
Currently LoopUnroll generates a prologue loop before the main loop
body to execute first N%UnrollFactor iterations. Also, this loop is
used if trip-count can overflow - it's determined by a runtime check.

However, we've been mistakenly optimizing this loop to a linear code for
UnrollFactor = 2, not taking into account that it also serves as a safe
version of the loop if its trip-count overflows.

llvm-svn: 222451
2014-11-20 20:19:55 +00:00
Saleem Abdulrasool
0de13e90eb X86: use the correct alloca symbol for Windows Itanium
Windows itanium targets the MSVCRT, and the stack probe symbol is provided by
MSVCRT.  This corrects the emission of stack probes on i686-windows-itanium.

llvm-svn: 222439
2014-11-20 18:01:26 +00:00
Renato Golin
daa4ba29b8 MCJIT tests passing on ARM after r222414 fixed the relocation
llvm-svn: 222430
2014-11-20 13:32:16 +00:00
Jyoti Allur
0aaf89456e [ELF] Prevent ARM ELF object writer from generating deprecated relocation code R_ARM_PLT32
llvm-svn: 222414
2014-11-20 05:58:11 +00:00
David Majnemer
44cf22795f Add a test for r221870
bad-relocs.obj.coff-i386 has a relocation whose symbol index is outside
the symbol table.

llvm-svn: 222413
2014-11-20 05:32:10 +00:00
Colin LeMahieu
384b462d47 [Hexagon] Adding A2_xor instruction with IR selection pattern and test.
llvm-svn: 222399
2014-11-19 23:22:23 +00:00
Chad Rosier
eafdf66096 Revert "[Reassociate] As the expression tree is rewritten make sure the operands are"
This reverts commit r222142.  This is causing/exposing an execution-time regression
in spec2006/gcc and coremark on AArch64/A57/Ofast.

Conflicts:

	test/Transforms/Reassociate/optional-flags.ll

llvm-svn: 222398
2014-11-19 23:21:20 +00:00
Colin LeMahieu
35e8a8aa73 [Hexagon] Adding A2_or instruction with IR selection pattern and test.
llvm-svn: 222396
2014-11-19 22:58:04 +00:00
Andrea Di Biagio
b770dd344e [X86] Improved lowering of v4x32 build_vector dag nodes.
This patch improves the lowering of v4f32 and v4i32 build_vector dag nodes
that are known to have at least two non-zero elements.

With this patch, a build_vector that performs a blend with zero is 
converted into a shuffle. This is done to let the shuffle legalizer expand
the dag node in a optimal way. For example, if we know that a build_vector
performs a blend with zero, we can try to lower it as a movq/blend instead of
always selecting an insertps.

This patch also improves the logic that lowers a build_vector into a insertps
with zero masking. See for example the extra test cases added to test sse41.ll.

Differential Revision: http://reviews.llvm.org/D6311

llvm-svn: 222375
2014-11-19 19:34:29 +00:00
Tom Stellard
271c0a936e R600/SI: Make SIInstrInfo::isOperandLegal() more strict
A register operand that has a common sub-class with its instruction's
defined register class is not always legal.  For example,
SReg_32 and M0Reg both have a common sub-class, but we can't
use an SReg_32 in instructions that expect a M0Reg.

This prevents the llvm.SI.sendmsg.ll test from failing when the fold
operand pass is added.

llvm-svn: 222368
2014-11-19 16:58:49 +00:00
Zoran Jovanovic
ebf19d975c [mips][micromips] Implement SWM32 and LWM32 instructions
Differential Revision: http://reviews.llvm.org/D5519

llvm-svn: 222367
2014-11-19 16:44:02 +00:00
Suyog Sarda
e6a1f30c00 Vectorize a reduction chain feeding into a 'return' statement.
e.x 
return (a[0]+b[0]) + (a[1]+b[1])

Differential Revision: http://reviews.llvm.org/D6227

llvm-svn: 222364
2014-11-19 16:07:38 +00:00