1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 12:33:33 +02:00
Commit Graph

127972 Commits

Author SHA1 Message Date
Cong Hou
7046b71a1c Move test/CodeGen/Generic/pr26652.ll to test/CodeGen/X86/pr26652.ll and test it only on X86.
llvm-svn: 261807
2016-02-25 00:12:18 +00:00
Sanjay Patel
d0dcf18d3d fix typo
llvm-svn: 261805
2016-02-24 23:44:19 +00:00
Cong Hou
6af13323da Detecte vector reduction operations just before instruction selection.
(This is the second attemp to commit this patch, after fixing pr26652 & pr26653).

This patch detects vector reductions before instruction selection. Vector
reductions are vectorized reduction operations, and for such operations we have
freedom to reorganize the elements of the result as long as the reduction of them
stay unchanged. This will enable some reduction pattern recognition during
instruction combine such as SAD/dot-product on X86. A flag is added to
SDNodeFlags to mark those vector reduction nodes to be checked during instruction
combine.

To detect those vector reductions, we search def-use chains starting from the
given instruction, and check if all uses fall into two categories:

1. Reduction with another vector.
2. Reduction on all elements.

in which 2 is detected by recognizing the pattern that the loop vectorizer
generates to reduce all elements in the vector outside of the loop, which
includes several ShuffleVector and one ExtractElement instructions.


Differential revision: http://reviews.llvm.org/D15250

llvm-svn: 261804
2016-02-24 23:40:36 +00:00
Sanjay Patel
234a2e7455 add tests to show missing bitcasted logic transform
llvm-svn: 261799
2016-02-24 22:31:18 +00:00
Amaury Sechet
7446ce3b8d Add capability to push/pop DFI in MCStreamer. NFC
Summary: This is extracted from D17555

Reviewers: davidxl, reames, sanjoy, MatzeB, pete

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17579

llvm-svn: 261796
2016-02-24 22:25:18 +00:00
Anna Zaks
22dfbced29 [asan] Do not instrument globals in the special "LLVM" sections
llvm-svn: 261794
2016-02-24 22:12:18 +00:00
Matthias Braun
33cae27aaa MachineInstr: Respect register aliases in clearRegiserKills()
This fixes bugs in copy elimination code in llvm. It slightly changes the
semantics of clearRegisterKills(). This is appropriate because:
- Users in lib/CodeGen/MachineCopyPropagation.cpp and
  lib/Target/AArch64RedundantCopyElimination.cpp and
  lib/Target/SystemZ/SystemZElimCompare.cpp are incorrect without it
  (see included testcase).
- All other users in llvm are unaffected (they pass TRI==nullptr)
- (Kill flags are optional anyway so removing too many shouldn't hurt.)

Differential Revision: http://reviews.llvm.org/D17554

llvm-svn: 261763
2016-02-24 19:21:48 +00:00
Tim Northover
681f4f01b7 AArch64: remove CRC feature from Cyclone.
Turns out we don't actually support those instructions.

llvm-svn: 261759
2016-02-24 18:10:17 +00:00
Teresa Johnson
d364501c91 [ThinLTO] Add missing breaks when parsing summaries (NFC)
This wasn't causing a correctness issue, but was causing extra duplicate
entries to be added to the SummaryMap.

llvm-svn: 261757
2016-02-24 17:57:28 +00:00
David Majnemer
f106ac635e [SimplifyCFG] Use a more elegant solution than r261731
The cleanupret instruction has an invariant that it's 'from' operand be
a cleanuppad.  This invariant was violated when we removed a dead block
which removed a cleanuppad leaving behind a cleanupret with an undef
'from' operand.

This was solved in r261731 by staving off the removal of the dead block
to a later pass.

However, it occured to me that we do not need to do this.
Instead, we can simply avoid processing the cleanupret if it has an
undef 'from' operand because we know that it will be removed soon.

llvm-svn: 261754
2016-02-24 17:30:48 +00:00
Simon Pilgrim
25738c3f4f [X86][SSSE3] Added target shuffle combine tests for SSE3/SSSE3 specific shuffles.
Allows us to test SSSE3 PSHUFB intrinsic.

llvm-svn: 261753
2016-02-24 17:08:59 +00:00
Sanjay Patel
ade13ec1de remove fixme comment that was fixed with r261750
llvm-svn: 261752
2016-02-24 17:08:29 +00:00
Sanjay Patel
97801c676a [InstCombine] enable optimization of casted vector xor instructions
This is part of the payoff for the refactoring in:
http://reviews.llvm.org/rL261649
http://reviews.llvm.org/rL261707

In addition to removing a pile of duplicated code, the xor case was
missing the optimization for vector types because it checked
"SrcTy->isIntegerTy()" rather than "SrcTy->isIntOrIntVectorTy()"
like 'and' and 'or' were already doing.

This solves part of:
https://llvm.org/bugs/show_bug.cgi?id=26702

llvm-svn: 261750
2016-02-24 17:00:34 +00:00
Sanjay Patel
70d4e7a02d add test to show missing bitcasted vector xor fold
llvm-svn: 261748
2016-02-24 16:34:29 +00:00
Anton Korobeynikov
173f5d7e53 MSP430InstrInfo::loadRegFromStackSlot forgets to set register def.
Summary:
For instance, compiling the below results in a panic:

```
llc: ../lib/CodeGen/InlineSpiller.cpp:1140: bool (anonymous namespace)::InlineSpiller::foldMemoryOperand(ArrayRef<std::pair<MachineInstr *, unsigned int> >, llvm::MachineInstr *): Assertion `MO->isDead() && "Cannot fold physreg def"' failed.
#0 0x00007f50fbcf353e llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/h/3rd/llvm/build/../lib/Support/Unix/Signals.inc:321:15
#1 0x00007f50fbcf3929 PrintStackTraceSignalHandler(void*) /home/h/3rd/llvm/build/../lib/Support/Unix/Signals.inc:380:1
#2 0x00007f50fbcf22a3 llvm::sys::RunSignalHandlers() /home/h/3rd/llvm/build/../lib/Support/Signals.cpp:45:5
#3 0x00007f50fbcf3bb4 SignalHandler(int) /home/h/3rd/llvm/build/../lib/Support/Unix/Signals.inc:210:1
#4 0x00007f50fa87a180 (/lib/x86_64-linux-gnu/libc.so.6+0x35180)
#5 0x00007f50fa87a107 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x35107)
#6 0x00007f50fa87b4e8 abort (/lib/x86_64-linux-gnu/libc.so.6+0x364e8)
#7 0x00007f50fa873226 (/lib/x86_64-linux-gnu/libc.so.6+0x2e226)
#8 0x00007f50fa8732d2 (/lib/x86_64-linux-gnu/libc.so.6+0x2e2d2)
#9 0x00007f50fddd9287 (anonymous namespace)::InlineSpiller::foldMemoryOperand(llvm::ArrayRef<std::pair<llvm::MachineInstr*, unsigned int> >, llvm::MachineInstr*) /home/h/3rd/llvm/build/../lib/CodeGen/InlineSpiller.cpp:1141:21
#10 0x00007f50fddd9ee9 (anonymous namespace)::InlineSpiller::spillAroundUses(unsigned int) /home/h/3rd/llvm/build/../lib/CodeGen/InlineSpiller.cpp:1286:9
#11 0x00007f50fddd388b (anonymous namespace)::InlineSpiller::spillAll() /home/h/3rd/llvm/build/../lib/CodeGen/InlineSpiller.cpp:1338:21
#12 0x00007f50fddd221d (anonymous namespace)::InlineSpiller::spill(llvm::LiveRangeEdit&) /home/h/3rd/llvm/build/../lib/CodeGen/InlineSpiller.cpp:1391:3
#13 0x00007f50fdfd921b (anonymous namespace)::RAGreedy::selectOrSplitImpl(llvm::LiveInterval&, llvm::SmallVectorImpl<unsigned int>&, llvm::SmallSet<unsigned int, 16u, std::less<unsigned int> >&, unsigned int) /home/h/3rd/llvm/build/../lib/CodeGen/RegAllocGreedy.cpp:2555:5
#14 0x00007f50fdfd647b (anonymous namespace)::RAGreedy::selectOrSplit(llvm::LiveInterval&, llvm::SmallVectorImpl<unsigned int>&) /home/h/3rd/llvm/build/../lib/CodeGen/RegAllocGreedy.cpp:2221:12
#15 0x00007f50fdfc89f9 llvm::RegAllocBase::allocatePhysRegs() /home/h/3rd/llvm/build/../lib/CodeGen/RegAllocBase.cpp:110:14
#16 0x00007f50fdfd6337 (anonymous namespace)::RAGreedy::runOnMachineFunction(llvm::MachineFunction&) /home/h/3rd/llvm/build/../lib/CodeGen/RegAllocGreedy.cpp:2611:3
#17 0x00007f50fded33ee llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/h/3rd/llvm/build/../lib/CodeGen/MachineFunctionPass.cpp:43:3
#18 0x00007f50fd6cdc6f llvm::FPPassManager::runOnFunction(llvm::Function&) /home/h/3rd/llvm/build/../lib/IR/LegacyPassManager.cpp:1550:23
#19 0x00007f50fd6cdf85 llvm::FPPassManager::runOnModule(llvm::Module&) /home/h/3rd/llvm/build/../lib/IR/LegacyPassManager.cpp:1571:16
#20 0x00007f50fd6ce71a (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /home/h/3rd/llvm/build/../lib/IR/LegacyPassManager.cpp:1627:23
#21 0x00007f50fd6ce246 llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/h/3rd/llvm/build/../lib/IR/LegacyPassManager.cpp:1730:16
#22 0x00007f50fd6cec31 llvm::legacy::PassManager::run(llvm::Module&) /home/h/3rd/llvm/build/../lib/IR/LegacyPassManager.cpp:1761:3
#23 0x0000000000415bdc compileModule(char**, llvm::LLVMContext&) /home/h/3rd/llvm/build/../tools/llc/llc.cpp:405:5
#24 0x0000000000414571 main /home/h/3rd/llvm/build/../tools/llc/llc.cpp:211:13
#25 0x00007f50fa866b45 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b45)
#26 0x0000000000414296 _start (/home/h/3rd/llvm/build/bin/llc+0x414296)
Stack dump:
0.	Program arguments: ./bin/llc -mtriple msp430 loadstore.ll 
1.	Running pass 'Function Pass Manager' on module 'loadstore.ll'.
2.	Running pass 'Greedy Register Allocator' on function '@inc'
```

Original IR:

```llvm
%struct.VeryLarge = type { i8, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }

; Function Attrs: norecurse nounwind
define void @inc(%struct.VeryLarge* noalias nocapture sret %agg.result, %struct.VeryLarge* byval align 1 %s) #0 {
entry:
  %p0 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 0
  %0 = load i8, i8* %p0, align 1, !tbaa !1
  %p1 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 1
  %1 = load i32, i32* %p1, align 1, !tbaa !6
  %p2 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 2
  %2 = load i32, i32* %p2, align 1, !tbaa !7
  %p3 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 3
  %3 = load i32, i32* %p3, align 1, !tbaa !8
  %p4 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 4
  %4 = load i32, i32* %p4, align 1, !tbaa !9
  %p5 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 5
  %5 = load i32, i32* %p5, align 1, !tbaa !10
  %p6 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 6
  %6 = load i32, i32* %p6, align 1, !tbaa !11
  %p7 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 7
  %7 = load i32, i32* %p7, align 1, !tbaa !12
  %p8 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 8
  %8 = load i32, i32* %p8, align 1, !tbaa !13
  %p9 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 9
  %9 = load i32, i32* %p9, align 1, !tbaa !14
  %p10 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 10
  %10 = load i32, i32* %p10, align 1, !tbaa !15
  %p11 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 11
  %11 = load i32, i32* %p11, align 1, !tbaa !16
  %p12 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 12
  %12 = load i32, i32* %p12, align 1, !tbaa !17
  %p13 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 13
  %13 = load i32, i32* %p13, align 1, !tbaa !18
  %p14 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 14
  %14 = load i32, i32* %p14, align 1, !tbaa !19
  %p15 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 15
  %15 = load i32, i32* %p15, align 1, !tbaa !20
  %p16 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 16
  %16 = load i32, i32* %p16, align 1, !tbaa !21
  %p17 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 17
  %17 = load i32, i32* %p17, align 1, !tbaa !22
  %p18 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 18
  %18 = load i32, i32* %p18, align 1, !tbaa !23
  %p19 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 19
  %19 = load i32, i32* %p19, align 1, !tbaa !24
  %p20 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 20
  %20 = load i32, i32* %p20, align 1, !tbaa !25
  %p21 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 21
  %21 = load i32, i32* %p21, align 1, !tbaa !26
  %p22 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 22
  %22 = load i32, i32* %p22, align 1, !tbaa !27
  %p23 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 23
  %23 = load i32, i32* %p23, align 1, !tbaa !28
  %p24 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 24
  %24 = load i32, i32* %p24, align 1, !tbaa !29
  %p25 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 25
  %25 = load i32, i32* %p25, align 1, !tbaa !30
  %p26 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 26
  %26 = load i32, i32* %p26, align 1, !tbaa !31
  %p27 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 27
  %27 = load i32, i32* %p27, align 1, !tbaa !32
  %p28 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 28
  %28 = load i32, i32* %p28, align 1, !tbaa !33
  %p29 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 29
  %29 = load i32, i32* %p29, align 1, !tbaa !34
  %p30 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 30
  %30 = load i32, i32* %p30, align 1, !tbaa !35
  %p31 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 31
  %31 = load i32, i32* %p31, align 1, !tbaa !36
  %p32 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 32
  %32 = load i32, i32* %p32, align 1, !tbaa !37
  %add = add i8 %0, 1
  store i8 %add, i8* %p0, align 1, !tbaa !1
  %add2 = add i32 %1, 2
  store i32 %add2, i32* %p1, align 1, !tbaa !6
  %add3 = add i32 %2, 3
  store i32 %add3, i32* %p2, align 1, !tbaa !7
  %add4 = add i32 %3, 4
  store i32 %add4, i32* %p3, align 1, !tbaa !8
  %add5 = add i32 %4, 5
  store i32 %add5, i32* %p4, align 1, !tbaa !9
  %add6 = add i32 %5, 6
  store i32 %add6, i32* %p5, align 1, !tbaa !10
  %add7 = add i32 %6, 7
  store i32 %add7, i32* %p6, align 1, !tbaa !11
  %add8 = add i32 %7, 8
  store i32 %add8, i32* %p7, align 1, !tbaa !12
  %add9 = add i32 %8, 9
  store i32 %add9, i32* %p8, align 1, !tbaa !13
  %add10 = add i32 %9, 10
  store i32 %add10, i32* %p9, align 1, !tbaa !14
  %add11 = add i32 %10, 11
  store i32 %add11, i32* %p10, align 1, !tbaa !15
  %add12 = add i32 %11, 12
  store i32 %add12, i32* %p11, align 1, !tbaa !16
  %add13 = add i32 %12, 13
  store i32 %add13, i32* %p12, align 1, !tbaa !17
  %add14 = add i32 %13, 14
  store i32 %add14, i32* %p13, align 1, !tbaa !18
  %add15 = add i32 %14, 15
  store i32 %add15, i32* %p14, align 1, !tbaa !19
  %add16 = add i32 %15, 16
  store i32 %add16, i32* %p15, align 1, !tbaa !20
  %add17 = add i32 %16, 17
  store i32 %add17, i32* %p16, align 1, !tbaa !21
  %add18 = add i32 %17, 18
  store i32 %add18, i32* %p17, align 1, !tbaa !22
  %add19 = add i32 %18, 19
  store i32 %add19, i32* %p18, align 1, !tbaa !23
  %add20 = add i32 %19, 20
  store i32 %add20, i32* %p19, align 1, !tbaa !24
  %add21 = add i32 %20, 21
  store i32 %add21, i32* %p20, align 1, !tbaa !25
  %add22 = add i32 %21, 22
  store i32 %add22, i32* %p21, align 1, !tbaa !26
  %add23 = add i32 %22, 23
  store i32 %add23, i32* %p22, align 1, !tbaa !27
  %add24 = add i32 %23, 24
  store i32 %add24, i32* %p23, align 1, !tbaa !28
  %add25 = add i32 %24, 25
  store i32 %add25, i32* %p24, align 1, !tbaa !29
  %add26 = add i32 %25, 26
  store i32 %add26, i32* %p25, align 1, !tbaa !30
  %add27 = add i32 %26, 27
  store i32 %add27, i32* %p26, align 1, !tbaa !31
  %add28 = add i32 %27, 28
  store i32 %add28, i32* %p27, align 1, !tbaa !32
  %add29 = add i32 %28, 29
  store i32 %add29, i32* %p28, align 1, !tbaa !33
  %add30 = add i32 %29, 30
  store i32 %add30, i32* %p29, align 1, !tbaa !34
  %add31 = add i32 %30, 31
  store i32 %add31, i32* %p30, align 1, !tbaa !35
  %add32 = add i32 %31, 32
  store i32 %add32, i32* %p31, align 1, !tbaa !36
  %add33 = add i32 %32, 33
  store i32 %add33, i32* %p32, align 1, !tbaa !37
  %33 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %agg.result, i32 0, i32 0
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %33, i8* %p0, i32 129, i32 1, i1 false), !tbaa.struct !38
  ret void
}

; Function Attrs: argmemonly nounwind
declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture readonly, i32, i32, i1) #1

attributes #0 = { norecurse nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { argmemonly nounwind }

!llvm.ident = !{!0}

!0 = !{!"clang version 3.8.0 (git://github.com/llvm-mirror/clang 40ef2b7531472c41212c4719a9294aeb7bddebbc) (git://github.com/llvm-mirror/llvm c601eaf55606dfb9ad372b514b77aa00d1409be1)"}
!1 = !{!2, !3, i64 0}
!2 = !{!"", !3, i64 0, !5, i64 1, !5, i64 5, !5, i64 9, !5, i64 13, !5, i64 17, !5, i64 21, !5, i64 25, !5, i64 29, !5, i64 33, !5, i64 37, !5, i64 41, !5, i64 45, !5, i64 49, !5, i64 53, !5, i64 57, !5, i64 61, !5, i64 65, !5, i64 69, !5, i64 73, !5, i64 77, !5, i64 81, !5, i64 85, !5, i64 89, !5, i64 93, !5, i64 97, !5, i64 101, !5, i64 105, !5, i64 109, !5, i64 113, !5, i64 117, !5, i64 121, !5, i64 125}
!3 = !{!"omnipotent char", !4, i64 0}
!4 = !{!"Simple C/C++ TBAA"}
!5 = !{!"int", !3, i64 0}
!6 = !{!2, !5, i64 1}
!7 = !{!2, !5, i64 5}
!8 = !{!2, !5, i64 9}
!9 = !{!2, !5, i64 13}
!10 = !{!2, !5, i64 17}
!11 = !{!2, !5, i64 21}
!12 = !{!2, !5, i64 25}
!13 = !{!2, !5, i64 29}
!14 = !{!2, !5, i64 33}
!15 = !{!2, !5, i64 37}
!16 = !{!2, !5, i64 41}
!17 = !{!2, !5, i64 45}
!18 = !{!2, !5, i64 49}
!19 = !{!2, !5, i64 53}
!20 = !{!2, !5, i64 57}
!21 = !{!2, !5, i64 61}
!22 = !{!2, !5, i64 65}
!23 = !{!2, !5, i64 69}
!24 = !{!2, !5, i64 73}
!25 = !{!2, !5, i64 77}
!26 = !{!2, !5, i64 81}
!27 = !{!2, !5, i64 85}
!28 = !{!2, !5, i64 89}
!29 = !{!2, !5, i64 93}
!30 = !{!2, !5, i64 97}
!31 = !{!2, !5, i64 101}
!32 = !{!2, !5, i64 105}
!33 = !{!2, !5, i64 109}
!34 = !{!2, !5, i64 113}
!35 = !{!2, !5, i64 117}
!36 = !{!2, !5, i64 121}
!37 = !{!2, !5, i64 125}
!38 = !{i64 0, i64 1, !39, i64 1, i64 4, !40, i64 5, i64 4, !40, i64 9, i64 4, !40, i64 13, i64 4, !40, i64 17, i64 4, !40, i64 21, i64 4, !40, i64 25, i64 4, !40, i64 29, i64 4, !40, i64 33, i64 4, !40, i64 37, i64 4, !40, i64 41, i64 4, !40, i64 45, i64 4, !40, i64 49, i64 4, !40, i64 53, i64 4, !40, i64 57, i64 4, !40, i64 61, i64 4, !40, i64 65, i64 4, !40, i64 69, i64 4, !40, i64 73, i64 4, !40, i64 77, i64 4, !40, i64 81, i64 4, !40, i64 85, i64 4, !40, i64 89, i64 4, !40, i64 93, i64 4, !40, i64 97, i64 4, !40, i64 101, i64 4, !40, i64 105, i64 4, !40, i64 109, i64 4, !40, i64 113, i64 4, !40, i64 117, i64 4, !40, i64 121, i64 4, !40, i64 125, i64 4, !40}
!39 = !{!3, !3, i64 0}
!40 = !{!5, !5, i64 0}
```



Reviewers: asl

Subscribers: qcolombet

Differential Revision: http://reviews.llvm.org/D17441

llvm-svn: 261746
2016-02-24 15:15:02 +00:00
Simon Pilgrim
9dfc61c3a6 [X86][SSE41] Combine vector blends with zero
Part 2 of 2
This patch add support for combining target shuffles into blends-with-zero.

Differential Revision: http://reviews.llvm.org/D17483

llvm-svn: 261745
2016-02-24 15:14:21 +00:00
Simon Pilgrim
21285b3ce4 [X86][SSE41] Combine insertion of zero scalars into vector blends with zero
Part 1 of 2
This patch attempts to replace the insertion of zero scalars with a vector blend with zero, avoiding the use of the integer insertion instructions (which are particularly slow on many targets).
(Part 2 will add support for combining multiple blends-with-zero).

Differential Revision: http://reviews.llvm.org/D17483

llvm-svn: 261743
2016-02-24 14:53:27 +00:00
Nikolay Haustov
c13c21ef6e [AMDGPU] Assembler: Simplify handling of optional operands
Prepare to support DPP encodings.

For DPP encodings, we want row_mask/bank_mask/bound_ctrl to be optional operands. However this means that when parsing instruction which has no mnemonic prefix, we cannot add both default values for VOP3 and for DPP optional operands to OperandVector - neither instructions would match. So add default values for optional operands to MCInst during conversion instead.

Mark more operands as IsOptional = 1 in .td files.
Do not add default values for optional operands to OperandVector in AMDGPUAsmParser.
Add default values for optional operands during conversion using new helper addOptionalImmOperand.
Change to cvtVOP3_2_mod to check instruction flag instead of presence of modifiers. In the future, cvtVOP3* functions can be combined into one.
Separate cvtFlat and cvtFlatAtomic.
Fix CNDMASK_B32 definition to have no modifiers.

Review: http://reviews.llvm.org/D17445

Reviewers: tstellarAMD
llvm-svn: 261742
2016-02-24 14:22:47 +00:00
Artur Pilipenko
5746dd289e NFC. Move isDereferenceable to Loads.h/cpp
This is a part of the refactoring to unify isSafeToLoadUnconditionally and isDereferenceablePointer functions. In subsequent change I'm going to eliminate isDerferenceableAndAlignedPointer from Loads API, leaving isSafeToLoadSpecualtively the only function to check is load instruction can be speculated.   

Reviewed By: hfinkel

Differential Revision: http://reviews.llvm.org/D16180

llvm-svn: 261736
2016-02-24 12:49:04 +00:00
Artur Pilipenko
e41e2d4c04 NFC. Move getAlignment helper function from ValueTracking to Value class.
Reviewed By: reames, hfinkel

Differential Revision: http://reviews.llvm.org/D16144

llvm-svn: 261735
2016-02-24 12:25:10 +00:00
Simon Pilgrim
306c90b833 [X86][SSE] Fixed vector rotation test name typo
Rotation of 16i6 vector not 8i16 vector - copy+paste is not your friend

llvm-svn: 261733
2016-02-24 11:39:13 +00:00
Nikolay Haustov
0caff3d99c [AMDGPU] fix amd_kernel_code_t bit field position as per spec (added missing reserved fields)
lit tests passed before and after because it doesn't test the binary representation of amd_kernel_code_t.

Patch by: Valery Pykhtin (Valery.Pykhtin@amd.com)

Reviewers: arsenm
llvm-svn: 261732
2016-02-24 10:54:25 +00:00
David Majnemer
f4887d90a6 [SimplifyCFG] Do not blindly remove unreachable blocks
DeleteDeadBlock was called indiscriminately, leading to cleanuprets with
undef cleanuppad references.

Instead, try to drain the BB of most of it's instructions if it is
unreachable.  We can then remove the BB if it solely consists of a
terminator (and maybe some phis).

llvm-svn: 261731
2016-02-24 10:02:16 +00:00
David Majnemer
83f0a59a00 [CodeView] Describe variables live in x87 registers
We didn't have a mapping from LLVM's x87 floating point registers to
CodeView's encoding.

llvm-svn: 261730
2016-02-24 10:01:24 +00:00
Simon Pilgrim
962ee76b8b [X86][SSE] Don't get target shuffle operands prematurely.
PerformShuffleCombine should be usable by unary and binary target shuffles, but was attempting to get the first two operands whatever the instruction type. Since these are only used for VECTOR_SHUFFLE instructions for one particular combine I've moved them inside the relevant if statement.

llvm-svn: 261727
2016-02-24 09:07:47 +00:00
Michael Zuckerman
d1c409a5af [LLVM][AVX512][PSHUFHW ][PSHUFLW ] Change imm8 to int
Differential Revision: http://reviews.llvm.org/D17538

llvm-svn: 261725
2016-02-24 08:39:05 +00:00
Igor Breger
8b9daa338d AVX512: Add vpmovzxbw/d/q ,vpmovzxw/d/q ,vpmovzxbdq lowering patterns that support 256bit inputs like AVX patterns ( that are disable in case HasVLX , see SS41I_pmovx_avx2_patterns).
Differential Revision: http://reviews.llvm.org/D17504

llvm-svn: 261724
2016-02-24 08:15:20 +00:00
Justin Bogner
c26c003b44 X86: Wrap a helper for an assert in #ifndef NDEBUG
This function is used in exactly one place, and only in asserts
builds. Move it a few lines up before the use and only define it when
asserts are enabled. Fixes the release build under -Werror.

Also remove the forward declaration and commentary that was basically
identical to the code itself.

llvm-svn: 261722
2016-02-24 07:58:02 +00:00
Matt Arsenault
6c6bd4573e AMDGPU: Check cheaper condition before SignBitIsZero
Don't do an expensive computeKnownBits call when we
can do the cheap check for legal offsets first.

llvm-svn: 261720
2016-02-24 04:55:29 +00:00
Sanjay Patel
4f966611aa [InstCombine] refactor visitOr() to use foldCastedBitwiseLogic()
Note: The 'and' case in foldCastedBitwiseLogic() is inheriting one extra
check from the nearly identical 'or' case:
  if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src))

But I'm not sure how to expose that difference in a regression test. 
Without that check, the 'or' path will infinite loop on:
test/Transforms/InstCombine/zext-or-icmp.ll
because the zext-or-icmp fold is attempting a reverse transform.

The refactoring should extend to the 'xor' case next to solve part of
PR26702.

llvm-svn: 261707
2016-02-23 23:56:23 +00:00
Jingyue Wu
e98e24a50c [doc] Obtaining help on LLVM's CUDA support.
llvm-svn: 261706
2016-02-23 23:34:49 +00:00
Derek Schuff
c81038d2b7 Revert "[WebAssembly] Stackify code emitted by eliminateFrameIndex"
This reverts r261685 due to wasm test breakage.

llvm-svn: 261702
2016-02-23 22:13:21 +00:00
Sanjay Patel
36b8418f0e minimize test and use FileCheck
llvm-svn: 261701
2016-02-23 22:03:44 +00:00
Tim Northover
9ff0bb755b AArch64: rename compact unwind forms back to UNWIND_ARM64_*. NFC.
Looks like the global rename last year was a bit over-zealous. These things
really are referred to with ARM64 elsewhere (ld64, libunwind, ...).

llvm-svn: 261698
2016-02-23 21:49:05 +00:00
Derek Schuff
74080913f6 [WebAssembly] Stackify code emitted by eliminateFrameIndex
llvm-svn: 261685
2016-02-23 21:25:17 +00:00
Chris Bieneman
c833f3d653 [CMake] Create an install-distribution target driven by LLVM_DISTRIBUTION_COMPONENTS
The idea here is to provide a customizable install target that only depends on building the things you actually want to install. It relies on each component being installed having an auto-generated install-${component}, which in turn depends only on the target being installed.

This is fundamentally a workaround for the fact that CMake generates build files which have their "install" target depend on the "all" target. This results in "ninja install" building a bunch of unneeded things.

llvm-svn: 261681
2016-02-23 20:33:53 +00:00
Tim Northover
ccd1c20321 ARM: fix handling of movw/movt relocations with addend.
We were emitting only one half of a the paired relocations needed for these
instructions because we decided that an offset needed a scattered relocation.
In fact, movw/movt relocations can be paired without being scattered.

llvm-svn: 261679
2016-02-23 20:20:23 +00:00
Geoff Berry
5377924081 [AArch64] Generate csinv instruction more often
Reviewers: t.p.northover, jmolloy

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D17546

llvm-svn: 261675
2016-02-23 19:34:13 +00:00
Xinliang David Li
e8ec116138 Fix comment
llvm-svn: 261672
2016-02-23 19:18:21 +00:00
Hans Wennborg
1c98857e6a Revert r261633 "Supporting all entities declared in lexical scope in LLVM debug info."
This and the corresponding Clang change caused PR26715.

llvm-svn: 261671
2016-02-23 19:17:03 +00:00
Davide Italiano
715c1d0a06 [X86ISelLowering] Stop typing the same return over and over and over.
llvm-svn: 261666
2016-02-23 18:39:38 +00:00
Weiming Zhao
df8f1f1370 Fix PR25339: ARM Constant Island
Summary:
Currently, the ARM Constant Island may not converge (or not converge quickly).
This patch let it move to the closest water after the user if it doesn't converge after 15 iterations.

This address https://llvm.org/bugs/show_bug.cgi?id=25339

Reviewers: t.p.northover, srhines, kristof.beyls, aadg, rengolin

Subscribers: weimingz, aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D16890

llvm-svn: 261665
2016-02-23 18:39:19 +00:00
Derek Schuff
313b544b12 [WebAssembly] Add TODO comment to revisit red zone size
llvm-svn: 261664
2016-02-23 18:17:46 +00:00
Derek Schuff
7fa1e4425f [WebAssembly] Implement red zone for user stack
Implements a mostly-conventional redzone for the userspace
stack. Because we have unsigned load/store offsets we continue to use a
local SP subtracted from the incoming SP but do not write it back to
memory.

Differential Revision: http://reviews.llvm.org/D17525

llvm-svn: 261662
2016-02-23 18:13:07 +00:00
Sanjay Patel
e4ec3d519e [InstCombine] improve readability ; NFCI
Less indenting, named local variables, more descriptive names.

llvm-svn: 261659
2016-02-23 17:41:34 +00:00
David Majnemer
29ae1c3d76 [WinEH] Don't inline an 'unwinds to caller' cleanupret into funclets which locally unwind
It is problematic if the inlinee has a cleanupret which unwinds to
caller and we inline it into a call site which doesn't unwind.

If the funclet unwinds anywhere other than to the caller,
then we will give the funclet two unwind destinations.
This will result in a verifier failure.

Seeing as how the caller wasn't an invoke (which would locally unwind)
and that the funclet cannot unwind to caller, we must conclude that an
'unwind to caller' cleanupret is dynamically unreachable.

This fixes PR26698.

Differential Revision: http://reviews.llvm.org/D17536

llvm-svn: 261656
2016-02-23 17:11:04 +00:00
Sanjay Patel
4e682fcfe8 [InstCombine] less indenting; NFC
llvm-svn: 261652
2016-02-23 16:59:21 +00:00
Geoff Berry
a37df81842 [AArch64] Fix fastcc -tailcallopt epilog code generation.
Summary:
Fix a bug in epilog generation where the incoming stack arguments were
not being popped for fastcc functions when -tailcallopt was passed.

Reviewers: t.p.northover, mcrosier, jmolloy, rengolin

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16894

llvm-svn: 261650
2016-02-23 16:54:36 +00:00
Sanjay Patel
70dd21aa9d [InstCombine] add helper function to foldCastedBitwiseLogic() ; NFCI
This is a straight cut and paste of the existing code and is intended to
be the first step in solving part of PR26702:
https://llvm.org/bugs/show_bug.cgi?id=26702

We should be able to reuse most of this and delete the nearly identical 
existing code in visitOr(). Then, we can enhance visitXor() to use the
same code too.

llvm-svn: 261649
2016-02-23 16:36:07 +00:00
Aaron Ballman
96f0a702b2 Silencing a signed vs unsigned mismatch.
llvm-svn: 261640
2016-02-23 15:02:43 +00:00