1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 04:52:54 +02:00
llvm-mirror/test
Adhemerval Zanella 65794251d1 [AArch64] Add custom lowering for v4i8 trunc store
This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.

A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:

  define void @foo(<4 x i16> %x, i8* nocapture %p) {
    %0 = trunc <4 x i16> %x to <4 x i8>
    %1 = bitcast i8* %p to <4 x i8>*
    store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
    ret void
  }

Can be optimized from:

  umov    w8, v0.h[3]
  umov    w9, v0.h[2]
  umov    w10, v0.h[1]
  umov    w11, v0.h[0]
  strb    w8, [x0, #3]
  strb    w9, [x0, #2]
  strb    w10, [x0, #1]
  strb    w11, [x0]
  ret

To:

  xtn     v0.8b, v0.8h
  str     s0, [x0]
  ret

The patch also adjust the memory cost for autovectorization, so the C
code:

  void foo (const int *src, int width, unsigned char *dst)
  {
    for (int i = 0; i < width; i++)
       *dst++ = *src++;
  }

can be vectorized to:

  .LBB0_4:                                // %vector.body
                                          // =>This Inner Loop Header: Depth=1
        ldr     q0, [x0], #16
        subs    x12, x12, #4            // =4
        xtn     v0.4h, v0.4s
        xtn     v0.8b, v0.8h
        st1     { v0.s }[0], [x2], #4
        b.ne    .LBB0_4

Instead of byte operations.

llvm-svn: 335735
2018-06-27 13:58:46 +00:00
..
Analysis [AArch64] Add custom lowering for v4i8 trunc store 2018-06-27 13:58:46 +00:00
Assembler ConstantFold: Don't fold global address vs. null for addrspace != 0 2018-06-26 18:55:43 +00:00
Bindings
Bitcode [ThinLTO] Parse module summary index from assembly 2018-06-26 13:56:49 +00:00
BugPoint
CodeGen [AArch64] Add custom lowering for v4i8 trunc store 2018-06-27 13:58:46 +00:00
DebugInfo [Debugify] Diagnose mis-sized dbg.values 2018-06-26 22:46:41 +00:00
Examples
ExecutionEngine [ORC] Add LLJIT and LLLazyJIT, and replace OrcLazyJIT in LLI with LLLazyJIT. 2018-06-26 21:35:48 +00:00
Feature
FileCheck [FileCheck] Add CHECK-EMPTY directive for checking for blank lines 2018-06-26 15:15:45 +00:00
Instrumentation Revert "[asan] Instrument comdat globals on COFF targets" 2018-06-26 22:43:48 +00:00
Integer
JitListener
Linker
LTO [ThinLTO] Add per-module indexes to combined index consistently 2018-06-26 01:32:58 +00:00
MC Move REQUIRES: line to the top 2018-06-26 17:44:23 +00:00
Object [ELF] Change isSectionData to exclude SHF_EXECINSTR 2018-06-23 00:15:33 +00:00
ObjectYAML
Other Revert r335306 (and r335314) - the Call Graph Profile pass. 2018-06-22 05:33:57 +00:00
SafepointIRVerifier
SymbolRewriter
TableGen [IR] Split Intrinsics.inc into enums and implementations 2018-06-23 02:02:38 +00:00
ThinLTO/X86 [ThinLTO] Add string saver onto index for value names 2018-06-26 02:29:08 +00:00
tools Move REQUIRES: line to the top 2018-06-26 17:44:23 +00:00
Transforms [AArch64] Add custom lowering for v4i8 trunc store 2018-06-27 13:58:46 +00:00
Unit
Verifier Revert r335306 (and r335314) - the Call Graph Profile pass. 2018-06-22 05:33:57 +00:00
YAMLParser
.clang-format
CMakeLists.txt
lit.cfg.py [LIT] Enable testing of LLVM gold plugin on Mac OS X 2018-06-20 15:32:47 +00:00
lit.site.cfg.py.in
TestRunner.sh