[X86] AMD Zen 3: SSE XMM moves are zero-cycle

I've verified this with llvm-exegesis. This is not limited to zero registers. Refs: AMD SOG 19h, 2.9.4 Zero Cycle Move The processor is able to execute certain register to register mov operations with zero cycle delay. Agner, 22.13 Instructions with no latency Register-to-register move instructions are resolved at the register rename stage without using any execution units. These instructions have zero latency. It is possible to do six such register renamings per clock cycle, and it is even possible to rename the same register multiple times in one clock cycle.
2024-11-23 03:02:36 +01:00 · 2021-05-07 16:15:43 +03:00 · 2021-05-07 16:15:43 +03:00 · 8c8821fc73
commit 8c8821fc73
parent e720a8cc78
2 changed files with 1246 additions and 1220 deletions
--- a/lib/Target/X86/X86ScheduleZnver3.td
+++ b/lib/Target/X86/X86ScheduleZnver3.td
@ -1464,7 +1464,7 @@ defm : Zn3WriteResYMM<WriteVecMoveY, [Zn3FPFMisc0123], 0, [1], 1>;
 def : IsOptimizableRegisterMove<[
  InstructionEquivalenceClass<[
    // GPR variants.
-    MOV32rr, MOV64rr
+    MOV32rr, MOV64rr,
    // FIXME: MOVSXD32rr, but it is only supported in disassembler.
    // FIXME: XCHG32rr/XCHG64rr after MCA is fixed

@ -1472,7 +1472,9 @@ def : IsOptimizableRegisterMove<[
    // MMX moves are *NOT* eliminated.

    // SSE variants.
-    // FIXME
+    MOVAPSrr, MOVUPSrr,
+    MOVAPDrr, MOVUPDrr,
+    MOVDQArr, MOVDQUrr

    // AVX variants.
    // FIXME
--- a/test/tools/llvm-mca/X86/Znver3/reg-move-elimination-sse-xmm.s
+++ b/test/tools/llvm-mca/X86/Znver3/reg-move-elimination-sse-xmm.s