1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 12:43:36 +01:00
llvm-mirror/test/tools/llvm-mca/X86/intel-syntax.s
Andrea Di Biagio 38b78fd66c [X86][BtVer2] Fix latency/throughput of scalar integer MUL instructions.
Single operand MUL instructions that implicitly set EAX have the following
latency/throughput profile (see below):

imul %cl              # latency: 3cy - uOPs: 1 - 1 JMul
imul %cx              # latency: 3cy - uOPs: 3 - 3 JMul
imul %ecx             # latency: 3cy - uOPs: 2 - 2 JMul
imul %rcx             # latency: 6cy - uOPs: 2 - 4 JMul

mul %cl               # latency: 3cy - uOPs: 1 - 1 JMul
mul %cx               # latency: 3cy - uOPs: 3 - 3 JMul
mul %ecx              # latency: 3cy - uOPs: 2 - 2 JMul
mul %rcx              # latency: 6cy - uOPs: 2 - 4 JMul

Excluding the 64bit variant, which has a latency of 6cy, every other instruction
has a latency of 3cy. However, the number of decoded macro-opcodes (as well as
the resource cyles) depend on the MUL size.

The two operand MULs have a more predictable profile (see below):

imul %dx, %dx         # latency: 3cy - uOPs: 1 - 1 JMul
imul %edx, %edx       # latency: 3cy - uOPs: 1 - 1 JMul
imul %rdx, %rdx       # latency: 6cy - uOPs: 1 - 4 JMul

imul $3, %dx, %dx     # latency: 4cy - uOPs: 2 - 2 JMul
imul $3, %ecx, %ecx   # latency: 3cy - uOPs: 1 - 1 JMul
imul $3, %rdx, %rdx   # latency: 6cy - uOPs: 1 - 4 JMul

This patch updates the values in the Jaguar scheduling model and regenerates
llvm-mca tests.

Differential Revision: https://reviews.llvm.org/D66547

llvm-svn: 369661
2019-08-22 15:20:16 +00:00

41 lines
1.7 KiB
ArmAsm

# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,INTEL
# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -resource-pressure=false -output-asm-variant=0 < %s | FileCheck %s -check-prefixes=ALL,ATT
# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -resource-pressure=false -output-asm-variant=1 < %s | FileCheck %s -check-prefixes=ALL,INTEL
.intel_syntax noprefix
mov eax, 1
mov ebx, 0xff
imul esi, edi
lea eax, [rsi + rdi]
# ALL: Iterations: 100
# ALL-NEXT: Instructions: 400
# ALL-NEXT: Total Cycles: 306
# ALL-NEXT: Total uOps: 400
# ALL: Dispatch Width: 2
# ALL-NEXT: uOps Per Cycle: 1.31
# ALL-NEXT: IPC: 1.31
# ALL-NEXT: Block RThroughput: 2.0
# ALL: Instruction Info:
# ALL-NEXT: [1]: #uOps
# ALL-NEXT: [2]: Latency
# ALL-NEXT: [3]: RThroughput
# ALL-NEXT: [4]: MayLoad
# ALL-NEXT: [5]: MayStore
# ALL-NEXT: [6]: HasSideEffects (U)
# ALL: [1] [2] [3] [4] [5] [6] Instructions:
# ATT-NEXT: 1 1 0.50 movl $1, %eax
# ATT-NEXT: 1 1 0.50 movl $255, %ebx
# ATT-NEXT: 1 3 1.00 imull %edi, %esi
# ATT-NEXT: 1 1 0.50 leal (%rsi,%rdi), %eax
# INTEL-NEXT: 1 1 0.50 mov eax, 1
# INTEL-NEXT: 1 1 0.50 mov ebx, 255
# INTEL-NEXT: 1 3 1.00 imul esi, edi
# INTEL-NEXT: 1 1 0.50 lea eax, [rsi + rdi]