Advancing direct convolution using convolution slicing optimization and ISA extensions

V Ferrari, R Sousa, M Pereira… - ACM Transactions on …, 2023 - dl.acm.org
Convolution is one of the most computationally intensive operations that must be performed
for machine learning model inference. A traditional approach to computing convolutions is …

Exact Dot Product Accumulate Operators for 8-bit Floating-Point Deep Learning

O Desrentes, BD de Dinechin… - 2023 26th Euromicro …, 2023 - ieeexplore.ieee.org
Low bit-width floating-point formats appear as the main alternative to 8-bit integers for
quantized deep learning applications. We propose an architecture for exact dot product …

Understanding Performance Implications of LLM Inference on CPUs

S Na, G Jeong, BH Ahn, J Young… - 2024 IEEE …, 2024 - ieeexplore.ieee.org
The remarkable performance of LLMs has led to their application in a wide range of fields,
with data centers utilizing expensive accelerators such as GPUs and TPUs to support LLM …

Improving convolution via cache hierarchy tiling and reduced packing

V Ferrari, R Sousa, M Pereira… - Proceedings of the …, 2022 - dl.acm.org
Convolution is one of the most computationally intensive machine learning model
operations, usually solved by the known Im2Col+ BLAS method. This work proposes a novel …

Exploiting the New Power ISA™ Matrix Math Instructions Through Compiler Built-ins

JE Moreira, K Barton, P Bergner, P Bhat… - … on Languages and …, 2022 - Springer
Abstract Power ISA™ Version 3.1 has introduced a new family of matrix math assist
instructions, collectively known as the Matrix-Multiply Assist (MMA) facility. The instructions …