Advancing direct convolution using convolution slicing optimization and ISA extensions
V Ferrari, R Sousa, M Pereira… - ACM Transactions on …, 2023 - dl.acm.org
Convolution is one of the most computationally intensive operations that must be performed
for machine learning model inference. A traditional approach to computing convolutions is …
for machine learning model inference. A traditional approach to computing convolutions is …
Exact Dot Product Accumulate Operators for 8-bit Floating-Point Deep Learning
O Desrentes, BD de Dinechin… - 2023 26th Euromicro …, 2023 - ieeexplore.ieee.org
Low bit-width floating-point formats appear as the main alternative to 8-bit integers for
quantized deep learning applications. We propose an architecture for exact dot product …
quantized deep learning applications. We propose an architecture for exact dot product …
Understanding Performance Implications of LLM Inference on CPUs
The remarkable performance of LLMs has led to their application in a wide range of fields,
with data centers utilizing expensive accelerators such as GPUs and TPUs to support LLM …
with data centers utilizing expensive accelerators such as GPUs and TPUs to support LLM …
Improving convolution via cache hierarchy tiling and reduced packing
V Ferrari, R Sousa, M Pereira… - Proceedings of the …, 2022 - dl.acm.org
Convolution is one of the most computationally intensive machine learning model
operations, usually solved by the known Im2Col+ BLAS method. This work proposes a novel …
operations, usually solved by the known Im2Col+ BLAS method. This work proposes a novel …
Exploiting the New Power ISA™ Matrix Math Instructions Through Compiler Built-ins
JE Moreira, K Barton, P Bergner, P Bhat… - … on Languages and …, 2022 - Springer
Abstract Power ISA™ Version 3.1 has introduced a new family of matrix math assist
instructions, collectively known as the Matrix-Multiply Assist (MMA) facility. The instructions …
instructions, collectively known as the Matrix-Multiply Assist (MMA) facility. The instructions …