Compiling for the IBM matrix engine for enterprise workloads

V Ferrari, R Sousa, M Pereira… - ACM Transactions on …, 2023 - dl.acm.org

Convolution is one of the most computationally intensive operations that must be performed
for machine learning model inference. A traditional approach to computing convolutions is …

被引用次数：4 相关文章所有 4 个版本

[PDF] hal.science

Exact Dot Product Accumulate Operators for 8-bit Floating-Point Deep Learning

O Desrentes, BD de Dinechin… - 2023 26th Euromicro …, 2023 - ieeexplore.ieee.org

Low bit-width floating-point formats appear as the main alternative to 8-bit integers for
quantized deep learning applications. We propose an architecture for exact dot product …

被引用次数：2 相关文章所有 6 个版本

[PDF] github.io

Understanding Performance Implications of LLM Inference on CPUs

S Na, G Jeong, BH Ahn, J Young… - 2024 IEEE …, 2024 - ieeexplore.ieee.org

The remarkable performance of LLMs has led to their application in a wide range of fields,
with data centers utilizing expensive accelerators such as GPUs and TPUs to support LLM …

Improving convolution via cache hierarchy tiling and reduced packing

V Ferrari, R Sousa, M Pereira… - Proceedings of the …, 2022 - dl.acm.org

Convolution is one of the most computationally intensive machine learning model
operations, usually solved by the known Im2Col+ BLAS method. This work proposes a novel …

被引用次数：2 相关文章所有 2 个版本

Exploiting the New Power ISA™ Matrix Math Instructions Through Compiler Built-ins

JE Moreira, K Barton, P Bergner, P Bhat… - … on Languages and …, 2022 - Springer

Abstract Power ISA™ Version 3.1 has introduced a new family of matrix math assist
instructions, collectively known as the Matrix-Multiply Assist (MMA) facility. The instructions …