PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks

M Neseem, C McCullough, R Hsin… - Proceedings of the …, 2024 - openaccess.thecvf.com
Low-precision quantization is recognized for its efficacy in neural network optimization. Our
analysis reveals that non-quantized elementwise operations which are prevalent in layers …

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

H You, Y Fu, Z Wang, A Yazdanbakhsh - arXiv preprint arXiv …, 2024 - arxiv.org
Autoregressive Large Language Models (LLMs) have achieved impressive performance in
language tasks but face two significant bottlenecks:(1) quadratic complexity in the attention …

NASA-F: FPGA-Oriented Search and Acceleration for Multiplication-Reduced Hybrid Networks

H Shi, Y Xu, Y Wang, W Mao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The costly multiplications challenge the deployment of modern deep neural networks
(DNNs) on resource-constrained devices. To promote hardware efficiency, prior works have …

PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer

PD Letourneau, MK Singh, HP Cheng, S Han… - arXiv preprint arXiv …, 2024 - arxiv.org
We present Polynomial Attention Drop-in Replacement (PADRe), a novel and unifying
framework designed to replace the conventional self-attention mechanism in transformer …

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

H You, Y Guo, Y Fu, W Zhou, H Shi, X Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have shown impressive performance on language tasks but
face challenges when deployed on resource-constrained devices due to their extensive …

BETA: Binarized Energy-Efficient Transformer Accelerator at the Edge

Y Ji, C Fang, Z Wang - arXiv preprint arXiv:2401.11851, 2024 - arxiv.org
Existing binary Transformers are promising in edge deployment due to their compact model
size, low computational complexity, and considerable inference accuracy. However …

[PDF][PDF] AI at the Edge: Efficient Deep Learning for Resource-Constrained Environments

M Neseem - 2024 - scale-lab.github.io
Artificial Intelligence (AI) for edge applications represents a paradigm shift in how data is
processed and insights are generated in computing systems. Traditionally, machine learning …

WiP: Towards Light Adaptation of Large Language Models For Personal Hardware

L Wang, J Wang, D Wang - Proceedings of the Workshop on Edge and …, 2024 - dl.acm.org
The large language models (LLMs) that everyone is using are not deployed locally. Users
need to send relatively private and important data to LLM when using it. Handing over …