PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
M Neseem, C McCullough, R Hsin… - Proceedings of the …, 2024 - openaccess.thecvf.com
Low-precision quantization is recognized for its efficacy in neural network optimization. Our
analysis reveals that non-quantized elementwise operations which are prevalent in layers …
analysis reveals that non-quantized elementwise operations which are prevalent in layers …
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Autoregressive Large Language Models (LLMs) have achieved impressive performance in
language tasks but face two significant bottlenecks:(1) quadratic complexity in the attention …
language tasks but face two significant bottlenecks:(1) quadratic complexity in the attention …
NASA-F: FPGA-Oriented Search and Acceleration for Multiplication-Reduced Hybrid Networks
The costly multiplications challenge the deployment of modern deep neural networks
(DNNs) on resource-constrained devices. To promote hardware efficiency, prior works have …
(DNNs) on resource-constrained devices. To promote hardware efficiency, prior works have …
PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer
PD Letourneau, MK Singh, HP Cheng, S Han… - arXiv preprint arXiv …, 2024 - arxiv.org
We present Polynomial Attention Drop-in Replacement (PADRe), a novel and unifying
framework designed to replace the conventional self-attention mechanism in transformer …
framework designed to replace the conventional self-attention mechanism in transformer …
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Large language models (LLMs) have shown impressive performance on language tasks but
face challenges when deployed on resource-constrained devices due to their extensive …
face challenges when deployed on resource-constrained devices due to their extensive …
BETA: Binarized Energy-Efficient Transformer Accelerator at the Edge
Y Ji, C Fang, Z Wang - arXiv preprint arXiv:2401.11851, 2024 - arxiv.org
Existing binary Transformers are promising in edge deployment due to their compact model
size, low computational complexity, and considerable inference accuracy. However …
size, low computational complexity, and considerable inference accuracy. However …
[PDF][PDF] AI at the Edge: Efficient Deep Learning for Resource-Constrained Environments
M Neseem - 2024 - scale-lab.github.io
Artificial Intelligence (AI) for edge applications represents a paradigm shift in how data is
processed and insights are generated in computing systems. Traditionally, machine learning …
processed and insights are generated in computing systems. Traditionally, machine learning …
WiP: Towards Light Adaptation of Large Language Models For Personal Hardware
The large language models (LLMs) that everyone is using are not deployed locally. Users
need to send relatively private and important data to LLM when using it. Handing over …
need to send relatively private and important data to LLM when using it. Handing over …