RAELLA: Reforming the arithmetic for efficient, low-resolution, and low-loss analog PIM: No retraining required!

T Andrulis, JS Emer, V Sze - … of the 50th Annual International Symposium …, 2023 - dl.acm.org
Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural
Network (DNN) inference by reducing costly data movement and by using resistive RAM …

[HTML][HTML] Modular design automation of the morphologies, controllers, and vision systems for intelligent robots: a survey

W Li, Z Wang, R Mai, P Ren, Q Zhang, Y Zhou, N Xu… - Visual Intelligence, 2023 - Springer
Abstract Design automation is a core technology in industrial design software and an
important branch of knowledge-worker automation. For example, electronic design …

Pela: Learning parameter-efficient models with low-rank approximation

Y Guo, G Wang, M Kankanhalli - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Applying a pre-trained large model to downstream tasks is prohibitive under resource-
constrained conditions. Recent dominant approaches for addressing efficiency issues …

FedACQ: adaptive clustering quantization of model parameters in federated learning

T Tian, H Shi, R Ma, Y Liu - International Journal of Web Information …, 2024 - emerald.com
Purpose For privacy protection, federated learning based on data separation allows
machine learning models to be trained on remote devices or in isolated data devices …

Differentiable Neural Architecture, Mixed Precision and Accelerator Co-Search

KT Chitty-Venkata, Y Bian, M Emani… - IEEE …, 2023 - ieeexplore.ieee.org
Quantization, effective Neural Network architecture, and efficient accelerator hardware are
three important design paradigms to maximize accuracy and efficiency. Mixed Precision …

Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

R Qin, D Liu, Z Yan, Z Tan, Z Pan, Z Jia, M Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org
The scaling laws have become the de facto guidelines for designing large language models
(LLMs), but they were studied under the assumption of unlimited computing resources for …

IDKM: Memory Efficient Neural Network Quantization via Implicit, Differentiable -Means

S Jaffe, AK Singh, F Bullo - arXiv preprint arXiv:2312.07759, 2023 - arxiv.org
Compressing large neural networks with minimal performance loss is crucial to enabling
their deployment on edge devices.(Cho et al., 2022) proposed a weight quantization method …

A Bag of Tricks for Scaling CPU-based Deep FFMs to more than 300m Predictions per Second

B Škrlj, B Ben-Shalom, G Gašperšič, A Schwartz… - arXiv preprint arXiv …, 2024 - arxiv.org
Field-aware Factorization Machines (FFMs) have emerged as a powerful model for click-
through rate prediction, particularly excelling in capturing complex feature interactions. In …

DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN Inference

B Khabbazan, M Riera… - 2023 IEEE 30th …, 2023 - ieeexplore.ieee.org
Quantization is commonly used in Deep Neural Networks (DNNs) to reduce the storage and
computational complexity by decreasing the arithmetical precision of activations and …

A Survey on Image Classification of Lightweight Convolutional Neural Network

Y Liu, P Xiao, J Fang, D Zhang - 2023 19th International …, 2023 - ieeexplore.ieee.org
In recent years, deep neural networks have achieved tremendous success in image
classification in both academic and industrial settings. However, the high hardware …