RAELLA: Reforming the arithmetic for efficient, low-resolution, and low-loss analog PIM: No retraining required!
Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural
Network (DNN) inference by reducing costly data movement and by using resistive RAM …
Network (DNN) inference by reducing costly data movement and by using resistive RAM …
[HTML][HTML] Modular design automation of the morphologies, controllers, and vision systems for intelligent robots: a survey
W Li, Z Wang, R Mai, P Ren, Q Zhang, Y Zhou, N Xu… - Visual Intelligence, 2023 - Springer
Abstract Design automation is a core technology in industrial design software and an
important branch of knowledge-worker automation. For example, electronic design …
important branch of knowledge-worker automation. For example, electronic design …
Pela: Learning parameter-efficient models with low-rank approximation
Applying a pre-trained large model to downstream tasks is prohibitive under resource-
constrained conditions. Recent dominant approaches for addressing efficiency issues …
constrained conditions. Recent dominant approaches for addressing efficiency issues …
FedACQ: adaptive clustering quantization of model parameters in federated learning
Purpose For privacy protection, federated learning based on data separation allows
machine learning models to be trained on remote devices or in isolated data devices …
machine learning models to be trained on remote devices or in isolated data devices …
Differentiable Neural Architecture, Mixed Precision and Accelerator Co-Search
Quantization, effective Neural Network architecture, and efficient accelerator hardware are
three important design paradigms to maximize accuracy and efficiency. Mixed Precision …
three important design paradigms to maximize accuracy and efficiency. Mixed Precision …
Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices
The scaling laws have become the de facto guidelines for designing large language models
(LLMs), but they were studied under the assumption of unlimited computing resources for …
(LLMs), but they were studied under the assumption of unlimited computing resources for …
IDKM: Memory Efficient Neural Network Quantization via Implicit, Differentiable -Means
S Jaffe, AK Singh, F Bullo - arXiv preprint arXiv:2312.07759, 2023 - arxiv.org
Compressing large neural networks with minimal performance loss is crucial to enabling
their deployment on edge devices.(Cho et al., 2022) proposed a weight quantization method …
their deployment on edge devices.(Cho et al., 2022) proposed a weight quantization method …
A Bag of Tricks for Scaling CPU-based Deep FFMs to more than 300m Predictions per Second
B Škrlj, B Ben-Shalom, G Gašperšič, A Schwartz… - arXiv preprint arXiv …, 2024 - arxiv.org
Field-aware Factorization Machines (FFMs) have emerged as a powerful model for click-
through rate prediction, particularly excelling in capturing complex feature interactions. In …
through rate prediction, particularly excelling in capturing complex feature interactions. In …
DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN Inference
B Khabbazan, M Riera… - 2023 IEEE 30th …, 2023 - ieeexplore.ieee.org
Quantization is commonly used in Deep Neural Networks (DNNs) to reduce the storage and
computational complexity by decreasing the arithmetical precision of activations and …
computational complexity by decreasing the arithmetical precision of activations and …
A Survey on Image Classification of Lightweight Convolutional Neural Network
Y Liu, P Xiao, J Fang, D Zhang - 2023 19th International …, 2023 - ieeexplore.ieee.org
In recent years, deep neural networks have achieved tremendous success in image
classification in both academic and industrial settings. However, the high hardware …
classification in both academic and industrial settings. However, the high hardware …