Embedded deep learning accelerators: A survey on recent advances

G Akkad, A Mansour, E Inaty - IEEE Transactions on Artificial …, 2023 - ieeexplore.ieee.org
The exponential increase in generated data as well as the advances in high-performance
computing has paved the way for the use of complex machine learning methods. Indeed, the …

Mimonets: Multiple-input-multiple-output neural networks exploiting computation in superposition

N Menet, M Hersche, G Karunaratne… - Advances in …, 2024 - proceedings.neurips.cc
With the advent of deep learning, progressively larger neural networks have been designed
to solve complex tasks. We take advantage of these capacity-rich models to lower the cost of …

MIMMO: Multi-Input Massive Multi-Output Neural Network

M Ferianc, M Rodrigues - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Neural networks (NNs) have achieved superhuman accuracy in multiple tasks, but NNs
predictions' certainty is often debatable, especially if confronted with out of training …

RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs

S Chaudhari, P Aggarwal, V Murahari… - arXiv preprint arXiv …, 2024 - arxiv.org
State-of-the-art large language models (LLMs) have become indispensable tools for various
tasks. However, training LLMs to serve as effective assistants for humans requires careful …

TextMixer: Mixing Multiple Inputs for Privacy-Preserving Inference

X Zhou, Y Lu, R Ma, T Gui, Q Zhang… - Findings of the …, 2023 - aclanthology.org
Pre-trained language models (PLMs) are often deployed as cloud services, enabling users
to upload textual data and perform inference remotely. However, users' personal text often …

MOSEL: Inference Serving Using Dynamic Modality Selection

B Hu, L Xu, J Moon, NJ Yadwadkar, A Akella - arXiv preprint arXiv …, 2023 - arxiv.org
Rapid advancements over the years have helped machine learning models reach
previously hard-to-achieve goals, sometimes even exceeding human capabilities. However …

Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules

C Xiao, Y Luo, W Zhang, P Zhang, X Han, Y Lin… - arXiv preprint arXiv …, 2023 - arxiv.org
Pre-trained language models (PLMs) have achieved remarkable results on NLP tasks but at
the expense of huge parameter sizes and the consequent computational costs. In this paper …

Mux-plms: Data multiplexing for high-throughput language models

V Murahari, A Deshpande, CE Jimenez… - arXiv preprint arXiv …, 2023 - arxiv.org
The widespread adoption of large language models such as ChatGPT and Bard has led to
unprecedented demand for these technologies. The burgeoning cost of inference for ever …

PruMUX: Augmenting Data Multiplexing with Model Compression

Y Su, V Murahari, K Narasimhan, K Li - arXiv preprint arXiv:2305.14706, 2023 - arxiv.org
As language models increase in size by the day, methods for efficient inference are critical to
leveraging their capabilities for various applications. Prior work has investigated techniques …

SAE: Single Architecture Ensemble Neural Networks

M Ferianc, H Fan, M Rodrigues - arXiv preprint arXiv:2402.06580, 2024 - arxiv.org
Ensembles of separate neural networks (NNs) have shown superior accuracy and
confidence calibration over single NN across tasks. Recent methods compress ensembles …